Ph.D
Group : Artificial Intelligence and Inference Systems
Recherche ciblée de documents sur le Web
Starts on 01/10/2001
Advisor : ROUSSET, Marie-Christine
Funding : A
Affiliation : Université Paris-Saclay
Laboratory : Orsay
Defended on 08/06/2005, committee :
AMANN Bernd
MOHAND Hacid-Saïd
KODRATOFF Yves
LEGER Alain
ROUSSET Marie-Christine
Research activities :
- Artificial Intelligence
- Semantic Web
Abstract :
Our work combines a search engine such as Google or a Web crawler (such as that of Xyleme) with a filtering tool that can distinguish, among the possible thousands of web pages returned by Google or the Xyleme Crawler, those that really contain useful data for the datawarehouse. In the first e.dot experiments, it was shown that guiding the search through the web by keywords extracted from the domain ontology was not precise enough to guarantee that the returned Web pages were relevant to the topic of the warehouse. Our approach for designing a filtering tool is generic and declarative. We have defined and implemented a query language, called WebQueL, which anables the combination of different criteria for specifying the web pages of interest. Those criteria allow for combining content and structure of searched documents.