Ph.D
Group : Large-scale Heterogeneous DAta and Knowledge
Traitement efficace de requêtes sparql avec extensions olap pour entrepôts RDF
Starts on 01/09/2011
Advisor : MANOLESCU-GOUJOT, Ioana
[GOASDOUE François]
Funding :
Affiliation : Université Paris-Saclay
Laboratory : LRI
Defended on 22/09/2014, committee :
Directrice de la thèse :
- Mme. Ioana Manolescu, Directrice de Recherche, Inria et Université Paris-Sud
Co-encadrant :
- M. François Goasdoué, Professeur, Université Rennes 1
Rapporteurs :
- M. Alon Halevy, Professeur, Google Research
- M. Frank van Harmelen, Professeur, Vrije Universiteit Amsterdam
- M. Frank van Harmelen, Professeur, Vrije Universiteit Amsterdam
Examinateurs :
- M. Serge Abiteboul, Directeur de Recherche, Inria et ENS Cachan
- Mme. Christine Froidevaux, Professeur, Université Paris-Sud
- M. Philippe Rigaux, Professeur, Conservatoire National des Arts et Métiers
Research activities :
Abstract :
The utility and relevance of data lie in the information that can be extracted from it. The high rate of data publication and its increased complexity, for instance the heterogeneous, self-describing Semantic Web data, motivate the interest in efficient techniques for data manipulation. In this thesis we leverage mature relational data management technology for querying Semantic Web data.
The first part focuses on query answering over data subject to RDFS constraints, stored in relational data management systems. The implicit information resulting from RDF reasoning is required to correctly answer such queries. We introduce the database fragment of RDF, going beyond the expressive power of previously studied fragments. We devise novel techniques for answering Basic Graph Pattern queries within this fragment, exploring the two established approaches for handling RDF semantics, namely graph saturation and query reformulation.
In particular, we consider graph updates within each approach and propose a method for incrementally maintaining the saturation. We experimentally study the performance trade-offs of our techniques, which can be deployed on top of any relational data management engine.
The second part of this thesis considers the new requirements for data analytics tools and methods emerging from the development of the Semantic Web. We fully redesign, from the bottom up, core data analytics concepts and tools in the context of RDF data. We propose the first complete formal framework for warehouse-style RDF analytics. Notably, we define analytical schemas tailored to heterogeneous, semantic-rich RDF graphs, analytical queries which (beyond relational cubes) allow flexible querying of the data and the schema as well as powerful aggregation and OLAP-style operations. Experiments on a fully-implemented platform demonstrate the practical interest of our approach.