Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Ph.D
Group : Artificial Intelligence and Inference Systems

View-Based Techniques for the Efficient Management of Web Data

Starts on 01/01/2009
Advisor : MANOLESCU-GOUJOT, Ioana

Funding : CDD sur contrat INRIA
Affiliation : Université Paris-Saclay
Laboratory : LRI INRIA LEO

Defended on 29/06/2012, committee :
Serge Abiteboul, Directeur de recherche, Inria Saclay and ENS Cachan (examinateur)
Alin Deutsch, Associate Professor, University of California, San Diego (rapporteur)
Christine Froidevaux, Professeur, Université Paris-Sud and Inria Saclay (examinateur)
François Goasdoué, Maître de conférences, Université Paris Sud and Inria Saclay (co-directeur de thèse)
Ioana Manolescu, Directeur de recherche, Inria Saclay and Université Paris Sud (directeur de thèse)
Philippe Rigaux, Professeur, Conservatoire National des Arts et Métiers (examinateur)
Marie-Christine Rousset, Professeur, Université de Grenoble (examinateur)
Gerhard Weikum, Directeur de recherche, Max-Planck-Institüt für Informatik (rapporteur)

Research activities :

Abstract :
Data is being published in digital formats at very high rates nowadays. A large share of this data has complex structure, typically organized as trees (Web documents such as HTML and XML being the most representative) or graphs (in particular, graph-structured Semantic Web databases, expressed in RDF). There is great interest in exploiting such complex data, whether in an Open Data access model or within companies owning it, and efficiently doing so for large data volumes remains challenging.
Materialized views have long been used to obtain significant performance improvements when processing queries. The principle is that a view stores pre-computed results that can be used to evaluate (possibly part of) a query. Adapting materialized view techniques to the Web data setting we consider is particularly challenging due to the structural and semantic complexity of the data. This thesis tackles two problems in the broad context of materialized view-based management of Web data.
First, we focus on the problem of view selection for RDF query workloads. We present a novel algorithm, which, based on a query workload, proposes the most appropriate views to be materialized in the database, in order to minimize the combined cost of query evaluation, view maintenance and view storage. Although RDF query workloads typically feature many joins, hampering the view selection process, our algorithm scales to hundreds of queries, a number unattained by existing approaches. Furthermore, we propose new techniques to account for the implicit data that can be derived by the RDF Schemas and which further complicate the view selection process.
The second contribution of our work concerns query rewriting based on materialized XML views. We start by identifying an expressive dialect of XQuery, corresponding to tree patterns with value joins, and study some important properties for these queries, such as containment and minimization. Based on these notions, we consider the problem of finding minimal equivalent rewritings of a query expressed in this dialect, using materialized views expressed in the same dialect, and provide a sound and complete algorithm for that purpose. Our work extends the state of the art by allowing each pattern node to return a set of attributes, supporting value joins in the patterns, and considering rewritings which combine many views. Finally, we show how our view-based query rewriting algorithm can be applied in a distributed setting, in order to efficiently disseminate corpora of XML documents carrying RDF annotations.

Ph.D. dissertations & Faculty habilitations
CAUSAL LEARNING FOR DIAGNOSTIC SUPPORT


CAUSAL UNCERTAINTY QUANTIFICATION UNDER PARTIAL KNOWLEDGE AND LOW DATA REGIMES


MICRO VISUALIZATIONS: DESIGN AND ANALYSIS OF VISUALIZATIONS FOR SMALL DISPLAY SPACES
The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.