Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Ph.D
Group : Large-scale Heterogeneous DAta and Knowledge

Performance and optimization in peer to peer data management

Starts on 01/09/2009
Advisor : MANOLESCU-GOUJOT, Ioana

Funding : A
Affiliation : Université Paris-Saclay
Laboratory : LRI INRIA SACLAY

Defended on 05/07/2013, committee :
Alain Denise, Professeur, Université Paris Sud (examinateur)
Yanlei Diao, Professeur, University of Massachusetts Amherst (rapporteur)
Ioana Manolescu, Directeur de Recherche, INRIA Saclay & Université Paris-Sud (directrice de thèse)
Philippe Rigaux, Professeur, Conservatoire National des Arts et Métiers (rapporteur)
Patrick Valduriez, Directeur de Recherche, INRIA Sophia Antipolis – Méditerranée (examinateur)
Vasilis Vassalos, Professeur, Athens University of Economics and Business (examinateur)

Research activities :

Abstract :
XML was recommended by W3C in 1998 as a markup language to be used by device- and system-independent methods of representing information. XML is nowadays used as a data model for storing and querying large volumes of data in database systems. In spite of significant research and systems development, many performance problems are raised by processing very large amounts of XML data.

Materialized views have long been used in databases to speed up queries. Materialized views can be seen as precomputed query results that can be re-used to evaluate (part of) another query, and have been a topic of intensive research, in particular in the context of relational data warehousing.

This thesis investigates the applicability of materialized views techniques to optimize the performance of Web data management tools, in particular in distributed settings, considering XML data and queries. We make three contributions.

We first consider the problem of choosing the best views to materialize within a given space budget in order to improve the performance of a query workload. Our work is the first to address the view selection problem for a rich subset of XQuery. The challenges we face stem from the expressive power and features of both the query and view languages and from the size of the search space of candidate views to materialize. While the general problem has prohibitive complexity, we propose and study a heuristic algorithm and demonstrate its superior performance compared to the state of the art.

Second, we consider the management of large XML corpora in peer-to-peer networks, based on distributed hash tables (or DHTs, in short). We consider a platform leveraging distributed materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere in the network, and exploited to efficiently answer queries issued by any network peer. This thesis has contributed important scalability oriented optimizations, as well as a comprehensive set of experiments deployed in a country-wide WAN. These experiments outgrow by orders of magnitude similar competitor systems in terms of data volumes and data dissemination throughput. Thus, they are the most advanced in understanding the performance behavior of DHT-based XML content management in real settings.

Finally, we present a novel approach for scalable content-based publish/subscribe (pub/sub, in short) in the presence of constraints on the available computational resources of data publishers. We achieve scalability by off-loading subscriptions from the publisher, and leveraging view-based query rewriting to feed these subscriptions from the data accumulated in others. Our main contribution is a novel algorithm for organizing subscriptions in a multi-level dissemination network in order to serve large numbers of subscriptions, respect capacity constraints, and minimize latency. The efficiency and effectiveness of our algorithm are confirmed through extensive experiments and a large deployment in a WAN.

Ph.D. dissertations & Faculty habilitations
CAUSAL LEARNING FOR DIAGNOSTIC SUPPORT


CAUSAL UNCERTAINTY QUANTIFICATION UNDER PARTIAL KNOWLEDGE AND LOW DATA REGIMES


MICRO VISUALIZATIONS: DESIGN AND ANALYSIS OF VISUALIZATIONS FOR SMALL DISPLAY SPACES
The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.