Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Ph.D
Group : Bioinformatics

Designing scientific workflows following a structure and provenance-aware strategy

Starts on 15/12/2011
Advisor : FROIDEVAUX, Christine
[COHEN-BOULAKIA Sarah]

Funding : ETR-BGF
Affiliation : Université Paris-Saclay
Laboratory : LRI-Bioinfo

Defended on 11/10/2013, committee :
Prof. Christine Froidevaux, Université Paris Sud (directeur de these)
Dr. Sarah Cohen-Boulakia, Université Paris Sud (co-encadrante)
Prof. Mohand-Said Hacid, Université Lyon 1 (rapporteur)
Prof. Therese Libourel, Université Montepellier II (rapporteur)
Prof. Daniela Grigori, Université Paris Dauphine (examinateur)
Prof. Chantal Reynaud, Université Paris Sud (examinateur)

Research activities :

Abstract :
Scientific workflow systems are equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. For several reasons, the complexity of workflow and workflow execution structures is increasing over time, with a clear impact on scientific workflows reuse.
The global aim of this thesis is to enhance workflow reuse by providing strategies to reduce the complexity of workflow structures while preserving provenance. Two strategies are introduced.
First, we propose an approach to rewrite any scientific workflow (represented as a directed acyclic graph (DAG)) into a series-parallel (SP) structure while preserving provenance. Such structures allow to design polynomial-time algorithms for complex workflow operations (e.g., comparing workflows) while such operations are related to an NP-hard problem for general DAG structures. The SPFlow rewriting and provenance-preserving algorithm is thus introduced.
Second, we provide a methodology and a technique to reduce the redundancy present in workflows by detecting and removing "anti-patterns" responsible for such redundancy. The DistillFlow algorithm is able to transform a workflow into a distilled semantically-equivalent workflow, free or partly free of anti-patterns and with a more concise and simpler structure.
The two main approaches (SPFlow and DistillFlow) are based on a provenance model that we have introduced to represent the provenance structure of the workflow executions. Our solutions are available for use at https://www.lri.fr/~chenj. They have been systematically tested on large collections of real workflows, especially from the Taverna system.

Ph.D. dissertations & Faculty habilitations
CAUSAL LEARNING FOR DIAGNOSTIC SUPPORT


CAUSAL UNCERTAINTY QUANTIFICATION UNDER PARTIAL KNOWLEDGE AND LOW DATA REGIMES


MICRO VISUALIZATIONS: DESIGN AND ANALYSIS OF VISUALIZATIONS FOR SMALL DISPLAY SPACES
The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.