The data-centric languages and systems thematic aims at designing and developping programming langages as well as systems that seriously take into account massive data. The purpose is to build robust and efficient platforms on well founded theoretical grounds. Internet explosion and the ever growing importance of data in applications as well as the recent emergence of cloud computing, has given birth to a whirlwind of new data models (XML, JSON) and languages (XPath, XQuery, Pig, Jaql...). Whether they are developed under the banner of NoSQL (which stands for Not Only SQL), for BigData Analytics, for Cloud computing or as domain specific languages (DSL) embedded in a host language, most of them share a common subset of SQL and/or the ability to handle semistructured data. Such languages can greatly benefit from formal uniform foundations, and we argue that such foundations should account for novel features critical to various application domains. Also, most of those languages provide limited type checking, or ignore it altogether. We believe type checking is essential for many applications, with usage ranging from error detection to optimization.
Regarding the language topic, we are currently interested in the XML format, this led us to the design and implementation of CDuce an XML centric programming language .
The Datacert thematic is highly interested in the certification of data intensive systems with the Coq proof assistant. As data, in all forms, is increasingly large in volume, as a result of computers capturing more and more of the companies activity, the scientists' work and also as a result of better and more powerful automatic data gathering tools (e.g. space telescopes, focused crawlers, archived experimental data mandatory in some types of government-funded research programs). The availability and reliability of such large data volumes is a gold mine for companies, scientists and simply citizens. It is then of crucial importance to protect data integrity and reliability by means of robust methods and tools. Program verification and certification have been intensively studied in the last decades yielding very impressive results and highly reliable software (e.g., the Compcert project). However, and surprisingly, while the amount of data stored and managed by data engines has drastically increased, little attention has been devoted to ensure that such (complex) systems are indeed reliable. The aim of this research track is to certify data intensive systems such as relational database systems, XQuery and semi structured engines using formal tools such as SMT provers (e.g., Alt-ergo) embedded in the Why(3) platform as well as interactive theorem provers (e.g., Coq).
Groups
Toccata
Joint Inria project teams
Research highlights
Semantic subtyping: dealing set theoretically with function union intersection and negation types Static and Dynamic Semantics of NoSQL Languages, accepted at ACM POPL 2013 Optimizing XML querying using type-based document projection
Contracts & grants
Typex
Software & patents
CDuce
Collaborations
Members
Ph.D. dissertations & Faculty habilitations