Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Ph.D de

Ph.D
Group : Learning and Optimization

Contributions to Simulation-based High-dimensional Sequential Decision Making

Starts on 01/11/2009
Advisor : TEYTAUD, Olivier

Funding :
Affiliation : Université Paris-Saclay
Laboratory : LRI INRIA

Defended on 14/04/2013, committee :
BOUZY Bruno (Rapporteur) - MdC, HDR, Université Paris-Descartes (LIPADE)
CAZENAVE Tristan (Examinateur) - Professeur, Université Paris-Dauphine (LAMSADE)
DONCIEUX Stéphane (Examinateur) - Professeur, ISIR, Université Pierre & Marie Curie
DUTECH Alain (Rapporteur) - CR, HDR, Inria (Loria)
MARTIN Jean-Claude (Examinateur) - Professeur, LIMSI, Université Paris-Sud 11
TEYTAUD Olivier - CR1, HDR, Inria (LRI), Université Paris-Sud 11

Research activities :

Abstract :
My thesis is entitled "Contributions to Simulation-based High-dimensional Sequential Decision Making". The context of the thesis is about games, planning and Markov Decision Processes.
An agent interacts with its environment by successively making decisions. The agent starts from an initial state until a final state in which the agent can not make decision anymore. At each timestep, the agent receives an observation of the state of the environment. From this observation and its knowledge, the agent makes a decision which modifies the state of the environment. Then, the agent receives a reward and a new observation. The goal is to maximize the sum of rewards obtained during a simulation from an initial state to a final state. The policy of the agent is the function which, from the history of observations, returns a decision.
We work in a context where (i) the number of states is huge, (ii) reward carries little information, (iii) the probability to reach quickly a good final state is weak and (iv) prior knowledge is either nonexistent or hardly exploitable.
Both applications described in this thesis present these constraints : the game of Go and a 3D simulator of the european project MASH (Massive Sets of Heuristics).
In order to take a satisfying decision in this context, several solutions are brought :
1. Simulating with the compromise exploration/exploitation (MCTS)
2. Reducing the complexity by local solving (GoldenEye)
3. Building a policy which improves itself (RBGP)
4. Learning prior knowledge (CluVo+GMCTS)
Monte Carlo Tree Search (MCTS) is the state of the art for the game of Go. From a model of the environment, MCTS builds incrementally and asymetrically a tree of possible futures by performing Monte-Carlo simulations. The tree starts from the current observation of the agent. The agent switches between the exploration of the model and the exploitation of decisions which statistically give a good cumulative reward. We discuss 2 ways for improving MCTS : the
parallelization and the addition of prior knowledge. The parallelization does not solve some weaknesses of MCTS; in particular some local
problems remain challenges. We propose an algorithm (GoldenEye) which is composed of 2 parts : detection of a local problem and then its resolution. The algorithm of resolution reuses some concepts of MCTS and it solves difficult problems of a classical database.
The addition of prior knowledge by hand is laborious and boring. We propose a method called Racing-based Genetic Programming (RBGP) in order to add automatically prior knowledge. The strong point is that RBGP rigorously validates the addition of a prior knowledge and RBGP can be used for building a policy (instead of only optimizing an algorithm).
In some applications such as MASH, simulations are too expensive in time and there is no prior knowledge and no model of the environment; therefore Monte Carlo Tree Search can not be used. So that MCTS becomes usable in this context, we propose a method for learning prior knowledge (CluVo). Then we use pieces of prior knowledge for improving the rapidity of learning of the agent and also for building a model. We use from this model an adapted version of Monte Carlo Tree Search (GMCTS). This method solves difficult problems of MASH and gives good results in an application to a word game.

Ph.D. dissertations & Faculty habilitations
CAUSAL LEARNING FOR DIAGNOSTIC SUPPORT


CAUSAL UNCERTAINTY QUANTIFICATION UNDER PARTIAL KNOWLEDGE AND LOW DATA REGIMES


MICRO VISUALIZATIONS: DESIGN AND ANALYSIS OF VISUALIZATIONS FOR SMALL DISPLAY SPACES
The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.