Ph.D
Group : Parallel Architecture
Coût énergétique, coût mémoire du calcul réversible
Starts on 01/09/2007
Advisor : EISENBEIS, Christine
Funding : CDD sur contrat INRIA
Affiliation : Université Paris-Saclay
Laboratory : INRIA Saclay
Defended on 21/12/2011, committee :
Pr. Claire Hanen, Université Paris-Ouest-Nanterre-La Défense et Lip6. Rapporteur
Dr. Erven Rohou, INRIA Rennes - Bretagne Atlantique. Rapporteur
Pr. Jean-Luc Gaudiot, University of California, Irvine. Examinateur
Pr. Yannis Manoussakis, Université Paris-Sud 11. Examinateur
Dr. Claude Tadonki, École des Mines. Examinateur
Dr. Christine Eisenbeis, INRIA Saclay - Ile-de-France. Directrice de thèse
Research activities :
Abstract :
The main resources for computation are time, space and energy. Reducing them is the main challenge in the field of processor performance.
In this thesis, we are interested in a fourth factor which is information. Information has an important and direct impact on these three resources.
We show how it contributes to performance optimization. Landauer has suggested that independently on the hardware where computation is
run information erasure generates dissipated energy. This is a fundamental result of thermodynamics in physics. Therefore, under this hypothesis,
only reversible computations where no information is ever lost, are likely to be thermodynamically adiabatic and do not dissipate power.
Reversibility means that data can always be retrieved from any point of the program. Information may be carried not only by a data but also
by the process and input data that generate it. When a computation is reversible, information can also be retrieved from other already computed
data and reverse computation. Hence reversible computing improves information locality.
This thesis develops these ideas in two directions. In the first part, we address the issue of making a computation DAG (directed acyclic graph)
reversible in terms of spatial complexity. We define energetic garbage as the additional number of registers needed for the reversible computation
with respect to the original computation. We propose a reversible register allocator and we show empirically that the garbage size is never more
than 50% of the DAG size. In the second part, we apply this approach to the trade-off between recomputing (direct or reverse) and storage in
the context of supercomputers such as the recent vector and parallel coprocessors, graphic cards (GPU, Graphic Processing Unit), IBM Cell processor, etc.,
where the gap between processor cycle time and memory access time is increasing. We show that recomputing in general and reverse computing
in particular helps reduce register requirements and memory pressure. This approach of reverse rematerialization also contributes to the increase
of instruction-level parallelism (Cell) and thread-level parallelism in multicore processors with shared register/memory file (GPU). On the latter
architecture the number of registers required by the loop kernel limits the number of running threads and affects performance.
Reverse rematerialization generates additional instructions but their cost can be hidden by the parallelism gain. Experiments on the highly memory
demanding Lattice QCD simulation code on Nvidia GPU show a performance gain up to 11%.