Ph.D
Group :
Dense matrix computations: communication cost and numerical stability
Starts on 01/10/2009
Advisor : GRIGORI, Laura
Funding : contrat doctoral du Ministère
Affiliation : Université Paris-Saclay
Laboratory : LRI - GRAND LARGE
Defended on 11/02/2013, committee :
Nicholas Higham (Rapporteur), Professeur, School of Mathematics, the University of Manchester
Yves Robert (Rapporteur), Professeur, Ecole Normale Supérieure de Lyon
Iain Duff (Examinateur), Professeur, the University of Strathclde
Yannis Manoussakis (Examinateur), Professeur, Université Paris Sud
Jean-Louis Roch (Examinateur), Maître de Conférences, IMAG
Laura Grigori (Directeur de thèse), Directeur de Recherche, INRIA
Research activities :
Abstract :
This dissertation focuses on a widely used linear algebra kernel to solve linear systems, that is the LU
decomposition. Usually, to perform such a computation one uses the Gaussian elimination with partial
pivoting (GEPP). The backward stability of GEPP depends on a quantity which is referred to as the
growth factor, it is known that in general GEPP leads to modest element growth in practice. However
its parallel version does not attain the communication lower bounds. Indeed the panel factorization represents
a bottleneck in terms of communication. To overcome this communication bottleneck, Grigori
et al [60] have developed a communication avoiding LU factorization (CALU), which is asymptotically
optimal in terms of communication cost at the cost of some redundant computation. In theory, the upper
bound of the growth factor is larger than that of Gaussian elimination with partial pivoting, however
CALU is stable in practice. To improve the upper bound of the growth factor, we study a new pivoting
strategy based on strong rank revealing QR factorization. Thus we develop a new block algorithm for
the LU factorization. This algorithm has a smaller growth factor upper bound compared to Gaussian
elimination with partial pivoting. The strong rank revealing pivoting is then combined with tournament
pivoting strategy to produce a communication avoiding LU factorization that is more stable than CALU.
For hierarchical systems, multiple levels of parallelism are available. However, none of the previously
cited methods fully exploit these hierarchical systems. We propose and study two recursive algorithms
based on the communication avoiding LU algorithm, which are more suitable for architectures with
multiple levels of parallelism. For an accurate and realistic cost analysis of these hierarchical algorithms,
we introduce a hierarchical parallel performance model that takes into account processor and
network hierarchies. This analysis enables us to accurately predict the performance of the hierarchical
LU factorization on an exascale platform.