Title
Exploring the feasibility of lossy compression for PDE simulations
Abstract
AbstractCheckpoint restart plays an important role in high-performance computing (HPC) applications, allowing simulation runtime to extend beyond a single job allocation and facilitating recovery from hardware failure. Yet, as machines grow in size and in complexity, traditional approaches to checkpoint restart are becoming prohibitive. Current methods store a subset of the application’s state and exploit the memory hierarchy in the machine. However, as the energy cost of data movement continues to dominate, further reductions in checkpoint size are needed. Lossy compression, which can significantly reduce checkpoint sizes, offers a potential to reduce computational cost in checkpoint restart. This article investigates the use of numerical properties of partial differential equation (PDE) simulations, such as bounds on the truncation error, to evaluate the feasibility of using lossy compression in checkpointing PDE simulations. Restart from a checkpoint with lossy compression is considered for a fail-stop error in two time-dependent HPC application codes: PlasComCM and Nek5000. Results show that error in application variables due to a restart from a lossy compressed checkpoint can be masked by the numerical error in the discretization, leading to increased efficiency in checkpoint restart without influencing overall accuracy in the simulation.
Year
DOI
Venue
2019
10.1177/1094342018762036
Periodicals
Keywords
Field
DocType
Lossy compression, checkpoint restart, exascale, error tolerance selection, error propagation, fault tolerance, compression
Compression (physics),Propagation of uncertainty,Lossy compression,Computer science,Parallel computing,Fault tolerance
Journal
Volume
Issue
ISSN
33
2
1094-3420
Citations 
PageRank 
References 
6
0.42
6
Authors
5
Name
Order
Citations
PageRank
Jon Calhoun1474.75
Franck Cappello23775251.47
Luke Olson323521.93
M. Snir43984520.82
William D. Gropp55547548.31