Title
Clock delta compression for scalable order-replay of non-deterministic parallel applications
Abstract
The ability to record and replay program execution helps significantly in debugging non-deterministic MPI applications by reproducing message-receive orders. However, the large amount of data that traditional record-and-reply techniques record precludes its practical applicability to massively parallel applications. In this paper, we propose a new compression algorithm, Clock Delta Compression (CDC), for scalable record and replay of non-deterministic MPI applications. CDC defines a reference order of message receives based on a totally ordered relation using Lamport clocks, and only records the differences between this reference logical-clock order and an observed order. Our evaluation shows that CDC significantly reduces the record data size. For example, when we apply CDC to Monte Carlo particle transport Benchmark (MCB), which represents common non-deterministic communication patterns, CDC reduces the record size by approximately two orders of magnitude compared to traditional techniques and incurs between 13.1% and 25.5% of runtime overhead.
Year
DOI
Venue
2015
10.1145/2807591.2807642
International Conference for High Performance Computing, Networking, Storage, and Analysis
Keywords
Field
DocType
Debugging tools, Non-determinism, Compression
Computer science,Massively parallel,Parallel computing,Lamport timestamps,Decoding methods,Data compression,Delta encoding,Scalability,Debugging,Encoding (memory),Distributed computing
Conference
ISBN
Citations 
PageRank 
978-1-5090-0273-3
5
0.44
References 
Authors
14
5
Name
Order
Citations
PageRank
Kento Sato119211.43
Dong H. Ahn2384.21
Ignacio Laguna323924.56
Gregory L. Lee419914.30
Martin Schulz516719.77