Title
Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale
Abstract
Accurate, reproducible and comparable measurement of the overheads, communication times and progression behaviour of blocking and nonblocking collective operations is a complicated task. Although different measurement schemes for blocking collective operations are implemented in well-known benchmarks, many of these schemes introduce different systematic errors in their measurements. We characterise these errors and select a window-based approach as the most accurate method. However, this approach complicates measurements significantly and introduces clock synchronisation as a new source of errors. We analyse approaches to avoid or correct those errors and develop a scalable synchronisation scheme to conduct benchmarks on massively parallel systems. Our results are compared to the window-based scheme implemented in the SKaMPI benchmarks and show a reduction of the synchronisation overhead by a factor of 16 on 128 processes. We also describe two different measurement schemes for the overhead and asynchronous progress of nonblocking collective communications. An implementation and results of both measurement schemes are presented.
Year
DOI
Venue
2010
10.1080/17445760902894688
IJPEDS
Keywords
DocType
Volume
mpi,different systematic error,collective operation,scalable synchronisation scheme,skampi benchmarks,massive scale,clock synchronisation,scalable synchronization,synchronisation overhead,communication time,comparable measurement,different measurement scheme,benchmarking,collective communication,measurement scheme,time synchronization,collective operations,col,systematic error,clock synchronization
Journal
25
Issue
ISSN
Citations 
4
1744-5760
12
PageRank 
References 
Authors
0.78
18
3
Name
Order
Citations
PageRank
Torsten Hoefler12197163.64
Timo Schneider231218.39
Andrew Lumsdaine32754236.74