Title | ||
---|---|---|
Preventing the explosion of exascale profile data with smart thread-level aggregation. |
Abstract | ||
---|---|---|
State of the art performance analysis tools, such as Score-P, record performance profiles on a per-thread basis. However, for exascale systems the number of threads is expected to be in the order of a billion threads, and this would result in extremely large performance profiles. In most cases the user almost never inspects the individual per-thread data. In this paper, we propose to aggregate per-thread performance data in each process to reduce its amount to a reasonable size. Our goal is to aggregate the threads such that the thread-level performance issues are still visible and analyzable. Therefore, we implemented four aggregation strategies in Score-P: (i) SUM -- aggregates all threads of a process into a process profile; (ii) SET -- calculates statistical key data as well as the sum; (iii) KEY -- identifies three threads (i.e., key threads) of particular interest for performance analysis and aggregates the rest of the threads; (iv) CALLTREE -- clusters threads that have the same call-tree structure. For each one of these strategies we evaluate the compression ratio and how they maintain thread-level performance behavior information. The aggregation does not incur any additional performance overhead at application run-time. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2832106.2832107 | ESPT@SC |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
17 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Daniel Lorenz | 1 | 0 | 1.01 |
Sergei Shudler | 2 | 8 | 3.24 |
Felix Wolf | 3 | 3 | 2.94 |