Title
Stack Trace Analysis for Large Scale Debugging
Abstract
We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes to a few by sampling stack traces to form process equivalence classes, groups of processes exhibiting similar behavior. We can then use full-featured debuggers on representatives from these behavior classes for root cause analysis. STAT scalably collects stack traces over a sampling period to assemble a profile of the application's behavior. STAT routines process the samples to form a call graph prefix tree that encodes common behavior classes over the program's process space and time. STAT leverages MRNet, an infrastructure for tool control and data analyses, to overcome scalability barriers faced by heavy-weight debuggers. We present STAT's design and an evaluation that shows STAT gathers informative process traces from thousands of processes with sub-second latencies, a significant improvement over existing tools. Our case studies of production codes verify that STAT supports the quick identification of errors that were previously difficult to locate.
Year
DOI
Venue
2007
10.1109/IPDPS.2007.370254
Long Beach, CA
Keywords
Field
DocType
parallel programming,program debugging,program diagnostics,software libraries,software tools,trees (mathematics),STAT routines,Stack Trace Analysis Tool,call graph prefix tree,large scale debugging,parallel application,root cause analysis
Programming language,Computer science,Root cause analysis,Parallel computing,Stack trace,Call graph,Sampling (statistics),Equivalence class,Trie,Scalability,Debugging,Distributed computing
Conference
ISBN
Citations 
PageRank 
1-4244-0910-1
68
2.63
References 
Authors
15
7
Name
Order
Citations
PageRank
Dorian C. Arnold133824.70
Dong H. Ahn232522.61
de Supinski, Bronis R.32667154.44
Gregory L. Lee419914.30
Barton P. Miller53196397.51
Martin Schulz62227129.64
de Supinski, B.R.743923.69