Title
Visual and algorithmic tooling for system trace analysis: a case study
Abstract
Despite advances in the application of automated statistical and machine learning techniques to system log and trace data there will always be a need for human analysis of machine traces, because trace information on unstable systems may be incomplete, or incorrect. In addition, false positives from automated analysis will not likely disappear, and remediation measures and candidate fix tests will need to be evaluated. We present Zinsight, a visual and analytic tool that supports performance analysts and debugging, using large event traces to understand complex systems. This tool enables analysts to quickly create and manipulate high-level structural representations linked with statistical analysis derived from the underlying event trace data. The original raw trace is annotated with module names and a domain specific database is incorporated to relate software functions to module names. Navigable sequence context graph views present automatically extracted execution flow patterns from arbitrarily definable sets of events and are linked to frequency, distribution, and response time views. The goal is to reduce the cognitive and computational load on the analyst while providing answers to the most natural questions in a problem determination session. We present a case study of the tool in use on field problems from the recently shipped (late 2008) IBM z10 mainframe. As a result of the industry trend toward higher parallelism and memory latency, many issues were encountered with legacy code. The tool was applied successfully to diagnose these problems.
Year
DOI
Venue
2010
10.1145/1740390.1740412
Operating Systems Review
Keywords
Field
DocType
large event trace,system trace analysis,machine trace,statistical analysis,trace analysis,human analysis,original raw trace,analytic tool,case study,underlying event trace data,automated analysis,visualization,problem determination,algorithmic tooling,trace information,trace data,pattern extraction,memory latency,false positive,machine learning,complex system
Information system,Data mining,Algorithmics,Visualization,Computer science,Software,Artificial intelligence,Legacy code,Legacy system,Software development,Distributed computing,Debugging
Journal
Volume
Issue
Citations 
44
1
8
PageRank 
References 
Authors
0.49
12
2
Name
Order
Citations
PageRank
Wim De Pauw140431.73
Stephen Heisig2192.76