Title
Framework for enabling system understanding
Abstract
Building the effective HPC resilience mechanisms required for viability of next generation supercomputers will require in depth understanding of system and component behaviors. Our goal is to build an integrated framework for high fidelity long term information storage, historic and run-time analysis, algorithmic and visual information exploration to enable system understanding, timely failure detection/prediction, and triggering of appropriate response to failure situations. Since it is unknown what information is relevant and since potentially relevant data may be expressed in a variety of forms (e.g., numeric, textual), this framework must provide capabilities to process different forms of data and also support the integration of new data, data sources, and analysis capabilities. Further, in order to ensure ease of use as capabilities and data sources expand, it must also provide interactivity between its elements. This paper describes our integration of the capabilities mentioned above into our OVIS tool.
Year
DOI
Venue
2011
10.1007/978-3-642-29740-3_27
international conference on parallel processing
Keywords
DocType
Volume
depth understanding,relevant data,failure situation,integrated framework,enabling system understanding,run-time analysis,visual information exploration,new data,analysis capability,data source,information storage
Conference
7156
ISSN
Citations 
PageRank 
0302-9743
1
0.35
References 
Authors
4
10
Name
Order
Citations
PageRank
jim brandt110.35
feng chen210.35
A. Gentile310.35
Box Leangsuksun4584.88
Jackson Mayo5437.97
Philippe P. Pébay627327.36
D. Roe710.35
Narate Taerat8888.13
David C. Thompson930818.14
matthew wong1010.35