Title
FINJ: A Fault Injection Tool for HPC Systems.
Abstract
We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments. FINJ provides support for custom workloads and allows generation of anomalous conditions through the use of fault-triggering executable programs. FINJ can also be integrated seamlessly with most other lower-level fault injection tools, allowing users to create and monitor a variety of highly-complex and diverse fault conditions in HPC systems that would be difficult to recreate in practice. FINJ is suitable for experiments involving many, potentially interacting nodes, making it a very versatile design and evaluation tool.
Year
DOI
Venue
2018
10.1007/978-3-030-10549-5_62
Lecture Notes in Computer Science
Keywords
DocType
Volume
Exascale systems,Resiliency,Fault detection,Monitoring,Benchmarking,Open-source
Conference
11339
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
11
6
Name
Order
Citations
PageRank
Alessio Netti142.42
Zeynep Kiziltan237427.79
Özalp Babaoglu321749.75
Alina Sîrbu4679.06
Andrea Bartolini545751.90
Andrea Borghesi6265.92