Abstract | ||
---|---|---|
We present FINJ, a high-level fault injection tool for High-Performance Computing (HPC) systems, with a focus on the management of complex experiments. FINJ provides support for custom workloads and allows generation of anomalous conditions through the use of fault-triggering executable programs. FINJ can also be integrated seamlessly with most other lower-level fault injection tools, allowing users to create and monitor a variety of highly-complex and diverse fault conditions in HPC systems that would be difficult to recreate in practice. FINJ is suitable for experiments involving many, potentially interacting nodes, making it a very versatile design and evaluation tool. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/978-3-030-10549-5_62 | Lecture Notes in Computer Science |
Keywords | DocType | Volume |
Exascale systems,Resiliency,Fault detection,Monitoring,Benchmarking,Open-source | Conference | 11339 |
ISSN | Citations | PageRank |
0302-9743 | 0 | 0.34 |
References | Authors | |
11 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alessio Netti | 1 | 4 | 2.42 |
Zeynep Kiziltan | 2 | 374 | 27.79 |
Özalp Babaoglu | 3 | 217 | 49.75 |
Alina Sîrbu | 4 | 67 | 9.06 |
Andrea Bartolini | 5 | 457 | 51.90 |
Andrea Borghesi | 6 | 26 | 5.92 |