Name
Affiliation
Papers
NATHAN DEBARDELEBEN
Los Alamos Natl Lab, High Performance Comp Div, Los Alamos, NM 87544 USA
59
Collaborators
Citations 
PageRank 
217
490
31.71
Referers 
Referees 
References 
1099
1351
680
Search Limit
1001000
Title
Citations
PageRank
Year
Online Detection and Classification of State Transitions of Multivariate Shock and Vibration Data00.342022
Resiliency in numerical algorithm design for extreme scale simulations00.342022
Understanding the Effects of DRAM Correctable Error Logging at Scale00.342021
Thermal neutrons: a possible threat for supercomputer reliability10.402021
Quantifying Server Memory Frequency Margin and Using It to Improve Performance in HPC Systems10.352021
Extreme Protection Against Data Loss with Single-Overlap Declustered Parity00.342020
An Overview of the Risk Posed by Thermal Neutrons to the Reliability of Computing Devices00.342020
Thermal Neutrons: a Possible Threat for Supercomputers and Safety Critical Applications10.412020
TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications20.372020
Chaser: An Enhanced Fault Injection Tool for Tracing Soft Errors in MPI Applications00.342020
Quantifying Memory Underutilization in HPC Systems and Using it to Improve Performance via Architecture Support50.412019
BinFI : an efficient fault injector for safety-critical machine learning systems120.632019
TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs30.392019
Do Solar Proton Events Reduce the Number of Faults in Supercomputers?: A Comparative Analysis of Faults During and without Solar Proton Events00.342019
SaNSA - The Supercomputer and Node State Architecture00.342018
Characterization and Comparison of Application Resilience for Serial and Parallel Executions.10.352018
Enhancing HPC System Log Analysis by Identifying Message Origin in Source Code00.342018
Modeling Application Resilience In Large-Scale Parallel Execution00.342018
Using virtualization to quantify power conservation via near-threshold voltage reduction for inherently resilient applications.10.352018
Lessons learned from memory errors observed over the lifetime of Cielo.50.432018
Improving Application Resilience by Extending Error Correction with Contextual Information00.342018
Physics-Informed Machine Learning for DRAM Error Modeling00.342018
The Atlas Cluster Trace Repository.00.342018
RSVP: Soft Error Resilient Power Savings at Near-Threshold Voltage Using Register Vulnerability00.342017
Addressing statistical significance of fault injection: empirical studies of the soft error susceptibility.10.352017
Experimental and Analytical Study of Xeon Phi Reliability20.362017
Resilience Analysis of Top K Selection Algorithms00.342017
LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures.40.432017
Silent Data Corruption Resilient Two-sided Matrix Factorizations.60.422017
Automating DRAM Fault Mitigation By Learning From Experience10.352017
Improving DRAM Fault Characterization through Machine Learning40.422016
SDC is in the Eye of the Beholder: A Survey and Preliminary Study20.362016
Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications.10.352016
On the Inherent Resilience of Integer Operations.00.342016
Towards Practical Algorithm Based Fault Tolerance in Dense Linear Algebra.90.482016
Differentiated Failure Remediation with Action Selection for Resilient Computing10.362015
Empirical Studies of the Soft Error Susceptibility ofSorting Algorithms to Statistical Fault Injection30.422015
Towards Building Resilient Scientific Applications: Resilience Analysis on the Impact of Soft Error and Transient Error Tolerance with the CLAMR Hydrodynamics Mini-App30.422015
Memory Errors in Modern Systems: The Good, The Bad, and The Ugly751.672015
On the Non-Suitability of Non-Volatility20.382015
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation491.672015
Harnessing Unreliable Cores in Heterogeneous Architecture: The PyDac Programming Model and Runtime00.342014
F-SEFI: A Fine-Grained Soft Error Fault Injection Tool for Profiling Application Vulnerability231.122014
GPGPUs: how to combine high computational power with high reliability140.852014
Addressing failures in exascale computing1233.222014
Fault Injection Experiments with the CLAMR Hydrodynamics Mini-App20.402014
PyDac: A Resilient Run-Time Framework for Divide-and-Conquer Applications on a Heterogeneous Many-Core Architecture.10.382013
Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults691.912013
Analyzing Reliability of Memory Sub-systems with Double-Chipkill Detect/Correct40.462013
Exploring Time and Frequency Domains for Accurate and Automated Anomaly Detection in Cloud Computing Systems40.432013
  • 1
  • 2