Title
HealthLog Monitor: A Flexible System-Monitoring Linux Service
Abstract
Error monitoring is a critical procedure for most computing systems, varying from HPC to embedded systems domains. Several generic architectures have been proposed and employed in modern processors, offering the capability of hardware-level error detection. This critical information is required to isolate and/or mitigate failures. However, research has revealed many cases where indications of upcoming failures can be identified early and before the actual fail occurrence, known as symptoms. Such cases become more frequent as technology trends try to exploit the conservative worst-case voltage guardbands and push computing systems towards more aggressive and often hazardous regions. In this paper we present HealthLog monitor, a flexible system monitoring service that offers a generic abstraction layer to combine both error and symptom monitoring. HealthLog is capable of monitoring hardware measurements (performance, sensor and errors) as well as external health-related data, allowing combined symptom description and reaction features supported by an API. The scope of the monitor is to offer a universal standard for error reporting and system monitoring mechanisms in all system layers. The current version of HealthLog was developed and tested on AppliedMicro's X-Gene 2 micro-server, but it is a cross-platform solution as it does not depend on a specific architecture. This work demonstrates how platform events, software metrics and external peripheral mechanisms can be combined to deliver early warnings of upcoming failures and trigger evading reactions.
Year
DOI
Venue
2018
10.1109/IOLTS.2018.8474119
2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS)
Keywords
Field
DocType
reliability,error handling,error reporting,protection mechanisms
Computer science,System monitoring,Real-time computing,Exploit,Error detection and correction,Software metric,Abstraction layer,Computing systems,Embedded system
Conference
ISBN
Citations 
PageRank 
978-1-5386-5993-9
0
0.34
References 
Authors
5
3
Name
Order
Citations
PageRank
Athanasios Chatzidimitriou1596.83
George Papadimitriou2364.38
Gizopoulos, D.31059.80