Title
Run-Time Root Cause Analysis in Adaptive Distributed Systems.
Abstract
In a distributed environment, several components collaborate with each other to cater a complex functionality. Adaptation in distributed systems is one of the emerging trends that re-configures itself through components addition/removal/update, to cope up with faults. Components are generally inter-dependent, thus a fault propagates from one component to another. Existing root cause analysis techniques generally create a static faults' dependencies graph to identify the root fault. However, these dependencies keep on changing with adaptations that makes design-time fault dependencies invalid at run-time. This paper describes the problem of deriving causal relationships of faults in adaptive distributed systems. Then, presents a statechart-based solution that statically identifies the sequence of methods execution to derive the causal relationships of faults at run-time. The approach is evaluated, and found that it is highly scalable and time efficient that can be used to reduce the Mean Time To Recover (MTTR) of a distributed system.
Year
DOI
Venue
2013
10.1007/978-3-642-41033-8_38
Lecture Notes in Computer Science
Keywords
DocType
Volume
Distributed Systems,Root cause analysis,Fault causal relationship,adaptive system,component-based system
Conference
8186
ISSN
Citations 
PageRank 
0302-9743
1
0.43
References 
Authors
12
3
Name
Order
Citations
PageRank
Amit Raj110.43
Stephen Barrett2636.47
Siobhán Clarke369987.36