Title
Automatic Fault Diagnosis in Cloud Infrastructure
Abstract
With cloud computing, a cycle of fault diagnosis and recovery becomes the norm. There is a large amount of monitoring data and log events available, but it is hard to figure out which events or metrics are critical in fault diagnosis. Other approaches model faults as a deviation from normal behaviors, and thus are less applicable in cloud where changes in the environment may impact what is considered normal. In this work, we propose an adaptive and flexible fault diagnosis framework to automatically identify the key fault indicators and detect fault patterns. Leveraging ideas from social media, we represent the hierarchical relationships among metrics and events as well as how they relate to faults. We apply the EdgeRank algorithm to decide the key events that contribute to a fault. Our approach works across different environments to detect the potential faults. We evaluated our framework using a cloud-based enterprise system using a list of injected faults that vary from environmental (e.g. virtual machine or network) to application degradation. We considered both private and public clouds. Our solution achieves over 90% detection accuracy with modest overhead. A comparison of our approach shows it is more accurate than alternative approaches in the literature.
Year
DOI
Venue
2013
10.1109/CloudCom.2013.68
CloudCom (1)
Keywords
Field
DocType
approaches model fault,fault diagnosis,automatic fault diagnosis,key fault indicator,key event,potential fault,approach work,flexible fault diagnosis framework,cloud computing,cloud infrastructure,alternative approach,fault pattern
Enterprise system,Virtual machine,Fault coverage,Computer science,Business data processing,Software fault tolerance,Real-time computing,Fault indicator,Cloud computing,Distributed computing
Conference
ISSN
Citations 
PageRank 
2330-2194
6
0.49
References 
Authors
12
3
Name
Order
Citations
PageRank
Qian Zhu1672.57
Teresa Tung2172.11
Qing Xie360.49