Title | ||
---|---|---|
Learning a Hierarchical Monitoring System for Detecting and Diagnosing Service Issues |
Abstract | ||
---|---|---|
We propose a machine learning based framework for building a hierarchical monitoring system to detect and diagnose service issues. We demonstrate its use for building a monitoring system for a distributed data storage and computing service consisting of tens of thousands of machines. Our solution has been deployed in production as an end-to-end system, starting from telemetry data collection from individual machines, to a visualization tool for service operators to examine the detection outputs. Evaluation results are presented on detecting 19 customer impacting issues in the past three months. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2783258.2788624 | ACM Knowledge Discovery and Data Mining |
Field | DocType | Citations |
Data collection,Data mining,Monitoring system,Visualization,Computer science,Distributed data store,Telemetry,Past Three Months,Unsupervised learning,Artificial intelligence,Machine learning | Conference | 8 |
PageRank | References | Authors |
0.49 | 18 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vinod Nair | 1 | 1658 | 134.40 |
Ameya Raul | 2 | 8 | 0.49 |
Shwetabh Khanduja | 3 | 8 | 0.49 |
Vikas Bahirwani | 4 | 29 | 1.85 |
Sundararajan Sellamanickam | 5 | 127 | 14.07 |
S. Sathiya Keerthi | 6 | 4455 | 527.30 |
Steve Herbert | 7 | 8 | 0.49 |
Sudheer Dhulipalla | 8 | 9 | 0.87 |