Title
Learning a Hierarchical Monitoring System for Detecting and Diagnosing Service Issues
Abstract
We propose a machine learning based framework for building a hierarchical monitoring system to detect and diagnose service issues. We demonstrate its use for building a monitoring system for a distributed data storage and computing service consisting of tens of thousands of machines. Our solution has been deployed in production as an end-to-end system, starting from telemetry data collection from individual machines, to a visualization tool for service operators to examine the detection outputs. Evaluation results are presented on detecting 19 customer impacting issues in the past three months.
Year
DOI
Venue
2015
10.1145/2783258.2788624
ACM Knowledge Discovery and Data Mining
Field
DocType
Citations 
Data collection,Data mining,Monitoring system,Visualization,Computer science,Distributed data store,Telemetry,Past Three Months,Unsupervised learning,Artificial intelligence,Machine learning
Conference
8
PageRank 
References 
Authors
0.49
18
8
Name
Order
Citations
PageRank
Vinod Nair11658134.40
Ameya Raul280.49
Shwetabh Khanduja380.49
Vikas Bahirwani4291.85
Sundararajan Sellamanickam512714.07
S. Sathiya Keerthi64455527.30
Steve Herbert780.49
Sudheer Dhulipalla890.87