Title
Locating the Clues of Declining Success Rate of Service Calls
Abstract
For many on-line systems with massive users, to provide services continuously and steadily is vital for business, which requires the anomalies of services should be located and resolved in a timely manner. As a common IT infrastructure, various APM (Application Performance Management) systems/frameworks have been adopted to monitor each call request to a service. Nevertheless, the call request may contain multidimensional attributes (e.g., City, ISP, Platform, etc.), which may further contain multiple values (e.g., ISP could be T-Mobile, CMCC, etc.). As a result, an anomaly such as DSR (Declining Success Rate) to service typically occurs with a combination of such attribute values, which creates major challenges to locate the root cause of the anomaly due to potentially huge numbers of the combinations. In this paper, we propose a novel method, ImpAPTr (Impact Analysis based on Pruning Tree), to identify the combination of dimensional attributes as the clues leading to the root cause of anomalies regarding DSR timely. In the evaluation with the simulated dataset, ImpAPTr detects valid clues in milliseconds with an accuracy of 99.37% (within the top 10 candidate results), 97.72% (top 5), and 94.51% (top 3), respectively, which outperforms previous approaches to a large degree. A field test with a production environment dataset indicates that ImpAPTr is able to detect valid clues in a few seconds.
Year
DOI
Venue
2020
10.1109/ISSRE5003.2020.00039
2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)
Keywords
DocType
ISSN
On-line service,Continuity,Anomaly,Multiple attributes
Conference
1071-9458
ISBN
Citations 
PageRank 
978-1-7281-9871-2
0
0.34
References 
Authors
0
7
Name
Order
Citations
PageRank
Guoping Rong13512.92
Hao Wang223.41
Yong You300.68
He Zhang481765.63
Jialin Sun500.34
Dong Shao63810.52
Yangchen Xu700.34