Title
Multi-task Hierarchical Classification for Disk Failure Prediction in Online Service Systems
Abstract
One of the most common threats to online service system's reliability is disk failure. Many disk failure prediction techniques have been developed to predict failures before they actually occur, allowing proactive steps to be taken to minimize service disruption and increase service reliability. Existing approaches for disk failure prediction do not differentiate among various types of disk failure. In industrial practice, however, different product teams treat distinct types of disk failures as different prediction tasks in large-scale online service systems like Microsoft 365. For example, hardware operation team is concerned with physical disk errors, while database service team focuses on I/O delay. In this paper, we propose MTHC (Multi-Task Hierarchical Classification) to enhance the performance of disk failure prediction for each task via multi-task learning. In addition, MTHC introduces a novel hierarchy-aware mechanism to deal with the data imbalance problem, which is a severe issue in the area of disk failure prediction. We show that MTHC can be easily utilized to enhance most state-of-the-art disk failure prediction models. Our experiments on both industrial and public datasets demonstrate that such disk failure prediction models enhanced by MTHC performs much better than those models working without MTHC. Furthermore, our experiments also present that the hierarchical-aware mechanism underlying MTHC can alleviate the data imbalance problem and thus improve the practical performance of various disk failure prediction models. More encouragingly, the proposed MTHC has been successfully applied to Microsoft 365 online service systems, and averagely reduces the number of virtual machine interruptions by 10% per month.
Year
DOI
Venue
2022
10.1145/3534678.3539176
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
15
Name
Order
Citations
PageRank
Yudong Liu100.34
Hailan Yang200.34
Pu Zhao387.23
Minghua Ma400.68
Chengwu Wen500.34
Hongyu Zhang686450.03
Chuan Luo749641.38
Qingwei Lin828527.76
Chang Yi900.34
Jiaojian Wang1000.34
Chenjian Zhang1100.34
Paul Wang1200.34
Yingnong Dang1353726.92
Saravan Rajmohan1401.69
Dongmei Zhang151439132.94