Title | ||
---|---|---|
Near-Realtime Server Reboot Monitoring and Root Cause Analysis in a Large-Scale System |
Abstract | ||
---|---|---|
Large-scale Internet services run on a fleet of distributed servers, and the continuous availability of the hardware is key to the robustness of the services. Unplanned reboots disrupt the services running on the hardware and lower the fleet availability. Server reboots are also important signals that could indicate underlying issues such as memory leaks from the services, catastrophic hardware fa... |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/DSN-S52858.2021.00027 | 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-S) |
Keywords | DocType | ISSN |
server reboots,datacenter,availability,near realtime,large scale production system,data engineering | Conference | 1530-0889 |
ISBN | Citations | PageRank |
978-1-6654-3566-6 | 0 | 0.34 |
References | Authors | |
0 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fred Lin | 1 | 0 | 0.34 |
Bhargav Bolla | 2 | 0 | 0.34 |
Eric Pinkham | 3 | 0 | 0.34 |
Neil Kodner | 4 | 0 | 0.34 |
Daniel Moore | 5 | 0 | 0.34 |
Amol Desai | 6 | 0 | 0.68 |
Sriram Sankar | 7 | 1 | 1.70 |