Failure recovery: when the cure is worse than the disease - Citegraph

Paper Info

Title
Failure recovery: when the cure is worse than the disease

Abstract
Cloud services inevitably fail: machines lose power, networks become disconnected, pesky software bugs cause sporadic crashes, and so on. Unfortunately, failure recovery itself is often faulty; e.g. recovery can accidentally recursively replicate small failures to other machines until the entire cloud service fails in a catastrophic outage, amplifying a small cold into a contagious deadly plague! We propose that failure recovery should be engineered foremost according to the maxim of primum non nocere, that it "does no harm." Accordingly, we must consider the system holistically when failure occurs and recover only when observed activity safely allows for it.

Year	Venue	Keywords
2013	HotOS	failure recovery,pesky software bug,small failure,primum non nocere,sporadic crash,catastrophic outage,entire cloud service,cloud service,observed activity,small cold
Field	DocType	Citations
Computer science,Computer security,Harm,Software bug,Maxim,Primum non nocere,Recursion,Cloud computing	Conference	18
PageRank	References	Authors
0.73	14	11

Authors (11 rows)

Cited by (18 rows)

References (14 rows)

Name	Order	Citations	PageRank
Zhenyu Guo	1	512	39.61
Sean McDirmid	2	175	13.55
Mao Yang	3	496	30.94
Li Zhuang	4	238	10.65
Pu Zhang	5	18	0.73
Yingwei Luo	6	315	41.30
Tom Bergan	7	18	0.73
Peter Bodík	8	1182	51.66
Madan Musuvathi	9	116	7.62
Zheng Zhang	10	1193	73.82
Lidong Zhou	11	2136	147.82

1