Title
Uncovering Bugs in Distributed Storage Systems during Testing (not in Production!)
Abstract
Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this paper, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors. Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging.
Year
Venue
Field
2016
Conference on File and Storage Technologies
Black-box testing,Computer science,System testing,Concurrency,Distributed data store,Real-time computing,White-box testing,Debugging,Cloud computing,Executable
DocType
Citations 
PageRank 
Conference
2
0.37
References 
Authors
25
11
Name
Order
Citations
PageRank
Pantazis Deligiannis1464.29
Matt McCutchen221.39
Paul Thomson31225.85
Shuo Chen419210.08
Alastair F. Donaldson566152.35
John Erickson6100.87
Cheng Huang772043.59
Akash Lal853732.12
Rashmi Mudduluru9141.54
Shaz Qadeer103257239.11
Wolfram Schulte112342153.40