Title
Toward online testing of federated and heterogeneous distributed systems
Abstract
Making distributed systems reliable is notoriously difficult. It is even more difficult to achieve high reliability for federated and heterogeneous systems, i.e., those that are operated by multiple administrative entities and have numerous inter-operable implementations. A prime example of such a system is the Internet's inter-domain routing, today based on BGP. We argue that system reliability should be improved by proactively identifying potential faults using an online testing functionality. We propose DiCE, an approach that continuously and automatically explores the system behavior, to check whether the system deviates from its desired behavior. DiCE orchestrates the exploration of relevant system behaviors by subjecting system nodes to many possible inputs that exercise node actions. DiCE starts exploring from current, live system state, and operates in isolation from the deployed system. We describe our experience in integrating DiCE with an opensource BGP router. We evaluate the prototype's ability to quickly detect origin misconfiguration, a recurring operator mistake that causes Internet-wide outages. We also quantify DiCE's overhead and find it to have marginal impact on system performance.
Year
Venue
Keywords
2011
USENIX Annual Technical Conference
opensource bgp router,system performance,live system state,high reliability,relevant system behavior,system node,heterogeneous system,system deviate,online testing,system behavior,system reliability,computer science
Field
DocType
Citations 
Prime (order theory),Mistake,Computer science,Real-time computing,Implementation,Border Gateway Protocol,Operator (computer programming),Dice,The Internet,Distributed computing
Conference
5
PageRank 
References 
Authors
0.47
11
6
Name
Order
Citations
PageRank
Marco Canini185760.21
Vojin Jovanovic21035.03
Daniele Venzano322116.42
Boris Spasojević450.47
Olivier Crameri5724.67
Dejan Kostic61707119.11