Title
Fault Localization via Risk Modeling
Abstract
Internet backbone networks are under constant flux in order to keep up with demand and offer new features. The pace of change in technology often outstrips the pace of introduction of associated fault monitoring capabilities that are built into today's IP protocols and routers. Moreover, some of these new technologies cross networking layers, raising the potential for unanticipated interactions and service disruptions, which the individual layers' built-in monitoring capabilities may not detect. In these instances, operators typically employ higher layer monitoring techniques such as end-to-end liveness probing to detect lower or cross-layer failures, but lack tools to precisely determine where a detected failure may have occurred. In this paper, we evaluate the effectiveness of using risk modeling to translate high-level failure notifications into lower layer root causes in two specific scenarios in a tier-1 ISP. We show that a simple greedy heuristic works with accuracy exceeding 80 percent for many failure scenarios in simulation, while delivering extremely high precision (greater than 80 percent). We report our operational experience using risk modeling to isolate optical component and MPLS control plane failures in an ISP backbone.
Year
DOI
Venue
2010
10.1109/TDSC.2009.37
IEEE Trans. Dependable Sec. Comput.
Keywords
Field
DocType
high-level failure notification,failure scenario,individual layer,fault localization,risk modeling,internet backbone network,higher layer monitoring technique,mpls control plane failure,cross-layer failure,associated fault monitoring capability,built-in monitoring capability,isp backbone,internet,ip routing,mpls,spatial correlation,network management system,multiprotocol label switching,tier 1 isp,ospf,failure analysis,fault management,networking layers,optical fiber,data model,risk management,topology,failure mode,ip protocols
Open Shortest Path First,Internet Protocol,Multiprotocol Label Switching,Computer science,Network layer,Computer network,Real-time computing,Greedy algorithm,Risk management,Internet backbone,Liveness,Distributed computing
Journal
Volume
Issue
ISSN
7
4
1545-5971
Citations 
PageRank 
References 
12
0.75
20
Authors
4
Name
Order
Citations
PageRank
Ramana Rao Kompella1102957.23
Jennifer Yates279064.51
Albert G. Greenberg35970676.74
Alex C. Snoeren43228239.85