Title
IP fault localization via risk modeling
Abstract
Automated, rapid, and effective fault management is a central goal of large operational IP networks. Today's networks suffer from a wide and volatile set of failure modes, where the underlying fault proves difficult to detect and localize, thereby delaying repair. One of the main challenges stems from operational reality: IP routing and the underlying optical fiber plant are typically described by disparate data models and housed in distinct network management systems. We introduce a fault-localization methodology based on the use of risk models and an associated troubleshooting system, SCORE (Spatial Correlation Engine), which automatically identifies likely root causes across layers. In particular, we apply SCORE to the problem of localizing link failures in IP and optical networks. In experiments conducted on a tier-1 ISP backbone, SCORE proved remarkably effective at localizing optical link failures using only IP-layer event logs. Moreover, SCORE was often able to automatically uncover inconsistencies in the databases that maintain the critical associations between the IP and optical networks.
Year
Venue
Keywords
2005
NSDI
localizing link failure,optical link failure,risk modeling,ip fault localization,optical network,distinct network management system,ip routing,operational reality,underlying optical fiber plant,underlying fault,effective fault management,large operational ip network,fault management,optical fiber,network management system,data model,failure mode
Field
DocType
Citations 
Troubleshooting,Optical link,Shared Risk Resource Group,Computer science,Computer network,Fault management,Real-time computing,Disparate system,Network monitoring,Loose Source Routing,IP forwarding,Distributed computing
Conference
72
PageRank 
References 
Authors
5.20
12
4
Name
Order
Citations
PageRank
Ramana Rao Kompella1102957.23
Jennifer Yates279064.51
Albert G. Greenberg35970676.74
Alex C. Snoeren43228239.85