Title
Less is More: Sparse Graph Mining with Compact Matrix Decomposition
Abstract
Given a large sparse graph, how can we find patterns and anomalies? Several important applications can be modeled as large sparse graphs, e.g., network traffic monitoring, research citation network analysis, social network analysis, and financial transactions. Low-rank decompositions, such as singular value decomposition (SVD) and CUR, are powerful techniques for revealing latent-hidden variables and associated patterns from high dimensional data. However, those methods often ignore the sparsity property of the graph, and hence usually incur too high memory and computational cost to be practical. We propose a novel method, the Compact Matrix Decomposition (CMD), to compute sparse low-rank approximations. CMD dramatically reduces both the computation cost and the space requirements over existing decomposition methods singular value decomposition (SVD) and CUR. Using CMD as the key building block, we further propose procedures to efficiently construct and analyze dynamic graphs from real-time application data. We provide theoretical guarantee for our methods, and present results on two real, large datasets, one on network flow data (100 GB trace of 22K hosts over one month) and one on DBLP (200 MB over 25 years). We show that CMD is often an order of magnitude more efficient than the state of the art (SVD and CUR): it is over 10X faster, but requires less than 1-10 of the space, for the same reconstruction accuracy. Finally, we demonstrate how CMD is used for detecting anomalies and monitoring time-evolving graphs, in which it successfully detects worm-like hierarchical scanning patterns in real network data. Copyright © 2007 Wiley Periodicals, Inc., A Wiley Company Statistical Analy Data Mining 1: 000-000, 2007
Year
DOI
Venue
2008
10.1002/sam.v1:1
Statistical Analysis and Data Mining
Keywords
Field
DocType
data mining,matrix decomposition
Flow network,Singular value decomposition,Data mining,Clustering high-dimensional data,Computer science,High memory,Matrix decomposition,Social network analysis,Artificial intelligence,Machine learning,Dense graph,Computation
Journal
Volume
Issue
Citations 
1
1
30
PageRank 
References 
Authors
2.66
15
4
Name
Order
Citations
PageRank
Jimeng Sun14729240.91
Yinglian Xie2114076.73
Hui Zhang388561002.58
Christos Faloutsos4279724490.38