Title
EP-MEANS: an efficient nonparametric clustering of empirical probability distributions
Abstract
Given a collection of m continuous-valued, one-dimensional empirical probability distributions {P1, ..., Pm}, how can we cluster these distributions efficiently with a nonparametric approach? Such problems arise in many real-world settings where keeping the moments of the distribution is not appropriate, because either some of the moments are not defined or the distributions are heavy-tailed or bi-modal. Examples include mining distributions of inter-arrival times and phone-call lengths. We present an efficient algorithm with a non-parametric model for clustering empirical, one-dimensional, continuous probability distributions. Our algorithm, called ep-means, is based on the Earth Mover's Distance and k-means clustering. We illustrate the utility of ep-means on various data sets and applications. In particular, we demonstrate that ep-means effectively and efficiently clusters probability distributions of mixed and arbitrary shapes, recovering ground-truth clusters exactly in cases where existing methods perform at baseline accuracy. We also demonstrate that ep-means outperforms moment-based classification techniques and discovers useful patterns in a variety of real-world applications.
Year
DOI
Venue
2015
10.1145/2695664.2695860
SAC 2015: Symposium on Applied Computing Salamanca Spain April, 2015
Field
DocType
ISBN
Recommender system,Cluster (physics),Data set,Computer science,Algorithm,Empirical probability,Nonparametric statistics,Theoretical computer science,Probability distribution,Cluster analysis
Conference
978-1-4503-3196-8
Citations 
PageRank 
References 
4
0.40
9
Authors
3
Name
Order
Citations
PageRank
Keith Henderson147724.73
Brian Gallagher2167386.45
Tina Eliassi-Rad31597108.63