Title | ||
---|---|---|
EP-MEANS: an efficient nonparametric clustering of empirical probability distributions |
Abstract | ||
---|---|---|
Given a collection of m continuous-valued, one-dimensional empirical probability distributions {P1, ..., Pm}, how can we cluster these distributions efficiently with a nonparametric approach? Such problems arise in many real-world settings where keeping the moments of the distribution is not appropriate, because either some of the moments are not defined or the distributions are heavy-tailed or bi-modal. Examples include mining distributions of inter-arrival times and phone-call lengths. We present an efficient algorithm with a non-parametric model for clustering empirical, one-dimensional, continuous probability distributions. Our algorithm, called ep-means, is based on the Earth Mover's Distance and k-means clustering. We illustrate the utility of ep-means on various data sets and applications. In particular, we demonstrate that ep-means effectively and efficiently clusters probability distributions of mixed and arbitrary shapes, recovering ground-truth clusters exactly in cases where existing methods perform at baseline accuracy. We also demonstrate that ep-means outperforms moment-based classification techniques and discovers useful patterns in a variety of real-world applications.
|
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2695664.2695860 | SAC 2015: Symposium on Applied Computing
Salamanca
Spain
April, 2015 |
Field | DocType | ISBN |
Recommender system,Cluster (physics),Data set,Computer science,Algorithm,Empirical probability,Nonparametric statistics,Theoretical computer science,Probability distribution,Cluster analysis | Conference | 978-1-4503-3196-8 |
Citations | PageRank | References |
4 | 0.40 | 9 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Keith Henderson | 1 | 477 | 24.73 |
Brian Gallagher | 2 | 1673 | 86.45 |
Tina Eliassi-Rad | 3 | 1597 | 108.63 |