EP-MEANS: an efficient nonparametric clustering of empirical probability distributions - Citegraph

Paper Info

Title
EP-MEANS: an efficient nonparametric clustering of empirical probability distributions

Abstract
Given a collection of m continuous-valued, one-dimensional empirical probability distributions {P1, ..., Pm}, how can we cluster these distributions efficiently with a nonparametric approach? Such problems arise in many real-world settings where keeping the moments of the distribution is not appropriate, because either some of the moments are not defined or the distributions are heavy-tailed or bi-modal. Examples include mining distributions of inter-arrival times and phone-call lengths. We present an efficient algorithm with a non-parametric model for clustering empirical, one-dimensional, continuous probability distributions. Our algorithm, called ep-means, is based on the Earth Mover's Distance and k-means clustering. We illustrate the utility of ep-means on various data sets and applications. In particular, we demonstrate that ep-means effectively and efficiently clusters probability distributions of mixed and arbitrary shapes, recovering ground-truth clusters exactly in cases where existing methods perform at baseline accuracy. We also demonstrate that ep-means outperforms moment-based classification techniques and discovers useful patterns in a variety of real-world applications.

Year	DOI	Venue
2015	10.1145/2695664.2695860	SAC 2015: Symposium on Applied Computing Salamanca Spain April, 2015
Field	DocType	ISBN
Recommender system,Cluster (physics),Data set,Computer science,Algorithm,Empirical probability,Nonparametric statistics,Theoretical computer science,Probability distribution,Cluster analysis	Conference	978-1-4503-3196-8
Citations	PageRank	References
4	0.40	9
Authors
3

Authors (3 rows)

Cited by (4 rows)

References (9 rows)

Name	Order	Citations	PageRank
Keith Henderson	1	477	24.73
Brian Gallagher	2	1673	86.45
Tina Eliassi-Rad	3	1597	108.63

1