Title
Dirichlet process mixture models made scalable and effective by means of massive distribution.
Abstract
Clustering with accurate results have become a topic of high interest. Dirichlet Process Mixture (DPM) is a model used for clustering with the advantage of discovering the number of clusters automatically and offering nice properties like, e.g., its potential convergence to the actual clusters in the data. These advantages come at the price of prohibitive response times, which impairs its adoption and makes centralized DPM approaches inefficient. We propose DC-DPM, a parallel clustering solution that gracefully scales to millions of data points while remaining DPM compliant, which is the challenge of distributing this process. Our experiments, on both synthetic and real world data, illustrate the high performance of our approach on millions of data points. The centralized algorithm does not scale and has its limit on 100K data points, where it needs more than 7 hours. In this case, our approach needs less than 30 seconds.
Year
DOI
Venue
2019
10.1145/3297280.3297327
SAC
Keywords
Field
DocType
clustering, dirichlet process mixture model, parallelism
Data point,Convergence (routing),Cluster (physics),Computer science,Dirichlet process mixture,Algorithm,Cluster analysis,Dirichlet process mixture model,Scalability
Conference
ISBN
Citations 
PageRank 
978-1-4503-5933-7
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Khadidja Meguelati100.34
Benedicte Fontez200.34
Nadine Hilgert3214.55
Florent Masseglia440843.08