Title | ||
---|---|---|
Dirichlet process mixture models made scalable and effective by means of massive distribution. |
Abstract | ||
---|---|---|
Clustering with accurate results have become a topic of high interest. Dirichlet Process Mixture (DPM) is a model used for clustering with the advantage of discovering the number of clusters automatically and offering nice properties like, e.g., its potential convergence to the actual clusters in the data. These advantages come at the price of prohibitive response times, which impairs its adoption and makes centralized DPM approaches inefficient. We propose DC-DPM, a parallel clustering solution that gracefully scales to millions of data points while remaining DPM compliant, which is the challenge of distributing this process. Our experiments, on both synthetic and real world data, illustrate the high performance of our approach on millions of data points. The centralized algorithm does not scale and has its limit on 100K data points, where it needs more than 7 hours. In this case, our approach needs less than 30 seconds.
|
Year | DOI | Venue |
---|---|---|
2019 | 10.1145/3297280.3297327 | SAC |
Keywords | Field | DocType |
clustering, dirichlet process mixture model, parallelism | Data point,Convergence (routing),Cluster (physics),Computer science,Dirichlet process mixture,Algorithm,Cluster analysis,Dirichlet process mixture model,Scalability | Conference |
ISBN | Citations | PageRank |
978-1-4503-5933-7 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Khadidja Meguelati | 1 | 0 | 0.34 |
Benedicte Fontez | 2 | 0 | 0.34 |
Nadine Hilgert | 3 | 21 | 4.55 |
Florent Masseglia | 4 | 408 | 43.08 |