Abstract | ||
---|---|---|
In this paper, we address the problem of recovering arbitrary-shaped data clusters from datasets while facing emph{high space constraints}, as this is for instance the case in many real-world applications when analysis algorithms are directly deployed on resources-limited mobile devices collecting the data. We present DBMSTClu a new space-efficient density-based emph{non-parametric} method working on a Minimum Spanning Tree (MST) recovered from a limited number of linear measurements i.e. a emph{sketched} version of the dissimilarity graph $mathcal{G}$ between the $N$ objects to cluster. Unlike $k$-means, $k$-medians or $k$-medoids algorithms, it does not fail at distinguishing clusters with particular forms thanks to the property of the MST for expressing the underlying structure of a graph. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. An approximate MST is retrieved by following the dynamic emph{semi-streaming} model in handling the dissimilarity graph $mathcal{G}$ as a stream of edge weight updates which is sketched in one pass over the data into a compact structure requiring $O(N operatorname{polylog}(N))$ space, far better than the theoretical memory cost $O(N^2)$ of $mathcal{G}$. The recovered approximate MST $mathcal{T}$ as input, DBMSTClu then successfully detects the right number of nonconvex clusters by performing relevant cuts on $mathcal{T}$ in a time linear in $N$. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing state-of-the-art on several datasets. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1137/1.9781611975321.2 | SDM |
Field | DocType | Citations |
Cluster (physics),Spectral clustering,Graph,Discrete mathematics,Pattern recognition,Computer science,Artificial intelligence,Partition (number theory),Cluster analysis,DBSCAN,Minimum spanning tree | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anne Morvan | 1 | 3 | 1.42 |
Krzysztof Choromanski | 2 | 124 | 23.56 |
Cédric Gouy-Pailler | 3 | 62 | 10.69 |
Jamal Atif | 4 | 309 | 29.49 |