Abstract | ||
---|---|---|
Current data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along several computers that partially solves the undertaken problem. Then, these partial answers are integrated in order to obtain a final solution. We introduce the Distributed Shared Nearest Neighbor based clustering algorithm (D-SNN) which is able to work with disjoint partitions of data producing a global clustering solution that achieves a competitive performance regarding centralized approaches. Our algorithm is suited for large scale problems (e.g, text clustering) where data cannot be handled by a single machine due to memory size constraints. Experimental results over five data sets show that our proposal is competitive in terms of standard clustering quality performance measures. |
Year | Venue | Field |
---|---|---|
2017 | CIARP | k-nearest neighbors algorithm,Data mining,Data set,Data processing,Disjoint sets,Pattern recognition,Document clustering,Computer science,Distributed algorithm,Artificial intelligence,Cluster analysis |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
8 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Juan Zamora | 1 | 16 | 2.64 |
Héctor Allende-cid | 2 | 22 | 12.60 |
Marcelo Mendoza | 3 | 1502 | 85.81 |