Title | ||
---|---|---|
Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service |
Abstract | ||
---|---|---|
In this paper we propose a highly scalable approach to data clustering which may be applied in cloud-based big data services. We present a hierarchical approach to create an automatic data clustering in a Scalable Distributed Two-Layer Datastore (SD2DS) system by extending LH* schema so that it enables addressing data items based on their content. We achieved that with the bucket structure increase, the total clustering error decreases. Moreover, our method allows to incrementally add new data items to the structure and enables a parallel data processing. We carried out various simulations for 3 different cluster shapes and 5 different noise ratios to prove correctness of our solution. Additionally, we compare our solution with common clustering methods like K-means, Agglomerative and Birch. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ES.2018.00029 | 2018 Sixth International Conference on Enterprise Systems (ES) |
Keywords | Field | DocType |
BDaaS,Cloud Computing,Big Data,Clustering,SD2DS | Hierarchical clustering,Data mining,Data structure,Computer science,Correctness,Distributed database,Cluster analysis,Big data,Cloud computing,Scalability | Conference |
ISSN | ISBN | Citations |
2377-8636 | 978-1-5386-8389-7 | 0 |
PageRank | References | Authors |
0.34 | 8 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Adam Krechowicz | 1 | 3 | 2.85 |
Stanislaw Deniziak | 2 | 45 | 13.20 |