Title
Hierarchical Clustering in Scalable Distributed Two-Layer Datastore for Big Data as a Service
Abstract
In this paper we propose a highly scalable approach to data clustering which may be applied in cloud-based big data services. We present a hierarchical approach to create an automatic data clustering in a Scalable Distributed Two-Layer Datastore (SD2DS) system by extending LH* schema so that it enables addressing data items based on their content. We achieved that with the bucket structure increase, the total clustering error decreases. Moreover, our method allows to incrementally add new data items to the structure and enables a parallel data processing. We carried out various simulations for 3 different cluster shapes and 5 different noise ratios to prove correctness of our solution. Additionally, we compare our solution with common clustering methods like K-means, Agglomerative and Birch.
Year
DOI
Venue
2018
10.1109/ES.2018.00029
2018 Sixth International Conference on Enterprise Systems (ES)
Keywords
Field
DocType
BDaaS,Cloud Computing,Big Data,Clustering,SD2DS
Hierarchical clustering,Data mining,Data structure,Computer science,Correctness,Distributed database,Cluster analysis,Big data,Cloud computing,Scalability
Conference
ISSN
ISBN
Citations 
2377-8636
978-1-5386-8389-7
0
PageRank 
References 
Authors
0.34
8
2
Name
Order
Citations
PageRank
Adam Krechowicz132.85
Stanislaw Deniziak24513.20