Title
Towards an Efficient and Distributed DBSCAN Algorithm Using MapReduce
Abstract
Clustering is a major data mining technique that groups a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Among several types of clustering, density-based clustering algorithms are more efficient in detecting clusters with varied density and different shapes. One of the most important density-based clustering algorithms is DBSCAN. Due to the huge size of generated data by the widespread diffusion of wireless technologies and the complexity of big data analysis, new scalable algorithms for efficiently processing such data are needed. In this chapter we are particularly interested in using traffic data for finding congested areas in a city. For this purpose, we developed a new distributed and efficient strategy of DBSCAN algorithm that uses MapReduce to detect dense areas based on the input parameters. We conducted experiments using real traffic data of a brazilian city, Fortaleza, and compared our approach with the centralized and the MapReduce-based approaches. Our preliminary results confirmed that our approach is scalable and more efficient than the other ones. We also present an incremental version of DBSCAN considering the MapReduce version of it.
Year
DOI
Venue
2014
10.1007/978-3-319-22348-3_5
Lecture Notes in Business Information Processing
Keywords
Field
DocType
DBSCAN,MapReduce,Traffic data
Cluster (physics),Data mining,Wireless,Computer science,SUBCLU,Scalable algorithms,Cluster analysis,Big data,DBSCAN,Scalability
Conference
Volume
ISSN
Citations 
227
1865-1348
0
PageRank 
References 
Authors
0.34
12
6