Title
Scalable approximation of kernel fuzzy c-means
Abstract
Virtually every sector of business and industry that use computing, including financial analysis, search engines, and electronic commerce, incorporate Big Data analysis into their business model. Sophisticated clustering algorithms are highly desired to deduce the nature of data by assigning labels to unlabeled data. We address two main challenges in Big Data. First, by definition, the volume of Big Data is too large to be loaded into a computer's memory (this volume changes based on the computer used or available). Second, in real-time applications, the velocity of new incoming data prevents historical data from being stored and future data from being accessed. Therefore, we propose our Streaming Kernel Fuzzy c-Means (stKFCM) algorithm, which reduces both computational complexity and space complexity significantly. The proposed stKFCM only requires O(n2) memory where n is the (predetermined) size of a data subset (or data chunk) at each time step, which makes this algorithm truly scalable (as n can be chosen based on the available memory). Furthermore, only 2n2 elements of the full N × N (where N >> n) kernel matrix need to be calculated at each time-step, thus reducing both the computation time in producing the kernel elements and the complexity of the FCM algorithm. Empirical results show that stKFCM, even with very small n, can provide clustering performance as accurately as kernel fuzzy c-means run on the entire data set while achieving a significant speedup.
Year
DOI
Venue
2013
10.1109/BigData.2013.6691749
BigData Conference
Keywords
Field
DocType
streaming data,clustering performance,pattern clustering,stkfcm algorithm,projection,computational complexity reduction,approximation theory,fuzzy c-means,streaming kernel fuzzy c-means algorithm,scalable algorithms,kernel fuzzy c-means scalable approximation,computational complexity,big data analysis,computation time reduction,kernel clustering,big data,space complexity reduction,kernel matrix
Fuzzy clustering,Data mining,CURE data clustering algorithm,Computer science,Tree kernel,Theoretical computer science,Artificial intelligence,String kernel,Cluster analysis,Data stream clustering,Kernel embedding of distributions,Variable kernel density estimation,Machine learning
Conference
ISSN
Citations 
PageRank 
2639-1589
2
0.38
References 
Authors
25
2
Name
Order
Citations
PageRank
Zijian Zhang1279.14
Timothy C. Havens220.38