Title
Clustering Through Probability Distribution Analysis Along Eigenpaths
Abstract
Data clustering is one of the most fundamental techniques in exploratory data analysis. It is widely used for determining the underlying data structure, classifying natural data and compressing data in engineering, business management, social statistics, computer science, and medicine. Under the assumption that clusters are high density regions in the feature space separated by relatively low density neighbors, a novel approach is proposed for modeling any high dimensional clustering problem as a one-dimensional analysis of the probability distribution. First, a special path between two vertexes, namely eigenpath, is defined in this paper to represent their close connection. Second, we propose the connectedness index based on the eigenpath for quantitatively describing the connection between two vertexes. Third, the connectedness index is applied to the candidates of cluster centers and measures the connection between different candidates. Then an indicative curve can be drawn with the knowledge of connectedness index. This approach not only provides effective indicative curve for unknown data sets but also facilitates eliminating the curse of dimensionality partly as well as correctly recognizes arbitrary cluster forms and automatically excludes outliers. Extensive experiments showed the effectiveness and efficiency of the proposed approach.
Year
DOI
Venue
2021
10.1109/TSMC.2018.2884839
IEEE Transactions on Systems, Man, and Cybernetics: Systems
Keywords
DocType
Volume
Connectedness index,density-based clustering,eigenpath
Journal
51
Issue
ISSN
Citations 
2
2168-2216
0
PageRank 
References 
Authors
0.34
16
5
Name
Order
Citations
PageRank
WM122134.28
Changqing Hui200.34
Daren Sun300.34
Xiang Sun4101.48
QM546472.05