Title
Visualizing The Finer Cluster Structure Of Large-Scale And High-Dimensional Data
Abstract
Dimension reduction and visualization of high-dimensional data have become very important research topics because of the rapid growth of large databases with high dimensions in data science. A successful dimension reduction and visualization method seeks to produce a low-dimensional representation of high-dimensional data that preserves both the global and local structure of the data. In this paper, we propose using a generalized sigmoid function to model the distance similarity in both high- and low-dimensional space. In particular, a single parameter v is introduced to the generalized sigmoid function in low-dimensional space, so that we can adjust the slope and the heaviness of the function tail by changing the value of the parameter easily. Using real-world data sets with different sample sizes and dimensions, we show that our proposed method can generate visualization results that are competitive with those of the state-of-the-art methods, such as uniform manifold approximation and projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and related methods. In addition, by adjusting the value of v, our proposed method can preserve more of both the global and finer cluster structure of the data. Furthermore, like UMAP, our proposed method can easily scale to massive high-dimensional data. Finally, we use domain knowledge to demonstrate that the finer subclusters that are revealed with small values of v are meaningful.
Year
DOI
Venue
2021
10.1007/978-3-030-82153-1_30
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III
Keywords
DocType
Volume
Data visualization, Manifold learning, Nonlinear dimension reduction, Cluster structure, Generalized sigmoid function
Conference
12817
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Yu Liang12112.01
A. Chaudhuri2115.82
Haoyu Wang301.69