Title
Estimating the effective dimension of large biological datasets using Fisher separability analysis
Abstract
Modern large-scale datasets are frequently said to be high-dimensional. However, their data point clouds frequently possess structures, significantly decreasing their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. We introduce and test a dimensionality estimator, based on analysing the separability properties of data points, on several benchmarks and real biological datasets. We show that the introduced measure of ID has performance competitive with state-of-the-art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allows estimating the intrinsic dimension in situations where the intrinsic manifold assumption is not valid.
Year
DOI
Venue
2019
10.1109/IJCNN.2019.8852450
2019 International Joint Conference on Neural Networks (IJCNN)
Keywords
Field
DocType
high-dimensional data,intrinsic dimensionality,separability,cancer mutation,single cell RNA-Seq
Data point,Effective dimension,Algorithm,Curse of dimensionality,Intrinsic dimension,Artificial intelligence,Point cloud,Mathematics,Machine learning,Manifold,Instrumental and intrinsic value,Estimator
Journal
Volume
ISSN
ISBN
abs/1901.06328
2161-4393
978-1-7281-1986-1
Citations 
PageRank 
References 
1
0.35
15
Authors
3
Name
Order
Citations
PageRank
Luca Albergante191.99
Jonathan Bac211.70
Andrei Zinovyev328227.30