Title
Machine Learning Friendly Set Version of Johnson-Lindenstrauss Lemma.
Abstract
The widely discussed and applied Johnson–Lindenstrauss (JL) Lemma has an existential form saying that for each set of data points Q in n-dimensional space, there exists a transformation f into an $$n'$$-dimensional space ($$n'<n$$) such that for each pair $$\mathbf {u},\mathbf {v} \in Q$$$$(1-\delta )\Vert \mathbf {u}-\mathbf {v}\Vert ^2 \le \Vert f(\mathbf {u})-f(\mathbf {v})\Vert ^2 \le (1+\delta )\Vert \mathbf {u}-\mathbf {v}\Vert ^2 $$ for a user-defined error parameter $$\delta $$. Furthermore, it is asserted that with some finite probability the transformation f may be found as a random projection (with scaling) onto the $$n'$$ dimensional subspace so that after sufficiently many repetitions of random projection, f will be found with user-defined success rate $$1-\epsilon $$. In this paper, we make a novel use of the JL Lemma. We prove a theorem stating that we can choose the target dimensionality in a random projection-type JL linear transformation in such a way that with probability $$1-\epsilon $$ all of data points from Q fall into predefined error range $$\delta $$ for any user-predefined failure probability $$\epsilon $$ when performing a single random projection. This result is important for applications such as data clustering where we want to have a priori dimensionality reducing transformation instead of attempting a (large) number of them, as with traditional Johnson–Lindenstrauss Lemma. Furthermore, we investigate an important issue whether or not the projection according to JL Lemma is really useful when conducting data processing, that is whether the solutions to the clustering in the projected space apply to the original space. In particular, we take a closer look at the k-means algorithm and prove that a good solution in the projected space is also a good solution in the original space. Furthermore, under proper assumptions local optima in the original space are also ones in the projected space. We investigate also a broader issue of preserving clusterability under JL Lemma projection. We define the conditions for which clusterability property of the original space is transmitted to the projected space, so that a broad class of clustering algorithms for the original space is applicable in the projected space.
Year
DOI
Venue
2017
10.1007/s10115-019-01412-8
Knowledge and Information Systems
Keywords
Field
DocType
Johnson–Lindenstrauss lemma, Random projection, Sample distortion, Dimensionality reduction, Linear JL transform, k-means algorithm, Clusterability retention
Random projection,Discrete mathematics,Combinatorics,Existential quantification,Céa's lemma,Curse of dimensionality,Linear map,Cluster analysis,Mathematics,Lemma (mathematics),Johnson–Lindenstrauss lemma
Journal
Volume
Issue
ISSN
62
5
0219-1377
Citations 
PageRank 
References 
0
0.34
7
Authors
1
Name
Order
Citations
PageRank
Mieczyslaw A. Klopotek136678.58