Title
Identifiability Of Complete Dictionary Learning
Abstract
Sparse component analysis (SCA), also known as complete dictionary learning, is the following problem: Given an input matrix M and an integer r, find a dictionary D with r columns and a matrix B with k-sparse columns (that is, each column of B has at most k nonzero entries) such that M approximate to DB. A key issue in SCA is identifiability, that is, characterizing the conditions under which D and B are essentially unique (that is, they are unique up to permutation and scaling of the columns of D and rows of B). Although SCA has been vastly investigated in the last two decades, only a few works have tackled this issue in the deterministic scenario, and no work provides reasonable bounds in the minimum number of samples (that is, columns of M) that leads to identifiability. In this work, we provide new results in the deterministic scenario when the data has a low-rank structure, that is, when D is (under)complete. While previous bounds feature a combinatorial term ((k)(r)), we exhibit a sufficient condition involving O(r(3)/(r-k)(2)) samples that yields an essentially unique decomposition, as long as these data points are well spread among the subspaces spanned by r - 1 columns of D. We also exhibit a necessary lower bound on the number of samples that contradicts previous results in the literature when k equals r - 1. Our bounds provide a drastic improvement compared to the state of the art, and imply, for example, that for a fixed proportion of zeros (constant and independent of r, e.g., 10% of zero entries in B), one only requires O(r) data points to guarantee identifiability.
Year
DOI
Venue
2019
10.1137/18M1233339
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE
Keywords
DocType
Volume
matrix factorization, dictionary learning, sparse component analysis, identifiability, uniqueness
Journal
1
Issue
Citations 
PageRank 
3
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Jeremy E. Cohen1468.34
Nicolas Gillis250339.77