Title
Topic Discovery through Data Dependent and Random Projections
Abstract
We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme.
Year
Venue
DocType
2013
ICML
Journal
Volume
Citations 
PageRank 
abs/1303.3664
25
1.00
References 
Authors
7
4
Name
Order
Citations
PageRank
Weicong Ding1332.82
Mohammad H. Rohban2575.28
Prakash Ishwar395167.13
Venkatesh Saligrama41350112.74