Abstract | ||
---|---|---|
We present algorithms for topic modeling based on the geometry of cross-document word-frequency patterns. This perspective gains significance under the so called separability condition. This is a condition on existence of novel-words that are unique to each topic. We present a suite of highly efficient algorithms based on data-dependent and random projections of word-frequency patterns to identify novel words and associated topics. We will also discuss the statistical guarantees of the data-dependent projections method based on two mild assumptions on the prior density of topic document matrix. Our key insight here is that the maximum and minimum values of cross-document frequency patterns projected along any direction are associated with novel words. While our sample complexity bounds for topic recovery are similar to the state-of-art, the computational complexity of our random projection scheme scales linearly with the number of documents and the number of words per document. We present several experiments on synthetic and real-world datasets to demonstrate qualitative and quantitative merits of our scheme. |
Year | Venue | DocType |
---|---|---|
2013 | ICML | Journal |
Volume | Citations | PageRank |
abs/1303.3664 | 25 | 1.00 |
References | Authors | |
7 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Weicong Ding | 1 | 33 | 2.82 |
Mohammad H. Rohban | 2 | 57 | 5.28 |
Prakash Ishwar | 3 | 951 | 67.13 |
Venkatesh Saligrama | 4 | 1350 | 112.74 |