Abstract | ||
---|---|---|
Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions. |
Year | DOI | Venue |
---|---|---|
2005 | 10.1145/1081870.1081876 | KDD |
Keywords | Field | DocType |
specified query distribution,optimal low-rank approximation,classical method,variable latent semantic indexing,particular query distribution,approximation error,lsi technique,low-rank approximation,classical corpus,improved approximation,particular setting,svd,latent semantic indexing,linear algebra,low rank approximation,vlsi | Singular value decomposition,Linear algebra,Computer science,Matrix (mathematics),Theoretical computer science,Curse of dimensionality,Probabilistic latent semantic analysis,Artificial intelligence,Order of magnitude,Very-large-scale integration,Machine learning,Approximation error | Conference |
ISBN | Citations | PageRank |
1-59593-135-X | 5 | 0.76 |
References | Authors | |
19 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Anirban Dasgupta | 1 | 2535 | 136.99 |
Ravi Kumar | 2 | 13932 | 1642.48 |
Prabhakar Raghavan | 3 | 13351 | 2776.61 |
Andrew Tomkins | 4 | 9388 | 1401.23 |