Title
Variable latent semantic indexing
Abstract
Latent Semantic Indexing is a classical method to produce optimal low-rank approximations of a term-document matrix. However, in the context of a particular query distribution, the approximation thus produced need not be optimal. We propose VLSI, a new query-dependent (or "variable") low-rank approximation that minimizes approximation error for any specified query distribution. With this tool, it is possible to tailor the LSI technique to particular settings, often resulting in vastly improved approximations at much lower dimensionality. We validate this method via a series of experiments on classical corpora, showing that VLSI typically performs similarly to LSI with an order of magnitude fewer dimensions.
Year
DOI
Venue
2005
10.1145/1081870.1081876
KDD
Keywords
Field
DocType
specified query distribution,optimal low-rank approximation,classical method,variable latent semantic indexing,particular query distribution,approximation error,lsi technique,low-rank approximation,classical corpus,improved approximation,particular setting,svd,latent semantic indexing,linear algebra,low rank approximation,vlsi
Singular value decomposition,Linear algebra,Computer science,Matrix (mathematics),Theoretical computer science,Curse of dimensionality,Probabilistic latent semantic analysis,Artificial intelligence,Order of magnitude,Very-large-scale integration,Machine learning,Approximation error
Conference
ISBN
Citations 
PageRank 
1-59593-135-X
5
0.76
References 
Authors
19
4
Name
Order
Citations
PageRank
Anirban Dasgupta12535136.99
Ravi Kumar2139321642.48
Prabhakar Raghavan3133512776.61
Andrew Tomkins493881401.23