Title
Latent semantic indexing: a probabilistic analysis
Abstract
Latent semantic indexing (LSI) is an information retrieval technique based on the spectralanalysis of the term-document matrix, whose empirical success had heretofore been withoutrigorous prediction and explanation. We prove that, under certain conditions, LSI does succeedin capturing the underlying semantics of the corpus and achieves improved retrieval performance.We also propose the technique of random projection as a way of speeding up LSI. We complementour theorems with...
Year
DOI
Venue
2000
10.1006/jcss.2000.1711
Journal of Computer and System Sciences - Special issue on the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems
Keywords
Field
DocType
collaborative filtering,probabilistic analysis,information retrieval,spectral method,latent semantic indexing
Latent semantic indexing,Latent Dirichlet allocation,Information retrieval,Computer science,Probabilistic analysis of algorithms,Document-term matrix,Probabilistic latent semantic analysis
Journal
Volume
Issue
ISSN
61
2
0022-0000
Citations 
PageRank 
References 
234
95.83
10
Authors
4
Search Limit
100234
Name
Order
Citations
PageRank
Christos H. Papadimitriou1166713192.54
Prabhakar Raghavan2133512776.61
Prabhakar Raghavan3133512776.61
Santosh Vempala43546523.21