Abstract | ||
---|---|---|
Latent Semantic Indexing (LSI), a vector space-based approach to information retrieval, has been proven to be an effective tool in correlating and retrieving relevant documents. While much work has been published on LSI, most of it addresses the algorithmic or theoretical basis of the model. Little, if any, presents implementation issues in practice. We describe a production-level implementation of LSI. The system integrates components including document collection and preprocessing, singular value decomposition (SVD), multilingual processing, and a tree-based access method for similarity querying. We discuss implementation issues encountered during the development of the system. In particular, we address scalability issues in the query engine and various components of the system, and present lessons learned |
Year | DOI | Venue |
---|---|---|
2001 | 10.1109/RIDE.2001.916491 | Heidelberg |
Keywords | Field | DocType |
implementation scalability,telcordia lsi engine,information retrieval,latent semantic indexing | Data mining,Computer science,Database,Scalability | Conference |
ISSN | ISBN | Citations |
1066-1395 | 0-7695-0957-6 | 20 |
PageRank | References | Authors |
62.25 | 12 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chung-Min Chen | 1 | 441 | 161.66 |
Ned Stoffel | 2 | 20 | 62.25 |
Mike Post | 3 | 20 | 62.25 |
Chumki Basu | 4 | 574 | 160.00 |
Devasis Bassu | 5 | 22 | 65.13 |
Clifford Behrens | 6 | 22 | 63.77 |