Title
Indexing for fast categorisation
Abstract
Automatic categorisation is an important technique for the management of large document collections. Categorisation can be used to store or locate documents that satisfy an information need when the need cannot be expressed as a concise list of query terms. Inverted indexes are used in all query-based retrieval systems to allow efficient query processing. In this paper, we propose the application of inverted indexes to categorisation with the aim of developing a fast, scalable, and accurate approach. Specifically, we propose successful variants of inverted indexing to reduce index size: first, quantisation of term-category weights; second, compression of the quantised weights; and, last, storing only those weights that significantly impact the categorisation process. We show that our techniques permits fast, accurate categorisation: index size is reduced by orders of magnitude compared to conventional inverted indexing and the accuracy of categorisation is preserved.
Year
Venue
Keywords
2003
ACSC
conventional inverted indexing,fast categorisation,accurate categorisation,efficient query processing,information need,index size,categorisation process,automatic categorisation,accurate approach,inverted indexing,inverted index,satisfiability,efficiency,indexation,compression,document management
Field
DocType
ISBN
Data mining,Information needs,Information retrieval,Document management system,Computer science,Search engine indexing,Scalability
Conference
0-909-92594-1
Citations 
PageRank 
References 
3
0.40
27
Authors
3
Name
Order
Citations
PageRank
Vaughan R. Shanks130.74
Hugh E. Williams2104893.45
Adam Cannane3887.88