Title
Topic-Grained Text Representation-Based Model for Document Retrieval.
Abstract
Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy.
Year
DOI
Venue
2022
10.1007/978-3-031-15934-3_64
International Conference on Artificial Neural Networks and Machine Learning (ICANN)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
8
Name
Order
Citations
PageRank
Mengxue Du100.34
Shasha Li224.09
Jie Yu34110.55
Jun Ma43214.39
Bin Ji502.03
Huijun Liu601.69
Wuhang Lin700.34
Zibo Yi800.68