Abstract | ||
---|---|---|
Document retrieval enables users to find their required documents accurately and quickly. To satisfy the requirement of retrieval efficiency, prevalent deep neural methods adopt a representation-based matching paradigm, which saves online matching time by pre-storing document representations offline. However, the above paradigm consumes vast local storage space, especially when storing the document as word-grained representations. To tackle this, we present TGTR, a Topic-Grained Text Representation-based Model for document retrieval. Following the representation-based matching paradigm, TGTR stores the document representations offline to ensure retrieval efficiency, whereas it significantly reduces the storage requirements by using novel topicgrained representations rather than traditional word-grained. Experimental results demonstrate that compared to word-grained baselines, TGTR is consistently competitive with them on TREC CAR and MS MARCO in terms of retrieval accuracy, but it requires less than 1/10 of the storage space required by them. Moreover, TGTR overwhelmingly surpasses global-grained baselines in terms of retrieval accuracy. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1007/978-3-031-15934-3_64 | International Conference on Artificial Neural Networks and Machine Learning (ICANN) |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mengxue Du | 1 | 0 | 0.34 |
Shasha Li | 2 | 2 | 4.09 |
Jie Yu | 3 | 41 | 10.55 |
Jun Ma | 4 | 32 | 14.39 |
Bin Ji | 5 | 0 | 2.03 |
Huijun Liu | 6 | 0 | 1.69 |
Wuhang Lin | 7 | 0 | 0.34 |
Zibo Yi | 8 | 0 | 0.68 |