Abstract | ||
---|---|---|
Passage retriever plays an important role for obtaining answers in open-domain textual question answering system, which selects candidate contexts from a large collection of documents and feed to the machine reader. Traditional defacto methods usually construct sparse vectors to match the rules of co-occurrence of words between passages and questions, such as TF-IDF or BM25. And some more advanced methods model word-level contextual semantics similarities to match the text. In this work, we presents a method of encoding text by short sliding windows with built-in continuity, and applying manifold learning method on it to model continuous representation of semantics, so as to represent the similarity features at the passage-level and reduce the directional sparsity difference caused by the difference of text length. Compared with the traditional Lucene BM25 system in the top-20 paragraphs retrieval, the accuracy of our method is 5%-16% higher, and the recall rate is 8%-16% higher. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/DSC53577.2021.00052 | 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC) |
Keywords | DocType | ISBN |
passages,TF-IDF,advanced methods model word-level contextual semantics similarities,short sliding windows,manifold learning method,passage-level,text length,traditional Lucene BM25 system,passage retrieval,open-domain question,open-domain textual question answering system,candidate contexts,machine reader,traditional defacto methods,sparse vectors | Conference | 978-1-6654-1816-4 |
Citations | PageRank | References |
0 | 0.34 | 12 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ruidong Ding | 1 | 0 | 0.34 |
Bin Zhou | 2 | 341 | 30.99 |
Hongkui Tu | 3 | 0 | 0.34 |