Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering - Citegraph

Paper Info

Title
Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering

Abstract
To alleviate the data scarcity problem in training question answering systems, recent works propose additional intermediate pre-training for dense passage retrieval (DPR). However, there still remains a large discrepancy between the provided upstream signals and the downstream question-passage relevance, which leads to less improvement. To bridge this gap, we propose the HyperLink-induced Pre-training (HLP), a method to pre-train the dense retriever with the text relevance induced by hyperlink-based topology within Web documents. We demonstrate that the hyperlink-based structures of dual-link and co-mention can provide effective relevance signals for large-scale pre-training that better facilitate downstream passage retrieval. We investigate the effectiveness of our approach across a wide range of open-domain QA datasets under zero-shot, few-shot, multihop, and out-of-domain scenarios. The experiments show our HLP outperforms the BM25 by up to 7 points as well as other pre-training methods by more than 10 points in terms of top-20 retrieval accuracy under the zero-shot scenario. Furthermore, HLP significantly outperforms other pre-training methods under the other scenarios.

Year	DOI	Venue
2022	10.18653/v1/2022.acl-long.493	PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS)
DocType	Volume	Citations
Conference	Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)	0
PageRank	References	Authors
0.34	0	13

Authors (13 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Jiawei Zhou	1	0	0.68
Xiaoguang Li	2	141	19.54
Lifeng Shang	3	485	30.96
Lan Luo	4	1	0.69
Ke Zhan	5	1	1.03
Enrui Hu	6	1	1.37
Xinyu Zhang	7	0	0.34
Hao Jiang	8	0	0.34
Zhao Cao	9	6	3.85
Fan Yu	10	1	1.03
Xin Jiang	11	150	32.43
Qun Liu	12	2149	203.11
Lei Chen	13	6239	395.84

1