Another Look at DPR: Reproduction of Training and Replication of Retrieval - Citegraph

Paper Info

Title
Another Look at DPR: Reproduction of Training and Replication of Retrieval

Abstract
Text retrieval using learned dense representations has recently emerged as a promising alternative to "traditional" text retrieval using sparse bag-of-words representations. One foundational work that has garnered much attention is the dense passage retriever (DPR) proposed by Karpukhin et al. for end-to-end open-domain question answering. This work presents a reproduction and replication study of DPR. We first verify the reproducibility of the DPR model checkpoints by training passage and query encoders from scratch using two different implementations: the original code released by the authors and another independent codebase. After that, we conduct a detailed replication study of the retrieval stage, starting with model checkpoints provided by the authors but with an independent implementation from our group's Pyserini IR toolkit and PyGaggle neural text ranking library. Although our experimental results largely verify the claims of the original DPR paper, we arrive at two important additional findings: First, it appears that the original authors under-report the effectiveness of the BM25 baseline and hence also dense-sparse hybrid retrieval results. Second, by incorporating evidence from the retriever and improved answer span scoring, we manage to improve end-to-end question answering effectiveness using the same DPR models.

Year	DOI	Venue
2022	10.1007/978-3-030-99736-6_41	ADVANCES IN INFORMATION RETRIEVAL, PT I
Keywords	DocType	Volume
Open-domain QA, Dense retrieval	Conference	13185
ISSN	Citations	PageRank
0302-9743	0	0.34
References	Authors
0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xueguang Ma	1	0	1.01
Kai Sun	2	0	0.34
Ronak Pradeep	3	0	0.34
Minghan Li	4	0	0.68
Jimmy Lin	5	4800	376.93

1