Abstract | ||
---|---|---|
Multilingual question answering tasks typically assume answers exist in the same language as the question. Yet in practice, many languages face both information scarcity---where languages have few reference articles---and information asymmetry---where questions reference concepts from other cultures. This work extends open-retrieval question answering to a cross-lingual setting enabling questions from one language to be answered via answer content from another language. We construct a large-scale dataset built on questions from TyDi QA lacking same-language answers. Our task formulation, called Cross-lingual Open Retrieval Question Answering (XOR QA), includes 40k information-seeking questions from across 7 diverse non-English languages. Based on this dataset, we introduce three new tasks that involve cross-lingual document retrieval using multi-lingual and English resources. We establish baselines with state-of-the-art machine translation systems and cross-lingual pretrained models. Experimental results suggest that XOR QA is a challenging task that will facilitate the development of novel techniques for multilingual question answering. Our data and code are available at https://nlp.cs.washington.edu/xorqa. |
Year | Venue | DocType |
---|---|---|
2021 | NAACL-HLT | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Akari Asai | 1 | 9 | 4.28 |
Jungo Kasai | 2 | 7 | 3.85 |
Jonathan H. Clark | 3 | 0 | 1.01 |
Kenton C.T. Lee | 4 | 1176 | 38.01 |
Eunsol Choi | 5 | 287 | 15.69 |
Hannaneh Hajishirzi | 6 | 417 | 46.10 |