Title | ||
---|---|---|
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs. |
Abstract | ||
---|---|---|
Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literature on this dataset and show that the best systems only achieve 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.0%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 47.0% F1. |
Year | Venue | Field |
---|---|---|
2019 | arXiv: Computation and Language | Reading comprehension,Computer science,Natural language processing,Artificial intelligence |
DocType | Volume | Citations |
Journal | abs/1903.00161 | 4 |
PageRank | References | Authors |
0.39 | 39 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Dheeru Dua | 1 | 38 | 4.95 |
Yizhong Wang | 2 | 33 | 4.70 |
Pradeep Dasigi | 3 | 131 | 12.09 |
Gabriel Stanovsky | 4 | 52 | 10.69 |
Sameer Singh | 5 | 1060 | 71.63 |
Matthew Gardner | 6 | 704 | 38.49 |