Abstract | ||
---|---|---|
With an ultimate goal of narrowing the gap between human and machine readers in text comprehension, we present the first collection of Challenging Chinese machine reading Comprehension datasets (C^3) collected from language and professional certification exams, which contains 13,924 documents and their associated 23,990 multiple-choice questions. Most of the questions in C^3 cannot be answered merely by surface-form matching against the given text. As a pilot study, we closely analyze the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed in these real-world reading comprehension tasks. We further explore how to leverage linguistic knowledge including a lexicon of idioms and proverbs, graphs of general world knowledge (e.g., ConceptNet), and domain-specific knowledge such as textbooks to aid machine readers, through fine-tuning a pre-trained language model. Experimental results demonstrate that linguistic and general world knowledge may help improve the performance of the baseline reader in both general and domain-specific tasks. C^3 will be available at this http URL. |
Year | Venue | DocType |
---|---|---|
2019 | arXiv: Computation and Language | Journal |
Volume | Citations | PageRank |
abs/1904.09679 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kai Sun | 1 | 33 | 7.71 |
Dian Yu | 2 | 64 | 11.49 |
Dong Yu | 3 | 6264 | 475.73 |
Claire Cardie | 4 | 5591 | 555.20 |