Embracing data abundance: BookTest Dataset for Reading Comprehension. - Citegraph

Paper Info

Title
Embracing data abundance: BookTest Dataset for Reading Comprehension.

Abstract
There is a practically unlimited amount of natural language data available. Still, recent work in text comprehension has focused on datasets which are small relative to current computing possibilities. This article is making a case for the community to move to larger data and as a step in that direction it is proposing the BookTest, a new dataset similar to the popular Childrenu0027s Book Test (CBT), however more than 60 times larger. We show that training on the new data improves the accuracy of our Attention-Sum Reader model on the original CBT test data by a much larger margin than many recent attempts to improve the model architecture. On one version of the dataset our ensemble even exceeds the human baseline provided by Facebook. We then show in our own human study that there is still space for further improvement.

Year	Venue	Field
2016	arXiv: Computation and Language	Human study,Computer science,Model architecture,Reading comprehension,Natural language,Natural language processing,Test data,Text comprehension,Artificial intelligence,Machine learning
DocType	Volume	Citations
Journal	abs/1610.00956	7
PageRank	References	Authors
0.52	23	3

Authors (3 rows)

Cited by (7 rows)

References (23 rows)

Name	Order	Citations	PageRank
Ondrej Bajgar	1	110	5.45
Rudolf Kadlec	2	229	16.25
Jan Kleindienst	3	220	23.74

1