Title
An Open Corpus of Everyday Documents for Simplification Tasks
Abstract
In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with their simplifications for the development and evaluation of simplification systems that make everyday documents more accessible. We present a detailed description of what this corpus will look like and the basic corpus of everyday documents we have already collected. This latter contains everyday documents from many domains including driver’s licensing, government aid and banking. It contains a total of over 120,000 sentences. We describe our preliminary work evaluating the feasibility of using crowdsourcing to generate simplifications for these documents. This is the basis for our future extended corpus which will be available to the community of researchers interested in simplification of everyday documents.
Year
DOI
Venue
2014
10.3115/v1/W14-1210
PITR@EACL
Field
DocType
Citations 
Text simplification,Information retrieval,Crowdsourcing,Computer science,Parallel corpora,Government
Conference
3
PageRank 
References 
Authors
0.38
15
2
Name
Order
Citations
PageRank
david pellow130.38
Maxine Eskenazi2979127.53