An Open Corpus of Everyday Documents for Simplification Tasks - Citegraph

Paper Info

Title
An Open Corpus of Everyday Documents for Simplification Tasks

Abstract
In recent years interest in creating statistical automated text simplification systems has increased. Many of these systems have used parallel corpora of articles taken from Wikipedia and Simple Wikipedia or from Simple Wikipedia revision histories and generate Simple Wikipedia articles. In this work we motivate the need to construct a large, accessible corpus of everyday documents along with their simplifications for the development and evaluation of simplification systems that make everyday documents more accessible. We present a detailed description of what this corpus will look like and the basic corpus of everyday documents we have already collected. This latter contains everyday documents from many domains including driver’s licensing, government aid and banking. It contains a total of over 120,000 sentences. We describe our preliminary work evaluating the feasibility of using crowdsourcing to generate simplifications for these documents. This is the basis for our future extended corpus which will be available to the community of researchers interested in simplification of everyday documents.

Year	DOI	Venue
2014	10.3115/v1/W14-1210	PITR@EACL
Field	DocType	Citations
Text simplification,Information retrieval,Crowdsourcing,Computer science,Parallel corpora,Government	Conference	3
PageRank	References	Authors
0.38	15	2

Authors (2 rows)

Cited by (3 rows)

References (15 rows)

Name	Order	Citations	PageRank
david pellow	1	3	0.38
Maxine Eskenazi	2	979	127.53

1