Title
Creating a bi-lingual entailment corpus through translations with Mechanical Turk: $100 for a 10-day rush
Abstract
This paper reports on experiments in the creation of a bi-lingual Textual Entailment corpus, using non-experts' workforce under strict cost and time limitations ($100, 10 days). To this aim workers have been hired for translation and validation tasks, through the Crowd-Flower channel to Amazon Mechanical Turk. As a result, an accurate and reliable corpus of 426 English/Spanish entailment pairs has been produced in a more cost-effective way compared to other methods for the acquisition of translations based on crowdsourcing. Focusing on two orthogonal dimensions (i.e. reliability of annotations made by non experts, and overall corpus creation costs), we summarize the methodology we adopted, the achieved results, the main problems encountered, and the lessons learned.
Year
Venue
Keywords
2010
Mturk@HLT-NAACL
aim worker,bi-lingual entailment corpus,orthogonal dimension,crowd-flower channel,10-day rush,spanish entailment pair,reliable corpus,non expert,main problem,amazon mechanical turk,bi-lingual textual entailment corpus,overall corpus creation cost
Field
DocType
Citations 
Logical consequence,Textual entailment,Computer science,Crowdsourcing,Artificial intelligence,Natural language processing
Conference
27
PageRank 
References 
Authors
1.68
6
2
Name
Order
Citations
PageRank
Matteo Negri177582.49
Yashar Mehdad251432.04