Title
Creating language resources for under-resourced languages: methodologies, and experiments with Arabic
Abstract
Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general. We describe the creation of useful resources for languages that currently lack them, taking resources for Arabic summarisation as a case study. We illustrate three different paradigms for creating language resources, namely: (1) using crowdsourcing to produce a small resource rapidly and relatively cheaply; (2) translating an existing gold-standard dataset, which is relatively easy but potentially of lower quality; and (3) using manual effort with appropriately skilled human participants to create a resource that is more expensive but of high quality. The last of these was used as a test collection for TAC-2011. An evaluation of the resources is also presented.
Year
DOI
Venue
2015
10.1007/s10579-014-9274-3
Language Resources and Evaluation
Keywords
Field
DocType
Resources,Summarisation,Arabic,Under-resourced languages
Arabic,Crowdsourcing,Computer science,Speech recognition,Artificial intelligence,Natural language processing
Journal
Volume
Issue
ISSN
49
3
1574-020X
Citations 
PageRank 
References 
5
0.59
64
Authors
3
Name
Order
Citations
PageRank
Mahmoud El-Haj1326.03
Udo Kruschwitz238755.73
Chris Fox3516.88