Title
metaTED: a Corpus of Metadiscourse for Spoken Language.
Abstract
This paper describes metaTED - a freely available corpus of metadiscursive acts in spoken language collected via crowdsourcing. Metadiscursive acts were annotated on a set of 180 randomly chosen TED talks in English, spanning over different speakers and topics. The taxonomy used for annotation is composed of 16 categories, adapted from Adel (2010). This adaptation takes into account both the material to annotate and the setting in which the annotation task is performed. The crowdsourcing setup is described, including considerations regarding training and quality control. The collected data is evaluated in terms of quantity of occurrences, inter-annotator agreement, and annotation related measures (such as average time on task and self-reported confidence). Results show different levels of agreement among metadiscourse acts (alpha is an element of [0.15; 0.49]). To further assess the collected material, a subset of the annotations was submitted to expert appreciation, who validated which of the marked occurrences truly correspond to instances of the metadiscursive act at hand. Similarly to what happened with the crowd, experts revealed different levels of agreement between categories (alpha is an element of [0.18; 0.72]). The paper concludes with a discussion on the applicability of metaTED with respect to each of the 16 categories of metadiscourse.
Year
Venue
Keywords
2016
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
metadiscourse,spoken language,crowdsourcing
Field
DocType
Citations 
Metadiscourse,Computer science,Natural language processing,Corpus linguistics,Artificial intelligence,Spoken language
Conference
0
PageRank 
References 
Authors
0.34
4
4
Name
Order
Citations
PageRank
Rui Correia1143.83
Nuno J. Mamede231057.76
Jorge Baptista39222.45
Maxine Eskenazi4979127.53