Title
Momresp: A Bayesian Model for Multi-Annotator Document Labeling.
Abstract
Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MOMRESP, a model that improves upon item response models to incorporate information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels for the document classification task. We implement this model and show that MOMRESP can use unlabeled data to improve estimates of the ground-truth labels over a majority vote baseline dramatically in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Correspondingly, in those same situations, estimates of annotator reliability are also stronger than the majority vote baseline. Because MOMRESP predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MOMRESP does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work.
Year
Venue
Keywords
2014
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Bayesian models,corpus annotation,crowd-sourcing,identifiability
Field
DocType
Citations 
Document classification,Annotation,Imperfect,Bayesian inference,Inference,Computer science,Artificial intelligence,Label switching,Natural language processing,Majority rule,Data Annotation
Conference
3
PageRank 
References 
Authors
0.39
12
4
Name
Order
Citations
PageRank
Paul Felt1235.35
Robbie Haertel2817.19
Eric K. Ringger327239.24
Kevin D. Seppi433541.46