Abstract | ||
---|---|---|
This article discusses the transition from annotated data to a gold standard, that is, a subset that is sufficiently noise-free with high confidence. Unless appropriately reinterpreted, agreement coefficients do not indicate the quality of the data set as a benchmarking resource: High overall agreement is neither sufficient nor necessary to distill some amount of highly reliable data from the annotated material. A mathematical framework is developed that allows estimation of the noise level of the agreed subset of annotated data, which helps promote cautious benchmarking. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1162/coli.2009.35.4.35402 | Computational Linguistics |
Keywords | Field | DocType |
high overall agreement,agreement coefficient,high confidence,gold standard,cautious benchmarking,reliable data,benchmarking resource,annotator agreement,annotated data,annotated material,mathematical framework,noise model | Data mining,Computer science,Noise level,Benchmarking | Journal |
Volume | Issue | ISSN |
35 | 4 | 0891-2017 |
Citations | PageRank | References |
27 | 1.41 | 19 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Beata Beigman Klebanov | 1 | 137 | 19.49 |
Eyal Beigman | 2 | 108 | 9.70 |