Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. - Citegraph

Paper Info

Title
Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements.

Abstract
Objective To present a series of experiments: (1) to evaluate the impact of pre-annotation on the speed of manual annotation of clinical trial announcements; and (2) to test for potential bias, if pre-annotation is utilized. Methods To build the gold standard, 1400 clinical trial announcements from the clinicaltrials.gov website were randomly selected and double annotated for diagnoses, signs, symptoms, Unified Medical Language System (UMLS) Concept Unique Identifiers, and SNOMED CT codes. We used two dictionary-based methods to preannotatethe text. We evaluated the annotation time and potential bias through F-measures and ANOVA tests and implemented Bonferroni correction. Results Time savings ranged from 13.85% to 21.5% per entity. Inter-annotator agreement (IAA) ranged from 93.4% to 95.5%. There was no statistically significant difference for IAA and annotator performance in preannotations. Conclusions On every experiment pair, the annotator with the pre-annotated text needed less time to annotate than the annotator with non-labeled text. The time savings were statistically significant. Moreover, the pre-annotation did not reduce the IAA or annotator performance. Dictionary-based pre-annotation is a feasible and practical method to reduce the cost of annotation of clinical named entity recognition in the eligibility sections of clinical trial announcements without introducing bias in the annotation process.

Year	DOI	Venue
2014	10.1136/amiajnl-2013-001837	JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
Keywords	Field	DocType
information extraction,named entity recognition,natural language processing,umls,clinical trial announcements,pre-annotation,analysis of variance	Data mining,Computer science,Clinical trial,Artificial intelligence,Natural language processing,SNOMED CT,Unique identifier,Annotation,Bonferroni correction,Information retrieval,Named-entity recognition,Unified Medical Language System,Medical diagnosis	Journal
Volume	Issue	ISSN
21	3	1067-5027
Citations	PageRank	References
18	0.79	23
Authors
9

Authors (9 rows)

Cited by (18 rows)

References (23 rows)

Name	Order	Citations	PageRank
Todd Lingren	1	114	12.78
Louise Deleger	2	234	20.13
Katalin Molnar	3	62	4.60
Haijun Zhai	4	62	7.40
Jareen Meinzen-Derr	5	20	1.85
Megan Kaiser	6	92	7.44
Laura Stoutenborough	7	82	6.09
Qi Li	8	65	6.98
Imre Solti	9	337	23.36

1