Title
Supporting the annotation of chronic obstructive pulmonary disease (COPD) phenotypes with text mining workflows.
Abstract
Chronic obstructive pulmonary disease (COPD) is a life-threatening lung disorder whose recent prevalence has led to an increasing burden on public healthcare. Phenotypic information in electronic clinical records is essential in providing suitable personalised treatment to patients with COPD. However, as phenotypes are often "hidden" within free text in clinical records, clinicians could benefit from text mining systems that facilitate their prompt recognition. This paper reports on a semi-automatic methodology for producing a corpus that can ultimately support the development of text mining tools that, in turn, will expedite the process of identifying groups of COPD patients.A corpus of 30 full-text papers was formed based on selection criteria informed by the expertise of COPD specialists. We developed an annotation scheme that is aimed at producing fine-grained, expressive and computable COPD annotations without burdening our curators with a highly complicated task. This was implemented in the Argo platform by means of a semi-automatic annotation workflow that integrates several text mining tools, including a graphical user interface for marking up documents.When evaluated using gold standard (i.e., manually validated) annotations, the semi-automatic workflow was shown to obtain a micro-averaged F-score of 45.70% (with relaxed matching). Utilising the gold standard data to train new concept recognisers, we demonstrated that our corpus, although still a work in progress, can foster the development of significantly better performing COPD phenotype extractors.We describe in this work the means by which we aim to eventually support the process of COPD phenotype curation, i.e., by the application of various text mining tools integrated into an annotation workflow. Although the corpus being described is still under development, our results thus far are encouraging and show great potential in stimulating the development of further automatic COPD phenotype extractors.
Year
DOI
Venue
2015
10.1186/s13326-015-0004-6
J. Biomedical Semantics
Keywords
Field
DocType
corpora for clinical text mining,phenotype curation,chronic obstructive pulmonary disease,corpus annotation,automatic annotation workflows,ontology linking
COPD,Data science,Data mining,Disease,Text mining,Annotation,Computer science,Lung Disorder,Public healthcare,Workflow
Journal
Volume
Issue
ISSN
6
1
2041-1480
Citations 
PageRank 
References 
4
0.51
37
Authors
4
Name
Order
Citations
PageRank
Xiao Fu140.51
Riza Theresa Batista-Navarro29810.87
Rafal Rak338218.30
Sophia Ananiadou42658183.08