Title
Improving condition severity classification with an efficient active learning based framework.
Abstract
Classification of condition severity can be useful for discriminating among sets of conditions or phenotypes, for example when prioritizing patient care or for other healthcare purposes. Electronic Health Records (EHRs) represent a rich source of labeled information that can be harnessed for severity classification. The labeling of EHRs is expensive and in many cases requires employing professionals with high level of expertise. In this study, we demonstrate the use of Active Learning (AL) techniques to decrease expert labeling efforts. We employ three AL methods and demonstrate their ability to reduce labeling efforts while effectively discriminating condition severity. We incorporate three AL methods into a new framework based on the original CAESAR (Classification Approach for Extracting Severity Automatically from Electronic Health Records) framework to create the Active Learning Enhancement framework (CAESAR-ALE). We applied CAESAR-ALE to a dataset containing 516 conditions of varying severity levels that were manually labeled by seven experts. Our dataset, called the "CAESAR dataset," was created from the medical records of 1.9 million patients treated at Columbia University Medical Center (CUMC). All three AL methods decreased labelers' efforts compared to the learning methods applied by the original CAESER framework in which the classifier was trained on the entire set of conditions; depending on the AL strategy used in the current study, the reduction ranged from 48% to 64% that can result in significant savings, both in time and money. As for the PPV (precision) measure, CAESAR-ALE achieved more than 13% absolute improvement in the predictive capabilities of the framework when classifying conditions as severe. These results demonstrate the potential of AL methods to decrease the labeling efforts of medical experts, while increasing accuracy given the same (or even a smaller) number of acquired conditions. We also demonstrated that the methods included in the CAESAR-ALE framework (Exploitation and Combination_XA) are more robust to the use of human labelers with different levels of professional expertise.
Year
DOI
Venue
2016
10.1016/j.jbi.2016.03.016
Journal of Biomedical Informatics
Keywords
Field
DocType
Active Learning,Condition,Electronic Health Records,Phenotyping,Severity
Data mining,Active learning,Computer science,Support vector machine,Automation,Data curation,Artificial intelligence,SNOMED CT,Classifier (linguistics),Problem-based learning,Machine learning,Test set
Journal
Volume
Issue
ISSN
61
C
1532-0464
Citations 
PageRank 
References 
5
0.42
30
Authors
7
Name
Order
Citations
PageRank
Nir Nissim119919.42
Mary Regina Boland21008.63
Nicholas P. Tatonetti3987.37
Yuval Elovici42583204.53
George Hripcsak51493160.86
Yuval Shahar61974214.22
Robert Moskovitch772939.62