Title
Improving Medical Code Prediction from Clinical Text via Incorporating Online Knowledge Sources
Abstract
Clinical notes contain detailed information about health status of patients for each of their encounters with a health system. Developing effective models to automatically assign medical codes to clinical notes has been a long-standing active research area. Despite a great recent progress in medical informatics fueled by deep learning, it is still a challenge to find the specific piece of evidence in a clinical note which justifies a particular medical code out of all possible codes. Considering the large amount of online disease knowledge sources, which contain detailed information about signs and symptoms of different diseases, their risk factors, and epidemiology, there is an opportunity to exploit such sources. In this paper we consider Wikipedia as an external knowledge source and propose Knowledge Source Integration (KSI), a novel end-to-end code assignment framework, which can integrate external knowledge during training of any baseline deep learning model. The main idea of KSI is to calculate matching scores between a clinical note and disease related Wikipedia documents, and combine the scores with output of the baseline model. To evaluate KSI, we experimented with automatic assignment of ICD-9 diagnosis codes to the emergency department clinical notes from MIMIC-III data set, aided by Wikipedia documents corresponding to the ICD-9 codes. We evaluated several baseline models, ranging from logistic regression to recently proposed deep learning models known to achieve the state-of-the-art accuracy on clinical notes. The results show that KSI consistently improves the baseline models and that it is particularly successful in assignment of rare codes. In addition, by analyzing weights of KSI models, we can gain understanding about which words in Wikipedia documents provide useful information for predictions.
Year
DOI
Venue
2019
10.1145/3308558.3313485
WWW '19: The Web Conference on The World Wide Web Conference WWW 2019
Keywords
Field
DocType
Multi-label classification, attention mechanism, document similarity learning, healthcare
Health care,Medical classification,Diagnosis code,Information retrieval,Computer science,Code assignment,Exploit,Multi-label classification,Artificial intelligence,Deep learning,Health informatics,Machine learning
Conference
ISBN
Citations 
PageRank 
978-1-4503-6674-8
1
0.41
References 
Authors
0
2
Name
Order
Citations
PageRank
Tian Bai1163.40
Slobodan Vucetic263756.38