Abstract | ||
---|---|---|
The assignment of ICD-9-CM codes to patientu0027s clinical reports is a costly and wearing process manually done by medical personnel, estimated to cost about $25 billion per year in the United States. To develop a system that automates this process has been an ambition of researchers but is still an unsolved problem due to the inherent difficulties in processing unstructured clinical text. This problem is here formulated as a multi-label supervised learning one where the independent variable is the reportu0027s text and the dependent the several assigned ICD-9-CM labels. Different variations of two neural network based models, the Bag-of-Tricks and the Convolutional Neural Network (CNN) are investigated. The models are trained on the diabetic patient subset of the freely available MIMIC-III dataset. The results show that a CNN with three parallel convolutional layers achieves F 1 scores of 44.51% for five digit codes and 51.73% for three digit, rolled up, codes. Although fully automated coding is not achievable, these results suggest that automated classification could be used to aid clinical staff by selecting the most probable codes. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3279996.3280019 | international conference data science |
Keywords | DocType | Citations |
Electronic Health Records,Classification,Supervised Learning | Conference | 0 |
PageRank | References | Authors |
0.34 | 3 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vitor Pereira | 1 | 1 | 3.06 |
Sérgio Matos | 2 | 415 | 29.51 |
José Luis Oliveira | 3 | 760 | 84.03 |