Abstract | ||
---|---|---|
The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/CBMS49503.2020.00040 | 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS) |
Keywords | DocType | ISSN |
electronic health records, data normalization, UMLS, cohort selection, representation learning, clustering | Conference | 2372-918X |
ISBN | Citations | PageRank |
978-1-7281-9430-1 | 0 | 0.34 |
References | Authors | |
4 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Everett Neil Rush | 1 | 0 | 0.34 |
Özgür Özmen | 2 | 0 | 0.34 |
Kathryn Knight | 3 | 2 | 2.12 |
Byung H. Park | 4 | 6 | 2.24 |
Clifton Baker | 5 | 0 | 0.34 |
Makoto L. Jones | 6 | 0 | 0.34 |
Merry Ward | 7 | 0 | 0.34 |
Jonathan R. Nebeker | 8 | 0 | 0.34 |