Characterizing Sub-Cohorts via Data Normalization and Representation Learning - Citegraph

Paper Info

Title
Characterizing Sub-Cohorts via Data Normalization and Representation Learning

Abstract
The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.

Year	DOI	Venue
2020	10.1109/CBMS49503.2020.00040	2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS)
Keywords	DocType	ISSN
electronic health records, data normalization, UMLS, cohort selection, representation learning, clustering	Conference	2372-918X
ISBN	Citations	PageRank
978-1-7281-9430-1	0	0.34
References	Authors
4	8

Authors (8 rows)

Cited by (0 rows)

References (4 rows)

Name	Order	Citations	PageRank
Everett Neil Rush	1	0	0.34
Özgür Özmen	2	0	0.34
Kathryn Knight	3	2	2.12
Byung H. Park	4	6	2.24
Clifton Baker	5	0	0.34
Makoto L. Jones	6	0	0.34
Merry Ward	7	0	0.34
Jonathan R. Nebeker	8	0	0.34

1