Title
Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks.
Abstract
Different analytic techniques operate optimally with different types of data. As the use of EHR-based analytics expands to newer tasks, data will have to be transformed into different representations, so the tasks can be optimally solved. We classified representations into broad categories based on their characteristics, and proposed a new knowledge-driven representation for clinical data mining as well as trajectory mining, called Severity Encoding Variables (SEVs). Additionally, we studied which characteristics make representations most suitable for particular clinical analytics tasks including trajectory mining. Our evaluation shows that, for regression, most data representations performed similarly, with SEV achieving a slight (albeit statistically significant) advantage. For patients at high risk of diabetes, it outperformed the competing representation by (relative) 20%. For association mining, SEV achieved the highest performance. Its ability to constrain the search space of patterns through clinical knowledge was key to its success.
Year
DOI
Venue
2019
10.3233/SHTI190229
Studies in Health Technology and Informatics
Keywords
Field
DocType
Data Science,Electronic Health Records,Data Mining
Data science,Data mining,External Data Representation,Medicine
Conference
Volume
ISSN
Citations 
264
0926-9630
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Wonsuk Oh152.16
Michael Steinbach2176091.22
Regina Castro3103.56
Kevin A Peterson400.34
Vipin Kumar511560934.35
Pedro J Caraballo6297.59
Gyorgy J Simona700.34