Title
Unlocking Data for Statistical Analyses and Data Mining: Generic Case Extraction of Clinical Items from i2b2 and tranSMART.
Abstract
In medical science, modern IT concepts are increasingly important to gather new findings out of complex diseases. Data Warehouses (DWH) as central data repository systems play a key role by providing standardized, high-quality and secure medical data for effective analyses. However, DWHs in medicine must fulfil various requirements concerning data privacy and the ability to describe the complexity of (rare) disease phenomena. Here, i2b2 and tranSMART are free alternatives representing DWH solutions especially developed for medical informatics purposes. But different functionalities are not yet provided in a sufficient way. In fact, data import and export is still a major problem because of the diversity of schemas, parameter definitions and data quality which are described variously in each single clinic. Further, statistical analyses inside i2b2 and tranSMART are possible, but restricted to the implemented functions. Thus, data export is needed to provide a data basis which can be directly included within statistics software like SPSS and SAS or data mining tools like Weka and RapidMiner. The standard export tools of i2b2 and tranSMART are more or less creating a database dump of key-value pairs which cannot be used immediately by the mentioned tools. They need an instance-based or a case-based representation of each patient. To overcome this lack, we developed a concept called Generic Case Extractor (GCE) which pivots the key-value pairs of each clinical fact into a row-oriented format for each patient sufficient to enable analyses in a broader context. Therefore, complex pivotisation routines where necessary to ensure temporal consistency especially in terms of different data sets and the occurrence of identical but repeated parameters like follow-up data. GCE is embedded inside a comprehensive software platform for systems medicine.
Year
DOI
Venue
2016
10.3233/978-1-61499-678-1-567
Studies in Health Technology and Informatics
Keywords
Field
DocType
i2b2,tranSMART,data warehouse,data export,statistical analyses,data mining
Data science,Data mining,Computer science
Conference
Volume
ISSN
Citations 
228
0926-9630
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Daniel Firnkorn111.11
Sebastian Merker200.34
Matthias Ganzinger356.63
Thomas Muley400.34
Petra Knaup532429.30