Title
Assessing the effect of data integration on predictive ability of cancer survival models.
Abstract
Cancer is the second leading cause of death in the United States. To improve cancer prognosis and survival rates, a better understanding of multi-level contributory factors associated with cancer survival is needed. However, prior research on cancer survival has primarily focused on factors from the individual level due to limited availability of integrated datasets. In this study, we sought to examine how data integration impacts the performance of cancer survival prediction models. We linked data from four different sources and evaluated the performance of Cox proportional hazard models for breast, lung, and colorectal cancers under three common data integration scenarios. We showed that adding additional contextual-level predictors to survival models through linking multiple datasets improved model fit and performance. We also showed that different representations of the same variable or concept have differential impacts on model performance. When building statistical models for cancer outcomes, it is important to consider cross-level predictor interactions.
Year
DOI
Venue
2020
10.1177/1460458218824692
HEALTH INFORMATICS JOURNAL
Keywords
DocType
Volume
cancer survival,data heterogeneities,data integration,interactions,model performance,multi-level data analysis
Journal
26.0
Issue
ISSN
Citations 
SP1.0
1460-4582
0
PageRank 
References 
Authors
0.34
0
7
Name
Order
Citations
PageRank
Yi Guo11210.16
Jiang Bian215043.09
Francois Modave300.34
Qian Li421.06
Thomas J. George552.55
Mattia C. F. Prosperi69922.97
Elizabeth Shenkman743.87