Title
A Novel Method For Clinical Risk Prediction With Low-Quality Data
Abstract
In real-world data, predictive models for clinical risks (such as adverse drug reactions, hospital readmission, and chronic disease onset) are constantly struggling with low-quality issues, namely redundant and highly correlated features, extreme category imbalances, and most importantly, a large number of missing values. In most existing work, each patient is represented as a value vector with the fixed-length from some feature space, and missing values are forced to be imputed, which introduces much noise for prediction if the data set is highly incomplete. Besides, other challenges are either remaining unresolved or only partially solved when modeling, but without a systematic approach. In this paper, we propose a novel framework to address these low-quality problems, that we first treat patients as bags with the various number of feature-value pairs, called instances, and map them to an embedding space through our proposed feature embedding method to learn from it directly. In this way, predictive models can avoid the negative impact of missing data naturally. A novel multi-instance neural network is then connected, using two computational modules to deal with the problems of correlated and redundant features: multi-head attention and attention-based multi-instance pooling. They are capable of capturing the instance correlations and locating valuable information in each instance or bag. The feature embedding and multi-instance neural network are parameterized and optimized jointly in an end-to-end manner. Moreover, the training process is under both main and auxiliary supervision with focal loss functions to avoid the caveat of a highly imbalanced label set. This proposed framework is named AMI-Net3. We evaluate it on three suitable data sets from real-world settings with different clinical risk prediction tasks: adverse drug reaction of risperidone, schizophrenia relapse, and invasive fungi infection, respectively. The comprehensive experimental results demonstrate the effectiveness and superiority of our proposed method over competitive baselines.
Year
DOI
Venue
2021
10.1016/j.artmed.2021.102052
ARTIFICIAL INTELLIGENCE IN MEDICINE
Keywords
DocType
Volume
Clinical risk prediction, Incomplete data, Multi-instance learning, Transformer, Feature embedding
Journal
114
ISSN
Citations 
PageRank 
0933-3657
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Zeyuan Wang101.35
Josiah K. Poon202.37
Shuze Wang300.34
Shiding Sun401.01
Simon Poon501.01