Title
Prediction and risk stratification from hospital discharge records based on Hierarchical sLDA
Abstract
Background The greatly accelerated development of information technology has conveniently provided adoption for risk stratification, which means more beneficial for both patients and clinicians. Risk stratification offers accurate individualized prevention and therapeutic decision making etc. Hospital discharge records (HDRs) routinely include accurate conclusions of diagnoses of the patients. For this reason, in this paper, we propose an improved model for risk stratification in a supervised fashion by exploring HDRs about coronary heart disease (CHD). Methods We introduced an improved four-layer supervised latent Dirichlet allocation (sLDA) approach called Hierarchical sLDA model, which categorized patient features in HDRs as patient feature-value pairs in one-hot way according to clinical guidelines for lab test of CHD. To address the data missing and imbalance problem, RFs and SMOTE methods are used respectively. After TF-IDF processing of datasets, variational Bayes expectation-maximization method and generalized linear model were used to recognize the latent clinical state of a patient, i.e., risk stratification, as well as to predict CHD. Accuracy, macro-F1, training and testing time performance were used to evaluate the performance of our model. Results According to the characteristics of our datasets, i.e., patient feature-value pairs, we construct a supervised topic model by adding one more Dirichlet distribution hyperparameter to sLDA. Compared with established supervised algorithm Multi-class sLDA model, we demonstrate that our proposed approach enhances training time by 59.74% and testing time by 25.58% but almost no loss of average prediction accuracy on our datasets. Conclusions A model for risk stratification and prediction of CHD based on sLDA model was proposed. Experimental results show that Hierarchical sLDA model we proposed is competitive in time performance and accuracy. Hierarchical processing of patient features can significantly improve the disadvantages of low efficiency and time-consuming Gibbs sampling of sLDA model.
Year
DOI
Venue
2022
10.1186/s12911-022-01747-3
BMC MEDICAL INFORMATICS AND DECISION MAKING
Keywords
DocType
Volume
Risk stratification, Topic models, Supervised latent Dirichlet allocation, Hospital discharge records
Journal
22
Issue
ISSN
Citations 
1
1472-6947
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Guanglei Yu100.34
Linlin Zhang200.34
Ying Zhang3110.21
Jiaqi Zhou400.34
Tao Zhang5422100.57
Xuehua Bi600.34