Accuracies of Training Labels and Machine Learning Models: Experiments on Delirium and Simulated Data. - Citegraph

Paper Info

Title
Accuracies of Training Labels and Machine Learning Models: Experiments on Delirium and Simulated Data.

Abstract
Supervised predictive models require labeled data for training purposes. Complete and accurate labeled data is not always available, and imperfectly labeled data may need to serve as an alternative. An important question is if the accuracy of the labeled data creates a performance ceiling for the trained model. In this study, we trained several models to recognize the presence of delirium in clinical documents using data with annotations that are not completely accurate. In the external evaluation, the support vector machine model with a linear kernel performed best, achieving an area under the curve of 89.3% and accuracy of 88%, surpassing the 80% accuracy of the training sample. We then generated a set of simulated data and carried out a series of experiments which demonstrated that models trained on imperfect data can (but do not always) outperform the accuracy of the training data.

Year	DOI	Venue
2021	10.3233/SHTI220161	World Congress on Medical and Health (Medical) Informatics (MedInfo)
Keywords	DocType	Volume
delirium,support vector machine,weak supervised learning	Conference	290
ISSN	Citations	PageRank
1879-8365	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yan Cheng	1	0	0.34
Yijun Shao	2	0	1.69
James Rudolph	3	0	0.34
Charlene R Weir	4	0	0.34
Beth Sahlmann	5	0	0.34
Qing Zeng-Treitler	6	0	0.34

1