Title
Multiple additive regression trees with hybrid loss for classification tasks across heterogeneous clinical data in distributed environments: a case study
Abstract
Multiple additive regression trees (MART) have been widely used in the literature for various classification tasks. However, the overfitting effects of MART across heterogeneous and highly imbalanced big data structures within distributed environments has not yet been investigated. In this work, we utilize distributed MART with hybrid loss to resolve overfitting effects during the training of disease classification models in a case study with 10 heterogeneous and distributed clinical datasets. Lexical and semantic analysis methods were utilized to match heterogeneous terminologies with 80% overlap. Data augmentation was used to resolve class imbalance yielding virtual data with goodness of fit 0.01 and correlation difference 0.02. Our results highlight the favorable performance of the proposed distributed MART on the augmented data with an average increase by 7.3% in the accuracy, 6.8% in sensitivity, 10.4% in specificity, for a specific loss function topology.
Year
DOI
Venue
2021
10.1109/EMBC46164.2021.9629912
2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC)
Keywords
DocType
Volume
distributed environments, data augmentation, lexical analysis, multiple additive regression trees, hybrid loss
Conference
2021
ISSN
Citations 
PageRank 
1557-170X
0
0.34
References 
Authors
0
4