Title
Weighted elastic net for unsupervised domain adaptation with application to age prediction from DNA methylation data.
Abstract
Motivation Predictive models are a powerful tool for solving complex problems in computational biology. They are typically designed to predict or classify data coming from the same unknown distribution as the training data. In many real-world settings, however, uncontrolled biological or technical factors can lead to a distribution mismatch between datasets acquired at different times, causing model performance to deteriorate on new data. A common additional obstacle in computational biology is scarce data with many more features than samples. To address these problems, we propose a method for unsupervised domain adaptation that is based on a weighted elastic net. The key idea of our approach is to compare dependencies between inputs in training and test data and to increase the cost of differently behaving features in the elastic net regularization term. In doing so, we encourage the model to assign a higher importance to features that are robust and behave similarly across domains. Results We evaluate our method both on simulated data with varying degrees of distribution mismatch and on real data, considering the problem of age prediction based on DNA methylation data across multiple tissues. Compared with a non-adaptive standard model, our approach substantially reduces errors on samples with a mismatched distribution. On real data, we achieve far lower errors on cerebellum samples, a tissue which is not part of the training data and poorly predicted by standard models. Our results demonstrate that unsupervised domain adaptation is possible for applications in computational biology, even with many more features than samples. Availability and implementation Source code is available at https://github.com/PfeiferLabTue/wenda. Supplementary information Supplementary data are available at Bioinformatics online.
Year
DOI
Venue
2019
10.1093/bioinformatics/btz338
BIOINFORMATICS
Field
DocType
Volume
Data mining,Domain adaptation,Computer science,Elastic net regularization,DNA methylation
Journal
35
Issue
ISSN
Citations 
14
1367-4803
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Lisa Handl100.34
Adrin Jalali2123.04
Michael Scherer300.68
Ralf Eggeling4244.76
Nico Pfeifer525926.24