Title
On handling negative transfer and imbalanced distributions in multiple source transfer learning
Abstract
AbstractTransfer learning has benefited many real-world applications where labeled data are abundant in source domains but scarce in the target domain. As there are usually multiple relevant domains where knowledge can be transferred, multiple source transfer learning MSTL has recently attracted much attention. However, we are facing two major challenges when applying MSTL. First, without knowledge about the difference between source and target domains, negative transfer occurs when knowledge is transferred from highly irrelevant sources. Second, existence of imbalanced distributions in classes, where examples in one class dominate, can lead to improper judgement on the source domains' relevance to the target task. Since existing MSTL methods are usually designed to transfer from relevant sources with balanced distributions, they will fail in applications where these two challenges persist. In this article, we propose a novel two-phase framework to effectively transfer knowledge from multiple sources even when there exists irrelevant sources and imbalanced class distributions. First, an effective supervised local weight scheme is proposed to assign a proper weight to each source domain's classifier based on its ability of predicting accurately on each local region of the target domain. The second phase then learns a classifier for the target domain by solving an optimization problem which concerns both training error minimization and consistency with weighted predictions gained from source domains. A theoretical analysis shows that as the number of source domains increases, the probability that the proposed approach has an error greater than a bound is becoming exponentially small. We further extend the proposed approach to an online processing scenario to conduct transfer learning on continuously arriving data. Extensive experiments on disease prediction, spam filtering and intrusion detection datasets demonstrate that: i the proposed two-phase approach outperforms existing MSTL approaches due to its ability of tackling negative transfer and imbalanced distribution challenges, and ii the proposed online approach achieves comparable performance to the offline scheme.
Year
DOI
Venue
2014
10.1002/sam.11217
Periodicals
Keywords
Field
DocType
multiple source,transfer learning,negative transfer,imbalanced distribution
Data mining,Negative transfer,Existential quantification,Computer science,Transfer of learning,Filter (signal processing),Minification,Artificial intelligence,Classifier (linguistics),Intrusion detection system,Optimization problem,Machine learning
Journal
Volume
Issue
ISSN
7
4
1932-1864
Citations 
PageRank 
References 
21
0.74
22
Authors
5
Name
Order
Citations
PageRank
Liang Ge1816.73
Jing Gao22723131.05
Hung Q. Ngo3210.74
Kang Li433729.74
Aidong Zhang52970405.63