On handling negative transfer and imbalanced distributions in multiple source transfer learning - Citegraph

Paper Info

Title
On handling negative transfer and imbalanced distributions in multiple source transfer learning

Abstract
AbstractTransfer learning has benefited many real-world applications where labeled data are abundant in source domains but scarce in the target domain. As there are usually multiple relevant domains where knowledge can be transferred, multiple source transfer learning MSTL has recently attracted much attention. However, we are facing two major challenges when applying MSTL. First, without knowledge about the difference between source and target domains, negative transfer occurs when knowledge is transferred from highly irrelevant sources. Second, existence of imbalanced distributions in classes, where examples in one class dominate, can lead to improper judgement on the source domains' relevance to the target task. Since existing MSTL methods are usually designed to transfer from relevant sources with balanced distributions, they will fail in applications where these two challenges persist. In this article, we propose a novel two-phase framework to effectively transfer knowledge from multiple sources even when there exists irrelevant sources and imbalanced class distributions. First, an effective supervised local weight scheme is proposed to assign a proper weight to each source domain's classifier based on its ability of predicting accurately on each local region of the target domain. The second phase then learns a classifier for the target domain by solving an optimization problem which concerns both training error minimization and consistency with weighted predictions gained from source domains. A theoretical analysis shows that as the number of source domains increases, the probability that the proposed approach has an error greater than a bound is becoming exponentially small. We further extend the proposed approach to an online processing scenario to conduct transfer learning on continuously arriving data. Extensive experiments on disease prediction, spam filtering and intrusion detection datasets demonstrate that: i the proposed two-phase approach outperforms existing MSTL approaches due to its ability of tackling negative transfer and imbalanced distribution challenges, and ii the proposed online approach achieves comparable performance to the offline scheme.

Year	DOI	Venue
2014	10.1002/sam.11217	Periodicals
Keywords	Field	DocType
multiple source,transfer learning,negative transfer,imbalanced distribution	Data mining,Negative transfer,Existential quantification,Computer science,Transfer of learning,Filter (signal processing),Minification,Artificial intelligence,Classifier (linguistics),Intrusion detection system,Optimization problem,Machine learning	Journal
Volume	Issue	ISSN
7	4	1932-1864
Citations	PageRank	References
21	0.74	22
Authors
5

Authors (5 rows)

Cited by (21 rows)

References (22 rows)

Name	Order	Citations	PageRank
Liang Ge	1	81	6.73
Jing Gao	2	2723	131.05
Hung Q. Ngo	3	21	0.74
Kang Li	4	337	29.74
Aidong Zhang	5	2970	405.63

1