Title
A partially supervised cross-collection topic model for cross-domain text classification
Abstract
Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be learned in this new feature space. However, most existing methods do not explore the duality of the marginal distribution of examples and the conditional distribution of class labels given labeled training examples in the source domain. Besides, few previous works attempt to explicitly distinguish the domain-independent and domain-specific latent features and align the domain-specific features to further improve the cross-domain learning. In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. Experimental results on nine datasets show that our model outperforms two standard classifiers and four state-of-the-art methods, which demonstrates the effectiveness of our proposed model.
Year
DOI
Venue
2013
10.1145/2505515.2505556
CIKM
Keywords
Field
DocType
target domain,cross-domain text classification,precise text classifier,related source domain,supervised cross-collection topic model,accurate classifier,topic model,cross-domain learning,source domain,labelled text data,topic modeling
Data mining,Conditional probability distribution,Computer science,Duality (optimization),Artificial intelligence,Classifier (linguistics),Feature vector,Information retrieval,Pattern recognition,Topic model,Linear classifier,Marginal distribution,Machine learning
Conference
Citations 
PageRank 
References 
18
0.62
18
Authors
3
Name
Order
Citations
PageRank
Yang Bao1180.62
Nigel Collier2116496.59
Anindya Datta3842127.21