An Under-Sampling Approach to Imbalanced Automatic Keyphrase Extraction. - Citegraph

Paper Info

Title
An Under-Sampling Approach to Imbalanced Automatic Keyphrase Extraction.

Abstract
The task of automatic keyphrase extraction is usually formalized as a supervised learning problem and various learning algorithms have been utilized. However, most of the existing approaches make the assumption that the samples are uniformly distributed between positive (keyphrase) and negative (non-keyphrase) classes which may not be hold in real keyphrase extraction settings. In this paper, we investigate the problem of supervised keyphrase extraction considering a more common case where the candidate phrases are highly imbalanced distributed between classes. Motivated by the observation that the saliency of a candidate phrase can be described from the perspectives of both morphology and occurrence, a multi-view under-sampling approach, named co-sampling, is proposed. In co-sampling, two classifiers are learned separately using two disjoint sets of features and the redundant candidate phrases reliably predicted by one classifier is removed from the training set of the peer classifier. Through the iterative and interactive under-sampling process, useless samples are continuously identified and removed while the performance of the classifier is boosted. Experimental results show that co-sampling outperforms several existing under-sampling approaches on the keyphrase exaction dataset. © 2012 Springer-Verlag.

Year	DOI	Venue
2012	10.1007/978-3-642-32281-5_38	WAIM
Keywords	Field	DocType
imbalanced classification,keyphrase extraction,multi-view learning,under-sampling	Training set,Data mining,Disjoint sets,Pattern recognition,Computer science,Salience (neuroscience),Phrase,Supervised learning,Sampling (statistics),Artificial intelligence,Classifier (linguistics),Machine learning	Conference
Volume	Issue	ISSN
7418 LNCS	null	16113349
Citations	PageRank	References
2	0.43	20
Authors
3

Authors (3 rows)

Cited by (2 rows)

References (20 rows)

Name	Order	Citations	PageRank
Weijian Ni	1	14	8.09
Tong Liu	2	3	3.14
Qingtian Zeng	3	242	43.67

1