A Weighted k-Nearest Neighborhood for BaseNP Detection under Covariate Shift - Citegraph

Paper Info

Title
A Weighted k-Nearest Neighborhood for BaseNP Detection under Covariate Shift

Abstract
In common machine learning methods, there is a basic assumption that training data and test data are sampled from the same distribution. However, this assumption is commonly violated in practical fields. The situation where the training and test data are generated from different distributions is so-called covariate shift. In natural language processing, it is highly possible to occur covariate shift due to the size of sample space. Natural language data have theoretically infinite size, which causes that the distribution of training data can not reflect that of entire data. In this paper, we try to verify that the performance of methods on natural language processing can be improved by reducing error from covariate shift. For this purpose, we propose the importance weighted k-NN for base noun detection. In the proposed method, the weights are set as a difference between the training and test distribution. Theoretically, the performance under covariate shift can be improved using importance weight method. In the experiment, the proposed method shows better performance than normal k-NN.

Year	DOI	Venue
2008	10.1109/ALPIT.2008.78	ALPIT
Keywords	Field	DocType
base noun detection,nlp,pattern clustering,covariate shift,entire data,learning (artificial intelligence),different distribution,weighted knn,machine learning,weighted k-nearest neighborhood,basenp detection,better performance,test data,natural language processing,so-called covariate shift,natural language data,training data,chromium,noun,learning artificial intelligence,bismuth,barium,natural language,information technology,copper	Training set,Importance Weight,Covariate shift,Computer science,Pattern clustering,Noun,Natural language,Test data,Statistics,Sample space	Conference
ISBN	Citations	PageRank
978-0-7695-3273-8	0	0.34
References	Authors
6	4

Authors (4 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Jeong-Woo Son	1	67	10.56
Seong-Bae Park	2	311	47.31
Young-Jin Han	3	4	1.14
Seyoung Park	4	76	14.48

1