Title
A Weighted k-Nearest Neighborhood for BaseNP Detection under Covariate Shift
Abstract
In common machine learning methods, there is a basic assumption that training data and test data are sampled from the same distribution. However, this assumption is commonly violated in practical fields. The situation where the training and test data are generated from different distributions is so-called covariate shift. In natural language processing, it is highly possible to occur covariate shift due to the size of sample space. Natural language data have theoretically infinite size, which causes that the distribution of training data can not reflect that of entire data. In this paper, we try to verify that the performance of methods on natural language processing can be improved by reducing error from covariate shift. For this purpose, we propose the importance weighted k-NN for base noun detection. In the proposed method, the weights are set as a difference between the training and test distribution. Theoretically, the performance under covariate shift can be improved using importance weight method. In the experiment, the proposed method shows better performance than normal k-NN.
Year
DOI
Venue
2008
10.1109/ALPIT.2008.78
ALPIT
Keywords
Field
DocType
base noun detection,nlp,pattern clustering,covariate shift,entire data,learning (artificial intelligence),different distribution,weighted knn,machine learning,weighted k-nearest neighborhood,basenp detection,better performance,test data,natural language processing,so-called covariate shift,natural language data,training data,chromium,noun,learning artificial intelligence,bismuth,barium,natural language,information technology,copper
Training set,Importance Weight,Covariate shift,Computer science,Pattern clustering,Noun,Natural language,Test data,Statistics,Sample space
Conference
ISBN
Citations 
PageRank 
978-0-7695-3273-8
0
0.34
References 
Authors
6
4
Name
Order
Citations
PageRank
Jeong-Woo Son16710.56
Seong-Bae Park231147.31
Young-Jin Han341.14
Seyoung Park47614.48