Title
Comparing K Nearest Neighbours Methods And Linear Regression-Is There Reason To Select One Over The Other?
Abstract
Non-parametric k nearest neighbours (k-nn) techniques are increasingly used in forestry problems, especially in remote sensing. Parametric regression analysis has the advantage of well-known statistical theory behind it, whereas the statistical properties of k-nn are less studied. In this study, we compared the relative performance of k-nn and linear regression in an experiment. We examined the effect of three different properties of the data and problem: 1) the effect of increasing non-linearity of the modelling task, 2) the effect of the assumptions concerning the population and 3) the effect of balance of the sample data. In order to be able to determine the effect of these three aspects, we used simulated data and simple modelling problems. K-nn and linear regression gave fairly similar results with respect to the average RMSEs. In both cases, balanced modelling dataset gave better results than unbalanced dataset. When the results were examined within diameter classes, the k-nn results were less biased than regression model results, especially with extreme values of diameter. The differences increased with increasing non-linearity of the model and increasing unbalance of the data. The difference between the methods was more obvious when the assumed model form was not exactly correct.
Year
Venue
Keywords
2012
MATHEMATICAL AND COMPUTATIONAL FORESTRY & NATURAL-RESOURCE SCIENCES
Modelling, Regression, Imputation, Balanced Data, K Nearest Neighbour
Field
DocType
Volume
Econometrics,Regression diagnostic,Regression analysis,Linear model,Polynomial regression,Proper linear model,Statistics,Statistical theory,Mathematics,Linear regression,Segmented regression
Journal
4
Issue
ISSN
Citations 
1
1946-7664
0
PageRank 
References 
Authors
0.34
1
2
Name
Order
Citations
PageRank
Arto Haara150.81
Annika S. Kangas251.39