Title
Fast k most similar neighbor classifier for mixed data based on approximating and eliminating
Abstract
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented
Year
DOI
Venue
2008
10.1007/978-3-540-68125-0_66
PAKDD
Keywords
Field
DocType
similar neighbour classifier,exhaustive comparison,triangle inequality,fast k,similar neighbor classifier,new approximating,k-nn classifier,fast k-nn algorithm,mixed data,proposed classifier,comparison function,search algorithm,nearest neighbor,satisfiability,k nearest neighbor,pattern recognition,nearest neighbor search
Data mining,Best bin first,Computer science,M-tree,Artificial intelligence,Triangle inequality,Classifier (linguistics),Nearest neighbor search,Binary number,k-nearest neighbors algorithm,Pattern recognition,Nonparametric statistics,Machine learning
Conference
Volume
ISSN
ISBN
5012
0302-9743
3-540-68124-8
Citations 
PageRank 
References 
3
0.39
12
Authors
3