Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning. - Citegraph

Paper Info

Title
Stochastic k-Neighborhood Selection for Supervised and Unsupervised Learning.

Abstract
Neighborhood Components Analysis (NCA) is a popular method for learning a distance metric to be used within a k-nearest neighbors (kNN) classifier. A key assumption built into the model is that each point stochastically selects a single neighbor, which makes the model well-justified only for kNN with k=1. However, kNN classifiers with k>1 are more robust and usually preferred in practice. Here we present kNCA, which generalizes NCA by learning distance metrics that are appropriate for kNN with arbitrary k. The main technical contribution is showing how to efficiently compute and optimize the expected accuracy of a kNN classifier. We apply similar ideas in an unsupervised setting to yield kSNE and ktSNE, generalizations of Stochastic Neighbor Embedding (SNE, tSNE) that operate on neighborhoods of size k, which provide an axis of control over embeddings that allow for more homogeneous and interpretable regions. Empirically, we show that kNCA often improves classification accuracy over state of the art methods, produces qualitative differences in the embeddings as k is varied, and is more robust with respect to label noise.

Year	Venue	Field
2013	ICML	Embedding,Pattern recognition,Homogeneous,Computer science,Generalization,Metric (mathematics),Unsupervised learning,Artificial intelligence,Classifier (linguistics),Machine learning
DocType	Citations	PageRank
Conference	17	0.79
References	Authors
11	5

Authors (5 rows)

Cited by (17 rows)

References (11 rows)

Name	Order	Citations	PageRank
Daniel Tarlow	1	514	31.62
Kevin Swersky	2	1118	52.13
Laurent Charlin	3	637	29.86
Ilya Sutskever	4	25814	1120.24
Richard S. Zemel	5	4958	425.68

1