Title
Theory of the GMM Kernel.
Abstract
In web search, data mining, and machine learning, two popular measures of data similarity are the cosine and the resemblance (the latter is for binary data). In this study, we develop theoretical results for both the cosine and the GMM (generalized min-max) kernel, which is a generalization of the resemblance. GMM has direct applications in machine learning as a positive definite kernel and can be efficiently linearized via probabilistic hashing to handle big data. Owing to its discrete nature, the hashed values can also be used to build hash tables for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a general family of distributions and includes the multivariate normal and t-distribution as special cases. The consistency result holds as long as the data have bounded first moment (an assumption which typically holds for data commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. We also prove the limit of cosine under elliptical distributions. In comparison, the consistency of GMM requires much weaker conditions. For example, when data follow a t-distribution with ν degrees of freedom, GMM typically provides a better estimate of similarity than cosine when ν
Year
DOI
Venue
2017
10.1145/3038912.3052679
Proceedings of the 26th International Conference on World Wide Web
DocType
ISBN
Citations 
Conference
978-1-4503-4913-0
4
PageRank 
References 
Authors
0.40
32
2
Name
Order
Citations
PageRank
Ping Li1207.25
Cun-Hui Zhang217418.38