Scalable Kernel Methods via Doubly Stochastic Gradients. - Citegraph

Paper Info

Title
Scalable Kernel Methods via Doubly Stochastic Gradients.

Abstract
The general perception is that kernel methods are not scalable, so neural nets become the choice for large-scale nonlinear learning problems. Have we tried hard enough for kernel methods? In this paper, we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Based on the fact that many kernel methods can be expressed as convex optimization problems, our approach solves the optimization problems by making two unbiased stochastic approximations to the functional gradient-one using random training points and another using random features associated with the kernel-and performing descent steps with this noisy functional gradient. Our algorithm is simple, need no commit to a preset number of random features, and allows the flexibility of the function class to grow as we see more incoming data in the streaming setting. We demonstrate that a function learned by this procedure after t iterations converges to the optimal function in the reproducing kernel Hilbert space in rate O(1/t), and achieves a generalization bound of O(1/root t). Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show competitive performances of our approach as compared to neural nets in datasets such as 2.3 million energy materials from MolecularSpace, 8 million handwritten digits from MNIST, and 1 million photos from ImageNet using convolution features.

Year	Venue	DocType
2014	ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014)	Journal
Volume	ISSN	Citations
27	1049-5258	55
PageRank	References	Authors
1.63	26	7

Authors (7 rows)

Cited by (55 rows)

References (26 rows)

Name	Order	Citations	PageRank
Bo Dai	1	230	34.71
Bo Xie 0002	2	94	5.19
Niao He	3	212	16.52
Yingyu Liang	4	393	31.39
Raj, Anant	5	69	5.22
Maria-Florina Balcan	6	1445	105.01
Le Song	7	2437	159.27

1