Title
Asynchronous Accelerated Stochastic Gradient Descent.
Abstract
Stochastic gradient descent (SGD) is a widely used optimization algorithm in machine learning. In order to accelerate the convergence of SGD, a few advanced techniques have been developed in recent years, including variance reduction, stochastic coordinate sampling, and Nesterov's acceleration method. Furthermore, in order to improve the training speed and/or leverage larger-scale training data, asynchronous parallelization of SGD has also been studied. Then, a natural question is whether these techniques can be seamlessly integrated with each other, and whether the integration has desirable theoretical guarantee on its convergence. In this paper, we provide our formal answer to this question. In particular, we consider the asynchronous parallelization of SGD, accelerated by leveraging variance reduction, coordinate sampling, and Nesterov's method. We call the new algorithm asynchronous accelerated SGD (AASGD). Theoretically, we proved a convergence rate of AASGD, which indicates that (i) the three acceleration methods are complementary to each other and can make their own contributions to the improvement of convergence rate; (ii) asynchronous parallelization does not hurt the convergence rate, and can achieve considerable speedup under appropriate parameter setting. Empirically, we tested AASGD on a few benchmark datasets. The experimental results verified our theoretical findings and indicated that AASGD could be a highly effective and efficient algorithm for practical use.
Year
Venue
Field
2016
IJCAI
Convergence (routing),Asynchronous communication,Stochastic gradient descent,Computer science,Artificial intelligence,Sampling (statistics),Acceleration,Rate of convergence,Variance reduction,Machine learning,Speedup
DocType
Citations 
PageRank 
Conference
2
0.37
References 
Authors
14
6
Name
Order
Citations
PageRank
Qi Meng1183.44
Wei Chen216614.55
Jingcheng Yu370.81
Taifeng Wang417913.33
Zhi-Ming Ma522718.26
Tie-yan Liu64662256.32