Abstract | ||
---|---|---|
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems. Researchers and industry have developed several techniques to optimize SGD's runtime performance, including asynchronous execution and reduced precision. Our main result is a martingale-based analysis that enables us to capture the rich noise models that may arise from such techniques. Specifically, we use our new analysis in three ways: (1) we derive convergence rates for the convex case (HOGWILD!) with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called BUCKWILD!, that uses lower-precision arithmetic. We show experimentally that our algorithms run efficiently for a variety of problems on modern hardware. |
Year | Venue | Field |
---|---|---|
2015 | ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015) | Convergence (routing),Asynchronous communication,Stochastic gradient descent,Martingale (probability theory),Matrix completion,Matrix (mathematics),Computer science,Algorithm,Regular polygon,Theoretical computer science,Artificial intelligence,Machine learning |
DocType | Volume | ISSN |
Journal | 28 | 1049-5258 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Christopher De Sa | 1 | 8 | 3.83 |
Ce Zhang | 2 | 803 | 83.39 |
Kunle Olukotun | 3 | 4532 | 373.50 |
Ré Christopher | 4 | 3422 | 192.34 |
Ré Christopher | 5 | 3422 | 192.34 |