Title
Asynchronous Stochastic Gradient Descent For Extreme-Scale Recommender Systems
Abstract
Recommender systems are influential for many internet applications. As the size of the dataset provided for a recommendation model grows rapidly, how to utilize such amount of data effectively matters a lot. For a typical Click-Through-Rate(CTR) prediction model, the amount of daily samples can probably be up to hundreds of terabytes, which reaches dozens of petabytes at an extreme-scale when we take several days into consideration. Such data makes it essential to train the model parallelly and continuously. Traditional asynchronous stochastic gradient descent (ASGD) and its variants are proved efficient but often suffer from stale gradients. Hence, the model convergence tends to be worse as more workers are used. Moreover, the existing adaptive optimizers, which are friendly to sparse data, stagger in long-term training due to the significant imbalance between new and accumulated gradients.To address the challenges posed by extreme-scale data, we propose: 1) Staleness normalization and data normalization to eliminate the turbulence of stale gradients when training asynchronously in hundreds and thousands of workers; 2) SWAP, a novel framework for adaptive optimizers to balance the new and historical gradients by taking sampling period into consideration. We implement these approaches in TensorFlow and apply them to CTR tasks in real-world e-commerce scenarios. Experiments show that the number of workers in asynchronous training can be extended to 3000 with guaranteed convergence, and the final AUC is improved by more than 5 percentage.
Year
Venue
DocType
2021
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE
Conference
Volume
ISSN
Citations 
35
2159-5399
0
PageRank 
References 
Authors
0.34
0
2
Name
Order
Citations
PageRank
Lewis Liu101.35
Kun Zhao2111.92