Title
Attention Is All You Need.
Abstract
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature.
Year
Venue
DocType
2017
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017)
Conference
Volume
ISSN
Citations 
30
1049-5258
432
PageRank 
References 
Authors
6.52
0
8
Search Limit
100432
Name
Order
Citations
PageRank
Ashish Vaswani190132.81
Noam Shazeer2108943.70
Niki Parmar352213.34
Jakob Uszkoreit4110739.31
Llion Jones552311.93
Aidan N. Gomez653413.23
Łukasz Kaiser7230789.08
Illia Polosukhin848310.60