Abstract | ||
---|---|---|
This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we create a number of high-quality, high-performance models on the GPU and CPU, dominating the Pareto frontier for this shared task. |
Year | Venue | DocType |
---|---|---|
2018 | NEURAL MACHINE TRANSLATION AND GENERATION | Conference |
Volume | Citations | PageRank |
abs/1805.12096 | 0 | 0.34 |
References | Authors | |
4 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Marcin Junczys-Dowmunt | 1 | 312 | 24.24 |
Kenneth Heafield | 2 | 579 | 39.46 |
Hieu Hoang | 3 | 1518 | 68.35 |
Roman Grundkiewicz | 4 | 109 | 11.75 |
Anthony Aue | 5 | 290 | 16.87 |