Reformer: The Efficient Transformer - Citegraph

Paper Info

Title
Reformer: The Efficient Transformer

Abstract
Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of Transformers. For one, we replace dot-product attention by one that uses locality-sensitive hashing, changing its complexity from O($L^2$) to O($L \log L$), where $L$ is the length of the sequence. Furthermore, we use reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models while being much more memory-efficient and much faster on long sequences.

Year	Venue	Keywords
2020	ICLR	attention, locality sensitive hashing, reversible layers
DocType	Citations	PageRank
Conference	4	0.39
References	Authors
10	3

Authors (3 rows)

Cited by (4 rows)

References (10 rows)

Name	Order	Citations	PageRank
Nikita Kitaev	1	4	0.39
Łukasz Kaiser	2	2307	89.08
Anselm Levskaya	3	4	0.39

1