Compressive Transformers for Long-Range Sequence Modelling - Citegraph

Paper Info

Title
Compressive Transformers for Long-Range Sequence Modelling

Abstract
We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Year	Venue	Keywords
2020	ICLR	memory, language modeling, transformer, compression
DocType	Citations	PageRank
Conference	1	0.35
References	Authors
25	5

Authors (5 rows)

Cited by (1 rows)

References (25 rows)

Name	Order	Citations	PageRank
Jack Rae	1	75	8.77
Anna Potapenko	2	1	0.69
Siddhant M. Jayakumar	3	11	5.55
Chloe Hillier	4	150	4.77
Timothy P. Lillicrap	5	4377	170.65

1