Abstract | ||
---|---|---|
We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19. |
Year | Venue | Keywords |
---|---|---|
2020 | ICLR | memory, language modeling, transformer, compression |
DocType | Citations | PageRank |
Conference | 1 | 0.35 |
References | Authors | |
25 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jack Rae | 1 | 75 | 8.77 |
Anna Potapenko | 2 | 1 | 0.69 |
Siddhant M. Jayakumar | 3 | 11 | 5.55 |
Chloe Hillier | 4 | 150 | 4.77 |
Timothy P. Lillicrap | 5 | 4377 | 170.65 |