Persistent RNNs: Stashing Recurrent Weights On-Chip. - Citegraph

Paper Info

Title
Persistent RNNs: Stashing Recurrent Weights On-Chip.

Abstract
This paper introduces a new technique for mapping Deep Recurrent Neural Networks (RNN) efficiently onto GPUs. We show how it is possible to achieve substantially higher computational throughput at low mini-batch sizes than direct implementations of RNNs based on matrix multiplications. The key to our approach is the use of persistent computational kernels that exploit the GPUu0027s inverted memory hierarchy to reuse network weights over multiple timesteps. Our initial implementation sustains 2.8 TFLOP/s at a mini-batch size of 4 on an NVIDIA TitanX GPU. This provides a 16× reduction in activation memory footprint, enables model training with 12× more parameters on the same hardware, allows us to strongly scale RNN training to 128 GPUs, and allows us to efficiently explore end-to-end speech recognition models with over 100 layers.

Year	Venue	Field
2016	ICML	Memory hierarchy,CUDA,Computer science,Parallel computing,Recurrent neural network,Artificial intelligence,Throughput,Deep learning,Memory footprint,Artificial neural network,Matrix multiplication,Machine learning
DocType	Citations	PageRank
Conference	12	0.73
References	Authors
19	9

Authors (9 rows)

Cited by (12 rows)

References (19 rows)

Name	Order	Citations	PageRank
Gregory Frederick Diamos	1	1117	51.07
Shubho Sengupta	2	505	19.84
Bryan C. Catanzaro	3	1191	75.56
mike chrzanowski	4	309	12.21
Adam Coates	5	2493	160.95
Erich Elsen	6	551	29.33
Jesse H. Engel	7	326	20.21
Awni Y. Hannun	8	517	27.54
Sanjeev Satheesh	9	5591	233.55

1