Abstract | ||
---|---|---|
With the rapidly increasing applications of deep learning, LSTM-RNNs are widely used. Meanwhile, the complex data dependence and intensive computation limit the performance of the accelerators. In this paper, we first proposed a hybrid network expansion model to exploit the finegrained data parallelism. Based on the model, we implemented a Reconfigurable Processing Unit(RPU) using Processing In Memory(PIM) units. Our work shows that the gates and cells in LSTM can be partitioned to fundamental operations and then recombined and mapped into heterogeneous computing components. The experimental results show that, implemented on 45nm CMOS process, the proposed RPU with size of 1.51 mm
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>
and power of 413 mw achieves 309 GOPS/W in power efficiency, and is 1.7 χ better than state-of-the-art reconfigurable architecture. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1109/SiPS.2017.8110011 | 2017 IEEE International Workshop on Signal Processing Systems (SiPS) |
Keywords | Field | DocType |
LSTM,Scalable,Reconfigurable Computing,Process In Memory | Electrical efficiency,Data modeling,Logic gate,Computer science,Parallel computing,Symmetric multiprocessor system,Complex data type,Data parallelism,Artificial intelligence,Deep learning,Computation | Conference |
ISBN | Citations | PageRank |
978-1-5386-0447-2 | 0 | 0.34 |
References | Authors | |
10 | 7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yu Gong | 1 | 12 | 7.36 |
Tingting Xu | 2 | 3 | 2.43 |
Bo Liu | 3 | 6 | 5.82 |
Wei Ge | 4 | 21 | 11.72 |
Jinjiang Yang | 5 | 0 | 2.37 |
Jun Yang | 6 | 147 | 36.54 |
Longxing Shi | 7 | 116 | 39.08 |