Title | ||
---|---|---|
A 47.4µJ/epoch Trainable Deep Convolutional Neural Network Accelerator for In-Situ Personalization on Smart Devices |
Abstract | ||
---|---|---|
A scalable deep learning accelerator supporting both inference and training is implemented for device personalization of deep convolutional neural networks. It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Two cores conduct forward and backward propagation in convolutional layers and utilize a masking scheme to reduce 88.3% of intermediate data to store for training. The third core executes weight update process in convolutional layers and inner product computation in fully connected layers with a novel large window dataflow. The system enables 8-bit fixed point datapath with lossless training and consumes 47.4μJ/epoch for a customized deep CNN model. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/A-SSCC47793.2019.9056972 | 2019 IEEE Asian Solid-State Circuits Conference (A-SSCC) |
Keywords | DocType | ISBN |
CNN training,convolutional layers,lossless training,deep CNN model,energy-efficient dataflow,backward propagation,processor cores,deep convolutional neural networks,device personalization,scalable deep learning accelerator,smart devices | Conference | 978-1-7281-5107-6 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Seungkyu Choi | 1 | 10 | 3.90 |
Jaehyeong Sim | 2 | 52 | 7.63 |
Myeonggu Kang | 3 | 12 | 4.00 |
Yeongjae Choi | 4 | 45 | 5.78 |
hyeonuk kim | 5 | 10 | 2.76 |
Lee-Sup Kim | 6 | 707 | 98.58 |