Title
A 47.4µJ/epoch Trainable Deep Convolutional Neural Network Accelerator for In-Situ Personalization on Smart Devices
Abstract
A scalable deep learning accelerator supporting both inference and training is implemented for device personalization of deep convolutional neural networks. It consists of three processor cores operating with distinct energy-efficient dataflow for different types of computation in CNN training. Two cores conduct forward and backward propagation in convolutional layers and utilize a masking scheme to reduce 88.3% of intermediate data to store for training. The third core executes weight update process in convolutional layers and inner product computation in fully connected layers with a novel large window dataflow. The system enables 8-bit fixed point datapath with lossless training and consumes 47.4μJ/epoch for a customized deep CNN model.
Year
DOI
Venue
2019
10.1109/A-SSCC47793.2019.9056972
2019 IEEE Asian Solid-State Circuits Conference (A-SSCC)
Keywords
DocType
ISBN
CNN training,convolutional layers,lossless training,deep CNN model,energy-efficient dataflow,backward propagation,processor cores,deep convolutional neural networks,device personalization,scalable deep learning accelerator,smart devices
Conference
978-1-7281-5107-6
Citations 
PageRank 
References 
0
0.34
0
Authors
6
Name
Order
Citations
PageRank
Seungkyu Choi1103.90
Jaehyeong Sim2527.63
Myeonggu Kang3124.00
Yeongjae Choi4455.78
hyeonuk kim5102.76
Lee-Sup Kim670798.58