Title
CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs
Abstract
ABSTRACTDeploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. Regular convolutions process a fixed grid of pixels across all the spatial locations in an image, while dynamic deformable convolution may access arbitrary pixels in the image with the access pattern being input-dependent and varying with spatial location. These properties lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape on a flexible hardware accelerator. We evaluate these algorithmic changes with corresponding hardware optimizations and show a 1.36x and 9.76x speedup respectively for the full and depthwise deformable convolution on hardware with minor accuracy loss. We then co-design a network called CoDeNet with the modified deformable convolution for object detection and quantize the network to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher-accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters--20.9x smaller but 10% more accurate than Tiny-YOLO.
Year
DOI
Venue
2021
10.1145/3431920.3439295
FPGA
DocType
Citations 
PageRank 
Conference
5
0.45
References 
Authors
0
9
Name
Order
Citations
PageRank
Qijing Huang161.47
Dequan Wang2482.77
Z. Dong3244.86
Yizhao Gao483.22
Yaohui Cai550.45
Tian Li651.46
Bichen Wu7665.25
Kurt Keutzer8184.86
John Wawrzynek92264284.44