CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs - Citegraph

Paper Info

Title
CoDeNet: Efficient Deployment of Input-Adaptive Object Detection on Embedded FPGAs

Abstract
ABSTRACTDeploying deep learning models on embedded systems for computer vision tasks has been challenging due to limited compute resources and strict energy budgets. The majority of existing work focuses on accelerating image classification, while other fundamental vision problems, such as object detection, have not been adequately addressed. Compared with image classification, detection problems are more sensitive to the spatial variance of objects, and therefore, require specialized convolutions to aggregate spatial information. To address this need, recent work introduces dynamic deformable convolution to augment regular convolutions. Regular convolutions process a fixed grid of pixels across all the spatial locations in an image, while dynamic deformable convolution may access arbitrary pixels in the image with the access pattern being input-dependent and varying with spatial location. These properties lead to inefficient memory accesses of inputs with existing hardware. In this work, we harness the flexibility of FPGAs to develop a novel object detection pipeline with deformable convolutions. We show the speed-accuracy tradeoffs for a set of algorithm modifications including irregular-access versus limited-range and fixed-shape on a flexible hardware accelerator. We evaluate these algorithmic changes with corresponding hardware optimizations and show a 1.36x and 9.76x speedup respectively for the full and depthwise deformable convolution on hardware with minor accuracy loss. We then co-design a network called CoDeNet with the modified deformable convolution for object detection and quantize the network to 4-bit weights and 8-bit activations. With our high-efficiency implementation, our solution reaches 26.9 frames per second with a tiny model size of 0.76 MB while achieving 61.7 AP50 on the standard object detection dataset, Pascal VOC. With our higher-accuracy implementation, our model gets to 67.1 AP50 on Pascal VOC with only 2.9 MB of parameters--20.9x smaller but 10% more accurate than Tiny-YOLO.

Year	DOI	Venue
2021	10.1145/3431920.3439295	FPGA
DocType	Citations	PageRank
Conference	5	0.45
References	Authors
0	9

Authors (9 rows)

Cited by (5 rows)

References (0 rows)

Name	Order	Citations	PageRank
Qijing Huang	1	6	1.47
Dequan Wang	2	48	2.77
Z. Dong	3	24	4.86
Yizhao Gao	4	8	3.22
Yaohui Cai	5	5	0.45
Tian Li	6	5	1.46
Bichen Wu	7	66	5.25
Kurt Keutzer	8	18	4.86
John Wawrzynek	9	2264	284.44

1