Title | ||
---|---|---|
P<sup>2</sup>M-DeTrack: Processing-in-Pixel-in-Memory for Energy-efficient and Real-Time Multi-Object Detection and Tracking |
Abstract | ||
---|---|---|
Today’s high resolution, high frame rate cameras in autonomous vehicles generate a large volume of data that needs to be transferred and processed by a downstream processor or machine learning (ML) accelerator to enable intelligent computing tasks, such as multi-object detection and tracking. The massive amount of data transfer incurs significant energy, latency, and bandwidth bottlenecks, which hinders real-time processing. To mitigate this problem, we propose an algorithm-hardware co-design framework called Processing-in-Pixel-in-Memory-based object Detection and Tracking (P
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>
M-DeTrack). P
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>
M-DeTrack is based on a custom faster R-CNN-based model that is distributed partly inside the pixel array (front-end) and partly in a separate FPGA/ASIC (back-end). The proposed front-end in-pixel processing down-samples the input feature maps significantly with judiciously optimized strided convolution and pooling. Compared to a conventional baseline design that transfers frames of RGB pixels to the back-end, the resulting P
<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup>
M-DeTrack designs reduce the data bandwidth between sensor and back-end by up to 24×. The designs also reduce the sensor and total energy (obtained from in-house circuit simulations at Globalfoundries 22nm technology node) per frame by 5.7× and 1.14×, respectively. Lastly, they reduce the sensing and total frame latency by an estimated 1.7× and 3×, respectively. We evaluate our approach on the multi-object object detection (tracking) task of the large-scale BDD100K dataset and observe only a 0.5% reduction in the mean average precision (0.8% reduction in the identification F1 score) compared to the state-of-the-art. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/VLSI-SoC54400.2022.9939582 | 2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC) |
Keywords | DocType | ISSN |
autonomous vehicles,detection,tracking,processing-in-pixel-in-memory,faster R-CNN | Conference | 2324-8432 |
ISBN | Citations | PageRank |
978-1-6654-9006-1 | 0 | 0.34 |
References | Authors | |
12 | 14 |
Name | Order | Citations | PageRank |
---|---|---|---|
Gourav Datta | 1 | 0 | 0.34 |
Souvik Kundu | 2 | 9 | 4.94 |
Zihan Yin | 3 | 0 | 0.34 |
Joe Mathai | 4 | 0 | 0.34 |
Zeyu Liu | 5 | 0 | 0.34 |
Zixu Wang | 6 | 0 | 0.34 |
Mulin Tian | 7 | 0 | 0.34 |
Shunlin Lu | 8 | 0 | 0.34 |
Ravi Teja Lakkireddy | 9 | 0 | 0.34 |
Andrew Schmidt | 10 | 0 | 0.34 |
Wael Abd-Almageed | 11 | 248 | 24.52 |
Ajey Jacob | 12 | 0 | 0.34 |
Akhilesh Jaiswal | 13 | 0 | 0.34 |
Peter Beerel | 14 | 0 | 0.34 |