Abstract | ||
---|---|---|
In this paper, we propose a simple but effective encoder-decoder based network for fast and accurate depth estimation on mobile devices. Unlike other depth estimation methods using heavy context modeling modules, the encoder with a fast downsampling strategy is employed to obtain sufficient receptive field and contexts at a faster rate. To obtain dense prediction, a light decoder is adopted to recover back to the original resolution. Additionally, to improve the representative ability of the light network, we introduce a teacher-student strategy. It relies on a distillation process ensuring that the student (the proposed light network) learns from the teacher. The proposed method achieves a good trade-off between latency and accuracy. We evaluated the proposed algorithm on the MAI 2021 Monocular Depth Estimation Challenge and achieved a score of 129.41, ranked the first place, which wins the second by a large margin (129.41 v.s. 14.51). More specifically, the proposed method achieves a si-RMSE score of 0.28 with 97 ms on the Raspberry Pi 4. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/CVPRW53098.2021.00279 | 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGITION WORKSHOPS (CVPRW 2021) |
DocType | ISSN | Citations |
Conference | 2160-7508 | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ziyu Zhang | 1 | 112 | 10.19 |
Yicheng Wang | 2 | 22 | 8.06 |
Zilong Huang | 3 | 0 | 0.68 |
Guozhong Luo | 4 | 0 | 1.01 |
Gang Yu | 5 | 382 | 19.85 |
Bin Fu | 6 | 0 | 0.68 |