Title
Semantically guided self-supervised monocular depth estimation
Abstract
Depth information plays an important role in the vision-related activities of robots and autonomous vehicles. An effective method to obtain 3D scene information is self-supervised monocular depth estimation, which utilizes large and diverse monocular video datasets during the training process without the need for ground-truth data. A novel multi-task learning strategy that uses semantic information to guide the monocular depth estimation method while maintaining self-supervision is proposed. An improved differential direct visual odometer (DDVO) combined with Pose-Net is applied for achieving better pose prediction. Minimum reprojection loss with auto-masking and semantic masking is used to remove the effects of low-texture areas and moving dynamic-class objects within scenes. Concurrently, the semantic masking is introduced into the DDVO pose predictor to filter moving objects and reduce the matching error between monocular sequence frames. In addition, PackNet is employed as the backbone of multi-task learning to further improve the accuracy of deep prediction. The proposed method produces state-of-the-art results for monocular depth estimation on the KITTI Eigen split benchmark, even outperforming supervised methods that have been trained using ground-truth depth.
Year
DOI
Venue
2022
10.1049/ipr2.12409
IET IMAGE PROCESSING
DocType
Volume
Issue
Journal
16
5
ISSN
Citations 
PageRank 
1751-9659
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
lu xiao112314.27
Haoran Sun200.34
Xiuling Wang300.34
Zhiguo Zhang400.34
Haixia Wang513227.85