Title
Unsupervised depth estimation from monocular videos with hybrid geometric-refined loss and contextual attention
Abstract
Most existing methods based on convolutional neural networks (CNNs) are supervised, which require a large amount of ground-truth data for training. Recently, some unsupervised methods utilize stereo image pairs as input by transforming depth estimation into a view synthesis problem, but need stereo camera as an additional equipment for data acquisition. Therefore, we use more available monocular videos captured from monocular camera as our input, and propose an unsupervised learning framework to predict scene depth maps from monocular video frames. First, we design a novel unsupervised hybrid geometric-refined loss, which can explicitly explore more accurate geometric relationship between the input color image and the predicted depth map, and preserve depth boundaries and fine structures in depth maps. Then, we design a contextual attention module to capture nonlocal dependencies along the spatial and channel dimensions in a dual path, which can improve the ability of feature representation and further preserve fine depth details. In addition, we also utilize an adversarial loss to discriminate synthetic or realistic color images by training a discriminator so as to produce realistic results. Experimental results demonstrate that the proposed framework achieves comparable or even better results than those trained with monocular videos or stereo image pairs.
Year
DOI
Venue
2020
10.1016/j.neucom.2019.10.107
Neurocomputing
Keywords
DocType
Volume
Unsupervised,Monocular video,Attention,Hybrid geometric-refined loss
Journal
379
ISSN
Citations 
PageRank 
0925-2312
2
0.36
References 
Authors
0
4
Name
Order
Citations
PageRank
Mingliang Zhang120.36
Xinchen Ye296.90
Xin Fan34212.48
Wei Zhong493.89