Abstract | ||
---|---|---|
Constructing Birds-Eye-View (BEV) maps from monocular images is typically a complex multi-stage process involving the separate vision tasks of ground plane estimation, road segmentation and 3D object detection. However, recent approaches have adopted end-to-end solutions which warp image-based features from the image-plane to BEV while implicitly taking account of camera geometry. In this work, we show how such instantaneous BEV estimation of a scene can be learnt, and a better state estimation of the world can be achieved by incorporating temporal information. Our model learns a representation from monocular video through factorised 3D convolutions and uses this to estimate a BEV occupancy grid of the final frame. We achieve state-of-the-art results for BEV estimation from monocular images, and establish a new benchmark for single-scene BEV estimation from monocular video. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICRA48506.2021.9561169 | 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021) |
DocType | Volume | Issue |
Conference | 2021 | 1 |
ISSN | Citations | PageRank |
1050-4729 | 0 | 0.34 |
References | Authors | |
4 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Avishkar Saha | 1 | 0 | 0.68 |
Oscar Mendez Maldonado | 2 | 3 | 2.41 |
Chris Russell | 3 | 0 | 0.68 |
Richard Bowden | 4 | 1840 | 118.50 |