Abstract | ||
---|---|---|
Recent advances in 3D perception have shown impressive progress in understanding geometric structures of 3Dshapes and even scenes. Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints. We introduce an approach to learn view-invariant,geometry-aware representations for network pre-training, based on multi-view RGB-D data, that can then be effectively transferred to downstream 2D tasks. We propose to employ contrastive learning under both multi-view im-age constraints and image-geometry constraints to encode3D priors into learned 2D representations. This results not only in improvement over 2D-only representation learning on the image-based tasks of semantic segmentation, instance segmentation, and object detection on real-world in-door datasets, but moreover, provides significant improvement in the low data regime. We show a significant improvement of 6.0% on semantic segmentation on full data as well as 11.9% on 20% data against baselines on ScanNet. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/ICCV48922.2021.00564 | ICCV |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ji Hou | 1 | 36 | 2.19 |
Saining Xie | 2 | 231 | 12.45 |
Benjamin Graham | 3 | 129 | 15.99 |
Angela Dai | 4 | 396 | 18.84 |
Matthias Nießner | 5 | 0 | 0.34 |