Abstract | ||
---|---|---|
In this work we introduce a novel, CNN-based architecture that can be trained end-to-end to deliver seamless scene segmentation results. Our goal is to predict consistent semantic segmentation and detection results by means of a panoptic output format, going beyond the simple combination of independently trained segmentation and detection models. The proposed architecture takes advantage of a novel segmentation head that seamlessly integrates multi-scale features generated by a Feature Pyramid Network with contextual information conveyed by a lightweight DeepLab-like module. As additional contribution we review the panoptic metric and propose an alternative that overcomes its limitations when evaluating non-instance categories. Our proposed network architecture yields state-of-the-art results on three challenging street-level datasets,i.e. Cityscapes,Indian Driving Dataset and Mapillary Vistas. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/CVPR.2019.00847 | 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) |
DocType | Volume | ISSN |
Conference | abs/1905.01220 | 1063-6919 |
Citations | PageRank | References |
3 | 0.37 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lorenzo Porzi | 1 | 120 | 11.79 |
Samuel Rota Bulò | 2 | 564 | 33.69 |
Aleksander Colovic | 3 | 3 | 0.37 |
Peter Kontschieder | 4 | 376 | 21.10 |