Abstract | ||
---|---|---|
In this paper, we address the problem of scene parsing with deep learning and focus on the context aggregation strategy for robust segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we introduce an emph{object context pooling (OCP)} scheme, which represents each pixel by exploiting the set of pixels that belong to the same object category with such a pixel, and we call the set of pixels as object context. Our implementation, inspired by the self-attention approach, consists of two steps: (i) compute the similarities between each pixel and all the pixels, forming a so-called object context map for each pixel served as a surrogate for the true object context, and (ii) represent the pixel by aggregating the features of all the pixels weighted by the similarities. The resulting representation is more robust compared to existing context aggregation schemes, e.g., pyramid pooling modules (PPM) in PSPNet and atrous spatial pyramid pooling (ASPP), which do not differentiate the context pixels belonging to the same object category or not, making the reliability of contextually aggregated representations limited. We empirically demonstrate our approach and two pyramid extensions with state-of-the-art performance on three semantic segmentation benchmarks: Cityscapes, ADE20K and LIP. Code has been made available at: this https URL. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Computer Vision and Pattern Recognition | Pattern recognition,Segmentation,Computer science,Pooling,Pyramid,Artificial intelligence,Pixel,Parsing,Deep learning |
DocType | Volume | Citations |
Journal | abs/1809.00916 | 12 |
PageRank | References | Authors |
0.53 | 15 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yuhui Yuan | 1 | 34 | 3.34 |
Jingdong Wang | 2 | 4198 | 156.76 |