Title
OCNet: Object Context Network for Scene Parsing.
Abstract
In this paper, we address the problem of scene parsing with deep learning and focus on the context aggregation strategy for robust segmentation. Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we introduce an emph{object context pooling (OCP)} scheme, which represents each pixel by exploiting the set of pixels that belong to the same object category with such a pixel, and we call the set of pixels as object context. Our implementation, inspired by the self-attention approach, consists of two steps: (i) compute the similarities between each pixel and all the pixels, forming a so-called object context map for each pixel served as a surrogate for the true object context, and (ii) represent the pixel by aggregating the features of all the pixels weighted by the similarities. The resulting representation is more robust compared to existing context aggregation schemes, e.g., pyramid pooling modules (PPM) in PSPNet and atrous spatial pyramid pooling (ASPP), which do not differentiate the context pixels belonging to the same object category or not, making the reliability of contextually aggregated representations limited. We empirically demonstrate our approach and two pyramid extensions with state-of-the-art performance on three semantic segmentation benchmarks: Cityscapes, ADE20K and LIP. Code has been made available at: this https URL.
Year
Venue
Field
2018
arXiv: Computer Vision and Pattern Recognition
Pattern recognition,Segmentation,Computer science,Pooling,Pyramid,Artificial intelligence,Pixel,Parsing,Deep learning
DocType
Volume
Citations 
Journal
abs/1809.00916
12
PageRank 
References 
Authors
0.53
15
2
Name
Order
Citations
PageRank
Yuhui Yuan1343.34
Jingdong Wang24198156.76