Abstract | ||
---|---|---|
Region-based Convolutional Neural Networks (R-CNNs) have achieved great success in the field of object detection. The existing R-CNNs usually divide a Region-of-Interest (ROI) into grids, and then localize objects by utilizing the spatial information reflected by the relative position of each grid in the ROI. In this paper, we propose a novel feature-encoding approach, where spatial information is represented through the spatial distributions of visual patterns. In particular, we design a Mask Weight Network (MWN) to learn a set of masks and then apply channel-wise masking operations to ROI feature map, followed by a global pooling and a cheap fully-connected layer. We integrate the newly designed feature encoder into the Faster R-CNN architecture. The resulting new Faster R-CNNs can preserve the object-detection accuracy of the standard Faster R-CNNs by using substantially fewer parameters. Compared to R-FCNs using state-of-art PS ROI pooling and deformable PS ROI pooling, the new Faster R-CNNs can produce higher object-detection accuracy with good run-time efficiency. We also show that a specifically designed and learned MWN can capture global contextual information and further improve the object-detection accuracy. Validation experiments are conducted on both PASCAL VOC and MS COCO datasets. |
Year | Venue | Field |
---|---|---|
2018 | arXiv: Computer Vision and Pattern Recognition | Spatial analysis,Object detection,Pattern recognition,Masking (art),Convolutional neural network,Computer science,Pooling,Artificial intelligence,Encoder,Grid,Encoding (memory) |
DocType | Volume | Citations |
Journal | abs/1802.03934 | 0 |
PageRank | References | Authors |
0.34 | 12 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Xiaochuan Fan | 1 | 52 | 5.01 |
Hao Guo | 2 | 19 | 4.03 |
Kang Zheng | 3 | 42 | 7.41 |
Wei Feng | 4 | 501 | 61.25 |
Song Wang | 5 | 954 | 79.55 |