Abstract | ||
---|---|---|
In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available. |
Year | Venue | Field |
---|---|---|
2017 | arXiv: Computer Vision and Pattern Recognition | Computer vision,mmap,Image warping,Pattern recognition,Computer science,Pooling,Light head,Subnet,Artificial intelligence,Detector,Computation |
DocType | Volume | Citations |
Journal | abs/1711.07264 | 14 |
PageRank | References | Authors |
0.54 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zeming Li | 1 | 32 | 2.85 |
Chao Peng | 2 | 25 | 1.71 |
Gang Yu | 3 | 382 | 19.85 |
Xiangyu Zhang | 4 | 13044 | 437.66 |
Yangdong Deng | 5 | 429 | 44.78 |
Jian Sun | 6 | 25842 | 956.90 |