Abstract | ||
---|---|---|
In this paper, we address the challenging problem of detecting pedestrians who are heavily occluded or far from camera. Unlike most existing pedestrian detection methods which only use coarse-resolution feature maps with fixed receptive field, our approach exploits multi-grained deep features to make the detector more robust to visible parts of occluded pedestrians and small-size targets. Specifically, we jointly train a scale-aware network and a human parsing network in a semi-supervised manner with only bounding box annotation. We carefully design the scale-aware network to predict pedestrians of particular scales using most appropriate feature maps, by matching their receptive field with the target sizes. The human parsing network generates a fine-grained attentional map which helps guide the detector to focus on the visible parts of occluded pedestrians and small-size instances. Both networks are computed in parallel and form an unified single stage pedestrian detector, which assures a great trade-off between accuracy and speed. Experiments on two challenging benchmarks, Caltech and KITTI, demonstrate the effectiveness of our proposed approach, which in addition, executes 2× faster than competitive methods. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1109/ICME.2018.8486498 | 2018 IEEE International Conference on Multimedia and Expo (ICME) |
Keywords | Field | DocType |
Pedestrian Detection,Human Parsing,Attention,Deep Learning | Computer vision,Pattern recognition,Computer science,Robustness (computer science),Feature extraction,Image segmentation,Artificial intelligence,Parsing,Detector,Pedestrian detection,Feature learning,Minimum bounding box | Conference |
ISSN | ISBN | Citations |
1945-7871 | 978-1-5386-1738-0 | 0 |
PageRank | References | Authors |
0.34 | 3 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chunze Lin | 1 | 6 | 1.44 |
Jiwen Lu | 2 | 3105 | 153.88 |
Jie Zhou | 3 | 2103 | 190.17 |