Abstract | ||
---|---|---|
In this paper, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process.We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset. In addition, we show the superiority of our network in pose tracking on the PoseTrack dataset. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/CVPR.2019.00584 | 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) |
Field | DocType | Volume |
Computer vision,Pattern recognition,Computer science,Pose,Artificial intelligence,Feature learning | Journal | abs/1902.09212 |
ISSN | Citations | PageRank |
1063-6919 | 59 | 1.11 |
References | Authors | |
31 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ke Sun | 1 | 185 | 12.57 |
Bin Xiao | 2 | 174 | 10.91 |
Dong Liu | 3 | 721 | 74.92 |
Jingdong Wang | 4 | 4198 | 156.76 |