Abstract | ||
---|---|---|
The neural network based approach for 3D human pose estimation from monocular images has attracted growing interest. However, annotating 3D poses is a labor-intensive and expensive process. In this paper, we propose a novel self-supervised approach to avoid the need of manual annotations. Different from existing weakly/self-supervised methods that require extra unpaired 3D ground-truth data to alleviate the depth ambiguity problem, our method trains the network only relying on geometric knowledge without any additional 3D pose annotations. The proposed method follows the two-stage pipeline: 2D pose estimation and 2D-to-3D pose lifting. We design the transform re-projection loss that is an effective way to explore multi-view consistency for training the 2D-to-3D lifting network. Besides, we adopt the confidences of 2D joints to integrate losses from different views to alleviate the influence of noises caused by the self-occlusion problem. Finally, we design a two-branch training architecture, which helps to preserve the scale information of re-projected 2D poses during training, resulting in accurate 3D pose predictions. We demonstrate the effectiveness of our method on two popular 3D human pose datasets, Human3.6M and MPI-INF-3DHP. The results show that our method significantly outperforms recent weakly/self-supervised approaches. |
Year | Venue | DocType |
---|---|---|
2020 | AAAI | Conference |
Volume | ISSN | Citations |
34 | 2159-5399 | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yang Li | 1 | 659 | 125.00 |
Kan Li | 2 | 55 | 14.93 |
Shuai Jiang | 3 | 22 | 9.74 |
Ziyue Zhang | 4 | 0 | 4.06 |
Congzhentao Huang | 5 | 0 | 1.01 |
Richard Yi da Xu | 6 | 52 | 11.12 |