Title
Mining Multi-View Information: A Strong Self-Supervised Framework for Depth-based 3D Hand Pose and Mesh Estimation
Abstract
In this work, we study the cross-view information fusion problem in the task of self-supervised 3D hand pose estimation from the depth image. Previous methods usually adopt a hand-crafted rule to generate pseudo labels from multi-view estimations in order to supervise the network training in each view. However, these methods ignore the rich semantic information in each view and ignore the complex dependencies between different regions of different views. To solve these problems, we propose a cross-view fusion network to fully exploit and adaptively aggregate multi-view information. We encode diverse semantic information in each view into multiple compact nodes. Then, we introduce the graph convolution to model the complex dependencies between nodes and perform cross-view information interaction. Based on the cross-view fusion network, we propose a strong self-supervised framework for 3D hand pose and hand mesh estimation. Furthermore, we propose a pseudo multi-view training strategy to extend our framework to a more general scenario in which only single-view training data is used. Results on NYU dataset demonstrate that our method outperforms the previous self-supervised methods by 17.5% and 30.3% in multi-view and single-view scenarios. Meanwhile, our framework achieves comparable re-sults to several strongly supervised methods.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.01990
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
Face and gestures, 3D from multi-view and sensors, Pose estimation and tracking, Self-& semi-& meta- & unsupervised learning
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Pengfei Ren111.70
Haifeng Sun26827.77
Jiachang hao311.70
J. Wang447995.23
Qi Qi521056.01
Jianxin Liao645782.08