Abstract | ||
---|---|---|
AbstractReconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This article proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment the hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for the real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefitting from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hand and object shapes, various interactive motions, and moving cameras. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3451341 | ACM Transactions on Graphics |
Keywords | DocType | Volume |
Single depth camera, hand tracking, object reconstruction, hand-object interaction | Journal | 40 |
Issue | ISSN | Citations |
3 | 0730-0301 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hao Zhang | 1 | 2 | 1.45 |
Yuxiao Zhou | 2 | 0 | 1.35 |
Yifei Tian | 3 | 0 | 0.34 |
Jun-hai Yong | 4 | 620 | 61.47 |
Feng Xu | 5 | 194 | 23.14 |