U-shaped spatial–temporal transformer network for 3D human pose estimation - Citegraph

Paper Info

Title
U-shaped spatial–temporal transformer network for 3D human pose estimation

Abstract
3D human pose estimation has achieved much progress with the development of convolution neural networks. There still have some challenges to accurately estimate 3D joint locations from single-view images or videos due to depth ambiguity and severe occlusion. Motivated by the effectiveness of introducing vision transformer into computer vision tasks, we present a novel U-shaped spatial–temporal transformer-based network (U-STN) for 3D human pose estimation. The core idea of the proposed method is to process the human joints by designing a multi-scale and multi-level U-shaped transformer model. We construct a multi-scale architecture with three different scales based on the human skeletal topology, in which the local and global features are processed through three different scales with kinematic constraints. Furthermore, a multi-level feature representations is introduced by fusing intermediate features from different depths of the U-shaped network. With a skeletal constrained pooling and unpooling operations devised for U-STN, the network can transform features across different scales and extract meaningful semantic features at all levels. Experiments on two challenging benchmark datasets show that the proposed method achieves a good performance on 2D-to-3D pose estimation. The code is available at https://github.com/l-fay/Pose3D .

Year	DOI	Venue
2022	10.1007/s00138-022-01334-6	Machine Vision and Applications
Keywords	DocType	Volume
Human pose estimation, Spatial–temporal transformer network, Multi-scale and multi-level feature representations	Journal	33
Issue	ISSN	Citations
6	0932-8092	0
PageRank	References	Authors
0.34	5	4

Authors (4 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Yang Honghong	1	0	0.34
Guo Longfei	2	0	0.34
Yumei Zhang	3	10	7.91
Xiaojun Wu	4	356	52.89

1