Abstract | ||
---|---|---|
This paper addresses the problems of skeleton feature representation and the modeling of temporal dynamics to recognize human actions consisting of poses. In contrast to traditional methods which generally used relative coordinate systems dependent on some joints, or modeled only the long-term dependency, we attempt to understand 3D human behavior with observation by taking temporally different windows. Instead of taking raw skeletons as the input, we transform the skeletons into another coordinate system to obtain the robustness to scale, rotation and translation, and extract motion features between adjacent skeletons, which finally constructs an efficient hybrid-stream combining both pose and motion streams. We propose novel generalized Temporal Sliding Long Short-term Memory (TS-LSTM) networks. The proposed networks are composed of multiple TS-LSTM networks with various hyper-parameters, which can capture various temporal dynamics of actions. We also propose a novel hyper-parameter searching method, which finds decent hyper-parameters of generalized TS-LSTM to handle temporal dynamics of actions. In the experiment, we evaluate the proposed networks to verify the effectiveness of the proposed methods, and compare them with the other methods on three challenging datasets. Additionally, we analyze a relation between the recognized actions and the hyper-parameters, and visualize the layers of the proposed models. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/TMM.2020.2978637 | IEEE Transactions on Multimedia |
Keywords | DocType | Volume |
Human action recognition,fusion of deep learning,long short-term memory,temporal sequence analysis | Journal | 23 |
ISSN | Citations | PageRank |
1520-9210 | 1 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Inwoong Lee | 1 | 13 | 3.98 |
Do Young Kim | 2 | 22 | 8.74 |
Sanghoon Lee | 3 | 235 | 26.21 |