Title
Modeling the Uncertainty for Self-supervised 3D Skeleton Action Representation Learning
Abstract
ABSTRACTSelf-supervised learning (SSL) has been proved very effective in learning representations from unlabeled data in language and vision domains. Yet, very few instrumental self-supervised approaches exist for 3D skeleton action understanding, and directly applying the existing SSL methods from other domains for skeleton action learning may suffer from misalignment of representations and some limitations. In this paper, we consider that a good representation learning encoder can distinguish the underlying features of different actions, which can make the similar motions closer while pushing the dissimilar motions away. There exists, however, some uncertainties in the skeleton actions due to the inherent ambiguity of 3D skeleton pose in different viewpoints or the sampling algorithm in contrastive learning, thus, it is ill-posed to differentiate the action features in the deterministic embedding space. To address these issues, we rethink the distance between action features and propose to model each action representation into the probabilistic embedding space to alleviate the uncertainties upon encountering the ambiguous 3D skeleton inputs. To validate the effectiveness of the proposed method, extensive experiments are conducted on Kinetics, NTU60, NTU120, and PKUMMD datasets with several alternative network architectures. Experimental evaluations demonstrate the superiority of our approach and through which, we can gain significant performance improvement without using extra labeled data.
Year
DOI
Venue
2021
10.1145/3474085.3475248
International Multimedia Conference
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Yukun Su132.72
Guosheng Lin2356.06
Ruizhou Sun301.01
Yun Hao401.35
Wu Qingyao525933.46