Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision - Citegraph

Paper Info

Title
Contextualized Spatio-Temporal Contrastive Learning with Self-Supervision

Abstract
Modern self-supervised learning algorithms typically enforce persistency of instance representations across views. While being very effective on learning holistic image and video representations, such an objective becomes suboptimal for learning spatio-temporally fine-grained features in videos, where scenes and instances evolve through space and time. In this paper, we present Contextualized Spatio-Temporal Contrastive Learning (ConST-CL) to effectively learn spatio-temporally fine-grained video representations via self-supervision. We first design a region-based pretext task which requires the model to transform instance representations from one view to another, guided by context features. Further, we introduce a simple network design that successfully reconciles the simultaneous learning process of both holistic and local representations. We evaluate our learned representations on a variety of downstream tasks and show that ConST-CL achieves competitive results on 6 datasets, including Kinetics, UCF, HMDB, AVA-Kinetics, AVA and OTB. Our code and models will be available at https://github.com/tensorflow/models/tree/master/official/projects/const_cl.

Year	DOI	Venue
2022	10.1109/CVPR52688.2022.01359	IEEE Conference on Computer Vision and Pattern Recognition
Keywords	DocType	Volume
Video analysis and understanding, Action and event recognition, Representation learning, Self-& semi-& meta- & unsupervised learning	Conference	2022
Issue	Citations	PageRank
1	0	0.34
References	Authors
0	8

Authors (8 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Liangzhe Yuan	1	19	1.96
Rui Qian	2	2	1.04
Yin Cui	3	262	11.30
Boqing Gong	4	685	33.29
Florian Schroff	5	757	32.72
Yang Ming-Hsuan	6	15303	620.69
Hartwig Adam	7	1326	42.50
Ting Liu	8	30	4.08

1