Title | ||
---|---|---|
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning |
Abstract | ||
---|---|---|
Our target is to learn visual correspondence from unlabeled videos. We develop Liir, a locality-aware inter-and intra-video reconstruction method that fills in three missing pieces, i.e., instance discrimination, location awareness, and spatial compactness, of self-supervised correspondence learning puzzle. First, instead of most existing efforts focusing on intra-video self-supervision only, we exploit cross-video affinities as extra negative samples within a unified, inter-and intra-video reconstruction scheme. This enables instance discriminative representation learning by contrasting desired intra-video pixel association against negative inter-video correspondence. Second, we merge position information into correspondence matching, and design a position shifting strategy to remove the side-effect of position encoding during inter-video affinity computation, making our Liir location-sensitive. Third, to make full use of the spatial continuity nature of video data, we impose a compactness-based constraint on correspondence matching, yielding more sparse and reliable solutions. The learned representation surpasses self-supervised state-of-the-arts on label propagation tasks including objects, semantic parts, and keypoints. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CVPR52688.2022.00852 | IEEE Conference on Computer Vision and Pattern Recognition |
Keywords | DocType | Volume |
Motion and tracking, Segmentation,grouping and shape analysis | Conference | 2022 |
Issue | Citations | PageRank |
1 | 0 | 0.34 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Liulei Li | 1 | 0 | 0.68 |
Tianfei Zhou | 2 | 21 | 7.46 |
Wenguan Wang | 3 | 1019 | 37.24 |
yang lu | 4 | 4 | 0.76 |
Jianwu Li | 5 | 76 | 12.99 |
Yi Yang | 6 | 6873 | 271.72 |