Title | ||
---|---|---|
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising |
Abstract | ||
---|---|---|
ABSTRACTBERT-type structure has led to the revolution of vision-language pre-training and the achievement of state-of-the-art results on numerous vision-language downstream tasks. Existing solutions dominantly capitalize on the multi-modal inputs with mask tokens to trigger mask-based proxy pre-training tasks (e.g., masked language modeling and masked object/frame prediction). In this work, we argue that such masked inputs would inevitably introduce noise for cross-modal matching proxy task, and thus leave the inherent vision-language association under-explored. As an alternative, we derive a particular form of cross-modal proxy objective for video-language pre-training, i.e., Contrastive Cross-modal matching and denoising (CoCo). By viewing the masked frame/word sequences as the noisy augmentation of primary unmasked ones, CoCo strengthens video-language association by simultaneously pursuing inter-modal matching and intra-modal denoising between masked and unmasked inputs in a contrastive manner. Our CoCo proxy objective can be further integrated into any BERT-type encoder-decoder structure for video-language pre-training, named as Contrastive Cross-modal BERT (CoCo-BERT). We pre-train CoCo-BERT on TV dataset and a newly collected large-scale GIF video dataset (ACTION). Through extensive experiments over a wide range of downstream tasks (e.g., cross-modal retrieval, video question answering, and video captioning), we demonstrate the superiority of CoCo-BERT as a pre-trained structure. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3474085.3475703 | International Multimedia Conference |
DocType | Citations | PageRank |
Conference | 1 | 0.35 |
References | Authors | |
0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianjie Luo | 1 | 1 | 0.35 |
Yehao Li | 2 | 75 | 8.57 |
Yingwei Pan | 3 | 357 | 23.66 |
Ting Yao | 4 | 842 | 52.62 |
Hongyang Chao | 5 | 495 | 36.96 |
Tao Mei | 6 | 4702 | 288.54 |