Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba - Citegraph

Paper Info

Title
Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba

Abstract
ABSTRACTVideos grow to be one of the largest mediums on the Internet. E-commerce platforms like Alibaba need to process millions of video data across multimedia (e.g., visual, audio, image, and text) and on a variety of tasks (e.g., retrieval, tagging, and summary) every day. In this work, we aim to develop a once and for all pretraining technique for diverse modalities and downstream tasks. To achieve this, we make the following contributions: (1) We propose a self-supervised multi-modal co-training framework. It takes cross-modal pseudo-label consistency as the supervision and can jointly learn representations of multiple modalities. (2) We introduce several novel techniques (e.g., sliding-window subset sampling, coarse-to-fine clustering, fast spatial-temporal convolution and parallel data transmission and processing) to optimize the training process, making billion-scale stable training feasible. (3) We construct a large-scale multi-modal dataset consisting of 1.4 billion videos (~0.5 PB) and train our framework on it. The training takes only 4.6 days on an in-house 256 GPUs cluster, and it simultaneously produces pretrained video, audio, image, motion, and text networks. (4) Finetuning from our pretrained models, we obtain significant performance gains and faster convergence on diverse multimedia tasks at Alibaba. Furthermore, we also validate the learned representation on public datasets. Despite the domain gap between our commodity-centric pretraining and the action-centric evaluation data, we show superior results against state-of-the-arts.

Year	DOI	Venue
2021	10.1145/3474085.3481541	International Multimedia Conference
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	9

Authors (9 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lianghua Huang	1	0	1.69
Yu Liu	2	0	0.34
Xiangzeng Zhou	3	0	0.34
Ansheng You	4	0	0.68
Ming Li	5	0	0.34
Bin Wang	6	0	0.34
Yingya Zhang	7	0	0.68
Pan Pan	8	3	4.16
Yinghui Xu	9	172	20.23

1