Title | ||
---|---|---|
Once and for All: Self-supervised Multi-modal Co-training on One-billion Videos at Alibaba |
Abstract | ||
---|---|---|
ABSTRACTVideos grow to be one of the largest mediums on the Internet. E-commerce platforms like Alibaba need to process millions of video data across multimedia (e.g., visual, audio, image, and text) and on a variety of tasks (e.g., retrieval, tagging, and summary) every day. In this work, we aim to develop a once and for all pretraining technique for diverse modalities and downstream tasks. To achieve this, we make the following contributions: (1) We propose a self-supervised multi-modal co-training framework. It takes cross-modal pseudo-label consistency as the supervision and can jointly learn representations of multiple modalities. (2) We introduce several novel techniques (e.g., sliding-window subset sampling, coarse-to-fine clustering, fast spatial-temporal convolution and parallel data transmission and processing) to optimize the training process, making billion-scale stable training feasible. (3) We construct a large-scale multi-modal dataset consisting of 1.4 billion videos (~0.5 PB) and train our framework on it. The training takes only 4.6 days on an in-house 256 GPUs cluster, and it simultaneously produces pretrained video, audio, image, motion, and text networks. (4) Finetuning from our pretrained models, we obtain significant performance gains and faster convergence on diverse multimedia tasks at Alibaba. Furthermore, we also validate the learned representation on public datasets. Despite the domain gap between our commodity-centric pretraining and the action-centric evaluation data, we show superior results against state-of-the-arts. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1145/3474085.3481541 | International Multimedia Conference |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 9 |
Name | Order | Citations | PageRank |
---|---|---|---|
Lianghua Huang | 1 | 0 | 1.69 |
Yu Liu | 2 | 0 | 0.34 |
Xiangzeng Zhou | 3 | 0 | 0.34 |
Ansheng You | 4 | 0 | 0.68 |
Ming Li | 5 | 0 | 0.34 |
Bin Wang | 6 | 0 | 0.34 |
Yingya Zhang | 7 | 0 | 0.68 |
Pan Pan | 8 | 3 | 4.16 |
Yinghui Xu | 9 | 172 | 20.23 |