Title
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Abstract
ABSTRACTThe pre-trained neural models have recently achieved impressive performance in understanding multimodal content. However, it is still very challenging to pre-train neural models for video and language understanding, especially for Chinese video-language data, due to the following reasons. Firstly, existing video-language pre-training algorithms mainly focus on the co-occurrence of words and video frames, but ignore other valuable semantic and structure information of video-language content, e.g., sequential order and spatiotemporal relationships. Secondly, there exist conflicts between video sentence alignment and other proxy tasks. Thirdly, there is a lack of large-scale and high-quality Chinese video-language datasets (eg. including 10 million unique videos), which are the fundamental success conditions for pre-training techniques. In this work, we propose a novel video-language understanding framework named Victor, which stands for VIdeo-language understanding via Contrastive mulTimOdal pRe-training. Besides general proxy tasks such as masked language modeling, Victor constructs several novel proxy tasks under the contrastive learning paradigm, making the model be more robust and able to capture more complex multimodal semantic and structural relationships from different perspectives. Victor is trained on a large-scale Chinese video-language dataset, including over 10 million complete videos with corresponding high-quality textual descriptions. We apply the pre-trained Victor model to a series of downstream applications and demonstrate its superior performance, comparing against the state-of-the-art pre-training methods such as VideoBERT and UniVL.
Year
DOI
Venue
2021
10.1145/3474085.3475431
International Multimedia Conference
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
9
Name
Order
Citations
PageRank
Chenyi Lei1261.97
Shixian Luo200.34
Yong Liu329019.62
Wanggui He400.34
Jiamang Wang500.34
Guoxin Wang654.52
Hai-Hong Tang7174.76
Chunyan Miao82307195.72
Houqiang Li92090172.30