Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training - Citegraph

Paper Info

Title
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training

Abstract
ABSTRACTIn this work, we present Auto-captions on GIF (ACTION), which is a new large-scale pre-training dataset for generic video understanding. All video-sentence pairs are created by automatically extracting and filtering video caption annotations from billions of web pages. Auto-captions on GIF dataset can be utilized to pre-train the generic feature representation or encoder-decoder structure for video captioning, and other downstream tasks (e.g., sentence localization in videos, video question answering, etc.) as well. We present a detailed analysis of Auto-captions on GIF dataset in comparison to existing video-sentence datasets. We also provide an evaluation of a Transformer-based encoder-decoder structure for vision-language pre-training, which is further adapted to video captioning downstream task and yields the compelling generalizability on MSR-VTT. The dataset is available at http://www.auto-video-captions.top/2022/dataset.

Year	DOI	Venue
2022	10.1145/3503161.3551581	International Multimedia Conference
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yingwei Pan	1	357	23.66
Yehao Li	2	75	8.57
Jianjie Luo	3	0	0.34
Jun Xu	4	72	2.20
Ting Yao	5	842	52.62
Tao Mei	6	4702	288.54

1