Title
Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning
Abstract
Temporal sentence grounding aims to detect the most salient moment corresponding to the natural language query from untrimmed videos. As labeling the temporal boundaries is labor-intensive and subjective, the weakly- supervised methods have recently received increasing attention. Most of the existing weakly-supervised methods gen-erate the proposals by sliding windows, which are content- independent and of low quality. Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning(CPL) to overcome the above limitations. Specifi-cally, we use multiple learnable Gaussian functions to gen-erate both positive and negative proposals within the same video that can characterize the multiple events in a long video. Then, we propose a controllable easy to hard neg-ative proposal mining strategy to collect negative samples within the same video, which can ease the model opti-mization and enables CPL to distinguish highly confusing scenes. The experiments show that our method achieves state-of-the-art performance on Charades-STA and Activi-tyNet Captions datasets. The code and models are available at https://github.com/minghangz/cpl.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.01511
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
Vision + language, Recognition: detection,categorization,retrieval, Video analysis and understanding
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
minghang zheng100.68
yanjie huang200.68
qingchao chen300.68
Yuxin Peng4112274.90
Yang Liu51568126.97