Title
Wnet: Audio-Guided Video Object Segmentation via Wavelet-Based Cross- Modal Denoising Networks
Abstract
Audio-Guided video object segmentation is a challenging problem in visual analysis and editing, which automatically separates foreground objects from the background in a video sequence according to the referring audio expressions. However, existing referring video object segmentation works mainly focus on the guidance of text-based referring expressions, due to the lack of modeling the semantic representations of audio-video interaction contents. In this paper, we consider the problem of audio-guided video semantic segmentation from the viewpoint of end-to-end denoising encoder-decoder network learning. We propose the wavelet-based encoder network to learn the cross-modal representations of the video contents with audio-form queries. Specifically, we adopt the multi-head cross-modal attention layers to explore the potential relations of video and query contents. A 2-dimension discrete wavelet trans-form is merged into the transformer encoder to decompose the audio-video features. Next, we maximize mutual information between the encoded features and multi-modal features after cross-modal attention layers to enhance the au-dio guidance. Then, a self attention-free decoder network is developed to generate the target masks with frequency-domain transforms. In addition, we construct the first large-scale audio-guided video semantic segmentation dataset. The extensive experiments show the effectiveness of our method 1 1 Code is available at: https://github.com/asudahkzj/Wnet.git.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.00138
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
Segmentation,grouping and shape analysis, Machine learning, Vision + X
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
10
Name
Order
Citations
PageRank
Wenwen Pan100.34
Haonan Shi200.34
Zhou Zhao377390.87
Jieming Zhu400.34
Xiuqiang He531239.21
Zhigeng Pan600.34
Lianli Gao755042.85
Jun Yu82597105.69
Fei Wu92209153.88
Qi Tian106443331.75