Title
Temporal Group Fusion Network for Deep Video Inpainting
Abstract
Video inpainting is a task of synthesizing spatio-temporal coherent content in missing regions of the given video sequence, which has recently drawn increasing attention. To utilize the temporal information across frames, most recent deep learning-based methods align reference frames to target frame firstly with explicit or implicit motion estimation and then integrate the information from the aligned frames. However, their performance relies heavily on the accuracy of frame-to-frame alignment. To alleviate the above problem, in this paper, a novel Temporal Group Fusion Network (TGF-Net) is proposed to effectively integrate temporal information through a two-stage fusion strategy. Specifically, the input frames are reorganized into different groups, where each group is followed by an intra-group fusion module to integrate information within the group. Different groups provide complementary information for the missing region. A temporal attention model is further designed to adaptively integrate the information across groups. Such a temporal information fusion way gets rid of the dependence on alignment operations, greatly improving the visual quality and temporal consistency of the inpainted results. In addition, a coarse alignment model is introduced at the beginning of the network to handle videos with large motion. Extensive experiments on DAVIS and Youtube-VOS datasets demonstrate the superiority of our proposed method in terms of PSNR/SSIM values, visual quality and temporal consistency, respectively.
Year
DOI
Venue
2022
10.1109/TCSVT.2021.3117964
IEEE Transactions on Circuits and Systems for Video Technology
Keywords
DocType
Volume
Video inpainting,video completion,video object removal,video editing
Journal
32
Issue
ISSN
Citations 
6
1051-8215
0
PageRank 
References 
Authors
0.34
19
3
Name
Order
Citations
PageRank
Ruixin Liu123.41
Bairong Li221.38
Zhu Yuesheng311239.21