Title
Learning based Multi-modality Image and Video Compression
Abstract
Multi-modality (i.e., multi-sensor) data is widely used in various vision tasks for more accurate or robust perception. However, the increased data modalities bring new challenges for data storage and transmission. The existing data compression approaches usually adopt individual codecs for each modality without considering the correlation between different modalities. This work proposes a multi-modality compression framework for infrared and visible image pairs by exploiting the cross-modality redun-dancy. Specifically, given the image in the reference modality (e.g., the infrared image), we use the channel-wise alignment module to produce the aligned features based on the affine transform. Then the aligned feature is used as the context information for compressing the image in the current modality (e.g., the visible image), and the corresponding affine coefficients are losslessly compressed at negligible cost. Furthermore, we introduce the Transformer-based spatial alignment module to exploit the correlation between the intermediate features in the decoding procedures for different modalities. Our framework is very flexible and easily extended for multi-modality video compression. Experimental results show our proposed framework outperforms the traditional and learning-based single modality compression methods on the FLIR and KAIST datasets.
Year
DOI
Venue
2022
10.1109/CVPR52688.2022.00599
IEEE Conference on Computer Vision and Pattern Recognition
Keywords
DocType
Volume
Low-level vision, 3D from multi-view and sensors
Conference
2022
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
5
Name
Order
Citations
PageRank
Guo Lu1258.02
Tianxiong Zhong200.34
Jing Geng300.34
Qiang Hu400.34
Dong Xu57616291.96