Title
CTNR: Compress-then-Reconstruct Approach for Multimodal Abstractive Summarization
Abstract
With the rapid growth of multimodal data in social medias and the huge requirement of short but abundant information. Multimodal summarization has drawn much attention in both industry and academia. It usually obtains textual summary from multiple sources by computer vision or nature language processing technologies. However, there are also two challenges in modeling such task: 1) The feature representation is limited by the non-alignment among multimodal data; 2) Massive parallel data is required during training, which is time-consuming and laborious. In this paper, we introduce an unsupervised architecture (Compress-then-Reconstruct, CTNR) to generate the summary in an end-to-end manner and a Cross-Modal Transformer module (CMTrans) to fuse the multimodal non-alignment information. Comprehensive experiments show that the proposed CTNR framework with CMTrans outperforms mainstream unsupervised approaches in terms of BLEU, ROUGE and relevance scores on MSMO and Youtube News dataset, which increase 8.82% and 11.01% on average respectively.
Year
DOI
Venue
2021
10.1109/IJCNN52387.2021.9534082
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
DocType
ISSN
Citations 
Conference
2161-4393
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Chenxi Zhang12410.25
Zijian Zhang2279.14
Jiangfeng Li300.68
Qin Liu4235.17
Hongming Zhu514.75