CTNR: Compress-then-Reconstruct Approach for Multimodal Abstractive Summarization - Citegraph

Paper Info

Title
CTNR: Compress-then-Reconstruct Approach for Multimodal Abstractive Summarization

Abstract
With the rapid growth of multimodal data in social medias and the huge requirement of short but abundant information. Multimodal summarization has drawn much attention in both industry and academia. It usually obtains textual summary from multiple sources by computer vision or nature language processing technologies. However, there are also two challenges in modeling such task: 1) The feature representation is limited by the non-alignment among multimodal data; 2) Massive parallel data is required during training, which is time-consuming and laborious. In this paper, we introduce an unsupervised architecture (Compress-then-Reconstruct, CTNR) to generate the summary in an end-to-end manner and a Cross-Modal Transformer module (CMTrans) to fuse the multimodal non-alignment information. Comprehensive experiments show that the proposed CTNR framework with CMTrans outperforms mainstream unsupervised approaches in terms of BLEU, ROUGE and relevance scores on MSMO and Youtube News dataset, which increase 8.82% and 11.01% on average respectively.

Year	DOI	Venue
2021	10.1109/IJCNN52387.2021.9534082	2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
DocType	ISSN	Citations
Conference	2161-4393	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chenxi Zhang	1	24	10.25
Zijian Zhang	2	27	9.14
Jiangfeng Li	3	0	0.68
Qin Liu	4	23	5.17
Hongming Zhu	5	1	4.75

1