Latent Memory-augmented Graph Transformer for Visual Storytelling - Citegraph

Paper Info

Title
Latent Memory-augmented Graph Transformer for Visual Storytelling

Abstract
ABSTRACTVisual storytelling aims to automatically generate a human-like short story given an image stream. Most existing works utilize either scene-level or object-level representations, neglecting the interaction among objects in each image and the sequential dependency between consecutive images. In this paper, we present a novel Latent Memory-augmented Graph Transformer~(LMGT ), a Transformer based framework for visual story generation. LMGT directly inherits the merits from the Transformer, which is further enhanced with two carefully designed components, i.e., a graph encoding module and a latent memory unit. Specifically, the graph encoding module exploits the semantic relationships among image regions and attentively aggregates critical visual features based on the parsed scene graphs. Furthermore, to better preserve inter-sentence coherence and topic consistency, we introduce an augmented latent memory unit that learns and records highly summarized latent information as the story line from the image stream and the sentence history. Experimental results on three widely-used datasets demonstrate the superior performance of LMGT over the state-of-the-art methods.

Year	DOI	Venue
2021	10.1145/3474085.3475236	International Multimedia Conference
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mengshi Qi	1	36	3.91
Jie Qin	2	0	0.34
Di Huang	3	0	2.03
Zhiqiang Shen	4	63	9.46
Yi Yang	5	6873	271.72
Jiebo Luo	6	6314	374.00

1