Photo Stream Question Answer - Citegraph

Paper Info

Title
Photo Stream Question Answer

Abstract
Understanding and reasoning over partially observed visual clues are often regarded as a challenging real-world problem even for human beings. In this paper, we present a new visual question answering (VQA) task -- Photo Stream QA, which aims to answer the open-ended questions about a narrative photo stream. Photo Stream QA is more challenging and interesting than the existing VQA tasks, since the temporal and visual variance among photos in the stream is huge and hard to observe. Therefore, instead of learning simple vision-text mappings, the AI algorithms must fill these variance gaps with more recollection, reasoning, even the knowledge from our daily experiences. To tackle the problems in Photo Stream QA, we propose an end-to-end baseline (E-TAA) with a novel Experienced Unit (E-unit) and Three-stage Alternating Attention (TAA). E-unit yields a better visual representation which captures the temporal semantic relation among visual clues in the photo stream, while TAA creates three levels of attention that gradually refines visual features by using the textual representation from the question as the guidance. Experimental results on our developed dataset demonstrate that, as the first attempt at the Photo Stream QA task, E-TAA provides promising results outperforming all the other baseline methods.

Year	DOI	Venue
2020	10.1145/3394171.3413745	MM '20: The 28th ACM International Conference on Multimedia Seattle WA USA October, 2020
DocType	ISBN	Citations
Conference	978-1-4503-7988-5	0
PageRank	References	Authors
0.34	27	7

Authors (7 rows)

Cited by (0 rows)

References (27 rows)

Name	Order	Citations	PageRank
Wenqiao Zhang	1	3	2.73
Siliang Tang	2	179	33.98
Yanpeng Cao	3	30	6.32
Jun Xiao	4	513	50.95
Shiliang Pu	5	187	42.65
Fei Wu	6	2209	153.88
Yue-Ting Zhuang	7	3549	216.06

1