Watch, Reason and Code: Learning to Represent Videos Using Program - Citegraph

Paper Info

Title
Watch, Reason and Code: Learning to Represent Videos Using Program

Abstract
Humans have a surprising capacity to induce general rules that describe the specific actions portrayed in a video sequence. The rules learned through this kind of process allow us to achieve similar goals to those shown in the video but in more general circumstances. Enabling an agent to achieve the same capacity represents a significant challenge. In this paper, we propose a Watch-Reason-Code(WRC) model to synthesise programs that describe the process carried out in a set of video sequences. The 'watch' stage is simply a video encoder that encodes videos to multiple feature vectors. The 'reason' stage takes as input the features from multiple diverse videos and generates a compact feature representation via a novel deviation-pooling method. The 'code' stage is a multi-sound decoder that the first step leverages to generate a draft program layout with possible useful statements and perceptions. Further steps then take these outputs and generate a fully structured, compile-able and executable program. We evaluate the effectiveness of our model in two video-to-program synthesis environments, Karel andVizDoom, showing that we can achieve the state-of-the-art under a variety of settings.

Year	DOI	Venue
2019	10.1145/3343031.3351094	Proceedings of the 27th ACM International Conference on Multimedia
Keywords	Field	DocType
video embedding, video to program translation, video understanding	Computer science,Multimedia	Conference
ISBN	Citations	PageRank
978-1-4503-6889-6	0	0.34
References	Authors
0	7

Authors (7 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Xuguang Duan	1	4	2.07
Qi Wu	2	396	41.54
Chuang Gan	3	253	31.92
Yiwei Zhang	4	52	12.65
Wen-bing Huang	5	167	18.91
Anton van den Hengel	6	3710	174.30
Wenwu Zhu	7	4399	300.42

1