Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure. - Citegraph

Paper Info

Title
Learning Deep Intrinsic Video Representation by Exploring Temporal Coherence and Graph Structure.

Abstract
Learning video representation is not a trivial task, as video is an information-intensive media where each frame does not exist independently. Locally, a video frame is visually and semantically similar with its adjacent frames. Holistically, a video has its inherent structure--the correlations among video frames. For example, even the frames far from each other may also hold similar semantics. Such context information is therefore important to characterize the intrinsic representation of a video frame. In this paper, we present a novel approach to learn the deep video representation by exploring both local and holistic contexts. Specifically, we propose a triplet sampling mechanism to encode the local temporal relationship of adjacent frames based on their deep representations. In addition, we incorporate the graph structure of the video, as a priori, to holistically preserve the inherent correlations among video frames. Our approach is fully unsupervised and trained in an end-to-end deep convolutional neural network architecture. By extensive experiments, we show that our learned representation can significantly boost several video recognition tasks (retrieval, classification, and highlight detection) over traditional video representations.

Year	Venue	Field
2016	IJCAI	Computer vision,ENCODE,Architecture,Convolutional neural network,Computer science,A priori and a posteriori,Motion compensation,Coherence (physics),Video tracking,Artificial intelligence,Semantics,Machine learning
DocType	Citations	PageRank
Conference	12	0.59
References	Authors
23	6

Authors (6 rows)

Cited by (12 rows)

References (23 rows)

Name	Order	Citations	PageRank
Yingwei Pan	1	357	23.66
Yehao Li	2	75	8.57
Ting Yao	3	842	52.62
Tao Mei	4	4702	288.54
Houqiang Li	5	2090	172.30
Yong Rui	6	7052	449.08

1