Title
Words Speak for Actions: Using Text to Find Video Highlights
Abstract
Video highlights are a selection of the most interesting parts of a video. The problem of highlight detection has been explored for video domains like egocentric, sports, movies, and surveillance videos. Existing methods are limited to finding visually important parts of the video but does not necessarily learn semantics. Moreover, the available benchmark datasets contain audio muted, single activity, short videos, which lack any context apart from a few keyframes that can be used to understand them. In this work, we explore highlight detection in the TV series domain, which features complex interactions with the surroundings. The existing methods would fare poorly in capturing the video semantics in such videos. To incorporate the importance of dialogues/audio, we propose using the descriptions of shots of the video as cues to learning visual importance. Note that while the audio information is used to determine visual importance during training, the highlight detection still works using only the visual information from videos. We use publicly available text ranking algorithms to rank the descriptions. The ranking scores are used to train a visual pairwise shot ranking model (VPSR) to find the highlights of the video. The results are reported on TV series videos of the VideoSet dataset and a season of Buffy the Vampire Slayer TV series.
Year
DOI
Venue
2017
10.1109/ACPR.2017.141
2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)
Keywords
Field
DocType
Video Highlights,Pairwise Ranking,Text Rank
Pairwise comparison,Learning to rank,Task analysis,Ranking,Information retrieval,Computer science,Visualization,Feature extraction,Semantics
Conference
ISSN
ISBN
Citations 
2327-0977
978-1-5386-3355-7
0
PageRank 
References 
Authors
0.34
7
2
Name
Order
Citations
PageRank
Sukanya Kudi100.34
Anoop M. Namboodiri225526.36