Title
Seeing Bot
Abstract
We demonstrate a video captioning bot, named Seeing Bot, which can generate a natural language description about what it is seeing in near real time. Specifically, given a live streaming video, Seeing Bot runs two pre-learned and complementary captioning modules in parallel - one for generating image-level caption for each sampled frame, and the other for generating video-level caption for each sampled video clip. In particular, both the image and video captioning modules are boosted by incorporating semantic attributes which can enrich the generated descriptions, leading to human-level caption generation. A visual-semantic embedding model is then exploited to rank and select the final caption from the two parallel modules by considering the semantic relevance between video content and the generated captions. The Seeing Bot finally converts the generated description to speech and sends the speech to an end user via an earphone. Our demonstration is conducted on any videos in the wild and supports live video captioning.
Year
DOI
Venue
2017
10.1145/3077136.3084144
SIGIR
DocType
ISBN
Citations 
Conference
978-1-4503-5022-8
1
PageRank 
References 
Authors
0.36
0
5
Name
Order
Citations
PageRank
Yingwei Pan135723.66
Zhaofan Qiu211710.06
Ting Yao384252.62
Houqiang Li42090172.30
Tao Mei54702288.54