Seeing Bot - Citegraph

Paper Info

Title
Seeing Bot

Abstract
We demonstrate a video captioning bot, named Seeing Bot, which can generate a natural language description about what it is seeing in near real time. Specifically, given a live streaming video, Seeing Bot runs two pre-learned and complementary captioning modules in parallel - one for generating image-level caption for each sampled frame, and the other for generating video-level caption for each sampled video clip. In particular, both the image and video captioning modules are boosted by incorporating semantic attributes which can enrich the generated descriptions, leading to human-level caption generation. A visual-semantic embedding model is then exploited to rank and select the final caption from the two parallel modules by considering the semantic relevance between video content and the generated captions. The Seeing Bot finally converts the generated description to speech and sends the speech to an end user via an earphone. Our demonstration is conducted on any videos in the wild and supports live video captioning.

Year	DOI	Venue
2017	10.1145/3077136.3084144	SIGIR
DocType	ISBN	Citations
Conference	978-1-4503-5022-8	1
PageRank	References	Authors
0.36	0	5

Authors (5 rows)

Cited by (1 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yingwei Pan	1	357	23.66
Zhaofan Qiu	2	117	10.06
Ting Yao	3	842	52.62
Houqiang Li	4	2090	172.30
Tao Mei	5	4702	288.54

1