Abstract | ||
---|---|---|
In this paper, We propose a novel video captioning model that utilizes context information of correlated clips. Unlike the ordinary “one clip - one caption” algorithms, we concatenate multiple neighboring clips as a chunk and train the network in “one chunk - multiple caption” manner. We train and evaluate our algorithm using M-VAD dataset and report the performance of caption and keyword generation. Our model is a foundation model for generating a video story using several captions. Therefore, in this paper, we focus on caption generation for several videos and trend analysis of the generated captions. In the experiments, we show the performance of intermediate results of our model in both qualitative and quantitative aspects. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/BIGCOMP.2019.8679213 | BigComp |
Keywords | Field | DocType |
Videos,Feature extraction,Motion pictures,Decoding,Training,Task analysis,Data models | Data modeling,Closed captioning,Storytelling,Task analysis,Computer science,Speech recognition,Feature extraction,Concatenation,Decoding methods,CLIPS | Conference |
ISSN | ISBN | Citations |
2375-933X | 978-1-5386-7789-6 | 0 |
PageRank | References | Authors |
0.34 | 0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Seung Ho Han | 1 | 11 | 5.52 |
Bo-Won Go | 2 | 0 | 0.34 |
Jin Ho Choi | 3 | 18 | 8.20 |