Abstract | ||
---|---|---|
Automatically describing video content with natural language is a fundamental challenge of computer vision. Recurrent Neural Networks (RNNs), which models sequence dynamics, has attracted increasing attention on visual interpretation. However, most existing approaches generate a word locally with the given previous words and the visual content, while the relationship between sentence semantics and visual content is not holistically exploited. As a result, the generated sentences may be contextually correct but the semantics (e.g., subjects, verbs or objects) are not true.This paper presents a novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual-semantic embedding. The former aims to locally maximize the probability of generating the next word given previous words and visual content, while the latter is to create a visual-semantic embedding space for enforcing the relationship between the semantics of the entire sentence and visual content. The experiments on YouTube2Text dataset show that our proposed LSTM-E achieves to-date the best published performance in generating natural sentences: 45.3% and 31.0% in terms of BLEU@4 and METEOR, respectively. Superior performances are also reported on two movie description datasets (M-VAD and MPII-MD). In addition, we demonstrate that LSTM-E outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1109/CVPR.2016.497 | 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) |
Field | DocType | Volume |
Computer vision,Embedding,Pattern recognition,Computer science,Visual interpretation,Speech recognition,Natural language,Artificial intelligence,Natural language processing,Artificial neural network,Sentence,Semantics | Journal | abs/1505.01861 |
Issue | ISSN | Citations |
1 | 1063-6919 | 135 |
PageRank | References | Authors |
3.07 | 40 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yingwei Pan | 1 | 357 | 23.66 |
Tao Mei | 2 | 4702 | 288.54 |
Ting Yao | 3 | 842 | 52.62 |
Houqiang Li | 4 | 2090 | 172.30 |
Yong Rui | 5 | 7052 | 449.08 |