Title | ||
---|---|---|
Dori: Discovering Object Relationships For Moment Localization Of A Natural Language Query In A Video |
Abstract | ||
---|---|---|
This paper studies the task of temporal moment localization in long untrimmed videos using natural language queries. Given a query sentence, the goal is to determine the start and end of the relevant segment within the video. Our key innovation is to learn a video feature embedding through a language-conditioned message-passing algorithm suitable for temporal moment localization which captures the relationships between humans, objects and activities in the video. These relationships are obtained by a spatial sub-graph that contextualizes the scene representation using detected objects and human features conditioned in the language query. Moreover, a temporal sub-graph captures the activities within the video through time. Our method is evaluated on three standard benchmark datasets, and we also introduce YouCookII as a new benchmark for this task. Experiments show our method outperforms state-of-the-art methods on these datasets, confirming the effectiveness of our approach. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1109/WACV48630.2021.00112 | 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021) |
DocType | ISSN | Citations |
Conference | 2472-6737 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Cristian Rodriguez | 1 | 0 | 1.69 |
Edison Marrese-Taylor | 2 | 0 | 0.68 |
Basura Fernando | 3 | 0 | 0.34 |
Hongdong Li | 4 | 1724 | 101.81 |
Stephen Gould | 5 | 1378 | 87.70 |