Abstract | ||
---|---|---|
First-person video naturally brings the use of a physical environment to the forefront, since it shows the camera wearer interacting fluidly in a space based on his intentions. However, current methods largely separate the observed actions from the persistent space itself. We introduce a model for environment affordances that is learned directly from egocentric video. The main idea is to gain a human-centric model of a physical space (such as a kitchen) that captures (1) the primary spatial zones of interaction and (2) the likely activities they support. Our approach decomposes a space into a topological map derived from first-person activity, organizing an ego-video into a series of visits to the different zones. Further, we show how to link zones across multiple related environments (e.g., from videos of multiple kitchens) to obtain a consolidated representation of environment functionality. On EPIC-Kitchens and EGTEA+, we demonstrate our approach for learning scene affordances and anticipating future actions in long-form video. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/CVPR42600.2020.00024 | 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
Keywords | DocType | ISSN |
environment functionality,EPIC-Kitchens,scene affordances,long-form video,environment affordances,egocentric video,first-person video,human-centric model,primary spatial zones,first-person activity,ego-video,topological map,EGTEA+,Ego-Topo | Conference | 1063-6919 |
ISBN | Citations | PageRank |
978-1-7281-7169-2 | 0 | 0.34 |
References | Authors | |
41 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tushar Nagarajan | 1 | 18 | 1.62 |
Yanghao Li | 2 | 194 | 13.98 |
Christoph Feichtenhofer | 3 | 519 | 20.44 |
Kristen Grauman | 4 | 6258 | 326.34 |