Abstract | ||
---|---|---|
Wearable cameras capture a first-person view of the world, and offer a hands-free way to record daily experiences or special events. Yet, not every frame is worthy of being captured and stored. We propose to automatically predict "snap points" in unedited egocentric video-that is, those frames that look like they could have been intentionally taken photos. We develop a generative model for snap points that relies on a Web photo prior together with domain-adapted features. Critically, our approach avoids strong assumptions about the particular content of snap points, focusing instead on their composition. Using 17 hours of egocentric video from both human and mobile robot camera wearers, we show that the approach accurately isolates those frames that human judges would believe to be intentionally snapped photos. In addition, we demonstrate the utility of snap point detection for improving object detection and keyframe selection in egocentric video. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1007/978-3-319-10602-1_19 | COMPUTER VISION - ECCV 2014, PT V |
Keywords | Field | DocType |
Ground Truth,Object Detection,Salient Object,Label Image,Video Summarization | Computer vision,Object detection,Wearable computer,Computer science,SNAP Points,Salient objects,Ground truth,Artificial intelligence,Mobile robot,Generative model | Conference |
Volume | ISSN | Citations |
8693 | 0302-9743 | 31 |
PageRank | References | Authors |
0.88 | 34 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bo Xiong | 1 | 58 | 5.74 |
Kristen Grauman | 2 | 6258 | 326.34 |