Abstract | ||
---|---|---|
This paper addresses a new problem of weakly-supervised online action segmentation in instructional videos. We present a framework to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach. We improve our framework by introducing the Online-Offline Discrepancy Loss (OODL) to encourage the segmentation results to have a higher temporal consistency. Furthermore, only during training, we exploit framewise correspondence between multiple views as supervision for training weakly-labeled instructional videos. In particular, we investigate three different multi-view inference techniques to generate more accurate frame-wise pseudo ground-truth with no additional annotation cost. We present results and ablation studies on two benchmark multi-view datasets, Breakfast and IKEA ASM. Experimental results show efficacy of the proposed methods both qualitatively and quantitatively in two domains of cooking and assembly. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/CVPR52688.2022.01341 | IEEE Conference on Computer Vision and Pattern Recognition |
Keywords | DocType | Volume |
Video analysis and understanding, Action and event recognition, Self-& semi-& meta- & unsupervised learning | Conference | 2022 |
Issue | Citations | PageRank |
1 | 0 | 0.34 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Reza Ghoddoosian | 1 | 0 | 0.34 |
Dwivedi Isht | 2 | 0 | 0.68 |
Nakul Agarwal | 3 | 0 | 0.34 |
Chiho Choi | 4 | 36 | 5.61 |
Behzad Dariush | 5 | 109 | 13.14 |