AVLnet - Learning Audio-Visual Language Representations from Instructional Videos. | 3 | 0.37 | 2021 |
Unsupervised Discriminative Embedding For Sub-Action Learning in Complex Activities. | 0 | 0.34 | 2021 |
Mining YouTube - A dataset for learning fine-grained action concepts from webly supervised video data. | 0 | 0.34 | 2019 |