Abstract | ||
---|---|---|
Temporal Action Localization (TAL) in untrimmed video is important for many applications. But it is very expensive to annotate the segment-level ground truth (action class and temporal boundary). This raises the interest of addressing TAL with weak supervision, namely only video-level annotations are available during training). However, the state-of-the-art weakly-supervised TAL methods only focus on generating good Class Activation Sequence (CAS) over time but conduct simple thresholding on CAS to localize actions. In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance. We propose a novel Outer-Inner-Contrastive (OIC) loss to automatically discover the needed segment-level supervision for training such a boundary predictor. Our method achieves dramatically improved performance: under the IoU threshold 0.5, our method improves mAP on THUMOS'14 from 13.7% to 21.2% and mAP on ActivityNet from 7.4% to 27.3%. It is also very encouraging to see that our weakly-supervised method achieves comparable results with some fully-supervised methods. |
Year | DOI | Venue |
---|---|---|
2018 | 10.1007/978-3-030-01270-0_10 | COMPUTER VISION - ECCV 2018, PT XVI |
Keywords | Field | DocType |
Temporal action localization, Weak supervision, Outer-Inner-contrastive, Class activation sequence | Computer science,Ground truth,Artificial intelligence,Thresholding,Machine learning | Conference |
Volume | ISSN | Citations |
11220 | 0302-9743 | 23 |
PageRank | References | Authors |
0.69 | 32 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zheng Shou | 1 | 155 | 7.92 |
Hang Gao | 2 | 28 | 1.47 |
Lei Zhang | 3 | 31 | 5.43 |
Kazuyuki Miyazawa | 4 | 49 | 4.76 |
Shih-Fu Chang | 5 | 13015 | 1101.53 |