Title
CUHK & ETHZ & SIAT Submission to ActivityNet Challenge 2016.
Abstract
This paper presents the method that underlies our submission to the untrimmed video classification task of ActivityNet Challenge 2016. We follow the basic pipeline of temporal segment networks and further raise the performance via a number of other techniques. Specifically, we use the latest deep model architecture, e.g., ResNet and Inception V3, and introduce new aggregation schemes (top-k and attention-weighted pooling). Additionally, we incorporate the audio as a complementary channel, extracting relevant information via a CNN applied to the spectrograms. With these techniques, we derive an ensemble of deep models, which, together, attains a high classification accuracy (mAP $93.23%$) on the testing set and secured the first place in the challenge.
Year
Venue
Field
2016
arXiv: Computer Vision and Pattern Recognition
Data mining,Spectrogram,Model architecture,Computer science,Pooling,Communication channel,Artificial intelligence,Residual neural network,Machine learning
DocType
Volume
Citations 
Journal
abs/1608.00797
4
PageRank 
References 
Authors
0.52
10
10
Name
Order
Citations
PageRank
Yuanjun Xiong133118.71
LiMin Wang281648.41
Zhe Wang319919.26
Bowen Zhang4804.49
Hang Song5162.28
Wei Li6745.27
Dahua Lin7111772.62
Yu Qiao82267152.01
Luc Van Gool9275661819.51
Xiaoou Tang1015728670.19