Abstract | ||
---|---|---|
This paper presents advances in analyzing audio content information to detect events in videos, such as a parade or a birthday party. We developed a set of tools for audio processing within the predominantly vision-focused deep neural network (DNN) framework Caffe. Using these tools, we show, for the first time, the potential of using only a DNN for audio-based multimedia event detection. Training DNNs for event detection using the entire audio track from each video causes a computational bottleneck. Here, we address this problem by developing a sparse audio frame-sampling method that improves event-detection speed and accuracy. We achieved a 10 percentage-point improvement in event classification accuracy, with a 200x reduction in the number of training input examples as compared to using the entire track. This reduction in input feature volume led to a 16x reduction in the size of the DNN architecture and a 300x reduction in training time. We applied our method using the recently released YLI-MED dataset and compared our results with a state-of-the-art system and with results reported in the literature for TRECVIDMED. Our results show much higher MAP scores compared to a baseline i-vector system - at a significantly reduced computational cost. The speed improvement is relevant for processing videos on a large scale, and could enable more effective deployment in mobile systems. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2671188.2749396 | ICMR |
Keywords | Field | DocType |
Multimedia Event Detection, Audio, Video, Deep Neural, Networks, Caffe | Bottleneck,Software deployment,Computer science,Artificial intelligence,Audio signal processing,Artificial neural network,Deep neural networks,Pattern recognition,Caffè,Speech recognition,Sampling (statistics),Multimedia,Machine learning | Conference |
Citations | PageRank | References |
7 | 0.58 | 6 |
Authors | ||
7 |
Name | Order | Citations | PageRank |
---|---|---|---|
Khalid Ashraf | 1 | 8 | 0.98 |
Benjamin Elizalde | 2 | 359 | 22.38 |
Forrest N. Iandola | 3 | 352 | 17.25 |
Matthew W. Moskewicz | 4 | 1743 | 102.93 |
Julia Bernd | 5 | 19 | 4.98 |
Gerald Friedland | 6 | 11 | 1.31 |
Kurt Keutzer | 7 | 5040 | 801.67 |