Title
Dual-Modality Seq2seq Network For Audio-Visual Event Localization
Abstract
Audio-visual event localization requires one to identify the event which is both visible and audible in a video (either at a frame or video level). To address this task, we propose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking both audio and visual features at each time segment as inputs, our proposed model learns global and local event information in a sequence to sequence manner, which can be realized in either fully supervised or weakly supervised settings. Empirical results confirm that our proposed method performs favorably against recent deep learning approaches in both settings.
Year
DOI
Venue
2019
10.1109/icassp.2019.8683226
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
Audio-Video Features, Dual Modality, Event Localization, Deep Learning
Pattern recognition,Computer science,Artificial intelligence,Deep learning,Artificial neural network
Journal
Volume
ISSN
Citations 
abs/1902.07473
1520-6149
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Yan-Bo Lin101.01
Yu-Jhe Li241.05
Yu-Chiang Frank Wang391461.63