Dual-Modality Seq2seq Network For Audio-Visual Event Localization - Citegraph

Paper Info

Title
Dual-Modality Seq2seq Network For Audio-Visual Event Localization

Abstract
Audio-visual event localization requires one to identify the event which is both visible and audible in a video (either at a frame or video level). To address this task, we propose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking both audio and visual features at each time segment as inputs, our proposed model learns global and local event information in a sequence to sequence manner, which can be realized in either fully supervised or weakly supervised settings. Empirical results confirm that our proposed method performs favorably against recent deep learning approaches in both settings.

Year	DOI	Venue
2019	10.1109/icassp.2019.8683226	2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords	Field	DocType
Audio-Video Features, Dual Modality, Event Localization, Deep Learning	Pattern recognition,Computer science,Artificial intelligence,Deep learning,Artificial neural network	Journal
Volume	ISSN	Citations
abs/1902.07473	1520-6149	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yan-Bo Lin	1	0	1.01
Yu-Jhe Li	2	4	1.05
Yu-Chiang Frank Wang	3	914	61.63

1