Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events. - Citegraph

Paper Info

Title
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events.

Abstract
Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We show that the learnt representations are useful for classifying events and localizing their characteristic audio-visual elements. The system is trained using only video-level event labels without any timing information. An important feature of our method is its capacity to learn from unsynchronized audio-visual events. We achieve state-of-the-art results on a large-scale dataset of weakly-labeled audio event videos. Visualizations of localized visual regions and audio segments substantiate our systemu0027s efficacy, especially when dealing with noisy situations where modality-specific cues appear asynchronously.

Year	Venue	DocType
2018	CVPR Workshops	Conference
Volume	Citations	PageRank
abs/1804.07345	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Sanjeel Parekh	1	3	2.48
Slim Essid	2	212	32.00
Alexey Ozerov	3	637	37.14
Ngoc Q. K. Duong	4	288	21.11
Patrick Pérez	5	6529	391.34
Gaël Richard	6	1220	110.40

1