Title
Investigating Multimodal Audiovisual Event Detection and Localization.
Abstract
The current paper investigates a multisensory speaker tracking approach, combining sound localization with visual object detection and tracking. The sound localization module estimates the position of the speaker whenever a new spatial audio event is detected (i.e. sound source position /speaker alteration). Besides localization, spatial audio events can be detected /verified through various decision-making systems utilizing multichannel audio features. Visual object detection and tracking is also considered, either in parallel with the sound system or subsequently, after the initial sound localization. The case scenario examined in this paper consists of energy-based sound localization using a core cross-shaped coincident microphone array combined with state of the art machine vision, such as the OpenCV face detection pre-trained classifiers and the Open Tracking and Learning Detection (openTLD) framework. A modular multi-sensory architecture is involved, allowing microphone array(s) to be combined with multi-camera sequences and other signals (i.e. depth /motion imaging). The proposed approach is presented and demonstrated in focused real-world scenarios (i.e. cultural /theatrical shows capturing and live-streaming, meetings and press-conferences, recording and broadcast of video lectures, teleconferences, etc.).
Year
DOI
Venue
2016
10.1145/2986416.2986426
Audio Mostly Conference
Keywords
Field
DocType
Spatial Audio Features, Audio Segmentation, Audio Localization, Audiovisual Event Detection, Multimodal decision making
Broadcasting,Object detection,Computer vision,Machine vision,Computer science,Microphone array,Speech recognition,Artificial intelligence,Sound localization,Modular design,Face detection,Coincident
Conference
Citations 
PageRank 
References 
1
0.36
20
Authors
4
Name
Order
Citations
PageRank
N. Vryzas142.46
R. Kotsakis2365.76
Charalampos Dimoulas310412.35
G. Kalliris427714.72