Speaker Detection and Applications to Cross-Modal Analysis of Planning Meetings - Citegraph

Paper Info

Title
Speaker Detection and Applications to Cross-Modal Analysis of Planning Meetings

Abstract
Detection of meeting events is one of the most important tasks in multimodal analysis of planning meetings. Speaker detection is a key step for extraction of most meaningful meeting events. In this paper, we present an approach of speaker localization using combination of visual and audio information in multimodal meeting analysis. When talking, people make a speech accompanying mouth movements and hand gestures. By computing correlation of audio signals, mouth movements, and hand motion, we detect a talking person both spatially and temporally. Three kinds of features are extracted for speaker localization. Hand movements are expressed by hand motion efforts; audio features are expressed by computing 12 mel-frequency cepstral coefficients from audio signals, and mouth movements are expressed by normalized cross-correlation coefficients of mouth area between two successive frames. A time delay neural network is trained to learn the correlation relationships, which is then applied to perform speaker localization. Experiments and applications in planning meeting environments are provided.

Year	DOI	Venue
2009	10.1109/ISM.2009.66	ISM
Keywords	Field	DocType
audio signal,mouth movements,motion compensation,audio signals,audio information,cross-modal analysis,audio signal analysis,time delay neural network,speaker localization,speaker detection,hand motion effort,mutimodal meeting analysis,planning,meeting event detection,speaker recognition,audio feature,hand movement,planning meetings,meeting analysis,meaningful meeting event,gesture recognition,audio signal processing,hand motion,mouth movement,hand gesture,neural nets,planning meeting,face,modal analysis,data mining,signal analysis,mel frequency cepstral coefficient,normalized cross correlation,skin,feature extraction	Mel-frequency cepstrum,Audio signal,Computer vision,Computer science,Gesture,Gesture recognition,Speech recognition,Speaker recognition,Time delay neural network,Speaker diarisation,Artificial intelligence,Audio signal processing	Conference
ISBN	Citations	PageRank
978-0-7695-3890-7	0	0.34
References	Authors
13	3

Authors (3 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Bing Fang	1	61	8.64
Yingen Xiong	2	437	30.25
Francis Quek	3	60	6.56

1