Title
Detection and separation of speech event using audio and video information fusion and its application to robust speech interface
Abstract
A method of detecting speech events in a multiple-sound-source condition using audio and video information is proposed. For detecting speech events, sound localization using a microphone array and human tracking by stereo vision is combined by a Bayesian network. From the inference results of the Bayesian network, information on the time and location of speech events can be known. The information on the detected speech events is then utilized in the robust speech interface. A maximum likelihood adaptive beamformer is employed as a preprocessor of the speech recognizer to separate the speech signal from environmental noise. The coefficients of the beamformer are kept updated based on the information of the speech events. The information on the speech events is also used by the speech recognizer for extracting the speech segment.
Year
DOI
Venue
2004
10.1155/S1110865704402303
EURASIP J. Adv. Sig. Proc.
Keywords
Field
DocType
robust speech interface,human tracking,environmental noise,bayesian network,speech recognizer,speech signal,speech event,speech segment,video information fusion,maximum likelihood adaptive beamformer,video information,speech recognition,adaptive beamformer,sound localization
Speech processing,Adaptive beamformer,Speech coding,Computer science,Voice activity detection,Audio mining,Speech recognition,Codec2,Linear predictive coding,Acoustic model
Journal
Volume
Issue
ISSN
2004,
11
1687-6180
Citations 
PageRank 
References 
22
2.46
11
Authors
8
Name
Order
Citations
PageRank
Futoshi Asano150251.20
Kiyoshi Yamamoto2739.76
isao hara31059.79
Jun Ogata4466.07
Takashi Yoshimura5304.04
Yoichi Motomura631240.26
Naoyuki Ichimura77816.80
Hideki Asoh870589.85