Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds - Citegraph

Paper Info

Title
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Abstract
Research on noise robust speech recognition has mainly focused on dealing with relatively stationary noise that may differ from the noise conditions in most living environments. In this paper, we introduce a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room. To deal with such severe noise conditions, our recognition system exploits all available information about speech and noise; that is spatial (directional), spectral and temporal information. This is realized with a model-based speech enhancement pre-processor, which consists of two complementary elements, a multi-channel speech-noise separation method that exploits spatial and spectral information, followed by a single channel enhancement algorithm that uses the long-term temporal characteristics of speech obtained from clean speech examples. Moreover, to compensate for any mismatch that may remain between the enhanced speech and the acoustic model, our system employs an adaptation technique that combines conventional maximum likelihood linear regression with the dynamic adaptive compensation of the variance of the Gaussians of the acoustic model. Our proposed system approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy.

Year	DOI	Venue
2013	10.1016/j.csl.2012.07.006	Computer Speech & Language
Keywords	Field	DocType
integrated speech enhancement,noise robust speech recognition,time-varying noise source,model-based speech enhancement pre-processor,stationary noise,recognition system,severe noise condition,noise condition,temporal modeling,enhanced speech,clean speech example,acoustic model	Speech enhancement,Speech processing,Recognition system,Computer science,Voice activity detection,Communication channel,Speech recognition,Temporal modeling,Linear predictive coding,Acoustic model	Journal
Volume	Issue	ISSN
27	3	0885-2308
Citations	PageRank	References
11	0.63	27
Authors
14

Authors (14 rows)

Cited by (11 rows)

References (27 rows)

Name	Order	Citations	PageRank
Marc Delcroix	1	699	62.07
Keisuke Kinoshita	2	494	54.81
Tomohiro Nakatani	3	1327	139.18
Shoko Araki	4	1726	158.79
Atsunori Ogawa	5	151	25.35
Takaaki Hori	6	408	45.58
Shinji Watanabe	7	1158	139.38
Masakiyo Fujimoto	8	393	34.28
Takuya Yoshioka	9	585	49.20
Takanobu Oba	10	53	12.09
Yotaro Kubo	11	61	5.47
Mehrez Souden	12	195	14.68
Seongjun Hahm	13	73	8.20
Atsushi Nakamura	14	11	0.97

1