Abstract | ||
---|---|---|
A computational auditory scene analysis (CASA) system is described, in which sound separation according to spatial location is combined with the 'missing data' approach for automatic speech recognition. Time-frequency masks for the missing data recognizer are derived from the statistics of interaural time and level differences; these masks identify acoustic features that constitute reliable evidence of the target speech signal. It is demonstrated that this approach yields good performance in a challenging environment, in which a target voice is contaminated by another talker and reverberation. The ability of the system to generalize to source-receiver configurations that were not encountered during training is discussed. |
Year | DOI | Venue |
---|---|---|
2006 | 10.1109/ICASSP.2006.1661434 | 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 |
Keywords | Field | DocType |
computational auditory scene analysis,robustness,missing data,image analysis,time frequency analysis,time frequency,automatic speech recognition,reverberation,statistics,speech coding,speech recognition | Sound separation,Speech coding,Computer science,Robustness (computer science),Artificial intelligence,Missing data,Computational auditory scene analysis,Reverberation,Pattern recognition,Speech recognition,Time–frequency analysis,Binaural recording,Statistics | Conference |
ISSN | Citations | PageRank |
1520-6149 | 1 | 0.46 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Guy J. Brown | 1 | 760 | 97.54 |
Sue Harding | 2 | 37 | 4.49 |
Jon P. Barker | 3 | 48 | 4.74 |