Abstract | ||
---|---|---|
Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues an addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input Our solution is to re-rain the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%. |
Year | DOI | Venue |
---|---|---|
1996 | 10.1109/ICSLP.1996.607281 | ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4 |
Keywords | Field | DocType |
hidden markov models,layout,cumulant,feature extraction,speech processing,training data,automatic speech recognition,speech recognition,transfer function,word recognition,human voice | Speech enhancement,Speech processing,Human voice,Computer science,Audio mining,Word recognition,Speech recognition,Feature extraction,Hidden Markov model,Acoustic model | Conference |
Citations | PageRank | References |
3 | 1.09 | 8 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Hiroshi G. Okuno | 1 | 2092 | 233.19 |
Tomohiro Nakatani | 2 | 1327 | 139.18 |
Takeshi Kawabata | 3 | 296 | 51.73 |