Abstract | ||
---|---|---|
Time-frequency mask estimation has shown considerable success recently. In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition. Additionally, we investigate how masking compares with feature denoising, which directly reconstructs clean features from noisy ones. We train a mask estimator that predicts ideal ratio masks. Experimental results on Google voice search evaluation sets demonstrate that masking is superior to feature denoising, and a lightweight masking frontend produces significant improvements over a strong baseline. We also show that masking improves performance of a multi condition trained (MTR) acoustic model. |
Year | Venue | Keywords |
---|---|---|
2015 | 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | Robust speech recognition, time-frequency masking, deep neural network, feature denoising |
Field | DocType | Citations |
Pattern recognition,Computer science,Speech recognition,Artificial intelligence,Time frequency masking | Conference | 2 |
PageRank | References | Authors |
0.36 | 6 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yu-Xuan Wang | 1 | 650 | 32.68 |
Ananya Misra | 2 | 77 | 11.46 |
Kean K. Chin | 3 | 42 | 3.49 |