Abstract | ||
---|---|---|
Real telephony speech recognition task is challenging due to 1) diversified channel distortions and 2) limited access to the real data because of the data privacy consideration. In this paper, assuming no real telephony data are available, we employ diversified audio codecs simulation based data augmentation method to train telephony speech recognition system. Specifically, we assume only wide-band 16 kHz data are available, and we first down-sample the 16 kHz data to the 8 kHz data; we then pass the down-sampled data through various categories of audio codecs to simulate the real channel distortion. As a result, we train our speech recognition with such distorted data. To analyze the effectiveness of different audio codec simulation methods, we classify them into three main categories according to their distortion severity, in terms of their spectrogram analysis. We conduct experiments on various real telephony test sets to show the effectiveness of the proposed data augmentation method. The result shows that the real data is more close with highly distorted simulation data, since the model with highly distorted data reduce the Word-Error-Rate 7.28%-12.78% compared to the baseline. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/APSIPAASC47483.2019.9023257 | 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) |
Keywords | DocType | ISSN |
data privacy,telephony data,data augmentation,telephony speech recognition system,channel distortion,highly distorted simulation data,audio codec simulation,distortion severity classification,spectrogram analysis | Conference | 2640-009X |
ISBN | Citations | PageRank |
978-1-7281-3249-5 | 0 | 0.34 |
References | Authors | |
0 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thi-Ly Vu | 1 | 0 | 0.34 |
Zhiping Zeng | 2 | 1 | 3.06 |
Haihua Xu | 3 | 55 | 11.41 |
Eng Siong Chng | 4 | 970 | 106.33 |