Title
Audio Codec Simulation based Data Augmentation for Telephony Speech Recognition
Abstract
Real telephony speech recognition task is challenging due to 1) diversified channel distortions and 2) limited access to the real data because of the data privacy consideration. In this paper, assuming no real telephony data are available, we employ diversified audio codecs simulation based data augmentation method to train telephony speech recognition system. Specifically, we assume only wide-band 16 kHz data are available, and we first down-sample the 16 kHz data to the 8 kHz data; we then pass the down-sampled data through various categories of audio codecs to simulate the real channel distortion. As a result, we train our speech recognition with such distorted data. To analyze the effectiveness of different audio codec simulation methods, we classify them into three main categories according to their distortion severity, in terms of their spectrogram analysis. We conduct experiments on various real telephony test sets to show the effectiveness of the proposed data augmentation method. The result shows that the real data is more close with highly distorted simulation data, since the model with highly distorted data reduce the Word-Error-Rate 7.28%-12.78% compared to the baseline.
Year
DOI
Venue
2019
10.1109/APSIPAASC47483.2019.9023257
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
Keywords
DocType
ISSN
data privacy,telephony data,data augmentation,telephony speech recognition system,channel distortion,highly distorted simulation data,audio codec simulation,distortion severity classification,spectrogram analysis
Conference
2640-009X
ISBN
Citations 
PageRank 
978-1-7281-3249-5
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Thi-Ly Vu100.34
Zhiping Zeng213.06
Haihua Xu35511.41
Eng Siong Chng4970106.33