MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION - Citegraph

Paper Info

Title
MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION

Abstract
In this paper, we propose MixSpeech, a simple yet effective data augmentation method based on mixup for automatic speech recognition (ASR). MixSpeech trains an ASR model by taking a weighted combination of two different speech features (e.g., mel-spectrograms or MFCC) as the input, and recognizing both text sequences, where the two recognition losses use the same combination weight. We apply MixSpeech on two popular end-to-end speech recognition models including LAS (Listen, Attend and Spell) and Transformer, and conduct experiments on several low-resource datasets including TIMIT, WSJ, and HKUST. Experimental results show that MixSpeech achieves better accuracy than the baseline models without data augmentation, and outperforms a strong data augmentation method SpecAugment on these recognition tasks. Specifically, MixSpeech outperforms SpecAugment with a relative PER improvement of 10.6% on TIMIT dataset, and achieves a strong WER of 4.7% on WSJ dataset.

Year	DOI	Venue
2021	10.1109/ICASSP39728.2021.9414483	2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords	DocType	Citations
Speech Recognition, Data Augmentation, Low-resource, Mixup	Conference	0
PageRank	References	Authors
0.34	8	6

Authors (6 rows)

Cited by (0 rows)

References (8 rows)

Name	Order	Citations	PageRank
Linghui Meng	1	0	1.69
Jin Xu	2	6	3.22
Xu Tan	3	88	23.94
Jindong Wang	4	247	16.56
Tao Qin	5	2384	147.25
Bo Xu	6	130	9.43

1