SpecAugment on Large Scale Datasets - Citegraph

Paper Info

Title
SpecAugment on Large Scale Datasets

Abstract
Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018). We achieve improvement across all test domains by mixing raw training data augmented with SpecAugment and noise-perturbed training data when training the acoustic model. We also introduce a modification of SpecAugment that adapts the time mask size and/or multiplicity depending on the length of the utterance, which can potentially benefit large scale tasks. By using adaptive masking, we are able to further improve the performance of the Listen, Attend and Spell model on LibriSpeech to 2.2% WER on test-clean and 5.2% WER on test-other.

Year	DOI	Venue
2020	10.1109/ICASSP40776.2020.9053205	ICASSP
DocType	Citations	PageRank
Conference	3	0.40
References	Authors
0	8

Authors (8 rows)

Cited by (3 rows)

References (0 rows)

Name	Order	Citations	PageRank
Daniel S. Park	1	22	3.46
Yu Zhang	2	442	41.79
Chung-Cheng Chiu	3	248	28.00
Chen Youzheng	4	3	0.40
Bo Li	5	206	42.46
William Chan	6	357	24.67
Quoc V. Le	7	8501	366.59
Yonghui Wu	8	1065	72.78

1