AISPEECH-SJTU ASR SYSTEM FOR THE ACCENTED ENGLISH SPEECH RECOGNITION CHALLENGE - Citegraph

Paper Info

Title
AISPEECH-SJTU ASR SYSTEM FOR THE ACCENTED ENGLISH SPEECH RECOGNITION CHALLENGE

Abstract
This paper describes the AISpeech-SJTU ASR system for the Interspeech-2020 Accented English Speech Recognition Challenge (AESRC). This task is challenging due to the diversity of pronunciation accuracy, intonation speed and pronunciation of some syllables. All participants were restricted to develop their systems based on the speech and text corpora provided by the organizer. To work around the data-scarcity problem, data augmentation was first explored including noise simulation, SpecAugment, speed perturbation and TTS simulation. Moreover, SOTA CNN-transformer-based joint CTC-attention system was built and accent adaptation was proposed to train an accent robust system. Finally, the first-pass recognition hypotheses generated from CTC head were rescored by forward, backward LSTM-LM and the attention head. Our system with the best configuration achieves second place in the challenge, resulting in a word error rate (WER) of 4.00% on dev set and 4.47% WER on test set, while WER on test set of the top-performing, second runner-up and official baseline systems are 4.06%, 4.52%, 8.29%, respectively.

Year	DOI	Venue
2021	10.1109/ICASSP39728.2021.9414471	2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords	DocType	Citations
accent speech recognition, accent adaptation, data augmentation, RNNLM	Conference	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Tian Tan	1	99	6.27
Yizhou Lu	2	1	3.72
Rao Ma	3	0	1.35
Sen Zhu	4	0	0.34
Jiaqi Guo	5	0	1.01
Yanmin Qian	6	295	44.44

1