End-To-End Multilingual Speech Recognition System With Language Supervision Training - Citegraph

Paper Info

Title
End-To-End Multilingual Speech Recognition System With Language Supervision Training

Abstract
End-to-end (E2E) multilingual automatic speech recognition (ASR) systems aim to recognize multilingual speeches in a unified framework. In the current E2E multilingual ASR framework, the output prediction for a specific language lacks constraints on the output scope of modeling units. In this paper, a language supervision training strategy is proposed with language masks to constrain the neural network output distribution. To simulate the multilingual ASR scenario with unknown language identity information, a language identification (LID) classifier is applied to estimate the language masks. On four Babel corpora, the proposed E2E multilingual ASR system achieved an average absolute word error rate (WER) reduction of 2.6% compared with the multilingual baseline system.

Year	DOI	Venue
2020	10.1587/transinf.2019EDL8214	IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
Keywords	DocType	Volume
multilingual speech recognition, language-adaptive training, hybrid attention/CTC	Journal	E103D
Issue	ISSN	Citations
6	1745-1361	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Danyang Liu	1	0	0.34
Ji Xu	2	3	4.14
Pengyuan Zhang	3	50	19.46

1