Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling. - Citegraph

Paper Info

Title
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling.

Abstract
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring more data compared to conventional DNN-HMM systems. In this work, we attempt to use data from 10 BABEL languages to build a multilingual seq2seq model as a prior model, and then port them towards 4 other BABEL languages using transfer learning approach. We also explore different architectures for improving the prior multilingual seq2seq model. The paper also discusses the effect of integrating a recurrent neural network language model (RNNLM) with a seq2seq model during decoding. Experimental results show that the transfer learning approach from the multilingual model shows substantial gains over monolingual models across all 4 BABEL languages. Incorporating an RNNLM also brings significant improvements in terms of %WER, and achieves recognition performance comparable to the models trained with twice more training data.

Year	Venue	Keywords
2018	2018 IEEE Spoken Language Technology Workshop (SLT)	Decoding,Training,Data models,Mathematical model,Convolution,Two dimensional displays,Speech recognition
DocType	Volume	ISSN
Conference	abs/1810.03459	2639-5479
Citations	PageRank	References
1	0.35	9
Authors
9

Authors (9 rows)

Cited by (1 rows)

References (9 rows)

Name	Order	Citations	PageRank
Jaejin Cho	1	8	2.91
Murali Karthick Baskar	2	8	4.99
ruizhi li	3	51	12.01
Matthew Wiesner	4	5	2.85
Sri Harish Reddy Mallidi	5	1	0.35
Nelson Yalta	6	11	2.17
Martin Karafiát	7	154	12.74
Shinji Watanabe	8	1158	139.38
Takaaki Hori	9	408	45.58

1