Title
CMU's IWSLT 2022 Dialect Speech Translation System.
Abstract
This paper describes CMU’s submissions to the IWSLT 2022 dialect speech translation (ST) shared task for translating Tunisian-Arabic speech to English text. We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems. We also augment the paired ASR data with pseudo translations via sequence-level knowledge distillation from an MT model and use these artificial triplet ST data to improve our end-to-end (E2E) systems. Our E2E models are based on the Multi-Decoder architecture with searchable hidden intermediates. We extend the Multi-Decoder by orienting the speech encoder towards the target language by applying ST supervision as hierarchical connectionist temporal classification (CTC) multi-task. During inference, we apply joint decoding of the ST CTC and ST autoregressive decoder branches of our modified Multi-Decoder. Finally, we apply ROVER voting, posterior combination, and minimum bayes-risk decoding with combined N-best lists to ensemble our various cascaded and E2E systems. Our best systems reached 20.8 and 19.5 BLEU on test2 (blind) and test1 respectively. Without any additional MSA data, we reached 20.4 and 19.2 on the same test sets.
Year
DOI
Venue
2022
10.18653/v1/2022.iwslt-1.27
International Conference on Spoken Language Translation (IWSLT)
DocType
Volume
Citations 
Conference
Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022)
0
PageRank 
References 
Authors
0.34
0
9
Name
Order
Citations
PageRank
Brian Yan101.01
Patrick Fernandes201.01
Siddharth Dalmia301.35
Jiatong Shi414.08
Yifan Peng500.68
Dan Berrebbi601.35
Xinyi Wang743.82
Graham Neubig804.06
Shinji Watanabe91158139.38