Title
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems
Abstract
Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system featuring speed perturbation, SpecAugment and Bayesian learning hidden unit contributions (LHUC) speaker adaptation was used to produce initial N-best outputs before being rescored by the speaker adapted Conformer system using a 2-way cross system score interpolation. In cross adaptation, the hybrid CNN-TDNN system was adapted to the 1-best output of the Conformer system or vice versa. Experiments on the 300-hour Switchboard corpus suggest that the combined systems derived using either of the two system combination approaches outperformed the individual systems. The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5% to 3.9% absolute (22.5% to 28.9% relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.
Year
DOI
Venue
2022
10.21437/INTERSPEECH.2022-696
Conference of the International Speech Communication Association (INTERSPEECH)
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
10
Name
Order
Citations
PageRank
Mingyu Cui102.03
Jiajun Deng201.69
Shoukang Hu3610.90
Xurong Xie468.57
Tianzi Wang502.03
Shujie Hu601.35
Mengzhe Geng715.42
Boyang Xue800.68
Xunying Liu933052.46
Helen M. Meng101078172.82