Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems - Citegraph

Paper Info

Title
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems

Abstract
Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system featuring speed perturbation, SpecAugment and Bayesian learning hidden unit contributions (LHUC) speaker adaptation was used to produce initial N-best outputs before being rescored by the speaker adapted Conformer system using a 2-way cross system score interpolation. In cross adaptation, the hybrid CNN-TDNN system was adapted to the 1-best output of the Conformer system or vice versa. Experiments on the 300-hour Switchboard corpus suggest that the combined systems derived using either of the two system combination approaches outperformed the individual systems. The best combined system obtained using multi-pass rescoring produced statistically significant word error rate (WER) reductions of 2.5% to 3.9% absolute (22.5% to 28.9% relative) over the stand alone Conformer system on the NIST Hub5'00, Rt03 and Rt02 evaluation data.

Year	DOI	Venue
2022	10.21437/INTERSPEECH.2022-696	Conference of the International Speech Communication Association (INTERSPEECH)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	10

Authors (10 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Mingyu Cui	1	0	2.03
Jiajun Deng	2	0	1.69
Shoukang Hu	3	6	10.90
Xurong Xie	4	6	8.57
Tianzi Wang	5	0	2.03
Shujie Hu	6	0	1.35
Mengzhe Geng	7	1	5.42
Boyang Xue	8	0	0.68
Xunying Liu	9	330	52.46
Helen M. Meng	10	1078	172.82

1