Performance-Efficiency Trade-Offs in Unsupervised Pre-Training for Speech Recognition. - Citegraph

Paper Info

Title
Performance-Efficiency Trade-Offs in Unsupervised Pre-Training for Speech Recognition.

Abstract
This paper is a study of performance-efficiency trade-offs in pre-trained models for automatic speech recognition (ASR). We focus on wav2vec 2.0, and formalize several architecture designs that influence both the model performance and its efficiency. Putting together all our observations, we introduce SEW (Squeezed and Efficient Wav2vec), a pre-trained model architecture with significant improvements along both performance and efficiency dimensions across a variety of training setups. For example, under the 100h-960h semi-supervised setup on LibriSpeech, SEW achieves a 1.9x inference speedup compared to wav2vec 2.0, with a 13.5% relative reduction in word error rate. With a similar inference time, SEW reduces word error rate by 25-50% across different model sizes.

Year	DOI	Venue
2022	10.1109/ICASSP43922.2022.9747432	IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Felix Wu	1	246	27.21
Kwangyoun Kim	2	2	4.11
Jing Pan	3	0	0.68
Kyu Han	4	0	0.34
Kilian Q. Weinberger	5	4072	227.22
Yoav Artzi	6	0	0.34

1