Large scale weakly and semi-supervised learning for low-resource video ASR - Citegraph

Paper Info

Title
Large scale weakly and semi-supervised learning for low-resource video ASR

Abstract
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems. On the challenging task of transcribing social media videos in low-resource conditions, we conduct a large scale systematic comparison between two self-labeling methods on one hand, and weakly-supervised pretraining using contextual metadata on the other. We investigate distillation methods at the frame level and the sequence level for hybrid, encoder-only CTC-based, and encoder-decoder speech recognition systems on Dutch and Romanian languages using 27,000 and 58,000 hours of unlabeled audio respectively. Although all approaches improved upon their respective baseline WERs by more than 8%, sequence-level distillation for encoder-decoder models provided the largest relative WER reduction of 20% compared to the strongest data-augmented supervised baseline.

Year	DOI	Venue
2020	10.21437/Interspeech.2020-1917	INTERSPEECH
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	10

Authors (10 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Singh Kritika	1	0	0.34
Manohar Vimal	2	0	0.34
Xiao Alex	3	3	2.44
Sergey Edunov	4	204	10.37
Ross B. Girshick	5	21921	927.22
Vitaliy Liptchinsky	6	8	3.16
Christian Fuegen	7	9	6.58
Saraf Yatharth	8	0	0.34
Geoffrey Zweig	9	3406	320.25
Abdel-rahman Mohamed	10	3772	266.13

1