TOWARDS DATA SELECTION ON TTS DATA FOR CHILDREN'S SPEECH RECOGNITION - Citegraph

Paper Info

Title
TOWARDS DATA SELECTION ON TTS DATA FOR CHILDREN'S SPEECH RECOGNITION

Abstract
Although great progress has been made on automatic speech recognition (ASR) systems, children's speech recognition still remains a challenging task. General ASR systems for children's speech suffer from the lack of corpora and mismatch between children's and adults' speech. Efforts have been made to reduce such mismatch by applying normalization methods to generate modified adults' speech for ASR training. However, modified adults' data can reflect the characteristics of children's speech to a very limited extent. In this work, we adopt text-to-speech data augmentation to improve the performance of children's speech recognition system. We find that the children's TTS model generates speech with inconsistent quality due to children's substandard pronunciations of phonemes, and the ASR system suffers when trained with these additional synthesized data. To solve this problem, we propose data selection strategies on the TTS augmented data, and the effectiveness of the synthesized data can be substantially boosted for children's ASR modeling. We show that the speaker embedding similarity based data selection strategy can obtain the best position: relative 14.0% and 14.7% CER reduction for child conversation and child reading test set respectively compared to the baseline model trained on real data.

Year	DOI	Venue
2021	10.1109/ICASSP39728.2021.9413930	2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords	DocType	Citations
children's speech recognition, data augmentation, text-to-speech, data selection	Conference	0
PageRank	References	Authors
0.34	0	6

Authors (6 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Wei Wang	1	0	0.34
Zhikai Zhou	2	0	0.68
Yizhou Lu	3	1	3.72
Hongji Wang	4	2	1.40
Chenpeng Du	5	0	1.69
Yanmin Qian	6	295	44.44

1