Title
Data Augmentation Based on Frequency Warping for Recognition of Cleft Palate Speech
Abstract
In this paper, we present an automatic speech recognition (ASR) system for the speech of a person with a cleft lip and palate (CLP). The accuracy of speech recognition for a person with CLP is lower than that of a physically-unimpaired (PU) person because the CLP speech has characteristics that differ from those of a PU person; moreover, the amount of available training data is quite limited. In the field of ASR for PU people, data augmentation and self-supervised learning have been studied to tackle this problem of data scarcity. In this paper, we evaluate the effectiveness of those approaches on CLP speech recognition, and propose a data augmentation technique based on frequency warping. The formant of CLP speech tends to fluctuate compared to that of PU people. In order to compensate for the large variety of formant components, our data augmentation method stretches or contracts the spectrogram through the frequency axis. The experimental results on an ASR task with two CLP subjects showed that both data augmentation and self-supervised learning were effective for CLP speech recognition, and our proposed method further improved the performance of those two approaches based on conventional SpecAugment techniques.
Year
Venue
Keywords
2021
2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
speech recognition,data augmentation,self-supervised learning,cleft lip and palate,dysarthria
DocType
ISSN
ISBN
Conference
2640-009X
978-1-6654-4162-9
Citations 
PageRank 
References 
0
0.34
0
Authors
7