Title
Automatic Hypernasality Detection in Cleft Palate Speech Using CNN
Abstract
Automatic hypernasality detection in cleft palate speech can facilitate diagnosis by speech-language pathologists. This paper describes a feature-independent end-to-end algorithm that uses a convolutional neural network (CNN) to detect hypernasality in cleft palate speech. A speech spectrogram is adopted as the input. The average F1-scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. The experiments explore the influence of the spectral resolution on the hypernasality detection performance in cleft palate speech. Higher spectral resolution can highlight the vocal tract parameters of hypernasality, such as formants and spectral zeros. The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. Compared with deep neural network and shallow classifiers, CNN realizes the highest F1-score of 0.9485. Comparing various network architectures, the convolutional filter of size 1 × 8 achieves the highest F1-score in the hypernasality detection task. The selected filter size of 1 × 8 considers more frequency information and is more suitable for hypernasality detection than the filters of size 3 × 3, 4 × 4, 5 × 5, and 6 × 6. According to an analysis of hypernasality-sensitive vowels, the experimental result concludes that the vowel /i/ is the most sensitive vowel to hypernasality. Compared with state-of-the-art literature, the proposed CNN-based system realizes a better detection performance. The results of an experiment that is conducted on a heterogeneous corpus demonstrate that CNN can better handle the speech variability compared with the shallow classifiers.
Year
DOI
Venue
2019
10.1007/s00034-019-01141-x
Circuits, Systems, and Signal Processing
Keywords
Field
DocType
Cleft palate speech, Hypernasality, Convolutional neural network, End-to-end, Speech spectrogram
Convolutional neural network,Spectrogram,Control theory,Filter (signal processing),Feature extraction,Speech recognition,Vowel,Formant,Artificial neural network,Mathematics,Vocal tract
Journal
Volume
Issue
ISSN
38
8
0278-081X
Citations 
PageRank 
References 
1
0.36
16
Authors
6
Name
Order
Citations
PageRank
Xiyue Wang152.49
Ming Tang26626.92
Sen Yang373.55
Heng Yin42153111.33
Hua Huang580362.97
Ling He6526.94