Automatic Hypernasality Detection in Cleft Palate Speech Using CNN - Citegraph

Paper Info

Title
Automatic Hypernasality Detection in Cleft Palate Speech Using CNN

Abstract
Automatic hypernasality detection in cleft palate speech can facilitate diagnosis by speech-language pathologists. This paper describes a feature-independent end-to-end algorithm that uses a convolutional neural network (CNN) to detect hypernasality in cleft palate speech. A speech spectrogram is adopted as the input. The average F1-scores for the hypernasality detection task are 0.9485 and 0.9746 using a dataset that is spoken by children and a dataset that is spoken by adults, respectively. The experiments explore the influence of the spectral resolution on the hypernasality detection performance in cleft palate speech. Higher spectral resolution can highlight the vocal tract parameters of hypernasality, such as formants and spectral zeros. The CNN learns efficient features via a two-dimensional filtering operation, while the feature extraction performance of shallow classifiers is limited. Compared with deep neural network and shallow classifiers, CNN realizes the highest F1-score of 0.9485. Comparing various network architectures, the convolutional filter of size 1 × 8 achieves the highest F1-score in the hypernasality detection task. The selected filter size of 1 × 8 considers more frequency information and is more suitable for hypernasality detection than the filters of size 3 × 3, 4 × 4, 5 × 5, and 6 × 6. According to an analysis of hypernasality-sensitive vowels, the experimental result concludes that the vowel /i/ is the most sensitive vowel to hypernasality. Compared with state-of-the-art literature, the proposed CNN-based system realizes a better detection performance. The results of an experiment that is conducted on a heterogeneous corpus demonstrate that CNN can better handle the speech variability compared with the shallow classifiers.

Year	DOI	Venue
2019	10.1007/s00034-019-01141-x	Circuits, Systems, and Signal Processing
Keywords	Field	DocType
Cleft palate speech, Hypernasality, Convolutional neural network, End-to-end, Speech spectrogram	Convolutional neural network,Spectrogram,Control theory,Filter (signal processing),Feature extraction,Speech recognition,Vowel,Formant,Artificial neural network,Mathematics,Vocal tract	Journal
Volume	Issue	ISSN
38	8	0278-081X
Citations	PageRank	References
1	0.36	16
Authors
6

Authors (6 rows)

Cited by (1 rows)

References (16 rows)

Name	Order	Citations	PageRank
Xiyue Wang	1	5	2.49
Ming Tang	2	66	26.92
Sen Yang	3	7	3.55
Heng Yin	4	2153	111.33
Hua Huang	5	803	62.97
Ling He	6	52	6.94

1