Title
Cued Speech automatic recognition in normal-hearing and deaf subjects
Abstract
This article discusses the automatic recognition of Cued Speech in French based on hidden Markov models (HMMs). Cued Speech is a visual mode which, by using hand shapes in different positions and in combination with lip patterns of speech, makes all the sounds of a spoken language clearly understandable to deaf people. The aim of Cued Speech is to overcome the problems of lipreading and thus enable deaf children and adults to understand spoken language completely. In the current study, the authors demonstrate that visible gestures are as discriminant as audible orofacial gestures. Phoneme recognition and isolated word recognition experiments have been conducted using data from a normal-hearing cuer. The results obtained were very promising, and the study has been extended by applying the proposed methods to a deaf cuer. The achieved results have not shown any significant differences compared to automatic Cued Speech recognition in a normal-hearing subject. In automatic recognition of Cued Speech, lip shape and gesture recognition are required. Moreover, the integration of the two modalities is of great importance. In this study, lip shape component is fused with hand component to realize Cued Speech recognition. Using concatenative feature fusion and multi-stream HMM decision fusion, vowel recognition, consonant recognition, and isolated word recognition experiments have been conducted. For vowel recognition, an 87.6% vowel accuracy was obtained showing a 61.3% relative improvement compared to the sole use of lip shape parameters. In the case of consonant recognition, a 78.9% accuracy was obtained showing a 56% relative improvement compared to the use of lip shape only. In addition to vowel and consonant recognition, a complete phoneme recognition experiment using concatenated feature vectors and Gaussian mixture model (GMM) discrimination was conducted, obtaining a 74.4% phoneme accuracy. Isolated word recognition experiments in both normal-hearing and deaf subjects were also conducted providing a word accuracy of 94.9% and 89%, respectively. The obtained results were compared with those obtained using audio signal, and comparable accuracies were observed.
Year
DOI
Venue
2010
10.1016/j.specom.2010.03.001
Speech Communication
Keywords
Field
DocType
french cued speech,phoneme recognition,deaf subject,speech recognition,automatic recognition,cued speech automatic recognition,gesture recognition,hidden markov models,vowel recognition,multi-stream hmm decision fusion,automatic cued speech recognition,feature fusion,isolated word recognition experiment,complete phoneme recognition experiment,consonant recognition,cued speech,feature vector,shape parameter,gaussian mixture model,hidden markov model,word recognition
Consonant,Speech processing,Pattern recognition,Computer science,Gesture,Word recognition,Gesture recognition,Cued speech,Speech recognition,Speaker recognition,Artificial intelligence,Vowel
Journal
Volume
Issue
ISSN
52
6
Speech Communication
Citations 
PageRank 
References 
3
0.43
4
Authors
3
Name
Order
Citations
PageRank
Panikos Heracleous16816.27
Denis Beautemps25716.31
Noureddine Aboutabit3163.68