Abstract | ||
---|---|---|
The article compares two approaches to the description of ultrasound vocal tract images for application in a "silent speech interface," one based on tongue contour modeling, and a second, global coding approach in which images are projected onto a feature space of Eigentongues. A curvature-based lip profile feature extraction method is also presented. Extracted visual features are input to a neural network which learns the relation between the vocal tract configuration and line spectrum frequencies (LSF) contained in a one-hour speech corpus. An examination of the quality of LSF's derived from the two approaches demonstrates that the eigentongues approach has a more efficient implementation and provides superior results based on a normalized mean squared error criterion. |
Year | DOI | Venue |
---|---|---|
2007 | 10.1109/ICASSP.2007.366140 | 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS |
Keywords | Field | DocType |
image processing, speech synthesis, neural network applications, communication systems, silent speech interface | Speech corpus,Speech processing,Speech synthesis,Feature vector,Pattern recognition,Computer science,Image processing,Feature extraction,Speech recognition,Artificial intelligence,Silent speech interface,Vocal tract | Conference |
ISSN | Citations | PageRank |
1520-6149 | 27 | 1.62 |
References | Authors | |
9 | 8 |
Name | Order | Citations | PageRank |
---|---|---|---|
Thomas Hueber | 1 | 150 | 14.21 |
Guido Aversano | 2 | 94 | 7.02 |
Gérard Chollet | 3 | 725 | 129.74 |
B. Denby | 4 | 268 | 26.69 |
Gérard Dreyfus | 5 | 475 | 58.97 |
Y. Oussar | 6 | 294 | 26.32 |
Pierre Roussel-Ragot | 7 | 45 | 4.38 |
M. Stone | 8 | 81 | 11.74 |