Abstract | ||
---|---|---|
We present in this paper an end-to-end automatic speech recognition (ASR) system for a person with an articulation disorder resulting from athetoid cerebral palsy. In the case of a person with this type of articulation disorder, the speech style is quite different from that of a physically unimpaired person, and the amount of their speech data available to train the model is limited because their burden is large due to strain on the speech muscles. Therefore, the performance of ASR systems for people with an articulation disorder degrades significantly. In this paper, we propose an end-to-end ASR framework trained by not only the speech data of a Japanese person with an articulation disorder but also the speech data of a physically unimpaired Japanese person and a non-Japanese person with an articulation disorder to relieve the lack of training data of a target speaker. An end-to-end ASR model encapsulates an acoustic and language model jointly. In our proposed model, an acoustic model portion is shared between persons with dysarthria, and a language model portion is assigned to each language regardless of dysarthria. Experimental results show the merit of our proposed approach of using multiple databases for speech recognition. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/icassp.2019.8683803 | 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) |
Keywords | Field | DocType |
Speech recognition, multilingual, assistive technology, end-to-end model, dysarthria | Training set,Athetoid cerebral palsy,Dysarthric speech,Computer science,End-to-end principle,Dysarthria,Language model,Database,Acoustic model | Conference |
ISSN | Citations | PageRank |
1520-6149 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
yuki takashima | 1 | 4 | 3.84 |
Tetsuya Takiguchi | 2 | 85 | 8.77 |
Yasuo Ariki | 3 | 519 | 88.94 |