Personalizing TTS Voices for Progressive Dysarthria - Citegraph

Paper Info

Title
Personalizing TTS Voices for Progressive Dysarthria

Abstract
Amyotrophic lateral sclerosis (ALS) patients experience progressive speech deterioration due to muscle paralysis, leading to eventual loss of verbal communication capability. Text-to-speech synthesis (TTS) is an important technology for speech generating devices, enabling users to communicate using generic electronic voices, but often without the vocal identity of the users. Our work is aimed at personalizing TTS voices for people with ALS induced dysarthria by integrating machine learning and speech processing techniques of voice conversion (VC) and TTS. This is challenging as only small quantities of dysarthric speech are available from individual patients. Our system includes both timbre and prosody conversion for VC, neural TTS to generate TTS speech, and neural feature converter to interface VC and TTS. We collected speech data from 4 ALS target speakers with mild to severe dysarthria. Subjective listening tests showed that on average, our approach improved speech intelligibility by about 72% over the target speakers’ speech, the converted voice was 2 to 3 times more similar to ALS targets than to TTS sources, and the converted speech quality was in the MOS scale of fair to good.

Year	DOI	Venue
2021	10.1109/BHI50953.2021.9508522	2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI)
Keywords	DocType	ISSN
voice conversion,feature conversion,neural TTS,dysarthria,amyotrophic lateral sclerosis	Conference	2641-3590
ISBN	Citations	PageRank
978-1-6654-4770-6	0	0.34
References	Authors
0	4

Authors (4 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Yunxin Zhao	1	807	121.74
Minguang Song	2	0	2.37
Yanghao Yue	3	0	0.34
Mili Kuruvilla-Dugdale	4	0	0.34

1