Title
Building Statistical Parametric Multi-speaker Synthesis for Bangladeshi Bangla.
Abstract
We present a text-to-speech (TTS) system designed for the dialect of Bengali spoken in Bangladesh. This work is part of an ongoing effort to address the needs of new under-resourced languages. We propose a process for streamlining the bootstrapping of TTS systems for under-resourced languages. First, we use crowdsourcing to collect the data from multiple ordinary speakers, each speaker recording small amount of sentences. Second, we leverage an existing text normalization system for a related language (Hindi) to bootstrap a linguistic front-end for Bangla. Third, we employ statistical techniques to construct multi-speaker acoustic models using Long Short-term Memory Recurrent Neural Network (LSTM-RNN) and Hidden Markov Model (HMM) approaches. We then describe our experiments that show that the resulting TTS voices score well in terms of their perceived quality as measured by Mean Opinion Score (MOS) evaluations.
Year
DOI
Venue
2016
10.1016/j.procs.2016.04.049
Procedia Computer Science
Keywords
Field
DocType
TTS,Bangladesh,HMM,LSTM-RNN,acoustic modeling
Crowdsourcing,Hindi,Computer science,Bootstrapping,Recurrent neural network,Speech recognition,Mean opinion score,Bengali,Artificial intelligence,Natural language processing,Hidden Markov model,Text normalization
Conference
Volume
ISSN
Citations 
81
1877-0509
2
PageRank 
References 
Authors
0.40
15
6
Name
Order
Citations
PageRank
Alexander Gutkin121.08
Linne Ha253.19
Martin Jansche325723.92
Oddur Kjartansson464.89
Knot Pipatsrisawat535820.44
Richard Sproat6317.34