Title | ||
---|---|---|
Non-native speech conversion with consistency-aware recursive network and generative adversarial network. |
Abstract | ||
---|---|---|
This paper deals with the problem of automatically correcting the pronunciation of non-native speakers. Since the pronunciation characteristics of non-native speakers depend heavily on the context (such as words), conversion rules for correcting pronunciation should be learned from a sequence of features rather than a single-frame feature. For the on-line conversion of local sequences of features, we construct a neural network (NN) that takes a sequence of features as an input/output, generates a sequence of features in a segment-by-segment fashion and guarantees the consistency of the generated features within overlapped segments. Futhermore, we apply a recently proposed generative adversarial network (GAN)-based postfilter to the generated feature sequence with the aim of synthesizing natural-sounding speech. Through subjective and quantitative evaluations, we confirmed the superiority of our proposed method over a conventional NN approach in terms of conversion quality. |
Year | Venue | Field |
---|---|---|
2017 | Asia-Pacific Signal and Information Processing Association Annual Summit and Conference | Pronunciation,Generative adversarial network,Task analysis,Quantitative Evaluations,Computer science,Speech recognition,Feature extraction,Artificial neural network,Recursion |
DocType | ISSN | Citations |
Conference | 2309-9402 | 0 |
PageRank | References | Authors |
0.34 | 0 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Keisuke Oyamada | 1 | 2 | 0.73 |
Hirokazu Kameoka | 2 | 801 | 79.06 |
Takuhiro Kaneko | 3 | 104 | 16.80 |
Hiroyasu Ando | 4 | 4 | 2.15 |
Kaoru Hiramatsu | 5 | 88 | 19.94 |
Kunio Kashino | 6 | 285 | 68.41 |