Title
Pronunciation Learning With Rnn-Transducers
Abstract
Most speech recognition systems rely on pronunciation dictionaries to provide accurate transcriptions. Typically, some pronunciations are carved manually, but many are produced using pronunciation learning algorithms. Successful algorithms must have the ability to generate rich pronunciation variants, e.g. to accommodate words of foreign origin, while being robust to artifacts of the training data, e.g. noise in the acoustic segments from which the pronunciations are learned if the method uses acoustic signals. We propose a general finite-state transducer (FST) framework to describe such algorithms. This representation is flexible enough to accommodate a wide variety of pronunciation learning algorithms, including approaches that rely on the availability of acoustic data, and methods that only rely on the spelling of the target words. In particular, we show that the pronunciation FST can be built from a recurrent neural network (RNN) and tuned to provide rich yet constrained pronunciations. This new approach reduces the number of incorrect pronunciations learned from Google Voice traffic by up to 25% relative.
Year
DOI
Venue
2017
10.21437/Interspeech.2017-47
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords
Field
DocType
speech recognition, pronunciation learning
Pronunciation,Computer science,Speech recognition
Conference
ISSN
Citations 
PageRank 
2308-457X
0
0.34
References 
Authors
7
5
Name
Order
Citations
PageRank
Antoine Bruguier163.50
Danushen Gnanapragasam200.34
Leif Johnson3374.34
Kanishka Rao418911.94
Françoise Beaufays5272.84