Pronunciation Learning With Rnn-Transducers - Citegraph

Paper Info

Title
Pronunciation Learning With Rnn-Transducers

Abstract
Most speech recognition systems rely on pronunciation dictionaries to provide accurate transcriptions. Typically, some pronunciations are carved manually, but many are produced using pronunciation learning algorithms. Successful algorithms must have the ability to generate rich pronunciation variants, e.g. to accommodate words of foreign origin, while being robust to artifacts of the training data, e.g. noise in the acoustic segments from which the pronunciations are learned if the method uses acoustic signals. We propose a general finite-state transducer (FST) framework to describe such algorithms. This representation is flexible enough to accommodate a wide variety of pronunciation learning algorithms, including approaches that rely on the availability of acoustic data, and methods that only rely on the spelling of the target words. In particular, we show that the pronunciation FST can be built from a recurrent neural network (RNN) and tuned to provide rich yet constrained pronunciations. This new approach reduces the number of incorrect pronunciations learned from Google Voice traffic by up to 25% relative.

Year	DOI	Venue
2017	10.21437/Interspeech.2017-47	18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION
Keywords	Field	DocType
speech recognition, pronunciation learning	Pronunciation,Computer science,Speech recognition	Conference
ISSN	Citations	PageRank
2308-457X	0	0.34
References	Authors
7	5

Authors (5 rows)

Cited by (0 rows)

References (7 rows)

Name	Order	Citations	PageRank
Antoine Bruguier	1	6	3.50
Danushen Gnanapragasam	2	0	0.34
Leif Johnson	3	37	4.34
Kanishka Rao	4	189	11.94
Françoise Beaufays	5	27	2.84

1