Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks - Citegraph

Paper Info

Title
Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks

Abstract
Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-speech systems as they describe how words are pronounced. We propose a G2P model based on a Long Short-Term Memory (LSTM) recurrent neural network (RNN). In contrast to traditional joint-sequence based G2P approaches, LSTMs have the flexibility of taking into consideration the full context of graphemes and transform the problem from a series of grapheme-to-phoneme conversions to a word-to-pronunciation conversion. Training joint-sequence based G2P require explicit grapheme-to-phoneme alignments which are not straightforward since graphemes and phonemes don't correspond one-to-one. The LSTM based approach forgoes the need for such explicit alignments. We experiment with unidirectional LSTM (ULSTM) with different kinds of output delays and deep bidirectional LSTM (DBLSTM) with a connectionist temporal classification (CTC) layer. The DBLSTM-CTC model achieves a word error rate (WER) of 25.8% on the public CMU dataset for US English. Combining the DBLSTM-CTC model with a joint n-gram model results in a WER of 21.3%, which is a 9% relative improvement compared to the previous best WER of 23.4% from a hybrid system.

Year	DOI	Venue
2015	10.1109/ICASSP.2015.7178767	IEEE International Conference on Acoustics, Speech and SP
Keywords	Field	DocType
neural nets,speech recognition,speech synthesis,synchronisation,CTC layer,DBLSTM-CTC model,G2P models,RNN,ULSTM,US English,WER,connectionist temporal classification,deep bidirectional LSTM,grapheme-to-phoneme alignments,grapheme-to-phoneme conversion,grapheme-to-phoneme models,hybrid system,joint n-gram model,joint-sequence based G2P,long short-term memory recurrent neural networks,public CMU dataset,speech recognition,text-to-speech systems,unidirectional LSTM,word error rate,word-to-pronunciation conversion,CTC,G2P,LSTM,RNN,pronunciation,speech recognition	Grapheme,Computer science,Word error rate,Recurrent neural network,Long short term memory,US English,Speech recognition,Natural language processing,Artificial intelligence,Hybrid system,Connectionism	Conference
ISSN	Citations	PageRank
1520-6149	21	0.97
References	Authors
9	4

Authors (4 rows)

Cited by (21 rows)

References (9 rows)

Name	Order	Citations	PageRank
Kanishka Rao	1	189	11.94
Fuchun Peng	2	1378	85.75
Hasim Sak	3	690	39.56
Françoise Beaufays	4	27	2.84

1