Listen, Attend and Spell - Citegraph

Paper Info

Title
Listen, Attend and Spell

Abstract
We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a language model, and 10.3% with language model rescoring over the top 32 beams. By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%.

Year	Venue	Field
2015	CoRR	Computer science,Filter bank,Word error rate,Speech recognition,Artificial intelligence,Natural language processing,Encoder,Spell,Artificial neural network,Voice search,Machine learning,Language model
DocType	Volume	Citations
Journal	abs/1508.01211	50
PageRank	References	Authors
2.43	10	4

Authors (4 rows)

Cited by (50 rows)

References (10 rows)

Name	Order	Citations	PageRank
William Chan	1	357	24.67
Navdeep Jaitly	2	2988	166.08
Quoc V. Le	3	8501	366.59
Oriol Vinyals	4	9419	418.45

1