An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition. - Citegraph

Paper Info

Title
An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition.

Abstract
Conventional acoustic models for automatic speech recognition (ASR) are usually constructed from sub-word unit (e.g., context-dependent phoneme, grapheme, wordpiece etc.). Recent studies demonstrate that connectionist temporal classification (CTC) based acoustic-to-word (A2W) models are also promising for ASR. Such structures have drawn increasing attention as they can directly target words as output units, which simplify ASR pipeline by avoiding additional pronunciation lexicon, or even language model. In this study, we systematically explore to use word as acoustic modeling unit for conversational speech recognition. By replacing senone alignment with word alignment in a convolutional bidirectional LSTM architecture and employing a lexicon-free weighted finite-state transducer (WFST) based decoding, we greatly simplify conventional hybrid speech recognition system. On Hub5-2000 Switchboard/CallHome test sets with 300-hour training data, we achieve a WER that is close to the senone based hybrid systems with a WFST based decoding.

Year	DOI	Venue
2018	10.1109/SLT.2018.8639623	SLT
Keywords	Field	DocType
Hidden Markov models,Acoustics,Decoding,Training,Speech recognition,Lattices,Neural networks	Pronunciation,Computer science,Grapheme,Speech recognition,Lexicon,Decoding methods,Hidden Markov model,Artificial neural network,Hybrid system,Language model	Conference
ISSN	ISBN	Citations
2639-5479	978-1-5386-4334-1	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Chunlei Zhang	1	37	7.43
Chengzhu Yu	2	16	3.77
Chao Weng	3	113	19.75
Jia Cui	4	6	2.80
Dong Yu	5	6264	475.73

1