Title
An Exploration of Directly Using Word as ACOUSTIC Modeling Unit for Speech Recognition.
Abstract
Conventional acoustic models for automatic speech recognition (ASR) are usually constructed from sub-word unit (e.g., context-dependent phoneme, grapheme, wordpiece etc.). Recent studies demonstrate that connectionist temporal classification (CTC) based acoustic-to-word (A2W) models are also promising for ASR. Such structures have drawn increasing attention as they can directly target words as output units, which simplify ASR pipeline by avoiding additional pronunciation lexicon, or even language model. In this study, we systematically explore to use word as acoustic modeling unit for conversational speech recognition. By replacing senone alignment with word alignment in a convolutional bidirectional LSTM architecture and employing a lexicon-free weighted finite-state transducer (WFST) based decoding, we greatly simplify conventional hybrid speech recognition system. On Hub5-2000 Switchboard/CallHome test sets with 300-hour training data, we achieve a WER that is close to the senone based hybrid systems with a WFST based decoding.
Year
DOI
Venue
2018
10.1109/SLT.2018.8639623
SLT
Keywords
Field
DocType
Hidden Markov models,Acoustics,Decoding,Training,Speech recognition,Lattices,Neural networks
Pronunciation,Computer science,Grapheme,Speech recognition,Lexicon,Decoding methods,Hidden Markov model,Artificial neural network,Hybrid system,Language model
Conference
ISSN
ISBN
Citations 
2639-5479
978-1-5386-4334-1
0
PageRank 
References 
Authors
0.34
0
5
Name
Order
Citations
PageRank
Chunlei Zhang1377.43
Chengzhu Yu2163.77
Chao Weng311319.75
Jia Cui462.80
Dong Yu56264475.73