Title
Large Context End-To-End Automatic Speech Recognition Via Extension Of Hierarchical Recurrent Encoder-Decoder Models
Abstract
This paper describes a novel end-to-end automatic speech recognition (ASR) method that takes into consideration long-range sequential context information beyond utterance boundaries. In spontaneous ASR tasks such as those for discourses and conversations, the input speech often comprises a series of utterances. Accordingly, the relationships between the utterances should be leveraged for transcribing the individual utterances. While most previous end to -end ASR methods only focus on utterance-level ASR that handles single utterances independently, the proposed method (which we call "large-context end-to-end ASR") can explicitly utilize relationships between a current target utterance and all preceding utterances. The method is modeled by combining an attention-based encoder decoder model, which is one of the most representative end-to-end ASR models, with hierarchical recurrent encoder-decoder models, which are effective language models for capturing long-range sequential contexts beyond the utterance boundaries. Experiments on Japanese discourse speech tasks demonstrate the proposed method yields significant ASR performance improvements compared with the conventional utterance-level end-to-end ASR system.
Year
DOI
Venue
2019
10.1109/icassp.2019.8683843
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
Keywords
Field
DocType
End-to-end automatic speech recognition, attention based encoder-decoder, hierarchical recurrent encoder-decoder
Transcription (linguistics),Encoder decoder,Computer science,End-to-end principle,Utterance,Speech recognition,Language model
Conference
ISSN
Citations 
PageRank 
1520-6149
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Ryo Masumura12528.24
Tomohiro Tanaka255.11
Takafumi Moriya335.45
Yusuke Shinohara48810.26
Takanobu Oba55312.09
Yushi Aono6711.02