Title
HIERARCHICAL TRANSFORMER-BASED LARGE-CONTEXT END-TO-END ASR WITH LARGE-CONTEXT KNOWLEDGE DISTILLATION
Abstract
We present a novel large-context end-to-end automatic speech recognition (E2E-ASR) model and its effective training method based on knowledge distillation. Common E2E-ASR models have mainly focused on utterance-level processing in which each utterance is independently transcribed. On the other hand, large-context E2E-ASR models, which take into account long-range sequential contexts beyond utterance boundaries, well handle a sequence of utterances such as discourses and conversations. However, the transformer architecture, which has recently achieved state-of-the-art ASR performance among utterance-level ASR systems, has not yet been introduced into the large-context ASR systems. We can expect that the transformer architecture can be leveraged for effectively capturing not only input speech contexts but also long-range sequential contexts beyond utterance boundaries. Therefore, this paper proposes a hierarchical transformer-based large-context E2E-ASR model that combines the transformer architecture with hierarchical encoder-decoder based large-context modeling. In addition, in order to enable the proposed model to use long-range sequential contexts, we also propose a large-context knowledge distillation that distills the knowledge from a pre-trained large-context language model in the training phase. We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.
Year
DOI
Venue
2021
10.1109/ICASSP39728.2021.9414928
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)
Keywords
DocType
Citations 
large-context endo-to-end automatic speech recognition, transformer, hierarchical encoder-decoder, knowledge distillation
Conference
0
PageRank 
References 
Authors
0.34
0
6
Name
Order
Citations
PageRank
Ryo Masumura12528.24
Naoki Makishima214.06
Mana Ihori315.41
Akihiko Takashima414.40
Tomohiro Tanaka5178.61
Orihashi, S.635.50