Title
Enhancing Transformer with Horizontal and Vertical Guiding Mechanisms for Neural Language Modeling
Abstract
Language modeling is an important problem in Natural Language Processing (NLP), and the multi-layer Transformer network is currently the most advanced and effective model for this task. However, there exist two inherent defects in its multi-head self-attention structure: (1) attention information loss: the lower-level attention weights cannot be explicitly passed through upper layers, which may lead the network lose some pivotal attention information captured by lower-level layers; (2) multi-head bottleneck: the dimension of each head in vanilla Transformer is relatively small and the process of each head is independent, which introduces an expressive bottleneck and makes subspace learning inadequate constitutionally. To overcome these two weaknesses, a novel neural architecture named Guide-Transformer is proposed in this paper. The Guide-Transformer utilizes horizontal and vertical attention information to guide the original process of the multi-head self-attention sublayer without introducing excessive complexity. The experimental results on three authoritative language modeling benchmarks demonstrate the effectiveness of Guide-Transformer. For the popular perplexity (ppl) and bits-per-character (bpc) evaluation metrics, Guide-Transformer achieves moderate improvements over the powerful baseline model.
Year
DOI
Venue
2021
10.1109/ICC42927.2021.9500450
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021)
Keywords
DocType
ISSN
neural language modeling, transformer, attention mechanism, information guiding
Conference
1550-3607
Citations 
PageRank 
References 
0
0.34
0
Authors
3
Name
Order
Citations
PageRank
Anlin Qu100.34
Jianwei Niu21643141.54
Shasha Mo332.43