Enhancing Transformer with Horizontal and Vertical Guiding Mechanisms for Neural Language Modeling - Citegraph

Paper Info

Title
Enhancing Transformer with Horizontal and Vertical Guiding Mechanisms for Neural Language Modeling

Abstract
Language modeling is an important problem in Natural Language Processing (NLP), and the multi-layer Transformer network is currently the most advanced and effective model for this task. However, there exist two inherent defects in its multi-head self-attention structure: (1) attention information loss: the lower-level attention weights cannot be explicitly passed through upper layers, which may lead the network lose some pivotal attention information captured by lower-level layers; (2) multi-head bottleneck: the dimension of each head in vanilla Transformer is relatively small and the process of each head is independent, which introduces an expressive bottleneck and makes subspace learning inadequate constitutionally. To overcome these two weaknesses, a novel neural architecture named Guide-Transformer is proposed in this paper. The Guide-Transformer utilizes horizontal and vertical attention information to guide the original process of the multi-head self-attention sublayer without introducing excessive complexity. The experimental results on three authoritative language modeling benchmarks demonstrate the effectiveness of Guide-Transformer. For the popular perplexity (ppl) and bits-per-character (bpc) evaluation metrics, Guide-Transformer achieves moderate improvements over the powerful baseline model.

Year	DOI	Venue
2021	10.1109/ICC42927.2021.9500450	IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021)
Keywords	DocType	ISSN
neural language modeling, transformer, attention mechanism, information guiding	Conference	1550-3607
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Anlin Qu	1	0	0.34
Jianwei Niu	2	1643	141.54
Shasha Mo	3	3	2.43

1