Title
A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation
Abstract
Rare words are usually replaced with a single <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$&lt; $</tex-math></inline-formula> <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$unk$</tex-math></inline-formula> <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$&gt;$</tex-math></inline-formula> token in the current encoder–decoder style of neural machine translation, challenging the translation modeling by an obscured context. In this article, we propose to build a fuzzy semantic representation (FSR) method for rare words through a hierarchical clustering method to group rare words together, and integrate it into the encoder–decoder framework. This hierarchical structure can compensate for the semantic information in both source and target sides, and providing fuzzy context information to capture the semantic of rare words. The introduced FSR can also alleviate the data sparseness, which is the bottleneck in dealing with rare words in neural machine translation. In particular, our method is easily extended to the transformer-based neural machine translation model and learns the FSRs of all in-vocabulary words to enhance the sentence representations in addition to rare words. Our experiments on Chinese-to-English translation tasks confirm a significant improvement in the translation quality brought by the proposed method.
Year
DOI
Venue
2020
10.1109/TFUZZ.2020.2969399
IEEE Transactions on Fuzzy Systems
Keywords
DocType
Volume
Fuzzy semantic representation (FSR),hierarchical clustering,neural network,neural machine translation (NMT)
Journal
28
Issue
ISSN
Citations 
5
1063-6706
3
PageRank 
References 
Authors
0.38
13
6
Name
Order
Citations
PageRank
Yang Muyun111229.50
Shujie Liu233837.84
Kehai Chen34316.34
Hongyang Zhang430.38
Enbo Zhao530.38
Tiejun Zhao6643102.68