Title | ||
---|---|---|
A Hierarchical Clustering Approach to Fuzzy Semantic Representation of Rare Words in Neural Machine Translation |
Abstract | ||
---|---|---|
Rare words are usually replaced with a single
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$< $</tex-math></inline-formula>
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$unk$</tex-math></inline-formula>
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$>$</tex-math></inline-formula>
token in the current encoder–decoder style of neural machine translation, challenging the translation modeling by an obscured context. In this article, we propose to build a fuzzy semantic representation (FSR) method for rare words through a hierarchical clustering method to group rare words together, and integrate it into the encoder–decoder framework. This hierarchical structure can compensate for the semantic information in both source and target sides, and providing fuzzy context information to capture the semantic of rare words. The introduced FSR can also alleviate the data sparseness, which is the bottleneck in dealing with rare words in neural machine translation. In particular, our method is easily extended to the transformer-based neural machine translation model and learns the FSRs of all in-vocabulary words to enhance the sentence representations in addition to rare words. Our experiments on Chinese-to-English translation tasks confirm a significant improvement in the translation quality brought by the proposed method. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TFUZZ.2020.2969399 | IEEE Transactions on Fuzzy Systems |
Keywords | DocType | Volume |
Fuzzy semantic representation (FSR),hierarchical clustering,neural network,neural machine translation (NMT) | Journal | 28 |
Issue | ISSN | Citations |
5 | 1063-6706 | 3 |
PageRank | References | Authors |
0.38 | 13 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yang Muyun | 1 | 112 | 29.50 |
Shujie Liu | 2 | 338 | 37.84 |
Kehai Chen | 3 | 43 | 16.34 |
Hongyang Zhang | 4 | 3 | 0.38 |
Enbo Zhao | 5 | 3 | 0.38 |
Tiejun Zhao | 6 | 643 | 102.68 |