Zero-Shot North Korean To English Neural Machine Translation By Character Tokenization And Phoneme Decomposition - Citegraph

Paper Info

Title
Zero-Shot North Korean To English Neural Machine Translation By Character Tokenization And Phoneme Decomposition

Abstract
The primary limitation of North Korean to English translation is the lack of a parallel corpus; therefore, high translation accuracy cannot be achieved. To address this problem, we propose a zero-shot approach using South Korean data, which are remarkably similar to North Korean data. We train a neural machine translation model after tokenizing a South Korean text at the character level and decomposing characters into phonemes. We demonstrate that our method can effectively learn North Korean to English translation and improve the BLEU scores by +1.01 points in comparison with the baseline.

Year	Venue	DocType
2020	ACL	Conference
Volume	Citations	PageRank
2020.acl-srw	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Hwichan Kim	1	0	0.68
Tosho Hirasawa	2	0	3.38
Mamoru Komachi	3	241	44.56

1