Abstract | ||
---|---|---|
We discuss and solve the task of Khmer name Romanization. Although several standard Romanization systems exist for Khmer, conventional transcription methods are applied prevalently in practice. These are inconsistent and complicated in some cases, due to unstable phonemic, orthographic, and etymological principles. Consequently, statistical approaches are required for the task. We collect and manually align 7, 658 Khmer name Romanization instances. The alignment scheme is designed to reach a precise, consistent, and monotonic correspondence between the two different writing systems on grapheme level, through which various machine learning approaches are facilitated. Experimental results demonstrate that standard approaches of conditional random fields and support vector machine supervised by the manual alignment achieve a precision of .99 on grapheme level, which outperforms a state-of-the-art recurrent neural network approach in a pure sequence-to-sequence manner. The manually aligned data have been released under a license of CC BY-NC-SA for the research community. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-981-10-8438-6_15 | Communications in Computer and Information Science |
DocType | Volume | ISSN |
Conference | 781 | 1865-0929 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Chenchen Ding | 1 | 3 | 1.21 |
Vichet Chea | 2 | 0 | 0.34 |
Masao Utiyama | 3 | 714 | 86.69 |
Eiichiro SUMITA | 4 | 1466 | 190.87 |
Sethserey Sam | 5 | 0 | 0.34 |
Sopheap Seng | 6 | 0 | 0.34 |