Title | ||
---|---|---|
Utilization of Multi-word Expressions to Improve Statistical Machine Translation of Statutory Sentences |
Abstract | ||
---|---|---|
Statutory sentences are generally difficult to read because of their complicated expressions and length. Such difficulty is one reason for the low quality of statistical machine translation (SMT). Multi-word expressions (MWEs) also complicate statutory sentences and extend their length. Therefore, we proposed a method that utilizes MWEs to improve the SMT system of statutory sentences. In our method, we extracted the monolingual MWEs from a parallel corpus, automatically acquired these translations based on the Dice coefficient, and integrated the extracted bilingual MWEs into an SMT system by the single-tokenization strategy. The experiment results with our SMT system using the proposed method significantly improved the translation quality. Although automatic translation equivalent acquisition using the Dice coefficient is not perfect, the best system's score was close to a system that used bilingual MWEs whose equivalents are translated by hand. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1007/978-3-319-50953-2_18 | Lecture Notes in Artificial Intelligence |
Keywords | Field | DocType |
Multi-word expressions,Statistical machine translation,Legal information sharing | Rule-based machine translation,Expression (mathematics),Statutory law,Sørensen–Dice coefficient,Computer science,Machine translation,Speech recognition,Natural language processing,Artificial intelligence,Automatic translation | Conference |
Volume | ISSN | Citations |
10091 | 0302-9743 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Satomi Sakamoto | 1 | 0 | 0.34 |
Yasuhiro Ogawa | 2 | 1 | 3.08 |
Makoto Nakamura | 3 | 28 | 7.99 |
Tomohiro Ohno | 4 | 31 | 10.06 |
Katsuhiko Toyama | 5 | 39 | 11.41 |