Abstract | ||
---|---|---|
In this paper we present our research devoted to the development of Natural Language Processing technologies for the Ainu language, a critically endangered language isolate spoken by the Ainu people, the native inhabitants of northern parts of the Japanese archipelago. In particular, we focused on improving the existing tools for transcription normalization, word segmentation (tokenization) and part-of-speech tagging. In the experiments we applied two Ainu language dictionaries from different domains (literary and colloquial) and created a new data set by combining them. The experiments confirmed the positive effect of these modifications on the overall performance of the tools, especially with objective samples unrelated to the training data. We also discuss further improvements obtained by applying corpus-driven language models to the problem of word segmentation and using a state-of-the-art tool for training part-of-speech taggers. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-030-66527-2_10 | LCT |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Karol Nowakowski | 1 | 0 | 0.34 |
Michal Ptaszynski | 2 | 132 | 25.47 |
Fumito Masui | 3 | 87 | 27.22 |