Abstract | ||
---|---|---|
This paper introduces a method for segmenting a given word into word parts, including affixes, word stem, and word roots. In our approach, word parts including affixes and word roots in a given training dataset are counted and relevant probability values estimated. The method involves training a probabilistic model on a set of annotated word segmentation, finding most probable word stem and affixes, and finally further segment word stem into word roots. At run-time, we first strip the affixes off the given word to derive the stem. Then we segment the stem word into word roots. We enumerate all possible segmentation, and the most probable segmentation is then returned as the best morphological segmentation of the given word. Moreover, we adjust our probabilistic model by considering the rules for adding suffixes to word roots and the positions of prefixes and suffixes in a word. Preliminary evaluation shows that the proposed method is competitive with previous works. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TAAI51410.2020.00056 | 2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI) |
Keywords | DocType | ISSN |
morphology,word root,affix,probabilistic model | Conference | 2376-6816 |
ISBN | Citations | PageRank |
978-1-6654-4737-9 | 0 | 0.34 |
References | Authors | |
0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Tsen Hsieh | 1 | 0 | 0.34 |
Jason S. Chang | 2 | 345 | 62.64 |