Title
Probabilistic Segmentation of Word Forms into Affixes and Word Roots
Abstract
This paper introduces a method for segmenting a given word into word parts, including affixes, word stem, and word roots. In our approach, word parts including affixes and word roots in a given training dataset are counted and relevant probability values estimated. The method involves training a probabilistic model on a set of annotated word segmentation, finding most probable word stem and affixes, and finally further segment word stem into word roots. At run-time, we first strip the affixes off the given word to derive the stem. Then we segment the stem word into word roots. We enumerate all possible segmentation, and the most probable segmentation is then returned as the best morphological segmentation of the given word. Moreover, we adjust our probabilistic model by considering the rules for adding suffixes to word roots and the positions of prefixes and suffixes in a word. Preliminary evaluation shows that the proposed method is competitive with previous works.
Year
DOI
Venue
2020
10.1109/TAAI51410.2020.00056
2020 International Conference on Technologies and Applications of Artificial Intelligence (TAAI)
Keywords
DocType
ISSN
morphology,word root,affix,probabilistic model
Conference
2376-6816
ISBN
Citations 
PageRank 
978-1-6654-4737-9
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Tsen Hsieh100.34
Jason S. Chang234562.64