Title
Tibetan number identification based on classification of number components in Tibetan word segmentation
Abstract
Tibetan word segmentation is essential for Tibetan information processing. People mainly use the basic machine matching method which is based on dictionary to segment Tibetan words at present, because there is no segmented Tibetan corpus which can be used for training in Tibetan word segmentation. But the method based on dictionary is not fit to Tibetan number identification. This paper studies the characteristics of Tibetan numbers, and then, proposes a method to identify Tibetan numbers based on classification of number components. The method first tags every number component according to the class it belongs to while segmenting, and then updates the tag series according to some predefined rules. At last adjacent number components are combined to form a Tibetan number if they meet a certain requirement. In the testing result from 7938K Tibetan corpus, the identification accuracy is 99.21%.
Year
Venue
Keywords
2010
COLING (Posters)
tibetan number identification,number component,tibetan information processing,tibetan number,segmented tibetan corpus,identification accuracy,segment tibetan word,last adjacent number component,tibetan word segmentation,tibetan corpus,word segmentation
Field
DocType
Volume
Market segmentation,Information processing,Pattern recognition,Computer science,Text segmentation,Natural language processing,Artificial intelligence
Conference
C10-2
Citations 
PageRank 
References 
2
0.40
1
Authors
6
Name
Order
Citations
PageRank
Huidan Liu1165.09
Weina Zhao220.73
Minghua Nuo3114.22
Li Jiang420.40
Jian Wu520.40
Yeping He67714.64