Title
Simultaneous tokenization and part-of-speech tagging for Arabic without a morphological analyzer
Abstract
We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the likelihood of the possible stems of the openclass words. By encoding some basic linguistic information, the machine learning task is simplified, while achieving state-of-the-art tokenization results and competitive POS results, although with a reduced tag set and some evaluation difficulties.
Year
Venue
Keywords
2010
ACL (Short Papers)
reduced tag set,openclass word,evaluation difficulty,open-class item,simultaneous tokenization,state-of-the-art tokenization result,morphological analyzer,part-of-speech tagging,basic linguistic information,competitive pos result
Field
DocType
Volume
Rule-based machine translation,Tokenization (data security),Arabic,Lexical analysis,Computer science,Part-of-speech tagging,Speech recognition,Natural language processing,Artificial intelligence,Spectrum analyzer,Encoding (memory)
Conference
P10-2
Citations 
PageRank 
References 
6
0.70
2
Authors
1
Name
Order
Citations
PageRank
Seth Kulick122129.66