Title | ||
---|---|---|
Simultaneous tokenization and part-of-speech tagging for Arabic without a morphological analyzer |
Abstract | ||
---|---|---|
We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the likelihood of the possible stems of the openclass words. By encoding some basic linguistic information, the machine learning task is simplified, while achieving state-of-the-art tokenization results and competitive POS results, although with a reduced tag set and some evaluation difficulties. |
Year | Venue | Keywords |
---|---|---|
2010 | ACL (Short Papers) | reduced tag set,openclass word,evaluation difficulty,open-class item,simultaneous tokenization,state-of-the-art tokenization result,morphological analyzer,part-of-speech tagging,basic linguistic information,competitive pos result |
Field | DocType | Volume |
Rule-based machine translation,Tokenization (data security),Arabic,Lexical analysis,Computer science,Part-of-speech tagging,Speech recognition,Natural language processing,Artificial intelligence,Spectrum analyzer,Encoding (memory) | Conference | P10-2 |
Citations | PageRank | References |
6 | 0.70 | 2 |
Authors | ||
1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Seth Kulick | 1 | 221 | 29.66 |