Title
POS Tagging for Arabic Tweets.
Abstract
Part-of-Speech (POS) tagging is a key step in many NLP algorithms. However, tweets are difficult to POS tag because there are many phenomena that frequently appear in Twitter that are not as common, or are entirely absent, in other domains: tweets are short, are not always written maintaining formal grammar and proper spelling, and abbreviations are often used to overcome their restricted lengths. Arabic tweets also show a further range of linguistic phenomena such as usage of different dialects, romanised Arabic and borrowing foreign words. In this paper, we present an evaluation and a detailed error analysis of stateof-the-art POS taggers for Arabic when applied to Arabic tweets. The accuracy of standard Arabic taggers is typically excellent (96-97%) on Modern Standard Arabic (MSA) text; however, their accuracy declines to 49-65% on Arabic tweets. Further, we present our initial approach to improve the taggers’ performance. By doing some improvements based on observed errors, we are able to reach 79% tagging accuracy.
Year
Venue
Field
2015
RANLP
Arabic,Computer science,Part-of-speech tagging,Speech recognition,Modern Standard Arabic,Natural language processing,Spelling,Artificial intelligence,Formal grammar
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Fahad Albogamy100.68
Allan Ramsay2238.97
Allan Ramsay3238.97