Abstract | ||
---|---|---|
Part-of-Speech (POS) tagging is a key step in many NLP algorithms. However, tweets are difficult to POS tag because there are many phenomena that frequently appear in Twitter that are not as common, or are entirely absent, in other domains: tweets are short, are not always written maintaining formal grammar and proper spelling, and abbreviations are often used to overcome their restricted lengths. Arabic tweets also show a further range of linguistic phenomena such as usage of different dialects, romanised Arabic and borrowing foreign words. In this paper, we present an evaluation and a detailed error analysis of stateof-the-art POS taggers for Arabic when applied to Arabic tweets. The accuracy of standard Arabic taggers is typically excellent (96-97%) on Modern Standard Arabic (MSA) text; however, their accuracy declines to 49-65% on Arabic tweets. Further, we present our initial approach to improve the taggers’ performance. By doing some improvements based on observed errors, we are able to reach 79% tagging accuracy. |
Year | Venue | Field |
---|---|---|
2015 | RANLP | Arabic,Computer science,Part-of-speech tagging,Speech recognition,Modern Standard Arabic,Natural language processing,Spelling,Artificial intelligence,Formal grammar |
DocType | Citations | PageRank |
Conference | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Fahad Albogamy | 1 | 0 | 0.68 |
Allan Ramsay | 2 | 23 | 8.97 |
Allan Ramsay | 3 | 23 | 8.97 |