POS Tagging for Arabic Tweets. - Citegraph

Paper Info

Title
POS Tagging for Arabic Tweets.

Abstract
Part-of-Speech (POS) tagging is a key step in many NLP algorithms. However, tweets are difficult to POS tag because there are many phenomena that frequently appear in Twitter that are not as common, or are entirely absent, in other domains: tweets are short, are not always written maintaining formal grammar and proper spelling, and abbreviations are often used to overcome their restricted lengths. Arabic tweets also show a further range of linguistic phenomena such as usage of different dialects, romanised Arabic and borrowing foreign words. In this paper, we present an evaluation and a detailed error analysis of stateof-the-art POS taggers for Arabic when applied to Arabic tweets. The accuracy of standard Arabic taggers is typically excellent (96-97%) on Modern Standard Arabic (MSA) text; however, their accuracy declines to 49-65% on Arabic tweets. Further, we present our initial approach to improve the taggers’ performance. By doing some improvements based on observed errors, we are able to reach 79% tagging accuracy.

Year	Venue	Field
2015	RANLP	Arabic,Computer science,Part-of-speech tagging,Speech recognition,Modern Standard Arabic,Natural language processing,Spelling,Artificial intelligence,Formal grammar
DocType	Citations	PageRank
Conference	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Fahad Albogamy	1	0	0.68
Allan Ramsay	2	23	8.97
Allan Ramsay	3	23	8.97

1