Abstract | ||
---|---|---|
With the huge size and large diversity of web content and the appearance of more social media platforms and blog websites, more people are contributing content of varying quality. Many users prefer to keep themselves anonymous when posting material to the web, which resulted in more pieces of text: articles, blogs, essays and emails being published under assumed identities or have no known author. This may result in copyright and other legal issues and thus the need for good authorship attribution systems. The problem may be more acute for Arabic texts due to restrictions, actual and perceived, on electronic content publication and the prevailing social norms. In this paper we study the issue of Arabic author attribution (AAA) concerned with designating a particular author of an Arabic (MSA) article from among a given set of potential authors. Many features were taken into consideration for training and testing our models for AAA. We studied the effects of features like part of speech (PoS) tags, stylistic issues like punctuation marks usage and sentence characteristics, word types and word diversity. In general, PoS features, word n-grams features and rare words proved to be the most informative for our task. We also investigated the effect of factors like number of potential authors, number of articles per author, and the size of text chunks used and we report on the results. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1007/978-3-030-32959-4_14 | Communications in Computer and Information Science |
Keywords | DocType | Volume |
Arabic authorship attribution,Arabic plagiarism detection,Writing style recognition,Arabic special features,Arabic text author identification | Conference | 1108 |
ISSN | Citations | PageRank |
1865-0929 | 0 | 0.34 |
References | Authors | |
0 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Maha Hajja | 1 | 0 | 0.34 |
Ahmad Yahya | 2 | 0 | 0.34 |
Adnan Yahya | 3 | 68 | 4.77 |