Arabic Text Categorization Based on Arabic Wikipedia - Citegraph

Paper Info

Title
Arabic Text Categorization Based on Arabic Wikipedia

Abstract
This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization algorithm was built by adopting a simple categorization idea then moving forward to more complex ones. We applied tests and filtration criteria to reach the best and most efficient results that our algorithm can achieve. The categorization depends on the statistical relations between the input (test) text and the reference (training) data supported by well-defined Wikipedia-based categories. Our algorithm supports two levels for categorizing Arabic text; categories are grouped into a hierarchy of main categories and subcategories. This introduces a challenge due to the correlation between certain subcategories and overlap between main categories. We argue that our algorithm achieved good performance compared to other methods reported in the literature.

Year	DOI	Venue
2014	10.1145/2537129	ACM Trans. Asian Lang. Inf. Process.
Keywords	Field	DocType
corpus-based datasets,automated process,simple categorization idea,categorization algorithm,arabic text categorization,certain subcategories,categorizing arabic text,arabic wikipedia,main category,customize category,efficient result,text analysis	Categorization,Arabic,Computer science,Computational linguistics,Natural language processing,Artificial intelligence,Arabic natural language processing,Hierarchy,Text categorization	Journal
Volume	Issue	Citations
13	1	1
PageRank	References	Authors
0.35	5	2

Authors (2 rows)

Cited by (1 rows)

References (5 rows)

Name	Order	Citations	PageRank
Adnan Yahya	1	68	4.77
Ali Salhi	2	1	0.69

1