Effective Multi-Dialectal Arabic Pos Tagging - Citegraph

Paper Info

Title
Effective Multi-Dialectal Arabic Pos Tagging

Abstract
This work introduces robust multi-dialectal part of speech tagging trained on an annotated data set of Arabic tweets in four major dialect groups: Egyptian, Levantine, Gulf, and Maghrebi. We implement two different sequence tagging approaches. The first uses conditional random fields (CRFs), while the second combines word- and character-based representations in a deep neural network with stacked layers of convolutional and recurrent networks with a CRF output layer. We successfully exploit a variety of features that help generalize our models, such as Brown clusters and stem templates. Also, we develop robust joint models that tag multi-dialectal tweets and outperform uni-dialectal taggers. We achieve a combined accuracy of 92.4% across all dialects, with per dialect results ranging between 90.2% and 95.4%. We obtained the results using a train/dev/test split of 70/10/20 for a data set of 350 tweets per dialect.

Year	DOI	Venue
2020	10.1017/S1351324920000078	NATURAL LANGUAGE ENGINEERING
Keywords	DocType	Volume
Part-of-speech tagging, Arabic, Dialects, Deep neural network, Brown clusters	Journal	26
Issue	ISSN	Citations
6	1351-3249	0
PageRank	References	Authors
0.34	0	8

Authors (8 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Darwish Kareem	1	615	52.39
Mohammed Attia	2	146	16.51
Hamdy Mubarak	3	140	19.60
Samih Younes	4	38	11.26
Ahmed Abdelali	5	152	25.84
Lluís Màrquez	6	0	0.34
Mohamed Eldesouki	7	2	2.05
Laura Kallmeyer	8	165	38.11

1