Title
Unsupervised Arabic dialect segmentation for machine translation
Abstract
Resource-limited and morphologically rich languages pose many challenges to natural language processing tasks. Their highly inflected surface forms inflate the vocabulary size and increase sparsity in an already scarce data situation. In this article, we present an unsupervised learning approach to vocabulary reduction through morphological segmentation. We demonstrate its value in the context of machine translation for dialectal Arabic (DA), the primarily spoken, orthographically unstandardized, morphologically rich and yet resource poor variants of Standard Arabic. Our approach exploits the existence of monolingual and parallel data. We show comparable performance to state-of-the-art supervised methods for DA segmentation.
Year
DOI
Venue
2022
10.1017/S1351324920000455
NATURAL LANGUAGE ENGINEERING
Keywords
DocType
Volume
Machine translation, Morphology, Arabic dialects, Unsupervised learning
Journal
28
Issue
ISSN
Citations 
2
1351-3249
0
PageRank 
References 
Authors
0.34
39
2
Name
Order
Citations
PageRank
Wael Salloum1596.86
Nizar Habash21833145.59