Title
Expanding N-grams for Code-Switch Language Models.
Abstract
It has become common, especially among urban youth, for people to use more than one language in their everyday conversations - a phenomenon referred to by linguists as "code-switching". With the rise in globalization and the widespread of code-switching among multilingual societies, a great demand has been placed on Natural Language Processing (NLP) applications to be able to handle such mixed data. In this paper, we present our efforts in language modeling for code-switch Arabic-English. In order to train a language model (LM), huge amounts of text data is required in the respective language. However, the main challenge faced in language modeling for code-switch languages, is the lack of available data. In this paper, we propose an approach to artificially generate code-switch Arabic-English n-grams and thus improve the language model. This was done by expanding the relatively-small available corpus and its corresponding n-grams using translation-based approaches. The final LM achieved relative improvements in both perplexity and OOV rates of 1.97% and 16.36% respectively.
Year
DOI
Venue
2018
10.1007/978-3-319-99010-1_20
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2018
Keywords
DocType
Volume
Code-switching,Code-mixing,Arabic-English,Language modeling,Natural language generation
Conference
845
ISSN
Citations 
PageRank 
2194-5357
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Injy Hamed113.08
Mohamed Elmahdy2134.57
Slim Abdennadher339460.95