Expanding N-grams for Code-Switch Language Models. - Citegraph

Paper Info

Title
Expanding N-grams for Code-Switch Language Models.

Abstract
It has become common, especially among urban youth, for people to use more than one language in their everyday conversations - a phenomenon referred to by linguists as "code-switching". With the rise in globalization and the widespread of code-switching among multilingual societies, a great demand has been placed on Natural Language Processing (NLP) applications to be able to handle such mixed data. In this paper, we present our efforts in language modeling for code-switch Arabic-English. In order to train a language model (LM), huge amounts of text data is required in the respective language. However, the main challenge faced in language modeling for code-switch languages, is the lack of available data. In this paper, we propose an approach to artificially generate code-switch Arabic-English n-grams and thus improve the language model. This was done by expanding the relatively-small available corpus and its corresponding n-grams using translation-based approaches. The final LM achieved relative improvements in both perplexity and OOV rates of 1.97% and 16.36% respectively.

Year	DOI	Venue
2018	10.1007/978-3-319-99010-1_20	PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2018
Keywords	DocType	Volume
Code-switching,Code-mixing,Arabic-English,Language modeling,Natural language generation	Conference	845
ISSN	Citations	PageRank
2194-5357	0	0.34
References	Authors
0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Injy Hamed	1	1	3.08
Mohamed Elmahdy	2	13	4.57
Slim Abdennadher	3	394	60.95

1