Abstract | ||
---|---|---|
In this article, we introduce the task of word-based language identification in multilingual texts, in which every word needs to be classified with regard to its language. This task is necessary for multilingual texts in which language switches can occur within sentences, often more than once, as is the case in the texts in The Chymistry of Isaac Newton collection. We present a novel method based on character n-grams in combination with a weighting scheme that allows us to model the probability of language switches at different points in sentences. This method reaches the highest accuracy of 89.94% when 5-grams are used. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1093/llc/fqu032 | DIGITAL SCHOLARSHIP IN THE HUMANITIES |
Field | DocType | Volume |
Computer science,Artificial intelligence,Language identification,Natural language processing,A-weighting,Linguistics | Journal | 30 |
Issue | ISSN | Citations |
4 | 2055-7671 | 2 |
PageRank | References | Authors |
0.41 | 6 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Levi King | 1 | 10 | 2.79 |
Sandra Kübler | 2 | 56 | 13.29 |
wallace hooper | 3 | 2 | 0.41 |