Title
Word-Level Language Identification In The Chymistry Of Isaac Newton
Abstract
In this article, we introduce the task of word-based language identification in multilingual texts, in which every word needs to be classified with regard to its language. This task is necessary for multilingual texts in which language switches can occur within sentences, often more than once, as is the case in the texts in The Chymistry of Isaac Newton collection. We present a novel method based on character n-grams in combination with a weighting scheme that allows us to model the probability of language switches at different points in sentences. This method reaches the highest accuracy of 89.94% when 5-grams are used.
Year
DOI
Venue
2015
10.1093/llc/fqu032
DIGITAL SCHOLARSHIP IN THE HUMANITIES
Field
DocType
Volume
Computer science,Artificial intelligence,Language identification,Natural language processing,A-weighting,Linguistics
Journal
30
Issue
ISSN
Citations 
4
2055-7671
2
PageRank 
References 
Authors
0.41
6
3
Name
Order
Citations
PageRank
Levi King1102.79
Sandra Kübler25613.29
wallace hooper320.41