Abstract | ||
---|---|---|
Multilingual text processing has been gaining more and more attention in recent years. This trend has been accentuated by
the global integration of European states and the vanishing cultural and social boundaries. Multilingual text processing has
become an important field bringing a lot of new and interesting problems. This paper describes a novel approach to multilingual
plagiarism detection. We propose a new method called MLPlag for plagiarism detection in multilingual environment. This method
is based on analysis of word positions. It utilizes the EuroWordNet thesaurus which transforms words into language independent
form. This allows to identify documents plagiarized from sources written in other languages. Special techniques, such as semantic-based
word normalization, were incorporated to refine our method. It identifies the replacement of synonyms used by plagiarists
to hide the document match. We performed and evaluated our experiments on monolingual and multilingual corpora and results
are presented in this paper.
|
Year | DOI | Venue |
---|---|---|
2008 | 10.1007/978-3-540-85776-1_8 | Artificial Intelligence: Methodology, Systems, Applications |
Keywords | Field | DocType |
word position,multilingual environment,multilingual corpus,multilingual plagiarism detection,eurowordnet thesaurus,document match,plagiarism,eu- rowordnet,multilingual text processing,copy detection,thesaurus,semantic-based word normalization,nature language processing,plagiarism detection,new method,european state,lemmatization.,natural language processing | Lemmatisation,Normalization (statistics),Plagiarism detection,Copy detection,Computer science,Artificial intelligence,Natural language processing,EuroWordNet,Text processing | Conference |
Volume | ISSN | Citations |
5253 | 0302-9743 | 16 |
PageRank | References | Authors |
0.91 | 9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zdenek Ceska | 1 | 42 | 2.56 |
Michal Toman | 2 | 17 | 1.65 |
Karel Jezek | 3 | 110 | 11.77 |