Abstract | ||
---|---|---|
Three reasons make plagiarism across languages to be on the rise: (i) speakers of under-resourced languages often consult documentation in a foreign language, (ii) people immersed in a foreign country can still consult material written in their native language, and (iii) people are often interested in writing in a language different to their native one. Most efforts for automatically detecting cross-language plagiarism depend on a preliminary translation, which is not always available.In this paper we propose a freely available architecture for plagiarism detection across languages covering the entire process: heuristic retrieval, detailed analysis, and post-processing. On top of this architecture we explore the suitability of three cross-language similarity estimation models: Cross-Language Alignment-based Similarity Analysis (CL-ASA), Cross-Language Character n-Grams (CL-CNG), and Translation plus Monolingual Analysis (T+MA); three inherently different models in nature and required resources.The three models are tested extensively under the same conditions on the different plagiarism detection sub-tasks-something never done before. The experiments show that T+MA produces the best results, closely followed by CL-ASA. Still CL-ASA obtains higher values of precision, an important factor in plagiarism detection when lesser user intervention is desired. |
Year | DOI | Venue |
---|---|---|
2013 | 10.1016/j.knosys.2013.06.018 | Knowl.-Based Syst. |
Keywords | Field | DocType |
monolingual analysis,native language,under-resourced language,plagiarism detection,different plagiarism detection,foreign language,cross-language plagiarism detection,cl-asa obtains,cross-language character n-grams,different model,cross-language alignment-based similarity analysis | Similarity analysis,Heuristic,Architecture,Plagiarism detection,Computer science,Natural language processing,Artificial intelligence,Documentation,First language,Foreign language | Journal |
Volume | Issue | ISSN |
50 | C | 0950-7051 |
Citations | PageRank | References |
26 | 0.95 | 18 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alberto Barrón-Cedeño | 1 | 346 | 29.35 |
Parth Gupta | 2 | 118 | 13.78 |
paolo rosso | 3 | 1831 | 188.74 |