Title
Semantic and Similarity Measure Methods for Plagiarism Detection of Students' Assignments.
Abstract
This paper aims at detecting semantic plagiarism in Czech texts. The paper integrates a similarity measure technique previously used for text compression along with a synonyms structured thesaurus and a stemming algorithm to detect rewording and restructuring of texts in Czech language. Out of a 100 GB corpus, we extracted 884 files of B.A., M.A., and Ph.D. students' assignments, semester works and theses, from Computer Science major. The total size of the extracted testing data used was 1.98 GB of plain text for our initial experiment. The method is tested first on short texts. Then, the method is applied on longer texts of students' assignments. Our results on short texts showed more accurate results to detect paraphrased texts of semantic similarity, but lower accuracy was detected in case of identical texts with rearranged paragraphs. Our results experiment conducted on the long texts corpus of students' assignment and theses show a semantic plagiarism rate of 23.9%. However, after manual scanning of documents, some noise results occur as a result of using the same technical terms and scientific definitions and references in bibliography lists in different documents. These results will be fine-tuned and optimized in the future by building a file-specific stop word list, additional exact match method and removing references and other standard text templates often used in certain parts of students' assignment works and theses.
Year
DOI
Venue
2015
10.1007/978-3-319-29504-6_12
PROCEEDINGS OF THE SECOND INTERNATIONAL AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT (AECIA 2015)
Keywords
DocType
Volume
Semantic plagiarism detection,Plagiarism detection methods,Similarity measures,Data compression,Czech thesaurus,Synonymy,Plagiarism detection techniques
Conference
427
ISSN
Citations 
PageRank 
2194-5357
0
0.34
References 
Authors
0
4
Name
Order
Citations
PageRank
Hussein Soori111.72
Michal Prilepok2326.45
Jan Platos328658.72
Václav Snasel41261210.53