Title
Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech
Abstract
This paper attempts to apply data compression based similarity method for plagiarism detection. The method has been used earlier for plagiarism detection for Arabic and English languages. In this paper we utilize this method for Czech language text from a local multi-domain Czech corpus with 50 original documents with non-plagiarized parts, and 100 suspicious documents. The documents were generated so that every document could have from 1 to 5 paragraphs. The suspicion rate in the documents was randomly chosen from 0.2 to 0.8. The findings of the study show that the similarity measurement based on Lempel-Ziv comparison algorithms is efficient for the plagiarized part of the Czech text documents with a success rate of 82.60%. Future studies may enhance the efficiency of the algorithms by including combined and more sophisticated methods.
Year
DOI
Venue
2014
10.1007/978-3-319-13572-4_13
AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT, AECIA 2014
Keywords
Field
DocType
similarity measurement,plagiarism detection,Lempel-Ziv compression algorithm,plagiarism in Czech,data compression,plagiarism detection tools
Czech,Plagiarism detection,Information retrieval,Arabic,Computer science,Data compression
Conference
Volume
ISSN
Citations 
334
2194-5357
0
PageRank 
References 
Authors
0.34
11
4
Name
Order
Citations
PageRank
hussein soori100.34
Michal Prilepok2326.45
Jan Platos328658.72
Václav Snasel41261210.53