Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech - Citegraph

Paper Info

Title
Utilizing Text Similarity Measurement for Data Compression to Detect Plagiarism in Czech

Abstract
This paper attempts to apply data compression based similarity method for plagiarism detection. The method has been used earlier for plagiarism detection for Arabic and English languages. In this paper we utilize this method for Czech language text from a local multi-domain Czech corpus with 50 original documents with non-plagiarized parts, and 100 suspicious documents. The documents were generated so that every document could have from 1 to 5 paragraphs. The suspicion rate in the documents was randomly chosen from 0.2 to 0.8. The findings of the study show that the similarity measurement based on Lempel-Ziv comparison algorithms is efficient for the plagiarized part of the Czech text documents with a success rate of 82.60%. Future studies may enhance the efficiency of the algorithms by including combined and more sophisticated methods.

Year	DOI	Venue
2014	10.1007/978-3-319-13572-4_13	AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT, AECIA 2014
Keywords	Field	DocType
similarity measurement,plagiarism detection,Lempel-Ziv compression algorithm,plagiarism in Czech,data compression,plagiarism detection tools	Czech,Plagiarism detection,Information retrieval,Arabic,Computer science,Data compression	Conference
Volume	ISSN	Citations
334	2194-5357	0
PageRank	References	Authors
0.34	11	4

Authors (4 rows)

Cited by (0 rows)

References (11 rows)

Name	Order	Citations	PageRank
hussein soori	1	0	0.34
Michal Prilepok	2	32	6.45
Jan Platos	3	286	58.72
Václav Snasel	4	1261	210.53

1