Similarity Based on Data Compression - Citegraph

Paper Info

Title
Similarity Based on Data Compression

Abstract
Similarity detection is one of the most important areas in document processing. The applications of it starts in spam detection and goes through identification of plagiarism in the web, bachelor or master thesis and ends at identification of copied scientific papers. This paper presents an improvement of a plagiarism detection algorithm which is based on the Lampel and Ziv dictionary based compression algorithm by application of stop words removing and tests this algorithm on real dataset. Moreover, a visualization of the plagiarized documents relationship is also presented. The algorithm confirms its ability in detection of the plagiarized parts of text and also the achieved improvement when the suggested improvements are applied.

Year	DOI	Venue
2013	10.1007/978-3-642-45111-9_24	MICAI (2)
Field	DocType	Citations
Data mining,Plagiarism detection,Information retrieval,Visualization,Computer science,Document processing,Data compression,Stop words	Conference	1
PageRank	References	Authors
0.36	14	3

Authors (3 rows)

Cited by (1 rows)

References (14 rows)

Name	Order	Citations	PageRank
Michal Prilepok	1	32	6.45
Jan Platos	2	286	58.72
Václav Snasel	3	1261	210.53

1