Title
Similarity Based on Data Compression
Abstract
Similarity detection is one of the most important areas in document processing. The applications of it starts in spam detection and goes through identification of plagiarism in the web, bachelor or master thesis and ends at identification of copied scientific papers. This paper presents an improvement of a plagiarism detection algorithm which is based on the Lampel and Ziv dictionary based compression algorithm by application of stop words removing and tests this algorithm on real dataset. Moreover, a visualization of the plagiarized documents relationship is also presented. The algorithm confirms its ability in detection of the plagiarized parts of text and also the achieved improvement when the suggested improvements are applied.
Year
DOI
Venue
2013
10.1007/978-3-642-45111-9_24
MICAI (2)
Field
DocType
Citations 
Data mining,Plagiarism detection,Information retrieval,Visualization,Computer science,Document processing,Data compression,Stop words
Conference
1
PageRank 
References 
Authors
0.36
14
3
Name
Order
Citations
PageRank
Michal Prilepok1326.45
Jan Platos228658.72
Václav Snasel31261210.53