Title
Fuzzy Cross Language Plagiarism Detection (Arabic-English) Using Wordnet In A Big Data Environment
Abstract
Cross-Language Plagiarism refers to the unacknowledged reuse of a text involving its translation from one natural language to another without proper referencing to the original source. One of the common problems in data processing is efficient large-scale text comparison, especially semantic based similarity due to the increase in the number of publications and the rate of suspicious documents sources of plagiarism. CLPD nature could be more complicated than simple copy+translate and paste, thus the detecting process exposes the need for a vague concept and fuzzy sets techniques in a big data environment to reveal dishonest practices in Arabic documents. In this paper, we propose a new Cross-Language Plagiarism Detection based on fuzzy-semantic similarity using WordNet and two semantic approaches Wu&Palmer and Lin; the work is done in a parallel way using Apache Hadoop with its distributed file system HDFS and the MapReduce programming model. The experimental results show that the Fuzzy Wu & Palmer have high performance than Fuzzy Lin.
Year
DOI
Venue
2018
10.1145/3264560.3264562
PROCEEDINGS OF 2018 2ND INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2018)
Keywords
Field
DocType
CLPD, Fuzzy sets, Semantic Similarity, Hadoop, HDFS, MapReduce
Semantic similarity,Plagiarism detection,Programming paradigm,Computer science,Fuzzy logic,Fuzzy set,Natural language,Natural language processing,Artificial intelligence,WordNet,Big data
Conference
Citations 
PageRank 
References 
0
0.34
5
Authors
4
Name
Order
Citations
PageRank
Hanane Ezzikouri111.72
Mohamed Oukessou200.68
Youness Madani344.13
Mohamed Erritali401.35