Fuzzy Cross Language Plagiarism Detection (Arabic-English) Using Wordnet In A Big Data Environment - Citegraph

Paper Info

Title
Fuzzy Cross Language Plagiarism Detection (Arabic-English) Using Wordnet In A Big Data Environment

Abstract
Cross-Language Plagiarism refers to the unacknowledged reuse of a text involving its translation from one natural language to another without proper referencing to the original source. One of the common problems in data processing is efficient large-scale text comparison, especially semantic based similarity due to the increase in the number of publications and the rate of suspicious documents sources of plagiarism. CLPD nature could be more complicated than simple copy+translate and paste, thus the detecting process exposes the need for a vague concept and fuzzy sets techniques in a big data environment to reveal dishonest practices in Arabic documents. In this paper, we propose a new Cross-Language Plagiarism Detection based on fuzzy-semantic similarity using WordNet and two semantic approaches Wu&Palmer and Lin; the work is done in a parallel way using Apache Hadoop with its distributed file system HDFS and the MapReduce programming model. The experimental results show that the Fuzzy Wu & Palmer have high performance than Fuzzy Lin.

Year	DOI	Venue
2018	10.1145/3264560.3264562	PROCEEDINGS OF 2018 2ND INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2018)
Keywords	Field	DocType
CLPD, Fuzzy sets, Semantic Similarity, Hadoop, HDFS, MapReduce	Semantic similarity,Plagiarism detection,Programming paradigm,Computer science,Fuzzy logic,Fuzzy set,Natural language,Natural language processing,Artificial intelligence,WordNet,Big data	Conference
Citations	PageRank	References
0	0.34	5
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (5 rows)

Name	Order	Citations	PageRank
Hanane Ezzikouri	1	1	1.72
Mohamed Oukessou	2	0	0.68
Youness Madani	3	4	4.13
Mohamed Erritali	4	0	1.35

1