Title
Candidate document retrieval for Arabic-based text reuse detection on the web
Abstract
Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.
Year
DOI
Venue
2016
10.1109/INNOVATIONS.2016.7880048
2016 12th International Conference on Innovations in Information Technology (IIT)
Keywords
Field
DocType
Web Document Retrieval,Query Generation,Text Reuse Detection,Fingerprinting
Data mining,World Wide Web,Search engine,Arabic,Information retrieval,Information technology,Reuse,Computer science,Ranking (information retrieval),Document retrieval,Source document,Query formulation
Conference
ISSN
ISBN
Citations 
1819-9127
978-1-5090-5344-5
0
PageRank 
References 
Authors
0.34
9
3
Name
Order
Citations
PageRank
Leena Lulu1174.11
Boumediene Belkhouche25517.44
Saad Harous38523.19