Candidate document retrieval for Arabic-based text reuse detection on the web - Citegraph

Paper Info

Title
Candidate document retrieval for Arabic-based text reuse detection on the web

Abstract
Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.

Year	DOI	Venue
2016	10.1109/INNOVATIONS.2016.7880048	2016 12th International Conference on Innovations in Information Technology (IIT)
Keywords	Field	DocType
Web Document Retrieval,Query Generation,Text Reuse Detection,Fingerprinting	Data mining,World Wide Web,Search engine,Arabic,Information retrieval,Information technology,Reuse,Computer science,Ranking (information retrieval),Document retrieval,Source document,Query formulation	Conference
ISSN	ISBN	Citations
1819-9127	978-1-5090-5344-5	0
PageRank	References	Authors
0.34	9	3

Authors (3 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Leena Lulu	1	17	4.11
Boumediene Belkhouche	2	55	17.44
Saad Harous	3	85	23.19

1