Abstract | ||
---|---|---|
Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved. |
Year | DOI | Venue |
---|---|---|
2016 | 10.1109/INNOVATIONS.2016.7880048 | 2016 12th International Conference on Innovations in Information Technology (IIT) |
Keywords | Field | DocType |
Web Document Retrieval,Query Generation,Text Reuse Detection,Fingerprinting | Data mining,World Wide Web,Search engine,Arabic,Information retrieval,Information technology,Reuse,Computer science,Ranking (information retrieval),Document retrieval,Source document,Query formulation | Conference |
ISSN | ISBN | Citations |
1819-9127 | 978-1-5090-5344-5 | 0 |
PageRank | References | Authors |
0.34 | 9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Leena Lulu | 1 | 17 | 4.11 |
Boumediene Belkhouche | 2 | 55 | 17.44 |
Saad Harous | 3 | 85 | 23.19 |