Measuring documents similarity in large corpus using MapReduce algorithm - Citegraph

Paper Info

Title
Measuring documents similarity in large corpus using MapReduce algorithm

Abstract
Document similarity measures between documents and queries has been extensively studied in information retrieval. Measuring the similarity of documents are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. There exist a large number of composed documents in a large amount of corpus. Most of them are required to compute the similarity for validation. In this paper, we propose our approach of measuring similarity between documents in large amount of corpus. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Simulation results, on Hadoop framework, show that our new MapReduce algorithm outperforms the classical ones in term of running time performance and increases the value of the similarity.

Year	DOI	Venue
2016	10.1109/ICMCS.2016.7905587	2016 5th International Conference on Multimedia Computing and Systems (ICMCS)
Keywords	DocType	ISSN
Hadoop cluster,document similarity,MapReduce programming model,similarity measure	Conference	2472-7652
ISBN	Citations	PageRank
978-1-5090-5147-2	1	0.35
References	Authors
3	3

Authors (3 rows)

Cited by (1 rows)

References (3 rows)

Name	Order	Citations	PageRank
Marouane Birjali	1	14	3.57
Abderrahim Beni-Hssane	2	1	0.35
Mohammed Erritali	3	14	10.03

1