Cross-Language high similarity search using a conceptual thesaurus - Citegraph

Paper Info

Title
Cross-Language high similarity search using a conceptual thesaurus

Abstract
This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora.

Year	DOI	Venue
2012	10.1007/978-3-642-33247-0_8	CLEF
Keywords	Field	DocType
large cross-language collection,cross-language high similarity search,eurovoc conceptual thesaurus,cross-language high similarity,language pairs english-german,competitive result,state-of-the-art model,near-duplicates search,different nature,concept-based similarity model,artificial intelligence,similarity search	Information retrieval,Plagiarism detection,Computer science,Machine translation,Natural language processing,Artificial intelligence,Nearest neighbor search,Computation	Conference
Citations	PageRank	References
12	0.76	18
Authors
3

Authors (3 rows)

Cited by (12 rows)

References (18 rows)

Name	Order	Citations	PageRank
Parth Gupta	1	118	13.78
Alberto Barrón-Cedeño	2	346	29.35
paolo rosso	3	1831	188.74

1