Title
Cross-Language high similarity search using a conceptual thesaurus
Abstract
This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora.
Year
DOI
Venue
2012
10.1007/978-3-642-33247-0_8
CLEF
Keywords
Field
DocType
large cross-language collection,cross-language high similarity search,eurovoc conceptual thesaurus,cross-language high similarity,language pairs english-german,competitive result,state-of-the-art model,near-duplicates search,different nature,concept-based similarity model,artificial intelligence,similarity search
Information retrieval,Plagiarism detection,Computer science,Machine translation,Natural language processing,Artificial intelligence,Nearest neighbor search,Computation
Conference
Citations 
PageRank 
References 
12
0.76
18
Authors
3
Name
Order
Citations
PageRank
Parth Gupta111813.78
Alberto Barrón-Cedeño234629.35
paolo rosso31831188.74