Abstract | ||
---|---|---|
This work addresses the issue of cross-language high similarity and near-duplicates search, where, for the given document, a highly similar one is to be identified from a large cross-language collection of documents. We propose a concept-based similarity model for the problem which is very light in computation and memory. We evaluate the model on three corpora of different nature and two language pairs English-German and English-Spanish using the Eurovoc conceptual thesaurus. Our model is compared with two state-of-the-art models and we find, though the proposed model is very generic, it produces competitive results and is significantly stable and consistent across the corpora. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-33247-0_8 | CLEF |
Keywords | Field | DocType |
large cross-language collection,cross-language high similarity search,eurovoc conceptual thesaurus,cross-language high similarity,language pairs english-german,competitive result,state-of-the-art model,near-duplicates search,different nature,concept-based similarity model,artificial intelligence,similarity search | Information retrieval,Plagiarism detection,Computer science,Machine translation,Natural language processing,Artificial intelligence,Nearest neighbor search,Computation | Conference |
Citations | PageRank | References |
12 | 0.76 | 18 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Parth Gupta | 1 | 118 | 13.78 |
Alberto Barrón-Cedeño | 2 | 346 | 29.35 |
paolo rosso | 3 | 1831 | 188.74 |