Abstract | ||
---|---|---|
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF1 in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a Hindi- English bilingual lexicon, 'Shabdanjali', consisting of approx. 26K Hindi words. But neither we had any effective Bengali-English bilingual lexicon nor any parallel corpora to build the statistical lexicon. Under this limited resources, we mostly depended on our phoneme-based transliterations to generate equivalent English query from Hindi and Bengali topics. We adopted Automatic Query Generation and Machine Translation approach for our experiment. Other language-specific resources included a Bengali morphological analyzer, a Hindi stemmer and a set of 200 Hindi and 273 Bengali stop- words. Lucene framework was used for stemming, indexing, retrieval and scoring of the corpus documents. The CLEF results suggested the need for a rich bilingual lexicon for CLIR involving Indian languages. The best MAP values for Bengali, Hindi and English queries for our experiment were 7.26, 4.77 and 36.49 respectively. |
Year | Venue | Keywords |
---|---|---|
2007 | CLEF (Working Notes) | hindi,transliteration,cross-language text retrieval,clef evaluation.,bengali,measurement,performance |
Field | DocType | Citations |
Hindi,Computer science,Machine translation,Search engine indexing,Bengali,Lexicon,Natural language processing,Artificial intelligence,Linguistics,Text retrieval,Clef,Stop words | Conference | 2 |
PageRank | References | Authors |
0.40 | 9 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Debasis Mandal | 1 | 4 | 1.54 |
Sandipan Dandapat | 2 | 73 | 15.17 |
Mayank Gupta | 3 | 118 | 10.60 |
Pratyush Banerjee | 4 | 52 | 6.57 |
Sudeshna Sarkar | 5 | 423 | 210.58 |