Title
Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources.
Abstract
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF1 in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a Hindi- English bilingual lexicon, 'Shabdanjali', consisting of approx. 26K Hindi words. But neither we had any effective Bengali-English bilingual lexicon nor any parallel corpora to build the statistical lexicon. Under this limited resources, we mostly depended on our phoneme-based transliterations to generate equivalent English query from Hindi and Bengali topics. We adopted Automatic Query Generation and Machine Translation approach for our experiment. Other language-specific resources included a Bengali morphological analyzer, a Hindi stemmer and a set of 200 Hindi and 273 Bengali stop- words. Lucene framework was used for stemming, indexing, retrieval and scoring of the corpus documents. The CLEF results suggested the need for a rich bilingual lexicon for CLIR involving Indian languages. The best MAP values for Bengali, Hindi and English queries for our experiment were 7.26, 4.77 and 36.49 respectively.
Year
Venue
Keywords
2007
CLEF (Working Notes)
hindi,transliteration,cross-language text retrieval,clef evaluation.,bengali,measurement,performance
Field
DocType
Citations 
Hindi,Computer science,Machine translation,Search engine indexing,Bengali,Lexicon,Natural language processing,Artificial intelligence,Linguistics,Text retrieval,Clef,Stop words
Conference
2
PageRank 
References 
Authors
0.40
9
5
Name
Order
Citations
PageRank
Debasis Mandal141.54
Sandipan Dandapat27315.17
Mayank Gupta311810.60
Pratyush Banerjee4526.57
Sudeshna Sarkar5423210.58