Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources. - Citegraph

Paper Info

Title
Bengali and Hindi to English Cross-language Text Retrieval under Limited Resources.

Abstract
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF1 in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a Hindi- English bilingual lexicon, 'Shabdanjali', consisting of approx. 26K Hindi words. But neither we had any effective Bengali-English bilingual lexicon nor any parallel corpora to build the statistical lexicon. Under this limited resources, we mostly depended on our phoneme-based transliterations to generate equivalent English query from Hindi and Bengali topics. We adopted Automatic Query Generation and Machine Translation approach for our experiment. Other language-specific resources included a Bengali morphological analyzer, a Hindi stemmer and a set of 200 Hindi and 273 Bengali stop- words. Lucene framework was used for stemming, indexing, retrieval and scoring of the corpus documents. The CLEF results suggested the need for a rich bilingual lexicon for CLIR involving Indian languages. The best MAP values for Bengali, Hindi and English queries for our experiment were 7.26, 4.77 and 36.49 respectively.

Year	Venue	Keywords
2007	CLEF (Working Notes)	hindi,transliteration,cross-language text retrieval,clef evaluation.,bengali,measurement,performance
Field	DocType	Citations
Hindi,Computer science,Machine translation,Search engine indexing,Bengali,Lexicon,Natural language processing,Artificial intelligence,Linguistics,Text retrieval,Clef,Stop words	Conference	2
PageRank	References	Authors
0.40	9	5

Authors (5 rows)

Cited by (2 rows)

References (9 rows)

Name	Order	Citations	PageRank
Debasis Mandal	1	4	1.54
Sandipan Dandapat	2	73	15.17
Mayank Gupta	3	118	10.60
Pratyush Banerjee	4	52	6.57
Sudeshna Sarkar	5	423	210.58

1