Title
IIT at TREC 2002 Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval
Abstract
For TREC 10 we participated in the Named Page Finding Task and the Cross-Lingual Task. In the web track, we explored the use of linear combinations of term collections based on document structure. Our goal was to examine the effects of different term collection statistics based on document structure in respect to known item retrieval. We parsed documents into structural components and built specific term indexes based on that document structure. Each of those indices have their own collection statistics for term weighting based on the type of language used for that structure in the collection. For producing a single ranked list, we examined a weighted linear combination approach to merging results. Our approach to known item retrieval was equal or above the median 58% of the time and 71% above the mean score of submitted runs. In the Arabic track we participated in Arabic Cross-language Information Retrieval (CLIR) and in Arabic monolingual information retrieval. For the monolingual retrieval, we examined the use of two stemming algorithms. The first is a deeper approach, and the second is a pattern-based approach. For the Arabic CLIR, we explored the retrieval effectiveness by using a machine translation (MT) system and translation probabilities obtained from parallel documents collection provided by the United Nations (UN).
Year
Venue
Keywords
2002
TREC
known-item search,linear combination of retrieval strategies,document structure retrieval,pattern-based stemming named page finding task,light-stemming,cross-lingual arabic retrieval,indexation,information retrieval,document structure,machine translation
Field
DocType
Citations 
Linear combination,Weighting,Arabic,Information retrieval,Ranking,Computer science,Document Structure Description,Machine translation,Natural language processing,Artificial intelligence,Parsing,Merge (version control)
Conference
3
PageRank 
References 
Authors
1.25
20
6
Name
Order
Citations
PageRank
Abdur Chowdhury12013160.59
Mohammed Aljlayl21027.58
Eric C. Jensen369646.72
Steven M. Beitzel469646.72
david a grossman539946.60
Ophir Frieder63300419.55