Title
Document Structure with IR Tools
Abstract
The IRTools software toolkit was modified for 2003 to utilize a MySQL database for the inverted index. Indexing was for each occurrence of each term in the collection, with HTML structure, location offset, paragraph, and subdocument weight considered. This structure enables some more sophisticated queries than a "bag of words" approach. Post hoc results from the TREC 2002 Named Page Web task are presented, in which a staged fall through approach to topic processing yielded good results, with exact precision of 0.49. The paper also provides an overview of IRTools and its interactive interface, as well as an invitation for IR researchers to get involved with the GridIR standards formation process.
Year
Venue
DocType
2003
TREC
Conference
Citations 
PageRank 
References 
1
0.63
1
Authors
1
Name
Order
Citations
PageRank
Gregory B. Newby122032.13