Structural Term Extraction for Expansion of Template-Based Genomic Queries - Citegraph

Paper Info

Title
Structural Term Extraction for Expansion of Template-Based Genomic Queries

Abstract
This paper describes our experiments run to address the ad hoc task of the TREC 2005 Genomics track. The task topics were expressed with 5 different structures called Generic Topic Templates (GTTs). We hypothesized the presence of GTT- specific structural terms in the free-text fields of documents relevant to a topic instantiated from that same GTT. Our experiments aimed at extracting and selecting candidate structural terms for each GTT. Selected terms were used to expand initial queries and the quality of the term selection was measured by the impact of the expansion on initial search results. The evaluation used the task training topics and the associated relevance information. This paper describes the two term extraction methods used in the experiments and the resulting two runs sent to NIST for evaluation. This paper describes our experiments run to address the ad hoc task of the TREC 2005 Genomics track. The collection used for the task included a subset of the MEDLINE database, 50 test topics, and 10 additional sample topics with associated partial relevance information. The topics were expressed with 5 different structures called Generic Topic Templates (GTTs). We hypothesized that two kinds of terms were contained in title and abstract fields of documents relevant to an instance of a GTT: terms showing relevance to the GTT structure and terms showing relevance to the particular instance of the GTT, the topic. Terms relevant to a GTT are expected to express the generic information present in all instances of the GTT, such as interactions and relationships. Terms relevant to the instance of a GTT are expected to express the particular entities specific to the instance, i.e. the topic. We aimed at isolating terms specific to the structure of each GTT. GTT- specific terms were used for query expansion to seek to improve retrieval performance. Both relevance feedback and pseudo-relevance feedback were used to extract GTT- specific terms. Our methods were evaluated with the trec_eval program, using the partial relevance information available for the sample topics. The Físreál (1) search engine, developed at Dublin City University, was used to generate the rankings. The paper is organized in the following way: Section 2 introduces some background on the collection and relevance/pseudo-relevance feedback methods. Section 3 describes our experimental method and its analysis. Section 4 concludes on future work and experiments.

Year	Venue	Keywords
2005	TREC	query expansion,search engine,information retrieval
Field	DocType	Citations
Data mining,Information retrieval,Computer science,NIST,Natural language processing,Artificial intelligence,Template	Conference	2
PageRank	References	Authors
0.53	1	5

Authors (5 rows)

Cited by (2 rows)

References (1 rows)

Name	Order	Citations	PageRank
Fabrice Camous	1	38	4.26
Stephen Blott	2	1002	168.03
Cathal Gurrin	3	1031	139.37
Gareth J. F. Jones	4	2709	300.77
Alan F. Smeaton	5	4656	518.60

1