Title
DIMACS at the TREC 2004 Genomics Track
Abstract
DIMACS participated in the text categorization and ad hoc retrieval tasks of the TREC 2004 Genomics track. For the categorization task, we tackled the triage and annotation hierarchy subtasks. and biology of the laboratory mouse. In particular, the Mouse Genome Database (MGD) contains information on the characteristics and functions of genes in the mouse, and on where this information appeared in the scientific litera- ture. Human curators encode this information using con- trolled vocabulary terms from the Gene Ontology2 (GO), and provide citations to documents that report each piece of information. GO consists of three structured networks: Bi- ological Process (BP), Molecular Function (MF), and Cellu- lar Component (CC)) of terms describing attributes of genes and gene products. The TREC 2004 Genomics track defined a categorization task with three subtasks based on simplified versions of this curation process. DIMACS participated in two of those sub- tasks, triage and annotation hierarchy, but not in the anno- tation hierarchy plus evidence subtask. We discuss our two subtasks below, and full details are available in the track overview paper (4). the articles from the test set had, during MGI's operational manual triage process, been chosen for sending to GO cu- rators. (Whether curators had or hadn't actually linked to this document from any MGD entry was not an issue.) We can view this as a binary text classification problem, with articles chosen for curation during the triage process being positive examples, and those rejected during triage being negative examples. Logs from MGI were used to produce relevance judgments for the subtask data. Subtask partic- ipants were given the relevance judgments for the training set, which showed that 375 of the training set articles were positive examples (had been selected for curation) and 5462 training articles were negative examples. The test set rele- vance judgments, revealed after ocial runs were submitted,
Year
Venue
Keywords
2004
TREC
logistic regression
Field
DocType
Citations 
Data mining,Information retrieval,Computer science,Genomics,Triage,Text categorization,Bayesian logistic regression
Conference
12
PageRank 
References 
Authors
0.98
6
7
Name
Order
Citations
PageRank
Aynur A. Dayanik1646.24
Dmitriy Fradkin234419.25
Alexander Genkin323027.92
Paul B. Kantor4716115.67
David Madigan535836.10
David D. Lewis64560737.43
Vladimir Menkov7829.22