Title
Knowledge-Intensive and Statistical Approaches to the Retrieval and Annotation of Genomics MEDLINE Citations
Abstract
Retrieving and annotating relevant informa- tion sources in the genomics literature are dif- ficult but common tasks undertaken by biolo- gists. The research presented here addresses these issues by exploring methods for retriev- ing MEDLINE® citations that answer real bi- ologists' information needs and by addressing the initial tasks required to annotate MED- LINE citations having genomic content with terms from the Gene Ontology (GO). We ap- proached the retrieval task using two methods: aggressive, knowledge-intensive query expan- sion and text neighboring. Our approaches to the triage subtask for annotation consisted of traditional machine learning (ML) methods as well as a novel ML algorithm for thematic analysis. Finally, we used a statistical, n-gram heuristic to decide which of the GO hierar- chies should be used to annotate a given MEDLINE citation.
Year
Venue
Keywords
2004
TREC
medline,genomics,mesh,information need,machine learning
Field
DocType
Citations 
Thematic analysis,Data mining,Heuristic,Information needs,Annotation,Information retrieval,Query expansion,Computer science,Citation,Triage,MEDLINE
Conference
8
PageRank 
References 
Authors
0.85
8
12
Name
Order
Citations
PageRank
Alan R. Aronson12551260.67
Susanne M. Humphrey256163.27
Nicholas C. Ide39110.78
Won Kim4173171.15
Russell R. Loane5303.36
James G. Mork664765.22
Lawrence H. Smith719614.48
Lorraine Tanabe838329.80
W. John Wilbur942443.91
Natalie Xie101239.13
Dina Demner Fushman111717147.70
Hongfang Liu121479160.66