Title
Annotating protein function through lexical analysis
Abstract
We now know the full genomes of more than 60 organisms. The experimental characterization of the newly sequenced proteins is deemed to lack behind this explosion of naked sequences (sequencefunction gap). The rate at which expert annotators add the experimental information into more or less controlled vocabularies of databases snails along at an even slower pace. Most methods that annotate protein function exploit sequence similarity by transferring experimental information for homologues. A crucial development aiding such transfer is large-scale, work- and management-intensive projects aimed at developing a comprehensive ontology for gene-protein function, such as the Gene Ontology project. In parallel, fully automatic or semiautomatic methods have successfully begun to mine the existing data through lexical analysis. Some of these tools target parsing controlled vocabulary from databases; others venture at mining free texts from MEDLINE abstracts or full scientific papers. Automated text analysis has become a rapidly expanding discipline in bioinformatics. A few of these tools have already been embedded in research projects.
Year
DOI
Venue
2004
10.1609/aimag.v25i1.1746
AI Magazine
Keywords
Field
DocType
experimental characterization,annotating protein function,automated text analysis,controlled vocabulary,lexical analysis,full genomes,annotate protein function,experimental information,gene-protein function,databases snail,full scientific paper,text analysis
Ontology,Gene Ontology Project,Pace,Information retrieval,Computer science,Controlled vocabulary,Exploit,Protein function,Artificial intelligence,Parsing,Lexical analysis
Journal
Volume
Issue
ISSN
25
1
0738-4602
Citations 
PageRank 
References 
0
0.34
38
Authors
2
Name
Order
Citations
PageRank
Rajesh Nair1737.60
Burkhard Rost279588.14