Title
Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.
Abstract
WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology.
Year
DOI
Venue
2012
10.1093/database/bas040
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION
Keywords
Field
DocType
data mining,information,ontology,community,biology,workflow,databases,resource
Data mining,Ontology,Multiple data,DictyBase,Gene ontology,Computer science,WormBase,The Arabidopsis Information Resource,Bioinformatics,Molecular Sequence Annotation,Workflow
Journal
Volume
ISSN
Citations 
2012
1758-0463
12
PageRank 
References 
Authors
0.56
15
13
Name
Order
Citations
PageRank
Kimberly Van Auken157960.06
Petra Fey21068.75
Tanya Z Berardini351651.20
Robert Dodson4603.08
Laurel Cooper5807.27
Donghui Li640122.86
Juancarlos Chan743766.45
Yuling Li81407.99
Siddhartha Basu9414.69
Hans-Michael Müller1062879.29
Rex L Chisholm1115113.82
Eva Huala1254158.82
Paul W Sternberg13739118.90