Title
Functional bioinformatics for Arabidopsis thaliana.
Abstract
Motivation: The genome of Arabidopsis thaliana, which has the best understoodplant genome,still hasapproximately one-thirdof its genes with no functional annotation at all from either MIPS or TAIR. We have applied our Data Mining Prediction (DMP) method to the problem of predicting the functional classes of these protein sequences. This method is based on using a hybrid machine-learning/data-mining method to identify patterns in the bioinformatic data about sequences that are predictive of function. We use data about sequence, predicted secondary structure, predicted structural domain, InterPro patterns, sequence similarity profile and expressions data. Results: We predicted the functional class of a high percentage of the Arabidopsis genes with currently unknown function. These pre- dictions are interpretable and have good test accuracies. We describe in detail seven of the rules produced. Availability: Rulesets are available at http://www.aber.ac.uk/compsci/ Research/bio/dss/arabpreds/ and predictions are available at http:// www.genepredictions.org
Year
DOI
Venue
2006
10.1093/bioinformatics/btl051
Bioinformatics
Keywords
Field
DocType
expressions data,arabidopsis thaliana,functional class,data-mining method,functional annotation,functional bioinformatics,arabidopsis gene,bioinformatic data,plant genome,protein sequence,sequence similarity profile,machine learning,data mining
Genome,Arabidopsis,Data mining,Gene,Annotation,Biology,Arabidopsis thaliana,Bioinformatics,Protein secondary structure,InterPro
Journal
Volume
Issue
ISSN
22
9
1367-4803
Citations 
PageRank 
References 
10
0.72
12
Authors
4
Name
Order
Citations
PageRank
Amanda Clare159247.37
Andreas Karwath222821.60
Helen Ougham3100.72
Ross D. King41774194.85