Title
Reannotation of protein-coding genes based on an improved graphical representation of DNA sequence.
Abstract
Over annotation of protein coding genes is common phenomenon in microbial genomes. the genome of Amsacta moorei entomopoxvirus (AmEPV) is a typical case, because more than 63% of its annotated ORFs are hypothetical In this article, we propose an improved graphical representation titled I-TN (Improved curve based on trinucleotides) curve, which allows direct inspection of composition and distribution of codons and asymmetric gene structure This improved graphical representation can also provide convenient tools for genome analysis From this presentation, 18 variables are exploited as numerical descriptors to represent the specific features of protein coding genes quantitatively, with which we reannotate the protein coding genes in several viral genomes Using the parameters trained on the experimentally validated genes. all of the 30 experimentally validated genes and 63 putative genes in AmEPV genome are recognized correctly as protein coding, the accuracies of the present method for self-test and cross-validation are 100%. respectively Twenty-eight annotated hypothetical genes arc predicted as noncoding and then the number of reannotated protein coding genes in AmEPV should be 266 instead of 294 reported in the original annotations Extending the present method trained in AmEPV to other entomopoxvirus genomes directly. such as Melanoplus sanguinipes entomopoxvirus (MsEPV), all of the 123 annotated function-known and putative genes are recognized correctly as protein coding. and 17 hypothetical genes are recognized as noncoding The present method could also be extended to other genomes with or without adaptation of training sets with high accuracy 2010 Wiley Periodicals. Inc. J Comput Chem 31 2126-2135, 2010
Year
DOI
Venue
2010
10.1002/jcc.21500
JOURNAL OF COMPUTATIONAL CHEMISTRY
Keywords
Field
DocType
graphical representation, protein coding gene,reannotation,numerical descriptor
Genome,Mathematical optimization,Gene,Annotation,Microbial Genomes,Chemistry,Coding (social sciences),DNA sequencing,ORFS,Bioinformatics,Computational biology,Numerical descriptors
Journal
Volume
Issue
ISSN
31
11
0192-8651
Citations 
PageRank 
References 
5
0.62
1
Authors
2
Name
Order
Citations
PageRank
Jia-Feng Yu182.04
Xiao Sun220917.20