Abstract | ||
---|---|---|
A new simple method is found for efficient and accurate identification of coding sequences in prokaryotic genome. The method employs a Shannon description of artificial language for DNA sequences. It consists in translating a DNA sequence into a pseudo-amino acid sequence with 20 fundamental words according to the universal genetic code. With an entropy-density profile (EDP), the method maps a sequence of finite length to a vector and then analyzes its position in the 20-dimensional phase space depending on its nature. It is found that the ratio of the relative distance to an averaged coding and non-coding EDP over a small number (up to one) of open reading frames (ORFs) can serve as a good coding potential. An iterative algorithm is designed for finding a set of "root" sequences using this coding potential. A multivariate entropy distance (MED) algorithm is then proposed for the identification of prokaryotic genes; it has a feature to combine the use of a coding potential and an EDP-based sequence similarity analysis. The current version of MED is unsupervised, parameter-free and simple to implement. It is demonstrated to be able to detect 95-99% genes with 10-30% of additional genes when tested against the RefSeq database of NCBI and to detect 97.5-99.8% of confirmed genes with known functions. It is also shown to be able to find a set of (functionally known) genes that are missed by other well-known gene finding algorithms. All measurements show that the MED algorithm reaches a similar performance level as the algorithms like GeneMark and Glimmer for prokaryotic gene prediction. |
Year | DOI | Venue |
---|---|---|
2004 | 10.1142/S0219720004000624 | J. Bioinformatics and Computational Biology |
Keywords | Field | DocType |
entropy,gene finding algorithm,linguistic description of dna | Genome,Sequence alignment,Small number,Gene,Biology,Iterative method,Genetic code,Coding (social sciences),DNA sequencing,Bioinformatics | Journal |
Volume | Issue | ISSN |
2 | 2 | 0219-7200 |
Citations | PageRank | References |
7 | 1.11 | 6 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Zhengqing Ouyang | 1 | 8 | 1.47 |
Huaiqiu Zhu | 2 | 162 | 15.27 |
Jin Wang | 3 | 7 | 1.11 |
Zhen-Su She | 4 | 125 | 9.43 |