Abstract | ||
---|---|---|
The exponential growth of available DNA sequences and the increased interoperability of biological information is triggering intergovernmental efforts aimed at increasing the access, dissemination, and analysis of sequence data. Achieving the efficient storage and processing of DNA material is an important goal that parallels well with the foreseen coding standardization on the horizon. This paper proposes novel coding approaches, for both the dissemination and processing of sequences, where the speed of the DNA processing is shown to be boosted by exploring more than the normally utilized eight bits for encoding a single nucleotide. Further gains are achieved by encoding the nucleotides together with their trailing alignment information as a single 64-bit data structure. The paper also proposes a slight modification to the established FASTA scheme in order to improve on its representation of alignment information. The significance of the propositions is confirmed by the encouraging results from empirical tests. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1016/j.cmpb.2010.03.014 | Computer Methods and Programs in Biomedicine |
Keywords | Field | DocType |
novel coding approach,sequence data,alignment information,available dna sequence,single nucleotide,coding standardization,dna material,biological information,omics processing,dna encoding,64-bit data structure,dna processing,nucleotides,dna sequence,data structure,exponential growth | Data mining,Interoperability,Computer science,Theoretical computer science,Coding (social sciences),DNA sequencing,Artificial intelligence,Standardization,Exponential growth,Data structure,Computer vision,DNA,Encoding (memory) | Journal |
Volume | Issue | ISSN |
100 | 2 | 1872-7565 |
Citations | PageRank | References |
0 | 0.34 | 4 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bostjan Murovec | 1 | 5 | 1.35 |
James M. Tiedje | 2 | 295 | 63.36 |
Blaz Stres | 3 | 2 | 1.23 |