Abstract | ||
---|---|---|
The quantitative underpinning of the information contents of biosequences represents an elusive goal and yet also an obvious prerequisite to the quantitative modeling and study of biological function and evolution. Previous studies have consistently exposed a tenacious lack of compressibility on behalf of biosequences. This leaves the question open as to what distinguishes them from random strings, the latter being clearly unpalatable to the living cell. This paper assesses the randomness of biosequences in terms on newly introduced parameters that relate to the vocabulary of their (suitably constrained) subsequences rather than their substrings. Results from experiments show the potential of the method in distinguishing a protein sequence from its random reshuffling, as well as in tasks of classification and clustering. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1109/DCC.2009.60 | DCC |
Keywords | Field | DocType |
random reshuffling,living cell,biological function,previous study,quantitative underpinning,information content,subsequence composition,elusive goal,random string,obvious prerequisite,quantitative modeling,clustering,probability density function,dna,data mining,data compression,classification,protein sequence,organisms,construction industry,proteins,genetics | Substring,Computer science,Living cell,Theoretical computer science,Construction industry,Subsequence,Cluster analysis,Probability density function,Vocabulary,Randomness | Conference |
ISSN | Citations | PageRank |
1068-0314 | 0 | 0.34 |
References | Authors | |
4 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alberto Apostolico | 1 | 1441 | 182.20 |
Fabio Cunial | 2 | 72 | 9.68 |