Title
Probing the Randomness of Proteins by Their Subsequence Composition
Abstract
The quantitative underpinning of the information contents of biosequences represents an elusive goal and yet also an obvious prerequisite to the quantitative modeling and study of biological function and evolution. Previous studies have consistently exposed a tenacious lack of compressibility on behalf of biosequences. This leaves the question open as to what distinguishes them from random strings, the latter being clearly unpalatable to the living cell. This paper assesses the randomness of biosequences in terms on newly introduced parameters that relate to the vocabulary of their (suitably constrained) subsequences rather than their substrings. Results from experiments show the potential of the method in distinguishing a protein sequence from its random reshuffling, as well as in tasks of classification and clustering.
Year
DOI
Venue
2009
10.1109/DCC.2009.60
DCC
Keywords
Field
DocType
random reshuffling,living cell,biological function,previous study,quantitative underpinning,information content,subsequence composition,elusive goal,random string,obvious prerequisite,quantitative modeling,clustering,probability density function,dna,data mining,data compression,classification,protein sequence,organisms,construction industry,proteins,genetics
Substring,Computer science,Living cell,Theoretical computer science,Construction industry,Subsequence,Cluster analysis,Probability density function,Vocabulary,Randomness
Conference
ISSN
Citations 
PageRank 
1068-0314
0
0.34
References 
Authors
4
2
Name
Order
Citations
PageRank
Alberto Apostolico11441182.20
Fabio Cunial2729.68