Title
A Complex Prime Numerical Representation of Amino Acids for Protein Function Comparison.
Abstract
Computationally assessing the functional similarity between proteins is an important task of bioinformatics research. It can help molecular biologists transfer knowledge on certain proteins to others and hence reduce the amount of tedious and costly benchwork. Representation of amino acids, the building blocks of proteins, plays an important role in achieving this goal. Compared with symbolic representation, representing amino acids numerically can expand our ability to analyze proteins, including comparing the functional similarity of them. Among the state-of-the-art methods, electro-ion interaction pseudopotential (EIIP) is widely adopted for the numerical representation of amino acids. However, it could suffer from degeneracy that two different amino acid sequences have the same numerical representation, due to the design of EIIP. In light of this challenge, we propose a complex prime numerical representation (CPNR) of amino acids, inspired by the similarity between a pattern among prime numbers and the number of codons of amino acids. To empirically assess the effectiveness of the proposed method, we compare CPNR against EIIP. Experimental results demonstrate that the proposed method CPNR always achieves better performance than EIIP. We also develop a framework to combine the advantages of CPNR and EIIP, which enables us to improve the performance and study the unique characteristics of different representations.
Year
DOI
Venue
2016
10.1089/cmb.2015.0178
JOURNAL OF COMPUTATIONAL BIOLOGY
Keywords
Field
DocType
CPNR,EIIP,numerical representation,protein,sequence comparison
Prime (order theory),Computer science,Amino acid,Algorithm,Degeneracy (mathematics),Artificial intelligence,Protein function,Bioinformatics,Machine learning
Journal
Volume
Issue
ISSN
23.0
8
1066-5277
Citations 
PageRank 
References 
0
0.34
2
Authors
4
Name
Order
Citations
PageRank
Duo Chen171.50
Jiasong Wang251.14
Ming Yan3998.39
Sheng Bao421526.77