Title
A Fast and Efficient Method for Estimating Amino Acid Substitution Models
Abstract
Amino acid substitution models (matrices) play important role for protein phylogenetics analysis and protein sequence alignment. Different approaches have been proposed to estimate amino acid substitution matrices since the time of Day Hoff in 1972. Currently, maximum likelihood approaches have been widely used to estimate popular matrices such as WAG, LG, FLU, etc. Although maximum likelihood approaches result in high quality matrices, they are slow and not applicable to very large datasets. The most time consuming step in estimating matrices is building phylogenetics trees from protein alignments. In this paper, we propose new methods to overcome the obstacle by splitting large alignments into small ones which still contain enough evolutionary information for estimating matrices. Experiments with both Pfam and FLU datasets showed that proposed methods were about three to nine times faster than the best current method while the quality of estimated matrices are nearly the same. Thus, our methods will enable researchers to estimate matrices from very large datasets.
Year
DOI
Venue
2011
10.1109/KSE.2011.21
KSE
Keywords
Field
DocType
amino acid substitution model,large datasets,protein sequence alignment,estimating amino acid substitution,high quality matrix,evolution (biological),amino acid substitution matrices,trees (mathematics),genetics,protein alignment,maximum likelihood estimation,proteins,amino acid substitution models,splitting large alignment,maximum likelihood approaches,biology computing,flu datasets,evolutionary information,maximum likelihood,protein phylogenetics analysis,phylogenetics trees,efficient method,maximum likelihood methods,amino acid substitution matrix,phylogenetic trees,maximum likelihood method,protein sequence,phylogenetic tree
Time of day,Phylogenetic tree,Pattern recognition,Matrix (mathematics),Computer science,Amino acid,Maximum likelihood,Artificial intelligence,Protein sequence alignment,Evolutionary information
Conference
ISBN
Citations 
PageRank 
978-1-4577-1848-9
1
0.37
References 
Authors
1
4
Name
Order
Citations
PageRank
Van Dat Le110.37
Cuong Cao Dang261.41
Le Si Quang351.86
Vinh Sy Le4419.52