Title
Atypical structural tendencies among low-complexity domains in the protein data bank proteome.
Abstract
A variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure tendencies across the entire PDB proteome. Secondary structure tendencies varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure tendencies. Comparison of LCD secondary structure tendencies with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure tendencies as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural tendencies among LCDs parsed by the nature and magnitude of single amino acid enrichment. Author summary The structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences "AAAAAAAAAA", "EEEEEEEEEE", and "EEKRKEEEKE" will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.
Year
DOI
Venue
2020
10.1371/journal.pcbi.1007487; 10.1371/journal.pcbi.1007487.r001; 10.1371/journal.pcbi.1007487.r002; 10.1371/journal.pcbi.1007487.r003; 10.1371/journal.pcbi.1007487.r004
PLOS COMPUTATIONAL BIOLOGY
Keywords
DocType
Volume
Low-complexity domain,&#x03B1,-helix propensity,&#x03B2,-sheet propensity,intrinsic disorder,protein structure
Journal
16
Issue
ISSN
Citations 
1
1553-734X
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Sean M Cascarina100.34
Mikaela R Elder200.34
Eric D. Ross321.00