Sequence-Based Properties that Identify Intrinsically Disordered Phase-Separating Protein Regions
We aimed to investigate the molecular mechanisms that drive liquid-liquid phase separation (LLPS) of intrinsically disordered proteins (IDPs). This phenomenon is critical in many cellular processes (including RNA metabolism, chromatin rearrangement, and signal transduction), and known to be driven primarily, but not exclusively, by IDPs. To fully understand how these processes occur and are regulated, it is important that we understand the interactions and sequence properties underlying phase separation behavior. IDPs are proteins that contain intrinsically disordered regions (IDRs), which are regions that do not adopt stable tertiary or secondary structures. While at least 40% of the human proteome is classified as IDPs, only a subset exhibit phase separation behavior. Previous work created a computer algorithm called ParSe (partition sequence) that successfully predicts folded, ID, and phase-separating (PS) IDRs from the protein primary sequence. This algorithm uses the polymer scaling exponent, v, and a conformational parameter, the intrinsic beta-turn propensity, to distinguish the three protein classes (folded, F; disordered, D; and phase-separating disordered, P). Here, we confirm that the v and beta-turn propensity values follow a normal distribution in three expanded protein sequence sets (PS-IDR, IDR, and Folded). Next, we determined the ability of 568 intrinsic sequence-based properties to define the F, D, and P populations in the sequence sets. We found that most of these properties yield statistically significant differences in the means of the sequence sets. Principal component analysis identified two principal modes of variance in the human proteome: one corresponding to physiochemical properties, like hydrophobicity, charge, or v, and the other to conformational propensities, like preferences for alpha-helix, beta-turn, or beta-sheet. These results established that a hydrophobicity scale could accurately distinguish between folded and ID populations, and that an alpha-helix scale paired with v could optimally identify PS-IDR from IDR. Using those three parameters, a second-generation version of ParSe was developed.
intrinsically disordered proteins, liquid-liquid phase separation
Ibrahim, A. (2022). Sequence-based properties that identify intrinsically disordered phase-separating protein regions (Unpublished thesis). Texas State University, San Marcos, Texas.