Polymer Properties that Predict Protein Structure Class from the Primary Sequence
Within cells, membrane-free compartments form spontaneously and reversibly in a process referred to as “phase separation”. By forming specific compartments and microenvironments, these membraneless organelles, for example, Cajal bodies, the nucleolus, stress granules, and P-bodies, are used to regulate a myriad of cellular functions via control of the spatial organization of biological matter and the concomitant modulation of biochemical reactivity. Proteins have a prominent role driving phase separation and, among phase-separating (PS) proteins, many have intrinsically disordered regions (IDRs) that are needed for phase separation to occur. Previous work created a computer algorithm called ParSe (Partition Sequence) that successfully identifies PS IDRs from the protein primary sequence starting from predictions of hydrodynamic size, which is indicative of the relative strength of intramolecular as compared to solvent interactions. The key assumption of ParSe is that intramolecular cohesion that compacts monomeric proteins is correlated with intermolecular cohesion that drives phase separation. To assess hydrodynamic size, ParSe uses a sequence-based model of the polymer scaling exponent, vmodel, that when paired with a second sequence-based parameter, the intrinsic propensity for a sequence to form β-turns, can distinguish between sequences belonging to one of three classes of protein regions: folded, ID, and PS ID. However, the prior study did not test whether the combination of vmodel and β-turn propensity is unique in its predictive power, as would be required if hydrodynamic size and turn structures are indeed mechanistically linked to protein phase separation. Here, it is shown that vmodel and β-turn propensity are not unique in their ability to identify PS IDRs but rather this can be done with similar fidelity using vmodel paired with a range of different types of conformational propensity scales or hydrophobicity scales. Thus, structural hypotheses relating to the mechanistic details of protein phase separation cannot be established based on these results. Moreover, when applying ParSe to verified globular proteins, we noticed that these proteins often contain short regions that are incorrectly predicted to be ID. We hypothesize that these predicted short IDRs within known folded regions represent segments within a folded domain that have low structural stability. To test this hypothesis, ParSe calculations were compared to hydrogen-deuterium exchange (HDX) data measured in four folded proteins. Good agreement between the locations of ParSe-predicted IDRs and regions with low stability as inferred by HDX rates were found.
LLPS, phase separation, proteins, primary sequence prediction
Khaodeuanepheng, N. (2022). Polymer properties that predict protein structure class from the primary sequence (Unpublished thesis). Texas State University, San Marcos, Texas.