In short...

It is generally accepted that many functional proteins do not have well-defined folded structures. These so-called intrinsically disordered proteins (IDPs) are encoded abundantly in the human genome and are involved in a variety of biological process including cell signalling, cell cycle control, molecular recognition, nucleic acid transcription and replication, as well as the development of neurodegenerative diseases and cancer. The studies of IDPs is an emerging field of research and general rules for describing their conformational behaviour and mechanisms are still missing. Thus, expanding the amount of experimental data from different systems as well as developing new techniques to characterize their properties are essential to improve our knowledge about this family of proteins.

We are interested in using the state-of-the-art nuclear magnetic resonance techniques and other biophysical methods in combination with novel computational modelling to explore the structural propensity and dynamics of IDPs and the mechanism of their interaction with other proteins or nucleic acids at atomic resolution.

Intrinsically disordered proteins

After a decade in the post-genome era, the determination of the functions of proteins encoded in DNA sequences is still one of the major challenges. It is a widely accepted concept that the function of a protein is determined by its three-dimensional structure. Numerous protein structures with their functional interpretations deposited in the Protein Data Bank (PDB) over last fifty years strongly support this idea. However, this structure-function paradigm has been reassessed extensively in recent years. From bioinformatics studies, intrinsically disordered proteins have been shown to be amply present in all kingdoms of life. It is estimated that approximately 50% of mammalian proteins contain long disordered regions (more than 30 residues), and approximately 25% of their proteins are expected to be fully disordered under physiological conditions. The lack of folded structure provides several advantages such as having a larger solvent exposed surface to enhance the chance of interacting with binding partners via the so-called “fly-casting” mechanism, as well as allowing them to act as scaffolds by interacting with different proteins. One of the most intriguing aspects of disordered proteins is that they often undergo structural transitions from disordered to folded forms upon binding to their physiological partners. This folding-upon-binding mechanism opens a new view of protein-protein and protein-DNA/RNA interactions. In spite of the advantages of being unstructured, the disorderedness of some of these proteins also leads to disease related aggregation or fibrillization. It is also estimated from bioinformatics studies that about 80% of cancer-associated proteins contain consecutive disordered regions. This new class of proteins is now mostly termed intrinsically disordered proteins (IDPs) or intrinsically disordered regions (IDRs) of structured proteins. With those key studies elucidating the importance of IDPs, “protein disorder” has become an emerging research field. From the accumulating amount of studies, it is now generally believed that IDPs play key roles in many physiological processes, including cell signalling, cell cycle control, molecular recognition, nucleic acid transcription and replication, as well as in the development of neurodegenerative diseases, cardiovascular diseases, amyloidoses, and type II diabetes.

Physiological and biochemical results have drawn our attention to the importance of IDPs, but several aspects about the mechanisms of IDP function are still unknown: How are IDPs recognized by the partner proteins in the absence of a folded structure? Does any specific pre-recognition conformation exist with their flexible nature? Can we derive a general rule to understand the conformational behaviour of these proteins from the primary sequence? In other words, can we predict the functions and mechanisms of IDPs from primary sequence? Insights into the dynamics and conformational propensities of these proteins at the atomic level will be a critical step on the way to answer these questions. Conventional approaches for structure determination or characterization is less feasible due to the structural heterogeneity of IDPs. Novel biophysical methods and computational models, therefore, become essential to overcome their rapidly inter-converting nature.

Our group is interested in using, nuclear magnetic resonance (NMR) spectroscopy, giving specific information for almost all atoms with minimal interference, to characterize IDPs. Particularly, two of the latest developed NMR techniques, residual dipolar couplings (RDCs) and paramagnetic relaxation enhancements (PREs), which are extremely sensitive to local conformational sampling and transient long-range interaction in unstructured proteins, will be applied to those systems studied. Other biophysical methods such as small angle X-ray scattering (SAXS), circular dichroism spectroscopy, and fluorescence spectroscopy will also be used as complementary methods. In addition, due to the heterogeneity of IDPs, a statistically significant computational model will be used to characterize the structural propensities of the IDPs. We are using experimental data as constraints to obtain representative conformational ensembles of IDPs. We are also developing new methods hopefully to predict the function of IDPs solely on the basis of primary sequence. We hope studies carried on in our group will improve our understanding of the structural dynamics, conformational behaviour, related biological processes, and the onset of pathological aggregation or fibrillization of IDPs.

Nuclear Magnetic Resonance Spectroscopy

NMR spectroscopy, giving specific information for almost all atoms with minimal interference, is one of the most powerful tools for experimental characterization of disordered proteins. In addition to those regularly measured parameters (chemical shifts, scalar couplings, nuclear Overhauser effects, and relaxation rates), two more recently developed experimental parameters, residual dipolar couplings (RDCs) and paramagnetic relaxation enhancements (PREs), will also be applied to probe the local conformational sampling and long-range distance information in IDPs.

nmr

Residual dipolar couplings

The size of RDCs can be calculated very precisely as ensemble and time averages from the well-understood geometry dependence of nucleus-nucleus dipolar interactions. In solution, this interaction vanishes due to molecular tumbling. However, a small part of the dipolar interaction (denoted as residual dipolar coupling) can be re-introduced by dissolving the protein molecules in weak alignment media such as stretched polyacrylamide gel or bicelles. As an illustrative example, the RDCs between amide nitrogen and proton (NH) are negative on the elongated part of a disordered protein because the angle between the NH vector and the external magnetic field is close to perpendicular, leading to the cosine function of such angle in the second-order Legendre polynomial to an extreme (the molecule supposed to be aligned parallel to the magnetic field). In contract, if there is a significant helical component populated, the angle would be close to zero leading to positive RDC values. Therefore, RDCs are extremely useful for local conformational studies even in the case of transiently populated structural propensities.

Paramagnetic relaxation enhancement

In contrast to RDCs which reports on local conformational sampling, PREs provide information about transient long-range contacts for inter- or intra- protein interactions. PREs can be observed after introducing a suitable paramagnetic tag, such as commercially available nitroxide MTSL or lanthanide chelating tags. Because the gyromagnetic ratio of the electron spin is over 600 times larger than the proton spin, the observed line broadening due to paramagnetic relaxation enhancement provides long-range probes of distances over 25 Å even if the contacts are weakly or transiently populated. In addition to using NMR signal line-broadening to estimate PREs as commonly used, explicit relaxation rates for different types of nuclei will also be recorded explicitly to reduce the uncertainties from the complexity of correlation times in unfolded proteins, and to provide sufficient and precise distance information for the characterization of the IDPs.

Computational modelling

Statistical coil model and constrained subensemble selection

scmodel

The so-called statistical coil model consists of an ensemble of structures in which the backbone dihedral angles sample amino acid-specific energy potentials based on their occurrence in the non-α-helical and non-β-sheet regions of highly resolved X-ray structures. An extremely efficient algorithm, flexible-Meccano, can be used to construct such model. This approach has been demonstrated to provide theoretical RDCs that compare well with experimental values in several cases. The deviation between predicted and experimental values is indicative of the presence of long-range contacts or residual. Furthermore, using a genetic algorithm, Asteroids, developed in Blackledge's group, a subensemble of structures that fulfils experimental data can be selected from flexible-Meccano generated pool. Residue-specific information of IDPs can be revealed from the selected subensembles using experimental observables such as RDCs, PREs, CSs and SAXS.

Restrained molecular dynamics simulation

rmd

Alternative to using conformational sampling and selecting method, MD simulation gives a route to dynamic properties and energy evolution. Due to the lack of computational power and underdevelopment of force field for unstructured proteins, restraint-free MD simulation is still challenging. Currently, simulation with assistance of experimental observables is a more feasible approach. In restrained MD simulation, a pseudo-energy potential term is added to the total energy function of the simulated system to minimize the difference between calculated values and experimental data. In addition, due to the heterogeneity of unstructured systems, a single conformer is not sufficient and not realistic to fulfill all experimental restraints. Therefore, a replica of structures is running in parallel and only the calculated values averaged over all conformers are necessary to target to experimental restraints.

Other biophysical techniques

Small angle X-ray scattering

SAXS has been used to characterize the shape of interacting proteins and overall dimensions of unfolded peptide chains. Unlike crystallography, sample prepared for SAXS methods is in solution similar to experiments conducted in NMR spectroscopy. Accordingly, SAXS is widely applied as a complementary method with NMR studies. The National Synchrotron Radiation Research Center has a beamline (BL23A) specifically dedicated for SAXS studies, providing a convenient access for SAXS measurement.

Spectroscopic techniques

Fluorescence and circular dichroism (CD) spectroscopy provide immediate assay of protein disorder. Far UV-CD is also sensitive to the poly-proline II helix conformation often populated in IDPs. These techniques will be used a preliminary check of the level of protein disorder.

References

General news/books about IDPs

Scientific reviews/books about IDPs

Scientific articles/reviews about NMR