Advertisement

Successes and challenges in simulating the folding of large proteins

  • Anne Gershenson
    Correspondence
    To whom correspondence may be addressed: Dept. of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, MA 01003. Tel.: 413-545-1250; Fax: 413-545-1289
    Affiliations
    Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, Massachusetts 01003

    Molecular and Cellular Biology Graduate Program, University of Massachusetts, Amherst, Massachusetts 01003
    Search for articles by this author
  • Shachi Gosavi
    Correspondence
    To whom correspondence may be addressed: Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore-560065, India. Tel.: 91-80-2366-6106
    Affiliations
    Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore-560065, India
    Search for articles by this author
  • Pietro Faccioli
    Correspondence
    To whom correspondence may be addressed: Dipartimento di Fisica, Universitá degli Studi di Trento, 38122 Povo (Trento), Italy. Tel.: 39-0461281698
    Affiliations
    Dipartimento di Fisica, Universitá degli Studi di Trento, 38122 Povo (Trento), Italy

    Trento Institute for Fundamental Physics and Applications, 38123 Povo (Trento), Italy
    Search for articles by this author
  • Patrick L. Wintrode
    Correspondence
    To whom correspondence may be addressed: Dept. of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201. Tel.: 410-706-6639; Fax: 410-706-0886
    Affiliations
    Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201
    Search for articles by this author
Open AccessPublished:November 11, 2019DOI:https://doi.org/10.1074/jbc.REV119.006794
      Computational simulations of protein folding can be used to interpret experimental folding results, to design new folding experiments, and to test the effects of mutations and small molecules on folding. However, whereas major experimental and computational progress has been made in understanding how small proteins fold, research on larger, multidomain proteins, which comprise the majority of proteins, is less advanced. Specifically, large proteins often fold via long-lived partially folded intermediates, whose structures, potentially toxic oligomerization, and interactions with cellular chaperones remain poorly understood. Molecular dynamics based folding simulations that rely on knowledge of the native structure can provide critical, detailed information on folding free energy landscapes, intermediates, and pathways. Further, increases in computational power and methodological advances have made folding simulations of large proteins practical and valuable. Here, using serpins that inhibit proteases as an example, we review native-centric methods for simulating the folding of large proteins. These synergistic approaches range from Gō and related structure-based models that can predict the effects of the native structure on folding to all-atom-based methods that include side-chain chemistry and can predict how disease-associated mutations may impact folding. The application of these computational approaches to serpins and other large proteins highlights the successes and limitations of current computational methods and underscores how computational results can be used to inform experiments. These powerful simulation approaches in combination with experiments can provide unique insights into how large proteins fold and misfold, expanding our ability to predict and manipulate protein folding.

      Introduction

      To function, structured proteins need to reproducibly fold to a unique three-dimensional structure in a biologically reasonable timescale. The observation that proteins reliably fold despite having astronomical numbers of possible conformations has been the impetus behind decades of experimental and theoretical folding studies (
      • Onuchic J.N.
      • Luthey-Schulten Z.
      • Wolynes P.G.
      Theory of protein folding: the energy landscape perspective.
      ,
      • Dill K.A.
      • MacCallum J.L.
      The protein-folding problem, 50 years on.
      ,
      • Gruebele M.
      • Dave K.
      • Sukenik S.
      Globular protein folding in vitro in vivo.
      ). However, protein-folding pathways and folding intermediates are of interest not only in fundamental biophysics. Partially folded states expose surfaces that are normally buried. If these states are populated for extended periods of time, they may be recognized by elements of the cell's protein quality control machinery that can assist in folding or target proteins for degradation (
      • Hebert D.N.
      • Molinari M.
      In and out of the ER: protein folding, quality control, degradation, and related human diseases.
      ,
      • Hartl F.U.
      • Hayer-Hartl M.
      Converging concepts of protein folding in vitro in vivo.
      ,
      • Labbadia J.
      • Morimoto R.I.
      The biology of proteostasis in aging and disease.
      ,
      • Dubnikov T.
      • Ben-Gedalya T.
      • Cohen E.
      Protein quality control in health and disease.
      ). Further, in many protein-folding diseases, pathological mutant proteins populate partially folded nonnative conformations and these conformations may result in nonnative protein-protein associations or oligomerization. A detailed understanding of protein folding and misfolding pathways thus has the potential to aid in the development of therapeutic interventions that prevent misfolding or reduce the population of intermediates. Alternately, in diseased cells, drugs could be designed that promote misfolding and drive cells into apoptosis.
      A challenge to developing such a detailed understanding is posed by the transient nature of the intermediates. Even the most long-lived folding intermediates rarely persist beyond timescales of a few minutes at most. Experimentally, transient intermediate states have been detected and characterized using spectroscopic methods such as fluorescence and CD as well as by small-angle X-ray scattering and other scattering methods, and, although technical advances have allowed the detection of rare and excited states by NMR (
      • Jiang Y.
      • Kalodimos C.G.
      NMR studies of large proteins.
      ,
      • Sekhar A.
      • Kay L.E.
      An NMR view of protein dynamics in health and disease.
      ) and X-ray crystallography (
      • Hekstra D.R.
      • White K.I.
      • Socolich M.A.
      • Henning R.W.
      • Šrajer V.
      • Ranganathan R.
      Electric-field-stimulated protein mechanics.
      ), these molecularly detailed methods may not be widely available, or such experiments may not be sufficient to detect folding intermediates.
      By contrast, computer simulations provide a detailed picture of protein folding not easily accessible to experiment. Specifically, protein-folding simulations can provide valuable, detailed, testable data on how proteins fold and misfold and may be used to formulate hypotheses on how protein folding might be manipulated. In the last decade, special purpose supercomputers such as ANTON (
      • Piana S.
      • Klepeis J.L.
      • Shaw D.E.
      Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations.
      ) and massively distributed computing schemes such as [email protected] (
      • Pande V.S.
      Understanding protein folding using Markov state models.
      ) have made it possible to simulate the folding of small proteins in all-atom detail using realistic empirical force fields, without the aid of any biasing forces. Whereas these simulations have provided invaluable insights, even special purpose, high-performance computing platforms are limited to simulating the folding of smaller chains (currently ∼100 amino acids with folding times up to milliseconds). However, the median length of a protein is 532, 365, and 329 amino acids in eukaryotes, bacteria, and archaea, respectively (
      • Wang M.
      • Kurland C.G.
      • Caetano-Anollés G.
      Reductive evolution of proteomes and protein structures.
      ), and folding times range from microseconds to tens of minutes. Even with anticipated continuing increases in computing power, simulating the folding of these larger slower folding proteins using standard molecular dynamics (MD)
      The abbreviations used are: MD
      molecular dynamics
      AAT
      α1-antitrypsin
      SBM
      structure-based model
      BF
      bias functional
      SCPS
      self-consistent path sampling
      RCL
      reactive center loop
      ER
      endoplasmic reticulum
      AWSEM
      associative memory, water-mediated, structure and energy model
      DHFR
      dihydrofolate reductase
      DLD
      discontinuous loop domain
      ABD
      adenosine-binding domain
      WSME
      Wako-Saito-Muñoz-Eaton
      AKE
      adenylate kinase
      HDX
      hydrogen-deuterium exchange
      PDB
      Protein Data Bank
      FEP
      folding free energy profile(s).
      simulations will remain out of reach for the foreseeable future. In addition to the increased computational demands due to size, large slow folding proteins often fold through long-lived intermediates corresponding to deep local energy minima. For such proteins, even very long simulations are likely to simply observe the protein exploring limited conformational space within a single local minimum because transitions between minima are rare.
      To simulate such rare transitions, numerous methods for accelerating or enhancing sampling during MD simulations have been developed (
      • Gō N.
      Statistical mechanics of protein folding, unfolding and fluctuation.
      ,
      • Karanicolas J.
      • Brooks 3rd, C.L.
      Improved Gō-like models demonstrate the robustness of protein folding mechanisms towards non-native interactions.
      ,
      • Papoian G.A.
      • Wolynes P.G.
      AWSEM-MD: from neural networks to protein structure prediction and functional dynamics of complex biomolecular assemblies.
      ,
      • Bolhuis P.G.
      • Chandler D.
      • Dellago C.
      • Geissler P.L.
      Transition path sampling: throwing ropes over rough mountain passes, in the dark.
      ,
      • Barducci A.
      • Bonomi M.
      • Parrinello M.
      Metadynamics.
      ,
      • Bello-Rivas J.M.
      • Elber R.
      Simulations of thermodynamics and kinetics on rough energy landscapes with milestoning.
      ,
      • Elber R.
      A new paradigm for atomically detailed simulations of kinetics in biophysical systems.
      ,
      • Cabriolu R.
      • Skjelbred Refsnes K.M.
      • Bolhuis P.G.
      • van Erp T.S.
      Foundations and latest advances in replica exchange transition interface sampling.
      ,
      • Husic B.E.
      • Pande V.S.
      Markov state models: From an art to a science.
      ,
      • Orioli S.
      • a Beccara S.
      • Faccioli P.
      Self-consistent calculation of protein folding pathways.
      ). One particularly effective approach to efficiently simulate the folding of large proteins is to take advantage of the evidence that protein folding is to a large extent encoded by the contacts in the native, folded protein. Computational schemes in which the force field explicitly includes a biasing term favoring native contacts allow the simulation of protein folding with many orders of magnitude less computational effort than is required with conventional MD methods. These studies, including folding simulations of large proteins such as adenylate kinase (
      • Li W.
      • Terakawa T.
      • Wang W.
      • Takada S.
      Energy landscape and multiroute folding of topologically complex proteins adenylate kinase and 2ouf-knot.
      ,
      • Giri Rao V.V.H.
      • Gosavi S.
      In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
      ), GFP (
      • Reddy G.
      • Liu Z.
      • Thirumalai D.
      Denaturant-dependent folding of GFP.
      ), TIM barrels (
      • Halloran K.T.
      • Wang Y.
      • Arora K.
      • Chakravarthy S.
      • Irving T.C.
      • Bilsel O.
      • Brooks 3rd, C.L.
      • Matthews C.R.
      Frustration and folding of a TIM barrel protein.
      ), dihydrofolate reductase (
      • Inanami T.
      • Terada T.P.
      • Sasai M.
      Folding pathway of a multidomain protein depends on its topology of domain connectivity.
      ), a DNA polymerase (
      • Wang Y.
      • Chu X.
      • Suo Z.
      • Wang E.
      • Wang J.
      Multidomain protein solves the folding problem by multifunnel combined landscape: theoretical investigation of a Y-family DNA polymerase.
      ), and serpins (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ,
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ), have been successful in generating folding pathways and intermediates that agree with experimental results and provide testable hypotheses on what intermediate states are likely to be populated during folding.
      This review focuses on native-centric simulation methods that are applicable to the folding of large and slow-folding proteins. We consider two categories of techniques, both of which rely on the native protein contact map but have different levels of spatial resolution and chemical detail: (i) Gō and related structure-based models (SBMs) that provide knowledge on the effects of structure on folding and (ii) all-atom-based methods that take into account the effects of side-chain chemistry on folding and can therefore predict how mutations may affect folding.
      As a prototypical case study, we discuss how both classes of methods have been used to simulate the folding of the canonical inhibitory serpin α1-antitrypsin (AAT) a 394-amino acid protein with folding times as long as tens of minutes (
      • Kim D.
      • Yu M.H.
      Folding pathway of human α1-antitrypsin: characterization of an intermediate that is active but prone to aggregation.
      ,
      • Tsutsui Y.
      • Dela Cruz R.
      • Wintrode P.L.
      Folding mechanism of the metastable serpin α1-antitrypsin.
      ,
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ). The interest in this specific protein resides in the fact that it has a topologically complex native structure, and the functional conformation is a kinetically trapped state, not the lowest free energy conformation (
      • Honeycutt J.D.
      • Thirumalai D.
      Metastability of the folded states of globular proteins.
      ,
      • Mottonen J.
      • Strand A.
      • Symersky J.
      • Sweet R.M.
      • Danley D.E.
      • Geoghegan K.F.
      • Gerard R.D.
      • Goldsmith E.J.
      Structural basis of latency in plasminogen activator inhibitor-1.
      ,
      • Lawrence D.A.
      • Olson S.T.
      • Palaniappan S.
      • Ginsburg D.
      Engineering plasminogen activator inhibitor 1 mutants with increased functional stability.
      ,
      • Auer S.
      • Miller M.A.
      • Krivov S.V.
      • Dobson C.M.
      • Karplus M.
      • Vendruscolo M.
      Importance of metastable states in the free energy landscapes of polypeptide chains.
      ,
      • Im H.
      • Woo M.-S.
      • Hwang K.Y.
      • Yu M.-H.
      Interactions causing the kinetic trap in serpin protein folding.
      ). Furthermore, specific point mutations are known to enhance its misfolding propensity, giving rise to misfolding diseases (
      • Stein P.E.
      • Carrell R.W.
      What do dysfunctional serpins tell us about molecular mobility and disease?.
      ,
      • Greene C.M.
      • Marciniak S.J.
      • Teckman J.
      • Ferrarotti I.
      • Brantly M.L.
      • Lomas D.A.
      • Stoller J.K.
      • McElvaney N.G.
      α1-Antitrypsin deficiency.
      ). We show how the two theoretical methods discussed above provide complementary results and how these results may be used to inform experiments, interpret experimental results, and generate hypotheses on how folding proteins may interact with the protein quality control machinery.

      The importance of native interactions and the native-centric approach to protein folding

      A physical basis and justification for native-centric approaches to modeling protein folding was provided by the energy landscape picture of protein folding and the principle of minimal frustration developed in the 1980s and 1990s (
      • Onuchic J.N.
      • Luthey-Schulten Z.
      • Wolynes P.G.
      Theory of protein folding: the energy landscape perspective.
      ). The effective potential energy (averaged over solvent degrees of freedom) as a function of chain conformation defines a protein's energy landscape (Fig. 1). This multidimensional energy landscape results from multiple driving forces and constraints, including the drive to bury hydrophobic residues, to satisfy hydrogen bond donors and acceptors, and to solvate or pair charged residues and the constraints of chain connectivity and steric clashes. According to arguments adapted from the physics of disordered systems, random amino acid sequences will be characterized by irreconcilable conflicts between these multiple driving forces and constraints, termed energetic frustration, resulting in many unrelated structures of similar energy (
      • Bryngelson J.D.
      • Wolynes P.G.
      Spin glasses and the statistical mechanics of protein folding.
      ,
      • Shakhnovich E.I.
      • Gutin A.M.
      Formation of unique structure in polypeptide chains: theoretical investigation with the aid of a replica approach.
      ).
      Figure thumbnail gr1
      Figure 1Cartoons showing funneled energy landscapes of protein folding. A, energy landscape of a realistic protein in which the funneled landscape is rugged and contains local minima and barriers that can lead to long-lived intermediate states. B, idealized perfectly smooth energy landscape of a much less frustrated protein. This type of smooth landscape is encoded in the simplest Gō models. Images were obtained from http://dillgroup.org/#/landscapes6 and used with permission under Creative Commons BY 4.0 license.
      The principle of minimal frustration states that, unlike random sequences, the sequences of naturally occurring proteins have been selected by evolution to minimize energetic conflicts between interactions in the native conformation (
      • Onuchic J.N.
      • Luthey-Schulten Z.
      • Wolynes P.G.
      Theory of protein folding: the energy landscape perspective.
      ). As a result, compared with alternative structures, the native structure is a better-optimized solution to the problem of satisfying the large number of competing interactions. The energy landscape of such a minimally frustrated protein resembles a high-dimensional funnel with the native state at the bottom. Although local minima and barriers still exist (the funnel is “rugged”), the global energetic bias toward the native structure promotes efficient folding (Fig. 1).
      A funneled energy landscape implies that the potential energy of the system should decrease with increasing numbers of native contacts, suggesting that the number of native contacts formed should serve as a good reaction coordinate for folding. Early support for a native-centric picture of protein folding was provided by simulations of simplified lattice proteins (
      • Gō N.
      Statistical mechanics of protein folding, unfolding and fluctuation.
      ,
      • Taketomi H.
      • Ueda Y.
      • Gō N.
      Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions.
      ,
      • Camacho C.J.
      • Thirumalai D.
      Modeling the role of disulfide bonds in protein folding: entropic barriers and pathways.
      ,
      • Klimov D.K.
      • Thirumalai D.
      Multiple protein folding nuclei and the transition state ensemble in two-state proteins.
      ,
      • Gin B.C.
      • Garrahan J.P.
      • Geissler P.L.
      The limited role of nonnative contacts in the folding pathways of a lattice protein.
      ). More recently, Best et al. (
      • Best R.B.
      • Hummer G.
      • Eaton W.A.
      Native contacts determine protein folding mechanisms in atomistic simulations.
      ) analyzed ultralong unbiased all-atom simulations of several small (<100-aa) fast-folding proteins and found that during transitions between unfolded and folded states, native contacts persisted significantly longer than nonnative contacts.

      Gō models encode native structure and can be used to understand protein-folding dynamics

      The funneled shape of a protein's energy landscape implies that there is a minimal chance that the polypeptide chain remains trapped in local energy minima (
      • Dill K.A.
      • MacCallum J.L.
      The protein-folding problem, 50 years on.
      ,
      • Bryngelson J.D.
      • Onuchic J.N.
      • Socci N.D.
      • Wolynes P.G.
      Funnels, pathways, and the energy landscape of protein folding: a synthesis.
      ,
      • Wolynes P.G.
      • Onuchic J.N.
      • Thirumalai D.
      Navigating the folding routes.
      ). In the ideal funnel limit, proteins can smoothly flow from unfolded states at the top of the funnel to the folded ensemble. On the other hand, this process is associated with a large configurational entropy reduction, which can give rise to a significant free energy barrier. As a result, even in the minimally rugged funnels, protein folding remains a rare, thermally activated reaction.
      Gō models of proteins further simplify the nature of the folding landscape by encoding the native structure of the protein in the potential energy and, for the most part, ignoring all nonnative interactions (
      • Gō N.
      Statistical mechanics of protein folding, unfolding and fluctuation.
      ,
      • Taketomi H.
      • Ueda Y.
      • Gō N.
      Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions.
      ,
      • Hills Jr., R.D.
      • Brooks 3rd, C.L.
      Insights from coarse-grained Gō models for protein folding and dynamics.
      ,
      • Noel J.K.
      • Onuchic J.N.
      The many faces of structure-based potentials: from protein folding landscapes to structural characterization of complex biomolecules.
      ,
      • Hyeon C.
      • Thirumalai D.
      Capturing the essence of folding and functions of biomolecules using coarse-grained models.
      ). This reduces the complexity of the calculations and makes the rare folding events computationally accessible. The term structure-based models (SBMs) has been used to refer to Gō-type models, which, in addition to native structure-derived terms, may contain knowledge-based terms that are derived from sequence information (
      • Karanicolas J.
      • Brooks 3rd, C.L.
      Improved Gō-like models demonstrate the robustness of protein folding mechanisms towards non-native interactions.
      ,
      • Azia A.
      • Levy Y.
      Nonnative electrostatic interactions can modulate protein folding: molecular dynamics with a grain of salt.
      ), nonnative interactions (
      • Yadahalli S.
      • Hemanth Giri Rao V.V.
      • Gosavi S.
      Modeling non-native interactions in designed proteins.
      ), information from additional native structures (
      • Whitford P.C.
      • Miyashita O.
      • Levy Y.
      • Onuchic J.N.
      Conformational transitions of adenylate kinase: switching by cracking.
      ), etc. However, because this review deals mostly with models that include few non-structure-derived interactions, we use the terms Gō models and SBMs interchangeably. Gō models have commonly been simulated using either Monte Carlo or MD methods and are computationally relatively inexpensive. A further reduction in complexity can be achieved through coarse-graining to include either a single Cα bead or a few beads per residue (Fig. 2). Thus, Gō models are simple protein models that enable extensive sampling of potential energy landscapes. Additionally, because so few beads are involved, the data are easier to interpret. Further, standard enhanced sampling methods, such as replica exchange, used in MD simulations can be used when folding proteins with large barriers (
      • Okamoto Y.
      Generalized-ensemble algorithms: enhanced sampling techniques for Monte Carlo and molecular dynamics simulations.
      ,
      • Bernardi R.C.
      • Melo M.C.R.
      • Schulten K.
      Enhanced sampling techniques in molecular dynamics simulations of biological systems.
      ). Gō-model simulations have been successfully used to understand both the structural (folding mechanisms, intermediate populations, etc.) and the kinetic features (barrier heights, rates of different events, etc.) of protein folding (
      • Gō N.
      Statistical mechanics of protein folding, unfolding and fluctuation.
      ,
      • Hills Jr., R.D.
      • Brooks 3rd, C.L.
      Insights from coarse-grained Gō models for protein folding and dynamics.
      ,
      • Noel J.K.
      • Onuchic J.N.
      The many faces of structure-based potentials: from protein folding landscapes to structural characterization of complex biomolecules.
      ,
      • Hyeon C.
      • Thirumalai D.
      Capturing the essence of folding and functions of biomolecules using coarse-grained models.
      ).
      Figure thumbnail gr2
      Figure 2Gō (SBM) model schematic. Coarse graining sets the chain connectivity while encoding the native structure. Two types of constraints are encoded in these models: (i) local along the polypeptide chain constraints consisting of bond constraints between two consecutive beads, angular constraints between three consecutive beads, and dihedral potentials between four consecutive beads; (ii) longer-distance contact interactions, which are attractive when two beads are within the contact distance in the native structure and are otherwise repulsive, accounting for the excluded volume of the beads. In some implementations, nonnative attractive interactions replace these repulsive interactions.

      Encoding structure in Gō models

      Gō models encode protein structure through two features: (i) the chain connectivity of the protein and (ii) contact interactions present in the native state (Fig. 2). Chain connectivity is encoded by having strong bonds (that cannot break during the folding simulation) between those beads that represent atoms or groups of atoms that are connected by chemical bonds in the protein. Additional local (along the backbone) interactions can include strong angular constraints between three consecutive beads (connected by bonds) and dihedral interactions between four consecutive beads. Strong dihedral interactions may be used to preserve chain chirality or other structural constraints, such as the planarity of rings. Weaker structure-derived or statistical dihedral interactions serve as a proxy for secondary structural propensity. These interactions are weak enough to break and form on the timescale of the simulation. Contact interactions are defined between two beads that do not interact through any local interactions and that are close in space in the folded structure of the protein. Two beads are considered to be close in space if either they or, alternately, the atoms that they represent are within a cutoff distance of each other. Other more complex ways of defining contact interactions exist and sometimes work better in folding simulations than the cutoff-based contact map (
      • Noel J.K.
      • Whitford P.C.
      • Onuchic J.N.
      The shadow map: a general contact definition for capturing the dynamics of biomolecular folding and function.
      ). Contacts may also all have the same (homogeneous) strengths or be assigned contact strengths based on the atoms or the residues that are in contact. Finally, an excluded volume interaction is present between those beads that are not in contact in the native state. This ensures that beads do not pass through each other and the chain does not cross. Different flavors of Gō models have been tested on a few proteins, and for the most part, as long as little to no frustration in the form of nonnative interactions is encoded, diverse features of protein folding calculated from the different models are both similar to each other and to experimental results (
      • Karanicolas J.
      • Brooks 3rd, C.L.
      Improved Gō-like models demonstrate the robustness of protein folding mechanisms towards non-native interactions.
      ,
      • Chavez L.L.
      • Onuchic J.N.
      • Clementi C.
      Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates.
      ,
      • Cho S.S.
      • Levy Y.
      • Wolynes P.G.
      Quantitative criteria for native energetic heterogeneity influences in the prediction of protein folding kinetics.
      ). However, functional regions of proteins can be either energetically or topologically frustrated, and in such cases, different unfrustrated Gō models can give different folding features (
      • Terse V.L.
      • Gosavi S.
      The sensitivity of computational protein folding to contact map perturbations: the case of ubiquitin folding and function.
      ,
      • Yadahalli S.
      • Gosavi S.
      Functionally relevant specific packing can determine protein folding routes.
      ), and energetic frustration in the form of nonnative interactions (Fig. 2) needs to be explicitly included in the model (
      • Azia A.
      • Levy Y.
      Nonnative electrostatic interactions can modulate protein folding: molecular dynamics with a grain of salt.
      ,
      • Han J.-H.
      • Batey S.
      • Nickson A.A.
      • Teichmann S.A.
      • Clarke J.
      The folding and evolution of multidomain proteins.
      ,
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Micheletti C.
      • Faccioli P.
      Folding pathways of a knotted protein with a realistic atomistic force field.
      ,
      • Chen T.
      • Chan H.S.
      Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins.
      ).
      A key advantage of using a Gō model is that it can be easily modified to include specific native or nonnative terms (increasing the complexity of the model) when data from the simplest model do not agree with a specific experimental observable, such as the population of an intermediate. When the addition of an extra term to the potential energy function leads to an agreement with experiments, then it can be concluded that that term leads to that specific feature. In other words, in some cases, extra terms can lead to a better understanding of the physical basis of experimental findings.

      Encoding frustration in Gō models

      As stated earlier, the funneled energy landscape of structured proteins is a result of selection for protein sequences in which interactions present in the folded state are much more stabilizing than nonnative interactions, thus reducing frustration (
      • Dill K.A.
      • MacCallum J.L.
      The protein-folding problem, 50 years on.
      ,
      • Bryngelson J.D.
      • Onuchic J.N.
      • Socci N.D.
      • Wolynes P.G.
      Funnels, pathways, and the energy landscape of protein folding: a synthesis.
      ,
      • Wolynes P.G.
      • Onuchic J.N.
      • Thirumalai D.
      Navigating the folding routes.
      ). However, residues that perform function in the folded state need to be conserved and cannot always be chosen to reduce energetic frustration. The interactions of functional residues (e.g. amino acids in the active site or at protein-protein interfaces) with residues that have been selected to reduce frustration and promote folding can lead to trapping during folding (
      • Giri Rao V.V.H.
      • Gosavi S.
      Using the folding landscapes of proteins to understand protein function.
      ,
      • Ferreiro D.U.
      • Komives E.A.
      • Wolynes P.G.
      Frustration in biomolecules.
      ). Such functional frustration can show up as an increase in complexity of the folded state due to the presence of additional functional secondary structural elements in the native state. This can lead to larger barriers to folding, backtracking during folding, etc. An alternate signature of functional frustration is a loss of contacts because functional amino acids are chosen to stabilize interprotein interactions rather than intraprotein interactions. This can lead to lower barriers to folding, stalling of folding and the population of intermediates, etc. Such effects of function, which induce localized changes in protein structure, are detectable even in Gō models that do not encode frustration (
      • Giri Rao V.V.H.
      • Gosavi S.
      Using the folding landscapes of proteins to understand protein function.
      ,
      • Ferreiro D.U.
      • Komives E.A.
      • Wolynes P.G.
      Frustration in biomolecules.
      ).
      However, for some proteins, nonnative interactions need to be explicitly encoded in the Gō model for key folding features such as the population of a folding intermediate to agree with experiments (
      • Azia A.
      • Levy Y.
      Nonnative electrostatic interactions can modulate protein folding: molecular dynamics with a grain of salt.
      ,
      • Han J.-H.
      • Batey S.
      • Nickson A.A.
      • Teichmann S.A.
      • Clarke J.
      The folding and evolution of multidomain proteins.
      ,
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Micheletti C.
      • Faccioli P.
      Folding pathways of a knotted protein with a realistic atomistic force field.
      ,
      • Chen T.
      • Chan H.S.
      Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins.
      ). Nonspecific nonnative interactions can be added to purely structure-based Gō models by introducing an attractive interaction at a chosen interaction distance between all of those pairs of beads that are not in native contact (Fig. 2). Alternately, nonnative interactions may be added between selected groups of amino acids, such as the hydrophobic amino acids or the charged amino acids. Such nonspecific nonnative interactions, their forms, and their utility have been reviewed in detail elsewhere (
      • Yadahalli S.
      • Hemanth Giri Rao V.V.
      • Gosavi S.
      Modeling non-native interactions in designed proteins.
      ,
      • Chen T.
      • Song J.
      • Chan H.S.
      Theoretical perspectives on nonnative interactions and intrinsic disorder in protein folding and binding.
      ).
      Several proteins undergo conformational transitions, converting from one structural ensemble to a distinct structural ensemble upon ligand binding or chemical modification. In such cases, interactions that are “nonnative” in the unbound (or unmodified) structural ensemble are formed in the bound (or modified) ensemble. Such specific nonnative interactions can be appended to the Gō model of the unbound structure (
      • Hills Jr., R.D.
      • Brooks 3rd, C.L.
      Insights from coarse-grained Gō models for protein folding and dynamics.
      ,
      • Whitford P.C.
      • Miyashita O.
      • Levy Y.
      • Onuchic J.N.
      Conformational transitions of adenylate kinase: switching by cracking.
      ,
      • Whitford P.C.
      • Sanbonmatsu K.Y.
      • Onuchic J.N.
      Biomolecular dynamics: order-disorder transitions and energy landscapes.
      ). This class of models, termed dual-structure–based models, has often been used to understand conformational transitions of proteins but has rarely been used to understand folding. However, when a single large conformational transition dominates the function of a protein, which is the case for serpins, it is likely that simulations using such dual-structure–based models will be a computationally inexpensive way to capture the functional frustration present in the folding energy landscape.

      All-atom-enhanced sampling based on a native-centric biasing force

      An alternative strategy for simulating rare macromolecular transitions consists of retaining full all-atom resolution with chemically motivated realistic forces, but resorting to more sophisticated algorithms, possibly combined with additional approximations, to lower the cost of characterizing rare transitions. Although computationally more expensive, such methods have the advantage of accounting for the chemistry of the side chains and can be used to investigate the effects of mutations on folding. Many such enhanced path-sampling techniques have been developed during the last 2 decades (for a recent review, see, for example, Ref.
      • Elber R.
      A new paradigm for atomically detailed simulations of kinetics in biophysical systems.
      ). Some of these methods are based on reconstructing the reaction kinetics from a statistical analysis of many short MD trajectories (an incomplete list includes transition interface sampling (
      • Bolhuis P.G.
      • Chandler D.
      • Dellago C.
      • Geissler P.L.
      Transition path sampling: throwing ropes over rough mountain passes, in the dark.
      ,
      • Cabriolu R.
      • Skjelbred Refsnes K.M.
      • Bolhuis P.G.
      • van Erp T.S.
      Foundations and latest advances in replica exchange transition interface sampling.
      ), Markov state models (
      • Husic B.E.
      • Pande V.S.
      Markov state models: From an art to a science.
      ), and milestoning (
      • Bello-Rivas J.M.
      • Elber R.
      Simulations of thermodynamics and kinetics on rough energy landscapes with milestoning.
      )). This way, it is in principle possible to obtain predictions statistically consistent with plain MD simulations while massively distributing the computational load. In practice, however, these schemes still require huge computational resources and cannot be applied to slow, complex reactions, such as the folding or conformational changes of large proteins, which can occur on time scales of minutes to hours.
      A computationally efficient way to further lower the computational cost of atomistic simulations consists of introducing biasing forces to promote escape from metastable states (e.g. see Refs.
      • Barducci A.
      • Bonomi M.
      • Parrinello M.
      Metadynamics.
      and
      • Paci E.
      • Karplus M.
      Forced unfolding of fibronectin type 3 modules: an analysis by biased molecular dynamics simulations.
      ,
      • Camilloni C.
      • Broglia R.A.
      • Tiana G.
      Hierarchy of folding and unfolding events of protein G, CI2, and ACBP from explicit-solvent simulations.
      ,
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Faccioli P.
      Dominant folding pathways of a WW domain.
      ,
      • a Beccara S.
      • Fant L.
      • Faccioli P.
      Variational scheme to compute protein reaction pathways using atomistic force fields with explicit solvent.
      ). One particularly useful biasing force is implemented in ratchet-and-pawl MD, in which a history-dependent force is only applied to prevent the system from backtracking (
      • Paci E.
      • Karplus M.
      Forced unfolding of fibronectin type 3 modules: an analysis by biased molecular dynamics simulations.
      ,
      • Camilloni C.
      • Broglia R.A.
      • Tiana G.
      Hierarchy of folding and unfolding events of protein G, CI2, and ACBP from explicit-solvent simulations.
      ). In native-centric simulations, the collective coordinate of interest for backtracking is the contact map distance from the native state. In this implementation of ratchet-and-pawl MD, no biasing force is present as long as the protein is not moving away from the native state (e.g. as long as the total number of native contacts formed remains constant or increases or no nonnative contacts are formed). When the biasing force is off, the simulation is identical to a conventional MD simulation, and the motion of the system is determined entirely by the physical forces between atoms. When the system evolves in such a way that the total number of native contacts decreases or nonnative contacts are formed, then a biasing force is applied to discourage (but not absolutely prevent) the move from occurring. Unlike Gō type models, the energy landscape is not a smooth ideal funnel, and local minima and barriers still exist. However, the ratchet force facilitates escape from local minima and therefore allows for computationally efficient simulations of folding (Fig. 3).
      Figure thumbnail gr3
      Figure 3The BF method for simulating protein folding using all-atom force fields and ratchet-and-pawl MD. A, schematic view of multiple folding trajectories from the BF approach to transition path sampling. In the BF approach, the force field, V(R), is a conventional, all-atom force field (e.g. Amber or CHARMM) plus the history-dependent ratchet-and-pawl bias allowing for the efficient production of multiple, trial folding trajectories (lines in the funnel). The ratchet-and-pawl (right) bias limits backtracking, and the least biased trajectories (red) are selected for analysis. (The ratchet figure is from Antoni Espinosa (commons.wikimedia.org/wiki/File:Trinquete.png).6 The funnel is from the Oas laboratory at Duke University (https://oaslab.com/Drawing_funnels.html).6 B, schematic explanation of the steps involved in implementing the BF method for folding simulations adapted from Wang et al. (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ). In the initial step, the protein of interest is unfolded using high-temperature MD simulations. BF folding simulations are then performed using the force field defined above, and multiple trial folding trajectories are generated. Note that folding is not always successful, and some protein molecules fail to fold completely or misfold, and these results may be particularly pronounced for mutant proteins. Folding and misfolding trajectories with minimum biasing (yellow lines) are identified and analyzed.
      One of the first challenges in introducing a bias is that if the reaction coordinate chosen for biasing is suboptimal, uncontrolled systematic errors may occur. A number of approaches have been proposed to keep these errors to a minimum. In particular, the bias functional (BF) method (
      • a Beccara S.
      • Fant L.
      • Faccioli P.
      Variational scheme to compute protein reaction pathways using atomistic force fields with explicit solvent.
      ) relies on generating a large number of trial transition pathways using ratchet and pawl MD and subsequently scoring these trial pathways according to a specific penalty function. It can be shown that the paths corresponding to the lowest value of this penalty function are the most realistic, in the sense that they have the largest probability to occur in the absence of the biasing force.
      Such post-processing of ensembles of possible transition pathways has the advantage of keeping the systematic error introduced by the biasing force to a minimum while enabling extremely slow and complex reactions to be simulated on typical computer clusters available to most computational biophysics/biochemistry laboratories.
      The BF approach has been benchmarked against the results of plain MD simulations for the folding of small proteins (
      • a Beccara S.
      • Fant L.
      • Faccioli P.
      Variational scheme to compute protein reaction pathways using atomistic force fields with explicit solvent.
      ) and directly against experimental data for folding kinetics (
      • Wang F.
      • Cazzolli G.
      • Wintrode P.
      • Faccioli P.
      Folding mechanism of proteins Im7 and Im9: insight from all-atom simulations in implicit and explicit solvent.
      ,
      • Ianeselli A.
      • Orioli S.
      • Spagnolli G.
      • Faccioli P.
      • Cupellini L.
      • Jurinovich S.
      • Mennucci B.
      Atomic detail of protein folding revealed by an ab initio reappraisal of circular dichroism.
      ). It has since been applied to simulate very slow folding reactions of large proteins, including serpins (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ) and even proteins with knotted native structures (
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Micheletti C.
      • Faccioli P.
      Folding pathways of a knotted protein with a realistic atomistic force field.
      ). These processes are far too slow to be simulated by plain MD, even by massively distributed computing or by resorting to the largest special-purpose supercomputing facility. The BF approach successfully predicted differences in the folding kinetics of two structurally homologous proteins by showing that one of these proteins had to overcome an additional free-energy barrier to reach the native state (
      • Wang F.
      • Cazzolli G.
      • Wintrode P.
      • Faccioli P.
      Folding mechanism of proteins Im7 and Im9: insight from all-atom simulations in implicit and explicit solvent.
      ). In this case, the chemical information present in the atomistic description of the amino acids was required to distinguish between the folding pathways of the structurally homologous proteins.
      Despite these promising results, it should be emphasized that if the collective coordinate (e.g. the contact map) adopted in the definition of the biasing force is not a good reaction coordinate, then the minimum-bias paths identified by the BF scheme will still be plagued by large systematic errors. In principle, these errors could affect the reliability of the BF calculations, in particular when they are applied to study the folding of large chains with complex folding mechanisms, where a biasing coordinate based on the distance from the native contact map may not be a good reaction coordinate.
      To tackle this problem, an important improvement of the BF method, called self-consistent path sampling (SCPS) has been recently introduced (
      • Orioli S.
      • a Beccara S.
      • Faccioli P.
      Self-consistent calculation of protein folding pathways.
      ). In this scheme, the biasing collective coordinate is not chosen a priori. Instead, it is calculated self-consistently, through an iterative procedure, starting from an initial guess. This way, at convergence, the dynamics is accelerated by a bias that acts along a direction set by a realistic reaction coordinate (technically, a parametrization of the so-called committor function). A number of numerical tests on toy systems designed to emphasize the problems of the BF method have shown that SCPS can lead to significant improvement. The computational cost of SCPS simulations is, however, about 1 order of magnitude larger than that of BF simulations. Although SCPS simulations are significantly more computationally challenging, they are still feasible using existing supercomputing facilities or clusters based on a hybrid GPU-CPU computing platform.
      Finally, it should be emphasized that unlike structure-based methods discussed above, BF and SCPS are intrinsically nonequilibrium methods. As a result, the statistical methods that are commonly adopted to estimate equilibrium properties (such as free energy barriers) from molecular simulations cannot be applied. Thus, most of the applications of BF and SCPS made to date are based on semiquantitative analyses. For example, studies have been performed to estimate the relative change in free-energy generated by specific point mutations. Recently, more elaborate statistical methods have been proposed for recovering equilibrium distributions from SCPS and BF nonequilibrium simulations. For example, a recently proposed scheme makes it possible to sample the Boltzmann distribution in the transition region by means of specific ratchet-and-pawl simulations (
      • Bartolucci G.
      • Orioli S.
      • Faccioli P.
      Transition path theory from biased simulations.
      ). To date, however, these techniques have only been validated against MD simulations for simple systems, and further validation is required to assess their accuracy for realistic and biologically relevant systems.
      Both Gō and BF computational methods have been used to simulate the folding of the human serpin AAT (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ,
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ). We therefore next describe the structural properties of AAT and highlight how these two approaches have led to complementary insights into the complex folding mechanisms of this inhibitory serpin.

      Inhibitory serpins: A prototype for folding large, topologically complex proteins

      Inhibitory serpins are the most common inhibitors of serine and cysteine proteases and are found in all kingdoms of life (
      • Lucas A.
      • Yaron J.R.
      • Zhang L.
      • Ambadapadi S.
      Overview of serpins and their roles in biological systems.
      ). Large conformational changes of these two-domain, ∼400-amino-acid-long, topologically complex proteins are required for both regulation and function (Fig. 4). The ubiquity of this multidomain protein family, the functionally required metastability of the active structure, and human diseases associated with serpin misfolding motivated both Gō- and BF-based folding simulations of AAT, the canonical human inhibitory serpin (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ,
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ).
      Figure thumbnail gr4
      Figure 4Inhibitory serpin structure and function. Shown is an active, metastable AAT structure (PDB entry 1QLP) (
      • Elliott P.R.
      • Pei X.Y.
      • Dafforn T.R.
      • Lomas D.A.
      Topography of a 2.0 Å structure of α1-antitrypsin reveals targets for rational drug design to prevent conformational disease.
      ) with a solvent-accessible RCL (purple). The structure is colored from blue to pink from the N to the C terminus. The α/β domain (CATH domain 2) is shown in blue (residues 23–190) and yellow (residues 290–340), whereas the mainly β domain (CATH domain 1), which includes the solvent-exposed RCL (residues 341–361), is shown in green (residues 191–290), purple (RCL), and pink (residues 362–394). Spontaneous insertion of the RCL into sheet A remodels the domains, adding the RCL to the α/β domain, resulting in the lower-free-energy inactive latent state (PDB entry 1IZ2 (
      • Im H.
      • Woo M.-S.
      • Hwang K.Y.
      • Yu M.-H.
      Interactions causing the kinetic trap in serpin protein folding.
      )), and the latency transition is important for regulating the activity of some serpins (
      • Mottonen J.
      • Strand A.
      • Symersky J.
      • Sweet R.M.
      • Danley D.E.
      • Geoghegan K.F.
      • Gerard R.D.
      • Goldsmith E.J.
      Structural basis of latency in plasminogen activator inhibitor-1.
      ). Cleavage of the RCL by target serine and cysteine proteases results in the formation of an acyl enzyme bond between the protease active site and the RCL, cleavage of the RCL, and insertion of the cleaved RCL into sheet A translocating the covalently attached protease 70 Å from one pole of the serpin to the other as shown by the structure of the kinetically trapped trypsin-AAT inhibitory complex (PDB entry 1EZX (
      • Huntington J.A.
      • Read R.J.
      • Carrell R.W.
      Structure of a serpin-protease complex shows inhibition by deformation.
      )). Trypsin is in gray with the catalytic triad in red. Ser-195 in the trypsin catalytic triad and AAT Met-358, which form the intermolecular bond, are shown in red and purple spacefill, respectively. The N-terminal 22–23 residues in AAT lack electron density in the X-ray crystal structures, indicating that the extreme N terminus is disordered.
      Inhibitory serpins regulate target proteases by mechanically deforming the protease active site (
      • Gettins P.G.W.
      Serpin structure, mechanism, and function.
      ,
      • Huntington J.A.
      • Read R.J.
      • Carrell R.W.
      Structure of a serpin-protease complex shows inhibition by deformation.
      ,
      • Dementiev A.
      • Dobó J.
      • Gettins P.G.W.
      Active site distortion is sufficient for proteinase inhibition by serpins: structure of the covalent complex of α1-proteinase inhibitor with porcine pancreatic elastase.
      ). The energy for this mechanical process is stored in the metastable, stressed serpin conformation characterized by the solvent-exposed reactive center loop (RCL), which acts as bait for proteases (Fig. 4). The initial stages in interactions between the RCL and the Ser or Cys target protease are the same as those for a normal protease substrate; the target protease docks to the RCL, the acyl intermediate with a covalent bond between the catalytic Ser in the protease and the RCL is formed, and the peptide bond is cleaved. In inhibitory serpins, RCL cleavage leads to insertion of the cleaved RCL into β sheet A, the central β sheet, as a sixth, central strand, translocating the covalently attached protease ∼70 Å relative to the serpin, increasing the distance between the catalytic Ser and His residues in the protease catalytic triad from ∼3 to ∼6 Å and trapping the acyl intermediate with its covalent bond between the catalytic Ser and the cleaved RCL (
      • Huntington J.A.
      • Read R.J.
      • Carrell R.W.
      Structure of a serpin-protease complex shows inhibition by deformation.
      ,
      • Dementiev A.
      • Dobó J.
      • Gettins P.G.W.
      Active site distortion is sufficient for proteinase inhibition by serpins: structure of the covalent complex of α1-proteinase inhibitor with porcine pancreatic elastase.
      ,
      • Dementiev A.
      • Simonovic M.
      • Volz K.
      • Gettins P.G.W.
      Canonical inhibitor-like interactions explain reactivity of α1-proteinase inhibitor Pittsburgh and antithrombin with proteinases.
      ,
      • Ye S.
      • Cech A.L.
      • Belmares R.
      • Bergstrom R.C.
      • Tong Y.
      • Corey D.R.
      • Kanost M.R.
      • Goldsmith E.J.
      The structure of a Michaelis serpin-protease complex.
      ). This reaction results in a protease-serpin covalent complex containing an inhibited, mechanically deformed protease and a serpin in the relaxed, RCL-inserted conformation The energy for this large conformational change is stored in the active, metastable stressed serpin structure, and mutations in regions critical to this functionally required conformational lability are often associated with misfolding and disease (
      • Stein P.E.
      • Carrell R.W.
      What do dysfunctional serpins tell us about molecular mobility and disease?.
      ,
      • Gettins P.G.W.
      Serpin structure, mechanism, and function.
      ,
      • Gooptu B.
      • Lomas D.A.
      Conformational pathology of the serpins: themes, variations, and therapeutic strategies.
      ).
      Why would such a complex—and potentially dangerous—inhibitory mechanism have been favored during evolution? We can only speculate, but proteolysis is a rather dangerous process for cells and organisms because runaway proteolysis is likely to result in severe injury or death. For example, in animals, many of the processes regulated by serpins, such as complement activation, fibrinolysis, and hemostasis, involve proteolytic cascades in which a very small amount of protease at the beginning of the cascade can, through amplification, produce large numbers of activated proteases at the end. For example, a small amount of active factor IXa can result in large amounts of active thrombin. It has been suggested that the irreversible suicide inhibition resulting from the serpin inhibitory mechanism enables tighter control of potential proteolytic cascades than noncovalent inhibition mechanisms associated with other families of protease inhibitors (
      • Huntington J.A.
      Shape-shifting serpins–advantages of a mobile mechanism.
      ).
      In addition to protease inhibition, serpins can spontaneously deactivate by releasing strand 1C (C-terminal to the RCL) from β sheet C, thus lengthening the flexible RCL and allowing insertion of the intact RCL into β sheet A as a sixth, central strand resulting in the latent conformation (Fig. 4) (
      • Mottonen J.
      • Strand A.
      • Symersky J.
      • Sweet R.M.
      • Danley D.E.
      • Geoghegan K.F.
      • Gerard R.D.
      • Goldsmith E.J.
      Structural basis of latency in plasminogen activator inhibitor-1.
      ,
      • Gettins P.G.W.
      Serpin structure, mechanism, and function.
      ). The latent structure resembles the relaxed form in the protease-serpin complex, but β sheet C is disrupted, and the RCL is still intact. For some serpins, modulating the probability of transitioning to the latent state provides another means of regulating serpin activity and protease inhibition (
      • Mottonen J.
      • Strand A.
      • Symersky J.
      • Sweet R.M.
      • Danley D.E.
      • Geoghegan K.F.
      • Gerard R.D.
      • Goldsmith E.J.
      Structural basis of latency in plasminogen activator inhibitor-1.
      ,
      • Gettins P.G.W.
      Serpin structure, mechanism, and function.
      ).
      The latent conformation has lower free energy than does the active state; however, to our knowledge, direct folding to the latent state has not been observed for any serpin. Even serpins that readily transition to the latent state make this transition via the metastable, active conformation (
      • Hansen M.
      • Busse M.N.
      • Andreasen P.A.
      Importance of the amino-acid composition of the shutter region of plasminogen activator inhibitor-1 for its transitions to latent and substrate forms.
      ,
      • Zhang Q.
      • Buckle A.M.
      • Law R.H.P.
      • Pearce M.C.
      • Cabrita L.D.
      • Lloyd G.J.
      • Irving J.A.
      • Smith A.I.
      • Ruzyla K.
      • Rossjohn J.
      • Bottomley S.P.
      • Whisstock J.C.
      The N terminus of the serpin, tengpin, functions to trap the metastable native state.
      ).
      The native structures of AAT and other serpins are topologically complex and consist of three β sheets and 8–9 α helices (Fig. 4). These secondary structural elements form two nonsequential domains (
      • Dawson N.L.
      • Lewis T.E.
      • Das S.
      • Lees J.G.
      • Lee D.
      • Ashford P.
      • Orengo C.A.
      • Sillitoe I.
      CATH: an expanded resource to predict protein function through structure and sequence.
      ) with three connections between domains. The α/β domain (CATH domain 2) contains seven of the nine α helices, including four at the N terminus, and the central A β sheet. The mainly β domain (CATH domain 1), which includes the C terminus, is composed of β sheets B and C and the remaining two α helices. Unlike many multidomain proteins, the two domains of the serpin fold are interdigitated, and both contain residues from the N- and C-terminal regions of the sequence. The RCL switches between domains forming part of the mainly β domain (CATH domain 1) when solvent-accessible in the metastable, active conformation and part of the α/β domain (CATH domain 2) in the latent state and protease-serpin complex, where the RCL forms strand 4A in the central β sheet.
      How inhibitory serpins with diverse sequences from a wide range of organisms fold to the kinetically trapped higher-free-energy metastable state while avoiding the lower-free-energy latent state has long been a puzzle in the field.

      Gō-model simulations of inhibitory serpin folding to the active and latent states

      The consistency of inhibitory serpin folding suggests that folding to the metastable active state is encoded in the structure rather than in specific sequences, implying that Cα based SBMs should be sufficient to explain this phenomenon.
      To address this question, Giri Rao and Gosavi (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ) used Gō models to simulate folding of human AAT to the metastable, active, and latent conformations. Proteins that undergo large conformational transitions between stable structures have often been studied using dual-structure–based models in which favorable energies are included for contacts present in either of the two conformations, leading to an energy landscape with two alternative minima. However, to address the basis of serpin folding to a metastable conformation, a different approach was taken. Two independent Gō model-folding simulations were performed. In one, the native (target) state was taken to be the metastable active structure, and in the other, the native state was the more stable latent structure (see Figure 4, Figure 5).
      Figure thumbnail gr5
      Figure 5Comparison of the folding free energy profiles (FEP) calculated at their respective folding temperatures (Tf) using the SBMs of active and latent AAT structures. Simulations were performed using replica exchange umbrella sampling (see Ref.
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      for further details). The FEP, the change in Gibbs free energy relative to the thermal energy at the folding temperature, ΔG/kBTf, as a function of the fraction of formed native contacts, Q, for latent and active AAT are plotted in gray and black, respectively. The native ensembles, N, are at Q ≈ 0.84; the transition state ensemble of latent AAT, TSlatent, and the intermediate ensemble, Iactive, of active AAT are at Q ≈ 0.4; and the unfolded ensembles, U, are at Q ≈ 0.1. The relative changes in enthalpy, ΔΔH (active minus latent), and entropy ΔΔS (active minus latent), between the folding of active and latent AAT, plotted versus Q are shown in red and blue, respectively. ΔΔS at Q ≈ 0.4 is higher than ΔΔH at Q ≈ 0.4. Aligned representative structures from the intermediate ensembles are shown with the N-terminal unfolded regions shown in gray. Folded structures (active: PDB entry 1QLP (
      • Elliott P.R.
      • Pei X.Y.
      • Dafforn T.R.
      • Lomas D.A.
      Topography of a 2.0 Å structure of α1-antitrypsin reveals targets for rational drug design to prevent conformational disease.
      ); latent: PDB entry 1IZ2 (
      • Im H.
      • Woo M.-S.
      • Hwang K.Y.
      • Yu M.-H.
      Interactions causing the kinetic trap in serpin protein folding.
      )) are also shown with the same coloring as the intermediate structures (N→C-terminal: red through green to blue). The C-terminal region and the RCL are structured in both TSlatent and Iactive. The FEP graph was adapted from Giri Rao and Gosavi (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ). This research was originally published in Proceedings of the National Academy of Sciences U.S.A. Giri Rao, V. V. H., and Gosavi, S. On the folding of a structurally complex protein to its metastable active state. Proc. Natl. Acad. Sci. U.S.A. 2018; 115:1998–2003. © National Academy of Sciences.
      The Gō-model simulations of AAT folding to the metastable, active structure found that the mainly β domain (CATH domain 1), including most of sheets B and C but lacking the two C-terminal β strands, 4 and 5B, folds early, whereas the α/β domain remains largely unfolded (Fig. 5). This partially folded structure with a largely folded β domain comprises a major intermediate along the folding pathway consistent with the experimental observations that folding to the active, metastable conformation involves at least one intermediate (
      • Kim D.
      • Yu M.H.
      Folding pathway of human α1-antitrypsin: characterization of an intermediate that is active but prone to aggregation.
      ,
      • Tsutsui Y.
      • Dela Cruz R.
      • Wintrode P.L.
      Folding mechanism of the metastable serpin α1-antitrypsin.
      ,
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ,
      • Im H.
      • Woo M.-S.
      • Hwang K.Y.
      • Yu M.-H.
      Interactions causing the kinetic trap in serpin protein folding.
      ,
      • Zhang Q.
      • Buckle A.M.
      • Law R.H.P.
      • Pearce M.C.
      • Cabrita L.D.
      • Lloyd G.J.
      • Irving J.A.
      • Smith A.I.
      • Ruzyla K.
      • Rossjohn J.
      • Bottomley S.P.
      • Whisstock J.C.
      The N terminus of the serpin, tengpin, functions to trap the metastable native state.
      ,
      • Bruch M.
      • Weiss V.
      • Engel J.
      Plasma serine proteinase inhibitors (serpins) exhibit major conformational changes and a large increase in conformational stability upon cleavage at their reactive sites.
      ,
      • Bottomley S.P.
      The folding pathway of α1-antitrypsin: avoiding the unavoidable.
      ). Subsequently, the α/β domain folds, and, in one of the last folding steps, strands 4 and 5B are incorporated into the mainly β domain. This order of events is in good agreement with available data on AAT-folding kinetics from hydrogen/deuterium exchange coupled to MS (
      • Tsutsui Y.
      • Dela Cruz R.
      • Wintrode P.L.
      Folding mechanism of the metastable serpin α1-antitrypsin.
      ), fast photochemical oxidation coupled to MS (
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ), and tryptophan fluorescence spectroscopy (
      • Kim D.
      • Yu M.H.
      Folding pathway of human α1-antitrypsin: characterization of an intermediate that is active but prone to aggregation.
      ).
      Simulations of folding to the latent state showed significant differences from folding to the metastable structure (Fig. 5). In the latent state, the inserted RCL is strand 4A in the α/β domain, hydrogen-bonding with strands 3 and 5A (Figure 4, Figure 5). During folding to the latent state, stable contacts between the RCL (4A) and β strand 5A formed early, and consolidation of these two strands resulted in concerted folding of the mainly β and α/β domains, with no significantly populated intermediate states along the folding pathway. This lack of an intermediate is consistent with experimental observations that unfolding of serpin species with cleaved, inserted RCLs appears to be two-state (
      • Bruch M.
      • Weiss V.
      • Engel J.
      Plasma serine proteinase inhibitors (serpins) exhibit major conformational changes and a large increase in conformational stability upon cleavage at their reactive sites.
      ).
      This simultaneous folding of both domains observed for folding to the latent state resulted in a large concerted loss of conformational entropy. This entropy loss leads to a folding free energy barrier that is higher than that seen for the stepwise folding pathway that leads to the active, metastable structure (Fig. 5). Based on these results, it is suggested that if RCL-strand 5A contacts form early during folding, they will subsequently break and allow folding to proceed along the lower free energy barrier pathway to the active metastable structure.
      This finding, that folding to the active, metastable state results from entropic contributions to folding barriers is experimentally testable. Giri Rao and Gosavi (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ) suggested incorporating a disulfide bond between the RCL and strand 5A. Under oxidizing conditions with an intact disulfide bond, this AAT variant should fold directly to the latent state with slower kinetics than would be observed for folding of the reduced AAT variant to the active, metastable state.
      These results demonstrate how simple, structure-based models can be used to address folding conundrums even for large proteins with complicated topologies, such as serpins.

      Investigating the effects of mutations on serpin folding using enhanced sampling based on biased all-atom simulations

      Because all-atom biased MD simulations retain the chemistry of the side chains they allow for the investigation of how sequence specific factors (e.g. mutations) affect folding. These investigations are particularly important for proteins such as serpins, where mutations can lead to disease-associated misfolding and aggregation (
      • Greene C.M.
      • Marciniak S.J.
      • Teckman J.
      • Ferrarotti I.
      • Brantly M.L.
      • Lomas D.A.
      • Stoller J.K.
      • McElvaney N.G.
      α1-Antitrypsin deficiency.
      ,
      • Gooptu B.
      • Lomas D.A.
      Conformational pathology of the serpins: themes, variations, and therapeutic strategies.
      ). Therefore, Wang et al. (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ) used the BF method to examine how known pathological mutations associated with human disease perturb AAT folding.
      The most common AAT mutation linked to severe disease is the Z mutation, E342K, which converts a Glu-Lys salt bridge at the base the RCL to a repulsive Lys-Lys interaction. In vitro, folding Z populates an aggregation-prone intermediate that can persist for hours (
      • Yu M.H.
      • Lee K.N.
      • Kim J.
      The Z type variation of human α1-antitrypsin causes a protein folding defect.
      ), and Z unfolds from the native to the intermediate state significantly faster than does WT (
      • Knaupp A.S.
      • Levina V.
      • Robertson A.L.
      • Pearce M.C.
      • Bottomley S.P.
      Kinetic instability of the serpin Z α1-antitrypsin promotes aggregation.
      ). In cells, this folding defect results in severe misfolding, degradation of nascent Z chains, and the formation and accumulation of insoluble polymers in the endoplasmic reticulum (ER) (
      • Lomas D.A.
      • Evans D.L.
      • Finch J.T.
      • Carrell R.W.
      The mechanism of Z α1-antitrypsin accumulation in the liver.
      ). The resulting low level of circulating AAT leads to lung disease, whereas accumulation of polymers in hepatocytes can lead to cell death and liver disease (
      • Greene C.M.
      • Marciniak S.J.
      • Teckman J.
      • Ferrarotti I.
      • Brantly M.L.
      • Lomas D.A.
      • Stoller J.K.
      • McElvaney N.G.
      α1-Antitrypsin deficiency.
      ). The structure(s) of the partially folded Z AAT species that mediate polymer formation is therefore of considerable medical interest, but its high aggregation propensity has hindered structural studies.
      The serpin fold consists of two nonsequential domains with extensive interdomain and nonlocal contacts (Fig. 4). Perhaps unsurprisingly, in all-atom BF folding simulations for WT, Z, and other AAT variants, folding began with the independent formation of local, sequential structural units (Fig. 6). For WT AAT, subsequent successful folding involved at least two pathways in which these structural units dock to each other in a defined order. In the major pathway, strands 4 and 5B at the C terminus and the α helices at the N terminus docked last. The N-terminal helical region is highly frustrated (
      • Ferreiro D.U.
      • Komives E.A.
      • Wolynes P.G.
      Frustration in biomolecules.
      ). Thus, the finding that the last step in folding is docking of the N-terminal α helices highlights the difficulties in folding frustrated regions. This result, that the C-terminal β strands 4 and 5B are incorporated into the AAT structure prior to incorporation of the N-terminal α helices in the major folding pathway, is a novel, experimentally testable prediction of the BF simulations.
      Figure thumbnail gr6
      Figure 6WT and Z AAT BF folding results. Shown are kinetic free energy landscapes from least biased trajectories plotted as the root mean square deviation from the metastable active X-ray crystal structure (PDB entry 1QLP (
      • Elliott P.R.
      • Pei X.Y.
      • Dafforn T.R.
      • Lomas D.A.
      Topography of a 2.0 Å structure of α1-antitrypsin reveals targets for rational drug design to prevent conformational disease.
      )) versus the fraction of native contacts, Q. The heat map is colored by the number of frames. A random sampling of the conformational ensembles from highly populated local minima 2 and 5 for WT and Z is shown with one randomly chosen colored conformation. The landscapes and conformational ensembles show that, within the simulated time interval, WT AAT does fold to the native conformation (local minimum 5) in some of the trajectories. Compared with the WT trajectories, Z begins misfolding at low Q (e.g. the conformational ensemble from local minimum 2), and even the conformations in local minimum 5 are not fully folded. Adapted from Wang et al. (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ). This research was originally published in Biophysical Journal. Wang, F., Orioli, S., Ianeselli, A., Spagnolli, G., a Beccara, S., Gershenson, A., Faccioli, P., and Wintrode, P. L. All-atom simulations reveal how single-point mutations promote serpin misfolding. Biophys. J. 2018; 114:2083–2094. © American Society for Biochemistry and Molecular Biology.
      In BF simulations, folding of the pathological Z mutant diverged from WT folding relatively early despite the use of the same all-atom force field (Amber99) and ratchet-and-pawl native-centric biasing force. In other words, the E342K mutation was sufficient to drive the variant to a nonnative structure. In WT AAT folding simulations, interactions were formed between β strands 5 and 6A in the α/β domain and sheets B and C in the mainly β domain. This interdomain association occurred early in the folding as the local structural elements began to dock with each other, and once formed, these interdomain interactions were preserved for the remainder of the WT folding pathway. In Z folding simulations, this association failed to occur, and as a result, the majority of simulations led to final structures in which β sheets in both domains failed to form correctly.
      In cellular studies, a number of AAT mutations have been made to rescue or better understand Z misfolding (
      • Brantly M.
      • Courtney M.
      • Crystal R.G.
      Repair of the secretion defect in the Z form of α1-antitrypsin by addition of a second mutation.
      ,
      • Sifers R.N.
      • Hardick C.P.
      • Woo S.L.
      Disruption of the 290–342 salt bridge is not responsible for the secretory defect of the PiZ α1-antitrypsin variant.
      ). Folding simulations for these and other variants suggest that unfavorable electrostatic interactions and steric clashes both play a role in Z misfolding and that there are a number of different ways to rescue Z misfolding.
      Experimental studies of the kinetics of AAT folding have mainly focused on WT AAT (
      • Kim D.
      • Yu M.H.
      Folding pathway of human α1-antitrypsin: characterization of an intermediate that is active but prone to aggregation.
      ,
      • Tsutsui Y.
      • Dela Cruz R.
      • Wintrode P.L.
      Folding mechanism of the metastable serpin α1-antitrypsin.
      ,
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ) with limited data on the kinetics of Z unfolding (
      • Yu M.H.
      • Lee K.N.
      • Kim J.
      The Z type variation of human α1-antitrypsin causes a protein folding defect.
      ,
      • Knaupp A.S.
      • Levina V.
      • Robertson A.L.
      • Pearce M.C.
      • Bottomley S.P.
      Kinetic instability of the serpin Z α1-antitrypsin promotes aggregation.
      ). The results of the BF simulations are in good agreement with data from kinetic and equilibrium AAT folding studies. Importantly, these simulations make new, testable predictions on how AAT variants fold and misfold (e.g. structural predictions for likely long-lived intermediates in the Z misfolding/folding pathways) and how Z misfolding might be rescued.

      Comparing all-atom biased simulations with Gō-model predictions

      An important issue to address is how the results of BF protein-folding simulations, performed using all-atom force fields, compare with the results of MD simulations based on simplified native centric (Gō) force fields. For small globular proteins, typically characterized by native structure with a simple native topology, both the BF and Gō-model approximation schemes lead to remarkably similar folding mechanisms (
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Faccioli P.
      Dominant folding pathways of a WW domain.
      ). On the other hand, some discrepancies seem to emerge for medium-sized proteins (e.g. consisting of more than 100 amino acids) and for proteins with nontrivial native topology (
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Micheletti C.
      • Faccioli P.
      Folding pathways of a knotted protein with a realistic atomistic force field.
      ,
      • Wang F.
      • Cazzolli G.
      • Wintrode P.
      • Faccioli P.
      Folding mechanism of proteins Im7 and Im9: insight from all-atom simulations in implicit and explicit solvent.
      ).
      A prototypical case of disagreement is the folding of the two evolutionarily related bacterial colicin immunity proteins, Im7 and Im9. These chains have nearly identical α-helical native structures; thus, Gō-type models cannot distinguish between them and predict identical folding kinetics (
      • Chen T.
      • Chan H.S.
      Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins.
      ). However, a number of kinetic folding experiments have shown that at neutral pH, Im7 populates a folding intermediate and shows three-state kinetics, whereas Im9 shows two-state folding kinetics (
      • Ferguson N.
      • Capaldi A.P.
      • James R.
      • Kleanthous C.
      • Radford S.E.
      Rapid folding with and without populated intermediates in the homologous four-helix proteins Im7 and Im9.
      ,
      • Capaldi A.P.
      • Shastry M.C.
      • Kleanthous C.
      • Roder H.
      • Radford S.E.
      Ultrarapid mixing experiments reveal that Im7 folds via an on-pathway intermediate.
      ,
      • Capaldi A.P.
      • Kleanthous C.
      • Radford S.E.
      Im7 folding mechanism: misfolding on a path to the native state.
      ). This difference is due to transient nonnative interactions that stabilize the folding intermediate in Im7 and can be resolved in Gō-type models augmented with sequence-dependent nonnative hydrophobic interactions (
      • Chen T.
      • Chan H.S.
      Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins.
      ). However, these Gō-type simulations required several model iterations before the correct model was arrived at. In contrast, the Im7 folding intermediate was correctly observed in BF simulations, based on the CHARMM36 force field in explicit water without any modifications to the BF method (
      • Wang F.
      • Cazzolli G.
      • Wintrode P.
      • Faccioli P.
      Folding mechanism of proteins Im7 and Im9: insight from all-atom simulations in implicit and explicit solvent.
      ).
      Differences between Gō-type and biased dynamics simulations were also observed in the folding of the smallest known polypeptide chain that folds into a topologically knotted native structure (
      • a Beccara S.
      • Škrbić T.
      • Covino R.
      • Micheletti C.
      • Faccioli P.
      Folding pathways of a knotted protein with a realistic atomistic force field.
      ). However, remarkable agreement between these two approaches was recovered after the effective potential in the Gō model was modified to include nonnative interactions that implicitly account for the hydrophobic/hydrophilic property of the amino acids (
      • Wallin S.
      • Zeldovich K.B.
      • Shakhnovich E.I.
      The folding mechanics of a knotted protein.
      ). These results for both the knotted protein and Im7 emphasize that, as stated previously (see “Encoding frustration in Gō models”), inclusion of nonnative interactions is sometimes required to correctly simulate folding using Gō-type models (Fig. 2).
      In the case of serpin folding, the Gō model (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ) and the BF approach (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ) are complementary and show considerable agreement. For WT AAT, both sets of simulations found that the mainly β domain folds early and that there is a highly populated intermediate state in which the β domain is mostly folded, whereas the α/β domain is still largely unformed (Figure 5, Figure 6). However, even at this stage in the all-atom BF simulations, significant amounts of local secondary structure were formed in the α/β domain.
      WT AAT is at least a three-state folder (
      • Kim D.
      • Yu M.H.
      Folding pathway of human α1-antitrypsin: characterization of an intermediate that is active but prone to aggregation.
      ,
      • Tsutsui Y.
      • Dela Cruz R.
      • Wintrode P.L.
      Folding mechanism of the metastable serpin α1-antitrypsin.
      ,
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ,
      • Bottomley S.P.
      The folding pathway of α1-antitrypsin: avoiding the unavoidable.
      ), and this is reflected in both sets of simulations. In the Gō-model simulations, a highly populated intermediate is formed when the fraction of native contacts, Q, is ∼0.4 (Fig. 5), and in the BF simulations, similar intermediates are populated between Q values of 0.5 and 0.7 (local minima 1 and 2 in Fig. 6). The somewhat larger fraction of native contacts in the BF simulations may be because folding was carried out at room temperature, whereas the Gō-model simulations were carried out at a higher temperature, Tfold, the temperature at which the native and unfolded states are equally populated. In addition, because the native-centric BF approach discourages backtracking, secondary structural elements may be overstabilized. The approach to this intermediate state is different for the two simulations; in the BF simulations, nonnative contacts between sheets B and C in the β domain and a strand from sheet A in the α/β domain must be resolved before these intermediates can form. These stabilizing nonnative interactions are not present in the Gō model.
      As noted above, positioning of the highly mobile RCL is critical for correct serpin folding to the active, metastable state (Fig. 4). Although the timing is slightly different, in both sets of simulations, premature insertion of the RCL into sheet A is prevented by the formation of contacts between the RCL and sheets B and C locking the C-terminal end of the RCL into place before the β domain has completed folding in agreement with the results of kinetic folding experiments (
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ). Interestingly, based on Frustratometer (
      • Ferreiro D.U.
      • Komives E.A.
      • Wolynes P.G.
      Frustration in biomolecules.
      ,
      • Parra R.G.
      • Schafer N.P.
      • Radusky L.G.
      • Tsai M.-Y.
      • Guzovsky A.B.
      • Wolynes P.G.
      • Ferreiro D.U.
      Protein Frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics.
      ) calculations of frustration, the C-terminal end of the RCL shows a lower degree of frustration than does the N-terminal end, again emphasizing that frustrated regions can be more difficult to fold. Both Gō-model and BF simulations also agree that packing of the C-terminal strands 4 and 5B, which complete the formation of the β domain, is a late event in folding, a finding that is supported by both pulsed oxidative footprinting (
      • Stocks B.B.
      • Sarkar A.
      • Wintrode P.L.
      • Konermann L.
      Early hydrophobic collapse of α1-antitrypsin facilitates formation of a metastable state: insights from oxidative labeling and mass spectrometry.
      ) and fragment complementation studies (
      • Dolmer K.
      • Gettins P.G.W.
      How the serpin α1-proteinase inhibitor folds.
      ).
      Thus, as demonstrated by investigations of AAT folding, Gō-type and all-atom approaches can provide complementary insights into protein folding, and method selection will depend on the question being asked.

      Simulating the folding of other large proteins

      In addition to AAT, folding simulations have been performed for a number of large multidomain proteins using both SBMs and all-atom methods (
      • Li W.
      • Terakawa T.
      • Wang W.
      • Takada S.
      Energy landscape and multiroute folding of topologically complex proteins adenylate kinase and 2ouf-knot.
      ,
      • Giri Rao V.V.H.
      • Gosavi S.
      In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
      ,
      • Reddy G.
      • Liu Z.
      • Thirumalai D.
      Denaturant-dependent folding of GFP.
      ,
      • Halloran K.T.
      • Wang Y.
      • Arora K.
      • Chakravarthy S.
      • Irving T.C.
      • Bilsel O.
      • Brooks 3rd, C.L.
      • Matthews C.R.
      Frustration and folding of a TIM barrel protein.
      ,
      • Inanami T.
      • Terada T.P.
      • Sasai M.
      Folding pathway of a multidomain protein depends on its topology of domain connectivity.
      ,
      • Wang Y.
      • Chu X.
      • Suo Z.
      • Wang E.
      • Wang J.
      Multidomain protein solves the folding problem by multifunnel combined landscape: theoretical investigation of a Y-family DNA polymerase.
      ,
      • Tanaka T.
      • Hori N.
      • Takada S.
      How co-translational folding of multi-domain protein Is affected by elongation schedule: molecular simulations.
      ). The domain architecture in multidomain proteins may be classified as sequential, proteins in which the domains are translated sequentially from the N to the C terminus, and nonsequential or discontinuous, proteins such as serpins in which one or more domain is discontinuously translated. Often, the folding of proteins with sequential domains is hypothesized to also be sequential, but more complicated folding has also been observed (
      • Batey S.
      • Nickson A.A.
      • Clarke J.
      Studying the folding of multidomain proteins.
      ,
      • Braselmann E.
      • Chaney J.L.
      • Clark P.L.
      Folding the proteome.
      ,
      • Liu K.
      • Maciuba K.
      • Kaiser C.M.
      The ribosome cooperates with a chaperone to guide multi-domain protein folding.
      ). A special class of sequential multidomain proteins are the repeat proteins, and the folding and misfolding of this class of proteins has recently been reviewed elsewhere (
      • Perez-Riba A.
      • Synakewicz M.
      • Itzhaki L.S.
      Folding cooperativity and allosteric function in the tandem-repeat protein class.
      ,
      • Lafita A.
      • Tian P.
      • Best R.B.
      • Bateman A.
      Tandem domain swapping: determinants of multidomain protein misfolding.
      ).
      Another promising native-centric approach to study the folding and misfolding of multidomain proteins is AWSEM-MD (associative memory, water-mediated, structure and energy model) developed by Wolynes and colleagues (
      • Papoian G.A.
      • Wolynes P.G.
      AWSEM-MD: from neural networks to protein structure prediction and functional dynamics of complex biomolecular assemblies.
      ,
      • Zheng W.
      • Schafer N.P.
      • Wolynes P.G.
      Frustration in the energy landscapes of multidomain protein misfolding.
      ). AWSEM is a coarse-grained protein force field in which a native centric bias can be introduced to varying degrees, ranging from a full Gō-type potential based on the native structure to a potential with local conformational preferences derived from fragment libraries (
      • Papoian G.A.
      • Wolynes P.G.
      AWSEM-MD: from neural networks to protein structure prediction and functional dynamics of complex biomolecular assemblies.
      ). Zheng et al. (
      • Zheng W.
      • Schafer N.P.
      • Wolynes P.G.
      Frustration in the energy landscapes of multidomain protein misfolding.
      ) used AWSEM-MD to study the folding of fused dimers of either SH3 domains or Ig domains from human titin. Prior to these simulations, single-molecule studies on Ig domain folding had shown that the monomers could domain-swap (a process in which adjacent monomers form intermolecular contacts that mimic the corresponding intramolecular native contacts in the monomer) and that monomers with identical sequences were more likely to domain-swap than monomers with divergent sequences (
      • Borgia M.B.
      • Borgia A.
      • Best R.B.
      • Steward A.
      • Nettels D.
      • Wunderlich B.
      • Schuler B.
      • Clarke J.
      Single-molecule fluorescence reveals sequence-specific misfolding in multidomain proteins.
      ). Domain swapping of monomers with identical sequences was also observed in force unfolding experiments and simulations of polyubiquitin chains (
      • Xia F.
      • Thirumalai D.
      • Gräter F.
      Minimum energy compact structures in force-quench polyubiquitin folding are domain swapped.
      ).
      The AWSEM-MD simulations performed by Zheng and et al. (
      • Zheng W.
      • Schafer N.P.
      • Wolynes P.G.
      Frustration in the energy landscapes of multidomain protein misfolding.
      ) suggested that misfolding of repeat proteins could occur in multiple ways, including the experimentally observed domain swapping and an alternate mechanism involving the formation of short stretches of intermolecular β-strands that formed amyloid-like contacts. Subsequently, the amyloid-like misfolding mechanism was supported by both ensemble and single-molecule kinetic folding experiments (
      • Borgia A.
      • Kemplen K.R.
      • Borgia M.B.
      • Soranno A.
      • Shammas S.
      • Wunderlich B.
      • Nettels D.
      • Best R.B.
      • Clarke J.
      • Schuler B.
      Transient misfolding dominates multidomain protein folding.
      ). Unlike domain-swapped contacts, these amyloid-like contacts had no counterpart in the native structure. More generally, Zheng et al. (
      • Zheng W.
      • Schafer N.P.
      • Wolynes P.G.
      Frustration in the energy landscapes of multidomain protein misfolding.
      ) found that the frequency of misfolding through both mechanisms was reduced by lowering the sequence identity between monomers in the fused dimer, a finding that supports the hypothesis that evolution has disfavored high sequence identity between neighboring domains in repeat proteins as a means of minimizing misfolding (
      • Han J.-H.
      • Batey S.
      • Nickson A.A.
      • Teichmann S.A.
      • Clarke J.
      The folding and evolution of multidomain proteins.
      ). These findings are important not just for repeat proteins but also for proteins such as serpins where misfolding of some disease-associated mutants is proposed to result in domain-swapped oligomers (
      • Yamasaki M.
      • Sendall T.J.
      • Pearce M.C.
      • Whisstock J.C.
      • Huntington J.A.
      Molecular basis of α1-antitrypsin deficiency revealed by the structure of a domain-swapped trimer.
      ,
      • Gooptu B.
      • Dickens J.A.
      • Lomas D.A.
      The molecular and cellular pathology of α1-antitrypsin deficiency.
      ) as well as for amyloidgenic proteins.
      Unlike repeat proteins, many sequential multidomain proteins are made of diverse structural units, decreasing the probability of misfolding by domain swapping. Jin Wang and colleagues (
      • Wang Y.
      • Chu X.
      • Suo Z.
      • Wang E.
      • Wang J.
      Multidomain protein solves the folding problem by multifunnel combined landscape: theoretical investigation of a Y-family DNA polymerase.
      ) used SBMs to simulate the unfolding and folding of DNA polymerase IV from Sulfolobus solfataricus (DPO4), a 342-amino-acid-long protein. With the exception of the N-terminal eight amino acids, which form a strand in the palm domain, DPO4 is a four-domain sequential protein with the finger, palm, thumb, and little finger domains in order from the N to the C terminus (Fig. 7A). In the SBM simulations, DPO4 folded via six parallel pathways, a “divide and conquer” strategy where each domain could fold independently and some, but not all, domain interfaces helped template the folding of other domains. Wang and co-workers suggest that this divide and conquer strategy can lead to multiple folding intermediates but that the formation of each intermediate decreases the degrees of freedom, thereby speeding the folding of multidomain proteins.
      Figure thumbnail gr7
      Figure 7Domain structures and connectivities for DPO4, DHFR, AKE, and Suf1. A, DPO4 with the finger (light blue), palm (N-terminal strand in blue and the rest in green), thumb (gold), and little finger (pink) domains (PDB entry 2RDI (
      • Wong J.H.
      • Fiala K.A.
      • Suo Z.
      • Ling H.
      Snapshots of a Y-family DNA polymerase in replication: substrate-induced conformational transitions and implications for fidelity of Dpo4.
      )). Domain assignments are from CATH (
      • Dawson N.L.
      • Lewis T.E.
      • Das S.
      • Lees J.G.
      • Lee D.
      • Ashford P.
      • Orengo C.A.
      • Sillitoe I.
      CATH: an expanded resource to predict protein function through structure and sequence.
      ). B, DHFR showing the discontinuous DLD (blue and pink) and the continuous ABD (gold) domains (PDB entry 1RX1 (
      • Sawaya M.R.
      • Kraut J.
      Loop and subdomain movements in the mechanism of Escherichia coli dihydrofolate reductase: crystallographic evidence.
      )). The domain assignments are from Inanami et al. (
      • Inanami T.
      • Terada T.P.
      • Sasai M.
      Folding pathway of a multidomain protein depends on its topology of domain connectivity.
      ). C, AKE showing the discontinuous CORE domain (blue, green, and pink) and the two continuous insertions, NMP (light blue) and Lid (gold) (PDB entry 4AKE (
      • Müller C.W.
      • Schlauderer G.J.
      • Reinstein J.
      • Schulz G.E.
      Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding.
      )). The domain assignments are from Giri Rao and Gosavi (
      • Giri Rao V.V.H.
      • Gosavi S.
      In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
      ). D, SufI showing the three sequential domains as assigned by CATH (
      • Dawson N.L.
      • Lewis T.E.
      • Das S.
      • Lees J.G.
      • Lee D.
      • Ashford P.
      • Orengo C.A.
      • Sillitoe I.
      CATH: an expanded resource to predict protein function through structure and sequence.
      ) (PDB entry 2UXT (
      • Tarry M.
      • Arends S.J.R.
      • Roversi P.
      • Piette E.
      • Sargent F.
      • Berks B.C.
      • Weiss D.S.
      • Lea S.M.
      The Escherichia coli cell division protein and model Tat substrate SufI (FtsP) localizes to the septal ring and has a multicopper oxidase-like structure.
      )). There are missing loops in the M domain. All structures are colored from the N terminus in blue to the C terminus in pink. Nonsequential, discontinuous domains are multicolored.
      Simulations of the folding of the protein dihydrofolate reductase (DHFR) and a circular permutant by Inanami et al. (
      • Inanami T.
      • Terada T.P.
      • Sasai M.
      Folding pathway of a multidomain protein depends on its topology of domain connectivity.
      ) provide further insight into the divide and conquer folding strategy. The 159-amino acid DHFR has two discontinuous domains, a discontinuous loop domain (DLD) (residues 1–37 and 107–159) and a continuous adenosine-binding domain (ABD) (residues 38–106) (Fig. 7B), whereas the circular permutant resolves the discontinuities resulting in two sequential domains. To simulate the folding of DHFR and other proteins with discontinuous domains, Inanami and co-workers had to modify the Wako-Saito-Muñoz-Eaton (WSME) method (
      • Wako H.
      • Saitô N.
      Statistical mechanical theory of the protein conformation. I. General considerations and the application to homopolymers.
      ,
      • Wako H.
      • Saitô N.
      Statistical mechanical theory of the protein conformation. II. Folding pathway for protein.
      ,
      • Muñoz V.
      • Eaton W.A.
      A simple model for calculating the kinetics of protein folding from three-dimensional structures.
      ), a kind of SBM. They then simulated the folding of both WT DHFR and the circular permutant using their extended WSME approach.
      In the simulations, folding occurred through two major intermediates, one in which the continuous ABD domain was mostly formed and the second in which the DLD domain was mostly formed. In the WT protein, formation of the DLD was entropically unfavorable due to the loop closure penalty that arises when the N and C termini are brought close together. Thus, the DLD first pathway was disfavored, and the folding pathway in which the ABD forms before the DLD dominated. In the circular permutant, the entropic penalty for forming the DLD was much reduced, and the two pathways (ABD first or DLD first) had similar probabilities analogous to the divide and conquer folding observed for DPO4. The DLD loop closure penalty is similar to the barrier for folding to the serpin latent state (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ), and both of these studies reiterate the importance of entropic folding barriers.
      Escherichia coli adenylate kinase (AKE) is a complex nonsequential protein composed of two continuous domains (LID and NME) inserted into a discontinuous domain (CORE) split into three parts (Fig. 7C). SBM simulations of AKE found, in agreement with bulk thermodynamic experiments, that AKE folds cooperatively (
      • Giri Rao V.V.H.
      • Gosavi S.
      In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
      ). This cooperativity was enabled by the insertion of the smaller less stable domains into the larger more stable domain. The smaller domains transiently folded and unfolded, stabilizing only upon folding of the larger discontinuous domain, whose folding constrained the termini of the smaller domains into near-native conformations and reduced the entropic cost of folding these domains. Similar entropic considerations go into the physics-based sequential collapse model in which the closure of loops with lengths on the order of 65 amino acids, too large to crowd the side chains and too small to have very large entropic penalties, are postulated to be one of the earliest events in protein folding (
      • Bergasa-Caceres F.
      • Haas E.
      • Rabitz H.A.
      Nature's shortcut to protein folding.
      ). This nonlocal loop formation would encourage the formation of local structure, such as foldons (
      • Panchenko A.R.
      • Luthey-Schulten Z.
      • Wolynes P.G.
      Foldons, protein structural modules, and exons.
      ,
      • Englander S.W.
      • Mayne L.
      The nature of protein folding pathways.
      ). Application of the sequential collapse–based method to AKE agreed with the results of time-resolved FRET experiments (
      • Bergasa-Caceres F.
      • Haas E.
      • Rabitz H.A.
      Nature's shortcut to protein folding.
      ) and the previous SBM folding simulations.
      Results of the SBM simulations suggested that for AKE, the number of intermediates and in turn folding pathways could be modulated by increasing the stabilities of the inserted domains and making their folding independent of the discontinuous domain (
      • Giri Rao V.V.H.
      • Gosavi S.
      In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
      ). In separate SBM simulations, the stabilities of the inserted domains were tuned by introducing energetic heterogeneity in the strengths of the contacts stabilizing the smaller domains (
      • Li W.
      • Terakawa T.
      • Wang W.
      • Takada S.
      Energy landscape and multiroute folding of topologically complex proteins adenylate kinase and 2ouf-knot.
      ). The results of these simulations, specifically both the number of folding intermediates and the number of folding pathways, broadly matched with experiments. Further tuning of domain stabilization and folding pathways was achieved by mutating AKE residues in or near the NME domain (
      • Kantaev R.
      • Riven I.
      • Goldenzweig A.
      • Barak Y.
      • Dym O.
      • Peleg Y.
      • Albeck S.
      • Fleishman S.J.
      • Haran G.
      Manipulating the folding landscape of a multidomain protein.
      ). Upon experimentally testing the effects of these mutations, it was found that stabilizing the NME domain mainly preserved WT-like folding while shunting a minor fraction (9%) of the folding flux through a pathway that bypassed folding intermediates and folded directly to the native state. Although the effects of the specific mutations used in the experiments on folding have not been tested using SBMs, the experimental results only partially agree with predictions from the SBM simulations with homogeneous contact strengths. This partial agreement between experimental and computational folding results provides clues on how to improve simulation approaches. For instance, heterogeneous contact strengths may be necessary to predict the specifics of intermediate populations in AKE folding. As shown for other proteins, such as Im7 (
      • Chen T.
      • Chan H.S.
      Native contact density and nonnative hydrophobic effects in the folding of bacterial immunity proteins.
      ) and the knotted protein (
      • Wallin S.
      • Zeldovich K.B.
      • Shakhnovich E.I.
      The folding mechanics of a knotted protein.
      ) mentioned earlier, SBMs may need to be supplemented with nonnative interactions (see Fig. 2) to improve agreement with experiment.
      In contrast, for the 443-amino-acid-long E. coli cell division protein SufI (also called ftsP), the ability to access a number of parallel folding pathways leads to misfolding (
      • Tanaka T.
      • Hori N.
      • Takada S.
      How co-translational folding of multi-domain protein Is affected by elongation schedule: molecular simulations.
      ). SufI is composed of three sequential domains (Fig. 7D), and Ignatova and colleagues (
      • Zhang G.
      • Hubalewska M.
      • Ignatova Z.
      Transient ribosomal attenuation coordinates protein synthesis and co-translational folding.
      ) showed experimentally that stretches of slow translation are required for proper SufI folding. Tanaka et al. (
      • Tanaka T.
      • Hori N.
      • Takada S.
      How co-translational folding of multi-domain protein Is affected by elongation schedule: molecular simulations.
      ) used their SBM to simulate the folding of full-length SufI and mimicked co-translational folding by vectorially synthesizing SufI at various rates during the simulations. For the full-length protein and for fast vectorial synthesis, SufI folded via multiple pathways, many of which resulted in long-lived misfolded species. In contrast, vectorial synthesis regimes that allowed for folding during synthesis restricted the number of possible pathways and increased the efficiency of folding. These results are consistent with recent work showing that rare codon regions, which may attenuate translation, are conserved across species (
      • Jacobs W.M.
      • Shakhnovich E.I.
      Evidence of evolutionary selection for cotranslational folding.
      ,
      • Chaney J.L.
      • Steele A.
      • Carmichael R.
      • Rodriguez A.
      • Specht A.T.
      • Ngo K.
      • Li J.
      • Emrich S.
      • Clark P.L.
      Widespread position-specific conservation of synonymous rare codons within coding sequences.
      ). In general, the balance between translation rates and folding is protein-dependent. For proteins such as Suf1, translational attenuation at particular places in the sequence can aid in folding, whereas for other proteins, fast translation can reduce the probability of populating misfolded intermediates (
      • O'Brien E.P.
      • Ciryam P.
      • Vendruscolo M.
      • Dobson C.M.
      Understanding the influence of codon translation rates on cotranslational protein folding.
      ,
      • Sharma A.K.
      • O'Brien E.P.
      Non-equilibrium coupling of protein structure and function to translation-elongation kinetics.
      ).
      In summary, when larger proteins consist of multiple small domains with minimal interdomain interfaces, the folding of individual domains may be only partially dependent on or completely independent of the folding of other domains, and this can lead to multiple pathways. Interdomain interfaces can template the folding of neighboring domains and reduce the number of pathways that are available to the protein. Evolution may further modulate the folding of such sequential multidomain proteins by having divergent amino acid sequences in adjacent domains or by tuning translation rates. However, on the whole, conclusions drawn from single-domain proteins can be applicable to the folding of the individual domains of such modular multidomain proteins. In contrast, in multidomain proteins, where either the domains are connected through multiple linkers or there is an extensive interface between them, protein folding depends on the structure of the entire protein. In such proteins, topological frustration created by the need to fold a complex structure may stall folding and reduce free energy barriers but enable the population of folding intermediates. Alternately, the entropy loss required to fold the entire protein in an all or nothing fashion can lead to large folding energy barriers.

      Harnessing the predictive power of folding simulations

      Existing experimental folding data are often used to validate the results of folding simulations; as shown for AKE, where simulations motivated the design of AKE mutants and single-molecule FRET (smFRET) studies of their folding (
      • Kantaev R.
      • Riven I.
      • Goldenzweig A.
      • Barak Y.
      • Dym O.
      • Peleg Y.
      • Albeck S.
      • Fleishman S.J.
      • Haran G.
      Manipulating the folding landscape of a multidomain protein.
      ), the conformational ensembles along the simulated folding trajectories can be used to design new kinetic folding experiments and to help interpret experimental results. As has been demonstrated for near-UV CD spectra calculated from BF folding simulations (
      • Ianeselli A.
      • Orioli S.
      • Spagnolli G.
      • Faccioli P.
      • Cupellini L.
      • Jurinovich S.
      • Mennucci B.
      Atomic detail of protein folding revealed by an ab initio reappraisal of circular dichroism.
      ), these conformational ensembles may be used to predict how average protein properties change during folding. However, different conformational ensembles can lead to similar average properties, and the real power of the conformational ensembles available from folding simulations lies in comparisons with and prediction of results from methods that can elucidate how conformational distributions change as a protein folds.
      Many methods, including time-resolved fluorescence, lifetime FRET, smFRET, 19F NMR, and EPR, use site-specific labels that report on the local environment around the label or changes in the relative distance between labels. For these local probes, simulations may be used to make informed decisions as to where to locate these local reporters to maximize signal changes as the protein folds and to predict the experimental results. Simulations may also be helpful in interpreting observed signals. For example, for smFRET, all-atom conventional MD simulations performed on the Anton supercomputer have been used to explain observed folding and unfolding rate constants (
      • Chung H.S.
      • Piana-Agostinetti S.
      • Shaw D.E.
      • Eaton W.A.
      Structural origin of slow diffusion in protein folding.
      ). More recently, Schug and co-workers (
      • Reinartz I.
      • Sinner C.
      • Nettels D.
      • Stucki-Buchli B.
      • Stockmar F.
      • Panek P.T.
      • Jacob C.R.
      • Nienhaus G.U.
      • Schuler B.
      • Schug A.
      Simulation of FRET dyes allows quantitative comparison against experimental data.
      ) used coarse-grained SBMs that explicitly included the FRET dyes to simulate smFRET folding data, resulting in good agreement with experimental results.
      Although label-dependent methods can provide information on how the local environment of the probe changes as the protein folds, other methods, including NMR, pulsed oxidative labeling coupled to MS (
      • Konermann L.
      • Pan Y.
      • Stocks B.B.
      Protein folding mechanisms studied by pulsed oxidative labeling and mass spectrometry.
      ,
      • Johnson D.T.
      • Di Stefano L.H.
      • Jones L.M.
      Fast photochemical oxidation of proteins (FPOP): a powerful mass spectrometry based structural proteomics tool.
      ), and hydrogen-deuterium exchange MS (HDX-MS) or NMR (
      • Englander S.W.
      • Mayne L.
      • Bai Y.
      • Sosnick T.R.
      Hydrogen exchange: the modern legacy of Linderstrøm-Lang.
      ), provide conformationally sensitive data with peptide to single-residue resolution. Particularly for NMR experiments, experimental data may be incorporated into simulations to help provide a molecular interpretation of the experimental results (e.g. see Ref.
      • Fossat M.J.
      • Dao T.P.
      • Jenkins K.
      • Dellarole M.
      • Yang Y.
      • McCallum S.A.
      • Garcia A.E.
      • Barrick D.
      • Roumestand C.
      • Royer C.A.
      High-resolution mapping of a repeat protein folding free energy landscape.
      ). In addition, conformational distributions from folding simulations may be used to predict time-resolved experimental results. For example, oxidative labeling of amino acid side chains depends on side-chain solvent accessibility (
      • Konermann L.
      • Pan Y.
      • Stocks B.B.
      Protein folding mechanisms studied by pulsed oxidative labeling and mass spectrometry.
      ,
      • Johnson D.T.
      • Di Stefano L.H.
      • Jones L.M.
      Fast photochemical oxidation of proteins (FPOP): a powerful mass spectrometry based structural proteomics tool.
      ), and the time evolution of the probability that a given residue is solvent-accessible may be simply computed from conformational ensembles along the folding trajectory.
      For folded proteins, deuterium uptake curves from HDX-MS experiments have been predicted with reasonable accuracy from all-atom or coarse-grained simulations even on relatively short submicrosecond timescales (
      • Vendruscolo M.
      • Paci E.
      • Dobson C.M.
      • Karplus M.
      Rare fluctuations of native proteins sampled by equilibrium hydrogen exchange.
      ,
      • Mohammadiarani H.
      • Shaw V.S.
      • Neubig R.R.
      • Vashisth H.
      Interpreting hydrogen-deuterium exchange events in proteins using atomistic simulations: case studies on regulators of G-protein signaling proteins.
      ). Similarly, conformational ensembles along folding pathways, such as those shown in Figure 5, Figure 6 for AAT, could be used to predict the results of kinetic folding experiments monitored by HDX-MS. However, conformational ensembles from native-centric simulations may not capture structural fluctuations, possibly leading to underestimation of exchange. An alternative approach would be to better account for structural fluctuations by subjecting one or more conformations from the major long-lived intermediates identified in the simulations to 0.1–1-μs MD simulations. Deuterium uptake at short pulse-labeling timescales (typically 5–10 s in the case of manual labeling) could then be predicted from ensembles harvested directly from the folding simulations or from the MD-generated ensembles, allowing direct comparisons between predicted and observed folding intermediates.
      In addition to predicting experimental results, the simulations may be used to test the effects of mutations as demonstrated for both AAT (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ) and AKE (
      • Giri Rao V.V.H.
      • Gosavi S.
      In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
      ). Furthermore, as has been demonstrated for AKE (
      • Kantaev R.
      • Riven I.
      • Goldenzweig A.
      • Barak Y.
      • Dym O.
      • Peleg Y.
      • Albeck S.
      • Fleishman S.J.
      • Haran G.
      Manipulating the folding landscape of a multidomain protein.
      ), the simulations may be used as the basis for designing and experimentally characterizing new mutations.

      Extrapolating from folding simulations to folding in more complicated environments

      As discussed above, the results of folding simulations are generally compared with experimental results from the folding of purified, full-length proteins. But, physiologically, proteins are synthesized and fold in the crowded and complicated intracellular milieu (
      • Balchin D.
      • Hayer-Hartl M.
      • Hartl F.U.
      In vivo aspects of protein folding and quality control.
      ,
      • Sontag E.M.
      • Samant R.S.
      • Frydman J.
      Mechanisms and functions of spatial protein quality control.
      ,
      • Adams B.M.
      • Oster M.E.
      • Hebert D.N.
      Protein quality control in the endoplasmic reticulum.
      ). In cells, proteins may fold co-translationally as they are synthesized and both co- and post-translationally proteins interact with the protein homeostasis machinery, including modifying enzymes (e.g. kinases and oligosaccharyltransferases) and molecular chaperones. Although folding simulations are performed in a much simpler environment, the insights generated into how the protein sequence folds are still relevant to in-cell folding.
      Folding pathways and intermediates generated using the computational methods described above may allow the formulation of specific hypotheses regarding which structural regions and motifs of a protein interact with molecular chaperones and other components of the protein homeostasis machinery. Because folding simulations are usually performed on full-length proteins, we focus on what is known about post-translational interactions between folding proteins and cellular quality control. (For more comprehensive reviews of the protein homeostasis machinery, see Ref.
      • Dubnikov T.
      • Ben-Gedalya T.
      • Cohen E.
      Protein quality control in health and disease.
      .) To our knowledge, there have not been rigorous attempts to extrapolate substrate-chaperone interactions from folding simulations; nonetheless, a general strategy is summarized below.
      The first step is to use protein-folding simulations to predict and structurally characterize the conformational ensembles of the relevant long-lived intermediates. Next, the solvent accessibility of regions that are likely to bind molecular chaperones or other components of the protein homeostasis machinery can be determined.
      For the bacterial Hsp70 molecular chaperones DnaK and BiP, the Hsp70 resident in the ER, seven amino acid sequence motifs that are likely to bind in the Hsp70 substrate-binding site have been determined using peptide arrays combined with Hsp70 structures and computational methods (
      • Van Durme J.
      • Maurer-Stroh S.
      • Gallardo R.
      • Wilkinson H.
      • Rousseau F.
      • Schymkowitz J.
      Accurate prediction of DnaK-peptide binding via homology modelling and experimental data.
      ,
      • Schneider M.
      • Rosam M.
      • Glaser M.
      • Patronov A.
      • Shah H.
      • Back K.C.
      • Daake M.A.
      • Buchner J.
      • Antes I.
      BiPPred: Combined sequence- and structure-based prediction of peptide binding to the Hsp70 chaperone BiP.
      ). Using the available webservers (Limbo (switchlab.org/bioinformatics/limbo)
      Please note that the JBC is not responsible for the long-term archiving and maintenance of this site or any other third party hosted site.
      for DnaK and BipPred (https://www.ncbi.nlm.nih.gov/pubmed/27287023) for BiP), likely Hsp70 binding sites can be determined from the protein sequence. Then the relative solvent accessibility of these regions could be determined for the conformational ensembles of folding intermediates from simulations. Similarly, recent work (
      • Behnke J.
      • Mann M.J.
      • Scruggs F.-L.
      • Feige M.J.
      • Hendershot L.M.
      Members of the Hsp70 family recognize distinct types of sequences to execute ER quality control.
      ) suggests that binding motifs for the ER-resident Hsp40s Erdj4 and Erdj5 as well as the mammalian ER-resident Hsp110 ortholog Grp170 may be identified using the TANGO algorithm (tango.crg.es)6 (
      • Fernandez-Escamilla A.-M.
      • Rousseau F.
      • Schymkowitz J.
      • Serrano L.
      Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins.
      ) developed to identify sequences with a high aggregation propensity. For proteins that fold and mature in the ER, these sequence motifs may also be used to try to identify likely binding sites for molecular chaperones. Other classes of molecular chaperones, including the chaperonin GroEL (
      • Bandyopadhyay B.
      • Goldenzweig A.
      • Unger T.
      • Adato O.
      • Fleishman S.J.
      • Unger R.
      • Horovitz A.
      Local energetic frustration affects the dependence of green fluorescent protein folding on the chaperonin GroEL.
      ), may preferentially recognize dynamic, frustrated protein regions (
      • He L.
      • Hiller S.
      Frustrated interfaces facilitate dynamic interactions between native client proteins and holdase chaperones.
      ). Frustrated regions in folding simulations may be identified using the Frustratometer (
      • Parra R.G.
      • Schafer N.P.
      • Radusky L.G.
      • Tsai M.-Y.
      • Guzovsky A.B.
      • Wolynes P.G.
      • Ferreiro D.U.
      Protein Frustratometer 2: a tool to localize energetic frustration in protein molecules, now with electrostatics.
      ,
      • Das A.
      • Plotkin S.S.
      SOD1 exhibits allosteric frustration to facilitate metal binding affinity.
      ), again allowing the identification of possible accessible recognition sites in the conformational ensembles of likely long-lived folding intermediates. Although these proposed methods are relatively primitive and not applicable to all chaperones (
      • Koldewey P.
      • Horowitz S.
      • Bardwell J.C.A.
      Chaperone-client interactions: non-specificity engenders multifunctionality.
      ), they have the potential to provide structural information on transient folding intermediates that may be recognized by chaperone networks.
      These predictions may also be experimentally tested either in vitro for folding experiments performed in the presence of molecular chaperones or in cells. Interactions between the chaperones and folding proteins may be captured using cross-linking, and the location of the interactions may be mapped using MS (
      • Sinz A.
      Cross-linking/mass spectrometry for studying protein structures and protein-protein interactions: where are we now and where should we go from here?.
      ). This combination of simulations and experiments has the potential to aid in understanding how chaperones and other components of the protein homeostasis machinery recognize and interact with folding and misfolding client proteins.
      Another area where native centric folding simulations can find application is in understanding the basis of diseases caused by protein misfolding. In misfolding diseases, such as AAT deficiency, ALS, and Alzheimer's disease, toxic gain-of-function phenotypes are correlated with aggregate accumulation, and oligomerization and aggregation are believed to proceed from specific partially folded or misfolded states (
      • Gámez A.
      • Yuste-Checa P.
      • Brasil S.
      • Briso-Montiano Á.
      • Desviat L.R.
      • Ugarte M.
      • Pérez-Cerdá C.
      • Pérez B.
      Protein misfolding diseases: prospects of pharmacological treatment.
      ). Such oligomerization-prone states could potentially be targeted by small molecules. High-throughput screening is one strategy for identifying such compounds, but for a rational design strategy to be pursued, knowledge of the structure of these pathological protein states, ideally at the level of atomic detail, is needed. In the AAT folding simulations described above, it was encouraging to see that the propensity of disease-associated mutants to misfold correlated with observed disease severity in patients. Similar correlations between MD trajectories, mutations, and disease severity have been observed for superoxide dismutase 1 and ALS (
      • Das A.
      • Plotkin S.S.
      Mechanical probes of SOD1 predict systematic trends in metal and dimer affinity of ALS-associated mutants.
      ). These results suggest that the misfolded states generated computationally may adequately represent the pathological misfolded states formed in cells. Although this hypothesis requires further experimental validation, if correct, then these computationally generated misfolded states could serve as targets for in silico drug design. We suggest that such a combination of folding simulations, in silico small molecule screening/design, and experiments could serve as a general approach for identifying potential therapies for diseases linked to protein misfolding.

      Conclusions and future directions

      Native centric simulation methods have made it possible to extend the computation of detailed folding pathways and intermediates beyond the small fast-folding model proteins that have traditionally been favored by protein-folding researchers.
      Collectively, comparisons between studies suggest that the results of all-atom approaches, such as BF/SPCS and native-centric structure-based (Gō-type) model (SBM) folding simulations share many common features, likely reflecting the importance of native contacts. SBMs are particularly helpful when interrogating how the same or similar chains fold to different structures, as demonstrated by simulating serpin folding to the native and latent conformations (
      • Giri Rao V.V.H.
      • Gosavi S.
      On the folding of a structurally complex protein to its metastable active state.
      ). BF/SPCS and other all-atom methods can address similar questions while extending the reach of simulations to questions of chemistry.
      For example, these all-atom methods can be used to predict or explain how differences in the primary structure alter folding pathways (
      • Wang F.
      • Cazzolli G.
      • Wintrode P.
      • Faccioli P.
      Folding mechanism of proteins Im7 and Im9: insight from all-atom simulations in implicit and explicit solvent.
      ). Additional, important differences between all-atom and SBM approaches are likely to arise for reactions in which transient nonnative interactions play an important role in restricting the exploration of the configuration space. Thus, the choice of method is likely to depend on computational resources and efficiencies as well as the question being asked.
      All-atom biased simulations have been successfully applied to elucidate the mechanism through which a specific small molecule accelerates the PAI-1 serpin latency transition (
      • Cazzolli G.
      • Wang F.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      Serpin latency transition at atomic resolution.
      ) and to predict how point mutations alter the misfolding propensity (
      • Wang F.
      • Orioli S.
      • Ianeselli A.
      • Spagnolli G.
      • a Beccara S.
      • Gershenson A.
      • Faccioli P.
      • Wintrode P.L.
      All-atom simulations reveal how single-point mutations promote serpin misfolding.
      ). Future applications are likely to include explicitly simulating interactions between proteins and small molecules as well as simulating the formation of protein oligomers during and after folding.
      SBMs have already been used to simulate the complex conformational dynamics of ribosomes (
      • Levi M.
      • Noel J.K.
      • Whitford P.C.
      Studying ribosome dynamics with simplified models.
      ), the membrane insertion of proteins (
      • Giri Rao V.V.H.
      • Desikan R.
      • Ayappa K.G.
      • Gosavi S.
      Capturing the membrane-triggered conformational cransition of an α-helical pore-forming toxin.
      ), and co-translational folding (
      • Tanaka T.
      • Hori N.
      • Takada S.
      How co-translational folding of multi-domain protein Is affected by elongation schedule: molecular simulations.
      ). Additionally, in the absence of a structure, protein or DNA sequence derived information and experimental constraints have been used in conjunction with coarse-grained and structure-based models to understand protein conformations and assembly (
      • Noel J.K.
      • Onuchic J.N.
      The many faces of structure-based potentials: from protein folding landscapes to structural characterization of complex biomolecules.
      ,
      • Morcos F.
      • Jana B.
      • Hwa T.
      • Onuchic J.N.
      Coevolutionary signals across protein lineages help capture multiple protein conformations.
      ,
      • Morcos F.
      • Pagnani A.
      • Lunt B.
      • Bertolino A.
      • Marks D.S.
      • Sander C.
      • Zecchina R.
      • Onuchic J.N.
      • Hwa T.
      • Weigt M.
      Direct-coupling analysis of residue coevolution captures native contacts across many protein families.
      ), chromosome folding (
      • Zhang B.
      • Wolynes P.G.
      Topology, structures, and energy landscapes of human chromosomes.
      ), etc. With increasing computational power, and the accumulation of biological data, we expect that both structure- and data-derived models of larger biological machines will be used to simulate and understand folding, assembly, and dynamics in cells and other physiologically relevant environments (
      • McGuffee S.R.
      • Elcock A.H.
      Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm.
      ,
      • Earnest T.M.
      • Cole J.A.
      • Luthey-Schulten Z.
      Simulating biological processes: stochastic physics from whole cells to colonies.
      ,
      • Yu I.
      • Mori T.
      • Ando T.
      • Harada R.
      • Jung J.
      • Sugita Y.
      • Feig M.
      Biomolecular interactions modulate macromolecular structure and dynamics in atomistic model of a bacterial cytoplasm.
      ). We expect that such simulations will aid in the analysis of biological experiments at diverse length scales. We also hope that such detailed computational models will stimulate new and interesting experiments.
      The increasing availability of affordable graphics-processing units capable of running MD codes brings both native-centric simulations and simulations based on a native biasing force within the reach of many research groups at reasonable cost. (Also, freely distributed software packages such as SMOG (and the associated web server) (
      • Noel J.K.
      • Levi M.
      • Raghunathan M.
      • Lammert H.
      • Hayes R.L.
      • Onuchic J.N.
      • Whitford P.C.
      SMOG 2: a versatile software package for generating structure-based models.
      ), CafeMol (
      • Kenzaki H.
      • Koga N.
      • Hori N.
      • Kanada R.
      • Li W.
      • Okazaki K.
      • Yao X.-Q.
      • Takada S.
      CafeMol: a coarse-grained biomolecular simulator for simulating proteins at work.
      ), Go-Kit (
      • Neelamraju S.
      • Wales D.J.
      • Gosavi S.
      Go-Kit: a tool to enable energy landscape exploration of proteins.
      ), and eSBMTools (
      • Lutz B.
      • Sinner C.
      • Heuermann G.
      • Verma A.
      • Schug A.
      eSBMTools 1.0: enhanced native structure-based modeling tools.
      ) are available to facilitate the use of Gō-type simulations by nonexperts.) We hope this review will convince both experimentalists and MD simulation experts that native-biased simulations have come of age, can bridge the timescale gap between experiments and simulations, and are just as easily accessible to everyone as atomistic MD simulations. Importantly, these methods can aid in the interpretation of experiments and generate novel testable hypotheses.
      There are insights to be gained from both types of native-biased simulations in the folding of large proteins, and such insights may be hard to come by in experiments. Altogether, these computational platforms can provide useful tools for translational research aiming to identify new therapeutic strategies. Much remains to be done to expand the application of native structure-based techniques at both the larger and the atomistic length scales, and we hope this review will inspire further developments in the models, methods, and applications of these promising simulation techniques.

      Acknowledgments

      We thank current and former members of the Faccioli, Gershenson, Gosavi, and Wintrode research groups for their research on folding large proteins and helpful discussions.

      References

        • Onuchic J.N.
        • Luthey-Schulten Z.
        • Wolynes P.G.
        Theory of protein folding: the energy landscape perspective.
        Annu. Rev. Phys. Chem. 1997; 48 (9348663): 545-600
        • Dill K.A.
        • MacCallum J.L.
        The protein-folding problem, 50 years on.
        Science. 2012; 338 (23180855): 1042-1046
        • Gruebele M.
        • Dave K.
        • Sukenik S.
        Globular protein folding in vitro in vivo.
        Annu. Rev. Biophys. 2016; 45 (27391927): 233-251
        • Hebert D.N.
        • Molinari M.
        In and out of the ER: protein folding, quality control, degradation, and related human diseases.
        Physiol. Rev. 2007; 87 (17928587): 1377-1408
        • Hartl F.U.
        • Hayer-Hartl M.
        Converging concepts of protein folding in vitro in vivo.
        Nat. Struct. Mol. Biol. 2009; 16 (19491934): 574-581
        • Labbadia J.
        • Morimoto R.I.
        The biology of proteostasis in aging and disease.
        Annu. Rev. Biochem. 2015; 84 (25784053): 435-464
        • Dubnikov T.
        • Ben-Gedalya T.
        • Cohen E.
        Protein quality control in health and disease.
        Cold Spring Harb. Perspect. Biol. 2017; 9 (27864315)a023523
        • Jiang Y.
        • Kalodimos C.G.
        NMR studies of large proteins.
        J. Mol. Biol. 2017; 429 (28728982): 2667-2676
        • Sekhar A.
        • Kay L.E.
        An NMR view of protein dynamics in health and disease.
        Annu. Rev. Biophys. 2019; 48 (30901260): 297-319
        • Hekstra D.R.
        • White K.I.
        • Socolich M.A.
        • Henning R.W.
        • Šrajer V.
        • Ranganathan R.
        Electric-field-stimulated protein mechanics.
        Nature. 2016; 540 (27926732): 400-405
        • Piana S.
        • Klepeis J.L.
        • Shaw D.E.
        Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations.
        Curr. Opin. Struct. Biol. 2014; 24 (24463371): 98-105
        • Pande V.S.
        Understanding protein folding using Markov state models.
        Adv. Exp. Med. Biol. 2014; 797 (24297278): 101-106
        • Wang M.
        • Kurland C.G.
        • Caetano-Anollés G.
        Reductive evolution of proteomes and protein structures.
        Proc. Natl. Acad. Sci. U.S.A. 2011; 108 (21730144): 11954-11958
        • Gō N.
        Statistical mechanics of protein folding, unfolding and fluctuation.
        Adv. Biophys. 1976; 9 (1015397): 65-113
        • Karanicolas J.
        • Brooks 3rd, C.L.
        Improved Gō-like models demonstrate the robustness of protein folding mechanisms towards non-native interactions.
        J. Mol. Biol. 2003; 334 (14607121): 309-325
        • Papoian G.A.
        • Wolynes P.G.
        AWSEM-MD: from neural networks to protein structure prediction and functional dynamics of complex biomolecular assemblies.
        in: Papoian G.A. Coarse-grained Modeling of Biomolecules. CRC Press, Inc., Boca Raton, FL2017: 121-190
        • Bolhuis P.G.
        • Chandler D.
        • Dellago C.
        • Geissler P.L.
        Transition path sampling: throwing ropes over rough mountain passes, in the dark.
        Annu. Rev. Phys. Chem. 2002; 53 (11972010): 291-318
        • Barducci A.
        • Bonomi M.
        • Parrinello M.
        Metadynamics.
        Wiley Interdiscip. Rev. Comput. Mol. Sci. 2011; 1: 826-843
        • Bello-Rivas J.M.
        • Elber R.
        Simulations of thermodynamics and kinetics on rough energy landscapes with milestoning.
        J. Comput. Chem. 2016; 37 (26265358): 602-613
        • Elber R.
        A new paradigm for atomically detailed simulations of kinetics in biophysical systems.
        Q. Rev. Biophys. 2017; 50 (29233220): e8
        • Cabriolu R.
        • Skjelbred Refsnes K.M.
        • Bolhuis P.G.
        • van Erp T.S.
        Foundations and latest advances in replica exchange transition interface sampling.
        J. Chem. Phys. 2017; 147 (29055317)152722
        • Husic B.E.
        • Pande V.S.
        Markov state models: From an art to a science.
        J. Am. Chem. Soc. 2018; 140 (29323881): 2386-2396
        • Orioli S.
        • a Beccara S.
        • Faccioli P.
        Self-consistent calculation of protein folding pathways.
        J. Chem. Phys. 2017; 147 (064108) (28810783)
        • Li W.
        • Terakawa T.
        • Wang W.
        • Takada S.
        Energy landscape and multiroute folding of topologically complex proteins adenylate kinase and 2ouf-knot.
        Proc. Natl. Acad. Sci. U.S.A. 2012; 109 (22753508): 17789-17794
        • Giri Rao V.V.H.
        • Gosavi S.
        In the multi-domain protein adenylate kinase, domain insertion facilitates cooperative folding while accommodating function at domain interfaces.
        PLoS Comput. Biol. 2014; 10 (25393408)e1003938
        • Reddy G.
        • Liu Z.
        • Thirumalai D.
        Denaturant-dependent folding of GFP.
        Proc. Natl. Acad. Sci. U.S.A. 2012; 109 (22778437): 17832-17838
        • Halloran K.T.
        • Wang Y.
        • Arora K.
        • Chakravarthy S.
        • Irving T.C.
        • Bilsel O.
        • Brooks 3rd, C.L.
        • Matthews C.R.
        Frustration and folding of a TIM barrel protein.
        Proc. Natl. Acad. Sci. U.S.A. 2019; 116 (31346089): 16378-16383