Advertisement

Computational tools help improve protein stability but with a solubility tradeoff

      Abstract

      Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnology. Increasing protein stability is an especially challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 experimental mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodynamic stability, ThreeFoil. Experimental characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein solubility, the most common cause of protein design failure. Examination of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of solubility.
      Most natural proteins have modest thermodynamic stability, limiting their development as effective biocatalysts, biosensors, and therapeutics (
      • Rockah-Shmuel L.
      • Tóth-Petróczy Á.
      • Tawfik D.S.
      Systematic mapping of protein mutational space by prolonged drift reveals the deleterious effects of seemingly neutral mutations.
      ). Generally, increasing stability tends to increase protein production yields (
      • Kwon W.S.
      • Da Silva N.A.
      • Kellis J.T.
      Relationship between thermal stability, degradation rate and expression yield of barnase variants in the periplasm of Escherichia coli.
      ,
      • McLendon G.
      • Radany E.
      Is protein turnover thermodynamically controlled?.
      ), lengthen shelf-life (
      • Sauerborn M.
      • Brinks V.
      • Jiskoot W.
      • Schellekens H.
      Immunological mechanism underlying the immune response to recombinant human protein therapeutics.
      ,
      • Manning M.C.
      • Chou D.K.
      • Murphy B.M.
      • Payne R.W.
      • Katayama D.S.
      Stability of protein pharmaceuticals: an update.
      • Boulet-Audet M.
      • Byrne B.
      • Kazarian S.G.
      High-throughput thermal stability analysis of a monoclonal antibody by attenuated total reflection ft-ir spectroscopic imaging.
      ), and improve survival under challenging solution conditions (pH, proteases, cosolvents, etc.). Increased stability also allows the use of elevated reaction temperatures, improving reaction rates and reducing unwanted bacterial growth in industrial reactions (
      • Bommarius A.S.
      • Paye M.F.
      Stabilizing biocatalysts.
      ). Increased stability may also decrease local and/or global unfolding rates, thereby reducing the formation of aggregates and, consequently, immunogenicity (
      • Sauerborn M.
      • Brinks V.
      • Jiskoot W.
      • Schellekens H.
      Immunological mechanism underlying the immune response to recombinant human protein therapeutics.
      ,
      • Manning M.C.
      • Chou D.K.
      • Murphy B.M.
      • Payne R.W.
      • Katayama D.S.
      Stability of protein pharmaceuticals: an update.
      • Boulet-Audet M.
      • Byrne B.
      • Kazarian S.G.
      High-throughput thermal stability analysis of a monoclonal antibody by attenuated total reflection ft-ir spectroscopic imaging.
      ,
      • Redington J.M.
      • Breydo L.
      • Uversky V.N.
      When good goes awry: the aggregation of protein therapeutics.
      ). Despite its great significance, the reliable engineering of protein stability via sequence changes is still a challenging and often laborious task (
      • Magliery T.J.
      Protein stability: computation, sequence statistics, and new experimental methods.
      ). Various experimental methods, including directed evolution (
      • Romero P.A.
      • Arnold F.H.
      Exploring protein fitness landscapes by directed evolution.
      ) and deep sequencing (
      • Tripathi A.
      • Varadarajan R.
      Residue specific contributions to stability and activity inferred from saturation mutagenesis and deep sequencing.
      ,
      • Foit L.
      • Morgan G.J.
      • Kern M.J.
      • Steimer L.R.
      • von Hacht A.A.
      • Titchmarsh J.
      • Warriner S.L.
      • Radford S.E.
      • Bardwell J.C.A.
      Optimizing protein stability in vivo.
      ), are commonly used to improve protein stability in practice, but these are time-consuming and costly (
      • Bornscheuer U.T.
      • Huisman G.W.
      • Kazlauskas R.J.
      • Lutz S.
      • Moore J.C.
      • Robins K.
      Engineering the third wave of biocatalysis.
      ).
      To allow simple and cost-effective protein engineering, numerous computational tools have been developed to predict the effect of mutations on stability. These tools are readily employed by both experts and non-experts due to simple inputs (a PDB
      The abbreviations used are: PDB, Protein Data Bank; MCC, Matthews correlation coefficient; GuSCN, guanidine thiocyanate.
      1The abbreviations used are: PDB, Protein Data Bank; MCC, Matthews correlation coefficient; GuSCN, guanidine thiocyanate.
      structure) and output (a predicted change in stability). Some tools, like EGAD (
      • Pokala N.
      • Handel T.M.
      Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity.
      ), rely on physical force fields that seek to recapitulate the forces felt by a solvated protein. Others, such as PoPMuSiC (
      • Dehouck Y.
      • Grosfils A.
      • Folch B.
      • Gilis D.
      • Bogaerts P.
      • Rooman M.
      Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: Popmusic-2.0.
      ), use statistical potentials, based on the probability of particular amino acids or atoms being in contact in known structures. Combining some elements of each of the above with empirical terms (such as an explicit term for hydrogen bonding) is often referred to as an empirical potential, employed by well-established tools, such as FoldX (
      • Schymkowitz J.
      • Borg J.
      • Stricher F.
      • Nys R.
      • Rousseau F.
      • Serrano L.
      The foldx web server: an online force field.
      ) and Rosetta (
      • Kellogg E.H.
      • Leaver-Fay A.
      • Baker D.
      Role of conformational sampling in computing mutation-induced changes in protein structure and stability.
      ). Still others, like IMutant3 (
      • Capriotti E.
      • Fariselli P.
      • Rossi I.
      • Casadio R.
      A three-state prediction of single point mutations on protein stability changes.
      ), utilize a set of specific features, such as change in hydrophobicity or size, in conjunction with machine learning techniques. To date, however, only a few of these tools (FoldX (
      • Schymkowitz J.
      • Borg J.
      • Stricher F.
      • Nys R.
      • Rousseau F.
      • Serrano L.
      The foldx web server: an online force field.
      ), Rosetta-ddG (
      • Kellogg E.H.
      • Leaver-Fay A.
      • Baker D.
      Role of conformational sampling in computing mutation-induced changes in protein structure and stability.
      ), and PoPMuSiC (
      • Dehouck Y.
      • Grosfils A.
      • Folch B.
      • Gilis D.
      • Bogaerts P.
      • Rooman M.
      Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: Popmusic-2.0.
      )) have been employed to improve protein stability experimentally through the use of point mutations (
      • Gilis D.
      • McLennan H.R.
      • Dehouck Y.
      • Cabrita L.D.
      • Rooman M.
      • Bottomley S.P.
      In vitro and in silico design of α1-antitrypsin mutants with different conformational stabilities.
      • Cabrita L.D.
      • Gilis D.
      • Robertson A.L.
      • Dehouck Y.
      • Rooman M.
      • Bottomley S.P.
      Enhancing the stability and solubility of TEV protease using in silico design.
      ,
      • Yang D.-F.
      • Wei Y.-T.
      • Huang R.-B.
      Computer-aided design of the stability of pyruvate formate-lyase from Escherichia coli by site-directed mutagenesis.
      ,
      • Zhang S.-B.
      • Wu Z.-L.
      Identification of amino acid residues responsible for increased thermostability of feruloyl esterase a from Aspergillus niger using the popmusic algorithm.
      ,
      • Komor R.S.
      • Romero P.A.
      • Xie C.B.
      • Arnold F.H.
      Highly thermostable fungal cellobiohydrolase i (cel7a) engineered using predictive methods.
      ,
      • Silva I.R.
      • Larsen D.M.
      • Jers C.
      • Derkx P.
      • Meyer A.S.
      • Mikkelsen J.D.
      Enhancing RGI lyase thermostability by targeted single point mutations.
      ,
      • Song X.
      • Wang Y.
      • Shu Z.
      • Hong J.
      • Li T.
      • Yao L.
      Engineering a more thermostable blue light photo receptor Bacillus subtilis YtvA LOV domain by a computer aided rational design method.
      ,
      • Wijma H.J.
      • Floor R.J.
      • Jekel P.A.
      • Baker D.
      • Marrink S.J.
      • Janssen D.B.
      Computationally designed libraries for rapid enzyme stabilization.
      ,
      • Floor R.J.
      • Wijma H.J.
      • Colpa D.I.
      • Ramos-Silva A.
      • Jekel P.A.
      • Szymański W.
      • Feringa B.L.
      • Marrink S.J.
      • Janssen D.B.
      Computational library design for increasing haloalkane dehalogenase stability.
      ,
      • Deng Z.
      • Yang H.
      • Li J.
      • Shin H.-D.
      • Du G.
      • Liu L.
      • Chen J.
      Structure-based engineering of alkaline α-amylase from alkaliphilic alkalimonas amylolytica for improved thermostability.
      ,
      • Larsen D.M.
      • Nyffenegger C.
      • Swiniarska M.M.
      • Thygesen A.
      • Strube M.L.
      • Meyer A.S.
      • Mikkelsen J.D.
      Thermostability enhancement of an endo-1,4-β-galactanase from talaromyces stipitatus by site-directed mutagenesis.
      ,
      • Heselpoth R.D.
      • Yin Y.
      • Moult J.
      • Nelson D.C.
      Increasing the stability of the bacteriophage endolysin plyc using rationale-based foldx computational modeling.
      • Bednar D.
      • Beerens K.
      • Sebestova E.
      • Bendl J.
      • Khare S.
      • Chaloupkova R.
      • Prokop Z.
      • Brezovsky J.
      • Baker D.
      • Damborsky J.
      Fireprot: energy- and evolution-based computational design of thermostable multiple-point mutants.
      ), whereas many have only been tested on existing mutation data sets. Entirely sequence-based approaches, such as consensus design and ancestral reconstruction, have also been applied to the protein engineering problem. Whereas these methods often generate sequences with high stability, functional specificity may be sacrificed (
      • Risso V.A.
      • Gavira J.A.
      • Sanchez-Ruiz J.M.
      Thermostable and promiscuous precambrian proteins.
      ). Making specific consensus mutations, rather than changing the entire sequence, has also proven to be effective (
      • Magliery T.J.
      Protein stability: computation, sequence statistics, and new experimental methods.
      ). However, the size and quality of the multiple sequence alignments needed for this method can vary considerably from one target to another, and too much or too little diversity can be detrimental (
      • Porebski B.T.
      • Buckle A.M.
      Consensus protein design.
      ). Thus, choosing a particular tool when undertaking protein engineering to enhance protein stability can be fraught.
      To decrease reliance on any one tool and increase the chance of success when using computational methods, which remains notably low (
      • Magliery T.J.
      Protein stability: computation, sequence statistics, and new experimental methods.
      ,
      • Potapov V.
      • Cohen M.
      • Schreiber G.
      Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details.
      ,
      • Li Z.
      • Yang Y.
      • Zhan J.
      • Dai L.
      • Zhou Y.
      Energy functions in de novo protein design: current challenges and future prospects.
      ), we combined diverse tools into a single meta-predictor. Meta-predictors have performed well in related predictions, such as covalent protein modification (
      • Wan J.
      • Kang S.
      • Tang C.
      • Yan J.
      • Ren Y.
      • Liu J.
      • Gao X.
      • Banerjee A.
      • Ellis L.B.M.
      • Li T.
      Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection.
      ), protein–ligand binding (
      • Yang J.
      • Roy A.
      • Zhang Y.
      Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment.
      ), protein–protein interfaces (
      • Qin S.
      • Zhou H.-X.
      meta-ppisp: a meta web server for protein-protein interaction site prediction.
      ), disordered regions of proteins (
      • Ishida T.
      • Kinoshita K.
      Prediction of disordered regions in proteins based on the meta approach.
      ,
      • Kozlowski L.P.
      • Bujnicki J.M.
      Metadisorder: a meta-server for the prediction of intrinsic disorder in proteins.
      ), and protein aggregation (
      • Emily M.
      • Talvas A.
      • Delamarche C.
      Metamyl: a meta-predictor for amyloid proteins.
      ), where the combination of tools with differing and perhaps complementary strengths and weaknesses may allow retention of strengths while ameliorating weaknesses. We show that this stability meta-predictor performs better than any single tool when tested against a large set of experimental mutations from the Protherm database (
      • Bava K.A.
      • Gromiha M.M.
      • Uedaira H.
      • Kitajima K.
      • Sarai A.
      Protherm, version 4.0: thermodynamic database for proteins and mutants.
      ). In addition, we tested the meta-predictor by using it to predict mutations to stabilize a previously designed protein, ThreeFoil, which has moderate thermodynamic stability (
      • Broom A.
      • Ma S.M.
      • Xia K.
      • Rafalia H.
      • Trainor K.
      • Colón W.
      • Gosavi S.
      • Meiering E.M.
      Designed protein reveals structural determinants of extreme kinetic stability.
      ), as is often the case in initial design iterations and generally limits further development of protein function (
      • Bloom J.D.
      • Labthavikul S.T.
      • Otey C.R.
      • Arnold F.H.
      Protein stability promotes evolvability.
      ,
      • Tokuriki N.
      • Tawfik D.S.
      Stability effects of mutations and protein evolvability.
      ). Experimental characterization of 10 of the top predicted stabilizing mutations resulted in four stabilizing mutations that combine to yield a substantial improvement in stability. However, the mutation predicted to be the most stabilizing was experimentally the most destabilizing. Moreover, the highly stabilized multiple mutants lost considerable solubility due to increases in surface hydrophobicity. Critically, analysis of the Protherm database and individual prediction tools shows that increased surface hydrophobicity is a common and significant caveat to enhancing stability. Thus, we find that, although increasing accuracy and accounting for solubility warrant further attention, meta-prediction offers an effective, inexpensive, and extensible way to model and improve protein stability, which may be valuable for a great diversity of applications.

      Results

       Meta-prediction of stability change

      There are numerous freely available and easy-to-use tools for predicting the change in protein stability upon point mutation. These include FoldX (
      • Schymkowitz J.
      • Borg J.
      • Stricher F.
      • Nys R.
      • Rousseau F.
      • Serrano L.
      The foldx web server: an online force field.
      ), Rosetta-ddG (
      • Kellogg E.H.
      • Leaver-Fay A.
      • Baker D.
      Role of conformational sampling in computing mutation-induced changes in protein structure and stability.
      ), PoPMuSiC (
      • Dehouck Y.
      • Grosfils A.
      • Folch B.
      • Gilis D.
      • Bogaerts P.
      • Rooman M.
      Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: Popmusic-2.0.
      ), EGAD (
      • Pokala N.
      • Handel T.M.
      Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity.
      ), IMutant3 (
      • Capriotti E.
      • Fariselli P.
      • Rossi I.
      • Casadio R.
      A three-state prediction of single point mutations on protein stability changes.
      ), CUPSAT (
      • Parthiban V.
      • Gromiha M.M.
      • Schomburg D.
      Cupsat: prediction of protein stability upon point mutations.
      ), SDM (
      • Worth C.L.
      • Preissner R.
      • Blundell T.L.
      Sdm–a server for predicting effects of mutations on protein stability and malfunction.
      ), Hunter (
      • Cohen M.
      • Potapov V.
      • Schreiber G.
      Four distances between pairs of amino acids provide a precise description of their interaction.
      ), MuPro (
      • Cheng J.
      • Randall A.
      • Baldi P.
      Prediction of protein stability changes for single-site mutations using support vector machines.
      ), MultiMutate (
      • Deutsch C.
      • Krishnamoorthy B.
      Four-body scoring function for mutagenesis.
      ), and DFire (
      • Yang Y.
      • Zhou Y.
      Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions.
      ). Examining the individual performance of each of these 11 stability prediction tools against a data set of 605 mutations (supplemental Table S1) from the Protherm database (
      • Bava K.A.
      • Gromiha M.M.
      • Uedaira H.
      • Kitajima K.
      • Sarai A.
      Protherm, version 4.0: thermodynamic database for proteins and mutants.
      ), which had not been used in their training or parameterization, revealed significant differences between tools, depending on the type of mutation (e.g. solvent-exposed versus buried) (Fig. 1, a–e).
      Figure thumbnail gr1
      Figure 1.Building the meta-predictor. The performance of each tool as measured by MCC (
      • Matthews B.W.
      Comparison of the predicted and observed secondary structure of t4 phage lysozyme.
      ) is shown for different classifications of point mutations. a, polarity: decreased (red), similar (green), increased (blue). b, size: smaller (red), similar (green), larger (blue). c, solvent-accessible surface area of the WT residue: buried (red), partially exposed (green), exposed (blue). d, secondary structure of the backbone at the mutated position: helical (red), strand (green), turn (cyan), unstructured (purple). e, whether the mutation is to/from a glycine (red) or non-glycine (cyan). f, the more tools that were combined in the meta-predictor, the better the average performance, shown for MCC (black), ρ (orange), and R (green). Error bars (a–e), S.D. from 1000 analyses using 50% of the mutation data set. Individual predictors are grouped based on methodology: physical force field (EGAD), empirical potentials (FoldX and Rosetta), statistical potentials without machine learning (CUPSAT, DFire, Hunter, MultiMutate, and SDM), statistical potentials with machine learning (PoPMuSiC), and feature-based with machine learning (IMutant3 and MuPro).
      Notably, there are many cases where specific tools are particularly well or poorly suited to certain types of mutations. For instance, tools using physical or empirical potentials (EGAD, FoldX, and Rosetta-ddG) appear to more accurately predict mutations that increase hydrophobicity than those that reduce it or leave it essentially unaltered (Fig. 1a, red versus green and blue bars). By contrast, many of the tools using statistical potentials (CUPSAT, SDM, PoPMuSiC, and MuPro) show the opposite trend (Fig. 1a). Interestingly, whereas all tools are better at predicting changes to buried residues compared with partially buried or exposed residues (Fig. 1c, red versus green and blue bars), some tools (EGAD, FoldX, Rosetta-ddG, DFire, and PoPMuSiC) do fairly well overall, whereas others like MultiMutate and IMutant3 are not reliable for surface-exposed residues (Fig. 1c, blue bars).
      To utilize all of these tools to maximum effect, a meta-predictor was constructed such that the weight given to the prediction from any particular tool was based on that tool's performance against similar types of mutations from the training set of 605 mutations (see “Experimental procedures”). Cross-validation (see “Experimental procedures”) showed that the meta-predictor has improved performance as more individual tools are incorporated (Fig. 1f). The performance of the meta-predictor was superior to any individual tool based on numerous statistical measures, including correlation coefficients, precision, accuracy, and error (Fig. 2 and Table 1). Thus, combining any number of stability prediction tools weighted according to their individual performance is a useful and extensible strategy for improving performance and may be valuable for many applications, such as predicting stability changes for disease-causing mutations and identifying mutations to stabilize a protein of biotechnological importance. A web-server for the meta-predictor can be found at meieringlab.uwaterloo.ca/stabilitypredict/.
      Please note that the JBC is not responsible for the long-term archiving and maintenance of this site or any other third party hosted site.
      Figure thumbnail gr2
      Figure 2.Comparison of individual tools with the meta-predictor. The predicted ΔΔG for a point mutation is compared with the experimentally determined value for 605 mutations from the Protherm database (
      • Bava K.A.
      • Gromiha M.M.
      • Uedaira H.
      • Kitajima K.
      • Sarai A.
      Protherm, version 4.0: thermodynamic database for proteins and mutants.
      ). This comparison is performed for each of the individual prediction tools and the meta-predictor (bottom right). The equations for the linear fit are given as well as the Pearson (R) correlation coefficient based on the mean of 1000 samples of 50% of the data, as shown in .
      Table 1Comparison of individual tools with the meta-predictor
      ToolMCCRPrecisionAccuracyS.E.
      %%kcal/mol
      EGAD0.340.5250741.61
      FoldX0.380.5452781.78
      Rosetta-ddG0.320.5446752.34
      CUPSAT0.240.5544751.77
      DFire0.430.6449761.84
      Hunter0.160.3234681.89
      MultiMutate0.190.5432622.34
      SDM0.260.4637681.96
      PoPMuSiC0.330.6859791.32
      IMutant30.140.5141751.52
      MuPro0.180.4957781.52
      Meta-predictor0.480.7363821.29

       Improving ThreeFoil stability

      In an effort to improve the thermodynamic stability of ThreeFoil (which is ∼6 kcal/mol) (
      • Broom A.
      • Ma S.M.
      • Xia K.
      • Rafalia H.
      • Trainor K.
      • Colón W.
      • Gosavi S.
      • Meiering E.M.
      Designed protein reveals structural determinants of extreme kinetic stability.
      ), the meta-predictor was used to predict the change in thermodynamic stability (ΔΔG) for all point mutations (excluding cysteine) to each of ThreeFoil's 141 residues, excluding 21 residues probably involved in carbohydrate or sodium binding function (supplemental Fig. S1). Because ThreeFoil possesses 3-fold sequence and structural symmetry, the predicted ΔΔG values were averaged across symmetric positions. Thus, small structural variations between symmetric positions can provide an ensemble-like representation, improving prediction accuracy (
      • Davey J.A.
      • Chica R.A.
      Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles.
      ). The outcome was a total of 720 predictions, 83 (12%) of which were expected to be stabilizing. At some positions there were multiple predicted stabilizing mutations; these may be unidentified functional sites (
      • Meiering E.M.
      • Serrano L.
      • Fersht A.R.
      Effect of active site residues in barnase on activity and stability.
      ,
      • Shoichet B.K.
      • Baase W.A.
      • Kuroki R.
      • Matthews B.W.
      A relationship between protein stability and protein function.
      • Tokuriki N.
      • Stricher F.
      • Serrano L.
      • Tawfik D.S.
      How protein stability and new functions trade off.
      ) or areas where the initial design was poor (
      • Broom A.
      • Doxey A.C.
      • Lobsanov Y.D.
      • Berthin L.G.
      • Rose D.R.
      • Howell P.L.
      • McConkey B.J.
      • Meiering E.M.
      Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein.
      ). To test diverse positions, only the most stabilizing mutation at each residue was considered. An exception was the selection of two mutations at Glu-66, a completely solvent-exposed position for which 10 of the 18 possible substitutions were predicted to stabilize and where poor agreement between Rosetta and consensus design was observed during the initial development of ThreeFoil (
      • Broom A.
      • Doxey A.C.
      • Lobsanov Y.D.
      • Berthin L.G.
      • Rose D.R.
      • Howell P.L.
      • McConkey B.J.
      • Meiering E.M.
      Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein.
      ). In total, 10 point mutations predicted to stabilize were tested experimentally (Fig. 3a). These mutations were made in ThreeFoil's second symmetric subdomain, and the effect of each mutation was determined using kinetic unfolding and folding measurements (Fig. 3b, Table 2, and supplemental Fig. S2).
      Figure thumbnail gr3
      Figure 3.Experimental characterization of mutations to ThreeFoil. a, ThreeFoil's structure with sites of mutation shown as sticks. Stabilizing mutations are shown in green, destabilizing mutations in orange, and neutral in gray. b, folding and unfolding kinetics of ThreeFoil and mutants. The same colors are used as in a, with wild type shown in black and the highly stabilized multiple mutants in blue. Solid lines, fits of the data to a two-state unfolding model using Equation 3. c, predicted (light gray) and experimentally determined (dark gray) change in thermodynamic stability for ThreeFoil mutants, ΔΔGU F. Positive values indicate increased stability, calculated using folding and unfolding rate constants in the absence of denaturant (see “Experimental procedures” and ). d, fractions of mutations that are stabilizing (green, ΔΔGU F ≥ 0.2 kcal/mol), neutral (gray, 0.2 kcal/mol > ΔΔGU F > −0.2 kcal/mol), or destabilizing (orange, ΔΔGU F ≤ −0.2 kcal/mol) determined for ThreeFoil mutants designed using the meta-predictor, the Protherm database, and random mutagenesis (from deep sequencing studies (
      • Foit L.
      • Morgan G.J.
      • Kern M.J.
      • Steimer L.R.
      • von Hacht A.A.
      • Titchmarsh J.
      • Warriner S.L.
      • Radford S.E.
      • Bardwell J.C.A.
      Optimizing protein stability in vivo.
      ,
      • Deng Z.
      • Huang W.
      • Bakkalbasi E.
      • Brown N.G.
      • Adamski C.J.
      • Rice K.
      • Muzny D.
      • Gibbs R.A.
      • Palzkill T.
      Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution.
      ,
      • Araya C.L.
      • Fowler D.M.
      • Chen W.
      • Muniez I.
      • Kelly J.W.
      • Fields S.
      A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function.
      • Klesmith J.R.
      • Bacik J.-P.
      • Wrenbeck E.E.
      • Michalczyk R.
      • Whitehead T.A.
      Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning.
      ); see supplemental Table S2). Error bars, uncertainty from linear fit using Origin 5 software.
      Table 2Kinetic parameters of ThreeFoil mutants
      Mutantln (kf)ln(ku)mfmuCmidΔΔGU − F
      kcal mol−1 m−1kcal mol−1 m−1mkcal/mol
      WT−9.57 ± 0.06−22.0 ± 0.2−6.22 ± 0.333.20 ± 0.050.79 ± 0.020
      K53V−8.07 ± 0.05−21.9 ± 0.3−7.10 ± 0.243.24 ± 0.080.80 ± 0.030.8 ± 0.2
      A62V−8.66 ± 0.05−22.0 ± 0.3−6.78 ± 0.283.25 ± 0.070.79 ± 0.030.5 ± 0.2
      D85P−9.14 ± 0.06−21.9 ± 0.3−5.97 ± 0.333.19 ± 0.070.83 ± 0.030.2 ± 0.2
      D49N−9.07 ± 0.03−21.9 ± 0.1−6.37 ± 0.173.23 ± 0.030.80 ± 0.020.2 ± 0.2
      A68G−9.59 ± 0.05−22.1 ± 0.2−6.12 ± 0.293.28 ± 0.040.79 ± 0.020.0 ± 0.2
      R90L−9.99 ± 0.05−22.4 ± 0.2−5.78 ± 0.293.34 ± 0.040.81 ± 0.020.0 ± 0.2
      D93P−9.45 ± 0.06−21.9 ± 0.2−6.26 ± 0.363.13 ± 0.050.79 ± 0.030.0 ± 0.2
      E66Y−9.73 ± 0.07−21.2 ± 0.3−6.76 ± 0.383.09 ± 0.060.69 ± 0.03−0.6 ± 0.2
      E66L−9.58 ± 0.06−20.6 ± 0.2−5.60 ± 0.313.02 ± 0.050.76 ± 0.02−0.9 ± 0.2
      Q78I
      Only a single point was determined on the refolding branch for Q78I due to aggregation at intermediate denaturant concentrations. The mf from WT was used to estimate ln(kf), and values affected by this approximation are given in italic type.
      8.40 ± 0.13−16.3 ± 0.16.22 ± 0.332.70 ± 0.030.52 ± 0.032.7 ± 0.2
      MMut1−4.50 ± 0.3−20.4 ± 0.3−6.34 ± 0.312.46 ± 0.071.08 ± 0.052.1 ± 0.3
      MMut2−3.15 ± 0.17−19.6 ± 0.2−6.75 ± 0.162.32 ± 0.061.08 ± 0.032.4 ± 0.2
      a Only a single point was determined on the refolding branch for Q78I due to aggregation at intermediate denaturant concentrations. The mf from WT was used to estimate ln(kf), and values affected by this approximation are given in italic type.
      The experimental effects of the mutations on stability were mixed (Fig. 3c). Four mutations (K53V, A62V, D85P, and D49N) increased the folding rate, stabilizing ThreeFoil by 0.8, 0.5, 0.2, and 0.2 kcal/mol, respectively (Table 2 and Fig. 3 (green mutations)). Three mutations (A68G, R90L, and D93P) had no significant effect on stability, although in the case of A68G and D93P, the predicted effect was also small (Table 2). The remaining three mutations (E66Y, E66L, and Q78I), which had been predicted to be the most stabilizing, resulted in faster unfolding and stability losses of 0.6, 0.9, and 2.7 kcal/mol, respectively. Consistent with mutations being, by design, outside ThreeFoil's carbohydrate binding sites, none of the stabilizing mutations affected carbohydrate binding affinity (supplemental Fig. S3). Predictions of the individual tools differed considerably, and no single tool was reliable (supplemental Fig. S4). Overall, the 40% success rate of the meta-predictor compares favorably with an expected rate of ∼2% for random mutagenesis (based on recent deep sequencing experiments; supplemental Table S2 (
      • Foit L.
      • Morgan G.J.
      • Kern M.J.
      • Steimer L.R.
      • von Hacht A.A.
      • Titchmarsh J.
      • Warriner S.L.
      • Radford S.E.
      • Bardwell J.C.A.
      Optimizing protein stability in vivo.
      ,
      • Deng Z.
      • Huang W.
      • Bakkalbasi E.
      • Brown N.G.
      • Adamski C.J.
      • Rice K.
      • Muzny D.
      • Gibbs R.A.
      • Palzkill T.
      Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution.
      ,
      • Araya C.L.
      • Fowler D.M.
      • Chen W.
      • Muniez I.
      • Kelly J.W.
      • Fields S.
      A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function.
      • Klesmith J.R.
      • Bacik J.-P.
      • Wrenbeck E.E.
      • Michalczyk R.
      • Whitehead T.A.
      Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning.
      )).

       Taking advantage of symmetry

      By using the most stabilizing mutations and not only combining them, but repeating them in all three symmetric subdomains (Fig. 4), ThreeFoil's symmetry was leveraged to achieve substantial stabilization. Two multiple mutants were made, the first (MMut1) incorporating K53V, A62V, and D85P at all symmetric sites, and the second (MMut2) adding D49N at each symmetric site. Both multiple mutants stabilize the protein by >2 kcal/mol and increase the denaturation midpoint as a result of greatly increased refolding rates (Fig. 3 (b and c), blue mutants). For instance, whereas the wild-type has a refolding half-life in the absence of denaturant of several hours, the multiple mutants have refolding half-lives of 1 min or less under the same conditions. Although the stabilization and improvement in refolding rates represent a marked improvement, both multiple mutants are considerably less soluble (0.1–0.2 mg/ml) than the original protein (>20 mg/ml) (Table 3). Given the nature of the individual mutations (K6V/K53V/K100V and D38P/D85P/D132P replace charged and exposed side chains with hydrophobic ones, A15V/A62V/A109V increases the size of a hydrophobic side chain, and D2N/D49N/D96N replaces a charged and exposed side chain with a polar one), these results highlight a potential problem wherein selecting for the singular characteristic of improved stability has given the unintended outcome of lower solubility, probably due to increased surface hydrophobicity.
      Figure thumbnail gr4
      Figure 4.Taking advantage of symmetry to improve stability. ThreeFoil structure (PDB entry 3PG0) is shown from a side view, perpendicular to the 3-fold axis of symmetry (top) and rotated 90° to look along the axis of symmetry toward the capping β-hairpin triplets (bottom). The first, second, and third subdomains are shown in magenta, cyan, and yellow, respectively. Sites of mutations used for MMut1 (K6V/K53V/K100V, A15V/A62V/A109V, and D38P/D85P/D132P) and MMut2 (same as MMut1 plus D2N/D49N/D96N) are shown as thick sticks.
      Table 3Experimental and predicted solubility of ThreeFoil variants
      MutantSolubilityAggrescan3DCamSolPASTATANGOZipperDBZyggregatorHydrophobicity
      mg/ml
      WT23.81.001.001.001.001.001.001.00
      K53V22.50.970.780.980.010.941.080.95
      A62V21.11.000.981.001.000.880.990.98
      D85P16.90.960.991.000.991.000.920.94
      D49N25.00.980.991.001.000.940.910.95
      A68G21.91.001.011.001.001.001.001.01
      R90L16.80.920.871.000.421.001.080.95
      D93P21.20.930.981.001.001.000.920.94
      E66Y19.90.950.981.001.001.000.900.93
      E66L17.00.960.991.001.001.000.900.92
      Q78I1.41.000.960.910.021.000.990.97
      MMut10.20.790.280.860.000.631.010.72
      MMut20.10.760.150.860.010.580.730.65
      R0.710.750.930.750.690.300.75
      ρ0.600.630.580.640.200.150.59

       Stabilizing mutations tend to increase surface hydrophobicity

      Whereas protein function often relies on having at least some stability (maintenance of the native form of the protein), it is also crucial to have enough solubility (protein available in solution) to perform that function adequately. Examining the source of reduced solubility of the multiple mutants, and minor reduction in solubility of the single mutants, reveals critical problems with the potentials/force fields underlying current protein stabilization tools.
      A diverse set of factors that may contribute to reduced protein solubility have been recognized, from specific structural features like exposed hydrophobic patches (
      • Voynov V.
      • Chennamsetty N.
      • Kayser V.
      • Helk B.
      • Trout B.L.
      Predictive tools for stabilization of therapeutic proteins.
      ) and regions with high propensity to form amyloid-like structures (
      • Thompson M.J.
      • Sievers S.A.
      • Karanicolas J.
      • Ivanova M.I.
      • Baker D.
      • Eisenberg D.
      The 3D profile method for identifying fibril-forming segments of proteins.
      ) to overall properties, such as net charge (
      • Lawrence M.S.
      • Phillips K.J.
      • Liu D.R.
      Supercharging proteins can impart unusual resilience.
      ) and amino acid content (
      • Warwicker J.
      • Charonis S.
      • Curtis R.A.
      Lysine and arginine content of proteins: computational analysis suggests a new tool for solubility design.
      ,
      • Rauscher S.
      • Baud S.
      • Miao M.
      • Keeley F.W.
      • Pomès R.
      Proline and glycine control protein self-organization into elastomeric or amyloid fibrils.
      ). Many solubility prediction tools exist that take these factors into account. Applying six tools available online showed they successfully predicted the greatly reduced solubility of the multiple mutants but could not discriminate well the differences in solubility between the single mutants (Table 3). Notably, however, simply accounting for the change in hydrophobicity of the mutation (measured by change in solvation free energy of the side chains) (
      • Wimley W.C.
      • Creamer T.P.
      • White S.H.
      Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
      ) gave an equally good prediction of the multiple mutants' poor behavior, as measured by both linear and rank correlation coefficients (Table 3). Indeed, 9 of the 10 mutations are to more hydrophobic residues, and this apparent preference for hydrophobic mutations, even on the protein surface, suggested a potential general problem with computational prediction.
      Whereas a few experimental studies have suggested that increasing hydrophobicity at the protein surface improves stability (
      • Cordes M.H.
      • Sauer R.T.
      Tolerance of a protein to multiple polar-to-hydrophobic surface substitutions.
      • Poso D.
      • Sessions R.B.
      • Lorch M.
      • Clarke A.R.
      Progressive stabilization of intermediate and transition states in protein folding reactions by introducing surface hydrophobic residues.
      ,
      • Funahashi J.
      • Takano K.
      • Yamagata Y.
      • Yutani K.
      Positive contribution of hydration structure on the surface of human lysozyme to the conformational stability.
      ,
      • Machius M.
      • Declerck N.
      • Huber R.
      • Wiegand G.
      Kinetic stabilization of Bacillus licheniformis α-amylase through introduction of hydrophobic residues at the surface.
      • Ayuso-Tejedor S.
      • Abián O.
      • Sancho J.
      Underexposed polar residues and protein stabilization.
      ), and the favoring of increased hydrophobicity has been reported for a few computational tools (
      • Pokala N.
      • Handel T.M.
      Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity.
      ,
      • Dantas G.
      • Kuhlman B.
      • Callender D.
      • Wong M.
      • Baker D.
      A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins.
      ,
      • Jacak R.
      • Leaver-Fay A.
      • Kuhlman B.
      Computational protein design with explicit consideration of surface hydrophobic patches.
      ), our analysis indicates that this is, in fact, a widespread phenomenon. Examining the Protherm data set shows that stabilizing mutations on the protein surface are often observed when making substitutions to more hydrophobic residues, whereas destabilizing mutations tend to be substitutions to more hydrophilic residues (Fig. 5, Protherm). Critically, we find that 10 of the 11 individual tools have a bias toward recommending hydrophobic mutations on the protein surface as stabilizing. Overall, this results in predicted stabilizing mutations increasing solvation free energy by 0.84 kcal/mol on average (Fig. 5), similar to mutating alanine to valine (
      • Wimley W.C.
      • Creamer T.P.
      • White S.H.
      Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
      ). Although the tools may be recapitulating a real tendency for hydrophobic surface mutations to stabilize, the fact that protein design attempts most frequently fail because of poor solubility (
      • Huang P.-S.
      • Boyken S.E.
      • Baker D.
      The coming of age of de novo protein design.
      ) makes this a critical concern for protein engineering and design.
      Figure thumbnail gr5
      Figure 5.Stabilizing mutations on the protein surface favor increasing hydrophobicity. The changes in side chain solvation free energy, ΔΔGsolvation (kcal/mol) (
      • Wimley W.C.
      • Creamer T.P.
      • White S.H.
      Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
      ) for all 229 solvent-exposed mutations in the Protherm-derived data set (positive ΔΔGsolvation indicates a mutation to a more hydrophobic residue with reduced solubility in water (
      • Wimley W.C.
      • Creamer T.P.
      • White S.H.
      Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
      )) are shown as box plots. Red boxes, mutations predicted to destabilize (ΔΔGU F ≤ −0.2 kcal/mol); blue boxes, mutations predicted to stabilize (ΔΔGU F ≥ 0.2 kcal/mol). For the Protherm-derived data set (far left), red indicates mutations experimentally determined to destabilize, and blue indicates mutations experimentally determined to stabilize relative to the wild-type protein. The notched region around the median (horizontal line) of each box represents the 95% confidence interval for the median, indicating that for Protherm and the majority of the predictors, mutations on the protein surface that increase stability tend to be to more hydrophobic amino acids, whereas mutations that decrease stability tend to be to more hydrophilic amino acids (CUPSAT is the singular exception). The colored region of each box includes the middle 50% of the data, with dashed whiskers showing 1.5 times the interquartile range above and below the middle 50% (the interquartile range is the difference between the top and bottom of the colored region, or the difference between the 25th and 75th percentiles). Outliers (mutations outside the whiskers) are shown as semitransparent single points.

       Prediction tools underestimate the importance of buried polar groups

      In addition to favoring hydrophobic mutations on the protein surface, the individual tools and meta-predictor may underestimate the importance of buried polar groups. This is suggested by the failure of the Q78I mutation, predicted to stabilize by seven of the individual tools (CUPSAT, DFire, FoldX, MultiMutate, PoPMuSiC, Rosetta-ddG, and SDM, with the remaining predicting essentially no change) (supplemental Fig. S4) and estimated by the meta-predictor to give ∼1.5 kcal/mol of stability, yet resulting in much faster unfolding than WT, and a loss of ∼3 kcal/mol of stability (Fig. 3, b and c). A single buried hydrogen bond has been estimated to contribute ∼1 kcal/mol to protein stability (
      • Pace C.N.
      • Fu H.
      • Lee Fryar K.
      • Landua J.
      • Trevino S.R.
      • Schell D.
      • Thurlkill R.L.
      • Imura S.
      • Scholtz J.M.
      • Gajiwala K.
      • Sevcik J.
      • Urbanikova L.
      • Myers J.K.
      • Takano K.
      • Hebert E.J.
      • et al.
      Contribution of hydrogen bonds to protein stability.
      ). Because Gln-78 makes three buried hydrogen bonds (one side chain–side chain and two side chain–backbone) (Fig. 6), the loss of ∼3 kcal/mol of stability may be accounted for by the loss of these specific interactions. Whereas buried hydrophobic groups (e.g. -CH2−) may contribute a similar ∼1 kcal/mol to stability (
      • Pace C.N.
      • Fu H.
      • Lee Fryar K.
      • Landua J.
      • Trevino S.R.
      • Schell D.
      • Thurlkill R.L.
      • Imura S.
      • Scholtz J.M.
      • Gajiwala K.
      • Sevcik J.
      • Urbanikova L.
      • Myers J.K.
      • Takano K.
      • Hebert E.J.
      • et al.
      Contribution of hydrogen bonds to protein stability.
      ), isoleucine has only 1 additional -CH2− group compared with glutamine, suggesting that the buried hydrogen bonds were either undervalued or unrecognized by most of the prediction tools. In the case of physical or empirical force fields, this could arise from solvation terms or parameterization that penalizes the burial of polar or charged residues (
      • Bazzoli A.
      • Kelow S.P.
      • Karanicolas J.
      Enhancements to the rosetta energy function enable improved identification of small molecules that inhibit protein-protein interactions.
      ), whereas statistical potentials may inherently undervalue certain interactions (
      • Rykunov D.
      • Fiser A.
      Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials.
      ,
      • Rooman M.
      • Gilis D.
      Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power.
      • Thomas P.D.
      • Dill K.A.
      Statistical potentials extracted from protein structures: how accurate are they?.
      ). Here, the mutation of a buried polar sidechain to a hydrophobic one illustrates the dramatic failure that results when computational methods either do not recognize the presence of buried hydrogen bonds or inaccurately predict their contribution to protein stability.
      Figure thumbnail gr6
      Figure 6.Thermodynamic contributions of buried hydrogen bonds are poorly predicted. In ThreeFoil, the wild-type glutamine at position 78 is completely buried from solvent yet forms three hydrogen bonds. a and b, views of the entire protein are shown with a 90° rotation to illustrate the buried nature and position of Gln-78. c, enlarged view illustrating hydrogen bonds. Two hydrogen bonds are with backbone oxygens, and one is with the oxygen of a serine side chain. Bonds to the glutamine's nitrogen involve the backbone oxygens from serines 22 and 24, whereas the single bond to the glutamine's oxygen involves the side chain oxygen of serine 59. Hydrogen bonds are shown as magenta dashed lines; Gln-78 is shown as orange sticks; and Ser-22, Ser-24, and Ser-59 are shown as cyan sticks.

      Discussion

      Stabilizing mutations give multiple desirable attributes to proteins: enhanced capacity for directed evolution and design of novel functions (
      • Bloom J.D.
      • Labthavikul S.T.
      • Otey C.R.
      • Arnold F.H.
      Protein stability promotes evolvability.
      ,
      • Tokuriki N.
      • Tawfik D.S.
      Stability effects of mutations and protein evolvability.
      ), increased resistance to high temperature and harsh solution conditions (
      • Bommarius A.S.
      • Paye M.F.
      Stabilizing biocatalysts.
      ,
      • Magliery T.J.
      Protein stability: computation, sequence statistics, and new experimental methods.
      ), and reduced immunogenicity caused by local or global unfolding and subsequent aggregation (
      • Sauerborn M.
      • Brinks V.
      • Jiskoot W.
      • Schellekens H.
      Immunological mechanism underlying the immune response to recombinant human protein therapeutics.
      ,
      • Manning M.C.
      • Chou D.K.
      • Murphy B.M.
      • Payne R.W.
      • Katayama D.S.
      Stability of protein pharmaceuticals: an update.
      • Boulet-Audet M.
      • Byrne B.
      • Kazarian S.G.
      High-throughput thermal stability analysis of a monoclonal antibody by attenuated total reflection ft-ir spectroscopic imaging.
      ,
      • Redington J.M.
      • Breydo L.
      • Uversky V.N.
      When good goes awry: the aggregation of protein therapeutics.
      ). The meta-prediction approach used here, which incorporated 11 diverse computational tools for predicting ΔΔG upon mutation, was better than any single tool at predicting the effects of more than 600 experimentally characterized point mutations. The meta-predictor also successfully identified stabilizing mutations to a previously designed protein, ThreeFoil. Most mutations to ThreeFoil resulted in stability changes close to the predicted values (Fig. 3), with a success rate of 40%, comparing favorably with other studies (
      • Komor R.S.
      • Romero P.A.
      • Xie C.B.
      • Arnold F.H.
      Highly thermostable fungal cellobiohydrolase i (cel7a) engineered using predictive methods.
      ,
      • Silva I.R.
      • Larsen D.M.
      • Jers C.
      • Derkx P.
      • Meyer A.S.
      • Mikkelsen J.D.
      Enhancing RGI lyase thermostability by targeted single point mutations.
      ,
      • Wijma H.J.
      • Floor R.J.
      • Jekel P.A.
      • Baker D.
      • Marrink S.J.
      • Janssen D.B.
      Computationally designed libraries for rapid enzyme stabilization.
      ,
      • Floor R.J.
      • Wijma H.J.
      • Colpa D.I.
      • Ramos-Silva A.
      • Jekel P.A.
      • Szymański W.
      • Feringa B.L.
      • Marrink S.J.
      • Janssen D.B.
      Computational library design for increasing haloalkane dehalogenase stability.
      ,
      • Heselpoth R.D.
      • Yin Y.
      • Moult J.
      • Nelson D.C.
      Increasing the stability of the bacteriophage endolysin plyc using rationale-based foldx computational modeling.
      ). Moreover, the stabilizing single mutations could be combined to yield highly stabilized multiple mutants. Because the meta-predictor weighs the contribution of each tool according to its performance on different mutation types (increased size, decreased polarity, etc.), it is readily extensible, allowing for future improvements using complementary or improved individual tools.
      The experimental results are also notable because functional residues, which are often sources of instability and easy targets for improvement, were excluded (
      • Meiering E.M.
      • Serrano L.
      • Fersht A.R.
      Effect of active site residues in barnase on activity and stability.
      ,
      • Shoichet B.K.
      • Baase W.A.
      • Kuroki R.
      • Matthews B.W.
      A relationship between protein stability and protein function.
      • Tokuriki N.
      • Stricher F.
      • Serrano L.
      • Tawfik D.S.
      How protein stability and new functions trade off.
      ). Indeed, binding function was maintained upon mutation. The amplification of stability (>2 kcal/mol) gained by leveraging the sequence and structural symmetry of ThreeFoil (
      • Broom A.
      • Doxey A.C.
      • Lobsanov Y.D.
      • Berthin L.G.
      • Rose D.R.
      • Howell P.L.
      • McConkey B.J.
      • Meiering E.M.
      Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein.
      ), demonstrates the tractability of engineering using symmetric (
      • Broom A.
      • Trainor K.
      • MacKenzie D.W.
      • Meiering E.M.
      Using natural sequences and modularity to design common and novel protein topologies.
      ) or repeat protein scaffolds (
      • Boersma Y.L.
      • Plückthun A.
      Darpins and other repeat protein scaffolds: advances in engineering and applications.
      ,
      • Wetzel S.K.
      • Settanni G.
      • Kenig M.
      • Binz H.K.
      • Plückthun A.
      Folding and unfolding mechanism of highly stable full-consensus ankyrin repeat proteins.
      ), both of which are abundant in nature (
      • Balaji S.
      Internal symmetry in protein structures: prevalence, functional relevance and evolution.
      ).
      Although the stabilization offered by the multiple mutants is substantial, there was a considerable loss in solubility, probably resulting from increases in surface hydrophobicity. All but 1 of the 10 tested mutations either introduced a hydrophobic amino acid or changed a charged residue to a polar one, and the majority of these changes occurred on the protein surface. Although it may seem counterintuitive that hydrophobic mutations at solvent-exposed positions would stabilize, it may be that even on the protein surface, hydrophobic side chains can bury relatively more surface area when in the native rather than the denatured state (
      • Poso D.
      • Sessions R.B.
      • Lorch M.
      • Clarke A.R.
      Progressive stabilization of intermediate and transition states in protein folding reactions by introducing surface hydrophobic residues.
      ). Here, an analysis of more than 200 experimentally characterized point mutations on protein surfaces reveals that when computational tools attempt to improve protein stability, they often rely on the stabilizing effect of hydrophobic mutations at the cost of solubility. Even the tools that are the most widely used and/or perform the best (high correlation coefficients and/or low error) fall into this trap, such as Rosetta, FoldX, DFire, and PoPMuSiC. Although protein solubility may be generally improved by introducing a high density of similarly charged residues on the protein surface, this leads to decreases in stability (
      • Lawrence M.S.
      • Phillips K.J.
      • Liu D.R.
      Supercharging proteins can impart unusual resilience.
      ). However, specifically designing optimized networks of surface electrostatics using a small number of mutations has been shown to improve both stability and foldability, illustrating that stability and solubility tradeoffs are not mandatory (
      • Tzul F.O.
      • Schweiker K.L.
      • Makhatadze G.I.
      Modulation of folding energy landscape by charge-charge interactions: linking experiments with computational modeling.
      ).
      The tendency of the prediction tools to favor hydrophobic mutations is not limited to the protein surface. Mutation of a buried glutamine (Gln-78) to isoleucine was predicted to be the most stabilizing, with the majority of individual tools predicting considerable stabilization and the remainder predicting a nearly neutral effect. Mutations to valine, leucine, and methionine at the same position were also predicted to stabilize. Surprisingly, the Q78I mutation was the most destabilizing mutation tested, dramatically accelerating unfolding. The results for Gln-78 point to an imbalance in force field terms (
      • Bazzoli A.
      • Kelow S.P.
      • Karanicolas J.
      Enhancements to the rosetta energy function enable improved identification of small molecules that inhibit protein-protein interactions.
      ) or systematic errors and biases in statistical potentials (
      • Rykunov D.
      • Fiser A.
      Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials.
      ,
      • Rooman M.
      • Gilis D.
      Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power.
      • Thomas P.D.
      • Dill K.A.
      Statistical potentials extracted from protein structures: how accurate are they?.
      ); properly recognizing the key role of buried polar side-chains that can hydrogen-bond (
      • Worth C.L.
      • Blundell T.L.
      On the evolutionary conservation of hydrogen bonds made by buried polar amino acids: the hidden joists, braces and trusses of protein architecture.
      ,
      • Bolen D.W.
      • Rose G.D.
      Structure and energetics of the hydrogen-bonded backbone in protein folding.
      • Bolon D.N.
      • Marcus J.S.
      • Ross S.A.
      • Mayo S.L.
      Prudent modeling of core polar residues in computational protein design.
      ) is key to future improvements in prediction reliability.
      Accurate and reliable prediction of point mutations is critical to protein design and engineering. Because destabilizing mutations, particularly buried ones, often have larger effects than stabilizing ones (
      • Tokuriki N.
      • Stricher F.
      • Schymkowitz J.
      • Serrano L.
      • Tawfik D.S.
      The stability effects of protein mutations appear to be universally distributed.
      ), even a single incorrectly predicted mutation, like Q78I here, can jeopardize the stability of the entire protein and counteract a larger number of stabilizing mutations. Thus, getting the details right when predicting changes in stability will allow confident engineering of multiple mutants without testing individual mutations and result in reliable success during de novo design (
      • Magliery T.J.
      Protein stability: computation, sequence statistics, and new experimental methods.
      ,
      • Potapov V.
      • Cohen M.
      • Schreiber G.
      Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details.
      ,
      • Li Z.
      • Yang Y.
      • Zhan J.
      • Dai L.
      • Zhou Y.
      Energy functions in de novo protein design: current challenges and future prospects.
      ). For design targets with homologous sequences, incorporating information from multiple-sequence alignments may improve reliability (
      • Bednar D.
      • Beerens K.
      • Sebestova E.
      • Bendl J.
      • Khare S.
      • Chaloupkova R.
      • Prokop Z.
      • Brezovsky J.
      • Baker D.
      • Damborsky J.
      Fireprot: energy- and evolution-based computational design of thermostable multiple-point mutants.
      ,
      • Broom A.
      • Trainor K.
      • MacKenzie D.W.
      • Meiering E.M.
      Using natural sequences and modularity to design common and novel protein topologies.
      ,
      • Berliner N.
      • Teyra J.
      • Colak R.
      • Garcia Lopez S.
      • Kim P.M.
      Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.
      ,
      • Goldenzweig A.
      • Goldsmith M.
      • Hill S.E.
      • Gertman O.
      • Laurino P.
      • Ashani Y.
      • Dym O.
      • Unger T.
      • Albeck S.
      • Prilusky J.
      • Lieberman R.L.
      • Aharoni A.
      • Silman I.
      • Sussman J.L.
      • Tawfik D.S.
      • Fleishman S.J.
      Automated structure- and sequence-based design of proteins for high bacterial expression and stability.
      • Bendl J.
      • Stourac J.
      • Sebestova E.
      • Vavra O.
      • Musil M.
      • Brezovsky J.
      • Damborsky J.
      Hotspot wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering.
      ). However, the availability of such sequences, and thus the benefit of their use, will differ from one target to another. Furthermore, choosing the appropriate level of sequence diversity can be challenging (
      • Porebski B.T.
      • Buckle A.M.
      Consensus protein design.
      ). Predictions may also be improved by accounting for backbone flexibility, either by building this directly into the tool itself (
      • Kellogg E.H.
      • Leaver-Fay A.
      • Baker D.
      Role of conformational sampling in computing mutation-induced changes in protein structure and stability.
      ) or through multistate design, which could improve the accuracy of existing predictors, although at the cost of additional computational resources (
      • Davey J.A.
      • Damry A.M.
      • Euler C.K.
      • Goto N.K.
      • Chica R.A.
      Prediction of stable globular proteins using negative design with non-native backbone ensembles.
      ). Further improvements in the reliability of meta-prediction approaches may be gained by incorporating these additional tools and methods.
      In summary, the meta-prediction approach presented here combines individual predictors to improve performance, as demonstrated in testing against a large data set of experimentally characterized mutations and by finding new stabilizing mutations to a previously designed protein while retaining its function. This approach, wherein the weight given to any one tool's prediction is proportional to its performance on similar types of mutations, is extensible to future tools and offers a simple method to gain additional reliability when improving protein stability. Critically, however, the protein engineering tools used here have a marked tendency to improve protein stability by mutating residues on a protein's surface to more hydrophobic side chains, which commonly results in a loss of solubility. This loss of solubility is a key issue that must be resolved for protein engineering to be employed widely and reliably, with a transformative impact on biotechnology.

      Experimental procedures

       Stability prediction by individual tools

      The following individual tools were used for ΔΔG predictions: CUPSAT (web server: cupsat.tu-bs.de/)3 (
      • Parthiban V.
      • Gromiha M.M.
      • Schomburg D.
      Cupsat: prediction of protein stability upon point mutations.
      ), DFIRE2 (stand-alone executable, version 1.1; wild-type and mutant structures were generated using SCWRL4 (
      • Krivov G.G.
      • Shapovalov M.V.
      • Dunbrack R.L.
      Improved prediction of protein side-chain conformations with scwrl4.
      )) (
      • Yang Y.
      • Zhou Y.
      Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions.
      ), EGAD (stand-alone library) (
      • Pokala N.
      • Handel T.M.
      Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity.
      ), FoldX (stand-alone executable, version 3.0) (
      • Schymkowitz J.
      • Borg J.
      • Stricher F.
      • Nys R.
      • Rousseau F.
      • Serrano L.
      The foldx web server: an online force field.
      ), Hunter (stand-alone executable) (
      • Cohen M.
      • Potapov V.
      • Schreiber G.
      Four distances between pairs of amino acids provide a precise description of their interaction.
      ), IMutant3 (stand-alone executable, version 3.0.7) (
      • Capriotti E.
      • Fariselli P.
      • Rossi I.
      • Casadio R.
      A three-state prediction of single point mutations on protein stability changes.
      ), MultiMutate (stand-alone executable) (
      • Deutsch C.
      • Krishnamoorthy B.
      Four-body scoring function for mutagenesis.
      ), MuPro (stand-alone executable, version 1.1) (
      • Cheng J.
      • Randall A.
      • Baldi P.
      Prediction of protein stability changes for single-site mutations using support vector machines.
      ), PoPMuSiC (web server, version 2.0; currently only version 2.1 is available) (
      • Dehouck Y.
      • Grosfils A.
      • Folch B.
      • Gilis D.
      • Bogaerts P.
      • Rooman M.
      Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: Popmusic-2.0.
      ), Rosetta-ddG (stand-alone executable (ddg_monomer) using protocol 16, which incorporates backbone flexibility, Rosetta version 3.5) (
      • Kellogg E.H.
      • Leaver-Fay A.
      • Baker D.
      Role of conformational sampling in computing mutation-induced changes in protein structure and stability.
      ), and SDM (web server: mordred.bioc.cam.ac.uk/∼sdm/sdm.php)3 (
      • Worth C.L.
      • Preissner R.
      • Blundell T.L.
      Sdm–a server for predicting effects of mutations on protein stability and malfunction.
      ). For predictions of point mutations to ThreeFoil, the PDB structure 3PG0 was used. For predictions of mutations in the Protherm database (
      • Bava K.A.
      • Gromiha M.M.
      • Uedaira H.
      • Kitajima K.
      • Sarai A.
      Protherm, version 4.0: thermodynamic database for proteins and mutants.
      ), the structure associated with the particular entry in the database was used and reduced to the residues indicated in the database (i.e. for multidomain proteins). For prediction tools requiring temperature or pH as part of the input, a temperature of 27 °C was used and a pH of 8.1 for consistency with experimental conditions used in testing ThreeFoil. Note that values of the change in free energy for the mutants is based on the convention of measuring the difference in free energy changes from folded to unfolded,
      ΔΔGUF=ΔGUF(mut)ΔGUF(WT)
      (Eq. 1)


      where ΔGU F = GUGF.
      Thus, positive ΔΔGU F values represent increased stability (output from individual tools is converted to this convention).

       Curation of the Protherm database

      We searched the Protherm database (
      • Bava K.A.
      • Gromiha M.M.
      • Uedaira H.
      • Kitajima K.
      • Sarai A.
      Protherm, version 4.0: thermodynamic database for proteins and mutants.
      ) for unique point mutations with experimentally measured ΔΔG values where the temperature and pH of the experiment was between 20 and 30 °C and between 5 and 9, respectively. Additionally, most of the prediction tools cannot account for co-factors or prosthetic groups, so proteins with such groups were also removed. This left 1663 point mutations in total. Testing the prediction tools against mutations used for their training or parameterization may give misleading results. To avoid this, we removed all 1058 mutations used in the training or parameterization of any of the tools, leaving 605 mutations to 60 proteins, including all major structural classes (results of testing against the 1058 removed mutations are given as supplemental Table S3).
      A known problem with the Protherm database is entry of ΔΔG values with the wrong sign (
      • Bava K.A.
      • Gromiha M.M.
      • Uedaira H.
      • Kitajima K.
      • Sarai A.
      Protherm, version 4.0: thermodynamic database for proteins and mutants.
      ). To correct such entries, we manually reviewed the publications reporting the 605 mutations. In addition to fixing numerous cases where the sign was entered incorrectly, we found several cases where the value reported in the publication was in kJ/mol but had been entered in kcal/mol. A table summarizing the 605 curated mutations is included as supplemental Table S1. As above, we use ΔΔGU F, with positive values indicating increased stability.

       Building the meta-predictor

      For the 605-mutation data set (supplemental Table S1), predictions were made using each of the 11 tools. Mutations were then categorized based on five criteria: 1) change in polarity as measured by ΔG of solvation (less, the same, more); 2) change in size (smaller, the same, larger); 3) solvent-accessible surface area of the wild-type residue (buried, partially exposed, fully exposed); 4) secondary structure at the site of mutation (α, β, turn, unstructured); and 5) whether the mutation was to or from glycine (because performance on mutations involving glycine tended to differ most dramatically from other amino acids).
      Measures of solvation free energy were taken from the work of Wimley et al. (
      • Wimley W.C.
      • Creamer T.P.
      • White S.H.
      Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
      ), with a difference of >0.1 kcal/mol (ΔΔGsolvation) required to consider the mutation more or less polar. Measures of size were taken from the work of Darby and Creighton (
      • Creighton T.
      ), and a difference of at least 19 Å3 (approximately the size of a CH3 group) was considered larger or smaller. Exposure of the WT residues was calculated using VMD (www.ks.uiuc.edu/Research/vmd/)3 (
      • Humphrey W.
      • Dalke A.
      • Schulten K.
      VMD: visual molecular dynamics.
      ), by dividing the solvent-accessible surface area of the amino acid in the PDB structure by that of the fully exposed amino acid. Ranges for the ratios were as follows: buried, ≤0.05; partially buried, 0.05–0.20; exposed, ≥0.20. The secondary structure was calculated using DSSP (
      • Kabsch W.
      • Sander C.
      Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.
      ).
      Predictions from each tool were weighted by the Matthews correlation coefficient (
      • Matthews B.W.
      Comparison of the predicted and observed secondary structure of t4 phage lysozyme.
      ) (MCC) for each type of mutation (Fig. 1). The MCC values were determined through cross-validation, in which the data set of 605 mutations was split into halves, with one half used to determine MCC values as weights and the other half used to test overall performance (Fig. 1f); this was repeated 1000 times. Final reported performance values are an average of those 1000 tests (Fig. 1f and Table 1), and the scatter plot in Fig. 2 shows the average value for each mutation when it was not used to train the weights. Overall, the meta-predictor score was calculated as follows,
      ΔΔGmeta=itoolsjtypeswi,j×ΔΔGtooliitoolsjtypeswi,j
      (Eq. 2)


      where wi,j is the weight for the ith tool on the jth mutation type.
      Weights for each tool and each type of mutation are given in supplemental Table S4. In the case of the work presented here, all 11 tools were used, but as shown in Fig. 1, benefit can still be derived from combining a subset of those tools if not all are readily available. The web-server for the meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/) does not include the PoPMuSiC predictor, but it can be used individually from the authors' website (dezyme.com).

       Meta-prediction of ThreeFoil point mutations

      The ΔΔG of point mutations to ThreeFoil were predicted with each of the 11 tools and then combined into a meta-ΔΔG as above (see Equation 1). To preserve function, we excluded 18 residues in the three symmetric carbohydrate-binding sites (Asp-17/64/111, Ile-30/77/124, Tyr-32/79/126, Ser-35/82/129, Asn-39/86/133, and Gln-40/87/134) and three residues (Asn-28/75/122) in the single sodium-binding site, 21 residues in total. This left 120 residues to test. ThreeFoil is well-behaved in solution, and to preserve this, we chose to avoid incorporation of cysteine, which could cause aberrant cross-linking and aggregation (ThreeFoil has no cysteine residues). Given the remaining set of 18 natural amino acids (not including the WT amino acid) and 120 positions, we tested a total of 2160 mutations with each tool (except for EGAD, which cannot predict mutations to or from glycine or proline (
      • Pokala N.
      • Handel T.M.
      Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity.
      )).
      Because ThreeFoil has a 3-fold symmetric sequence and structure, we reasoned that the quality of the predictions could be improved by averaging the results across symmetric positions, because the mutation impact ought to be very similar at each position. Experimental testing of point mutations was performed using the symmetric position in the second/middle subdomain module.

       Expression and purification of ThreeFoil mutants

      ThreeFoil was expressed from a pET-28a plasmid (Novagen) in BL21 DE3 Escherichia coli cells, as described previously (
      • Broom A.
      • Ma S.M.
      • Xia K.
      • Rafalia H.
      • Trainor K.
      • Colón W.
      • Gosavi S.
      • Meiering E.M.
      Designed protein reveals structural determinants of extreme kinetic stability.
      ,
      • Broom A.
      • Doxey A.C.
      • Lobsanov Y.D.
      • Berthin L.G.
      • Rose D.R.
      • Howell P.L.
      • McConkey B.J.
      • Meiering E.M.
      Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein.
      ). Expression was induced with 1 mm isopropyl 1-thio-β-d-galactopyranoside, and cells were harvested after 24 h at 37 °C. Inclusion bodies were solubilized in buffered urea (6 m urea, 100 mm sodium phosphate, 10 mm Tris, pH 8.1), bound to a nickel-nitrilotriacetic acid column, and eluted with the same buffer as above but at pH 4.5. The protein was then refolded by dialysis (1:10, 10 times) in lyophilization buffer in 50 mm ammonium acetate and lyophilized.

       Kinetic analysis of ThreeFoil mutants

      All measurements were performed at 27 °C. Folding and unfolding were monitored by fluorescence using a SpectraMax M5 plate reader (Molecular Devices) with excitation at 274 nm and emission at 317 nm. Fluorescence was measured from the bottom of 96-well UV Star® (Greiner Bio-One) black-well plates with clear, UV-transparent bottoms and the top sealed with HD Clear sealing tape (Hampton Research) to prevent evaporation. To folded or unfolded protein (10 mg/ml ThreeFoil (∼550 μm) in standard buffer (50 mm Tris, 150 mm NaCl, pH 8.1) with or without 4 m GuSCN), varying concentrations of GuSCN in standard buffer were added, such that the final volume in each well was 250 μl at a protein concentration of 3.3 μm. For mutants with much reduced solubility under folded conditions (Q78I, MMut1, and MMut2), the final concentration was kept at 3.3 μm, but the initial stock was of lower concentration; thus, the range of denaturant concentrations that could be explored was less.
      The shortest experiments ran for 30 min, whereas the longest ran for 4 days, and the interval for taking plate readings (10 s to 30 min) was chosen such that each run would have ∼200 measurements. Each kinetic trace was fit to a single exponential equation,
      S=B+Cekt
      (Eq. 3)


      where S is the fluorescence signal, B is the offset, C is the amplitude, k is the rate constant, and t is the time in seconds. The chevron of observed rate constants, kobs, as a function of denaturant activity, A, was fit to the equation for a two-state transition between folded (f) and unfolded (u) states of the protein (
      • Maxwell K.L.
      • Wildes D.
      • Zarrine-Afsar A.
      • De Los Rios M.A.
      • Brown A.G.
      • Friel C.T.
      • Hedberg L.
      • Horng J.-C.
      • Bona D.
      • Miller E.J.
      • Vallée-Bélisle A.
      • Main E.R.G.
      • Bemporad F.
      • Qiu L.
      • Teilum K.
      • et al.
      Protein folding: defining a “standard” set of experimental conditions and a preliminary kinetic data set of two-state proteins.
      ),
      ln(kobs)=ln(ef+mfA+eu+muA)
      (Eq. 4)


      where mfand mu are the linear denaturant dependence of folding and unfolding, respectively, and f = ln(kfH2O) and u = ln(kuH2O) are the natural logarithms of the respective folding and unfolding rate constants in water (measured in s−1). The denaturant activity, A, was calculated as by Cota and Clarke (
      • Cota E.
      • Clarke J.
      Folding of β-sandwich proteins: three-state transition of a fibronectin type iii module.
      ),
      A=[GuSCN]×(C0.5/(C0.5+[GuSCN]))
      (Eq. 5)


      where C0.5 is the [GuSCN] at half activity, equal to 6.47 m. The Gibbs free energy of unfolding, also referred to as the thermodynamic stability of the protein, was calculated from the ratio of the folding and unfolding rate constants in the absence of denaturant,
      ΔGUF=−RTln(kuH2O/kfH2O)
      (Eq. 6)


      where R is the universal gas constant, and T, the temperature, is 300 K.

       Solubility measurements and predictions

      A saturated protein solution was obtained by adding 250 μl of buffer (150 mm NaCl, 50 mm Tris, pH 8.1) to 10 mg of lyophilized protein. After incubating for 2 h at room temperature, the sample was centrifuged at 13,300 × g for 10 min to pellet undissolved protein. The protein concentration of the supernatant was measured spectrophotometrically after a 1:20 dilution in buffer, using a molar absorption coefficient of 33,600 μl mol−1 cm−1 (
      • Broom A.
      • Doxey A.C.
      • Lobsanov Y.D.
      • Berthin L.G.
      • Rose D.R.
      • Howell P.L.
      • McConkey B.J.
      • Meiering E.M.
      Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein.
      ).
      Prediction of mutant solubility was performed using the web servers for Aggrescan3D (
      • Zambrano R.
      • Jamroz M.
      • Szczasiuk A.
      • Pujols J.
      • Kmiecik S.
      • Ventura S.
      Aggrescan3d (a3d): server for prediction of aggregation properties of protein structures.
      ) (biocomp.chem.uw.edu.pl/A3D/),3CamSol (
      • Sormanni P.
      • Aprile F.A.
      • Vendruscolo M.
      The camsol method of rational design of protein mutants with enhanced solubility.
      ), PASTA (
      • Walsh I.
      • Seno F.
      • Tosatto S.C.E.
      • Trovato A.
      Pasta 2.0: an improved server for protein aggregation prediction.
      ) (protein.bio.unipd.it/pasta2/),3 TANGO (
      • Fernandez-Escamilla A.-M.
      • Rousseau F.
      • Schymkowitz J.
      • Serrano L.
      Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins.
      ) (tango.crg.es/),3 ZipperDB (
      • Goldschmidt L.
      • Teng P.K.
      • Riek R.
      • Eisenberg D.
      Identifying the amylome, proteins capable of forming amyloid-like fibrils.
      ) (services.mbi.ucla.edu/zipperdb/intro),3 and Zyggregator (
      • Tartaglia G.G.
      • Vendruscolo M.
      The zyggregator method for predicting protein aggregation propensities.
      ). The hydrophobicity scale for side chains based on solvation free energy was taken from Table 2 of Wimley et al. (
      • Wimley W.C.
      • Creamer T.P.
      • White S.H.
      Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
      ). All values were normalized such that the WT score was 1 and predictions of reduced solubility would score lower than 1. For predictors that represent reduced solubility as lower scores, this was done by dividing the WT and mutant scores by the WT score. For predictors that represent reduced solubility as higher scores (i.e. higher aggregation propensity), this was done by dividing the inverse of each score by the inverse of the WT score.

      Author contributions

      A. B. and E. M. M. conceived of and coordinated the study and wrote the paper. A. B. and Z. J. designed, performed, and analyzed the computational predictions/modeling and wet laboratory experiments. A. B. and K. T. curated and analyzed the Protherm database. All authors reviewed the results and approved the final version of the manuscript.

      Supplementary Material

      References

        • Rockah-Shmuel L.
        • Tóth-Petróczy Á.
        • Tawfik D.S.
        Systematic mapping of protein mutational space by prolonged drift reveals the deleterious effects of seemingly neutral mutations.
        PLoS Comput. Biol. 2015; 11: e1004421
        • Kwon W.S.
        • Da Silva N.A.
        • Kellis J.T.
        Relationship between thermal stability, degradation rate and expression yield of barnase variants in the periplasm of Escherichia coli.
        Protein Eng. 1996; 9: 1197-1202
        • McLendon G.
        • Radany E.
        Is protein turnover thermodynamically controlled?.
        J. Biol. Chem. 1978; 253: 6335-6337
        • Sauerborn M.
        • Brinks V.
        • Jiskoot W.
        • Schellekens H.
        Immunological mechanism underlying the immune response to recombinant human protein therapeutics.
        Trends Pharmacol. Sci. 2010; 31: 53-59
        • Manning M.C.
        • Chou D.K.
        • Murphy B.M.
        • Payne R.W.
        • Katayama D.S.
        Stability of protein pharmaceuticals: an update.
        Pharm. Res. 2010; 27: 544-575
        • Boulet-Audet M.
        • Byrne B.
        • Kazarian S.G.
        High-throughput thermal stability analysis of a monoclonal antibody by attenuated total reflection ft-ir spectroscopic imaging.
        Anal. Chem. 2014; 86: 9786-9793
        • Bommarius A.S.
        • Paye M.F.
        Stabilizing biocatalysts.
        Chem. Soc. Rev. 2013; 42: 6534-6565
        • Redington J.M.
        • Breydo L.
        • Uversky V.N.
        When good goes awry: the aggregation of protein therapeutics.
        Protein Pept. Lett. 2017; 24: 340-347
        • Magliery T.J.
        Protein stability: computation, sequence statistics, and new experimental methods.
        Curr. Opin. Struct. Biol. 2015; 33: 161-168
        • Romero P.A.
        • Arnold F.H.
        Exploring protein fitness landscapes by directed evolution.
        Nat. Rev. Mol. Cell Biol. 2009; 10: 866-876
        • Tripathi A.
        • Varadarajan R.
        Residue specific contributions to stability and activity inferred from saturation mutagenesis and deep sequencing.
        Curr. Opin. Struct. Biol. 2014; 24: 63-71
        • Foit L.
        • Morgan G.J.
        • Kern M.J.
        • Steimer L.R.
        • von Hacht A.A.
        • Titchmarsh J.
        • Warriner S.L.
        • Radford S.E.
        • Bardwell J.C.A.
        Optimizing protein stability in vivo.
        Mol. Cell. 2009; 36: 861-871
        • Bornscheuer U.T.
        • Huisman G.W.
        • Kazlauskas R.J.
        • Lutz S.
        • Moore J.C.
        • Robins K.
        Engineering the third wave of biocatalysis.
        Nature. 2012; 485: 185-194
        • Pokala N.
        • Handel T.M.
        Energy functions for protein design: adjustment with protein-protein complex affinities, models for the unfolded state, and negative design of solubility and specificity.
        J. Mol. Biol. 2005; 347: 203-227
        • Dehouck Y.
        • Grosfils A.
        • Folch B.
        • Gilis D.
        • Bogaerts P.
        • Rooman M.
        Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: Popmusic-2.0.
        Bioinformatics. 2009; 25: 2537-2543
        • Schymkowitz J.
        • Borg J.
        • Stricher F.
        • Nys R.
        • Rousseau F.
        • Serrano L.
        The foldx web server: an online force field.
        Nucleic Acids Res. 2005; 33: W382-W388
        • Kellogg E.H.
        • Leaver-Fay A.
        • Baker D.
        Role of conformational sampling in computing mutation-induced changes in protein structure and stability.
        Proteins. 2011; 79: 830-838
        • Capriotti E.
        • Fariselli P.
        • Rossi I.
        • Casadio R.
        A three-state prediction of single point mutations on protein stability changes.
        BMC Bioinformatics. 2008; 9: S6
        • Gilis D.
        • McLennan H.R.
        • Dehouck Y.
        • Cabrita L.D.
        • Rooman M.
        • Bottomley S.P.
        In vitro and in silico design of α1-antitrypsin mutants with different conformational stabilities.
        J. Mol. Biol. 2003; 325: 581-589
        • Cabrita L.D.
        • Gilis D.
        • Robertson A.L.
        • Dehouck Y.
        • Rooman M.
        • Bottomley S.P.
        Enhancing the stability and solubility of TEV protease using in silico design.
        Protein Sci. 2007; 16: 2360-2367
        • Yang D.-F.
        • Wei Y.-T.
        • Huang R.-B.
        Computer-aided design of the stability of pyruvate formate-lyase from Escherichia coli by site-directed mutagenesis.
        Biosci. Biotechnol. Biochem. 2007; 71: 746-753
        • Zhang S.-B.
        • Wu Z.-L.
        Identification of amino acid residues responsible for increased thermostability of feruloyl esterase a from Aspergillus niger using the popmusic algorithm.
        Bioresour. Technol. 2011; 102: 2093-2096
        • Komor R.S.
        • Romero P.A.
        • Xie C.B.
        • Arnold F.H.
        Highly thermostable fungal cellobiohydrolase i (cel7a) engineered using predictive methods.
        Protein Eng. Des. Sel. 2012; 25: 827-833
        • Silva I.R.
        • Larsen D.M.
        • Jers C.
        • Derkx P.
        • Meyer A.S.
        • Mikkelsen J.D.
        Enhancing RGI lyase thermostability by targeted single point mutations.
        Appl. Microbiol. Biotechnol. 2013; 97: 9727-9735
        • Song X.
        • Wang Y.
        • Shu Z.
        • Hong J.
        • Li T.
        • Yao L.
        Engineering a more thermostable blue light photo receptor Bacillus subtilis YtvA LOV domain by a computer aided rational design method.
        PLoS Comput. Biol. 2013; 9: e1003129
        • Wijma H.J.
        • Floor R.J.
        • Jekel P.A.
        • Baker D.
        • Marrink S.J.
        • Janssen D.B.
        Computationally designed libraries for rapid enzyme stabilization.
        Protein Eng. Des. Sel. 2014; 27: 49-58
        • Floor R.J.
        • Wijma H.J.
        • Colpa D.I.
        • Ramos-Silva A.
        • Jekel P.A.
        • Szymański W.
        • Feringa B.L.
        • Marrink S.J.
        • Janssen D.B.
        Computational library design for increasing haloalkane dehalogenase stability.
        Chembiochem. 2014; 15: 1660-1672
        • Deng Z.
        • Yang H.
        • Li J.
        • Shin H.-D.
        • Du G.
        • Liu L.
        • Chen J.
        Structure-based engineering of alkaline α-amylase from alkaliphilic alkalimonas amylolytica for improved thermostability.
        Appl. Microbiol. Biotechnol. 2014; 98: 3997-4007
        • Larsen D.M.
        • Nyffenegger C.
        • Swiniarska M.M.
        • Thygesen A.
        • Strube M.L.
        • Meyer A.S.
        • Mikkelsen J.D.
        Thermostability enhancement of an endo-1,4-β-galactanase from talaromyces stipitatus by site-directed mutagenesis.
        Appl. Microbiol. Biotechnol. 2015; 99: 4245-4253
        • Heselpoth R.D.
        • Yin Y.
        • Moult J.
        • Nelson D.C.
        Increasing the stability of the bacteriophage endolysin plyc using rationale-based foldx computational modeling.
        Protein Eng. Des. Sel. 2015; 28: 85-92
        • Bednar D.
        • Beerens K.
        • Sebestova E.
        • Bendl J.
        • Khare S.
        • Chaloupkova R.
        • Prokop Z.
        • Brezovsky J.
        • Baker D.
        • Damborsky J.
        Fireprot: energy- and evolution-based computational design of thermostable multiple-point mutants.
        PLoS Comput. Biol. 2015; 11: e1004556
        • Risso V.A.
        • Gavira J.A.
        • Sanchez-Ruiz J.M.
        Thermostable and promiscuous precambrian proteins.
        Environ. Microbiol. 2014; 16: 1485-1489
        • Porebski B.T.
        • Buckle A.M.
        Consensus protein design.
        Protein Eng. Des. Sel. 2016; 29: 245-251
        • Potapov V.
        • Cohen M.
        • Schreiber G.
        Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details.
        Protein engineering, design & selection. 2009; 22: 553-560
        • Li Z.
        • Yang Y.
        • Zhan J.
        • Dai L.
        • Zhou Y.
        Energy functions in de novo protein design: current challenges and future prospects.
        Annu. Rev. Biophys. 2013; 42: 315-335
        • Wan J.
        • Kang S.
        • Tang C.
        • Yan J.
        • Ren Y.
        • Liu J.
        • Gao X.
        • Banerjee A.
        • Ellis L.B.M.
        • Li T.
        Meta-prediction of phosphorylation sites with weighted voting and restricted grid search parameter selection.
        Nucleic Acids Res. 2008; 36: e22
        • Yang J.
        • Roy A.
        • Zhang Y.
        Protein-ligand binding site recognition using complementary binding-specific substructure comparison and sequence profile alignment.
        Bioinformatics. 2013; 29: 2588-2595
        • Qin S.
        • Zhou H.-X.
        meta-ppisp: a meta web server for protein-protein interaction site prediction.
        Bioinformatics. 2007; 23: 3386-3387
        • Ishida T.
        • Kinoshita K.
        Prediction of disordered regions in proteins based on the meta approach.
        Bioinformatics. 2008; 24: 1344-1348
        • Kozlowski L.P.
        • Bujnicki J.M.
        Metadisorder: a meta-server for the prediction of intrinsic disorder in proteins.
        BMC Bioinformatics. 2012; 13: 111
        • Emily M.
        • Talvas A.
        • Delamarche C.
        Metamyl: a meta-predictor for amyloid proteins.
        PLoS One. 2013; 8: e79722
        • Bava K.A.
        • Gromiha M.M.
        • Uedaira H.
        • Kitajima K.
        • Sarai A.
        Protherm, version 4.0: thermodynamic database for proteins and mutants.
        Nucleic Acids Res. 2004; 32: D120-D121
        • Broom A.
        • Ma S.M.
        • Xia K.
        • Rafalia H.
        • Trainor K.
        • Colón W.
        • Gosavi S.
        • Meiering E.M.
        Designed protein reveals structural determinants of extreme kinetic stability.
        Proc. Natl. Acad. Sci. U.S.A. 2015; 112: 14605-14610
        • Bloom J.D.
        • Labthavikul S.T.
        • Otey C.R.
        • Arnold F.H.
        Protein stability promotes evolvability.
        Proc. Natl. Acad. Sci. U.S.A. 2006; 103: 5869-5874
        • Tokuriki N.
        • Tawfik D.S.
        Stability effects of mutations and protein evolvability.
        Curr. Opin. Struct. Biol. 2009; 19: 596-604
        • Parthiban V.
        • Gromiha M.M.
        • Schomburg D.
        Cupsat: prediction of protein stability upon point mutations.
        Nucleic Acids Res. 2006; 34: W239-W242
        • Worth C.L.
        • Preissner R.
        • Blundell T.L.
        Sdm–a server for predicting effects of mutations on protein stability and malfunction.
        Nucleic Acids Res. 2011; 39: W215-W222
        • Cohen M.
        • Potapov V.
        • Schreiber G.
        Four distances between pairs of amino acids provide a precise description of their interaction.
        PLoS Comput. Biol. 2009; 5: e1000470
        • Cheng J.
        • Randall A.
        • Baldi P.
        Prediction of protein stability changes for single-site mutations using support vector machines.
        Proteins. 2006; 62: 1125-1132
        • Deutsch C.
        • Krishnamoorthy B.
        Four-body scoring function for mutagenesis.
        Bioinformatics. 2007; 23: 3009-3015
        • Yang Y.
        • Zhou Y.
        Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions.
        Protein Sci. 2008; 17: 1212-1219
        • Davey J.A.
        • Chica R.A.
        Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles.
        Proteins. 2014; 82: 771-784
        • Meiering E.M.
        • Serrano L.
        • Fersht A.R.
        Effect of active site residues in barnase on activity and stability.
        J. Mol. Biol. 1992; 225: 585-589
        • Shoichet B.K.
        • Baase W.A.
        • Kuroki R.
        • Matthews B.W.
        A relationship between protein stability and protein function.
        Proc. Natl. Acad. Sci. U.S.A. 1995; 92: 452-456
        • Tokuriki N.
        • Stricher F.
        • Serrano L.
        • Tawfik D.S.
        How protein stability and new functions trade off.
        PLoS Comput. Biol. 2008; 4: e1000002
        • Broom A.
        • Doxey A.C.
        • Lobsanov Y.D.
        • Berthin L.G.
        • Rose D.R.
        • Howell P.L.
        • McConkey B.J.
        • Meiering E.M.
        Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein.
        Structure. 2012; 20: 161-171
        • Deng Z.
        • Huang W.
        • Bakkalbasi E.
        • Brown N.G.
        • Adamski C.J.
        • Rice K.
        • Muzny D.
        • Gibbs R.A.
        • Palzkill T.
        Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution.
        J. Mol. Biol. 2012; 424: 150-167
        • Araya C.L.
        • Fowler D.M.
        • Chen W.
        • Muniez I.
        • Kelly J.W.
        • Fields S.
        A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function.
        Proc. Natl. Acad. Sci. U.S.A. 2012; 109: 16858-16863
        • Klesmith J.R.
        • Bacik J.-P.
        • Wrenbeck E.E.
        • Michalczyk R.
        • Whitehead T.A.
        Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning.
        Proc. Natl. Acad. Sci. U.S.A. 2017; 114: 2265-2270
        • Voynov V.
        • Chennamsetty N.
        • Kayser V.
        • Helk B.
        • Trout B.L.
        Predictive tools for stabilization of therapeutic proteins.
        mAbs. 2009; 1: 580-582
        • Thompson M.J.
        • Sievers S.A.
        • Karanicolas J.
        • Ivanova M.I.
        • Baker D.
        • Eisenberg D.
        The 3D profile method for identifying fibril-forming segments of proteins.
        Proc. Natl. Acad. Sci. U.S.A. 2006; 103: 4074-4078
        • Lawrence M.S.
        • Phillips K.J.
        • Liu D.R.
        Supercharging proteins can impart unusual resilience.
        J. Am. Chem. Soc. 2007; 129: 10110-10112
        • Warwicker J.
        • Charonis S.
        • Curtis R.A.
        Lysine and arginine content of proteins: computational analysis suggests a new tool for solubility design.
        Mol. Pharm. 2014; 11: 294-303
        • Rauscher S.
        • Baud S.
        • Miao M.
        • Keeley F.W.
        • Pomès R.
        Proline and glycine control protein self-organization into elastomeric or amyloid fibrils.
        Structure. 2006; 14: 1667-1676
        • Wimley W.C.
        • Creamer T.P.
        • White S.H.
        Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides.
        Biochemistry. 1996; 35: 5109-5124
        • Cordes M.H.
        • Sauer R.T.
        Tolerance of a protein to multiple polar-to-hydrophobic surface substitutions.
        Protein Sci. 1999; 8: 318-325
        • Poso D.
        • Sessions R.B.
        • Lorch M.
        • Clarke A.R.
        Progressive stabilization of intermediate and transition states in protein folding reactions by introducing surface hydrophobic residues.
        J. Biol. Chem. 2000; 275: 35723-35726
        • Funahashi J.
        • Takano K.
        • Yamagata Y.
        • Yutani K.
        Positive contribution of hydration structure on the surface of human lysozyme to the conformational stability.
        J. Biol. Chem. 2002; 277: 21792-21800
        • Machius M.
        • Declerck N.
        • Huber R.
        • Wiegand G.
        Kinetic stabilization of Bacillus licheniformis α-amylase through introduction of hydrophobic residues at the surface.
        J. Biol. Chem. 2003; 278: 11546-11553
        • Ayuso-Tejedor S.
        • Abián O.
        • Sancho J.
        Underexposed polar residues and protein stabilization.
        Protein Eng. Des. Sel. 2011; 24: 171-177
        • Dantas G.
        • Kuhlman B.
        • Callender D.
        • Wong M.
        • Baker D.
        A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins.
        J. Mol. Biol. 2003; 332: 449-460
        • Jacak R.
        • Leaver-Fay A.
        • Kuhlman B.
        Computational protein design with explicit consideration of surface hydrophobic patches.
        Proteins. 2012; 80: 825-838
        • Huang P.-S.
        • Boyken S.E.
        • Baker D.
        The coming of age of de novo protein design.
        Nature. 2016; 537: 320-327
        • Pace C.N.
        • Fu H.
        • Lee Fryar K.
        • Landua J.
        • Trevino S.R.
        • Schell D.
        • Thurlkill R.L.
        • Imura S.
        • Scholtz J.M.
        • Gajiwala K.
        • Sevcik J.
        • Urbanikova L.
        • Myers J.K.
        • Takano K.
        • Hebert E.J.
        • et al.
        Contribution of hydrogen bonds to protein stability.
        Protein Sci. 2014; 23: 652-661
        • Bazzoli A.
        • Kelow S.P.
        • Karanicolas J.
        Enhancements to the rosetta energy function enable improved identification of small molecules that inhibit protein-protein interactions.
        PLoS One. 2015; 10: e0140359
        • Rykunov D.
        • Fiser A.
        Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials.
        Proteins. 2007; 67: 559-568
        • Rooman M.
        • Gilis D.
        Different derivations of knowledge-based potentials and analysis of their robustness and context-dependent predictive power.
        Eur. J. Biochem. 1998; 254: 135-143
        • Thomas P.D.
        • Dill K.A.
        Statistical potentials extracted from protein structures: how accurate are they?.
        J. Mol. Biol. 1996; 257: 457-469
        • Broom A.
        • Trainor K.
        • MacKenzie D.W.
        • Meiering E.M.
        Using natural sequences and modularity to design common and novel protein topologies.
        Curr. Opin. Struct. Biol. 2016; 38: 26-36
        • Boersma Y.L.
        • Plückthun A.
        Darpins and other repeat protein scaffolds: advances in engineering and applications.
        Curr. Opin. Biotechnol. 2011; 22: 849-857
        • Wetzel S.K.
        • Settanni G.
        • Kenig M.
        • Binz H.K.
        • Plückthun A.
        Folding and unfolding mechanism of highly stable full-consensus ankyrin repeat proteins.
        J. Mol. Biol. 2008; 376: 241-257
        • Balaji S.
        Internal symmetry in protein structures: prevalence, functional relevance and evolution.
        Curr. Opin. Struct. Biol. 2015; 32: 156-166
        • Tzul F.O.
        • Schweiker K.L.
        • Makhatadze G.I.
        Modulation of folding energy landscape by charge-charge interactions: linking experiments with computational modeling.
        Proc. Natl. Acad. Sci. U.S.A. 2015; 112: E259-E266
        • Worth C.L.
        • Blundell T.L.
        On the evolutionary conservation of hydrogen bonds made by buried polar amino acids: the hidden joists, braces and trusses of protein architecture.
        BMC Evol. Biol. 2010; 10: 161
        • Bolen D.W.
        • Rose G.D.
        Structure and energetics of the hydrogen-bonded backbone in protein folding.
        Annu. Rev. Biochem. 2008; 77: 339-362
        • Bolon D.N.
        • Marcus J.S.
        • Ross S.A.
        • Mayo S.L.
        Prudent modeling of core polar residues in computational protein design.
        J. Mol. Biol. 2003; 329: 611-622
        • Tokuriki N.
        • Stricher F.
        • Schymkowitz J.
        • Serrano L.
        • Tawfik D.S.
        The stability effects of protein mutations appear to be universally distributed.
        J. Mol. Biol. 2007; 369: 1318-1332
        • Berliner N.
        • Teyra J.
        • Colak R.
        • Garcia Lopez S.
        • Kim P.M.
        Combining structural modeling with ensemble machine learning to accurately predict protein fold stability and binding affinity effects upon mutation.
        PLoS One. 2014; 9: e107353
        • Goldenzweig A.
        • Goldsmith M.
        • Hill S.E.
        • Gertman O.
        • Laurino P.
        • Ashani Y.
        • Dym O.
        • Unger T.
        • Albeck S.
        • Prilusky J.
        • Lieberman R.L.
        • Aharoni A.
        • Silman I.
        • Sussman J.L.
        • Tawfik D.S.
        • Fleishman S.J.
        Automated structure- and sequence-based design of proteins for high bacterial expression and stability.
        Mol. Cell. 2016; 63: 337-346
        • Bendl J.
        • Stourac J.
        • Sebestova E.
        • Vavra O.
        • Musil M.
        • Brezovsky J.
        • Damborsky J.
        Hotspot wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering.
        Nucleic Acids Res. 2016; 44: W479-W487
        • Davey J.A.
        • Damry A.M.
        • Euler C.K.
        • Goto N.K.
        • Chica R.A.
        Prediction of stable globular proteins using negative design with non-native backbone ensembles.
        Structure. 2015; 23: 2011-2021
        • Krivov G.G.
        • Shapovalov M.V.
        • Dunbrack R.L.
        Improved prediction of protein side-chain conformations with scwrl4.
        Proteins. 2009; 77: 778-795
        • Creighton T.
        Proteins: Structure and molecular properties. W. H. Freeman & Co., NY1993: 4
        • Humphrey W.
        • Dalke A.
        • Schulten K.
        VMD: visual molecular dynamics.
        J. Mol. Graph. 1996; 14: 33-38
        • Kabsch W.
        • Sander C.
        Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.
        Biopolymers. 1983; 22: 2577-2637
        • Matthews B.W.
        Comparison of the predicted and observed secondary structure of t4 phage lysozyme.
        Biochim. Biophys. Acta. 1975; 405: 442-451
        • Maxwell K.L.
        • Wildes D.
        • Zarrine-Afsar A.
        • De Los Rios M.A.
        • Brown A.G.
        • Friel C.T.
        • Hedberg L.
        • Horng J.-C.
        • Bona D.
        • Miller E.J.
        • Vallée-Bélisle A.
        • Main E.R.G.
        • Bemporad F.
        • Qiu L.
        • Teilum K.
        • et al.
        Protein folding: defining a “standard” set of experimental conditions and a preliminary kinetic data set of two-state proteins.
        Protein Sci. 2005; 14: 602-616
        • Cota E.
        • Clarke J.
        Folding of β-sandwich proteins: three-state transition of a fibronectin type iii module.
        Protein Sci. 2000; 9: 112-120
        • Zambrano R.
        • Jamroz M.
        • Szczasiuk A.
        • Pujols J.
        • Kmiecik S.
        • Ventura S.
        Aggrescan3d (a3d): server for prediction of aggregation properties of protein structures.
        Nucleic Acids Res. 2015; 43: W306-W313
        • Sormanni P.
        • Aprile F.A.
        • Vendruscolo M.
        The camsol method of rational design of protein mutants with enhanced solubility.
        J. Mol. Biol. 2015; 427: 478-490
        • Walsh I.
        • Seno F.
        • Tosatto S.C.E.
        • Trovato A.
        Pasta 2.0: an improved server for protein aggregation prediction.
        Nucleic Acids Res. 2014; 42: W301-W307
        • Fernandez-Escamilla A.-M.
        • Rousseau F.
        • Schymkowitz J.
        • Serrano L.
        Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins.
        Nat. Biotechnol. 2004; 22: 1302-1306
        • Goldschmidt L.
        • Teng P.K.
        • Riek R.
        • Eisenberg D.
        Identifying the amylome, proteins capable of forming amyloid-like fibrils.
        Proc. Natl. Acad. Sci. U.S.A. 2010; 107: 3487-3492
        • Tartaglia G.G.
        • Vendruscolo M.
        The zyggregator method for predicting protein aggregation propensities.
        Chem. Soc. Rev. 2008; 37: 1395-1401