The Denatured State Dictates the Topology of Two Proteins with Almost Identical Sequence but Different Native Structure and Function*

The protein folding problem is often studied by comparing the mechanisms of proteins sharing the same structure but different sequence. The recent design of the two proteins GA88 and GB88, displaying different structures and functions while sharing 88% sequence identity (49 out of 56 amino acids), allows the unique opportunity for a complementary approach. At which stage of its folding pathway does a protein commit to a given topology? Which residues are crucial in directing folding mechanisms to a given structure? By using a combination of biophysical and computational techniques, we have characterized the folding of both GA88 and GB88. We show that, contrary to expectation, GB88, characterized by a native α+β fold, displays in the denatured state a content of native-like helical structure greater than GA88, which is all-α in its native state. Both experiments and simulations indicate that such residual structure may be tuned by changing pH. Thus, despite the high sequence identity, the folding pathways for these two proteins appear to diverge as early as in the denatured state. Our results suggest a mechanism whereby protein topology is committed very early along the folding pathway, being imprinted in the residual structure of the denatured state.

Understanding the rules that govern the folding of proteins is one of the main unsolved problems in modern science (1). Current knowledge on the protein folding reaction has been achieved by extensively characterizing the folding mechanisms of simple globular proteins (2), and a comprehension of the folding pathways of larger multidomain systems is still far from being achieved. Given the diversity of protein structures and amino acid compositions, it is extremely difficult to draw general rules by studying folding kinetics of individual proteins. In fact, when considering the folding of different proteins, a comparison may be jeopardized by the variability in amino acid sequence and in the three-dimensional structure of the native and denatured states.
A powerful approach to elucidate relationships between sequence information and folding mechanism is to study proteins that differ in sequence but share the same overall fold (3)(4)(5)(6)(7)(8)(9)(10)(11). This strategy assumes that general correlations between amino acid sequences and folding pathways may be extrapolated by comparing folding processes of different members of a given protein family.
Generally, proteins with significant sequence similarity are expected to have a similar fold. In fact, analysis of the Protein Data Bank (PDB) reveals that a sequence similarity of 40% nearly always leads to a similar fold (12). This observation provoked Rose and Creamer (13) in 1994 to issue the "Paracelsus Challenge," whereby the protein folding community was confronted with the task of designing two proteins that are at least 50% identical but possessed different folds. Amazingly, this goal was fully achieved in only 3 years, when Regan and co-workers (14) designed a sequence that, despite being 50% identical to a mostly ␤-sheet protein, folded into a fourhelix bundle. Since then, others have achieved similarly impressive results (15). Recently, ambitious work by Bryan and co-workers (16,17) led to the design of pairs of proteins with an extraordinarily high degree of sequence identity but different folds and different functions. In particular, the sequences of two domains from streptococcal protein G were subjected to an iterative design of heteromorphic pairs, leading the authors to produce two protein G variants, called G A 88, which is mostly ␣-helical (the 3-helix bundle protein A fold), and G B 88, displaying the ␣ϩ␤ protein G fold (Fig. 1). These two proteins share 88% sequence identity (49 out of 56 amino acids), yet they display two different structures and functions that are similar to the respective wild-type proteins. In parallel with studies on protein families, this protein engineering achievement offers the unique opportunity for a complementary study on protein folding mechanism addressing two key questions. 1) At which stage of its folding pathway does a pro-* This work was partially supported by grants from the Italian Ministero dell'Istruzione dell'Università e della Ricerca (Grants 2007B57EAB_004, 20074TJ3ZB_005, and RBRN07BMCT_007). This work was supported, in whole or in part, by National Institutes of Health Grants GM50789 (to V. D.) and GM062154 (to P. N. B.). This work was also supported by a grant from the Department of Defense through the National Defense Science and Engineering Graduate Fellowship Program (to M. E. M.) for the MD studies. □ S The on-line version of this article (available at http://www.jbc.org) contains supplemental Figs. S1-S3. 1 To whom correspondence may be addressed. tein commit to a given topology? 2) Which residues are crucial in directing folding to a given native structure?
Here we present for the first time an extensive characterization of the folding mechanisms of G A 88 and G B 88 by experiment and molecular dynamics (MD) 3 simulation. The results obtained under a variety of solvent conditions suggest the presence in the denatured state of G B 88 of pH-sensitive residual structure, as indicated by a pH dependence of its m DϪN value, which is not observed for G A 88. The MD simulations are consistent with these findings, showing that for G B 88 the non-polar solvent-accessible surface area decreases markedly at low pH. Interestingly, in analogy with earlier observations on a similar heteromorphic protein A/protein G pair sharing 59% sequence identity (i.e. A219 and G311) (18), the extent of native-like helical structure in the denatured state of G B 88 (the protein G ␣ϩ␤ fold) is greater than that of denatured G A 88, which is all-␣ in its native conformation. Both our current and our earlier studies of this system suggest that protein topology is committed very early along the folding pathway being "imprinted" in the residual structure of the denatured state; this weak, loosely defined topology is sufficient to dictate the pathway of folding. The significance of these observations from the perspective of previous work on the folding of proteins with the same topology but very different sequences is discussed.

EXPERIMENTAL PROCEDURES
The buffers used were 50 mM sodium phosphate from pH 8.0 to 6.3, 50 mM sodium acetate from pH 5.5 to 3.8, 50 mM sodium formate from pH 3.4 to 3.0, and 50 mM sodium phosphate/phosphoric acid from pH 2.8 to 2.0. In an effort to check whether the buffer composition had any effect on the folding experiments, we performed additional control experiments in the presence of 50 mM Tris at pH 7.2 and 8.0, 50 mM Mes at pH 6.2 and 5.5, and 50 mM sodium phosphate (first protonation) at pH 2.0. Both the folding and the unfolding reactions were essentially independent from buffer composition. All reagents were analytical grade.

Protein Expression and Purification
Bryan and co-workers (16) cloned G A 88 and G B 88 genes into the vector pG58, which encodes an engineered subtilisin pro-sequence as the N terminus of the fusion protein. Proteins were purified employing an affinity chromatography previously developed (19). Soluble cell extract of pro-domain fusion proteins was injected on a 5-ml Bio-Scale TM Mini Profinity eXact cartridge at 5 ml/min to allow binding and then washed with 10 column volumes of 100 mM NaPO 4 (pH 7.2) to remove impurities. To cleave and elute the purified target proteins, 15 ml of 100 mM NaF in the presence of 100 mM NaPO 4 (pH 7.2) were injected at 5 ml/min. After the first 10 ml, the flow was stopped, and the column was incubated for 30 min to allow complete cleavage. After elution, the purified proteins were then dialyzed to remove potassium fluoride.

Equilibrium Unfolding
Equilibrium denaturations were followed on a JASCO circular dichroism (CD) spectropolarimeter (JASCO, Inc., Easton, MD), in a 1-cm quartz cuvette (Hellma). CD spectra of G A 88 and G B 88 were recorded between 250 and 200 nm. Protein concentrations were typically 6 M.

Stopped-flow Measurements
Single mixing kinetic folding experiments were carried out on a Pi-star stopped-flow instrument (Applied Photophysics, Leatherhead, UK); the excitation wavelength was 280 nm, and the fluorescence emission was measured using a 320 nm cutoff glass filter. In all experiments, performed at 25 and 10°C, refolding and unfolding were initiated by an 11-fold dilution of the denatured or the native protein with the appropriate buffer. Final protein concentrations were typically 1 M. The observed kinetics were always independent of protein concentration (from 0.5 to 5 M), as expected from monomolecular reactions without effects due to transient aggregation (20).

Data Analysis
Equilibrium experiments-Assuming a standard two-state model, the urea-induced denaturation transitions were fit to the equation where m DϪN is the slope of the transition (proportional to the increase in solvent-accessible surface area ongoing from the native to the denatured state) and D 50 is the midpoint of the denaturation transition. An equation that takes into account the pre-and post-transition baselines was used to fit the observed unfolding transition (21).
Kinetic Experiments-Analysis was performed by non-linear least squares fitting of single exponential phases using the fitting procedures provided in the Applied Photophysics software. The chevron plots were fitted, using the Kaleidagraph software package, by numerical analysis based on a two-state model following the equation where k F and k U represent the folding and unfolding rate constants, respectively. The logarithm of each microscopic rate constant was assumed to vary linearly with denaturant concentration (22).

Molecular Dynamics Simulations
We performed 11 all-atom, explicit solvent MD simulations for each protein, G A 88 and G B 88: 298 K (30 ns), 498 K neutral and low pH (3 ϫ 50 ns, 2 ϫ 5 ns), for a total of 700 ns (0.7 s) of simulation time. The starting structures were taken from the published NMR ensembles (17) (G A 88, PDB id 2jws, Model 1; G B 88, PDB id 2jwu, Model 3 (298 K), Model 1 (498 K)). Low pH systems were created by protonating all aspartate and glutamate residues (neither protein contains a histidine). The simulations were all performed using our in-house MD software, in lucem molecular mechanics (ilmm), 4 with the Levitt et al. (24) all-atom force field and the microcanonical ensemble (N, constant number of particles, system volume, and total energy). Non-bonded terms were treated with an 8 (at 498 K) or 12 Å (298 K) force-shifted cutoff. The proteins were minimized and solvated with explicit F3C flexible waters (25) using our standard protocol (26). Briefly, the protein was treated in vacuo for 1000 steps of steepest descent minimization. Pre-equilibrated F3C water was added within 1.8 Å of the protein to a box extending at least 10 -12 Å from the protein on all sides. The water was then minimized for 1000 steps and then subjected to 500 ps of dynamics with 2-fs time steps and an additional 500 steps of steepest descent minimization. Finally, the protein was minimized for 500 steps.
The denatured state ensemble was defined as the final 30 ns of the three longer (50 ns) 498 K simulations. The C␣ root mean square deviation (r.m.s.d.) was calculated over all 56 residues as well as for just the core residues. The core was defined as the consecutive residues beginning with the Nterminal residue of the first secondary structure element as defined by He et al. (17) to the C terminus of the final element (G A 88, residues 9 -51; G B 88, residues 1-55). The percentages of helix and solvent-accessible surface area (SASA) were calculated using our in-house implementations of the Dictionary of Protein Secondary Structure (DSSP) (27) and Lee and Richards (28) algorithms, respectively. The percentage of helix or ␤-structure was reported over the total 56 residues as the number of structured residues as defined by DSSP, and nonpolar SASA was reported as the sum of the SASA for all nonpolar residues (Ala, Val, Phe, Pro, Met, Ile, Leu, Trp, and Gly). When values were reported relative to the native state, the average value over the 30-ns 298 K simulation was used as the native value.
Multidimensional scaling of the all-against-all C␣ r.m.s.d. matrix was performed using the R statistical package to assign conformational ensembles. The exit from the native-like cluster, or point of no return, in the three-dimensional projection of the multidimensional scaling was defined as the transition state (TS), and the preceding 5 ps was defined as the transition state ensemble, as described previously (29,30). The average C␣ r.m.s.d. to the starting structure was reported for each TS structure, as was the average pairwise C␣ r.m.s.d. between all TS structures in the ensemble.
The fraction of time that residues were in contact was calculated for the native ensembles (all 30 ns of the native state), the TS ensembles (5 ps from each 498 K unfolding simulation), and the denatured state (the last 30 ns of the three long 498 K simulations). Two residues were considered in contact if they contained carbon atoms that were Ͻ5.4 Å apart or any other two non-hydrogen atoms that were Ͻ4.6 Å apart. Hydrogen bonds were measured in the denatured state (last 30 ns of the three 50-ns 498 K simulations) for specific residue pairs using the following criteria: 1) the distance between the donor hydrogen and acceptor atom was Յ2.6 Å; 2) the donorhydrogen-acceptor angle was within 45°of linearity; and 3) the charges on the donor and acceptor atoms were ϽϪ0.3, and the charge on the hydrogen was Ͼϩ0.3.

RESULTS
Equilibrium Unfolding of G A 88 and G B 88-To study the folding mechanism of G A 88 and G B 88, we carried out both equilibrium and kinetic (un)folding experiments. The results of urea-induced equilibrium denaturation of G A 88 and G B 88 measured at 10°C, pH 7.2 in 50 mM sodium phosphate buffer, as monitored by far-UV circular dichroism (CD) spectroscopy, are provided in Fig. 2 (representative spectra for native and denatured G A 88 and G B 88 are also provided in supplemental Fig. S1). The observed transitions follow a simple twostate behavior, suggesting the absence of stable equilibrium intermediates for both proteins and indicating that these designed variants are capable of cooperative (un)folding reactions. Furthermore, the reaction was fully reversible under all conditions explored.
At physiological pH and in the absence of denaturant, the unfolding free energy of G A 88 derived from a two-state analysis and a global fit of 60 wavelengths (from 250 to 220 nm) is ⌬G DϪN ϭ 3.00 Ϯ 0.18 kcal/mol, with a m DϪN value of 0.62 Ϯ 0.04 kcal mol Ϫ1 M Ϫ1 . In the case of G B 88, ⌬G DϪN ϭ 2.35 Ϯ 0.30 kcal/mol, and the m DϪN value is 1.10 Ϯ 0.08 kcal mol Ϫ1 M Ϫ1 . Considering that the G B 88 construct contains ϳ10 structured residues more than G A 88, both m DϪN values are consistent with those expected for proteins of this size, according to the BPPred database (31). Hence, the seeming difference in cooperativity, as reflected by the different m DϪN values, can be accounted for by the difference in the number of structured residues between the two proteins.
Folding and Unfolding Kinetics-We carried out extensive fluorescence kinetic experiments on both proteins under a variety of different experimental conditions. In particular, the folding and unfolding kinetics were investigated at several pH values, ranging from 10 to 2. In the case of G A 88, it was not possible to measure reliable folding and unfolding rate constants at 25°C over a wide range of denaturant concentrations because the rates were too fast for our stopped-flow apparatus. Thus, kinetic folding data for the two proteins were recorded at 10°C to slow the process for G A 88 and to obtain values under the same conditions for comparison in the case of G B 88. In all cases, the folding and unfolding time courses were fitted satisfactorily to a single exponential decay at any final denaturant concentration (representative folding and unfolding time courses are reported as supplemental Fig. S2). Semi-logarithmic plots of the observed folding/unfolding rate constants of G A 88 and G B 88 versus denaturant concentration (i.e. chevron plots) at pH 7.2 are presented in Fig. 3. Both proteins displayed a V-shaped chevron plot, a hallmark of twostate folding (22). In the case of G A 88, there was excellent agreement between the thermodynamic parameters obtained by equilibrium and kinetic data; however, in the case of G B 88, there was a minor deviation, the m DϪN value being 1.10 Ϯ 0.08 kcal mol Ϫ1 M Ϫ1 from equilibrium experiments and 0.90 Ϯ 0.05 kcal mol Ϫ1 M Ϫ1 from chevron plot analysis. As observed previously for other small single domain proteins (32)(33)(34), a significant deviation of m DϪN from equilibrium and kinetic data suggests that there are residual structure and/or changes in the exposure of non-polar residues in the denatured state of the protein. Because the small deviation observed for G B 88 is at the limit of experimental detection, we performed additional experiments under various conditions as well as MD simulations to further investigate these options, as described below.
A powerful method to address the global properties of folding transition and denatured states is the analysis of chevron plots recorded under different experimental conditions or on various site-directed variants (35). In fact, because the m-values (slopes of the unfolding and refolding arms of the chevron plots) reflect the change in accessible surface area upon (un)folding, analysis of their dependence on reaction conditions may be of diagnostic value to identify transition state movements along the reaction coordinate, as well as denatured state collapse or residual structure. In this study, we compared the folding kinetics of G A 88 and G B 88 at various pH values, ranging from 10 to 2. Inspection of Fig. 4 reveals that although both the stability and the m-values of G A 88 are insensitive to pH and this protein is fully native even at pH 2.0  (supplemental Fig. S3), G B 88 is destabilized at pH Ͻ5. However, the low stability of the latter protein and the poor definition of the observed refolding arms prevented a quantitative analysis of the m-values. Therefore as detailed below, we resorted to investigating the folding of G B 88 under stabilizing conditions.
Certain inorganic salts, such as phosphates and sulfates, favor compact protein conformations because of the preferential exclusion of solvent from the protein surface (36); this makes them potent stabilizers of both the native and the partially folded states. The chevron plots of G B 88 measured at different pH values and in the presence of 0.4 M sodium sulfate are provided in Fig. 5A. As expected, the stabilizing salt allows for a better definition of the refolding arms of the chevron plots. Consequently, we carried out a quantitative analysis of folding parameters over a wide range of pH conditions in the presence of salt. Fig. 5B shows the dependence on pH of calculated m DϪN , m F , and m U values for G B 88. The data fit to the protonation of a single titratable group with an apparent pK a ϳ5. Interestingly, the m DϪN decreases with decreasing pH values, suggesting that the denatured state of this small single domain protein becomes more compact at acidic conditions. Importantly, however, even at neutral pH, the observed m DϪN is lower than that calculated from equilibrium experiments (i.e. 0.93 Ϯ 0.05 kcal mol Ϫ1 M Ϫ1 ), suggesting the presence of residual structure in the denatured state under physiological conditions. Molecular Dynamics Simulations-To further investigate the differences between G A 88 and G B 88, MD simulations were conducted. Five independent thermal unfolding simulations were performed for each protein at 498 K at both neutral and low pH, in addition to a simulation at room temperature (298 K) for each protein as a control. Snapshots from the thermal unfolding at neutral pH are presented in Fig. 6. The sequence positions where the two proteins display different amino acids are highlighted as balls. Interestingly, in G B 88, the first hairpin (␤1/␤2) had a tendency to be loosely maintained in the denatured state, whereas the second hairpin (␤3/ ␤4) was more extended. In G A 88, the C-terminal region tended to collapse down more than in G B 88, leading to slightly more interactions involving ␤3 and ␤4 (residues 42-55). Specifically, these residues had 21.6 Ϯ 5.5 internal residue-residue contacts in G A 88 versus 18.8 Ϯ 4.5 contacts for G B 88. Finally, the central helix (Fig. 6, displayed in green) was fairly well preserved in the denatured state of G B 88.
A quantitative comparison of the unfolding simulations is provided in Table 1, where we report the average properties over the three independent unfolding simulations for each protein. The C␣ r.m.s.d. relative to the starting structure reached over 11 Å in all cases. Although there was little   FEBRUARY 4, 2011 • VOLUME 286 • NUMBER 5 JOURNAL OF BIOLOGICAL CHEMISTRY 3867 change in the C␣ r.m.s.d. upon lowering the pH for G A 88, in the case of G B 88, the C␣ r.m.s.d. increased at low pH. The overall residual helical content in the denatured state at neutral pH was similar for G A 88 and G B 88. The helix content of both proteins increased when the pH was lowered. Interestingly, the helical content in the denatured state of G B 88 at low pH was surprisingly high, being 104% of the native extent of helix content. In contrast, G A 88 contained 39% of its native helix content. Although we could not directly address the helical content of the denatured states, we experimentally ob-served the m DϪN value of G B 88 to decrease with decreasing pH (Fig. 5B). This observation is consistent with the MD simulations, suggesting the denatured state of G B 88 to be more structured at acidic than at neutral pH. Importantly, such a dependence was not observed in G A 88, whose m DϪN value was found insensitive to pH.

The Denatured State Dictates the Topology of Two Proteins
The relative compaction of G B 88 as compared with G A 88 can also be seen in Fig. 6. This effect is reflected in the nonpolar SASAs of the two denatured states relative to their control native states ( Table 1). The non-polar SASA at neutral pH was ϳ1000 Å 2 lower in the starting structures (1330 Ϯ 83 and 1185 Ϯ 72 Å 2 for G A 88 and G B 88, respectively). Relative to the native state, the non-polar SASA of the denatured state at neutral pH increased by 182% in the case of G A 88 and by 174% for G B 88. Although both proteins had some reduction of non-polar SASAs when the pH was lowered, G B 88 was more sensitive to acidification, and the increase in non-polar exposure upon unfolding was reduced relative to neutral pH. This finding parallels the experimental observations, which clearly indicate the denatured state of G B 88 to be more compact at acidic conditions, as reflected by a decreased m DϪN value (Fig. 5B).
TS ensembles were identified for each of the simulations described above, as well as the two shorter simulations for each protein at both pH conditions (Fig. 7). The representative TS structures for G A 88 show that their helical contents and overall size were quite similar at both pHs. On the other hand, the G B 88 TS ensemble was very sensitive to pH, and the structures at neutral pH were much more native-like than those at low pH. The ␤1/␤2 hairpin was more robust than the ␤3/␤4 hairpin in the TS. Furthermore, the two hairpins did not appear to interact directly and instead behaved as different entities physically separated by the central helix.
Interestingly, the heterogeneity of the G A 88 and G B 88 TS ensembles at neutral pH were similar, particularly when the effect of the unstructured N terminus of G A 88 was accounted for (compare the core C␣ r.m.s.d. with the "To self" values in Table 2). Both ensembles are ϳ5 Å C␣ r.m.s.d. from their respective starting structures. The pH sensitivity of the G B 88 TS ensemble was dramatic; the C␣ r.m.s.d. to the starting structure increased by 1.5 Å, and the average C␣ r.m.s.d. between structures within the TS ensemble also increased by 1.4 Å. The core C␣ r.m.s.d. was 6.2 Å for the G B 88 TS ensemble at low pH, and it was quite distorted with a C␣ r.m.s.d. of 7.0 Å FIGURE 6. Denatured state at neutral pH. G A 88 (above) and G B 88 (below) are colored from red to blue from the N terminus to the C terminus. The C␣ atoms of differing residues (24,25,30,33,45,49  a All properties were averaged over the final 30 ns of the three long 498 K simulations, and the S.D. is indicated in parentheses. The values for the 298 K simulations are defined as 100% native. b ␣-Helix was defined using DSSP (27). c Percentage of native ␣-helix was calculated relative to the percent of ␣-helix in the 30-ns 298 K native simulation. d Nonpolar SASA (NP SASA) was defined as the sum of the SASA for all hydrophobic residues (see "Experimental Procedures"). e Percentage of native nonpolar SASA (NP SASA) was calculated relative to the nonpolar SASA in the 30-ns 298 K native simulation. from its starting structure (Fig. 7). In contrast, the G A 88 TS remained within ϳ4 Å of its starting structure, and the spread within a TS ensemble was ϳ4.5 Å (excluding the N terminus of G A 88).
The overall folding pathways from representative simulations (run 1 in each case) of G A 88 and G B 88 are presented in Fig. 8 as the reverse of the simulated unfolding process. It ap-pears that the topology, residual structure, and interactions in the denatured state direct whether the protein will fold into the helical G A 88 structure or the mixed ␣/␤ G B 88 structure. In G A 88, the protein has some dynamic residual structure, whereas the main chain is fairly fluid with different main chain interactions occurring over time within the collapsed state; the productive interactions with respect to folding are local along the sequence, with folding of kernels of helical structure that then dock together and consolidate in the TS ensemble. In contrast, in the case of G B 88, the approximate topology of the native state was already apparent in the denatured state, with segregation of the two hairpin regions by the central helix. This helix was more stable in G B 88 than in G A 88 due to improved packing interactions between residues 30 and 33, which in G A 88 are both Ile, whereas in the case of G B 88, they are Phe and Tyr. Moreover, substitution of Gly to Ala at residue 24 and Ile to Thr at residue 25 in G B 88 also increased the helical propensity of the region.
Interestingly, although the sequence of the N-terminal region is identical in the two proteins, there was a tendency for the region to form a helix in G A 88 and a loose hairpin in G B 88 (Figs. 7 and 8). This difference was, in large part, due to a hydrogen bond between Thr 1 and Glu 19 , which was present in 99.8% of the G B 88 denatured state structures at neutral pH. At low pH, however, the loose ␤1/␤2 hairpin was not present because Glu 19 was protonated. Although positions 1 and 19 are identical in G A 88 and G B 88, in the case of the latter, we did not observe this hydrogen bond in the denatured state at neutral pH. Instead, Glu 19 tended to interact with the solvent, and Thr 1 either interacted with the solvent or formed hydrogen bonds with Asp 47 and Glu 48 (over 67 and 60% of the denatured state, respectively). The backbone / angles of residues 47 and 48 when interacting with Thr 1 were compatible with forming ␣3 in G A 88. These observations indicate that long range interactions play a critical role in the residual structure in the denatured state of these proteins.
The ␤3/␤4 turn was also fractionally present in the denatured state of G B 88 due to the presence of two side chain hydrogen bonds: Asp 47 -Lys 50 and Asp 47 -Tyr 45 (57 and 14% of the time at neutral pH, respectively). Of note, these hydrogen bonds were not present in the low pH denaturing simulations of G B 88 due to the protonation of Asp 47 , nor were they present in the denatured state of G A 88 where both Lys 50 and Tyr 45 are mutated to Leu. This interaction appears to stabilize the ␤3/␤4 turn in G B 88, thus preventing these residues from assuming an ␣-helical structure, as they do in G A 88.

DISCUSSION
Critical insights on many problems in biology have been classically achieved using simplified model systems. Although a comprehensive understanding of the folding of large multidomain proteins is still an aspiration, the successful design of two heteromorphic proteins sharing 88% sequence identity, called G A 88 and G B 88 (16), provides a unique opportunity to unveil the mechanism whereby a few key residues commit the polypeptide chain to its characteristic and functionally competent native topology. Here we have characterized the folding and unfolding kinetics of G A 88 and G B 88 by experiment  and simulation. The key findings show that both engineered proteins appear to fold via a two-state mechanism, and protein topology is committed very early along the folding pathway.

Understanding the Mechanism by Which Proteins Commit to Their Topology Highlights the Role of Long Range Interactions in Denatured
States-For the purpose of this study, it is essential to understand the mechanism whereby a few key residues univocally determine native topology, i.e. which structural determinants preclude the sequence of G A 88 from adopting the G B 88 topology and vice versa? Experimentally, we clearly detected a difference in denatured state properties of the two proteins. In fact, we observed the m DϪN value of G B 88 to decrease with decreasing pH (Fig. 5B). This observation is consistent with the MD simulations suggesting the denatured state of G B 88 to be more structured at acidic than neutral pH, as mirrored both by its native helical content and by its solvent-accessible surface area (Table 1). Importantly, such a dependence was not observed in G A 88, whose m DϪN value was found experimentally to be insensitive to pH. In summary, although only 7 of the 56 amino acids are different between G A 88 and G B 88, only the latter displays in its denatured state a detectable residual structure. Such a structure may be tuned by changing pH, as reflected both by analysis of m DϪN values as a function of pH (apparent pK a ϳ5) and by comparison of the MD simulations at neutral and low pH. Surprisingly, the residues that are different between the two proteins ( Fig. 1) do not include amino acids titrating below neutral pH, suggesting that the observed compaction of the denatured state of G B 88 originates from non-local effects. Indeed, the MD simulations of G B 88 highlight the presence of side chain hydrogen bonds in the ␤3/␤4 hairpin turn, in the denatured state at neutral pH. These hydrogen bonds involve residue Asp 47 , which is protonated in the low pH simulations and does not form hydrogen bonds with either Tyr 45 or Lys 50 as it does at neutral pH. Interestingly, residues 45 and 50 are  (24, 25, 30, 33, 45, 49, and 50) are shown as red balls. The fraction of time in contact for G A 88 (above) and G B 88 (below) were plotted with non-native contacts in the topleft triangle of each panel and with native contacts (present in the starting structure) in the bottom-right of each panel. Native contacts are reported for the full 30-ns 298 K simulation (n ϭ 30,000). Transition state contacts are reported for all transition state ensembles (n ϭ 30); denatured state contacts are plotted for the last 30 ns of the three long 498 K simulations (n ϭ 90,000). Two residues were considered in contact if they contained carbon atoms that were Յ5.4 Å apart or any other non-hydrogen atoms that were Յ4.6 Å apart. Contacts were colored from white (never occurred) to black (present 100% of the time).
both mutated to Leu in G A 88 and are part of the ␣3 helix. However, Asp 47 along with Glu 48 tends to interact with Thr 1 in the denatured state of G A 88. This interaction favored backbone dihedral angles, which were compatible with ␣-helix rather than the ␤3/␤4 turn, as in G B 88.
According to AGADIR (37) and our MD simulations, the 7-residue difference in sequence between G A 88 and G B 88 yields an increased helical propensity for G B 88. In particular, the 4 residues forming the loop connecting ␣1 and ␣2 in G A 88 adopt a helical conformation in the structure of native G B 88. Furthermore, MD reveals that substitution of Gly to Ala at residue 24 and Ile to Thr at residue 25 increases the helical propensity of the region. Thus, experiments and simulations converge in supporting the hypothesis that a longer, more stable ␣-helix in G B 88 prevents the latter sequence from folding to the G A 88 structure. Interestingly, it was recently shown that a switch between the G A and G B structures may be obtained even with a single amino acid substitution (38). Under conditions where the G A fold is Ͼ90% populated, mutation of Leu 45 into Tyr shifts the population to Ͼ90% of the G B fold. Surprisingly, position 45 is not in the loop where G A 88 and G B 88 display a different helical propensity (Fig. 1), indicating that the greater helical content of the denatured state of G B 88 is affected by long range interactions.
Overall, comparison of the folding of G A 88 and G B 88 highlights a conundrum; although only a few residues (or even a single one) are responsible for the selective stabilization of the two alternative topologies, information on the folding mechanism indicates that no single residue appears to act as a unique gatekeeper in the selection of protein topology. Both experiments and simulations on the folding of G A 88 and G B 88 suggest that native topology might be presculpted in the denatured state, where incipient nuclei are present. Stabilization of such nuclei is affected by long range interactions, and commitment to the native fold occurs by selective stabilization of these incipient nuclei rather than by actively blocking alternative pathways.
Do Engineered Proteins Display Cooperative Folding?-An intriguing general question is whether folding is under evolutionary pressure. This issue was recently discussed by Baker and co-workers (39) in a study on the folding of a de novo designed protein, Top7, characterized by a novel non-natural topology. Although most small naturally occurring proteins fold in a cooperative mode (40), Top7 displays a non-cooperative folding mechanism, suggesting that cooperativity may be a result of natural selection. In this context, it may seem somewhat puzzling that both G A 88 and G B 88 fold in a cooperative manner, displaying single exponential folding and unfolding kinetics as well as V-shaped chevron plots, a hallmark of two-state folding (22). However, a possible explanation to reconcile these apparently conflicting observations is that although the structure of Top7 is non-natural (41), both G A 88 and G B 88 have been engineered starting from naturally occurring frameworks (16).
It was reported recently that malleability of protein folding pathways stems from the existence of multiple nuclei within a given protein structure (42). Implicit in this view, a natural topology would contain one or more nucleation motifs, repre-senting the minimal units to encode for the final structure (43). Nucleation of the motif is the basis for cooperative folding, and the number of accessible pathways is related to the number of nucleation motifs within a protein (44). On the basis of the folding pathways observed for Top7, G A 88, and G B 88, it is tempting to speculate that because naturally occurring topologies contain nucleation motifs, productive folding of these substructures would result in cooperative transitions. In contrast, non-natural proteins may not contain stable nucleation motifs and, thus, they appear to fold non-cooperatively. We may conclude that evolution does not directly select for cooperative folding, but rather it selects for topologies that can fold in a cooperative manner.
Comparison with Studies on Protein Families-The study of homologous proteins has represented a powerful approach to obtain insight into protein folding (3-11, 34, 45, 46), especially when combined with structural information on intermediate or transition states. In a recent study, we addressed the structural features of the early and late transition states of two homologous three-state proteins, PSD-95 PDZ3 and PTP-BL PDZ2 (3). For different PDZ domains, we observed that the late folding transition states (TS2) are more similar to each other than the early transition states (TS1). This observation would suggest that although native topology defines the late stages of folding in a unique way, significant freedom in creating structural contacts is observed for the early events. In this perspective, it is of interest to compare the cases of G A 88/G B 88, whose folding appears to diverge early in the denatured state, with that of the PDZ family, where a strong native bias is seen only at the late stages of folding. A plausible scenario to reconcile these apparently contrasting results would imply the presence of multiple nucleation motifs at the early stages of PDZ folding. In fact, the apparent structural divergence of TS1 for PDZ2 and PDZ3 most likely arises from selective stabilization of alternative nuclei in the two denatured proteins, which may then appear to explore distinct early folding pathways. When folding proceeds to the native state, the alternative nuclei all consolidate in a native-like conformation, and folding pathways appear to converge. Support for this hypothesis comes from circular permutation experiments on PDZ2, whereby alternative nuclei may be selectively stabilized and alter the early events of folding without affecting the late ones (47,48). On the other hand, G A 88 and G B 88 do not appear to contain alternative nuclei; moreover, because they display two completely different structures, their respective nucleation motifs may be completely independent such that their folding pathways diverge from the very early stages. Future work based on protein engineering experiments, MD simulations, and -value analysis (23) will further address structural determinants in the folding of this heteromorphic protein pair in an effort to identify crucial residues for the stabilization of these two alternative topologies and their respective folding motifs and to extend the experimental and theoretical work to more complex multidomain systems.