Changes of Protein Folding Pathways by Circular Permutation

The evolved properties of proteins are not limited to structure and stability but also include their propensity to undergo local conformational changes. The latter, dynamic property is related to structural cooperativity and is controlled by the folding-energy landscape. Here we demonstrate that the structural cooperativity of the ribosomal protein S6 is optimized by geometric overlap of two competing folding nuclei: they both include the central β-strand 1. In this way, folding of one nucleus catalyzes the formation of the other, contributing to make the folding transition more concerted overall. The experimental evidence is provided by an extended set of circular permutations of S6 that allows quantitative analysis of pathway plasticity at the level of individual side chains. Because similar overlap between competing nuclei also has been discerned in other proteins, we hypothesize that the coupling of several small nuclei into extended “supernuclei” represents a general principle for propagating folding cooperativity across large structural distances.

The evolved properties of proteins are not limited to structure and stability but also include their propensity to undergo local conformational changes. The latter, dynamic property is related to structural cooperativity and is controlled by the folding-energy landscape. Here we demonstrate that the structural cooperativity of the ribosomal protein S6 is optimized by geometric overlap of two competing folding nuclei: they both include the central ␤-strand 1. In this way, folding of one nucleus catalyzes the formation of the other, contributing to make the folding transition more concerted overall. The experimental evidence is provided by an extended set of circular permutations of S6 that allows quantitative analysis of pathway plasticity at the level of individual side chains. Because similar overlap between competing nuclei also has been discerned in other proteins, we hypothesize that the coupling of several small nuclei into extended "supernuclei" represents a general principle for propagating folding cooperativity across large structural distances.
Proteins are highly evolved molecules that are selected and optimized by the functional requirements of the cell. The optimization, however, works not only on the native structures but also involves their physical properties (1). A well known example is protein stability, i.e. the equilibrium between folded states and the denatured counterparts, that needs to be maintained within certain limits to assure functionality and yet allow degradation. Protein sequences are moreover tuned to resist aggregation and erroneous interactions with other biomolecules. In part such tuning is achieved by electrostatic repulsion (2,3) and negative design, i.e. side chains in the form of gatekeepers that specifically reduce sequence stickiness (4) or otherwise obstruct unwanted interactions between or within proteins (4 -8). At a more generic level, protein aggregation seems also to be prevented by the folding potential itself. The factors that govern structural specificity disfavor automatically non-native contacts between molecules and bias the proteins to aggregate by native-like interactions (4,9,10). The archetypical example of such aggregates is the domain swap (11,12). A seemingly different type of aggregation is the ordered assembly into fibrils driven by sequence signatures that allow linear propagation of non-native ␤-structure (13). In either case, aggregation relies on the exposure of amino acid segments that are normally hidden within the native structure, for example by local unfolding. There is growing evidence that proteins suppress such detrimental unfolding events by increasing the unfolding cooperativity (14 -16). It is interesting to note that, unlike native structure and stability, cooperativity is a property that is primarily determined by the folding-energy landscape (14). The principle can be illustrated as follows. If the folding-energy landscape allows only one folding route, thermal motions of the native state would repeatedly lead to excursions along the same unfolding trajectory. Such narrow sampling of the conformational space could make the affected parts of the protein loose and susceptible to local unfolding by even modest fluctuations (14). If in contrast the energy landscape is tuned to have a broad, diffuse folding progression there is no preferred way of unfolding the protein: each microscopic unfolding attempt involves different parts of the structure (17). In effect, this uniform probability of unfolding will produce a native structure that is relatively "rigid" because the simultaneous occupancy of partly unfolded states that are matched for aggregation is minimized. From experimental studies of small two-state proteins it is apparent that most natural proteins belong to the latter category (18). Folding seems optimized for high cooperativity manifested in transition-state structures, i.e. folding nuclei, that resemble diffuse versions of the native structures (14). Consistent with the idea that the diffuse transition-state structure is a biological adaptation, it can readily be changed by circular permutation into polarized, less cooperative counterparts where half of the protein is fully structured and the other half is disordered (19,20). Also the transition-state structure responds readily to extensive changes in the side-chain contacts as observed in comparative studies of sequence-divergent homologs (21)(22)(23). Even so, the factors governing folding cooperativity are yet poorly understood, and cooperativity is also the molecular trait of naturally evolved proteins that is most difficult to reproduce in silico and by de novo design. One possibility is that cooperativity is achieved by balancing long sequence separation between interacting residues with strong contacts. Such equalization of the folding probability across the protein structure has been seen to promote diffuse nuclei in both computational (17,24,25) and experimental studies (14). Consistently circular permutations of S6 and SH3 show that the contribution of individual side chains to the folding nucleus is directly related to changes in sequence separation of the interacting side chains, implicating that chain entropy is indeed a responsive factor for tuning the folding trajectory. But there is more to it. The plasticity of the S6 folding reaction seems restrained to two competing pathways corresponding to nucleation in either half of the protein structure (19,20). It is further apparent that the competing nuclei of S6 are to some degree overlapping, providing a clue to how the folding cooperativity can be extended from one part of the structure to another (20): strand 1 (␤1) in the center of the ␤-sheet is part of both nuclei (see Fig. 1). To shed further light on how this putative coupling of cooperativity works, here we specifically investigate the behavior of the nuclei overlap. The study is based on -value analysis of a new, expanded set of S6 permutants that includes incisions between all six secondary structure elements. The fate of each part of the S6 structure is thus monitored and compared on six different settings of the folding-energy landscape, yielding statistics that are sufficiently exact to map out folding changes at the level of individual side chains. The results show that the majority of -values respond to circular permutation as predicted from changes in sequence separation. A distinct exception, however, are the -values in ␤1 that remain constant around 0.5. The insensitivity of ␤1 to pathway changes suggests that its side chains form an approximately equal number of interactions with each of the two S6 nuclei. One side of ␤1 is part the ␣1 nucleus, whereas the other is part of the ␣2 nucleus. Thus, when folding shifts between the ␣1 and ␣2 nuclei, the ␤1 contacts simply shift from one side to the other with little effect on the macroscopic -values. From a general perspective, this balanced shift of nucleating contacts around ␤1 provides a concrete example of how structural order can be propagated from one nucleus to the other without the accumulation of partly structured intermediates. On this basis, we conclude that folding cooperativity is a modular property that can be extended through protein structures by multiple, competing, folding nuclei that share structural overlap.

MATERIALS AND METHODS
Protein Engineering-All circular permutants of S6 are labeled according to the position of the incisions, and their amino acids are numbered according to the wild-type sequence, Protein Data Bank code 1RIS (26). The wild-type numeration was used for all point mutations in this study. The circular permutants P 13-14 and P 68 -69 were designed and constructed as described previously (27), and the circular permutants P 54 -55 , P 33-34 , and P 81-82 were designed and constructed according to the procedures in Ref. 19. The genes designed for P [33][34] and P 81-82 were purchased from Entelechon. By these five constructs, all possible incisions between the secondary structure elements of S6 have been covered. The point mutations used for mapping out the interactions in the transitionstate ensembles of the permuted proteins were chosen to cover all parts of the hydrophobic core.
Protein Expression and Purification-Mutations were performed with the QuikChange site-directed mutagenesis kit (Stratagene), oligonucleotides were purchased from DNA Technology, and all mutations were confirmed by sequencing (eurofins mug operon). The mutant proteins were transformed and overexpressed in competent Escherichia coli strain BL21 or C41 DE3 and then purified by two-step precipitation followed by cation-exchange chromatography (CM-Sepharose) and gel filtration (Sephacryl S-100) as described previously (28). The buffer was 50 mM Tris, pH 7.5. The identity of the purified protein was confirmed by mass spectroscopy.
Kinetic Measurements-Stopped-flow measurements and curve fitting were performed on SX-18MV and PiStar instruments (Applied Photophysics, Leatherhead, UK). The excitation wavelength was 280 nm, and the emission was collected with a 305-nm-cutoff filter. Mixing was 1 ϩ 10, and the final protein concentration was 0.8 M. All measurements were conducted at 25°C in 50 mM MES 3 at pH 6.3 (Sigma) using GdmCl as denaturant (UltraPure, Invitrogen).
Chevron Analysis-To minimize the effect of chevron curvatures at low and high GdmCl concentrations, the kinetic m-values were derived from the linear regime at the bottom of the chevron plots, i.e. midpoint Ϯ 2 M, using the standard equation, where log k f H 2O and log k u H 2O are the refolding and unfolding rate constants at [GdmCl] ϭ 0 M, and m f and m u are the slopes of the refolding and unfolding chevron limbs, respectively. Data analysis was performed with the software KaleidaGraph 4.0 and GraphPad Prism 4.0.
-Value Analysis-By systematically truncating side chains while measuring the effects on the folding and unfolding kinetics it is possible to map out the interaction patterns in the transition-state ensemble (30). In essence, mutations that slow down the refolding reaction are considered to target stabilizing contacts in its structure. The strengths of these interactions are measured by the -values. A -value of 0 indicates that the targeted side chain experiences a denatured-like environment in the transition-state ensemble, whereas a -value of 1 indicates that it has a fully native-like environment. Fractional values of are thus taken to indicate the degree of native-like interactions, i.e. ϭ 0.5 indicates that the truncated side-chain moieties form half of the native contacts in the folding transition state. To reduce the effects of extrapolation errors and transition-state shifts occurring at high [GdmCl] (29), the -values (30) were calculated from chevron fits (Equation 1) according to Equation 2, where A ϭ 1 M and B ϭ 4 M, referring to the GdmCl concentrations at which the refolding and unfolding rate constants were taken. The change of upon circular permutation was then calculated according to Equation 3.
In the cases where the transition midpoints were too low to allow precise fits of the refolding limb, i.e. midpoints Ͻ2 M GdmCl, m f was locked to the average value of the permutant in question. The current procedure for calculating the -values is slightly different from that in Refs. 14 and 19) and in a few cases causes small deviations from previously published data. The magnitude of these differences, however, is within the experimental errors and has no significance for the interpretation of data. Data for mutations with ⌬⌬G Ͻ 0.7 kcal/mol were excluded together with data where the m-value changes produced artificially high values of ⌬⌬G, i.e. V40A and F60A in S6 wt , L19A in P [33][34] , and F60A in P 81-82 (see Tables 1-6). Notably derivation of according to Equation 2 emphasizes mainly the structures of the early transition-state ensembles. Analysis of the later transition states needs to include the downward kinks in the unfolding limb appearing at high [GdmCl] (28,29) (see data in Fig. 2). Slight shifts to parallel folding pathways upon point mutation were in some cases indicated by anti-Hammond behavior but are currently not possible to account for in an accurate manner. However, the error due to such shifts is likely to be small but could contribute to underestimating the -values in the ␣1 nucleus (see scheme in Fig. 8).
Calculation of Changes in Sequence Separation, ⌬L-The average sequence separation between contacts lost (L) upon point mutation (14) was calculated from the Protein Data Bank structures of S6 wt and the circular permutants according to Equation 4, where L i is the sequence separation (loop length) between the individual carbon-carbon contacts (within a radius of 6 Å) lost upon mutation, and n is the total number of contacts lost. To emphasize the contribution of tertiary contacts, four residues on either side of the target side chain were excluded from the calculation. The change of sequence separation upon circular permutation was then calculated as follows.
The weighted parameter L w encompasses the influence from sequence neighbors within a given secondary structure element. That is, amino acid X is linked to the "actions" of its sequence neighbors X Ϫ 1 and X ϩ 1 according to Equation 6.
For mutations at the edges of the secondary structure elements either L (X Ϫ 1) or L (X ϩ 1) was set to 0. "Bootstrap" Analysis of the Correlation between ⌬ and ⌬L w -To investigate the contribution of individual data points to the ⌬L w versus ⌬ correlation we performed a "bootstrap-type" analysis. In the analysis, a sliding window of three consecutive mutations along the S6 sequence was omitted, and for each step the correlation (r) between ⌬ and ⌬L w was recalculated.

The Effect of Circular Permutation on Folding Kinetics and
Stability-The stopped-flow data show that the stability alterations following circular permutation of S6 are mainly exerted through increased unfolding rate constants (k u ) ( Fig. 1 and Tables 1-6). The accompanying changes in the refolding rate constants (k f ) are smaller overall and indicate for several of the permutants a stabilization of the transition-state ensemble ( ‡). A minimalist interpretation of these results is that the circular permutation involves two energetic components that are partly compensating: first, a structural stabilization due to the linkage of the N and C termini by a well designed hairpin (27), and second, an energetic penalty from cutting up the backbone in a new position. In the case of P 54 -55 , where the incision is done in a flexible loop, the effect of the new N and C termini linkage dominates and is manifested in an overall stabilization ( Fig. 1 and Tables 1-6). Notably the kinetic m-values and transition-state placement (␤ ‡ ) remain relatively unaffected by circular permutation with the exception of P 33-34 that shows decreased m u ( Fig. 1 and Tables 1-6). The folding nuclei of the different S6 permutants seem thus to maintain a similar solvent-accessible surface area despite considerable redistributions of the transition-state contacts as indicated by the -value data below (Figs. 2 and 3). An interesting detail of P 81-82 is that this permutant seems to dimerize in its native state at low GdmCl concentrations, giving rise to an addition unfolding phase (data not shown). For the purpose of this study, however, the complex kinetics of P 81-82 was resolved by unfolding the permutant at 0.8 M GdmCl where it remains monomeric.  Tables  1-6. On the whole, the results show that the S6 -values respond readily to circular permutation and that the response of the individual S6 permutants is different. The effects are clearly seen by comparing the -value changes at the level of secondary structure elements. For example, the -values of ␤4 are low overall in S6 wt , P 68 -69 , and P 81-82 but considerably higher in P 13-14 , P 33-34 , and P 54 -55 (Fig. 3). Notably the latter group also shows the most pronounced increase of k f in Fig. 1. A similar pattern is observed for the adjacent ␣2, whereas ␣1 at the opposite side of the proteins shows the reversed response by having the highest -values in S6 wt , P 68 -69 , and P 81-82 . It is also apparent that the -values of the central ␤-strand 1 remain quite similar around 0.5 in all permutants, thus presenting a pivot point around which the distribution of nucleating contacts can move to either side of the S6 structure depending on detailed wiring of the polypeptide backbone. In addition to this general redistribution of the nucleating contacts upon circular permutation we note that an isolated cluster of neighboring side chains in P [13][14] shows -values that are anomalously high, i.e. V75A and V88A ( Table 2). The cause of the anomaly is not . is calculated from log k f 1 M and log k u 4 M (Equation 1), and ⌬L and ⌬L w are calculated according to Equations 4 -6. ⌬G D-N and ⌬⌬G D-N are in units of kcal/mol. element is the average -value for the individual secondary structure elements.  MP is the transition midpoint.
. is calculated from log k f 1 M and log k u 4 M (Equation 1), and ⌬L and ⌬L w are calculated according to Equations 4 -6. ⌬G D-N and ⌬⌬G D-N are in units of kcal/mol. element is the average -value for the individual secondary structure elements. ND, not determined. yet established but seems to stem from misfolding in the transition-state ensemble of P [13][14] . The detailed, quantitative assessment of how the -value distribution of S6 changes upon circular permutation is presented in the discussion below.

Changes of the Folding Process upon Circular Permutation: Influence of Sequence Separation and Backbone Neighbors-
From the mutant data in Figs. 2 and 3, it is apparent that circular permutation of the S6 structure leads to significant changes of the folding process (27,31). The effect is particularly clear in ␣-helix 1 and ␤-strand 4 where the -values display consistent changes of more than 0.2 units and for the radical response at positions 75 and 88 in P [13][14] . To examine these -value changes quantitatively, we have previously modeled the effect of circular permutation by the site-specific parameter ⌬L (Equations 4 and 5) (19). In some analogy with the global parameter contact order (32), ⌬L provides a measure of sequence separation between side-chain contacts but not for the individual long range contacts probed by the -value mutations (14). The simplistic analysis shows that the -values increase in positions where ⌬L goes down and diminish where ⌬L goes up. The correlation is not unique for S6 but has also been resolved for permutants of the SH3 domain (19), suggesting that the -value response to increased sequence separation is general: if the loops between individual contact pairs are extended, then their contribution to the folding nucleus is reduced. In this study, the addition of the final permutants   Fig. 4). The reason for this apparent reversion is in part that some of the most influential data points at positive ⌬L values break the correlation by being too high. Importantly the latter, deviating data points are unlikely due to experimental errors but indicate that the minimalist treatment of the sequence separation in Equation 4 is in part unrealistic: it assumes that the residues act independently of their sequence neighbors. The problem is best illustrated by the -value of V88A in P 81-82 that changes in good accord with those of the neighboring probes in ␤4, i.e. V85A and V90A. Yet V88A yields a ⌬L value that is of opposite sign of its sequence neighbors, making the mutation an outlier in the plot (Fig. 4). The cause of this deviation is simply that some of the long range interactions of V88A extend to ␣2, hence bridging the gap of the backbone incision. When it comes to control of the folding reaction, however, the contribution of these interactions is small in relation to the more extensive and conflicting contact bias of the other residues in ␤4. Thus, to introduce some persistence length of the backbone we included a weight from the nearest neighbors in the L calculation according to Equation 6, L w . The correction leads to a small improvement of the correlation with ⌬: r  increases to 0.69 (Fig. 4). The result indicates, not surprisingly, that the action of a residue in the folding transition state is coupled to the action of its sequence neighbors. Even so, the most significant contributor to obstruct the correlation between ⌬ and ⌬L w seems to persist. A group of -values at large negative values of ⌬L w appears to be insensitive to circular permutation, giving rise to a trumpet-like shape of the plot. Again this pattern is not due to experimental errors but stems from the mechanistic details of the S6 folding reaction.
Folding at the Level of Individual Side Chains and Secondary Structure Elements-Upon closer examination of how the transition-state ensemble responds to circular permutation, it is apparent that the ⌬ values at fixed sequence positions display considerable variation. It is also apparent that the ⌬ values can be grouped according to their positions in the S6 structure. For example, the side chains in ␤4 are biased to negative ⌬L w and display correspondingly the highest values of ⌬ (Fig. 5). Vice versa, the data points for ␣1 are found at the bottom left-hand side of the correlation (Fig. 5). The favorable span of ⌬ and ⌬L w values associated with ␤4 and ␣1 result also in relatively high individual correlation coefficients of r ϭ 0.64 and r ϭ 0.80, respectively. The effect of circular permutation on ⌬L w for mutations in ␤3 and ␣2, however, is smaller overall, manifested in data points that are grouped rather symmetrically around the origin. Accordingly the contribution from ␤3 and ␣2 to the global correlation between ⌬ and ⌬L w is relatively modest. In apparent contrast, the ⌬ values of ␤1 seem not to respond to circular permutation at all. Despite the large negative ⌬L w values of this secondary structure element, the ⌬ values fall along the x axis, save the odd data point of V6A in the anomalously condensed transition state of P [13][14] (Fig. 5). Intriguingly it turns out that this deviating behavior of ␤1 is a direct prediction from a folding mechanism with two competing folding nuclei that are partly overlapping.
Folding at the Level of Overlapping Nuclei-In our previous studies of the S6 permutants P [13][14] (14), P 54 -55 , and P 68 -69 (19) and the sequence-divergent S6 from Aquifex aeolicus (22), it was indicated that the S6 structure is composed of two competing folding nuclei: ␤1 ϩ ␣1 ϩ ␤3 and ␤1 ϩ ␣2 ϩ ␤4. These two-strand helix motifs match the size of an independently folding unit, a so-called foldon (33,34), and also represent alternative starting points for the folding reaction (20). Depending on permutation and sequence details, however, the different S6 variants will choose different routes across this folding landscape (see Ref. 35). The diffuse transition-state structure of P 81-82 suggests that this protein folds by both channels, i.e. P 81-82 has similar probabilities to fold by either the ␣1 or the ␣2 nuclei, whereas the polarized transition state of P 13-14 indicates  Tables 1-6. a strong preference for the ␣2 channel. Thus, although the overall features of the S6 energy landscape is determined by topology, the folding pathway of a specific S6 variant is ultimately determined by the sequence details and could vary. This modular organization of the protein structure into competing foldons seems not to be unique for S6 but is also discerned in proteins with other types of topologies (35)(36)(37)(38). An interesting feature of the S6 nuclei, however, is that they are partly overlapping: by sharing ␤1 the folding of one side of the protein is coupled to the folding of the other. To facilitate the comparison of the different S6 permutants, we have schematically colorcoded the secondary structure elements according to their average -values (Tables 1-6 and Fig. 6). First it can be seen that the -value distributions of all S6 variants encompass ␤1.
In addition, all proteins indicate the involvement of either or both of the helices. S6 wt and P 81-82 show average -values above 0.15 in both ␣1 and ␣2, whereas P 13-14 , P 33-34 , P 54 -55 , and P 68 -69 tend to be polarized to one or the other (Tables 1-6 and Fig. 6). Finally the -values indicate the contribution of a flanking strand on either side of ␤1, making it possible to outline the contours of the two-strand helix motif in each of the permutant transition-state structures. Taking into account also the lower -values it is possible to discern a second, partly over-lapping, two-strand helix motif in the diffuse transition-state ensemble of P 81-82 (Fig. 6).

Additional Support for Parallel Pathways: the Nuclei Overlap Region
Displays Constant -Values-The critical question is then: what says that the diffuse -value distributions observed for S6 wt and P 81-82 represent parallel folding over two competing pathways rather than folding over a "single" pathway with a globally expanded, transitionstate structure? In support of the former scenario, all S6 variants have very similar folding rate constants despite radical changes of the appearance of the folding nucleus. The energetic costs of forming the structurally analogous ␣1 and ␣2 nuclei are essentially the same. On the other hand, if the folding reaction would be uniform the large changes in the character of the transition state are expected to have larger kinetic impact (17). The folding nuclei for the different S6 variants also have a very similar solventaccessible surface area as measured from the kinetic m-values ( Fig. 1 and Tables 1-6). Such invariable m-values are in perfect accord with a shift between two equivalent foldons but are difficult to reconcile with a single, structurally variable nucleus. Yet another point is that the existence of two competing pathways provides a simple explanation for why the folding kinetics of the S6 variants are relatively insensitive to changes in protein stability and contact order of the folded states (31). In a system with two competing pathways even major entropic or energetic perturbations in one part of the structure could have a relatively modest impact on the global refolding rate constant: the flux of molecules simply is redistributed from the penalized nucleus to the other. Now coming back to the relation between ⌬ and ⌬L w , a direct test of the competing pathway model is provided by how the -values in the foldon-overlap region change with circular permutation. As the interactions anchoring the pivotal ␤1 will change from one side of the protein to the other, the associated -values are expected to change less than those in the other secondary structure elements. In the balanced case, the -values of ␤1 could even be completely unaffected by pathway changes: half of the contacts are with the ␣1 nucleus, and half are with the ␣2 nucleus, yielding constant, fractional -values below 0.5 regardless of how the protein progresses down the folding funnel. Consistent with this scenario, the experimentally observed -values of ␤1 display no appreciable change upon circular permutation despite substantial changes in ⌬L w (Fig. 5). The sole exception is the high . Spatial distribution of the -values, color-coded onto the structures of S6 wt , P 13-14 , P 33-34 , P 54 -55 , P 68 -69 , and P 81-82 . The data indicate that the folding nucleus of S6 undergoes pronounced changes upon circular permutation. A schematic top view of the same data showing the average -value for the individual secondary structure elements is presented in Fig. 6. For clarification, two of the residues in ␤4 of S6 wt show Ͼ 0.15, but the impact of these low -values is lost in the average in Fig. 6. This illustrates that the graphical representation of the -value distribution is sensitive to the thresholds used and could thus display slight variation between different studies even though the underlying data are within experimental errors. The detailed description of how the -value distribution changes upon circular permutations is given by the numeric data in Tables 1-6. OCTOBER 10, 2008 • VOLUME 283 • NUMBER 41 -value of V6A in the anomalously condensed transition state of P [13][14] . Accordingly the correlation between ⌬ and ⌬L undergoes a marked improvement upon elimination of the ␤1 data: r increases to 0.82 or to 0.89 when the data are averaged over the individual secondary structure elements (Fig. 4). As a blind test of this result, we used a bootstrap approach to determine how r depends on the exclusion of all possible sequence segments. The data from a sliding window of three consecutive -values along the S6 wt sequence are shown in Fig. 7. In support of the conclusions above, the bootstrap analysis singles out ␤1 as the sequence region that does not comply with the correlation between ⌬ and ⌬L w (see data in Fig. 5).

Circular Permutants of S6
The Wild-type -Values Are Partly a Function of Heterogeneous Contact Energies-It is thus apparent that changes of upon circular permutants can be simply predicted from sequence separation, but can this rationale also be used to predict the original -values of S6 wt ? The answer is no. The reason is that the relation between ⌬ and ⌬L only captures the chain entropy component of the folding process and does not explicitly account for heterogeneities in the contact potential. As S6 wt is the base line, any heterogeneity in the contact free energies is cancelled by subtraction leaving just the effect of the chain entropy differences. In support of this assumption, the stability loss upon mutation in a specific sequence position (⌬⌬G D-N ) shows little variation across the different S6 constructs (Tables  1-6). Looking at the absolute levels of , however, these are substantially skewed by heterogeneities in the contact free energies: the interactions keeping together the entropically penalized N-and C-terminal regions are stronger than those joining regions that are closer in sequence (14). As a consequence of this energy-entropy compensation, the folding probability becomes rather uniform along the S6 wt sequence, producing the characteristically diffuse -value distribution of S6 wt (14,17). If this heterogeneity is not accounted for, computational analysis of the S6 wt folding reaction will yield an overly polarized nucleus at positions with low sequence separation. On the other hand, if the potential is tuned to match the experimental contact free energies, as done by Matysiak and Clementi (24), simulations reproduce remarkably well the experimental -values not only for S6 wt but also for the circular permutants. A similar, good correspondence with experimental data was also obtained by correlating long loops with strong contacts as done by Wang and co-workers (25,39). Note also the strategies used in Refs. 40 and 41.  Equations 4 -6). A, plot of ⌬ versus ⌬L according to Equations 4 and 5. B, plot of ⌬ versus ⌬L w according to Equation 6. The improvement of the correlation for ⌬L w suggests that the action of residues that are close in sequence is to some extent coupled. Even so, the plot of ⌬ versus ⌬L w shows a trumpet-like shape because the -values in ␤1 (the overlap of the two competing S6 nuclei) seem insensitive to permutation. C, upon removal of the -values in ␤1 the correlation between ⌬ versus ⌬L w increases to r ϭ 0.82. D, plot of ⌬ versus ⌬L w where the individual data points are averaged over the individual secondary structure elements and ␤1 is excluded (r ϭ 0.89). The improved correlation of this plot indicates that the persistence length of the S6 sequence with respect to folding nucleation approaches the length of a secondary structure element. With the sole exception of ␤1, the individual secondary structure elements seem to comply with the global ⌬L w correlation. Data points from the anomalously high -values of L75A and V88A in P [13][14] are not shown because they fall outside the plots of ␣2 and ␤4. Possibly the outlier V6A in ␤1 of P [13][14] is also part of this contiguous cluster of anomalous -values, the origin of which seems to be non-native contacts in the transition-state ensemble.
An explanation for the biased contact potential of S6 is that there is a selective pressure to optimize the folding cooperativity to avoid misfolding and aggregation (14). The potential is tuned to make each folding and unfolding event equally (un)likely (17). In principle, such equalization of the folding probability reduces the preference for any specific trajectory that could otherwise be associated with native-state fraying or a population of structurally promiscuous intermediates. From a macroscopic view, such a broad folding progression is manifested in a globally diffuse transition-state ensemble where each contact is present to a fractional extent (17). Within the context of two competing pathways, optimization of folding cooperativity would mean that both pathways are about equally populated as indicated for S6 wt and P 81-82 . The data presented in this study suggest that the balance between the two pathways is buffered by structural overlap.
Competing Nuclei: a Simple Reason for Complex Behavior-In summary, the folding landscape of S6 can be schematically visualized as shown in Fig. 8 where the main trajectories are linked to folding of either of the two competing nuclei: a funnel of funnels (see Refs. 36 and 42). Notably these global features of the S6 landscape are linked to topology but do not imply that all variants or divergent members of the S6 family need to sample both trajectories (22, 35) (see Fig. 3). Rather the landscape sets the boundaries for how the folding reaction is free to vary. Depending on the specific sequence details one protein could favor trajectory 1, and another protein could favor trajectory 2, but upon directed perturbations of the systems the missing trajectory is expected to show up. In addition to the nuclei in Fig. 8 there are likely to be other combinatorial ways of nucleating the S6 structure that have been omitted because they seem relatively rare. For example, the simultaneous formation of both nuclei and an additional path linked to a third nuclei composed of strands ␤1, ␤3, and ␤4 (see the structure of WW domains (43)). Despite these limitations, the two-component landscape captures at a structural level the most essential characteristics of the S6 folding reaction. It identifies the mutations inducing a Hammond postulate shift between consecutive transition-state ensembles (28) and mutations that shift the folding reaction to parallel transition states (20). Notably shifts between the ␣1 and ␣2 trajectories upon point mutation are in several cases associated with anti-Hammond behavior (44,45) indicating that the early ␣2 nucleus is slightly more expanded than the ␣1 nucleus (see Fig. 8). In parallel studies, Werbeck and Itzhaki (37) and Lowe and Itzhaki (38) have arrived at an analogous folding scheme for modular ankyrin repeat proteins based on pathway perturbations following point mutations. This indicates that deconvolution of protein structures into component nuclei presents a general strategy for dissecting complex folding reactions. Even so, the simplistic landscape in Fig. 8 fails to shed any light on the exaggerated ⌬ response in part of P 13-14 (see Fig.  5). The cluster of anomalously high -values in this permutant (i.e. positions 75, 88, and possibly also 6) seems due to transient misfolding and needs an extended description based on nonnative contacts. 4 The distinctive detail of the S6 reaction in Fig.   4 E. Haglund and M. Oliveberg, unpublished data. α β α FIGURE 6. Top, schematic outline of the S6 secondary structure elements and the two overlapping folding nuclei. These nuclei also represent two competing channels in the S6 folding-energy landscape (see Fig. 8). The structural overlap, ␤1, forms a similar number of contacts with both the ␣1 and ␣2 nuclei, 78 and 49, respectively. The arrangement renders the -values of this secondary structure element relatively insensitive to the changes between the two competing folding channels; upon pathway alterations the contacts simply switch from one nucleus to the other, maintaining a similar contribution to the stability of the transition-state ensemble. Bottom, the dominant folding nucleus of S6 wt and the individual circular permutants. The secondary structure elements are colored according to average -values (see Fig. 3 and Tables 1-6). 8, however, is the resolution of the nuclei overlap. Identification of such a structural coupling between multiple, elementary nucleation motifs not only has implications for structural evolution but also presents a simple explanation for how folding cooperativity is mechanistically controlled in larger proteins with more complex topology. FIGURE 8. Schematic view of the competing folding pathways of the S6 energy landscape (see the structural outline in Fig. 6). If the two competing nuclei of S6 represent independent folding units, their combined action can be treated as two folding funnels merging into a common native state. The free energy projection of such a landscape will produce a broad folding barrier with two, parallel, early transition states and a more convergent late transition-state ensemble. These early and late maxima are separated by minima where either of the nuclei are formed and the other is still unstructured. The stability of these intermediates is determined by the energetic coupling between the two folding units. If the coupling is weak, they may form independently of one another and appear as populated intermediates. If the coupling is strong and the stability of one folding unit depends on the degree of folding of the other, as is the case with a structural overlap, the intermediates will be unstable, and folding cooperativity will be optimized. Notably the -values in this study emphasize mainly the structures of the early transition-state ensembles (Fig. 3). Analysis of the later transition states needs to include the downward kinks in the unfolding limb appearing at high [GdmCl] (28, 29) (see Fig. 2).