Trimming Down a Protein Structure to Its Bare Foldons

Background: Structural cooperativity safeguards native proteins but is yet poorly understood. Results: Guided by the folding nucleus, we reduced the size of a protein without compromising its structural integrity. Conclusion: Folding nuclei are used in a modular fashion to extend or reduce the cooperative units of proteins. Significance: Understanding cooperativity is crucial for understanding protein function and for rational design of structural properties. Folding of the ribosomal protein S6 is a malleable process controlled by two competing, and partly overlapping, folding nuclei. Together, these nuclei extend over most of the S6 structure, except the edge strand β2, which is consistently missing in the folding transition states; despite being part of the S6 four-stranded sheet, β2 seems not to be part of the cooperative unit of the protein. The question is then whether β2 can be removed from the S6 structure without compromising folding cooperativity or native state integrity. To investigate this, we constructed a truncated variant of S6 lacking β2, reducing the size of the protein from 96 to 76 residues (S6Δβ2). The new S6 variant expresses well in Escherichia coli and has a well dispersed heteronuclear single quantum correlation spectrum and a perfectly wild-type-like crystal structure, but with a smaller three-stranded β-sheet. Moreover, S6Δβ2 displays an archetypical v-shaped chevron plot with decreased slope of the unfolding limb, as expected from a protein with maintained folding cooperativity and reduced size. The results support the notion that foldons, as defined by the structural distribution of the folding nuclei, represent a property-based level of hierarchy in the build-up of larger protein structures and suggest that the role of β2 in S6 is mainly in intermolecular binding, consistent with the position of this strand in the ribosomal assembly.

The biological function of proteins is not only a matter of shape and stability, but also relies on the energy landscape controlling the structural dynamics and occupancy of alternatively structured states (1). As a painful reminder of the latter, protein misfolding diseases (2,3) arise recurrently from mutations that would appear perfectly benign if judged by native structure and activity alone (3,4). Of particular interest here are the role and molecular origins of folding cooperativity, which safeguard the integrity of the native structure by reducing the occupancy of partly unfolded or misfolded states (5,6). A clue to how folding cooperativity is maintained and propagated over large structural distances was uncovered by the folding behavior of the ribosomal protein S6 (7,8). The structure of S6 seems to be composed of two cooperative subunits or "foldons," 1 and 2, that also act as competing nuclei in the folding process (8) (Fig.  1). S6 can thus fold along either of two parallel, and sequentially opposed, pathways, one starting with the nucleation of 1 and the other starting with the nucleation of 2 (9) (Fig. 1). Even so, the nucleation events of 1 and 2 are not entirely independent, but are coupled by a structural overlap in the form of the shared strand ␤1 (10). Such overlap between competing nuclei stands out as an efficient way of linking small cooperative subunits into larger structures without compromising global cooperativity or folding kinetics (10). Folding of large proteins would, in principle, simplify folding of smaller proteins that are structurally overlapping to parallel. However, from studies of S6 where the bias between the 1 and 2 pathways has systematically been shifted by circular permutation, it is apparent that the edge strand ␤2 never participates in the nucleation process (6,8,10). Despite being an integral part of the native sheet, ␤2 appears to be outside the cooperative unit of S6 (Fig. 1). It is then expected that S6 should also be able to fold cooperatively without ␤2, providing that the 1 and 2 foldons are sufficiently stable on their own. In this study, we have tested this possibility by simply removing ␤2 from the S6 structure; the strand was shuffled to the end of the sequence through circular permutation and then cut off (see Fig. 2). The truncated protein (S6 ⌬␤2 ) displays a highly dispersed solution NMR spectrum characteristic of rigid tertiary structures and a crystal structure that closely matches that of the wild-type protein. Moreover, S6 ⌬␤2 retains the cooperative folding behavior of wild-type S6 and the circular permutant from which it is derived. Taken together, this shows that ␤2 is not required for folding or native state integrity, but rather plays another role in the S6 structure, possibly in functional shape adjustment. This division of globular domains into cooperative core and subsidiary secondary structure is not only useful for guiding protein engineering, but also sheds light on dynamic behavior and structural constraints in protein evolution.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-Mutations were performed with the QuikChange Multi site-directed mutagenesis kit (Stratagene, Agilent Technologies, Santa Clara, CA). Oligonucleotides were purchased from Eurofins MWG Operon (Ebersberg, Germany), and all mutations were confirmed by sequencing (Eurofins MWG Operon). The protein was overexpressed in Escherichia coli strain BL21 and purified as described in Ref. 11.
NMR Spectroscopy-All NMR data were obtained at 25°C, pH 6.3, and with a protein concentration of 0.5 mM, on a Bruker 700-MHz spectrometer (Bruker Avance, Karlsruhe, Germany) equipped with a cryogenically cooled triple resonance probe. Assignment was by standard 15 N-{ 1 H} HSQC, 2 HNCA, HN(CO)CA, HN(CA)CO, HNCO, 15 N-edited NOESY and total correlation spectroscopy experiments. Dynamics were measured by standard T 1 , T 2 , and steady-state heteronuclear NOE experiments. Spectra were transformed using NMRPipe (12) and analyzed with the program Sparky (45). In the T 1 and T 2 experiments, the signal attenuation, from 10 different relaxation delays, was fitted to a single exponential decay and the relaxation rates were determined. The fitting routine was performed using MATLAB (MathWorks, Natick, MA).
X-ray Crystallography-Crystals of S6 ⌬␤2 were grown using the hanging drop vapor diffusion method at 20°C from drops containing equal volumes of protein (18 mg/ml) and a reservoir solution composed of 1.9 M ammonium sulfate, 0.1 M MES, pH 6.5, and 8% (v/v) 1,4-dioxane. The addition of dioxane was necessary to avoid the formation of crystal clusters due to excessive nucleation. Crystals appeared after 4 days and reached a maximum size of 0.2 ϫ 0.2 ϫ 0.1 mm in 10 days. The crystals were cryoprotected using 20% glycerol in the mother liquor. Crystallographic data were collected at 100 K at station I911-2 of MAX-lab, Lund, Sweden, using a 165mm charge coupled device (CCD) detector from MarResearch (Norderstedt, Germany). Images were indexed, integrated, and scaled using XDS (13). Crystals diffracted to 0.96 Å resolution, and data collection statistics are shown in Table 1.
The structure of S6 ⌬␤2 was determined by the molecular replacement method using the program Phaser (14), with the structure of ribosomal protein S6 from Thermus thermophilus (Protein Data Bank (PDB) code: 1RIS) (15) as a starting model. The model was built using the automatic building procedure in ARP/wARP (16), and the model was completed by manual building using the molecular graphics program Coot (17). In the initial stages, all refinement was carried out to a resolution of 1.2 Å using phenix.refine (18), with 5% of the total reflections randomly set aside for cross-validation. Subsequently, the resolution was gradually extended to the full resolution range, and refinement was carried out using SHELXL-97 (19) in conjugate-gradient least-squares mode. Disordered side chains were modeled, and their occupancies were refined using the restraint that the occupancies should sum to unity. Hydrogen atoms were not refined, but were included in "riding" positions. Anisotropic refinement was carried out in a series of rounds 2 The abbreviation used is: HSQC, heteronuclear single quantum correlation. . Notably, ␤2 seems to be outside this cooperative unit of the S6 structure. C, as folding can nucleate in either 1 (green) or 2 (blue), the folding reaction of S6 partitions into two competing pathways: 1 3 2 and 2 3 1.  with SHELXL. After each of these refinement rounds, the model was inspected manually with 2͉F o ͉ Ϫ ͉F c ͉ and ͉F o ͉ Ϫ ͉F c ͉ Fourier maps generated from SHELX fcf files using Coot. The refinement statistics are summarized in Table 1. The stereochemistry of the model was evaluated using MolProbity (20). Structural alignments and illustrations were made using the SSM algorithm (21) and PyMOL (22), respectively.
Stopped-flow Kinetics-Stopped-flow measurements and curve fitting were performed as in Ref. 10. All kinetic parameters were derived from the linear regime near the bottom of the chevron plots, i.e. at Ϯ2 M from the transition midpoint, to avoid contributions from the curvatures in the unfolding and refolding limbs at high and low [GdmCl]. The data were fitted according to the standard equation where k f H 2O and k u H 2O are the extrapolated values of the refolding (k f ) and unfolding (k u ) rate constants at 0 M GdmCl, and m f and m u are the slopes of the refolding and unfolding limbs, respectively.

RESULTS
Design of S6 ⌬␤2 -The new construct S6 ⌬␤2 was derived from the circular permutant P 54 -55 , which has a linker between the wild-type N and C termini and an incision between Lys-54 and Asp-55 in the flexible loop between ␤2 and ␤3 (Fig. 2). The purpose of the permutation is to change the position of ␤2 to the end of the sequence where it can easily be truncated by the introduction of a stop codon at Arg-78 (36) (subscripts refer to the numbering in S6 wt ) (Fig. 2). To prevent S6 ⌬␤2 oligomerization and better adjust the newly exposed surface to solvent, we introduced polar side-chain gatekeepers (23) in ␤3 (F7T (60) and L8T (61) ) and ␣1 (A62Q (20) and L63N (21) ) (Fig. 2). These gatekeepers were grafted from the mixed ␣/␤-protein heterogeneous nuclear ribonucleoprotein K (HNRNP K) (24) (PDB code 1KHM), which has a structure similar to that of S6 ⌬␤2 .
NMR Spectroscopy-The solution HSQC spectrum of S6 ⌬␤2 is well dispersed and displays all the characteristics of a folded protein with rigid tertiary structure (Fig. 3). Assignment of the backbone was complete, and identification of the secondary structure elements of S6 ⌬␤2 was by N H , H N , H ␣ , C ␣ , and CЈ FIGURE 2. Schematic outline of S6 ⌬␤2 construction. Red lettering denotes altered sequence positions, and strand ␤2 is colored gray. First, the N and C termini of S6 wt were linked by the TTPG loop, and the protein was incised between Lys-54 and Asp-55 to localize ␤2 at the C terminus (permutant P 54 -55 ). Second, ␤2 was truncated from the C terminus of P 54 -55 to obtain S6 ⌬␤2 . The latter step also included the insertion of gatekeepers in ␤3 (F7T (60) and L8T (61) ) and in ␣1 (A62Q (20) and L63N (21) ) to better adjust the newly exposed protein surface to solvent and avoid unwanted aggregation. The positions of these gatekeepers in the tentative S6 ⌬␤2 structure are shown in red. A, reference HSQC spectrum of P 54 -55 . B, the HSQC spectrum of S6 ⌬␤2 has sharp and well dispersed cross-peaks similar to those of P 54 -55 , consistent with a folded, monomeric protein. C, relaxation parameters R 1 , R 2 , and NOE of S6 ⌬␤2 indicate overall low dynamic motions of the S6 ⌬␤2 backbone as expected for a globally ordered structure.
backbone chemical shifts. The results indicate three ␤-strands (residues 6 -12, 33-39, and 45-51) and two ␣-helices (residues 15-29 and 56 -76), the positions of which overlap precisely with those of the parent protein S6 wt (supplemental Fig. S1). No peak splitting was found in sequential vicinity to the prolines, which indicates that S6 ⌬␤2 resides in a single isomeric state. In contrast, P 54 -55 exhibits proline cis-trans isomerization near the C terminus, which is truncated in S6 ⌬␤2 , and around Pro-3 near the N terminus, which is homogeneous in S6 ⌬␤2 (25). The dynamic motions of the S6 ⌬␤2 structure were mapped out by NMR relaxation experiments. At all positions, except for the outermost N-terminal residues, the relaxation data indicate a well ordered protein with a low degree of dynamic motion. The heteronuclear 15 N-{ 1 H} NOE values are high and uniform along the S6 ⌬␤2 backbone, with an average value of 0.81 Ϯ 0.17. Although no significant dips are seen along the sequence, residues 20 -25 at the center of helix 1 stand out by displaying higher NOE values (Fig. 3). This indicates that the central part of helix 1 is particularly rigid. The rotational correlation time ( c ) of the S6 ⌬␤2 molecule was determined according to the approximation (26) where N is the Larmor frequency of nitrogen, and R 1 (23) designed into the edge-exposed ␤3 in S6 ⌬␤2 seem thus to work according to prediction. X-ray Structure-S6 ⌬␤2 crystallizes in space group P2 1 2 1 2 1 with one molecule in the asymmetric unit. The exceptionally high resolution of the diffraction data (0.96 Å) and low mosaicity of the crystals are consistent with a compact molecule with little flexibility (Fig. 4). The structure consists of a threestranded ␤-sheet with two ␣-helices on the same side. The radius of gyration including hydration shell, calculated using CRYSOL (27), is 16.6 Å, in good agreement with the NMRderived value of 17.4 Å. The S6 ⌬␤2 structure is essentially identical to the in silico theoretical model for P 54 -55 except for the deletion of strand ␤2 and the following loop. It is also very similar to that of S6 wt . Because this is the first reported crystal structure of a circular permutant of S6, this is an important confirmation that the permutation has caused little perturbation of the overall structure. No extended ␤-sheets are formed between molecules in the crystal. This is in contrast to another artificial variant, S6 Alz , in which hydrophobic residues were engineered into ␤2, leading to a ␤-zippered tetramer in the crystal (23). The lack of oligomerization of S6 ⌬␤2 follows from the introduction of edge-strand gatekeepers in ␤3 and is consistent with NMR data.
Structural perturbations caused by the removal of ␤2 are minimal and are mostly limited to rearrangements of side chains in ␤3 (Fig. 4). The latter strand relaxes to new positions due to the removal of packing constraints from side chains in ␤2. For example, Trp-9 (62) repacks toward the central strand ␤1, facilitated by a rearrangement of Asn-49 (7) (Fig. 4). Removal of ␤2 also causes a small rigid body shift of helix ␣1 (residues 58 -76) relative to the rest of the structure. Shifts in C␣ positions in this helix between S6 wt and S6 ⌬␤2 range from 1.2 to 2.6 Å when the rest of the structure is superimposed. This indicates that ␤2 has a role in the exact positioning of ␣1 in the wild-type . Structure of ␤2-truncated protein S6 ⌬␤2 . The crystal structure of S6 ⌬␤2 has a resolution of 0.96Å (beige) and is overlaid with the in silico structure (designed from PDB entry 1RIS (9)) used as basis for the P 54 -55 design (white). This is used in the absence of a crystal structure for P 54 -55 . Data show that the protein also folds into a native-like structure without the edge strand ␤2. The most notable effect of ␤2 truncation is that the new N-terminal edge strand in S6 ⌬␤2 (␤3) is slightly shorter than in the S6 wt structure; the new N terminus folds away from ␤1. Also, there is a slight alteration in the position of ␣1. The residues that change most in conformation on the surface of the ␤-sheet as a result of ␤2 removal, namely Trp-9 (62) and Asn-49 (7) , are highlighted.
protein. An important interaction in this positioning could be the salt bridge between Lys-65 (23) and wild-type position 42 in ␤2 (supplemental Fig. S2). In contrast, ␣2 (residues 15-28) is not shifted upon ␤2 removal, except for some rearrangement at its N terminus. This rearrangement seems due to interactions with the N-and C-terminal linker (residues 41-44) introduced in the circular permutation.
Folding Kinetics-S6 wt displays a v-shaped chevron plot with approximately linear GdmCl dependence of the logarithmized refolding and unfolding rate constants (logk f and logk u ) (Fig. 5), characteristic for cooperative, two-state folding behavior according to Scheme 1 where D, ‡, and N are the denatured ensemble, the transition state and the native state, respectively. The protein stability for such a system is defined by where K DϪN ϭ [D]/[N] ϭ k u /k f is the first-order equilibrium constant for folding, and where m DϪN ϭ m u Ϫ m f are the m-values. Because m f and m u are measures of changes in the solvent-exposed surface area for the transitions D 3 ‡ and N 3 ‡ (Scheme 1), they are commonly used as experimental reaction coordinates for folding (28 -30 Table 2). The major part of this stabilization is likely due to the linkage of the N and C termini as mutations/incisions in the dynamic loop between ␤2 and ␤3 have generally small effects on protein stability (10). Coupled to the stability gain, P 54 -55 displays an increased refolding rate constant, which has previously been suggested to arise from a stabilization of the 2 pathway (10). Finally, the removal of ␤2 from P 54 -55 to obtain S6 ⌬␤2 results in a pronounced decrease of the transition midpoint; the stability decreases by 5.1 kcal/mol to ⌬G DϪN H 2O ϭ 4.0 kcal/mol ( Table 2). Most of this stability loss is due to an increased unfolding rate constant (Fig. 5), yielding an overall -value for ␤2 removal of approximately zero (Equation 5) where ⌬logk f and ⌬logk u are the changes in rate constants upon ␤2 removal (31). The result shows that ␤2 has only marginal impact on the folding transition state ensemble and selectively stabilizes the folded state (10). This supports the conclusion that ␤2 is outside the cooperative core of the S6 structure. Moreover, S6 ⌬␤2 displays a pronounced decrease of m u (Eq. A), indicating that its folded state buries less surface area than the parent protein P 54 -55 (Fig. 5 and Table 2). Such a decrease in buried surface area is expected as the truncated S6 ⌬␤2 is effectively a smaller protein than P 54 -55 and wild-type S6.

Foldons as Property-based Level in Structural Hierarchy-
Comparison of structurally divergent protein domains indicates that structural evolution to some extent occurs progressively by insertions, deletions, and substitutions of smaller structural units (32). Mutation or insertion of additional residues in a flexible part of the protein can cause it to condense into an ordered secondary structure element that adds to the mother domain. When the mother domain is a well optimized two-state protein (1), the structural integrity of this newly acquired secondary structure element will not be critical for global stability or folding; it can unfold locally or be replaced with a suitable disordered sequence without rupturing the scaffold to which it docks. In sharp contrast, the corresponding alteration of a secondary structure element within the cooperative unit of the mother protein is expected to cause global  Table 2.

TABLE 2
Kinetic data derived from the chevron plots in Figure 5   unfolding because the stabilities of two-state domains are generally very sensitive to core substitutions (1,31). To distinguish between such property-based differences within protein structures, we have employed an additional level of structural hierarchy based on foldons (7,8,10,33) (Fig. 1): a minimal cooperative unit that, in principle, would be able to fold cooperatively on its own (cf. Ref. 34). This does not always mean that the foldon would be thermodynamically stable on its own, or even contiguous in sequence, 3 but its folding transition, albeit to a high-energy state, would proceed over a free-energy barrier (1). The structure of S6 seems to be based on two partly overlapping foldons, 1 and 2, identified from the spatial boundaries of two competing folding nuclei (10) (Fig. 1). A notable feature of this arrangement is that ␤2, regardless of how the protein is rewired through circular permutation, falls outside the boundaries of the two folding nuclei (10); despite ␤2 being part of the S6 sheet, it seems not to be part of the cooperative core. The ability to remove ␤2 without compromising folding or stability lends further support to the notion that the properties of the various parts of a protein can be predicted from the spatial distribution of foldons. In the simplest case, the foldons are then expected to outline the inflexible parts of the protein machinery. However, there are also instances where the actual unfolding/folding transitions of foldons are used to control functional rearrangements. For example, reversible unfolding/ folding of foldons and domains can lend elasticity to arrayed repeat proteins (35), and the "cracking" of strategically placed foldons can relive local stress and thereby lower the barriers for specific global motions (36,37).
Foldon Expansion and Reduction-The solution structures and backbone dynamics of S6 wt and P 54 -55 have earlier shown that ␤2 displays dynamic motions in the region connecting to the seemingly flexible ␤2-␤3 loop (25). Together with the autonomous stability of S6 ⌬␤2 (Fig. 5 and Table 2), this indicates that ␤2 is free to unfold locally in the folded ground state of the wild-type protein. Even so, it is reasonable to believe that S6 could evolve to integrate ␤2 with the cooperative core, e.g. to better resist aggregation (2) or proteolytic digestion (38,39) or to allow further extension of the mother domain. Evidence that such changes of the foldon architecture indeed occur is provided by the S6-like protein muscle acylphosphatase (PDB code 1APS (40)), which comprises a C-terminal extension that folds up against ␤2 to form a five-stranded sheet (Fig. 6). Kinetic analysis suggests that the fifth strand is also integrated in the folding nucleus of acylphosphatase together with ␤2 and ␣1 (41) (Fig. 6). It is not yet established, however, whether this expansion of the cooperative unit leads to the introduction of a third foldon, and thus, three competing pathways in the folding energy landscape, cf. Ref. 8. Conversely, it should be possible to reduce further the S6 structure to a single foldon comprising just two strands and a helix. This minimal folding unit would then correspond to the smallest observed proteins that are able to fold into globular structures without co-factors (8). Indications that such a reduction could indeed occur are provided by ground state rupture of ␤4 in the pre-equilibrium of the unfolding reaction at high [GdmCl] (42). As the remaining structure still undergoes cooperative unfolding, it is expected to persist as an independent cooperative unit even in the absence of GdmCl. Consistently, the structures of ␣2 and ␤4 show relatively low FIGURE 6. Expansion/reduction of foldons and role of ␤2 in assembly of S6-like proteins. A, structure of S6 wt . B, the C-terminal extension of acylphosphatase forms an additional fifth strand that extends the sheet by docking to ␤2. The high -value of the mutation F94L in ␤5 (41) suggests that this sheet extension also involves an expansion/shift of the foldons of the cooperative core. C, conversely, dodecin from H. salinarum (43) seem to represent an example of foldon reduction because the structure of this protein matches S6 wt without ␣2 and ␤4. D, upon truncation of edge-strand gatekeepers in ␤2 (S6 Alz ), S6 associates by intermolecular ␤2 contacts (23). The figure shows the dimeric unit of the S6 Alz tetramer. E, similar association patterns are observed for the S6-like monomers of monooxygenase from S. coelicolor (PDB code 1LQ9) (44) and several other viral analogues, implying that ␤2 plays a key role in the preferred mode of dimerization. F, in the ribosomal complex (PDB code 1G1X), ␤2 of S6 wt provides anchoring point for the loop that coordinates S18. protection factors as measured by hydrogen-exchange NMR (9); in contrast to the 1 foldon (Fig. 1), which exchanges by global unfolding (EX1), ␣2 and ␤4 exchange by local motions in the native free-energy basin (EX2), suggesting that these secondary structure elements can be removed with maintained folding barrier. Although S6 can be expressed and purified with extensive side-chain truncations in the ␣2 and ␤4 region, 4 we have not yet been able to isolate the 1 foldon as an independent domain. The isolated 1 requires further optimization of core packing and solvent interface to populate under equilibrium conditions. An example of such an optimization is provided by the sequence-divergent domain of flavoprotein dodecin from Halobacterium salinarum (PDB 2V18) (43), which displays a topology very similar to S6 as it would appear without ␣2 and ␤4 (Fig. 6).
Possible Roles of ␤2-The question then arises: if not required for folding (Fig. 5) or structural integrity (Figs. 3 and 4), what is the role of ␤2? One possibility is that ␤2 originates from a loop between ␣1 and ␤3, which serves to co-localize the N and C termini in the superfamily of ferredoxin-like folds. An advantage of such co-localization, which is common among globular proteins, could be that it facilitates structural evolution by gene shuffling; the fold can be freely evolved, inserted, or deleted from loops of other domains. A more straightforward explanation is, however, that ␤2 is maintained for functional reasons. In the S6 structure, ␤2 provides an anchor for the loop that wraps over S18 in the ribosomal assembly (Fig. 6). Furthermore, comparison with other ferredoxin-like structures shows that ␤2 is the preferred interface for dimerization (see e.g. ferredoxin-like folds at the SCOP: Structural Classification of Proteins database). A representative example is monooxygenase from Streptomyces coelicolor (PDB 1LQ9 and several others) (44) where ␤2 joins two monomers by forming an anti-parallel sheet (Fig. 6). Interestingly, S6 can be made to assemble in an analogous fashion by truncation of charged gatekeeper side chains in ␤2 (23) (Fig. 6). In the wild-type protein, these side chains are employed as aggregation gatekeepers that prevent promiscuous edge-toedge interactions, and upon their removal, S6 assembles into ordered homotetramers (23) (Fig. 6). It is conceivable that the position of ␤2 outside the cooperative unit of the protein allows such quaternary assemblies to evolve relatively freely, with only minor penalties for the structural properties of the constituent monomers. A basic, two-component, recipe for designing proteins would then be to assemble in an overlapping fashion suitable foldons to shape the cooperative scaffold and then decorate this unit with mutationally tolerant loop and border segments for function and binding.