Collagen triple helix formation can be nucleated at either end

The directional dependence of folding rates for rod-like macromolecules such as parallel α -helical coiled-coils, DNA double-helices and collagen triple helices is largely unexplored. This is mainly due to technical difficulties in measuring rates in different directions. Folding of collagens is nucleated by trimeric non-collagenous domains. These are usually located at the C-terminus suggesting that triple helix folding proceeds from the C- to the N-terminus. Evidence is presented here, that effective nucleation is possible at both ends of the collagen-like peptide (Gly-Pro-Pro) 10 , using designed proteins in which this peptide is fused either N- or C-terminal to a nucleation domain, either T4-phage foldon or the disulfide knot of type III collagen. The location of the nucleation domain influences triple-helical stability, which might be explained by differences in the linker sequences and the presence or absence of repulsive charges at the carboxy-terminal end of the triple helix. Triple-helical folding rates are found to be independent of the site of nucleation and consistent with cis-trans isomerization being the rate-limiting step.

been performed (2) and for collagen NMR diffusion was proposed as a potential method to determine directionality (3). The collagen triple helix is a linear structure composed of three left-handed polyproline-II-type helices. Chains in this conformation are not stable as individual structures, but associate to form a right-handed triple helix which is stabilized by hydrogen bonds between chains (4). Formation of the triple helix is only possible if every third residue in the sequence of the chains is glycine (G). Imino acid residues favour polyproline-II-type helices and therefore the frequency of proline (P) residues in the typical collagen like sequence (Gly-X-Y) n is high in X-and Y-positions. 4-hydroxyproline plays a special role in stabilizing the triple helix (5,6) and is frequently in the Y-position. Collagen triple helices are very abundant structures. They occur in all collagens as long rod-like elements and form short rods in the first component of complement C1q, lung surfactant protein SPA, mannose binding protein, scavenger receptors and many other extracellular proteins (7). Collagen triple helices are slowly folding structures with peptide cis-trans isomerization as the rate-limiting step (8). Cis-trans isomerization is often rate limiting also for globular proteins (9) and is dominating for collagens because of the many X-Pro bonds (X stands for any amino acid residue) with their high probability to form cis peptide bonds in the unfolded state. Another peculiarity for collagens is the need of nucleation domains for correct and sufficiently fast triple helical folding. It was found that trimerizing noncollagenous domains are essential for initial chain association and chain registration. This requirement for oligomerization and registration was explained by the need to establish a high local concentration at the nucleus and to prevent mismatched structures. For thermodynamic and kinetic studies, collagen-like peptides with designed oligomerization and nucleation domains have been particularly useful (10)(11)(12)(13). Spontaneous folding of the by guest on March 24, 2020 http://www.jbc.org/ Downloaded from collagen triple helix is extremely slow, follows a third order reaction and is therefore strongly concentration dependent (11,14). For most collagens NC-1 domains are found at the C-terminus leading to the idea that triple helices may only or preferentially fold from the C-terminus. Recently it was suggested that a three-stranded coiled-coil domain at the N-terminus of collagen XIII can induce collagen folding from the N-terminus (15). Similar suggestions have been made for other collagens (16). This stimulated us to measure the rate of collagen triple helix folding in both directions applying designed proteins in which a globular oligomerization domain or a disulfide knot are attached to either the N-or Cterminus of (Gly-Pro-Pro) 10 .

Construction of expression plasmids and production of recombinant proteins.
The designed proteins (GPP) 10 -Cys 2 and (GPP) 10 -foldon were expressed in E. coli as described (11,17). These models were used by these authors to study the increase in stability caused by the trimerization of the fused phage protein foldon (17) or by the linkage of the three chains by the disulfide knot Cys 2 = GPPGPCCGGG of type III collagen (11).
The inverted protein with the oligomerization disulfide knot at the N-terminus Cys 2 -(GPP) 10 was expressed using the same strategy but an additional Gly-Ser-spacer was inserted between Cys 2 and the (GPP) 10  (GPP) 10 -Cys 2 and Cys 2 -(GPP) 10 were trimerized at 20 °C by their (GPP) 10 -domains and formation of the disulfide knot was achieved by oxidation of the fully reduced material with a mixture of oxidized and reduced glutathione at a molar ratio of 9 : 1. Trimer formation was verified by mass spectroscopy and by analytical ultracentrifugation.

Mass Spectral Analysis.
For mass spectral analysis, the peptides were chromatographed on a 100 µm i.d.
(inner diameter) column packed with Vydac C18 reverse-phase material (5 µm particle size). The proteins were eluted with a linear 20 minute gradient from 0.1% TFA to 80% acetonitrile/0.1% TFA at a flow rate of 1 µl/min. The outlet of the column was directed to a microspray needle, which was pulled from 100 µm i.d. 280 µm o.d. fused silica capillaries (LC Packings) on a model P-2000 quartz micropipette puller (Sutter Instrument Company). The needle was placed into an XYZ micro positioner and the voltage was applied directly to the sample stream through the capillary union (18). Spray voltages were usually between 1100 and 1400V. Mass determinations were carried out on a TSQ7000 triple quadrupole mass spectrometer (Finnigan). Formation of trimers of the predicted mass was verified for (GPP) 10 -Cys 2 and Cys 2 -(GPP) 10 . In foldon-(GPP) 10 and (GPP) 10foldon the three chains are not connected by covalent bonds and mass spectrometry demonstrated monomers of the predicted mass.

Analytical ultracentrifugation.
Sedimentation equilibrium experiments were performed on a Beckman Optima XL-A analytical ultracentrifuge (Beckman Instruments) equipped with 12-mm Epon doublesector cells in an An-60 Ti rotor. The peptides were analysed in 5 mM sodium phosphate buffer, pH 7.4, containing 150 mM NaCl. Sedimentation velocity runs were performed at a rotor speed of 56,000 rpm and sedimenting material was assayed by absorbance at 234 nm.
Molecular masses were evaluated from lnA versus r 2 plots, where A is the absorbance and r is the distance from the rotor center (19). A partial specific volume of 0.73 ml/g was used for all calculations.

Results and Discussion
The thermal transition profile of Cys 2 -(GPP) 10 was measured by the change in circular dichroism at 221 nm and a midpoint transition temperature T m = 67 °C was obtained which is 15 °C lower than the value for (GPP) 10 -Cys 2 (11) (Fig. 1a, Table 1). As previously observed for (GPP) 10 (Table 1). In the presence of guanidine HCl the transition was fully reversible according to two criteria: (i) after heating to 50 °C for Cys 2 -(GPP) 10 and to 70 o C for (GPP) 10 -Cys 2 , for 5 min, the sample was cooled to 20 °C and the value before heating returned after 2 hours, (ii) a second transition profile after this refolding was identical to the initial one. The thermal transition of foldon-(GPP) 10 was measured by circular dichroism at 210 nm, where the spectrum of foldon alone shows no change upon increasing the temperature [11]. The midpoint transition temperature T m = 53 °C corresponds to the (GPP) 10 -triple helix and is 17 °C lower than the value for the (GPP) 10domain in (GPP) 10 -foldon measured by the same method (Fig. 1b). Full reversibility was demonstrated by repeated scanning. Comparing the T m -values with those of (GPP) 10 -triple helix without oligomerization domains (Table 1), the results demonstrate that nucleation at either the N-and C-terminus stabilizes the triple helix. The mode of stabilization by foldon at the C-terminus was investigated in some detail (11,17) and the crystal structure of (GPP) 10 -foldon has recently been solved (20). It was concluded that the stabilization is entropic in nature such that the oligomerization domain creates an internal concentration of about 1 M at the junction between foldon (or the disulfide knot) and (GPP) 10  The designed proteins with oligomerization domains either at the N-or C-terminus were then used to compare the rates of refolding. It is known that nucleation of triple helices from free (GPP) 10 chains is extremely slow, concentration dependent and of apparent third order (11). Connection of three chains by oligomerization domains increases the intrinsic chain concentrations to a level at which cis-trans isomerization steps in helix propagation, rather than chain finding steps, become rate limiting [11]. Given the very high concentration dependence of nucleation, it has to be expected that helix propagation will be preferentially initiated at the end at which the oligomerization domain is attached, although we cannot exclude the possibility that a small fraction of helix initiations may also occur at other sites. Figure 2 shows the time resolved phases, which follow a fast and kinetically unresolved jump of the circular dichroism signal. The fast phase mainly consists of the temperature dependent change of circular dichroism, which is visible also in temperature regions where no transition takes place (Fig.1). It may also include a very small fast phase of triple helix formation before the first cis-peptide bond is met (8,21). The slow phase reflects the major part of the transition of the (GPP) 10 -domain and long triple helices of real collagens fold entirely fold with a slow phase determined by cis-trans isomerization (8).
For Cys 2 -(GPP) 10 and foldon-(GPP) 10 it was verified that the kinetic time courses did not change upon increasing the concentration by a factor of 4 at 10 or 20 o C for Cys 2 -(GPP) 10 and foldon-(GPP) 10 . For (GPP) 10 -Cys 2 this was shown previously (11). Figures 2 and 3 clearly demonstrate that the first order rate constant of folding is identical for proteins carrying the oligomerization domains at opposite ends. The refolding rate constant for Cys 2 -(GPP) 10 was 0.00037 s -1 measured at 20 o C and a peptide concentration of 100 µM.
For foldon-(GPP) 10 a value of 0.00098 s -1 was determined for a peptide concentration of 51 µM at the same temperature. Also, the activation energies obtained from an Arrhenius plot of rate constants measured at different temperatures turned out to be identical within the limits of experimental error (Fig. 3). Values of activation energy larger than 50 kJ/mol confirm cis-trans isomerization steps as rate limiting events. Deviations from the maximum activation energies predicted for this step have been discussed (11).
In summary the results demonstrate that collagen triple helix formation can be nucleated at both ends. The rates are equal within experimental errors. In order to exclude possible effects of the nature of the nucleation domain two very different oligomerization sequences were employed. It was shown that their influence on protein stability was slightly different and also that the location of the domains at either end of the collagen-like domain was an important factor. These differences may be explained by energetic differences in the linker sequences between the collagen triple helix and the oligomerization domains and the presence of charges at the end of the triple helix. The rate limiting step of cis-trans isomerization is however independent of the direction of folding.

Figure 2
Refolding rates of (GPP) 10 with disulfide knots.  Refolding rates of (GPP) 10 with foldon domains. experimental data, dashed lines were fitted according to [11]. b. Temperature dependence of the refolding rate of foldon-(GPP) 10 (red circles and line) and (GPP) 10 -foldon (blue circles and line). The activation energies are 50.0 kJ/mol and 54.5 kJ/mole, respectively. Table 1 Midpoint transition temperatures T m of the (GPP) 10 -domain in designed proteins and their molecular masses.