Sequence Dependence of Renucleation after a Gly Mutation in Model Collagen Peptides*

Missense mutations in the collagen triple helix that replace one Gly residue in the (Gly-X-Y)n repeating pattern by a larger amino acid have been shown to delay triple helix folding. One hypothesis is that such mutations interfere with the C- to N-terminal directional propagation and that the identity of the residues immediately N-terminal to the mutation site may determine the delay time and the degree of clinical severity. Model peptides are designed to clarify the role of tripeptide sequences N-terminal to the mutation site, with respect to length, stability, and nucleation propensity, to complete triple helix folding. Two sets of peptides with different N-terminal sequences, one with the natural sequence α1(I) 886-900, which is just adjacent to the Gly901 mutation, and one with a GPO(GAO)3 sequence, which occurs at α1(I) 865-879, are studied by CD and NMR. Placement of the five tripeptides of the natural α1(I) collagen sequence N-terminal to the Gly to Ala mutation site results in a peptide that is folded only C-terminal to the mutation site. In contrast, the presence of the Hyp-rich sequence GPO(GAO)3 N-terminal to the mutation allows complete refolding in the presence of the mutation. The completely folded peptide contains an ordered central region with unusual hydrogen bonding while maintaining standard triple helix structure at the N- and C-terminal ends. These peptide results suggest that the location and sequences of downstream regions favorable for renucleation could be the key factor in the completion of a triple helix N-terminal to a mutation.

which there is defective mineralization of bones in type I collagen (2,3). Missense mutations that change one Gly in the repetitive (Gly-X-Y) sequence are the most common mutations (4). Such Gly mutations are found all along the collagen chain, suggesting that the loss of a Gly at any site in the triple helix has pathological consequences. The phenotype of the disease varies widely, depending on the type of amino acid substitution and the site of the mutation (5,6). There is evidence of abnormal folding of collagen in OI and other collagen diseases, which may relate to the pathology (7,8).
Folding of the triple helix is a complex, multistep process that includes association of three chains to form the supercoiled polyproline II triple helical structure (8). Collagen is synthesized in a procollagen form, with N-and C-terminal globular propeptides flanking the (Gly-X-Y) n central domain (9,10). Posttranslational hydroxylation of Pro and Lys residues in the Y positions and further glycosylation of Hyl occur while the chains are unfolded (11,12). Trimerization occurs through the association of three C-terminal propeptides, and then nucleation of the triple helix takes place at the (Gly-Pro-Hyp) n -rich sequence found at the C terminus of the (Gly-X-Y) n region. After nucleation of the three chains, the triple helix propagates in a zipper-like fashion from the C to the N terminus, with a rate-limiting step of cis-trans isomerization of Pro and Hyp (13,14).
OI has been characterized as a human folding disease (2). Enzymatic digestion studies show that OI mutations result in slower collagen folding rates (2,7,15). All OI collagens show increased levels of lysyl hydroxylation and glycosylation N-terminal to the mutation site (4,16). This has been suggested to result from delayed folding, since posttanslational enzymatic modifications can occur only on the unfolded chain. Thus, the appearance of increased amounts N-terminal to the Gly substitution site suggests that the mutation delays triple helix formation at the site, extending the time during which collagens can be modified by enzymes that act only on the unfolded state.
In order to study details of the folding mechanism, biophysical studies, including NMR and CD, have been applied to collagen model peptides (17). Peptides that satisfy the stringent (Gly-X-Y) n sequence constraint and have a high content of imino acids will form stable triple helices in solution and have been used to model folding of the triple helix (17)(18)(19). One system that has been well characterized is peptide T1-892, which contains a C-terminal (Gly-Pro-Hyp) 4 sequence modeling the nucleation domain of the Type I sequence and an N-terminal sequence of residues 892-909 from the ␣1(I) chain of type I collagen (GPAGPAGPVGPAGARGPA) (20 -22). Peptide T1-892 includes the site of a Gly to Ser mutation at position 901, leading to a mild form of OI. NMR and CD studies have indicated that T1-892 forms a rigid, uniform triple helix along the entire length of the peptide. Substitution of Gly to Ala or Gly to Ser to model the OI mutation results in the formation of a peptide that is folded at the C-terminal end and unfolded at the mutation site and N-terminal to it (23). Based on these data, it has been suggested that that the mutation interrupts the C-to N-terminal folding of the triple helix in the peptide.
In OI collagens, however, full-length triple helical molecules are formed, indicating that triple helix can be present on both sides of the substitution site. To better model OI collagen, peptide studies were initiated to develop models that could fold on both sides of the mutation. The sequences N-terminal to the mutation were varied with respect to length, stability, and nucleation propensity to better understand the requirements for completion of a triple helix. The presence of five tripeptides N-terminal to the mutation with a Gly-Ala-Hyp-rich sequence was necessary and sufficient to lead to a fully trimeric conformation, with triple helix formed both N-and C-terminal to the substitution site. This GAO-rich sequence promotes stability and was shown to act as an effective nucleus for triple helix formation. In contrast, the presence of five tripeptide units of the natural ␣1(I) collagen sequence N-terminal to the ␣1(901) OI mutation does not lead to complete peptide folding, suggesting that the triple helix may not continue past a mutation with the expected zipper-like C to N folding.

EXPERIMENTAL PROCEDURES
Peptides-Peptides were purchased from SynPep (Dublin, CA) and Alta Bioscience (Birmingham, UK). Peptides were purified on a C-18 column by high pressure liquid chromatography, and their identity was confirmed by mass spectroscopy.
NMR Spectroscopy-Samples T1-886, T1-886(G16A), T1-865, and T1-865(G16A) were prepared at a pH of 2.5 and a concentration of 7 mM. Concentration was confirmed by UV spectroscopy. Samples T1-892 and T1-892(GAO) 3 were prepared at a concentration of 9 mM. All NMR experiments were carried out on a Varian INOVA 500-MHz spectrometer. Heteronuclear single quantum coherence spectroscopy (HSQC) spectra were recorded at 0°C with a sweep width of 6000 Hz in the 1 H dimension and 1800 Hz in the 15 N dimension. Temperatures were calibrated with methanol (ϩ0.1°C). All of the pulse sequences employed enhanced sensitivity pulsed field gradient techniques (24). The { 1 H-15 N} NOE relaxation measurements were performed in the presence and absence of 1 H saturation at 0, 10, 20, 30, and 40°C (25).
The NMR folding experiments were performed as previously described (20). The sample was denatured outside of the NMR spectrometer by heating for 10 min at 50°C and then immersed in an ice water bath at 0°C for 30 s to quickly reduce the temperature to 0°C. The sample was then transferred to the spectrometer, which was precooled at 0°C. The total dead time for sample transfer was 1 min. A series of two-dimensional 1 H-15 N HSQC spectra was acquired every 3.5 min at 0°C. For each two-dimensional 1 H-15 N HSQC spectrum, 32 t 1 increments were acquired with two scans per increment. Each folding experiment was performed three times per peptide, and the intensities were averaged. The kinetics of folding were monitored by measuring cross-peak volumes as a function of time. All two-dimensional data were processed on a Silicon Graphics work station with the Felix 97 software package (MSI, San Diego, CA).
Circular Dichroism Spectroscopy-Circular dichroism spectra were recorded on an Aviv model 62DS spectrophotometer. Cuvettes of 1-mm path length were used, and the temperature of the cells was controlled using a Peltier thermoelectric temperature controller. Samples were dissolved in phosphate-buffered saline buffer, pH 7.0, and pre-equilibrated at 4°C for 2-4 days prior to recording spectra. Wavelength scans were collected with a 2-s averaging time, from 260 to 210 nm at 0.5-nm steps at 5°C. For temperature-induced denaturation, the ellipticity was monitored at 225 nm. Peptides were equilibrated for 2 min with steps of 0.3°C, giving an average heating rate 0.1°C/ min (26). The melting temperature, T m , was defined as the temperature at which the fraction folded (F) is equal to 0.5. For this monomer to trimer system, F was calculated as follows, where observed represents the observed ellipticity, trimer is the trimer signal, and monomer is the monomer signal. The CD folding experiments were carried out by heating the sample to 70°C for 15 min and then rapidly quenching in an ice water bath and placing in a pre-equilibrated CD cell at 5°C (21). The dead time was on the order of 25 s. The ellipticity at 225 nm was monitored, with a time constant of 2 s and time interval of 10 s. The half time of refolding, t1 ⁄ 2 , was determined as the time for the fraction folded to reach 0.5.

RESULTS
Peptides were designed to consider the effects of length, imino acid content, and nucleation propensity on the completion of folding in the presence of a Gly mutation ( Fig. 1, a and b).
Study of Peptides with Natural Sequences N-terminal to a Gly Mutation-Peptide T1-886 contains a (GPO) 4 C-terminal nucleation sequence and seven tripeptide units taken from the ␣1(I) sequence (residues 886 -906) of collagen. A homologous peptide T1-886(G16A) was synthesized with a Gly to Ala mutation at position 16, with five tripeptide units N-terminal to the mutation site (Fig. 1b).
CD spectroscopy was used to characterize the conformation and thermal stability of T1-886 and T1-886(G16A) (Fig. 2). Peptide T1-886 gave a characteristic triple helix CD spectrum at low temperature with a maximum near 224 nm and a minimum at 198 nm ( Fig. 2, inset). A sharp thermal transition with a T m of 25°C is observed as the temperature is increased. Peptide T1-886(G16A), which contains a Gly to Ala substitution at position 16, shows a significant drop in mean residue ellipticity (MRE) at 225 nm from 5194 degrees cm 2 dmol Ϫ1 to 1988 degrees cm 2 dmol Ϫ1 and a substantial decrease in melting temperature from 25 to 7°C, indicating a large destabilization of the peptide and loss of triple helix content. The reduced ellipticity and stability could arise from an alternate conformation in the central region of the peptide or could be due to interruption of the folding at the mutation site and the formation of a partially folded triple helix.
NMR studies were carried out on 15 N-labeled peptides T1-886 and T1-886(G16A) in order to ascertain the conformation of individual residues. The NMR spectrum of peptide T1-886 contains both monomer and trimer resonances for all labeled positions, Gly 28 , Ala 18 , and Gly 7 , consistent with the trimer form being in equilibrium with the monomer form of the peptide (Fig. 3). The appearance of three trimer peaks and positive { 1 H-15 N} NOEs (Table 1) for Gly 7 and Ala 18 is consistent with the nonrepetitive environment surrounding Gly 7 and Ala 18 and the formation of a rigid triple helix along the peptide chain. As seen previously, in addition to the monomer resonance, a single trimer resonance with positive { 1 H-15 N} NOE is seen for the rigid C-terminal repetitive (GPO) 4 triple helical environment at Gly 28 (27). In contrast, the HSQC spectrum of T1-886(G16A) shows distinct monomer and trimer peaks only for resonances that are C-terminal to the substitution site, suggesting that the Gly to Ala mutation disrupts the triple helical conformation. The C-terminal Gly 28 has a trimer resonance with a positive NOE value, indicating that the C-terminal end forms a rigid triple helix. Residues Gly 7 at the N-terminal end and Ala 16  CD and NMR studies on T1-886-(G16A) show that using five tripeptides of native sequences adjacent to the mutation site to increase the length N-terminal to the substitution site is not sufficient to reinitiate folding after the mutation site. Computational analysis suggests that further increasing the length alone will not promote N-terminal refolding because of the inherent instability of the N-terminal triplets adjacent to 886, GPVGPAGKS (28) (data not shown). These sequences do not have high stability or propensity to nucleate. However, at 865-879 there is a sequence of triplets, GPO(GAO) 3 GPV, with high imino acid content and high nucleation propensity (Fig. 1). It is possible that this GPO(GAO) 3 sequence could serve as a good renucleation sequence N-terminal to the Gly to Ala mutation site, and peptides were designed to test this hypothesis.
Sequences with High Imino Acid Content at the N Terminus Allow Folding after a Gly Mutation-Peptides T1-865 and T1-865(G16A) were designed similarly to T1-886 but now include the GPO(GAO) 3 GPV sequence from positions 865-879 of the collagen sequence N-terminal to the substitution site (Fig. 1). CD experiments indicate that T1-865 forms a highly stable triple helix with an MRE at 225 nm of 4460 degrees cm 2 dmol Ϫ1 and a melting temperature of 35°C (Fig. 4). Peptide T1-865(G16A) with a Gly to Ala substitution at position 16 has a very similar MRE of 4037 degrees cm 2 dmol Ϫ1 but a significantly reduced T m of 15°C relative to T1-865.
NMR of peptide T1-865(G16A) 15 N-labeled at different positions is used to examine the conformation of residues in different parts of the molecule. In the HSQC spectrum, both trimer and monomer resonances are seen for all labeled residues Gly 7 and Gly 28 , consisting of the N-and C-terminal residues, and for Gly 16 , which is the site of the Gly to Ala mutation (Fig. 5). In addition to the monomer resonances, residues Gly 7 and Gly 28 show only a single trimer resonance and positive { 1 H-15 N} NOEs, suggesting that they are both in a rigid repetitive triple helical environment consisting of (GPO) 4 at the C-terminal end and GPO(GAO) 3 at the N-terminal end (Table 1). Three trimer   DECEMBER 16 , indicating that the environment surrounding the mutation site is different for the three chains but rigid on the nanosecond-picosecond time scale. A large downfield chemical shift perturbation is seen for one of the Ala resonances (9.5 ppm) relative to the other two trimer resonances. Based on the NMR data, placement of the GPO(GAO) 3 sequence at the N-terminal end of the peptide allows formation of a rigid triple helix at both the C-and N-terminal ends of the peptide as well as the formation of an ordered trimer in the central mutation region.

Renucleation of Collagen Model Peptides
The effect of the N-terminal GPO -(GAO) 3 nucleation sequence on the directionality of triple helix folding can be seen in the real time NMR folding experiments of peptides T1-886, T1-886(G16A), T1-865, and T1-865(G16A) (Fig. 6, a-d). Comparison of the folding of native peptides T1-886 and T1-865 shows that T1-886 folds from the C-to N-terminal direction, whereas T1-865 appears to fold as a cooperative unit. Substitution of Gly to Ala in T1-886 results in a peptide that is only partially folded at the C-terminal end but has folding rates at the C-terminal (GPO) 4 end that are similar to those seen in the native peptide. Substitution of Gly to Ala in T1-865(G16A), which contains the GPO(GAO) 3 imino acid-rich N-terminal sequence, results in the loss of cooperative folding that is observed in the parent T1-865 peptide but does show folding around the mutation site. However, relative to the C-terminal folding, folding at the mutation site and N-terminal to it appears to be slowed down.
Evidence That GPO(GAO) 3 Is a Nucleation Domain-A peptide T1-892(GAO) 3 was designed to investigate whether the sequence GPO(GAO) 3 can function as a nucleation domain when located at the C terminus (Fig. 1c). CD data indicate that T1-892(GAO) 3 has an MRE of 4750 degrees cm 2 dmol Ϫ1 , similar to T1-892, indicating the formation of triple helix (22). The T m of T1-892(GAO) 3 is 14°C, lower than the T m of 20°C seen for T1-892. HSQC spectra of T1-892(GAO) 3 are typical of spectra obtained for folded trimers, and { 1 H-15 N} NOE data indicate that the peptide forms a rigid triple helix along the entire peptide chain (Table 1). Real time NMR folding experiments indicate that the C-terminal Gly 25 residue folds faster than the N-terminal Gly 7 residue, indicating that T1-892(GAO) 3 folds in a directional manner from the C-to N-terminal direction (data not shown), similar to folding of T1-892 (20,27). The ability of T1-892(GAO) 3 to form a stable triple helix that folds from the C to N-terminal direction indicates that GPO(GAO) 3 can function as a C-terminal nucleation domain.
To investigate the molecular basis of the more efficient nucleation capacity of (GPO) 4 compared with GPO(GOA) 3 , NMR dynamics experiments were used to assess the flexibility of residue Gly 25 in the monomer and trimer forms of T1-892 and T1-892(GAO) 3 as a function of temperature (Fig. 8). The all-trans monomer Gly 25 residues are more flexible than the trimer Gly 25 residues as seen by the negative { 1 H- 15 N} NOE values for the monomer and the positive values for the trimer. Substitution of GPO(GAO) 3 in the C-terminal end results in more negative NOEs at position Gly 25 with a more distinct effect at higher temperatures. The more negative NOEs in T1-892(GAO) 3 indicate that the lower imino acid content of GPO(GAO) 3 relative to (GPO) 4 in T1-892 results in more flexibility and mobility in the nucleation domain.

DISCUSSION
Redesign of sequences has allowed refolding of a triple helix on the N-terminal as well as the C-terminal side of a Gly substitution in the context of an OI mutation site. Peptides that contain the natural collagen sequence found directly N-terminal to the ␣1(I) 901 mutation site did not generate refolding around the Gly to Ala mutation. Substitution of the N-terminal imino acid-poor sequence with a more stable imino acid-rich GPO(GAO) 3 sequence found further downstream of the mutation results in a stable triple helix that folds around the Gly to Ala mutation site, with rigid triple helical N-and C-terminal regions. Further studies will be performed to understand whether the naturally occurring heterotrimers have similar features to the homotrimer models studied here.
Although the presence of a Gly substitution prevents a continuous triple helix, the NMR data indicate that the central region around the substitution site is ordered. The NMR data on T1-865(G16A) indicate the presence of three distinct trimer resonances at the mutation site, indicating that the three chains are in different environments. The high values of the NOEs indicate that each residue adopts a specific ordered conformation at the mutation site. The 1 H chemical shift for one of the Ala NH resonances at the mutation site is highly downfield shifted (ϳ9.5 ppm). This unusual downfield shift is also seen in the NMR spectra of a different peptide model of OI, (POG) 10 with a Gly to Ala substitution (31). Details of the structural perturbations at the mutation site of the (POG) 10 peptide with a Gly to Ala substitution have been described by an x-ray crystal structure (32)(33)(34), and it has been shown that the breaking of the repeating (Gly-X-Y) n sequence by the Gly to Ala substitution results in an alteration of the conformation with a local untwisting of the triple helix. At the substitution site, direct interchain hydrogen bonds are replaced with interstitial water bridges between the peptide chains. The significant downfield NMR chemical shifts observed for peptide T1-895(G16A) and for peptide (POG) 10 with the Gly to Ala substitution may reflect similar unusual hydrogen bonding patterns to the interstitial water bridges seen in the x-ray structure of the (POG) 10 Gly to Ala peptide.  The success in complete folding around a mutation site in our peptide systems depended on having the sequence GPO(GAO) 3 adjacent to the mutation. Further peptide designs show that the GPO(GAO) 3 sequence is a good C-terminal nucleation domain, although less efficient than (GPO) 4 , particularly at low concentrations. The nature of a renucleation sequence has not been defined in collagen, but it is proposed that a good renucleation sequence may be similar to a good nucleation sequence that allows folding to be initiated. Factors that promote nucleation have been shown to relate to high imino acid content and high stability (9,35). Experiments performed on a set of host-guest peptides with different tripeptide sequences have shown that GPO sequences fold faster than GAO sequences, but both are good nucleation sequences (35). Further studies have shown that nucleation propensity may also be related to the dynamics of the monomer state with a highly constrained monomer state related to the efficiency of folding (36). The CD and NMR results on peptides T1-892 and T1-892(GAO) 3 indicate that imino acid-rich sequences of the form (GPO) 4 or GPO(GAO) 3 can act as nucleation domains when placed at the C-terminal end of the peptide.
The observation that the GPO(GAO) 3 sequence, which occurs at ␣1(I) 865-876 but not the natural sequence ␣1(I) 886 -900 placed adjacent to the Gly 901 mutation site allows complete folding on both sides of the mutation has implications for considering the folding of OI collagens with a Gly substitution. It was previously suggested that the sequence of the triplet located N-terminal to the mutation site is important for renucleation and completion of collagen triple helix formation (2,23,37). The implication is that if renucleation occurs, it will take place adjacent to the mutation site and continue in a unidirectional propagation (Fig. 9a). The fact that the sequence that was directly adjacent to the mutation did not result in refolding suggests that renucleation sequences further downstream from the mutation rather than sequences immediately adjacent to it may be required for folding to resume in collagen. The stable and imino acid-rich GPO(GAO) 3 sequence found further downstream in the collagen sequence may act as a second nucleation domain, allowing folding to restart in collagen.
The discontinuity in folding at the mutation site suggests that alternate folding mechanisms may be required to explain OI collagen folding. Assuming that a renucleation site must be encountered in order to restart folding downstream of the mutation, then reverse folding will be required as well in order to form a complete triple helix around the mutation site (Fig.  9b). To investigate the nature of reverse folding, a set of peptides with N-terminal (GPO) 4 nucleation domains was previously designed (27). One peptide, T1Ј-892n, is similar in design to peptide T1-892(GAO) 3 and contains a (GPO) 4 nucleation domain N-terminal to the imino acid-poor 892 sequence. NMR and CD studies have shown that the peptide forms a stable rigid triple helix along the entire chain but that the directional nature of the folding is lost when nucleation takes place at the N terminus. To further investigate the details of folding at the N terminus, a second peptide, T1Ј-892-G10S, was designed to model the effect of an OI mutation N-terminal to a nucleation domain and contains a Gly to Ser replacement at position 10. This peptide folds up to the Gly to Ser mutation site from the N-terminal end, indicating that formation of a triple helix N-terminal to a mutation site is possible in the presence of a strong (GPO) 4 nucleating sequence at the N-terminal end (27). Reverse N-to C-terminal folding has also been observed in    folding model (a) and a discontinuous folding model (b). In both models, nucleation of the three chains occurs at the C-terminal end (depicted by the circle), and folding is interrupted at the ␣1(I) 901 mutation site (depicted with a red line). In the continuous folding model (a), renucleation occurs adjacent to the 901 mutation site (depicted by the circle), followed by propagation to the N-terminal end (depicted with an arrow), whereas in the discontinuous model (b), renucleation (depicted by the circle) occurs further downstream of the 901 mutation site at the GAO-rich site (depicted by a blue line), requiring reverse and forward folding to form full-length collagen (depicted with the arrows).
peptides that contain disulfide bonds or foldons at the N-terminal ends (38,39).
The existence of internal renucleation sites and the possibility of folding up to the mutation site from either the N-or C-terminal end supports the potential for a bidirectional mechanism to complete the folding of collagens with OI mutations. Crystal structure and NMR data of OI peptides support the existence of water-mediated hydrogen bonds at the interruption site, and the recruitment of these waters during the bidirectional folding may contribute to the folding delay that is observed in OI collagens. Examination of the collagen sequence shows that there are a number of regions that contain two sequential (GXO-GXO) triplets that may serve as internal renucleation sites. With the discontinuity in the folding and the lack of a unidirectional zipper-like folding model, the distance from the mutation site to the next renucleation site may also play a role in the observed folding delay in OI collagens and the degree of clinical severity.