Molecular Mechanism of α1(I)-Osteogenesis Imperfecta/Ehlers-Danlos Syndrome

We demonstrate that 85 N-terminal amino acids of the α1(I) chain participate in a highly stable folding domain, acting as the stabilizing anchor for the amino end of the type I collagen triple helix. This anchor region is bordered by a microunfolding region, 15 amino acids in each chain, which include no proline or hydroxyproline residues and contain a chymotrypsin cleavage site. Glycine substitutions and amino acid deletions within the N-anchor domain induce its reversible unfolding above 34 °C. The overall triple helix denaturation temperature is reduced by 5–6 °C, similar to complete N-anchor removal. N-propeptide partially restores the stability of mutant procollagen but not sufficiently to prevent N-anchor unfolding and a conformational change at the N-propeptide cleavage site. The ensuing failure of N-proteinase to cleave at the misfolded site leads to incorporation of pN-collagen into fibrils. Similar, but weaker, effects are caused by G88E substitution in the adjacent triplet, which appears to alter N-anchor structure as well. As in Ehlers-Danlos syndrome (EDS) VIIA/B, fibrils containing pN-collagen are thinner and weaker causing EDS-like laxity of large and small joints and paraspinal ligaments. However, distinct structural consequences of N-anchor destabilization result in a distinct α1(I)-osteogenesis imperfecta (OI)/EDS phenotype.

Mature type I collagen is a heterotrimer of two ␣1(I) and one ␣2(I) chains folded into a 300-nm-long triple helix with short, unstructured telopeptides at the N-and C-terminal ends. The distinguishing feature of the collagen triple helix is an obligatory Gly residue in every third position on each chain. Collagen precursor (procollagen) is synthesized and folded within the rough endoplasmic reticulum. It is secreted by cells with N-and C-terminal propeptides still attached on each side of the molecule. Subsequent cleavage of the propeptides by specialized Nand C-proteinases triggers self-assembly of the mature collagen into fibrils. The fibrils co-assemble with a variety of different extracellular matrix molecules into the structural scaffold of skin, bone, and other connective tissues. Most mutations disrupting the structure of the type I collagen triple helix (usually substitutions of one of the obligatory Gly residues) cause moderately severe to lethal forms of osteogenesis imperfecta (OI), 2 which is a clinically heterogeneous group of disorders characterized by bone fragility and skeletal deformity (1)(2)(3).
In our previous study (4) we described a distinct group of ␣1(I)-OI/ EDS patients with combined clinical symptoms of the type III or IV OI as well as laxity of large and small joints and paraspinal ligaments more characteristic of Ehlers-Danlos syndrome (EDS). All of them had structural mutations within the first 90 N-terminal residues of the helical region of the ␣1(I) chain (five Gly substitutions and a 15-aa deletion). Their mutations altered procollagen structure within the N-propeptide cleavage site, resulting in incorporation of pN-collagen with uncleaved N-propeptides into fibrils. We hypothesized that the nature of EDS-like symptoms in ␣1(I)-OI/EDS patients is similar to type VII EDS (5,6) caused primarily by deletions of the N-propeptide cleavage site in ␣1(I) and ␣2(I) chains (EDS VIIA and VIIB, respectively) or by N-proteinase deficiency (EDS VIIC). However, it remained unclear why ␣1(I)-OI/ EDS patients had a somewhat different EDS phenotype (e.g. pronounced early scoliosis and no bilateral hip dysplasia) and why their collagen fibrils had more rounded cross-section under electron microscopy investigation.
Based on accumulated structural knowledge and analysis of the amino acid sequence, we proposed that the long suspected (7)(8)(9) domain organization of the collagen triple helix plays a crucial role in ␣1(I)-OI/EDS (4). Specifically, we postulated that a distinct, highly stable "N-anchor" folding domain is formed by the first 85 N-terminal amino acids of each chain of the type I collagen triple helix. It is separated from the rest of the triple helix by a highly flexible microunfolding region lacking the Pro and Hyp residues essential for triple helix stability. The N-anchor is responsible for the proper folding and stability of the N-terminal end of the triple helix. Disruption of the structure of the anchor domain by ␣1(I)-OI/EDS mutations propagates into the adjacent N-propeptide, thereby altering the folding of its cleavage site.
In the present study, we provide further biochemical and biophysical evidence for the existence of the N-anchor domain and demonstrate that the putative structural change in ␣1(I)-OI/EDS mutations is a complete unfolding of this domain above 34°C. We discuss implications of these findings for understanding the structural origin of the difference between ␣1(I)-OI/EDS and EDS VIIA/B patients, both in the appearance of collagen fibrils and in the EDS phenotype.

EXPERIMENTAL PROCEDURES
Preparation of Procollagen-Skin fibroblast cultures were established from dermal punch biopsies collected from OI and ␣1(I)-OI/EDS patients. Normal control cells (CRL-2127) were purchased from American Type Culture Collection (Manassas, VA). Mutant and control fibroblasts were grown to confluence at 37°C in Dulbecco's modified Eagle's medium (Invitrogen) containing 10% fetal bovine serum (Invitrogen) and 2 mM glutamine (Sigma) in the presence of 5% CO 2 . Culture medium was removed and fresh, serum-free Dulbecco's modi-fied Eagle's medium supplemented with 2 mM glutamine and 50 g/ml ascorbate (Sigma) was added to the cell cultures. The medium was harvested after 24 h; buffered with 100 mM Tris-HCl, pH 7.4, protected with protease inhibitors (25 mM EDTA, 0.2% NaN 3 , 1 mM phenylmethylsulfonyl fluoride, 5 mM benzamidine and 10 mM N-ethylmaleimide) and cooled to 4°C. Procollagen was precipitated by gradual addition of ammonium sulfate to a final concentration of 176 mg/ml and incubation at 4°C overnight, followed by centrifugation at 12,000 ϫ g for 2 h. Ammonium sulfate precipitates were dissolved in 0.1 M sodium carbonate, 0.5 M NaCl, pH 9.3, or 0.2 M sodium phosphate, 0.5 M glycerol, pH 7.4. Total protein concentration was measured by Micro BCA protein assay (Pierce) and collagen concentration was measured by Sircol assay (Accurate Chemical & Scientific Corp., Westbury, NY). Based on these assays, collagen constituted 30 -50% of the protein mix.
Preparation of Pepsin-treated Collagen-Ammonium sulfate protein precipitates were suspended in 0.5 M acetic acid and digested by pepsin (EMD Biosciences, Darmstadt, Germany) overnight at 4°C (ϳ0.1 mg/ml, ϳ1:10 pepsin:collagen). After digestion, collagen was precipitated by addition of NaCl to 0.7-0.9 M final concentration. The pellet was redissolved in 0.5 M acetic acid and dialyzed against 2 mM HCl. Purity of the preparation and the content of type III collagen were evaluated by gel electrophoresis as described below.
Fluorescent Labeling and Gel Electrophoresis-For fluorescent labeling, ammonium sulfate procollagen precipitates were dissolved in 0.1 M sodium carbonate, 0.5 M NaCl, pH 9.3. Pepsin-treated collagen was transferred into the same buffer by mixing a 2 mM HCl solution of the protein 1:1 with the 2ϫ concentrated buffer. The proteins were labeled by monoreactive Cy5 or Cy3 (Amersham Biosciences) as described (10). For gel electrophoresis analysis, aliquots of fluorescently labeled samples were mixed with lithium dodecyl sulfate sample buffer (Invitrogen) and analyzed on precast 3-8% Tris-acetate (collagen) or 6% Tris-glycine (procollagen) mini gels (Invitrogen). The gels were scanned on an FLA5000 fluorescence scanner (Fuji Medical Systems, Stamford, CT). Intensity profiles for each lane were extracted using ScienceLab software supplied with the scanner. Quantitative analysis of band intensities was performed by ScienceLab software or by using PeakFit software (Systat, Point Richmond, CA), when band deconvolution was required. The fraction of type III collagen in each sample was estimated from the intensities of ␣1(III) 3 and ␣1(I) ϩ ␣1(III) bands without and with the addition of dithiothreitol.
Proteolytic Cleavage and Peptide Mapping-Collagen labeled by Cy5 or Cy3 (0.1 mg/ml) was transferred into 50 mM Tris, 0.2 M NaCl, 2 mM CaCl 2 , pH 7.5, using AutoSeq G-50 micro-spin columns (Amersham Biosciences) and treated by chymotrypsin (Sigma; 0.5 mg/ml) or trypsin (Sigma; 1 mg/ml) at different temperatures (24 -38°C). Binary mixtures of collagens (mutant and control), one labeled by Cy5 and one labeled by Cy3, were prepared so that they could be co-processed by an enzyme in the same tube under identical conditions. Sample aliquots were collected after 20 min at each temperature or after different time intervals at a fixed temperature, mixed with the gel sample buffer (with added acetic acid to stop the reaction), and analyzed on 3-8% Tris-acetate mini gels. When necessary, the samples mixed with the gel buffer were rapidly frozen on dry ice until the end of experiment. Gel slices containing major bands were excised from the gel, treated with CNBr, and re-analyzed on pre-cast 12% Tris-glycine or 12% BisTris mini gels (Invitrogen). For better visualization of CNBr peptides, higher collagen concentration (up to 0.3 mg/ml) and/or more intense fluorescent labeling were used.
Differential Scanning Calorimetry (DSC)-DSC measurements were performed with 0.1-0.4 mg/ml procollagen solutions in 0.2 M sodium phosphate, 0.5 M glycerol, pH 7.4, as well as with pepsin-treated collagen solutions in the same phosphate/glycerol buffer or in 2 mM HCl. A study of normal control and several OI mutants revealed that additional chromatographic purification is not required for DSC measurement of procollagen denaturation thermograms. Other proteins co-precipitated with procollagen by 176 mg/ml ammonium sulfate do not contribute measurable heat in the temperature range of procollagen denaturation. The thermograms were recorded from 10 to 60°C in a Nano II calorimeter (Calorimetry Sciences Corp., Lindon, UT). Most measurements were performed at 0.125°C/min heating rate, which provides the best resolution of different denaturation peaks. In cases of limited protein amount, 1°C/min thermograms were recorded. Due to the effect of the scanning rate on collagen denaturation (11), the thermograms measured at 1°C/min could be compared only with each other. Each thermogram was background-corrected by subtraction of a linear baseline and normalized per unit collagen concentration (for quantitative analysis) or to give the same peak height (for easier visual comparison). To extract the contribution of type I collagen, the thermogram of purified human type III collagen (SouthernBiotech, Birmingham, AL) was rescaled to represent the amount corresponding to the fraction of type III collagen in the sample and subtracted from the thermogram of the mixture. For the rescaling procedure we utilized the similarity of type I and type III denaturation enthalpies (12). Deconvolution of mutant peaks in DSC thermograms was performed with the PeakFit software.
Circular Dichroism-CD spectra of 0.2 mg/ml collagen in 0.2 M sodium phosphate, 0.5 M glycerol, pH 7.4, were recorded from 215 to 250 nm at 100 nm/min scanning rate, 1 cm optical path length, 5 nm bandwidth in a J810 (Jasco, Easton, MD) spectropolarimeter equipped with a PFD-425S thermoelectric temperature controller (Jasco). To minimize ultraviolet damage, the shutter was closed during temperature equilibration between measurements (total UV exposure Ͻ15-20 min). The change in the fraction of triple helical conformation was estimated from the change in ellipticity at 223.8 nm. This wavelength was selected to minimize the direct effect of temperature on the ellipticity not associated with changes in collagen conformation. It represents an apparent isosbestic point, in which the CD signal from the normal control collagen is independent of temperature from 10 to 30°C.

Thermal Stability of the Triple Helix
DSC Thermograms-Normalized DSC thermograms (see "Experimental Procedures") of normal control and mutant collagens in different buffers are shown in Fig. 1. Heating of normal control collagen at a constant rate produces a single denaturation peak in DSC with a maximum at the apparent denaturation temperature T m . This peak contains a small contribution from type III collagen also present in the sample (Fig. 1A, Normal control). Whenever possible, we measured the content of type III collagen in the sample by gel electrophoresis (Fig. 1C) and corrected the thermograms for its contribution as described under "Experimental Procedures." Because of the relatively low content of type III collagen (Table 1) the correction was small and not critical for qualitative interpretation of the data (Fig. 1A).
Mutations within the collagen triple helix result in additional denaturation peaks, typically with reduced T m . We find that each peak corresponds to a distinct collagen form rather than melting of different domains within the same molecule. For instance, three denaturation peaks of molecules with no (T m Ϸ 42°C), one (T m Ϸ 40°C), or two (T m Ϸ 37.5°C) mutant ␣1(I) chains can be distinguished in G76E (Fig.  1A). Although most mutant peaks have lower T m , their specific dena-turation enthalpy appears to be the same as in normal control collagen, within the accuracy of collagen concentration measurements. Thus, the area under each peak represents the fraction of the corresponding molecules, allowing one to quantify each of the fractions from peak deconvolution. The results of quantitative collagen composition analysis based on deconvolution of DSC peaks from G76E and other mutant samples are collected in Table 1.
Note that additional peaks on thermograms could also result from degraded collagen. Occasionally we did observe DSC peaks originating from cleavage of mutant triple helices upon prolonged exposure to pepsin. To eliminate such artifacts, each sample was verified by gel electrophoresis, and whenever necessary, the length of the pepsin treatment was adjusted. For instance, due to the increased susceptibility, the length of G88E treatment was reduced to 6 h.
The assignment of each peak in the G76E thermogram relies on the following. In heterozygous mutants, ϳ25% of the synthesized molecules are expected to have no mutant chains, ϳ50% (one mutant chain) and ϳ25% (two mutant chains). Normally, mutant molecules have higher probability of being retained and degraded inside cells. Thus, the ϳ1:2 ratio of the areas of the normal T m peak and the middle peak indicates that the latter represents molecules with a single mutant chain, almost all of which appear to be secreted. Then the smallest, third peak represents molecules with two mutant chains, about 50% of which appear to be retained and degraded by cells. This assignment is consistent with the expected larger effect on T m of two mutant chains with large charged residues in place of the obligatory glycine.
Interpretation of the peaks and the composition analysis for thermograms of other ␣1(I) mutations is less straightforward when only two or one denaturation peaks are distinguishable. The loss of peaks corresponding to molecules with one or two mutant chains could be related to (a) reduced synthesis and/or secretion of such molecules and (b) weaker effect of the mutant chains on T m so that two or even all three , and 2 mM HCl, pH 2.7 (B), at 0.125°C/min heating rate. C, gel electrophoresis of the normal control sample with (ϩ) and without (Ϫ) dithiothreitol. Each thermogram shows the additional heat (compared with a reference solution) required for raising the sample temperature at a constant rate. A peak in the thermogram is the excess heat of collagen triple helix denaturation. The peak maximum corresponds to the apparent T m at given buffer composition and heating rate. The area under the peak is the denaturation enthalpy. Thin solid lines represent background-corrected sample thermograms, normalized to the same peak height. Bold lines represent thermograms of type I collagen obtained by subtraction of the expected type III collagen contribution. The fraction of type III collagen was evaluated from the intensities of ␣1(III) 3 and ␣1(I) ϩ ␣1(III) bands ( Table 1). The contribution of type III collagen to each thermogram (shown by a dashed line in the normal control sample) was estimated based on the independently measured DSC thermogram of purified type III collagen as described under "Experimental Procedures." Dotted lines show the denaturation peaks of different type I collagen species obtained by deconvolution of the corresponding thermograms. All three possible peaks for molecules with no, one, and two mutant ␣1(I) chains (␣1 m ) could be distinguished only in G76E and in G34R (at acid pH) thermograms. In other cases, the glycine substitutions result in approximately the same effect on T m regardless of the number of mutant chains so that only two denaturation peaks can be resolved, one for normal molecules and one for both types of mutant helices.
a ␣1 m and ␣1 indicate mutant and normal chains, correspondingly. b All cleavage experiments were performed at 34°C, except for chymotrypsin cleavage of G88E, which was performed at 30°C to avoid excessive proteolytic damage. c Estimated error peaks corresponding to different molecular forms of type I collagen overlap with each other. In some cases, the three peaks can still be resolved in a different buffer, as illustrated in Fig. 1B for G34R. Different effects of pH on different denaturation peaks can be explained as follows. Glu-32 is likely to be protonated and uncharged at pH 2.7 (below the glutamic acid pK). The protonation of Glu-32 would eliminate the potentially favorable electrostatic interaction between the positively charged Arg-34 and the negatively charged Glu-32. Such interaction might reduce the destabilizing effect of G34R substitution at pH 7.4, particularly in molecules with two mutant chains. The lack of this favorable interaction might explain the larger effect of the substitution on the molecules at pH 2.7, resulting in the larger T m shift and in the separation of the denaturation peaks for molecules with one and two mutant chains. Similarly, the loss of favorable interaction between Arg-75 and Glu-76 might explain a significant shift of the denaturation peak of G76E molecules with two mutant chains to lower temperature at pH 2.7 compared with pH 7.4.
In other cases, we can only determine the total fraction of mutant molecules but cannot distinguish the molecules with one and two mutant chains (e.g. for G25V). Sometimes only one peak is observed (e.g. G352S and G832S, Fig. 2, K and M). In this case, the only conclusion that can be drawn from DSC of media collagen is that the mutation has little effect on collagen thermal stability (provided that the mutant molecules are secreted).
Effect of Propeptides-We previously reported an unusual feature of mutations in the first 90 N-terminal amino acids (4). Their DSC thermograms showed distinct differences between each procollagen and collagen. It appeared that the presence of N-propeptides had a stabilizing effect on the triple helix in molecules containing mutant chains, resulting in up to 3°C increase in their apparent T m (Fig. 2, B-F). No effect of propeptides on the apparent T m was observed in normal collagen ( Fig. 2A) or molecules with mutations immediately beyond this region (G121D and G136R, Fig. 2, H and I). Our subsequent, more detailed studies revealed no differences in the thermal stability of procollagen and collagen further along the triple helix (e.g. G193S, G352S, G448S and G832S, Fig. 2, J-M), except in the immediate vicinity of the C-terminal end. Stability of the latter molecules appears to be affected by cleavage of C-propeptides, but this effect is beyond the scope of the present study and it will be discussed elsewhere. 3

Mutation Effect on Microunfolding
Proteolytic Cleavage-To verify the existence of a distinct N-terminal folding domain and to determine how mutations alter its structure, we investigated susceptibility of collagen to proteolytic cleavage by trypsin and chymotrypsin. After 20-min equilibration in 0.5 mg/ml chymotrypsin at 34°C, we observed no substantial proteolytic degradation of control collagen, G136R, G193S, and G448S. In contrast, all tested molecules with mutations in the first 90 N-terminal amino acids of the ␣1(I) chain exhibited pronounced cleavage ( Fig. 3A and Table 1). Each mutant was co-processed with control collagen in the same tube to ensure identical conditions. The cleavage patterns were identical within  each of the following groups of molecules: 1) normal control, G136R, G193S, and G448S; 2) G13D and ⌬E7; 3) G25V and G34R; and 4) G76E and G88E. Thus, only one representative from each group is shown in Fig. 3A.
Analysis of CNBr peptides from the products of this cleavage revealed increased chymotrypsin susceptibility within CB5 (aa 87-123) of ␣1(I) chain, most likely at Phe-92 (Fig. 3, B and C). In addition, G121D had an increased susceptibility to proteolysis at the N-terminal end of CB8 (aa 124 -402). The results of collagen treatment by trypsin were qualitatively similar, but the reaction products were more difficult to detect and interpret due to the much larger number of potential cleavage sites.
Thermal Stability of Truncated Collagen-By utilizing similar gel electrophoretic analysis and CNBr peptide mapping, we found that 4-h treatment of G25V collagen by 0.5 mg/ml chymotrypsin at 30°C produced a mixture of intact, full-length collagen with no mutant chains and truncated collagen whose N-terminal domain was completely removed by chymotrypsin cleavage (most likely at Phe-92 in ␣1(I) and Phe-86 in ␣2(I)). DSC of this mix in 2 mM HCl at 1°C/min produced two peaks consistent with denaturation of intact molecules at normal T m and N-truncated molecules ϳ7°C lower (Fig. 4B). Since truncated molecules do not contain the mutation, this reduced T m simply represents thermal stability of the triple helical fragment without 90 N-terminal amino acids.
Deconvolution of DSC thermograms of untreated mutant collagens in 2 mM HCl at 1°C/min revealed low temperature peaks in intact G25V, G34R and G76E with T m similar to the truncated collagen (Fig.  4B). These peaks correspond to the denaturation of triple helices with two mutant chains, which therefore have the same thermal stability as the triple helix without the first 90 N-terminal amino acids. In these mutants the N-terminal domain becomes so weak that it does not contribute to the thermal stability of the triple helix at all, in sharp contrast to the normal control. In combination with significant chymotrypsin sensitivity within ␣1(I)-CB5 at 34°C, stabilization of the molecule by the N-propeptide, and the inability of N-proteinase to cleave the N-propeptide, this effect suggests that the first 90 N-terminal amino acids might be completely unfolded at body temperature.
The lack of such low temperature peak in ⌬E7 and G13D means that the N-terminal domain contributes to the thermal stability even when these proteins contain two mutant chains. Their susceptibility to chymotrypsin cleavage within ␣1(I)-CB5 at 34°C still indicates that the conformational change in the triple helix propagates through the whole N-terminal domain, but the conformational change appears to be less drastic than in G25V, G34R, or G76E, e.g. it might involve loosening rather than complete unfolding of the N-terminal domain.
Reversible Partial Denaturation of G25V Collagen-To confirm unfolding of the N-terminal domain in G25V, we measured temperature-induced changes in the circular dichroism at 223.8 nm. This wave-  length was selected to minimize changes in the CD signal directly associated with temperature rather than with changes in the secondary structure of collagen. Control and G25V samples were equilibrated at 10°C and tested by similar trains of 34.5°C pulses shown in Fig. 5B. Comparison of the observed CD changes (Fig. 5A) is consistent with reversible loss of ϳ1.5-2% of the triple helix signal. Based on the results of N-propeptide processing (Ref . 4 and Table 1) and the DSC analysis discussed above, it is reasonable to assume that these changes are associated with partial denaturation of triple helices in molecules with two mutant chains, which comprise ϳ15-25% of secreted collagen. Then, the 1.5-2% loss of the triple helix signal indicates reversible unfolding of a domain containing 100 Ϯ 40 amino acids in each chain, consistent with the results of proteolytic cleavage analysis and the truncated collagen study. (The upper and lower bounds in this estimate were calculated as 1014 ϫ 2%/15% ϭ 135 and 1014 ϫ 1.5%/25% ϭ 61 correspondingly, where 1014 is the length of each chain within the triple helical domain).

DISCUSSION
The present study provides biochemical and biophysical evidence corroborating our hypothesis (4) of a highly stable 85-aa domain, which serves as the stabilizing N-anchor for the amino end of the collagen triple helix. Regardless of their location, five different glycine substitutions and a deletion within this region of the ␣1(I) chain have qualitatively similar effects on collagen stability. They all result in a significant loss of the thermal stability of collagen triple helix, a smaller change in the stability of procollagen, altered N-propeptide cleavage, and appearance of chymotrypsin sensitivity at Phe-92 in the ␣1(I) chain. Finer, quantitative effects on the thermal stability and proteolytic sensitivity appear to vary with the mutation location but not with its type. Specifically, within each of the two pairs of closely located mutations (⌬E7/ G13D and G25V/G34R) these effects were virtually indistinguishable despite the difference in the nature of the mutation. Based on the present data we arrive at the following conclusions.
Molecules with Two Mutant Chains Exhibit Cooperative N-anchor Unfolding-Mutations within N-anchor (⌬E7, G13D, G25V, G34R, G76E) or in its immediate vicinity (G88E) appear to disrupt the structure of the whole domain. Indeed, increased thermal stability in the presence of N-propeptides (Fig. 2) and altered N-propeptide cleavage (Table 1 and Ref. 4) show that the disruption propagates from the mutation site all the way to the N-terminal end of the triple helix. The chymotrypsin sensitivity indicates that the disruption propagates from the mutation toward Phe-92. Mutations beyond this region do not produce similar effects.
Minimal effect of N-propeptides on thermal stability of G88E molecules and minimal changes in N-propeptide cleavage in this mutant are consistent with the location of this mutation at the domain boundary. However, rather low content of G88E molecules with one or two mutant chains estimated from DSC (57% versus expected 75%) and high content of type III collagen in the cell culture media (Table 1)  G25V, G34R, and G76E substitutions appear to cause fast, cooperative N-anchor unfolding at temperatures as low as 34°C. This conclusion is suggested by a similar reduction in thermal stability of molecules with two mutant chains and truncated collagen without N-anchor (Fig.  4). It is further supported by direct observation of reversible microunfolding in G25V by CD (Fig. 5). Structural disruption of N-anchor in ⌬E7, G13D, and G88E is less drastic. These mutants are more thermally stable. However, chymotrypsin sensitivity in ⌬E7 and G13D and N-propeptide effects in G88E still indicate some triple helix loosening affecting the whole domain at 34°C and possible N-anchor unfolding at body temperature.
N-anchor Is a Well Defined Structural Domain-Analysis of amino acid sequence of the N-terminal region of human type I collagen triple helix suggests a structural reason for the high thermal stability and cooperative unfolding. Fig. 6 shows the high content of Gly-Xaa-Yaa triplets with Pro in Xaa or Hyp in Yaa positions within the first 85 residues, corresponding to the N-anchor. Evaluation of local thermal stability based on the stability scores proposed by Bachinger et al. (13) or on host-guest peptide data (14 -16) suggests that this is one of the most thermally stable regions along the collagen triple helix. It is bounded by a stretch of five triplets with no imino acids in either ␣1(I) or ␣2(I) chain. The latter is one of the least stable sections of the triple helix based on the local stability calculations (Fig. 6), which was previously recognized as a likely flexible or microunfolding region (17)(18)(19).
Apparently, due to its inherent flexibility, the microunfolding region prevents structural disruptions of the triple helix on either side of it from affecting the other side, thereby creating the well defined N-anchor domain. The microunfolding region limits propagation of the disruptions as long as they do not affect the register of the chains. Molecules containing two ␣1(I) chains with any of the studied N-anchor mutations cause unfolding or significant loosening of the whole domain, which propagates both into the N-propeptide and into the microunfolding site but not beyond the latter. The resulting structural change within the N-propeptide cleavage site prevents its processing by N-proteinase (Table 1   The error bars represent the standard deviation calculated by pulling together all data points measured for each protein at each temperature. The actual temperature in the cell was within 1°C from the program temperature at all measurement points (verified in a trial run with a thermocouple calibrated against NIST-certified thermometers). To achieve Յ0.3% measurement error, we optimized the measurement conditions as described under "Experimental Procedures," resulting in Ն300 signal/noise ratio.
A Single Mutant Chain Is Not Sufficient to Unfold the Whole Domain-Due to the proximity of ⌬E7 and G13D to the N-propeptide, these helices with one mutant chain do have altered N-propeptide processing but are not susceptible to chymotrypsin. In contrast, the molecules with a single mutant G25V, G34R, or G76E chain are susceptible to chymotrypsin, but their N-propeptides are cleaved, and these molecules are more thermally stable than truncated collagen. Apparently, the N-cap of the first seven highly stable triplets in these molecules remains at least partially folded. Understandably, the G88E substitution, located at the boundary between the N-anchor and the microunfolding site, has a strong effect on the chymotrypsin sensitivity. Its effect on N-propeptide cleavage is so weak that we cannot distinguish whether it originates only from molecules with two mutant chains or not.
N-anchor Unfolding Might Contribute to the Distinct ␣1(I)-OI/EDS Phenotype-As mentioned earlier (4), the EDS symptoms in ␣1(I)-OI/ EDS appear to be related to incomplete or delayed N-propeptide cleavage and incorporation of the resulting pN-collagen into matrix fibrils. The abnormal N-propeptide cleavage is similar to EDS VIIA/B caused by deletions of exon 6 in COL1A1/COL1A2, which encodes for the cleavage site and adjacent telopeptide (6). However, ␣1(I)-OI/EDS patients exhibit a distinct phenotype with early progressive scoliosis, no bilateral hip dysplasia, and thin dermal fibrils with more regular shape and more uniform diameters (4). Abnormal N-propeptide cleavage was also observed in several cases of ␣2(I)-OI/EDS caused by deletions of exon 9 or 11 in the N-anchor region of COL1A2 (20 -22). The phenotype of ␣2(I)-OI/EDS patients was reported to be generally consistent with EDSVIIA/B. Their dermal fibrils were not studied except for one unusual case of a large duplication of exons 12-32 in COL1A2 (23). The latter patient had EDS VIIA/B phenotype with bilateral congenital hip dislocation but thin, rounded fibrils with uniform diameters similar to ␣1(I)-OI/EDS. N-propeptide cleavage in cell culture from this patient appeared to be complete, unlike in ␣1(I)-OI/EDS or EDS VIIA/B.
We hypothesize that more regular shape and uniform fibril diameters in ␣1(I)-OI/EDS are related to nonspecific proteolytic cleavage within the unfolded N-anchor region by enzymes present in the dermis of the patient. The irregular fibril shape observed in EDSVII by electron microscopy is believed to be associated with incorporation of pN-collagen whose uncleaved N-propeptides cover the fibril surface and inhibit normal fibril formation (24,25). Although significant matrix incorporation of pN-collagen was observed in ␣1(I)-OI/EDS cell cultures (4), analysis of skin collagen composition from one of the patients (G13D) by guanidine HCl extraction revealed rather low (Ͻ 2%) pN-collagen content. One possible explanation is that unfolded N-anchor domain might prevent the specific N-proteinase cleavage while being susceptible to cleavage by other, nonspecific proteases before or after incorporation into fibrils.
In addition to more regular fibril shape and more uniform diameters, N-anchor unfolding could directly affect functional properties of the fibrils. For instance, Hyl-87 at the N-anchor boundary is a known glycosylation and cross-linking site. The surrounding region is also one of proposed "hot spots" for binding of a variety of matrix molecules to collagen, including integrins (26). N-anchor unfolding is likely to alter the structure of this region and, thereby, might affect, e.g. formation and function of cross-links, interaction of collagen with other matrix components and cells, fibril hydration, etc. Note that the potential crosslinking abnormality in ␣1(I)-OI/EDS is likely to be different from the known cross-link deficiency in EDS VIIA/B (6) caused by the loss of other important Hyl residues encoded in exon 6.
We suggest that the distinct features of the EDS phenotype and more pronounced OI symptoms in ␣1(I)-OI/EDS patients result from unfolding/loosening of the structurally well defined N-anchor domain. We suspect that similar N-anchor mutations in the ␣2(I) chain, particularly Gly substitutions, might result in a different phenotype. Since only one mutant ␣2(I) chain can be incorporated into the triple helix, the resulting structural disruption of the N-anchor might be more localized and not affect the whole domain, as discussed above. In the future, we hope to test this hypothesis and gain further insights into molecular mechanisms of various phenotypes by direct comparison of structural and physical properties of collagen fibrils in different forms of OI/EDS and EDS VII.
In conclusion, this work is only a first step toward full characterization of structural domains in the collagen triple helix. We presented evidence delineating the N-terminal domain, described structural changes in it caused by mutations, and presented a model relating these changes to a very specific disease phenotype. In an ongoing project we are utilizing similar approaches to systematically examine structural FIGURE 6. Amino acid sequence (A) and local stability models (B) for the N-terminal end of type I collagen triple helix. Proline (P) and hydroxyproline (O) residues required for triple helical stability are underlined and highlighted by green color. The fragment highlighted by yellow has no imino acids and is expected to have low triple helix propensity and thermal stability. It is homologous to a known microunfolding region of mouse and rat collagens readily cleaved by chymotrypsin above 25-30°C at Phe-92 (17). The profile of local relative stability score (shown in blue) was described by Bachinger et al. (13). The local T m profile (shown in red) was generously provided to us by Dr. Persikov. It was calculated with the same seven triplet averaging window as in Bachinger et al. (13) but on the basis of measured thermal stability of triple helical host-guest peptides in water (16).
consequences of glycine substitutions in other parts of the triple helix. We are finding evidence for the existence of other structural domains and hope that their mapping and characterization will be equally or even more revealing. However, there is still much work to be done before the domain map of the type I collagen triple helix is complete.