Structural Heterogeneity of Type I Collagen Triple Helix and Its Role in Osteogenesis Imperfecta*

We investigated regions of different helical stability within human type I collagen and discussed their role in intermolecular interactions and osteogenesis imperfecta (OI). By differential scanning calorimetry and circular dichroism, we measured and mapped changes in the collagen melting temperature (ΔTm) for 41 different Gly substitutions from 47 OI patients. In contrast to peptides, we found no correlations of ΔTm with the identity of the substituting residue. Instead, we observed regular variations in ΔTm with the substitution location in different triple helix regions. To relate the ΔTm map to peptide-based stability predictions, we extracted the activation energy of local helix unfolding (ΔG‡) from the reported peptide data. We constructed the ΔG‡ map and tested it by measuring the H-D exchange rate for glycine NH residues involved in interchain hydrogen bonds. Based on the ΔTm and ΔG‡ maps, we delineated regional variations in the collagen triple helix stability. Two large, flexible regions deduced from the ΔTm map aligned with the regions important for collagen fibril assembly and ligand binding. One of these regions also aligned with a lethal region for Gly substitutions in the α1(I) chain.

We investigated regions of different helical stability within human type I collagen and discussed their role in intermolecular interactions and osteogenesis imperfecta (OI). By differential scanning calorimetry and circular dichroism, we measured and mapped changes in the collagen melting temperature (⌬T m ) for 41 different Gly substitutions from 47 OI patients. In contrast to peptides, we found no correlations of ⌬T m with the identity of the substituting residue. Instead, we observed regular variations in ⌬T m with the substitution location in different triple helix regions. To relate the ⌬T m map to peptide-based stability predictions, we extracted the activation energy of local helix unfolding (⌬G ‡ ) from the reported peptide data. We constructed the ⌬G ‡ map and tested it by measuring the H-D exchange rate for glycine NH residues involved in interchain hydrogen bonds. Based on the ⌬T m and ⌬G ‡ maps, we delineated regional variations in the collagen triple helix stability. Two large, flexible regions deduced from the ⌬T m map aligned with the regions important for collagen fibril assembly and ligand binding. One of these regions also aligned with a lethal region for Gly substitutions in the ␣1(I) chain.
The mature type I collagen molecule is a 300-nm-long triple helix formed by two ␣1(I) and one ␣2(I) chains, which is flanked by short terminal peptides. Based on the (Gly-Xaa-Yaa) 338 triplet repeat within each chain (1), the triple helix is commonly viewed as a single domain (Xaa and Yaa stand for variable residues). However, this picture may not accurately represent variations in the stability of different regions within the triple helix that fold and unfold cooperatively (2)(3)(4). The triple helix is only metastable or marginally stable at physiological temperature (3,(5)(6)(7)(8). Its local structure appears to be highly dynamic and intimately related to local stability. The more labile regions may exist in a loose conformation, constantly undergoing unfolding/refolding transitions while more stable "clamp" regions prevent unfolding of the whole molecule (3, 9 -12). Such structural and dynamic heterogeneity is believed to play an important role in self-assembly (9,13) and function (3) of collagen fibers.
Significant progress in understanding regional variations in the triple helix stability in different collagens has been reported in recent years. For example, some flexible sites were localized by observing triple helix bending in electron microscopy (14) and/or increased susceptibility to proteolytic cleavage (9,15). Genetically generated reshuffling of different triple helix regions was shown to have a significant effect on the overall stability of the molecule (11,16). Relative local stability maps were proposed based on scoring different sequences (17) or on the denaturation temperature (T m ) measured for triple-helical host-guest peptides (18).
The existence of looser, less stable, and tighter, more stable structural regions within the triple helix is now commonly accepted. However, knowledge of their locations, properties, and role in type I collagen function is still incomplete. In particular, it is still not understood if and how these regions affect the osteogenesis imperfecta (OI) 3 phenotype.
OI is an autosomal dominant bone disorder caused primarily by mutations in type I collagen (19,20). Over 80% of lethal to moderately severe cases result from substitutions of the obligatory Gly residues that disrupt the triple helix, clearly pointing to structural defects as the underlying cause (21). However, no correlations of OI severity with the predicted local stability of the helix at the substitution site or with the change in the measured denaturation temperature of the whole helix (⌬T m ) were found (17,(21)(22)(23)(24)(25)(26)(27)(28). Regional variations in collagen structure and stability may modulate the structural and functional consequences of otherwise similar mutations, thereby contributing to the lack of simple correlations between the structural disruptions and OI phenotype.
Note that by local properties or disruptions of the triple helix we mean those involving less than 5-10 adjacent Gly-Xaa-Yaa (GXY) triplets. This is the characteristic distance scale of variations in the reported local stability maps (17,18). We refer to disruptions or average helix properties on much larger distance scales as regional.
One approach to better understanding regional variations in the triple helix structure and stability is a systematic study of molecules with Gly substitutions by the same techniques under similar conditions. For instance, such studies of ␣1(I) mutations at the N-terminal end of collagen allowed us to delineate a highly stable N-anchor region separated from the rest of the triple helix by a flexible microunfolding site (29). This region is formed by ϳ85 N-terminal residues in each chain (see "Discussion"). Mutations within this region cause structural changes in the molecule, resulting in a distinct OI/EDS phenotype with significant ligament laxity, characteristic of Ehlers-Danlos Syndrome (EDS) (30). Analysis of these structural changes allowed us to propose a molecular mechanism for the EDS-like symptoms (29,30).
Another approach is to continue development of the local stability models. Many efforts were invested into detailed studies of collagen-like peptides with different sequences (see e.g. Refs. 18, 28, 31, 32 and references therein). We utilized the peptide-based local stability models (17,18) as one of the tools for delineating the N-anchor region (29). However, accurate evaluation of the local stability of full-length collagens from such data is still a challenge (18). The existing models have not been experimentally tested as yet.
In the present study, we combine both of these approaches: (i) We report measurements of ⌬T m for 41 Gly substitutions from OI patients, all performed at the same conditions by differential scanning calorimetry and circular dichroism. We test the effects of the identity and the position of the substitution along the triple helix. We differentiate contributions from molecules with different chain composition and map ⌬T m for molecules with one mutant chain. (ii) We generalize the peptidebased models for quantitative prediction of the local stability of type I collagen. We test the predictions and determine the corresponding model parameters by measuring the H-D exchange kinetics of glycine amides involved in interchain hydrogen bonds. (iii) Based on the measured ⌬T m map and calculated local stability map, we discuss regional variations in the triple helix stability and structure. We delineate two large, flexible regions, which may be important for collagen-collagen interactions, ligand binding, and OI phenotypes.
To better understand our findings and eliminate potential artifacts, we utilize a variety of techniques and approaches. In the main text, we focus only on the most crucial results and ideas. The auxiliary techniques, results, and analyses are presented in the supplemental material.

Collagen Preparation
Human Procollagen-Skin fibroblast cultures were established from dermal punch biopsies collected from OI patients under an IRB-approved protocol. Normal control cells CRL-2127 were purchased from American Type Culture Collection (Manassas, VA); GM04501, GM04503, GM07525, GM07753 were purchased from Coriell cell repositories (Coriell Institute for Medical Research, Camden, NJ). Fibroblasts were grown to confluence at 37°C in Dulbecco's modified Eagle's medium containing 10% fetal bovine serum and 2 mM glutamine in the presence of 5% CO 2 . Cultures were stimulated with Dulbecco's modified Eagle's medium supplemented with 0.1% fetal bovine serum, 2 mM glutamine, and 50 g/ml ascorbate. After 24 h, medium was harvested, buffered with 100 mM Tris-HCl pH 7.4, mixed with protease inhibitors (25 mM EDTA, 1 mM phenylmethylsulfonyl fluoride, 5 mM benzamidine, 10 mM N-ethylmaleimide, and 0.2% NaN 3 ,) and cooled to 4°C. Procollagen from media was precipitated by 176 mg/ml ammonium sulfate.
Pepsin-treated Human Collagen-The ammonium sulfate precipitates were suspended in 0.5 M acetic acid and digested by pepsin (ϳ0.1 mg/ml, ϳ1:10 pepsin:collagen) overnight at 4°C. After digestion, collagen was selectively precipitated with 0.9 M NaCl (final concentration), suspended in 0.5 M acetic acid, and reprecipitated with 0.7 M NaCl. The pellet was washed with 70% ethanol and redissolved and dialyzed against 0.2 M sodium phosphate, 0.5 M glycerol buffer, pH 7.4. Purity of the preparation, triple helix resistance to pepsin at 4°C, and the content of type III and type V collagens were evaluated by gel electrophoresis as described previously (29). Collagen concentration in each preparation was measured by circular dichroism (see below).
Rat Tail Tendon Fibers-Frozen tails of 6-week-old rats were purchased from Pel-Freez Biologicals and stored at Ϫ80°C. Tendons were excised, washed in 3.5 M NaCl, 10 mM Tris, 20 mM EDTA, 2 mM N-ethylmaleimide, 1 mM phenylmethylsulfonyl fluoride, pH 7.5 and stored in this buffer at 4°C. Crystalline organization of collagen in tendon fibrils was disrupted by overnight equilibration in 0.2 M sodium phosphate, 1 M glycerol, pH 7.5 at 4°C. Tendons were washed in 20 mM NaCl, 10 mM HEPES, pH 7.5 for several hours at 25°C. A small fiber was separated from tendon with needlepoint tweezers, stretched, covered by a droplet of the NaCl/HEPES buffer, sandwiched between two 13 ϫ 2 mm, round CaF 2 infrared windows (one window with "in" and "out" ports for solution exchange) and sealed by a thin layer of ultra high-vacuum Apiezon L grease (M&I Materials, Manchester, UK) around window perimeter.

Thermal Stability of Human Collagen
Differential Scanning Calorimetry-DSC measurements were performed with 0.1-0.4 mg/ml procollagen and pepsin-treated collagen solutions in 0.2 M sodium phosphate, 0.5 M glycerol, pH 7.4. The thermograms were recorded from 10 to 60°C in Nano II or Nano III calorimeters (Calorimetry Sciences Corporation, Lindon, UT). Except for ␣2(I)-G805D, all mutant collagens were scanned at 0.125 or 0.25°C/min heating rate. We found that this 2-fold change in the heating rate results in a 0.4 Ϯ 0.1°C shift of the whole thermogram with negligible distortion of the shape of the denaturation peaks. Because of limited protein amount, ␣2(I)-G805D was scanned only at 1°C/min, which results in a detectable distortion of the peak shape, reducing the accuracy of ⌬T m measurement to ϳ0.5°C. All thermograms were corrected by subtraction of a linear baseline and normalized per unit collagen concentration (for quantitative analysis) or to give the same peak height (for visual comparison). To extract the contribution of type I collagen, whenever necessary normalized thermograms of purified human type III and type V collagens (SouthernBiotech, Birmingham, AL) were multiplied by their fractions in the sample and subtracted from the thermogram of the mixture (29). Deconvolution of mutant peaks in DSC thermograms was performed with the PeakFit software (Systat, PointRichmond, CA).
Circular Dichroism-CD spectra of pepsin-treated, purified collagen in 0.2 M sodium phosphate, 0.5 M glycerol, pH 7.4 (PGB) or in 2 mM HCl, pH 2.7 were measured in a quartz microcuvette with 1-cm optical path length in a J810 spectropolarimeter equipped with a PFD-425S thermoelectric temperature controller (Jasco, Easton, MD). Collagen concentration in the solution was evaluated from the ellipticity at 221 nm at 20 -25°C. Triple helix unfolding upon heating was evaluated from the ellipticity change at 223.8 nm (see supplemental material). The temperature in the cuvette was calibrated by a thermocouple probe inserted into the buffer solution. CD spectra were recorded after 2-5 min equilibration at the selected temperature.

Infrared Spectroscopy of Rat Tail Tendon Fibers
Infrared (IR) spectra of solvated rat tail tendon fibers were measured in a Continuum IR microscope attached to a Nexus 670 Fourier transform infrared spectrometer with a 15ϫ Reflechromat IR objective/condenser and a narrow band MCT/A detector (Thermo Electron Corp., Madison, WI). Tendon fibers sandwiched between CaF 2 IR windows were mounted in a custom built, thermostated (Ϯ0.05°C), flowthrough IR cell with built-in passive compensation for thermal expansion (United States Patent Applications 10/926,405; 11/826,806). The cell was designed for high mechanical and optical stability, allowing measurements of highly reproducible spectra (Ϯ10 Ϫ4 OD units) of solvated samples (33). The mounted sample was flattened by slight mechanical pressure to the optical path of 5-7 m and extensively washed with 3 mM CAPS, 5 mM NaCl, pH 10.0 solution. The IR beam was restricted by a 150 ϫ 40 m rectangular mask and positioned in the middle of a ϳ70-m wide collagen sample. To obtain isotropic spectra, the beam was polarized before the sample with a BaF 2 holographic polarizer at the "magic angle" of 55 deg with respect to the fiber direction. The spectra were collected from 900 to 7000 cm Ϫ1 at 4 cm Ϫ1 resolution by accumulation of 50 -200 interferograms, depending on the desired time resolution. The H-D exchange reaction was initiated by rapid (Յ10 s) replacement of the H 2 O buffer with a D 2 O buffer with matching buffer composition and pD (glass electrode reading ϩ0.4). The disappearance of water O-H spectral bands (3000 -3700 cm Ϫ1 ) indicated replacement of all interstitial H 2 O by D 2 O within 1 min. The spectra were collected with 1-30 min periodicity for up to a week at constant temperature. The fraction of unexchanged Gly-NH groups involved in interchain hydrogen bonds was measured at pD 10 from intensities of amide II (integrated from 1515 to 1600 cm Ϫ1 ) and amide A (integrated from 3150 to 3450 cm Ϫ1 ) bands. The intensities within these spectral regions in collagen with fully deuterated Gly amides were measured after pre-equilibration at 50°C, pD 10 for 10 h.

Model Analysis
For quantitative analysis and comparison with experiments, we describe local stability of collagen in terms of activation energy for local triple helix unfolding. We calculate this activation energy from the amino acid sequence of collagen and apparent denaturation temperatures (T m ) of host-guest peptides reported by Persikov et al. (18) within the following model.
Denaturation of Host-guest Peptides-The T m of host-guest peptides have been investigated mostly in a kinetically limited regime, in which the transition is irreversible at the experimental time scale (34). At the same heating rate, the activation free energy of the transition G ‡ is related to the apparent T m as in Equation 1 (see supplemental material).
Here T 0 is a reference temperature; ␦T m ϭ T m Ϫ T 0 , and ␦G ‡ ϭ G ‡ (T 0 ) Ϫ G 0 ‡ , and G 0 ‡ and H 0 ‡ are the activation free energy and enthalpy for a reference peptide which denatures at T 0 . Hereafter we refer to thermodynamic potentials only at the reference temperature T 0 and omit the corresponding argument in their notation.
Based on empirical ␦T m (18) and Equation 1, the change in the activation energy ␦G GXY upon changing one GXY triplet can be approximated by Equation 2, where ␦T m GXY is a component of ␦T m independent of adjacent triplets and ␦T int GXY is a correction for interaction with nearest neighbor triplets. The values of ␦T m GXY and ␦T int GXY for all common triplet sequences were tabulated by Persikov et al. (18).
Local Stability and Unfolding of Type I Collagen-For fulllength collagen triple helices, we use the following additional assumptions. (i) Local unfolding is opening of at least one GXY triplet i, which breaks interchain hydrogen bonds and exposes buried Gly-NH to water. It affects adjacent triplets j whose contribution is weighted by expϪ(i Ϫ j) 2 /n 2 ], where n characterizes the extent of the perturbation. (ii) The sequence dependence of the activation energies is the same in collagen and host-guest peptides. (iii) The contribution of each chain in type I collagen is independent and additive, even though type I collagen is a heterotrimer while all studied host-guest peptides were homotrimers (18). The kinetics is determined entirely by local unfolding exposing glycine amides to water, after which the exchange is instantaneous as well. We approximate the number of unexchanged where, as above, the index i labels GXY triplets in the triple helix and Equation 5, is the characteristic rate constant for local triple helix unfolding at the triplet i, , R is the universal molar gas constant, and k 0 is the local H-D exchange rate constant within the same reference sequence as used for the definition of G ref ‡ .

RESULTS
Thermal Stability of Gly Substitution Mutants-We systematically examined thermal stability of type I collagen secreted by cultured fibroblasts from 5 normal controls and 47 heterozygous patients with 41 different Gly substitutions. We measured DSC thermograms of pepsin-treated collagen at 0.125-1°C/ min heating rate (Figs. 1 and 2; Note that the phosphate-glycerol buffer was used to suppress fibrillogenesis upon slow heating, as suggested before (6). We confirmed that the buffer had the same effect on molecules with and without mutant chains by testing procollagens, which are more soluble at physiological conditions. In the phosphateglycerol buffer, DSC thermograms of procollagens from normal control and mutants with Gly substitutions outside of N-and C-terminal regions of the triple helix were identical to the corresponding pepsin-treated collagens (29). In phosphate-buffered saline at the same pH, procollagen thermograms were simply shifted by Ϫ1.7°C compared with the phosphateglycerol buffer, as previously predicted (6) and confirmed for normal control molecules (8) and some mutants (supplemental material).
At the same heating rate, DSC thermograms of pepsintreated type I collagen secreted by normal control cells were identical. They had a single sharp maximum (Fig. 1A) at the apparent denaturation temperature (T m ). The values of T m varied up to Ϯ0.3°C between the cell lines and even between different cultures of the same cell line (see supplemental material). We observed similar T m variation in pepsin-treated collagen purified from tail tendons of wild-type mice (35). DSC thermograms from the same collagen batch (from the same human cell culture or the same mouse tail tendon) were reproducible within 0.1°C. Therefore, Ϯ0.3°C variation is not related to the measurement technique or the instrument but rather to variation in the properties of the collagen itself. One possible source is variability in posttranslational modification, e.g. due to uncontrolled changes in cell culture conditions. For instance, overmodification of collagen upon incubation of cells at 40°C caused an almost 1°C increase in the T m (36).
In most mutants we observed additional peaks with lower T m (Figs. 1, B-D and 2), corresponding to denaturation of molecules containing one or two mutant chains. Pepsin-treated, secreted collagens with the same mutation from unrelated patients had denaturation peaks at the same temperatures. All major DSC peaks corresponded to denaturation of different molecules rather than different domains within the same molecule (supplemental material).
Three peaks, corresponding to ␣1 2 ␣2, ␣1␣1 m ␣2, and ␣1 m 2 ␣2 (the superscript labels the mutant chain) were observed in ␣1(I)-G76E (29) and ␣1(I)-G523C (Fig. 1B). The area under each peak (estimated by peak deconvolution) represents the fraction of the corresponding molecules in the mixture ( Table 1). The differences between the fractions measured from DSC thermogram deconvolution (Table 1) and expected for heterozygous ␣1(I) mutations (25%, ␣1 2 ␣2; 50%, ␣1␣1 m ␣2; and 25%, ␣1 m 2 ␣2) are related to intracellular degradation of a fraction of mutant molecules. This conclusion was confirmed by studies of intracellular collagen denaturation (Table 1 and supplemental material). Note that within 2-5% experimental All thermograms were corrected to physiological buffer by a temperature shift of Ϫ1.7°C (6). The contribution of type III collagen was subtracted as described previously (29). DSC thermograms of mutant collagens were deconvoluted into component peaks shown by dotted lines (B-D), representing denaturation of molecules with none, one, and two mutant chains. The fraction of the corresponding component was determined based on the area under each peak (29). The value of ⌬T m for each glycine substitution was defined as the difference in the positions of peaks corresponding to molecules with no and one mutant chains, so that ⌬T m for ␣1(I) and ␣2(I) mutations could be compared. When the corresponding DSC peaks fused into one (D), the relative fractions of the components could not be determined due to uncertainties associated with unknown shapes of the mutant peaks. However, except for ␣2(I)-G805D, the value of ⌬T m could still be measured with Ϯ0.3°C or better accuracy.
accuracy, the fraction of molecules with two mutant chains obtained by DSC thermogram deconvolution for ␣1(I)-G523C (Fig. 1B) was identical to the fraction deduced for the same sample from the intensity of the S-S dimer band in gel electrophoresis (Table 1).
Two peaks were observed in pepsin-treated, secreted collagens from most other mutants. A single peak was observed in ␣1(I)-G148D, -G148R, -G352S, -G448S, and -G832S as well as in ␣2(I)-G706S, -G721S, and -G805D (Figs. 1 and 2). A smaller number of peaks than possible collagen compositions may have two explanations. (i) A small difference in T m of molecules with different content of mutant chains may result in fusion of the corresponding peaks into one (Fig. 1D). (ii) Molecules with one and/or two mutant chains may be retained and degraded by cells and become undetectable by DSC (Ͻ5% of total collagen) in preparations from the cell culture media.
In all cases with fewer than expected peaks, we found that the secretion of molecules with one mutant chain was sufficient for detection by DSC (supplemental material). Clearly, the secretion of molecules with two mutant chains was also sufficient in ␣1(I)-G313C and -G967C, where S-S dimers were quantified by SDS-PAGE (Table 1). Previously, we reported evidence of the secretion of such molecules in ␣1(I) -G13D, -G25V, and -G34R as well (29). However, we could not say with certainty whether the molecules with two affected chains contributed to DSC thermograms of other ␣1(I) mutants.
From DSC thermograms we determined the difference (⌬T m ) in the denaturation temperatures of molecules with one and no mutant chains, as illustrated in Fig. 1. In mutants with one broad DSC peak, ⌬T m was estimated by thermogram deconvolution (Fig. 1D). Only ␣2(I)-G805D did not have a detectable effect on T m (although a lower resolution DSC at 1°C/min was utilized due to limited protein availability). All other studied substitutions reduced the triple helix T m by 0.8°C to 4.6°C (Table 1). Based on comparison of ⌬T m for 5 different cell lines with the same ␣1(I)-G589S substitution, we estimated reproducibility of ⌬T m as Ϯ0.3°C, consistent with the T m reproducibility in normal controls.
Mapping ⌬T m versus substitution location in the triple helix for pepsin-treated molecules with one mutant chain revealed a regular pattern (Fig. 3). At the same time, we found no apparent correlations of ⌬T m with the substitution identity when all mutations were pooled together (Fig. 3, inset).  Table 1). Note that ␣2(I)-G247S and -G247C had different post-translational overmodification (37).
These observations suggested that ⌬T m variations may be related to heterogeneous local stability of the triple helix and concomitant variations in the extent of structural disruptions caused by substitutions at different locations.
Triple Helix Disruption by Gly Substitutions-We previously found that the whole N anchor region of ϳ30 N-terminal GXY (Gly-Xaa-Yaa) triplets was disrupted by Gly substitutions within it. DSC of procollagen and altered N-propeptide cleavage suggested that the disruptions propagated all the way to the N-terminal end (30). Chymotrypsin susceptibility indicated that the disruptions propagated all the way to ␣1(I)-F92, where the cleavage was monitored (29).
Similar to N-anchor mutations, the thermal stability of secreted ␣1(I)-G997S procollagen was higher than the pepsin-treated collagen (Fig. 4), indicating that the mutation disrupted all five GPO triplets on its C-terminal side. The C-propeptide limited the extent of the disruption, thereby contributing to the stability of the mutant. In ␣1(I)-G967C and other mutants outside the N-anchor, the stability of secreted procollagen was the same as pepsin-treated collagen. Neither ␣1(I)-G997S nor ␣1(I)-G967C had any significant effect on the chymotrypsin susceptibility at ␣1(I)-F935 (supplemental material), in contrast to the increased susceptibility of N-anchor mutants at ␣1(I)-F92 (29). Gly substitutions appear to cause more limited disruptions within the last 25 triplets at the C-terminal end compared with the first 25 triplets at the N-terminal end. Interestingly, we found an evidence of a ϳ100 triplet disruption in pepsin-treated ␣2(I)-G898V collagen. A shoulder in the DSC thermogram (not expected in ␣2(I) mutants) indicated melting of a large domain below the T m of the whole molecule (supplemental material). The contributions from molecules with and without the mutant chain were better resolved in 2 mM HCl (supplemental material). Subsequent CD measurements in 2 mM HCl revealed reversible unfolding ϳ 30% of the triple helix below T m of the whole molecule (Fig. 5). We applied a train of temperature pulses, heating the sample from 20 to 35.5°C and cooling back to 20°C, as shown in Fig. 5B. In normal control, the first 35.5°C pulse decreased the ellipticity at 223.8 nm due to denaturation of ϳ5% damaged molecules present in the sample (Fig. 5A). Further pulses caused no additional changes. In ␣2(I)-G898V, the first 35.5°C pulse caused ϳ20% change in the ellipticity, of which only ϳ5% was due to irrevers-ible denaturation of damaged molecules. Subsequent pulses caused ϳ15% oscillations due to reversible unfolding of a large triple helical region within mutant molecules (Fig. 5A), consistent with DSC observations. The upward drift of the curve was caused by irreversible complete unfolding of a small fraction of mutant molecules. Because the reversible unfolding occurred only in molecules with the mutant chain, comprising ϳ50% of the mixture (Table 1), we estimated that it involved ϳ30% of the triple helix length or ϳ100 triplets. The same region also appeared to be disrupted by ␣2(I)-G922S. Domain melting of the latter molecules was revealed by DSC in 2 mM HCl after C-terminal end truncation at or near ␣1(I)-F935 by chymotrypsin (supplemental material).

Analysis of Local Stability by H-D Exchange-
The heterogeneity in local stability of type I collagen triple helix can be experimentally studied using hydrogen to deuterium (H-D) exchange The numbers in parentheses indicate ⌬T m and % fraction of molecules with two mutant chains determined from DSC deconvolution (G76E and G523C) and S-S dimer intensity in gel electrophoresis (G313C, G523C, and G967C). Type III collagen content was evaluated from gel electrophoresis with and without dithiothreitol. The fraction of mutant collagen was evaluated from DSC deconvolution for secreted collagen and from CD (supplemental materials) for intracellular collagen. Error values represent S.D. (for multiple measurements) or estimated errors (e.g., for DSC deconvolution).

Structural Heterogeneity of Type I Collagen
kinetics of amide NH groups. At physiological conditions, the H-D exchange is catalyzed by OD Ϫ ions in surrounding water (38,39). Because Gly-NH groups in the GXY triplets are buried in the core of the helix, their H-D exchange rate is limited. Local triplet helix unfolding is required to expose them to surrounding water and OD Ϫ (2,38,40). Different rates of local unfolding within different regions of the helix may contribute to the observed wide range of slow exchange rates (from minutes to days). However, the H-D exchange measurements in collagen reported previously (2, 40 -42) did not differentiate Gly-NH within GXY triplets from other amides and did not distinguish the contributions of the unfolding and catalytic exchange processes into the overall reaction rate.
To capture the kinetics of local triple helix unfolding, we studied slow (Ͼ1 min) time evolution of amide II and amide A bands in infrared (IR) spectra of rat tail tendon fibrils at pD 10. The high pD decreased the time of OD Ϫ catalyzed H-D exchange in partially exposed NHs to less than a second. It eliminated the contributions of non-Gly amides and of the exchange process itself. The observed frequency of the amide A band of the slowly exchanging groups (Ͼ3331 cm Ϫ1 ) was consistent with that expected (43)(44)(45)(46) for Gly-NH involved in interchain hydrogen bonds within the triple helix and much higher than observed for other amides (3280 -3310 cm Ϫ1 ). The relative amplitude of the slow exchange was consistent with the fraction of Gly-NH calculated from the amino acid sequence. The exchange rate was independent of pD, confirming that it was determined entirely by the rate of local unfolding.
The measured time evolution of the fraction of unexchanged amides (Fig. 6) was, therefore, determined only by the kinetics of local triple helix unfolding. At the same time, high pD/pH did not cause appreciable changes in high definition IR spectra of the triple helix at or below 50°C, indicating that the normal collagen structure and dynamics were fully preserved.
The measured time evolution of the normalized fraction of unexchanged Gly-NH (Fig. 6) indicates the presence of a broad variation in stabilities of different triple helical regions. The data exhibit a broad distribution of helix unfolding rates k from k max ϳ 0.5 min Ϫ1 to k min ϳ 4 ϫ 10 Ϫ3 min Ϫ1 . The range of variation in the activation energy of local unfolding estimated from the unfolding rates was ⌬G ‡ ϳ 4.5 kcal/mol (⌬G ‡ ϳ RTln(k max /k min )).
Mapping of Local Stability from Peptide Data-Mapping of the local triple helix stability was previously based on relative stability scores (17) or T m of peptides with different guest GXY triplets inserted in the middle of a (GXY) n host (18). Both models, however, expressed the stability in relative units appropri-  Fig. 1. The inset summarizes ⌬T m dependence on the substitution identity. Analysis of the inset suggests no statistically significant correlations of ⌬T m with the substitution identity. Note that small differences in the apparent average ⌬T m for different residues may be related to special locations of few outlying mutations. For instance, the average ⌬T m is the largest for V (Ϫ2.6°C) and the smallest for D (Ϫ1.6 o ). However, upon exclusion of just ␣1(I)-G25V and ␣2(I)-G805D from these sets, the average ⌬T m for V (Ϫ1.9°C) and D (Ϫ2.0°C) become the same and consistent with the overall average ⌬T m (Ϫ2.1°C). Both excluded mutations are located within special triple helix regions, see "Discussion." Moreover substitutions of the same Gly with these residues, ␣1(I)-G121V and -G121D, result in the same ⌬T m , within the 0.3°C reproducibility of the measurements (Table 1).  FEBRUARY 22, 2008 • VOLUME 283 • NUMBER 8 ate only for qualitative analysis (18). From the same host-guest peptide data, we calculated the sequence-dependence of the activation free-energy for local helix unfolding and the corresponding H-D exchange rate for type I rat collagen (see "Experimental Procedures," Equations 1-5).

Structural Heterogeneity of Type I Collagen
We observed the best agreement between the calculated and measured H-D exchange kinetics at H 0 ‡ ϭ 45 kcal/mol and n ϭ 5 (Fig. 6). These two unknown parameters, necessary for calculations of the local stability profile, were consistent with H 0 ‡ ϭ 30 -45 kcal/mol estimated from the reported (34) CD denaturation curves of the host-guest peptides and n Ͻ 10 estimated from the length dependence of peptide T m (supplemental material). Because these are intrinsic parameters of the triple helix, independent of the collagen source, they were used to compute the profile of activation free energy ⌬G ‡ (Equation 3) for local unfolding of human collagen, which is shown in the top panel of Fig. 7.

DISCUSSION ⌬T m Map of Human Type I Collagen
Substitution Identity-At the onset of this study, we expected systematic DSC measurements to reveal larger ⌬T m for Gly substitutions with Val, Arg, Asp, or Glu, which are more destabilizing in peptides (24) and tend to be more clinically severe (21). Instead, we found no overall correlations of ⌬T m with the substitution identity (Fig. 3, inset). Moreover, substitutions of the same Gly and most substitutions within adjacent GXY triplets resulted in the same ⌬T m (Ϯ0.3°C), regardless of their identity or the chain in which they occurred (Table 1 and Fig. 3).
At first glance, these findings seem counterintuitive because different substitutions cause different local disruptions of the triple helix (24). However, ⌬T m measures the global stability of the molecule, which may be the same at different local disruptions. For instance, two mutant ␣1(I) chains result in stronger local disruptions than one chain but the same ⌬T m in G13D, G25V, and G34R (29). One mutant chain disrupts a large region surrounding the mutation, apparently eliminating the contribution of the region into T m . Two mutant chains disrupt this region at a lower temperature, but ⌬T m remains the same, being determined by the helix stability outside the disrupted region. Similar ⌬T m in most molecules with two and one mutant chain may explain why we observed three DSC peaks only in 2 of 23 studied ␣1(I) mutations.
More importantly, we found no effect of the substitution identity or chain on ⌬T m at the same site in the tested cases. These cases included three pairs of substitutions of the same Gly in the same chain (G121D/G121V and G148D/G148R in ␣1(I) and G247S/G247C in ␣2(I)), two pairs of the same substitution in different chains (G121D/G121D and G247S/G247S), three pairs of different substitutions in different chains, (G121D/G121V, G247S/G247C, and G898S/G898V), and one pair of adjoining substitutions (G922S/G925R in ␣2(I)). More testing would be interesting, but considering insufficient number of available cell lines, in the present study we focused on the role of the mutation location instead.
Substitution Location-Mapping of ⌬T m versus mutation location revealed a regular pattern (Figs. 3 and 7). Moreover, the pattern for just Gly 3 Ser substitutions at 16 different locations appeared to be the same as for various other substitutions pooled together. Evidently, ⌬T m depends on which region is disrupted by the mutation (17). Larger ⌬T m are caused by disruptions of stronger regions, contributing more to the stability of the whole helix. Smaller ⌬T m are caused by disruptions of weaker regions, contributing less. The ⌬T m map then reveals regional contributions into the global triple helix stability. We observed average ⌬T m Ϸ Ϫ2.1°C (Ϯ2, where ϭ 0.3°C is the ⌬T m reproducibility) for most mutations, except for three distinct regions marked on the ⌬T m map in Fig. 7.  N-anchor is a highly stable region extending from Gly-1 to ϳGly-85 (29). In all Gly substitutions within this region we observed large ⌬T m from Ϫ3.2 to Ϫ4.6°C in molecules with one and/or two mutant chains (Table 1). These substitutions disrupted the whole region (29,30). Even ␣1(I)-G88E, located at the region boundary, appeared to have an effect on N-propeptide cleavage on the other side of the region (30).
Mid-flex is a flexible region in the middle of the molecule, in which triple helix weakness results in less than average contribution to the overall stability. This region includes substitutions of four Gly residues from Gly-352 through Gly-436 with ⌬T m , from Ϫ0.8 to Ϫ1.5°C (Fig. 7 and Table 1). The exact boundaries of this region are presently difficult to pinpoint. For instance, we excluded adjacent ␣2(I)-G337S and ␣1(I)-G448S, which have ⌬T m ϭ Ϫ1.6°C. Given the 0.3°C uncertainty in the ⌬T m , such exclusion is somewhat arbitrary.
C-flex is a flexible region near the C-terminal end of the molecule, which has even lower stability. It includes substitutions of five residues from Gly-676 through Gly-832 with ⌬T m from 0 to Ϫ1.2°C. As with the Mid-flex region, the exclusion of ␣2(I)-G610C and -G649R (both with ⌬T m ϭ Ϫ1.6°C) from this region is somewhat arbitrary. The exact C-terminal boundary of this region is not defined yet as well, because the closest studied mutations are located at Gly-898.
Presently, we do not know whether ␣1(I)-G148D/G148R (⌬T m Ϸ Ϫ1.2°C) and ␣2(I)-G511S/␣1(I)-G523C (⌬T m Ϸ Ϫ3.4°C) are just "outliers" or they are located within other distinct structural regions, as the mutations discussed above. Our FIGURE 7. Regional variations of triple helix properties in human type I collagen. ⌬G ‡ is the deviation of the activation energy for local triple helix unfolding from its average value, calculated at 37°C as described under "Experimental Procedures" (H 0 ‡ ϭ 45 kcal/mol and n ϭ 5, Fig. 6). White boxes show D-periods within collagen fibrils. Cyan boxes show MLBRs (54). Green box shows the region of ␣1(I)-OI/EDS mutations (30). Red boxes show extended lethal regions (LR) in the ␣1(I) chain where virtually all Gly substitutions are lethal (21). Yellow boxes show clustering of lethal mutations (LC) in the ␣2(I) chain (21). The ⌬T m map for glycine substitutions (bottom) is the same as in Fig. 3, except for the symbols. The gray box indicates average ⌬T m ϭ Ϫ2.1 Ϯ 0.6°C. Double arrows show the stable N-anchor region delineated in (29,30) and tentative assignments for flexible Mid-flex and C-flex regions, in which ⌬T m lie below the gray box. current collection of mutant cells is not sufficient for answering this question.

Local Stability (⌬G ‡ ) Map
To test whether ⌬T m are correlated with the local stability of the triple helix at the substitution site (17), we extended the model proposed in (18). From host-guest peptide stabilities reported in (18) we calculated sequence dependent activation free energy ⌬G ‡ for local helix opening. We measured H-D exchange upon concomitant exposure of normally-buried amide groups of Gly to water. We found the exchange kinetics to be in good agreement with the one predicted from the ⌬G ‡ calculations (Fig. 6). This agreement suggests that the ⌬G ‡ map (Fig. 7) is a reasonable working model. However, it does not guarantee every detail of the map to be accurate, because the exchange kinetics does not resolve such details (Equations 4 and 5).
When comparing the ⌬T m and ⌬G ‡ maps (Fig. 7), it is important to keep in mind that Gly substitutions may disrupt large regions of the helix upon ⌬T m measurement. Disruptions localized to just a few adjacent triplets were reported only in peptide crystals (47) or at 4°C (48). At relevant temperatures in solution, Gly substitutions cause complete unfolding of the same 8 -11 triplet peptides (24,48). In human type I collagen, ␣1(I)-G13D, -G25V, -G34R, and -G76E disrupt ϳ30 triplets at the N-terminal end (29,30). ␣2(I)-G898V and, probably, ␣1(I)-G898S cause cooperative, reversible unfolding of ϳ100 surrounding triplets (supplemental material).
Disruption of 10 -100 triplets will affect several peaks and valleys in the ⌬G ‡ map, most of which span less than 10 triplets (Fig. 7). As a result, ⌬T m will depend not only on ⌬G ‡ at the mutation site but also on adjacent peaks and valleys. Therefore, ⌬T m and ⌬G ‡ patterns should be complementary rather than similar.
Based on this argument, one may expect large ⌬T m when a mutation disrupts a wide region (over 10 triplets) of high local stability, as within the N-anchor. One may expect small ⌬T m when a mutation occurs in a wide region (over 10 triplets) with low local stability, as within the Mid-flex region. One may expect average ⌬T m when a mutation occurs in a region with more rapidly varying ⌬G ‡ , which is the case for most other mutations. At the present time, it is difficult to say why the C-flex region does not conform to these simple rules. One possibility is that the inherent simplifications of the underlying model (18) cause large inaccuracies in the ⌬G ‡ map within this region.

Potential Implications
It is instructive to overlay the ⌬T m and ⌬G ‡ maps with the triple helix regions important for collagen function and pathology (Fig. 7).
Fibrillogenesis-From the D-period map of type I collagen packing in fibers (49 -52), we see that the Mid-flex region aligns with the gap region of the D2 segment. This is exactly where the most significant triple helix bending occurs in fibrils (53), which is probably not a coincidence.
Ligand Binding-The C-flex region aligns with MLBR2, which is one of the three Major Ligand Binding Regions proposed in Ref. 54. An increased helix flexibility and less tight packing of the chains within this region are important for binding and triple helix cleavage by collagenases (55). Because tight triple helix is likely to reduce sequence recognition, the increased flexibility may be important for binding of other ligands as well.
Interestingly, MLBR1 may also align with a flexible region in the triple helix, as suggested by ␣1(I)-G148D/R mutations with low ⌬T m in the middle of MLBR1. We hope to test this hypothesis as soon as we have enough mutant cell lines within the region.
OI Phenotypes of ␣1(I) Mutations-To date, close to 400 cases of glycine substitutions in the ␣1(I) chain have been described, but only two large (Ͼ20 triplets) regions with distinct OI phenotypes have been discovered (21).
Mutations within the N-anchor result in an OI/EDS phenotype (29,30). Except for one G76R case, no lethal Gly substitutions were found within this region (21). In a previous study, we argued that these mutations cause unfolding of the whole N-anchor in procollagen secreted from cells, preventing proper N-propeptide cleavage (29). Incorporation of the uncleaved pN-collagen molecules into fibers leads to symptoms reminiscent of EDS VII A and B, caused by mutations in the N-propeptide cleavage site (30).
Essentially all (14 of 15 cases) ␣1(I) Gly substitutions from Gly-691 through Gly-823 were found to be lethal (21). The clinical outcome of the only case reported as non-lethal is unknown since the pregnancy was terminated after early detection of severe deformities by ultrasound. Furthermore, the abnormally low density of the substitutions discovered within this region (0.34/triplet versus average 1.16/triplet in the entire ␣1(I) chain) may also point to an even more severe, embryonic lethal phenotype in addition to the detected cases (21). This lethal phenotype region aligns almost perfectly with the C-flex region and MLBR2 (Fig. 7). Presently, we do not know whether such an alignment is coincidental. We can only hypothesize that weaker destabilization of the triple helix (smaller ⌬T m ) within this region may allow more complete folding and secretion of molecules containing mutant chains. Secretion of molecules with an altered MLBR2 structure may be particularly detrimental for bone.
OI Phenotypes of ␣2(I) Mutations-In about 300 reported cases of Gly substitutions in the ␣2(I) chain, clustering of lethal mutations was delineated (21). Most lethal cases were found within eight clusters that do not exhibit any obvious correlations with either ⌬T m or ⌬G ‡ maps (Fig. 7). Evidently, changes in helical stability are not directly involved in the lethal outcome of heterozygous ␣2(I) mutations, perhaps because of the lower fraction of mutant molecules. Instead, the pattern of lethal ␣2(I) mutations may be determined, e.g. by altered ligand binding at disrupted mutation sites (54,56).

Conclusions
In summary, we found: 1) no significant overall correlations of ⌬T m with Gly substitution identity; 2) no straightforward correlations of ⌬T m with the local helix stability at the substitution site; 3) a regular ⌬T m variation with the substitution location, revealing regional variations in the stability and dynamic structure of the triple helix; and 4) some correlations of these regions with those important for fiber formation, ligand binding, and OI phenotypes of ␣1(I) mutations.
We argue that OI severity may be related to the extent of structural disruptions caused by Gly substitutions and the resulting ⌬T m , but this relationship is neither simple nor straightforward. For instance, the tendency of Val, Arg, Asp, and Glu to cause more severe OI compared with Ser or Ala substitutions of the same Gly (21) may be related to more extensive local helix disruptions (24). However, ⌬T m may simply not reveal the difference in the extent of local disruptions for the reasons discussed above.
It is also useful to keep in mind that the disruption of collagen structure may be the initial cause of the disease and yet only one of many factors affecting the disease outcome. Significant OI phenotype variations were reported in patients with the same mutation (21). Moreover, lethal to moderate phenotype range observed in the Brtl IV mouse model with ␣1(I)-G349C substitution (57) was demonstrated to be unrelated to variations in collagen properties (35,58,59).
Thus, we focus on large regions of distinct structural disruptions and distinct OI phenotypes rather than on overall correlations between ⌬T m and OI severity. We hope that this approach may better reveal any underlying structure-function relationships. We believe that our progress is encouraging, but many challenges remain. The map of known OI mutations is rapidly saturating possible glycine substitution sites, but systematic studies of properties and interactions of collagens with these mutations are only beginning.