Recombinant Procollagen II: Deletion of D Period Segments Identifies Sequences That Are Required for Helix Stabilization and Generates a Temperature-sensitive N-Proteinase Cleavage Site*

A cDNA cassette system was used to synthesize recombinant versions of procollagen II in which one of the four blocks of 234 amino acids that define a repeating D periods of the collagen triple helix were deleted. All the proteins were triple helical and all underwent a helix-to-coil transition between 25 and 42 °C as assayed by circular dichroism. However, the details of the melting curves varied. The procollagen lacking the D1 period unfolded 3 °C lower than a full-length molecule. With the procollagen lacking the D4 period, the first 25% of unfolding occurred at a lower temperature than the full-length molecule, but the rest of the structure unfolded at the same temperature. With the procollagen lacking the terminal D0.4 period, the protein unfolded 3 °C lower than the full-length molecule and a smaller fraction of the protein was secreted by stably transfected clones than with the other recombinant procollagens. The results confirmed previous suggestions that the collagen triple helix contains regions of varying stability and they demonstrated that the two D periods at the end of the molecule contain sequences that serve as clamps for folding and for stabilizing the triple helix. Reaction of the recombinant procollagens with procollagen N-proteinase indicated that in the procollagen lacking the sequences, the D1 period assumed an unusual temperature-sensitive conformation at 35 °C that allowed cleavage at an otherwise resistant Gly-Ala bond between residues 394 and 395 of the α1(II) chain.

The fibrillar collagens are major structural proteins that largely define the size, shape, and strength of tissues in most complex organisms (1)(2)(3)(4). In addition, collagen fibrils are encountered in unusual biological situations such as the byssal threads whereby mussels adhere to solid surfaces (5) and in tube worms that live near deep sea hydrothermal vents (6). Collagen fibrils are formed by the spontaneous self-assembly of collagen monomers (1)(2)(3)(4). The monomers of the major fibrillar collagens, types I, II and III, are very similar in structure in that they consist of 338 or 343 consecutive repeating tripeptide sequences of -Gly-Xxx-Yyy-tripeptide units. The -Xxx-position of the sequence is frequently proline and the -Yyy-position is frequently hydroxyproline. The tripeptide sequences with proline in the Xxx position and hydroxyproline in the Yyy position drive folding of the polypeptide chains into a unique triplehelical conformation of collagen. Other amino acids between the obligate glycine residues form hydrophobic or hydrophilic clusters on the surface of the molecules that direct the selfassembly of the proteins into stable fibrils. Because of the constraints conferred by the triple-helical conformation, the monomers of collagens are generally regarded as rigid rods. However, there are many indications that different regions of the collagen triple-helix vary in stability and undergo microunfolding in the physiological range of temperatures (7)(8)(9)(10)(11)(12)(13)(14)(15)(16). Evidence for the microunfolding of the monomer included the effects of partially denaturing and then renaturing the protein (8,9), experiments involving reversible inhibition of hydroxylation of proline and lysine residues during biosynthesis (10), comparisons of the helix-forming properties of synthetic peptides with repetitive -Gly-Xxx-Yyy-sequences (11)(12)(13)(14)(15)(16), measurements of enthalpy changes by microcalorimetry (7), and the effects of temperature on the kinetics of fibril formation (17). One of the most direct indications of regions of varying stability in the triple helix came from mutations that convert different obligate glycine codons to codons for amino acids with bulkier side chains and cause the brittleness of bones and other tissues characteristic of the heritable disease known as osteogenesis imperfecta (3, 18 -20). For example, one mutation that converted the glycine codon at position 631 of the ␣1(I) chain to a codon for serine had no measurable effect on the thermal stability of type I collagen, whereas a similar mutation that converted the glycine codon at position 598 to a codon for serine markedly lowered the melting temperature of the protein (20). Similar results were recently obtained with synthetic peptides containing collagen sequences in that a serine substitution for glycine at ␣1-913 destabilized the triple helix more than a serine for glycine substitution in the same peptide at ␣1-901 (16).
We recently developed a cDNA cassette system to synthesize recombinant versions of procollagen II with deletions in one of the blocks of 234 amino acids that define the four repeating D periods of the collagen triple helix (21). Expression of the deleted D period cassettes in a mammalian system provided modified procollagen II that was secreted and was triple helical. Here we have compared the thermal stabilities of the recombinant procollagens missing specific D periods. The re-sults provide direct evidence that some regions of the monomer are rich in sequences that stabilize the triple helix and thereby provide clamps for folding and unfolding of the molecule. The results also demonstrate that deletion of the sequences of the D1 period allows the protein to assume an unusual temperature-sensitive conformation at 35°C so that procollagen Nproteinase cleaves an otherwise resistant Gly-Ala bond in the D2 period of the ␣1(II) chain.

MATERIALS AND METHODS
cDNA Constructs-Discrete regions of cDNAs for the pro-␣1(II) chain of procollagen II were subcloned to generate seven cassettes that encoded (a) the N-propeptide and the N-telopeptide, (b) the D1 period, (c) the D2 period, (d) the D3 period, (e) the D4 period, (f) the D0.4 period at the C terminus of the triple helix, and (g) the C-telopeptide 1 and the C-propeptide (21). The cassettes were amplified by the PCR using primers designed to introduce new restriction sites so that the cassettes could be assembled into a variety of different constructs (see Ref. 21). The DNA cassettes were cloned into the bacterial strain DH5␣ (Life Technologies, Inc.). The cassettes were excised with HindIII and BsrBI and then used to assemble a series of DNA constructs that were inserted into the HindIII and EcoRV sites of the mammalian expression vector pcDNA3 (Invitrogen) containing a cytomegalovirus promoter and a gene encoding neomycin resistance. The structures of all the junctions in the constructs were verified by DNA sequencing.
Cell Transfection-HT-1080 cells (American Type Culture Collection CCL 121) were cultured in Dulbecco's modified Eagle's medium supplemented with 10% (v/v) fetal calf serum (22). The cells were transfected with one of the functional DNA constructs by the calcium phosphate precipitation method using a commercial kit (Profection Mammalian Transfection System Kit; Promega). Briefly, cells were split 18 h prior to transfection, grown to a density of approximately 10 6 cells in a 10-cm cell culture dish, and provided with fresh culture medium 3 h prior to transfection. Each DNA construct used for transfection was linearized by cleavage with PvuI. Approximately 12 g of DNA was precipitated with calcium phosphate and incubated with the cells for approximately 17 h. Fresh culture medium was then applied. After the cells reached confluency, they were split 1 to 10. After 24 h, medium was added that contained G418 (Life Technologies, Inc.) at an active concentration of 400 g/ml. The medium was exchanged with fresh selection medium every 48 h over a 12-day period.
Screening of Transfected Clones-Isolated G418-resistant cell colonies were expanded in 6-, 12-, or 24-well plates (22). Upon reaching confluency, cells were cultured for 24 h with 1 ml of serum-free Dulbecco's modified Eagle's medium supplemented with 41 g/ml L-ascorbic acid phosphate magnesium salt n-hydrate (Wako Pure Chemical Industries, Ltd.) and 0.5 Ci/ml uniformly 14 C-labeled amino acid mixture (NEN Life Science Products Inc.). The medium was collected and proteins precipitated using 8000 M r polyethylene glycol (Sigma) at a concentration of 5% (w/v). Precipitated proteins were pelleted by centrifugation, recovered in storage buffer (0.4 M NaCl, 25 mM EDTA, and 0.04% NaN 3 in 0.1 M Tris/HCl buffer, pH 7.4), and separated by SDSpolyacrylamide gel electrophoresis under reducing conditions followed by electroblottings. The samples were assayed both with a phosphor storage plate (Storm System; Molecular Dynamics) for 14 C-labeled protein and by Western blotting. For Western blotting the primary antibody was a guinea pig anti-human antibody specific for the C-telopeptide region of procollagen II (kindly provided by Dr. Carmen Merryman, Department of Biochemistry and Molecular Biology, Thomas Jefferson University, Philadelphia, PA). The secondary antibody was goat antiguinea pig antibodies conjugated with alkaline phosphatase (Sigma).
Recombinant Procollagen II Production-Clones that secreted recombinant procollagen II were grown in Dulbecco's modified Eagle's medium supplemented with 10% (v/v) fetal calf serum. Confluent cells from 10 175-cm 2 cell culture flasks (22) were expanded to four inter-connected flasks (6000 cm 2 each; Nunc Cell Factories; Nunc). When the cells reached approximately 80% confluency, the culture medium was removed and the cell layers were washed briefly with phosphate-buffered saline. Labeling medium without fetal calf serum but containing 0.17 Ci/ml of a uniformly 14 C-labeled amino acid mixture (NEN Life Science Products Inc.) was added for 24 h, and then collected and exchanged with fresh labeling medium. After another 24 h, the second labeling medium was collected and replaced for another 24 h by medium that did not contain 14 C-labeled amino acids. Following collection of the third medium, the cell layers were washed twice with phosphate-buffered saline containing 1 mM EDTA. The cells were then treated with the same cycle of three consecutive 24-h incubations of media to generate a total of six 24-h collections.
Purification of Recombinant Procollagens-The method of Fertala et al. (22) was used with minor modifications. Media harvested from each 24-h period was filtered through a 1.6-m glass fiber filter (Millipore) and supplemented with stock solutions to provide the following final concentrations: 0.1 M Tris-HCl buffer, 0.4 M NaCl, 25 mM EDTA, 10 mM N-ethylmaleimide, 1 mM p-aminobenzamidine, and 0.04% NaN 3 . The medium was adjusted to pH 7.4. High molecular mass proteins were concentrated approximately 10-fold with filter cartridges with a 100-kDa molecular mass cut-off (Prep/Scale TFF; Millipore). The proteins were precipitated overnight by the addition of (NH 4 ) 2 SO 4 to a final concentration of 175 mg/ml, and the precipitate was collected by centrifugation at 15,000 ϫ g for 1 h. Pellets from each 24-h collection were pooled and resuspended overnight in storage buffer (22) and then dialyzed twice against 200 volumes of DEAE-cellulose column I buffer (2 M urea, 0.2 M NaCl, 5 mM EDTA, and 0.04% NaN 3 in 0.1 M Tris/HCl buffer, pH 7.4). Insoluble material was removed by centrifugation. The supernatant was chromatographed on a DEAE-cellulose anion-exchange column (2.6 ϫ 15 cm) equilibrated and eluted with the DEAE-cellulose column I buffer. The elution profile was assayed by 14 C content in a liquid scintillation counter (Beckman). The flow-through fraction was collected and dialyzed against 200 volumes of DEAE-cellulose column II buffer (2 M urea, 2 mM EDTA, and 0.04% NaN 3 in 0.075 M Tris/HCl buffer, pH 7.8). The sample was chromatographed on a second DEAEcellulose column (2.6 ϫ 15 cm) equilibrated and eluted with the DEAEcellulose column II buffer. The flow-through fraction was collected and applied in the same buffer to a third anion-exchange column (1.6 ϫ 5 cm; Q-Sepharose, Pharmacia). The column was washed with DEAEcellulose column II buffer and the recombinant procollagen II was eluted with 0.4 M NaCl in the same buffer. The eluted protein was dialyzed against 200 volumes of storage buffer and stored at Ϫ80°C. For further analysis by CD, the proteins were concentrated on a membrane filter (YM-100; Amicon) and the storage buffer was exchanged with storage buffer that did not contain EDTA.
Amino Acid Composition and N-terminal Amino Acid Sequence Analysis-The amino acid compositions and protein concentrations of the purified procollagens was assayed on protein hydrolysates by the Wistar Protein Microchemistry Core Facility, Philadelphia, PA. All N-terminal amino acid sequencing by Edman degradation was also performed by the same facility.
Additional Assays for Expression of the ϪD0.4 Period Construct-Because expression of the ϪD0.4 period construct was not detectable by more rapid migration of the recombinant protein on electrophoretic gels, mRNA from cells expressing the construct was assayed by RT-PCR. Total mRNA was extracted from the cells with a selective resin (RNeasy; Qiagen), and the RNA was reverse transcribed with random primers (First Strand cDNA synthesis kit; Pharmacia Biotech Inc.). The cDNA for pro-␣1(II) chains amplified by PCR with a primer pair spanning the codon for Ser at amino acid position 898 (numbered from the first Gly of the triple helix) to the codon for Trp at amino acid position 1,136 within the C-propeptide. The PCR products were separated on an agarose gel.
To assay secretion of the protein, confluent clones expressing the FL and ϪD0.4 constructs were incubated in 12-well microtiter plates (5 cm 2 ) for 24 h (22). Proteins in the medium were precipitated with 5% polyethylene glycol. Samples were centrifuged at about 15,000 ϫ g for 30 min at 4°C and pellets were resuspended in 32 l of storage buffer. Cells were lysed in 200 l of buffer containing 1% SDS, 1% sodium deoxycholate, 0.1% Triton X-100, 10 mM EDTA, 0.5 unit of aprotinin/ml (Sigma), and 3% ␤-mercaptoethanol in phosphate buffer adjusted to pH 7.4. The cell lysate and proteins precipitated from the media were analyzed by Western blot analysis with guinea pig anti-human collagen II antibodies and secondary antibodies of anti-guinea pig IgG/horseradish peroxidase (Sigma). Bands were detected using the ECL kit (Amersham), and x-ray film (Hyperfilm ECL; Amersham) at two different 1 The abbreviations used are: C-telopeptide, C-terminal telopeptide; C-propeptide, C-terminal propeptide; N-telopeptide, N-terminal telopeptide; N-propeptide, N-terminal propeptide; CD, circular dichroism; FL, full-length procollagen II molecule; residues 1 to 1,014, amino acid positions of the ␣1(II) chain numbered from the first glycine in the major triple helix; pN␣1(II) chains, pro-␣1(II) chains lacking the Cpropeptide; pC␣1(II) chains, pro-␣1(II) chains lacking the N-propeptide; pX␣ chains, pro-␣1(II) chains lacking the D1 period that are cleaved at the Gly-Ala bond at amino acid positions 394 and 395; RT-PCR, reverse transcriptase-polymerase chain reaction.
times of exposure. The film was assayed by densitometry (Personal Densitometer SI; Molecular Dynamics).
To assay the protease resistance of procollagen lacking the D0.4 period, medium proteins were precipitated with 176 mg/ml ammonium sulfate from 10 175-cm 2 flasks of confluent cells and resuspended in storage buffer (22). The samples were preincubated at 25-43°C, and then digested at the same temperature for 2 min with 250 g/ml ␣-chymotrypsin and 100 g/ml trypsin (27). Protease-resistant ␣1(II) chains were assayed by SDS-polyacrylamide gel electrophoresis and densitometry of gel stained with colloidal Coomassie Blue (Sigma).
Cleavage of Recombinant Proteins with Procollagen N-and C-Proteinases-Purified 14 C-labeled novel procollagens II were used as substrates for procollagen N-proteinase and procollagen C-proteinase purified from chick embryo tendons (23,24). The assay conditions were 27 l containing approximately 1 g of procollagen and either 2.2 units of N-proteinase or 1.0 unit of C-proteinase in 7 mM CaCl 2 , 0.1 M NaCl, 0.015% Brij, and 0.02% NaN 3 in 25 mM Tris/HCl buffer, pH 7.5. One unit of each of these enzymes is defined as the amount of enzyme needed to cleave 1 g of substrate in 1 h at 35°C using the buffer conditions just stated. Unless otherwise noted, digestions were generally performed at 35°C for 3 or 4 h. Products of the proteinase cleavage were separated in SDS-polyacrylamide gels and analyzed using a phosphor storage plate.
Unfolding and Folding of the Recombinant Procollagens-CD was assayed in a spectropolarimeter (JASCO J-500A) using thermostatted quartz cells with a path length of 0.05 cm as described by Davis and Bä chinger (13). The temperature of the sample was monitored by a thermistor and a digital thermometer (Omega Engineering, Inc.), and the temperature of the circulating water bath (Lauda RCS20D) was controlled by a temperature programmer (Lauda PM350). The CD spectrum of the sample was scanned from 180 to 260 nm. For melting experiments, the temperature of the sample was increased at a rate of 10°C/h and the CD signal at 221 nm was monitored. The CD and temperature signals were recorded in the X-Y mode with an HP7090A measurement plotting system. The degree of conversion was calculated as described by Bruckner et al. (25). Because the protein yields were too low for assays by CD, the thermal stability of the procollagen lacking the D0.4 period was assayed by brief digestion with chymotrypsin and trypsin (26,27). A clone secreting the protein was incubated with 14 C-labeled amino acids, the medium proteins were protease digested, and the protease-resistant ␣1(II) chains were assayed by polyacrylamide gel electrophoresis in SDS followed by analysis with a phosphor storage plate analyzer (26,27).

Recombinant Procollagens Lacking a Complete D Period-
Cassettes of cDNA for the pro-␣1(II) chain of type II procollagen were assembled (21) into six DNA constructs that coded for either full-length pro-␣1(II) chains or pro-␣1(II) chains lacking one of the four D periods (Fig. 1). The constructs were then used to prepare stably transfected clones of a mammalian cell line (HT-1080) that secreted the recombinant proteins. Screening the medium from the G418-resistant cells with antibodies to type II procollagen identified a few clones that secreted small amounts of FL molecules even though the constructs used for transfection coded for a truncated pro-␣1(II) chain, apparently because a small subpopulation of HT-1080 cells expresses the endogenous COL2A1 gene. However, such clones were readily eliminated by the more rapid gel migration of pro-␣1(II) chains with deletions of a complete D period (Fig. 1). Chromatographic separation of the medium proteins from the clones expressing the ϪD1, ϪD2, ϪD3, and ϪD4 constructs gave preparations that were homogeneous by SDS-gel electrophoresis and that had the expected amino acid composition except for lower values for glutamate and higher values for histidine and tyrosine (21,29,30). The yields of secreted protein from the ϪD0.4 construct were too small to purify the recombinant protein.
After digestion of the crude medium with trypsin and chymotrypsin, both ␣ chains and two bands of degraded ␣ chains were obtained (upper panel, Fig. 1). Similar bands of degradation products of ␣1(II) chains were previously obtained after protease digestion of recombinant collagen II (27,28) and tissueextracted collagen II (29,30).
CD spectra of the proteins lacking ϪD1, ϪD2, ϪD3, and ϪD4 periods revealed that the magnitude of the maximum at 221 nm for each of the procollagens lacking a complete D period was consistent with predicted values except that the maximum was slightly higher for the recombinant protein lacking the D3 period (21). The thermal stabilities of four of the recombinant procollagens were assayed by CD. All of the proteins underwent a sharp helix-to-coil transition between 25 and 42°C (Fig.  2). However, there were significant differences among the proteins. With procollagen lacking the D1 period, the temperature for 25, 50, and 75% of unfolding were 2 to 4°C less than for the FL molecule (Table I). With procollagen lacking the D2 period, all three values were indistinguishable from the control. With procollagen lacking the D3 period, the values for 50 and 75% unfolding were higher than the control. With the protein lacking the D4 period, the value for 25% unfolding was less than control but the other two values were the same.
Recombinant Procollagen Lacking the D0.4 Period-A sepa- Proteins secreted to the media by clones expressing the -D1, -D2, -D3, and -D4 periods were precipitated with 175 mg/ml ammonium sulfate, digested with chymotrypsin and trypsin to remove the propeptides and telopeptides, and were separated on 7.5% polyacrylamide gels in SDS. The gel was stained with Coomassie Blue. Recombinant collagen ␣ chains with deletions of a complete D period migrate more rapidly than full-length ␣ chains but differently from each other because of variations in post-translational modifications (22). The yield of secreted protein from clones expressing the -D0.4 construct were too low for purification of the recombinant protein. Therefore, the crude medium was precipitated with 175 mg/ml ammonium sulfate. The proteins were dialyzed and digested with 100 g/ml trypsin and 250 g/ml chymotrypsin at room temperature for 7 min before electrophoretic separation. The -D0.4 protein migrated similarly to FL chain. In addition, two bands of degradation products of ␣1(II) chains were seen (see Refs. [27][28][29][30]. Lower panel, schematic of the DNA constructs used for synthesis of recombinant procollagens II. rate construct was prepared that lacked a cassette for the D0.4 period coding for the last 78 amino acids of the triple-helical domain of the ␣1(II) chain. Because deletion of 78 amino acids produced a minimal shift in migration of pro-␣1(II) chains (Fig.  1), clones expressing the construct were identified by an RT-PCR. As expected, there was a difference of 234 base pairs in the RT-PCR products (top right panel in Fig. 3). Fewer positive clones were obtained than with constructs lacking a complete D period in that only 2% of stably transfected clones secreted detectable levels of recombinant protein (0.01 g/ml), whereas 7-26% of clones obtained with the other constructs secreted high levels (0.1-0.2 g/ml). The amount of intracellular pro-␣1(II) and partially processed pro-␣1(II) in clones expressing the D0.4 construct was about the same as in cells expressing the FL construct (top left panel in Fig. 3). However, the recombinant protein was not secreted as efficiently (middle panel in Fig. 3). As a result, the ratio of medium to intracellular recombinant protein after 24 h was 1.7 in the clones expressing the ϪD0.4 period construct whereas it was 23 in the clone expressing the FL construct. Because the yields were too low for assays by CD, the thermal stability of the procollagen lacking the D0.4 period was assayed by brief digestion with chymotrypsin and trypsin (26,27). The assays indicated that the midpoint for unfolding of the recombinant procollagen II lacking the D0.4 period was about 3°C lower than the full-length construct (lower panel in Fig. 3, and Table I).
Digestion of Recombinant Procollagens with C-Proteinase and N-Proteinase-Five of the purified 14 C-labeled recombinant procollagens were tested as substrates for procollagen C-proteinase and N-proteinase (23,24). The C-proteinase apparently cleaved all five of the recombinant proteins (Fig. 4). After digestion with the N-proteinase, fragments of the expected size were obtained with four of the procollagens, i.e. the FL protein and procollagens lacking either the D2, D3, or D4 periods (Fig. 4). However, procollagen lacking the D1 period generated a different pattern of fragments. Under a variety of conditions, the protein lacking the D1 period was digested more slowly (Figs. 4, 5 and 6). Also, two different large fragments were generated. One fragment co-migrated with the expected cleavage product of pC␣1(II) chains. The fragment had the same N-terminal sequence as authentic pC␣1(II) chains (24) and, therefore, arose from cleavage at the normal cleavage site of the N-proteinase. The second and more abundant fragment that was designated as pX␣ chains migrated more rapidly. Cleavage to pX␣ chains was temperature-dependent in that pX␣ chains were the major large fragment generated at 35°C FIG. 2. CD melting curves of recombinant procollagens II. In each case the melting curves obtained for a procollagen lacking a complete D period is compared with that obtained for FL procollagen. CD absorbance data was collected at 221 nm. Samples were heated at a rate of 10°C/h. Each melting curve contains 1000 data points. Symbols as in Fig. 1. , and T m 0.75 correspond to the temperatures at which the degree of conversion is 0.25, 0.5, and 0.75, respectively. All assays were repeated at least two times. Significant differences from values for FL type II procollagen are in bold underlined type. but pC␣1(II) chains were the major products generated at 25°C (Fig. 6). Specifically, the ratio of pC␣1 to pX␣1 chains was 3.4:1 at 25°C, but decreased to 2.2:1 at 30°C and to 0.2:1 at 35°C. Cleavage to pX␣ chains was not explained by nonspecific cleavage of partially unfolded protein, since cleavage to pX␣ chains was not seen when the same protein was digested with a mixture of trypsin and chymotrypsin at temperatures ranging from 25 to 43°C under the conditions of the experiment in Figure 3 (bottom panel).
To define the structure of pX␣ chains, the band was excised from a polyacrylamide gel and the N-terminal sequence assayed. The sequence was Ala-Arg-Gly-Gln-Pro-Gly-Val-Met-Gly-Phe. Therefore, the cleavage by the N-proteinase was at the Gly-Ala bond between residues 394 and 395 of the ␣1(II) chain (31). DISCUSSION The results here extend previous indications that some sequences of the collagen molecule form a triple helix that is more thermally stable than other sequences in the same molecule (2,4,(7)(8)(9)(10)(11)(12)(13)(14)(15)(16). Since the procollagen II lacking the D1 period unfolded at a lower temperature than the FL molecule, the 234 residues in the D1 period must be rich in sequences that stabilize the triple helix and, therefore, serve as an N-terminal clamp for the helix. Since the procollagen II lacking the D2 period had the same melting profile as the full-length molecule, the sequences between 235 and 468 must be relatively neutral in their effects. A similar conclusion was reached earlier on the basis of the observation that a spontaneous deletion of residues 157 to 447 did not significantly lower the melting temperature of a recombinant procollagen II (27). Since the procollagen II lacking the D3 period had a higher melting temperature than the FL molecule, the residues between 469 and 702 must contain sequences that are less helix stabilizing than most of the sequences in the molecule. With the procollagen II lacking the D4 period, the first 25% of unfolding occurred at a lower temperature than the full-length but the rest of the structure unfolded at the same temperature. The data suggest, therefore, that some of the sequences of residues 702 to 936 in the D4 period serve to stabilize the triple helix. The D4 period contains the sequences around the vertebrate collagenase cleavage site at residues 775 and 776 that were previously shown to form a relatively unstable triple helix (32). Apparently, more C-terminal sequences within the D4 period compensate for the effects of the sequences around the vertebrate collagenase site. With procollagen II lacking the 78 amino acids of the C-terminal D0.4 period (amino acids 937 to 1014), all of the secreted monomers unfolded 3°C lower than the full-length protein. Therefore, the sequences of the D0.4 period that end in five triplets of -Gly-Pro-Hyp- (33) together with the C-terminal sequences of the D4 period stabilize the C terminus of the triple helix. The presence of clamp-like sequences at both ends of the monomer of collagen II are consistent with previous observations on the kinetics of folding and unfolding of fragments of type III collagen (13).
Fewer clones expressing the ϪD0.4 construct were obtained and the yields of recombinant procollagen from the positive clones were far less than with the other constructs. Also, a smaller fraction of the ϪD0.4 procollagen was secreted into the medium. Therefore, the results indicated that the last 78 amino acids of the D0.4 period are required both for efficient folding and secretion of the protein, apparently because they are required for rapid nucleation of the triple helix. The results, therefore, complement the recent results of Bulleid et al. (34) who recently examined folding and assembly of procollagen in a semi-permeabilized cell system. They demonstrated that a single transmembrane domain can replace the roles of the C-propeptide and the C-telopeptide in chain association, but that a minimum of two hydroxyproline-containing Gly-X-Y triplets at the C terminus of the triple helix were required for nucleation.
To self-assemble into tightly packed and flexible fibrils, the collagen monomer must be synthesized not as a rigid rod but as flexible structure that undergoes extensive microunfolding in solution (1)(2)(3)(4)17). In mammals, the monomers completely unfold at about 41°C and large regions demonstrate microunfolding at 37°C. At lower temperatures of 30 -32°C, the triple helix becomes more rigid, but the monomers of collagen II do not assemble into fibrils (35) and the monomers of collagen I form unusually thick and rigid fibrils (2,17). Collagen monomers from poikilotherms consistently unfold at about 4°C above body temperature (1). Therefore, there apparently has been selective pressure for about 500 million years of evolution (36) for synthesis of monomers that microunfold at the temperatures they self-assemble into fibrils. Probably, there is also selective pressure for clamp-like sequences at the ends of the monomers. The requirements both for regions of microunfolding and for regions of more stable triple-helical conformation help to explain why some single amino acid substitutions in fibrillar collagens produce lethal phenotypes whereas the same amino acid substitutions at other sites in the same monomers produce milder phenotypes difficult to distinguish from osteo-porosis or osteoarthritis (18 -20).
Procollagen N-proteinase is a metallopeptidase that specifically cleaves at single Pro-Gln or Ala-Gln sequences in each of the three pro-␣ chains of type I or type II procollagen, but it will not cleave any peptide bonds in either the isolated pro-␣ chains or partially unfolded forms of the proteins (37)(38)(39). The dependence of the reaction on the correct conformation of the substrate was previously used to demonstrate the order in which the chains are cleaved from type I procollagen, since cleavage of the first two pro-␣ chains slowed cleavage of the third pro-␣ chain, apparently because the protein partially unfolded (38). Also, the resistance to cleavage that occurred between 37 and 42°C was used to assay thermal unfolding of the protein (39). Quantitative scanning electron transmission microscopy of mixed fibrils containing both collagen I and pNcollagen I demonstrated the N-propeptide is in a "bent-back" conformation (40). A similar bent-back conformation was observed by rotary shadowing and electron microscopy of procollagen monomers (41). In addition, data on type I collagen from N-telopeptide binding studies (42,43), calculations of conformation (42)(43)(44), and peptide inhibitors of lysyl oxidase (45) indicated the N-terminal telopeptide is also folded into a bent-back conformation. Therefore, the bonds specifically cleaved by N-proteinase are probably exposed in a specific conformation. The conformation of the cleaved bonds was apparently maintained in the procollagens lacking the D2, D3, or D4 periods. In the procollagen lacking the sequences of the D1 period, the protein assumed an unusual conformation at 35°C that allowed cleavage by the Nproteinase of a Gly-Ala bond in the D2 period that resists cleavage both in the native monomer and in isolated pro-␣1(II) and pro-␣1(I) chains (Fig. 7). However, the unusual conformation is temperature sensitive, since the monomer is cleaved at the normal site as the temperature is lowered.