Production of Human Type I Collagen in Yeast Reveals Unexpected New Insights into the Molecular Assembly of Collagen Trimers*

Substantial evidence supports the role of the procollagen C-propeptide in the initial association of procollagen polypeptides and for triple helix formation. To evaluate the role of the propeptide domains on triple helix formation, human recombinant type I procollagen, pN-collagen (procollagen without the C-propeptides), pC-collagen (procollagen without the N-propeptides), and collagen (minus both propeptide domains) heterotrimers were expressed in Saccharomyces cerevisiae. Deletion of the N- or C-propeptide, or both propeptide domains, from both proα-chains resulted in correctly aligned triple helical type I collagen. Protease digestion assays demonstrated folding of the triple helix in the absence of the N- and C-propeptides from both proα-chains. This result suggests that sequences required for folding of the triple helix are located in the helical/telopeptide domains of the collagen molecule. Using a strain that does not contain prolyl hydroxylase, the same folding mechanism was shown to be operative in the absence of prolyl hydroxylase. Normal collagen fibrils were generated showing the characteristic banding pattern using this recombinant collagen. This system offers new opportunities for the study of collagen expression and maturation.

Collagen is the single most abundant protein found in animals. In the human body, it is expressed in most tissues and plays a structural, as well as a signaling, role in the development, maintenance, and repair of tissues and organs. 20 different collagen types are coded by more than 30 genes. Assembly of trimeric collagen intracellularly and formation of collagen fibers in the extracellular matrix is the result of a complex multistep process (1,2). Within the endoplasmic reticulum, the individual procollagen polypeptides undergo several co-and post-translational modifications, including hydroxylation of specific prolyl and lysyl residues, selection and alignment of three procollagen polypeptides, and disulfide bond formation among the C-propeptides. Experimental evidence suggests folding of the triple helix begins at the C terminus and propagates toward the N terminus. Prior to triple helix formation, prolyl hydroxylase converts proline in the Y position of GXY triplets to hydroxyproline. Hydrogen bonding between the ␣-chains of the triple helix increases the denaturation temperature of the molecule, preventing it from unfolding at the animal's body temperature (3). Triple helical procollagen is secreted from the cell, and the N-and C-terminal propeptides are removed by specific N-and C-proteinases. The resulting collagen monomers, consisting of triple helical and telopeptide regions, undergo a self-assembly process to generate collagen fibril intermediates and then mature collagen fibers. These fibers are further stabilized by covalent cross-links within the triple helix and telopeptide regions, providing strength and support to the surrounding tissue.
Collagen biosynthesis has been studied extensively for many years, but the mechanism by which three procollagen polypeptides initially associate and come into correct registration is not well understood. Early work studying procollagen assembly suggested that the C-propeptide plays an undefined but critical role in assembly of a triple helical molecule (4 -6). More recently, the procollagen C-propeptide has been implicated in selection of appropriate procollagen polypeptides and correct registration of the triple helix (7). The C-propeptide of type III procollagen is reported to direct the initial association of polypeptides and is required for helix formation (8). Expression of ␣1(III)-␣2(I) procollagen C-propeptide chimeras led to the identification of a discontinuous sequence of 15 amino acids, GNPELPEDVLDV. . .SSR, that directed procollagen self-association (9). Other reports implicate the last 10 amino acids of the ␣2(I) procollagen C-propeptide, as well as intramolecular disulfide bond formation, as prerequisites for assembly of the type I procollagen heterotrimer (10,11).
Our studies further address the role of the procollagen polypeptides, especially the role of the C-propeptide of type I procollagen, in folding of the triple helix. Different yeast strains have been shown to express and correctly fold human procollagens (12)(13)(14). The Saccharomyces cerevisiae expression system allowed us to evaluate the minimal requirements for collagen synthesis and folding, because these cells most likely lack any specialized genes for collagen biosynthesis in fibroblastic cells. We show that type I collagen heterotrimers expressed in yeast were correctly assembled into triple helical collagen monomers in the absence of the N-and C-propeptide domains from the pro␣1(I) and pro␣2(I) collagen polypeptides. Furthermore, we show that this same folding mechanism is operative in yeast in the absence of prolyl hydroxylase. The recombinant collagen forms normal fibrils in vitro.

EXPERIMENTAL PROCEDURES
Plasmid Constructions-Plasmid pGET323 (similar to pGET462 shown in Fig. 1A minus the preproHSA-␣2(I) procollagen-PGKt; described in Ref. 12) was used to create plasmids encoding ␣1(I) procollagen lacking the N-propeptide, C-propeptide, or both propeptides. pGET323 contains the yeast TRP1 gene, 2 micron origin, Escherichia coli ␤-lactamase gene, and CoLE1 origin (15), as well as the yeast GAL1-10 dual promoter (16), driving the production of preproHSA-␣1(I) procollagen. A plasmid encoding type I pC-collagen homotrimer * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. was created using pGET323 as a template for PCR 1 with the following primers: 5Ј-ACGCGTCGACAGCTGTCTTATGGCTATGATGAG-3Ј (sense) and 5Ј-TTGGAAGCCTTGGGGACCAGGTGCA-3Ј (antisense). This and all PCRs used the following program: 94°C, 1 min; 60°C, 1 min; 72°C, 3 min; 35 cycles. All DNAs generated by PCR were sequenced to confirm only the desired sequence was obtained. The PCR DNA was purified and digested with SalI-ApaLI. pGET323 was digested with SalI and DraIII, treated with calf intestinal phosphatase, and the 8.1-kb fragment was isolated. The 1.1-kb DraIII-ApaLI fragment from pGET323 was isolated, and the three fragments were ligated to create pDO243858, encoding ␣1(I) pC-collagen with the N-propeptide sequence removed and the preproHSA secretion signal fused directly to the N-telopeptide. pDO243858 was further modified by introduction of two stop codons after the last amino acid of the C-terminal telopeptide. This was accomplished using an intermediate plasmid that contained the 3Ј half of the ␣1(I) procollagen cDNA from the BamHI site at nucleotide 2803, relative to the ATG initiator codon, to the SspI site in the 3Ј untranslated region. The plasmid was digested with Eco47III and BspEI, and the following sequence was cloned using overlapping oligonucleotides: 5Ј-GCTGGTTTCGACTTCAGCTTCCTCCCCCAGCC-ACCTCAAGAGAAGGCTCACGATGGCGGCCGCTACTACCGGGCTG-ATGATGCCAATGTGGTTCGTGACCGTGACCTCGAGGTCGACACC-ACCCTCAAGAGCCTGAGCCAGCAGATCGAGAACAT-3Ј (sense) and the corresponding antisense oligonucleotide to create a double stranded oligonucleotide that introduces NotI and SalI sites flanking the C-telopeptide/propeptide junction. The plasmid containing the 3Ј ␣1(I) procollagen sequence with the newly introduced SalI and NotI sites was digested with these two enzymes, and the following oligonucleotides, containing two stop codons in tandem after the last amino acid of the C-telopeptide, were inserted: 5Ј-GGCCGCTACTACCGGGCTTAATGA-GATGATGCCAATGTGGTTCGTGACCGTGACCTCGAGG-3Ј (sense) and 5Ј-TCGACCTCGAGGTCACGGTCACGAACCACATTGGCATCAT-CTCATTAAGCCCGGTAGTAGC-3Ј (antisense). DNA sequencing was performed on the resulting plasmid, pDO24805, to confirm the sequence. pDO24805 was digested with BamHI and ClaI, and the 1.5-kb fragment was isolated. pGET323 was digested with SalI and ClaI and treated with calf intestinal phosphatase, and the 5.5-kb vector backbone was isolated. pGET323 was also digested with SalI and BamHI, and the 2.7-kb fragment containing the 5Ј portion of the ␣1(I) procollagen cDNA was isolated. The three fragments were ligated to create pDO248015, which encodes ␣1(I) pN-collagen. To construct a plasmid expressing only ␣1-rhcollagen I sequences lacking both N-and Cpropeptide sequences but containing the telopeptide sequences, the 5.5-kb SalI-ClaI fragment from pGET323, 1.5-kb BamHI-ClaI fragment from pDO248015, and the 2.2-kb SalI-BamHI fragment from pDO243858 were ligated. The resulting plasmid, pDO248010 encoding ␣1-rhcollagen I, was confirmed by restriction digestion and DNA sequence analysis.
Deletion of the N-propeptide and C-propeptide sequences from the ␣2(I) procollagen cDNA was achieved by first subcloning the 1.84-kb KpnI-SacII fragment from pGET462 (illustrated in Fig. 1A) into the SacII-KpnI sites of Bluescript SKIIϩ, creating plasmid pGET758. Deletion of the N-propeptide was accomplished by PCR using primers 5Ј-AGTACGCGTCGACAGTATGATGGAAAAGGAGTTGGA-3Ј (sense) and 5Ј-TCCAGGAGTTCCAGGGAAACCACG-3Ј (antisense) with pGET737 (similar to the illustration in Fig. 1 except the preproHSA signal was replaced with the native procollagen signals) as the template DNA. The PCR DNA was purified and digested with SalI and PstI and cloned into SalI-PstI-digested pGET758, creating plasmid pDO243861. The portion of the plasmid containing the PCR DNA was sequenced to confirm only the desired mutations were present. To reconstruct the promoter region into pDO243861, the 845-base pair SalI fragment from pGET758 was cloned into the SalI site. Isolates with the promoter in the correct orientation were identified by restriction digestion and DNA sequence analysis. The isolate with the desired configuration is pDO243863. To delete the C-propeptide sequence, a 3.1-kb SacII-ClaI fragment from pGET737 was subcloned into Bluescript KSIIϩ at the SacII-ClaI sites, creating pDO248019. This plasmid was digested with AvrII and CelII, and a DNA fragment, created by PCR using primers 5Ј-GGTCCTGCTGGTCCTAGGGGCCCT-3Ј (sense) and 5Ј-AGAAGGT-GCTGAGCGAGGCTGGTCTCATTAAGCCCTGTAGAAGTCTCCATC-GTAACC-3Ј (antisense) and pGET737 as template DNA, was inserted after purification and digestion with AvrII-CelII. Several isolates were sequenced and one, pDO248050, had the desired sequence.
A plasmid expressing type I pC-collagen heterotrimer, lacking the N-propeptide sequences from both ␣1(I) and ␣2(I) procollagen polypeptides, was constructed by ligating the 5.494-kb NsiI-AgeI fragment from pDO243858, the 1.213-kb SacII-AgeI fragment from pDO243863, and the 6.922-kb NsiI-SacII fragment from pGET737. This ligation created plasmid pDO243865. A plasmid encoding type I pN-collagen heterotrimer was constructed in two steps. First, the 2.3-kb CelII-SacII fragment from pDO248050 was cloned into the SacII-CelII sites in pGET462, creating pDO243869. pDO243869 was digested with AgeI-NsiI, and the 8.141-kb fragment was ligated to the 5.494-kb NsiI-AgeI fragment from pDO248015, creating plasmid pDO243873. Finally, to create a plasmid encoding rhcollagen I heterotrimer lacking all four propeptide domains, the 6.922-kb NsiI-SacII fragment from pDO243873, the 1.213-kb SacII-AgeI fragment from pDO243863, and the 5.494-kb NsiI-AgeI fragment from pDO248010 were ligated, creating pDO248053. Plasmids pGET462, pDO243873, pDO243865, and pDO248053 were transformed into yeast GY5382 to create strains CYT89, -87, -90, and -59, respectively, and grown at 30°C. Plasmid pDO248053 was also transformed into yeast GY5196 to create strain CYT38 and was grown at 20°C. Yeast strains were transformed by electroporation (17) and grown in 2% glucose, 0.67% yeast nitrogen base with ammonium sulfate, and 0.5% casamino acids. Induction of protein expression was performed in medium similar to that described above except a mixture of 0.5% glucose and 0.5% galactose was the carbon source, and the medium was buffered with 1% sodium citrate, pH 6.5. GY5196 strains contained 2% galactose as the sole carbon source during induction, and cultures were supplemented with 34 g/ml uracil.
Fermentation and Purification-Recombinant collagen was purified from yeast cells following fermentation at the 10-liter scale (Biostat C; B. Braun, Goettingen, Germany). Fermentation conditions were maintained at 30°C (strain GY5382) or 20°C (strain GY5196) (both strains as described in Ref. 12), pH 6.5, and aeration at 1 air volume/liquid volume/min. Dissolved oxygen was controlled at 20% by automatically adjusting the impeller speed. The fermentation was started batchwise with medium composed of 5 g/liter casamino acids, 40 g/liter glucose, 0.5 g/liter galactose, and 13.4 g/liter yeast nitrogen base (YNB; Difco). Galactose levels were monitored throughout the fermentation and maintained at 0.5 g/liter. When the glucose was depleted from the medium, the culture was fed to achieve a final concentration of 20 g/liter glucose with 6.7 g/liter YNB added to supplement the growth medium. When the glucose from the first feed was depleted, a second feed of 20 g/liter glucose and 6.7 g/liter YNB was added to the fermentor. The yeast were harvested when cell growth ceased and were frozen at Ϫ80°C. Frozen cell paste was resuspended in 0.1 M Tris-HCl, pH 7.4, 0.4 M NaCl and passed through a DYNO Mill (Type KDLA; Bachofen AG, Basel, Switzerland) to rupture the cells. Cell debris was removed by centrifugation at 10,000 ϫ g for 30 min at 4°C, and the supernatant, containing total soluble protein, was collected and delipidated using Celite 512. After delipidation, collagen was recovered from the soluble fraction by NaCl precipitation at neutral pH values as described (18). The precipitate was resuspended and dialyzed into 50 mM sodium acetate, pH 4.5 (Buffer A), at 4°C. The precipitate that formed during dialysis was removed by centrifugation, and the supernatant containing the collagen was further purified by cation exchange chromatography. A column of SP-Sepharose (5 ϫ 18 cm) was equilibrated in Buffer A at 4°C, and the sample was applied to the column at a flow rate of 5 ml/min. Bound proteins were eluted with a linear NaCl gradient (0 -0.5 M NaCl, 20 column volumes). Fractions containing purified collagen were pooled, and the protein was recovered by precipitation at an acidic pH value with NaCl (18). The precipitate was solubilized in 10 mM HCl at a concentration of ϳ3 mg/ml.
Protease Digestion Assays-rhcollagen I samples were assayed for helix formation and thermal stability by pepsin or trypsin/chymotrypsin digestion (19). Samples were digested with 150 g/ml pepsin for 30 min at 10, 20, 30, 40, and 45°C. The reaction was terminated by neutralization of the sample with 1 N NaOH. Trypsin/chymotrypsin digestion were performed as described previously (19), with minor modifications. Prior to addition of the proteases, samples were preincubated at the desired temperatures (20,25,30,35,40, and 45°C) for 15 min. The proteases were added and incubated for an additional 2 min. Partially purified type I procollagen was obtained from serum-free conditioned medium of human skin fibroblast cultures by 20% ammonium sulfate precipitation. Procollagen was treated with proteases as a control in these assays. Reaction products were analyzed by SDS-PAGE using 6% gels, and proteins were visualized by staining with Gelcode Blue (Pierce, Rockford, IL). Scanning densitometry was done using the Bio-Rad GelDoc 1000 gel scanner.
Procollagenase Digestion-Human fibroblast procollagenase (provided by Howard Welgus, Washington University Medical Center, St. Louis, MO) was activated by treatment with 10 g/ml trypsin at 25°C for 30 min. The activation reaction was stopped by the addition of soybean trypsin inhibitor to a final concentration of 50 g/ml. Samples were digested with activated collagenase in 0.05 M Tris-HCl, pH 7.5, 0.15 M NaCl, 0.01 M CaCl 2 for 16 h at 25°C. Digests were analyzed by SDS-PAGE using 4 -12% gradient gels, and proteins were visualized by staining with GelCode Blue.
Circular Dichroism Analysis-Purified samples to be analyzed by CD were diluted to a concentration of ϳ100 g/ml using 200 mM sodium phosphate, pH 7.0. A 200-l aliquot was placed in a rectangular 1-mm cuvette. The samples were analyzed using a Jasco model J-715 CD spectropolarimeter with a six-position Peltier controlled sample holder. The samples were allowed to equilibrate for 5 min at each temperature and were then scanned from 250 to 185 nm. The scan was repeated over a temperature range spanning the expected melting temperature (T m ). The results were plotted as the molar ellipticity at a given wavelength as a function of temperature.
Amino Acid Analysis-Aliquots of purified collagen samples were dried and then subjected to vapor phase hydrolysis overnight at 116°C under N 2 , in vacuo. The hydrolyzed amino acids were derivatized with the AccQ-Tag chemistry kit (Waters Corporation, Milford, MA) and analyzed on an AccQ-Tag column using a Hewlett Packard model 1100 high pressure liquid chromatograph.
Fibril Formation-Fibrils were formed by dialysis against 20 mM sodium phosphate, pH 7.2, at 15°C (20). The suspension of collagen fibrils was diluted with 20 mM sodium phosphate to 0.125, 0.25, 0.5, and 1.0 mg/ml. A drop of the fibril suspension was transferred to formvar/ carbon-coated grids (Polysciences, Inc., Warrington, PA), washed with water, and air-dried. The grids were negatively stained with 1% phosphotungstic acid, pH 7 (21,22), and examined at 75 kV using a Hitachi 7000 transmission electron microscope. The magnification was calibrated using a line grating. For diameter determination, fibril diameters were determined as the average of three measurements and normalized to the periodicity of the collagen fibril as an internal standard.

Expression of Collagen with Deleted Propeptide Regions-A
previously described heterologous yeast expression system (12) was used to directly examine the role of the propeptides on folding of the type I procollagen triple helix. A plasmid was constructed that contained ␣1(I) and ␣2(I) procollagen cDNAs fused to the human serum albumin signal sequence under the control of the GAL1-10 promoter. Additional constructs were made that did not contain the N-propeptide, C-propeptide, or both propeptides (Fig. 1). This series of plasmids were transformed into a S. cerevisiae strain that was previously engineered to express functional chicken prolyl hydroxylase. Collagen expression levels were measured with a quantitative immunoassay that specifically detects native type I collagen heterotrimers but is unable to detect denatured type I collagen or type I collagen homotrimers (12). Expression of ␣1(I) and ␣2(I) procollagen heterotrimer lacking the N-propeptide (pCcollagen) resulted in detection of a 3.9-fold increase in expression relative to full-length procollagen (Table I). Surprisingly, a construct expressing ␣1(I) and ␣2(I) procollagen polypeptides lacking the C-propeptide (pN-collagen) was detected by the immunoassay; furthermore, expression was 5.9-fold higher than procollagen in the same strain. Because the assay detected pN-collagen, it strongly suggested that the pN-collagen polypeptides had folded into a triple helix. The generation of an expression plasmid that codes only for the telopeptide and triple helical region of collagen (rhcollagen I) lead to further increases in heterotrimer expression levels; expression was 18-fold higher than procollagen. These unexpected results suggest the propeptides are not required for folding, but their presence may act to limit expression in yeast.
Characterization of rhcollagen I-The propeptide domains of procollagen are sensitive to proteolytic cleavage by pepsin, trypsin, and chymotrypsin, whereas the triple helix is resistant to these proteases. Additionally, collagen ␣-chains that are not correctly folded into a triple helix are susceptible to proteolysis (18). Protease digestion assays were performed to probe the helical conformation of the yeast-derived rhcollagen I and demonstrated that the chains were correctly registered and that the rhcollagen I was of the expected molecular weight. Purified  non-hydroxylated and hydroxylated rhcollagen I and procollagen were treated with pepsin at different temperatures (Fig. 2, A-C). Non-hydroxylated rhcollagen I was resistant to pepsin digestion at 10 and 20°C, whereas hydroxylated rhcollagen I was resistant to pepsin treatment up to 30°C. The non-hydroxylated collagen polypeptides migrated slightly faster in SDS-PAGE gels than hydroxylated polypeptides ( Fig. 2A, compare lanes 2 and 3), a finding consistent with previous reports of altered ␣-chain migration in gels because of the lack of hydroxylation (23). The fibroblast-derived procollagen control was converted to collagen and was pepsin-resistant up to 30°C but was degraded at 40°C. Resistance to pepsin alone should be evaluated with caution when judging the formation of a collagen triple helix (24), with recommendation to use a combination of trypsin and chymotrypsin to probe the integrity and proper alignment of the triple helix. The rhcollagen I and fibroblast-derived procollagen were also treated with a mixture of trypsin and chymotrypsin, and the results of digestions with pepsin were confirmed with this mixture of proteases (Fig. 2, D-F). With the exception of the slight increase in mobility of the non-hydroxylated collagen, the size of the protease-resistant bands in both sets of digestions were the same for native and recombinant collagen, indicating that recombinant non-hydroxylated and hydroxylated rhcollagen I molecules have formed a correctly aligned, fulllength triple helix. Furthermore, these results suggest that folding of the triple helix in yeast can occur not only in the absence of both N-and C-propeptides but also without prolyl hydroxylase.
The ratio of pepsin-resistant ␣1 and ␣2 collagen polypeptides from hydroxylated rhcollagen I was not 2:1 as expected for type I collagen heterotrimer but was estimated to be 5:1 by scanning densitometry of SDS-PAGE gels. Our interpretation of this finding is that both heterotrimeric and homotrimeric (␣1) 3 rhcollagen I molecules are present in approximately equal amounts. The presence of homotrimeric type I collagen has been observed in tissue, certain fibroblasts producing type I collagen, and in other recombinant expression systems (25)(26)(27).
A further test of the integrity and alignment of the triple helix of rhcollagen I was cleavage using mammalian collagenase. Mammalian collagenase makes one specific cleavage in all three polypeptides in the triple helix at position 775-776. In addition to a specific primary sequence Gly-(Ile/Leu)-(Ala/Leu), the local structure and fold of the helix is also critical for collagenase cleavage (28,29). If the rhcollagen I triple helix were out of alignment by as little as 3 amino acids, collagenase cleavage would not occur (30). Digestion of native type I collagen and hydroxylated rhcollagen I with mammalian collagenase resulted in the generation of typical collagenase reaction products (Fig. 3), indicating both rhcollagen I hetero-and homotrimers were cleaved by collagenase. The larger fragments of both ␣1 and ␣2(I) collagen (TC A ) are visible on the gel, as well as the smaller fragments (TC B ) from native type I collagen. Consistent with the lower ratio of ␣2 versus ␣1 rhcollagen I polypeptides, the proteolytic TC B fragment of ␣2 rhcollagen I was not visible on this gel. These results show that the polypeptides of rhcollagen I are properly aligned.
Hydroxylated and non-hydroxylated rhcollagen I was compared with human placental type I collagen by CD spectroscopy over a range of temperatures. The T m values determined by CD analysis were 30, 24, and 40°C for hydroxylated and nonhydroxylated rhcollagen I and native type I collagen, respectively (Fig. 4). These results suggest the rhcollagen I produced by the strain containing the prolyl hydroxylase genes was partially hydroxylated. This finding was confirmed by amino acid analysis showing the rhcollagen I had 49 Ϯ 2% (mean Ϯ standard deviation) of the hydroxyproline expected for collagen isolated from tissue and less than procollagen expressed in the same strain (12). Hydroxylated lysine residues were not detected.  3 and 4) and yeast-produced hydroxylated rhcollagen I (lanes 5 and 6) were treated with mammalian collagenase (lanes 4 and 6) at 25°C for 16 h. Lane 1 contains a molecular weight marker (broad range marker; Bio-Rad), and lane 2 contains collagenase. Digests were fractionated by SDS-PAGE using 4 -12% gradient gels, and reaction products were visualized by staining with GelCode Blue. TC A denotes a fragment of the collagen molecule from the N terminus to amino acid 775; TC B is the fragment from amino acid 776 to the C terminus.
Fiber Formation-The ability of rhcollagen I to assemble into collagen fibrils was examined by electron microscopy. rhcollagen I was purified without the use of any proteases, ensuring that the telopeptides remained intact. Fibrils formed with rhcollagen I showed the typical banding pattern of fibrils seen in tissues (Fig. 5). Striated collagen fibrils were long and cylindrical and had tapered ends similar to those seen in vivo (21,22). The fibrils showed a periodicity of 67 nm, comparable with in vivo-formed fibrils from type I collagen-rich tissues such as tendon. The shafts of the formed fibrils had a mean diameter of 275 Ϯ 97 nm and ranged from 134 to 470 nm. The diameters of these fibrils are comparable with other in vitro fibrillogenesis experiments using type I collagen and with fibers found in most tissues. The melting temperature of fibrils made from rhcollagen I was measured at 48°C using differential scanning calorimetry, comparable with native bovine collagen isolated from hide (51-52°C; data not shown). DISCUSSION Previous studies showed that triple helical type I procollagen could be generated in transfected mammalian cells with ␣1 and ␣2(I) procollagen polypeptides lacking the N-propeptide (31). To further study the role of the propeptide domains in procollagen triple helix assembly, engineered ␣1(I) and ␣2(I) procollagen genes were generated that encode procollagen lacking either the N-or C-terminal propeptide regions or both propeptide regions. Using an S. cerevisiae heterologous expression system, type I collagen polypeptides were assembled into correctly aligned triple helix without the presence of the C-propeptide. These results indicated that only the triple helix and telopeptide regions of the molecule are sufficient for assembly of a triple helix. These results were completely unexpected based on all of the reports describing the essential role of the C-propeptide in pro␣-chain selection and registration to bring the sequences at the C terminus of the triple helical region in close proximity to each other to form a folding nucleus (1-2, 4 -11).
The triple helix of rhcollagen I was further examined using protease digestion. Proteolytic enzymes digest single polypeptide chains and incorrectly folded collagen molecules. In two separate protease digestion experiments, using either pepsin or trypsin and chymotrypsin, we demonstrated resistance of the triple helix of rhcollagen I to these proteases, indicating that correct folding of rhcollagen I had occurred. Additionally, correctly folded full-length rhcollagen I was produced by yeast cells that were not engineered to express prolyl hydroxylase. Thus, neither proline hydroxylation nor the prolyl hydroxylase enzyme is essential for polypeptide chain association and folding of the triple helix under our experimental conditions. We questioned whether the pepsin and trypsin/chymotrypsin-resistant rhcollagen I, which appears full-length by SDS-PAGE, may contain ␣1 and ␣2 rhcollagen I chains that are out of register by only a few residues. If this was the case, proteolytic removal of a few N-or C-terminal amino acids in a nonhelical configuration because of ␣-chain misalignment would not be detected by SDS-PAGE. To distinguish between correctly aligned and slightly misaligned ␣-chains, we used mammalian collagenase digestion as a probe. Collagenase cleaves collagen at one specific locus, the Gly-Ile bond at position 775-776. In addition to this primary sequence requirement for cleavage, other parameters affect cleavage (28). Type I, II, and III collagen contain the partial sequence Gly-(Ile/Leu)-(Ala/ Leu) at 27 other locations throughout their helical domains. All of these other loci are not cleaved by collagenase. The reasons for this strict specificity is not completely understood but suggests that extended sequences around the cleavage site or a specific folded structure are required for activity. It has been proposed that the helix is tightly folded up to and including the Gly residue at the cleavage site. Because the four triplets following Gly-775 are imino acid deficient, a loosely folded helical structure is thought to exist at the C-terminal side of the cleavage site (29). The border between the tight and loose forms of the helix may act to present the cleavage site more efficiently to collagenase. We reasoned that if our collagen chains were even slightly out of alignment, these additional structural requirements of collagenase would be altered or lost. Furthermore, mutagenesis experiments have shown that alteration of the primary sequence of ␣1 inhibits cleavage, and the presence of non-cleavable ␣1-chains in a heterotrimer with wild-type ␣2-chains results in inhibition of ␣2-chain cleavage (32). Cleavage of the ␣1and ␣2-chains of rhcollagen I was complete, indicating both hetero-and homotrimers were cleaved by collagenase. This finding suggests that the ␣-chains of rhcollagen I are properly aligned.
Fibrillogenesis data attests to the integrity and quality of this recombinant collagen. Electron micrographic analysis of fibers formed with rhcollagen I shows the typical banding pattern of fibers seen in tissues. Telopeptides play an important role in collagen fibrillogenesis as their removal alters both the kinetics of fiber formation and fiber morphology including loss of diameter uniformity and unidirectional packing (33). The presence of the telopeptide domains adds additional physical properties to rhcollagen I that may be lacking in proteasetreated collagen. Even though the rhcollagen I monomer has a hydroxyproline level at ϳ50% of the level of fully hydroxylated collagen, these molecules produce a fibril with similar thermal stability to tissue-derived collagen.
Deletion of the N-propeptide and C-propeptide and the combination of the two deletions resulted in dramatic increases in collagen levels. One possible explanation of this phenomena is that synthesis of pro␣-chains represents the rate-limiting step in collagen production in yeast. Removal of ϳ45 K d of protein (ϳ15 K d from the N-propeptide and ϳ30 K d from the C-propeptide) from each ␣-chain may remove a significant biosynthetic burden, resulting in higher expression levels. Additional posttranslational modifications occur in the propeptides, including N-linked glycosylation of the C-propeptide and the formation of several disulfide bonds in both propeptides. The elimination of these processing steps could also enhance expression.
A second explanation relates to the role of the propeptides in folding. The C-propeptide has been thought to play a key role in pro␣-chain selection and registration (4 -7). Additionally, the C-propeptide may also affect the rate of folding, acting to limit helix formation, thus allowing prolyl hydroxylase to bind to the unfolded pro␣-chains and completely hydroxylate proline residues in the Y position of GXY triplets. If the C-propeptides were removed and the chains were allowed to fold faster, expression levels would be expected to increase. An additional consequence of this increased folding rate would be decreased hydroxylation levels, because helix formation limits hydroxylation. Our data indicate the rhcollagen I is underhydroxylated, supporting this hypothesis. Furthermore, when the same yeast background was used to express procollagen, hydroxylation levels were 33% higher (12). The lower levels of hydroxylation seen in the rhcollagen I strain may be overcome by increasing the levels of prolyl hydroxylase.
In this study we demonstrated selection and folding of type I rhcollagen I subunits into (␣1) 2 ␣2 heterotrimers and (␣1) 3 homotrimers in yeast in the absence of propeptide sequences. We have engineered type I collagen genes that encode the N-and C-telopeptides with the entire triple helical domain and show that these sequences are sufficient for assembly of a triple helix in S. cerevisiae. Other fibrillar collagens (types II-III, V, and XI) have a similar structure and thus would be expected to fold into triple helices without the propeptide regions in an analogous system. Additionally, manipulating the collagen molecule in this manner has allowed us to evaluate folding mechanisms separate from hydroxylation and other roles played by prolyl hydroxylase.
Yeast has been used as a model system to study the expression, trafficking, and assembly of eukaryotic proteins, and this organism is now being utilized for studying the mechanisms of collagen synthesis and folding (12)(13)(14). The minimal requirements for collagen synthesis and folding were evaluated, because yeast do not express collagen and are unlikely to possess the specialized genes required for collagen biosynthesis in other cells (such as fibroblasts). Our studies in this system have provided new insights into the minimal structural requirements for collagen expression and folding. We are in the process of testing additional collagen types using this system and also are testing mammalian cells for the functional requirements of structural domains to form a triple helical collagen. The ability to dissect the process of collagen assembly using this heterologous expression system exemplifies its utility for furthering the understanding of the key components of collagen triple helix assembly and collagen biology.