Production of Recombinant Human Type I Procollagen Trimers Using a Four-gene Expression System in the Yeast Saccharomyces cerevisiae *

The expression of stable recombinant human collagen requires an expression system capable of post-translational modifications and assembly of the procollagen polypeptides. Two genes were expressed in the yeast Saccharomyces cerevisiae to produce both propeptide chains that constitute human type I procollagen. Two additional genes were expressed coding for the subunits of prolyl hydroxylase, an enzyme that post-translationally modifies procollagen and that confers heat (thermal) stability to the triple helical conformation of the collagen molecule. Type I procollagen was produced as a stable heterotrimeric helix similar to type I procollagen produced in tissue culture. A key requirement for glutamate was identified as a medium supplement to obtain high expression levels of type I procollagen as heat-stable heterotrimers inSaccharomyces. Expression of these four genes was sufficient for correct assembly and processing of type I procollagen in a eucaryotic system that does not produce collagen.

The expression of stable recombinant human collagen requires an expression system capable of post-translational modifications and assembly of the procollagen polypeptides. Two genes were expressed in the yeast Saccharomyces cerevisiae to produce both propeptide chains that constitute human type I procollagen. Two additional genes were expressed coding for the subunits of prolyl hydroxylase, an enzyme that post-translationally modifies procollagen and that confers heat (thermal) stability to the triple helical conformation of the collagen molecule. Type I procollagen was produced as a stable heterotrimeric helix similar to type I procollagen produced in tissue culture. A key requirement for glutamate was identified as a medium supplement to obtain high expression levels of type I procollagen as heatstable heterotrimers in Saccharomyces. Expression of these four genes was sufficient for correct assembly and processing of type I procollagen in a eucaryotic system that does not produce collagen.
Collagen is the single most abundant protein found in animals. It is found in all animals, including sponges. It is not expressed in yeast. In mammals, it is expressed in most tissues and plays both a structural as well as a signaling role in the development, maintenance, and repair of tissues and organs. More than 30 gene products compose the collagen family of molecules (1). Procollagens have several features and require numerous steps for production of functional molecules, including post-translational modifications (2). Key features in the collagen family are the formation of a triple helix composed of three polypeptide chains and the post-translational modification of proline residues to hydroxyproline, which provides stability of the triple helix against thermal denaturation and unfolding (T m ) 1 at the animal's body temperature (3). The content of proline and hydroxyproline is correlated with the temperature of an animal's environment (4). The triple helical domain of procollagen consists of -(GXY) n -repeats, where X and/or Y is frequently proline or hydroxyproline in the mature molecule. Prolyl 4-hydroxylase, an ␣ 2 ␤ 2 tetrameric enzyme composed of the prolyl hydroxylase ␣-subunit (␣PH) and the protein-disulfide isomerase (PDI) subunit in higher eucaryotes, is the enzyme that modifies proline residues to hydroxyproline. Additional steps for procollagen production include carbohydrate attachment, folding into a triple helix, secretion into the extracellular matrix, and cleavage by specific proteases to remove the propeptide domains to form mature collagen helices. A C-terminal non-helical propeptide facilitates the assembly of trimeric collagen molecules, leading to helix formation (5); the N-terminal propeptide may limit fiber diameter (6). The association and folding steps of three polypeptide chains that compose the triple helix potentially require chaperone functions in the endoplasmic reticulum, with PDI (7) and Hsp47 (8) as two proteins that have been implicated in the assembly of a procollagen trimer.
A fundamental question regarding collagen biosynthesis is which genes are essential for the expression of collagen in cells and which are nonessential. Expression of recombinant collagen has been performed using mammalian, baculoviral, and transgenic systems. Single procollagen genes were expressed in mammalian cells to produce homotrimeric type I procollagen (9), type II procollagen (10), and homotrimeric type V collagen (11). In baculovirus, prolyl hydroxylase was transfected and shown to be a functional enzyme (12). Subsequently, type I and III procollagens were transfected and expressed and were shown to be capable of modification by prolyl hydroxylase (13)(14)(15). Recently, homotrimeric type I procollagen and an engineered form of ␣2(I) procollagen have been expressed in the milk of transgenic mice (16,17). In contrast, no report of procollagen expression and assembly has been published using a bacterial expression system.
The yeast Pichia pastoris was first engineered to express prolyl hydroxylase and subsequently shown to produce functional type III procollagen if the gene for type III procollagen was introduced (18,19). Like Saccharomyces, Pichia contains endogenous PDI, but not ␣PH, and it does not synthesize procollagen. It was therefore a useful system to test the requirements for genes to produce type III procollagen. In this system, the type III procollagen gene and the two genes for prolyl hydroxylase were sufficient to produce stable type III procollagen molecules. However, Saccharomyces is evolutionarily diverse from Pichia. Furthermore, type I procollagen is composed of polypeptides generated from two distinct genes to form an (␣1) 2 ␣2 structure, whereas type II and III procollagens require only one gene product to form an (␣1) 3 structure.
To our knowledge, this is the first report to describe a multigene system in Saccharomyces that results in both the assem-bly and non-native post-translational modification of a multimeric protein to produce a functional heterologous molecule. A total of four gene products were required in Saccharomyces to generate a thermally stable triple helical type I procollagen: two genes that code for the polypeptide chains of type I procollagen and two additional genes that code for the subunits of prolyl hydroxylase. No other added genes were required to produce a functional procollagen. We further optimized our expression system at the molecular level, but also optimized the addition of medium components to significantly increase the level of expression of type I procollagen in Saccharomyces.

EXPERIMENTAL PROCEDURES
Plasmid Constructions-The precursor plasmid pGET100 and plasmid pGET150, which contains the GAL1/GAL10 dual promoter and is the base plasmid for other constructions, were made as follows. YEp9T, containing yeast (2 origin, FLP1 gene terminator in 2 DNA, and the yeast TRP1 gene) and bacterial (pBR322 functions) sequences (20), was modified between an NdeI site in the 2 DNA and a second NdeI site near the origin of pBR322 with the polylinker sequence (NdeI)-PvuII-ApaI-BglII-ClaI-NheI-XhoI-EcoRI-BamHI-AflII-NotI-(NdeI) to create pGET100 (PvuII site closest to 2 DNA). Genomic DNA from yeast strain S1799D (MAT␣ trp5 his4 ade6 gal2) was used as the template for PCR with primers (based on sequence information (21)) containing BamHI placed at Ϫ6 of the GAL1 promoter side and EcoRI placed at Ϫ1 of the GAL10 promoter side. The 687-bp EcoRI/BamHI GAL1/GAL10 product was subcloned into pUC. This fragment was then placed into the EcoRI/BamHI sites of pGET100 to make pGET150. The specific structure of circular plasmid pGET150 is as follows: Ap R (ampicillin resistance gene)-yeast TRP1-yeast 2 origin-FLP1 terminator-PvuIIpolylinker-EcoRI-GAL10 promoter/GAL1 promoter-BamHI-AflII-NotI-(NdeI)-SapI-Escherichia coli origin of replication-Ap R . Both sides of this dual promoter are inducible with galactose and repressible by glucose.
Plasmid pGET333, expressing human ␣1(I) preprocollagen, was constructed by cloning an SspI/XbaI fragment containing the human ␣1(I) preprocollagen cDNA (22) coding region between the PvuII and NheI sites in the polylinker of pGET150. To express human ␣1(I) procollagen using other secretion signals known to work well in yeast, a SalI site was introduced at the pre/pro junction (just upstream of amino acid 23 in preprocollagen), removing the 22-amino acid presequence using PCR. This site was used to fuse two secretion signals to the ␣1(I) procollagen gene using the artificial SalI site and the EcoRI site adjacent to the GAL promoter in pGET333. Plasmids pGET323 and pGET335 contain the prepro-human serum albumin (HSA) (23) secretion signal and the yeast prepro-␣-factor (24) secretion signal, respectively. The prepro-␣factor signal was isolated using PCR, whereas the prepro-HSA signal was constructed from synthetic oligonucleotides. Both sequences were isolated as EcoRI/SalI fragments with the SalI site containing the Arg-Arg KEX2 protease cleavage site (23) at the end of these prosequences to give authentic procollagen protein.
The general structure of all other plasmids is as follows: Ap R -yeast TRP1-2 origin-FLP1 terminator-PvuII-SspI-␣1(I) preprocollagen-XbaI-NheI-XhoI-EcoRI-GAL10 promoter/GAL1 promoter-BamHI-AflII-␣2(I) preprocollagen-3-phosphoglycerate kinase gene terminator-NotI Ϯ PMA1 promoter-yeast invertase secretion signal-PDI gene-ADH1 terminator-NotI-(NdeI)-SapI Ϯ 3-phosphoglycerate kinase gene promoter-␣PH-GAL10 terminator-SapI-E. coli origin of replication-Ap R . PMA1 and ADH1 refer to the plasmid membrane ATPase 1 gene and the alcohol dehydrogenase 1 gene, respectively, isolated from yeast. The full-length cDNA for human ␣2(I) preprocollagen has been described (25). The PDI gene used was from either chicken (26) or human (27) utilizing the yeast invertase secretion signal (23), replacing the first 22 amino acids of the chicken PDI gene. The ␣PH gene cDNA was from chicken (28) or human (29). The 3-phosphoglycerate kinase gene promoter (828 bp from the natural ClaI site to the introduced EcoRI site upstream of ATG), PMA1 promoter (the 939-bp fragment from the natural HindIII site to the introduced EcoRI site upstream of ATG), 3-phosphoglycerate kinase gene terminator (the 301-bp BamHI/SmaI fragment to the HindIII/NotI fragment using PCR to add NotI to Hin-dIII and to make sites devoid of the 3-phosphoglycerate kinase structural gene), ADH1 terminator (the natural 330-bp HindIII fragment), and GAL10 terminator (the 360-bp BamHI/SphI fragment) elements were originally isolated by PCR based on sequences and references in the Saccharomyces Genome Data Base (Department of Genetics, School of Medicine, Stanford University). Some ends and junctions were created using synthetic oligonucleotides.
Plasmid pGET737 contains only human ␣1(I) and ␣2(I) preprocollagen genes as described above. The human or chicken PDI and ␣PH expression units were added to plasmid pGET737 as NotI or SapI fragments, respectively, to create plasmids pGET837 (chicken PDI and human ␣PH), pGET901 (chicken PDI and ␣PH), and pGET903 (human PDI and ␣PH). Strain GY5382 contains integrated chicken ␣PH and PDI cDNA expression units in the yeast TRP1 locus, resulting in a trp1⌬ strain that expresses both of these genes under the control of the GAL10 promoter/GAL1 promoter elements. The EcoRI/PstI fragment of the TRP1 gene (30) was cloned into the pBluescript II SK ϩ vector. An EcoRI/BamHI fragment was then placed into pBR322 with subsequent deletion of the MfeI/BstXI fragment within the TRP1 structural gene and replacement with the polylinker (MfeI)-NotI-BglII-XhoI-(BstXI). The HindIII fragment containing the URA3 yeast gene (31) was converted to a SalI fragment and placed into the above XhoI site. A NotI fragment containing the ADH1 terminator-chicken PDI-yeast invertase secretion signal-AflII-GAL10 promoter/GAL1 promoter-AflII-chicken ␣PH-3-phosphoglycerate kinase gene terminator was added to the NotI site in the polylinker. This new plasmid (pGET829) containing the URA3 gene and dual expression units for chicken PDI and ␣PH within the disrupted TRP1 gene was cut with PmlI (61 bp from EcoRI in the promoter region) and ApaLI (10 bp in from the PstI site on the other end of the TRP1 gene) and therefore has homology to both ends of the TRP1 gene. Integration of this linear fragment was performed by a double crossover during yeast transformation. Western assays using antibodies to chicken ␣PH and PDI (32) identified the highest producing transformants. Subsequent analysis of the resulting strain, GY5382, indicated that it contains multiple integrations of the chicken PDI and ␣PH expression unit at the TRP1 locus.
Each strain was grown in base medium consisting of yeast nitrogen base buffered with 1% sodium citrate (pH 6.5) and supplemented with a carbon source (20 g/liter galactose for GY5196 and 10 g/liter glucose and 5 g/liter galactose for GY5382) and 0.5% casamino acids unless otherwise described. Supplementation using arginine (110 mg/liter), glutamate (765 mg/liter), and/or lysine (286 mg/liter) was at concentrations equivalent to the concentrations in casamino acids. Each culture was grown at 20°C (without ␣PH and PDI) and 30°C (with ␣PH and PDI) unless indicated otherwise and harvested at 60 -70 h. The cells were collected by centrifugation, resuspended in phosphate-buffered saline plus 5 mM EDTA and 1 mM phenylmethylsulfonyl fluoride, mixed with an equal volume of acid-washed glass beads, and frozen at Ϫ70°C. The cells were thawed and lysed by vortexing for 6 -15 min and centrifuged to remove cellular debris.
Quantitative Assay for Collagen-Collagen yield was determined by a luminometric immunoassay utilizing a goat anti-type I collagen antibody from BIODESIGN International (Kennebunkport, ME) derivatized with either biotin or ruthenium chelate. Samples were analyzed by lysing cells as described above and centrifuging to remove cell debris. The clarified supernatant samples from cell lysis were diluted in matrix buffer (100 mM PIPES (pH 6.8) and 1% (w/v) bovine serum albumin) in duplicate. A 25-l aliquot was mixed with 50 l of an antibody solution containing 1 g/ml ruthenium chelate-conjugated antibody and 1.5 g/ml biotin-conjugated antibody in diluent (matrix buffer plus 1.5% Tween 20). Samples were incubated for 2 h at ϳ20°C with shaking. A 25-l aliquot of 1 mg/ml solution of streptavidin-conjugated magnetic beads (in diluent) was added to each sample, and the samples were shaken for 30 min. A 200-l aliquot of ORIGEN assay buffer (IGEN Inc., Gaithersburg, MD) was added to each sample and then placed in an ORIGEN analyzer (IGEN Inc.). Total protein was determined using the BCA assay (Pierce) using a microtiter plate format.
Gels and Western Blots-The equivalent of 20 ml of cells at A 600 ϭ 1.0 were collected, resuspended in 200 l of buffer, and lysed as described above. SDS sample buffer was added; the samples were incubated at 100°C for 5 min; and the debris was collected by centrifugation. Clarified supernatants were loaded onto 5 or 10% SDS-polyacrylamide gels, electrophoresed, and stained using either GELCODE blue (Pierce) or silver stain.
Carbohydrate Analysis-Plasmid pGET327 in strain GY5344, an early strain containing the integrated ␣PH and PDI expression cassette, was grown, and the cells were lysed as described above. Procollagen was precipitated from the clarified supernatant with 4.5 M NaCl and resuspended in 0.1 M Tris-HCl (pH 7.4). Recombinant procollagen C-proteinase/BMP-1 (35) was used to cleave at the C-propeptide junction (36). The digest was treated with endoglycosidase H (New England Biolabs Inc., Beverly, MA) to remove N-linked carbohydrates as described by the manufacturer. The digests were analyzed by Western blotting using the ␣1(I) procollagen C-propeptide-specific antibody LF-41 (34).
Determination of Thermal Stability-Pepsin digestions were performed on yeast extracts at pH 2.5 using 640 units/ml pepsin with incubation for 15 min. The samples were neutralized with 1 M Tris base. SDS sample buffer was added, and the samples were boiled and then loaded onto a 5% SDS-polyacrylamide gel. Type I procollagen purified from yeast cells or from conditioned medium of human skin fibroblasts was treated with a mixture of trypsin (100 g/ml) and chymotrypsin (250 g/ml) (37). The samples were preheated to the desired temperature for 15 min in a thermal cycler (Perkin-Elmer Model 480), followed by addition of proteases and further incubation for 2 min. The digestion was stopped by addition of SDS sample buffer, followed by immediate boiling of the samples.
Amino Acid Analysis-Aliquots of the purified protein samples were dried and then subjected to vapor-phase hydrolysis overnight at 116°C under N 2 in vacuo. The hydrolyzed amino acids were derivatized with the AccQ-Tag chemistry kit from Waters and analyzed on an AccQ-Tag column using a Hewlett-Packard Model 1100 high pressure liquid chromatography apparatus.
Circular Dichroism Analysis-Purified samples were diluted to ϳ100 g/ml using 200 mM sodium phosphate (pH 7.0). Aliquots of 200 l were analyzed using a Jasco (Easton, MD) Model J-715 CD spectropolarimeter with a Peltier controlled sample holder. The samples were equilibrated for 5 min at each temperature and then scanned from 250 to 185 nm. The results were plotted as the molar ellipticity at a given wavelength as a function of temperature. A first derivative plot of the data was used to determine T m .

Expression of ␣1(I) Procollagen Gene and Analysis of Collagen Protein-
The initial approach to producing human type I procollagen was to express the ␣1(I) procollagen polypeptide. The native human ␣1(I) procollagen signal sequence was tested as well as the prepro-HSA (23) and prepro-␣-factor (38) sequences that are commonly used to express heterologous proteins in yeast (Fig. 1). Yeast cells were transfected with plasmids that contain the ␣1(I) procollagen gene constructs varying only in their signal sequence (Fig. 2A). The native procollagen signal sequence resulted in the highest level of ␣1(I) procollagen produced based on the intensity of the bands on the Western blot; the prepro-␣-factor and prepro-HSA regions also directed the synthesis of human procollagen, but to a lesser degree. As expression of ␣1(I) procollagen in the absence of ␣2(I) procollagen results in homotrimer formation in mammalian cells (9), yeast extracts were treated with pepsin to digest susceptible proteins. A light band was detected at the expected size for collagen on SDS-polyacrylamide gel (Fig. 2B), indicating the presence of a homotrimeric collagen triple helix.
Since earlier reports described hyperglycosylation in several yeast strains (39), we compared the sizes of N-linked oligosaccharide at the single acceptor site located in the C-propeptide of ␣1(I) procollagen (40). The C-propeptide trimer was removed from procollagen with C-proteinase/BMP-1 (41). Following deglycosylation of human skin fibroblast-and yeast-derived hu-man procollagen C-propeptides with endoglycosidase H, identical decreases in molecular mass were seen by Western blotting (Fig. 2C). This result suggests that similar levels of carbohydrate were added to human procollagen expressed in Saccharomyces and in fibroblast cultures.
The same signal sequence substitutions tested for ␣1(I) procollagen plus the yeast invertase signal were used for expression of the individual gene products ␣2(I) procollagen, ␣PH, and PDI. The native signal sequences for ␣2(I) procollagen and ␣PH gave the highest expression detected by Western blotting, whereas PDI was more efficiently expressed and secreted into the endoplasmic reticulum with the preinvertase signal sequence than with its native signal sequence. In addition, no difference was measured in the expression levels or retention of PDI in the endoplasmic reticulum by Western blotting when the endoplasmic reticulum retention signal KDEL, present in PDI of higher eucaryotes, was replaced with HDEL, present in yeast PDI (data not shown).
The formation of a triple helix of homotrimeric type I procollagen stable at Ͼ25°C would indicate the functionality of the prolyl hydroxylase enzyme. The T m for non-hydroxylated procollagen is ϳ25°C (3). We compared pepsin-digested extracts from Saccharomyces expressing human ␣1(I) procollagen at 30°C with and without the genes that code for prolyl hydroxylase. One prominent band was seen on SDS-polyacrylamide gel that comigrated with the expected size of ␣1(I) procollagen in extracts from cells containing the ␣PH and PDI genes, but this band was absent from cells without prolyl hydroxylase (Fig. 2D). In the absence of the hydroxylation system, the pro-␣1 chains were not able to fold into a triple helix that was stable at the 30°C growth temperature. These data suggest the yeast cells are expressing and assembling an active prolyl hydroxylase tetramer, resulting in the formation of hydroxyproline, which stabilizes the triple helix at elevated temperatures.
Expression of Heterotrimeric Type I Procollagen-The next step was to express four genes in Saccharomyces to test whether they are sufficient for production of type I procollagen.
To accurately measure procollagen expression levels, an immunoassay was developed to quantify human heterotrimeric type I collagen. This assay was challenged to detect thermally denatured human placental type I collagen, pepsin-resistant human type I collagen homotrimer expressed in our yeast system, or bovine type I collagen. No signal was detected. Several different expression units were generated that either placed the genes on a 2 vector or integrated them into the yeast genome (Fig. 1). Both procollagen genes were derived from human sequences; the ␣PH and PDI genes were either of chicken or human origin.
Expression constructs that contained chicken/chicken ␣PH and PDI subunits expressed type I procollagen at higher levels than human/human or human/chicken prolyl hydroxylase subunits (Table I). The integration of the chicken prolyl hydroxylase genes into the yeast genome further increased type I procollagen expression. Other plasmid constructs tested but not reported here included different combinations of yeast promoters driving the four genes. In addition, a new strain was created by integrating the ␣1(I) and ␣2(I) procollagen genes into the yeast chromosome. The results of these experiments were lower or undetectable levels of procollagen expression (data not shown).
Characterization of Recombinant Type I Collagen-The folding of type I procollagen into a heterotrimeric helix, the T m of the resulting helix, and the level of hydroxyproline in the helical region were determined. The thermal stability of recombinant type I collagen was evaluated by treatment with a mixture of trypsin and chymotrypsin at various temperatures (37). The yeast-derived recombinant collagen heterotrimer (Fig. 3A) was resistant to the proteases at temperatures as high as 40°C. The melting curves for the fibroblast-derived collagen (Fig. 3B) suggested that this collagen had a slightly higher T m relative to the recombinant collagen. Circular dichroism measurement of purified type I collagen showed a T m of 35°C (Fig. 4), which is slightly below the T m of tissue-derived collagen. To directly demonstrate the presence of hydroxyproline in the recombinant collagen, amino acid analysis was performed on a purified sample. This analysis showed hydroxyproline levels that were 82 Ϯ 2% (n ϭ 7) of values for collagen from tissue-derived sources, which is in agreement with the CD measurements. Direct detection of hydroxyproline residues by amino acid analysis unequivocally demonstrated the functionality of the prolyl hydroxylase enzyme in our strains.
Medium Optimization-During development of the expression system, experiments had shown that 0 -2% casamino acid supplementation of the medium influenced the level of detectable human procollagen, with 0.5% casamino acids as the optimal concentration. Further analysis of the medium using three additional supplementations at 0 -1% was undertaken to optimize procollagen production (Table II). Casamino acid supplementation was compared with the medium supplements Bacto-Tryptone, Bacto-peptone, and yeast extract for their influence on procollagen expression. A level of 0.5% casamino acid supplementation supported the highest levels of procollagen production of the different supplements tested.
Since casamino acids were the most stimulatory for procollagen production, simpler amino acid mixtures, based on the concentrations that would be found in medium containing 0.5% casamino acids, were tested to identify the stimulatory component(s). Several combinations of amino acid mixtures were tried. Increased levels of proline and glycine had no effect on procollagen production levels. Ultimately, yeast nitrogen base supplemented with arginine, glutamate, and lysine or with glutamate alone supplied the needed component necessary for high level procollagen production (Table II). Procollagen expression levels were 3-4 g/mg of total protein, or 0.3-0.4%. This requirement of hydroxylated procollagen for precursors of ␣-ketoglutarate in the media suggests that not enough ␣-ketoglutarate is made in vivo for the hydroxylation reaction. Ascorbic acid (another cofactor of prolyl hydroxy-

FIG. 2. Characterization of type I procollagen homotrimer. A,
Western blot of yeast extracts from strains expressing ␣1(I) procollagen with different signal sequences. Transfected Saccharomyces cells were grown at 20°C, and expression of the ␣1(I) procollagen gene was induced by galactose. Yeast extracts were subjected to electrophoresis and Western blotting using antibody LF-39, which recognizes the N-propeptide of human ␣1(I) procollagen. Procollagen containing its native signal sequence is pGET327; that containing the prepro-HSA signal is pGET323; and that containing the ␣-factor signal is pGET335. B, pepsin digestions at different temperatures of yeast extracts expressing ␣1(I) procollagen (pGET327). Pepsin-digested yeast extracts were electrophoresed on SDS-polyacrylamide gel, and proteins were visualized by silver staining. C, carbohydrate analysis of the homotrimeric type I procollagen C-propeptide region expressed from plasmid pGET327. Endoglycosidase H (EndoH) was used to cleave N-linked oligosaccharide from the C-propeptide of both the human skin fibroblast (HSF)-and yeast-expressed human type I procollagen homotrimers. The digests were reduced and electrophoresed on SDS-polyacrylamide gel, followed by Western blotting. The C-propeptide was identified using the LF-41 antibody. D, pepsin digestions of the type I procollagen homotrimer from pGET327 expressed in yeast with and without prolyl hydroxylase/ protein-disulfide isomerase genes integrated into the yeast. Pepsindigested yeast extracts were electrophoresed on SDS-polyacrylamide gel, and proteins were visualized by silver staining.
a h denotes the human gene; c denotes the chicken gene. b Multiple ␣PH/PDI expression units were integrated into the chromosome. lase) supplementation of the medium had no effect on hydroxylation and production of the heterotrimer. DISCUSSION The results of this study demonstrate that a complex, multisubunit procollagen molecule can be synthesized and assembled into a triple helix in Saccharomyces cerevisiae containing a functional multisubunit prolyl hydroxylase enzyme. Three procollagen chains, coded by two genes, are synthesized and assembled into a triple helix. Two genes coding for prolyl hydroxylase form an active enzyme, presumably a tetramer. Prolyl hydroxylase post-translationally modified procollagen, resulting in a thermally stable molecule. Therefore, the posttranslational modification must occur in the same location within the endoplasmic reticulum as the assembly of the three procollagen chains. Saccharomyces apparently lacks only the prolyl hydroxylase required for production of thermally stable human type I procollagen. The prolyl hydroxylase stabilizes the triple helix through the generation of hydroxyproline residues, but also is associated with the folding of the triple helix. PDI supports the folding and disulfide formation of procollagen C-propeptides (7). The low levels of triple helix detected by expressing the ␣1(I) procollagen gene in the absence of transfected prolyl hydroxylase genes suggest the existence of a less efficient or rate-limited mechanism in Saccharomyces for the generation of triple helical procollagen. It can be postulated that the endogenous yeast PDI may assemble the three procollagen polypeptide chains, but the winding of the triple helix is more efficient in the presence of prolyl hydroxylase.
The procollagen molecule produced in this expression system had several features of tissue-derived procollagen synthesized by mammalian cells. The procollagen molecule was triple helical and thermally stable with a T m of 35°C based on CD analysis. Non-hydroxylated collagen has a T m of 23-25°C, whereas collagen isolated from mammalian tissues has a T m of 39 -40°C (3). The T m was consistent with the stability of the triple helix to proteolytic digestion and was in agreement with the level of hydroxyproline determined by amino acid analysis equivalent to 82% of levels found in tissue-derived type I collagen. N-Linked carbohydrate was also detected on recombi-nant type I procollagen. Since earlier reports describe hyperglycosylation of heterologous proteins in several yeast strains, the size of N-linked oligosaccharide at the single acceptor site located in the C-propeptide of ␣1(I) procollagen was determined. The C-propeptide contained carbohydrate with a mass equivalent to that found in procollagen isolated from skin fibroblasts. No evidence of hyperglycosylation was observed for the ␣1(I) procollagen polypeptide.
Both homotrimeric and heterotrimeric type I procollagens were synthesized by Saccharomyces using plasmids that contained one or both procollagen genes, respectively. The procollagen genes and the genes for prolyl hydroxylase were sufficient to assemble procollagen polypeptides and to generate thermally stable procollagen. To obtain and to optimize expression of procollagen, gene location, use of different promoters, Collagen purified from pGET737 in the GY5382 strain (A) or procollagen from human skin fibroblast-conditioned medium (B) was treated with a mixture of trypsin and chymotrypsin at various temperatures. Digests were electrophoresed on SDS-polyacrylamide gel, and proteins were visualized by staining with GELCODE blue. Lane 1, human collagen purified from placenta; lane 2, undigested sample; lanes 3-10, enzyme digests at 25,30,33,35,38,40,42,and 45°C,respectively. FIG. 4. Circular dichroism analysis of type I procollagen purified from S. cerevisiae. A, purified human type I collagen was equilibrated at each temperature and then scanned. The temperature measurements are indicated (q). Normalized ellipticity was plotted against temperature. The scale of 100 indicates triple helical collagen, and 0 is denatured collagen. B, the T m was determined as 35°C from the maximum of the first derivative of the curve in A.
signal sequence modifications, and species origin of the posttranslational machinery were varied. All of these parameters played key roles in the production of thermally stable human type I procollagen, with some combinations producing little or no detectable procollagen material. The configuration that gave the highest type I procollagen yield was placement of the human ␣1(I) and ␣2(I) procollagen genes on a 2 vector with integration of the two chicken prolyl hydroxylase genes into the yeast genome. We initially tested chicken ␣PH and PDI, but expected human ␣PH and PDI to offer equal or better posttranslational modification efficiency and to potentially increase procollagen levels indirectly through increased folding and procollagen thermal stability. Instead, higher expression of human procollagen was measured with chicken prolyl hydroxylase genes (␣PH and PDI) compared with their human counterparts. Two possibilities to explain the higher expression of human procollagen using chicken prolyl hydroxylase are potentially higher enzymatic activity and enhanced interaction of the gene with its promoter to increase transcription. Sequence homologies between chicken and human ␣PH protein and cDNA sequences are 88 and 81%, respectively (28). Chicken and human PDI protein sequences are Ͼ90% homologous, whereas cDNA sequence comparison shows only 76% homology (26).
Developmental work on this yeast expression system showed the requirement of casamino acids for higher level production of recombinant procollagen. Addition of individual components of casamino acids to the medium showed that glutamate alone was sufficient to provide procollagen expression levels equivalent to those observed using the entire mixture of amino acids found in casamino acids. One possible explanation for the role of glutamate is its ability to undergo intracellular oxidation to create ␣-ketoglutarate, an intermediate in the tricarboxylic acid cycle and an essential cofactor for proline hydroxylation. Glycine and proline supplementation did not affect procollagen expression levels even though procollagen consists of high amounts of these two amino acids. In addition, this medium does not contain ascorbic acid, a cofactor for prolyl hydroxylase and necessary for the expression of thermally stable procollagens in other recombinant expression systems (14,18). Saccharomyces must generate sufficient levels of this cofactor to supply to the prolyl hydroxylase enzyme. These results are important for the production of proteins that could be used in medical applications, as use of a medium free of animal-derived components provides an extra margin of safety and avoids a potential regulatory hurdle defining the source of a raw material.
Fibroblasts have been used to delineate the biosynthetic pathway of collagen for 3 decades. Since fibroblasts express procollagen ubiquitously, many suggestions have been made about the function of various proteins in the biosynthesis of procollagen at the level of transcription, translation, and assembly. Hsp47 is a serpin-like collagen-specific chaperone localized in the endoplasmic reticulum that transiently binds to type I-V collagens and that is involved in the assembly and/or packaging of collagens (8). Hsp47 is associated with polysomebound ␣1(I) procollagen chains (42), and it prevents overmodification of type III procollagen in transfected 293 kidney cells (43). BiP/Grp78 and Grp94 form a complex with Hsp47 during the maturation of newly synthesized type IV collagen (44). PDI interacts with the C-propeptide of collagen chains prior to trimer formation, and prolyl hydroxylase remains associated with the triple helical domain if triple helical formation is prevented (7). Yeast was chosen as a model system for the synthesis of procollagen because it is a well characterized eucaryotic organism that does not normally synthesize procollagen. To synthesize type I procollagen, only four genes were shown to be required. This study shows that other fibroblastspecific genes are not needed for the basic mechanism of recognition and assembly of the procollagen polypeptides to form a stable triple helical molecule. Recombinant Hsp47 was not required for assembly of triple helical type I procollagen. PDI expressed with ␣PH increased homotrimeric type I procollagen synthesis, suggesting an enhancement of the assembly of procollagen with increased PDI and/or ␣PH protein.
In summary, we have demonstrated the minimum requirements for type I procollagen expression in Saccharomyces. A total of four genes are required. Two genes code for the two polypeptide chains of human type I procollagen. Two additional genes code for prolyl hydroxylase, a modification enzyme that post-translationally hydroxylates proline residues within the triple helical domain of the procollagen polypeptides. Prolyl hydroxylase or its individual subunits also enhance the level of procollagen synthesized. Glutamate, possibly acting as a precursor for the synthesis of ␣-ketoglutarate, is required for generating high levels of triple helical procollagen molecules. Other proteins associated with procollagen synthesis in mammalian cells are not required in Saccharomyces. It remains to be determined if specific chaperone proteins play a subtle role in the assembly of procollagens in eucaryotic cells.