Engineering Mammalian Mucin-type O-Glycosylation in Plants*

Background: Plants lack mammalian GalNAc-type protein O-glycosylation. Results: Transient expression of a Glc(NAc) C4-epimerase and a polypeptide GalNAc-transferase in Nicotiana benthamiana resulted in O-glycosylation. Conclusion: Mammalian O-glycosylation can be established in plants. Significance: Plants may serve as host cells for recombinant production of custom-designed O-glycoproteins. Mucin-type O-glycosylation is an important post-translational modification that confers a variety of biological properties and functions to proteins. This post-translational modification has a particularly complex and differentially regulated biosynthesis rendering prediction and control of where O-glycans are attached to proteins, and which structures are formed, difficult. Because plants are devoid of GalNAc-type O-glycosylation, we have assessed requirements for establishing human GalNAc O-glycosylation de novo in plants with the aim of developing cell systems with custom-designed O-glycosylation capacity. Transient expression of a Pseudomonas aeruginosa Glc(NAc) C4-epimerase and a human polypeptide GalNAc-transferase in leaves of Nicotiana benthamiana resulted in GalNAc O-glycosylation of co-expressed human O-glycoprotein substrates. A chimeric YFP construct containing a 3.5 tandem repeat sequence of MUC1 was glycosylated with up to three and five GalNAc residues when co-expressed with GalNAc-T2 and a combination of GalNAc-T2 and GalNAc-T4, respectively, as determined by mass spectrometry. O-Glycosylation was furthermore demonstrated on a tandem repeat of MUC16 and interferon α2b. In plants, prolines in certain classes of proteins are hydroxylated and further substituted with plant-specific O-glycosylation; unsubstituted hydroxyprolines were identified in our MUC1 construct. In summary, this study demonstrates that mammalian type O-glycosylation can be established in plants and that plants may serve as a host cell for production of recombinant O-glycoproteins with custom-designed O-glycosylation. The observed hydroxyproline modifications, however, call for additional future engineering efforts.

Most recombinant produced biological therapeutics are glycoproteins, and selection of host cells for their production is critical for therapeutic effects and safety because the glycan moieties can interact with lectin scavenger receptors and immune cells. Traditionally, selection of host cells has aimed at producing glycoproteins with normal human glycosylation, i.e. mature glycans similar to those found on the natural glycoproteins, typically complex type N-glycans and core 1 O-glycans, all with sialic acid capping (1). However, aberrant glycosylation may be desirable, e.g. with respect to design of vaccines aimed at eliciting immunity to specific glycoforms found on virus particles and virus-infected cells or on cancer cells (2). We have previously shown that the cancer-associated mucin MUC1 contains immunodominant aberrant O-glycopeptide epitopes to which cancer-specific IgG antibodies can be elicited in man (3), and we have developed a chemoenzymatic strategy for synthesis of such vaccine glycopeptides (3). However, sustainable recombinant expression systems for production of vaccines targeting cancer cells with immature O-glycosylation or envelope virus glycoproteins are still in demand.
O-Glycosylation is controlled by a family of up to 20 UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases (Gal-NAc-Ts) 3 ; the repertoire of GalNAc-Ts present in a given cell defines the pattern and density with which proteins are decorated with O-glycans. Normal mammalian cells typically produce elongated and/or branched O-glycans capped with sialic acids or different blood group-related structures. The most frequently used host cell for recombinant expression of therapeutic proteins is CHO, which generally produces mono-and disialylated core 1 (Gal␤1-3GalNAc␣1-O-Ser/Thr) O-glycan structures. Recently, it was shown that CHO-K1 expresses four GalNAc-Ts (GalNAc-T2, -T7, -T11, and -T19) (4). Because all eumetazoan cells express multiple GalNAc-Ts and generally extend and cap O-glycans, it is desirable to develop a cell system for recombinant production in which the entire O-glycosylation pathway can be custom-built both with respect to the Gal-NAc-T repertoire for custom design of sites of O-glycosylation, as well as the repertoire of glycosyltransferases involved in extending and capping O-glycans.
Yeast and plants offer cell systems suitable for recombinant expression without the capacity for mammalian GalNAc O-glycosylation (5,6). Neither species contain genes homologous to the GalNAc-T gene family that initiates O-glycosylation (7). Although GalNAc and the donor substrate UDP-GalNAc are not found in yeast, the presence of these compounds in plants is still disputed (5, 8 -10). Most of the enzymes known to elongate O-glycans in mammals have no close homologs in plants (11)(12)(13). These features should make it possible to custom design and build a capacity for O-glycosylation from the bottom up in plants. In yeast, the endogenous protein O-mannosylation initiated in the ER is likely to compete for substrate sites with the Golgi-localized GalNAc O-glycosylation. Despite this, Narimatsu and co-workers (14) introduced O-GalNAc glycosylation into yeast and obtained recombinant production of reporter constructs with GalNAc and core1 O-glycosylation but also a degree of O-mannosylation. With the use of a rhodamine-3-acetic acid derivative added to the culture medium, O-mannosylation was reduced but not completely eliminated.
Plants do not have competing O-glycosylation pathways for Ser/Thr residues. Plants produce another type of protein O-glycosylation, whereby prolines are hydroxylated to yield hydroxyproline (Hyp), which may be substituted with various Hyp O-glycans (15). Hyp conversion occurs in the ER and extends into the Golgi apparatus (16,17), and this modification is preferentially found at proline (Pro)-rich repetitive sequence motifs (15). Whereas the short Pro-rich hinge region of IgA1 in humans is substituted with mucin-type O-linked glycans, recombinant expression in plants results in several Hyp modifications and further substitution with short arabinosides (18). More recently, expression of a tandem repeat of the human mucin MUC1, which in humans is decorated with O-glycans, was also found to contain Hyp O-glycosylation (19). The key step in plant-specific Hyp O-glycosylation is regulated by 13 C4-hydroxylases (P4Hs), which individually, according to initial studies (17, 20 -22), are not required for viability and growth; and in principle, one or more of these could be mutated to eliminate undesirable modifications on recombinant glycoproteins. It is therefore conceivable that plants can be engineered to eliminate specific unwanted Hyp modifications that are required by proteins of interest during recombinant expression. Plants should therefore offer a suitable and safe system for recombinant production of glycoprotein pharmaceuticals.
In a recent study Daskalova et al. (23) studied transient expression of a GalNAc-T in combination with a Glc(NAc) C4-epimerase and a UDP-Gal(NAc) transporter in Nicotiana benthamiana L plants. They used a chimeric reporter construct with a short sequence from the human mucin, MUC1, as accep-tor for O-glycosylation, and they used questionable Vicia villosa agglutinin (VVA) lectin blotting results to conclude that Gal-NAc O-glycosylation could be established in plants with co-expression of GalNAc-T2 alone. They also concluded that co-expression of both an epimerase and a transporter were required for enhancing lectin reactivity with the reporter, whereas neither the epimerase nor the transporter alone improved lectin reactivity. The results were not corroborated by structural analysis, and it is known that lectin reactivity poses particular problems in plants (5,24). Thus, the original proposal of the presence of sialic acids in plant glycoproteins (25,26) was later shown likely to be due to contaminants (27). Furthermore, Daskalova and co-workers (28) more recently reported that the VVA lectin recognizes noncarbohydrate epitopes by Western blotting in plants, thus casting serious doubts on the conclusions drawn in their previous publication (23).
In this study, we have carefully tested different design strategies for the introduction of GalNAc mucin-type O-glycosylation into N. benthamiana plants. The results are essentially in complete disagreement with those reported by Daskalova et al. (23), which highlights the need for structural verification of products produced with glycoengineering. We used transient expression of enzymes and reporter substrates to identify requirements for establishing efficient glycosylation of mammalian O-glycoproteins (see Fig. 1). We found no evidence of GalNAc O-glycosylation when a reporter substrate was expressed in plants alone or with an appropriate GalNAc-T, GalNAc-T2. Basic O-glycosylation capacity was, however, achieved by introduction of a Glc(NAc) C4-epimerase and a GalNAc-T, which was validated through co-expression of three O-glycosylation target proteins. We did observe Hyp modifications in a MUC1-based substrate, which indicates that use of plants for production of O-glycoprotein therapeutics will require additional strategies to eliminate the corresponding endogenous proline hydroxylases.

EXPERIMENTAL PROCEDURES
Inoculation and Growth Conditions of N. benthamiana-Growth conditions were as described in Egelund et al. (29). Agrobacterium tumefaciens strain C58C1 pGV3850 was used for agrobacterium-mediated expression in N. benthamiana, which was performed essentially as described by Sainsbury and Lomonossoff (30). In brief, agrobacteria were transformed with constructs by electroporation and selected with the appropriate antibiotics. Agrobacteria cultures were grown overnight in Luria-Bertani (LB) medium, harvested by centrifugation, and resuspended in a buffer containing 10 mM MES, 10 mM MgCl 2 , and 100 M acetosyringone (A 600 ϭ 0.5) and left at 20°C for 2 h. Leaves of 3-4-week-old N. benthamiana plants were infiltrated with the bacterial cell suspensions using 1-ml syringes, and leaf material was collected for analysis after 5-6 days. To increase expression of transgenes, all experiments included co-infiltration of agrobacteria carrying a p19 gene construct, a viral protein specifically inhibiting post-transcriptional gene silencing (31).
DNA Constructs for Plant Transformation and Transient Expression-Vector constructs used are depicted in Fig. 2A. Open source vectors used for Agrobacterium-mediated expression and transformation are as follows: pBI121 (GenBank TM accession number AY781296); pCAMBIA 2300 (GenBank TM accession number AF234315); pCAMBIA 1302 (GenBank TM accession number AF234298). Vectors used for Agrobacterium-mediated expression and transformation are as follows: pBI121 (GenBank TM accession number AF234315); pCAMBIA 1302 (GenBank TM accession number AF234298). For legacy of open source pCAMBIA and pBI121 binary vectors see online. pPS48 (32) is an intermediate Escherichia coli only vector, which contains the cauliflower mosaic virus 35S promoter followed by the 35S terminator interspaced by a multiple cloning site, into which genes of interest were cloned. Entire transcriptional units (35S-Pro-goi-35S-term) were then excised using XbaI or HindIII and ligated into the multiple cloning site of the pCAMBIA-derived plant expression plasmids. The Nicotiana tabacum ubiquitin promoter and terminator were synthesized by GenScript. O-Glycosylation target constructs were directed to the secretory pathway by N-terminal fusion of nucleotides encoding one of the following signal peptide sequences: 1) NtSP derived from N. tabacum proline-rich protein 3 (UniProt accession number Q40502), encoding the sequence MGK-MASLFASLLVVLVSLSLA; and 2) PpSP derived from Physcomitrella patens aspartic protease (EMBL accession number AJ586914) encoding the sequence MGASRSV-RLAFFLVVLVVLAALAEA.
The following constructs expressing mucins and other O-glycosylation target peptides were assembled or synthesized synthetically. The MUC1-3.5 tandem repeat (33) codon optimized for expression in Arabidopsis thaliana was synthesized with an N-terminal PpSP sequence (EUROFINS, MWG, Germany) and then utilized as a template for PCR amplification using the primers SPMUC1For and MUC1Rev (sequences of all primers are listed in supplemental Table S1). The resulting PCR product was then N-terminally fused to YFP by cloning into the pC2300u vector containing YFP (34), yielding pC2300u-35SPro-PpSP-MUC1-3.5TR-YFP(His) 6  Construction of O-glycosylation machineries encoding C4-epimerase and GalNAc-Ts, with each gene brought under control of a separate promoter, was carried out as follow. 1) Cloning of the N-terminal FLAG-tagged (MDYKDDDD) epimerase into the pCAMBIA2300 vector (pC2300) involved PCR amplification using pET23-WbpP (35) as template and the primers PBY7For and PBY7Rev, which was subcloned into the SacI site of pPS48 under control of the 35S promoter and 35S terminator sequence. The entire transcriptional unit (35SPro-CytoEpi-35STerm) was then excised using HindIII and cloned into the HindIII site of pC2300, yielding pC2300 -35SPro-CytoEpi-35Sterm (CytoEpi). ER-targeted epimerase was constructed by PCR amplification of CytoEpi using the primers ERWbpPFor and ERWbpPRev for addition of the secretion signal peptide and C-terminal KDEL ER retention signal, resulting in the construct EREpi, which was subcloned into pCAMBIA2300 using the SacI site. 2) Full-length GalNAc-T2 was excised from an existing pBKS-GalNAc-T2 containing plasmid (GenBank TM accession number X85019) (38) using EcoRI, blunted with Klenow, and inserted into the StuI site of pC1302, thus rendering GalNAc-T2 under the control of 35S promoter and terminator and resulting in the construct pC1302D-35SPro-T2-35STerm (T2). 3) O-Glycosylation machinery construct encoding cytosolic epimerase and Golgi-targeted GalNAc-T2 from separate transcripts was made to compare with the Golgi UDP-GalNAc pool with the cytosolic pool. This construct was assembled by inserting the XbaI-35SPro-CytoEpi-35STerm-XbaI of construct CytoEpi into the XbaI site of construct T2, resulting in pC1302D-35SPro-T2-35STerm;35SPro-Cyto-Epi-35STerm (CytoEpi-T2). 4) O-Glycosylation machinery construct encoding ER epimerase and Golgi-targeted GalNAc-T2 (EREpi-T2) was cloned same as the cloning procedure of CytoEpi-T2. 5) Full-length of GalNAc-T4 (codon optimized for Arabidopsis) was synthesized by EUROFINS, MWG, Germany, and subcloned into pCAMBIA2300 using the HindIII site, resulting in pC2300D-35SPro-T4 -35STerm.
Three additional O-glycosylation machineries were made as 2A-linked polycistronic constructs. 1) GalNAc-T2 was PCRamplified with primer-introduced N-terminal hemagglutinin tag (HA tag) using the primers HAT2For and HAT2Rev. N-terminal FLAG-tagged epimerase was PCR-amplified using the primer set PFwbppFor and PFwbppRev. The two PCR fragments were then ligated and interspaced with a sequence encoding the 2A sequence in pC130035Su in accordance to the USER cloning method delineated by Nour-Eldin et al. Preparation of Leaf Total Protein Extracts-Approximately 1 g of freshly harvested leaves were frozen in liquid N 2 and comminuted using a pestle and mortar with 2 ml of extraction buffer A (50 mM NaPO 4 , 250 mM NaCl, 5 mM imidazole, pH 8.0) containing Complete Proteinase Inhibitor (Roche Applied Science) and 1 mM phenylmethanesulfonyl fluoride. The sample was incubated for 10 min on ice, and insoluble material was pelleted by centrifugation (20,000 ϫ g) for 10 min, and the supernatant was recovered and stored at Ϫ20°C.
SDS-PAGE Western Blotting-SDS-PAGE Western blot analysis was performed as described previously (40). Monoclonal antibodies (mAbs) to the T7 tag, and M11 to MUC16 were obtained from Invitrogen and Dako, respectively. mAbs to MUC1 and GalNAc-MUC1, 5E10 and 5E5, respectively, have been described previously (41). Lectin blot analysis was performed essentially as Western blot analysis, except the antibody was HRP-conjugated Vicia villosa lectin (VVA, EY Laboratories).
Endo-Asp Digestion of MUC-3.5TR and Sample Purification-Purified MUC1-3.5TR-YFP (ϳ25 g) was incubated with 1 g of endoproteinase Asp-N from Pseudomonas fragi (Roche Applied Science) in 300 l of 100 mM Tris-HCl, pH 8.0, for 16 h at 37°C with shaking. The digest was cleaned on a C18 Zip-Tip (Millipore). Briefly, the digestion mixture was dissolved in 20 l of 0.1% TFA and drawn through the column, desalted with 0.5% formic acid, and eluted with acetonitrile. In some instances, the released 20-mer MUC1 tandem repeat peptide was further isolated by HPLC using a Dionex system. Trifluoroacetic acid (TFA) was added to samples (0.05% v/v) before application to a C12 column (150 ϫ 4.6 mm Jupiter Proteo with 90-Å pore size, 4-m particle size, Phenomenex). Chromatographic separation was obtained in a two-eluent system, where eluent A was 0.05% TFA in water, and eluent B was 0.05% TFA in acetonitrile, and the pump speed was a constant 0.5 ml min Ϫ1 . From 0 to 5 min, the eluent was 5% B; from 5 to 35 min, eluent B increased in a linear gradient to 40%; and from 35-45 min eluent B increased to 100%.
Matrix-assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF)-Lyophilized peptides were dissolved in 20 l of water. The MALDI matrix was 25 g/liter 2,5-dihydroxyben-zoic acid (Sigma) dissolved in a 1:1 mixture of water and methanol. Samples were prepared for analysis by placing 0.5 l of sample solution on a probe tip followed by 0.5 l of matrix. All spectra were obtained in the linear mode and calibrated using external calibration. All mass spectra were acquired on a Voyager-Elite MALDI time of flight mass spectrometer (PerSeptive Biosystem Inc., Framingham, MA), equipped with delayed extraction.
Characterization of O-Glycosylation Sites by ETD-MS 2 -Products of O-glycosylation were characterized by electrospray ionization-linear ion trap-Fourier transform mass spectrometry in an LTQ-Orbitrap XL hybrid spectrometer (Thermo-Scientific) equipped for ETD for peptide sequence analysis by MS/MS (MS 2 ) with retention of glycan site-specific fragments. Samples were dissolved in methanol/water (1:1) containing 1% formic acid and introduced by direct infusion via a TriVersa NanoMate ESI-Chip interface (Advion BioSystems) at a flow rate of ϳ100 nl/min and 1.4-kV spray voltage. Mass spectra were acquired in positive ion FT mode using parameters similar to previous studies (42), except at a nominal resolving power of either 30,000 or 60,000. MS 1 spectra, in which multiple charge states were observed, were deconvoluted for clarity of presentation using the Xtract function in the Xcalibur data analysis software package (Thermo Fisher). ETD-MS 2 spectra were analyzed by comparison with theoretical c-and z-fragment m/z values calculated for all positional combinations of HexNAc residues distributed on all the potential Ser and Thr glycosylation sites in the sequence. Where potential hydroxylation of Pro was observed, positional combinations, including one or more Hyp residues, were also calculated. Calculations were performed using the web-based Protein Prospector MS-Product software routine. Fig. 1 are the principle constituents needed for GalNAc O-glycosylation. Introduction of UDP-GalNAc synthesis required a Glc/GlcNAc C4-epimerase. High concentrations of nucleotide sugars in Golgi are maintained by transporters, and it has been suggested that plants lack a specific transporter (43,44) for UDP-GalNAc (23). Most UDP-Gal transporters, however, function as UDP-Gal/GalNAc transporters (43,(45)(46)(47). We therefore first investigated whether introduction of an epimerase, P. aeruginosa cytoplasmic epimerase (WbpP) (35), and a human GalNAc-T, GalNAc-T2 (48), would be sufficient for O-glycosylation. A series of construct designs were prepared and tested to address the optimal order of coding regions, efficiency of introducing a 2A self-cleaving sequence, and effect of targeting the epimerase to ER, Golgi, or the cytoplasm (Fig. 2A). For expression and localization of O-glycosylation machinery components see supplemental Fig.  S1.

Depicted in
Expression of a MUC1-YFP Reporter Substrate in Plants-We initially assessed O-glycosylation capacity by expressing a known acceptor substrate for GalNAc-T2, which was based on the tandem repeat sequence of MUC1 and linked to YFP (MUC1-YFP). When expressed alone, the reporter MUC1-YFP substrate migrated as an ϳ45-kDa protein and reacted with mAb 5E10 (reactive with unglycosylated and all glycoforms of MUC1) but not with mAb 5E5 (reactive with GalNAc-glycosylated MUC1), indicating that the MUC1 reporter substrate was not substituted with GalNAc in wild type plants (Fig. 2B, lane  1). This was subsequently confirmed by mass spectrometric analysis of the MUC1 peptide enzymatically digested from the expressed MUC1-YFP protein (Fig. 3D), where a product corresponding to unmodified MUC1 (m/z 1887) was present, but no products corresponding to glycosylated MUC1 were observed. This analysis did uncover evidence of Hyp modification of MUC1. The presence of an ϳ16-Da increment (m/z 1903) in the MS spectrum of MUC1-YFP corresponds to hydroxylation of a single proline. Electrospray ionization-electron transfer dissociation Fourier transform-MS 2 analysis further demonstrated that Pro 11 of the MUC1 tandem repeat (DTRPAPGSTAP 11 PAHGVTSAP) was hydroxylated. 4 Based on peak sizes, this Hyp modification was a minor constituent, and evidence of other modifications, including Hyp O-glycosylation, was not observed (Fig. 3D).

Construct Designs of O-Glycosylation Machinery-Co-ex-
pression of either C4-epimerase WbpP or GalNAc-T2 alone with the MUC1-YFP reporter substrate did not result in detectable O-glycosylation (Fig. 2B, lanes 2 and 3). In contrast, co-expression of MUC1-YFP with a combination of C4-epimerase and GalNAc-T2 resulted in detectable O-glycosylation as evidenced by reactivity with mAb 5E5 and a slight mobility shift (1-2 kDa) on SDS-PAGE-derived Western blotting (Fig. 2B,  lanes 4 -8). The 2A auto-cleaving motif is a 20-amino acid sequence derived from the foot-and-mouth virus, which through an intra-ribosomal skipping event during translation results in synthesis of two protein products (49). Insertion of the 2A sequence between two coding regions in a polyprotein construct allows for expression of two independent proteins under control of a single promoter, and it has been widely tested in many eukaryotic expression systems (49,50). The 2A construct design is an attractive approach for achieving coordinated expression of multiple proteins. Tests of different construct designs for expression of the epimerase and GalNAc-T2   (Fig. 2B, lanes 4 and 5). The attempts to direct the epimerase to the Golgi (GolgiEpi-2A-T2) or to the ER (EREpi-T2) did not result in substantial glycosylation (Fig. 2B,  lanes 6 and 7). Furthermore, expression of the cytoplasmic epimerase and GalNAc-T2 from separate promoters resulted in detectable but substantially lower efficiency in glycosylation (Fig. 2B, lane 8) indicating that the 2A sequence is essential for the vector design.
Structural Analysis of MUC1-YFP Expressed in Glycoengineered N. benthamiana-MUC1-YFP co-expressed with or without T2-2A-CytoEpi was purified by metal-affinity chromatography and analyzed via SDS-PAGE and VVA lectin blotting (Fig. 3A). The increase in molecular mass exhibited by MUC1-YFP when co-expressed with T2-2A-CytoEpi was  readily apparent on a highly separated SDS-polyacrylamide gel (Fig. 3A, left panel). HRP-conjugated VVA lectin, specific for GalNAc moieties, reacted with MUC1-YFP co-expressed with T2-2A-Epi but not with MUC1-YFP expressed alone (Fig. 3A,  right panel). Purified MUC1-YFP was enzymatically digested with endo-Asp-N to release individual MUC1 tandem repeats (Fig. 3B). Analysis of the products of this digest by HPLC enabled the separation of differentially modified MUC1 tandem repeats (Fig. 3C). The released MUC1 expressed in plants without O-glycosylation machinery eluted at 22.9 min, and MS analysis of the isolated product confirmed this. 4 MS analysis of the total MUC1-YFP digest revealed the presence of unmodified MUC1, but no products corresponding to GalNAc modifications (Fig. 3D). The HPLC of digested MUC1-YFP from plants co-expressing the O-glycosylation machinery, T2-2A-CytoEpi, revealed additional products eluting between 20.5 and 22.3 min (Fig. 3C, lower panel). MS analysis of these products identified MUC1 tandem repeats with 1-3 GalNAc moieties (Fig. 3E). This HPLC and MS analysis revealed that a substantial amount of the tandem repeat sequence was modified by 1-3 mol of GalNAc, which is in agreement with the fact that Gal-NAc-T2 can attach GalNAc up to three of the five potential sites in the MUC1 tandem repeat sequence (Ser 8 , Thr 9 , and Thr 17 ) (38). Two minor peaks at 21.1 and 22.3 min were found when MUC1-YFP was expressed alone (Fig. 3C, upper panel). MALDI-TOF analysis of these fractions showed the presence of MUC1-1TR with 1 Hyp (only in the 22.3-min fraction), YFP peptide fragments, and apparent contaminants. Analysis of MUC1-YFP co-expressed with the CytoEpi-2A-T2 or Gol-giEpi-2A-T2 O-glycosylation machinery constructs confirmed incorporation of GalNAc, albeit at lower levels than observed with the T2-2A-CytoEpi construct (Fig. 3, F and G). Approximately 65% of total MUC1-YFP was glycosylated by the most efficient O-glycosylation machinery, T2-2A-CytoEpi, and MUC1-YFP was estimated to accumulate to ϳ12 g/g (fresh weight) equivalent to ϳ1% of the total soluble protein.
Co-expression of GalNAc-T2 and -T4 Completes GalNAc O-Glycosylation of MUC1-Complete O-glycosylation of all five potential sites of the MUC1 tandem repeat requires the coordinate action of GalNAc-T4 and GalNAc-T2 (38). We therefore co-expressed GalNAc-T4 (T4) with T2-2A-CytoEpi and MUC1-YFP, which resulted in a further minor shift in SDS-PAGE mobility of MUC1-YFP as evidenced by 5E5 mAb-mediated Western blot analysis (Fig. 4A). MS analysis of purified MUC1-YFP digested with endo-Asp-N confirmed incorporation of up to 5 mol of GalNAc per tandem repeat of MUC1 (m/z 2902) when co-expressed with GalNAc-T2 and -T4 (Fig. 4B). Again a minor amount of hydroxylation of Pro 11 was observed (m/z 1903).
O-Glycosylation of Human MUC16 and INF␣2B-Two other human O-glycoproteins were tested as substrates. A construct expressing 1.2 tandem repeats of MUC16 (MUC16-T7), i.e. encoding 223 amino acids with ϳ30 putative O-glycosylation sites and 3 N-glycosylation sites, was co-expressed with the T2-2A-CytoEpi glycosylation machinery. A clear shift in SDS-PAGE mobility (ϳ45 to ϳ50 kDa) of the T7-tagged protein combined with reactivity with VVA was observed with co-expression of T2-2A-CytoEpi (Fig. 5A). Co-expression of a con-struct encoding the full secreted INF␣2B cytokine tagged with T7 and an Arabinogalactan Protein type (SP) 10 glycomodule (INF␣2B-T7-AGP) with the T2-2A-CytoEpi construct resulted in a mobility shift of a minor fraction of the protein (ϳ30 to ϳ31 kDa) and reactivity with VVA (Fig. 5B). INF␣2B has a single O-glycosylation site (GVGVT 132 ETPLM 137 ), which is known to be O-glycosylated by GalNAc-T2 (51). The observed mobility shifts and VVA labeling are indicative of O-glycosylation, but further MS analysis is needed to confirm glycosylation sites.

DISCUSSION
Plants represent one eukaryotic cell system in which animal O-glycosylation can be built from scratch and custom-designed for a particular purpose. Because the capacity for GalNAc-type O-glycosylation is highly regulated and dynamic in mammalian cells, it is desirable to develop cell systems with defined capacity for O-glycosylation for production of recombinant therapeutics and vaccines. In this study, we established GalNAc O-glycosylation capacity in N. benthamiana plant cells by expression of a UDP-Glc(NAc) C4-epimerase and GalNAc-Ts, and we showed that three different co-expressed secreted substrates were O-glycosylated. In this study, we expressed GalNAc-T2 and epimerase transiently in plants using both monocistronic constructs and bicistronic 2A linked constructs to screen different permutations of optimal vector design for the glycosylation machinery. In the present transient implementations, the bicistronic 2A construct designs proved most efficient (Fig. 2B).
The GolgiEpi-2A-T2 and EREpi-T2 constructs were designed to test if ectopic expression of the C4-epimerase in the ER-Golgi lumen would convert ER-Golgi lumen UDP-GlcNAc to UDP-GalNAc in situ and allow GalNAc O-glycosylation. However, targeting epimerase to cytoplasm was most effective as evidenced by glycosylation of the co-expressed acceptor substrate MUC1 (Fig. 2B). Factors including local substrate (UDP-GlcNAc) and resulting product (UDP-GalNAc) concentrations, the presence of chaperones, and pH differences in various compartments may influence the glycosylation efficiency. Other factors may be inefficient 2A sequence cleavage and aberrant localization of the individual proteins. For example, de Felipe and Ryan (52) reported that a two times 2A construct encoding a Golgi-targeted cyan fluorescence protein, Golgitargeted YFP, and a cytosolic puromycin resistance protein (i.e. GT-CFP-2A-GT-YFP-2A-PAC), resulted in Golgi-localized cyan fluorescence protein and YFP aberrantly localized in the mitochondria. Our attempts to exploit the ER/Golgi GlcNAc pools by targeting the epimerase to the ER or the Golgi proved less efficient than targeting it to the cytosol. This demonstrates that plant cells efficiently transport UDP-GalNAc into the secretory pathway and either do not produce GalNAc and UDP-GalNAc or if so only in minute amounts. The results provide a basis for further exploration of plant cells as glycoengineered hosts for recombinant expression of O-glycoproteins.
Considerable efforts have been devoted to glycoengineering host cells to accommodate human type glycosylation on recombinant expressed therapeutics, which have led to remarkable improvements especially for protein N-glycosylation. Human type N-glycosylation has been engineered in yeast with unprec-edented control of homogeneity of N-glycan structures added to proteins (53). In contrast, engineering of the abundant Gal-NAc-type O-glycosylation has lagged behind. Yeast cells were originally engineered to enable formation of GalNAc␣ and Gal␤1-3GalNAc␣ O-glycosylation (14). The engineered glycosylation machinery included a UDP-GlcNAc C4-epimerase, a UDP-Gal(NAc) transporter, and the glycosyltransferases Gal-NAc-T2 and C1GalT1. Although yeast is a desirable expression system, it suffers from competing O-mannosylation initiated in the ER. O-Mannosylation is essential for yeast (54), but O-mannosyl glycans are immunogenic in man (55,56). Although it is not possible to knock out all O-mannosyltransferases, some reduction of O-mannosylation was achieved by adding a rhodamine-3-acetic acid derivative to the culture medium (14).
While this study was being completed, Daskalova et al. (23) reported on similar glycoengineering attempts in N. benthamiana. Based on VVA lectin labeling experiments, the authors concluded that co-expression of a GalNAc-T with a reporter substrate was sufficient to produce low levels of glycosylation, whereas co-expression of both a UDP-Gal(NAc) transporter and an C4-epimerase were required for enhanced O-glycosylation. These results are in complete disagreement with the results presented here, where efficient glycosylation was achieved by co-expression of a GalNAc-T with an epimerase. We can only reconcile the different findings with erroneous interpretation of VVA lectin reactivity as was in fact reported by Daskalova and co-workers only recently (28). In this paper, Daskalova and co-workers presented evidence of noncarbohydrate-specific binding of VVA lectin to tobacco protein extracts, which could not be inhibited by hapten sugar, such as D-GalNAc. Furthermore, the claim that a specific UDP-Gal-(NAc) transporter is required for GalNAc O-glycosylation in plants (23) is controversial. Although UDP-GalNAc transport capacity has not been directly demonstrated in plants, many transporters are known to have UDP-Glc/Gal and UDP-GlcNAc/GalNAc transport capacities (57). Here we clearly demonstrate efficient O-glycosylation without exogenous introduction of a dedicated UDP-GalNAc transporter. This was also found with the additional functional expression of the GalNAc-T4 isoform, which is unique among the GalNAc-T isoforms by being required for complete GalNAc O-glycosylation of all five potential sites in the MUC1 tandem repeat sequence as well as having a very high K m values for UDP-GalNAc (38).
Plants have a unique hydroxyproline-linked O-glycosylation, which has been found on recombinant expressed human IgA1 (18) and most recently MUC1 (19). This modification is initiated by a family of prolyl 4-hydroxylases (P4Hs), and the resulting Hyps may subsequently be glycosylated with arabinogalactan or shorter arabinosides. P4Hs are membrane-bound type II proteins found in the ER and extending into the Golgi apparatus (16). Thirteen putative P4Hs have been identified in Arabidopsis (58). In this study a minor amount of hydroxylation of Pro 11 of the MUC1 tandem repeat was observed. Other post- translational modifications, including hydroxyproline-linked O-glycosylation, were not encountered. The recent findings of Pinkhasov et al. (19) showed that either Pro 11 or Pro 12 of the MUC1 tandem repeat was further substituted with three L-arabinofuranoses. This hydroxyproline linked O-glycosylation may be rationalized by the use of different promoter systems driving the transient expression of the MUC1 reporter, where Pinkhasov et al. (19) employed a potent virus-based expression system, perhaps causing increased contact with the endogenous plant post-translational modification machinery.
Our preliminary studies suggest that the number of Pro residues in the MUC1 tandem repeat undergoing hydroxylation and the degree of hydroxylation appear to increase in Arabidopsis plants and tobacco suspension BY-2 cells stably transformed with the same MUC1 reporter construct used in this study. 4 In plants, protein O-glycosylation is primarily found in Hyp-rich glycoproteins of the cell wall, where Ser and in particular Hyp residues may be O-glycosylated. Dipeptidyl proline sequences in angiosperm Hyp-rich glycoproteins prone to hydroxylation were recently summarized by Kieliszewski et al. (59). Consistently hydroxylated sequences included the motifs AP, SP, PP, and the latter extensin-type motif is found in the APPA motif of the MUC1 tandem repeat. Several reports suggest involvement of the three P4Hs, AtP4H2, AtP4H5, and AtP4H13, in hydroxylation of consecutive Pro motifs in particular (17, 20 -22), perhaps pointing to these enzymes as first targets for engineering.
Glycoengineering of host cells generally seeks to achieve human glycosylation produced by normal cells, which involves complex-type N-glycans and/or core1 O-glycans capped with sialic acids to ensure circulation of injectable therapeutics for therapeutic effect. However, another important purpose of engineering O-glycans is to modulate immunogenic glycoforms produced by cancer cells and virus-infected cells. Changes in O-glycosylation are a hallmark of cancer cells, and immunogenic short immature aberrant O-glycans are termed pan-carcinoma antigens (60). Our interest in engineering O-glycosylation from scratch stems from our previous identification of immunodominant aberrant O-glycopeptide epitopes in the cancer-associated MUC1 mucin, which are not covered by immunological tolerance (41,61,62). Vaccination with MUC1 glycopeptides with the truncated GalNAc-Ser/Thr O-glycoform produced by GalNAc-T2 and -T4 in combination produce IgG antibodies with cancer-specific reactivity (61), and spontaneous IgG antibodies to the same epitope are found in many cancer patients at time of diagnosis (3,63). Recombinant production of such vaccines will require a host cell that produces similar truncated O-glycans, and glycoengineered plants as reported here may provide such a system. Interestingly, the acquisition of Hyp in the MUC1 repeat may not adversely affect immunogenicity, but rather stimulate immunity as recently reported for unglycosylated MUC1 vaccine produced in plants (19).
In this study we have elucidated requirements for engineering plants with the capacity to perform initiation of human type O-glycosylation. O-Glycosylation capacities may be engineered in e.g. tobacco with humanized N-glycosylation (64 -69), which may serve as a more general platform for transient (70) as well as stable expression of glycoprotein pharmaceuticals. The results highlight that additional engineering is needed for plants to become versatile production platforms of therapeutics.