Sequential in vitro enzymatic N-glycoprotein modification reveals site-specific rates of glycoenzyme processing

N-glycosylation is an essential eukaryotic posttranslational modification that affects various glycoprotein properties, including folding, solubility, protein–protein interactions, and half-life. N-glycans are processed in the secretory pathway to form varied ensembles of structures, and diversity at a single site on a glycoprotein is termed ‘microheterogeneity’. To understand the factors that influence glycan microheterogeneity, we hypothesized that local steric and electrostatic factors surrounding each site influence glycan availability for enzymatic modification. We tested this hypothesis via expression of reporter N-linked glycoproteins in N-acetylglucosaminyltransferase MGAT1-null HEK293 cells to produce immature Man5GlcNAc2 glycoforms (38 glycan sites total). These glycoproteins were then sequentially modified in vitro from high mannose to hybrid and on to biantennary, core-fucosylated, complex structures by a panel of N-glycosylation enzymes, and each reaction time course was quantified by LC-MS/MS. Substantial differences in rates of in vitro enzymatic modification were observed between glycan sites on the same protein, and differences in modification rates varied depending on the glycoenzyme being evaluated. In comparison, proteolytic digestion of the reporters prior to N-glycan processing eliminated differences in in vitro enzymatic modification. Furthermore, comparison of in vitro rates of enzymatic modification with the glycan structures found on the mature reporters expressed in WT cells correlated well with the enzymatic bottlenecks observed in vivo. These data suggest higher order local structures surrounding each glycosylation site contribute to the efficiency of modification both in vitro and in vivo to establish the spectrum of microheterogeneity in N-linked glycoproteins.

N-glycosylation is an essential eukaryotic posttranslational modification that affects various glycoprotein properties, including folding, solubility, protein-protein interactions, and half-life. N-glycans are processed in the secretory pathway to form varied ensembles of structures, and diversity at a single site on a glycoprotein is termed 'microheterogeneity'. To understand the factors that influence glycan microheterogeneity, we hypothesized that local steric and electrostatic factors surrounding each site influence glycan availability for enzymatic modification. We tested this hypothesis via expression of reporter N-linked glycoproteins in N-acetylglucosaminyltransferase MGAT1-null HEK293 cells to produce immature Man 5 GlcNAc 2 glycoforms (38 glycan sites total). These glycoproteins were then sequentially modified in vitro from high mannose to hybrid and on to biantennary, core-fucosylated, complex structures by a panel of N-glycosylation enzymes, and each reaction time course was quantified by LC-MS/MS. Substantial differences in rates of in vitro enzymatic modification were observed between glycan sites on the same protein, and differences in modification rates varied depending on the glycoenzyme being evaluated. In comparison, proteolytic digestion of the reporters prior to N-glycan processing eliminated differences in in vitro enzymatic modification. Furthermore, comparison of in vitro rates of enzymatic modification with the glycan structures found on the mature reporters expressed in WT cells correlated well with the enzymatic bottlenecks observed in vivo. These data suggest higher order local structures surrounding each glycosylation site contribute to the efficiency of modification both in vitro and in vivo to establish the spectrum of microheterogeneity in N-linked glycoproteins.
Glycans are important modulators of protein properties and functions across all clades of life (1,2). N-glycosylation is a conserved and essential cotranslational and posttranslational modification in higher eukaryotes (3) and plays an important role in protein homeostasis (4). N-glycans are cotranslationally and/or posttranslationally attached en bloc by oligosaccharyltransferase (5) to the conserved motif N-X-S/T(C), also known as a "sequon," where 'X' can be any amino acid except proline (6)(7)(8). The initial oligosaccharide is mannose rich, and it is trimmed by a series of glycoside hydrolases in the endoplasmic reticulum (ER), eventually exposing the core structure of N-glycans, which is made up of a chitobiose core (GlcNAc-GlcNAc) with branching mannoses, Man 5 GlcNAc 2 . N-glycans are generally categorized as belonging to one of three classes based on the extent of their processing: high mannose, hybrid, or complex. High mannose glycans are the least processed and most closely resemble the initial oligosaccharide that is transferred onto proteins, while complex glycans are the most processed and can take a variety of forms. This can include branching, extensions, and core fucosylation (9). However, the efficiency of glycan maturation at a given acceptor site on glycoproteins can often be incomplete, most notably (but not solely) because of steric or electrostatic factors that impact enzyme-substrate recognition (9)(10)(11). This often results in heterogeneous ensembles of glycan structures on glycoprotein acceptor (12) and even on individual glycosites on the same glycoprotein (13). This phenomenon is termed 'microheterogeneity' and is a hallmark of protein glycosylation that has been a focus of biochemical analysis for several decades (14,15).
An important branching point in N-glycan processing is the addition of a branching β-2-linked GlcNAc to the ⍺3 mannose of the Man 5 GlcNAc 2 structure by the GT-A family glycosyltransferase MGAT1 (16,17). This step marks a class switch from high mannose (e.g., Man 5 GlcNAc 2 ) to hybrid (e.g., GlcNAcMan 5 GlcNAc 2 ) glycans, as this branching GlcNAc can be further elaborated by other glycosyltransferases into a variety of structures. The activity of MGAT1 is also necessary for the subsequent trimming of the two terminal mannoses from the ⍺6 mannose by the glycoside hydrolase MAN2A1 (12), which results in a GlcNAcMan 3 GlcNAc 2 structure. This GlcNAcMan 3 GlcNAc 2 structure acts as the substrate for MGAT2 (13), which marks the point at which hybrid N-glycans transition to complex N-glycans, with two branching GlcNAc moieties that serve as a base for highly elaborated biantennary, triantennary, or tetra-antennary structures.
There is much interest in understanding the underlying criteria that define N-glycan microheterogeneity. N-glycans have been shown to be important modulators in antibodyreceptor interactions, both with respect to glycosylation of the antibody (18)(19)(20) and their receptors (21,22). In particular, the contribution of glycosylation to the properties of therapeutics is of particular interest in the development and manufacturing of biologics (4,23) and biosimilars (24). Glycosylation of these therapeutics is known to impact their stability and pharmacokinetics (4). Additionally, N-glycans are a vital component of viral glycoprotein properties and are known to impact host immune surveillance (25,26) and host receptor interactions involved in viral entry (27).
There have been several approaches to studying N-glycan microheterogeneity. Early studies demonstrated that Nglycan microheterogeneity is reproducible on a site-by-site basis (28) and that access by glycosyl hydrolases is predictive of N-glycan processing (29). NMR studies of N-glycan structures suggest that glycan interactions with the protein backbone can alter glycan conformations in ways that can impact N-glycan processing (30)(31)(32). There is evidence that changing nearby amino acids can alter N-glycan heterogeneity (10,33). Some recent studies have taken a systems approach, often by monitoring in vivo processing of N-glycans in cell culture systems (10,11). The involvement of the peptide-glycan interactions in affecting glycan conformation and thus potentially N-glycan processing, has also been supported by molecular dynamics (MD) simulations using yeast protein disulfide isomerase (PDI) as a model reporter glycoprotein (34,35). Additionally, a meta-analysis of sitespecific glycoproteomics papers found that solvent accessibility is related to the extent of branching and core fucosylation (36). A recent study by Mathew et al. studied early Nglycan processing steps through MD simulations and in vitro processing of N-glycans on the yeast protein disulfide isomerase, a similar approach as this study (37). They found that the shape of the surrounding protein environment and subsequent glycan conformation can influence the rate at which individual sites are processed. They studied early mannose trimming and class switching from high mannose to hybrid glycans using three mannosidases and the GlcNAc transferase MGAT1 on the processing of PDI. Our work here expands on this by using enzymes involved in later N-glycan processing using not only the model yeast protein PDI but also multiple N-linked glycoproteins of interest to human health.
In this study, we report extensive site-specific in vitro Nglycan processing data for five multiply N-linked glycosylated proteins, with 38 different sites of N-glycosylation in total. By enriching all sites of all glycoproteins with a common Man 5-GlcNAc 2 substrate and then monitoring N-glycan processing through time-course reactions, we were able to identify key bottlenecks that prevent specific sites on glycoproteins from being converted from high mannose to complex N-glycans. These bottlenecks appear to persist in vivo upon microheterogeneity analysis of each site of the reporter proteins when expressed in WT cells. Additionally, we found that removing the tertiary structure of the protein abolished all site specificity of N-glycan processing, highlighting the importance of protein tertiary structure in defining N-glycan microheterogeneity.

Expression of reporter proteins in WT-HEK293F cells
In order to probe individual steps of N-glycan processing, we first established a set of reporter proteins to be used as case studies (Table 1, Fig. S1). These proteins were selected based on their various applications in biology, virology, and use as therapeutics as well as their diversity in displayed glycans and the availability of quality crystal structures. CD16a (Fc ᵧ receptor IIIa) is an IgG receptor that is known to have differential affinities to antibodies depending on its glycan presentation, which impacts downstream signaling (22,33,38). PDI is a resident ER glycoprotein that has been used as a model protein for studying N-glycan processing due to its ease of expression, analysis, and well-defined site-specific glycan heterogeneity (10,11,34,35,37). Etanercept is a bioengineered therapeutic fusion protein of a TNF⍺ receptor and an IgG1 Fc domain commonly used to help treat autoimmune disorders (39). Erythropoietin is a therapeutic glycoprotein that stimulates red blood cell growth, and its glycosylation is known to impact its pharmacokinetics (40)(41)(42). Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike glycoprotein is a highly glycosylated trimer that is responsible for the viral entry of the associated coronavirus SARS-CoV-2 via binding to the human receptor ACE2 (27,43).
The reporter proteins were first transiently expressed in high yields in WT HEK293F cells and then harvested from supernatant and purified with Ni-NTA chromatography (Fig. S1, B-F). Glycopeptide analysis using LC-MS/MS was performed on these purified proteins in order to determine their glycan occupancy and diversity when expressed in a "WT" background ( Fig. 1). With LC-MS/MS techniques, we were able to obtain a detailed characterization of the N-glycan profile at each site on the reporter proteins, some of which contained dozens of different glycan moieties with a variety of terminal structures including sialylation, as well as core fucosylation (Fig. 2, A-C). Monitoring N-glycan microheterogeneity All classes of glycans were observed at most sites (Figs. 3, S5-S42), with SARS-CoV-2 spike glycoprotein pictured separately due to its large number of sites (Fig. S2A). Of particular interest are the sites on reporter glycoproteins that greatly differ from other sites on the same protein: sequons 2 and 4 on CD16a (Fig. 3A) and sequon 4 on PDI (Fig. 3B) are predominantly less processed high mannose and hybrid structures, while the other sites of N-glycosylation on the same proteins are mostly highly processed complex structures. This is in contrast to etanercept (Fig. 3C) and erythropoietin (Fig. 3D), which have more homogenous N-glycan presentations. The SARS-CoV-2 spike glycoprotein had a diversity of N-glycan presentations on its 22 sites, with most sites enriched with complex N-glycans and certain sites mostly presenting high mannose N-glycans (Fig. S2). Interestingly, we noted that Man5GlcNAc2 was always the most abundant high mannose structure on 36 of our 38 sites with two exceptions being N0234 and N0717 of SARS-CoV-2 that both contain less than 15% complex structures (Figs. S2, S27, S35).

Expression of reporter proteins in Lec1-HEK293F cells
Next, these same proteins were transiently overexpressed in HEK293S GnTI-(MGAT1 null) cells. The activity of MGAT1 is necessary for the formation of both hybrid and complex Nglycans, as the addition of GlcNAc to the nonreducing end α3linked mannose is needed for further elaboration and capping by downstream enzymes. KO of MGAT1 substantially reduces the diversity of glycans at all sites of N-glycosylation and causes a significant enrichment of Man 5 GlcNAc 2 structures N-glycans on expressed glycoproteins, as shown on sequon 2 of CD16a (Fig. 2D), as well as the other reporter sites (Figs. S5-S42). This is useful because it allows for the in vitro processing of all N-glycans on a glycoprotein to begin from a common substrate. This enrichment was successful for most sites on all reporter proteins (Figs. 3, S2B). However, there is an exception at Sequon 2 of etanercept, which contained a significant population of apparent hybrid N-glycans by an unknown processing event that we are currently exploring. These appear to be true hybrid structures with an attached GlcNAc on the ⍺-3-linked mannose based on MS2 fragmentation data ( Fig. S16B).

Conversion of high mannose glycans to hybrid glycans
In order to probe the effects of tertiary structure on N-glycan processing, we first monitored the conversion of Man 5 GlcNAc 2 N-glycans to GlcNAcMan 5 GlcNAc 2 N-glycans via the addition of GlcNAc by the glycosyltransferase MGAT1 on intact reporter proteins expressed and purified from MGAT1-deficient cells. We did this through a series of timecourse reactions using purified protein and MGAT1 in the presence of the nucleotide sugar donor UDP-GlcNAc followed by analysis and quantitation via LC-MS/MS (Fig. 4, example TIC can be found in Fig. S43). The ratio of enzymes to molarity of reporter N-glycan sites was kept constant so that we could compare the interprotein as well as intraprotein rates of N-glycan processing. We observed site-specific rates of GlcNAc addition across the range of our 38 sites of N-glycosylation. Generally, these rates corresponded well to the distributions of N-glycans that were found on the respective sites when the reporters were expressed in WT HEK293F cells Monitoring N-glycan microheterogeneity (Figs. 3, S5-42). For example, sequons 2 and 4 on CD16a are both modified relatively slowly by MGAT1 (Fig. 4A) and are also enriched with high mannose structures when expressed in WT HEK293 cells (Fig. 3A). This also holds for sequon 4 of PDI (Fig. 3B), which is the only site on PDI that is modified slowly by MGAT1 (Fig. 4B). This pattern is also demonstrated on SARS-CoV-2 spike glycoprotein: we observed a broad distribution of MGAT1 activity rates across its 22 sites of Nglycosylation (Figs. 4E, S3), and the fastest and slowest sites are enriched in complex and high mannose N-glycans, respectively (Fig. 4F). This is best represented by sites N0234 and N0717 on Spike glycoprotein, which had slow transfer rates and primarily were occupied by high mannose N-glycans when expressed in a WT background (Figs. 4F, S2).
In contrast to the demonstration of site specificity for PDI and CD16a, all sites on etanercept have relatively high levels of high mannose glycans (Fig. 3C), and all are processed slowly by MGAT1 (Fig. 4C). Additionally, erythropoietin's three sites of N-glycosylation are all processed efficiently by MGAT1 (Fig. 4D) and when expressed in WT HEK293 cells mostly produce complex N-glycans (Fig. 3D).

Core fucosylation of N-glycans by FUT8
Following the aforementioned experiments, we wanted to see if similar patterns of site-specific N-glycan processing rates would apply to core fucosylation. Core fucosylation is the attachment of an ⍺1,6-linked fucose to the GlcNAc that is directly attached to the asparagine at the core of N-linked glycans, a reaction which is catalyzed by the fucosyltransferase FUT8. This reaction is generally specific to complex Nglycans (44), and thus, we generated GlcNAc 2 Man 3 GlcNAc 2 glycans on our collection of reporter proteins by reacting with an excess of MGAT1, MAN2A1, and MGAT2 in the presence of the UDP-GlcNAc sugar donor. We then examined the rates of modification of the respective glycans with FUT8 (Fig. 7). Interestingly, at many sites we found substantial core fucosylation prior to in vitro processing despite the reporter proteins being expressed in an MGAT1-null cell line and thus lacking complex N-glycans (Fig. 7, A, B, D and E, S5-S42). Otherwise, we observed a diversity of fucosylation rates among the different sites. Sequon 2 on CD16a (Fig. 7A) and sequon 1 and 5 on PDI were markedly slow (Fig. 7B), as well as sequon 3 on etanercept (Fig. 7C). All sites on erythropoietin were fucosylated rapidly, which possibly reflects their high initial levels of fucosylation even before the FUT8 reaction (Fig. 7D). Most sites on SARS-CoV-2 spike glycoprotein were efficiently fucosylated (Fig. 7E), with a few exceptions (N0122, N0801, N1074, and N1098) (Fig. S3). Glycopeptide analysis data reflects that sites modified more rapidly by FUT8 in vitro had higher levels of fucosylation in vivo (Fig. 7F).

Impact of tertiary structure on site specificity
In order to determine whether these site-specific differences in glycosyltransferase rates were due to tertiary structure, we repeated the transfer of GlcNAc onto PDI-Man 5 GlcNAc 2 glycans using MGAT1 but first digested the protein with trypsin to cleave the fully folded protein substrate into glycopeptides. Without the reporter protein tertiary structure, all site specificity of GlcNAc transfer rate was lost and the overall rate of transfer was reduced (Fig. 8A compared to Fig. 3B). Since FUT8 activity requires access to the core GlcNAc linked to the Asn residue of the peptide backbone, we were curious to see if the glycoprotein being cleaved to glycopeptides would eliminate the site specificity observed on intact protein. Similar to MGAT1, all site-specific modification by FUT8 was lost following cleavage of CD16a to glycopeptides, with all sites (including the slow site 2 (Fig. 7A)) exhibiting similar rates of modification (Fig. 8B). CD16a was used instead of PDI for the FUT8 experiment to more fully illustrate that local tertiary structure was important for more than just one of our reporter proteins and because site 4 on PDI cannot adequately form complex N-glycans due to low MAN2A1 activity (Fig. 5B).

Discussion
While N-glycans are a crucial component in the production of membrane bound and secreted glycoproteins, the determinants that define the diversity of N-glycan structures at any given site are not well understood. Factors that have been suggested to influence N-glycan microheterogeneity include the expression of glycosyltransferases and glycoside hydrolases (45,46), the availability of nucleotide sugar donors (47), secretory pathway trafficking (11,34,46), and the accessibility of the acceptor site (10,37,48). The impact of enzyme availability in cells has mostly been probed through genetic engineering approaches (45). However, availability of enzymes and sugar nucleotides cannot sufficiently explain site-specific differences on the same polypeptide nor can protein trafficking in the secretory pathway. Thus, we hypothesize that it is the impact of acceptor site accessibility that allows for site-specific differences on the same protein. This could occur by multiple mechanisms including the substrate glycan interacting with the substrate protein backbone at a specific site and thus hampering engagement of the glycoenzyme with the substrate glycan. An example of this is site 4 on PDI that was elegantly demonstrated by Aebi et al. (10). Additionally, the glycosylation on the Fc fragment of etanercept (site 3, N317) differs from what is normally seen on IgG as there is an abundance of high mannose glycans accompanying the expected biantennary complex structures (49), which may indicate altered accessibility to the N-glycan at that site in this non-naturally occurring fusion protein. Another possibility is local secondary and tertiary structure of the substrate protein at individual sites of modification and the glycoenzyme active site resulting in steric or electrostatic clashes that prohibit optimal binding for catalysis. Interestingly, our analysis revealed that proteolytically digesting substrate proteins before transfer reactions abolished site-specific rate differences (Fig. 8) and in the case of MGAT1 reduced the overall rate of reaction. This strongly supports higher order structure of the substrate playing an important role in transfer rates and in agreement with proposed mechanisms for microheterogeneity.
MD studies and Markov state modeling by Mathew et al. demonstrated that the relative amount of time a glycan spends extended away from the protein and exposed to solvent correlates with site specificity of glycan-processing rates on the yeast model protein disulfide isomerase (37). Additionally, they monitored in vitro N-glycan processing rates with ER mannosidases as well as MGAT1 and MAN2A1, and the results with overlapping enzymes in our present study agree well. Site 4 of PDI was identified as a "slow" site (Figs. 4B and 5B), and their Michaelis-Menten analysis of PDI processing kinetics are also consistent with our observations (Figs. 4B and 5B). Additionally, their studies on the kinetics of earlier glycanprocessing steps involving ER mannosidase I and Golgi mannosidase IB show that this site specificity is conserved in earlier steps of N-glycan processing. However, their approach to reduce tertiary structure through reduction and alkylation prior to in vitro modification led to results contrasting with our glycopeptide experiments (Fig. 8), and they observed differences in modification rate at different sites. This may be due to some secondary structures of the protein not being completely disrupted without the use of protease digestion to cleave the model protein utilized or perhaps the denatured protein can still influence site kinetics.
Generally, these results indicate that the tertiary structure specific to an acceptor site can be an important factor in defining the types of N-glycans seen at a given site. In particular, the efficiency (or lack thereof) of MGAT1 and MAN2A1 appears to be highly predictive of high mannosetype glycans at a sequon. In vivo, it is likely that MGAT1 is rate limiting as the most common high mannose structure at most sites when expressed in a WT background is the MGAT1 substrate Man 5 GlcNAc 2 (Figs. S5-S42), and its activity is required for downstream processing by enzymes like MAN2A1. The rate-limiting role of MAN2A1 was also observed in our in vitro studies, particularly at sites that were also poorly modified with MGAT1. This may be partly due to lower activity of the recombinant enzyme employed in our in vitro studies; four times as much MAN2A1 had to be used in assays compared to the glycosyltransferases (1:250 enzyme:substrate molar ratio for MAN2A1 versus 1:1000 for glycosyltransferases). MAN2A1 processing has previously been identified as a potential bottleneck in N-glycan processing (37). Potential steric barriers to MAN2A1 action are suggested by the structure of MAN2A1:substrate complex that demonstrates a significant portion of the total N-linked glycan must fit into the active site of the enzyme for efficient binding and catalysis (50). If a site is not processed totally by MAN2A1, it may form hybrid structures but cannot form complex structures due to the necessity of mannose trimming on the ⍺6 branch of the trimannosyl N-glycan core. By contrast, the active site structure of MGAT1 involved in acceptor recognition has not yet been determined but likely also presents significant steric barriers for access to some poorly modified sites. While these studies provide a sound starting point for determining what structural features may be important in determining N-glycan destiny, much work remains. We purposely chose reporter proteins and processing enzymes with experimentally determined structures (Table 1), (43,(51)(52)(53)(54).
We are currently utilizing MD simulations of glycosylated reporter proteins and site-specific docking of specific glycan modified reporters with glycoenzymes to determine sitespecific glycans interacting with the reporter protein as well as clashes between the reporter sites and the glycoenzymes. This will guide future work involving mutagenesis studies to influence the rate at which glycosyltransferases and glycosyl hydrolases are able to modify acceptor sites. There is evidence that this approach can indeed alter the distribution of N-glycans at a specific site, as evidenced through modification of a tyrosine residue proximal to sequon 4 on protein disulfide isomerase (10). Taking a systematic approach that involves site-specific rate monitoring coupled with modeling and mutagenesis should result in common rules that not only will allow prediction of microheterogeneity but will allow us to tune it.

Expression and purification of glycoprotein reporters and glycosylation enzymes for in vitro modification
Expression constructs encoding the reporter proteins were generated with either NH 2 -terminal fusion tags (CD16a (low affinity immunoglobulin gamma Fc region receptor III-A, FCGR3A), UniProt P08637, residues 19 to 193; Erythropoietin (EPO), UniProt P01588, residues 28 to 193; Etanercept (TNF receptor-IgG1 fusion), GenBank AKX26891, residues 1-467) or C-terminal fusion tags (yeast PDI1 (protein disulfide-isomerase), UniProt P17967, residues 1-494; SARS-CoV-2 Spike glycoprotein, UniProt P0DTC2, residues 1-1208). The constructs employing N-terminal fusion sequences employed the pGEn2 expression vector while the PDI1 construct was generated in the PGEc2 vector as previously described (55). For the pGEn2 constructs, the fusion protein coding region was comprised of a 25 aa signal sequence, an His 8 tag, AviTag, the "superfolder" GFP coding region, the 7 aa recognition sequence of the tobacco etch virus (TEV) protease followed by the catalytic domain region for reporter proteins (55). Constructs encoding MGAT1, MAN2A1, MGAT2, and FUT8 employed the pGEn2 vector and were expressed and purified as previously described (55). For the PDI1 construct, the pGEc2 vector was employed and encoded the segment of Saccromyces cerevisiae PDI1 indicated, followed by an SGSG tetrapeptide, the 7 aa TEV recognition sequence, the "superfolder" GFP coding region, and an His 8 tag (55). For SARS-CoV-2 Spike, the construct contained an additional COOH-terminal trimerization sequence and His6 tag as previously described (56). The recombinant reporter proteins were expressed as a soluble secreted proteins by transient transfection of suspension culture HEK293F cells (FreeStyle 293-F cells, Thermo Fisher Scientific) for WT glycosylated structures and in HEK293S (GnTI-) cells (ATCC) to generate Man 5 GlcNAc 2 -Asn glycan structures (55,57). Cultures were maintained at 0.5 to 3.0 × 10 6 cells/ml in a humidified CO 2 platform shaker incubator at 37 C with 50% humidity. Transient transfection was performed using expression medium comprised of a 9:1 ratio of Freestyle293 expression medium (Thermo Fisher Scientific) and EX-Cell expression medium including Glutmax (Sigma-Aldrich). Transfection was initiated by the addition of plasmid DNA and PEI as transfection reagent (linear 25 kDa PEI, Polysciences, Inc). Twenty-four hours post-transfection, the cell cultures were diluted with an equal volume of fresh media supplemented with valproic acid (2.2 mM final concentration) and protein production was continued for an additional 5 days at 37 C (3). The cell cultures were harvested, clarified by sequential centrifugation at 1200 rpm for 10 min and 3500 rpm for 15 min at 4 C, and passed through a 0.8 μM filter (Millipore). The protein preparation was adjusted to contain 20 mM Hepes, 20 mM imidazole, 300 mM NaCl, pH 7.5, and subjected to Ni-NTA Superflow (Qiagen) chromatography using a column preequilibrated with 20 mM Hepes, 300 mM NaCl, 20 mM imidazole, pH 7.5 (Buffer I). Following loading of the sample, the column was washed with 3 column volumes of Buffer I followed by 3 column volumes of Buffer I containing 50 mM imidazole and eluted with Buffer I containing 300 mM imidazole at pH 7.0. The protein was concentrated to approximately 3 mg/ml using an ultrafiltration pressure cell (Millipore) with a 10 kDa molecular mass cutoff membrane and buffer exchanged with 20 mM Hepes, 100 mM NaCl, pH 7.0, 0.05% sodium azide, and 10% glycerol.

In vitro N-Glycan processing
For the time-course reactions, purified reporter proteins generated in HEK293S (GnTI-) cells were used. Reactions were performed at 37 C in 1.5 ml Eppendorf tubes in a reaction volume of 150 μl, with 20 mM Hepes (VWR) pH 7.5, and 300 mM NaCl (Fisher). For glycosyltransferases, the corresponding nucleotide sugar, UDP-GlcNAc (Sigma) for MGAT1 and MGAT2 and GDP-Fucose (CarboSynth) for FUT8 was kept in excess at 1 mM. MGAT1 and MGAT2 reactions were supplemented with 1 mM MnCl 2 (Sigma). The concentration of total N-glycans for each reaction was kept at 5 μM; for example, for a reporter protein with 5 sites of N-glycosylation, the concentration of the protein would be 1.25 μM. For MGAT1, MGAT2, and FUT8 reactions, a 1:1000 enzyme-toglycan ratio was used with the concentration of respective enzyme at 5 nM; for MAN2A1 reactions, a 1:250 enzyme-toglycan ratio was used with the concentration of MAN2A1 at 20 nM. Prior to adding enzymes, time-course reaction vessels were equilibrated at 37 C for 15 min. At each time point, 20 μl of samples were taken and reactions were deactivated by heating at 95 C for 5 min. The samples were then digested by proteases and processed for LC-MS/MS analysis. In order to prepare substrate reporter proteins for N-glycan processing steps downstream of MGAT1 (e.g., MAN2A1, MGAT2, FUT8), reporters were reacted with the appropriate combination of enzymes and sugar nucleotides at a 1:100 enzyme:glycan ratio for 3 h. Reporters were confirmed to have been >80% converted to desired product by LC-MS/MS, detailed later.
Enzymatic digestion of PDI1, etanercept, EPO, CD16a, and SARS-CoV-2 spike from WT and HEK293S (GnTI-) cells All proteins were reduced by incubating with 10 mM of DTT (Sigma) at 56 C and alkylated by 27.5 mM of iodoacetamide (Sigma) at room temperature in dark. For the intact glycopeptide analysis, aliquots of PDI1 proteins were digested respectively using trypsin (Promega), a combination of trypsin and Glu-C (Promega), or a combination of trypsin and AspN (Promega); aliquots of etanercept proteins were digested respectively using trypsin (Promega) or AspN (Promega); aliquots of EPO proteins were digested respectively using a combination of trypsin and Glu-C (Promega) or Glu-C (Promega); aliquots of CD16a proteins were digested respectively using chymotrypsin (Athens Research and Technology), AspN (Promega), or a combination of chymotrypsin (Athens Research and Technology) and Glu-C (Promega); aliquots of S proteins were digested respectively using alpha lytic protease (New England BioLabs), chymotrypsin (Athens Research and Technology), a combination of trypsin and Glu-C (Promega), or a combination of Glu-C and AspN (Promega). For the analysis of deglycosylated glycopeptides, aliquots of PDI1 proteins were digested respectively using trypsin (Promega) or a combination of trypsin and Glu-C (Promega); aliquots of etanercept proteins were digested respectively using trypsin (Promega) or AspN (Promega); aliquots of EPO proteins were digested respectively using a combination of trypsin and Glu-C (Promega) or Glu-C (Promega); aliquots of CD16a proteins were digested respectively using chymotrypsin (Athens Research and Technology) or AspN (Promega); aliquots of S proteins were digested respectively using chymotrypsin (Athens Research and Technology), a combination of trypsin and Glu-C (Promega), or AspN (Promega). Following digestion, the proteins were deglycosylated by Endo-H (Promega) followed by PNGaseF (Promega) treatment in the presence of 18 O water (Cambridge Isotope Laboratories).

LC-MS/MS analysis of glycopeptides of PDI1, etanercept, EPO, CD16a, and SARS-CoV-2 spike from WT and HEK293S (GnTI-) cells
The resulting peptides from respective enzymatic digestion of each protein were separated on an Acclaim PepMap RSLC C18 column (75 μm × 15 cm) and eluted into the nanoelectrospray ion source of an Orbitrap Fusion Lumos Tribrid or an Orbitrap Eclipse Tribrid mass spectrometer (Thermo Fisher Scientific) at a flow rate of 200 nl/min. The elution gradient for PDI1, etanercept, EPO, and CD16a proteins consists of 1% to 40% acetonitrile in 0.1% formic acid over 220 min followed by 10 min of 80% acetonitrile in 0.1% formic acid. The elution gradient for S protein consists of 1% to 40% acetonitrile in 0.1% formic acid over 370 min followed by 10 min of 80% acetonitrile in 0.1% formic acid. The spray voltage was set to 2.2 kV and the temperature of the heated capillary was set to 275 C. For the intact glycopeptide analysis, full MS scans were acquired from m/z 200 to 2000 at 60k resolution, and MS/MS scans following higher energy collisional dissociation with stepped collision energy (15%, 25%, 35%) were collected in the orbitrap at 15k resolution. For the deglycosylated glycopeptide analysis, full MS scans were acquired from m/z 200 to 2000 at 60k resolution, and MS/MS scans following collision-induced dissociation (CID) at 38% collision energy were collected in the ion trap.
For time-course reactions, a shorter LC gradient was used, and digests of the same reporters were combined prior to analysis for higher throughput. The elution gradient used for PDI, etanercept, EPO, and CD16a proteins was 1% to 80% acetonitrile in 0.1% formic acid over 60 min followed by 5 min of 80% acetonitrile in 0.1% formic acid. The peptides were eluted into the source of an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific). The spray voltage was set to 2.25 kV and the temperature of the heated capillary was set to 280 C. Full MS scans were acquired from m/z 300 to 2000 at 60k resolution, and MS/MS scans following CID at 38% collision energy were collected in the ion trap. The elution gradient used for SARS-CoV-2 spike glycoprotein was 1% to 80% acetonitrile in 0.1% formic acid over 300 min followed by 10 min of 80% acetonitrile in 0.1% formic acid. The peptides were eluted into the source of an Orbitrap Eclipse Tribrid mass spectrometer (Thermo Fisher Scientific). The spray voltage was set to 2.25 kV and the temperature of the heated capillary was set to 275 C. Full MS scans were acquired from m/z 300 to 1900 at 60k resolution, and MS/MS scans following CID at 38% collision energy were collected in the ion trap.

MS data analysis
For the intact glycopeptide analysis, the raw spectra were analyzed using pGlyco3 (58) for database searches with mass tolerance set as 20 ppm for both precursors and fragments.
The database search output was filtered to reach a 1% false discovery rate for glycans and 10% for peptides. The filtered result was further validated by manual examination of the raw spectra. For isobaric glycan compositions, fragments in the MS/MS spectra were evaluated to provide the most probable topologies. Quantitation was performed by calculating spectral counts for each glycan composition at each site. Any N-linked glycan compositions identified by only one spectra were removed from the quantitation. For the deglycosylated glycopeptide analysis, the spectra were analyzed using SEQUEST (Proteome Discoverer 1.4 and 2.5, Thermo Fisher Scientific) with mass tolerance set as 20 ppm for precursors and 0.5 Da for fragments. The search output from Proteome Discoverer 1.4 was filtered using ProteoIQ (v2.7, Premier Biosoft) to reach a 1% false discovery rate at protein level and 10% at peptide level. The search output from Proteome Discoverer 2.5 was filtered within the program to reach a 1% false discovery rate at protein level and 10% at peptide level. Occupancy of each Nlinked glycosylation site was calculated using spectral counts assigned to the 18 O-Asp-containing (PNGaseF-cleaved) and/or HexNAc-modified (EndoH-cleaved) peptides and their unmodified counterparts.
For time-course reactions, quantitation was performed through manual inspection of MS1 spectra using Thermo Freestyle 1.7 (Thermo Fischer Scientific). The intensities of monoisotopic peak heights for all observable charge states for reactants and products were determined and then summed and averaged in triplicate to determine percent conversion to product over time. Plots generated using RStudio (1.4.1717).

Data availability
All data generated or analyzed during this study are included in this article and supporting information files. The glycopeptide analysis MS data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD032149. MS data for time-course reactions available upon request.