Installation of O-glycan sulfation capacities in human HEK293 cells for display of sulfated mucins

The human genome contains at least 35 genes that encode Golgi sulfotransferases that function in the secretory pathway, where they are involved in decorating glycosaminoglycans, glycolipids, and glycoproteins with sulfate groups. Although a number of important interactions by proteins such as selectins, galectins, and sialic acid–binding immunoglobulin-like lectins are thought to mainly rely on sulfated O-glycans, our insight into the sulfotransferases that modify these glycoproteins, and in particular GalNAc-type O-glycoproteins, is limited. Moreover, sulfated mucins appear to accumulate in respiratory diseases, arthritis, and cancer. To explore further the genetic and biosynthetic regulation of sulfated O-glycans, here we expanded a cell-based glycan array in the human embryonic kidney 293 (HEK293) cell line with sulfation capacities. We stably engineered O-glycan sulfation capacities in HEK293 cells by site-directed knockin of sulfotransferase genes in combination with knockout of genes to eliminate endogenous O-glycan branching (core2 synthase gene GCNT1) and/or sialylation capacities in order to provide simplified substrates (core1 Galβ1–3GalNAcα1–O-Ser/Thr) for the introduced sulfotransferases. Expression of the galactose 3-O-sulfotransferase 2 in HEK293 cells resulted in sulfation of core1 and core2 O-glycans, whereas expression of galactose 3-O-sulfotransferase 4 resulted in sulfation of core1 only. We used the engineered cell library to dissect the binding specificity of galectin-4 and confirmed binding to the 3-O-sulfo-core1 O-glycan. This is a first step toward expanding the emerging cell-based glycan arrays with the important sulfation modification for display and production of glycoconjugates with sulfated O-glycans.

The human genome contains at least 35 genes that encode Golgi sulfotransferases that function in the secretory pathway, where they are involved in decorating glycosaminoglycans, glycolipids, and glycoproteins with sulfate groups. Although a number of important interactions by proteins such as selectins, galectins, and sialic acid-binding immunoglobulin-like lectins are thought to mainly rely on sulfated O-glycans, our insight into the sulfotransferases that modify these glycoproteins, and in particular GalNAc-type O-glycoproteins, is limited. Moreover, sulfated mucins appear to accumulate in respiratory diseases, arthritis, and cancer. To explore further the genetic and biosynthetic regulation of sulfated O-glycans, here we expanded a cell-based glycan array in the human embryonic kidney 293 (HEK293) cell line with sulfation capacities. We stably engineered O-glycan sulfation capacities in HEK293 cells by site-directed knockin of sulfotransferase genes in combination with knockout of genes to eliminate endogenous Oglycan branching (core2 synthase gene GCNT1) and/or sialylation capacities in order to provide simplified substrates (core1 Galβ1-3GalNAcα1-O-Ser/Thr) for the introduced sulfotransferases. Expression of the galactose 3-O-sulfotransferase 2 in HEK293 cells resulted in sulfation of core1 and core2 Oglycans, whereas expression of galactose 3-O-sulfotransferase 4 resulted in sulfation of core1 only. We used the engineered cell library to dissect the binding specificity of galectin-4 and confirmed binding to the 3-O-sulfo-core1 O-glycan. This is a first step toward expanding the emerging cell-based glycan arrays with the important sulfation modification for display and production of glycoconjugates with sulfated O-glycans.
Sulfated mucoproteins were originally characterized by use of the periodic acid-Schiff and toluidine blue staining method (1). Sulfated mucins were isolated from gastrointestinal, bronchial, reproductive, and other tissues (2)(3)(4), and sulfation was analyzed by 35 S-radiolabeling and chromatography (5,6). With the development of mass spectrometry (MS), insight into the detailed structures of sulfated glycans on N-glycoproteins and O-glycoproteins has evolved. Khoo et al. (7,8) established a highly selective enrichment and sensitive MS mapping method for sulfated N-glycans and/or O-glycans based on MALDI-MS/ MS analysis of permethylated glycans in negative ion mode. The MALDI-MS/MS was further extended to more comprehensive and targeted LC-MS/MS analysis, resulting in rapid selection and sequencing of sulfated glycans at high resolution (9)(10)(11)(12)(13). Recently, a higher-energy collisional dissociation (HCD) fragmentation technique was used to identify sulfated O-glycans on the mucins5B providing confident identification of larger sulfated glycans (14). A number of sulfated O-glycans were identified in the mouse secondary lymph nodes by MALDI-MS mapping of fractionated permethylated O-glycans (7). Multiple reaction monitoring approach was applied to decipher O-glycan isomers from lubricin using porous graphitized carbon chromatography in the negative ion mode, and a large number of isomeric O-glycans with sulfate and sialic acid groups were identified on lubricin (15) and the salivary mucin7 (MG2) (16).
The sulfotransferases involved in sulfation of O-glycans are poorly defined. A total of 50 sulfotransferase genes are found in the human genome, and these include 13 cytosolic sulfotransferases that function with proteins, lipids, and steroids in detoxification (31,32), two protein tyrosine sulfotransferases (33,34), and 35 Golgi sulfotransferases that use glycans and glycosaminoglycans (GAGs) as acceptor substrates (35). Studies of the latter enzymes have mainly been performed with in vitro enzyme assays, and their functions in sulfation of O-glycans remain largely predictions (please see Table S1 with references).
Here, we took a first step toward stable genetic installation of sulfation capacities for O-glycans in the human embryonic kidney 293 cell line (HEK293) to display and produce sulfated variants of the cancer-associated T (Galβ1-3GalNAcα1-O-Ser/Thr) and Tn (GalNAcα1-O-Ser/Thr) O-glycans on human mucins (36,37). Genetic engineering of the glycosylation capacities in a cell line can be used to dissect biosynthesis and genetic regulation of specific glycosylation features, and with more comprehensive and systematic engineering to develop a cell-based glycan array that displays the glycome on cells for biological interrogation (38,39). We obtained engineered HEK293 cells that display and produce 3-O-sulfated core1 and core2 O-glycans by installation of the GAL3ST2 and GAL3ST4 sulfotransferases and used these to express small GPF-tagged reporters containing tandem repeat (TR) sequences derived from human mucins (40). We used the platform to dissect the binding specificity of galectin-4 (GAL-4) for 3-O-sulfo-core1 O-glycans.

Prediction of human sulfotransferases serving simple Oglycans
Fig. S1 provides an overview of identified human Golgi sulfotransferase genes and their predicted roles in sulfation of glycolipids, glycoproteins, and GAGs (19,38,41). The largest groups of sulfotransferase paralogs are predicted to act in the biosynthesis of GAGs, whereas three groups (CHST8-10, CHST1-7, and GAL3ST1-4) are predicted to serve glycolipids and glycoproteins, including keratan sulfate (KS). These latter groups of sulfotransferases cannot be unambiguously assigned to specific types of glycoconjugates and glycosylation pathways, except for GAL3ST1 that only serves glycolipids (42). Reviewing the available reported data on substrate specificities of the remaining 13 sulfotransferases (please see references in Table S1), we predicted that GAL3ST2 and GAL3ST4 were the primary candidates for directing 3-O-sulfation of the Gal residue in the simple core1 O-glycan (T) structure, and that CHST1 and CHST3 potentially directed 6-O-sulfation of core1 and/or the GalNAc residue of the Tn O-glycan (Fig. 1A). GAL3ST2 was previously in in vitro assays found to exhibit 3-O-sulfotransferase activity with both Galβ1-3GalNAc-Obenzyl and Galβ1-3/4GlcNAc disaccharide substrates, whereas GAL3ST4 was shown to only use the Galβ1-3-GalNAc-O-benzyl substrate (43,44). The CHST1 6-O-sulfotransferase functions not only in KS biosynthesis (45) but also in 6-O-sulfation of Gal residues in extended core1 or with GalNAc in sialylated core1 (mSTa) as demonstrated in in vitro studies in Chinese hamster ovary cells (36). CHST3 appears to be the evolutionary closest sulfotransferase to CHST1 and reported to function in biosynthesis of chondroitin sulfate and KS (46). To install sulfation of ST, T, and Tn O-glycans, we therefore considered these four sulfotransferases (Fig. 1A).

Engineering strategy for 3-O-sulfated core1 O-glycans
The HEK293 cell line is widely used for recombinant expression of glycoproteins, and the wildtype cell has capacity for production of core1 and core2 O-glycans capped by sialic acids (Fig. S2). RNA-Seq transcriptomics analysis of suspension HEK293 wildtype (HEK293 WT ) cells indicated that most of the sulfotransferases predicted to serve glycoproteins, including GAL3ST2 and GAL3ST4, were not expressed (fragments per kilobase of transcript per million mapped reads [FPKMs] <1) (Fig. S3). We used a targeted knockin (KI) strategy to introduce GAL3ST2 or GAL3ST4 in combination with knockout (KO) of competing glycosyltransferases using a validated guide RNA (gRNA) targeting library previously described (41) (Figs. 1A and S2). We first installed GAL3ST2 or GAL3ST4 to generate HEK293 KI GAL3ST2 and HEK293 KI GAL3ST4 and then eliminated core2 O-glycan branching by KO of GCNT1 to establish HEK293 KO GCNT1, KI GAL3ST2 and HEK293 KO GCNT1, KI GAL3ST4 . Subsequently, we eliminated sialylation by double KO of ST3GAL1/2 encoding the main α2-3 sialyltransferases capping core1 O-glycans to generate HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST2 and HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST4 cell lines. This engineering strategy is predicted to provide homogenous unsubstituted core1 substrate for the introduced sulfotransferases and optimal conditions for sulfation. Note that we did not eliminate α2-6 sialyltransferases (ST6GALNAC1-6) that functions on core1 to form dST and monosialyl-T (mSTb) with or without the terminal α2-3 linked sialic acid. We also used the same KO designs without installation of sulfotransferase as counterpart to generate HEK293 KO GCNT1 and HEK293 KO GCNT1, ST3GAL1/2 (Table S2).

Evaluation of sulfation capacities in engineered cell lines
To evaluate effects of the engineering, we first explored lectin and antibody profiling of live nonpermeabilized cells.
Exposure of core1 T O-glycans with lectin (peanut agglutinin [PNA]) and mAbs (3C9, HH8) (47) was used to evaluate the effects of KI of GAL3ST2 and GAL3ST4 (Fig. 2). HEK293 WT and HEK293 KO GCNT1 cells did not expose T O-glycans, but strong binding of all reagents was induced by neuraminidase treatment. Combining KO of GCNT1 and ST3GAL1/2 (HEK293 KO GCNT1, ST3GAL1/2 ) also induced the same strong binding with all anti-T reagents, and the binding was not affected by neuraminidase treatment for the mAbs but slightly for PNA. In contrast, HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST2 or to greater extent HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST4 with KI of sulfotransferases showed reduced or no binding. These reaction patterns were not substantially affected by neuraminidase pretreatment suggesting that the GAL3ST4 and to a lesser extent GAL3ST2 were effective in capping exposed core1 T O-glycans with sulfate.
Previously, GAL3ST2 was suggested to preferentially function with Galβ1-4GlcNAc (LacNAc) substrates compared with core1 (43). We therefore took advantage of the mAb NUH2 directed against the disialyl-I branched glycolipid with α2-3Neu5Ac capping and reactivity with HEK293 WT cells (38,48). KO of GCNT1 slightly enhanced binding of NUH2 (approximately twofold) for unknown reasons, but KI of GAL3ST2 completely abrogated binding of NUH2 similar level as the KO of I-branching GCNT2 gene (38), whereas KI of GAL3ST4 reduced the binding to levels comparable to HEK293 WT cells (Fig. 2). This suggests that GAL3ST2 efficiently sulfates the I branched glycolipid and presumably competes with sialylation by ST3GAL6 (38).
To evaluate if CHST1 and CHST3 sulfated Tn O-glycans, we monitored exposure of Tn with the vicia villosa agglutinin lectin and the mAb 5F4. HEK293 WT cells were barely reactive with these, whereas HEK293 KO COSMC cells were strongly reactive, and this was abolished by KI of ST6GALNAC1 that results in STn (NeuAcα2-6GalNAcα1-O-Ser/Thr) O-glycans as expected (Fig. S4A). KI of CHST1 and CHST3 did not produce substantial changes, suggesting that sulfation of Tn by these sulfotransferases even in the absence of competing galactosylation and sialylation does not occur or is limited.

Sulfation of recombinant expressed secreted mucin TR reporters
To further evaluate sulfation capacities on O-glycosylation, we expressed previously designed human-secreted mucin TR O-glycan reporters in the engineered cells (Figs. 1B and S5) (38,40). We first analyzed the purified mucin1 reporter by SDS-PAGE and Western blot analysis and observed clear shifts in mobility when expressed in cells with different O-glycosylation capacities (Fig. 3A). Loss of core2 O-glycans (HEK293 KO GCNT1 ) enhanced migration, whereas further loss of core1 sialylation (HEK293 KO GCNT1, ST3GAL1/2 ) slowed migration. Importantly, KI of GAL3ST2 and GAL3ST4 in the latter cells with O-glycosylation limited to core1 without α2-3 sialylation (HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST2 and HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST4 ) reverted the faster migration and for GAL3ST4 even increased the migration further, indicating efficient sulfation. The effect of sialic acids on migration was evaluated by neuraminidase treatment, and the migration changes induced by GAL3ST2 and GAL3ST4 were unaffected (Fig. S6A). We further analyzed the mucin1 reporter expressed in HEK293 WT cells with endogenous sialylation capacity (HEK293 KI GAL3ST2 and HEK293 KI GAL3ST4 ) and without core2 (HEK293 KO GCNT1, KI GAL3ST2 and HEK293 KO GCNT1, KI GAL3ST4 ) (Fig. S6B). KI of GAL3ST2 in the background of core2 O-glycosylation in HEK293 WT cells induced a marked shift in migration of the mucin1 reporter, whereas this did not occur in the background of core1 Oglycosylation (HEK293 KO GCNT1, KI GAL3ST2 ). This indicates that GAL3ST2 can override 3-O-sulfation on the LacNAc disaccharide of the core2 branch. In contrast, KI of GAL3ST4

Installation of O-glycan sulfation
did not change mobility of the mucin1 reporter in the WT background but produced a slight shift in migration in combination with KO of GCNT1, which suggests that GAL3ST4 prefers the core1 substrate (Fig. S6B). We also analyzed the purified mucin1 reporters by ELISA with PNA lectin and anti-T mAbs (Fig. 3C), which corroborated the findings.
We then analyzed the potential of CHST1 or CHST3 for 6-O-sulfation of Tn with the mucin1 reporter expressed in HEK293 KO COSMC (Fig. 3B). While KI of CHST1 did not appear to affect the migration of the reporter, we did observe a distinct shift toward faster migration with KI of CHST3. This suggested that CHST3 does confer some degree of sulfation of Tn in the absence of competition by sialylation, despite that we could not demonstrate this with flow cytometry analysis relying on loss of anti-Tn reactivity (Fig. S4). It is important though to note that low levels of sulfation capping may not substantially inhibit binding of the anti-Tn reagents.
Finally, we tested the engineering of 3-O-sulfation of core1 O-glycans with mucin2 and mucin7 TR reporters (Fig. S7). The reporter design enables facile exchange of mucin TRs (38,40), and we stably expressed secreted TR reporters for mucin2 or mucin7 in HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST2 and HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST4 cells. Interestingly, the mucin7 reporter produced the same shift to faster migration as found for the MUC1 reporter, whereas the mucin2 reporter only produced a slight shift suggesting that the mucin2 TR is less prone to core1 sulfation.

Analysis of the mucin1 TR reporter by MS
The human mucin1 TRs of 20 amino acids contain a conserved PDTR sequence motif enabling endo-AspN digestion and bottom-up MS analysis. The mucin1 reporter comprises 6.5 TRs, each with five potential O-glycosites and a maximum of 34 O-glycosites (Fig. S5). LC-MS/MS analysis of the mucin1 reporter expressed with Tn (HEK293 KO COSMC ) and T (HEK293 KO GCNT1, ST3GAL1/2 ) O-glycans showed that all potential O-glycosites were nearly fully occupied with the Thr in PDTR being the most variable site and slight lower occupancy for T compared with Tn O-glycosylation (Fig. S8, A and B). We observed very low amount of an excess of one HexNAc compared with the total number of Ser/Thr potential O-glycosites, as discussed previously (40). A previous study has predicted the presence of GalNAc-GalNAc O-glycan human meconium (49). Glycosylation of this site is performed by the GalNAc-T4 isoenzyme and requires prior glycosylation at other sites in the mucin1 TR (50). Analysis of the sulfated mucin1 reporters expressed in cells with 3-O-sulfation capacities (HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST2 and HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST4 ) showed that the predominant mucin1 TR glycoforms were not sulfated, but glycoforms with 1 to 3 sulfate groups on core1 O-glycans per TR were clearly found, and the general O-glycan occupancy was slightly reduced from 4-5 to 3-4 O-glycans per TR (Fig. S8, C and D). Because of low quality of MS/MS data for sulfated glycopeptides, quantitative analysis of the characteristic precursor ions was performed purely by extracted-ion chromatogram of the theoretical masses corresponding to sulfo-T glycopeptides with various distribution of T and sulfated T epitopes per TR. MS/MS spectra from HCD scans confirmed the existence of the sulfated T on the TR despite the missing information of site localization (Fig. S8D), but the quality of the obtained MS/MS spectra did not enable us to identify specific sites of sulfation.
To further address stoichiometry of the O-glycosylation, we released the TR O-glycodomain from the GFP-mucin1 reporter by Lys-C digestion and performed intact MS analysis of the C4 HPLC-isolated domain (Fig. 4A). It was determined that the mucin1 TRs are highly occupied with Tn as well as T with the predominant identified glycoforms being those with 33 to 34 HexNAc or Hex-HexNAc residues and thus not reflecting the slightly lower occupancy observed with the T glycoform by the bottom-up analysis (Fig. S8B). Note that samples were pretreated with neuraminidase to remove potential α2-6 sialic acids to the internal GalNAc residue. Next, we analyzed the sulfated mucin1 reporter expressed in cells with 3-O-sulfation capacities (HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST2 and HEK293 KO GCNT1, ST3GAL1/2, KI GAL3ST4 ), which confirmed that sulfation was incomplete with the predominant glycoforms predicted to have 24 to 30 core1 O-glycans and 4 to 7 sulfated core1 O-glycans (Fig. 4, C and D). The most sulfated detectable proteoform for both GAL3ST2 and GAL3ST4 KI was 21 T and 13 sulfate groups.
Finally, despite the finding that mucin1 expressed in HEK293 KO COSMC, KI CHST3 resulted a slight change in SDS-PAGE migration (Fig. 3B), we were unable to identify evidence of sulfation in the intact mass and bottom-up analysis (data not shown).

Discussion
The mammalian cell lines, HEK293 and Chinese hamster ovary, widely used for recombinant expression of glycoproteins do not have sulfation capacities for N-glycoproteins and Oglycoproteins and therefore offer nascent systems to reconstruct sulfation of different types of glycans. The immature truncated T, Tn, and STn O-glycans represent prominent cancer-associated glycosylation features found only on Oglycoproteins, including mucins and mucin-like glycoproteins (19), and capping of these by sulfation has been reported (36,37). Here, we aimed at shedding light on the genetic and biosynthetic regulation of sulfation of these cancer-associated O-glycans, and we were able to establish stable engineered HEK293 cells with 3-O-sulfation capacities. Introduction of GAL3ST4 provided selective 3-O-sulfation of core1 O-glycans (Galβ1-3GalNAc), whereas introduction of GAL3ST2 provided broader 3-O-sulfation of core1 and core2 O-glycans (Galβ1-3GalNAc and Galβ1-4GlcNAc) as well as for glycolipids and N-glycans. We were unable to convincingly demonstrate efficient 6-O-sulfation capacity for the truncated Tn O-glycan by introduction of CHST1 or CHST3, but CHST3 remains a candidate since KI of this gene induced a shift in migration of the Tn-mucin1 reporter (Fig. 3B).
We used prior knowledge of sulfotransferases mainly derived from in vitro enzyme assays and phylogenic analysis to identify the most likely candidate sulfotransferases for sulfation of the T and Tn cancer-associated O-glycans (Fig. S1). The putative sulfotransferases involved in the synthesis of 3-O-Sulfo-T and 6-O-Sulfo-Tn O-glycans were predicted to be found among the 3-O-sulfotransferase isoenzymes, GAL3ST2 and GAL3ST4, and the 6-O-sulfotransferase isoenzymes, CHST1 and CHST3, respectively (Fig. S1)  ) that potentially outcompete sulfation. We may not have identified and studied the most optimal enzymes. The 3-O-sulfation of core1 by GAL3ST2 and GAL3ST4 was only demonstrated convincingly in the absence of α2-3 sialylation (Fig. S6, A and B), and the sulfation with these was incomplete with most T O-glycans unmodified despite lack of competition for the T-acceptor substrate as evaluated by intact and bottom-up MS analysis (Figs. 4 and S8). Interestingly though, the immunochemical analysis using flow cytometry (Fig. 2) and ELISA (Fig. 3) indicated that sulfation by GAL3ST4, and to lesser extent GAL3ST2, substantially reduced binding with anti-T reagents. This was particularly evident not only with the mAbs HH8 and 3C9 but also with the more sensitive PNA lectin. Moreover, KI of the GAL3ST sulfotransferases produced clear shifts in migration of the rather distinct mucin1 reporter bands by SDS-PAGE analysis, and with GAL3ST2, the band comigrated with the sialylated glycoforms, whereas with GAL3ST4, the band migrated even faster than this (Figs. 3A and S6). Structural analysis of sulfated glycopeptides by LS-MS/MS is challenging. Enrichment of glycopeptides is needed, and sulfated glycans pose particular challenges because of poor ionization and instability of sulfate groups

Installation of O-glycan sulfation
during the MS/MS fragmentation process. Stabilization of SO 3 group by ion-paring reagent is a method of choice for identification of glycosites in sulfated glycopeptides, and the use of 3K (Lys-Lys-Lys) ion pair complexes has improved analysis of sulfated N-glycopeptides (53). Recent improvements in analysis in negative ion mode after sialic acid neutralization have advanced analysis of sulfated glycan (8,10), and use of the negative ion mode is a promising approach for analysis of sulfoglycopeptides (54,55). Here, we were unable to further define degree and position of the added sulfation introduced, but our study suggests that 3-O-sulfation of at least for core1 O-glycans may be an inefficient process compared with sialylation.
Our interest in enabling the production of core1 (T) and Tn O-glycans with sulfate partly stems from a hypothesis that these glycoforms may represent important tumor-associated epitopes that have hereto been overlooked partly because of analytic constraints and partly because of lack of appropriate antibodies and binding proteins. The T, Tn, and STn O-glycan epitopes constitute the most prevalent cancer-associated glycan epitopes (56,57), and 3-O-sulfation of the Gal and 6-O-sulfation of both Gal and GalNAc in the core1 O-glycan have been identified on mucin1 purified from a breast cancer cell line, as well as on mucin7 isolated from arthritis patients (15,36,37). We previously demonstrated how availability of cells with homogeneous O-glycoforms allow for the production of immunogens that can be used to elicit novel antibodies with highly select specificity for aberrant O-glycoforms of glycoproteins (58), and we envisage that the developed cell expression system for sulfated core1 O-glycans can be used to develop such probes.
We were unable to convincingly demonstrate sulfation of Tn, although the SDS-PAGE analysis of the mucin1 reporter expressed in cells with CHST3 and not CHST1 did suggest some degree of sulfation (Fig. 3B). CHST1 was initially found to 6-O-sulfate the Gal residue in KS and later reported to have activity with mSTa but relatively lower or no activity toward T and Tn structures by in vitro assays (36,45). More recently, CHST1 was shown to induce strong binding of Siglec-7 and 15 to core1 O-glycans (25,36,45). None of the 6-O-sulfotransferases have been reported to sulfate GalNAc in Tn structure, whereas CHST1 is reported to sulfate GalNAc in mSTa, and CHST3 is evolutionary the most closest one to CHST1, which can sulfate GalNAc residue in chondroitin sulfate (36,46,59) (Fig. S1).
Engineering sulfation capacities in HEK293 cells provides a valuable platform for cell-based display of sulfated glycans and interrogation of the specificities of glycan-binding proteins (25,38,60,61). Glycoengineering in mammalian cells has mainly focused on sialylation, and only few studies have addressed sulfation (13,62). We used the cell-based glycan display platform combined with expression of mucin reporters to confirm the binding specificity of GAL-4 for sulfated O-glycans. GAL-4 was previously shown to bind core1 O-glycans with strong preference for 3-O-sulfated O-glycans (51,52), and this was fully recapitulated with the display platform where GAL-4 binding to WT and core1 engineered HEK293 cells was negligible or weak, respectively, whereas KI of GAL3ST2 and GAL3ST4 induced strong binding that was further amplified by expression of mucin reporters (Fig. 5). Recently, we used the same cell engineering strategy and discovered natural ligands related to sulfation for several Siglecs (Siglec-3, Siglec-7, Siglec-8, and Siglec-15) that required expression of the CHST1 6-O-sulfotransferase (25). We were unable to demonstrate clear functions for CHST1 as well as CHST3 in 6-O-sulfation of the cancer-associated Tn O-glycan. CHST1 is the key player for the synthesis of KS, which is the ligand of Siglec-8 and predicted to primarily 6-O-sulfate the galactose residue in LacNAc (Galβ1-4GlcNAc) and α2-3 sialylated LacNAc (63)(64)(65). Moreover, CHST1 was reported to have activity on Gal-NAc residues with mSTa but has relatively lower or no activity toward T or Tn O-glycans in vitro (36), and recently, CHST1 was found to affect core1 O-glycosylation and Siglec-7/15 binding (25). CHST3 was the other likely candidate for synthesis of 6-O-sulfated Tn regarding its known role in 6-Osulfation of GalNAc residues (46,59).
We simplified the complexity of glycosylation by KO of the core2 synthase (GCNT1) and sialyltransferase (ST3GAL1/2), which resulted in production of highly homogeneous molecules and further ensured by neuraminidase treatment of isolated reporters before MS analysis. The mucin1 TR reporters with simplified (Tn and T) and homogenous O-glycans enabled intact mass analysis of the released O-glycodomain (Fig. 4), which provided comparable results for bottom-up analysis (Fig. S8). While intact MS analysis is a powerful tool to explore the glycoproteoform distribution (66), it is clearly still a challenge to apply this for analysis of the inherent microheterogeneity of glycoproteins (67). Combining intact MS and bottom-up LC-MS/MS analysis is often necessary to deduce glycostructures and glycosites in full. Although the MS/MS spectra quality did not provide confident evidence to identify sulfoglycopeptides, HCD MS/MS spectra showed the presence of sulfo-T oxonium fragment ions in the mucin1 TRs (Fig. S8D, inset). A combination of the intact mass analysis of the full mucin1 TR construct together with the extracted ion chromatogram for the predicted sulfated glycopeptide masses from corresponding bottom-up data suggested that the most abundant glycoproteoform was fully occupied by the core1/T structure with one sulfate group for at least four of seven TRs with KI of GAL3ST2 (Figs. 4C and S8C). The most abundant glycoproteoform with KI of GAL3ST4 was predicted to be with at least one sulfo-T residue per TR (Fig. 4D). For both samples, the highest glycoproteoforms detected were with up to 21 T and 13 sulfo-T epitopes, which would correspond to at least two sulfate groups per TR. We were unable to identify sulfation of the Tn O-glycans on the mucin1 reporter expressed in cells with KI of CHST3 despite the clear shift in SDS-PAGE migration, and further advancements are clearly needed.
In summary, we have developed a sustainable cell-based platform for display and production of mucins with sulfated O-glycans. The strategy can be expanded to other sulfotransferases and with focus on more complex O-glycans or other glycoconjugates.

Experimental procedures
Gene targeting in HEK293 cells HEK293 cells were cultured in Dulbecco's modified Eagle's medium (Sigma) media supplemented with 10% fetal bovine serum (Sigma) and 4 mM GlutaMAX (Gibco) at 37 C and 5% CO 2 . Gene targeting was performed by the CRISPR/Cas9 KO with a validated gRNA library for all human glycosyltransferases (GlycoCRISPR) and the site-directed KI using a modified ObLiGaRe-targeted KI strategy as previously described (41,68). Briefly, HEK293 cells were seeded in 6-well plates 1 day prior the transfection and cotransfected by using Lipofectamine 3000 with 1 μg of gRNA and 1 μg of GFPtagged Cas9-PBKS for CRISPR/Cas9 KO or with 1.5 μg of each ZFN tagged with GFP/crimson and 3.5 μg donor plasmid for targeted KI. Cells were harvested 24 h after transfection, and the GFP-labeled bulk pool of cells were enriched by fluorescence-activated cell sorting (SH800; SONY). After 1 week in culture, cells were single sorted into 96-well plates. KI clones were screened by the junction PCR with primers specific for the junction area between the donor plasmid and the AAVS1 locus, and a primer set flanking the targeted KI locus was used to characterize the allelic insertion status. All KO clones were screened by the indel profile of target genes by Indel Detection by Amplicon Analysis (69), and selected clones were further verified by Sanger sequencing.

NHS-biotin labeling of GAL-4
Recombinant intact human GAL-4 (amino acids 1-323; NP_006140.1) without added tags was produced from vector pET28c in Escherichia coli BL21 Star (DE3) cells (Invitrogen) and then purified by affinity chromatography on lactosylsepharose as previously described for galectin-3 (70). A 20fold molar excess of NHS-Biotin (Thermo Fisher Scientific) dissolved in dimethyl sulfoxide was added to a 2 mg/ml of GAL-4 solution in PBS buffer. After 2 h incubation on ice, labeled Gal-4 was separated from the unreacted dye by a buffer change to PBS on a PD10 column.

Cell binding assays
Cell binding assays were performed on ice with mAbs, biotinylated-lectins and biotinylated GAL-4. Optionally, cells were treated with 150 mU/ml Clostridium perfringens neuraminidase (Sigma) for 1 h at 37 C. All mouse mAbs, 3C9, HH8, 5F4, and NUH2 (71) and biotinylated PNA and vicia villosa agglutinin lectins (Vector Laboratories) were incubated at different concentrations for 1 h, followed by washing and incubating with Alexa Fluor 488-conjugated rabbit antimouse IgG/IgM (Thermo Fisher Scientific) or Alexa Fluor 488-conjugated streptavidin (Thermo Fisher Scientific) for 1 h. To assess the GAL-4 binding, cells were prewashed in 100 mM lactose to remove the endogenous binding and incubated with biotinylated GAL-4 on ice for 30 min followed by incubation with Alexa Fluor 647-conjugated streptavidin for 30 min (Thermo Fisher Scientific). Washing was performed with PBSbovine serum albumin (BSA) (1× PBS containing 1% BSA [Sigma-Aldrich]), and cells were resuspended for flow cytometry analysis (SA3800; SONY).

Expression and purification of human mucin reporter proteins
Secreted and transmembrane human mucin TR reporters for mucin1, mucin2, and mucin7 were designed and synthesized as previously reported (Fig. 1B) (38,40). HEK293 cells were transfected with the secreted mucin TR reporters, and stable pools were established by G418 selection. After the enrichment of the GFP-positive population by fluorescence-activated cell sorting (SH800; SONY), cells were seeded at a cell density of 5 × 10 5 cells/ml in 200 ml of serum-free F17 culture media (Thermo Fisher Scientific) supplemented with 0.1% Kolliphor P188 (Sigma) and 4 mM GlutaMax and cultured at 37 C and 5% CO 2 under constant agitation (120 rpm). After 4 to 5 days of culture, the supernatant was harvested and purified by nickel affinity chromatography. Media were mixed 3:1 (v/v) in 4× binding buffer (100 mM sodium phosphate, pH 7.4, and 2 M NaCl) and applied to self-packed nickel-nitrilotriacetic acid affinity resin column (Qiagen), pre-equilibrated in washing buffer (25 mM sodium phosphate, pH 7.4, 500 mM NaCl, and 20 mM imidazole). After washing, bound protein was eluted with 200 mM imidazole in washing buffer, analyzed by SDS-PAGE with Coomassie staining, and desalted by PD-10 column (GE Healthcare). Purified mucins were quantified by BCA Protein Assay Kit (Thermo Fisher Scientific). Western blot analysis was performed using NuPAGE Bis-Tris 4 to 12% gels (Thermo Fisher Scientific) with transfer to nitrocellulose membranes (0.45 μm; Bio-Rad) for 60 min at 100 V constant. Membranes were blocked with 5% skimmed milk in Tris-buffered saline with Tween-20 for 1 h at room temperature (RT), incubated at 4 C overnight with 0.2 μg/ml anti-FLAG M2-Peroxidase-HRPconjugated mAb (Sigma) diluted in 5% skimmed milk in Trisbuffered saline with Tween-20, and developed with Pierce ECL substrate (Thermo Fisher Scientific) and visualized on BioSpectrum (UVP BioImaging Systems).

Intact mass analysis
Samples were analyzed by EASY-nLC 1200 UHPLC (Thermo Fisher Scientific) interfaced via nanoSpray Flex ion source to an on Orbitrap Fusion/Lumos instrument (Thermo Fisher Scientific) using "high" mass range setting in m/z range 700 to 4000. Instrument was operated in "low pressure" mode to provide optimal detection of intact protein masses. MS parameter settings: spray voltage 2.2 kV, source fragmentation energy 35 V. All ions were detected in Orbitrap at the resolution of 7500 (at m/z 200). The number of microscans was set to 20. The nLC was operated in a single analytical column set up using PicoFrit Emitters (New Objectives; 75 mm inner diameter) packed in-house with C4 phase (Dr Maisch; particle size of 3.0 μm, column length of 16-20 cm z 350-2000) of intact peptides was acquired in the Orbitrap at the nominal resolution setting of 120,000, followed by Orbitrap HCD-MS2 and at the nominal resolution setting of 60,000 of the five most abundant multiply charged precursors in the MS1 spectrum; a minimum MS1 signal threshold of 50,000 was used for triggering datadependent fragmentation events. Targeted MS/MS analysis was performed by setting up a targeted MSn Scan Properties pane.

MS data analysis
Glycan and glycopeptide compositional analysis was performed from m/z features extracted from LC-MS data using in-house written SysBioWare software (72). For m/z feature recognition from full MS scans, Minora Feature Detector Node of the Proteome discoverer 2.2 (Thermo Fisher Scientific) was used. The list of precursor ions (m/z, charge, peak area) was imported as ASCII data into SysBioWare, and compositional assignment within 3 ppm mass tolerance was performed. The main building blocks used for the compositional analysis were theoretical monoisotopic mass increment of NeuAc, Hex, HexNAc, dHex, and the theoretical monoisotopic mass increment of the most prominent peptide corresponding to each potential glycosites. To generate the potential glycopeptide list, all the glycoforms with an abundance higher than 5% of the most abundant glycoform were used for glycan feature analysis. Raw spectra for intact mass analysis were deconvoluted to zero charge by BioPharma Finder Software (Thermo Fisher Scientific) using default settings. Glycoproteoforms were annotated by in-house written SysBioWare software (72) using average masses of hexose, Nacetylhexosamine, and the known backbone mass of mucin1 TR reporter increment.

Data availability
All data generated or analyzed during this study are included in this article and supporting information files. The MS proteomics data have been deposited to the Proteo-meXchange Consortium via the PRIDE (73) partner repository with the dataset identifier PXD028982.
Supporting information-This article contains supporting information (5, 19, 36, 38, 42-46, 60, 74-108, 109-112). Conflict of interest-University of Copenhagen has filed a patent application on the cell-based display platform. GlycoDisplay ApS, Copenhagen, Denmark, has obtained a license to the field of the patent application. Y. N. and H. C. are cofounders of GlycoDisplay ApS and hold ownerships in the company. H. L. is cofounder and consultant with Galecto, Inc. All other authors declare that they have no conflicts of interest with the contents of this article.