Probing the contribution of individual polypeptide GalNAc-transferase isoforms to the O-glycoproteome by inducible expression in isogenic cell lines

The GalNAc-type O-glycoproteome is orchestrated by a large family of polypeptide GalNAc-transferase isoenzymes (GalNAc-Ts) with partially overlapping contributions to the O-glycoproteome besides distinct nonredundant functions. Increasing evidence indicates that individual GalNAc-Ts co-regulate and fine-tune specific protein functions in health and disease, and deficiencies in individual GALNT genes underlie congenital diseases with distinct phenotypes. Studies of GalNAc-T specificities have mainly been performed with in vitro enzyme assays using short peptide substrates, but recently quantitative differential O-glycoproteomics of isogenic cells with and without GALNT genes has enabled a more unbiased exploration of the nonredundant contributions of individual GalNAc-Ts. Both approaches suggest that fairly small subsets of O-glycosites are nonredundantly regulated by specific GalNAc-Ts, but how these isoenzymes orchestrate regulation among competing redundant substrates is unclear. To explore this, here we developed isogenic cell model systems with Tet-On inducible expression of two GalNAc-T genes, GALNT2 and GALNT11, in a knockout background in HEK293 cells. Using quantitative O-glycoproteomics with tandem-mass-tag (TMT) labeling, we found that isoform-specific glycosites are glycosylated in a dose-dependent manner and that induction of GalNAc-T2 or -T11 produces discrete glycosylation effects without affecting the major part of the O-glycoproteome. These results support previous findings indicating that individual GalNAc-T isoenzymes can serve in fine-tuned regulation of distinct protein functions.

GalNAc-type (mucin-type) O-glycosylation (hereafter simply O-glycosylation) is arguably one of the most abundant and complex types of protein glycosylation, and it is predicted that over 80% of proteins passing through the secretory pathway are O-glycosylated (1). O-Glycosylation is initiated by the transfer of GalNAc moieties to the hydroxyl group of Ser, Thr, and, less frequently, Tyr amino acids, a reaction catalyzed by up to 20 polypeptide GalNAc-transferases (GalNAc-Ts). 2 O-Glycans are further sequentially elaborated by a large number of glycosyltransferases to produce diverse structures with various biological functions (2)(3)(4). O-Glycans are found in both dense mucin-like regions of proteins or in more isolated positions, and the latter may serve to fine-tune protein functions and, for example, co-regulate limited pro-protein processing, ectodomain shedding, signaling, and receptor dimerization (2,3,(5)(6)(7). No apparent consensus motifs have emerged for O-glycosylation in general or for individual GalNAc-Ts, and apart from features such as glycosites often having adjacent Pro, Ser, or Thr residues, we still have little understanding of how specific sites are selected for glycosylation (1,8,9).
The large family of GalNAc-T isoenzymes clearly orchestrate the selection of sites, and there is now substantial evidence demonstrating that distinct GalNAc-Ts initiate O-glycosylation, whereas others follow up by incorporating additional O-glycans at adjacent sites (9,10). In addition, "long-range" O-glycosylation is directed by the GalNAc-T C-terminal lectin domain that recognizes partially glycosylated substrates and directs glycosylation toward sites ϳ10 residues away (8,(11)(12)(13)(14). Studies of the initiating type of GalNAc-T enzymes (mainly GalNAc-T1, -T2, -T3, -T6, and -T11) have revealed quite broad substrate specificities with only a small subset of glycosites identified as being uniquely glycosylated by individual isoforms (15)(16)(17)(18)(19). The substrate specificities of GalNAc-Ts have mainly been explored by in vitro enzyme assays using short peptide and The authors declare that they have no conflicts of interest with the contents of this article. This article contains Data S1, Tables S1-S3, and Figs. S1-S15. The mass spectrometry raw data have been deposited to the ProteomeXchange Consortium (70) via the PRIDE partner repository with the data set identifier PXD010155. 1 To whom correspondence should be addressed: Copenhagen Center for Glycomics, Dept. of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200 Copenhagen N, Denmark. Tel.: 45-35327797; Fax: 45-35367980; E-mail: schjoldager@sund.ku.dk. 2 The abbreviations used are: GalNAc-T, GalNAc-transferase isoenzyme; KO, knockout; KI, knock-in; GWAS, genome-wide association study; HEK, HEK293; GOI, gene of interest; CHO SH, Chinese hamster "safe harbor"; MG, malachite green; TMT, tandem-mass-tag; PNA, peanut agglutinin; LWAC, lectin weak-affinity chromatography; FT, flow-through; HDL, high-density lipoprotein; RT, room temperature; HCD, higher-energy c-trap dissociation; ETD, electron transfer dissociation; ZFN, zinc finger nuclease; AAVS1, adeno-associated virus "safe harbor". cro ARTICLE glycopeptide substrates (12,20,21), but more recently, use of isogenic cell lines with knockout (KO) and/or knock-in (KI) of individual GALNT genes coupled with the development of quantitative differential O-glycoproteomics has provided more comprehensive insight into the contributions of individual Gal-NAc-Ts to the O-glycoproteome (22)(23)(24)(25). Perhaps surprisingly, this latter approach has indicated that the nonredundant contributions of GalNAc-T isoenzymes to the O-glycoproteome appear to be minor, and the majority of O-glycosites are redundantly glycosylated among the isoforms present in a cell (22). However, this may be in agreement with the subtle phenotypes found in mice deficient in Galnt genes (26 -29) as well as the distinct phenotypes found in individuals with congenital deficiencies in GALNT3 and GALNT2 (25,30).
Congenital deficiencies caused by structural defects in the coding regions of GALNT genes are very rare (31); however, an increasing number of genome-wide association studies (GWASs) points to GALNT genes for diverse disease traits or dispositions (32). We recently pursued one GWAS candidate, GALNT2, associated with plasma HDL-cholesterol and triglyceride levels (33)(34)(35) and demonstrated that two humans with homozygous deleterious mutations in GALNT2 and animal Galnt2 deficiency models exhibited lowered plasma HDL-cholesterol, thus validating the GWAS. These genome-wide association variants are positioned in GALNT2 intron 1, in close proximity to a liver-specific transcriptional regulator, and we and others have demonstrated that these polymorphisms can affect GALNT2 transcription levels (25,36,37). Moreover, it appears that most human glycosyltransferase genes identified as GWAS candidates have diagnostic nucleotide variants positioned within intronic regions (32) and thus may also represent gene variants with altered regulation. Surveying the degrees of potential transcriptional regulation of glycosyltransferase genes by reported RNA-Seq transcriptome data indicated indeed that genes with predicted high transcriptional regulation were overrepresented among GWAS candidate genes (32). These findings pose a fundamental question. What is the need for fine-tuned regulation of the glycosylation process, and how may this regulation be achieved? The glycosylation process involves a large number of glycosyltransferases that act sequentially and competitively on diverse substrates, and the kinetic properties, substrate availabilities, and subcellular topologies of these enzymes direct the glycosylation outcome (19,38). However, many glycosyltransferases appear to be supersaturated in cells because, for example, proficient glycosylation can be achieved for recombinant glycoproteins over a wide range of expression levels (39). Moreover, glycosyltransferase isoenzymes have overlapping redundant functions that may affect regulation of specific functions. The latter is particularly apparent for the large group of GalNAc-T isoenzymes that have a plethora of possible peptide/protein substrates and a high likelihood for redundant activity toward these, but presumably with great variation in kinetic properties. It is therefore of considerable interest to probe the effects of regulating such enzymes.
In this study, we developed isogenic human embryonic kidney HEK293 (HEK) cell lines with inducible expression of Gal-NAc-T2 or -T11. Two isoenzymes endogenously expressed in these cells and kidney. We constructed isogenic HEK cell lines with KO of GALNT2 or GALNT11 and by KI installed corresponding doxycycline-inducible expression cassettes with the coding region of GALNT2 or GALNT11. We applied a quantitative differential O-glycoproteomics approach to probe effects of gradual induction, up to and beyond the normal endogenous levels of the enzymes, and found impressive selective and gradual induction of glycosylation of specific nonredundant substrates for each isoenzyme in an isoform-specific and dosedependent manner. The study clearly demonstrates that regulation of glycosyltransferase isoenzymes can fine-tune glycosylation and the functions exerted on proteins.

Establishing isogenic cells with inducible GalNAc-T2 and -T11 expression
HEK cells express a subset of the 20 GALNT genes with the major isoenzymes expressed being GALNT1 and -T2; with lower levels of GALNT3, -T7, -T11, and -T18; and with barely detectable levels of GALNT6 and -T16 (Fig. S1). HEK cells produce mainly sialylated core 1 O-glycans with a minor amount of core 2 structures (40).
We first established HEK WT cells with KO of GALNT2 (HEK⌬T2 clone 2A6) or GALNT11 (HEK⌬T11 clone 7E5) using zinc-finger nucleases (ZFNs) (the generation of these cell lines will be described elsewhere. 3 Loss of expression of Gal-NAc-T2 and -T11 was verified by direct immunocytology of cells either trypsinized or grown on coverslips with validated mAbs to GalNAc-T2 (4C4) and -T11 (1B2) (15,41). For Gal-NAc-T2, flow cytometry of permeabilized cells was also performed. Next, we installed doxycycline-inducible expression cassettes with full coding constructs of human GalNAc-T2 or -T11 in intron 1 of the PPP1R12C gene by KI at a locus known as the adeno-associated virus "safe harbor" (AAVS1) (42) in the corresponding KO cells (Fig. 1A).
The design was built on the PrIITE platform, where the Tet-On 3G (Tet-On) transactivator gene and inducible gene of interest (GOI) are introduced at separate loci (43). We predicted that it would be possible to stably incorporate both the Tet-On 3G gene and the inducible GOI at the same locus without loss of performance, and we used a tandem KI strategy where the first cassette provides a landing pad for the second cassette. This strategy continuously provides a fresh target sequence for the alternating pairs of ZFN used. Paramount to attaining low basal expression or "leakiness" is to prevent transcriptional overlap from the first cassette to the second cassette harboring the inducible GOI, which we expected to achieve by including insulator sequences (500 nucleotides) flanking the cassettes (Fig. 1A).
First the pCMV-TETOn3G cassette was stably integrated at the AAVS1 locus in HEK⌬T2 and HEK⌬T11 cell lines using ZFN-mediated KI (Fig. 1B). Cells expressing Tet-On, as confirmed by AAVS1 junction PCR and the ability to induce a GFP reporter construct, were isolated either by single cell sorting or

O-Glycoproteome contribution of individual GalNAc-Ts
FACS. Tet-On-positive cells were subjected to a second round of ZFN-mediated KI for insertion of the coding regions of GALNT2 or GALNT11 under control of the pTRE3G-inducible promoter, at the Chinese hamster "safe harbor" (CHO SH) landing pad within the pCMV-TETOn3G cassette ( Fig. 1A and Fig. S2A). Single-sorted cells were screened by immunocytology for the ability to induce expression of the relevant Gal-NAc-T using mAbs. The AAVS1 KI architecture of the final HEK ind T2 clone (clone 385) and HEK ind T11 clone (clone 40) (designated HEK ind T2 and HEK ind T11, respectively) was mapped by AAVS1 junction PCRs spanning the left, internal, and right junction (Fig. S2A). The HEK ind T2 was concluded to have a mono-allelic insertion of the Tet-On cassette, likely with a tandem integration of the GALNT2 cassette, based on the finding that the internal junction PCR was negative, whereas the right junction PCR was positive, indicating that the cassette had been inserted with a compromised 5Ј-end (Fig. S2B). We investigated this further by using different internal junction PCR primer pairs and found that a truncated form of the GALNT2 cassette had been integrated in tandem (Fig. S3A). However, we were still unable to obtain a positive internal junction between the Tet-On and the GALNT2 cassette with this

O-Glycoproteome contribution of individual GalNAc-Ts
primer pair (G-E). The HEK ind T11 clone produced positive PCR products for all junctions and was therefore considered to have biallelic integration of the Tet-On cassette and monoallelic integration of the GALNT11 cassette (Fig. S2C). Proposed structures, based on junction PCRs, of the KI architectures for HEK ind T2 and HEK ind T11 are shown in Fig. S3 (B and C, respectively). The genetic engineering did not appear to affect growth behavior or gross morphology, and both clones induced reproducibly over more than 20 passages and several freeze/ thaw cycles. The HEK ind T2 and HEK ind T11 clones were initially characterized by immunocytology, and there was no detectable staining with mAbs to GalNAc-T2 or -T11 when cells were cultured in the absence of doxycycline. Growth in 50 ng/ml doxycycline hyclate (hereafter simply doxycycline) induced strong juxtanuclear Golgi-like staining similar to the subcellular localization found in WT HEK cells, and this staining pattern coincided with the staining produced for the Golgi marker Giantin (Fig. S4). GalNAc-T2 and -T11 were readily induced, and enzyme expression could be detected after overnight doxycycline induction. To conveniently study the effect of induced GalNAc-T2 or -T11, while allowing time for biosynthesis of new glycosylation acceptor proteins, we chose to fix the induction time at 48 h for all experiments.
To further characterize the induction dynamics and determine dose windows of interest, we used indirect antibody staining and flow cytometry to quantify the expression of Gal-NAc-T2 in fixed and permeabilized HEK ind T2 cells with graded induction from 0 to 100 ng/ml doxycycline for 48 h (Fig. 2). In the absence of inducer, there were no cells with detectable expression of GalNAc-T2, in agreement with the immunocytology analysis. The graded induction resulted in a discrete global right shift of the lower-expressing population as the concentration of doxycycline was increased from 0 to 16 ng/ml (Fig.  2B). Importantly, this low to intermediate response corresponds to the expression levels of the endogenous enzyme in HEK WT cells ( Fig. 2A, bottom). By increasing the doxycycline concentration further from 50 to 100 ng/ml, we observed a biphasic induction mode with intermediate and high-level expressing cells and a maximum limit of induction ϳ10-fold above the endogenous HEK WT levels. At these concentrations of doxycycline, it has been reported that mitochondrial functions may be disturbed (44). Thus, in the following experiments, we chose to study HEK ind T2 induced with 0 -16 ng/ml doxycycline within the range of WT expression. In parallel, we performed secondary antibody and IgG isotype controls for all samples. The controls showed no change in staining upon Gal-NAc-T2 induction (Fig. S5). GalNAc-T11 expression in HEK WT is relatively low compared with GalNAc-T2 (Fig. S1), and we were only able to detect weak expression by immunocytology and not by flow cytometry (not shown). However, induction of GalNAc-T11 expression with 0 -10 ng/ml doxycycline appeared by immunocytology to reach HEK WT levels (not shown).

Functional validation of GalNAc-T2 induction with a biosensor
We took advantage of a recently developed GalNAc-T2specific cell-based biosensor based on an isoform-specific glycosylation site that regulates proprotein convertase processing of angiopoietin-like 3 (ANGPTL3) (45,46). The biosensor is composed of a fluorogen-activating protein (FAP) fused to a blocking domain (BD) by a linker sequence containing the O-glycosite and proprotein convertase cleavage site of ANG-PTL3 (Fig. 1D). The absence of GalNAc-T2-mediated glycosylation leads to furin cleavage of the linker, departure of the blocking domain, and dimerization of the fluorogen-activating protein domains, which at the cell surface bind and activate the dye malachite green (MG). The biosensor is GFP-tagged, allowing measurement of both sensor expression and activation. Cells stably expressing the T2 biosensor or a control ⌬Gly biosensor without the T2-specific glycosylation site were generated by puromycin selection and FACS enrichment and subsequently used for sensor activation studies. Live cell images of sensor activation are shown in Fig. S6. Already at 4 ng/ml doxycycline we observed a decrease in MG fluorescence for the T2 biosensor, and further induction resulted in a gradual decrease in MG fluorescence quantitated by flow cytometry (Fig. 3). This is in agreement with induction of GalNAc-T2 leading to an increase in linker glycosylation and blockage of furin cleavage. As expected, activation of the control ⌬Gly biosensor was not affected by GalNAc-T2 induction.

O-Glycoproteome contribution of individual GalNAc-Ts Quantitative analysis of the O-glycoproteome during graded induction of GalNAc-Ts
We modified our quantitative O-glycoproteomic workflow by here using multiplex tandem-mass-tag (TMT) amine-reactive labeling (47) of tryptic peptide digests from cell lysates followed by peanut agglutinin lectin weak-affinity chromatography (PNA LWAC) enrichment of O-glycopeptides (25) (Fig.  1E). The TMTsixplex allowed us to study four levels of induction, the uninduced state, and the original HEK WT cells. Based on the immunocytology, flow cytometry, and biosensor experiments, we chose to induce HEK ind T2 or HEK ind T11 cells for 48 h at the following concentrations of doxycycline: 0 (uninduced), 4, 8, 12, and 16 ng/ml for HEK ind T2 and 0 (uninduced), 2.5, 5, 7.5, and 10 ng/ml for HEK ind T11. HEK WT cells were cultured in parallel in the absence of doxycycline. In a separate experiment, we also performed a quantitative differential anal-ysis of cell lysates from HEK WT and HEK⌬T2 to identify GalNAc-T2-specific substrates.
We performed a LC-MS3 analysis of the TMT-labeled tryptic digest before O-glycopeptide enrichment (ratio check) and a LC-MS/MS deep proteome analysis of the LWAC flowthrough (FT) (Fig. 1E). Each induction level serves as a technical replicate, and in the ratio check (Figs. S7A and S8A and Data S1 (B and F)), we found no substantial variations in the proteome except for increased abundance of peptides from the induced GalNAc-T enzyme in the corresponding HEK ind T2 and HEK ind T11 cells (Fig. 4). This trend was also seen in the deep proteome analysis of the LWAC FT (Figs. S11 and S12 and Data S1 (D and H)), where both of the induced enzyme protein sequences had ϳ40% coverage and peptides showed comparable induction levels. Comparing quantitation of GalNAc-T2and GalNAc-T11-derived peptides across all levels of induction to HEK WT, we confirmed that the induced GalNAc-T protein levels are comparable with that of WT cells.
The LWAC-enriched O-glycopeptides were analyzed by LC-MS/MS to produce the HEK ind T2 and HEK ind T11 O-glycoproteome data, which can be found in Data S1 (C and G, respectively) and summarized in Data S1A. Two representative mass spectra showing O-glycopeptide identification and quantification are shown in Fig. S13. The identified O-glycopeptides and glycoproteins were comparable between the two data sets (Fig.  5) and similar to previously obtained data sets derived from human cell lines, plasma, and platelets ( Fig. S9) (25,48). Further, the data included 136 glycoproteins not previously identified in HEK cells, and in agreement with previously reported data, the majority of the identified O-glycopeptides (ϳ90%) contained one or two O-glycans, and only a minor fraction contained three or more (Data S1A).
To identify O-glycopeptides differentially affected by induction of GalNAc-T2 or -T11, we first normalized quantified O-glycopeptides to 0 ng/ml (uninduced), log 2 -transformed the -fold change values, and collapsed unambiguous O-glycopeptides identified multiple times. We analyzed histograms of the transformed quantitation ratios for each of the two isoforms, and for both GalNAc-T2 and -T11, we observed a rather narrow normal distribution with the appearance of right-skewed positive quant ratios upon induction (Figs. S7B and S8B). Given the highly uniform TMT-labeling and the low intersample var-  LC-MS analysis of TMT-labeled tryptic peptides (ratio check) was performed. Protein abundance was normalized to HEK ind T2 and HEK ind T11 cultured in the absence of doxycycline (Dox), and log 2 -transformed. HEK ind T2 reaches cumulative HEK WT GalNAc-T2 levels at 8 ng/ml doxycycline and GalNAc-T11 in HEK ind T11 at less than 2.5 ng/ml. For GalNAc-T2, the data are representative of analysis of two independent inducible clones, and for GalNAc-T11, data represent analysis of one clone.

O-Glycoproteome contribution of individual GalNAc-Ts
iation observed for induced cultures (Figs. S7A and S8A), we selected a log 2 -fold change threshold of 0.8 to group differentially induced O-glycopeptides. For the HEK⌬T2 versus HEK WT differential O-glycoproteome (Data S1J), we selected a rather conservative cut-off ⌬T2/WT log 2 -fold change of Ϫ3.4 (mean minus 2 S.D. values) (Fig. S10) for selection of candidates for GalNAc-T2-specific glycosylation, similar to what was used in previous studies (22).

GalNAc-T2
At the lowest concentration of doxycycline (4 ng/ml) 17 O-glycopeptides (hereafter simply glycopeptides) were induced above the threshold, 105 at 8 ng/ml, 76 at 12 ng/ml, and 172 at 16 ng/ml. A line-plot summary of the O-glycoproteome showing the level of induction for each glycopeptide at each doxycycline concentration is shown in Fig. 6. Comparing the differentially induced individual glycopeptides revealed a group of glycopeptides induced already at 4 ng/ml, the lowest concentration of doxycycline (green lines in Fig. 6), and groups of glycopeptides only surpassing the threshold at the higher levels of induction (blue and orange lines in Fig. 6). This made us classify the induced glycopeptides into low-dose or early responders and higher-dose or late responders. The induced glycopeptides from each group partly overlapped with the glycopeptides identified as lost in the differential O-glycoproteome of HEK WT and isogenic KO cells, and interestingly, the overlap with lowdose responders (4 ng/ml, 8 of 12, 66%) was higher compared with higher-dose responders (8 ng/ml, 24 of 73, 32%; 12 ng/ml, 16 of 47, 34%; and 16 ng/ml, 15 of 118, 12%). Comparing the induced glycosylation sites with our previous in vitro enzyme specificity analysis (21,22,49) revealed that seven of eight in vitro T2-specific sites were among the low-dose responders and one was among the high-dose responders (Table S1). Nine of nine peptides with an isoform specificity other than T2 or no in vitro activity were among the high-dose responders (Table S2).
Glycopeptides from nine glycoproteins were induced at 4 ng/ml (green in Fig. 6 and Data S1C), and among these was a glycopeptide from GalNAc-T2 itself 55 DLHHSNGEEK 64 (unambiguous glycosites in boldface type). This glycosite in GalNAc-T2 is not a specific substrate for the enzyme, and we interpret that this finding reflects induction of the enzyme protein and subsequent glycosylation by the other endogenous GalNAc-Ts expressed in HEK cells (Fig. S1) Fig. 6A and Data S1C). In addition to the glycopeptides induced at the lowest level, we found specifically induced glycosylation on, for example, neuroligin-1/NLGN1 ( 677 QQPSPFSVDQR 687 ), ATF6-␣/ ATF6 ( 405 MNPSVSPANQR 415 ), G protein-coupled receptor 64/GPR64 ( 252 GPPFSSSQSIPVVPR 266 ), and solute carrier family 38 member 10/SLC38A10 ( 787 PGGRPAPSQDL-NQR 801 ). Further increasing the level of induction (above 8 ng/ml) resulted in broader glycosylation effects with more general induction of a larger number of glycopeptides.

O-Glycoproteome contribution of individual GalNAc-Ts
Importantly, among the low-dose responders in the HEK ind T2 and HEK ind T11 data set, there was no overlap (Fig.  7). Direct comparison of the 367 overlapping unambiguous glycopeptides from the two data sets showed that glycopeptides induced in HEK ind T2 were not induced in HEK ind T11 and vice versa, with the exception of a single glycopeptide from GRP-78/ HSPA5/BiP ( 634 LYGSAGPPPTGEEDTAEKDEL 654 ). We also aligned the unambiguous single-site glycopeptides of each group for both the HEK ind T2 and HEK ind T11 data set ( Fig. S14 and S15). The number of aligned sequences is small, especially for the low-dose responders, because few peptides are both unambiguous and single-site. Still, we could validate GalNAc-T2's preference for Pro at position Ϫ1 and identified an enrichment of Pro at position ϩ3 for medium to high-dose responders. For GalNAc-T11, the LA module linker sequence XXC 6 XXXTC 1 XX is highly enriched for low-to medium-dose responders, and a more general GalNAc-T motif with Pro at position Ϫ1 appears for high-dose responders.
Taken together, we interpret the data to demonstrate that induction of GalNAc-T2 and -T11 in HEK cells produces dis-crete glycosylation of distinct O-glycosites in a tight and dosedependent manner without affecting the major part of the O-glycoproteome, which is covered by functional redundancy among GalNAc-T isoforms, even though many of the unaffected sites serve as substrates for the induced enzymes in in vitro assays (21).

Discussion
Whereas it seems logical that an increase in the amount of a glycosyltransferase enzyme in a cell should lead to an increase in glycosylation efficiency, this may not be the case given the complex and intertwined pathways of glycan assembly, where multiple enzymes utilize the same substrates, and isoenzymes have overlapping functions but potentially different subcellular localizations. Moreover, it is conceivable that many glycosyltransferases are present at supersaturated levels given that, for example, high expression levels of recombinant therapeutic glycoproteins in mammalian cells do not appear to exhaust the glycosylation capacity (39). We chose to study the effects of enzyme dose in the context of the large family of GalNAc-T isoenzymes, especially because several of these enzymes appear to be involved in human disorders through dysregulation of  (Fig. 4). When inducing GalNAc-T2 or -T11, we observe an increase in glycosylation, selectively of their nonredundant substrate sites, whereas the global O-glycoproteome is unaffected (gray lines). There is a strong agreement between reporter ion intensities in HEK WT and the response of the intensities upon induction. O-Glycopeptides selectively lost when comparing uninduced cells to HEK WT are rescued by enzyme induction in HEK ind T2 or HEK ind T11. For GalNAc-T2, the data are representative of analysis of two independent inducible clones, and for GalNAc-T11, data represent analysis of one clone.

O-Glycoproteome contribution of individual GalNAc-Ts
transcription rather than through structural defects in the enzyme proteins (32). We chose to study two isoenzymes, Gal-NAc-T2, a major contributor to the O-glycoproteome of all cells and co-regulator of lipid metabolism, and GalNAc-T11, which selectively glycosylates the LDLR-related receptors (24,25,34). Although both enzymes have been shown to have broad substrate specificities by in vitro analyses, we found that gradual induction of either enzyme in isogenic cell lines resulted in exquisite concomitant increase in glycosylation, selectively of their nonredundant substrate sites. The global O-glycoproteome was unaffected by the increase in GalNAc-T2 or -T11 except at induction levels 2-3-fold above the endogenous levels of these enzymes, where a more general but minor increase in the O-glycoproteome was found (Fig. 6).
Our insights into the functions of GalNAc-Ts and their isoform-specific contributions to the O-glycoproteome have so far been explored using in vitro assays with recombinant soluble enzymes and short peptide substrate (10,20,21) and isogenic cell lines with complete loss or gain of individual GALNT genes (22,50). Additional information has been obtained from studies of deficiencies in model organisms (5, 15, 27, 51-53) and, for GALNT3 and GALNT2, also in humans (25,30). Whereas in vitro studies with short synthetic peptides demonstrate quite broad substrate specificities with considerable overlap among isoforms, the available analyses of deficiencies in GALNTs in cells or organisms suggest more restricted and specific functions. Perhaps the most illustrative example is the isoform Gal-NAc-T3, which in vitro glycosylates a wide range of peptide substrates (54) but, by differential O-glycoproteomics of cell lines with KO of GALNT3, was identified to have only a small subset of nonredundant substrates (22,23), whereas in vivo studies suggest that GalNAc-T3 primarily co-regulates the proprotein processing of FGF23 and hence phosphate homeostasis (29,55). Another important example is GalNAc-T11, originally shown to have substrate specificities very similar to and overlapping with those of GalNAc-T1-T3 in vitro (15) but later identified by us as the only isoform glycosylating a highly conserved O-glycosite in the short linker regions XC 6 XXXTC 1 X present between the ligand-binding LDL class A repeats (LA modules) of all LDLR-related receptors (56). Moreover, these O-glycans and, in particular, their sialic acids potentiate receptor/ligand binding affinity and endocytosis, at least for LDLR and VLDLR (24). Interestingly, the Drosophila ortholog, dGal-NAcT1 or l(2)35a, is essential for embryonal development (15) and appears to have an analogous function, glycosylating LA modules of the fly lipophorin receptors (24). Human GALNT11 is a GWAS candidate gene associated with chronic kidney decline, and GalNAc-T11 is critical for glycosylation of the linker regions between the LA modules of the endocytic receptor LRP2. Whereas the functions of GalNAc-T11-mediated O-glycosylation of LRP2 remain to be established, deficiency of Lrp2 in mice causes proteinuria and kidney disease (57), suggesting that O-glycans on LRP2 are important for function (58).
Recent studies have demonstrated that common genetic variants in a regulatory locus of GALNT2 are likely drivers of the low-HDL and high-triglyceride genome-wide association traits (25,36,37), pointing for the first time to an important role of tight regulation and enzyme dose of a GalNAc-T isoenzyme. Further, the respective roles of GalNAc-T3 and -T11 in regulating the function of FGF23-and LDLR-related receptors suggest that expression of these isoenzymes could also be regulated as a means to fine-tune important and disease-relevant functions (3). In the present report, we produced isogenic cell models, which enabled us to study gradually increased levels of Gal-NAc-T2 or -T11 expression up to and beyond the endogenous levels of a HEK cell and to evaluate the global effect of enzyme dose on the O-glycoproteome. We demonstrate that a small subset of glycopeptides were induced at the lowest level of enzyme dose (early responders), whereas the majority of detectable glycopeptides were unaffected (Fig. 6). Previously, we identified a number of GalNAc-T2-specific candidates, including apolipoprotein C-III, angiopoietin-like 3, and phospholipid transfer protein, but none of these are expressed in HEK cells. Instead, for GalNAc-T2, the early responders were mainly localized to the endoplasmic reticulum (calnexin and GRP-78) or Golgi (GP73/GOLM1, TGN46, ␤4Gal-T3/ B4GALT3, and PAPS transporter 1/SLC35B2). One explanation for these substrates being found among the early responders may be that they represent the limited exposure of the induced biosynthetic pool of GalNAc-T2 that is moving from

O-Glycoproteome contribution of individual GalNAc-Ts
the endoplasmic reticulum to the Golgi. Interestingly, Gal-NAc-T2 was recently associated with PAPS transporter 1/SLC35B2, GP73/GOLM1, GOLIM4, and GLG1 as proteins depleted from endocytic vesicles in response to KO of AP-5 adaptor protein complex, suggesting involvement in the late endosome-to-Golgi retrieval process (59). For GalNAc-T11, the early responders identified were LRP1 and LRP2, which is in agreement with our previous findings and supports our hypothesis that O-glycosylation of these receptors serves regulatory functions (24). We recently characterized O-glycosylation of LDLR-related proteins in cell lines and rat organs, identifying LA linker glycosylation in LDLR, VLDLR, LRP1, LRP1B, SorLA, LRP8, and LRP2 (24), but in HEK cells, most of these genes are expressed at very low levels. In addition to the LRPs, we also identified a novel O-glycosite in the single LA linker of SPINT1, the matriptase inhibitor also known as HAI-1. In SPINT1, the LA domain affects the Kunitz domain 1 inhibition of matriptase (60), and it is possible that an O-glycan in the LA linker may modulate this effect. Besides glycosylation of LA linkers, increasing GalNAc-T11 expression induced a number of O-glycosites located in cysteine-rich globular domains. For example, an O-glycosite in the C terminus of XXYLT1 is located just 2 residues from one of four cysteine residues involved in disulfide bridge formation, providing structure to the catalytic domain (61). Similarly, the induced O-glycosite in podocalyxin is located in the most C-terminal part of the mucin domain, just N-terminal to a globular domain with two disulfide bridges. O-Glycosylation of podocalyxin affects ezrin/NHERF2 interaction and is essential to protect the integrity of the glomerulus (62).
The molecular basis for the observed tight regulation of highly select substrates is still unclear. The large group of common substrates that are not affected by the induction levels of a particular isoenzyme should in principle serve as a reservoir of competing substrates. The relative kinetic properties for individual O-glycosites are believed to drive selectivity and order of incorporation in peptides (19), but the isoform-specific glycosites utilized by individual GalNAc-Ts do not appear to be kinetically preferred, at least as evaluated by in vitro assays in the past (21,55). In particular, we were unable to show activity of GalNAc-T11 with peptides derived from LDLR linker regions, and only very weak activity was demonstrated with a folded construct containing multiple LA domains in vitro (56). We cannot rule out the possibility that higher-order organization of GalNAc-Ts in the Golgi and trans-Golgi network may drive the selectivity. Hassinen et al. (63) have suggested that heteromeric glycosyltransferase complex formation and Golgi stack pH drive specificity and velocity of the respective enzymes, and the GalNAc-Ts are believed to be differentially distributed throughout the Golgi, although this has only been studied in detail for GalNAc-T1-T3 (38). Further studies are definitely needed to resolve the mechanism, but it is clear that altering the level of expression of a single GalNAc-T drives specific and tight changes in the O-glycoproteome.
In conclusion, our study demonstrates that the expression levels of GalNAc-T2 and -T11 in HEK cells are not supersaturated at or below the endogenous expression levels of these enzymes. The presented data indicate that within reasonable expression levels, there is a tight correlation between enzyme dose and glycosylation efficiency and hence stoichiometry of O-glycans on highly select glycoproteins. Thus, it is plausible that tight regulation of GalNAc-Ts directly co-regulates specific functions, such as proprotein processing and ectodomain shedding. The results further suggest that even slight dysregulation of GalNAc-T expression levels can have adverse effects on glycosylation and resulting biological functions, and this may explain the many GWAS traits associated with the GALNT gene family.

ZFN gene-targeting plasmids and construction of donor plasmids
AAVS1 CompoZr ZFN pair, targeting intron 1 of the human PPP1R12C gene at the AAV integration safe harbor site (ACC-CCACAGTGGggccacTAGGGACAGGAT) and a CHO SH CompoZr ZFN pair targeting the landing pad (TCTTCCCC-GACCCAGGTCACTTCTGGgttataGCTGAGACTCCGGAC-AGCATGCAACC) of EPB71 were obtained from Sigma. ZFN binding sites are shown in capital letters, and the linker cut site is shown in lowercase. To enable FACS enrichment and improve editing efficiency, ZFNs are tagged with a 2A peptide fused to GFP or Crimson (64).
Knock-in donor plasmids were based on Obligate ligationgated recombination (ObLiGaRe) and designed as in Maresca et al. (65), the only modification being that inverted ZFN binding sites were positioned flanking the donor cassette (42). Synthetic ObLiGaRe donor vector frame works possessing a multiple cloning site, a universal single-stranded donor oligonucleotide (ssODN) site, and a ZFN landing pad flanked by insulators and inverted ZFN binding sites were synthesized by Genewiz USA. EPB71 (Addgene ID 90018) was used for AAVS1 targeting and EPB69 for CHO SH targeting.

Cell culture and tandem knock-in
HEK cells were maintained in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum and 2 mM L-glutamine. ObLiGaRe-mediated KI was performed as described previously (43). In brief, ϳ60% confluent HEK⌬T2 (2A6) and HEK⌬T11 (7E5) cells in 6-well plates were transfected with GFP/E2 Crimson-tagged AAVS1 ZFN plasmids and pAAVS1-EPB71-TETOn3G-ObLiGaRe donor construct in a ratio of 1:1:3 g, using Lipofectamine 3000 according to the manufacturer's instructions. 24 h after transfection, cells expressing both GFP and E2 Crimson were sorted out in a FACS ARIA III (BD Biosciences) as described previously (64). HEK⌬T2 stably expressing the Tet-On transactivator were selected after eight passages by the ability to express EYFP when transfected with pTRE3G-EYFP and cultured in medium supplemented with 50 ng/ml doxycycline (D9891, Sigma). EYFPpositive cells were sorted, expanded, and subjected to a second round of ObLiGaRe-mediated KI by transfection with GFP/E2 Crimson-tagged CHO SH ZFN plasmids and pCHO_SH-EPB69-pTRE3G-GALNT2-ObLiGaRe donor construct in a ratio of 1:1:3 g. Bulk-sorted cells were expanded and single cell-sorted.

Initial immunocytology screen
Expanded single-cell clones were cultured in 50 ng/ml doxycycline (D9891, Sigma) for 24 h before being harvested and seeded on Teflon-coated glass slides for microscopy. Slides with dried cells were fixed in ice-cold acetone for 7 min, dried, and incubated with anti-T2 hybridoma supernatant (4C4) overnight at 4°C. After washing, a rabbit anti-mouse IgG FITClabeled secondary antibody (F0261, Dako A/S, Glostrup, Denmark) diluted 1:500 in PBS, 1% BSA was added for 1 h at room temperature (RT), protected from light. After a final wash, the cells were mounted with Fluoromount-G TM (00-4958-02, Thermo Scientific).

Junction PCR
Integration of the cassettes at the AAVS1 locus was verified by junction PCR using primer pairs shown in Fig. S2 and Fig. S3 and listed in Table S2. We used TEMPase Hot Start DNA Polymerase (Ampliqon) in combination with an ammonium buffer system and a final DMSO concentration of 2.5%. The touch-down PCR program is shown in Table S3. PCR products were visualized on 2% agarose gels with GelStar stain (Lonza). Immunocytology 2.5 ϫ 10 4 cells were seeded on glass coverslips in 24-well plates. Cells were induced with doxycycline after 24 h by complete replacement of the culture medium and again after 24 h of induction. After 72 h of total culture time, cells were fixed in 4% paraformaldehyde for 15 min and washed three times in PBS. Fixed cells were permeabilized in block buffer (PBS ϩ 5% FBS, 0.5% Triton X-100, and 50 mM glycine) for 30 min at RT, stained with anti-GalNAc-T2 (4C4) or anti-GalNAc-T11 (1B2) mouse hybridoma supernatant and polyclonal rabbit anti-giantin (1:1000) (66) for 1 h at RT, washed with PBS, and incubated for 1 h at RT with Alexa Fluor 647-conjugated goat anti-mouse and Alexa Fluor 488 -conjugated goat anti-rabbit diluted 1:500 in block buffer. After a final wash in PBS, coverslips were mounted with Fluoromount-G TM (00-4958-02, Thermo Scientific). Images were captured by spinning disk confocal microscopy using an FEI CorrSight system utilizing 488-and 640-nm lasers and a ϫ63/1.4 numerical aperture oil objective. Captured z-stacks were collapsed for maximum intensity projection in ImageJ.

Characterization of induction dynamics by indirect flow cytometric measurement of GalNAc-T2 expression
1 ϫ 10 6 of HEK ind T2 cells were seeded in T25 flasks and cultured for a total of 72 h before analysis. Cells were induced with 0, 4,8,12,16,50, and 100 ng/ml doxycycline after 24 h by complete replacement of the culture medium and again after 24 h of induction. HEK WT and HEK⌬T2 cells were cultured in parallel in the absence of doxycycline. Cells were stained as described (67). In brief, trypsinized cells were washed in PBS; fixed with 4% paraformaldehyde for 10 min; permeabilized in PBS, 0.1% Triton X-100 for 4 min; washed in PBS; and blocked in PBS, 5% BSA for 30 min. 0.5-1 ϫ 10 6 cells were stained in a 200-l volume with GalNAc-T2 antibody (4C4), at a concentration of 10 g/ml in PBS, 1% BSA, for 1 h at RT. After washing in PBS, cells were incubated with Alexa Fluor 647-conjugated goat anti-mouse IgG (A-21235, Thermo Scientific) (1:1000 in PBS, 1% BSA) for 1 h at RT, washed, and stored in PBS, 1% BSA at 4°C until analysis. Secondary-only and IgG isotype control was performed in parallel by replacing 4C4 with negative control mouse IgG1 (X0931, Dako A/S) diluted 1:100 in PBS, 1% BSA. A Fortessa flow cytometer (BD Biosciences) using a 540-nm laser and a 570/10-nm bandpass filter was used to analyze cells. A population of 1 ϫ 10 5 events was recorded for each analysis. Gates were set using FSC-A versus SSC-H for exclusion of debris and FCS-H versus FSC-W to gate for single cells. Events were plotted using FlowJo. Two independent clones, clone 385 (HEK ind T2) and clone 468, were analyzed.

O-Glycoproteome contribution of individual GalNAc-Ts Functional validation of GalNAc-T2 induction with an established biosensor
HEK ind T2 cells stably expressing biosensors were generated and imaged as described previously (46). In brief, HEK ind T2 cells were transfected with T2 biosensor (linker sequence: -KPRAPRGTPF-) or ⌬Gly T2 biosensor (linker sequence: -KPRAPRGGPF-) and selected with puromycin for 2 weeks. GFP-positive cells were sorted on a FACSVantage fluorescence cell sorter (BD Biosciences), expanded, and used for induction experiments. Sensor activation response to doxycycline was first investigated by live cell imaging. Cells were passed onto coverslips, induced for 48 h with 50 ng/ml doxycycline, and right before imaging incubated with 110 nM MG dye. Images were captured using an LSM 510 Meta DuoScan spectral confocal microscope equipped with a ϫ40 objective (Zeiss).
For flow experiments, 5 ϫ 10 4 HEK ind T2 sensor-expressing cells were seeded in 24-well plates and cultured for a total of 72 h before analysis. Cells were induced with 0, 4, 8, 12, 16, 50, and 100 ng/ml doxycycline after 24 h by complete replacement of the culture medium and again after 24 h of induction. Cells were released with 300 ml of PBS containing 110 nM MG dye and 5 mM EDTA for 5 min at 37°C. Up to 50,000 events were recorded on a FACSVantage flow cytometer (BD Biosciences). GFP was detected using a 488-nm laser and a 530/30-nm filter. MG was detected using a 635-nm laser and a 685/35-nm filter. Events were gated through FSC-A versus SSC-H to select viable cells before producing histograms and dot plots. Geometric means of the gated populations (Fig. 3A) were used to calculate the MG/GFP ratio (Fig. 3B). All analysis was performed in FlowJo.

Differential O-glycoproteomics
2.5 ϫ 10 6 HEK ind T2 or HEK ind T11 cells were seeded in T75 flasks and cultured for a total of 72 h and induced for 48 h before harvest. Cells were induced with doxycycline at 24 and 48 h by medium replacement. HEK ind T2 was induced at 0, 4, 8, 12, and 16 ng/ml and HEK ind T11 at 0, 2.5, 5, 7.5, and 10 ng/ml of doxycycline. 2.5 ϫ 10 6 HEK WT cells were cultured in parallel in the absence of doxycycline. Total cell lysates were prepared as described previously (22). In brief, packed cell pellets were lysed in 50 mM ammonium bicarbonate, 0.2% RapiGest SF Surfactant (Waters Corp.), and the lysate was homogenized by sonication. Cleared lysates were diluted in 50 mM ammonium bicarbonate to bring the final concentration of RapiGest below 0.2% before being subjected to reduction with DTT, alkylation with iodoacetamide, and digestion with trypsin (Roche Applied Science). Each tryptic digest was purified using a 1-ml Sep-Pak C18 column (Waters Corp.), and peptide concentration was measured on a NanoDrop. Equal amounts of each digest were analyzed by LC-MS to ensure sample uniformity before labeling; 200 g of each digest was labeled by TMTsixplex TM isobaric labeling (Thermo Scientific) following the manufacturer's instructions, providing one channel for HEK WT and five channels for uninduced and induced HEK ind T2 or HEK ind T11. 1% of each labeling reaction was combined, and labeling efficiency was verified in a LC-MS ratio check. Labeled peptides were pooled and treated with neuraminidase (N3001, Sigma), and O-glycopep-tides were enriched by PNA LWAC. Selected elution fractions were StageTip-purified, pooled, and fractionated using the Pierce TM high pH reversed-phase peptide fractionation kit (Thermo Scientific) following the manufacturer's instructions. PNA LWAC flow-through was purified on a 1-ml Sep-Pak C18 column, and 100 g of FT peptides were fractionated using the Pierce TM high-pH reversed-phase peptide fractionation kit. For GalNAc-T2, two independent clones were analyzed, HEK ind T2 (clone 385) and clone 468. One inducible GalNAc-T11 clone HEK ind T11 (clone 40) was analyzed.

Mass spectrometry
EASY-nLC 1000 UHPLC (Thermo Scientific) interfaced via a PicoView nanoSpray ion source (New Objectives) to an Orbitrap Fusion mass spectrometer (Thermo Scientific) was used for glycoproteomic and proteomic studies. Nano-LC was operated in a single analytical column setup using PicoFrit Emitters (New Objectives, 75-m inner diameter) packed inhouse with Reprosil-Pure-AQ C18 phase (Dr. Maisch, 1.9-m particle size, ϳ19-cm column length), with a flow rate of 200 nl min Ϫ1 . All samples dissolved in 0.1% formic acid were injected onto the column and eluted in a gradient from 2 to 25% acetonitrile in either 95 (for glycoproteomic samples) or 155 min (for proteomic samples), from 25 to 80% acetonitrile in 10 min, followed by isocratic elution at 80% acetonitrile for 15 min (total elution time 120 or 180 min, respectively). The nanoSpray ion source was operated at 2.1-kV spray voltage and 300°C heated capillary temperature. A precursor MS1 scan (m/z 350 -1,700) of intact peptides was acquired in the Orbitrap at a nominal resolution setting of 120,000. For glycoproteomic samples, the five most abundant multiply charged precursor ions in the MS1 spectrum at a minimum MS1 signal threshold of 50,000 were triggered for sequential Orbitrap HCD MS2 and ETD MS2 (m/z of 100 -2,000). MS2 spectra were acquired at a resolution of 50,000 for HCD MS2 and 50,000 for ETD MS2. Activation times were 30 and 200 ms for HCD and ETD fragmentation, respectively; isolation width was 4 mass units, and 1 microscan was collected for each spectrum. Automatic gain control targets were 1,000,000 ions for Orbitrap MS1 and 100,000 for MS2 scans, and the automatic gain control for the fluoranthene ion used for ETD was 300,000. Supplemental activation (20%) of the charge-reduced species was used in the ETD analysis to improve fragmentation. Dynamic exclusion for 60 s was used to prevent repeated analysis of the same components.
For proteomic samples, the 10 most abundant multiply charged precursor ions in the MS1 spectrum at a minimum MS1 signal threshold of 100,000 were triggered for sequential Orbitrap HCD MS2 at a resolution of 60,000. In addition, for some proteomic samples, a synchronous-precursor selection MS3 method was used for quantitative analysis (68,69). Polysiloxane ions at m/z 445.12003 were used as a lock mass in all runs. Raw data have been deposited to the ProteomeXchange Consortium (70) via the PRIDE partner repository with the data set identifier PXD010155.

Mass spectrometry data analysis
Data processing was performed using Proteome Discoverer version 1.4 software (Thermo Scientific) using Sequest HT

O-Glycoproteome contribution of individual GalNAc-Ts
Node as described previously (1) with minor changes. Briefly, all spectra were initially searched with full cleavage specificity, filtered according to the confidence level (medium, low, and unassigned), and further searched with the semi-specific enzymatic cleavage. In all cases, the precursor mass tolerance was set to 6 ppm and fragment ion mass tolerance to 20 milli-mass units. Carbamidomethylation on cysteine residues was used as a fixed modification. Methionine oxidation and HexNAc attachment to serine, threonine, and tyrosine were used as vari-ablemodificationsforETDMS2.AllHCDMS2datawerepreprocessed as described (1) and searched under the same conditions mentioned above using only methionine oxidation as a variable modification.
For the quantitative analysis, only HCD MS2 spectra were used. In the case of ETD MS2 spectra, the group of TMT reporter fragment ions (m/z range of 126 -132) was extracted from the adjacent HCD MS2 spectrum (the same precursor ions), concatenated with the corresponding ETD MS2 spectrum, and later used for quantification. Processing of the TMT MS3 data was performed using Proteome Discoverer version 2.1 software (Thermo Scientific).
All spectra were searched against a concatenated forward/ reverse human-specific database (UniProt, January 2013, containing 20,232 canonical entries and another 251 common contaminants) using a target false discovery rate of 1%. False discovery rate was calculated using the target decoy peptidespectrum match validator node. The resulting list was filtered to include only peptides with glycosylation as a modification.

Visualization of MS data
Raw lists of quantified O-glycopeptides were imported into R. O-Glycopeptides with missing quantification(s) were dropped. Glycopeptides with HCD misassigned glycans were dropped. Next, data were normalized to HEK ind T2 or HEK ind T11 cultured in 0 ng/ml doxycycline, and the ratios were log 2 -transformed. Subsequently, unambiguous HCD and ETD O-glycopeptides were independently collapsed to reduce redundancy. The two collapsed lists were rejoined with the list of ambiguous HCD glycopeptides, generating Data S1C (HEK ind T2), Data S1G (HEK ind T11), and Data S1J (HEK⌬T2). A similar pipeline was applied to proteins quantified in the ratio check (Data S1, B and F) and LWAC FT (Data S1, E and I).