The Lectin Domain of the Polypeptide GalNAc Transferase Family of Glycosyltransferases (ppGalNAc Ts) Acts as a Switch Directing Glycopeptide Substrate Glycosylation in an N- or C-terminal Direction, Further Controlling Mucin Type O-Glycosylation*

Background: ppGalNAc transferases, which initiate O-glycosylation, possess a poorly understood lectin domain. Results: The lectin domain directs glycosylation in an N- or C- terminal direction in an isoform-specific manner. Conclusion: Unanticipated isoform-specific directionality was revealed for modification of glycopeptide substrates. Significance: A novel mechanism of controlling of mucin type O-glycosylation has been discovered based on tethered lectin domains specifying N- or C-terminal modification of glycopeptide substrates. Mucin type O-glycosylation is initiated by a large family of polypeptide GalNAc transferases (ppGalNAc Ts) that add α-GalNAc to the Ser and Thr residues of peptides. Of the 20 human isoforms, all but one are composed of two globular domains linked by a short flexible linker: a catalytic domain and a ricin-like lectin carbohydrate binding domain. Presently, the roles of the catalytic and lectin domains in peptide and glycopeptide recognition and specificity remain unclear. To systematically study the role of the lectin domain in ppGalNAc T glycopeptide substrate utilization, we have developed a series of novel random glycopeptide substrates containing a single GalNAc-O-Thr residue placed near either the N or C terminus of the glycopeptide substrate. Our results reveal that the presence and N- or C-terminal placement of the GalNAc-O-Thr can be important determinants of overall catalytic activity and specificity that differ between transferase isoforms. For example, ppGalNAc T1, T2, and T14 prefer C-terminally placed GalNAc-O-Thr, whereas ppGalNAc T3 and T6 prefer N-terminally placed GalNAc-O-Thr. Several transferase isoforms, ppGalNAc T5, T13, and T16, display equally enhanced N- or C-terminal activities relative to the nonglycosylated control peptides. This N- and/or C-terminal selectivity is presumably due to weak glycopeptide binding to the lectin domain, whose orientation relative to the catalytic domain is dynamic and isoform-dependent. Such N- or C-terminal glycopeptide selectivity provides an additional level of control or fidelity for the O-glycosylation of biologically significant sites and suggests that O-glycosylation may in some instances be exquisitely controlled.


Mucin type O-glycosylation is initiated by a large family of polypeptide GalNAc transferases (ppGalNAc Ts) that add ␣-
GalNAc to the Ser and Thr residues of peptides. Of the 20 human isoforms, all but one are composed of two globular domains linked by a short flexible linker: a catalytic domain and a ricin-like lectin carbohydrate binding domain. Presently, the roles of the catalytic and lectin domains in peptide and glycopeptide recognition and specificity remain unclear. To systematically study the role of the lectin domain in ppGalNAc T glycopeptide substrate utilization, we have developed a series of novel random glycopeptide substrates containing a single GalNAc-O-Thr residue placed near either the N or C terminus of the glycopeptide substrate. Our results reveal that the presence and N-or C-terminal placement of the GalNAc-O-Thr can be important determinants of overall catalytic activity and specificity that differ between transferase isoforms. For example, ppGalNAc T1, T2, and T14 prefer C-terminally placed GalNAc-O-Thr, whereas ppGalNAc T3 and T6 prefer N-terminally placed GalNAc-O-Thr. Several transferase isoforms, ppGalNAc T5, T13, and T16, display equally enhanced N-or C-terminal activities relative to the nonglycosylated control peptides. This N-and/or C-terminal selectivity is presumably due to weak glycopeptide binding to the lectin domain, whose orientation relative to the catalytic domain is dynamic and isoform-dependent. Such N-or C-terminal glycopeptide selectivity provides an additional level of control or fidelity for the O-glycosylation of biologically significant sites and suggests that O-glycosylation may in some instances be exquisitely controlled.
Mucin type protein O-glycosylation, as defined by the ␣-GalNAc-O-Ser/Thr linkage, is one of the most common types of protein glycosylation found in higher organisms. This modification is initiated in the Golgi by a large family (ϳ20 members in mammals and about half the number in the fly and Caenorhabditis elegans (see Ref. 1 for a review) of polypeptide GalNAc transferases (ppGalNAc Ts) 3 that transfer GalNAc from UDP-GalNAc to Ser and Thr. Subsequent chain elongation of the GalNAc residue by the stepwise action of additional specific transferases results in a vast array of heterogeneous O-linked glycan structures. In contrast to the initiating ppGalNAc Ts, nearly all of these O-glycan elongating transferases occur in small families of 1-4 members. Despite its prevalence, the biological roles and control of mucin type O-glycosylation remain incompletely understood.
Mucin type O-glycans have typically been associated with so-called mucin domains, where high numbers of Ser and Thr residues are glycosylated, producing a chemical-and proteaseresistant extended peptide structural motif. Glycoproteins containing O-glycosylated mucin domains commonly function in the protection of the cell surface and the modulation of cell-cell interactions and hence play important roles, for example in inflammation, the immune response, metastasis, and tumorigenesis (2)(3)(4)(5)(6). Because the role of the mucin domain in these glycoproteins has been thought to produce a stable extended hydrophilic motif, the actual glycosylation pattern has usually been thought to be of relatively low importance. This is supported by the very low sequence conservation across mammalian and insect species of the glycosylated domains of the secreted mucins (7,8). Nevertheless, it cannot be ruled out that in some cases, O-glycosylated domains may present a molecular code for the specific recognition of additional binding partners, thus playing potential roles in protein sorting, targeting, and turnover (9 -15).
Recent proteome-wide analysis demonstrates that mucin type O-glycosylation is widely distributed on most proteins passing through the secretory pathway and that most have only one or few isolated O-linked glycans (16 -18). Thus, it is likely that O-linked glycans at specific sites may play other important biological roles. One emerging role is the co-regulation of the proprotein processing process (19,20), where O-glycans modulate cleavage of nearby proprotein convertase cleavage sites (i.e. phosphaturic factor, FGF23, and angiopoietin-like protein 3 (ANGPTL3) (21,22)). Other studies have shown that specific O-glycosylation can modulate receptor activities (i.e. TGF-␤R II and the death receptor DR5) by presently unknown mechanisms (23,24). Several of the above O-glycosylation events are targeted by specific ppGalNAc T isoforms (i.e. ppGalNAc T2,T3, T14, and T16); therefore, some occurrences of O-glycosylation may be actively regulated to serve as modulators of complex biological processes and perhaps even signaling. Unsurprisingly, a number of distinct ppGalNAc T isoforms have been shown to cause or be associated with human disease. The loss of ppGalNAc T3 causes familial tumoral calcinosis due to abnormal processing of FGF23 (21,25). ppGalNAc T2, T5, and T19 (also known as WBSCR17) 4 are linked to Williams-Beuren syndrome (15), hereditary multiple exostoses (26), and levels of HDL cholesterol and coronary artery disease (22,27,28), respectively, whereas ppGalNAc T3, T6, T7, T9, and T12 are associated with various cancers (29 -34). Except for the known role of ppGalNAc T3 in modulating the cleavage and inactivation of FGF23, the mechanisms for causing disease by the other transferase isoforms are still uncertain.
Recent studies have further demonstrated the critical role of O-glycosylation in vertebrate and invertebrate development (35)(36)(37)(38)(39). In the fly, several ppGalNAc T isoforms are required to complete development (38), whereas in the mouse, ppGalNAc T1 modulates salivary gland organogenesis (40). Interestingly, the loss of the elongating transferase, T-synthase (which adds a ␤-Gal to the 3-position of the peptide GalNAc) leads to embryonic lethality in the mouse (35). Although a few target proteins have been impli-cated in the above studies, little is known about the actual site(s) of glycosylation and their specificity, and even less is known of the actual molecular mechanism(s) leading to their biological function.
Structurally, members of the ppGalNAc T family (except h-ppGalNAc T20 (41)) possess a unique two-domain architecture consisting of a transmembrane tethered N-terminal catalytic domain linked via a short flexible segment to a C-terminal ricin-like lectin domain containing three potential carbohydrate binding sites (see Fig. 1 for the crystal structure of ppGalNAc T2 with bound EA2 peptide substrate (42)). Presently, the roles of the catalytic and lectin domains in peptide and glycopeptide recognition and specificity remain unclear. Our recent studies utilizing short random (glyco)peptide substrates have shown that the ppGalNAc Ts possess specific binding preferences that vary among isoforms for peptide and even GalNAc-O-Ser/Thrcontaining glycopeptide substrates (43)(44)(45). Interestingly, we and others have shown that the placement of a neighboring GalNAc-O-Ser/Thr residue near a glycosylation site produces a range of transferase-specific effects (i.e. a relative inhibition of glycosylation (i.e. ppGalNAc T1 and T2 (46,47)), an alteration or shift in glycosylation site (ppGalNAc T2 and T4 (48 -51)), or even a large apparent rate enhancement (i.e. requirement for glycosylated substrate) (ppGalNAc T7 and T10 (45,(51)(52)(53)). Interestingly, the alteration in site preference observed for ppGalNAc T2 and T4 resides in their lectin domain (48,50,51), whereas the nearly absolute glycopeptide requirement of ppGalNAc T10 resides completely in its catalytic domain (45,51). Recent studies characterizing the properties of the lectin domains of ppGalNAc T2 and T3 further suggest that they may recognize glycopeptide sequence context (54) Table 2). ppGalNAc T glycopeptide substrate utilization, we have now extended our studies to the function of the lectin domain, utilizing a series of random glycopeptides containing N-or C-terminally placed GalNAc-O-Thr residues. Our results show that the presence and the N-or C-terminal location of a GalNAc-O-Thr site in these glycopeptides are indeed important determinants of the overall catalytic activity and specificity of these enzymes, which can significantly differ between transferase isoforms. Because of the large differences in glycosylation rates observed for some isoforms between an N-or C-terminally placed GalNAc-O-Thr, we believe we have now uncovered another level of control of mucin type O-glycosylation that will further advance our understanding of the regulation of this important modification.  Table 1 for sequences) were custom synthesized by Sussex Research (Ottawa, CA). Random (glyco)peptide substrates were lyophilized from water several times, adjusted to pH 7.5 with dilute NaOH and/or HCl, and stored frozen at a concentration of 50 mg/ml (15-17 mM). Similar random (glyco)peptide X and Z residue compositions were confirmed by Edman amino acid sequencing (data not shown). Edman amino acid sequencing was performed on an Applied Biosystems Procise 494 peptide sequencer as described previously (43)(44)(45)56). Liquid scintillation counting was performed in a Beckman model LS6500 scintillation counter.

Reagents and Random Peptide Substrates-Fully N-acety-
Transferases-N-terminally truncated and affinity-tagged ppGalNAc Ts used in this work were obtained from multiple sources and multiple expression systems and used as one or more of the following forms: medium supernatant, affinity-purified soluble transferase, or transferase bound to affinity beads. Experiments completed using transferases from different sources yielded consistent results. Soluble affinity-purified bovine ppGalNAc T1 (from Sf9 insect cells) and human ppGalNAc T2 and T3 (from Pichia pastoris) have been described previously (43)(44)(45)56). ppGalNAc T3 was also used as medium supernatant from P. pastoris as described (43). Poly-His-tagged ppGalNAc T3, T6, and T13, expressed from Xpress SF ϩ insect cells (Protein Sciences Corp., Meriden, CT), were used directly bound to Ni-NTA affinity beads (Thermo Fisher) after extensive washing. 5 Soluble N-terminal poly-His-tagged ppGalNAc T5 and T16 were expressed from HEK293f cells and purified by Ni-NTA superflow (Qiagen) nickel affinity chromatography by procedures analogous to those for the production of rat ST6GAL1 (57). 5 Soluble N-terminal poly-His-tagged ppGalNAc T14 and T16 were expressed from High Five insect cells and purified on Ni-NTA-agarose (Invitrogen) and MonoQ 5/50 GL ion exchange chromatography (GE Healthcare) essentially as described previously (58).
Glycosylation of Random Lectin Glycopeptide Substrates-Typical reaction conditions utilizing random lectin glycopeptides substrates GPIV, GPIV-C, GPV, and GPV-C were as follows: 100 mM HEPES, pH 7.5 (ppGalNAc T1, T2, and T3) or 68 mM sodium cacodylate, pH 6.5 (ppGalNAc T3, T5, T6, T13, T14, and T16), 1.8 mM 2-mercaptoethanol, 10 mM MnCl 2 , 50 M 3 H-radiolabeled UDP-GalNAc (ϳ6 ϫ 10 8 dpm/mol), and 1.5-1.7 mM (5 mg/ml) of glycopeptide substrates and up to 50 l of soluble transferase or transferase-bound affinity beads. UDP-GalNAc concentrations were kept low (0.5-50 M) to achieve high specific activity for subsequent Edman sequence analysis (described below). After the addition of transferase, reaction mixtures (75 l in 500 l of capped Eppendorf tubes) were incubated at 37°C in a TAITEC shaking Microincubator M-36. Aliquots of 15 l were taken out at time points of 15, 45, 120, 240, and ϳ1200 min (overnight) after initiating the reaction and quenched by the addition of 1 volume of 250 mM EDTA. UDP and non-hydrolyzed UDP-GalNAc were removed by passing the sample through a column of ϳ3 ml of Dowex 1 ϫ 8 anion exchange resin. [ 3 H]GalNAc incorporation was determined by scintillation counting 1 ⁄ 10 or 1 ⁄ 20 of the sample before and after passing through the Dowex column. Data are typically reported as plots of the mole fraction of total UDP-GalNAc transferred (including hydrolysis) versus time. Confirmation of [ 3 H]GalNAc transfer to peptide and the extent of UDP-[ 3 H]GalNAc hydrolysis was determined by Sephadex G10 gel filtration analysis of selected reaction time points as described previously (44). For all transferases reported in this study, the extent of hydrolysis was negligible (see "Results"). Incubations were performed using a range of transferase concentrations (determined by trial and error) that would transfer between 10 and 50% of the total [ 3 H]GalNAc to the optimal substrate after an overnight incubation (giving a range of ϳ0.003-0.016 mol of GalNAc/mol of glycopeptide). In total, 2-4 independent experiments were performed at 2-3 different transferase concentrations (except for ppGalNAc T14).

Determination of Optimal Reaction Conditions and Initial
Observations-Initial use of the GPIV and GPV series of substrates (see Table 1) with ppGalNAc T1 and T2 gave results that seemed to vary with (glyco)peptide concentration. For example, at the lowest substrate concentrations (ϳ0.16 or ϳ0.016 mM (glyco)peptide with T1 and T2, respectively), all four substrates were essentially inactive, whereas at the highest substrate concentration (1.5-1.7 mM), differential activities were observed, with the glycopeptide substrates displaying significantly greater activity compared with the control peptide substrates (see Fig. 2). We attribute these differential effects to peptide concentrations initially below the minimum K m value of the substrate and therefore have chosen to perform our studies utilizing substrate concentrations of 1.5-1.7 mM (5 mg/ml). Because of the random and multiple acceptor site nature of these substrates, a continuum of K m and V max values is expected to be observed; therefore, we have not pursued a detailed kinetic analysis of these substrates. It was further observed that the relative rates of glyco-sylation of the four substrates varied somewhat with the amount of transferase used in the assay; nevertheless, overall transferase preferences were preserved (data not shown).
A commonly observed feature of the transferase activity plots shown below is the plateau in transfer of GalNAc that is often observed with high transferase activities, even for the least active substrates. We attribute this plateau to a combination of several likely factors. For the most active substrates (approaching 50% GalNAc utilization), the plateau in incorporation is most likely dominated by the depletion of UDP-GalNAc donor. For the less active substrates, we attribute the slowing of the rate to the sequential loss of the optimal or best acceptor sites as glycosylation of the random (glyco)peptide progresses. Because the half-lives of ppGalNAc T1 and T2 are about 5 and 2.5 days (59), it is unlikely that transferase inactivation is the source of the plateau with these transferases. However, transferase inactivation cannot be ruled out for the remaining transferases.
Determination of GalNAc Acceptor Sites of Incorporation by Edman Sequencing-Lyophylized post-Dowex [ 3 H]GalNAcglycosylated peptide products were isolated by Sephadex G10 chromatography, and the glycopeptide fraction was pooled according to its 3 H radioactivity. After multiple lyophylizations from water, glycopeptides were subject to Edman degradation using a custom PTH analyzer program that diverted the PTHderivatives from each cycle directly into 7-ml HDPE scintillation vials (Fisher) loaded in a 100-tube rack on an ISCO Foxy 200 fraction collector. It was observed that collecting directly into scintillation vials significantly reduced nonspecific losses of the radiolabeled PTH-derivatives compared with collecting in glass or plastic test tubes. To improve the signal/noise ratio, counting was performed for 5 min/vial and typically repeated 2-3 times, with the resulting dpm values averaged before plotting. To reduce nonspecific binding of the [ 3 H]GalNAc-O-Thr-PTH-derivative to the glass fiber sample loading disk (determined by counting the disk after sequencing), ϳ0.25 mg of nonreacted "cold" (glyco)peptide substrate was commonly added to the sample to be analyzed. Nevertheless, significant variability in recovery of radiolabel was observed. A preview in radioactivity observed in the N-terminal residues, particularly noticeable in the GPV and GPV-C (glyco)peptides, is thought to be due to nonspecific release of (glyco)peptide from the glass filter during the Edman sequencing.

Design of Lectin Glycopeptide Substrates-
The goal of these studies was to develop a series of "universal" substrates that could be used with any ppGalNAc T isoform to access the role of each isoform's lectin domain. Peptides were designed with a single C-or N-terminal GalNAc-O-Thr residue (T*), designed to interact with the lectin domain, flanked by five positions of randomized residues (Z) containing no acceptor (Ser or Thr) residues ( Table 1). The Z regions flanking the T* contained no acceptor Thr sites specifically to eliminate glycopeptide activities arising from the direct binding of GalNAc-O-Thr in the catalytic domain as is observed for ppGalNAc T10, which optimally glycosylates a Ser/Thr directly preceding a GalNAc-O-Ser/Thr (45,51). Beyond these residues, in an N-or C-terminal direction, respectively, are 12 positions of randomized X residues also containing Thr that would serve as the acceptor region by its interaction with the catalytic domain. As a control for lectin domain specificity, peptides (GPIV-C and GPV-C) were also synthesized with the T* replaced with an Ala residue. The rationale behind utilizing randomized residues, X and Z, rather than a specific peptide sequence was based on our goal to eliminate (or at least reduce) transferase-specific bias due to a particular peptide sequence. With this series of (glyco)peptides the role of the lectin domain of the ppGalNAc Ts against glycopeptide substrates can now be systematically accessed, specifically addressing 1) the utilization of the lectin domain on overall transferase activity; 2) the preferred N-or C-terminal orientation, if any, of the T* relative to the site of glycosylation; and 3) the optimal number of residues between the lectin-bound T* and those residues glycosylated by the catalytic domain.
Glycosylation of the GPV-GPIV Series of (Glyco)peptides by ppGalNAc Ts-Representative time course plots and gel filtration profiles demonstrating the differential transfer of [ 3 H]GalNAc into peptides GPIV, GPIV-C, GPV, and GPV-C by the eight ppGalNAc Ts under study are given in Fig. 3. As shown by the right-hand panels of Fig. 3, the G10 gel filtration analysis reveals relatively little UDP-GalNAc hydrolysis activity for all of the transferases characterized (i.e. high 3 H in peptide fractions ϳ27-33 and little 3 H where free GalNAc would appear, fractions ϳ37-43). Therefore, the time plots of the ion exchange column pass-through (Fig. 3, left-hand panels) can be taken as a direct measure of transfer of GalNAc to (glyco)peptide. To aid in the discussion, a phylogenetic tree and a subfamily classification of the human transferases (1) are shown in Fig.  4, where a graphical summary of glycopeptide utilization deduced from the data in Fig. 3 is also given. We will discuss transferase isoforms with common behaviors below. GPIV-preferring Transferases; ppGalNAc T1, T2, and T14-As shown in Fig. 3, A-C (and data not shown), these subfamily Ia and Ib transferases are significantly more active against the C-terminal glycosylated glycopeptide, GPIV, compared with its nonglycosylated control and the N-terminal glycosylated analog (glyco)peptide substrates, GPV and GPV-C. This observation suggests that the binding of the C-terminal T* of GPIV to the transferase lectin domain enhances its activity compared with the other substrates. This may be due to both an effective increase in substrate concentration due to repeated lectin binding and release or to an optimal alignment of the acceptor in the catalytic domain due to specific lectin binding.
To confirm that lectin binding is involved with the enhanced GPIV activity, the effects of adding free GalNAc competitor were examined on ppGalNAc T2. As shown in Fig. 5, A and B, the addition of free GalNAc significantly reduced the activity of GPIV while not affecting the activity of the nonglycosylated control peptide. It should be further noted that under the higher (glyco)peptide and transferase and lower UDP-GalNAc concentrations utilized in this and many of our initial experi-ments, GPV also shows an elevated activity relative to GPV-C (see Fig. 2C) that is completely inhibited by added free GalNAc (Fig. 5, C and D). This suggests that the GPV glycopeptide substrate can interact with the lectin domain of ppGalNAc T2 under certain conditions, also leading to a rate enhancement. The x-ray crystal structure of ppGalNAc T2 (see Fig. 1) shows nine residues of the EA2 peptide substrate bound to the catalytic domain in a specific N to C terminus orientation (42). By model building, we have extended the EA2 peptide to mimic the bound GPIV and GPV glycopeptides, as shown in Fig. 6A (left-and right-hand panels, respectively). In the models, we observe that the glycosylated C terminus (ϳXZZZZZT*ZZZZZAG) of GPIV is extended to the right of the catalytic domain, whereas the glycosylated N-terminal (GAGAZZZZZT*ZZZZZXϳ) of GPV is extended to the left of the catalytic domain. Note that for simplicity, the extended regions were modeled as Ala residues in an extended conformation; however, in reality, an ensemble of dynamic structures would be expected. The observed elevated activity of GPIV over GPV for ppGalNAc T1, T2, and T14, therefore, would suggest that the lectin domain of these transferases would be positioned toward the right of the catalytic domain binding the T* of GPIV such that its Xaa residues would be aligned optimally in the catalytic domain. Such domain flexibility seems plausible in solution, because the relative positions of the catalytic and lectin domains are highly variable among the several crystal structures of ppGalNAc T1, T2, and T10 shown in Fig. 6B (42, 60, 61). This orientational flexibility is undoubtedly the result of the ϳ20-residue extended linker that connects the domains (see Table 2).
Glycopeptide GPV-preferring Transferases; ppGalNAc T3 and T6-These subfamily Ic transferases (Fig. 3, D and E) (and data not shown) display the reversed preferences of ppGalNAc T1, T2, and T14, showing GPV as the most active substrate with the remaining substrates being relatively inactive. This suggests that the lectin domains of ppGalNAc T3 and T6 preferentially interact with T* residues N-terminal of the site of subsequent glycosylation, suggesting that the lectin domain of these transferases must be positioned toward the left of the catalytic domain, as shown in the right panel of Fig. 6A.
Transferases with Preferences for both Glycopeptide Substrates GPIV and GPV; ppGalNAc T5, T13, and T16-As shown by the plots in Fig. 3, F-H (and data not shown), ppGalNAc T5, T13, and T16 possess nearly equally enhanced preferences for both glycopeptide substrates GPIV and GPV compared with their control nonglycosylated peptides. ppGalNAc T5 is the only member of subfamily Id, whereas ppGalNAc T13 and T16 are members of subfamilies Ia and Ib, respectively, whose other members, ppGalNAc T1 and ppGal-NAc T2 and T14, were shown above to prefer the GPIV glycopeptide substrate (see Fig. 3, A-C). This indicates that even within a subfamily, the N-and C-glycopeptide preferences can vary. The lack of directional glycopeptide specificity for ppGal-NAc T5, T13, and T16 may be due to extensive lectin domain conformational dynamics, suggesting that the domain may be equally populated at the right and left of the catalytic domain and/or due to an elevated local concentration of glycopeptide substrate due to lectin binding and release. Edman Sequencing of GPIV and GPV Series of (Glyco)peptides-In an attempt to determine whether there is an optimal distance between the lectin-bound T* residue and the Xaa residues glycosylated by the catalytic domain, selected (glyco)peptide products were Edman sequenced, and each residue's [ 3 H]GalNAc content was determined. Fig. 7, A-D, shows representative plots for ppGalNAc T1, T2, T3, and T13 with glycopeptides GPIV and GPV (left-and right-hand panels, respectively, roughly scaled to approximate their relative levels of incorporation). 6 Fig. 7E shows incorporation for the control peptides, GPIV-C and GPV-C, glycosylated by ppGalNAc T13. Rather than observing a sharp peak of maximal [ 3 H]GalNAc incorporation, a relatively broad distribution of incorporation is observed for all substrates. As might be expected, the largest differences are found between ppGalNAc T3 and ppGalNAc T1, T2, and T13 for both GPIV and GPV, whereas only subtle differences are observed in the patterns among ppGalNAc T1, T2, and T13. The distance from the T* to the peak in incorporation in GPIV is ϳ10 residues for ppGalNAc T1, T2, and T13 and 16 residues for T3. For GPV, the distance from the T* and peak in incorporation is 10 -11 residues for ppGalNAc T1, T2, and T13 and 8 -9 residues for ppGalNAc T3.
In an attempt to reveal the extent that the lectin domain may alter the pattern of [ 3 H]GalNAc incorporation, the distributions of the nonglycosylated control peptides, GPIV-C and GPV-C (glycosylated by ppGalNAc T13, Fig. 7E), were subtracted from the distributions obtained for the glycosylated peptides GPIV and GPV, respectively. 7 These difference plots for all eight transferase isoforms, grouped by glycopeptide utilization, are shown in Fig. 8. It was anticipated that these plots would reveal the effects of lectin domain binding on the pattern of glycosylation and perhaps further reveal differences between transferase class and N-or C-terminal preference. Except for ppGalNAc T3 on GPIV (Figs. 7C and 8D), the difference plots were not appreciably different from the original distribution, although in a few cases, the distributions may have shifted slightly (i.e. for ppGalNAc T2, T5, and T16 on GPIV). From these results, we conclude that the binding of glycopeptide substrate for the most part (i.e. except for ppGalNAc T3 on GPIV) does not greatly alter the pattern of glycosylation compared with the nonglycopeptide substrate. Furthermore, no consistent differences were observed between isoform classes or between transferases within a class that had different N-or 6 Note that two different incorporation time points are plotted in Fig. 7, A-C, for ppGalNAc T1, T2, and T3 and that the distributions obtained at the two different time points are identical when plotted at the same relative scale (data not shown). This observation suggests that comparisons of the patterns of incorporation among the different transferase isoforms can be made even between data obtained with different extents of glycosylation. This is reasonable because under the typical reaction conditions, the maximum possible incorporation of the UDP-GalNAc would give a (glyco)peptide GalNAc content of about 3 mol %. 7 GPIV-C and GPV-C control sequencing data were not successfully obtained from most transferases due to very low incorporation; therefore, the successfully obtained data from both control peptides for ppGalNAc T13 were utilized. The only other successfully obtained control data, from ppGalNAc T2, were nearly identical to ppGalNAc T13.

FIGURE 6.
A, model building of (glyco)peptide substrates GPIV and GPV onto the EA2-bound ppGalNAc T2 crystal structure (42). Left, GPIV modeled with its Xaa residues placed in the catalytic domain in the same N-to C-terminal orientation as the 9 residues of the bound EA2 peptide. Right, GPV modeled in the catalytic domain. For simplicity, the 9 EA2 residues were maintained as in the original structure (see Fig. 1), whereas 3 additional Ala residues were added to complete the representation of the 12-Xaa acceptor region. Likewise, Ala residues were used to represent the Zaa residues, whereas a Ser residue (green) was used to represent the location of the GalNAc-O-Thr. Note that for simplicity, both regions were modeled as static extended structures but in reality would be relatively  Table 2) is space-filled. Protein Data Bank accession numbers are as follows: 2FFU (42), 2FFV (42), 2D7I (61), and 1XHB (60), respectively.

TABLE 2 ppGalNAc T linker domain and lectin domain motif alignments
a Linker domain (L) residues conserved (or similar) in all eight transferases are shaded dark grey, whereas residues conserved only within a transferase class are shaded light grey. b ␣-, ␤-, and ␥-ricin lectin subdomain canonical motif residues, CLD and QXW, are shaded in green, those residues that are similar to the canonical motif are shaded in yellow, and dissimilar, non-canonical residues are shaded in red (66 -69). c Note that the last residues of the catalytic domain (C) and the first residues of the lectin domain, CLD, are contiguous with the linker domain residues (L).
C-terminal glycopeptide preferences (i.e. ppGalNAc T1, T2, and T14 versus ppGalNAc T13, T16, and T5). These findings are consistent with a highly mobile lectin domain where the lectin-bound glycopeptide substrate would have a broad range of acceptor interacting distances with the catalytic domain. Alternatively, these results could be consistent with just a regiospecific increase in substrate concentration sim-ply due to rapid binding and release from a conformationally fixed lectin domain.

DISCUSSION
Among eukaryote glycosyltransferases, the ppGalNAc T family is unique by possessing a separate carbohydrate binding lectin domain. Interestingly, such carbohydrate binding mod- ules are common to bacterial glycoside hydrolases, serving to both increase local enzyme-substrate concentrations and to impart substrate specificity to nominally nonspecific catalytic domains by targeting the enzyme to specific substrates or substrate features (62)(63)(64)(65). Numerous studies on the ppGalNAc Ts have indeed demonstrated that their lectin domains serve sim- (right) for  ppGalNAc T1, T2, T14, T3, T6, T5, T13, and T16, respectively. Note that for A and B, substrates were glycosylated using 0.5 M, 100-fold specific activity UDP-[ 3 H]GalNAc, whereas in C-H, substrates were glycosylated using 50 M standard specific activity UDP-[ 3 H]GalNAc. ilar functions to modulate glycopeptide substrate recognition and specificity (48 -53), but to date there has been no systematic study of the family against a common set of (glyco)peptide substrates. In this work, we have characterized eight nonglycopeptide-requiring ppGalNAc T isoforms against the series of random (glyco)peptide substrates listed in Table 1. Our findings have unambiguously revealed that prior GalNAc-O-Thr(Ser) substrate glycosylation can be recognized by these transferases in a specific N-or C-terminal direction that varies with ppGalNAc T isoform (see Figs. 3 and 4). Thus, for all eight transferases, at least one of the glycopeptide substrates, GPIV and/or GPV, shows significantly elevated activity over its nonglycosylated control. We attribute this N-or C-terminal specificity (i.e. elevated activity) to the binding of the glycosylated Thr residue of the substrate to the lectin domain in such a manner that the acceptor region of the substrate is oriented at the catalytic domain for optimal glycosylation. These findings strongly suggest that the tethered lectin domain on the ppGalNAc Ts may be mobile and that its location relative to the catalytic domain varies among isoforms. Such domain mobility is supported by the superimposition of the x-ray crystal structures of ppGalNAc T1, T2, and T10, as shown in Fig. 6B (42,60,61). For those transferases with similar N-and C-terminal glycopeptide enhancements (Fig. 4), it is possible that the lectin domain may be sufficiently mobile that it can enhance glycosylation from both an N-and C-terminal direction. Alternatively, the lectin domain may serve to equally increase the N-and C-terminal glycopeptide substrate concentrations by a simple bind and release mechanism, as shown for the glycoside hydrolases (62)(63)(64)(65). Further evidence for a highly flexible lectin domain may be found in the broad distribution of glycosylation that is observed in the Xaa region of both the glycosylated and nonglycosylated substrates (see Figs. 7 and 8). The only significant alteration in distribution is observed for ppGalNAc T3 glycosylating its nonpreferred glycopeptide substrate GPIV (Fig. 7). This suggests that the lectin domain of this transferase is involved to some extent in directing the glycosylation of this substrate. Obviously, further studies are necessary to fully understand the dynamics of the lectin domain of these transferases.

FIGURE 8. Plots of [ 3 H]GalNAc incorporation into the Xaa residues of random glycopeptides GPIV and GPV (open bars) and after normalization by subtraction of control peptides GPIV-C or GPV-C that were glycosylated by ppGalNAc T13 (filled bars). A-H, plots of GPIV (left) and GPV
In an attempt to correlate the different N-and C-terminal glycopeptide specificities to transferase peptide sequence we have compared the sequences of their linker domain and selected lectin domain motifs, as shown in Table 2. From the alignment of the linker domains, no obvious differences in length or sequence can readily account for the different behavior observed within a class (i.e. ppGalNAc T2 and T14, which prefer glycopeptide GPIV, are only 1 residue shorter than T16, which shows preferences for both glycopeptide substrates, whereas the linker domains of ppGalNAc T1 and T13 are identical except for two conservative substitutions, although they show different glycopeptide specificities). By contrast, the linker domains of ppGalNAc T3 and T6 are only 50% identical but have the same elevated GPV preferences.
Numerous studies of the ricin lectin ␣-, ␤-, and ␥-subdomain repeats across multiple organisms have revealed several binding motifs, including the CLD and QXW sequences (66 -69). Mutagenesis studies of the ppGalNAc T lectin domains have shown that the Asp residue of the ␣-subdomain CLD motif is typically required for lectin binding activities (i.e. ppGalNAc T1-D444A (54,70,71), ppGalNAc T2-D458H (50,54), and ppGalNAc T3-D519H (54,55) are lectin-inactivating mutants). A similar mutation in the ppGalNAc T1 ␤-ricin subdomain (D484A) only modestly decreases lectin-modulated activity (71,72), whereas mutation of the ␥-ricin subdomains of ppGalNAc T1 and T2 (D525A and D541A, respectively) shows no effects on their lectin domain activities (50,71,72). Mutation of the ␣-subdomain of ppGalNAc T4 (D459H) also eliminates its lectin activity (48). As shown in Table 2, the ␤-subdomains of ppGalNAc T2, T14, T16, T3, T6, and T5 and the ␥-subdomains of T3 and T6 lack this Asp. Several transferases possess Glu instead of Asp (i.e. ppGalNAc T14 and T16 in the ␣-subdomain and T16 and T5 in the ␥-domain); whether these domains can bind GalNAc is presently unknown. The ␤-subdomains of ppGalNAc T6 and T5 lack the QXW motif having EEW and LKW and would be expected to be inactive. Interestingly, the co-crystal structure of ppGalNAc T10 shows GalNAc-O-Ser bound to its ␤-subdomain (having canonical motifs CFD and QLW) (61). The ricin subdomain motifs of ppGalNAc T5 would be expected to have the weakest lectin binding activities because all three subdomains lack the critical Asp residue, although, in its ␥-subdomain, the Asp is replaced by a Glu. Nevertheless, because ppGalNAc T5 displays clear glycopeptide specificities, one or more of its lectin subdomains (perhaps its ␥-subdomain) must possess significant binding activity to provide glycopeptide enhancements. Generally, glycopeptide binding to of any of the ppGalNAc Ts lectin domains has been difficult to directly detect, further suggesting its weak nature (50,54,55). As with the analysis of the linker domains, we conclude that we cannot discern any obvious correlation between the likely activities of specific lectin subdomain motif and a given transferase's observed glycopeptide specificity.
From the above discussion of the lectin domain motifs listed in Table 2, it is likely, for all but ppGalNAc T5, that glycopeptide binding would occur predominantly at the ␣-subdomain. As shown in the superimposed crystal structures of the ppGalNAc Ts in Fig. 6B, the critical Asp residue of the CLD motif in the ␣-subdomain (highlighted in the structures) is found in a wide range of positions relative to the catalytic domain. One could easily assume that the conformational flexibility of the lectin domain relative to the catalytic domain in solution would be greater than that observed in Fig. 6B. Thus, both the structural and experimental results are consistent with a highly mobile lectin domain, whose dynamics and relative position with respect to the catalytic domain varies widely with transferase isoform. Whether additional domain-domain interactions may modulate the relative positions of the lectin domain to the catalytic domain is presently unknown, but it seems highly plausible that such interactions could assist in the positioning of the lectin domain. Future studies will address these questions.
Prior Evidence for the Lectin Domain Targeting N-or C-terminal Glycopeptide Glycosylation-The glycosylation of the Muc 7 peptide (PTPSATT 7 PAPPSS 13 S 14 APPET 19 T 20 AAK) by the subclass Ia transferases ppGalNAc T1 and T13 has been examined by Zhang et al. (73). Interestingly, they observed that although Thr 7 was initially glycosylated by both transferases, only ppGalNAc T13 was capable of further glycosylating the boldface C-terminal Ser and Thr residues of the peptide. These findings are entirely consistent with the transferase glycopeptide preferences summarized in Fig. 4, where ppGalNAc T1 prefers to glycosylate sites N-terminal of a prior site of glycosylation, whereas ppGalNAc T13 is capable of glycosylating sites both N-and C-terminal of a prior site of glycosylation. Raman et al. (51) have extensively characterized the glycosylation of a series of MUC5AC glycopeptides by ppGalNAc T2 and its catalytic domain. Based on comparative binding studies, it was concluded that ppGalNAc T2 bound glycopeptide when the acceptor site was 10 residues N-terminal of an existing site of GalNAc glycosylation but not a glycopeptide with the reverse orientation and that this was due to the presence of the lectin domain, an observation consistent with our studies of ppGalNAc T2 (see Fig. 4). 8 Likewise, studies on the glycan binding requirements of the lectin domain of ppGalNAc T3 suggest that this transferase utilizes its lectin domain to glycosylate residues C-terminal of the site of prior glycosylation (55), again in keeping with our findings for ppGalNAc T3. ppGalNAc T3 was previously shown to be the only isoform capable of glycosylating the proprotein processing region of FGF23 (IHFNT 171 PIPR RHT 178 R2SAEDD) (21,25,74), and in unpublished work, 9 Kato et al. have found that ppGalNAc T3 glycosylates the Thr 178 site in a lectin-dependent manner by first glycosylating Thr 171 in a lectin-independent manner, both in in vitro enzyme assays and ex vivo in CHO cells stably transfected with ppGalNAc T3 variants. The Thr 171 site is also glycosylated by several other ppGalNAc-T isoforms (21). Our previous work (43) utilizing transferase specific preferences, obtained from random peptide studies, also suggests that Thr 171 would be a moderate to good substrate for ppGalNAc T1, T2, and T3, whereas Thr 178 would not be a substrate for ppGalNAc T1 or T2 and would be only a modest substrate for ppGalNAc T3 (see Table 2 of Ref 43). This suggests that the prior glycosylation of Thr 171 may serve to target ppGalNAc T3 glycosylation of Thr 178 , 7 residues C-terminal of Thr 171 . The glycosylation distribution plots for ppGalNAc T3 for glycopeptide GPV (Figs. 7C and 8D) are entirely consistent with this, showing a peak of glycosylation 8 -9 residues C-terminal of the original site of glycosylation. By contrast, the prior glycosylation of Thr 171 would fail to enhance the glycosylation of Thr 178 by ppGalNAc T1 and T2 because these transferases possess the reverse glycopeptide specificities. Thus, glycosylation of Thr 171 enhances the glycosylation of Thr 178 by ppGalNAc T3 while effectively reducing the activities of those transferases with the reverse glycopeptide preferences. Thus, prior glycosylation of Thr 171 serves as a targeting switch or enhancer for the glycosylation of Thr 178 , whose glycosylation is required to inhibit the proprotein convertase cleavage of FGF23 in vivo and in vitro (20,21,74).
The control of site-specific mucin type O-glycosylation and the need for such a large family of initiating ppGalNAc Ts is not well understood. 10 In this work, we have identified a previously unappreciated level of control, whereby prior O-glycosylation is used to target and enhance the glycosylation of specific N-or C-terminal sites in an isoform-specific manner. Our studies and those of others strongly suggest that this N-or C-terminal selectivity is due to weak glycopeptide binding to the lectin domain, whose orientation relative to the catalytic domain is highly mobile and isoform-dependent. This glycopeptide selectivity can provide an additional level of control or fidelity for the glycosylation of biologically significant sites and suggests that O-glycosylation in some instances may be exquisitely controlled. Furthermore, our observations that homologous ppGalNAc T isoforms within a given subfamily (that presumably have similar peptide substrate specificities; see Ref. 56) may possess different N-or C-terminal glycopeptide preferences may help explain the large number of ppGalNAc T family members thereby maintaining peptide specificity while altering glycopeptide specificity. These studies clearly demonstrate that the biological control mucin type O-glycosylation is highly complex and that further structural, biochemical, and biological studies are necessary to fully understand this important modification. 8 Note that Raman et al. (51) have also proposed in their kinetic analysis that the ppGalNAc T2 lectin domain may also direct glycosylation to sites C-terminal of a GalNAc-O-Thr residue by assisting in product release rather than assisting in substrate binding. 9 K. Kato, C. Jeanneau, I. Dar, E. P. Bennett, K. T. Schjoldager, A. Benet-Pagès, T. M. Strom, and H. Clausen, unpublished results. 10 ppGalNAc T1 and T2 are ubiquitously expressed in nearly all mammalian tissues and cell lines, whereas the remaining ppGalNAc T isoforms studied in this work are selectively expressed (see Refs. 1, 16, and 75). For example, ppGalNAc T3 is the dominant transferase expressed in the testis and is also highly expressed in the kidney (21,75,76), and ppGalNAc T16 is more highly expressed than ppGalNAc T1 in the heart and brain (41), whereas ppGalNAc T13 is specifically expressed in neurons (73). How the multiple transferases work in concert in glycosylating their target proteins is largely unknown, as is their regulation at both the protein and transcriptional levels, although, judging from observations in our laboratory and other laboratories, large differences in protein stabilities exist.