Cloning and Characterization of a Ninth Member of the UDP-GalNAc:Polypeptide N -Acetylgalactosaminyltransferase Family, ppGaNTase-T9*

We have cloned, expressed and characterized the gene encoding a ninth member of the mammalian UDP-Gal-NAc:polypeptide N -acetylgalactosaminyltransferase (ppGaNTase) family, termed ppGaNTase-T9. This type II membrane protein consists of a 9-amino acid N-terminal cytoplasmic region, a 20-amino acid hydrophobic/trans-membrane region, a 94-amino acid stem region, and a 480-amino acid conserved region. Northern blot analysis revealed that the gene encoding this enzyme is expressed in a broadly distributed manner across many adult tissues. Significant levels of 5- and 4.2-kilobase transcripts were found in rat sublingual gland, testis, small intestine, colon, and ovary, with lesser amounts in heart, brain, spleen, lung, stomach, cervix, and uterus. In situ hybridization to mouse embryos (embryonic day 14.5) revealed significant hybridization in the developing mandible, maxilla, intestine, and mesencephalic ventricle. Constructs expressing this gene transiently in COS7 cells resulted in no detectable transferase activity in vitro against a panel

Mucin type O-linked glycosylation is initiated by the action of a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases (ppGaNTase, 1 EC 2.4.1.41), which catalyze the transfer of GalNAc from the nucleotide sugar UDP-GalNAc to the hydroxyl group of either serine or threonine. A number of functional roles for O-glycans have been suggested (reviewed in Ref. 1), including protection from proteolytic degradation (2), alteration of substrate structural conformation (3), aiding in sperm-egg binding during fertilization in mice (4), and coordination of leukocyte rolling along endothelial cells upon inflammation and injury (5). However, the exact biological functions of O-linked glycosylation remain largely unknown, as studies involving chemical/enzymatic cleavage of sugars and/or mutagenesis of acceptor residues on proteins can result in secondary effects unrelated to sugar removal or absence. Since carbohydrates can only be "mutated" indirectly by modifying the enzymatic activities of the glycosyltansferases responsible for their synthesis, our efforts have focused on the characterization of the enzyme family responsible for the initiation of Oglycan addition.
Thus far, seven distinct mammalian isoforms from this gene family have been identified and functionally characterized: ppGaNTase-T1 (6, 7), -T2 (8), -T3 (9, 10), -T4 (11), -T5 (12), -T6 (13), and -T7 (14,15). An eighth putative isoform was ablated in mice without any obvious phenotypic effects (16,17); however, the enzymatic activity and the gene encoding this isoform remain uncharacterized. Whereas some isoforms display a broad range of expression in adult tissues and act on a robust set of substrates (ppGaNTase-T1, -T2, and -T3), others are more restricted in both expression and substrate preference (ppGaNTase-T4, -T5, and -T7). ppGaNTase-T7 (14) has the distinction of being the only transferase identified thus far that requires a GalNAc-containing glycopeptide as a substrate; glycosylation of the peptide substrate by ppGaNTase-T1 is required before ppGaNTase-T7 will further glycosylate additional residues. This result indicates that not all O-linked glycosylation occurs simultaneously and suggests that a hierarchy of action within this family may be responsible for the complex patterns of multisite substrate glycosylation seen in vivo.
Here, we report the cloning of another member of this transferase family, termed ppGaNTase-T9. In common with ppGaN-Tase-T7, ppGaNTase-T9 demonstrated no transferase activity against a panel of unmodified peptide substrates in vitro. However, when the MUC5AC peptide substrate was first glycosylated by ppGaNTase-T1, the resultant glycopeptides were readily glycosylated further by ppGaNTase-T9 in a manner distinct from that of ppGaNTase-T7. ppGaNTase-T9 and pp-GaNTase-T7 transcript expression patterns differed as well; ppGaNTase-T9 was expressed more widely across adult tissues and exhibited distinct expression patterns within developing mouse embryos. These results suggest that glycosylation of multisite substrates occurs through the specific and hierarchical action of multiple members of this enzyme family, whose expression is uniquely regulated both during development and in adult tissues.

EXPERIMENTAL PROCEDURES
Isolation of ppGaNTase-T9 Probes and Full-length cDNAs-Previously, the conserved amino acid regions EIWGGEN and VWMDEYK were used to design sense and antisense PCR primers to amplify products from rat sublingual gland (rat SLG) cDNA. These products were cloned, sequenced and used to screen a rat SLG cDNA library as described (12). A probe previously used to clone the rat ppGaNTase-T5 cDNA (12) resulted in the detection of additional isoforms when screening an oligo(dT)-primed Uni-Zap XR rat SLG cDNA library according to standard procedures (18). A novel isoform, designated ppGaNTase-T9, was identified by cross-hybridization with the probe derived from positions 2076-2240 of ppGaNTase-T5 (12). One clone containing a truncated 3Ј end was initially isolated (rTA-0). An oligonucleotide (d(AGACGTTGTGGCCCAGAAAAAACTCCGAGGCTC-C)) based on the 3Ј-most sequence of this partial cDNA clone was end-labeled and used to screen the cDNA library a second time. A cDNA clone containing a complete open reading frame was isolated (rTA-3). The coding region within this clone was completely sequenced and given the designation ppGaNTase-T9. The N-terminal transmembrane domain was determined by a Kyte-Doolittle hydrophobicity plot.
Northern Blot Analysis-Total RNA from Wistar rat tissues was extracted according to the single-step isolation method described by Ausubel et al. (20). Following electrophoresis in a 1% formaldehydeagarose gel, rat total RNA samples were transferred to Hybond-N membranes (Amersham Pharmacia Biotech) according to Sambrook et al. (18). A 325-bp segment of the ppGaNTase-T9 cDNA region (from the vector pBSmTA-423, containing a 325-bp ppGaNTase-T9 insert in the HindIII site of pBluescriptKSϩ) from nucleotides 1334 -1756 of the amino acid coding region was labeled using the Random Primers DNA labeling system (Life Technologies, Inc.) according to manufacturer's instructions and used as a probe for ppGaNTase-T9 transcripts. ppGaNTase-T7 and -T1 were detected as described previously (12,14). Antisense 18 S ribosomal subunit oligonucleotide d(TATTGGAGCTG-GAATTACCGCGGCTGCTGG) was end-labeled as described (18) and used to normalize sample loading by hybridizing with 5 M excess of probe. All hybridizations were performed in 5ϫ SSPE, 50% formamide at 42°C with two final washes in 2ϫ SSC, 0.1% SDS at 65°C for 20 min.
In Situ Hybridization-In situ hybridization studies were performed using a modification of procedures described by Wilkinson and Green (21). Mouse embryos were fixed overnight in freshly prepared ice-cold 4% paraformaldehyde in phosphate-buffered saline. The embryos were dehydrated through ethanol into xylene and embedded in paraffin using a Tissue-Tek V.I.P. automatic processor (Miles). Sections (5 m) were adhered to commercially modified glass slides (Super Frost Plus, VWR), dewaxed in xylene, rehydrated through graded ethanols, and treated with proteinase K (to enhance probe accessibility) and with acetic anhydride (to reduce nonspecific background). Single-stranded RNA probes were prepared by standard techniques with specific activities of 5 ϫ 10 9 dpm/g. ppGaNTase-T9 was detected using the plasmid pBSrT9-IS as a template for RNA production, ppGaNTase-T7-specific RNA probes were prepared using the plasmid pBSrT7-IS, and ppGaN-Tase-T1 transcripts were detected using the plasmid pBSmT1-IS. pB-SrT9-IS contains nucleotides 199 -381 of the rat ppGaNTase-T9 amino acid coding region generated by PCR amplification using the primers mTAISϩ (d(ATAGGTACCAAGCTTGCTGAACAAAGGCTGAAGGA) and mTAISϪ (d(ATAGAGCTCGAGAGAGCGATTCAGGGAGATT). pBSrT7-IS contains a segment of the rat ppGaNTase-T7 (14) amino acid coding region from nucleotide position 1759 to 1964 generated by PCR amplification using the primers mT5ISϩ (d(ATAGGTACCAAGCTTG-ACCAAGGGACCCGACGGATCC) and mT5ISϪ (d(ATAGAGCTCGAG-GATGTTATTCATCTCCCACTTCTGAT). pBSmT1-IS contains nucleotides 1376 -1676 of the mouse ppGaNTase-T1 amino acid coding region generated by PCR amplification using the primers mT1insituϩ (d(ATA-GGTACCAAGCTTGTCATGGTATGGGAGGTAATCAGG)) and mT1in situϪ (d(ATAGAGCTCGAGAATATTTCTGGAAGGGTGACAT)). All of the above mentioned PCR products were cloned into the KpnI and SacI sites of pBluescript KSϩ. All vectors were linearized at the introduced HindIII site and transcribed with T7 RNA polymerase to produce labeled antisense RNA. Sections were hybridized at T m Ϫ15°C, washed at high stringency (T m Ϫ7°C) and treated with RNase A to further diminish nonspecific adherence of probe. Autoradiography with NBT-2 emulsion (Eastman Kodak Co.) was performed for 25 days. Slides were developed with D19 (Eastman Kodak), and the tissue counterstained with hematoxylin. Brightfield and darkfield images were captured with a Polaroid Digital Microscope camera and processed using Adobe Photoshop (Adobe Systems) with Image Processing Toolkit (Reindeer Games, Asheville, NC).
Generation of Secretion Constructs for ppGaNTase-T9 -cDNA clones containing the 1.8-kb coding region of ppGaNTase-T9 were isolated from the rat sublingual gland cDNA library described previously (12). An MluI site was introduced into cDNA clone rTA-0 by PCR amplification using the primers rTA-MluI-S (d(CCTACGCGTCTCCTGGGGG-TTCCGG)) and rTA-PCR-AS (d(GGTCAAGCAAAGGGGGGAGCCAGT-T)). This amplified product was digested with MluI and EagI and cloned into the vector pBS-IMKF3 to create the vector, pBS-rTAmut#7. Sequencing was performed to verify that no PCR-induced mutations had been sustained in the cloned product. A 650-bp MluI-EagI(blunt) fragment from pBS-rTAmut#7 was then cloned into the MluI-Bsp120(blunt) sites of pIMKF4 to generate the vector pF4-rTA-Mut-7. (pIMKF4 is identical to pIMKF3 (11) except that the multiple cloning site is expanded between the BglII and NotI sites using the annealed oligonucleotides, Bgl-Not-S (d(GATCTAGAGCTCACCGGTAAGC)) and Not-Bgl-AS (d(GGCCGCTTACCGGTGAGCTCTA)). A 1.2-kb BspEI-Bsu-36I(blunt) fragment from the cDNA clone rTA-3 was then cloned into the BspEI-Ecl136II sites of pF4-rTA-Mut-7 to generate the mammalian expression vector, pF4-rT9. pF4-rT9 is an SV40-based expression vector, which generates a fusion protein containing the following, in order: an insulin secretion signal, a metal binding site, a heart muscle kinase site, a FLAG epitope tag, and the truncated rat ppGaNTase-T9 cDNA.
The purified products of the reaction with Pichia-derived ppGaN-Tase-T1 were used as substrates in subsequent incubations with COS7 cell-derived ppGaNTase-T9 and ppGaNTase-T7 media to generate the data in Figs. 4 -6. Equal relative amounts of each recombinant enzyme were used in each reaction as determined by SDS-PAGE analysis (12). Reactions were carried out in a total volume of 50 l at the following concentrations: 15 g of each peptide or glycopeptide, 125 mM cacodylate buffer (pH 7.0) containing 0.2% (v/v) Triton X-100, 12.5 mM MnCl 2 , 1 mM aprotinin, 1 mM leupeptin, 1 mM E64, 1 mM phenylmethanesulfonyl fluoride, and 1.25 mM AMP. The enzyme samples were preincubated in this reaction mixture for 5 min at 37°C, and then the reaction was initiated with the addition of 2 nmol of UDP-[ 14 C]GalNAc (54.7 mCi⅐mmol Ϫ1 ; 2.02 Gbq⅐mmol Ϫ1 ; 0.02 mCi⅐ml Ϫ1 ) and 20 nmol of cold UDP-GalNAc. Reactions were performed for 96 h at 37°C with additional enzyme and UDP-GalNAc (22 nmol) being added after each 24-h interval. Reactions were then stopped by the addition of an equal volume of 10 mM EDTA, purified on a Waters 265 HPLC, and analyzed by capillary electrophoresis and MALDI-TOF as described above.
Periodate Oxidation, Sodium Borohydride Reduction, and Enzyme Assays-Purified MUC5AC glycopeptide (100 nmol) and MUC5AC parent peptide (100 nmol) were oxidized with 200 l of 0.08 M NaIO 4 in 0.05 M acetate buffer (pH 4.5) at 4°C for 60 h, in the dark (28) in side by side reactions. Excess periodate was destroyed by adding 20 l of ethylene glycol. The reaction mixtures were adjusted to pH 7.5 with 1 N NaOH. Sodium borohydride was added to a final concentration of 0.2 M and reduction continued for 4 h at 4°C. Excess borohydride was destroyed by the addition of 20 l of glacial acetic acid, and released boric acid was evaporated several times with methanol. The reaction mixtures were purified by HPLC as described above. Periodate-treated and untreated MUC5AC and MUC5AC glycopeptide were then used as substrates in reactions with COS7 cell-derived ppGaNTase-T1, ppGaNTase-T9, or mock-transfected media (pIMKF1) as described above to generate the data in Table III. Reactions were performed in duplicate for 24 h at 37°C. All reactions were stopped by the addition of an equal volume of 10 mM EDTA. Reaction products were passed through AG1-X8 resin and eluted with 1 ml of water, and incorporation was determined by scintillation counting. Background values obtained from controls incubated without peptide substrate were subtracted from each experimental value.
Radiolabeling, Digestion, and Amino Acid Sequencing of Glycopeptide Products-In order to determine the glycosylation sites of products of ppGaNTase-T1, -T7, and -T9 reactions, glycopeptides were subjected to Edman degradation on an Applied Biosystems 473A sequencer. Samples of 2000 -5000 pmol were dried on trifluoroacetic acid washed glass fiber filters spotted with 1.5 mg of BioBrene Plus. Amino acid phenylhydantoin (PTH)-derivatives were chromatographed on standard ABI 5-m C18 PTH columns using the Fast Normal gradient program and were monitored by absorbance at 272 nm. The PTH-Thr-O-GalNAc diastereomers were found to elute as two peaks at unique positions in the chromatogram, near the positions of PTH-Ser and PTH-Thr; we were unable to separate PTH-Thr from the PTH-Thr-O-GalNAc derivative. The PTH derivative of Ser-O-GalNAc is identified as an unre-solved doublet peak near the position of PTH-Asp (29,30).
To confirm sites of glycosylation in the tetra-glycopeptide, the MUC5AC tri-glycopeptide was incubated with ppGaNTase-T9 in the presence of labeled UDP-[ 14 C]GalNAc as described above. Products were purified as described above and subjected to Edman degradation, where counts were measured after each cycle.
Edman degradation sequencing of the MUC5AC hexa-glycopeptide was confirmed by limited proteinase K digestion. Digestion of the glycopeptide was performed in 25 l of 0.05 M phosphate buffer at pH 7.5 with an enzyme to substrate ratio of 1:25. The reaction was stopped after 2 h by addition of 25 l of 10 mM EDTA, and products were purified by HPLC as described above and analyzed by MALDI-TOF to determine the mass of each fragment. Fractions were then pooled according to the molecular mass and used for sequence analysis by Edman degradation.

RESULTS
cDNA Cloning and Sequence Analysis of ppGaNTase-T9 -A PCR strategy based on conserved regions within the ppGaN-Tase family was performed on cDNA from rat SLG; the resultant products were purified, cloned, and sequenced to identify the nature of the insert as described previously (12). A PCR product that previously identified the rat ppGaNTase-T5 cDNA (12) resulted in the detection on another novel cDNA, which shared homology to previously identified isoforms. The cDNA clone (rTA-0) contained a 3Ј truncation within the coding region. To obtain a full-length clone, an oligonucleotide based on the 3Ј-most sequence of the partial cDNA clone (d(AGACGTT-GTGGCCCAGAAAAAACTCCGAGGCTCC)) was end-labeled and used to screen the cDNA library a second time. A cDNA clone containing a complete open reading frame was isolated (rTA-3), sequenced, and given the designation ppGaNTase-T9.
As shown in Fig. 1, the cDNA encoding ppGaNTase-T9 contains a 1812-bp insert encoding a unique 603-amino acid protein. Conceptual translation of this cDNA revealed a type II membrane protein architecture, typical of the ppGaNTase family. The enzyme consists of a 9-amino acid N-terminal cytoplasmic region, a 20-amino acid hydrophobic/transmembrane region, a 94-amino acid stem region, and a 480-amino acid putative catalytic region. Table I summarizes the degree of amino acid similarity between each of the known isoforms within the conserved putative catalytic region. ppGaNTase-T9 displays the greatest degree to similarity within this region to ppGaNTase-T1 and the lowest degree of similarity to ppGaN-Tase-hT6. Amino acid similarity within the C-terminal ricinlike lectin motif is shown in Table II. This domain displays homology to the carbohydrate binding region of the plant lectin, ricin, and has been hypothesized to be involved in enzyme recognition of carbohydrate moieties on glycopeptide substrates (19,31). Within this region, ppGaNTase-T9 has the greatest similarity to ppGaNTase-mT3 and the least to ppGaNTase-mT2.
Northern Blot Analysis-Northern blots of rat total RNA were probed with a ppGaNTase-T9 specific probe (Fig. 2) as well as probes specific for the previously characterized ppGaN-Tase-T7 and -T1 (14). The highest levels of the 5-and 4.2-kb ppGaNTase-T9 message were found in the SLG, testis, small intestine, colon, and ovary. Smaller amounts were detectable in heart, brain, spleen, lung, stomach, cervix, and uterus. ppGaN-Tase-T7 transcripts were much more restricted in their expression, whereas ppGaNTase-T1 transcripts were more ubiquitous, as seen previously (14). The 18S ribosomal probe was employed to control for RNA integrity and loading variations.
Mouse Embryonic in Situ Hybridization Analysis-Given the degree of amino acid conservation and nucleic acid homology for each specific isoform across species as well as similarity of expression patterns seen in adult tissues (14), we examined ppGaNTase-T9 gene expression during mouse development using parasagittal sections of embryos during late organogenesis (Theiler stage 22-23, embryonic day 14.5). The region of the rat ppGaNTase-T9 gene used as a probe is 96% homologous to the corresponding mouse EST. Sections were hybridized with RNA probes specific for ppGaNTase-T9, -T7 and -T1 and compared with each other (Fig. 3). ppGaNTase-T9 is expressed relatively abundantly compared with ppGaNTase-T7 (Fig. 3, A versus B) and in a more restricted pattern than ppGaNTase-T1 (Fig. 3C). A higher magnification view of the developing hindbrain region in these animals (Fig. 3, D-F) shows discrete accumulation of ppGaNTase-T9 transcripts in the rapidly dividing, undifferen-
The substrates and products of the aforementioned reactions were then sequenced by Edman degradation to determine the sites of GalNAc addition by each enzyme (Fig. 5). Fig. 5A shows the HPLC profiles for residues 1-3 and 9 -13 of the MUC5AC parent peptide, the di-and tri-glycosylated species produced by ppGaNTase-T1 and the hexa-glycosylated species produced by ppGaNTase-T7. The * and ** denote the diastereomeric peaks indicative of PTH-Thr-O-GalNAc, and the *** indicates the unresolved doublet peak of PTH-Ser-O-GalNAc. The HPLC profiles indicate that ppGaNTase-T1 glycosylates threonines 3

FIG. 2. Northern blot analysis of ppGaNTase-T9, -T7 and -T1.
Total RNA from Wistar rats was extracted from glands and organs listed above each lane. After electrophoresis on 1% formaldehyde-agarose gels and transfer to Hybond-N membranes, RNA was hybridized with a ppGaNTase-T9-specific probe (T9), a -T7-specific probe (T7), a -T1-specific probe (T1), and an 18S rRNA probe (18S). The and 13 in the diglycosylated species and threonines 3, 12, and 13 in the tri-glycosylated species. (Our earlier work indicating that T1 glycosylates serine 5 was in error due to misinterpretation of an additional proline peak in the serine 5 HPLC profile; proline 4 in the MUC5AC sequence gave a peak near the position of PTH-Ser-O-GalNAc, which carried over into the serine 5 HPLC profile and was mistakenly assumed to indicate a glycosylated serine. Since then we have repeated the Edman degradation multiple times to conclusively assign modified positions.) Upon incubation with the tri-glycopeptide, ppGaNTase-T7 glycosylates threonines 2 and 10 and serine 11 to form the hexa-glycopeptide (Fig. 5A). The sites of GalNAc addition in the hexa-glycopeptide were confirmed by limited proteinase K digestion of this species and analysis of the products by mass spectrometry and Edman degradation (Fig. 5B). The two peaks produced by this analysis correspond to the first 9 residues of MUC5AC substituted with 2 GalNAc residues (m/z ϭ 1284) and the last 7 residues substituted with 4 GalNAc residues (m/z ϭ 1498). Edman degradation of these fragments confirmed previous sequencing of the unfragmented glycopeptide (data not shown). We recovered insufficient penta-glycopeptide formed by ppGaNTase-T7 to perform sequence analysis.
The tetra-glycopeptide produced by ppGaNTase-T9 was also analyzed to determine the site of GalNAc addition. Since the quantities of the tetra-glycopeptide recovered were limited, we employed radiochemical sequencing to verify the site of GalNAc addition; ppGaNTase-9 was found to add [ 14 C]GalNAc to threonine 2 of the tri-glycopeptide substrate, as determined by both radioactive counting and HPLC analysis of Edman degradation products (Fig. 5, C and D). Unfortunately, the recovery of the tri-glycopeptide formed by ppGaNTase-T9 was insufficient to perform sequence analysis.
To further define the requirement of the ppGaNTase-T9 isoform for a GalNAc-containing substrate, we modified Gal-NAc residues by periodate oxidation and sodium borohydride reduction. The purified glycopeptides obtained from incubation with ppGaNTase-T1, along with the MUC5AC parent peptide, were subjected to mild periodate oxidation followed by sodium borohydride reduction. Periodate-treated and untreated glycopeptides and MUC5AC parent peptide were purified by HPLC, analyzed for integrity by capillary electrophoresis (data not shown), and incubated with COS7 cell-derived ppGaNTase-T1, ppGaNTase-T9, or mock-transfected (pIMKF1) media. Table  III compares the counts incorporated into each substrate by each enzyme. The ability of ppGaNTase-T9 to use the glycopeptide as a substrate is clearly reduced upon treatment with periodate and sodium borohydride (compare 17080 cpm incorporated into untreated material to 381 cpm incorporated into treated material) (Table III). However, this reduction in incorporation by ppGaNTase-T9 is not due to the peptide being compromised during periodate treatment, as ppGaNTase-T1 works equally well on both treated and untreated MUC5AC (compare 21545 cpm to 18100 cpm) (Table III). These data suggest that ppGaNTase-T9, like ppGaNTase-T7, requires the presence of intact GalNAc on the MUC5AC peptide for it to be used as a substrate.
To begin to address the activity of these enzyme hierarchies on peptides other than MUC5AC, we incubated P. pastorisderived ppGaNTase-T1 with the EA2 peptide (m/z ϭ 1340) for extended periods of time in the presence of excess UDP-GalNAc as described for MUC5AC. MALDI-TOF analysis and Edman degradation of the product of this incubation indicate that ppGaNTase-T1 produces a mono-glycopeptide (m/z ϭ 1543)

FIG. 4. Capillary electrophoresis (CE) and MALDI-TOF (MS) profiles of MUC5AC glycopeptide substrates and reaction products.
The capillary electrophoresis and MALDI-TOF profiles of the purified di-glycopeptide (Di) and tri-glycopeptide (Tri) species formed by the action of Pichia-purified ppGaNTase-T1 on the MUC5AC parent peptide are shown at the top. The profiles of the products formed by reaction with ppGaNTase-T9 (T9) or ppGaNTase-T7 (T7) are shown at the bottom. Masses are shown next to each peak. Tri, tri-glycopeptide; Tetra, tetra-glycopeptide; Penta, penta-glycopeptide; Hexa, hexa-glycopeptide.
with GalNAc at threonine 7 (Fig. 6, A and B). The same monoglycosylated species is also produced upon incubation of EA2 with ppGaNTase-T2 (data not shown). When this mono-glycosylated species is then used as a substrate in subsequent incubations with ppGaNTase-T9 or -T7, a di-glycosylated species (m/z ϭ 1746) is formed by both enzymes (Fig. 6A). (The additional small peak present in the ppGaNTase-T7 sample most likely represents a trace amount of tri-glycosylated species (m/z ϭ 1949) that was variably present and in quantities too low to be recovered for further analysis.) Both di-glycosylated species showed an additional GalNAc at threonine 6, indicating that ppGaNTase-T9 and -T7 are transferring GalNAc to the same residue in this glycopeptide, producing the same final product; this is in contrast to their respective activities on the MUC5AC glycopeptides. DISCUSSION We report the cloning of a novel member of the UDP-GalNAc: polypeptide N-acetylgalactosaminyltransferase family, termed ppGaNTase-T9. ppGaNTase-T9 encodes a type II integral membrane protein similar in structure to previously identified family members. In common with ppGaNTase-T7, ppGaN- Tase-T9 fails to act on a panel of unmodified peptide substrates, but rather catalyzes the transfer of GalNAc from UDP-GalNAc to a GalNAc-containing peptide substrate. This activity requires the prior activity of another member of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family (ppGaNTase-T1 or -T2) and results in the modification of an additional hydroxyamino acid within the glycopeptide substrate. Periodate oxidation further demonstrated that pp-GaNTase-T9 also requires the presence of an intact GalNAc residue on the glycopeptide substrate. We have determined that ppGaNTase-T9 will act on di-and tri-glycosylated MUC5AC and mono-glycosylated EA2, indicating that ppGaN-Tase-T9 will recognize different glycoforms of a given peptide as well as more than one type of glycopeptide substrate.
Although both ppGaNTase-T9 and ppGaNTase-T7 require glycosylated substrates, their activities on the MUC5AC glycopeptide substrates are clearly distinct. Whereas ppGaN-Tase-T7 catalyzes the formation of hexa-glycopeptides from the tri-glycopeptide substrate, ppGaNTase-T9 produces a tetraglycopeptide species, even after extended incubation. Edman degradation revealed that both ppGaNTase-T7 and ppGaN-Tase-T9 glycosylate threonine 2; however, ppGaNTase-T7 additionally acts upon threonine 10 and serine 11. In contrast to the results with the MUC5AC glycopeptide, ppGaNTase-T9 and -T7 act similarly on mono-glycosylated EA2; ppGaN-Tase-T9 and -T7 both form a di-glycopeptide by transferring GalNAc onto threonine 6. These results clearly demonstrate that these two family members have both overlapping and distinct enzymatic activities in vitro. How these in vitro observations translate in to in vivo specificities remains to be determined.
In both the MUC5AC and EA2 glycopeptides examined, pp-GaNTase-T9 adds GalNAc to the position immediately N-terminal to a previously glycosylated residue. ppGaNTase-T7 also transfers to the position immediately N-terminal from a glycosylated threonine in the EA2 glycopeptide and N-terminal from glycosylated threonines in the MUC5AC tri-glycopeptide. Increased glycosylation of sites vicinal to preexisting GalNAc residues has also been observed in vitro using an undefined mixture of transferases present in human milk (32). It is known from the analysis of mucins, that sites of glycosylation tend to be clustered in vivo (33). This clustering may reflect specific GalNAc recognition and subsequent local addition of GalNAc by the glycopeptide transferases. The production of large quantities of specifically designed glycopeptides is necessary to be able to conclusively address the effects of number and position of preexisting GalNAc residues on the activity and subsequent GalNAc addition by the glycopeptide transferases.
Amino acid comparisons of all known functional mammalian ppGaNTases have not uncovered regions of greater conservation between ppGaNTase-T7 and ppGaNTase-T9 relative to the other family members (Tables I and II), including regions within the ricin-like lectin motif. However, a larger panel of glycopeptide-specific enzymes on which to base comparisons FIG. 6. MALDI-TOF profiles and Edman degradation of EA2 and resultant EA2 glycopeptides. A, mass spectra of EA2 monoglycopeptide (Mono) produced by the action of ppGaNTase-T1 and the di-glycopeptide species (Di) produced by ppGaNTase-T9 (T9) and -T7 (T7). Masses are shown next to each peak. Amino acid sequence of the peptide is shown below each panel. The * denotes glycosylated residues. B, Edman degradation profiles for threonines 6 and 7 of the EA2 parent peptide (EA2), the mono-glycopeptide (Mono) formed by ppGaNTase-T1, and the di-glycopeptide (Di) formed by ppGaNTase-T7. Retention time is shown at the bottom. The * and ** denote peaks indicative of the PTH-Thr-O-GalNAc diastereomers. T, threonine; Dm, dimethylphenylthiourea by-product. may aid in deciphering regions involved in the specific recognition of a glycopeptide substrate. Previous work in the nematode, Caenorhabditis elegans, identified nine ppGaNTase isoforms (34), but enzymatic activity was detectable for only five. It is possible that the remaining four isoforms may also require a previously glycosylated peptide as a substrate. One recent study reports that a single amino acid change within the ricinlike lectin motif of ppGaNTase-T4 compromises the glycopeptide transferase activity of this enzyme (31). However, ppGaN-Tase-T4 can act as both a peptide and glycopeptide transferase, and it is unclear what specific affect this mutation had on substrate binding and/or catalytic activity, as kinetic parameters were not investigated. The gene expression patterns of ppGaNTase-T9 and ppGaN-Tase-T7, like their enzymatic activities, display some overlap yet are quite distinct. By Northern analysis, ppGaNTase-T9 is broadly expressed across many adult tissues in the rat, including the sublingual gland, digestive tract, female reproductive tract, testis, heart, brain, spleen, and lung. This tissue distribution is more restricted than the near ubiquitous expression seen for ppGaNTase-T1 yet not as specific as that seen for ppGaNTase-T7, which is found primarily in the sublingual gland and digestive tract. Furthermore, ppGaNTase-T9 transcripts are found in tissues where other more restricted isoforms (ppGaNTase-T5 and ppGaNTase-T7) have not been seen (testis, lung, spleen, brain, and heart). Within the developing embryo, expression of ppGaNTase-T9 is quite abundant relative to ppGaNTase-T7, being found in the developing craniofacial region, intestine, and specific regions of the hindbrain. In contrast, ppGaNTase-T7 is only minimally expressed at this developmental stage and is confined to very discrete regions. Earlier embryonic stages revealed similarly disparate expression patterns for ppGaNTase-T7 and -T9, with ppGaNTase-T9 being expressed in a consistently broader pattern than ppGaN-Tase-T7 (35).
These results provide new information suggesting additional layers of regulation in the glycosylation process. Thus far, each peptide transferase member has displayed, to varying degrees, unique patterns of activity in vitro (11,12,36,37). The addition of a subfamily of glycopeptide transferases displaying unique activities may result in a potentially complex network of sequential action and regulation within this family. The complexity of the network is further elaborated by the spatial and temporal expression of each transferase as well as their specific location within the Golgi apparatus (38). For example, certain glycopeptide transferases will only be able to act if the requisite peptide transferase has been expressed in the same cell type at the appropriate time. Furthermore, there is evidence suggesting that modification of preexisting GalNAc residues by the addition of other sugars results in a decrease in the subsequent glycosylation of other sites (32); consequently, the respective locations within the Golgi apparatus of the glycopeptide transferases relative to the peptide transferases and other transferases involved in chain elongation may govern the extent to which certain sites are glycosylated. Therefore, the specific enzymatic activities of each isoform and their unique expression patterns in adult tissues and during development, as well as the hierarchy of action established within this large family, may be responsible for the complex patterns of glycosylation observed for in vivo substrates.
One powerful approach to gain insight into the biological role of O-linked glycosylation involves ablation of the genes encoding these enzymes in mice. However, given the overlapping enzymatic specificities exhibited by this family of glycosyltransferases, single gene ablations may result in subtle or perhaps uninformative phenotypes. There exist a number of examples where deletions of single genes from other glycosyltransferase families have resulted in viable, fertile mice without distinguishable phenotypes (39). Therefore, the ablation of multiple isoforms displaying similar enzymatic activity (e.g. ppGaNTase-T7 and-T9) and/or expression profiles may be necessary. Our current efforts to characterize spatial and temporal expression and activity of each member of this family will aid in making informed choices as to which combination of gene ablations may provide insightful phenotypes.