Cloning and Expression of a Novel, Tissue Specifically Expressed Member of the UDP-GalNAc:Polypeptide N-Acetylgalactosaminyltransferase Family*

We report the cloning and expression of the fifth member of the mammalian UDP-GalNAc:polypeptideN-acetylgalactosaminyltransferase (ppGaNTase) family. Degenerate polymerase chain reaction amplification and hybridization screening of a rat sublingual gland (RSLG) cDNA library were used to identify a novel isoform termed ppGaNTase-T5. Conceptual translation of the cDNA reveals a uniquely long stem region not observed for other members of this enzyme family. Recombinant proteins expressed transiently in COS7 cells displayed transferase activity in vitro. Relative activity and substrate preferences of ppGaNTase-T5 were compared with previously identified isoforms (ppGaNTase-T1, -T3, and -T4); ppGaNTase-T5 and -T4 glycosylated a restricted subset of peptides whereas ppGaNTase-T1 and -T3 glycosylated a broader range of substrates. Northern blot analysis revealed that ppGaNTase-T5 is expressed in a highly tissue-specific manner; abundant expression was seen in the RSLG, with lesser amounts of message in the stomach, small intestine, and colon. Therefore, the pattern of expression of ppGaNTase-T5 is the most restricted of all isoforms examined thus far. The identification of this novel isoform underscores the diversity and complexity of the family of genes controllingO-linked glycosylation.

O-Glycosidically linked oligosaccharides are responsible for the unique physical properties and extended structural conformation of molecules such as mucins and a number of membrane receptors (1). Additionally, O-glycans function as ligands for receptors mediating such diverse functions as sperm-egg adhesion (2) and leukocyte trafficking (3). The initiation of O-linked glycosylation occurs through the action of UDP-Gal-NAc:polypeptide N-acetylgalactosaminyltransferase (ppGaN-Tase, 1 EC 2.4.1.41), which catalyzes the transfer of GalNAc from the nucleotide sugar UDP-GalNAc to the hydroxyl group of either serine or threonine.
In vitro assays indicate that subtle differences in substrate preferences exist among the more widely expressed ppGaN-Tase-T1, -T2, and -T3, although there appears to be overlap (7,8,12). In contrast, ppGaNTase-T4 displays limited substrate preferences in vitro and a more restricted pattern of expression (9). Thus, while it remains unclear why so many isoforms of ppGaNTase have evolved, the conservation of this family of enzymes from nematode to humans suggests that there is functional importance to the diversity of the enzymes expressed. In the present study we have cloned and expressed a new form of ppGaNTase, which we designate ppGaNTase-T5. Conceptual translation of the cDNA that encodes this isoform reveals the unusual structural feature of a long putative stem region. In addition, both the pattern of expression and in vitro substrate preferences are more restricted in nature.

EXPERIMENTAL PROCEDURES
Construction of Rat Sublingual Gland cDNA Library-Poly(A) ϩ RNA was isolated from Wistar RSLGs using the QD TM Rapid Poly(A) ϩ mRNA Isolation System (5 Prime 3 3 Prime, Inc., Boulder, CO). All mRNA was doubly poly(A) ϩ -selected using two oligo(dT)-cellulose columns. The integrity of the purified mRNA was verified by Northern analysis using multiple RSLG probes (data not shown). Prior to cDNA synthesis, 5 g of poly(A) ϩ mRNA was treated with 10 mM CH 3 HgOH for 1 min to eliminate secondary structure. cDNA synthesis was performed using the Stratagene cDNA Synthesis Kit. cDNA size fractionation, quantitation, and cloning into the Uni-Zap XR vector was performed according to the manufacturer's instructions as was packaging, amplification, and titering of the resultant Uni-Zap XR library.
Isolation of ppGaNTase-T5 Probes and Full-length cDNAs-The conserved amino acid regions EIWGGEN and VWMDEYK were used to design sense and antisense PCR primers, d(GARATHTGGGGNGGN-GARAA) (321-S) and d(TTRTAYTCRTCCATCCANAC) (379-AS), respectively. These primers were then employed in PCR reactions using RSLG cDNA as the template under conditions described previously (9). The expected 200-bp PCR products were cloned into the SrfI site of an M13 cloning vehicle and screened for putative ppGaNTase fragments using plaque hybridization with the 32 P-end-labeled primers mentioned above. From this screen, 24 positive clones were obtained and sequenced using infrared fluorescent dye-labeled primers on a LI-COR DNA 4000L DNA sequencer. The insert from one unique clone was used to generate a 140-bp PCR fragment using the oligonucleotides 2-6-S (d(AATATGGAGCTATCATTCAAGGTCT)) and 2-6-A (d(AGATTCCGT-TCCACCGTCTTC)). This 140-bp fragment was labeled by asymmetric PCR with oligo 2-6-A as described previously (23) and subsequently used to screen 1 ϫ 10 6 plaques from the oligo(dT)-primed Uni-Zap XR RSLG cDNA library according to standard procedures (13). Six positive clones were obtained during this screen. One clone was sequenced to entirety on both strands. The N-terminal transmembrane domain was determined by a Kyte-Doolittle hydrophobicity plot. Sequence alignments were performed using the Clustal method of Megalign (DNAS-TAR) and began at the conserved region FNXXXSD in the putative lumenal domain. This conserved region was also used as the putative beginning of the catalytic domains in Fig. 6.
Amino Acid Similarity Determinations-Amino acid sequences were aligned, one pair at a time, using the pairwise ClustalW (1.4) algorithm in MacVector (Oxford Molecular Group). The following alignment modes and parameters were used: slow alignment, open gap penalty ϭ 10, extended gap penalty ϭ 0.1, similarity matrix ϭ blosum, delay divergence ϭ 40%, and no hydrophilic gap penalty. The percent aa sequence similarity displayed in Tables I and II represents the sum of the percent identities and similarities. Sequences comprising the conserved domain used in Table II begin with the first aa in Fig. 2 and end with a conserved Pro (aa position 425 in ppGaNTase-T1, 440 in ppGaN-Tase-T2, 499 in ppGaNTase-T3, 438 in ppGaNTase-T4, and 796 in ppGaNTase-T5). The segment of conserved sequences is approximately 340 aa in length in the various isoforms.
Generation of Secretion Constructs for ppGaNTase-T5-The fulllength 3.5-kb cDNA of ppGaNTase-T5 was cloned into the EcoRI and XhoI sites of an M13 vector. The secretion construct of ppGaNTase-T5 was generated by site-directed oligonucleotide mutagenesis (14) using the antisense oligonucleotide d(CCTGTGTTGAACGCGTTGAAT-GAGA), which generates an MluI site at nucleotide position 113 of ppGaNTase-T5 to form the vector, TBMluIAS. pF1-rT5 was constructed by cloning the MluI/BstEII fragment of TBMluIAS into the MluI/ BamHI sites of pIMKF1 (9). pF1-rT5 is an SV40-based expression vector that generates a fusion protein containing the following, in order: an insulin secretion signal, a metal binding site, a heart muscle kinase site, a FLAG TM epitope tag, and the truncated rat ppGaNTase-T5 gene, beginning 4 aa after the hydrophobic domain.
Purification, Labeling, and Gel Analysis of Secreted Isoforms-COS7 cells were grown to 90% confluency in Dulbecco's modified Eagle's medium (Life Technologies, Inc.) ϩ 10% fetal calf serum at 37°C and 5% CO 2 . One g of pIMKF1 (9), pF1-mT1 (9), pF1-mT3 (8), pF1-mT4 (9) or pF1-rT5, and 8 l of LipofectAMINE TM (Life Technologies Inc.) were used to transfect a 35-mm well of COS7 cells as described previously (9). Recombinant enzymes were purified from the culture media of transfected cells using anti-FLAG TM M2 affinity gel (Eastman Kodak Co.) as described previously (9). Levels of partially purified recombinant enzymes were analyzed by Tricine/SDS-PAGE (15) after labeling with [␥-32 P]rATP using heart muscle kinase (HMK) as described previously (16). Gels were dried under vacuum and exposed to film (XAR-Kodak) or quantitated on a PhosphorImager (Molecular Dynamics). A dilution series of certain isoforms was labeled and separated by Tricine/SDS-PAGE to verify the accuracy and linearity of gel quantitation.
Functional Expression Assays of Secreted Recombinant ppGaNTase-T5-Enzyme activity was measured against a panel of peptides including: EA2 (PTTDSTTPAPTTK), from the rat submandibular gland mucin repeat (17); HIV (RGPGRAFVTIGKIGNMR), from HIV gp120 protein (7); MUC2 (PTTTPISTTTMVTPTPTPTC), from human intestinal mucin (18); MUC1b (PDTRPAPGSTAPPAC), from human Muc1 mucin (19); EPO-T (PPDAATAAPLR), from human erythropoietin (4); rMUC-2 (SPTTSTPISSTPQPTS), from rat intestinal mucin (20); and mG-MUC (QTSSPNTGKTSTISTT), from mouse gastric mucin (21). To compare the peptide preferences of each isoform, FLAG-enriched recombinant enzymes were quantitated relative to one another using Tricine/SDS-PAGE as described above. Enzyme assays were conducted using similar amounts of each partially purified enzyme based on gel densitometric measurements. All assays were performed in duplicate at 37°C for 1 h as described previously (9). Reaction products were purified using anion exchange chromatography (AG 1X-8, Bio-Rad). Reactions lacking peptide yielded a background value which was averaged for each isoform and subtracted from each duplicate value. Adjusted duplicate measurements were then averaged to give the values shown in Fig. 4. Enzyme activity is expressed as cpm/h/densitometric unit of partially purified enzyme. Recovered material from cells transfected with vector alone (pIMKF1) gave no activity above background.
Kinetic Analyses-Kinetic constants for the ppGaNTases were determined for substrates which gave a measurable K m for ppGaNTase-T5 (EA2 and mG-MUC) as well as for UDP-GalNAc. Reactions were performed as described previously (9)  1-250 M. In an effort to ensure that less than 10% of the substrate UDP-[ 14 C]GalNAc was consumed in each assay, amounts of each enzyme used in these assays were not normalized relative to one another due to substantial differences in activity between isoforms. Assays to determine kinetic parameters of UDP-GalNAc were performed with saturating concentrations of EA2 (500 M), and assays to determine parameters of EA2 and mG-MUC were performed with 500 M UDP-GalNAc. All assays were performed in duplicate. K m and V max values were estimated using Lineweaver-Burk and Hanes plots.
Northern Blot Analysis-Total RNA was extracted from Wistar rat tissues according to the single step isolation method described in Ausubel et al. (22). Following electrophoresis in a 1% formaldehyde agarose gel, rat total RNA samples were transferred to Hybond-N membranes (Amersham Pharmacia Biotech) according to Sambrook et al. (13). pp-GaNTase-T5 transcripts were detected by asymmetric PCR labeling (23) of the ϳ1-kb MscI/NheI fragment from the ppGaNTase-T5 cDNA FIG. 2. Amino acid sequence alignments of ppGaNTase-T1, -T2, -T3, -T4, and -T5 from human, murine, and rat clones. A consensus sequence is shown below alignment blocks for positions that are 60% conserved among isoforms shown (only one of the three T1 isoforms was used in this determination). Alignments began at the conserved region, FNXXXSD, in the putative lumenal domain of the enzymes. Segments of amino acid sequences that were used to make hybridization probes or PCR primers are boxed and underlined with a bold arrow. within a 340-aa conserved domain Percent amino acid similarity is shown for an ϳ340-aa conserved domain, using pairwise ClustalW alignments, as described under "Experimental Procedures." hT1 ϭ human ppGaNTase-T1; mT1 ϭ mouse ppGaNTase-T1; rT1 ϭ rat ppGaNTase-T1; hT2 ϭ human ppGaNTase-T2; mT3 ϭ mouse ppGaNTase-T3; mT4 ϭ mouse ppGaNTase-T4; rT5 ϭ rat ppGaNTase-T5; NA ϭ not applicable. hT1 mT1 rT1 hT2 mT3 mT4 rT5 clone using the antisense oligonucleotide, d(CAGAATGTGGTTGGGTT-TAG). ppGaNTase-T1 and -T4 were detected as described previously (9). The antisense oligonucleotide for the 18 S ribosomal subunit (d(TATTGGAGCTGGAATTACCGCGGCTGCTGG)) was 32 P-end-labeled as described (13) and hybridized in 5 molar excess to the Northern blots, to normalize for sample loading. All hybridizations were performed in 5 ϫ SSPE, 50% formamide at 42°C with two final washes in 2 ϫ SSC, 0.1% SDS at 65°C for 20 min.

RESULTS
cDNA Cloning of ppGaNTase-T5 and Sequence Analysis-PCR amplification was performed on RSLG cDNA using degenerate primers designed from within a 420-amino acid conserved region of the ppGaNTase family of proteins. A novel PCR product was identified that shared homology with previously characterized isoforms and was subsequently used as a probe to screen a RSLG cDNA library. One cDNA clone containing a complete open reading frame was isolated, sequenced, and given the designation, ppGaNTase-T5 (GenBank TM /EBI Data Bank accession number AF049344). The cDNA that encodes ppGaNTase-T5 contains a 3519-bp insert (Fig. 1) which can be conceptually translated into a 930-aa type II membrane protein consisting of a 13-aa N-terminal cytoplasmic region, a 24-aa hydrophobic region, a 416-aa stem region, and a 477-aa lumenal region. The 416-aa stem region is ϳ4 -7 times longer than those of other isoforms. As shown in Fig. 2, ppGaNTase-T5 is distinct from previously identified mammalian isoforms yet shares many blocks of sequence similarity or identity between consensus aa 454 and 930. Amino acid similarity comparisons were performed between rat ppGaNTase-T5 and other known isoforms (Tables I and II). Similarity values range between 52 and 57% when looking at the full-length proteins (Table I)    increase to 64 -68% when comparisons are performed within an approximately 340-aa conserved domain (Table II). During this screen, a partial cDNA clone was isolated that shared complete homology with ppGaNTase-T5 except for an additional 100-bp region inserted at nucleotide position 2410. The conceptually translated sequence of this region revealed a stop codon that would yield a 3Ј truncation of the protein.
Based on preliminary sequence data and Southern analysis, this clone may represent a splice variant of ppGaNTase-T5 (data not shown).
Functional Expression-The coding region of ppGaNTase-T5 was cloned downstream of the insulin secretion signal, HMK site, and FLAG TM epitope tag of the vector pIMKF1 (9). The ppGaNTase-T5 expression vector, as well as similar constructs containing the previously identified ppGaNTase-T1, -T3, and -T4 cDNAs, were transfected into COS7 cells, and the expressed products were enriched by immunoadsorption to an anti-FLAG affinity matrix. Enriched material for each isoform was ␥-32 P-labeled and resolved by Tricine/SDS-PAGE (Fig. 3). Equivalent amounts of each partially purified isoform (as judged by densitometric scanning of Tricine/SDS-PAGE gels) were used to assess substrate preferences in vitro (Fig. 4). Both ppGaNTase-T4 and -T5 showed restricted preferences; ppGaN-Tase-T5 demonstrated significant transferase activity against substrate EA2, with lower levels of glycosylation seen with the rMUC-2, mG-MUC, and EPO-T peptides. Furthermore, mixing experiments between ppGaNTase-T4 and -T5 resulted in an additive effect, demonstrating that these recombinant preparations do not contain differing levels of transferase inhibitors (data not shown). In contrast, ppGaNTase-T1 and -T3 were more promiscuous and incorporated sugar in virtually all peptides tested, albeit at varying levels. When we attempted to express the putative splice variant described above, neither secreted protein nor enzymatic activity could be detected.
Kinetic parameters for ppGaNTase-T1, -T3, -T4, and -T5 were determined for peptide substrates, EA2 and mG-MUC, as well as the sugar donor UDP-GalNAc. Amounts of enzymes used in this analysis were not normalized relative to one another due to substantial differences in activity between isoforms. K m and V max values obtained are shown in Table III. All four enzymes displayed similar K m values for UDP-GalNAc, with ppGaNTase-T4 and -T5 having the highest values (0.069 mM and 0.051 mM, respectively). All enzymes, with the excep-tion of ppGaNTase-T4, seemed to prefer EA2 as a substrate over mG-MUC. K m values for ppGaNTase-T5 with the other peptides shown in Fig. 4 were not measurable.
Northern Blot Analysis-Northern blots of rat total RNA from multiple tissues were hybridized with specific probes for ppGaNTase-T1, -T4, and -T5 (Fig. 5). In contrast to 3.7-and 4.3-kb ppGaNTase-T1 transcripts, which were present in all samples examined, the expression of the 4.4-kb ppGaNTase-T5 message was restricted primarily to the SLG, with lower levels seen in stomach, small intestine, and colon. A minor transcript band was also detected at ϳ8 kb. The distribution of the 4.0-kb and 2.8-kb ppGaNTase-T4 transcripts was similar to ppGaN-Tase-T5, detected principally in the SLG, digestive tract, and additionally, in the female reproductive tract. DISCUSSION In an effort to further define the repertoire of ppGaNTases, we have employed degenerate PCR to clone a novel isoform from a rat cDNA library. ppGaNTase-T5 has the attributes of a type II membrane protein, characteristic of previously identified glycosyltransferases (Fig. 6). The N-terminal, hydrophobic, and catalytic regions of ppGaNTase-T5 are similar in length to ppGaNTase-T1, -T2, -T3, and T4. However, ppGaNTase-T5 has an unusually large stem region (416 aa), which is ϳ4 -7 times longer than other known isoforms. Many type II membrane proteins are converted into soluble, circulating species after cleavage within the stem region by a group of enzymes referred to as "secretases" (24). Of interest is the large number of hydroxyamino acids and potential N-glycosylation sites found in the ppGaNTase-T5 stem region. The presence of O-or N-glycans in this region could inhibit the action of secretases, thereby keeping this isoform resident within the Golgi complex. Alternatively, a heavily O-glycosylated stem may function as a extended stalk (1), inserting the catalytic region of this enzyme further into the Golgi lumen.
The aa conservation of an individual ppGaNTase isoform is greater than 90% across mammalian species, whereas different isoforms within the same species show more variability. pp-GaNTase-T5 demonstrates that highest degree of similarity to ppGaNTase-T1; they share 68% aa similarity within the conserved region and 57% aa similarity across the full-length protein (Tables I and II). The reduced similarity observed when comparing full-length proteins is most likely the result of di- minished conservation within the N terminus and the long stem region of ppGaNTase-T5. Previously identified isoforms (ppGaNTase-T1, -T2, -T3, and -T4) have regions of highly conserved or invariant sequences within the lumenal domain. Although ppGaNTase-T5 contains most of these conserved regions, it demonstrates the lowest aa similarity to other isoforms and has a very poorly conserved CHGXGGNQ block (which has been used in the past to identify novel isoforms) (9). This result points out the limitation of using a restricted set of conserved sequences to identify new isoforms; the clones recovered by such methods may be biased against the population of transferases lacking these blocks. When taken together with the results of a site-directed mutagenesis study demonstrating that the histidine residue in this sequence is non-essential (25), it further suggests that this conserved block per se may not play a significant functional role in the transferase activity of these enzymes.
Highly tissue specifically regulated isoforms of ppGaNTases may be uniquely programmed to act only on a limited subset of substrates from a particular cell or tissue type. This may account for the relatively modest activities seen for ppGaN-Tase-T5 and -T4 against our limited panel of peptides. Thus, it is plausible that ppGaNTase-T5 acts on a substrate unique to the RSLG. Future studies will focus on the identification of substrates whose expression patterns mimic those of a given isoform of ppGaNTase.
The highly restricted expression and unique structural features of ppGaNTase-T5 further emphasize the diversity and complexity of this family of enzymes. It is unclear why so many distinct isoforms are needed in vivo. It remains possible that some isoforms serve more general, housekeeping functions while others act on specific molecules only. However, an extensive network of transferases may be in place with enough redundancy to ensure appropriate glycosylation. Through the continued identification of novel isoforms and gene ablation studies on the diverse repertoire of members available currently, we can begin to dissect the biological significance of this family of enzymes.