cDNA Cloning and Expression of a Novel Human UDP-N-acetyl-α-D-galactosamine

The glycosylation of serine and threonine residues during mucin-type O-linked protein glycosylation is carried out by a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases (GalNAc-transferase). Previously two members, GalNAc-T1 and −T2, have been isolated and the genes cloned and characterized. Here we report the cDNA cloning and expression of a novel GalNAc-transferase termed GalNAc-T3. The gene was isolated and cloned based on the identification of a GalNAc-transferase motif (61 amino acids) that is shared between GalNAc-T1 and −T2 as well as a homologous Caenorhabditis elegans gene. The cDNA sequence has a 633-amino acid coding region indicating a protein of 72.5 kDa with a type II domain structure. The overall amino acid sequence similarity with GalNAc-T1 and −T2 is approximately 45%; 12 cysteine residues that are shared between GalNAc-T1 and −T2 are also found in GalNAc-T3. GalNAc-T3 was expressed as a soluble protein without the hydrophobic transmembrane domain in insect cells using a Baculo-virus vector, and the expressed GalNAc-transferase activity showed substrate specificity different from that previously reported for GalNAc-T1 and −T2. Northern analysis of human organs revealed a very restricted expression pattern of GalNAc-T3.

The glycosylation of serine and threonine residues during mucin-type O-linked protein glycosylation is carried out by a family of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferases (GalNAc-transferase) 1 (EC 2.4.1.41). Two distinct human GalNAc-transferase genes, GalNAc-T1 and -T2, have been cloned and characterized to date (1)(2)(3). 2 Analysis of the acceptor substrate specificity of GalNAc-T2 has revealed substrates that this transferase does not utilize (3,4). In the present study we have analyzed the acceptor substrate specificity of GalNAc-T1 and found that neither GalNAc-T1 nor -T2 utilize all substrates identified, thus suggesting the existence of additional GalNAc-transferases. The existence of additional GalNAc-transferases has also been suggested by Hagen et al. (5) by analysis of sequence similarities of expressed sequence tag clones with those of GalNAc-T1 and -T2. O-Glycosylation in yeast has similarly been shown by Tanner and colleagues (6 -8) to be initiated by at least four mannosyltransferases.
Families of glycosyltransferases with related acceptor and/or donor substrate specificities may be encoded by homologous genes showing segments of sequence similarity (9,10). Initially, no sequence similarities were found between enzymes having the same donor substrate specificity (11), but as more enzymes have been cloned, several families of homologous glycosyltransferase genes have been identified. Livingston and Paulson (12) originally identified a sialyltransferase motif in a segment of 55 amino acids that has now been found to be conserved within all identified members of the sialyltransferase family (13). Similarly, sequence similarities are also found within ␣3/4-fucosyltransferases (10,14), ␣2-fucosyltransferases (15), ␤6-N-acetylglucosaminyltransferases (16), ␤4-Nacetylgalactosaminyltransferases (17), the histo-blood group A/B transferases, and an ␣3-galactosyltransferase (18).
The human GalNAc-transferases T1 and T2 share a segment of 61 amino acids with 82% sequence similarity, and this segment is also found in a deduced homologue, ZK688.8 (see Fig.  1), which has been observed to exhibit GalNAc-transferase activity (5). In the present study we have utilized this potential GalNAc-transferase motif to develop a PCR strategy that identified two novel cDNAs with sequence similarity. Here we report the cloning of cDNA containing the complete coding sequence of one of these and show by expression that the gene encoded a GalNAc-transferase with an acceptor substrate specificity partly different from GalNAc-T1 and -T2. Northern analysis showed that the expression of GalNAc-T3 is highly tissuerestricted in contrast to GalNAc-T1 and -T2.

EXPERIMENTAL PROCEDURES
Identification of cDNA Homologous to GalNAc-T1 and -T2 by RT-PCR and Restriction Enzyme Analysis-Multiple sequence alignment analysis (DNASIS, Hitachi) of GalNAc-T1 and -T2 was applied to identify areas with highest degree of sequence similarity. Based upon a 61-amino acid segment shared by GalNAc-T1 and -T2 as well as a more recently reported sequence derived from a homologous Caenorhabditis elegans gene (5), a pair of sense and anti-sense primers (EBHC100, 5Ј-TGGGGAGGAGARAACCTAGA-3Ј, and EBHC106, 5Ј-ATTCATC-CATCCATACTTCT-3Ј, respectively, was used in RT-PCR amplifications of poly(A ϩ ) RNA from several sources (see Figs. 1 and 2). The mRNA from human organs (liver, brain, and submaxillary gland) were obtained from Clontech, and mRNA from human cancer cell lines (MKN45, Colo205, and WI38) was prepared as reported previously (3). A restriction enzyme search identified a common BstNI site within the expected 196-bp RT-PCR product of GalNAc-T1 and -T2, which would * This work was supported by the Danish Medical Research Council, the Danish Natural Science Research Council, the Lundbeck Foundation, Ib Henriksens Foundation, the Gangsted Foundation, the Novo Nordisk Foundation, and the Ingeborg Roikjer's Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) X92689.
The 196-bp products from RT-PCR of MKN45 mRNA that were resistant to BstNI cleavage were isolated using the prep-A-gene kit (Bio-Rad) and cloned into the pT7T3U19 vector (Pharmacia Biotech Inc.). Plasmids from 40 individual clones were purified using Qiagen-tip 20 column (Qiagen), and the clones were sequenced. Two sets of sequences differing from GalNAc-T1 and -T2 but exhibiting a high degree of similarity were identified, and sequence information from one set of identical clones designated TE3 was used for the isolation of 5Ј and 3Ј sequences outside the GalNAc-transferase motif.
Cloning and Sequencing of GalNAc-T3 by Rapid cDNA Library Screening-Rapid library screening was performed by diluting 1 ϫ 10 6 pfu of human salivary gland gt11 library (Clontech) into 40 sublibraries (designated numbers 1-40), each possessing approximately 2.5 ϫ 10 4 pfu. All sublibraries were subjected to phage amplification (approximately 40-fold) by liquid culture phage amplification (19), giving a sublibrary titer of 1 ϫ 10 6 pfu. Phage amplification was performed in 1 ml of LB MgSO 4 maltose medium in a shaking incubator at 37°C for 5 h. After amplification, 20 l of chloroform was added to each sublibrary, cellular debris was pelleted, and the phage supernatants were titrated and used in subsequent screenings. All 40 sublibraries were screened to identify TE3 possessing phage clones. One l of each sublibrary (approximately 10 4 -10 5 pfu) was lysed in a 10 l of volume in the presence of 0.45% Nonidet P-40 and Tween 20, 100 g/l proteinase K at 56°C for 30 min. Proteinase K was heat-inactivated by boiling for 15 min, and 2 l of phage lysate was amplified by PCR using primers EBHC100 and EBHC204 at 0.5 M using 40 cycles of 95°C for 45 s, 55°C for 5 s, and 72°C for 30 s. Thirteen sublibraries found to contain TE3 gt11 clones were further assayed by PCR using EBHC202 (5Ј-GCGGATCCGCAGCAAAAGCCCTCATAGCTTT-3Ј) or EBHC204 (5Ј-GCGGATCCTCTAGCAATCACCTGAGTGCC-3Ј) primers combined with the gt11 vector primers to estimate lengths of cDNA inserts for selection of sublibraries with most 3Ј or 5Ј sequences. Amplifications were performed for 35 cycles of 95°C for 45 s, 55°C for 1 s, and 72°C for 2 min.
Two sublibraries generated 3Ј PCR products (EBHC204/gt11 vector) of approximately 1000 bp, and two sublibraries generated TE3 5Ј PCR products of approximately 1200 bp. PCR products were subcloned into pT7T3U19 and sequenced. These PCR products were used to probe and isolate cDNA clones from the corresponding sublibraries. Both strands of the subcloned cDNAs were sequenced (20) using internal primers spaced 3-400 bp apart. Partly overlapping sequence data from cDNA clones were utilized to derive the complete coding sequence.
Expression of GalNAc-T3 in Sf9 Cells-A partial cDNA sequence of the putative GalNAc-T3 gene an RT-PCR product (pAcGP67-GalNAc-T3-sol) using primers EBHC219 (5Ј-AGCGGATCCTCAACGATG-GAAAGGAACATG-3Ј) and EBHC215 (5Ј-AGCGGATCCAGGAACACT-TAATCATTTTGGC-3Ј) with BamHI restriction sites introduced was produced and cloned (see Fig. 3). The PCR product was designed to yield a putative soluble form of the GalNAc-T3 protein with an NH 2 -terminal end positioned immediately COOH-terminal to the potential transmembrane domain and including the entire sequence expected to contain the catalytic domain. The PCR product was cloned into a BamHI site of the expression vector pAcGP67 (Pharmingen), and the expression construct was sequenced to verify the sequence and correct insertion into the cloning site. Control constructs included pAcGP67-Gal-NAc-T2-sol prepared as described previously (3), pAcGP67-GalNAc-T1sol prepared similarly by RT-PCR with human submaxillary gland mRNA and designed to mimic the originally identified amino terminus of the soluble bovine GalNAc-transferase protein (1), and pAcGP67-O 2sol containing the histo-blood group O 2 cDNA and prepared as described previously for the blood group A cDNA (21). Co-transfection of Sf9 cells with pAcGP67-constructs and Baculo-Gold DNA was performed according to the manufacturer's description. Briefly, 0.5 g of construct was mixed with 0.05 g of Baculo-Gold DNA and co-transfected in Sf9 cells in 24-well plates. 96 h post-transfection recombinant virus was amplified in 6-well plates at dilutions of 1:10 and 1:50. Titer of amplified virus was estimated by titration in 24-well plates with monitoring of enzyme activities. Transferase assays were performed on supernatants of Sf9 cells in 6-well plates infected with virus at titer 1:1000 to 1:5000 representing end point dilutions giving optimal enzyme activities.

Identification of DNA Homologous to GalNAc-T1 and -T2-A
set of primers (EBHC100/EBHC106) corresponding to sequences flanking a putative GalNAc-transferase motif (Fig. 1) were used in RT-PCR reactions with mRNA from a variety of human organs and cell lines. A single DNA fragment of approximately 196 bp corresponding to that predicted for GalNAc-T1 and -T2 was amplified from all templates (Fig. 2). Hybridization with oligonucleotides probes specific for GalNAc-T1 and -T2 served as controls for the identities of the products observed. A restriction enzyme (BstNI) that selectively cut the products of both GalNAc-T1 and -T2 was used to detect potentially novel DNA from homologous genes. As seen in Fig. 2, RNA from several organs and cell lines yielded RT-PCR products that were not cleaved by BstNI, indicating the presence of a novel DNA fragment. The BstNI uncleaved RT-PCR product from the gastric carcinoma cell line MKN45 was subcloned and sequenced. Forty independent clones were sequenced, and of these eight clones contained sequences homologous to but different from GalNAc-T1 and -T2. Six independent clones had a novel sequence designated TE3, and two clones had a novel sequence designated TE4. The DNA sequence of TE3 was clearly similar to GalNAc-T1 and -T2 with a sequence similarity of approximately 80%. The deduced amino acid sequence containing the putative GalNAc-transferase motif is presented in Fig. 1.
Cloning of Human GalNAc-T3 Using the TE3 DNA Sequence-Cloning and sequencing of the complete coding sequence of GalNAc-T3 was achieved by PCR screening of 40 sublibraries from a human salivary gland gt11 library, which yielded two sublibraries (number 8 and number 1) containing long 3Ј and 5Ј sequences outside the TE3 probe area. This strategy facilitated identification of cDNA clones with long 5Ј and 3Ј inserts and allowed us to compare multiple 5Ј and 3Ј sequences obtained within the isolated cDNA clones to identify and avoid intron containing sequences. Two PCR products of 1000 bp from sublibrary 8 and two PCR products from sublibrary 1 of 1200 bp were selected, subcloned, and sequenced. The sequences of these PCR products exhibited similarity to the sequences of GalNAc-T1 and -T2. One cDNA clone from each sublibrary was isolated, and inserts were subcloned and sequenced. The sequences found in the PCR products were identical to the corresponding sequences in the selected cDNA clones. The 3Ј cDNA clone 8.3Ј possessed a 3-kb insert with a single 900-bp open reading frame followed by multiple stop codons and a consensus polyadenylation box (Fig. 3). The 5Ј end of the insert of clone 8.3Ј apparently contained an intron sequence, and this has been confirmed by sequence comparison of several RT-PCR and cDNA clones as well as a genomic clone. 3 One 5Ј cDNA clone 1.5Ј possessed a 1300-bp open reading frame but was not considered to contain the complete coding sequence, because it lacked a putative hydrophobic transmembrane region. A second screen using an antisense primer EBHC211 (5Ј-ACCGGATCCAGTGTTTAGCTTCCCCACG) (5Ј region of clone 1.5Ј) yielded another 5Ј clone, 12.5Ј, which contained additional 550 bp of 5Ј sequence including a potential transmembrane region. As shown in Fig. 4 (Table I). GalNActransferase activity with the Muc2 acceptor substrate peptide was increased 20-fold, and activity with the HIV-V3 peptide was increased nearly 100-fold. In contrast, expression of Gal-NAc-T1 and -T2 constructs only increased the GalNAc-trans- Background levels of GalNAc-transferase activity in uninfected cell medium was higher than in control infected cell medium, probably as a result of the production and release of endogenous Sf9 GalNAc-transferase due to the larger number of cells in uninfected cultures. Furthermore, background enzyme activity varied significantly among different acceptor substrate peptides. The peptide Muc2 yielded the highest background and HIV-V3 peptide yielded the lowest activity. In an early attempt to express functional pAcGP67-GalNAc-T3, constructs were made that were truncated either at the 5Ј end or 3Ј end (data not shown). Interestingly, constructs lacking the 14 COOH-terminal or 55 NH 2 -terminal amino acids were completely inactive, indicating that both the stem region and the COOH-terminal region are important for maintaining a catalytically active protein, a feature also found for the ␣3-galactosyltransferase (14).
Northern Blot Analysis of Human Organs-Northern blots with mRNA from 16 human adult and 5 fetal organs were probed with GalNAc-T1, -T2, and -T3 (Fig. 6). Similar to pre-vious results using the multiple tissue Northern blot, MTN I, GalNAc-T1 hybridized to two mRNAs of approximately 3.4 and 4.1 kb (1), whereas GalNAc-T2 hybridized to a 4.5-kb mRNA. Variable amounts of a smaller 2-3-kb mRNA were also detected with this probe (3). Hybrization of these probes to multiple tissue Northern blots, MTN II and fetal MTN, resulted in slightly different estimated mRNA sizes for all GalNAc-Ts. This discrepancy is probably due to differences in the parameters of gel electrophoresis and the marker positions assigned by the supplier. GalNAc-T3 hybridized to a 3.6-kb mRNA (estimated from MTN I) highly expressed in pancreas and testis, which was weakly expressed in kidney, prostate ovary, intestine, and colon. A very low level of GalNAc-T3 mRNA was also detected in adult placenta and lung as well as fetal lung and kidney. In adult spleen GalNAc-T3 hybridized to a larger 4.2-kb mRNA (estimated from MTN II). DISCUSSION This study presents data on the cloning, sequencing, and expression of a third member of a growing family of polypeptide GalNAc-transferases. A putative GalNAc-transferase motif of 61 amino acids that is highly conserved in sequence among GalNAc-T1, GalNAc-T2, and a C. elegans homologue was used to search for potential additional members of the polypeptide GalNAc-transferase family. The screening strategy included an RT-PCR strategy similar to that reported for the sialyltransferase family (12,23) followed by restriction enzyme analysis as a selection procedure. This method allowed us to eliminate or reduce "background" for the two known GalNAc-transferases, GalNAc-T1 and GalNAc-T2, and clearly distinguish novel RT-PCR products of the same size as those for GalNAc-T1 and -T2. Two novel DNA fragments were identified and sequenced. The present study presents data about one of these.
The novel cDNA was shown by expression in insect cells to have polypeptide GalNAc-transferase activity (Table I); it therefore may be classified as GalNAc-T3. The GalNAc-T3 gene encodes a protein with a predicted type II transmembrane domain structure similar to GalNAc-T1 and -T2 as well as all other glycosyltransferases characterized thus far (10). The Gal-NAc-T3 protein shows an overall amino acid sequence similarity of approximately 45% to either GalNAc-T1 or -T2, which is similar to the sequence similarity between GalNAc-T1 and -T2 (3). The lowest degree of sequence similarity is found in the amino-terminal region, including the transmembrane domain as well as the putative stem region. GalNAc-T3 is more than 50 amino acids longer than GalNAc-T1 or -T2 in the putative stem region. More than 80% of the COOH-terminal sequence of GalNAc-T1, -T2, and -T3 can be aligned by sequence similarity including the GalNAc-transferase motif, a number of minor segments of sequence similarity, and most of the cysteine residues (Fig. 5). Despite the relative low overall amino acid sequence similarity between GalNAc-T1, -T2, and -T3, 12 cysteine residues that are evenly spaced within the major part of the proteins are conserved. These may be involved in intramo-lecular disulfide bonding, or they may be directly involved in the catalytic activity of the enzymes. The significance of conserved cysteine residues was originally noted by Drickamer (24) within the sialyltransferase family. The functional importance of cysteine residues involved in intramolecular disulfide bonding as well as possibly the catalytic site of the ␤4-galactosyltransferase was recently demonstrated (25). The number of conserved cysteine residues in the polypeptide GalNAc-transferases far exceeds the number of cysteine residues reported in other glycosyltransferases to date (10). Interestingly, it appears that in vitro measurable GalNAc-transferase activity is increased by the presence of reducing agents (3,26).
GalNAc-T3 was found to have a different acceptor substrate specificity than GalNAc-T1 and -T2. Among a panel of acceptor substrate peptides (Table I), GalNAc-T3 was found to glycosylate a peptide derived from the HIV envelope glycoprotein gp120, which did not serve as substrate for Gal-NAc-T1 or -T2 (4, Table I). This peptide was identified as an acceptor substrate during analysis of enzyme activity in total extracts of various cell lines and organs (4,27). GalNAc-T3 also catalyzed glycosylation of mucin-type acceptor sequences such as Muc2 and Muc5, which can also be glycosylated by GalNAc-T1 and -T2. In a previous study we found that the enzyme activity that mediated glycosylation of the HIV peptide also utilized the Muc2 substrate by cross-competitive glycosylation (4). This finding is consistent with the substrate specificity reported here for GalNAc-T3, suggesting that GalNAc-T3 may represent this particular enzyme; however, additional enzymes may also show related specificities. Detailed analysis of individual GalNAc-transferases with a large panel of peptides and structural confirmation of the specific acceptor sites utilized for GalNAc-glycosylation will be necessary to fully understand the specificity of the individual members of the enzyme family.
The first step of mucin-type O-glycosylation is mediated by at least three and probably more GalNAc-transferases. The data presented here clearly show that GalNAc-T3 exhibits a different acceptor substrate specificity than GalNAc-T1 and -T2, using short synthetic peptides with no or little predicted secondary structure. The finding that the three GalNAc-transferases share mucin-type acceptor substrates such as the Muc2 and Muc5 peptide sequences may indicate overlap in specificity, but further structural studies of the products formed to identify the sites utilized by each enzyme on these peptides with multiple serine and threonine residues are needed to clarify this. It is clear that in vivo models displaying differential expression of GalNAc-transferases are needed to evaluate the contribution of a given GalNAc-transferase.  a One unit of enzymes is defined as the amount of enzyme that transfers 1 mol GalNAc in 1 min using the standard reaction mixture as described under "Experimental Procedures" with 50 g of peptide as acceptor substrate.
In contrast to GalNAc-T1 and -T2, expression of GalNAc-T3 appears to be highly regulated and mainly found in pancreas and testis; weak expression is found in a few other organs including placenta. Interestingly, approximately 200 bp covering part of the GalNAc-transferase motif of GalNAc-T3 were recently sequenced from a pancreatic expressed sequence Tag library (EMBL accession number T11328). The lack of Gal-NAc-T3 expression in human liver correlates with the finding that organ extracts from human liver lacked GalNAc-transferase activity utilizing the HIV peptide, whereas expression of GalNAc-T3 mRNA in human placenta is in agreement with GalNAc-transferase activity using the HIV peptide in extracts of placenta (4). One interpretation of these data is that differential expression of different GalNAc-transferases can result in O-glycosylation of distinct sites on a given protein. The biological significance of this is unclear. There are a few studies on O-glycosylation sites, and these are limited to analysis of the functional activity or stability of a protein with or without a single O-glycosylation site (28), because assignments of O-glycosylation sites are difficult to perform (29).
The results presented here suggest that cell-, organ-, and species-specific differences in the position of O-glycosylation may occur as a result of differential expression of polypeptide GalNAc-transferases. In searching for potential motifs of Oglycosylation by analyzing serine and threonine residues carrying O-glycans in glycoproteins, one may need to consider the GalNAc-transferase repertoire of the cell of origin (30,31). The existence of a transferase family of unknown size possibly exceeding three members displaying differential acceptor sub-strate specificity and cell/organ distribution suggests that mucin-type O-glycosylation is a much more defined and controlled process than previously recognized. In this respect, previous studies aimed at identifying consensus sequence motifs for O-glycosylation may not have identified such because of the unknown level of complexity. The data reported here suggest that O-glycosylation in terms of sites is less random than previously suggested and that more defined acceptor substrate peptide sequences may be recognized for each of the individual GalNAc-transferases. With the individual GalNAc-transferases expressed as recombinant proteins, it may be possible to determine primary peptide sequence motifs for the individual enzymes, which could be useful for predicting O-glycosylation in vivo by a given cell type.