Molecular Cloning and Expression of a Novel β-1,6-N-Acetylglucosaminyltransferase That Forms Core 2, Core 4, and I Branches*

Mucin-type O-glycans are classified according to their core structures. Among them, cores 2 and 4 are important for having N-acetyllactosamine side chains, which can be further modified to express various functional oligosaccharides. Previously, we discovered by cloning cDNAs that the core 2 branching enzyme, termed core 2 β-1,6-N-acetylglucosaminyltransferase-leukocyte type (C2GnT-L), is highly homologous to the I branching β-1,6-N-acetylglucosaminyltransferase (IGnT) (Bierhuizen, M. F. A., Mattei, M.-G., and Fukuda, M. (1993) Genes Dev. 7, 468–478). Using these homologous sequences as probes, we identified an expressed sequence tag in dbEST, which has significant homology to C2GnT-L and IGnT. This approach, together with 5′and 3′ rapid amplification of cDNA ends, yielded a human cDNA that encompasses a whole coding region of an enzyme, termed C2GnT-mucin type (C2GnT-M). C2GnT-M has 48.2 and 33.8% identity with C2GnT-L and IGnT at the amino acid levels. The expression of C2GnT-M cDNA directed the expression of core 2 branched oligosaccharides and I antigen on the cell surface. Moreover, a soluble chimeric C2GnT-M had core 4 branching activity in addition to core 2 and I branching activities. A soluble chimeric C2GnT-L, in contrast, almost exclusively contains core 2 branching activity. Northern blot analysis demonstrated that the C2GnT-M transcripts are heavily expressed in colon, small intestine, trachea, and stomach, where mucin is produced. In contrast, the transcripts of C2GnT-L were more widely detected, including the lymph node and bone marrow. These results indicate that the newly cloned C2GnT-M plays a critical role inO-glycan synthesis in mucins and might have distinctly different roles in oligosaccharide ligand formation compared with C2GnT-L.

Those sialyl Le x and sulfated sialyl Le x in O-glycans have been shown to be preferential ligands for P-and L-selectin (6 -10). It has been also shown that poly-N-acetyllactosamines can be extended from core 2 branches (3,5,7,11,12). Poly-Nacetyllactosamines provide the backbone for additional modifications such as sialyl Le x . Moreover, a linear poly-N-acetyllactosamine (Gal␤134GlcNAc␤133) n can be converted to branched poly-N-acetyllactosamine, Gal␤134GlcNAc␤133 (Gal␤134GlcNAc␤136)Gal3 R, by the I branching enzyme IGnT (13,14). These linear and branched poly-N-acetyllactosamines represent, respectively, the i and I antigens, which are expressed in a cell type-specific manner (1). It has been shown that there exist two IGnTs, which use two different sets of acceptor substrates (15)(16)(17).
In the gastrointestinal tract, oligosaccharides with core 3, GlcNAc␤133GalNAc, can be frequently found. These tissues also contain core 4, Gal␤134GlcNAc␤136(GlcNAc␤133)Ga-lNAc, which was originally discovered in sheep gastric mucosa (18). Core 4 is formed from core 3 by core 4 ␤-1,6-N-acetylglucosaminyltransferase (C4GnT). It has been reported that the amount of core 4 oligosaccharides is reduced in colonic carcinoma cells, whereas the amount of core 2 oligosaccharides is maintained or increased (19,20). More recently, the increase in the transcript of C2GnT was found to be associated with the progression of colonic carcinomas (21).
To understand the roles of these oligosaccharides in normal and pathological conditions, it is essential to understand the regulation of their biosynthesis. To this end, we previously cloned cDNAs encoding C2GnT (termed in the present study C2GnT-leukocyte type (C2GnT-L)) from human HL-60 promyelocytic cells (22) and IGnT from human PA-1 embryonic carcinoma cells (23). C2GnT-L and IGnT share homologous sequences in three regions of their catalytic domains (23). Neither C2GnT-L nor IGnT, however, forms core 4 structure. * This work was supported by MERIT Award R37 CA33000 and in part by Grant PO1 CA71932 from the NCI, National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM  On the other hand, a C2GnT purified from mucin-producing cells, termed C2GnT-mucin (C2GnT-M) type, was reported to contain C2GnT and C4GnT activities in the same enzyme (2,15,16). In the present study we describe the isolation of cDNA encoding such an enzyme using EST sequence, which has strong homology with C2GnT-L and IGnT. We found that this newly cloned enzyme is unique in having C2GnT activity as well as core 4 and I branching activities. We termed this new enzyme C2GnT-M, because it is mainly expressed in mucinproducing tissues.

EXPERIMENTAL PROCEDURES
Isolation of cDNA Encoding C2GnT-M-The coding sequences of human C2GnT and IGnT homologous region A (nucleotides 315-702 for C2GnT-L and nucleotides 252-618 for IGnT; Refs. 22 and 23) were used to search dbEST using the TBLASTN program. A single query gene (AA307800) of 554 base pairs was found to have 67 and 62% identity with the coding regions of C2GnT and IGnT, respectively. Judging from the nucleotide sequence, the EST clone AA307800 did not encompass the whole coding sequence. Using the primers based on the sequence of this EST clone and human fetal brain Marathon Ready cDNA (CLON-TECH) as a template, PCR resulted in a product of the expected sequence of 247 base pairs.
Human fetal brain Marathon Ready cDNA was then used as a template to obtain both the 5Ј-and 3Ј-sequence. 5Ј RACE and 3Ј RACE were separately carried out by PCR using antisense primer or sense primer and AP1 adapter (CLONTECH). Antisense and sense primers used were CCL2AS (5Ј-CCAGCTTACTGGCTATGAAGACATTTGG-3Ј) and CCL1 (5Ј-GAGCACTTCAAGGCTGAAAGGAAGTTC-3Ј). These primers were designed to have a 247-base pairs overlapping sequence between these two RACE products. PCR products were cloned into pCR2.1-TOPO (Invitrogen), and colony hybridization was performed to identify the RACE products using a 32 P-labeled DNA fragment of 247 base pairs, obtained by PCR using CCL1 and CCL2AS primers de-scribed above. The nucleotide sequence was identical among several clones examined. Based on the obtained sequences, a cDNA was prepared by reverse transcription-PCR of poly(A) ϩ RNA from HT-29 colonic carcinoma cells, and the sequence of genomic DNA was analyzed by PCR. Reverse transcription-PCR was carried out as described previously (24), except that Thermoscript (Life Technologies) was used. Because two nucleotide sequences were found to be identical, the cDNA obtained by the above reverse transcription-PCR was thus cloned into pcDNA3.1/Zeo(Ϫ) vector, resulting in pcDNA3.1-C2GnT-M.
Construction of Vector Encoding a Chimeric C2GnT-M-The cDNA fragment encoding a putative catalytic domain plus stem region was prepared by PCR using pcDNA3.1-C2GnT-M as a template. The 5Ј primer for the PCR contained the sequence of BglII site and nucleotides 99 -124 (nucleotides 1-3 encode the initiation methionine), whereas the 3Ј primer contained the 3Ј end of the coding sequence, stop codon, and XbaI site. The cDNA fragment encoding amino acid residues 34 -438 of C2GnT-M was ligated into BamHI and XbaI sites of pcDNA3-A harboring a cDNA encoding a signal peptide and IgG binding domain of protein A (24), resulting in pcDNA3-A⅐C2GnT-M. Similarly, the catalytic domain of IGnT (amino acid residues 30 -400) was cloned into pcDNAI-A (24), resulting in pcDNAI-A⅐IGnT. pcDNAI-A⅐C2GnT-L was constructed exactly as described before (22), except that pcDNAI-A described recently (24) was used.
Enzyme Assays-pcDNA3-A⅐C2GnT-M, pcDNAI-A⅐C2GnT-L, or pcDNAI-A⅐IGnT was transiently transfected into CHO cells as described previously (25). Twenty-four h after the transfection, the medium was replaced with serum-free Opti-MEM (Life Technologies) and cultured for an additional 48 h. The spent medium obtained was then concentrated by Centriprep 30 (Amicon) and directly used for C2GnT-L and IGnT assay. For assaying C2GnT-M, the chimeric enzyme was adsorbed to IgG-Sepharose from the concentrated spent medium, and the enzyme bound to IgG-Sepharose was used as the enzyme source as described before (26) 20 l of the enzyme solution in total of 50 l of 50 mM 4-morpholineethanesulfonic acid, pH 7.0. After incubation at 37°C for 1 h, the reaction was stopped by diluting with 1 ml of water, and reaction products were purified by a C18 reversed-phase column (Alltech) as described previously (27).
CHO cells transfected with pcDNAI-A vector was used as a negative control, and the radioactivity obtained was subtracted from the radioactivity obtained in experiments with the enzyme. Because CHO cells contain no C2GnT or IGnT activity (22,23,28,29), the radioactivity obtained in mock experiments was Ͻ0.1% of the radioactivity incorporated in the reactions with the highest incorporation.
To identify the reaction products, the product, obtained after C18 reversed phase column chromatography, was subjected to high performance liquid chromatography using a column (4 ϫ 300 mm) of NH 2  Expression of C2GnT-L, C2GnT-M, and IGnT in CHO Cells-CHO cells were transiently transfected with pcDNAI-C2GnT-L, pcDNA3.1-C2GnT-M, or pcDNAI-IGnT, and the expression of core 2 branched oligosaccharides and I antigen was detected by T305 antibody and anti-I serum (Ma), respectively, as described previously (22,23). For expression of core 2 branched oligosaccharides, CHO cells were cotransfected with pcDSR␣-leukosialin (30) and the above vector encoding the glycosyltransferase, because T305 reacts with core 2 oligosaccharides expressed in leukosialin (22,30).
Northern Blot Analysis-Northern blots of multiple human tissues and cancer cell lines (CLONTECH) were hybridized with 32 P-labeled cDNA fragments of pcDNAI-C2GnT-L and pcDNA3.1-C2GnT-M, as described previously (25). The cDNA fragments were prepared by The maximum likelihood analysis was carried out by submitting the results to the RH server at the Stanford Genome Center and NCBI GeneMAP '98 (http://www.ncbi.nlm.nih.gov/genemap/), as described previously (31,32). Similarly, chromosomal mapping was carried out on the C2GnT-L and IGnT genes using PCR and the Stanford Human Genome Center G3 RH panel. It was also obtained using 3Ј-EST mapping data from the National Center for Biotechnology Information.

Isolation of cDNA Encoding C2GnT-M-Comparison of cloned
C2GnT and IGnT demonstrated that there are three regions highly shared by these two members of the ␤-1,6-Nacetylglucosaminyltransferase gene family (23). By searching the EST data base for identifying a novel cDNA related to C2GnT and/or IGnT, one cDNA sequence was found to be highly homologous to one of these shared regions (A region, see below). Because this novel cDNA was isolated from human colon cancer cells, we thought that the encoded enzyme may be involved in mucin oligosaccharide synthesis. To obtain a fulllength cDNA, 5Ј and 3Ј RACE reactions were performed. Finally, the full-length cDNA was prepared by reverse transcription-PCR of poly(A) ϩ RNA from HT-29 colonic carcinoma cells.
The full-length cDNA obtained encodes an open reading frame of 1314 base pairs, predicting a protein of 438 amino acid residues (50,863 Da), which we subsequently termed C2GnT-M (Fig. 1). A hydropathy plot predicts that this protein has a type II membrane topology seen for almost all mammalian glycosyltransferases so far cloned (33). The amino acid sequence of C2GnT-M has 48.2 and 33.8% identity with that of C2GnT-L and IGnT, respectively, indicating that newly cloned C2GnT-M is related more closely to C2GnT-L than to IGnT (Fig. 2). In particular, the amino acid sequence of C2GnT-M is highly homologous to that of C2GnT-L in three regions in the catalytic domain; both share 72.1, 57.7, and 75.0% identity in A, B, and C regions, respectively (Fig. 2). No other cDNA in the data base has a significant similarity with C2GnT-M.
There are two potential N-glycosylation sites (Fig. 1, asterisks) in the C2GnT-M sequence. A consensus sequence for polyadenylation signal is present in nucleotides 1747-1752 followed by a poly(A) tail (Fig. 1). Judging from the size of the mRNA (see below), the cDNA (2128 base pairs) mostly covers the whole transcript.

C2GnT-M Directs the Expression of Core 2 Branched Oligosaccharides and I Antigens on the Cell Surface-Because
C2GnT-M is highly homologous to C2GnT-L and IGnT, we tested whether C2GnT-M can direct the expression of core 2 branched oligosaccharides and I antigen on the cell surface. After transfecting with C2GnT-L or C2GnT-M together with a leukosialin cDNA, CHO cells showed strong staining with T305 antibody, which reacts with core 2 oligosaccharides on leukosialin (Fig. 3). In contrast, no staining was observed when CHO cells were transfected with pcDNAI-IGnT under the same conditions (Fig. 3, upper right). On the other hand, CHO cells were moderately and strongly stained with anti-I antibody after transfecting with pcDNA3.1-C2GnT-M and pcDNAI-IGnT, respectively (Fig. 3). The I antigen was not, however, expressed after transfecting C2GnT-L (Fig. 3, lower left). These results indicate that C2GnT-M has a unique property in having the FIG. 3. Immunofluorescent staining of transfected CHO cells by T305 or anti-I antibody. CHO cells were transfected with pcDNAI-C2GnT-L (C2GnT-L), pcDNA3.1-C2GnT-M (C2GnT-M), or pcD-NAI-IGnT (IGnT) and then incubated with T305 monoclonal antibody or anti-I human serum followed by fluorescein isothiocyanate-conjugated goat-anti-mouse IgG (for T305) or anti-human IgM (for anti-I serum). For T305 staining, the cells were co-transfected with pcDSR␣-leukosialin and cDNA encoding the glycosyltransferases. Bar, 40 m.
C2GnT-M Has Core 2, Core 4, and I Branching Enzyme Activities-Previously it has been shown that a partially purified preparation of C2GnT from bovine lung trachea contained the activities of C4GnT, C2GnT, and IGnT (16). Because our newly cloned enzyme has both the C2GnT and IGnT activities, we tested whether the cloned enzyme may be related to the enzyme previously described.
As shown in Fig. 4A, C2GnT-L exhibited a strong activity toward Gal␤133GalNAc␣3pNP and a barely detectable amount of activity toward GlcNAc␤133GalNAc␣3pNP. As expected, C2GnT-L did not show any activity toward acceptors for IGnT. C2GnT-M, in contrast, exhibited a substantial activity also to GlcNAc␤133GalNAc␣3pNP, indicating that C2GnT-M exhibits C4GnT activity as well. Interestingly, I branching enzyme activity of C2GnT-M was detected more as a dIGnT than a cIGnT (Fig. 4B). These results indicate that the newly cloned C2GnT-M has a weak but detectable activity of IGnT in addition to C2GnT and C4GnT activities.
The recombinant IGnT exhibited strong activity as a cIGnT, as shown previously (34), but also moderate activity as a dIGnT and very weak activity as a C4GnT (Fig. 3C).
Expression of C2GnT Transcripts in Various Human Tissues and Cancer Cell Lines-Northern blot analysis demonstrates that transcripts of C2GnT-M are expressed predominantly in colon, small intestine, trachea, stomach, and thyroid and are barely expressed in testis, prostate, kidney, and pancreas (Fig.  5). The transcripts of C2GnT-L, on the other hand, are more widely expressed and were detected also in heart, brain, placenta, spleen, peripheral leukocytes, lymph node, and bone marrow, where the transcripts of C2GnT-M were not detected (Fig. 5). Moreover, the transcripts of C2GnT-L were detected in HL-60, MOLT-4, and Raji leukemic cell lines, colon adenocarcinoma SW480, and cervical carcinoma HeLa cells, where the transcripts of C2GnT-M were not detected. On the other hand, the C2GnT-M transcripts were detected in A549 lung carcinoma cell line, where the C2GnT-L transcripts were not detected (Fig. 5, far right). These results indicate that C2GnT-M and C2GnT-L are differentially expressed in different tissues.
Chomosomal Mapping of the C2GnT-M Gene-To determine the chromosomal localization of the C2GnT-M gene, PCR analysis was carried out using the Stanford G3 RH panel. This analysis placed the C2GnT-M gene between two chromosome markers, D15S1160 and D15S1347, thus mapping the gene to q22.1 region of chromosome 15. Similar analysis placed the C2GnT-L gene and IGnT gene at q13 of chromosome 9 and at p24 of chromosome 6, respectively.

DISCUSSION
In the present study, we have isolated a cDNA encoding a novel C2GnT, C2GnT-M. C2GnT-M is unique in catalyzing more than one enzymatic reaction, forming core 4 branches and I branches in addition to core 2 branches. Previously, fucosyltransferase III and ST3Gal III, an ␣-2,3-sialyltransferase, were shown to add, respectively, fucose and sialic acid residues to both Gal␤134GlcNAc and Gal␤133 GlcNAc acceptors (35)(36)(37). As far as we know, we are the first to demonstrate in a definitive manner that a single enzyme catalyzes three different but related reactions.
C2GnT-M is almost exclusively expressed in small and large intestines, trachea, stomach, and thyroid among tissues examined (Fig. 5), thus likely involved in the synthesis of mucin-type oligosaccharides. Accordingly, we termed this new enzyme C2GnT-M (mucin-type). It is not apparent why the C2GnT-M is also expressed in thyroid. However, it was recently demonstrated that a major glycoprotein in calf thyroid contains core 2 branched O-glycans and I-branched N-glycans (38), indicating that thyroid synthesizes mucin-type O-glycans. It is noteworthy that C2GnT-M exhibits a significant activity of C4GnT. This finding is consistent with previous reports that the presence of C4GnT activity is always associated with the presence of C2GnT activity when various mucin-producing tissues were examined (15,16,19,20). It is very likely that C2GnT-M is responsible for C4GnT activity in these tissues. C2GnT-L, on the other hand, lacks C4GnT activity (Fig. 4). In small intestine, colon, and stomach, significant amounts of the transcripts for C2GnT-L and C2GnT-M were detected (Fig. 5), suggesting that both enzymes contribute to the C2GnT activity in these tissues. In fact, recent studies on C2GnT-L knock-out mice demonstrated that a residual C2GnT activity exists in small and large intestines among tissues examined after C2GnT-L is inactivated (39). It is most likely that C2GnT-M cloned in the present study is responsible for this activity.
C2GnT-M is also unique in transferring a GlcNAc residue preferentially to predistal galactose residues of GlcNAc␤133 Gal␤134 GlcNAc␤13 R, forming GlcNAc␤133(GlcNAc-␤136)Gal␤134GlcNAc␤13 R. Although the presence of this enzyme activity, dIGnT, was reported previously (14 -17), a cDNA encoding an enzyme acting more as a dIGnT than as a cIGnT was not reported before. The activities of dIGnT reported in hog gastric mucosa and rat liver (14,17) may be attributable to the newly cloned C2GnT-M. However, those enzymatic activities reported previously had much less C2GnT or C4GnT activity compared with dIGnT activity, whereas the opposite is true for C2GnT-M. It is thus likely that both IGnT and C2GnT-M contribute to dIGnT activity in these tissues. Our cloned enzyme most closely resembles the C2GnT preparation isolated from bovine trachea (16). However, dIGnT activity of the bovine trachea enzyme preparation was almost 40% of its C2GnT activity, whereas it is only 7% of C2GnT activity in the cloned C2GnT-M, although similar acceptors were used in both studies. On the other hand, it is noteworthy that the ratio of C2GnT and dIGnT activities in LS180, HT29, and NCI498 colonic carcinomas (20) is similar to that of C2GnT-M. Further studies are necessary to clarify this discrepancy.
The results obtained in the present study reveal that the genes of C2GnT-M, C2GnT-L, and IGnT are located in chromosome 15, q22, chromosome 9, q13, and chromosome 6, p24, respectively. The chromosomal mapping of C2GnT-L and IGnT genes was corroborated by a recent report using the Genebridge 4 RH panel, which was developed independently from the Stanford G3 RH panel (40). The gene locus for IGnT was originally reported to be in chromosome 9, q21 (23). The reason for this discrepancy in the IGnT chromosomal mapping is not clear but may be related to a more sensitive method used in the present study. Similar to the results obtained in the present study, the genes for two highly related polysialyltransferases, PST and STX, were located at different chromosomes by fluorescence in situ hybridization (41). These results, as a whole, suggest that Golgi-associated glycosyltransferases diverged relatively early in evolution.
In the present study, we could detect dIGnT activity in the IGnT cloned (23), as shown in Fig. 4, although the previous studies detected only cIGnT activity (34). This discrepancy is probably attributable to the fact that the chimeric soluble IGnT was assayed after concentration of the culture supernatant in the present study, whereas a culture supernatant was directly used in the previous study (34). These results, as a whole, indicate that in vitro assay of glycosyltransferase is a very powerful tool for detecting activity. On the other hand, the expression of I antigen by C2GnT-M, assessed by immunofluorescent staining was apparently as much as the I antigen expressed by IGnT when the full-length cDNAs were expressed in CHO cells (Fig. 3). This is strikingly in contrast to the results that C2GnT-M had only 7% activity of dIGnT compared with C2GnT activity when measured by in vitro assay (Fig. 4). These results suggest that the enzymatic activities measured by in vitro assay may not quantitatively reflect its actual activities in cells. It is possible that the protein A portion of the chimeric protein may bring a constraint on its catalytic domain. Future studies will be important to determine how C2GnT-M synthesizes these different oligosaccharides in a given cell.