Cloning of a Novel Member of the UDP-Galactose:β-N-Acetylglucosamine β1,4-Galactosyltransferase Family, β4Gal-T4, Involved in Glycosphingolipid Biosynthesis*

A novel putative member of the human UDP-galactose:β-N-acetylglucosamine β1,4-galactosyltransferase family, designated β4Gal-T4, was identified by BLAST analysis of expressed sequence tags. The sequence of β4Gal-T4 encoded a type II membrane protein with significant sequence similarity to other β1,4-galactosyltransferases. Expression of the full coding sequence and a secreted form of β4Gal-T4 in insect cells showed that the gene product had β1,4-galactosyltransferase activity. Analysis of the substrate specificity of the secreted form revealed that the enzyme catalyzed glycosylation of glycolipids with terminal β-GlcNAc; however, in contrast to β4Gal-T1, -T2, and -T3, this enzyme did not transfer galactose to asialo-agalacto-fetuin, asialo-agalacto-transferrin, or ovalbumin. The catalytic activity of β4Gal-T4 with monosaccharide acceptor substrates,N-acetylglucosamine as well as glucose, was markedly activated in the presence of α-lactalbumin. The genomic organization of the coding region of β4Gal-T4 was contained in six exons. All intron/exon boundaries were similarly positioned in β4Gal-T1, -T2, and -T3. β4Gal-T4 represents a new member of the β4-galactosyltransferase family. Its kinetic parameters suggest unique functions in the synthesis of neolactoseries glycosphingolipids.

The greater ␤4Gal-T gene family may include members with both distinct and conserved donor and/or acceptor substrate specificities. For example, a snail gene, previously identified by hybridization to a ␤4Gal-T1 probe, has acceptor substrate specificity similar to ␤4Gal-T1, but different donor substrate specificity as it is a ␤4GlcNAc-transferase (13). This ␤4GlcNActransferase is not responsive to ␣-lactalbumin modulation (14). In contrast, a snail ␤4GalNAc-transferase activity with acceptor substrate specificity similar to ␤4Gal-T1 exhibits sensitivity to ␣-lactalbumin modulation of the acceptor specificity (15). The donor substrate specificity of ␤4Gal-T1 is modulated by ␣-lactalbumin to include UDP-GalNAc, and the donor substrate specificity of the snail ␤4GalNAc-transferase activity is modulated to include UDP-Gal, albeit at much less efficiencies (15,16). Given the similarities in donor substrate specificities and ␣-lactalbumin modulation, it is likely that the snail ␤1,4GalNAc-transferase will be homologous to the mammalian ␤4Gal-T gene family (15,17). The GalNAc␤1-4GlcNAc␤1-R structure exists in man, but is mainly associated with N-linked glycans found on glycoprotein hormones (18). A putative glycoprotein ␤4GalNAc-transferase with selective activity for Nglycans associated with a specific peptide sequence has been characterized (19); however, this enzyme may be unrelated to the ␤4Gal-T gene family.
In the present study, we used human EST sequence information to identify and clone a novel member of the ␤4Gal-T gene family, designated ␤4Gal-T4. ␤4Gal-T4 is an active UDP-Gal:␤GlcNAc ␤1,4Gal-transferase with specificity for glycolipid substrates; however, it does not catalyze glycosylation of several glycoprotein acceptors, which are good substrates for other ␤4Gal-transferases. ␤4Gal-T4 exhibits ␣-lactalbumin modulation that is similar to a previously characterized snail ␤1,4Gal-NAc-transferase activity (15). The data demonstrate that members of the ␤4Gal-T gene family have distinct functions in galactosylation of different glycoconjugates, and suggest that ␤4Gal-T4 mainly plays a role in glycolipid biosynthesis.

EXPERIMENTAL PROCEDURES
Identification of ␤4Gal-T4 -The BLASTn and tBLASTn were used with the coding sequence of human ␤4Gal-T3 to search the dbEST data base at The National Center for Biotechnology Information (NCBI) as described previously (2). Overlapping segments of EST sequences were compiled and compared with known members of the human ␤4Gal-T family. cDNA clones of ESTs with the longest inserts ( Fig. 1) were obtained from Genome Systems Inc.
Cloning and Sequencing of the Full Coding Sequence of ␤4Gal-T4 -A large number of overlapping ESTs derived from a putative gene were identified and assembled by using Unigene (NCBI, transcript map A004F36). Five ESTs representing nearly the full coding sequence were selected (Fig. 1). Sequencing of EST clone 489768 revealed an open reading frame that encoded a sequence similar to ␤4Gal-T3, except that the 5Ј sequence was shorter and the clone lacked a translational initiation codon. The genomic organizations of ␤4Gal-T1, -T2, and -T3 genes were previously shown to be identical (2). Since the 5Ј sequence available from the ␤4Gal-T4 EST composite was incomplete but likely to extend into the first coding exon, the 5Ј position of the open reading frame was obtained by sequencing a genomic P1 clone. Confirmatory sequencing was performed on a cDNA clone obtained by PCR on total cDNA from the human MKN45 gastric cancer cell line with the sense primer TSHC 25 (5Ј-GTCCATCGGGGATGGGTTTTC -3Ј) and the antisense primer TSHC 12 (5Ј-CCACTGTCAGGCACAAAGTCAAC -3Ј), for 30 cycles at 95°C, 15 s; 55°C, 20 s; 72°C, 2 min 30 s. The entire sequence was confirmed by sequencing genomic P1 clones. The composite sequence contained an open reading frame of 1032 base pairs encoding a putative protein with a type II domain structure (Fig. 2).
Expression of ␤4Gal-T4 in Insect Cells-An expression construct designed to encode amino acid residues 42-344 of ␤4Gal-T4 was prepared by PCR using EST clone 489768, and the primer pair TSHC30 (5Ј-AGCGGATCCTAAAGCAAAGGAGTTCATGG -3Ј) and TSHC36 (5Ј-AGCGAATTCCAGGGTCATGCACCAAACCAG-3Ј), which include BamHI and EcoRI restriction sites, respectively (Fig. 2). The PCR product was cloned directionally between the BamHI and EcoRI site of pAcGP67A (PharMingen), and the construct sequenced to verify insertion orientation and sequence fidelity. An expression construct designed to encode the full coding sequence was prepared by PCR on cDNA of clone 489768, using the primer pair TSHC29 (5Ј-CATGGGCTTCAAC-CTGACTTTCCACCTTTCCTAC -3Ј) and TSHC36. The 12 bases missing from the 5Ј-end were added during PCR (Fig. 2), and the product was cloned into pBluescript KSϩ (Stratagene). Product encoding the full-length ␤4Gal-T4 was cloned directionally between the BamHI and EcoRI site of pVL1393 (PharMingen). Plasmids pAcGP67-␤4Gal-T4-sol and pVL-␤4Gal-T4-full were co-transfected with Baculo-Gold TM DNA (PharMingen) as described previously (21). Expression constructs pAcGP67-␤4Gal-T2-sol and pAcGP67-␤4Gal-T3-sol were prepared as described previously (2). Recombinant baculoviruses were obtained after two successive amplifications in Sf9 cells grown in serum-containing medium, and titers of virus were estimated by titration in 24-well plates with monitoring of enzyme activities. Controls included the pAcGP67-GalNAc-T3-sol (21). The kinetic properties were determined with secreted enzymes expressed in High Five™ cells grown in serum-free medium (SF-900 II, Life Technologies, Inc.) as suspension cultures in upright roller bottles shaking 140 rpm in 27°C water baths. Semipurification of enzymes was performed by consecutive chromatography on DEAE or Amberlite and S-Sepharose as described previously (22).  Table I for structures). The soluble constructs were assayed with 10 l of culture supernatant from infected cells. The full-length construct was assayed with 1% Triton X-100 homogenates of washed cells. Purified ␤4Gal-transferase from bovine milk (Sigma) and recombinant bovine ␤4Gal-T1 expressed in insect cells (Calbiochem) were used as controls. Reaction products were quantified by chromatography on Dowex-1. Assays with glycoprotein acceptors were performed with the standard reaction mixture modified to contain 100 mM Bis-Tris (pH 7) and 0.5 mg of the glycoprotein acceptor. The transfer of Gal was evaluated after acid precipitation by filtration through Whatman GF/C glass fiber filters. Assays for determination of K m of acceptor substrates were performed with semipurified enzyme in the standard reaction mixture modified to include 180 M UDP-[ 14 C]Gal. Assays for donor substrate K m were performed with 0.625 mM (for bovine ␤4Gal-T1) and 20 mM (for ␤4Gal-T4) ␤-D-GlcNAc-1-benzyl.
Northern Analysis-The cDNA fragment of soluble ␤4Gal-T4 was used as a probe. The probe was random primer-labeled using [␣-32 P]dCTP and an oligonucleotide labeling kit (Amersham Pharmacia Biotech). A human multiple tissue Northern blot, MTN I (CLONTECH), was probed overnight at 42°C as described previously (21), and washed twice for 10 min each at room temperature with 2 ϫ SSC, 0.1% SDS; twice for 10 min each at 55°C with 1 ϫ SSC, 0.1% SDS; and once for 10 min with 0.1 ϫ SSC, 0.1% SDS at 55°C.

RESULTS
Identification and Cloning of Human ␤4Gal-T4 -The strategy outlined in Fig. 1 produced a novel gene with significant sequence similarity to ␤4Gal-T3 and other members of the ␤4Gal-T gene family. A multiple sequence alignment of six human ␤4Gal-transferases is shown in Fig. 3. The ␤4Gal-T4 gene has highest sequence similarity to ␤4Gal-T3. Sequence similarities among the six human genes are found predominantly in the central regions; there were no significant similarities in the NH 2 -terminal regions. Several sequence motifs in the putative catalytic domains are conserved among all the transferases (1). Importantly, four cysteine residues are conserved in all ␤4Gal-transferases; a fifth cysteine residue in the C-terminal end of ␤4Gal-T1 is substituted by a tyrosine in the other transferases (Fig. 3) (3). N-Linked glycosylation sites are not generally conserved in glycosyltransferase species homologues or within different members of glycosyltransferase gene families; however, a single N-linked site in the C-terminal regions of ␤4Gal-T2, -T3, -T4, -T5, and -T6 is conserved (Fig. 4). Similarly, a single site in the central region of the putative catalytic domains of four ␤3Gal-transferases was conserved (26).
The predicted coding region of ␤4Gal-T4 has a single initiation codon in agreement with Kozak's rule (27), which precedes a sequence encoding a potential hydrophobic transmembrane segment (Figs. 2-4). The predicted coding sequence indicates that ␤4Gal-T4 is a type II transmembrane glycoprotein with an N-terminal cytoplasmic domain of 14 residues, a transmembrane segment of 20 residues, and a stem region and catalytic domain of 310 residues with three potential N-linked glycosylation sites (28). One N-linked site is located in the putative cytoplasmic sequence and therefore may not be utilized. A hydropathy plot (29) of ␤4Gal-T4 indicated that the putative stem region was highly hydrophilic similar to ␤4Gal-T1, -T2, and -T3 (Fig. 4). In contrast, ␤4Gal-T5 and -T6 have unusually long hydrophobic regions at the putative signal anchor sequences, which are not clearly defined. A comparison of four members of a ␤3Gal-transferase family showed that one member with exclusive substrate specificity for glycolipids, the G M1 synthase ␤3Gal-T4, differed from the other three members of the family by having a unique hydrophobic stem region (26). A hydrophobic stem region is not found in all glycosyltransferases acting on glycolipids (30). The 3Ј-untranslated region contains a polyadenylation signal at base pair 1796 (ϩ761).
Expression of ␤4Gal-T4 -Expression of a soluble construct of ␤4Gal-T4 in insect cells resulted in marked increase in galactosyltransferase activity with a number of ␤GlcNAc containing acceptor substrates, compared with uninfected cells or cells infected with a control construct (Table I). All identified substrates had ␤GlcNAc at the nonreducing end. Of the simple saccharide derivatives tested, only the disaccharide GlcNAc␤1-6GlcNAc␣1-benzyl was better than monosaccharide derivatives. ␤4Gal-T4 did have significant activity with disaccharide derivatives representing N-linked and O-linked core structures (GlcNAc␤1-6Man␣1-Me, GlcNAc␤1-2Man, and GlcNAc␤1-3GalNAc␣1-pNP) and with the biantennary pentasaccharide. In contrast, no activity was found with three glycoproteins that served as substrates for bovine milk ␤4Gal-T and human ␤4Gal-T2 and -T3 (Table II). Interestingly, ␤4Gal-T2 showed the highest relative activity with glycoprotein substrates. Previously, it was found that this enzyme has a low apparent K m for ␤GlcNAc-benzyl and UDP-Gal (2). Analysis with glycolipid substrates showed that ␤4Gal-T4 had good activity with Lc 3 Cer and 4-fold lower activity with nLc 5 Cer (Table III). Lower activity with the longer lactoseries glycolipids was previously found to be more pronounced for ␤4Gal-T3, which had 10-fold lower activity with nLc 5 Cer (2). ␤4Gal-T4 had higher apparent K m for UDP-Gal (31 M) than recombinant bovine ␤4Gal-T1 (20 M) (Table IV). No significant differences in activity of the full coding construct was found with the simple saccharide derivatives (data not shown). Although the activities of both human ␤4Gal-T1 and -T2 with GlcNAc concentrations above apparent K m are inhibited by ␣-lactalbumin (2, 7, 34), ␤4Gal-T4 showed a marked increase in N-acetyllactosamine synthase activity in the presence of ␣-lactalbumin (Fig. 6A). Two-fold activation was achieved at 0.25 mg/ml and almost 8-fold at 20 mg/ml, which is substantial when compared with 50% reduction of activities of ␤4Gal-T1 and -T2 at 0.040 mg/ml and 0.2 mg/ml, respectively (2). Activation of ␤4Gal-T4 by ␣-lactalbumin was observed at all concentrations of GlcNAc acceptor (Fig. 7A), and ␣-lactalbumin had no significant effect on the activity with ␤GlcNAc-benzyl at concentrations tested (Fig. 7B). Importantly, the apparent K m of ␤4Gal-T4 for GlcNAc could not be determined because of the low activity with this substrate even at 200 mM. ␤4Gal-T (T1) from bovine milk shows a slight degree of activation at concentrations below the K m for GlcNAc (Fig. 7A), which is in agree-ment with previous reports (7,(35)(36)(37). The activity of milk ␤4Gal-T with ␤GlcNAc-benzyl was partly inhibited by ␣-lactalbumin (Fig. 7B). Free glucose was not an acceptor for ␤4Gal-T4, but in the presence of increasing concentrations of ␣-lactalbumin, a low level of lactose synthase activity was observed (Fig.  6B). Interestingly, a low level of catalysis of xylose glycosylation (␤Xyl-MU) was also induced (data not shown). This was also found for milk ␤4Gal-T activity (37), and may suggest that the N-acetyllactosamine synthases could be structurally related to the ␤4Gal-T involved in synthesis of the proteoglycan core structure Gal␤1-3Gal␤1-4Xyl␤1-O-Ser (38,39). The concentration of ␣-lactalbumin required for induction of lactose synthase activity was 1 mg/ml with maximum activity at 20 mg/ml (Fig. 6), which is considerably higher than previously observed for the bovine milk ␤4Gal-T activity and ␤4Gal-T2, which required 400 g/ml and 100 g/ml, respectively, to achieve maximum lactose synthase activity (2). As shown in Fig. 7, ␤4Gal-T4 was not inhibited at high concentrations of either ␤GlcNAc-benzyl or free N-acetylglucosamine, which is in contrast to other ␤4Gal-transferases (2, 40). ␤4Gal-T4 showed strict donor substrate specificity for UDP-Gal and did not utilize UDP-GalNAc or UDP-GlcNAc with the acceptor substrates tested (data not shown). The soluble and full coding constructs exhibited the same modulation of activity by ␣-lactalbumin (data not shown).
Expression Pattern of ␤4Gal-T4 -Since a large number of ESTs from ␤4Gal-T4 has been identified, the cDNA library sources from which these are derived may provide information about the expression pattern. Based on this information, ␤4Gal-T4 is expressed in brain, central nervous system, colon, heart, lung, muscle, ovary, placenta, testis, and uterus.
Northern analysis with mRNA from eight human adult organs showed expression in most adult organs with highest levels observed in heart, placenta, kidney, and pancreas (Fig.  9). The transcript size of ␤4Gal-T4 was approximately 2.5 kilobase, which is similar to the transcript sizes of 2.2 kilobase for ␤4Gal-T2 and -T3. Two transcripts of 3.9 and 4.1 kilobase from ␤4Gal-T1 have been fully characterized, and shown to be differentially regulated (41,42). DISCUSSION The human ␤4Gal-transferase gene family includes at least six members, which are involved in the synthesis of the Nacetyllactosamine disaccharide in oligosaccharides and glycoconjugates (2, 4 -6, 9, 10). This large number of enzymes covering a single glycosidic linkage suggests either a high degree of redundancy in functions, or it may suggest that the enzymes have different functions. The high degree of divergence in primary sequence of the enzymes, studies of the acceptor substrate specificities of recombinant ␤4Gal-Ts (2,6), and the findings that mice deficient in ␤4Gal-T1 exhibit a severe phenotype (43,44), clearly point to different functional roles for each enzyme. Hence, the regulation of ␤1-4-galactosylation is  The sixth human member of the ␤4Gal-transferase family, ␤4Gal-T4, characterized in the study presented here, was found to exhibit unique kinetic properties. A dendrogram analysis (ClustalW) of the six human ␤4Gal-transferases indicates that the following pairs of enzymes, ␤4Gal-T1 and -T2, ␤4Gal-T3 and -T4, and ␤4Gal-T5 and -T6, are especially related (10). Since the first four identified human ␤4Gal-transferases, ␤4Gal-T1, -T2, -T3, and -T5, have similar donor and acceptor substrate specificities, it was expected that ␤4Gal-T4 also would have similar activity. ␤4Gal-T1, -T2, -T3, and -T4 all utilize ␤GlcNAc-terminating glycolipid acceptors (Lc 3 Cer, nLc 5 Cer) (2) (Table III). ␤4Gal-T4 resembled its closer homologue ␤4Gal-T3 in showing strong preference for the shorter glycolipid substrate compared with nLc 5 Cer (Table III) (2). This is in contrast to the bovine milk ␤4Gal-T, which efficiently utilized both glycolipid substrates (2). No natural glycoconjugate acceptor for ␤4Gal-T5 has been reported (6), but its close homologue, ␤4Gal-T6, transfers Gal to glucosylceramide (9).
An important finding was that the ␤4Gal-transferases showed different catalytic activities with glycoprotein acceptors. ␤4Gal-T1, -T2, and -T3 catalyzed transfer to asialo-agalacto-fetuin, asialo-agalacto-transferrin, and ovalbumin with varying efficiency, whereas ␤4Gal-T4 was inactive with these substrates (Table II). ␤4Gal-T5 was also reported to be inactive with asialo-agalacto-transferrin (6). The panel of glycoprotein substrates tested in the present study does not represent a complete set of possible N-linked glycan acceptor sequences, and the actual acceptor sequences for ␤4Gal-T1, -T2, and -T3 were not determined in the present study. Ovalbumin contains a single N-glycan with considerable heterogeneity, and mainly one potential acceptor sequence, GlcNAc␤1-2Man␣ (45). Transferrin has two complex biantennary N-linked glycans with GlcNAc␤1-2Man␣1-3Man and GlcNAc␤1-2Man␣1-6Man acceptor sites (46). Fetuin contains three complex Nglycans of the biantennary form or of the 2,4-branched triantennary type (47,48). Although no acceptor glycoprotein for ␤4Gal-T4 was identified, it is possible that this enzyme does catalyze transfer of galactose to glycoproteins since disaccharides and a pentasaccharide representing complex N-glycans served as a substrate (Table I). However, the activities with these structures were less than with monosaccharide derivatives, suggesting that indeed these oligosaccharides do not represent the glycoconjugate substrates. Fetuin also contains three O-glycans, of which some are of the complex type (49). Although ␤4Gal-T4 catalyzed glycosylation of the disaccharide structure GlcNAc␤1-3GalNAc␣1-pNP (the O-linked core 3 structure), the enzyme apparently did not work with the Olinked acceptors of asialo-agalacto-fetuin (Tables I and II). If ␤4Gal-T4 functions with glycoprotein acceptors, it may be with more complex structures. Preliminary studies with O-GlcNAc glycopeptides indicate that most enzymes can catalyze transfer to this type of protein glycosylation (50), but ␤4Gal-T4 showed the poorest activity. 3 Collectively, it appears likely that the main function of ␤4Gal-T4 is in the biosynthesis of neolactoseries glycosphingolipids.
The distinct response of ␤4Gal-T4 to ␣-lactalbumin resembles the response reported for a snail UDP-GalNAc:␤GlcNAc ␤1-4-N-acetylgalactosaminyltransferase activity (15). The ␤4GalNAc-transferase activity with GlcNAc-concentrations below K m was activated nearly 3-fold in the presence of ␣-lactalbumin, and the activity with Glc was increased 20-fold (12 mg/ml). For ␤4Gal-T1 and -T2 the relative lactose synthase activity inducible by ␣-lactalbumin is over 2-fold higher as compared with N-acetyllactosamine synthase activity without (2,7,52). A comparable analysis of human ␤4Gal-T4 and snail ␤4GalNAc-transferase activity also shows approximately 2-fold higher rates; however, it should be noted that GlcNAc is a poor substrate for these enzymes without ␣-lactalbumin (Table I) (15). If the induced lactose synthase activity is compared with   N-acetyllactosamine synthase in the presence of the same concentration of ␣-lactalbumin, the lactose synthase activity is lower than the N-acetyllactosamine synthase activity (15) (Fig.  6). The snail ␤4GalNAc-transferase activity also resembles bovine milk ␤4Gal-transferase by showing modulation by ␣-lactalbumin of broader donor substrate specificity to include UDP-Gal. The equivalent was not found for ␤4Gal-T4, which only showed activity with UDP-Gal. It has been suggested that the ␤4GalNAc-transferase activity found in snails and other invertebrates could be homologous to the ␤4Gal-transferase family, which was the case for the snail ␤4GlcNAc-transferase (17). Similarities in properties of ␤4Gal-T4 and the snail ␤4GalNActransferase suggest that ␤4Gal-T4 could represent a human homologue of this enzyme, and the human ␤4Gal-T4 could potentially be a better probe than ␤4Gal-T1 for identification and cloning of the snail ␤4GalNAc-transferase (13).
The ␤4Gal-transferase gene family appears to be derived by gene duplication with subsequent divergence in sequences. The strongest evidence for this is the finding that four of the human ␤4Gal-Ts have identical genomic organizations that includes conservation of five intron positions within the translated region. Importantly, all five introns are also found in the chick homologues of ␤4Gal-T1 and -T2 (3), and in a homologous snail ␤4GlcNAc-transferase (54). Shaper et al. (3) suggested that these two genes represented distinct ancestral lineages, and that the ␤4Gal-T2 lineage had given rise to several additional genes including ␤4Gal-T3, -T4, and -T5 in man, based on sequence analysis and chromosomal synteny of the location of chick and human ␤4Gal-T1 and -T2 homologues. Related to this, only two putative members of the ␤4Gal-transferase gene family have been identified in Caenorhabditis elegans, ce1 (GenBank accession no. Z29095) and ce2 (GenBank accession no. X98132) (1,55), and these exhibit most of the highly conserved motifs found in the chick and mammalian enzymes. The gene designated ce2 shows the highest sequence similarity to ␤4Gal-T5 and -T6, and the least to ␤4Gal-T1. ce2 contains all four cysteine residues conserved among ␤4Gal-T1, -T2, -T3, -T4, -T5, and -T6. The gene designated ce1 shows highest similarity to a more distant member of the human ␤4Galtransferase gene family, which has not been fully characterized yet. ce1 does not contain the four conserved cysteine residues, and shows several differences in other conserved motifs among the ␤4Gal-transferases. The evolutionary trait of the ␤4Galtransferase gene family thus remains to be clarified.