A Family of Human β4-Galactosyltransferases

BLAST analysis of expressed sequence tags (ESTs) using the coding sequence of the human UDP-galactose:β-N-acetylglucosamine β1,4-galactosyltransferase, designated β4Gal-T1, revealed a large number of ESTs with identical as well as similar sequences. ESTs with sequences similar to that of β4Gal-T1 could be grouped into at least two non-identical sequence sets. Analysis of the predicted amino acid sequence of the novel ESTs with β4Gal-T1 revealed conservation of short sequence motifs as well as cysteine residues previously shown to be important for the function of β4Gal-T1. The likelihood that the identified ESTs represented novel galactosyltransferase genes was tested by cloning and sequencing of the full coding region of two distinct genes, followed by expression. Expression of soluble secreted constructs in the baculovirus system showed that these genes represented genuine UDP-galactose:β-N-acetylglucosamine β1,4-galactosyltransferases, thus designated β4Gal-T2 and β4Gal-T3. Genomic cloning of the genes revealed that they have identical genomic organizations compared with β4Gal-T1. The two novel genes were located on 1p32-33 and 1q23. The results demonstrate the existence of a family of homologous galactosyltransferases with related functions. The existence of multiple β4-galactosyltransferases with the same or overlapping functions may be relevant for interpretation of biological functions previously assigned to β4Gal-T1.

During the last decade, more than 40 mammalian glycosyltransferases have been cloned and characterized (1,2). The initial strategy for cloning glycosyltransferases was cumbersome purification of labile enzyme proteins followed by screening of cDNA libraries with antibodies or DNA probes based on amino acid sequence information (3)(4)(5)(6)(7)(8)(9)(10)(11). The introduction of transfection cloning by Lowe and co-workers (12) resulted in a marked increase in the cloning of novel glycosyltransferase genes (13)(14)(15)(16)(17). A third successful approach has taken advantage of conserved sequences in glycosyltransferases that share donor and/or acceptor substrates. Thus, searches for novel members of homologous glycosyltransferase gene families utilizing conserved sequence motifs for RT-PCR 1 cloning with degenerate primers have resulted in the identification and cloning of a number of novel genes (18,19).
One part of the human genome project is the establishment of a data base of expressed sequence tags (ESTs), which currently has over 700,000 unique sequences. ESTs represent short 5Ј-and 3Ј-sequences (200 -500 bp) of cDNA clones from a large variety of human and animal organs (20). The EST data base is now estimated to contain sequence information from more than half the human genes; it therefore provides a unique source for identifying novel members of homologous gene families (21). The EST data base has recently been successfully utilized in searches for novel glycosyltransferase genes of the UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase family, where several novel members of this homologous gene family have been isolated by identification of ESTs with sequence similarities (22,23).
A number of homologous families of glycosyltransferase genes exist. The largest family identified so far is the sialyltransferase family, for which at least 11 distinct members have been identified (24). Sequence motifs shared between sialyltransferases are part of the catalytic domain, suggesting that conservation of these sequences is related to functional requirements (25). Additional homologous glycosyltransferase gene families include the ␣2and ␣3/4-fucosyltransferase families (26), the ␣3-GalNAc-transferase family (27), the ␤6-GlcNActransferase family (28), the ceramide galactosyltransferase family (29), and the polypeptide GalNAc-transferase family (23). Analysis of the sequence similarities within these glycosyltransferase gene families reveals that conserved sequences are generally limited to short sequence stretches in the putative catalytic domains, which are located in the central or C-terminal portions of enzymes that are type II transmembrane proteins. An additional characteristic is that cysteine residues in these areas are conserved in spacing (23,30). Cysteine residues are important for intramolecular disulfide bonding as well as for the catalytic activity of glycosyltransferases (31)(32)(33)(34).
The UDP-galactose:␤-N-acetylglucosamine ␤1,4-galactosyltransferase (designated ␤4Gal-T1; GenBank™/EMBL Data Bank accession number X14085) was the first glycosyltransferase to be isolated and cloned (3-5, 7, 35), and early searches for homologous genes by low stringency Southern hybridization suggested that this gene was unique. Characterization of ␤4-Gal-transferase activities from different sources, however, indicated that distinct activities exist (36,37). Emerging evidence now reveals that several ␤4-galactosyltransferase genes may exist. Shaper et al. (38) have identified two different chick cDNA sequences, which have 65 and 48% sequence similarity to human ␤4Gal-T1. Both chick cDNAs were shown to encode catalytically active ␤4-Gal-transferases. Thus, the ␤4Gal-T1 gene is likely to be part of a homologous gene family with recognizable sequence motifs, and this is supported by a large number of human ESTs with sequence similarities to ␤4Gal-T1 in EST data bases. 2 Two independent groups have analyzed ␤4-Gal-transferase activities in mice homozygously deficient in ␤4Gal-T1 (39,40). Both studies showed residual ␤4-Gal-transferase activity, providing clear evidence for the existence of additional ␤4-Gal-transferases. Uehara and Muramatsu (41) have identified and cloned a mouse cDNA that was shown to encode a protein with low ␤4-galactosyltransferase activity when expressed in Escherichia coli. This putative ␤4-galactosyltransferase exhibited little or no sequence similarity to ␤4Gal-T1, indicating that the gene is unlikely to represent an evolutionarily related member of a galactosyltransferase family. Since the gene was expressed in E. coli, it makes evaluation of the kinetic parameters uncertain as this expression system has been shown to be poor for glycosyltransferases (42).
In this study, we used available human EST sequence information to identify novel genes homologous to ␤4Gal-T1. The ␤4Gal-T1 gene was found to be highly represented in the EST data base by ESTs with nearly identical sequences, but in addition, a number of ESTs with similar sequences were identified, and these had shared motifs with conserved cysteine residues previously shown to be functionally important for ␤4Gal-T1. The full coding sequences of two of these genes were established, and expression demonstrated that these genes represented active UDP-Gal:␤-GlcNAc ␤1,4-Gal-transferase genes.

EXPERIMENTAL PROCEDURES
Identification of Genes Homologous to ␤4Gal-T1-Data base searches were performed with the coding sequence of the human ␤4Gal-T1 sequence (43) using the BLASTn and tBLASTn algorithms against the dbEST data base at the NCBI. The BLASTn algorithm was used to identify ESTs representing the query gene (identities of Ն95%), whereas tBLASTn was used to identify non-identical, but similar EST sequences. ESTs with 50 -90% nucleotide sequence identity were regarded as different from the query sequence. The results of tBLASTn searches were evaluated by visual inspection after elimination of ESTs regarded as identical to the query sequence (Ͼ95% nucleotide sequence identity). ESTs with several apparent short sequence motifs and cysteine residues arranged with similar spacing were selected for further sequence analysis. Initially, the identified ESTs (5Ј-sequence) were used in BLASTn searches of the dbEST data base to search for overlapping ESTs (95-100% identity in at least 30 bp) (see Fig. 1). If new ESTs were identified, the procedure was repeated, and sequences were merged. In addition, all identified ESTs were analyzed in the Unigene data base to confirm that they were from the same gene transcript and to select cDNA clones with the longest inserts as well as to identify additional ESTs with a non-overlapping 5Ј-sequence. Composites of all the sequence information for each set of ESTs were compiled and analyzed for sequence similarity to human ␤4Gal-T1. EST cDNA clones with the longest inserts (see Fig. 1) were obtained from Genome Systems Inc.
Cloning and Sequencing of the Full Coding Sequence of ␤4Gal-T2-Two partly overlapping ESTs were identified (see Fig. 1). Sequencing of the inserts revealed an open reading frame that potentially encoded a sequence similar to that of ␤4Gal-T1, but the 5Ј-sequence was shorter and without an initiation codon. Further 5Ј-sequence was obtained by 5Ј-rapid amplification of cDNA ends using human fetal brain Marathon-Ready cDNA (CLONTECH) in combination with the antisense primers EBER102 (5Ј-GAAACTGAGCCTTACTCAGGC) and EBER104  (5Ј-TCCACATCGCTGAAGATGAAGC) for 35 cycles at 95°C for 45 s,  55°C for 15 s, and 68°C for 3 min using the Expand kit enzyme  (Boehringer Mannheim). The products from the 5Ј-rapid amplification of cDNA ends were cloned into the BamHI/NotI site of pT7T3U19, and multiple clones were sequenced. The entire sequence was confirmed by sequencing genomic P1 clones. The composite sequence contained an open reading frame of 1191 bp potentially encoding a protein with a type II domain structure (see Fig. 2), with an overall sequence identity of ϳ63% to ␤4Gal-T1.
Cloning and Sequencing of the Full Coding Sequence of ␤4Gal-T3-One EST clone (184081) with a 1980-bp insert was identified by its 3Ј-EST sequence in Unigene (National Center for Biotechnology Information) (see Fig. 1). Sequencing of the insert revealed an intact open reading frame of 1179 bp potentially encoding a protein with a type II domain structure (see Fig. 3), with an overall sequence identity of ϳ54% to ␤4Gal-T1.
Expression of ␤4Gal-T2 and ␤4Gal-T3 in Sf9 Cells-An expression construct designed to encode amino acid residues 56 -397 of ␤4Gal-T2 was prepared by RT-PCR with mRNA from the Colo205 cell line using the primer pair EBER100FOR (5Ј-TACTTTGACGTCTACGCCCAG) and EBER114 (5Ј-GAAAACAGAGCCCAGTCCAG) with BamHI restriction sites (see Fig. 2). An expression construct designed to encode amino acid residues 23-393 of ␤4Gal-T3 was prepared by RT-PCR with RNA from the MKN45 gastric carcinoma cell line using the primer pair EBER200FOR (5Ј-CATGATGTACCTGTCACTGGGG) and EBER214 (5Ј-TAGCACGGCACCAGAGTTCAG) with BamHI restriction sites (see Fig. 3). The PCR products were cloned into the BamHI site of pAcGP67 (Pharmingen), and the construct was sequenced to verify correct insertion and sequence. A full-length coding expression construct of ␤4Gal-T3 was prepared by RT-PCR with MKN45 RNA using the primer pair EBER200FUL (5Ј-CCAGGATGTTGCGGAGGCTGC) and EBER214 (see Fig. 3), and the product was cloned into the BamHI site of pVL1193 (Pharmingen). Plasmids pAcGP67-␤4Gal-T2-sol, pAcGP67-␤4Gal-T3sol, and pVL-␤4Gal-T3-full were cotransfected with BaculoGold™ DNA (Pharmingen) as described previously (19). Recombinant baculovirus was obtained after two successive amplifications in Sf9 cells grown in serum-containing medium, and titers of virus were estimated by titration in 24-well plates with monitoring of enzyme activities. Controls included pAcGP67-GalNAc-T3-sol (19). Standard assays were performed in 50 l of total reaction mixture containing 25 mM Tris (pH 7.5), 10 mM MnCl 2 , 0.25% Triton X-100, 100 M UDP-[ 14 C]Gal (2300 cpm/ nmol; Amersham Corp.), and varying concentrations of acceptor substrates (Sigma) (see Table I for structures). The soluble constructs were assayed with 5-20 l of culture supernatant from infected cells, whereas the full-length construct was assayed with 1% Triton X-100 homogenates of washed cells. Bovine milk ␤1,4-Gal-transferase (Sigma) was used as control. Assays used for determination of K m of acceptor substrates were modified to include 200 M UDP-[ 14 C]Gal, and assays for donor substrate K m were performed with 2 mM (for ␤4Gal-T3 and bovine milk Gal-transferase) or 0.25 mM (for ␤4Gal-T2) benzyl-␤-GlcNAc.
Characterization of the Product Formed with ␤4Gal-T3-Soluble ␤4Gal-T3 was partially purified by sequential DEAE and S-Sepharose chromatographies from serum-free medium as described previously (44). Terminal glycosylation of Lc 3 Cer (see Table III for definitions of Svennerholm nomenclature) was performed in a reaction mixture consisting of 3 milliunits of ␤4Gal-T3 (specific activity determined with benzyl-␤-GlcNAc), 250 g of Lc 3 , 25 mM Tris (pH 7.4), 10 mM MnCl 2 , 0.25% Triton CF-54, and 3.5 mol of UDP-Gal in a final volume of 100 l. The Lc 3 Cer substrate was prepared from sialosyllactoneotetraosylceramide of bovine erythrocytes (45) by desialylation with 5% HAc in 2-propanol/hexane/H 2 O (55:25:20, v/v/v; lower phase) at 80°C for 4.5 h, isolation of the nLc 4 Cer product by passage through DEAE-Sephadex A-25 (Na ϩ form) in CHCl 3 /MeOH/H 2 O (30:60:8, v/v/v), and treatment of the product with jack bean ␤-galactosidase as described previously (46,47). The Lc 3 Cer product was evaluated by high performance TLC and 1 H NMR (47). The glycosylation of Lc 3 Cer with ␤4Gal-T3 was monitored by high performance TLC and run for 24 h until completed. The crude reaction mixture was taken up in ϳ3 ml of 0.1 M NaCl, sonicated thoroughly, applied to a 1-ml disposable octadecylsilica cartridge (Bakerbond, J. T. Baker Inc.), washed through with 4 ml each of 0.1 M 2 R. Almeida and H. Clausen, unpublished observation.
NaCl and H 2 O, and then eluted with 4-ml portions of H 2 O containing increasing amounts of MeOH from 10 to 90%, followed by 100% MeOH. Following concentration of each fraction, an assessment by high performance TLC analysis showed that the glycosphingolipid product was eluted as a single band with 100% MeOH, whereas the detergent eluted at 80% MeOH. The purified product was deuterium-exchanged by dissolving in CDCl 3 /CD 3 OD (2:1), evaporating thoroughly under dry nitrogen (repeating twice), and then dissolving in 0.5 ml of Me 2 SO-d 6 and 2% D 2 O (48). One-dimensional 1 H NMR spectroscopy was performed on a Bruker AMX-500 spectrometer (temperature, 308 K; spectral width, 5000 Hz acquired over 16,000 data points; relaxation delay, 2 s; and solvent suppression by presaturation pulse). NMR spectra of both the substrate and product were interpreted by reference to spectra of relevant glycosphingolipid standards acquired previously under identical conditions (46,47,49).
Northern Analysis-Human multiple tissue Northern blots were obtained from CLONTECH. The soluble expression construct of ␤4Gal-T2 and the full coding construct of ␤4Gal-T3 were used as probes. Probes were random prime-labeled using [␣-32 P]dCTP (Amersham Corp.) and an oligonucleotide labeling kit (Pharmacia Biotech Inc.). The blots were probed overnight at 42°C as described previously (19) and washed twice for 10 min each at room temperature with 2 ϫ SSC and 1% Na 4 P 2 O 2 ; twice for 20 min each at 65°C with 0.2 ϫ SSC, 1% SDS, and 1% Na 4 P 2 O 2 ; and once for 10 min with 0.2 ϫ SSC at room temperature.
In Situ Hybridization to Metaphase Chromosomes-Fluorescence in situ hybridization was performed on normal human lymphocyte metaphase chromosomes using essentially the procedures as described previously (51). Briefly, P1 DNA was labeled with biotin-14-dATP using the bioNICK labeling system (Life Technologies, Inc.). The labeled DNA was precipitated with ethanol in the presence of herring sperm DNA. A total of 300 ng of P1 DNA was precipitated with 50 ϫ human Cot1 DNA (Life Technologies, Inc.) and dissolved in 12 ml of hybridization solution (2 ϫ SSC, 10% dextran sulfate, 1% Tween 20, and 50% formamide (pH 7.0)). Prior to hybridization, the probe was heat-denatured at 80°C for 10 min, chilled on ice, and incubated at 37°C to allow re-annealing of highly repetitive sequences. After denaturation of the slides, probe incubations were carried out under a 18 ϫ 18-mm coverslip in a moist chamber for 45 h. Immunochemical detection of the probe was achieved using fluorescein isothiocyanate-labeled avidin (Vector Laboratories, Inc.) and several successive steps with rabbit anti-fluorescein isothiocyanate-and mouse anti-rabbit fluorescein isothiocyanate-conjugated antibodies. For evaluation of the chromosomal slides, a Zeiss epifluorescence microscope equipped with appropriate filters for visualization of fluorescein isothiocyanate was used. Hybridization signals and 4,6diamidino-2-phenylindole-counterstained chromosomes were transformed into pseudo-colored images using image analysis software. For precise localization and chromosome identification, 4,6-diamidino-2phenylindole-converted banding patterns were generated using the BDS-image™ software package (ONCOR).

RESULTS
Identification and Cloning of Human ␤4Gal-T2 and ␤4Gal-T3-The strategy outlined in Fig. 1 produced two novel genes with significant sequence similarity to ␤4Gal-T1 (Figs. 2 and 3). Additionally, two genes with significant similarities in the 3Ј-region were identified (data not shown). Multiple sequence alignment analysis of ␤4Gal-T1, the two novel human ␤4Gal-T enzymes, two homologous sequences from chick (38), and a snail ␤4GlcNAc-transferase (52) is shown in Fig. 4. The ␤4Gal-T1 gene shows high sequence similarity (65%) throughout the coding region to one of the chick genes (GenBank™/EMBL Data Bank accession number U19890), and ␤4Gal-T2 shows high similarity (72%) to the second chick gene (GenBank™/ EMBL Data Bank accession number U19889), suggesting that these may represent species-associated members of related gene families. The amino acid sequence similarities among the three human ␤4-Gal-transferases and the snail ␤4GlcNActransferase are limited to the central regions; there were no significant similarities in the NH 2 -terminal regions. Human ␤4Gal-T1 is closest in sequence to human ␤4Gal-T2 (52% identity), more distant from human ␤4Gal-T3 (44% identity), and most distant from snail ␤4-GlcNAc-transferase (34% identity).
Several sequence motifs in the putative catalytic domains are conserved among all the sequences. More important, four cysteine residues are conserved in all coding sequences (Fig. 4). The predicted coding region of ␤4Gal-T2 has three potential initiation codons, and the most 5Ј is in agreement with Kozak's rule (53) (Fig. 3). All three initiation codons are relatively far from the sequence encoding the transmembrane segment: 25, 30, and 37 residues, respectively. The predicted coding sequence depicts a type II transmembrane glycoprotein with three potentially different long N-terminal cytoplasmic domains, a transmembrane segment of 21 residues, and a stem region and catalytic domain of 339 residues, with three potential N-linked glycosylation sites (Fig. 2). A 3Ј-untranslated region without polyadenylation signals was included in the oligo(dT)-primed EST cDNA clones sequenced. The 3Ј-ESTs (STsG4681) were linked to chromosome 1 between D1S2861 and D1S211 microsatellite markers at 73-75 centimorgans (National Center for Biotechnology Information).
The predicted coding region of ␤4Gal-T3 has a single initiation codon, in agreement with Kozak's rule (53), placed immediately before a sequence encoding a potential hydrophobic transmembrane segment (Fig. 3). The predicted coding sequence yields a type II transmembrane glycoprotein with an N-terminal cytoplasmic domain of four residues, a transmembrane segment of 18 residues, and a stem region and catalytic domain of 371 residues, with four potential N-linked glycosylation sites. A 3Ј-untranslated region with a polyadenylation signal at position ϩ486 was included in the EST clones sequenced. The 3Ј-ESTs (STsG2055) were linked to chromosome 1 between D1S484 and D1S426 microsatellite markers at 173-181 centimorgans (National Center for Biotechnology Information).
Expression of ␤4Gal-T2-Expression of a soluble construct of ␤4Gal-T2 in Sf9 cells resulted in a marked increase in galactosyltransferase activity using the benzyl-␤-GlcNAc acceptor substrate compared with uninfected cells or cells infected with control constructs for polypeptide GalNAc-transferases or histo-blood group A and O genes (Table I) (19,54). Analysis of the substrate specificity of the soluble ␤4Gal-T2 activity showed that only benzyl-␤-GlcNAc, and not benzyl-␣-GlcNAc or benzyl-␣-GalNAc, was an acceptor substrate. Free glucose was not an acceptor, but in the presence of increasing concentrations of ␣-lactalbumin, incorporation rates similar to those for bovine milk ␤4-Gal-transferase were observed (Fig. 5A). Differences in the concentration of ␣-lactalbumin needed to achieve maximum activity with Glc were observed, with 400 g/ml required for ␤4Gal-T2 and only 100 g/ml for the bovine milk enzyme. The activities of both ␤4Gal-T2 and the bovine milk enzyme with GlcNAc were inhibited by ␣-lactalbumin, but ␤4Gal-T1 was overall more sensitive to inhibition (Fig. 5B). The apparent K m for benzyl-␤-GlcNAc was 160 M, and the K m for UDP-Gal using benzyl-␤-GlcNAc was 11 M (Table II). Bovine milk ␤4-galactosyltransferase showed higher a K m for UDP-Gal, in agreement with previous studies (37,(55)(56)(57)(58), and the measured K m for GlcNAc was similar to that determined in some studies (59,60), but 5-10-fold higher compared with other studies (55-58). As shown in Fig. 6, ␤4Gal-T2 was inhibited at high concentrations of both benzyl-␤-GlcNAc and free N-acetylglucosamine to a higher degree than bovine milk ␤4-Gal-transferase and ␤4Gal-T3 (61). ␤4Gal-T2 showed strict donor substrate specificity for UDP-Gal and did not utilize UDP-GalNAc or UDP-GlcNAc with the acceptor substrates tested (data not shown).
Expression of ␤4Gal-T3-Expression of the soluble construct of ␤4Gal-T3 in Sf9 cells produced a marked increase in Galtransferase activities with the benzyl-␤-GlcNAc acceptor substrate compared with uninfected cells or cells infected with the  control construct for GalNAc-T3 (Table I). A similar increase in activity was found in Sf9 cell homogenates after expression of the full-length construct of ␤4Gal-T3 (data not shown). The substrate specificity was similar to that of ␤4Gal-T2. ␤4Gal-T3 was largely insensitive to ␣-lactalbumin and did not efficiently utilize glucose (Fig. 5). The apparent K m for benzyl-␤-GlcNAc was 580 M, and the K m for UDP-Gal using benzyl-␤-GlcNAc was 84 M (Table II). ␤4Gal-T3 showed strict donor substrate specificity for UDP-Gal. Analysis of the specificity of ␤4Gal-T3 with glycolipid substrates revealed that Lc 3 Cer was efficiently utilized, whereas Glc␤1-Cer and nLc 5 Cer were poor substrates (Table III). This was in contrast to the activities found with the bovine milk ␤4-Gal-transferase that efficiently utilized both Lc 3 Cer and nLc 5 Cer.
Northern Analysis of ␤4Gal-T2 and ␤4Gal-T3-Northern analysis with mRNA from 16 human adult organs showed a single transcript of both genes of ϳ2.2 kb (Fig. 8). ␤4Gal-T2 was expressed weakly in several adult organs, with the highest expression in prostate, testis, ovary, intestine, and muscle. ␤4Gal-T3 appeared to be more strongly expressed than ␤4Gal-T2, but with a similar pattern, with the exception of placenta, where strong expression of T3 was also found.
Genomic Organization and Chromosomal Localizations-The coding regions of ␤4Gal-T2 and ␤4Gal-T3 were found in seven and six exons, respectively (Fig. 9). Human ␤4Gal-T1 and mouse ␤4Gal-T1 are encoded in six exons (62,63). The first putative coding exon of ␤4Gal-T2 encodes the first potential initiation codon and only eight amino acid residues of the N-terminal sequence of the longest form (Fig. 9), and it is  possible that the most 5Ј-initiation codon is not used. An intron in the 5Ј-untranslated region of ␤4Gal-T3 was found in a similar position as the most 5Ј-intron in ␤4Gal-T2 (Fig. 10). Comparison of the intron/exon boundaries of ␤4Gal-T1, ␤4Gal-T2, and ␤4Gal-T3 revealed that the five introns in the coding regions of the three genes are placed identically (Fig. 10). The central coding exons of all three genes are of nearly identical length, and variation in lengths of coding regions may be attributable to different initiation and stop codons in the first and last coding exons. Human ␤4Gal-T1 was previously localized to chromosome 9p13 (64). The ␤4Gal-T2 and ␤4Gal-T3 genes were localized to chromosomes 1p32-33 and 1q23 by fluorescence in situ hybridization (Fig. 11). Nearly identical localizations were obtained from the EST mapping. No specific hybridization signals were observed at other chromosomal sites. For each gene, a total of 20 cells in metaphase were analyzed. DISCUSSION The data presented here demonstrate that the human ␤4Gal-T1 gene is a member of a large family of homologous glycosyltransferase genes and that at least three of the members of this family, ␤4Gal-T1, ␤4Gal-T2, and ␤4Gal-T3, encode ␤1,4-galactosyltransferases with similar kinetic parameters. The EST strategy used in this study has successfully been applied to other homologous gene families (22,23,65). The short sequence information of ϳ200 -400 bp obtained from a single EST, which may contain sequencing artifacts, may not be considered strong evidence for a particular gene sequence of interest. However, in our experience, the reliability of EST 0.93 333 (58) c a Assayed using 100 g of Triton CF-54/100-l reaction mixture (enzyme source medium from infected S f 9 cells). b Assayed using 100 g of taurodeoxycholate/100-l of reaction mixture (enzyme source, Sigma). c Assayed using 100 g of Triton CF-54/100 l of reaction mixture. d ND, not detectable. sequences is very high, and the finding of similar protein sequence motifs and conserved cysteine residues similar to those of the query sequence strongly indicates the identification of a novel homologous gene. The establishment of the Unigene data base further confirms this by identifying other ESTs from the same gene from which more sequence can be compiled by direct comparison of sequences or with available insert sizes of ESTs. Thus, with the two genes identified in this study, most of the coding region can be identified from merged EST sequences, and a more detailed assessment of the authenticity of the putative homologous gene can be made. One drawback of the EST cloning strategy is that genes with long 3Ј-untranslated sequences may not be represented, and this has been a feature associated with some glycosyltransferase genes (8,10,66). A strong point of the EST strategy is that it offers access to expressed sequences from many different adult and fetal tissues, leading to the possibility of identifying genes with very restricted expression patterns.
Some evidence suggested that ␤4Gal-T1 was a member of a homologous gene family. One distant member of this family was found to use UDP-GlcNAc and not UDP-Gal (52), indicating that members may have different donor substrate specificities and functions. Interestingly, UDP-Gal:␤-GlcNAc ␤1,3galactosyltransferases have no sequence similarity with ␤4Gal-T1 (GenBank™/EMBL Data Bank Accession number E07739). 3 Several features of the two identified genes suggested that they represented genuine glycosyltransferase genes. The genes had higher sequence similarities to ␤4Gal-T1 than to a homologous ␤4-GlcNAc-transferase from snail (52). Four cysteine residues were conserved in all sequences, and these have been found to be involved in intramolecular disulfide bonding and catalysis for ␤4Gal-T1 (31,32). Finally, the genomic organizations of both genes were shown to be identical to that of ␤4Gal-T1, with the positions of five intron/exon boundaries conserved (Fig. 10). The last finding strongly suggests that this gene family arose through complete gene duplication late in evolution. Furthermore, the genomic organization of the homologous snail ␤4-GlcNAc-transferase gene was recently shown to be organized similarly, with conservation of the same five intron positions, but additionally, two exons were found as well (Fig. 10) (67). More important, the two chick ␤4-Gal-transferases identified by Shaper et al. (38) were also found to have genomic organizations identical to that of the ␤4Gal-T1 gene. The three ␤4-Gal-transferase genes have different chromosomal locations (64). Several large glycosyltransferase gene families have now been found to have different chromosomal locations (23,24), but some are also clustered in one region (26,28,66).
The ␤4Gal-T1 gene is different from most other known glycosyltransferase genes in having two translational initiation sites controlled by two different promoters (35,68). The two isoforms of ␤4Gal-T1 differ by having a short (11 residues) or long (24 residues) cytoplasmic sequence, and studies indicate that the short form primarily localizes to the Golgi apparatus and the long form primarily to the cell membrane (69). A number of biological roles have been attributed to a cell membrane form of ␤4Gal-T1 (70). The ␤4Gal-T2 gene, showing the highest similarity to ␤4Gal-T1, also resembles ␤4Gal-T1 in 3 M. Amado and H. Clausen, manuscript in preparation. having three initiation codons, potentially yielding isoforms with 25-, 30-, or 37-residue cytoplasmic sequences. A sequence motif (L(E/Q)RXC) is found in the cytoplasmic sequences of the human and chick genes immediately preceding the hydrophobic putative transmembrane signal sequence (Fig. 4). Since the major parts of the N-terminal regions of these genes show little or no sequence similarities, this conserved motif may be important. Bendiak et al. (71) have shown that the ␤4-Gal-transferase appears to exist as a high molecular weight aggregate, and this motif could potentially play a role in such aggregation. Aoki et al. (34) showed that a cysteine residue in the hydrophobic transmembrane sequence was important for Golgi retention, and this cysteine is conserved in ␤4Gal-T2, but not ␤4Gal-T3 or the snail ␤4-GlcNAc-transferase (Fig. 4). The implications of this are not known at present.
The fine specificity of the three ␤4-Gal-transferases for different types of glycoconjugates and branch points of oligosaccharide structures needs to be determined in comparative assays with recombinant forms of the enzymes. Here, we have correlated the activities of recombinant soluble ␤4Gal-T2 and ␤4Gal-T3 with purified bovine milk enzyme, which potentially contains other ␤-Gal-transferase activities than ␤4Gal-T1. The substrate specificities with monosaccharides and simple derivatives were similar (Table I). ␤4Gal-T2 had a lower K m for UDP-Gal than both bovine milk ␤4-Gal-transferase and ␤4Gal-T3 (Table II) and also a lower K m for the best acceptor substrates (benzyl-␤-GlcNAc and GlcNAc) used in this study. At high concentrations of acceptor substrates (benzyl-␤-GlcNAc and GlcNAc), all three enzymes showed inhibition of activities, but ␤4Gal-T2 was the most sensitive (Fig. 6). Inhibition of ␤4Gal-T1 was previously noted at concentrations in excess of 20 -30 mM (61), but in the present study, this was seen at higher concentrations.
As ␤4Gal-T1 and ␤4Gal-T2 were most similar in sequence and seemed to have similar properties, the specificity of ␤4Gal-T3 was studied in more detail. Analysis of ␤4Gal-T3 and bovine milk ␤-Gal-transferase with glycolipid substrates showed that both enzymes utilize Lc 3 Cer efficiently, and characterization of the product formed by ␤4Gal-T3 using 1 H NMR clearly established the expected product as being nLc 4 with a Gal␤1-4GlcNAc linkage. Interestingly, marked differences in activities were found with the extended substrate nLc 5 Cer (Table  III). In contrast to the bovine milk enzyme, ␤4Gal-T3 poorly utilized nLc 5 Cer, which may suggest that ␤4Gal-T1 is involved in all steps of the synthesis of poly-N-acetyllactosamine chain extension, whereas ␤4Gal-T3 may be mainly involved in the synthesis of the first N-acetyllactosamine unit. Differential expression of ␤4-Gal-transferases may be part of the regulation of poly-N-acetyllactosamine chain extension, which appears to be developmentally regulated (76,77).
The differences found in the kinetic properties of the three ␤-Gal-transferases may relate to past findings of ␤-Gal-transferase activities in different organs and in B cells of rheumatoid arthritis patients. Furukawa et al. (37) showed that liver ␤4-Gal-transferase activity was near 20-fold higher with asialoagalactotransferrin compared with asialo-agalacto-IgG, whereas the activity found in T and B cells showed only a 4 -5-fold difference with the two substrates. The ␤4-Gal-transferase activity in B cells of rheumatoid arthritis patients appears to be similar to that in B cells of healthy controls with several substrates, including asialo-agalactotransferrin (37) and ␤-GlcNAc-phenylisothiocyanate-bovine serum albumin (78), but different with asialo-agalacto-IgG (37). Furthermore, the K m for UDP-Gal of ␤4-Gal-transferase activity in B cells of rheumatoid arthritis patients was 2-fold higher (35.6 M) than that in normal B cells (17.6 M) (37). Finally, the activity in B cells with asialo-agalactotransferrin was more sensitive to ␣-lactalbumin inhibition than the activity with asialo-agalacto-IgG. It is intriguing that ␤4Gal-T2 has a lower K m for UDP-Gal than the other two enzymes and that this is close to that found in normal B cells (37). In addition, ␤4Gal-T2 was slightly less sensitive to ␣-lactalbumin inhibition, in agreement with the activity with asialo-agalacto-IgG found in normal B cells. A number of studies have concluded that there was no change in ␤-Gal-transferase activity in B cells of rheumatoid arthritis patients (79,80). With the new insight into the ␤-Gal-transferase family, which includes at least two additional genes, it is possible that the contradictory findings of Furukawa et al. (37) can be explained by a model with two ␤-Gal-transferases with different kinetic parameters expressed in normal B cells and a selective down-regulation of one in B cells of rheumatoid ar-thritis patients. ␤4Gal-T2 may be a candidate for such a second enzyme, and studies are now in progress to determine this.
␤4Gal-T2 and ␤4Gal-T3 were differentially expressed in human organs (Fig. 6). In contrast, ␤4Gal-T1 is considered to be more widely expressed, and extensive analysis of the regulatory elements indicated that this gene is a ubiquitously expressed housekeeping gene (81). However, a detailed comparative analysis of the expression patterns of the three enzymes at the level of individual cell types is not available at present. Two additional members of this gene family have been cloned, and expression is ongoing; and it is possible that these may represent ␤4-Gal-transferases with different expression patterns. Analysis of ␤4-Gal-transferase activities in extracts of organs from mice deficient in ␤4Gal-T1 showed that only 5% activity remain in liver and spleen (40); however, 30% activity was left in brain, although the total activity in this organ was low (39). Interestingly, expression of ␤4Gal-T2 and ␤4Gal-T3 could not be detected in liver, but ␤4Gal-T3 was weakly expressed in spleen and brain. Studies of ␤4-Gal-transferase activities by Shur and co-workers (39) have generally been performed with 20 -30 mM N-acetylglucosamine, based on kinetic properties of cell-surface ␤4-Gal-transferase activity in mouse sperm and embryonic carcinoma cell lines (6,61). In studies of sperm ␤4-Gal-transferase activity, the optimal acceptor substrate concentration was determined to be 20 mM GlcNAc, with some inhibition at higher concentrations. In this study, we show that all three ␤4-Gal-transferases are inhibited, but that ␤4Gal-T2 is inhibited at lower concentrations than both bovine milk ␤4-Gal-transferase and ␤4Gal-T3 (Fig. 6). Thus, analysis of galactosyltransferase activities at high acceptor substrate concentrations may be misleading for the actual activity present. Since mice deficient in ␤4Gal-T1 developed normally before birth and only exhibited reduced growth and lethality after birth, it is likely that other ␤4-Gal-transferases besides ␤4Gal-T1 are functioning in fetal life. Analysis of serum glycoproteins of 10-week-old mice deficient in ␤4Gal-T1 showed a marked decrease in galactosylation (40), suggesting that the major ␤4-Gal-transferase involved in the synthesis of these glycoproteins is ␤4Gal-T1. In contrast, Lu et al. (39) found that despite a lack of detectable ␤4-Gal-transferase activity in salivary glands, a normal level of Gal␤1-4GlcNAc terminating chains in salivary gland extracts was found by lectin analysis and ␣2,6-sialylation. It is therefore likely that another ␤4-Galtransferase besides ␤4Gal-T1 is active in salivary glands.
It is clear that the single step of forming the Gal␤1-4GlcNAc␤ linkage is performed by a family of ␤4-Gal-transferases and that each member of this family may play a distinct role in different cells. A similar finding was made for the initiation step of GalNAc and Man O-glycosylation in animals and yeast. Large families of homologous enzymes with distinct but overlapping specificities and different expression patterns perform a reaction previously considered to be maintained by a single enzyme (23,54). The ␣2-fucosyltransferases (82), ␣3/4fucosyltransferases (83), and some of the sialyltransferases (24) are other examples of this phenomenon of partial redundancy. The findings that the genes in most glycosyltransferase families have diverged considerably and show only 30 -50% sequence similarities combined with differential expression patterns suggest that each member of these gene families plays a distinct role. The seemingly similar enzyme activities determined in in vitro assays may not fully reveal the in vivo functions of these enzymes.
From the present status of cloning of glycosyltransferases, one may predict that there is no requirement for primary sequence similarities between glycosyltransferases utilizing the same donor or acceptor substrates per se. Analysis of the known glycosyltransferase gene families shows that these have common specificities for the nucleotide of the donor substrate and the anomeric configuration of the linkage formed, whereas the acceptor substrate and the linkage positions formed seem to be more variable. The sialyltransferases all utilize CMP-NeuAc and form ␣-anomeric linkages, but they form linkages to several different acceptors (Gal, GalNAc, GlcNAc, and NeuAc) and at different positions (C-3, C-6, and C-8) (24). Although all members of this gene family share some sequence similarity, it is apparent that members forming the same linkage type have higher sequence similarity. The ␣1,3/4-fucosyltransferase family utilizes GDP-Fuc and forms either ␣1-3 or ␣1-4 linkages to GlcNAc (84). The ␣3-GalNAc-transferase family utilizes UDP-Gal or UDP-GalNAc and forms ␣1-3 linkages to Gal (27). The ␤6-GlcNAc-transferases use different acceptor substrates (28). The polypeptide GalNAc-transferase family generally utilizes UDP-GalNAc, but one member (GalNAc-T2) was recently found also to utilize UDP-Gal (23,44). Finally, the snail ␤4-GlcNAc-transferase homologues to the ␤4-Gal-transferase family utilize UDP-GlcNAc, but form the same linkage. Therefore, specific functions of a newly identified homologous gene in a glycosyltransferase family are not easily predicted even though homologous glycosyltransferases recognize the same nucleotide of the donor substrate and form the same anomeric configuration of linkages.
The existence of multiple ␤4-Gal-transferases has implications for interpretation of past studies of the expression pattern and biological functions previously assigned to ␤4Gal-T1; this is what was learned from analysis of mice deficient in ␤4Gal-T1. Detection of gene expression of ␤4Gal-T1 by Northern or in situ hybridization is likely to be representative of genuine ␤4Gal-T1 expression, but analysis of enzyme activities in extracts or body fluids may represent all or any of the ␤4-Galtransferases, and knowledge of the kinetic parameters of each of the enzymes is required to ascertain that all activities are measured. Immunodetection with antibodies prepared against purified rather than recombinant ␤4-Gal-transferases may likewise be directed toward any or several of the enzymes in the family.