Genomic Organization and Chromosomal Assignment of the Human b1,4-N-Acetylgalactosaminyltransferase Gene

The β1,4-N-acetylgalactosaminyltransferase (β1,4GalNAc-T) (EC) gene is expressed in normal brain tissues and in various malignant transformed cells, such as malignant melanoma, neuroblastoma, and adult T cell leukemia. To analyze the regulatory mechanisms of gene expression, we determined the genomic organization of the β1,4GalNAc-T gene. The gene consists of at least 11 exons and spans >8 kilobase pairs. The coding region is located in exons 2-11. To determine the transcription initiation sites, 5′-rapid amplification of cDNA ends analysis and ribonuclease protection assays were performed using RNA obtained from the human melanoma cell line SK-MEL-31. Consequently, we defined three transcription initiation sites and the alternative usage of three exons. Exons 1a and 1b partially overlap; the latter is part (3′-side) of the former and corresponds to the 5′-noncoding region of the cDNA clone previously isolated. The third transcript, exon 1c, corresponds to nucleotides −520 to −412 (position +1 = A of ATG of β1,4GalNAc-T cDNA), which are considered to be in intron 1 based on the cloned cDNA sequence. Ribonuclease protection assays revealed the corresponding protection bands in samples of the gene-expressing cell lines. 5′-Flanking regions of individual initiation sites showed promoter activity when analyzed by chloramphenicol acetyltransferase assay in SK-MEL-31 cells. The multiple transcription initiation sites and their promoters/enhancers identified here might be differentially involved in the cell type-specific expression of the β1,4GalNAc-T gene. This gene was assigned to human chromosome 12q13.3 by means of fluorescence in situ hybridization.

The ␤1,4-N-acetylgalactosaminyltransferase (␤1,4Gal-NAc-T) (EC 2.4.1.92) gene is expressed in normal brain tissues and in various malignant transformed cells, such as malignant melanoma, neuroblastoma, and adult T cell leukemia. To analyze the regulatory mechanisms of gene expression, we determined the genomic organization of the ␤1,4GalNAc-T gene. The gene consists of at least 11 exons and spans >8 kilobase pairs. The coding region is located in exons 2-11. To determine the transcription initiation sites, 5-rapid amplification of cDNA ends analysis and ribonuclease protection assays were performed using RNA obtained from the human melanoma cell line SK-MEL-31. Consequently, we defined three transcription initiation sites and the alternative usage of three exons. Exons 1a and 1b partially overlap; the latter is part (3-side) of the former and corresponds to the 5-noncoding region of the cDNA clone previously isolated. The third transcript, exon 1c, corresponds to nucleotides ؊520 to ؊412 (position ؉1 ‫؍‬ A of ATG of ␤1,4GalNAc-T cDNA), which are considered to be in intron 1 based on the cloned cDNA sequence. Ribonuclease protection assays revealed the corresponding protection bands in samples of the gene-expressing cell lines. 5-Flanking regions of individual initiation sites showed promoter activity when analyzed by chloramphenicol acetyltransferase assay in SK-MEL-31 cells. The multiple transcription initiation sites and their promoters/enhancers identified here might be differentially involved in the cell type-specific expression of the ␤1,4GalNAc-T gene. This gene was assigned to human chromosome 12q13.3 by means of fluorescence in situ hybridization.
Glycosphingolipids are amphipathic molecules consisting of a hydrophilic carbohydrate moiety and lipophilic ceramides (1). They are expressed mainly on the cell membrane and have various roles in cell-cell or cell-extracellular matrix recognition and in the regulation of signaling events (2). Although carbohydrate structures on the glycosphingolipids as well as those on glycoproteins are enormously diverse, they are usually characteristic of different stages of development, a distinct cell lineage, and various steps of malignant transformation (3). These cell type-specific profiles of glycolipid components are determined by a combination of the glycosylation machineries expressed in individual tissues and cells (4). However, little is known about the mechanisms regulating glycosylation systems (5).
Studies of the regulation of carbohydrate synthesis and expression have been hampered by the lack of cloned glycosyltransferase genes. However, the isolation of cDNAs of these genes (6) has improved the understanding of the genomic organization and the control of the cell type-specific expression of glycosyltransferases. The structure and tissue-specific expression of the ␤-galactoside ␣2,6-sialyltransferase (7-10) and ␤1,4galactosyltransferase (11-13) genes have been analyzed. However, the enzymes encoded by these two genes catalyze the synthesis of carbohydrate chains on various glycoproteins, and the glycosyltransferase genes that are mainly responsible for the synthesis of glycosphingolipids have not been studied.
The level of ␤1,4-N-acetylgalactosaminyltransferase (␤1, 4GalNAc-T) 1 (G M2 /G D2 2 synthase; EC 2.4.1.92) activity is generally high in neuroectoderm-derived tumor cells, such as neuroblastomas and malignant melanomas (14). Northern blots and reverse transcription-polymerase chain reactions have revealed that the mRNA levels of the gene almost correlate with enzyme activities and G M2 /G D2 expression in individual cell lines (15). Moreover, the specific expression of G D2 and the possible transactivation of the ␤1,4GalNAc-T gene by human T cell lymphotrophic virus type I p40 tax protein in adult T cell leukemia cells have been reported (16). In addition, we demonstrated that the mRNA level of ␤1,4GalNAc-T in mouse brain gradually increases during development until birth (17), which corresponds with the reported changes in enzyme activity (18). These results indicate that the synthesis and expression of G M2 /G D2 are mainly controlled at the level of transcription of the ␤1,4GalNAc-T gene, with additional modification by many other epigenetic factors, and they suggest that there are complex regulatory systems of gene expression depending on the state of individual cells, such as neural development, malignant transformation or progression, and viral infection.
To understand the basis for the cell type-specific expression of the ␤1,4GalNAc-T gene, we isolated genomic clones of the gene, analyzed the genomic organization, and defined multiple transcription initiation sites and their promoters. We also determined the chromosomal location of the ␤1,4GalNAc-T gene by fluorescence in situ hybridization (FISH).

EXPERIMENTAL PROCEDURES
Materials-Restriction enzymes used for the cloning and sequencing of DNA were purchased from Takara Shuzo Corp. (Kyoto, Japan) and Nippon Gene Corp. (Toyama, Japan). A library of a Sau3AI partial digest of human placental DNA cloned into EMBL-3 SP6/T7 (CLON-TECH, Palo Alto, CA) was screened using probes derived from a human ␤1,4GalNAc-T cDNA clone (M2T1-1) (19). The human melanoma lines SK-MEL-31 and MeWo as well as the astrocytoma line AS were obtained from Dr. L. J. Old (Sloan-Kettering Cancer Center, New York).
Isolation and Characterization of Genomic Sequences-The amplified genomic library containing ϳ3 ϫ 10 6 plaques was screened by hybridization with a radiolabeled XbaI fragment of human ␤1,4GalNAc-T cDNA (pM2T1-1) (19). Insert DNA fragments were initially characterized by restriction enzyme digestion and Southern blotting using oligonucleotide probes based on the cDNA sequences. Human genomic DNA fragments that hybridized with ␤1,4GalNAc-T cDNA probes were subcloned into pBluescript II SK ϩ (Stratagene). All genomic exon regions were completely sequenced to determine intron interruptions and to confirm that the genomic sequences were in complete agreement with known ␤1,4GalNAc-T cDNA (pM2T1-1) data (19). Double-stranded sequencing was performed using the Taq Dye Primer Cycle sequencing kit and a Model 370A DNA sequencer (both from Applied Biosystems, Foster City, CA).
5Ј-Rapid Amplification of cDNA Ends (RACE) Analysis-A modified RACE analysis was performed to clone gene-specific 5Ј-ends using the 5Ј-AmpliFINDER RACE kit (CLONTECH) according to the manufacturer's instructions. First strand cDNAs were synthesized by avian myeloblastosis virus reverse transcriptase from 2 g of poly(A) ϩ mRNA of SK-MEL-31 with the gene-specific antisense reverse transcription primer corresponding to nucleotides ϩ235 to ϩ254 (5Ј-CTGGACTCA-CAACTGCAGTT-3Ј; position ϩ1 ϭ A of ATG of ␤1,4GalNAc-T cDNA) (19) downstream of the ATG codon. RNA was digested with NaOH, and the remaining oligonucleotide primers were removed. The purified cDNA was ligated with the AmpliFINDER R anchor using T4 RNA ligase and then amplified by PCR using an anchor primer complementary to the anchor sequences combined with a nested antisense primer corresponding to nucleotides ϩ98 to ϩ118 (5Ј-ACGGCGCAAGAGG-TAGCCGGA-3Ј) of ␤1,4GalNAc-T cDNA (see Fig. 2). The amplified product was cloned into pT7Blue(R) (Novagen) and sequenced as described above.
RNase Protection Analysis-The cRNA probes were synthesized from subcloned templates. To construct the plasmid used to prepare RNA probe a (see Fig. 3A), a 5Ј-AvaII/3Ј-BamHI fragment (nucleotides Ϫ1141 to Ϫ948 of ␤1,4GalNAc-T genomic DNA) in which the AvaII fragment was blunt-ended with Klenow enzyme and ligated with a HindIII linker was cloned into the HindIII and BamHI sites of pBluescript II SK ϩ . For RNA probes b and c (see Fig. 3A), two genomic DNA fragments spanning the regions from nucleotides Ϫ823 to Ϫ638 (probe b) and from nucleotides Ϫ651 to Ϫ441 (probe c) were amplified by PCR. PCR products b and c were subcloned into the BamHI and XbaI sites of pBluescript II SK ϩ and into pT7Blue(R), respectively, and the PCR products were confirmed by sequencing. Total RNA was isolated from cell lines as described (20). RNase protection assay was performed using the ribonuclease protection assay kit RPA II (Ambion INC., Austin, TX) according to the manufacturer's instruction. Total RNA (50 g) was hybridized with 4 ϫ 10 4 to 10 5 cpm each of the [␣-32 P]UTP-labeled riboprobes at 55°C overnight, followed by digestion with RNase A and RNase T1. The protected products were compared on a 6% sequencing gel with a known DNA sequencing ladder. The protected bands were detected by exposing the gel to an imaging plate (BAS-III, Fuji Photo Film Co.) for 10 -12 h.
Primer Extension-Primer extension analysis was performed using the avian myeloblastosis virus reverse transcriptase primer extension system (Promega). An oligonucleotide primer with a sequence complementary to exon 1c of the ␤1,4GalNAc-T gene (nucleotides Ϫ491 to Ϫ459) was radiolabeled at the 5Ј-end with T4 polynucleotide kinase and [␥-32 P]ATP (3000 Ci/mmol). The labeled primer was added to 50 g of total RNA isolated from SK-MEL-31, AS, or NALM6 in avian myeloblastosis virus primer extension buffer. The reaction mixtures were heated at 68°C for 10 min, annealed at 58°C for 90 min, and then left at room temperature to cool for 10 min. Avian myeloblastosis virus reverse transcriptase was added, and primer extension was performed at 42°C for 1 h. The reactions were terminated, and the extension products were precipitated and then analyzed on an 8% DNA sequencing gel. The size was determined based on a DNA sequencing ladder.
Chloramphenicol Acetyltransferase (CAT) Assay-Each fragment of the ␤1,4GalNAc-T gene, nucleotides Ϫ2228 (SalI) to Ϫ692 (BanII), Ϫ2228 to Ϫ948 (BamHI), Ϫ947 to Ϫ692, Ϫ2228 to ϩ156 (BglII), and Ϫ947 to ϩ156, were ligated to the CAT reporter gene (see Figs. 5A and 6A). To construct pGTCATϪ2228 (SalI)/Ϫ692 (BanII) and pGT-CATϪ2228 (SalI)/ϩ156 (BglII), the BanII and BglII sites were bluntended and ligated with a HindIII linker; then, the SalI/⌬BanII (⌬ means a modified restriction enzyme site) or SalI/⌬BglII fragment of the genomic DNA was ligated into the SalI and HindIII sites of pSV00CAT (Nippon Gene Corp.). For pGTCATϪ947/Ϫ692 and pGT-CATϪ947/ϩ156, fragment Ϫ2228 to Ϫ948 was deleted from pGT-CATϪ2228/Ϫ692 and pGTCATϪ2228/ϩ156, respectively. Plasmid pGTCATϪ2228/Ϫ948 was constructed by deleting fragment Ϫ947 to Ϫ692 from pGTCATϪ2228/Ϫ692. For constructs pGTCATϪ662/Ϫ441 and pGTCATϪ441/Ϫ662 (see Fig. 6B), the region between nucleotides Ϫ662 and Ϫ441 of the ␤1,4GalNAc-T gene was amplified by PCR and sequenced for confirmation. This PCR product (fragment Ϫ662 to Ϫ441) was ligated with the promoterless CAT gene in the sense or antisense direction. To prepare 5Ј-deletion constructs, fragment Ϫ2228 to ϩ156 of the ␤1,4GalNAc-T gene linked to the CAT gene was ligated into the XbaI and SalI sites of pUC19 to make convenient restriction sites for deletion and was then deleted from the 5Ј-end (see Fig. 6B) using a deletion kit (Takara Shuzo Corp.). Cell lines were seeded at 1.5-2.0 ϫ 10 6 /60-mm dish and were transfected using the calcium phosphate procedure as described (21). Briefly, the cells were transfected with 7 g of DNA mixed with 3 g of the pSV-␤-galactosidase plasmid (Promega) or a luciferase expression vector (pCEV/Luc) presented by Dr. J. Takeda (Osaka University) to quantify transfection efficiencies and incubated at 37°C for 4 -6 h. Thereafter, the growth medium was changed, and the cells were incubated at 37°C for an additional 48 h, washed with phosphate-buffered saline, harvested, and assayed for CAT reporter  gene activity by thin layer chromatography as described (21). ␤-Galactosidase activities were measured according to the manufacturer's instructions (Promega). The luciferase assay was performed using Pica Gene (Toyo Ink Corp., Tokyo) according to the manufacturer's instructions. FISH-One of the phage clones (C4-1), in which the ␤1,4GalNAc-T gene was inserted, was labeled with biotin-16-dUTP using a nick translation labeling kit (Boehringer Mannheim). FISH was performed on non-and R-banded normal human metaphase chromosomes using human Cot-1 DNA (Life Technologies, Inc.) as a competitor as described (22). FISH signals in 50 mitotic cells were detected with fluorescein isothiocyanate-conjugated avidin, and chromosomes were counterstained with propidium iodide. Photomicroscopy was performed under a fluorescence microscope using Nikon B-2A and B-2E filters.

Genomic Analysis and Mapping of the ␤1,4GalNAc-T Gene
We screened a human genomic library in the vector EMBL3 and isolated six clones (C2-1, C3-1, C4-1, C7-2, C8-1, and C10-1) based on restriction mapping. These clones hybridized with 5Ј-cDNA (nucleotides Ϫ60 to ϩ327 of cDNA) and 3Ј-cDNA (nucleotides ϩ2025 to ϩ2450 of cDNA) probes after digestion by restriction enzymes. Fragments of clones C3-1 and C8-1 hybridized with both probes, so the C3-1 clone, which was longer than C8-1, was subcloned into pBluescript II SK ϩ and sequenced. The insert of clone C3-1 was found to span almost the entire ␤1,4GalNAc-T gene, containing all exons corresponding to the 2.5-kilobase ␤1,4GalNAc-T cDNA clone (pM2T1-1). Table I shows that the ␤1,4GalNAc-T cDNA was divided into at least 11 exons, ranging from 41 to over ϳ1000 bp and spanning 8 kilobases of genomic DNA (Fig. 1). Each exon was sequenced to determine its exact size as summarized in Table  I. The sizes of the introns were determined by sequencing or calculating from the fragments generated by restriction enzyme digestion of the subcloned ␤1,4GalNAc-T gene as summarized in Table II. We confirmed that the sequences adjacent to the exon junctions were compatible with the consensus splice site sequences reported by Breathnach and Chambon (23). Although a sequence of 1950 nucleotides downstream from the first nucleotide of exon 11 (883 nucleotides from the end of cDNA clone pM2T1-1) was analyzed, a poly(A) signal was not found.

Identification of the Transcription Initiation Sites
To define the transcription initiation sites for the ␤1,4Gal-NAc-T gene, 5Ј-RACE analysis and RNase protection assays  were performed (Figs. 2 and 3). 5Ј-RACE Assay-After PCR amplification of the 5Ј-ends of cDNAs prepared from SK-MEL-31 mRNA, two bands at ϳ550 and 280 bp were obtained (data not shown). These extended cDNAs were cloned and Southern-blotted using oligonucleotide probes generated from exons 1 (nucleotides Ϫ46 to Ϫ28 of cDNA) and 2 (nucleotides ϩ66 to ϩ87 of cDNA) as shown in Fig. 1 to discriminate aberrant clones. The clones that hybridized with these probes were sequenced and compared with the genomic sequence of the ␤1,4GalNAc-T gene. The sequences of the 5Ј-end of 550-bp cDNA clones started at nucleotides Ϫ1073 to Ϫ1036 of the genomic sequence as shown in Fig. 2B. The sequences of the 5Ј-site of 280-bp cDNA clones matched either one of two portions of the genomic sequence. About 10% (5/48) of the 280-bp RACE products corresponded to the original exon 1 and started at nucleotides Ϫ814 to Ϫ809 or at Ϫ720 as shown in Fig. 2B. The remaining RACE clones (ϳ90%, 43/48) did not hybridize with an oligonucleotide probe derived from exon 1. The region of the cDNA sequences corresponding to exon 1 was replaced by a rather different sequence, which was found in intron 1 of the original genomic sequence corresponding to nucleotides Ϫ520 to Ϫ412 and termed exon 1c as shown in Fig.  2 (B and C). The sequence of the 3Ј-end (nucleotide Ϫ412) of exon 1c was adjacent to the original exon 2 junction and matched the consensus splice site sequences shown in Fig. 2B. Consequently, the results of 5Ј-RACE suggested the presence of three sites of transcription initiation as shown in Fig. 2C. The transcripts starting from exons 1a, 1b, and 1c were tentatively named a, b, and c, respectively. In transcripts a and b, exon 1b was completely covered by exon 1a, sharing ϳ170 bp, and it might consist of two different initiation sites. Transcript c started from new exon 1c alternatively with the two former transcripts. Although three possible transcription initiation sites were detected by RACE analysis, all the transcripts con-tained suitable Kozak sequences (24 -26) prior to the ATG codon of exon 2, resulting in identical single peptide sequences. Although other ATG codons are also found in exons 1a and 1c, they are not in a favorable context for initiating translation.
RNase Protection Assay-To confirm the results from 5Ј-RACE analysis, RNase protection assay using three cRNA probes surrounding exons 1a, 1b, and 1c was performed (Fig.  3A). Total RNA was prepared from the melanoma cell line SK-MEL-31, the astrocytoma cell line AS, and the pre-B cell lymphocytic leukemia cell line NALM6. SK-MEL-31 and AS cells express high levels of ␤1,4GalNAc-T mRNA, and NALM6 cells do not express this mRNA (15). As shown in Fig. 3B, the protected fragments were detected in SK-MEL-31 and AS cells and not in NALM6 cells or yeast tRNA using all three cRNA probes. The quality of RNA preparations used for RNase protection experiments was confirmed by Northern blotting using a ␤-actin probe (Fig. 3B, panel d). A 139-base segment in cRNA probe a was protected. A 204-base segment in cRNA probe b and a 130-base segment in cRNA probe c were also protected. Since RNA has less mobility than DNA of the same size on urea-polyacrylamide gels (27), we compared the size of the cRNA prepared using a known cDNA template with that estimated based on the sequencing ladder. The correct sizes were likely to be ϳ20% smaller than those obtained based on the sequencing ladder. According to these results, the sizes of the protected bands for exons 1a, 1b, and 1c were ϳ111, 163, and 104 bp, respectively. The sizes of the protected bands for exons 1a and 1b were almost in accordance with the results of 5Ј-RACE analysis, although the size for exon 1c was slightly higher than that calculated from 5Ј-RACE analysis.

Primer Extension Analysis
Since the protected band for exon 1c (Fig. 3B, panel c) was faint and the transcription initiation site estimated by RNase protection assay was ϳ20 bp upstream from the site estimated by 5Ј-RACE analysis, primer extension analysis was also performed to determine the transcription initiation site of exon 1c. As shown in Fig. 3C, the extension product obtained with the total RNA from SK-MEL-31 and AS cells was 66 nucleotides in length, corresponding to a position Ϫ524 relative to the ATG codon. The results from primer extension analysis were almost in accordance with the results from 5Ј-RACE analysis.

Sequence Analysis of the 5Ј-Flanking Region
The sequence of the 5Ј-flanking region of the ␤1,4GalNAc-T gene is shown in Fig. 4. The three transcription initiation sites that were determined by 5Ј-RACE analysis and RNase protection assay are marked by arrows. In the 5Ј-flanking region of exon 1a, there are consensus binding sites for the transcription factors EGR-1, HNF-5, and Sp-1. In the 5Ј-flanking region of exon 1b, there are three binding sites for Sp-1 and one for AP-2. In the 5Ј-flanking region of exon 1c, one AP-2 site and two S1 HS sites are present. There is a TATA box at nucleotide Ϫ1730, although we did not detect another exon near the TATA box in the transcripts of SK-MEL-31.

Promoter/Enhancer Activity of the 5Ј-Flanking Region of the ␤1,4GalNAc-T Gene
To roughly test whether the 5Ј-flanking regions of these initiation sites of the gene had promoter activity, we prepared constructs in which the 5Ј-flanking region (ϳ1 kilobase upstream from exon 1a) or the 5Ј-flanking and intron 1 regions were inserted before a promoterless CAT gene in plasmid pSV00CAT (Fig. 5A). Construct pGTCATϪ2228/Ϫ692 pro- moted a low level of CAT activity in SK-MEL-31 cells, but it was significantly higher than the background level in pSV00CAT-transfected cells. The CAT activity of pGT-CATϪ947/Ϫ692 was minimal in this system. The high level of CAT activity was detected using pGTCATϪ2228/ϩ156 or pGT-CATϪ947/ϩ156, which contained the original intron 1 region in addition to the 5Ј-flanking region (Fig. 5B, panel a). Similar results were obtained when these pGTCAT constructs were transfected into AS cells (Fig. 5B, panel b). CAT activity was scarcely detectable when these constructs were transfected into MeWo cells, another melanoma line that does not express the ␤1,4GalNAc-T gene.

Promoter/Enhancer Activity of Transcripts a, b, and c
As shown in Figs. 2 and 3, at least three initiation sites were detected by 5Ј-RACE analysis and RNase protection assay. To examine the promoter activity of these three transcripts, the CAT constructs pGTCATϪ2228/Ϫ948, pGTCATϪ947/Ϫ692, and pGTCATϪ662/Ϫ441 were prepared and examined by transfection into SK-MEL-31 cells. pGTCATϪ2228/Ϫ948 showed almost equivalent activity compared with pGT-CATϪ2228/Ϫ692, and pGTCATϪ662/441 showed much higher activity. On the other hand, only background levels of CAT activity were detected in pGTCATϪ947/Ϫ692 (Fig. 6A).
To examine the region essential for the promoter/enhancer activity of the ␤1,4GalNAc-T gene, several deletion constructs were prepared using pGTCATϪ2228/ϩ156, which showed high levels of CAT activity as shown in Fig. 5. CAT activity was measured following transfection of the constructs into SK-MEL-31, AS, and MeWo cells. As shown in Fig. 6B, plasmids pGTCATϪ2228/ϩ156 and pGTCATϪ947/ϩ156 promoted CAT activity quite considerably in SK-MEL-31 and AS cells, although the CAT activity decreased with the increased shortening of the 5Ј-side. These constructs did not show any CAT activity in MeWo cells. The more deleted constructs showed very low levels of CAT activity even though they contained intron 1. However, pGTCATϪ688/ϩ156 or more deleted constructs showed definite CAT activity in not only SK-MEL-31 and AS cells, but also in MeWo cells. pGTCATϪ662/Ϫ441, in which the CAT gene was ligated to exon 1c, also promoted CAT activity in all three cell lines. The reverse construct, pGT-CATϪ441/Ϫ662, scarcely promoted CAT activity, suggesting the presence of promoter/enhancer activity upstream of exon 1c (data not shown).
Summarizing these results, promoter/enhancer activity was detected upstream of each exon (exons 1a, 1b, and 1c), although the 5Ј-flanking region of exon 1b (nucleotides Ϫ947 to Ϫ692) required the addition of the original intron 1 for definite activity. The regions between nucleotides Ϫ2228 and Ϫ947 and between nucleotides Ϫ662 and Ϫ441 showed apparent CAT activity by themselves. Particularly high promoter activity was detected in the latter.
Chromosomal Mapping of the Human ␤1,4GalNAc-T Gene FISH signals were constantly emitted by chromosome band 12q13.3 among the 50 cells analyzed (Fig. 7). DISCUSSION Since cDNAs of a number of glycosyltransferase genes have been isolated, the genomic organizations of several glycosyl- transferases have been characterized so far. These results revealed that there are two types of genomic structure. The first group is represented by human ␤1,2-N-acetylglucosaminyltransferase I (28) and human ␣1,3-fucosyltransferase (29,30) genes, in which the coding sequences are completely contained within a single large exon. The second group is represented by ␤1,4-galactosyltransferase (31), ␣2,6-sialyltransferase (9, 32), and ␣1,3-galactosyltransferase (33) genes, in which the coding sequences are scattered over several exons. The ␤1,6-N-acetylglucosaminyltransferase gene forming the core 2 O-glycan branch belongs to the former group (34), and the blood group A synthase gene is of the latter type (35). The ␤1,4GalNAc-T gene analyzed here is also of the latter type. Like many other glycosyltransferase genes, the presence of an untranslated 5Ј-exon and of one unusually long coding exon has also been recognized in this gene (5).
The ␤1,4GalNAc-T gene is expressed under various biological conditions. The gene is abundantly expressed in the brain tissues of vertebrates. We demonstrated that this gene is expressed at high levels in mouse brain at the late stage of development (17), when differentiation of neuronal cells proceeds and synapses are formed. Among malignant tumor cells, human cancer cells derived from the neural crest characteristically express the products of this gene. Almost all neuroblastoma cells ubiquitously express the ␤1,4GalNAc-T gene and G D2 (36), while malignant melanoma cells usually express this gene at the progressed and "vertical" phase (37). Furthermore, the expression of G D2 appears to be due to the transactivation of the ␤1,4GalNAc-T gene by human T cell lymphotrophic virus type I p40 tax protein (16). These results suggest that this gene is regulated by several units of promoters and transcripts, provided there are no other glycosyltransferases catalyzing a similar reaction. This regulatory system of gene expression appears common among glycosyltransferase genes, as shown in ␤1,4-galactosyltransferase (13). The promoter units defined here should be differentially involved for the appropriate expression of this gene under the control of cell type-specific transcription factors. Among consensus binding sites for the transcription factors in the 5Ј-noncoding region of individual transcription initiation sites, EGR-1, known as NGFI-A, Krox-24, and zif268, is specially abundant in mouse brain, thymus, lung, and heart (38 -41). HNF-5 is a liver-specific DNA-binding protein, and HNF-binding sites are present in multiple regulatory sequences of other genes with liver-specific expression The average of the CAT activity normalized for transfection efficiency by the luciferase activity derived from cotransfected pCEV/Luc was calculated from three independent experiments. The transfection efficiency was 380 -460 when analyzed as described under "Experimental Procedures," and it did not essentially affect the relative CAT activity among constructs. An arbitrary value of 100 was given to the transcriptional activity resulting from the pGTCATϪ662/Ϫ441 construct. B, constructs and CAT activity of the ␤1,4GalNAc-T/CAT deletion mutant clones in SK-MEL-31, AS, and MeWo cells. The structures of the deletion mutant plasmids are shown by indicating the 5Ј-ends with nucleotide numbers at the left. The experiments were repeated three times independently using SK-MEL-31, AS, and MeWo cells, and the results were reproducible. The relative CAT activity that was normalized for transfection efficiency by the luciferase activity derived from cotransfected pCEV/Luc is presented by shaded bars (SK-MEL-31 cells), hatched bars (AS cells), and solid bars (MeWo cells). The transfection efficiencies were ϳ315, 385, and 310 for AS, MeWo, and SK-MEL-31 cells, respectively. An arbitrary value of 100 was given to the transcriptional activity resulting from the pGTCATϪ2228/ϩ156 construct in SK-MEL-31 cells. (42,43). AP-2 is also a transcription factor expressed in the neural crest lineage (44,45). The S1 HS site is the S1 nucleasesensitive site in the epidermal growth factor receptor gene promoter associated with a DNA-binding protein (46). Which consensus sequences among these elements are really significant remains to be elucidated.
Alternative splicing between exons 1 (1a or 1b) and 1c should also be involved in the differential expression of this gene. As has been shown, the ␣2,6-sialyltransferase gene is regulated by complicated alternative splicing mechanisms (47,48). A few exons located in the 5Ј-noncoding region are alternatively used in a lineage-or differentiation-specific manner in B lymphoblastoid cell lines (48). The alternative usage of several exons coding the enzyme protein was also identified as a basis of tissue-specific molecular form and gene expression (47). The significance of the alternative usage of three exons in this gene also remains to be investigated.
The regulatory mechanisms for each transcription of this gene appear to be very complicated. Promoter/enhancer activity between nucleotides Ϫ2228 and Ϫ810 was observed accordingly with mRNA levels of the ␤1,4GalNAc-T gene in the three cell lines. The precise characterization of enhancer activity in intron 1 remains to be analyzed. Promoter/enhancer activity in the 5Ј-flanking region of exon 1c (nucleotides Ϫ688 to Ϫ634) was observed even in MeWo cells, if less than in the two other cell lines, although no expression of the gene was detectable in this cell line. These results suggest the possibility that nucleotides Ϫ810 to Ϫ688 have a regulatory activity, repressing downstream promoter activity. Whether the repressor is a major mechanism for the negative expression of the gene is a very interesting issue and is now under investigation in our laboratory.