Genomic Organization and Chromosomal Mapping of the Galβ1,3GalNAc/Galβ1,4GlcNAc α2,3-Sialyltransferase

In this report we describe the chromosome mapping and genomic organization of the human Galβ1,3GalNAc/Galβ1,4GlcNAc α2,3-sialyltransferase gene. The gene is localized to human chromosome 11(q23-q24) by in situ hybridization of metaphase chromosomes. It spans more than 25 kilobases of human genomic DNA and is distributed over 14 exons that range in size from 61 to 679 base pairs. Previous characterization of cDNAs encoding the Galβ1,3GalNAc/Galβ1,4GlcNAc α2,3-sialyltransferase revealed that the gene produces at least three transcripts in human placenta, which code for identical protein sequences except at the 5′ ends (Kitagawa, H., and Paulson, J. C. (1994a) J. Biol. Chem. 269, 1394-1401). Repeated screening for clones that contain the 5′ end of the cDNA has identified two additional distinct mRNAs that are expressed in human placenta. Comparison of the genomic DNA sequence with that of the five different mRNAs indicates that these transcripts are produced by a combination of alternative splicing and alternative promoter utilization. Northern analysis indicated that one of them is specifically expressed in placenta, testis, and ovary, indicating that its expression is independently regulated from the others.

In this report we describe the chromosome mapping and genomic organization of the human Gal␤1,3GalNAc/ Gal␤1,4GlcNAc ␣2,3-sialyltransferase gene. The gene is localized to human chromosome 11(q23-q24) by in situ hybridization of metaphase chromosomes. It spans more than 25 kilobases of human genomic DNA and is distributed over 14 exons that range in size from 61 to 679 base pairs. Previous characterization of cDNAs encoding the Gal␤1,3GalNAc/Gal␤1,4GlcNAc ␣2,3-sialyltransferase revealed that the gene produces at least three transcripts in human placenta, which code for identical protein sequences except at the 5 ends (Kitagawa, H., and Paulson, J. C. (1994a) J. Biol. Chem. 269, 1394 -1401). Repeated screening for clones that contain the 5 end of the cDNA has identified two additional distinct mRNAs that are expressed in human placenta. Comparison of the genomic DNA sequence with that of the five different mRNAs indicates that these transcripts are produced by a combination of alternative splicing and alternative promoter utilization. Northern analysis indicated that one of them is specifically expressed in placenta, testis, and ovary, indicating that its expression is independently regulated from the others.
Sialic acid-containing oligosaccharide structures found on glycoproteins and glycolipids are known to vary with species, tissue type, and stage of development. The structural diversity of these carbohydrates is believed to be used by the cell to mediate specific cellular recognition processes including protein targeting, cell adhesion, and cellular differentiation and development (Kornfeld, 1987;Rademacher et al., 1988;Paulson, 1989;Brandley et al., 1990;Varki, 1992;Powell and Varki, 1995). The high degree of structural diversity observed in the terminal glycosylation sequences of glycoprotein carbohydrates is generally believed to be specified by the glycosyltransferases produced by the cell. Accumulating evidence suggests that the regulated expression of these enzymes may account for the synthesis of cell type-specific carbohydrate structures Kleene and Berger, 1993;Kitagawa and Paulson, 1994b;Natsuka and Lowe, 1994). Despite the growing number of glycosyltransferase cDNAs which have been cloned, limited information is available concerning the organization and regulation of the expression of glycosyltransferase genes (Joziasse, 1992;Kleene and Berger, 1993).
In this report we have examined the gene of human Gal␤1, 3GalNAc/Gal␤1,4GlcNAc ␣2,3-sialyltransferase for which the cDNA has recently been cloned from human melanoma cell line WM266-4 and human placenta and partial characterization of the human gene has been recently reported (Sasaki et al., 1993;Kitagawa and Paulson, 1994a;Chang et al., 1995). The gene was found to span more than 25 kb and produce at least five distinct transcripts in human placenta. Northern analysis indicated that one of them is specifically expressed in placenta, testis, and ovary. The results suggested that the human ␣2,3sialyltransferase gene is expressed tissue specifically by a combination of alternative splicing and alternative promoter utilization. Finally, we document that this gene and the human Gal␤1,3(4)GlcNAc ␣2,3-sialyltransferase gene, which has the highest homology to this gene, reside in different human chromosomes, 11q23-q24 and 1p34-p33, respectively.
Isolation of Human Genomic Clones-Human genomic DNA (Clontech) was partially digested with Sau3AI and then ligated into XhoIdigested FIXII (Stratagene). The resultant library was packaged using a Stratagene Gigapack II XL packaging extract and plated on E. coli XL1-Blue MRA(P2) (Stratagene). Approximately 1 million plaques were screened with radiolabeled sialyltransferase cDNA as described (Kitagawa and Paulson, 1994a). Multiple clones (C1-C21) were isolated, and * This work was supported in part by United States Public Health Service Grant GM27904. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence (s)  three of them, C1, C11, and C21, were characterized in detail. Insert DNA fragments were initially characterized by restriction digestion and Southern blot analysis (Sambrook et al., 1989). Human genomic DNA fragments that hybridized to the sialyltransferase cDNA probes were subcloned into Bluescript plasmid vectors. Nucleotide sequencing was carried out on double-stranded templates using a Sequenase sequencing kit (United States Biochemical Corp.).
Northern Analysis-Multiple tissue Northern blots of poly(A) ϩ RNAs were purchased from Clontech Laboratories for the analysis. The blots were probed with a gel-purified, radiolabeled (Ͼ1 ϫ 10 9 cpm/g), 1.3-kb EcoRI fragment isolated from STZ-2 (Kitagawa and Paulson, 1994a). The type A1 (former Long A) form-specific fragment (191 bp) was generated by PCR using 5Ј primer, 5Ј-CTTCATCTTGAAGGACAGTGG-3Ј, and 3Ј primer, 5Ј-CTTCCAGCCTGCAGGACACAT-3Ј. PCR reaction was carried out with Pfu polymerase by 30 cycles of 95°C for 45 s, 55°C for 45 s, and 73°C for 90 s. The PCR fragment was purified, radiolabeled (Ͼ1 ϫ 10 9 cpm/g), and then used for Northern blots as a probe. The hybridized blots were washed at room temperature in 2 ϫ SSC, 0.1% SDS for 10 min, then twice at 65°C in 0.1 ϫ SSC, 0.1% SDS for 20 min, and exposed to x-ray film for several days.
In Situ Chromosome Hybridization-In situ hybridization was carried out on chromosome preparations obtained from phytohemagglutinin-stimulated human lymphocytes cultured for 72 h. 5-Bromodeoxyuridine was added for the final 7 h of culture (60 g/ml medium), to ensure a posthybridization chromosomal banding of good quality. The clones STZ-1 (Kitagawa and Paulson, 1994a) and ST3NHP-1 (Kitagawa and Paulson, 1993) were used as probes, respectively. The conditions for labeling of probes and hybridization and washing were as described previously (Nguyen et al., 1986). After coating with nuclear track emulsion (KODAK NTB 2 ), the slides were exposed for 18 days at 4°C and were developed. To avoid any slipping of silver grains during the banding procedure, chromosome spreads were first stained with a buffered Giemsa solution, and metaphases were photographed. R-banding was then performed by the fluorochrome-photolysis-Giemsa method, and metaphases were rephotographed before analysis.

RESULTS
Identification of the Novel Isoforms of the Human Gal␤1, 3GalNAc/Gal␤1,4GlcNAc ␣2,3-Sialyltransferase-The cDNAs of Gal␤1,3GalNAc/Gal␤1,4GlcNAc ␣2,3-sialyltransferase were reported previously to consist of at least three isoforms in human placenta (Kitagawa and Paulson, 1994a). To identify the 5Ј ends of the Gal␤1,3GalNAc/Gal␤1,4GlcNAc ␣2,3-sialyltransferase transcripts and possibly identify other mRNA isoforms that differ in the 5Ј end, we employed a rapid amplification of cDNA ends-polymerase chain reaction (RACE-PCR) cloning strategy combined with reverse transcription polymerase chain reaction. Human placenta poly(A) ϩ RNA was reverse-transcribed using a common primer among the three isoforms cloned previously (type A1, type A2, and type B; see "Experimental Procedures"). PCR amplification was then performed using the anchor primer, and each isoform-specific internal primer revealing five distinct PCR products. Two bands were identified with the type A1 or type A2 form-specific primers, respectively, and three bands were detected with the type B form-specific primer. After subcloning the PCR product and sequencing individual clones, we found that there were two other novel isoforms related to the previously described type B form (Kitagawa and Paulson, 1994a) which differed in their 5Ј noncoding sequences and are referred to as type B2 and type B3. The nucleotide sequences of the nonhomologous region at the 5Ј end of the three type B forms are presented in Fig. 1. These two additional type B transcripts indicate that the human ␣2,3-sialyltransferase gene produces at least five transcripts in human placenta.
Isolation and Characterization of the Human ␣2,3-Sialyltransferase Gene-In order to gain information about the organization and the regulation of Gal␤1,3GalNAc/Gal␤1,4GlcNAc ␣2,3-sialyltransferase gene in human tissues, it was necessary to isolate the genomic sequences containing this gene. A cDNA (type A1) was used as a probe to screen a human placenta genomic DNA library. Several independent clones were isolated and subjected to a Southern blot analysis. Three were found to overlap with each other and contained all of the exons corresponding to the type A1 form of ␣2,3-sialyltransferase cDNA. As summarized in Fig. 2, coding sequences for the type A1 and type A2 forms protein of human ␣2,3-sialyltransferase gene are divided into 9 exons, and that for the type B form protein is divided into 10 exons, ranging in size from 61 bp to 679 bp. Exons E3 and E6 -E14 contain the coding sequence indicated in Fig. 2. Exon E14 also contains the 3Ј-untranslated region which included the poly(A) attachment site, ATTAAA. The locations of the ␣2,3-sialyltransferase exons were determined by sequencing of the genomic clones. Moreover, sequencing was done to determine the exact size of the exons as shown in Table I as well as the sequence of the intron/exon junctions as shown in Table II. All the intron/exon junctions were found to follow the GT/AG rule and were flanked by conserved sequences (Breathnach and Chambon, 1981). The sequence encoded by these exons is identical with the sequence of the human ␣2,3-sialyltransferase cDNA reported by us (Kitagawa and Paulson, 1994a). Sasaki et al. (1993) has also cloned the type B1 form cDNA from human melanoma cell line WM266-4 using the expression cloning method with the cytotoxic lectin. However, the cDNA lacks the 12 bp, from nucleotide position 87 to 108, in the stem region of the type B1 form reported by us (Kitagawa and Paulson, 1994a). The sequence of the cDNA at the boundary of the 12-bp deletion matches consensus splice donor site sequence. From both this observation and our analysis of the intron/exon structure of the gene, it became apparent that alternative splicing of the mRNA takes place within exon E6. Although we have screened human placenta cDNA extensively, this type of isoform has not been detected. Although no genomic clones were identified that contained the 5Ј-most exon (exon E1) of the type B1 form, restriction mapping by Southern blot analysis indicates that the exon is found to be more than 15 kb upstream of exon E2 (data not shown).
In summary, the analysis suggests that the entire ␣2,3sialyltransferase gene spans over 25 kb of human genomic DNA. It should be noted that, in contrast to the genomic organization of ␣2,6-sialyltransferase (Svensson et al., 1990), the highly conserved sialylmotif, used to clone this ␣2,3-sialyltransferase cDNA (Kitagawa and Paulson, 1994a), is divided into two exons, exon E10 and E11 (Fig. 2). In addition, the unique sequence found on the 5Ј end of the type B2 and type B3 forms (see Fig. 1) were mapped between exon E1 and exon E3, indicating that these were produced by alternative promoter utilization (Fig. 2). The exon E2 for the 5Ј end of type B2 form was located 397 bp upstream of exon E3. The 5Ј-most transcriptional start site of the type B3 form was located only 44 bp upstream of exon E3, and the mRNA was formed to the 3Ј end of exon E3 without splicing. These results suggest that the five mRNA isoforms are produced by a combination of alternative splicing and alternative promoter utilization, and, consequently, that the mRNA is formed from a combination of 14 exons of the ␣2,3-sialyltransferase gene.
Analysis of the Transcriptional Start Sites and the Sequence of the ␣2,3-Sialyltransferase Promoter Region-The sequence of the 5Ј-flanking region of the type A1, type A2, and type B3 forms, and the type B2 form of the ␣2,3-sialyltransferase gene is shown in Figs. 3 and 4, respectively. The transcriptional start sites were determined by sequencing the RACE-PCR products as described above and were marked by arrows in the figures. Sequencing of the PCR product of the type A1, type A2, the type B2, and the type B3 forms also revealed that transcription of the former two forms, type A1 and type A2, initiates at two positions and that of the latter two forms, type B2 and type B3, initiates at several positions, indicated by arrows in Figs. 3 and 4. As expected from the observation that there are multiple sites of transcription initiation, both of the 5Ј-flanking regions lack canonical TATA or CCAAT boxes, but do contain several other well characterized promoter elements as shown in Tables III and IV. As shown in Fig. 3 and Table III, this region contains six sequence motifs similar to AP2 recognition Shown below are the splicing patterns for each type of message described here. Abbreviations for each message are used in the text. In the previous publication (Kitagawa and Paulson (1994a)), types A1, A2, and B1 were referred to as Long A, Short, and Long B, respectively.

TABLE I
Exons of the human ␣2,3-sialyltransferase gene For each exon, the cDNA residues and positions of the amino acid sequence predicted by the open reading frame are given as numbered in Fig. 2 and Fig. 3 by Kitagawa and Paulson (1994a   , of which four are just upstream of the transcriptional start sites of the type B3 form and two are just upstream of those of the type A1 and type A2 forms. In addition, three potential mammary cell activating factor (MAF) consensus sequences (Mink et al., 1992) were identified; one potential Sp1 consensus sequence (Kadonaga et al., 1986) was found; and four ETF consensus sequences which stimulate transcription of promoters lacking TATA boxes (Kageyama et al., 1989) were identified. Moreover, two LF-A1 consensus sequences (Hardon et al., 1988), one HLH consensus sequence (Blackwell and Weintraub, 1990), and one NF-1-like protein binding site  were found. As shown in Fig. 4 and Table IV, the 5Ј-flanking region of the type B2 form also contains three sequence motifs similar to the MAF recognition element, one sequence similar to the AP1 binding site , four sequences similar to the AP2 binding site, seven ETF consensus sequences, one HLH consensus sequence, and one NF-1-like protein binding site. Moreover, three additional sequence motifs were detected. Three CArG consensus binding sites are present, a sequence motif required for expression of smooth muscle-specific genes (Reddy et al., 1990), one OCT (octamer binding transcription factor) consensus binding site (Cox et al., 1988) is identified, a sequence motif recognized by an octamer-related proteins which have been implicated in the control of the histone 2b gene and the melanocyte-specific tyrosinase-related protein TRP1 (Lowings et al., 1992), and one PEA3 consensus sequence is also present (Faisst and Meyer, 1992).
Tissue-specific Expression of the Type A1 Form-As described in the previous paper (Kitagawa and Paulson, 1994a), the ␣2,3-sialyltransferase exhibits a unique tissue-specific pattern of expression. In order to determine whether the ␣2,3sialyltransferase transcripts are tissue-specifically expressed by a combination of alternative splicing and alternative promoter utilization, Northern blots with mRNAs from human adult and fetal tissues were probed with the type A1 formspecific fragment (Fig. 5b). For comparison, the same Northern blot was probed by a full-length cDNA probe of the ␣2,3-sialyltransferase shown in Fig. 5a, which should detect all five transcripts. This result indicates that the type A1 form mRNA   MAF consensus is specifically expressed in placenta, ovary, and testis. For the three type B forms, however, the length of the cDNA specific for each type is relatively short so that it was not possible to get a visible signal.

G(A/G)(A/G)G(C/G)AAG(G/T)
Chromosomal Mapping of Two Human ␣2,3-Sialyltransferase Genes-Using a cDNA probe of the Gal␤1,3GalNAc/ Gal␤1,4GlcNAc ␣2,3-sialyltransferase, in situ hybridization to normal metaphase chromosomes was performed to determine the chromosomal localization of the human gene. Of 100 metaphase cells examined from this hybridization, 197 silver grains were associated with chromosomes, and 42 of these (21.3%) were located on chromosome 11. The distribution of grains on this chromosome was not random, and 83.3% of them were mapped to the q23-q24 region of chromosome 11 long arm (Fig. 6a). These results clearly indicate that the gene is located at human chromosome 11q23-q24.
The gene for human Gal␤1,3GalNAc/Gal␤1,4GlcNAc ␣2,3sialyltransferase is distributed over 14 exons that span at least 25 kb of genomic sequence. Transcription of this gene results in the production of five distinct mRNAs (type A1, type A2, type B1, type B2, and type B3 forms) in human placenta, each approximately 2.0 kb in size, that are generated by a combination of alternative splicing and alternative promoter utilization. Translation of these individual mRNAs predicts the biosynthesis of three related protein isoforms of the ␣2,3- sialyltransferase which were previously referred to as the Long A, Long B, and Short forms, of 332, 333, and 322 amino acids, respectively, which has been confirmed by in vitro translation. 2 Structurally, these three protein isoforms differ from each other only at its N-terminal that is the cytoplasmic tail and the part of transmembrane domain (Kitagawa and Paulson, 1994a). The biological significance of the three different protein isoforms is presently unclear.
The observation of multiple transcripts for this ␣2,3-sialyltransferase gene has also been observed with other glycosyltransferase genes including those of the ␣2,6-sialyltransferase and the ␤1,4-galactosyltransferase Russo et al., 1990;Wen et al., 1992a;Aasheim et al., 1993;Wang et al., 1993). In case of the ␣2,6-sialyltransferase, at least six different transcripts were produced via alternative splicing and alternative promoter usage. The most well-characterized one is a 4.3-kb mRNA found almost exclusively in the liver (Wen et al., 1992a), which is generated from six exons of the gene (Svensson et al., 1990). Two distinct forms of a 4.7-kb mRNA, one is highly expressed in B-cells and another is expressed at low levels in most tissues (Aasheim et al., 1993;Wang et al., 1993), have been identified. The two transcripts are also produced from the same six exons as the 4.3-kb one with the addition of one or two 5Ј-untranslated exons (Aasheim et al., 1993;Wang et al., 1993). Thus, these three transcripts have identical coding sequences but having different 5Ј-untranslated sequences. Since the coding sequences are identical, the different mRNA species in this case are a consequence of the cell type-specific regulation of the expression of this complex gene. In addition, three forms of a 3.6-kb mRNA have been isolated from rat kidney. Although these transcripts are generated from the ␣2,6-sialyltransferase gene, they retain less than 50% of the coding region and do not have sialyltransferase activity (Wen et al., 1992a;Harduin-Lepers et al., 1993). Moreover, they have only been detected in the kidney in rat and not in human (Kitagawa and Paulson, 1994b).
In the case of the ␣2,3-sialyltransferase, as shown in this report, at least five transcripts are produced from a single gene locus by a combination of alternative splicing and alternative promoter usage, which each codes for identical protein, Gal␤1,3GalNAc/Gal␤1,4GlcNAc ␣2,3-sialyltransferase, except at the 5Ј ends. Why are alternative promoters and alternative splicing used for the production of five mRNAs coding for the same enzyme? Northern analysis of the tissue distribution of them demonstrated that one of the ␣2,3-sialyltransferase mRNA isoforms, type A1 form, is specifically expressed in placenta, testis, and ovary although either of them is constitutively expressed in all the tissues examined (Fig. 5). This pattern of expression is likely a consequence of differential regulation at the level of transcription as demonstrated for the ␣2,6-sialyltransferase and the ␤1,4-galactosyltransferase (Svensson et al., 1992;Harduin-Lepers et al., 1993).
The sequence analysis of the 5Ј-flanking region of the ␣2,3sialyltransferase isoforms revealed the heterogeneous transcriptional start sites and the absence of typical TATA and CCAAT boxes coupled with the presence of GC boxes (Figs. 3  and 4). These structural features are believed to be typical of the so-called housekeeping genes, which are expressed at low levels in essentially all tissues (Kadonaga et al., 1986), suggesting that its regulation would be governed, at least in part, by the Sp1 binding sites like that of the ␤1,4-galactosyltransferase (Harduin-Lepers et al., 1993). Further work is required to confirm this mechanism.