Comparative Analysis of Retroviral and Native Promoters Driving Expression of β1,3-Galactosyltransferase β3Gal-T5 in Human and Mouse Tissues*

β1,3-Galactosyltransferase β3Gal-T5 is highly expressed in the colons of humans and certain primates due to a retroviral long terminal repeat (LTR) acting as a strong promoter. Because this promoter is inactive in other human tissues or mice, we attempted to understand how adoption of a retrotransposon allowed the gene to acquire tissue-specific expression. We identified three novel 5′-UTRs of β3Gal-T5 mRNA, types A, B, and C, and found widespread expression of the type A transcript at much lower levels than the LTR transcript, the expression of which is restricted to organs of the gastrointestinal tract. Expression of the type C 5′-UTR transcript was mostly restricted to the ileum, where it was expressed at high levels. We cloned the 5′-flanking regions of both types A and B 5′-UTRs, found deletion constructs functionally active as promoters, and identified CCAAT-binding factor (CBF) and hepatocyte nuclear factor 1 (HNF-1) as the principal nuclear factors controlling the promoters of types A and B 5′-UTR transcripts, respectively. The CCAAT-binding factor binding site and the entire downstream sequence driving the expression of type A transcripts in humans are structurally and functionally conserved in mice, where they constitute a uniqueβ3Gal-T5 promoter that appears to be the ancestral promoter of the gene. The HNF-1 binding motif of the second human promoter is identical to the HNF-1/Cdx binding motif of the LTR promoter but is in the antisense orientation, resulting in much lower binding affinity and promoter strength. These data may explain the successful insertion of the transposon during evolution.

␤1,3-Galactosyltransferase ␤3Gal-T5 is highly expressed in the colons of humans and certain primates due to a retroviral long terminal repeat (LTR) acting as a strong promoter. Because this promoter is inactive in other human tissues or mice, we attempted to understand how adoption of a retrotransposon allowed the gene to acquire tissue-specific expression. We identified three novel 5-UTRs of ␤3Gal-T5 mRNA, types A, B, and C, and found widespread expression of the type A transcript at much lower levels than the LTR transcript, the expression of which is restricted to organs of the gastrointestinal tract. Expression of the type C 5-UTR transcript was mostly restricted to the ileum, where it was expressed at high levels. We cloned the 5-flanking regions of both types A and B 5-UTRs, found deletion constructs functionally active as promoters, and identified CCAAT-binding factor (CBF) and hepatocyte nuclear factor 1 (HNF-1) as the principal nuclear factors controlling the promoters of types A and B 5-UTR transcripts, respectively. The CCAAT-binding factor binding site and the entire downstream sequence driving the expression of type A transcripts in humans are structurally and functionally conserved in mice, where they constitute a unique ␤3Gal-T5 promoter that appears to be the ancestral promoter of the gene. The HNF-1 binding motif of the second human promoter is identical to the HNF-1/Cdx binding motif of the LTR promoter but is in the antisense orientation, resulting in much lower binding affinity and promoter strength. These data may explain the successful insertion of the transposon during evolution.
␤1,3-Galactosyltransferase 5 (␤3Gal-T5) 3 is a late-acting glycosyltransferase responsible for the synthesis of type 1 chain carbohydrates, including Lewis antigens, in human epithelial cells (1,2). ␤3Gal-T5 is highly expressed in human colon mucosa, down-regulated in colon cancers (3), and large amounts of its reaction products are found in the bile and small intestine (4,5). In some colon cancer cells, the first exon (exon 1) in the 5Ј-UTR of the ␤3Gal-T5 transcript is under the control of a promoter located ϳ150 bp upstream and regulated primarily by the transcription factors HNF-1 and Cdx (6). Interestingly, the entire exon 1, as well as the HNF-1/Cdx binding motif, belongs to a retroviral long terminal repeat (LTR) sequence that is the dominant promoter for the ␤3Gal-T5 gene in the colon but is inactive in other tissues where ␤3Gal-T5 is expressed (7).
Transposable elements within eukaryotic genomes from plants (8) to mammals (9) greatly affect gene expression both at the protein (10) and regulatory sequence (11) levels. The most common transposable elements in the human genome are of retroviral origin (retroelements). Retroelements often directly donate transcriptional regulatory signals that may be either excluded from some genes or taken up by others. The former group probably includes highly conserved genes with basic functions, whereas the latter likely includes gene classes that have recently expanded, such as those involved in cellular responses to external stimuli (12). The glycosylation machinery includes gene products from both classes; core enzymes probably belong to the former class (13), whereas terminal glycosyltransferases to the latter (14). In addition to ␤3Gal-T5, a retroviral LTR has been detected in the 5Ј-flanking region of two other glycosyltransferases, a fucosyltransferase and a sialyltransferase (12), but little is known about the promoter activity.
With respect to ␤3Gal-T5, the same investigators found an alternative 5Ј-UTR that is expressed in brain whose first exon, exon Ϫ3, is located Ͼ100 kb upstream of the coding sequence (7). This 5Ј-UTR also contains two additional exons, Ϫ2 and Ϫ1, not present in the LTR-derived 5Ј-UTR. More recently, they also demonstrated that the ␤3Gal-T5 LTR promoter is active in certain non-human primates but not in the mouse genome, thus dating the transposon insertion to ϳ25-30 million years ago (15).
It is not clear whether the brain 5Ј-UTR transcript is tissuespecific, whether other 5Ј-UTR promoters for the ␤3Gal-T5 gene exist in humans, whether the mouse ortholog of the gene has one or more 5Ј-UTRs, or whether the mouse ortholog shares promoters with the human transcripts. Assuming that the LTR sequence was active at the time of insertion (15), understanding these aspects would allow us to understand the putative context of ␤3Gal-T5 expression at the time of LTR insertion, which would thereby facilitate understanding of why the insertion was tolerated and what stabilized the mobile element. The present paper is aimed at identifying the native ␤3Gal-T5 promoter(s) present in the human genome, with particular attention to that (if any) common to the mouse genome and to detect features in native promoter(s) that may explain stabilization of the retroviral promoter. To address these issues, we studied the 5Ј-UTR of ␤3Gal-T5 transcripts through rapid amplification of cDNA ends (RACE) analysis of RNA isolated from different human sources and determined their relative expression in various tissues and cell lines using competitive reverse transcription (RT)-PCR. We then focused on the 5Ј-flanking region of novel 5Ј-UTR transcripts, trying to identify sequences active as promoters, using the luciferase reporter assay and then binding sites for transcription factors through electrophoretic mobility shift assay (EMSA). We also compared the sequences of the novel 5Ј-flanking regions with that flanking the mouse ␤3Gal-T5 transcript, looking for possible homologies. Because we found a putative ␤3Gal-T5 promoter apparently conserved in both species as well as a human sequence poorly active as a promoter but able to bind the same transcription factors as the LTR promoter, we studied such sequences in detail. The results suggest that we found the ancestral mammalian promoter of the gene as well as an apparently defective promoter probably related to the successful insertion of the transposable element in some primates.

EXPERIMENTAL PROCEDURES
Cell Lines, Tissues, and RNAs-Human breast cancer cell lines MCF-7 and MDA-MB-231 (a gift of Dr. Fabio Dall'Olio, University of Bologna, Italy) and the pancreas carcinoma cell line Capan-2 (American Type Culture Collection HTB 80) were cultured in Dulbecco's modified Eagle's medium containing 10% fetal bovine serum, 100 units/ml penicillin, 1 mg/ml streptomycin, and 2 mM L-glutamine. Human bile duct carcinoma cells HuCC-T1 (Japanese Collection of Research Bioresources 0425) were cultured in RPMI 1640 medium containing the same supplements as above. Human colon adenocarcinoma cells Caco-2 and SW-1116 and human gastric carcinoma cells MKN-45 were cultured as previously described (3,16). Human ileum and colon samples were collected at surgery and kindly donated by Dr. Filippo Mare (Hospital di Circolo, Varese, Italy). Total RNA from human tissues was obtained from Ambion (mammary gland), Stratagene (colon, stomach, duodenum, pancreas, mammary gland, thymus, and trachea), or Clontech (stomach, pancreas, and mammary gland). Mouse total RNA was purchased from Clontech. Total and poly(A) ϩ RNAs were prepared from human tissues or cell lines as previously reported (3,16).
5Ј-RACE-RNA ligase-mediated RACE was performed essentially according to the manufacturer's instructions with the FirstChoice RLM RACE kit (Ambion). Each reaction, containing RNase inhibitor, was treated with phenol/chloroform and then mixed with 1 l of 20 mg/ml glycogen (Fermentas), and the RNA was precipitated with sodium acetate/ethanol. Five g of total RNA or 0.125 g of poly(A) ϩ RNA was used as the starting material. A tobacco acid phosphatase reaction was performed on one-half of the calf intestinal phosphatasetreated RNA, and the whole amount was ligated to an RNA adaptor. One-fourth of the obtained material was then used for individual RTs in a final volume of 5.0 l using either avian myeloblastosis virus or Moloney murine leukemia virus reverse transcriptase (GE Healthcare). Moloney murine leukemia virus RTs were incubated 20 min at 37°C, 40 min at 42°C, and 20 min at 45°C. Avian myeloblastosis virus RTs were incubated 40 min at 42°C, 20 min at 45°C, and 20 min at 50°C. The coding sequence-specific RT primer was 5Ј-AGGAGGAAGAAT-GTC-3Ј, and the exon Ϫ1-specific RT primer was 5Ј-CTTAC-TAGCCAATAAC-3Ј. Amplifications (35 cycles) were performed in 25 l of LA Taq (Takara) according to the manufacturer's recommendations, with 4 l of the RTs as the template. Nested PCRs (30 cycles) were performed using 0.4 -2.0 l of the first amplifications and inner primers designed to contain restriction sites. Amplified fragments were columncleaned, digested with appropriate restriction enzymes, gel-isolated, and cloned into pGl3 vector for sequencing.
RT-PCR-Competitive RT-PCR was performed as previously described (3,16). The human and mouse ␤-actin and human ␤3Gal-T5 coding sequence competitors and oligonucleotide primers used were those previously described (3). To quantify the different ␤3Gal-T5 5Ј-UTRs, a competitor was prepared by subcloning the entire sequence of the short splicing variant of the type A 5Ј-UTR in the HindIII-XhoI sites of pCDM8 vector. Subsequently, a double-stranded oligonucleotide (see below) with NdeI/HindIII ends was ligated into the corresponding cohesive ends of the construct. The oligonucleotide NdeI/HindIII ends are underlined; also shown are primer sequences for exons 1C (bold) and 1B (italic): 5Ј-TATGAAGGGCATTGGAGACCCAGGGACC-GCCTGCCCATTACAGTGACTCACA (upper strand) and 5Ј-AGCTTGTGAGTCACGTAATGGGCATGGCGGTCC-CTGGGTCTCCAATGCCCTTCA (lower strand). The resultant construct was truncated by PdmI digestion and self-ligated. The LTR competitor was prepared by PdmI digestion and self-ligation of a plasmid with the LTR transcript cloned in the pCDM8 vector. Other exon-specific upper strand primers were 5Ј-CTGCCGGGGCTGCCCC-GCGC (exon 1A), 5Ј-CCTGCCCTTGGACATCAGAG-CTGC (exon 1), and 5Ј-CAAACCAGAGGTTCCTCTTAC-CCAGC (exon 3); the lower strand primer for the coding sequence was 5Ј-GAATGGTGGGTACCTGTCCCACGG. Mouse ␤3GalT5 cDNA was obtained by PCR and cloned in the pCDM8 vector using an upper strand primer that annealed to the mouse first exon (5Ј-CGTCCCCTGCCCT-GTCCCTTCTC) and a lower strand primer that annealed to the end of the coding sequence (5Ј-GGTCCTAGGTGCCC-CCAGGGTCC). The upper strand primer for the mouse ortholog of exon 3 was 5Ј-GTGGGACCTCACTCACCG-GCTGC.
Luciferase Assay-A cDNA encompassing the 5Ј-flanking region of ␤3Gal-T5 at positions Ϫ808 to ϩ148 relative to the beginning of exon 1A was amplified by PCR (30 cycles) using LA Taq and GC II buffer (Takara) following the manufacturer's recommendations. The 25-l reaction contained 150 ng of human placenta genomic DNA as the template and primers having restriction sites at their 5Ј-ends. The resultant fragment was cloned in the corresponding sites of the vector pGl3 (Promega) upstream of the firefly luciferase coding sequence. Deletion constructs were prepared by PCR (25 cycles) under similar conditions with the proper primer pairs but using the plasmid pGl3-808ϩ148 (5 ng/reaction) as the template. The 5Ј-flanking region of the mouse ortholog was prepared similarly using corresponding primers. Site-directed mutagenesis of the CCAAT-binding factor (CBF) sites was performed using Pfu Turbo DNA polymerase (Stratagene) under the conditions described in the QuikChange mutagenesis kit protocol. A cDNA encompassing the 5Ј-flanking region of ␤3Gal-T5 from positions Ϫ1011 to ϩ20 from the beginning of exon 1B as well as the cognate deletion constructs were obtained and cloned in pGl3 using the same approach but under standard LA Taq reaction conditions.
For transfection, host cells were plated 20 h in advance in 96-well plates with 50,000 cells/well in 0.1 ml of culture medium. Transfection solutions were prepared by mixing 300 ng of test DNA with 15 ng of the Renilla luciferase expression vector pRL-CMV (Promega) in 25 l of serumfree medium and then adding 25 l of serum-free medium containing 0.9 l of Lipofectamine (Invitrogen). Liposomes were allowed to form 20 min at room temperature. Cells were washed twice with serum-free medium, fed with transfection solution, and incubated under regular growing conditions for 3 h. After the addition of 0.1 ml of standard medium, the incubation was continued for 20 h. The cells were then washed with phosphate-buffered saline and lysed with 40 l of Passive Lysis buffer (Promega). The luciferase activity of lysate aliquots was assayed using the dual luciferase reporter system (Promega).
EMSA-EMSAs were performed with the Lightshift chemiluminescence kit (Pierce) following the manufacturer's recommendations, but the binding reaction volume was scaled down to 10 l. In the case of G/C-rich sequences, the binding reactions contained 50 mM KCl, 5.0% glycerol, 5.0 ng/l poly(dI⅐dC), and no detergent or MgCl 2 . Nuclear extracts were prepared using the NE-PER extraction kit (Pierce), and 1 l of extract (3-4 g of protein) was added to each binding reaction.
Anti-HNF-1 (Santa Cruz Biotechnology catalog number sc-8986X) or anti-CBF-B (catalog number sc-10779X), at 1 l/reaction, was incubated 30 min on ice in the binding reaction before adding the probes. Competitor consensus sequences for CBF (catalog number sc-2591), Ets-1 (sc-2549), and Oct-1 (sc-2506) were double-stranded oligonucleotides whose sequences corresponded to those suggested by Santa Cruz Biotechnology, as indicated. HNF-1 competitor DNA was the 29-bp SIF3 sequence previously reported (17). DNA probes were biotinylated using the biotin 3Ј-end labeling Kit (Pierce). DNA/protein complexes were separated by 5% native PAGE, transferred to a nylon membrane, cross-linked under UV light, and detected according to the kit protocol.

Identification of Different 5Ј-UTRs in ␤3Gal-T5 Transcripts-
To identify novel 5Ј-UTRs, we first performed 5Ј-RACE analysis using ileum RNA and an RT primer specific for the coding sequence and found a single major band of ϳ650 bp. Sequencing of the cloned fragment revealed exon 3, exon Ϫ1, and a novel exon of 92 bp, named exon 1C that was located ϳ4 kb upstream of exon Ϫ1 in chromosome 21 ( Fig. 1). RT-PCR analysis with an upper strand primer specific to this novel exon and a lower strand primer localized in the coding sequence yielded no amplification using either mammary gland or MKN-45 gastric carcinoma cell RNAs. We could, however, successfully amplify products in these reactions using an upper strand primer specific to exon Ϫ1 and the same lower strand primer. We then performed 5Ј-RACE analysis starting from these same RNAs using a primer that annealed to exon Ϫ1 for RT. We obtained two major bands of ϳ250 and ϳ350 bp from mammary gland RNA and two bands of ϳ250 and ϳ400 bp from MKN-45 RNA. Sequencing of the fragments cloned from the mammary gland revealed a short novel sequence of 23 bp (exon 1A) joined to exon Ϫ1 and a longer one having the same 23-bp sequence at the 5Ј-end and an additional 108-bp sequence (exon 2A) in between. In the genome, they indicate a transcription start site located in a G/C-rich region ϳ33 kb from exon-1 (Fig. 1). Sequencing of the fragments cloned from MKN-45 cells also indicated the presence of exon 1A and, in addition, a novel sequence of 200 bp (exon 1B) joined to exon Ϫ1, suggesting another transcription start site located ϳ16 kb upstream of exon Ϫ1 (Fig. 1). RT-PCR performed with primers specific to the novel exons and subsequent sequencing of the products enabled us to delineate the exon structure of the ␤3Gal-T5 5Ј-UTRs (Fig. 1). Type A 5Ј-UTRs begin with exon 1A and comprise two main splice variants, a longer one that includes exon Ϫ1 and a shorter one without it. A third very minor type A variant that includes exon 2A is the longest and is almost undetectable in competitive experiments. Type B and C 5Ј-UTR transcripts begin with exon 1B and 1C, respectively, and both comprise the two main splice variants described above. Exon 3 is present at the 3Ј-end of all of the novel 5Ј-UTR-containing transcripts, as is true for the LTR-and brain-derived 5Ј-UTR mRNAs (1,7).
Distribution of ␤3Gal-T5 Transcripts with Different 5Ј-UTRs in Various Human Tissues and Cell Lines-We amplified ␤3Gal-T5 starting from normalized cDNA prepared from different sources using a common lower strand primer that annealed to the coding sequence and different upper strand primers specific to exons 1 (LTR), 1A, 1B, 1C, Ϫ3, and 3. Amplification with the exon Ϫ3-specific primer yielded no products regardless of the cDNA (data not shown). The LTR-derived 5Ј-UTR mRNA predominated in the colon, duodenum, stomach, and pancreas, where the actual expression levels ranged from 0.3 to 10 fg/pg of ␤-actin (Fig. 2). These tissues also contained types A and B 5Ј-UTR transcripts but at much lower levels. On the other hand, type A 5Ј-UTR transcripts were the predominant ␤3Gal-T5 mRNA in the mammary gland, thymus, and trachea, which lacked the LTR transcript. The actual expression levels of type A transcripts in such tissues ranged from 0.1 to 0.2 fg/pg of ␤-actin, suggesting that they may be under the control of a promoter weaker than the LTR promoter. The type C 5Ј-UTR-containing mRNA predominated in the ileum (15 fg/pg of ␤-actin) and was relevant in only one other tissue, the colon. By contrast, type B 5Ј-UTR transcripts were not the major species in any of the normal tissues tested. The actual expression levels of type B transcripts in any tissue were Ͻ0.15 fg/pg of ␤-actin, suggesting that they may be under the control of a very weak promoter. In some gastrointestinal cancer cell lines, type B transcripts were expressed at significant levels, and in the case of SW-1116 (colon), Capan-2 (pancreas), and HuCC-T1 (bile duct) cells, LTR transcript expression was low or absent. Analysis of additional independent RNA samples provided substantial sample-to-sample reproducibility; nevertheless, type C transcripts were present in one stomach and colon sample but not in the other (Fig. 2, boxed panel). Moreover, the total amount of ␤3Gal-T5 transcript measured in the sample pairs varied from 20 to 150% (Fig. 2), as previously reported in human colon (3,6).
Identification of Promoter Activity in the 5Ј-Flanking Regions of ␤3Gal-T5 Transcripts with Types A or B 5Ј-UTRs-We focused on types A and B 5Ј-UTR transcripts because the tissue distribution and expression levels made them candidates to be under the control of native promoters. On the other side, no cell line is currently available as a model for further studies of type C transcripts. To identify such putative promoters, we cloned 800 -1000-bp fragments of each 5Ј-flanking region in the luciferase reporter vector pGl3, prepared successive deletion mutants, and studied their ability to drive luciferase expression upon transfection into the appropriate cells. We used either MKN-45 cells, which express the LTR, types A and B 5Ј-UTR transcripts, or as a control, MDA-MB-231 cells, which do not express these transcripts. In the case of the 5Ј-flanking region of exon 1A (Fig. 3A), luciferase activity was evident when the whole fragment was assayed in MKN-45 cells (Ϫ808 to ϩ20). Inclusion of the intron sequence did not effect this activity (Ϫ808 to ϩ148), whereas cleavage of the first 47 bp upstream of the exon improved activity (Ϫ223 to Ϫ48). Activity was main-  Fig. 1. Quantification was performed by densitometric scanning of gel images. The amounts of amplified target cDNAs, presented in the bar graphs, were calculated from their respective standard curves and normalized to the amounts calculated for ␤-actin. The boxed panel shows the more relevant differences obtained analyzing independent RNA samples derived from the corresponding tissues (marked with a caret). The results are the means for two independent quantifications. Experimental variation was always Ͻ15% of the mean values.
tained with fragment Ϫ200 to Ϫ48 (ϳ9-fold over background), was drastically reduced with fragments Ϫ175 to Ϫ48 and Ϫ150 to Ϫ48 (ϳ3.5-fold over background), and was lost with fragment Ϫ100 to Ϫ48, suggesting that the sequences from Ϫ200 to Ϫ175 and from Ϫ150 to Ϫ100 may contain binding sites for relevant transcription factors. The highest luciferase activity measured was ϳ10-fold lower than that of the LTR promoter, whereas the expression levels of type A transcripts in MKN-45 cells were only ϳ4-fold less than that of LTR transcripts (Fig. 2). To further assess this aspect, we performed additional reporter assays in MCF-7 cells, which mainly express type A 5Ј-UTR transcripts and no detectable LTR transcripts, as shown in Fig.  2. The luciferase activity driven by the LTR promoter was found to be dramatically decreased in MCF-7 cells compared with MKN-45 cells, but was still ϳ11-fold higher than the background, whereas the activity of the Ϫ200 to Ϫ48 fragment flanking the type A 5Ј-UTR was maintained at ϳ10-fold over the background in these cells (Fig. 3B). These results suggest that the LTR construct per se is more highly active than the type A promoter in reporter systems.
For the type B 5Ј-UTR, the reporter activity of the whole fragment (Ϫ1011 to ϩ20) was detectable but very low in MKN-45 cells (Fig. 3C). Cleavage of the 3Ј-end up to nucleotide Ϫ553 was found to be necessary to stimulate luciferase activity (compare Ϫ872 to ϩ20 and Ϫ872 to Ϫ313 with Ϫ872 to Ϫ553 and also Ϫ772 to Ϫ313 with Ϫ772 to Ϫ553), suggesting the presence of an inhibitory region in this part of the sequence (see last paragraph). However, the most active fragment (Ϫ663 to Ϫ553) generated only ϳ4.5-fold higher luciferase activity than the background, even though the expression levels of type B 5Ј-UTR transcripts are similar to those of type A in MKN-45 cells. Fragments that included nucleotides up to Ϫ633 at the 5Ј-end had activity, whereas those shorter did not, suggesting that the sequence from Ϫ633 to Ϫ553 may contain stimulatory binding regions.
Characterization of the Promoter-driving Expression of Type A 5Ј-UTR Transcripts-To characterize the candidate promoter of the type A 5Ј-UTR transcripts, we prepared four overlapping DNA probes, each one ϳ40 bp long, together representing the sequence from Ϫ215 to Ϫ102 from the transcription initiation site. We used these probes in EMSAs with nuclear extracts prepared from MKN-45 cells. A major complex was found using probe Ϫ193 to Ϫ152, and other minor complexes were evident with probes Ϫ162 to Ϫ127 and Ϫ140 to Ϫ102 (Fig. 4A). Each complex appeared to be specific, because it disappeared in the presence of molar excess of the unlabeled probe. A search for potential transcription factor binding sites in the Ϫ193 to Ϫ152 sequence using the TRANS-FAC data base suggested high affinity sites for transcription factors GATA-1 and Ets-1 and a lower affinity site for CBF. In competition EMSAs with three substitution mutants of the probe (Fig. 4B), one (m2), potentially affecting both GATA-1 and CBF binding, failed to compete for complex formation. The were cloned in the pGl3 vector carrying the firefly luciferase gene and transfected in MKN-45 cells that express all three types of transcripts (types A and B both 1.5 fg/pg of ␤-actin; LTR 5.9 fg/pg of ␤-actin) or in MDA-MB-231 cells that express no ␤3Gal-T5 transcripts together with a Renilla luciferase reporter expression plasmid. Firefly luciferase activity was measured 24 h later and expressed relative to the Renilla luciferase activity determined for each sample. pGl3-basic, lacking any promoter sequence, and pGl3-control, containing the SV-40 promoter, were used as negative and positive controls, respectively. pGl3-LTR, having the sequence Ϫ148 to Ϫ28 flanking exon 1 and known as the LTR promoter, is shown for comparison. The constructs representing the putative promoter sequence of type A 5Ј-UTR transcripts and its CBF site-directed mutant were also transfected in MCF-7 cells (B), a mammary gland cell line expressing mainly type A 5Ј-UTR transcripts (1.3 fg/pg of ␤-actin) and no detectable LTR transcripts. The mutated CBF sequence has CACGCGTCA instead of the wild type (wt) CACCTATCA. Values are the means Ϯ S.D. for at least two experiments in duplicate. CBF competitor prevented complex formation, whereas competitors with GATA or Ets family consensus sequences did not affect the complex. In addition, preincubation of the binding reaction with an antibody specific to CBF yielded a supershifted complex, indicating that CBF is involved in complex formation. On this basis, we prepared a CBF site-directed mutant of the pGl3-200-48 construct and found that the luciferase activity measured upon transfection in MKN-45 or MCF-7 cells is drastically reduced with respect to the wild type construct (Fig. 3B). Altogether, the data confirm that the sequence is actually a ␤3Gal-T5 native promoter driving expression of the type A 5Ј-UTR transcripts.
Comparison of the Human and Mouse ␤3Gal-T5 Genes-The 5Ј-UTR of the mouse ␤3Gal-T5 transcript is reported to consist of two exons (GenBank TM accession number NM_033149). The first one, rich in G/C, is at least 47 bp long and may contain an additional 43 bp at the 5Ј-end (15). The second exon is 177 bp and appears homologous to the human exon 3. By competitive RT-PCR, we found that the amount of transcript originating at exon 1 corresponds to the total amount of transcript expressed in mouse colon, stomach, and mammary gland, suggesting that only one promoter is used in mouse tissues (Fig. 5A). The relative expression level in the colon was much higher than in the stomach or the mammary gland, as found in human tissues, but the actual amount, relative to ␤-actin, was much lower than in the human colon. We compared 10 kb of mouse genomic sequence from chromosome 16 (18), ϳ5 kb upstream and 5 kb downstream from the transcription initiation, with the corresponding human sequences flanking the three novel transcription initiation sites deduced from exons 1A, 1B, and 1C. No significant alignment was found in the latter two cases, whereas a surprisingly high level of homology was found between the mouse sequence and human sequence flanking exon 1A, mostly in the region just upstream of the exon (Fig. 5B). In particular, the CBF binding site and the downstream sequence responsible for the other complexes detected by EMSA, are almost completely conserved. To assess whether they are functional, we cloned the sequence in the pGl3 vector and transfected the construct in MKN-45 cells. Constructs with mutated CBF sites were also transfected as controls. The luciferase activities measured for the human and mouse sequences were very similar and were similarly reduced by the CBF mutation (Fig. 5C). Moreover, in competitive EMSAs, the mouse sequence containing the putative CBF site competed with the corresponding human probe, and when used as a probe itself, generated an identical complex that was competed out by the human sequence, the CBF consensus sequence, or anti-CBF (Fig. 5D). Similarly, the downstream sequence of the mouse promoter prevented complex formation with the downstream sequence of the human promoter. When labeled and used as a FIGURE 5. Expression of ␤3Gal-T5 in mouse tissues is under the control of a promoter structurally and functionally similar to the human promoter driving type A transcripts. A, quantitative expression of total (Tot) ␤3Gal-T5 transcript versus transcripts originating at exon 1 (1) was determined in three mouse tissues by competitive RT-PCR, as in Fig. 2. The amounts of amplified target cDNAs, presented in the right panel, were calculated from their respective standard curves and normalized to the amounts calculated for ␤-actin. B, ClustalW alignment of genomic regions flanking human type A and mouse ␤3Gal-T5 transcription initiation sites. Human chromosome 21 sequence from GenBank TM accession number AF064860 (nucleotides 37,000 -47,000) and mouse chromosome 16 sequence from GenBank TM accession number AC165170 (nucleotides 78,000 -88,000) were used as input sequences. The first exons in the 5Ј-UTRs are bold, and translation initiation sites (ϩ1) are shown in reverse type (white on black background). The mouse-extended first exon is italicized, the CBF binding site is boxed, and differences from the consensus sequence are in lower case. C, the mouse genomic sequence homologous to the human promoter of type A transcripts also contains a functional CBF binding site. The indicated DNA fragments flanking mouse exon 1 were cloned in the pGl3 vector, transfected in MKN-45 cells, and luciferase activity was measured and expressed as in Fig. 3. Mutated CBF sequences for both mouse and human have CACGCGTCA instead of the wild type (WT) CACCTATCA. pGl3-200-48, containing the human promoter driving type A transcripts, was mutated identically and used as a control. pGl3-basic, lacking any promoter sequence, is shown as a reference. D, competitive EMSAs were performed with human probe Ϫ193 to Ϫ152 in the presence of mouse oligonucleotide Ϫ109 to Ϫ67 and vice versa in the presence or absence of the CBF consensus sequence or GATA consensus sequence or anti-CBF. E, competitive EMSAs were performed with human probe Ϫ162 to Ϫ127 in the presence of mouse oligonucleotide Ϫ91 to Ϫ51 and vice versa. probe, the mouse sequence generated three complexes that were apparently identical to the human complexes and whose formation was prevented by a molar excess of the human sequence (Fig. 5E). These data indicate that the promoter driving the mouse and the human type A transcripts is conserved through evolution, suggesting that it is the ancestral ␤3Gal-T5 promoter.
Characterization of the Promoter Driving Expression of Type B 5Ј-UTR Transcripts-Because the type A promoter of ␤3Gal-T5 has no apparent features that can explain the insertion and stabilization of the LTR promoter, we tried to characterize the putative promoter region of the type B 5Ј-UTR transcripts of the gene, despite its low levels of activity in luciferase reporter experiments. To this end, EMSAs were performed using three overlapping DNA probes covering nucleotides Ϫ643 to Ϫ500 from the origin of exon 1B. A major complex and a very minor one were detected with probe Ϫ614 to Ϫ574 (Fig.  6A); both were inhibited by a molar excess of the probe and by the shorter sequence Ϫ606 to Ϫ582 ( Fig. 6B; WT, wild type). Among six substitution mutants of fragment Ϫ606 to Ϫ582, m5 failed to inhibit both complexes and m6 failed to inhibit the minor one only (Fig. 6B). A TRANSFAC data base search using both the parental and mutated sequences suggested the presence of recognition sites for transcription factors Oct-1 and HNF-1. In competitive EMSAs (Fig. 6C), formation of the major complex was prevented by the addition of the HNF-1 binding sequence of the SIF3 oligonucleotide as well as by an antibody specific to HNF-1, indicating the presence of an HNF-1 binding site, as reported for the LTR promoter (6,7). To directly compare these two promoters, competitive EMSAs were performed using both sequences (Fig. 6D). The LTR sequence completely abolished both the HNF-1 complex and the minor complex, suggesting that the binding is due to a Cdx binding site (6). Identical complexes were found with the LTR probe, but only the major complex was inhibited by a 200-fold molar excess of the Ϫ614 to Ϫ574 sequence, suggesting a lower affinity of the Ϫ614 to Ϫ574 sequence for the minor complex. By varying the competitor concentrations, we found that the relative affinity of the Ϫ614 to Ϫ574 fragment for the HNF-1 site of SIF3 was also much lower than that of the LTR sequence (Fig. 6E). A comparison of these two sequences (Fig. 6F) further revealed a string of nine identical nucleotides that includes the HNF-1 binding site and the overlapping sequence suggested to be a Cdx binding site (6). This same sequence is also present in the chimpanzee ortholog located in chromosome 22 (19). In the SIF3 and LTR sequences, the HNF-1/Cdx binding sequence is found in the sense orientation, whereas in both the Ϫ614 to Ϫ572 fragment and in the chimpanzee ortholog, this element is present in the antisense orientation. Taken together, our present data support the hypothesis that the sequence represents the promoter for the type B 5Ј-UTR transcripts, suggesting that the low luciferase activity associated with this fragment in our reporter experiments may be due to technical constraints probably related to the truncation of the 3Јregion. To evaluate the possibility that this 3Ј-region may bind inhibitory factors, as suggested by luciferase experiments, we prepared five overlapping DNA probes of ϳ40 bp in length covering the sequence from Ϫ494 to Ϫ314 from the transcription initiation site. EMSA performed with these probes and nuclear extracts of MKN-45 cells provided many specific spots, including a major complex with probe Ϫ353 to Ϫ314 (Fig. 7A). The addition of a molar excess of the Ϫ353 to Ϫ330 fragment did not inhibit this complex efficiently but only another minor complex formed with probe Ϫ353 to Ϫ314 (Fig. 7B). The major complex could be competed out using fragment Ϫ337 to Ϫ314, and substitution mutations of this fragment affected competition (Fig. 7B). The data indicate that the 3Ј-region of the putative type B promoter specifically binds putative regulatory factors in MKN-45 cells.  Fig. 4. B, competitive EMSAs were performed with probe Ϫ614 to Ϫ574 in the presence of oligonucleotide Ϫ606 to Ϫ582 and its mutants. The bands at the top of the gel are unspecific. Mutated oligonucleotide sequences are shown at the bottom of the panel. C, competitive EMSAs were performed with probe Ϫ614 to Ϫ574 in the presence of consensus sequences of candidate transcription factors or specific antibodies. SIF3 is an HNF-1-binding oligonucleotide. D, competitive EMSAs were performed with probe Ϫ614 to Ϫ574 in the presence of the LTR oligonucleotide or vice versa. E, competitive EMSAs were performed with SIF3 as the probe in the presence of different concentrations of oligonucleotides Ϫ614 to Ϫ574, SIF3, or LTR. F, comparison of the SIF3 oligonucleotide sequence (antisense), the corresponding sequence of the LTR (antisense), the corresponding sequence of the Ϫ614 to Ϫ574 oligonucleotide (sense), and the corresponding sequence of the chimpanzee ortholog (sense) (GenBank TM accession number AL954227, nucleotides 55926 -55954). The HNF-1/Cdx binding motif of the LTR sequence (see Ref. 6) is boxed.

DISCUSSION
We find from our current data that multiple promoters are present in the human ␤3Gal-T5 gene. Our quantitative analysis of the expression of four 5Ј-UTR transcripts suggests that they account for the large majority of the ␤3Gal-T5 mRNA molecules expressed in the tissues that we studied. However, because the total ␤3Gal-T5 mRNAs detected is not always equal to the sum of the individual 5Ј-UTR transcripts and also because of inherent experimental error in these analyses, we cannot rule out the possibility that additional 5Ј-UTR transcripts are present in some tissues driven by alternative cognate promoters.
The LTR and type C promoters, both active in the gastrointestinal tract, appear as strong promoters; on the other side, the ubiquitous type B promoter and the brain-specific promoter appear to be very weak promoters, whose physiological relevance is uncertain. The type A promoter, which is also quite weak, is indeed responsible for ␤3Gal-T5 expression in various tissues, such as the mammary gland, thymus, and trachea. This type A promoter is mostly responsive to CBF stimulation and is conserved in mouse, where it acts as a unique ␤3Gal-T5 promoter. This ancestral promoter provides for low ␤3Gal-T5 expression (0.1-0.4 fg/pg of ␤-actin) in both humans and mice but has substantial gastrointestinal specificity in mice, as previously suggested (15). The intrinsic weakness of such a promoter is explained by the relevant divergence from the CBF consensus sequence. In fact, it has been reported that any variant of the CCAAT core sequence, including CCTAT, strongly reduces the binding affinity (20). These structural and functional features are reflected in model cell systems, where luciferase activity driven by a type A promoter construct is much lower than the SV-40 promoter control. A common role of ␤3Gal-T5 in some general mammalian cell function may be proposed when it is transcribed under the control of this promoter. Little information is currently available concerning type 1 chain expression in the mouse (21,22), where ␣1,4 fucosylation does not occur (23). Moreover, it is not presently clear how ␤3Gal-T5 predominates in the mouse gastrointestinal tract, as it happens in the human counterpart, simply using the conserved CBF-dependent promoter.
On the other hand, the LTR promoter provides humans and some primates with not only gastrointestinal tissue specificity but also high expression levels (Ͼ10 fg/pg of ␤-actin in the colon). In the human ileum, however, high expression levels and possibly specificity is determined by another promoter not yet characterized, leading to the proposal of a distinct role for ␤3Gal-T5 in intestinal cell-specific functions (6). The intrinsic strength of the LTR promoter is also consistent with the conserved and proper orientation of both the HNF-1 and Cdx binding sites (6,24) and is recapitulated by our in vitro experiments showing very high luciferase activity for the LTR reporter construct in model cells. However, nothing is currently apparent with regards to the ancestral type A promoter that can explain the insertion and stabilization of the LTR retrotransposon. On this basis, we attempted to characterize the putative type B promoter, even though it has activity in certain human gastrointestinal cell lines but has only very weak activity in the normal tissues we tested. We found that the luciferase activity driven by the type B promoter construct was lower than expected in MKN-45 cells, as they express types A and B 5Ј-UTR transcripts at equivalent levels. However, to measure luciferase activity, it was necessary to remove a 552-bp segment from the 3Ј-end of the sequence, and this caused the possible deletion of specific binding sites for inhibitory factors. In fact, we found that this region binds various potential regulatory factors in MKN-45 cells. We speculate also that this truncation physically affects the ability of the promoter to drive luciferase activity in vitro. Moreover, additional binding sites for stimulatory factors may also have been present in this deleted region, affecting the activity measured for the type B reporter construct. Continuing on from these hypotheses, we pursued the characterization of the type B putative promoter and did find that this promoter and the LTR promoter share an almost identical HNF-1/Cdx binding motif but placed in the opposite orientation. Because such a perfect sequence match probably helped stabilize the element at the time of its transposition, a suggestive hypothesis is that the isolated sequence represents a remnant of a true native promoter that was relevant at the time of the LTR insertion but has since been overwhelmed by the mobile element. The present disposition of binding sites for stimulatory and inhibitory factors may not allow for full expression of the promoter in a reporter system.
Because other human (but not mouse) genes have been reported to have an LTR with or without similar expression patterns (12), it will be interesting to verify whether the context of the LTR insertion in these cases mirrors that here proposed for the ␤3Gal-T5 gene.
We also found that the LTR transcript is poorly expressed or not expressed at all in some gastrointestinal cancer cell lines. In  Fig. 4. B, competitive EMSAs were performed with probe Ϫ353 to Ϫ314 in the presence of oligonucleotide Ϫ353 to Ϫ330 or oligonucleotide Ϫ337 to Ϫ314 and its mutants. Mutated oligonucleotide sequences are shown at the left of the panel. some of these lines, the type B 5Ј-UTR transcript is very detectable, indicating that its promoter, with an antisense HNF-1/ Cdx motif, is active to some extent. Given the dependence of the two promoters on the same transcription factors (but the higher affinity and strength of the LTR promoter), we hypothesize that an unknown strong inhibitory mechanism specifically affects the LTR promoter in cancer cells. This is in good agreement with the finding that the negative regulation of ␤3Gal-T5 in colon cancers (3,6) exceeds the simple reduction of HNF-1 and Cdx binding (6) and represents another source of regulatory variations potentially introduced by retroelements.
Using exon Ϫ3-specific primer, we were unable to detect ␤3Gal-T5 by PCR in any cDNA we tested, suggesting that such exon belongs to a brain-specific transcript placed under the control of a distinct promoter yet unknown. Because the expression levels of ␤3Gal-T5 in brain are extremely low (7), such promoter is presumably very weak. In fact, other ␤1,3galactosyltransferases, such as ␤3Gal-T1 (25-27) and ␤3Gal-T2 (28), are expected to play a major role in tissues and cells of neuroectodermal origin. In conclusion, the strict conservation of the ancestral promoter from mice to humans, together with the adoption of multiple promoters, including the strong LTR promoter during evolution, suggest that more work is needed to understand the contribution of ␤3Gal-T5 and its reaction products in specific cell functions.