Organization and regulatory aspects of the human intestinal mucin gene (MUC2) locus.

The human MUC2 gene maps to chromosome 11p15, where three additional mucin genes have been located, and encodes the most abundant gastrointestinal mucin normally expressed in the intestinal goblet cell lineage. However, in pathological conditions, including colorectal cancer, MUC2 can be abnormally expressed. Therefore, it is of considerable interest to understand the regulation of the MUC2 gene and how the mechanism is altered in colon cancer. Toward this goal, we have isolated a group of overlapping clones (contig) spanning 85 kilobases harboring the entire MUC2 locus, including sequences located upstream of the gene. Detection of two DNase I-hypersensitive sites in the 5′ region of the MUC2 gene suggests the presence of DNA regulatory elements. To better characterize this region, we have sequenced 12 kilobases of the upstream region and analyzed it for functional activity by cloning portions of it into a luciferase reporter vector and assaying for promoter/enhancer activity using a transient transfection assay. A fragment from the AUG translational initiation codon +1 to −848 confers maximal transcriptional activity in several intestinal cell lines. Elements located further upstream exert a negative effect on the expression of the reporter gene when tested in conjunction with homologous or heterologous promoters. The same pattern of expression is observed when the MUC2/luciferase constructs are transfected into HeLa cells, which do not express the endogenous MUC2 gene. However, the level of activity in HeLa cells is at least an order of magnitude higher, suggesting that additional sequences singularly or in combination are responsible for the tissue- and cell lineage-specific expression of MUC2 Finally, we have identified an additional mucin-like gene (MUCX), located upstream of MUC2 We show that this MUCX gene, that is transcribed in opposite orientation to that of MUC2, is expressed with a pattern distinct from that of MUC2, yet similar to that of MUC5B and MUC6, two additional mucin genes located at chromosome 11p15. Recent information on the order of the mucin genes at chromosome 11p15 suggests that MUCX may be MUC6, one of the already identified mucin genes, or a novel one, yet to be fully characterized.

Mucins are the major components of mucus, the visco-elastic substance that protects and lubricates epithelial mucosa, including that of the gastrointestinal tract. They are highly glycosylated molecules, and up to 80% of their mass consists of O-linked glycosyl residues. Recently, the cloning of full-length or partial cDNA sequences of mucins expressed in different tissues has greatly facilitated investigations of the polypeptide moieties (reviewed in Refs. [1][2][3]. In the intestinal epithelium the major mucin is MUC2, and the corresponding gene has been mapped to human chromosome 11p15 (4). cDNAs, most likely corresponding to three distinct mucin genes, MUC6 (5), MUC5AC, and MUC5B (6, 7), encoding for a gastric and two tracheobronchial mucins, respectively, have been mapped to this same band. Although only the MUC2 cDNA has been characterized completely (8), the partial cDNA sequences of the gastric and tracheobronchial mucins suggest that these are distinct genes. Thus, chromosome 11p15 may be a locus that contains a cluster of mucin genes.
The expression of individual mucin genes is relatively organspecific (9 -11) and in the case of MUC2 is also cell typespecific, MUC2 being expressed almost exclusively in goblet cells (10,12). In the intestine, this lineage, as well as columnar, enteroendocrine and Paneth cells most likely arise from a common precursor, the stem cell located near the bottom of the crypt. Stem cells differentiate as they migrate upwards to the crypt surface where they are exfoliated into the intestinal lumen (13). The molecular mechanisms that regulate the spatial temporal differentiation of normal colonic epithelial cells are poorly understood. Moreover, the process is most likely under the influence of many external signals, some of which may be recapitulated in in vitro studies. For example, we have shown that in HT29 cells, a human adenocarcinoma cell line that is considered multipotent since it can express distinct cell lineage-specific markers upon exposure to appropriate inducers, MUC2 gene expression can be modulated through protein kinase C-and protein kinase A-dependent signal transduction pathways (14).
Although alterations of the level of mucin glycosylation have been well studied as a characteristic of colon cancer (15)(16)(17), only recently, with the availability of cDNA probes, has it been possible to study alterations in apomucin expression. Both up-regulation and down-regulation of mucin gene expression have been reported in cancer cells (10,11,18). Indeed, we have shown that in several cell lines that are derived from human mucinous tumors, which are characterized by the synthesis of large quantities of mucins, there is a constitutive high level of expression of the MUC2 gene, suggesting that in this subset of tumors MUC2 is deregulated (14).
Thus, to investigate the mechanisms that govern MUC2 expression during the differentiation of the goblet cell lineage and determine the causes of its abnormal expression in colon cancer, we undertook cloning the MUC2 locus to identify DNA elements capable of directing tissue and cell lineage-specific expression of MUC2.
In this report we present a partial characterization of a contig isolated by chromosome walking from a human chromosome 11 cosmid library. The group of overlapping clones contains the entire MUC2 locus, including several thousand nucleotides of DNA extending from the 5Ј terminus of MUC2, for which we have determined the DNA sequence. Detection of DNase I-hypersensitive sites in the untranscribed 5Ј-flanking sequence of MUC2 suggested the presence of DNA regulatory elements. To further characterize the potential cis-acting regulatory elements in this region we tested the functional activity of distinct segments of the MUC2 upstream region by performing in vitro transient transfection assays in several epithelial cell lines. We present evidence that a region of 850 nucleotides extending upstream from the MUC2 initiation of translation confers maximal activity to a luciferase reporter gene, while sequences that are located further upstream exert an inhibitory effect both on homologous and heterologous promoters. These data suggest that the regulation of the MUC2 gene most likely depends on the interaction of both positive and negative regulatory elements.
Finally, we have identified an additional mucin-like gene (MUCX) located upstream of MUC2. The pattern of expression of MUCX, which is transcribed in the opposite direction compared with MUC2, is also presented. Comparative analysis of DNA and deduced amino acid sequences, the pattern of expression of MUCX and the information on the physical order of the mucin genes clustered at 11p15 (19) suggest that the genomic sequence of MUCX may correspond to a portion of the amino terminus of MUC6 or to a novel, yet unidentified, mucin gene.

MATERIALS AND METHODS
Oligonucleotides and DNA Probes-Oligonucleotides corresponding to the antisense sequence of portion of the tandem repeats in the MUC6 (5), MUC5AC (20), and MUC5B (21) were obtained from the oligosynthesis facility at the Albert Einstein Cancer Center and they are as follows: MUC6/LA184, 5Ј AAGCTTGGAACGTGAGTGGGAAGTGT-GGT 3Ј (5); MUC5AC/LA175, 5Ј TGGAGTAGAGGTTGTGCTGGTTGT 3Ј (20); MUC5B/AV-1, 5Ј GGCTGTGGTGGTCAGCACTGTGAGGGT-GTGGGCAG 3Ј (21). BO2 corresponds to the antisense of the MUC2 cDNA sequence spanning nucleotides 82-101 according to the sequence published in Gum et al. (8). Cl2B is a cDNA representing a portion of the MUC2 tandem repeats (14). Probe C was generated from human placenta DNA by polymerase chain reaction amplification using primers 2S and 2A based on the cDNA sequence of human mucin-like protein (H-MLP) (22). The product is a 1-kb 1 fragment containing sequences spanning from nucleotide 13486 to nucleotide 13911 of the MUC2 cDNA. Glyceraldehyde-3-phosphate dehydrogenase was analyzed with a rat probe. All DNA probes were purified inserts labeled by random priming using a Random Primers DNA labeling system from Life Technologies, Inc.
Genomic Cloning, Characterization, and Sequencing-High density filters of a human chromosome 11 cosmid library (SRL library) (23) were screened initially with Cl2B in 0.5 M sodium phosphate buffer, pH 7.2, 8% SDS, 100 g/filter of sonicated single-stranded salmon sperm DNA, at 65°C. The 5Ј and the 3Ј end fragments of the initial clone were isolated and used to screen the same high density filters to isolate overlapping clones. Partial restriction map analysis determined the extent of overlapping. For sequence analysis, restriction fragments, covering the contig, were subcloned into pBluescript SKϩ vector (Stratagene), and double-stranded DNA templates were sequenced with both T3 and T7 promoter primers and internal primers using the dideoxy nucleotide chain termination method with Sequenase kit, version 2 (U. S. Biochemical Corp.). When artifact banding due to the presence of G ϩ C-rich areas was a problem, termination reactions were carried out in the presence of terminal deoxynucleotidyl transferase and excess dNTPs (24). Analysis of nucleic acid and protein sequence data was performed using PC/Gene (Intelligenetics) and Wisconsin Sequence Analysis Package GCG (Genetics Computer Group, Madison, WI) software.
MUC2/Luciferase Plasmid Construction-The vectors used were pGL2 basic, pGL2 enhancer, and pGL2 promoter, which contain no regulatory elements, the SV40 enhancer, and the SV40 promoter, respectively (Promega). Different portions of the 12-kb 5Ј-flanking sequence of MUC2 were cloned in these vectors. A 525-nucleotide BamHI-NcoI fragment was blunt-ended at the NcoI site, corresponding to the AUG initiation codon (8), HindIII linkers were added, and the fragment was then cloned into the HindIII-BglII sites of pGL2basic and enhancer. The resulting construct is the MUC2/luciferase 0.5 plasmid. Plasmid 0.5 was linearized with NcoI, which cuts at a second internal NcoI site in the MUC2 sequence, and XhoI, in the pGL2 polylinker; a different NcoI-XhoI fragment, containing the same sequence present in the original NcoI-BglII region of the 0.5 fragment plus 375 nucleotides of the BamHI fragment that is located upstream and contiguous to the 0.5 fragment, was than inserted, generating the 0.8 plasmid. Plasmid 0.8 was partially digested with BamHI and a BamHI-SacI fragment of 4.9 kb, containing sequences upstream of the 0.8 fragment was cloned into the 0.8 basic and enhancer constructs, generating the 5.0-kb plasmid. A 6-kb SacI-KpnI fragment, carrying the region further upstream of MUC2, was isolated from the 15-kb EcoRI fragment, previously cloned into pBSSKϩ, and subcloned in the MUC2/luciferase 5.0 plasmid cut with SacI and KpnI. The resulting construct is the 9.0-kb MUC2/ luciferase construct. Plasmid p4.3 was generated by subcloning a 4656nucleotide BamHI-SacI fragment, derived by partial BamHI digestion of a 5.2-kb SacI clone in pSBSSKϩ, which excludes the 0.5 fragment, into the promoter/luciferase vector cut with BglII and SacI. A schematic representation of these plasmids is shown in Fig. 1B. The identity of each of the MUC2/luciferase plasmids was verified by restriction enzyme and partial sequence analyses.
Cell Culture and Transient Transfection Assays-HT29, LS174T, and HeLa cells were maintained in minimal essential medium supplemented with nonessential amino acids and 10% fetal calf serum. HT29 cells were induced with TPA and forskolin as described previously (14).
For transfection, cells were seeded at 5 ϫ 10 4 /well in 24-well plates, and liposome-mediated transfections were performed 3 days later. The intestinal cell lines were transfected with tfx50 (Promega) using a ratio of tfx:DNA of 3:1. Typically, 1 g of plasmid DNA consisting of 0.1 g of test plasmid DNA, 0.2 g of CMV-␤gal, to correct for transfection efficiency, and 0.8 g of carrier DNA were mixed with tfx50 in 200 l of serum and antibiotic-free medium. Cells were incubated with the transfection mixture for 2 h at 37°C followed by the addition of 1 ml of complete medium. Cells were harvested 48 h later. HeLa cells were transfected with lipofectAMINE (Life Technologies, Inc.) following the supplier's recommendations.
For the luciferase assay, cells were lysed on plates using the reporter lysis buffer (Promega), and cell extracts were prepared following supplier instructions. 5-20 l of cell extract were used to determined luciferase activity, using the Promega detection kit and a Turner TD-20-e luminometer. The ␤-galactosidase activity was measured using 20 -40 l of cell extract. The luciferase activity of test plasmids is expressed as fold of induction of the test plasmid activity compared with that of the corresponding control, after correction for transfection efficiency as measured by the ␤-galactosidase activity.
DNase I-hypersensitive Site Detection-Nuclei from the different cell lines were prepared according to the protocol of Enver et al. (25). The nuclear pellet was resuspended at 500 g of DNA/ml. 20-l aliquots were incubated at 37°C for 10 min with increasing amounts of DNase I, ranging from 0.05 to 0.8 unit of enzyme. DNase I digestion was blocked by the addition of EDTA, pH 8, at a final concentration of 25 mM, followed by incubation with RNase A, at 20 g/ml, for 30 min at 37°C. Samples were digested overnight with proteinase K, at 50 g/ml in NTE buffer (0.1 M NaCl, 50 mM Tris, pH 8, 1 mM EDTA) containing 0.25% SDS, final concentration. DNA was extracted twice with phenol: chloroform and ethanol-precipitated. DNA, digested with EcoRI, was analyzed by Southern blot using a 2.6-kb PstI fragment located downstream of the first exon of the MUC2 gene as a probe (8).
RNA Isolation and Analysis-Total RNA was isolated and analyzed as described previously (14).

Partial Characterization of a Contig Encompassing the Human MUC2 Locus and an Additional Mucin
Gene-We used a cDNA probe (Cl2B), corresponding to a portion of the repetitive region of the MUC2 gene, to screen a human chromosome 11 cosmid library and isolated a contig of approximately 85 kb.
Results from partial sequence analysis, Southern blot hybridization of restriction enzyme-digested DNA of the contig with probes corresponding to unique regions in the 5Ј (BO2) and 3Ј end (C probe) of MUC2, in addition to Cl2B (see "Materials and Methods"), indicate that the contig spans the entire MUC2 locus, including the cap site (8) and approximately 50 kb of DNA 5Ј to it. A partial restriction map as well as locations of the start site, the tandem repeat region and the 3Ј unique portion of MUC2 in the contig are shown in Fig. 1A.
The MUC2 gene has been localized to chromosome 11, p15.5 (4). At least three additional mucin genes have been mapped at the same band position: the tracheobronchial MUC5AC and MUC5B (6, 7) and the gastric MUC6 gene (5). To determine whether the contig we isolated contained additional mucinrelated sequences, we used different fragments of the contig, as probes, to determine whether an additional mucin-like mRNA was detected in intestinal cell lines that do or do not synthesize mucins and in unrelated cell lines. Fig. 2 shows that a probe spanning the 5Ј end of the group of overlapping clones, the 13-kb NotI fragment (probe X in Fig. 1A), detects a mRNA in LS174T that has the physical characteristics of mucin mRNA, namely high molecular weight and polydispersity (1). This mRNA is not expressed in HT29 cells, uninduced or induced with forskolin ( Fig. 2) and TPA (not shown), two agents that we have shown previously induce MUC2 expression in these cells. The same pattern of expression was seen using a smaller 2.1-kb fragment located at the end of the contig (probe X1 in Fig. 1A).
In addition, no mucin-like mRNA was detected by these probes in HeLa (Fig. 2

) or HL60 cells (not shown).
To investigate the nature of the mRNA detected in LS174T by the 13-kb NotI fragment, which we refer to as MUCX, we determined whether the MUC5AC, MUC5B, and MUC6 genes had the same pattern of expression as the mRNA detected by probes X and X1 in LS174T cells and HT29 cells stimulated with forskolin and TPA. As probes we utilized oligonucleotides, described under "Materials and Methods," derived from published partial cDNA sequences corresponding to the tandem repeat regions of these other mucin genes. Fig. 3 shows the results of this analysis using probes corresponding to MUC5AC (A), MUC5B (B), and MUC6 (C). The pattern of expression of the different MUC genes is summarized in Table I.
The data presented in Fig. 3 and Table I show that the MUCX gene shares the same pattern of expression of both MUC5B and MUC6. In addition, both HeLa and HL60 are negative for the expression of any of the mRNA species detected by these probes.
To characterize further the MUCX gene, we determined the sequence of approximately 4 kb of DNA located at the 3Ј end of the contig, and homology searches were done with the BLAST network service at National Center for Biotechnology Information. The MUCX gene showed three areas of homology with the hMUC2 gene at the nucleotide level. Most important, the homology was maintained between the derived amino acid sequences of the regions in the MUCX DNA identified in the search and the corresponding portions in the MUC2 protein (Fig. 4, A and B). These regions of similarity lay in the D2 and D3 domains of the MUC2 protein. The D domains (D1-D4) in MUC2 are characterized by a high degree of sequence similarity to four D-domains in prepro-von Willebrand factor and by the presence of cysteine residues whose position in MUC2 and other mucin sequences is maintained invariant (26). In addition, one of the MUCX-deduced polypeptides also exhibits homology to a portion of the HGM-1 (gastric mucin) sequence ( Fig. 4B; Ref. 27). Although the homology among the distinct portions of MUCX and MUC2 is only about 50%, it is noteworthy, as shown in Fig. 4, A and B, that the position of the Cys residues is perfectly conserved in the three sequences. It has been suggested, based on analogy with the von Willebrand factor model, that these Cys residues are critical for promoting mucin oligomerization (26).
Therefore, the pattern of expression of the MUCX gene and nature of the transcript, sequence homology, and conservation of the position of Cys residues with other apomucins, all suggest that MUCX encodes a mucin-like peptide. However, our data do not resolve whether MUCX corresponds to MUC5B or MUC6 or is a novel mucin gene (see "Discussion").
DNase I-hypersensitive Sites Are Located in the 5Ј Region of MUC2-In a first approach to determine whether the proximal 5Ј-flanking region of MUC2 harbors sequences that regulate MUC2 transcription, we analyzed upstream DNA for the presence of DNase I-hypersensitive sites. We had demonstrated previously that MUC2 mRNA could be induced in HT29 cells treated with several agents, including forskolin and TPA (14). Thus, the presence of DNase I-hypersensitive sites was investigated in the chromatin of untreated HT29 cells, which do not express MUC2 mRNA, and forskolin-and TPA-treated cells, that express MUC2. As shown in Fig. 5, two major DNase I-hypersensitive sites, located approximately 600 and 1600 nucleotides upstream of the start sites, were detected. The location of these sites did not change with induction (Fig. 5A). These results are in agreement with our previous data suggesting that the regulation of MUC2 in HT29 cells by forskolin and TPA was predominantly post-trancriptional. Additionally, lack of alteration of the DNase I-hypersensitive sites as a function of expression of MUC2 is further documented in LS174T cells. This tumor cell line is characterized by the expression of a very high basal level of MUC2 that cannot be further induced by the same agents that promote MUC2 mRNA accumulation in HT29 cells. 2 Indeed, Fig. 5B shows that the same hypersensitive sites, present in uninduced or induced HT29 cells, are detected in the chromatin of LS174T. In non-intestinal cell lines the pattern of DNase I-hypersensitive sites is quite distinct: in HL60, a non-epithelial cell line, there is a single hypersensitive site located at a different position (Fig. 5B). Thus, our data suggest that the two hypersensitive sites detected in intestinal cells may be related to the expression of the MUC2 gene.
Characterization of the 5Ј Sequence and Functional Analysis of the hMUC2 Promoter-To further characterize the potential cis-acting regulatory elements in the DNA region 5Ј of the MUC2 gene, we determined the entire sequence of the 12 kb of the MUC2 upstream region. The full sequence has been deposited in GenBank™ under accession number U68061. A partial sequence starting at the AUG translational initiation codon and extending 2600 nucleotides upstream is presented in Fig.  6. Putative recognition sequences for transcription factors are shown. Some of the motifs are repeated several times, including GC boxes (putative Sp1 binding sites), the CCACCA sequence, which has been described in the SV40 enhancer, though the identity of the putative binding factor(s) (HC3) is not known (28), and a motif, CCCGG, which is present in the maize Adh1 promoter (29). In addition, several binding sites that play a role in the induction of gene expression and in cell proliferation and differentiation were noted. These include cyclic AMP-responsive and TPA-responsive elements, and Myc, AP-2, and CAAT/enhancer-binding protein binding sites. For some of these elements, namely Sp1 and Sp1-like binding factor, AP2 and CAAT/enhancer-binding protein, a role in the transcription of other intestinal genes has been suggested (30 -34). The presence of DNase I-hypersensitive sites and several putative cis-regulatory motifs in the 5Ј region of the MUC2 gene is consistent with its role as promoter for MUC2. To explore this further, we tested for the presence of promoter/ enhancer activity in the MUC2 5Ј-flanking sequence. As shown in Fig. 1B, distinct portions of the 15-kb EcoRI fragments were subcloned into the pGL2 vector series harboring the luciferase reporter gene. These plasmids contain DNA segments starting from the MUC2 translation initiation site (ϩ1), which is located 25 nucleotides downstream from the mRNA cap sites and extending to Ϫ364 in the 0.3 construct; Ϫ516 in 0.5; Ϫ848 in 0.8; Ϫ5183 in 5.0 kb; and Ϫ9062 in 9.0-kb construct, respectively (Fig. 1B). These fragments were cloned into the basic vector, which does not contain any regulatory element, as well as into the enhancer vector, which harbors the SV40 enhancer sequence downstream of the luciferase gene. The resulting MUC2/luciferase plasmids are labeled "b" (basic) and "enh" (enhancer) to indicate the vector background.
We transiently transfected the MUC2/luciferase plasmids into two different human intestinal cell lines that show distinct patterns of expression of the endogenous MUC2 gene. HT29 cells do not express MUC2 unless stimulated by any one of several agents, including forskolin and TPA. In contrast, LS174T cells have a constitutive very high level of MUC2 mRNA expression. Although a modest increase in luciferase activity of the MUC2 reporter constructs, ranging between 2and 3-fold compared with control vectors, was detected, a general pattern of expression emerged. When the MUC2 fragments were tested for transcriptional activity in the enhancer background (Fig. 7A), maximal activity was associated with fragment 0.8, that extends up to Ϫ848 relative to the AUG,  11p15 (MUC5AC, MUC5B, and MUC6). Total RNA was isolated from the indicated cell lines, which were or were not induced with the indicated agents, and analyzed by Northern blot for the expression of MUC5AC (A), MUC5B (B), and MUC6 (C), using, as probes, oligonucleotides LA175, AV-1, and LA184, respectively, as described under "Materials and Methods." The location of the 28 and 18 S rRNA is indicated. Also, in B, control sample (lane C) of LS174T cells shows partial RNA degradation.
both in HT29 and LS174T cells. In HT29 cells the luciferase activity associated with both enh0.5 and -0.8 plasmids was significantly greater than that of control (p Ͻ 0.05, signed rank test). A reduced activity was associated with sequences located further upstream (fragments 5.0 and 9.0 in Fig. 1B) when tested in the enhancer background. However, in the basic background, the 0.8 and 5.0 fragments were equally active, in both HT29 and LS174T cells.
The expression data obtained with MUC2 fragments inserted in the enhancer background suggest the presence of negative regulatory elements. This was confirmed in experiments using a construct harboring the DNA sequence located between Ϫ516 and Ϫ5183 (fragment 4.3, Fig. 1B) inserted 5Ј to the SV40 promoter in the promoter/luciferase vector, the "p" vector. Consistently, we observed that the luciferase activity, as driven by the SV40 promoter, was repressed up to 80% in the presence of the MUC2 4.3 fragment (Fig. 7C).
HeLa cells, a human cell line derived from a cervical carcinoma, do not express MUC2 whether uninduced or induced with forskolin or TPA (data not shown). However, a pattern of expression, similar to that observed in HT29 and LS174T cells, was obtained when these plasmids were transfected into HeLa cells, as shown in Fig. 7, A-C. However, there were also marked differences: first the level of activity of the MUC2 promoter was at least an order of magnitude higher in HeLa cells than in any of the intestinal cells tested. The highest luciferase activity was associated distinctly with plasmids containing the MUC2 fragment ϩ1 to Ϫ848 (fragment 0.8), both in the basic and enhancer background, but, unlike the intestinal lines, this fragment conferred much higher activity when assayed in the basic vector as compared with the enhancer (100fold induction of b0.8 versus 35-fold of enh0.8 in Fig. 7, A and  B). The luciferase activity associated with both the enh0.8 and b0.8 plasmids was significantly greater than that of controls (p Ͻ 0.05). Other aspects of expression in HeLa cells are similar to that seen in the intestinal cell lines, including the dramatic inhibition of luciferase activity associated with the MUC2 sequence spanning Ϫ848 to Ϫ5183 (fragment 4.3, Fig. 1B) in the p4.3 construct. DISCUSSION The human MUC2 gene encodes the major mucin peptide expressed in the intestine (10,12). In normal intestine, MUC2 is almost exclusively localized in goblet cells, thus it is an important marker for the study of differentiation of this cell lineage. In addition, altered expression of apomucins has been reported to occur in cancer (10,11,18). Thus, to investigate the molecular mechanisms that govern the expression of MUC2 during the differentiation process of goblet cells and the alterations that occur in malignant transformation, we undertook cloning of the MUC2 locus. In this paper we report the isolation of a contig spanning the entire MUC2 locus and a partial physical and functional characterization of the MUC2 5Ј-flanking sequence. In addition, the analysis of the DNA sequence upstream of MUC2 revealed the presence of an additional mucin-like gene, that we refer to as MUCX. This is not a surprising observation, since on chromosome 11p15, at the same general band where MUC2 resides, at least three addi- Genomic DNA was isolated, digested with EcoRI, and analyzed by Southern blot. Hypersensitive sites were revealed by indirect end labeling using a PstI probe spanning the initial 2.6-kb of the EcoRI fragment shown in Fig. 1A, localized downstream of the MUC2 AUG initiation codon. B, nuclei were isolated from LS174T and HL60 and processed as in A. Also indicated are the molecular weight DNA markers. tional mucin genes (MUC6, MUC5AC, and MUC5B) have been localized (4 -6). The evidence we present for the presence of an additional mucin gene close to MUC2 is 2-fold. First, a contig fragment (probe X, Fig. 1A), used as a probe, detected a mRNA with typical physical characteristics of some of the other mucin mRNAs, namely polydispersity (1), in a mucinous cell line. Second, three areas in probe X showed homology with distinct portions of the MUC2 cDNA, both at the nucleotide and corresponding amino acid level. These regions of similarity lay in the D2 and D3 domains located in the amino-terminal portion of the MUC2 protein, which are characterized by the presence of cysteine residues and by a high degree of sequence similarity to four D-domains in prepro-von Willebrand factor (26). The conservation of number and position of Cys residues is a hallmark of apomucins, and based on analogy with the von Willebrand factor model, it has been suggested that these Cys residues are critical for promoting mucin oligomerization (26).
The identity of MUCX is not clear, but our data and a recent report (19) on the organization of mucin genes at 11p15 narrow the possibilities. The pattern of expression of mRNA detected with the MUCX probe (probe X in Fig. 1A) in HT29 and LS174T cells corresponds to that of both MUC6 and MUC5B, all of which were expressed exclusively in LS174T cells (Table I). In contrast, MUC5AC expression was induced in HT29 cells by forskolin and TPA. Similar results for the expression of MUC6 and MUC5AC have been reported recently by others (35). Thus, MUC5AC is eliminated as a candidate, even though it flanks MUC2 (19). MUC6 has also been reported to flank MUC2, and thus, MUC6 is a stronger candidate for MUCX than is MUC5B.
Our sequence data on MUCX do not yet resolve this issue. Identity between MUCX and MUC6 (5) could not be established, since the partial cDNA sequences for MUC6 in the literature correspond to the tandem repeats and 3Ј unique regions of the gene, while the MUCX sequence reported herein is, based on homology to MUC2, most likely in the 5Ј end portion of a mucin gene. That MUCX represents a 5Ј region is consistent with the good homology found between MUCX and HMG-1, a potential cDNA spanning a more 5Ј unique region of the MUC5AC gene (27), but again the lack of sequence identity between MUC5AC and MUCX reinforces the conclusion from the expression data that MUCX is not MUC5AC. Thus, it seems that the genomic sequence of MUCX that we have isolated may correspond to the 5Ј portion of MUC6 or of a novel mucin gene, not yet fully characterized. The definite identification of the MUCX gene will require the isolation and characterization of the corresponding cDNA.
The MUCX gene is transcribed in the opposite direction as compared with MUC2, raising the possibility that the two genes may share regulatory elements and pattern of expression. However, our data on the expression of MUCX suggest otherwise. In fact, MUCX is exclusively expressed in LS174T and undetected in HT29 cells, whereas MUC2 expression is high in LS174T and can be induced by TPA and forskolin in HT29 cells. These data suggest that the two genes are independently regulated. This notion is consistent with the specific tissue distribution of mucins whose genes have been mapped on chromosome 11p15 (9).
The contig we have isolated contains 50 kb of DNA upstream of the MUC2 gene. We have sequenced 12 kb of this 5Ј-flanking region of MUC2 where elements which impart MUC2 tissuespecific and differentiation-dependent regulation may reside. Furthermore, functional analysis was performed using distinct portions of the 12-kb upstream sequence linked to the luciferase reporter gene and transfection of these MUC2/luciferase plasmids into two different intestinal cell lines characterized by a unique pattern of expression of the endogenous MUC2 gene. Our data indicate that elements, which impart promoter activity, are present in the 5Ј region of the MUC2 gene. Although two different lines (LS174T and HT29 cells) are characterized by either constitutive high or well inducible levels of MUC2 expression, and they correspond to mucinous and stemlike cells, respectively, there is a clear consistency in the activity associated with the different segments of the MUC2 promoter in these cell lines. A fragment extending from the AUG to nucleotide Ϫ848 (fragment 0.8 in Fig. 1B) gives the maximal transcriptional increase over the corresponding control plasmid. This increase is modest, varying between 2-and 5-fold and was detected both when the 0.8 fragment was inserted in an enhancer or basic background. This modest increase is consistent with our previous data suggesting that the MUC2 gene is transcribed at a low rate (14). Moreover, in a basic background, the 5.0 fragment conferred an activity similar to that associ-ated with the 0.8 fragment. Consistent with our results it has been reported that sequences located between Ϫ1308 and Ϫ641, relative to the cap site, are important for promoter activity when linked to a reporter gene (36). In an enhancer background, however, this fragment exerts a negative effect. Further analysis of this region will establish whether there may be competition for transacting factors between sequences located between Ϫ848 and Ϫ5183 in the MUC2 5Ј region and the SV40 enhancer, as the data suggest.
The down-modulation of the luciferase gene by MUC2 DNA sequences extending further upstream from the 0.8-kb fragment is further documented, in all the cell lines we have tested, by the consistent repression of the luciferase gene when it is driven by the SV40 promoter in the presence of the 4.3-kb MUC2 fragment located between Ϫ516 and Ϫ5183. The repressor activity of the 4.3-kb fragment is much greater when tested in conjunction with the SV40 promoter. We are investigating whether this repressor activity shows promoter specificity by testing fragment 4.3 in conjunction with additional promoters. Although our data do not distinguish between nonspecific mechanisms due to competition for transcription factors, which are important for MUC2 expression, and the presence of specific negative elements, it is noteworthy that several intestinal genes have been shown to be regulated by the combined action of both positive and negative elements, which ensure both cell lineage-specific expression and proper temporal and spatial expression of the gene in the intestine (37,38).
Localized to the 0.8-kb fragment is one of the two DNase I-hypersensitive sites that we have mapped in the MUC2 promoter at approximately Ϫ600 and Ϫ1600 from the AUG. These two hypersensitive sites are present both in unstimulated and stimulated HT29 cells, consistent with our previous data indicating that MUC2 induction in HT29 cells occurs mainly trough a post-transcriptional mechanism. The same sites are present in LS174T cells that have very high levels of MUC2 mRNA. A specific role of these hypersensitive sites in the expression of MUC2 in intestinal cells is suggested by the presence of a single different site in the chromatin of HL60 cells, an unrelated cell line that does not express MUC2.
HT29 cells are considered equivalent to stem cells and do not express markers of differentiation, yet they have the same DNase I-hypersensitive sites as LS174T cells, which express high levels of MUC2. This observation suggests that HT29 cells are already committed to differentiation, although not yet lineage-restricted. Indeed, we have shown that, depending on the stimuli, these cells can express simultaneously markers that in the mature cells are cell lineage-restricted (39). Whether the process of lineage restriction is accompanied by a reorganization of the chromatin of those genes that are not expressed in the mature cells is presently not known.
We found that in HeLa cells the pattern of transcriptional activity for each MUC2/luciferase plasmid was similar to that observed in the intestinal cells. However, the level of activity was much greater, being at least 20-fold higher than that detected in intestinal cells, upon correction for transfection efficiency. Although surprising, since HeLa cells do not express the MUC2 gene under any tested conditions (data not shown), it is possible that the activity detected in HeLa cells reflects the FIG. 7. Functional analysis of the 5-flanking region of the MUC2 gene. A, the activity of each MUC2/luciferase construct, as schematically represented in Fig. 1B, in the enhancer background, is presented for each of the cell lines tested. B, the activity of constructs, harboring the same MUC2 fragments assayed in A, in the basic background is presented. C shows the repressor activity of a 4.3-kb MUC2 fragment when tested in conjunction with the SV40 promoter. The luciferase activity for all three panels is calculated relative to the activity of the corresponding control plasmid (fold of induction) after correction for transfection efficiency, as determined by the activity of a CMV-␤gal reference plasmid. Names of the plasmids correspond to the sizes of the fragment tested, as illustrated in Fig. 1B. Prefixes enh, b, and p, indicate the background of the corresponding plasmids: enhancer, basic, and promoter, respectively. Each construct was tested in at least three separate experiments in duplicate. Standard errors are indicated with bars. absence from the transfected DNA of the proper chromatin structure associated with regulatable MUC2 gene expression. For example, we have previously reported that MUC2 mRNA is induced by forskolin and TPA in HT29 cells predominantly at a post-transcriptional level (14). Accordingly, the activity of the MUC2/luciferase plasmids was not significantly modulated in HT29 cells by treatment with these agents, confirming that transcription does not play a prominent role in the regulation of MUC2 in these colonic cells (data not shown). However, the same constructs could be modestly induced by TPA in HeLa cells (data not shown). Inspection of the first 2600 nucleotides upstream from the MUC2 AUG reveals the presence of several consensus AP1 binding sites, the TPA-responsive elements, which can mediate TPA induction, as well as an AP2 consensus site, which also can confer TPA responsivness (40). Most likely, these sites mediate TPA induction of the MUC2/luciferase constructs in HeLa cells, while they are inactive in HT29 cells, suggesting that, in the context of the MUC2 promoter, these sites are not functional.
The presence in the 5Ј-flanking region of MUC2 of a number of consensus sequences for transcription factors is not unexpected, and their functional significance requires further investigation. Nonetheless, comparison with the promoter of other genes expressed in the intestine may provide insight into the regulation of MUC2 during differentiation and transformation. Recently, it has been reported that a Sp1-like factor binds to a GC box in the MUC5B gene (41) and may play a role in its regulation in HT29-MTX cells, a methotrexate-resistant clone of HT29 cells characterized by the expression of several mucin genes (42). Whether this factor has any general relevance for the regulation of additional mucin genes, including MUC2 that contains several GC boxes in its 5Ј region, has yet to be investigated.
A Sp1-like factor has also been implicated in the regulation of the carcinoembryonic antigen gene (CEA) (29,30), that can be expressed in goblet cells, as is MUC2, as well as in the absorptive cell lineage. The partial overlapping pattern of expression of CEA and MUC2 (39) suggests that these genes may share some common regulatory elements. Moreover, cell lines derived from mucinous tumors and expressing high levels of MUC2 are also characterized by very high levels of CEA secretion (43). Mucinous tumors generally have a higher frequency of c-myc amplification as compared with more common colorectal tumors (44), although no association between c-myc alterations and MUC2 or CEA deregulation has been established. However, it is worth pointing out that a consensus c-myc binding site, an E box, is located at Ϫ1329 from the AUG of MUC2 and that a similar site, present in the CEA promoter, binds USF (29), a member of the basic helix-loop-helix leucin zipper proteins (45), as is c-Myc (46). All the Myc family members recognize an identical core sequence CACGTG and bind to the corresponding site as homo-or heterodimers (46 -48). Whether the USF binding site in the CEA promoter can be modulated by c-Myc in conjunction with any of its partners is not known. However, it is tempting to speculate that MUC2 and CEA may be coordinately deregulated via alterations of common transcription factors, resulting in the progression of the malignant phenotype.