Initiation of transcription of the MUC3A human intestinal mucin from a TATA-less promoter and comparison with the MUC3B amino terminus.

Human intestinal mucin genes MUC3A and MUC3B are members of a membrane mucin gene family residing at chromosome 7q22. In this paper, we utilized genomic and cDNA cloning to elucidate the sequence of the 5'-region of the MUC3A gene including the gene promoter and the amino terminus coding sequence. Following its 21-residue signal peptide, the amino terminus of the mucin consists of a 233-residue Thr-, Ser-, and Pro-rich nonrepetitive sequence that is contiguous with its hypervariable domain of 375-residue repeats. RNase protection analysis and 5'-GeneRacer PCR indicated that MUC3A gene transcripts initiate from multiple start sites along a region spanning approximately 180 bases. The 5'-flanking region of the gene had promoter activity when fused to a luciferase reporter gene in all of the tested cell lines. This region contained binding sites for several transcription factors, including those implicated in the regulation of intestinal genes, but lacked a cognate TATA box. These features of the gene promoter may enable the gene to be expressed at variable levels in several cell types with different repertoires of transcription factors. We also utilized 5'-GeneRacer PCR to determine the sequence of the 5'-terminus of the MUC3B message. The amino termini of the MUC3A and MUC3B mucins are 91% conserved at the amino acid level. Thus, MUC3A and MUC3B have highly conserved amino and carboxyl termini, suggesting a recent duplication of the entire ancestral gene. It remains to be determined whether other members of the 7q22 membrane mucin gene family have amino-terminal domains similar to MUC3A and MUC3B.

Membrane mucins are an important class of glycoproteins with diverse structures and functions. These molecules contain extracellular domains that serve as a scaffold for O-glycosylation. The O-glycans associated with membrane mucins are thought to function in cytoprotection and have been demonstrated to confer anti-adhesion properties upon cells (1). This latter characteristic may play a role in the dissemination and spread of cancer cells. In addition to conferring these electrostatic/physical properties upon cells, membrane mucins can anchor carbohydrate moieties with specific functions. Selectin ligands associated with membrane mucin glycans, for example, play a role in cancer cell extravasation during metastases (2). Certain membrane mucins function in signal transduction as well (3)(4)(5). Several membrane mucins also serve as clinically important tumor antigens (6,7).
All membrane mucins contain O-glycosylated extracellular domains; however, the size of these domains varies widely, from a few hundred amino acid residues to several thousand residues (8 -11). The composition and structure of other domains of these molecules also varies, suggesting different functions. Moreover, the alternative splicing of several membrane mucins leads to forms with deleted domains and to truncated forms lacking membrane-binding domains, rendering them soluble (12)(13)(14)(15).
The membrane mucin gene cluster located at human chromosome 7q22 contains at least four members including MUC3A, MUC3B, MUC17, and MUC11/12 (which may be one or two genes) (16,17). These mucins contain large extracellular glycosylation domains that exhibit variable number of tandem repeat (VNTR) polymorphism. They also contain extracellular EGF 1 -like domains, a transmembrane domain, and a cytoplasmic domain. We have focused much attention upon the MUC3 mucins, MUC3A and MUC3B (12,16,18,19). The extracellular regions of these glycoproteins both contain two polymorphic tandem repeat domains, one containing 375-residue tandem repeats and another containing 17-residue tandem repeats (19). The carboxyl termini of these mucins contain a mucin-like (Ser/Thr-rich) unique sequence, two EGF-like domains separated by a SEA module, a transmembrane domain, and a cytoplasmic domain (12,16). The exonic and intronic sequences of the carboxyl termini of MUC3A and MUC3B are ϳ95% identical, suggesting a recent gene duplication (16).
In this report we describe the sequences of the MUC3A gene 5Ј-flanking region and amino terminus, thus completing the MUC3A sequence. The coding sequence of the 5Ј-region of MUC3A consists of a signal peptide and a mucin-like but nonrepetitive amino terminus domain. Transcription is initiated from a TATA-less promoter at multiple start sites. We also describe the homologous coding region for MUC3B, which is highly conserved, indicating that the entire ancestral gene was duplicated.

EXPERIMENTAL PROCEDURES
In this paper, the term MUC3 refers to a clone derived from MUC3A or MUC3B or a hybridization probe or procedure capable of detecting both genes and their derived products. This terminology is used because almost all clones obtained in previous studies could have been derived from either gene and will hybridize to either gene (16). The human small intestinal cDNA library used previously to obtain MUC2 and MUC3 clones (18 -20) was screened using SIB227 as a hybridization probe. SIB227 consists entirely of sequence derived from a 1125-bp MUC3 tandem repeat domain (19). Hybridizing phage were plaquepurified, DNA was isolated, and inserts were retrieved by EcoRI digestion. DNA was sequenced following its cloning into pBluescript. All of the exonic sequence and most of the noncoding sequence were obtained using manual sequencing methods; however, some of the noncoding sequence was determined by automated sequencing performed by the University of California at San Francisco Cancer Center Genome Analysis Facility.
RNA was extracted from human LS174T colon cancer cells and from normal duodenal biopsy samples pooled from 10 patients using Tri reagent as described (21). DNA was extracted from the lymphocytes of individual donors using previously described procedures (21). RNA blot analysis was performed using SIB124 as a probe (18). Reverse transcription and PCR were performed as described (22).
Clones representative of 5Ј-termini of MUC3A and MUC3B messages were obtained using a GeneRacer kit purchased from Invitrogen. The protocol accompanying the kit was used essentially without modification. Briefly, 2 g of total RNA was treated with calf intestinal alkaline phosphatase and subsequently with tobacco acid pyrophosphatase. This procedure is designed to leave a 5Ј-phosphate moiety only on RNAs that contain a cap structure, thus selecting for 5Ј-full-length messages. This material was then joined to the Gene Racer 5Ј-RNA oligonucleotide (5Ј-CGACUGGAGCACGAGGACACUGACAUGGACUGAAGGAGUAG-AAA) using RNA ligase. The ligated mRNA was then reverse transcribed using Superscript II reverse transcriptase and a gene-specific deoxyribonucleotide primer, 5Ј-TGGGTGGAGTGGTTTCAAC (reverse complement of bases 2558 -2576 in Fig. 1B). The initial PCR utilized GeneRacer 5Ј-primer (5Ј-CGACTGGAGCACGAGGACACTGA) and 5Ј-TGAAGGCAAACTTGGACG as gene-specific primer (reverse complement of bases 2538 -2555 in Fig. 1B). Nested PCR, when performed, utilized GeneRacer 5Ј-nested primer (5Ј-GGACACTGACATGGACTGA AGGAGTA) and 5Ј-AGGTGTTGGAACTGACTGG as gene-specific nested primer (reverse complement of bases 2507-2525 in Fig. 1B). PCR products were cloned into pCR4-TOPO vector for sequencing.
RNase protection mapping of transcription start sites was performed using kits (RPA II and MAXIscript) purchased from Ambion. Templates for riboprobes were prepared by PCR using deoxyribonucleotide primers 5Ј-GGAGGAAGGGCAGCTTCCAG (bases Ϫ278 to Ϫ259 in Fig. 1B) and 5Ј-AGCGGGCAGGGGGCTGG (reverse complement of bases 35-51 in Fig. 1B). DNA isolated from human chromosome 7 BAC clone CTC-786K12 (purchased from Research Genetics, Inc.) was used in this reaction. The amplicon was ligated into pCR2.1, and clones containing insert in both orientations were identified by sequence analysis. The templates were linearized by digestion with BamHI, and the reactions were stopped and made RNase-free by digestion with proteinase K. The material was then extracted with chloroform/phenol and then chloroform and precipitated with sodium acetate/ethanol. After drying, the templates were dissolved in diethylpyrocarbonate-treated water. The riboprobes were synthesized from 1 g of template using the protocol described in the MAXIscript kit. To achieve high specific activity labeling, 2.0 M unlabeled CTP was used in conjunction with 100 Ci of [␣-32 P]CTP (specific activity Ͼ 3000 Ci/mmol; ICN) in a final volume of 20 l. The ribonuclease protection reactions were conducted using 25 g of total RNA, 100,000 cpm riboprobe, and the protocol described in the RPA II kit. The final products were subjected to electrophoresis on a standard sequencing gel, using the products derived from a sequencing reaction as size standards as described (20).
Transient transfections of MUC3A promoter/luciferase reporter constructs were performed using the pGL3-Basic vector (Promega). Six constructs were prepared using CTC-786K12 DNA and the two forward and three reverse primers described in Table I. Forward primers incorporated SstI sites, and reverse primers incorporated SmaI sites to facilitate cloning into pGL3-Basic. For transfections, the cells were seeded into 24-well plates, in Dulbecco's modified Eagle's medium with 10% fetal bovine serum. The transfections were carried out using Superfect (Qiagen) according to the manufacturer's instructions. Transfection conditions were optimized for each cell line. For AsPC, H498, SW1990, HPAF, and LS174T cell lines, 0.68 g of test plasmid, 0.12 g of pRLO (Promega; promoter-less Renilla luciferase internal control vector), and 4 l of Superfect were added in a transfection volume of 500uL for each well. For Caco2 and Panc-1 cells, 1 g of test plasmid, 0.12 g of pRLO, and 10 l of Superfect were used. After 2 days, the cells were harvested, and the luciferase activities were assessed using a dual luciferase assay kit (Promega). The results were normalized within cell lines by dividing them with results obtained with pGL3-Control vector (Promega), which utilizes the SV40 promoter and enhancer to drive luciferase transcription.

RESULTS
The MUC3A Amino Terminus-To obtain MUC3A or MUC3B amino terminus clones, the human intestinal cDNA library was first screened using SIB227 as a hybridization probe. Approximately 20 clones were obtained, successfully plaque-purified, and sequenced. Many of these clones consisted entirely of elements of the 1125 base tandem repeat domain as does SIB227. Some were chimeras with unrelated DNA fragments as commonly occurs when cloning repetitive cDNAs (19). However, one clone designated SIB275, encoded a mucin-like unique sequence upstream of and in frame with SIB227. Thus, SIB275 was considered a likely candidate as an amino terminus MUC3A or MUC3B cDNA clone.
We obtained a genomic clone containing MUC3A sequence by BLAST searching the GSS data base using SIB227 as query sequence. BAC clone CTC-786K12 was identified as containing this sequence at one terminus, reading in a 3Ј to 5Ј direction. The other terminus of this clone contained sequence identifiable as part of the zonadhesion gene. Thus, we concluded that this clone should contain the 5Ј-portion of the at least one of the MUC3 genes and associated flanking sequencing. (Later work identified this clone as containing MUC3A (16).) This clone was therefore purchased from Research Genetics, Inc. to enable additional sequencing. We sequenced a 14-kb segment of this clone initiating from the terminus containing SIB227 sequence, a portion of which is shown in Fig. 1. Later, clones and contigs confirming and extending this sequence became available in public data bases (GenBank TM accession numbers AC118759 and BL000001).
The clone CTC-786K12 sequence obtained confirmed the sequence of cDNA clone SIB275. By also extending the sequence, it revealed the presence of a possible initiation codon and signal peptide (Fig. 1). A single intron of 2167 bases was present in the genomic sequence, beginning at the termination point of  Fig. 1B a Underlined bases were changed to introduce SstI sites in forward primers and SmaI sites in reverse primers. b Position specified indicates the termination point of the resulting insert in the pGL3 construct. The 5Ј-untranslated region initiates from multiple start sites as described in the text and therefore varies in length. The BAC clone CT786K12 initiates in the second exon of MUC3A and contains the 5Ј-portion of the gene and its 5Ј-flanking region, which resides on chromosome 7q22. The location of the primers used to perform GeneRacer PCR to obtain message sequence is also given. The initiation points for the clones obtained are given in B. R.T., reverse transcription primer. B, genomic DNA and message sequence of the 5Ј-region of MUC3A. Potential transcription factor binding sites were determined using TFSEARCH and MatInspector web sites (150.82.196.184/research/db/TFSEARCH.html and www.genomatix.de; Refs. 23 and 24). The initiation points for GeneRacer clones derived from butyrate-treated LS174T cell RNA are indicated by boxes above the sequence, and those derived from duodenal RNA are indicated by circles. The number of clones obtained initiating from a given base is indicated by the number within the box or circle. Solid dots above the sequence indicate major transcription start sites as determined by RNase protection analysis. Double underlines indicate a signal sequence (25). The start site of cDNA clone SIB275 is indicated by an arrow; the sequence terminates at the end of this cDNA clone. Splice sites demarcating the single intron are also indicated by arrows, as are the boundaries of the mucin-like unique sequence and the 375-amino acid tandem repeat array. The sequence was submitted to GenBank TM with accession number AY307930. the putative signal peptide. It was not certain from this data, however, that the MUC3A amino terminus had been reached. A first approach in examining this further was to screen the human intestinal cDNA library with a probe made by PCR from the first 250 bases of clone SIB275. Several clones were obtained, and the ends were sequenced (Fig. 1A). Two of these clones, SIB324 and SIB326, extended the SIB275 sequence read by 64 and 37 bases in the 5Ј-direction, respectively. The fact that cDNA clones derived from further upstream genomic sequences were not obtained in this screening suggested that these clones may represent the 5Ј-terminal region of the message. To examine this still further, 5Ј-GeneRacer PCR was conducted. Previous work demonstrated that duodenal RNA and butyrate-treated LS174T cell RNA contained high levels of MUC3 message (17); thus, these were used for this study. We were able to obtain product visible on an ethidium bromidestained agarose gel after initial PCR with duodenal RNA; however, butyrate-treated LS174T cell RNA required that nested PCR be employed. A total of 16 clones were obtained with butyrate-treated LS174T cell RNA, of which 15 were similar to sequence shown in Fig. 1B. Interestingly, however, these clones initiated in different places along a 179-base segment as shown in Fig. 1B. A total of 15 clones were obtained from the duodenal RNA. Five of these clones had slightly different sequences and appear to be derived from MUC3B, as will be described later. The other 10 were derived from MUC3A, and the points at which they initiate are depicted in Fig. 1B. Three of these clones were derived from initial PCR, and seven were from nested PCR. Multiple start sites were also observed with duodenal RNA. The experimentation described above suggested that MUC3A transcripts initiate from multiple start sites. This possibility was tested using RNase protection analysis. We have used this technique previously to map the start sites of MUC2 (20), DPPIV (26), and a MUC2 promoter/growth hor-mone transgene (21). Preliminary results indicated that the technique was not sensitive enough to map protected fragments utilizing butyrate-treated LS174T cell RNA (data not shown). The fact that GeneRacer PCR was successful after initial PCR with duodenal RNA but required nested PCR with butyrate-treated LS174T cell RNA suggested that duodenum is the richer source of MUC3A message. We therefore directed our efforts entirely upon utilizing duodenal RNA to map start sites by RNase protection analysis. We were able to detect protected fragments utilizing this technique with 25 g of duodenal RNA and the high specific activity probe described under "Experimental Procedures." Multiple protected fragments were obtained with the antisense probe when duodenal RNA was used but not with the tRNA control (Fig. 2). In contrast, no bands were observed with the sense probe with RNA from either source. The actin control provided with the kit gave a single major band at the expected location (Fig. 2). The protected bands observed with duodenal RNA indicated the presence of eight major start sites for MUC3A transcription. Their locations, which span 141 bases, are indicated in Fig. 1B. In addition, many bands of lower intensity are evident as well, suggesting that MUC3A transcription can initiate from many start sites, in addition to the major ones. Thus, RNase protection analysis confirms the notion drawn from GeneRacer PCR that MUC3A transcripts initiate from multiple start sites.
Promoter Activity of the MUC3A Gene 5Ј-Flanking Sequence-The genomic region flanking the MUC3A transcription start sites contains the consensus sequences for many transcription factor-binding sites as depicted in Fig. 1B. Many of these elements including Cdx-2, GATA, USF, CACCC, and HNF-1 have been shown to be present within the promoters of genes expressed in the intestine (26 -29). Noticeably absent, however, is a cognate TATA box, an important element that binds transcription factor IID, orienting and nucleating the transcription initiation complex of many tissue-specific genes  (30,31). Because data from both GeneRacer PCR and RNase protection analysis indicated that the MUC3A message initiated from this region, we sought to determine whether the sequence encoded a functional promoter. To examine this we made luciferase promoter-reporter constructs that terminated before the most proximal transcription start site, within the region containing start sites, or after the most distal start site. As illustrated in Fig. 3A, two sets of these constructs were made, one initiating at base Ϫ775 in Fig. 1B and one initiating at base Ϫ623. Our rationale for this experimental design was that if the 5Ј-flanking region was a functional promoter, then constructs terminating downstream of the transcription start sites should produce luciferase activity in transient transfection assays. Constructs terminating upstream of the first transcription start site would not be expected to be active, however. This follows from the observation that transcripts are not initiated from upstream sequences in vivo.
These constructs were then transfected into four pancreatic and three colon cancer cells, and luciferase was assayed as a measure of promoter activity (Fig. 3B). The constructs were active in all of the tested cell lines; however, the relative levels of luciferase activity obtained varied markedly in magnitude. The pattern of activity elicited by the various constructs was remarkably similar in all tested lines. The constructs that terminated downstream of the most distal start site (Ϫ775/ϩ57 and Ϫ623/ϩ57) produced high levels of luciferase activity. Those that terminated within the region of start sites (Ϫ775/ Ϫ62 and Ϫ623/Ϫ62) had intermediate levels of activity. Most importantly, however, constructs that terminated upstream of the first known MUC3A start site (Ϫ775/Ϫ242 and Ϫ623/Ϫ242) produced only minimal levels of luciferase activity. Little difference was observed whether the constructs initiated at base Ϫ775 or base Ϫ623. These results demonstrate that the MUC3A 5Ј-flanking region has functional promoter activity, but for this to be manifested a region containing transcription start sites must be included.
Comparison of MUC3A and MUC3B Amino Terminus Sequences-As described above, five clones from 5Ј-GeneRacer PCR with duodenal RNA were noted that had slightly different sequences from the MUC3A sequence described in Fig. 1B. We hypothesized that these clones may represent MUC3B sequence but also considered the possibility that they could be an allelic variant of MUC3A. As a test of these possibilities, we posed the question: Is the putative MUC3B sequence invariably present in the DNA of individual donors? If so, then this would indicate that the sequence represented a second gene rather than an allele of MUC3A. To amplify the sequences, we chose forward and reverse PCR primers from DNA segments in which both sequences were identical (forward, 5Ј-CTTTATC CACGGCCACATCC (bases 2289 -2308 in Fig. 1B) and reverse, 5Ј-AGGCAAACTTGGACGTCGGG (reverse complement of bases 2533-2552 in Fig. 1B)). Genomic DNA samples from the lymphocytes of eight donors were used for the reactions. The products were then digested with a series of restriction enzymes chosen because they digest the MUC3A sequence or the putative MUC3B sequence but not both sequences (Fig. 4A). In this way we were able to test whether the individual samples contained both sequences. Results from two representative samples are shown in Fig. 4B. The size of the uncut PCR product was 263 bases. BfaI and HgaI are expected to digest the MUC3B sequence but not the MUC3A sequence, as described in Fig. 4A. This occurred in all of the tested samples, examples shown in Fig. 4B. HhaI, on the other hand, is expected to digest MUC3A but not MUC3B. This result was also obtained with all of the samples. As a control, we digested DNA isolated from BAC clone CTC-786K12 with these enzymes. This was digested with HhaI but not with BfaI or HgaI (Fig. 4B), the expected result because this clone contains MUC3A sequence. These results indicate that all eight tested genomic DNA samples contain both the MUC3A and MUC3B sequences. This result is consistent with the MUC3B sequence representing a separate gene, because it is highly unlikely that all samples would contain two alleles of the same gene, even if the gene frequency was 0.5 for both alleles (probability ϭ 0.5 8 Ͻ 0.004).
We performed reverse transcription-PCR to amplify the complete MUC3B amino terminus for sequencing. Duodenal RNA was used for reverse transcription in conjunction with random primers. For the forward PCR primer we chose a sequence FIG. 2. RNase protection analysis of MUC3A start sites. Samples were hybridized to the antisense and sense probes as indicated, and RNase protection analysis was conducted as described in the text. The legend indicates the position of protected bands in the sequence shown in Fig. 1B. The probes used for analysis were run on the same gel as was the actin control described in the text. Duo., duodenum. present in MUC3B but not MUC3A (5Ј-GATGCTCAAGTC-CTCCCCAG). For the reverse primer we chose a highly conserved segment within the 1125-base tandem repeat domain (5Ј-GGTGGTCTCAGATGTAGGTGTAG). This generated a 1.0-kb fragment, which was ligated into pCR2.1 for sequence analysis. The sequence confirmed and extended that of the GeneRacer PCR clones (Fig. 5). There are four changes in the 21-residue signal peptide. The 234-residue mucin-like unique sequence that constitutes the amino terminus domain of MUC3B is 91% conserved with that of MUC3A, with a single 1-residue insertion being present. The MUC3A and MUC3B amino termini are both contiguous with 375-residue tandem repeat domains described earlier (19). Thus, the amino termini of MUC3A and MUC3B are highly conserved, as are the carboxyl termini of the two glycoproteins (16).

DISCUSSION
MUC3A and MUC3B are highly conserved members of the membrane mucin gene family residing on chromosome 7q22. The initial cDNAs encoding these genes were thought to represent a single, secretory mucin (18,19). Later work demonstrated existence of the two separate, closely related genes (16). Moreover, the most common splice variants of these genes were shown to encode membrane-bound mucins (12, 32). These genes contain repetitive sequences that often rearrange or form chimeras during cloning (19). This has hampered efforts to obtain their sequences by both manual and automated methods. Only recently has the sequence of a genomic clone containing the amino terminus of MUC3A become available in the public data bank (BAC clone RP13-650G11; GenBank TM accession number AC118759). This sequence contains a large deletion encompassing both the 1125-base and the 51-base tandem repeats domains. Errors in mucin sequences in genomic data banks may be due to the software employed by the human genome project, which often handles repetitive sequences by deleting them from assembled contigs (33). There are still no clones available in the public data bank for the MUC3B amino terminus or promoter. Thus, the sequencing of these genes has been difficult, and the sequences presented here for the MUC3A and MUC3B amino termini represent the first complete coding sequence available for 7q22 family mucins.
Several types of evidence support the argument that the conceptual translations shown in Figs. 1 and 5 represent the true amino termini of MUC3A and MUC3B: (i) 5Ј-GeneRacer PCR revealed no sequences initiating upstream of those shown in Fig. 1. (ii) There is an in-frame stop codon at base ϩ19 of the sequence, 40 bases upstream of the putative initiation codon. There is also a potential 3Ј-splice site at base ϩ38, and we considered the possibility that this may be functional. This does not appear to be the case, however, because none of the 30 5Ј-GeneRacer MUC3A and MUC3B clones that we sequenced was spliced at this site. (iii) Attempts to isolate upstream clones by screening the intestinal cDNA library with a probe derived from the upstream region of SIB275 were unsuccessful. (iv) No ATG triplets occur in the MUC3A message sequence upstream of the proposed initiation codon. (v) The translated sequences were subjected to analysis for the presence of signal peptides using the SignalP program (www.cbs.dtu.dk; Ref. 25). This analysis strongly indicated the presence of 21-residue signal peptides for both MUC3A and MUC3B. Downstream of the signal peptide, the MUC3A and MUC3B amino termini unique sequences are mucin-like, containing more than 40% Thr, Ser, and Pro residues. They do not, however, contain repetitive elements, nor do they have sequence similarity with any other known protein. It is interesting that the initiation codons for both MUC3A and MUC3B occur in the same DNA context, i.e. GGCCCATGC, which deviates at the Ϫ3 position (underscored) from the consensus sequences for initiation codons. Only 3% of eukaryotic messages have pyrimidines at this position (34). The significance of this deviation from the consensus in MUC3A and MUC3B messages is not known.
The MUC3A upstream sequence contains binding sites for  5. Comparison of the amino termini of MUC3A and MUC3B. The MacVector 6.5 ClustalW routine was used to perform the alignment; identical residues are boxed. Arrows demarcate the boundaries of the signal peptides, the mucin-like unique sequences, and the 375-residue tandem repeat array. The MUC3B cDNA sequence was deposited in GenBank TM with accession number AY307931. AA, amino acid. many transcription factors, although it lacks a TATA box. This sequence has functional promoter activity in all of the tested cell lines and initiates transcription from a region spanning ϳ180 bases (Figs. 1-3). Genes with promoters lacking TATA boxes often have multiple transcription start sites (35)(36)(37). Other mucin genes including MUC2 and MUC5AC have promoters with TATA boxes and initiate transcription from a region spanning only a few bases (20,38). However, these genes have very restricted tissue-and cell type-specific patterns of expression. Conversely, MUC3 transcripts can be found in many disparate tissues including small and large intestine, thymus, liver, lymph nodes, and heart (17,39). Because the probes used to detect MUC3 expression in most previous studies would recognize either MUC3A or MUC3B, it is possible that the wide range of tissues MUC3 is detected in is due to either one or the other gene being expressed. However, this cannot entirely explain the wide range of tissues expressing MUC3. Previous work has demonstrated that both MUC3A and MUC3B are expressed in adult and fetal small and large intestine and Caco-2 cells (16). Data from this work indicate that both genes are expressed in duodenum. It seems likely that the wide range of transcription initiation sites for MUC3A transcripts noted in this study may allow the gene to be active in different tissues with different repertoires of transcription factors. It may also be significant that the region encompassing the first 450 bases of the MUC3A promoter is GC-rich, i.e. it contains 62% G and C residues. This region contains 16 CpG dinucleotides, and it is possible that methylation-mediated gene silencing plays a role in regulating this gene.
There is some discrepancy between the mapping of transcription start sites by GeneRacer PCR and RNase protection analysis ( Figs. 1 and 2). Some of this may be due to cloning biases. This possibility is suggested by the observation that several instances of multiple clones initiating at the same base were noted. This occurred with both LS174T cell and duodenal RNA. RNase protection analysis revealed the presence of eight major and many minor start sites (Fig. 2). Some of the faint bands may be due to nicking of the hybridized riboprobe during RNase digestion or at some other step of the procedure. The presence of MUC3B messages may complicate this analysis as well. The procedure did yield a major band with the actin control provided with the kit, however, and we obtained a narrow range of protected fragments when we applied RNase protection analysis to map the start of MUC2 and MUC2/hGH transgene transcription (20,21). Thus, it seems highly probable that the MUC3A promoter initiates transcription from multiple start sites. Interestingly, the MUC3B 5Ј-GeneRacer clones obtained all initiated from the same base. Since only five such clones were obtained, it is difficult to draw conclusions, but this may indicate that the MUC3B promoter is fundamentally different from the MUC3A promoter.
The structure of MUC3A and MUC3B is compared with other fully sequenced human membrane mucins in Fig. 6. The MUC3 mucins are large, with mucin-like unique sequences in their amino termini. They contain two separate Thr-, Ser-, and Pro-rich tandem repeat domains that exhibit VNTR polymorphism within the human population (19). The carboxyl termini of the MUC3 mucins are alternatively spliced, with the most abundant splice variants containing two EGF-like domains, a transmembrane domain, a cytoplasmic domain, and a SEA module domain (12,16,32). This latter domain is often associated with the extracellular region of O-glycosylated proteins (40) and contains a conserved proteolytic processing cleavage site (41). MUC4 is also a large membrane-associated mucin; in fact most alleles of MUC4 are larger than those of MUC3A and MUC3B (11). The amino termini of MUC4 contains three 126 -130-residue degenerate tandem repeats and a mucin-like unique sequence. The carboxyl termini lacks a cognate SEA module domain, but it does contain a proteolytic cleavage site (10). MUC4 also contains a cognate NIDO domain, an AMOP domain, and a vWF-D domain (42). The function of these conserved structural domains in MUC4 is unknown. The EGF-like domains of the MUC4 carboxyl terminus may be functionally important, however, because the EGF-like domains of the rat homolog of MUC4 binds to ErbB2 and modulates its phosphorylation and interaction with ErbB3 (3,43). MUC4 is highly expressed in airways and colon; however, appreciable levels are also found in stomach, lung and prostate (17). Thus, similar to the MUC3 mucins, MUC4 is expressed in several tissues. MUC1, which is associated with normal breast and pancreas and many types of carcinomas (44), is smaller than the MUC3 mucins and MUC4. This mucin lacks cognate extracellular structural domains other than its set of tandem repeats; however, its cytoplasmic domain is evolutionarily conserved and can function as a scaffold for signal transduction (4,5). The MUC13 (8) and MUC15 (9) mucins are very small in size; however, the carboxyl terminus of MUC13 has a domain structure similar to that of MUC3A and MUC3B, with EGF-like domains and a SEA module.
Membrane mucins vary tremendously in size and sequence FIG. 6. The domain structures of human membrane mucins. The relative sizes of the molecules are drawn approximately to scale. The MUC1 mucin is depicted as having a 1200-residue VNTR array, the average allele size (45). The total sizes of MUC3 and MUC4 mucins were estimated from the size of their respective messages (11). MUC13 and MUC15 do not appear to display VNTR polymorphism (8,9). The hatched boxes represent cognate structural domains. but often share conserved structural domains. They function as a scaffold for glycans that can alter the electrostatic properties of cells or serve as specific ligands. Certain membrane mucins have been shown to function in signal transduction, and others may as well. The challenge now is to understand how the expression of these molecules affects cell function in both normal tissues and in various pathological states in which their expression is often altered.