![]()
|
|
||||||||
J. Biol. Chem., Vol. 278, Issue 49, 49600-49609, December 5, 2003
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




From the
¶Gastrointestinal Research Laboratory, Department of Veterans Affairs Medical Center, San Francisco, California 94121 and the Departments of
Anatomy, ||Medicine, and 
Pathology, University of California at San Francisco, San Francisco, California 94121
Received for publication, June 2, 2003 , and in revised form, September 3, 2003.
| ABSTRACT |
|---|
|
|
|---|
180 bases. The 5'-flanking region of the gene had promoter activity when fused to a luciferase reporter gene in all of the tested cell lines. This region contained binding sites for several transcription factors, including those implicated in the regulation of intestinal genes, but lacked a cognate TATA box. These features of the gene promoter may enable the gene to be expressed at variable levels in several cell types with different repertoires of transcription factors. We also utilized 5'-GeneRacer PCR to determine the sequence of the 5'-terminus of the MUC3B message. The amino termini of the MUC3A and MUC3B mucins are 91% conserved at the amino acid level. Thus, MUC3A and MUC3B have highly conserved amino and carboxyl termini, suggesting a recent duplication of the entire ancestral gene. It remains to be determined whether other members of the 7q22 membrane mucin gene family have amino-terminal domains similar to MUC3A and MUC3B. | INTRODUCTION |
|---|
|
|
|---|
All membrane mucins contain O-glycosylated extracellular domains; however, the size of these domains varies widely, from a few hundred amino acid residues to several thousand residues (811). The composition and structure of other domains of these molecules also varies, suggesting different functions. Moreover, the alternative splicing of several membrane mucins leads to forms with deleted domains and to truncated forms lacking membrane-binding domains, rendering them soluble (1215).
The membrane mucin gene cluster located at human chromosome 7q22 contains at least four members including MUC3A, MUC3B, MUC17, and MUC11/12 (which may be one or two genes) (16, 17). These mucins contain large extracellular glycosylation domains that exhibit variable number of tandem repeat (VNTR) polymorphism. They also contain extracellular EGF1-like domains, a transmembrane domain, and a cytoplasmic domain. We have focused much attention upon the MUC3 mucins, MUC3A and MUC3B (12, 16, 18, 19). The extracellular regions of these glycoproteins both contain two polymorphic tandem repeat domains, one containing 375-residue tandem repeats and another containing 17-residue tandem repeats (19). The carboxyl termini of these mucins contain a mucin-like (Ser/Thr-rich) unique sequence, two EGF-like domains separated by a SEA module, a transmembrane domain, and a cytoplasmic domain (12, 16). The exonic and intronic sequences of the carboxyl termini of MUC3A and MUC3B are
95% identical, suggesting a recent gene duplication (16).
In this report we describe the sequences of the MUC3A gene 5'-flanking region and amino terminus, thus completing the MUC3A sequence. The coding sequence of the 5'-region of MUC3A consists of a signal peptide and a mucin-like but nonrepetitive amino terminus domain. Transcription is initiated from a TATA-less promoter at multiple start sites. We also describe the homologous coding region for MUC3B, which is highly conserved, indicating that the entire ancestral gene was duplicated.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
RNA was extracted from human LS174T colon cancer cells and from normal duodenal biopsy samples pooled from 10 patients using Tri reagent as described (21). DNA was extracted from the lymphocytes of individual donors using previously described procedures (21). RNA blot analysis was performed using SIB124 as a probe (18). Reverse transcription and PCR were performed as described (22).
Clones representative of 5'-termini of MUC3A and MUC3B messages were obtained using a GeneRacer kit purchased from Invitrogen. The protocol accompanying the kit was used essentially without modification. Briefly, 2 µg of total RNA was treated with calf intestinal alkaline phosphatase and subsequently with tobacco acid pyrophosphatase. This procedure is designed to leave a 5'-phosphate moiety only on RNAs that contain a cap structure, thus selecting for 5'-full-length messages. This material was then joined to the Gene Racer 5'-RNA oligonucleotide (5'-CGACUGGAGCACGAGGACACUGACAUGGACUGAAGGAGUAGAAA) using RNA ligase. The ligated mRNA was then reverse transcribed using Superscript II reverse transcriptase and a gene-specific deoxyribonucleotide primer, 5'-TGGGTGGAGTGGTTTCAAC (reverse complement of bases 25582576 in Fig. 1B). The initial PCR utilized GeneRacer 5'-primer (5'-CGACTGGAGCACGAGGACACTGA) and 5'-TGAAGGCAAACTTGGACG as gene-specific primer (reverse complement of bases 25382555 in Fig. 1B). Nested PCR, when performed, utilized GeneRacer 5'-nested primer (5'-GGACACTGACATGGACTGA AGGAGTA) and 5'-AGGTGTTGGAACTGACTGG as gene-specific nested primer (reverse complement of bases 25072525 in Fig. 1B). PCR products were cloned into pCR4-TOPO vector for sequencing.
|
-32P]CTP (specific activity > 3000 Ci/mmol; ICN) in a final volume of 20 µl. The ribonuclease protection reactions were conducted using 25 µg of total RNA, 100,000 cpm riboprobe, and the protocol described in the RPA II kit. The final products were subjected to electrophoresis on a standard sequencing gel, using the products derived from a sequencing reaction as size standards as described (20). Transient transfections of MUC3A promoter/luciferase reporter constructs were performed using the pGL3-Basic vector (Promega). Six constructs were prepared using CTC-786K12 DNA and the two forward and three reverse primers described in Table I. Forward primers incorporated SstI sites, and reverse primers incorporated SmaI sites to facilitate cloning into pGL3-Basic. For transfections, the cells were seeded into 24-well plates, in Dulbecco's modified Eagle's medium with 10% fetal bovine serum. The transfections were carried out using Superfect (Qiagen) according to the manufacturer's instructions. Transfection conditions were optimized for each cell line. For AsPC, H498, SW1990, HPAF, and LS174T cell lines, 0.68 µg of test plasmid, 0.12 µg of pRLO (Promega; promoter-less Renilla luciferase internal control vector), and 4 µl of Superfect were added in a transfection volume of 500uL for each well. For Caco2 and Panc-1 cells, 1 µg of test plasmid, 0.12 µg of pRLO, and 10 µl of Superfect were used. After 2 days, the cells were harvested, and the luciferase activities were assessed using a dual luciferase assay kit (Promega). The results were normalized within cell lines by dividing them with results obtained with pGL3-Control vector (Promega), which utilizes the SV40 promoter and enhancer to drive luciferase transcription.
|
| RESULTS |
|---|
|
|
|---|
We obtained a genomic clone containing MUC3A sequence by BLAST searching the GSS data base using SIB227 as query sequence. BAC clone CTC-786K12 was identified as containing this sequence at one terminus, reading in a 3' to 5' direction. The other terminus of this clone contained sequence identifiable as part of the zonadhesion gene. Thus, we concluded that this clone should contain the 5'-portion of the at least one of the MUC3 genes and associated flanking sequencing. (Later work identified this clone as containing MUC3A (16).) This clone was therefore purchased from Research Genetics, Inc. to enable additional sequencing. We sequenced a 14-kb segment of this clone initiating from the terminus containing SIB227 sequence, a portion of which is shown in Fig. 1. Later, clones and contigs confirming and extending this sequence became available in public data bases (GenBankTM accession numbers AC118759 [GenBank] and BL000001 [GenBank] ).
The clone CTC-786K12 sequence obtained confirmed the sequence of cDNA clone SIB275. By also extending the sequence, it revealed the presence of a possible initiation codon and signal peptide (Fig. 1). A single intron of 2167 bases was present in the genomic sequence, beginning at the termination point of the putative signal peptide. It was not certain from this data, however, that the MUC3A amino terminus had been reached. A first approach in examining this further was to screen the human intestinal cDNA library with a probe made by PCR from the first 250 bases of clone SIB275. Several clones were obtained, and the ends were sequenced (Fig. 1A). Two of these clones, SIB324 and SIB326, extended the SIB275 sequence read by 64 and 37 bases in the 5'-direction, respectively. The fact that cDNA clones derived from further upstream genomic sequences were not obtained in this screening suggested that these clones may represent the 5'-terminal region of the message. To examine this still further, 5'-GeneRacer PCR was conducted. Previous work demonstrated that duodenal RNA and butyrate-treated LS174T cell RNA contained high levels of MUC3 message (17); thus, these were used for this study. We were able to obtain product visible on an ethidium bromide-stained agarose gel after initial PCR with duodenal RNA; however, butyrate-treated LS174T cell RNA required that nested PCR be employed. A total of 16 clones were obtained with butyrate-treated LS174T cell RNA, of which 15 were similar to sequence shown in Fig. 1B. Interestingly, however, these clones initiated in different places along a 179-base segment as shown in Fig. 1B. A total of 15 clones were obtained from the duodenal RNA. Five of these clones had slightly different sequences and appear to be derived from MUC3B, as will be described later. The other 10 were derived from MUC3A, and the points at which they initiate are depicted in Fig. 1B. Three of these clones were derived from initial PCR, and seven were from nested PCR. Multiple start sites were also observed with duodenal RNA.
The experimentation described above suggested that MUC3A transcripts initiate from multiple start sites. This possibility was tested using RNase protection analysis. We have used this technique previously to map the start sites of MUC2 (20), DPPIV (26), and a MUC2 promoter/growth hormone transgene (21). Preliminary results indicated that the technique was not sensitive enough to map protected fragments utilizing butyrate-treated LS174T cell RNA (data not shown). The fact that GeneRacer PCR was successful after initial PCR with duodenal RNA but required nested PCR with butyrate-treated LS174T cell RNA suggested that duodenum is the richer source of MUC3A message. We therefore directed our efforts entirely upon utilizing duodenal RNA to map start sites by RNase protection analysis. We were able to detect protected fragments utilizing this technique with 25 µg of duodenal RNA and the high specific activity probe described under "Experimental Procedures." Multiple protected fragments were obtained with the antisense probe when duodenal RNA was used but not with the tRNA control (Fig. 2). In contrast, no bands were observed with the sense probe with RNA from either source. The actin control provided with the kit gave a single major band at the expected location (Fig. 2). The protected bands observed with duodenal RNA indicated the presence of eight major start sites for MUC3A transcription. Their locations, which span 141 bases, are indicated in Fig. 1B. In addition, many bands of lower intensity are evident as well, suggesting that MUC3A transcription can initiate from many start sites, in addition to the major ones. Thus, RNase protection analysis confirms the notion drawn from GeneRacer PCR that MUC3A transcripts initiate from multiple start sites.
|
|
Comparison of MUC3A and MUC3B Amino Terminus SequencesAs described above, five clones from 5'-GeneRacer PCR with duodenal RNA were noted that had slightly different sequences from the MUC3A sequence described in Fig. 1B. We hypothesized that these clones may represent MUC3B sequence but also considered the possibility that they could be an allelic variant of MUC3A. As a test of these possibilities, we posed the question: Is the putative MUC3B sequence invariably present in the DNA of individual donors? If so, then this would indicate that the sequence represented a second gene rather than an allele of MUC3A. To amplify the sequences, we chose forward and reverse PCR primers from DNA segments in which both sequences were identical (forward, 5'-CTTTATC CACGGCCACATCC (bases 22892308 in Fig. 1B) and reverse, 5'-AGGCAAACTTGGACGTCGGG (reverse complement of bases 25332552 in Fig. 1B)). Genomic DNA samples from the lymphocytes of eight donors were used for the reactions. The products were then digested with a series of restriction enzymes chosen because they digest the MUC3A sequence or the putative MUC3B sequence but not both sequences (Fig. 4A). In this way we were able to test whether the individual samples contained both sequences. Results from two representative samples are shown in Fig. 4B. The size of the uncut PCR product was 263 bases. BfaI and HgaI are expected to digest the MUC3B sequence but not the MUC3A sequence, as described in Fig. 4A. This occurred in all of the tested samples, examples shown in Fig. 4B. HhaI, on the other hand, is expected to digest MUC3A but not MUC3B. This result was also obtained with all of the samples. As a control, we digested DNA isolated from BAC clone CTC-786K12 with these enzymes. This was digested with HhaI but not with BfaI or HgaI (Fig. 4B), the expected result because this clone contains MUC3A sequence. These results indicate that all eight tested genomic DNA samples contain both the MUC3A and MUC3B sequences. This result is consistent with the MUC3B sequence representing a separate gene, because it is highly unlikely that all samples would contain two alleles of the same gene, even if the gene frequency was 0.5 for both alleles (probability = 0.58 < 0.004).
|
|
| DISCUSSION |
|---|
|
|
|---|
Several types of evidence support the argument that the conceptual translations shown in Figs. 1 and 5 represent the true amino termini of MUC3A and MUC3B: (i) 5'-GeneRacer PCR revealed no sequences initiating upstream of those shown in Fig. 1. (ii) There is an in-frame stop codon at base +19 of the sequence, 40 bases upstream of the putative initiation codon. There is also a potential 3'-splice site at base +38, and we considered the possibility that this may be functional. This does not appear to be the case, however, because none of the 30 5'-GeneRacer MUC3A and MUC3B clones that we sequenced was spliced at this site. (iii) Attempts to isolate upstream clones by screening the intestinal cDNA library with a probe derived from the upstream region of SIB275 were unsuccessful. (iv) No ATG triplets occur in the MUC3A message sequence upstream of the proposed initiation codon. (v) The translated sequences were subjected to analysis for the presence of signal peptides using the SignalP program (www.cbs.dtu.dk; Ref. 25). This analysis strongly indicated the presence of 21-residue signal peptides for both MUC3A and MUC3B. Downstream of the signal peptide, the MUC3A and MUC3B amino termini unique sequences are mucin-like, containing more than 40% Thr, Ser, and Pro residues. They do not, however, contain repetitive elements, nor do they have sequence similarity with any other known protein. It is interesting that the initiation codons for both MUC3A and MUC3B occur in the same DNA context, i.e. GGCCCATGC, which deviates at the -3 position (underscored) from the consensus sequences for initiation codons. Only 3% of eukaryotic messages have pyrimidines at this position (34). The significance of this deviation from the consensus in MUC3A and MUC3B messages is not known.
The MUC3A upstream sequence contains binding sites for many transcription factors, although it lacks a TATA box. This sequence has functional promoter activity in all of the tested cell lines and initiates transcription from a region spanning
180 bases (Figs. 1, 2, 3). Genes with promoters lacking TATA boxes often have multiple transcription start sites (3537). Other mucin genes including MUC2 and MUC5AC have promoters with TATA boxes and initiate transcription from a region spanning only a few bases (20, 38). However, these genes have very restricted tissue- and cell type-specific patterns of expression. Conversely, MUC3 transcripts can be found in many disparate tissues including small and large intestine, thymus, liver, lymph nodes, and heart (17, 39). Because the probes used to detect MUC3 expression in most previous studies would recognize either MUC3A or MUC3B, it is possible that the wide range of tissues MUC3 is detected in is due to either one or the other gene being expressed. However, this cannot entirely explain the wide range of tissues expressing MUC3. Previous work has demonstrated that both MUC3A and MUC3B are expressed in adult and fetal small and large intestine and Caco-2 cells (16). Data from this work indicate that both genes are expressed in duodenum. It seems likely that the wide range of transcription initiation sites for MUC3A transcripts noted in this study may allow the gene to be active in different tissues with different repertoires of transcription factors. It may also be significant that the region encompassing the first 450 bases of the MUC3A promoter is GC-rich, i.e. it contains 62% G and C residues. This region contains 16 CpG dinucleotides, and it is possible that methylation-mediated gene silencing plays a role in regulating this gene.
There is some discrepancy between the mapping of transcription start sites by GeneRacer PCR and RNase protection analysis (Figs. 1 and 2). Some of this may be due to cloning biases. This possibility is suggested by the observation that several instances of multiple clones initiating at the same base were noted. This occurred with both LS174T cell and duodenal RNA. RNase protection analysis revealed the presence of eight major and many minor start sites (Fig. 2). Some of the faint bands may be due to nicking of the hybridized riboprobe during RNase digestion or at some other step of the procedure. The presence of MUC3B messages may complicate this analysis as well. The procedure did yield a major band with the actin control provided with the kit, however, and we obtained a narrow range of protected fragments when we applied RNase protection analysis to map the start of MUC2 and MUC2/hGH transgene transcription (20, 21). Thus, it seems highly probable that the MUC3A promoter initiates transcription from multiple start sites. Interestingly, the MUC3B 5'-GeneRacer clones obtained all initiated from the same base. Since only five such clones were obtained, it is difficult to draw conclusions, but this may indicate that the MUC3B promoter is fundamentally different from the MUC3A promoter.
The structure of MUC3A and MUC3B is compared with other fully sequenced human membrane mucins in Fig. 6. The MUC3 mucins are large, with mucin-like unique sequences in their amino termini. They contain two separate Thr-, Ser-, and Pro-rich tandem repeat domains that exhibit VNTR polymorphism within the human population (19). The carboxyl termini of the MUC3 mucins are alternatively spliced, with the most abundant splice variants containing two EGF-like domains, a transmembrane domain, a cytoplasmic domain, and a SEA module domain (12, 16, 32). This latter domain is often associated with the extracellular region of O-glycosylated proteins (40) and contains a conserved proteolytic processing cleavage site (41). MUC4 is also a large membrane-associated mucin; in fact most alleles of MUC4 are larger than those of MUC3A and MUC3B (11). The amino termini of MUC4 contains three 126130-residue degenerate tandem repeats and a mucin-like unique sequence. The carboxyl termini lacks a cognate SEA module domain, but it does contain a proteolytic cleavage site (10). MUC4 also contains a cognate NIDO domain, an AMOP domain, and a vWF-D domain (42). The function of these conserved structural domains in MUC4 is unknown. The EGF-like domains of the MUC4 carboxyl terminus may be functionally important, however, because the EGF-like domains of the rat homolog of MUC4 binds to ErbB2 and modulates its phosphorylation and interaction with ErbB3 (3, 43). MUC4 is highly expressed in airways and colon; however, appreciable levels are also found in stomach, lung and prostate (17). Thus, similar to the MUC3 mucins, MUC4 is expressed in several tissues. MUC1, which is associated with normal breast and pancreas and many types of carcinomas (44), is smaller than the MUC3 mucins and MUC4. This mucin lacks cognate extracellular structural domains other than its set of tandem repeats; however, its cytoplasmic domain is evolutionarily conserved and can function as a scaffold for signal transduction (4, 5). The MUC13 (8) and MUC15 (9) mucins are very small in size; however, the carboxyl terminus of MUC13 has a domain structure similar to that of MUC3A and MUC3B, with EGF-like domains and a SEA module.
|
| FOOTNOTES |
|---|
* This work was supported by the Department of Veterans Affairs Medical Research Service. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
** Present address: University of Auckland, Private Bag 92019, Auckland, New Zealand. ![]()
To whom correspondence should be addressed. Tel.: 415-750-2095; Fax: 415-750-6972; E-mail: jgum{at}maelstrom.ucsf.edu.
1 The abbreviations used are: EGF, epidermal growth factor; contig, group of overlapping clones; SEA, domain found in sea urchin sperm protein, enterokinase, and agrin; GSS, genome survey sequence. ![]()
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
N. Moniaux, W. M. Junker, A. P. Singh, A. M. Jones, and S. K. Batra Characterization of Human Mucin MUC17: COMPLETE CODING SEQUENCE AND ORGANIZATION J. Biol. Chem., August 18, 2006; 281(33): 23676 - 23685. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||