Cloning of the amino-terminal and 5'-flanking region of the human MUC5AC mucin gene and transcriptional up-regulation by bacterial exoproducts.

To obtain gene regulatory sequence for the mucin gene MUC5AC, we have isolated the MUC5AC amino terminus cDNA and 5'-flanking region. This was possible through the use of rapid amplification of cDNA ends-polymerase chain reaction (RACE-PCR) in which the 5' sequence of the human gastric mucin cDNA HGM-1 (1) was used to design the first MUC5AC-specific primer. Primers for subsequent rounds of RACE were designed from the 5'-ends of amplified RACE products. After five rounds of RACE-PCR, we could no longer generate upstream extensions of the cDNA and hypothesized that we had reached the 5'-end. Primer extension and RNase protection analysis confirmed this. Combined nucleotide sequence for the RACE-PCR products was 3.3 kb with an open reading frame encoding 1100 amino acids. A putative translation start site was found at nucleotide +48. This was followed by a 45 nucleotide putative signal sequence. This amino-terminal sequence contains no tandem repeats but is >60% similar to the amino-terminal nucleotide sequence of MUC2. The positions of cysteine residues in this MUC2-similar region are almost 100% conserved between the two genes. Northern analysis showed expression of cognate RNA in the stomach and airway but not muscle and esophagus. This pattern was the same as that obtained using previously reported 3'-MUC5AC sequences. We have cloned approximately 4 kb of genomic DNA upstream of the transcription start site and have sequenced 1366 nucleotides containing a TATA box, a CACCC box, and putative binding sites for NFkappaB and Sp 1. Within 4 kb of the transcription start site are elements mediating transcriptional up-regulation in response to bacterial exoproducts.

Mucin is a glycoprotein secreted from epithelial cells at many body surfaces. In the airways, mucin interacts with cilia to trap and clear pathogens and irritants. This mucociliary mechanism is impaired when mucin is produced excessively as in cystic fibrosis, chronic bronchitis, and asthma. Mucociliary impairment leads to airway mucus plugging, which promotes chronic infection, airflow obstruction, and sometimes death.
Both MUC2 and MUC5AC map to chromosome 11p15.5 and may have arisen from a common ancestral gene. The structure of MUC2 is known. Its central region, comprising Ͼ50% of the polypeptide, contains two tandem repeat sequences rich in threonine, serine, and proline (4,17); this is flanked up-and downstream by cysteine-rich regions (17,18). The threonine and serine residues represent O-glycosylation sites, whereas the cysteine residues are thought to mediate intermolecular interactions underlying mucus gel formation. The isolation of the amino terminus of the MUC2 cDNA by anchor PCR 2 provided sequence for probing a genomic library to obtain the 5Ј-flanking sequence (17). Using portions of this sequence in luciferase vectors, we identified DNA elements controlling the MUC2 response to the common cystic fibrosis pathogen Pseudomonas aeruginosa (14).
Much less information is available regarding MUC5AC. Understanding the transcriptional control of this gene will require isolation of the amino terminus and 5Ј-flanking region. To date, MUC5AC amino-terminal cDNAs have not been reported. Although PCR-based techniques can in principle extend existing cDNA fragments over long distances, the large size of the MUC5AC mRNA (10 -12 kb) (8,9), and the potential presence of a central repetitive region present obstacles to extending the existing cDNA sequences to the 5Ј-end.
A significant aid in this regard was provided by publication of the sequence of cDNA HGM-1 cloned from the human stomach (1). This cDNA likely derives from MUC5AC as nucleotides 1942-2281 are 99% similar to the MUC5AC clone JUL 32 (19) and nucleotides 2190 -2541 are 92% similar to the 5Ј-end of MUC5AC clone NP3a (1). As noted by Klomp et al. (1), the ϳ8% discrepancy between HGM-1 and NP3a suggests that portions of HGM-1 are repeated twice in MUC5AC. HGM-1's similarity to NP3a would place one HGM-1-like sequence near the 3Ј-end since NP3a contains a polyadenylation signal; its ϳ60% similarity to the MUC2 D3-domain (1) would place another HGM-1-like sequence near the 5Ј-end since the MUC2 D3 domain is * This work was supported by National Institutes of Health Public Health Service Grants HL 24136 and HL 43762 and a grant from the state of California Tobacco Research and Development Program. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) AF015521, AF016834.
¶ To whom correspondence should be addressed. within 3 kb of the MUC2 transcription start site (17). Hypothesizing that HGM-1 itself is present near the MUC5AC 5Ј-end, we used an HGM-1 sequence as the first gene-specific primer in repetitive 5Ј-RACE-PCR reactions. This approach ultimately permitted amplification of a 3.3-kb upstream extension of HGM-1, which we call MUC5AC-5Ј-RACE product (MUC5AC-5ЈRP). Primer extension, RNase protection assays, and the presence of a translation start site and putative signal sequence indicate that this sequence is at the gene's 5Ј-end. Genomic DNA immediately upstream of MUC5AC-5ЈRP has the structural properties of a promoter and contains elements mediating transcriptional up-regulation in response to bacterial exoproducts. We conclude that our cloned sequences are the amino-terminal and 5Ј-flanking region of MUC5AC. The availability of these sequences should aid identification of the elements controlling MUC5AC overexpression in disease.

MATERIALS AND METHODS
Cell Culture-The human lung epithelial carcinoma cell line NCIH292 was grown in RPMI 1640 medium supplemented with 10% heat-inactivated fetal calf serum (Life Technologies, Inc.). The human colon carcinoma line HM3 (4) was grown in Dulbecco's modified Eagle's medium with high glucose and 10% fetal calf serum. In some experiments, cells were exposed for 6 or 24 h to P. aeruginosa.
Bacterial Culture and Preparation of Cell-free Supernatants-P. aeruginosa strain PAO1 was grown in M9 buffer (20) for 72 h at 37°C (to late log phase). Cell-free supernatant was obtained by centrifugation at 10,000 rpm for 60 min at 4°C and by filtration through a 0.22-m filter (Corning). Supernatant was aliquoted and stored at Ϫ80°C until used.
Exposure of Tissues and Cells to Bacterial Cell-free Filtrates-To look at the effects of P. aeruginosa on MUC5AC steady state mRNA, incubation was as described (21). Briefly, cells were washed twice with phosphate-buffered saline at 37°C. Samples were then incubated with bacterial supernatant or buffer (M9) diluted 1:4 with mammalian cell culture medium for 6 h. Total RNA was obtained from pelleted cells scraped from the culture dish (22). Lactate dehydrogenase release was measured (LDH 320, Sigma) to detect any cell lysis.
cDNA Synthesis and 5Ј-RACE-PCR-Sources known to contain abundant MUC5AC mRNA (P. aeruginosa-exposed NCIH292 cells or human stomach) were subjected to RNA extraction (22). Total RNA (3 g) was used to generate double-stranded cDNA using the Marathon cDNA Amplification kit (CLONTECH). The double-stranded cDNA was ligated with the Marathon cDNA adaptor and purified on a chromaspin ϩTE-1000 column (CLONTECH) in a total volume of 100 l. 5Ј-RACE was performed using the double-stranded cDNA as template with one HGM-1 gene-specific primer (Gm1) and the adaptor primer AP1 or AP2. Additional gene-specific primers (Gm5, Gm9, Gm9G, and Gm9H) were generated based on the sequences of progressively amplified 5Ј-RACE products.
Northern Blot Analysis of Tissue Distribution of MUC5AC mRNA, Results Using MUC5AC-5ЈRP and NP3a Probes-Total RNA was extracted from human tissues according to previously described methods (22). RNA samples (20 g) were separated on 1.0% agarose gels containing 2.2 M formaldehyde and then transferred to a positively charged nylon membrane (Gene Screen, NEN Life Science Products). cDNA probes were labeled with [␣-32 P]dCTP using a Life Technologies, Inc. random primer labeling kit. For the MUC5AC 3Ј-end, a cDNA fragment was amplified from tissue mRNA using primers NP3a3Ј and NP3a5Ј. The insert for the probe was gel-purified from a construct made by TA cloning the PCR product into pCRII vector (Invitrogen). For the new sequence MUC5AC-5ЈRP, probes were made from amplified fragments using primers TER and GM9. Labeled probe was added to 10 ml of hybridization buffer containing 50% formamide, 10% dextran sulfate, 0.2% Denhardt's, 50 mM TRIS-HCl, pH 7.5, 1 M NaCl, and 0.1% sodium pyrophosphate to give a concentration of 2-5 ϫ 10 6 cpm/ml. Membrane hybridization and washing were performed using conditions described previously (23).
Primers-Primers used for 5Ј-RACE, construction of Northern blot probes, genomic library screening and DNA walking, primer extension, and RNase protection assays are shown in Table I. DNA Cloning and Sequencing-After RACE-PCR, amplified fragments were purified by low-melting point agarose gel electrophoresis, cut with appropriate restriction enzymes and cloned into pBluescript II SK(Ϫ) (Stratagene) or sequenced directly. Escherichia coli (SURE strain, Stratagene) was transformed with plasmids containing these fragments. Transformants were grown at 37°C or 30°C. Both sense and antisense strands were sequenced. Sequencing reactions were carried out using SequiTherm Long-Read cycle sequencing kits (Epicentre Technologies) and Thermo Sequenase fluorescent labeled primer cycle sequencing kits (Amersham Pharmacia Biotech) with the IRD41 (Licor) labeled primers. Sequence data were assembled by Lasergene software (DNAstar). Homology and transcription factor binding site searches were performed using MatInspector release 2.1 and Transcription Element Search Software (TESS, University of Pennsylvania) and MacVector software (IBI).
Chromosome Localization of PCR-amplified DNA Fragments-Two mouse/human hybrid cell line DNA panels were purchased from Bios. Cell line 1049 contained human chromosomes 5 and 11. Cell line 1079 contained human chromosomes 2 and 5. DNA from each cell line was used as a PCR template with RACE product primers to determine the chromosomal location of RACE products.
Primer Extension Analysis of Transcription Start Site-When progressive 5Ј-RACE reactions could no longer amplify additional sequence from either the stomach tissue or airway cell cDNA templates, we performed primer extension using primer Gm9H (approximately 100 bp from the putative 5Ј-end of the mRNA) to confirm that we had reached the transcription start site. Primer extension was done using the Promega avian myeloblastosis virus reverse transcriptase primer extension system. Briefly, 0.1 pmol of 32 P-end labeled primer Gm9H was incubated with 5 l (40 -50 g) total RNA from tissue or cells and 5 l of 2 ϫ PE buffer at 58°C for 20 min. After cooling to room temperature, 9 l of a master mix containing 2 ϫ PE buffer, 6.25 mM sodium pyrophosphate and 1 l of avian myeloblastosis virus reverse transcriptase was added to each sample. After 30 min of incubation at 42°C, the samples were diluted with 20 l of loading dye, denatured by heating for 10 min at 90°C, and run on a 6% acrylamide, 7 M urea, TBE gel, along with sequencing ladder and size markers.
RNase Protection Analysis of Transcription Start Site-To confirm transcription start site location as determined by RACE-PCR and primer extension assays, we performed RNase protection assays. The labeled RNA probe required for this assay was generated from a PCR product designed to incorporate the T7 promoter. This PCR fragment was amplified from a 12-kb genomic clone (7"A) derived from screening a human genomic library in the Lambda FIX II vector (Stratagene) and was known from sequencing data to contain the putative exon I of MUC5AC. The library was screened with a probe generated from PCR of a 5Ј-RACE product with primers GM9 and GM2.6 using methods described in Ref. 23. The primers used to generate the RNA probe template from the genomic clone were RPA-T7 containing sequence from exon I and the T7 promoter and primer RPA-5Ј containing upstream genomic sequence (see Table I). This enabled us to generate high specific activity [ 32 P]UTP-labeled RNA probes using RNA polymerase. For RPA analysis of MUC5AC mRNA levels in cells exposed to P. aeruginosa, primers NP3a5Ј and NP3a3Ј were used to PCR-amplify a 294-bp fragment that was then cloned into pCRII vector (Invitrogen). To monitor amounts of RNA used in each reaction, we used p-TRI-cyclophilin or p-TRI GAPDH vectors (Ambion) to generate antisense RNA probes. For the assay, total RNA was hybridized with 5 ϫ 10 5 cpm of probe overnight at 42°C. The RNA:RNA template was digested for 15 min at room temperature with 0.5 units of RNase A and 20 units of RNase T1, precipitated and run on a 6% polyacrylamide/urea-sequencing gel with a sequencing ladder for size determination.
5Ј-Genomic DNA Walking-Genomic DNA was amplified from DNA provided in the human PromoterFinder TM DNA walking kit (CLON-TECH) according to instructions provided by the manufacturer. For long sequence amplifications, we used the LA PCR kit (TaKaRa) and high fidelity expand PCR kit (Boehringer Mannheim) using primers GM9H5Ј and adaptor primers AP1 and AP2.
Construction of a Cell Line Stably Transfected with the MUC5AC 5Ј-Flanking Region and Determination of Luciferase Activity After Treatment with P. aeruginosa Exoproducts-A DNA fragment extending from Ϫ4.0 kb to ϩ68 bp was cloned into the MluI/SmaI site of pGL3 basic vector (Promega). This construct, referred to as M4-2, was cotransfected with pcDNA3 into the epithelial cell line HM3. G418-selected colonies were pooled, expanded, and used in luciferase reporter assays. Stably transfected HM3 cells were seeded at 10 5 cells/well in 96-well tissue culture plates (Dynatech) in Dulbecco's modified Eagle's medium with high glucose, 10% fetal bovine serum and 200 g/ml G418 (Life Technologies, Inc.). Six days later (1 day post-confluence), cells were exposed for 24 h to P. aeruginosa supernatant diluted at 5, 25, or 50% into culture medium. Cells were washed once with phosphatebuffered saline and stored frozen at Ϫ80°C. After thawing, cells were assayed for luciferase activity using LucLite reagent (Packard) and a TopCount luminometer (Packard).

RACE-PCR, cDNA Cloning and Sequence Determination of the MUC5AC Amino
Terminus-Based on the 99% sequence identity between MUC5AC clone JUL 32 (19) and HGM-1 nucleotides 1947-2278 (1), we hypothesized that HGM-1 is a part of MUC5AC. Based on Ͼ60% similarity between HGM-1 and the amino-terminal cysteine-rich domain of MUC2 (D-domain 3), we hypothesized that HGM-1 is an amino-terminal sequence. This led us to initiate 5Ј-RACE-PCR experiments aimed at extending HGM-1 to the MUC5AC transcription start site (Fig. 1).
In the first round of 5Ј-RACE-PCR, we used an HGM-1specific primer (GM1) and an adaptor primer (AP1). This yielded a 900-bp PCR fragment. Sequence data showed that this fragment was the 5Ј extension of human gastric mucin (HGM) (1) and was Ͼ65% similar to the MUC2 D-domain 3 just 5Ј to the central repeat region. Primer GM5 was designed based on the 5Ј-end of this fragment and was used in a second round of 5Ј-RACE-PCR. This generated an 1100-bp PCR fragment whose 5Ј-end was used to design primer GM9. When used in a third round of 5Ј-RACE-PCR, GM9 generated a 700-bp fragment. Primer GM9G was designed based on the 5Ј-end of this fragment and was used in a fourth round of RACE-PCR to generate a 600-bp fragment. Primer Gm9H, 103-bp downstream of the 5Ј-end of the fourth round RACE-PCR product, was used in a fifth round of RACE-PCR and generated a 110-bp product. Repeated efforts to generate larger products with primer GM9H from both gastric tissue and NCIH292 (airway) cell cDNA yielded PCR products with identical sequence that were ϳ100 bp in length. This suggested that GM9H was approximately 100 bp from the 5Ј-end of the mRNA as processed in both gastric tissue and NCIH292 cells.
The overall cDNA sequence obtained by 5Ј-RACE is about 3.3-kb (Fig. 2). There is an open reading frame of 3300 nucleotides, 290 of which directly overlap and are in frame with those encoding human gastric mucin. At ϩ48 is an ATG codon embedded in a Kozak consensus sequence (24). This is a putative translation start site. Following this is a putative secretory protein signal sequence. The entire open reading frame encodes 1100 amino acids. The nucleotide sequence is approximately 65% similar to the MUC2 amino-terminal sequence (1-3500, Fig. 3A). No tandem repeat sequence is present, but there are three cysteine-rich domains (D1-D3) in which the cysteine positions correspond almost exactly to those previously described for the amino terminus of human MUC2 (Fig. 3B).
Northern Blot Analysis of Tissue Distribution of RNA Corresponding to Newly Cloned Sequence-Our interpretation that the 5Ј extension of HGM-1 (MUC5AC-5ЈRP) is at the 5Ј-end of MUC5AC rests primarily on the 99% similarity between a portion of HGM-1 and the MUC5AC cDNA JUL 32 (19). Further confirmation of the identity between our new sequence and MUC5AC was provided by Northern blot analysis in which we observed that a probe from our new sequence showed tissuespecific hybridization identical to that obtained using a probe from the previously described MUC5AC C-terminal cDNA NP3a (8) (Fig. 4).  a Also used to generate the Northern probe. b Also used for primer extension. c Numbered according to Meezaman et al. (8) and also used for Northern probes.

Human MUC5AC Gene Amino Terminus and Promoter 6815
Chromosome Mapping-Human chromosome 11p15 contains a mucin gene cluster currently known to include MUC5AC as well as MUC5B, MUC6 and MUC2. To obtain further supporting evidence that the newly cloned RACE-PCR sequence is part of MUC5AC, we performed chromosomal mapping experiments. As shown in Fig. 5, MUC5AC-5Ј primers amplified a product from mouse-human hybrid cell line 1049, but not from cell line 1079. As both cell lines contained DNA from chromosome 5 but only 1049 contained DNA from chromosome 11, the results clearly show that our RACE product MUC5AC-5ЈRP maps to chromosome 11. This is consistent with identification of this product as part of MUC5AC.
Primer Extension and RNase Protection Analysis-MUC5AC-5ЈRP contains a putative translation start site and signal sequence FIG. 3. Similarity matrix of cDNA and alignment of protein for MUC5AC-5RP and MUC2. A, the cDNAs were compared using the MacVector similarity matrix routine with a cutoff of 65% similarity. Lines indicate regions of similarity. B, alignment of amino acids was done using MacVector alignment program. Solid lines represent identity and double dots (:) indicate conservative substitutions. Conserved cysteine residues are indicated by asterisks (*) and the D-domains are indicated by bent lines. near its 5Ј-end (Fig. 2) suggesting that its 5Ј-end is at or near the transcription start site. To investigate this, we performed primer extension and RNase protection analysis. For primer extension, we used primer GM9H, which is approximately 100 bp upstream of the 5Ј-end of our RACE-PCR product as estimated from agarose gels. The primer extension reaction yielded a product of 114 bp (Fig. 6A) when RNA from gastric tissue or airway cells was used as a template, supporting the view suggested by RACE-PCR that the transcription start site was approximately 100 bp upstream of primer GM9H.
For RNase protection analysis, we used as probe a portion of the genomic clone 7"A containing the putative exon I and upstream sequence (see "Materials and Methods"). We examined a total of three RNA samples (Fig. 6B). These were taken from gastric tissue, colon carcinoma cells (HM3) and lung carcinoma cells (NCIH292). RNA from each sample protected the same three probe fragments, indicating putative start sites at 1, 6, and 8 bp upstream of the start site predicted by primer extension. The start site predicted by computer program NNPP (promoter prediction by neural network, Lawrence Berkeley National Laboratory, Human Genome Center) was at 4 bp upstream of the site indicated by primer extension. As it fell approximately in the middle of the range of possible start sites, we designated the computer-predicted start site as ϩ1.
Cloning and Sequencing of DNA Upstream of the Transcription Start Site-To obtain DNA immediately flanking the transcription start site, we performed 5Ј-genomic DNA walking using the gene-specific primer GM9H5Ј (ϩ68/Ϫ39) and two adaptor primers, AP1 and AP2 (see "Materials and Methods"). This yielded a 4-kb genomic DNA fragment (M4-2) the sequence of which is shown in Fig. 7. We have confirmed the sequence Ϫ300/ϩ1 as well as downstream sequence through exon 1 (ϩ1 to ϩ120) by sequencing a subclone of genomic clone 7"A. The upstream sequence contains a TATA box at Ϫ23/Ϫ29, further supporting the view that our RACE-PCR product MUC5AC-5ЈRP is at the 5Ј-end of the mRNA and that the designated transcription start site, ϩ1 is accurate. Present in the putative promoter region are NFB, Sp-1, GRE, AP-2, and CACCC box sites.
Up-regulation of MUC5AC Transcriptional Activity by P. aeruginosa-Availability of the upstream regulatory region permits analysis of potential abnormalities in MUC5AC transcription in disease models. We observed large inductions of MUC5AC RNA in epithelial cells exposed to P. aeruginosa or its exoproducts in cell-free supernatants (Fig. 8A). That this was controlled at the transcriptional level was indicated by 15-20fold induction of transcriptional activity in epithelial cells stably transfected with MUC5AC-luciferase reporter constructs and exposed to P. aeruginosa (Fig. 8B). These findings indicate the presence of elements responsive to P. aeruginosa in the 4-kb DNA fragment immediately upstream of the MUC5AC transcription start site. Analysis of deletion mutants will permit precise identification of these elements and open the way to identification of cognate transcription factors.

DISCUSSION
In this series of studies, we isolated the amino terminus and 5Ј-flanking region of the MUC5AC mucin gene as a first step toward understanding the dysregulation of mucin mRNA production in the airways of cystic fibrosis patients. Hypothesizing that the previously reported cDNA HGM-1 was relatively upstream in the MUC5AC sequence, we performed progressive RACE-PCR amplifications that eventually reached the transcription start site. We used a similar approach to isolate the 5Ј-flanking region from genomic DNA.
Evidence That the MUC5AC RACE-PCR Product Contains the Gene's 5Ј-End-That 5Ј-RACE-PCR yielded products with identical 5Ј-ends after several successive amplifications regardless of whether stomach or airway cDNA was used as a tem- Lanes are as follows: 1, stomach; 2, esophagus; 3, muscle; 4, bronchus; 5, lung. An RNA ladder was used for size markers as indicated. The blot was stripped and reprobed with GAPDH to assess amount and quality of RNA loaded. plate, first suggested we had reached the 5Ј-end. The results of subsequent primer extension and RNase protection assays supported this. Further support was provided by characteristics of the DNA both upstream and downstream of the putative transcription start site: 25 bp upstream of the start site is a TATA box, and 48 bp downstream is a putative translation start codon (ATG) embedded in a Kozak consensus sequence (24) followed by a 45-bp signal sequence.
Current Model of MUC5AC-The overall structure of MUC5AC, as pieced together from evidence currently available, is compared with the structure of MUC2 in Fig. 9. The structure of the MUC5AC carboxyl terminus has been known since the cloning of NP3a, a cDNA isolated from a nasal polyp library. Its identification as part of MUC5AC rests on the fact that cDNAs containing part of the NP3a sequence had previously been designated as MUC5 (25) and were later designated as MUC5AC (19) The recognition that NP3a comprises the gene's 3Ј-end rests on its containing a polyadenylation signal and poly(A) tail. It also contains a homologue of the MUC2 D-domain 4 (8). A similar cDNA, L31, was isolated from an HT29 (colon carcinoma) cell library (9).
Other than the positioning of NP3a and L31 at the 3Ј-end, it has not been possible to assign any of the other known MUC5AC cDNAs to particular positions in the coding sequence. cDNAs containing threonine/serine/proline-rich repetitive sequences (JER 47, JER 58, Mar 2, 10, 11, and CEL 2) from a tracheobronchial library (19) and cDNA 4F from a stomach library (15) are assumed to occupy positions in a central part of the gene based on comparisons with MUC2. Our Northern blots (Fig. 4) suggest that the size of the full-length MUC5 AC mRNA is 12-14 kb and possibly larger. Subtracting the amount of currently known sequence at the 5Ј-and 3Ј-ends, we estimate the size of the repeat region in MUC5AC to be at least 6 kb. The size of the repeat region in MUC2 is 8.4 kb.
Despite considerable interest in the gene 5Ј-end and promoter, prior to this report no 5Ј cDNAs had been conclusively identified. Klomp et al. (1) had noted, however, that HGM-1, a cDNA isolated from a gastric cDNA library, is Ͼ60% similar to the MUC2 D-domain 3, which is approximately 3 kb downstream of the MUC2 5Ј transcription start site. Taken together with evidence that HGM-1 is part of MUC5AC (see above), its similarity to this upstream region of MUC2 suggested that HGM-1 might comprise an upstream region of MUC5AC. Our 5Ј-RACE-PCR studies yielded a 3.3-kb 5Ј extension of HGM-1. Within the MUC5AC sequence reported here are homologues of MUC2 D-domains 1 and 2 and the 5Ј-end of D-domain 3. Upstream of D-domain 1 is a signal peptide and translation start site.
Sequence similarity between MUC2 and MUC5AC cDNAs has suggested a common ancestral origin. With the extension of the 5Ј-end of MUC5AC and manifestation of the conserved domain structure between the two genes, this theory gains increased support.
The cloning work described here has provided insights not only into mucin gene structure and evolution but also into the mechanisms by which mucin is overproduced in human dis-FIG. 6. Primer extension and RNase protection assay. A, autoradiography showing a 114-bp primer extension product obtained from human stomach RNA (lane 2). Lane 1 shows a sequencing ladder, and lane 3 is a 32 P-labeled ØX174 HinfI marker with band sizes as indicated. B, autoradiography after electrophoresis of the protected fragment shows a major band and two minor bands. Lane 1 is a sequencing ladder, lane 2 is total RNA from NCIH292 cells exposed to P. aeruginosa, lane 3 is total RNA from gastric tissue, and lane 4 is total RNA from HM3 cells exposed to P. aeruginosa. The arrows extend from individual nucleotides in the sequence of MUC5AC-5ЈRP that correspond to the protected bands. The nucleotide corresponding to the primer extension (P.E.) result and the nucleotide predicted by the computer search to be the start of transcription are also indicated. ease. Although MUC5AC has been recognized for some time to encode an airway mucin, it was only recently discovered that its expression is up-regulated in airway disease. The work reported here is the first to establish that this up-regulation is controlled at the transcriptional level and that key cis-and trans-activating factors operate within 4 kb upstream of the transcription start site. Availability of the newly cloned sequence will permit precise identification of transcriptional control mechanisms and will facilitate elucidation of upstream signaling pathways as well.