The Carboxyl-terminal Sequence of the Human Secretory Mucin, MUC6

The distribution of MUC6 suggests that its primary function is protection of vulnerable epithelial surfaces from damaging effects of constant exposure to a wide range of endogenous caustic or proteolytic agents. A combination of genomic, cDNA. and 3′ rapid amplification of cDNA ends techniques was used to isolate the carboxyl-terminal end of MUC6. The 3′ nontandem repeat region contained 1083 base pairs of coding sequence (361 amino acids) followed by 632 base pairs of 3′-untranslated region. The coding sequence consists of two distinct regions; region 1 contained the initial 270 amino acids (62% Ser-Thr-Pro with no Cys residues), and region 2 contained the COOH-terminal 91 amino acids (22% Ser-Thr-Pro with 12% Cys). Although region 1 had no homology to any sequences in GenBank, region 2 had approximately 25% amino acid homology to the COOH-terminal regions of human mucins MUC2, -5, and -5B and von Willebrand factor. The shortness of region 2 would leave little of the peptide backbone exposed to a potentially hostile environment. Antibody studies suggest that MUC6 in its native form exists as a disulfide-bonded multimer. The conservation of the 11 cysteine positions in region 2 suggests the importance of this short region to mucin polymerization.

Mucins are highly glycosylated glycoproteins which are the major structural component of the mucus gel. Their physiological functions are thought to include cytoprotection, mechanical protection, maintenance of viscosity in secretions, and cellular recognition among others, because of their characteristic structural features. Mucins combine a peptide backbone with oligosaccharide side chains (attached in O-glycosidic linkages through N-acetylgalactosamine to threonine or serine residues) (1). These O-glycosylation sites are not randomly distributed but appear to be concentrated in certain regions of these genes, most notably in the extended arrays of in tandem repeated peptides which are the major structural feature of mucins. Each human mucin species thus far described has a unique peptide sequence in its repeat unit, which has served as the major means of identifying the different genes. Additionally, gel-forming mucins appear to have cysteine-rich regions flanking the tandem repeat arrays bilaterally. At this time eight distinct human mucin genes have been described (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21). The human genes that have thus far been examined also appear to be highly polymorphic (2,22).
In addition to sharing the general needs of mucosal surfaces for protection from pathogenic organisms, desiccation, and mechanical damage, the gastrointestinal epithelium has specific needs that are different from other organ systems, i.e. protection against damage by low pH, bile, and proteolytic enzyme digestion (23). Evidence suggests that gastric mucin may be involved in the mechanism of gastric mucosal injury caused by Helicobacter pylori leading to gastritis, peptic ulceration, and possibly gastric cancer (24 -26). Alteration in mucin expression from colonic to gastric type is associated with changes in the biological phenotype of cancer cells (27). To date two distinct gastric mucin species have been described, MUC5 and MUC6. MUC5 was originally isolated from a human tracheobronchial cDNA library and appears to have considerable homology to MUC2, MUC5B, and von Willebrand factor (vWF) 1 throughout the entire coding region 3Ј to the tandem repeat region (16,18,19,28). MUC6 was originally isolated from a human gastric cDNA library. It has the longest described tandem repeat unit at 169 amino acids and is expressed primarily in the upper gastrointestinal tract (20). MUC5 and MUC6 are expressed in different cell types, with the former found in surface mucous cells of the cardia, fundus, and antrum and the latter in the mucous neck cells of the fundus, antral-type glands of the antrum and cardia, and Brunner's glands of the duodenum (17,29). MUC6 has also been found in the gall bladder, pancreas, seminal vesicles, and female reproductive tract, all of which are exposed on a constant basis to bile or proteases (30 -32). 2 The protective functions of mucus are thought to depend on its gel forming ability (oligomerization), which is known to be destroyed by exposure to proteases. MUC2 has considerable homology to vWF and is thought to oligomerize by similar mechanisms (28). However, MUC2 also has fairly extensive regions of low glycosylation outside of its tandem repeats which would be relatively unprotected from the action of proteases. The expression of MUC6 in tissues exposed to hostile environments suggests that it may have some unique features impart-ing increased resistance in such milieu.
In this manuscript we describe the isolation and sequencing of the 3Ј nontandem repeat sequence from gastric mucosal RNA. The translated sequence was analyzed and a peptide synthesized to a region of the coding sequence containing minimal potential O-glycosylation sites. The peptides were then used to immunize rabbits and the resulting antibodies used both to confirm the accuracy of the translated sequence and to examine the structure of the mature MUC6 glycoprotein.

MATERIALS AND METHODS
Mucin Isolation and Antibody Production-Gastric mucin was isolated from eight paired fundic and antral specimens of human gastric tissue by a combination of Sepharose CL-4B column chromatography and cesium chloride density gradient centrifugation as described previously (20). For production of polyclonal antibodies to gastric apomucins, the mucin was then deglycosylated by exposure to hydrogen fluoride for 3 h at room temperature, conditions that have been shown to remove essentially all of the oligosaccharide side chains (33). Polyclonal rabbit antisera were developed against the deglycosylated mucin by standard methods (34). For the development of polyclonal antibodies to the carboxyl-terminal nontandem repeat portion of MUC6, the protein sequence was deduced from the cDNA nucleotide sequence, and a synthetic peptide was made (CRPLHSYEQQLELPCPDPSTPGRR) corresponding to a 24-amino acid region with a high antigenic index (35). This peptide will be subsequently referred to as M6CNTRP1 (MUC6 carboxyl nontandem repeat peptide 1). Antibodies to the keyhole limpet hemocyanin conjugate of this synthetic peptide were developed in rabbits as described previously (9).
Nucleic Acid Isolation-Total RNA from gastric biopsy specimens (collected according to approved Committee on Human Research protocol) was isolated by homogenization immediately following harvest in a guanidinium thiocyanate solution followed by extraction with phenol: chloroform:isoamyl alcohol (25:24:1) as described previously (36). High molecular weight DNA from peripheral leukocytes was isolated by the method of Blin and Stafford (37).
cDNA and Genomic DNA Library Screening-The human stomach cDNA library in gt11 was obtained from CLONTECH. Expression library screening was performed as described previously (20). Genomic cloning was performed using a human placental DNA library in the -FIX vector (Stratagene) also as described previously (22). Plaques giving a positive signal were isolated, plated, and rescreened, with this process being repeated until clonality was finally obtained. The sequence of MUC6 3Ј nonrepetitive cDNA was generated by reverse transcription of total RNA followed by oligo(dT) anchor polymerase chain reaction of the 3Ј end of the gene (3Ј RACE; rapid amplification of cDNA ends). The following oligonucleotides were used: MUC6NTR1 (ϩ): 5Ј-CTACAGCCTCTTCTTCCTTCATATCCT-3Ј; 3Ј RACE dT(57) (Ϫ): 5Ј-AAGGATCCGTCGACATCGATAATACGACTCACTATAGGG-A(T) 17 -3Ј; 3Ј RACE adaptor (Ϫ): 5Ј-GATCAAGCTTGGATCCTGCAGA-CATCGATAATACGAC-3Ј.
One g of gastric total RNA was reversed transcribed using Superscript II RT (Life Technologies, Inc.) and 100 ng of the 3Ј RACE dT(57) oligonucleotide and the cDNA purified using silica matrix spin columns (GlassMAX spin cartridges; Life Technologies, Inc.). Primary amplification of the cDNA was then performed using the gene specific primer MUC6NTR1 and the 3Ј RACE adaptor. Thermal amplification of DNA was performed using a DNA thermal cycler and GeneAmp core reagents from Perkin-Elmer Corp.
Subcloning and DNA Sequencing-Sequencing was performed after subcloning of appropriate sized restriction fragments into Bluescript (Stratagene) or thermal amplification products into polymerase chain reaction 2 (Invitrogen) vectors. Both strands were sequenced for all clones described in this manuscript. The sequencing was performed using the dideoxynucleotide chain termination method with a modified T7 DNA polymerase (Sequenase, version 2.0; U. S. Biochemical Co.) or a modified Taq polymerase (Sequitherm, Epicentre Technologies) and 35 S-dATP (Amersham) (38). Analysis of nucleic acid and protein sequence data was performed using MacVector (IBI) and Intelligenetics Suite (Intelligenetics, Inc.) software.
Immunoblotting-Slot blotting was performed using human gastric mucin (isolated as described earlier) solubilized in phosphate-buffered saline. Ovine submaxillary mucin was used as a control. Concentrations of 0, 2, 4, and 6 g were treated with ␤-mercaptoethanol (␤ME) at 50°C for 10 min or with no treatment and placed on a 0.45-m nitrocellulose membrane using a slot blotting apparatus (BioRad BIODOT SF appa-ratus). Each slot of the nitrocellulose membrane was then washed with 500 l of phosphate-buffered saline, blocked with 3% gelatin in Trisbuffered saline (TBS; 20 mM Tris-HCl, pH 7.5, 500 mM NaCl) for 1 h, washed three times in TBS, exposed to the anti-MUC6NTRP1 polyclonal serum for 2 h at room temperature, washed in TBS-Tween 20 (TBST; TBS ϩ 0.05% Tween 20) three times, followed by incubation at room temperature with horseradish peroxidase-conjugated goat antirabbit IgG (Zymed) for 1 h. After washing twice with TBST and once with TBS, tetramethyl benzidine (Zymed) was then used as the colorimetric agent. The nitrocellulose membrane was then washed in distilled water, dried in the dark, scanned, and stored as a digitalized image that was analyzed using NIH Image version 1.54 software.

RESULTS
Standard cDNA cloning did not identify any clones containing both MUC6 tandem repeat and 3Ј nontandem repeat sequence. We therefore probed a genomic library, isolating several genomic clones. One of these, clone GMUC6-4 contained the tandem repeat region on its 5Ј end and nontandem repeat sequence on its 3Ј end. After restriction mapping and subcloning of the insert, the junction of the tandem repeats and the nontandem repeat region was identified by DNA sequencing. An oligonucleotide was then synthesized to the beginning of the nontandem repeat region (MUC6NTR1).
Using total gastric RNA obtained from biopsy specimens as a target, cDNA was synthesized using Superscript II (Stratagene) reverse transcriptase and the 3Ј anchor-dT primer. The resulting cDNA was then used as a template for thermal amplification using the primer pair MUC6NTR1 and 3Ј RACE adaptor. This gave a 1735-bp polymerase chain reaction product that was cloned into Bluescript KS vector (Stratagene) and sequenced in both directions (Fig. 1).
Mapping of the genomic clone GMUC6 -4 showed that the corresponding genomic DNA sequence contained three exons as shown in Fig. 2. Exon 1 appears to be quite long, encompassing the tandem repeats as well as the first 805 bp of the 3Ј nontandem repeat region. This was followed by an intron of approximately 2500 bp, a second exon extending from 806 to 908 bp, a second intron of 273 bp, and finally by a third exon from 909 to the stop codon at 1083. Analysis of the translated DNA sequence showed that the carboxyl-terminal nontandem repeat portion of MUC6 consists of two distinct regions. Region 1 (270 amino acids) corresponds closely to exon 1 and contains a high Thr-Ser-Pro content (61.5%) but no cysteine residues. The amino acid composition of region 1 is very similar to that of the tandem repeats regarding the percentages of polar, nonpolar, acidic, and basic residues. In contrast, region 2 (91 amino acids), which corresponds closely to exons 2 and 3, contains a much lower Thr-Ser-Pro content (21.7%) but a high content of cysteine residues (12.0%) ( Table I). Two potential N-glycosylation sites (Asn-X-Thr/Ser) are present, one in the Thr-Ser-Prorich region 1 (residue 224) and one in the Cys-rich region 2 (residue 288).
Like several of the other mucins that are thought to be capable of forming gels, MUC6 has homology to vWF. However, unlike MUC2, MUC5, and MUC5B, which have homology through almost their entire carboxyl-terminal nontandem repeat regions, the homology in MUC6 is confined entirely to the 91 amino acids comprising region 2. The homology among MUC2, MUC5, MUC5B, vWF, and region 2 of MUC6 ranges between 24 and 29%, much of which is accounted for by conservation of position of the 11 cysteine residues (Fig. 3) (10, 18, 19, 39, 40).
To confirm the amino acid sequence of the carboxyl-terminal nontandem repeat portion of MUC6, polyclonal antibodies were raised against a 24-amino acid sequence that computer analysis predicted would be antigenic (M6CNTRP1). After this serum was shown to recognize the M6CNTRP1 by enzyme-linked immunosorbent assay (data not shown), it was used in a slotblot assay using purified human gastric mucin (Fig. 4) with purified ovine submaxillary mucin used as a control. Because the cysteine-rich region is thought to be important in forming disulfide bonds, one set of samples was treated with 5% ␤ME at 50°C for 10 min prior to loading. This had the effect of increasing the signal for the gastric mucin approximately 2-fold (Fig.  4) as determined by densitometry (NIH Image 1.54) while not affecting the ovine submaxillary mucin controls appreciably (data not shown). The signal seen in the non-␤ME-treated gastric mucin sample is felt to be because of a combination of dissociation of the mucin multimers during the isolation procedure as well as isolation of newly synthesized monomers present in the mucosal cells. Colonic mucin gave a signal of intermediate intensity which was not affected by incubation with ␤ME (data not shown).

DISCUSSION
Certain regions of the gastrointestinal tract, particularly the gastric mucosa, have specialized protective needs because of their harsh environment. The constant exposure to acid and protease-rich secretions would rapidly lead to some degree of proteolysis for most proteins and hence to mucosal damage. Mucins are known to have protease-resistant regions conferred by the high content of O-linked oligosaccharide side chains. In a harsh environment, one would expect that the mucins making up the structural component of the mucus gel would maximize the protection of the core peptide by maximizing their Thr-Ser-Pro-rich regions. The distribution of MUC6 suggests that its primary function is protection of vulnerable epithelial surfaces from damaging effects of constant exposure to a wide range of endogenous caustic or proteolytic agents such as proteases (stomach, seminal vesicles, endocervix, endometrium, and pancreas), acid (stomach and duodenum), and bile (gall bladder and bile ducts) (17, 20, 30 -32). 2 It is therefore not surprising that MUC6 differs markedly from the other human mucins which are thought to be capable of gel formation, i.e. MUC2, MUC5, and MUC5B, all of which are located on chromosome 11p15.5. Whereas the other putative gel forming mucins are cysteine-rich and have considerable homology to each other and vWF throughout most of their carboxyl nontandem repeat regions (853 amino acids for MUC5B, 984 amino acids for MUC2, and 1043ϩ amino acids for MUC5AC), the MUC6 carboxyl terminus is considerably shorter (361 amino acids) and has homology to these peptides only in its terminal 91 amino acids. The homology between region 2 of MUC6, the other 11p15.5 mucins, and vWF is approximately 25% of which almost half is the result of the conservation of position of the 11 cysteine residues. The 270 amino acids of region 1 have no homology to vWF or other mucins and contain 61% Thr-Ser-Pro with no Cys residues, a composition that is virtually identical to the tandem repeat region. These residues would presumably be glycosylated and hence protected to the same degree as the tandem repeats.
The carboxyl-terminal nontandem repeat region of MUC6 has two potential N-glycosylation sites. Human gastric mucin precursors appear to be N-glycosylated and, although inhibition of N-glycosylation by tunicamycin has been shown to inhibit the oligomerization, maturation, and secretion of rat gastric mucin, the effect of such inhibition on oligomerization of human gastric mucin is unclear (41,42).
The original intent of raising the antibodies against M6NTRP1 was to confirm the reading frame of the translated amino acid sequence and to confirm the histological staining pattern for MUC6. Although the antibody recognized the synthetic peptide well, it was not satisfactory for immunohistochemistry on tissue sections, raising the question of whether the amino acid sequence was incorrect. We then performed a slot immunoblot with purified human gastric mucin using  ovine submaxillary mucin as a control. This indicated specific binding of the antibody to human gastric mucin and suggested that the epitope recognized by this antibody was masked in native mucin, possibly by disulfide bonding between the mucin monomers. Repeating this experiment with and without ␤ME treatment indicated that reduction of disulfide bonds considerably increased the intensity of the binding for ␣-M6CNTRP1 ( Fig. 4). Oligomerization is known to occur early in the biosynthesis of rat gastric mucin (42) and if, as the data suggest, multimerization affects recognition of the ␣-M6CNTRP1 epitope, this would explain the lack of reactivity by immunohistochemistry on fixed tissues, in which mucins should remain intact. The reactivity of ␣-M6CNTRP1 to non-␤ME-treated human gastric mucin suggests that isolation of apomucin monomers and/or degradation of oligomers is occurring during mucin isolation, although reactivity with other sites on the apomucin cannot be ruled out entirely.
The formation of vWF multimers is known to first involve dimerization by disulfide bond formation in the carboxyl-terminal 151 amino acids (43,44). Similarly, porcine submaxillary mucin is known to form disulfide-bonded dimers in its carboxylterminal 240 amino acids. The formation of these dimers is not affected by inhibition of N-glycosylation (45). The data from slot immunoblots with and without ␤ME treatment indicates that reduction of the disulfide bonds in region 2 of the carboxyl terminus is necessary to expose the epitope recognized by ␣-M6CNTRP1, i.e. MUC6 in its native form exists as a disulfide-bonded multimer, the secondary structure of which limits access to this epitope. The conservation of the positions of cysteines between the COOH termini of vWF (87 amino acids), porcine submaxillary mucin (88 amino acids), and of MUC6 (91 amino acids) and the necessity for disulfide-linked multimerization for mucus gel formation strongly suggest that the cysteine-rich region necessary for dimerization in secretory mucins, and vFW is even shorter than previously suspected. results with ␤ME treatment at varying amounts of purified human gastric mucin; the bottom lanes are without ␤ME treatment. Panel B, graphic representation of data with amounts of human gastric mucin plotted against pixel density within slots (background subtracted) as calculated by NIH Image version 1.54 software.