The N terminus of the MUC2 mucin forms trimers that are held together within a trypsin-resistant core fragment.

The N terminus of the human MUC2 mucin (amino acids 1-1397) has been expressed as a recombinant tagged protein in Chinese hamster ovary cells. The intracellular form was found to be an endoglycosidase H-sensitive monomer, whereas the secreted form was an oligomer that gave monomers upon disulfide bond reduction. The secreted MUC2 N terminus contained a trypsin-resistant core fragment. Edman sequencing and mass spectrometry of the peptides obtained localized this core fragment to the C-terminal end of the recombinant protein. This core retained its oligomeric nature with an apparent mass of approximately 240 kDa. Upon reduction, peptides of approximately 85 kDa were found, suggesting that the N terminus forms trimers. This interpretation was also supported by gel electrophoresis and gel filtration of the intact MUC2 N terminus. Electron microscopy revealed three globular domains each linked via an extended and flexible region to a central part in a trefoil-like manner. Immunostaining with gold-labeled antibodies localized the N-terminal end to the three globular structures, and the antibodies directed against the Myc and green fluorescent protein tags attached at the C terminus localized these to the stalk side of the central trefoil. The N terminus of the MUC2 mucin is thus assembled into trimers that contain proteolytically stable parts, suggesting that MUC2 can only be partly degraded by intestinal proteases and thus is able to maintain a mucin network protecting the intestine.

Mucins are highly glycosylated proteins protecting the mucosal surfaces of the body. All mucins are characterized by their heavy O-glycosylation, which is clustered in mucin domains rich in the amino acids Ser, Thr, and Pro. Mucins can be structured in different ways, either as membrane-bound with a transmembrane domain or secreted as monomers or as polymers (1,2). The latter mucins are also described as gel-forming, as these give the mucus its viscoelastic properties. Four human gel-forming mucins have been found, MUC2, MUC5AC, MUC5B, and MUC6, which are clustered on chromosome 11p15 (3). However, the best studied mucin is the porcine submaxillary mucin (PSM), 1 which has become a model also for the human mucins of this type (3)(4)(5)(6).
The MUC2 mucin is the main gel-forming mucin of the small and large intestines and is produced by the intestinal goblet cells (2,(7)(8)(9)(10). It is a major structural component of the mucous barrier covering the epithelium, protecting the epithelial cells from microorganisms as well as digestive enzymes. In fact, one of the remarkable features of this mucin is that it can withstand the pancreatic digestive enzymes. That the highly glycosylated mucin domains are totally resistant to these enzymes is easy to understand. However, the globular, less glycosylated ends must also be relatively proteolytically resistant, as the polymers must remain intact to maintain the mucous gel. The MUC2 mucin is difficult to solubilize, and it is insoluble in 4 M guanidinium chloride (8,10), a property that might be related to the appearance of nonreducible covalent linkages in addition to the disulfide bonds (10,11). The MUC2 mucin has been reported to be aberrantly expressed in the airways upon Pseudomonas aeruginosa infection in cystic fibrosis, where its properties might contribute to the abnormal mucous viscosity in this disorder (12).
Both N and C termini of human MUC2 show sequence similarities to the respective termini of the von Willebrand factor (vWF) and PSM. The vWF is dimerized in the endoplasmic reticulum via intermolecular disulfide bond formation of its C terminus. Multimerization takes place in the Golgi via disulfide bonds between cysteines in the N terminus, where the vWF forms end-to-end oligomers (13,14). Thus, the vWF will form linear polymers (13,14). This observation also suggests that the polymerized gel-forming mucins are linear, which is consistent with biophysical studies of mucins (15,16). However, when the N terminus of PSM was recombinantly expressed, the results suggested that it formed trimers instead of the predicted dimers (5,6). This suggestion has been disputed, and it has also been suggested that PSM may be oligomerized differently from other mucins.
To study how the N terminus takes part in the multimerization of the MUC2 mucin, the MUC2 N terminus was expressed in Chinese hamster ovary (CHO) cells. The secreted oligomerized mucin was purified and shown by electron microscopy to be held together in a central condense core surrounded by three large globular domains. The oligomer was also shown to have a large portion that was resistant to trypsin digestion. The trypsin-resistant "core fragment" was localized and shown to be a trimer.
Construction of Expression Vector pSNMUC2-MG-DNA clones D8, SMUC343 (7), and SMUC313 2 contain bp 1-617, 565-1792, and 1665-4208, respectively, of the published sequence (GenBank TM /EBI accession number L21998) (7). D8 and SMUC343 were joined by a natural EcoRI site at bp 566, and SMUC313 was added to the end of SMUC343 by the natural SfiI site at bp 1676. RNA from the cell line LS174T was used to obtain a 241-bp reverse transcription-PCR product using primers 5Ј-ACGACGGTGACCGAGAACCATTT and 5Ј-TCTAGAGGTGATA-CACTTATCCATGGGCCA. The PCR product was ligated to the 3Ј-end of the D8-SMUC343-SMUC313 construct using a natural BamHI site to give a unique XbaI site at the 3Ј-end. The construct was made in pBluescript (Stratagene). The DNA was sequenced; compared with the published sequence; and found to be identical with one exception: at positions 4079 and 4080, AT was substituted with TA. This change gives rise to an amino acid substitution of His to Leu and has previously been found to be a conflict (17). This change was not corrected. Enhanced GFP was removed from the pEGFP vector (Clontech) by digestion with NheI and EcoRI, and the murine Ig -chain leader sequence from pSecTagA (Invitrogen) replaced enhanced GFP using the same sites. The multicloning site was changed by digesting with HindIII and XbaI and ligating the annealed oligonucleotides 5Ј-AGCTCGAGTGAG-CTCACTAGTGAGCAGAAGCTGATCAGCGAGGAGGACCTGTCTAG-AAAGCTTA and 5Ј-CTAGTAAGCTTTCTAGACAGGTCCTCCTCGCT-GATCAGCTTCTGCTCACTAGTGAGCTCACTCG to the vector. The new multicloning site makes the HindIII and XbaI sites used in the cloning nonfunctional and adds new sites: SacI, XhoI, SpeI, and XbaI. The sequence between the SpeI and XbaI sites encodes a Myc tag, and a stop codon is situated after the XbaI site. The MUC2 N terminus (SacI to XbaI) was inserted into the pS-M vector via SacI and SpeI. Using the SacI site (bp 96) removes the natural signal sequence and is an in-frame ligation with the Myc tag, creating pSNMUC2-M. Mutagenesis of pEGFP-C1, enabling in-frame ligation, was carried out using a Quik-Change TM site-directed mutagenesis kit (Stratagene) with primers 5Ј-GCTAGCGCTACCGGTGCCACCATGGTGAGCAAGG and 5Ј-CCTTG-CTCACCATGGTGGCACCGGTAGCGCTAGC. Enhanced GFP was cut out with NheI and XbaI and inserted in-frame into pSNMUC2-M in the XbaI site, resulting in the pSNMUC2-MG expression vector.
Expression of SNMUC2-MG-The expression vector pSNMUC2-MG was transfected into CHO-K1 cells by electroporation using the Gene Pulser II RF module (Bio-Rad). Cells were trypsinized; washed once with 272 mM sucrose, 7 mM sodium phosphate buffer (pH 7.4), and 1 mM MgCl 2 ; and incubated at 4°C for 10 min. Electroporation was performed using 400 V and 50% modulation at 40 kHz for 15 pulses of 1 ms each at 1-s intervals. Positive clones were selected by adding 250 g/ml G418 48 h post-transfection. Positive clones, determined by fluorescence, were subcloned in microtiter plates, and strongly fluorescent clones were expanded and investigated by radiolabeling and immunoprecipitation. One high-level expression clone was selected for further investigation.
Metabolic Labeling-Newly confluent CHO-pSNMUC2-MG cells were starved for 2 h at 37°C in 2 ml of Met-and Cys-free minimal essential medium (Invitrogen) with 10% fetal bovine serum per 27-cm 2 Petri dish and metabolically labeled at 37°C using 150 Ci of [ 35 S]Met/ Cys (Redivue PRO-MIX 35 S cell labeling mixture, Amersham Biosciences) per Petri dish. The medium was collected and adjusted to 1 mM phenylmethylsulfonyl fluoride, 20 g/ml aprotinin, and 5 mM Nethylmaleimide. The cells were washed twice with cold phosphatebuffered saline before lysis in 2 ml of 50 mM Tris-HCl (pH 7.9), 150 mM NaCl, 1% Triton X-100, 1 mM phenylmethylsulfonyl fluoride, 20 g/ml aprotinin, and 5 mM N-ethylmaleimide. Lysates were sonicated three times for 5 s each (MSE Soniprep 100, intensity of 15) and clarified by centrifugation.
Gel Electrophoresis-The Laemmli system was used for gel electrophoresis, which was carried out on 0.75-mm thick gels in a Mini-Protean II cell (Bio-Rad). Precision protein standards (Bio-Rad) were used as molecular mass markers. Silver staining of SDS-polyacrylamide gels was carried out according to the method described by Blum et al. (18).
Antibodies and Affinity Purification of Rabbit Antiserum-Rabbit anti-MUC2N1 antiserum was raised against the synthetic peptide ED-PEEEVAPASC (GPEP11, amino acids 207-218 of MUC2) as previously described for the anti-MUC2N3 antiserum (19). This antiserum was used directly or purified on a 3-ml gel prepared from 1 g of thiopropyl-Sepharose 6B (Amersham Biosciences) and washed with 20 column volumes of binding buffer (100 mM Tris-HCl (pH 7.4), 500 mM NaCl, and 1 mM EDTA). 7 mg of peptide (GPEP11) dissolved in binding buffer were pumped through the gel at 40 l/min for 16 h. After washing the column at 350 l/min with 30 ml each of 10 mM Tris-HCl (pH 7.4), 100 mM glycine HCl (pH 2.5), and 10 mM Tris-HCl (pH 7.4), the anti-MUC2N1 antiserum (10 ml), dialyzed against 10 mM Tris-HCl (pH 7.4) and diluted with the same buffer to 50 ml, was applied to the column at 200 l/min. The column was washed at 500 l/min with 60 ml of 10 mM Tris-HCl (pH 7.4) and 60 ml of 10 mM Tris-HCl (pH 7.4) and 500 mM NaCl. The antibodies were eluted with 30 ml of 100 mM glycine HCl (pH 2.5) into a tube with 3 ml of 1 M Tris-HCl (pH 8.0). Additional 2 M Tris base was added to bring the eluate to a pH of ϳ7.0. The eluate was concentrated, and the buffer was exchanged to 25 mM Tris-HCl (pH 7.4), 150 mM NaCl, and 0.02% (w/v) NaN 3 by ultrafiltration (3000 ϫ g at ϩ4°C; Vivaspin 2 RC, M r 30,000 cutoff, Sartorius Corp.). The anti-MUC2N3 antiserum (19) was used unpurified, and the anti-Myc mAb (clone 9E10, American Type Culture Collection, CRL-1729) was used as hybridoma spent culture medium. The anti-GFP mAb was from BD Biosciences.
Immunoblotting-Proteins were blotted onto nitrocellulose membranes (Trans-Blot transfer medium, Bio-Rad) in 25 mM Tris and 192 mM glycine using a semidry transfer cell (Trans-Blot SD, Bio-Rad) at 2 mA/cm 2 for 2 h. The membrane was blocked for 3 h at room temperature in 10% milk powder in phosphate-buffered saline with 0.1% Tween 20 and incubated at 4°C overnight with the primary antibody. The membrane was incubated for 1 h at room temperature in the secondary antibody (1:1000 dilution; anti-rabbit or anti-mouse alkaline phosphatase-conjugated IgG, Dako Corp.) and developed with nitro blue tetrazolium/5-bromo-4-chloro-3-indolyl phosphate.
Immunoprecipitation-50 l of Dynabeads M-450 (goat anti-mouse IgG; 4 ϫ 10 8 beads/ml; Dynal, Inc.) were washed three times with 500 l of phosphate-buffered saline, 0.1% bovine serum albumin, and 0.1% sodium azide and incubated on a shaker for 1 h at room temperature with 200 l of anti-Myc mAb hybridoma supernatant. The beads were washed four times with 500 l of 50 mM Tris-HCl (pH 7.9), 150 mM NaCl, and 1% Triton X-100 and resuspended in 50 l of the same buffer. 2 ml of metabolically labeled culture medium and cell lysate, respectively, were added and incubated overnight at 4°C. After washing 10 times with 500 l of 50 mM Tris-HCl (pH 7.9), 150 mM NaCl, and 1% Triton X-100, the beads were eluted in 50 l of 50 mM Tris-HCl (pH 6.8) and 1% SDS for 5 min at 95°C.
Purification of SNMUC2-MG-100 ml of culture medium and cell lysates, respectively, containing SNMUC2-MG were filtrated in an Amicon Model 8400 stirred cell through an Amicon YM100 ultrafiltration membrane (Millipore Corp.). The volume was reduced to 25 ml, diluted to 300 ml with BT buffer (20 mM BisTris-HCl (pH 6.0)), and reduced again to 25 ml. This procedure was repeated four times. Aliquots (5 ml) of the concentrate were loaded onto a Superose 6 preparation grade column (1000 ϫ 26 mm; Amersham Biosciences) in BT buffer at a flow rate of 1.5 ml/min. The eluting proteins were monitored at 280 nm and collected in 6-ml fractions. The SNMUC2-MG-containing fractions, as determined by SDS-PAGE, were pooled and directly loaded onto an ion-exchange column (Mono Q 5/5, Amersham Biosciences) in BT buffer. Medium-derived SNMUC2-MG was washed with 210 mM NaCl in BT buffer and eluted with 250 mM NaCl in BT buffer. Lysatederived SNMUC2-MG was washed with 170 mM NaCl in BT buffer and eluted with 210 mM NaCl in BT buffer. The collected 1-ml fractions were individually analyzed by SDS-PAGE and silver-stained. After gel filtration as well as ion-exchange chromatography, the fractions were also subjected to SDS-PAGE and Western blotting using the anti-Myc mAb and anti-MUC2N1 antiserum. The fractions containing SNMUC2-MG were pooled (no forms of intact SNMUC2-MG were excluded), dialyzed 2 J. R. Gum, Jr., unpublished data. against H 2 O, and lyophilized. This purification finally resulted in a yield of ϳ0.5 mg of SNMUC2-MG protein.
Preparation of Tryptic Core Fragment-Purified medium-derived SNMUC2-MG (100 g) was incubated in 0.1% SDS for 5 min at 95°C and subsequently diluted with 9 volumes of 100 mM ammonium acetate (pH 8.5). After addition of 5 g of tosylphenylalanyl chloromethyl ketone-treated trypsin (Sigma), the sample was incubated overnight at 37°C. For deglycosylation, the trypsin digest was stopped with 100 g/ml aprotinin and 1 mM phenylmethylsulfonyl fluoride at room temperature for 4 h, and 4 units of N-glycosidase F (Roche Molecular Biochemicals) were added to the sample and incubated overnight at 37°C. Purification was done on lyophilized material that was redissolved in 0.1 M ammonium acetate (pH 8.5) and separated by gel filtration on a Superose 6 PC 3.2/30 column (Amersham Biosciences) using a Smart HPLC system (Amersham Biosciences). Fractions containing the tryptic core fragment were lyophilized with and without prior reduction in 10 mM dithiothreitol.
Deglycosylation with Hydrogen Fluoride-Immunoprecipitated 35 Slabeled SNMUC2-MG was lyophilized before addition of 100 l of anhydrous hydrogen fluoride and incubated for 90 min on ice. Hydrogen fluoride was evaporated under a stream of nitrogen and finally for 15 min in a SpeedVac concentrator. The material was redissolved in 20 l of SDS sample buffer with 200 mM dithiothreitol and analyzed on a 5% SDS-polyacrylamide gel.
Enzymatic Deglycosylation-35 S-Labeled SNMUC2-MG was immunoprecipitated, and the beads were resuspended in 50 l of 50 mM Tris-HCl (pH 7.9), 150 mM NaCl, and 1% Triton X-100. Aliquots (10 l) were washed twice with 100 l of 50 mM sodium acetate (pH 6.0), 4 mM calcium chloride, and 100 g/ml bovine serum albumin and resuspended in 50 l of the same buffer before addition of 0.5 l of neuraminidase (1 unit/ml), 0.5 l of O-glycosidase (0.5 units/ml), or 0.5 l of endoglycosidase H (5 units/ml) (Roche Molecular Biochemicals), respectively. Aliquots (10 l) were washed twice with 100 l of 50 mM sodium phosphate buffer (pH 7.0), 10 mM EDTA, and 100 g/ml bovine serum albumin and resuspended in 50 l of the same buffer before addition of 0.5 l of N-glycosidase F (1000 units/ml). All samples were incubated on a shaker overnight at 37°C. After removal of the supernatant, the beads were eluted in 20 l of SDS sample buffer containing 10% mercaptoethanol for 5 min at 95°C and analyzed by SDS-PAGE and autoradiography.
In-gel Digest-The deglycosylated tryptic core fragment was separated by SDS-PAGE on 4 -15% gradient Ready-Gel (Bio-Rad) and stained with Coomassie Blue. After destaining in 30% methanol and 7% acetic acid, the gel was soaked in H 2 O overnight. The excised band was cut into 1-mm pieces. After successive washing steps for 30 min at room temperature on a shaker in 0.2 M ammonium bicarbonate, 0.2 M ammonium bicarbonate and 25% acetonitrile, H 2 O and 25% acetonitrile, H 2 O and 50% acetonitrile, and 100% acetonitrile, the gel pieces were air-dried for 2 h. 1 volume of trypsin (sequencing grade, Roche Molecular Biochemicals; 20 g/ml in 0.2 M ammonium bicarbonate) was added to the dried gel; and 10 min later, an additional volume of 0.2 M ammonium bicarbonate was added and incubated overnight at 37°C. The peptides were eluted from the gel pieces in 4 volumes of 0.1% trifluoroacetic acid (for MALDI-MS) or 0.2% formic acid (for ESI-MS/ MS) for 6 h at room temperature.
MALDI-MS-The reduced and alkylated tryptic core fragment without deglycosylation was in-gel trypsin-digested (see above), lyophilized, and purified on a Zip-Tip C 18 column (Millipore Corp.) according to the manufacturer's instructions. The MALDI mass spectra were collected on a Micromass Tof-Spec E time-of-flight mass spectrometer with delayed extraction in the reflectron mode. 0.5 l of matrix solution (10 mg of ␣-cyano-4-hydroxycinnamic acid (Sigma) in 1 ml of 1:1 acetonitrile/ water) were mixed on the target with 0.5 l of sample and then left to dry. The monoisotopic peptide masses from the MALDI mass spectra were matched against theoretical tryptic peptides from the full-length SNMUC2-MG sequence for identification.
ESI-MS/MS-0.1 volume of the in-gel trypsin-digested core fragment was purified on a Zip-Tip C 18 column according to the manufacturer's instructions. The eluted volume (50%, ϳ2 l) was analyzed by nanoflow ESI-MS/MS on a Micromass Q-Tof 1 mass spectrometer. Argon was used as the collision gas with a collision energy of 40 eV.
Edman Sequencing-The lyophilized tryptic core fragment (100 pmol) was redissolved in SDS sample buffer containing 100 mM dithiothreitol and separated by SDS-PAGE on 5% Ready-Gel. After Western blotting onto polyvinylidene difluoride membrane (Immobilon-P, Millipore Corp.) in 50 mM sodium borate buffer (pH 8.9) and 10% methanol for 90 min at 2 mA/cm 2 , the membrane was stained with 0.1% Coomassie Blue R-250 (Bio-Rad). The stained band was cut out and N-terminally sequenced by Edman degradation on a Procise 492 Protein Sequencer (Applied Biosystems). A pulsed-liquid sequencing method for polyvinylidene difluoride-blotted protein was used according to the manufacturer.
Electron Microscopy-SNMUC2-MG was analyzed by negative staining and electron microscopy as described previously (20). The usual sample concentration was ϳ20 g/ml in 50 mM Tris-HCl (pH 7.4) and 0.15 M NaCl. Aliquots (5 l) were adsorbed onto carboncoated grids for 1 min, washed with 2 drops of water, and stained with 2 drops of 0.75% uranyl formate. The grids were rendered hydrophilic by glow discharge at low pressure in air. In some experiments, the N-terminal particles were identified with the anti-Myc mAb (9E10), the anti-GFP mAb (BD Biosciences), or the purified rabbit anti-MUC2N1 antiserum, all labeled with colloidal thiocyanate gold (21). Specimens were observed in a Jeol JEM 1230 electron microscope operating at an accelerating voltage of 60 kV. Images were recorded with a Gatan Multiscan 791 CCD camera.

RESULTS
Expression of the MUC2 N Terminus-A plasmid expressing the MUC2 N terminus up to amino acid 1397 (7) followed by a Myc tag and GFP was constructed and called pSNMUC2-MG. The MUC2 sequence contains all of the MUC2 N terminus, including the last cysteine (position 1395), and stops at the beginning of the first and small mucin domain. The plasmid was transfected into CHO-K1 cells, and stable cell lines were selected from fluorescent colonies. These were further cloned and analyzed for secretion of the MUC2 N terminus. Several clones showed high-level and consistent secretion of proteins migrating identically upon gel electrophoresis when stained chemically or with antibodies reacting with SNMUC2-MG (anti-MUC2N3 antiserum, anti-Myc mAb, and anti-GFP mAb). One of these clones was selected, expanded, and used for the purification of SNMUC2-MG.
To study the expression and sizes of the MUC2 N terminus expressed in the selected CHO cell lines, the cells were metabolically labeled, immunoprecipitated using the anti-Myc mAb, and analyzed by SDS-PAGE. Only one band was precipitated from the cell lysate with an apparent mass of ϳ220 kDa, and one band from the medium with an apparent mass of ϳ260 kDa, both analyzed in their reduced forms (using the sizes of the Precision protein standards) (Fig. 1A). The nonreduced secreted MUC2 N terminus gave only one large band with a size exceeding that of apoB-100 (512 kDa), which was used as a molecular mass marker. The lack of a marker larger than the N terminus made estimation of the size of the secreted oligomeric product difficult. However, the fact that it migrated significantly slower than the apoB-100 marker suggested that its oligomeric state was higher than a dimer.
To analyze the processing of the N terminus in CHO cells, pulse-chase studies were performed. The cells were metabolically labeled for 15 min, followed by a chase for up to 3 h, and analyzed by immunoprecipitation using the anti-Myc mAb and SDS-PAGE. The intracellular band was observed already after the pulse period, and this band vanished at the same time (90 -135 min) that the secreted oligomer appeared in the medium (Fig. 1B). These results suggest that the N terminus of MUC2 is oligomerized relatively late in the secretory pathway and that the secreted form is present as a disulfide bondstabilized oligomer. To further study the reason for the difference in size between the intracellular form and the secreted oligomer, both the intracellular and secreted forms were purified.
Purification of SNMUC2-MG-To purify SNMUC2-MG in its secreted oligomeric form as well as in its intracellular monomeric form, identical techniques were used. At first, both the culture medium and cell lysate were concentrated by four rounds of ultrafiltration through a 100-kDa cutoff filter with intermediate dilution with buffer. The concentrates thus lacked small components and were further purified by gel filtration on Sepharose 6. Fractions containing SNMUC2-MG were pooled and directly loaded onto a Mono Q ion-exchange column. The bound proteins were stepwise eluted with increasing amounts of NaCl. The secreted oligomeric form of SNMUC2-MG could be purified to homogeneity, as confirmed by SDS-PAGE, revealing the reduced monomers at an apparent molecular mass of 260 kDa (Fig. 2, lane 1). The purified intracellular monomers with an apparent molecular mass of 220 kDa were still contaminated by a weak band of ϳ170 kDa (Fig. 2, lane 2). On Western blots, the 260-and 220-kDa bands were detected by the anti-MUC2N3 antiserum, the anti-GFP mAb, and the anti-Myc mAb. However, the 170-kDa band was stained only by the anti-MUC2N3 antiserum, suggesting that it was a product of proteolytic cleavage, where the remaining protein lacked the Myc tag and GFP (data not shown).
Glycosylation Status of the MUC2 N Terminus-When the intracellular and secreted forms of SNMUC2-MG were analyzed in their reduced forms, a difference in size of ϳ40 kDa was revealed (Fig. 2). As glycosylation is one of the possible explanations for this, N-glycosylation and O-glycosylation were analyzed. As CHO cells preferably make mono-and disialylated Gal␤1-3GalNAc-(Ser/Thr), O-glycosylation can be studied by neuraminidase treatment, followed by O-glycosidase treatment. As shown in Fig. 3A, when analyzed in their reduced forms, the intracellular form (lysate) was unaffected by these enzymes, whereas the secreted form was reduced in size by this treatment. The secreted form was still larger than the intracellular one, suggesting additional differences or incomplete O-deglycosylation.
When the N-glycans were studied by endoglycosidase H digestion, the intracellular N terminus was reduced in size, whereas the secreted one was unaffected (Fig. 3A). Both forms were affected by the N-glycosidase F treatment. That the intracellular form was endoglycosidase H-sensitive and lacked O-glycans suggested that this N terminus was trapped in the endoplasmic reticulum. This information, together with the pulse-chase studies, suggests a relatively slow folding process and an efficient endoplasmic reticulum exit control for the MUC2 N terminus, resulting in accumulation in this compartment.
The secreted form of SNMUC2-MG was still larger than the intracellular form after removal of the O-glycans (Fig. 3A). Part of this is due to the N-glycans attached at the 10 potential sites, but all of the difference is not accounted for in this way. To further study this, the secreted and intracellular forms of the N terminus were subjected to anhydrous hydrogen fluoride treatment, known to effectively remove both N-and O-linked glycans, followed by PAGE (Fig. 3B). The sizes of both forms decreased, but secreted form of SNMUC2-MG was still slightly larger (10 -20 kDa) than the intracellular form. The sizes of the endoglycosidase H-and hydrogen fluoride-treated intracellular forms of SNMUC2-MG were similar, suggesting that the hydrogen fluoride treatment effectively removed all glycans. This suggests that the secreted MUC2 N terminus has modifications in addition to glycosylation giving an apparent higher molecular mass. It is less likely that the intracellular form is smaller due to proteolytic cleavage, as this band was stained by all available antibodies against SNMUC2-MG, but this cannot be excluded.
Trypsin-resistant Core Fragment-To further study how the secreted oligomerized MUC2 N terminus was assembled, it was subjected to proteolytic digestion. The purified nonreduced and reduced/alkylated forms of SNMUC2-MG from the spent culture medium were treated with trypsin and analyzed by SDS-PAGE under reducing conditions. As expected, the reduced/ alkylated form of SNMUC2-MG did not give any larger peptides analyzable by SDS-PAGE. However, the trypsin-digested nonreduced form of SNMUC2-MG gave a single distinct band migrating with an apparent molecular mass of 85 kDa when analyzed on a reducing gel (Fig. 4) compared with the non-trypsinized molecule (260 kDa). When trypsin-digested nonreduced SNMUC2-MG was analyzed under nonreducing conditions, a band with an apparent molecular mass of 240 kDa was observed (Fig. 4B). This suggests that the disulfide bondstabilized N-terminal oligomer is held together within this fragment, here called a core. Kinetic studies revealed that the SNMUC2-MG protein was converted to the 85-kDa band within 1 h, and no further degradation was observed for up to 24 h (data not shown). It was possible to digest SNMUC2-MG to the core fragment also without the preincubation with 0.1% SDS, but SDS pretreatment gave more consistent results. A band with an apparent mass of ϳ85 kDa was also obtained when nonreduced SNMUC2-MG was treated (without SDS pretreatment) with the proteases chymotrypsin, Glu-C, and thermolysin. Proteolytic digestion with Pronase gave a single band of ϳ60 kDa (data not shown). Subtilisin was the only protease tested that degraded nonreduced SNMUC2-MG to smaller peptides.
To delineate the part of the MUC2 N terminus that was involved in the oligomerization corresponding to the core fragment, epitope mapping and protein sequencing studies were performed. Two antisera (anti-MUC2N1 and anti-MUC2N3) against the MUC2 N terminus as well as the anti-Myc mAb were used in Western blot experiments on trypsin-digested SNMUC2-MG separated by SDS-PAGE under reducing conditions (Fig. 4A). The anti-MUC2N1 antiserum did not react with the 85-kDa core fragment, whereas the anti-MUC2N3 antiserum did, revealing that the C-terminal part of MUC2 in SNMUC2-MG was part of the core. The Myc epitope was inserted directly after the MUC2 N terminus, replacing the first and small mucin domain of MUC2 (7). As the anti-Myc mAb did not react with the core fragment, the core was localized within the last part of the D3 domain of the MUC2 N terminus as outlined in Fig. 5.
The N-terminal end of the core fragment was determined by Edman sequencing of the 85-kDa core fragment after SDS-PAGE and blotting of trypsin-treated SNMUC2-MG. A single sequence was obtained, EAPTXPD, where X corresponds to a cysteine that could not be analyzed in this experiment (Fig. 5 and Table I). The trypsin-resistant core fragment thus had a homogeneous N-terminal end and was due to cleavage after Arg 1022 in the MUC2 sequence. The C terminus of the core fragment has to extend at least over amino acids 1168 -1181, as the anti-MUC2N3 antiserum showed reactivity (see Fig. 5).  Table I. To further determine the C-terminal end of the core fragment, it was purified and subjected to SDS-PAGE under reducing conditions. The 85-kDa band was in-gel trypsin-digested, and the peptides obtained were analyzed by MALDI-MS and ESI-MS using nanospray on a Q-Tof mass spectrometer. The trypsin cleavage sites are shown in Fig. 5, and the observed peptides in Table I. The molecular mass of the peptide corresponding to the N-terminal end as revealed by Edman sequencing was found at m/z 2534.1 in both the MALDI-MS and ESI-MS experiments. Several of the internal peptides were observed, and some were possible to sequence by ESI-MS/MS (underlined sequences in Fig. 5). The peptides containing amino acids 1147-1166 and 1228 -1251 contained one and two potential N-glycosylation sites, respectively, and were detected only after N-glycanase treatment.
The most C-terminal peptide found was at m/z 1022.4 2ϩ and 1012.9 2ϩ (masses of 2043.8 and 2024.8 Da for the mercaptoethanol-and carboxyamidomethyl-modified cysteine, respectively) and corresponded to amino acids 1367-1383. This peptide could be sequenced by the electrospray mass spectrometer as shown in Fig. 6. The sequence can be read from the Cterminal end as y ion series all the way up to y 15 at m/z 1800.8 and from the N terminus as b ion series up to b 5 at m/z 634.3. The masses revealed that the cysteine had been modified by mercaptoethanol and that Asn 1373 in the original MUC2 sequence had been deaminated into Asp. That the plasmid constructed coded for Asn at this position was verified by DNA sequencing. The deamination of Asn was observed in several of the peptides (Table I) and is probably due to unintended deamination during the preparation procedures. This last peptide was thus produced by cleavage after Lys 1383 . The next potential trypsin cleavage site is only two amino acids farther C-terminal, and such a dipeptide will not be detected. There are two potential additional peptides toward the C terminus (Fig. 5). The first of these peptides (VNCCWPMDK) is derived only from MUC2, and the second (CITSSEQK) is from the last amino acids of MUC2 (Thr 1397 ), two cloning artifacts (SS), and the first three amino acids of the Myc tag (EQK). We have not been able to find these two peptides despite careful searches. This means that it is likely that the core fragment ends at amino acid 1383 (or 1385).
It is thus assumed that the core fragment is made up of 361 (or 363) amino acids with a predicted molecular mass of 40 kDa (or 42 kDa). This is considerably smaller than the apparent molecular mass determined by SDS-PAGE (85 kDa). The reason for this apparent difference is probably glycosylation. Removal of the five potential N-glycans by N-glycosidase F gave products that migrated at ϳ65 kDa (data not shown). The core fragment also has 70 Ser and Thr residues, several of which are gathered in a mucin-type domain sequence containing 23 Thr and 6 Ser residues out of 44 amino acids (Fig. 5). The average O-glycan found in proteins expressed in CHO cells has one NeuAc, one Gal, and one N-acetylgalactosaminitol residue, suggesting that only a fraction of the hydroxyamino acids need to be glycosylated to account for the size discrepancy.
Oligomeric Status of the Intact Secreted MUC2 N Terminus-As already suggested from Fig. 1, the intact secreted form of SNMUC2-MG has a disulfide bond-stabilized oligomeric state larger than a dimer. To further analyze this, size-exclusion gel chromatography was performed with 4 M guanidinium chloride on a Superose 6 column. Nonreduced SNMUC2-MG eluted at 10.35 ml, slightly before the largest molecular mass standard, thyroglobulin, with a mass of 669 kDa, which eluted at 10.70 ml (see "Experimental Procedures"). The size of the MUC2 N terminus could thus not be accurately determined, but was estimated to be 690 -750 kDa. As reduced SNMUC2-MG had an estimated mass of 260 kDa, the MUC2 N terminus appears to be organized into a trimer.
Oligomeric Status of the Trypsin-resistant Core Fragment of the MUC2 N Terminus-To determine whether the core fragment was still held together as an oligomer, nonreduced SNMUC2-MG purified from the spent culture medium was treated with trypsin and analyzed before and after reduction by SDS-PAGE. The nonreduced core fragment gave a band at ϳ240 kDa (Fig. 4B). Upon reduction, this was reduced to 85 kDa. Both of these bands were now within range of the molecular mass markers, allowing accurate size estimations. The size difference between the nonreduced and reduced core fragments suggests that the core is held together as a trimer.
Electron Microscopy of the Secreted MUC2 N-terminal Trimer-To further analyze the oligomeric status as well as macromolecular structure, the nonreduced N terminus was subjected to electron microscopy after negative staining. To be able to localize the domains, anti-Myc, anti-GFP, and immunopurified anti-MUC2N1 antibodies were gold-labeled, and SNMUC2-MG were immunostained. The overall picture of the MUC2 N terminus is a variable three-dimensional structure (Fig. 7). However, a general molecular structure with a central domain connected with single flexible threads to three identical globular domains could be observed. These flexible parts make the images variable, with many molecules less well spread and exposed on the grid. The central domain, holding the molecule together, was stained by both the anti-GFP (Fig. 7, A-C, E, and F) and anti-Myc (Fig. 7D) antibodies. Only one antibody dot could be observed per molecule despite the fact that each trimer should contain three sites. The reason for this it not known, but is most likely due to steric hindrance, an interpretation supported by the observation that it is impossible to catch and  N 3 D (1373 b ) a aa, amino acid; Cys(Cm), carboxyamidomethylated Cys; Cys(ME), mercaptoethanol-modified Cys. b Asn deaminated to Asp, preparatory artifact. c Asn to Asp due to deglycosylation with N-glycosidase F. detect this protein with the same antibody (anti-Myc or anti-GFP) in sandwich enzyme-linked immunosorbent assay experiments. When the target was a little more strongly negatively stained, the three large globular domains were less pronounced, whereas the central domain was more easily visualized (Fig. 7, E and F). In some molecules, the central part had an electron-dense trefoil-like structure (Fig. 7F). This trefoil could be part of the D3 domain where the MUC2 N terminus is held together. The flexible thread is probably a single peptide chain linking the central core with the large globular domain made up of the D1 and D2 domains (see Fig. 5). This interpretation is supported by the gold-labeled anti-MUC2N1 antibody, generated against a peptide from the D1 domain (Fig. 5). This antibody can stain one, two, or three of these globular domains (Fig. 7, G-I). The gold particles are located farthest away from the central parts, suggesting that the N-terminal D1 domain is located in the outer parts of the molecule. Some images (Fig.  7G) suggest a substructure in the large globular parts, maybe due to the D1 and D2 domains. This interpretation of the electron microscopy images is schematically explained in Fig.  7K. The size of the large globular domain is estimated to be ϳ10 nm, and the length of one subunit is estimated to be 20 -25 nm when maximally extended. The extended parts have not been localized in detail due to a lack of functional antibodies. It can thus be concluded that the MUC2 mucin N terminus is oligomerized into a trimer and that this is held together with disulfide bonds in a trypsin-resistant, relatively compact domain made up by amino acids 1023-1383 of MUC2. DISCUSSION CHO cells permanently expressing and secreting the N terminus of the MUC2 mucin in high quantities were instrumen-tal in the studies presented here. The selection of clones was greatly facilitated by the presence of GFP at the C terminus, as only green cell colonies were picked and screened for secretion. The initial use of COS cells was abandoned, as these cells secrete only small amounts of the recombinant protein, probably due to a less efficient endoplasmic reticulum folding machinery for this protein. However, CHO cells also seem to have some difficulties in folding the MUC2 N terminus, as suggested by the presence of relatively large amounts of endoglycosidase H-sensitive MUC2 monomers in the cell lysates. Despite this, the spent culture medium contained large quantities of the recombinant protein, allowing an easy purification procedure and subsequent biochemical studies.
Already at the time of cDNA sequencing of the MUC2 mucin (7), it was suggested that the assembly of this mucin followed the principles of the vWF (13,14). The assembly process is initiated in the endoplasmic reticulum, where dimers are formed by the C terminus via its cysteine knot. This has been shown to be the case for the full-length vWF (14), MUC2 (19,22), and MUC5AC (23) as well as for the recombinantly expressed C terminus of PSM (4), rat Muc2 (24), and human MUC2. 3 That this process is localized to the endoplasmic reticulum is suggested by the rapid formation of the dimers, the endoglycosidase H sensitivity, the subcellular localization of dimers, and the effects of drugs (brefeldin A) on the process (3,4,19,22).
The tion of this was late in the assembly process, as the secretion peaked almost 2 h after translation. The oligomerization and secretion of MUC2 were fast, as no fully glycosylated monomers or oligomers were detected inside the cell. This suggests that the oligomerization is late in the secretory pathway, maybe occurring in the trans-Golgi network or secretory vesicles concomitantly with secretion.
Following the initial C-terminal dimerization, the N terminus is responsible for further oligomerization in vWF, PSM and MUC2 proteins (4 -6, 11, 13, 14). This oligomerization is based on the formation of N-terminal dimers in the case of the vWF; and due to the mucin sequence similarities to the vWF, it was predicted that also mucins should be assembled similarly. This assumption was based not only on these sequence similarities, but also on the majority of biophysical information that was pointing in the direction of mucins being linear polymers (15,16,25). It thus came as a surprise when Hill and co-workers (5,6) expressed the N terminus of PSM and observed bands suggesting that it formed trimers instead of dimers. This interpretation was based on migration on SDS-polyacrylamide gels, where the large size of the trimer made accurate size estimations difficult. It has also been discussed whether PSM could be an exception and different from other mucins. Here, we have shown that the N terminus of the MUC2 mucin also forms trimers, supporting the previous results with PSM. Four lines of evidence support that the MUC2 N terminus forms trimers. First, the nonreduced secreted MUC2 N terminus migrated as an oligomer larger than a dimer, most likely a trimer, upon SDS-PAGE. Second, the nonreduced secreted MUC2 N terminus eluted before thyroglobulin upon gel filtration, giving an estimated size of ϳ700 kDa. Third, trypsin cleavage of the nonreduced secreted MUC2 N terminus gave a core fragment that was three times larger than its constituent peptides obtained after reduction. Last, electron microscopy of the MUC2 N terminus revealed three globular domains held together in a central domain that looks like a trefoil. Thus, it must be concluded that MUC2 forms a trimer structure in its N terminus.
The formation of N-terminal trimers in MUC2 and PSM suggests that at least two mucin species do not form linear polymers. The easiest interpretation is that these mucins will form branched networks. This may not be consistent with most of the biophysical information for soluble mucins, including PSM, which has predicted linear polymers. This interpretation was based on light scattering, ultracentrifugation, and molecular electron microscopy studies (15,16,25). One possibility is that MUC2 and PSM are different from the other gel-forming mucins. This is true for MUC2, as it is known to form insoluble complexes (8,10), as is a minor portion of MUC5B (26); but there is no information suggesting that PSM can become insoluble. To understand this, we have to await further studies on the other gel-forming mucins to determine whether they follow the suggested assembly for MUC2 and PSM or that of vWF with linear polymers. We observed the secretion of only Nterminal trimers; and thus, it is less likely that forms of MUC2 other than N-terminal trimers will be secreted from normal cells. However, it cannot be excluded that the mucin domain that should follow the N terminus could interfere or modulate the formation of N-terminal trimers in vivo. During studies of the assembly of full-length MUC2 in LS174T cells, we observed that not only C-terminal dimers of MUC2 are transferred out of the endoplasmic reticulum and become glycosylated in the Golgi apparatus, but also O-glycosylated monomers were detected (11). The reason for this is not known, but could be due to some MUC2 escaping the dimerization process or a yet undefined proteolytic cleavage. As the dimerization in the C terminus is believed to be an endoplasmic reticulum-specific process, these monomers cannot be extended once they enter into the Golgi apparatus or other later parts of the secretory pathway. The N terminus of these MUC2 monomers can most likely be incorporated into N-terminal trimers and could as such block further elongation. There is currently no proof of such a hypothetical model of mucin assembly, but regulation of the level of mucin monomers passing out of the endoplasmic reticulum could be one out of several ways to regulate mucin structure and thus its properties (11).
The three identical peptides that made up the trypsin-resistant core fragment were localized to the most C-terminal end of the MUC2 N terminus. This shows that the disulfide bonds holding the N terminus together must be localized in this part of the molecule. This is not a surprise, as this has been sug- gested for the vWF and PSM (4). Site-directed mutagenesis of recombinantly expressed PSM (6) suggests that Cys 1130 of MUC2 in the conserved sequence ECEWHY is one of the cysteines forming the intermolecular disulfide bonds. The localization of this cysteine in the core fragment is marked in Fig. 5. To allow for trimerization, there must be at least one additional intermolecular disulfide bond stabilizing the N-terminal trimer. The localization of this cysteine is not known today.
The molecular structure of the MUC2 N-terminal trimer as revealed by electron microscopy is consistent with the sizes of the domains and their potential function. The D1 and D2 domains are likely to harbor the disulfide bond isomerase activity catalyzing the trimerization. These domains are probably localized to the three large globular domains that were attached to the central core with a single flexible peptide. The precise localization of this peptide has not been determined, but it could be in the DЈ domain or may be more likely a part of the D3 domain, as this contains a long sequence (72 amino acids) lacking Cys (positions 917-988). The flexibility of this link could allow the enzymes to move and come close to their catalytic targets in the central core. This core, probably largely made up by the D3 domain, could be localized with two antibodies directed toward the recombinantly attached Myc tag and GFP. This core was revealed as a very dense trefoil-like structure, consistent with the localization of the trypsin-resistant core to this region. The apparent localization of the GFP and Myc tag to the stalk of the trefoil (Fig. 7) suggests that the first small mucin domain of MUC2 is localized here, close to the central core. That the flexible link between the large globular domains and the central core is susceptible to trypsin cleavage is understandable. The D1 and D2 domains can probably also be cleaved off under physiological circumstances as shown for the vWF (4) and MUC5B from salivary glands (26). However, no cleavage of the MUC2 N terminus was observed in CHO cells, nor has this been found for PSM expressed in COS cells. This may suggest that despite the flexible nature of this part, there is a need for specific enzymes present only in certain cells.
The native MUC2 mucin produced in intestinal cells, both in cell culture and in cells in situ, is also held together by additional non-disulfide intermolecular bonds (10,11). The nature of these bonds, as well as their location within the MUC2 mucin, is presently unknown. Reducing the SNMUC2-MG protein expressed in CHO cells gave only monomers; and thus, we have not been able to find any nonreducible linkage during these studies. The reason for this could be either that these bonds are formed in other parts of MUC2 or that necessary components for the formation of this type of linkage are missing in CHO cells.
The goblet cells of the small and large intestines are the normal site for MUC2 mucin biosynthesis. One of the main challenges of a digestive system is protection against digestive enzymes. The MUC2 mucin is the main intestinal mucin localized to the mucous gel covering the epithelial cells. This suggests that it has an important role in this protective system and that it must be able to withstand the action of pancreatic enzymes. How this works is easy to understand for the mucin domains, as the peptide core is protected by the dense Oglycosylation. However, it is even more important to protect the oligomeric nature of MUC2 as organized by the N and C termini. The present finding of a trypsin-resistant core fragment still held together as a trimer located in close proximity to the highly glycosylated domain and therefore able to maintain the oligomeric nature of the mucin must be physiologically very important. This proteolytic resistance is not maintained by a lack of amino acids where trypsin can cleave. The core fragment is resistant to cleavage by not only trypsin, but also by most proteolytic enzymes tested except subtilisin. Although no information is currently available on how this proteolytic resistance is maintained, the high number of intramolecular disulfide bonds must be crucial, as reduction makes the MUC2 N terminus just as degradable as other proteins. We have thus been able to express the MUC2 N terminus and to show that it forms trimers held together within a trypsin-resistant core fragment, suggesting that mucins will form branched networks and not simple linear polymers.