Human Mucin Genes MUC2, MUC3,MUC4, MUC5AC, MUC5B, andMUC6 Express Stable and Extremely Large mRNAs and Exhibit a Variable Length Polymorphism

Of the nine mucin genes that have been characterized, only MUC1 and MUC7 have been fully sequenced, and their transcripts can be detected as distinct bands of predicted size by Northern blot analysis. In contrast, the RNA patterns observed for each of the other MUC genes have usually shown a very high degree of polydispersity. This polydispersity has been believed to be one of the typical features of the mucin mRNAs, but until now, its origin has remained unexplained. In the work described in the present paper, we investigated two possible kinds of explanation for this phenomenon: namely that the extensive polydispersity results from a biological mechanism or that it is artifactual in origin. The data obtained, as a result of improving the purification and blotting methods, allowed us to show that in all of the tissues analyzed, each of the genes, MUC2–6, expresses mRNAs that are stable and are of an unusually large size to be found in eukaryotes (14–24 kilobases). Moreover, allelic variations in length of these mucin transcripts were observed. We demonstrate that these variations are directly related to the variable number of tandem repeat polymorphisms seen at the DNA level.

Mucins are a heterogeneous family of O-linked glycoproteins expressed in epithelial cells. These molecules, which are secreted or membrane-associated glycoproteins, are believed to play an important role in the protection and function of the epithelial surfaces.
Eight human mucin genes have been characterized that have been given MUC gene symbols numbered 1, 2, 3, 4, 5AC, 5B, 6, and 7 (1). More recently, a cDNA isolated from a bronchus library by Shankar et al. has been called MUC8 (2). Of these genes, MUC1 and MUC7 have been fully sequenced (3,5). MUC1 encodes PEM, an integral membrane mucin glycoprotein expressed on the apical surface of most simple secretory epithelia (3). Expression of this gene has been extensively studied in numerous cell lines and epithelial tissues. Two transcripts are usually detected by Northern blot analysis in most individuals when a tandem repeat probe is used, their sizes being estimated between 4 and 7 kb 1 (6,7). The presence of these two messenger RNAs is a result of allelic variation due to a variable number of tandem repeats (VNTR) per allele of a 20-amino acid motif (from 20 to 125 repeats) (3,4). The other MUC genes are believed to encode secreted mucins. The fulllength cDNA sequence of MUC7 has been published recently (5). The MUC7 gene encodes a small secreted mucin expressed by submandibular and sublingual glands. MUC7 mRNA is 2.4 kb in length. This messenger appears as a distinct and moderately polymorphic band, most individuals exhibiting a unique allelic form corresponding to six repeats in their repetitive domains (5,8).
Sequencing of the genes MUC2-6 is at different stages of progress. The complete structure of MUC2 cDNA is known and published spanning 15,563 base pairs (9). The MUC2 tandem repeat domain is polymorphic, and variations in length of the MUC2 mRNA might be expected among individuals; however, the MUC2 message is not limited to two allelic forms of one transcript but is very often polydisperse by Northern blot analysis (10 -12). The MUC5B gene is entirely cloned and spans approximately 40 kb. The central exon of this gene, encoding the whole tandem repeat domain, was completely sequenced recently and is 10,713 base pairs in length (13). MUC5B shows the same polydisperse signal on Northern blots as the MUC2 probe (14). The complete genomic organization of the 3Ј-region of MUC5B was published very recently (15). No other precise information is available for any of these genes on the genomic organization, the existence of alternative splicing, or the number and the length of their messenger RNAs. In our previous studies, as mentioned for MUC2, we reported smears or very wide bands on Northern blot analyses of the three mucin genes we cloned: MUC4 (16), MUC5AC (17), and MUC5B (14). This has also been reported for MUC3 and MUC6 by the group who cloned them (12,18). A large number of investigators, using other partial mucin cDNA probes (11, 19 -21) or comparing human mucin gene expression in various mucosae (healthy and pathological) and in cell lines (22)(23)(24)(25)(26)(27)(28)(29), have also observed the same polydisperse pattern. Moreover, similar smeared signals have been obtained with animal mucin probes (30 -36). Many reports on mucin transcripts have emphasized the polydispersity of the signal observed by Northern blot analyses, and this polydispersity has been considered as one of the typical char-acteristics of the mucin mRNAs. Two kinds of hypotheses have been proposed to explain these polydisperse signals: those that propose a biological origin due for example to rapid turnover, the existence of several related genes encoding similar transcripts varying in length, or the occurrence of alternative splicing; and those that suggest an artifactual origin due to specific properties of the mucin transcripts such as their unusual length and thus sensitivity to mechanical damage or a particular conformation due to a large tandem repeat domain that perturbs the migration of the molecules in electric fields.
To elucidate this phenomenon, we chose three different approaches. (i) We determined the half-life of secreted mucin transcripts (MUC2, MUC3, MUC5AC, and MUC5B) expressed by the HT29-MTX cell line cultured in the presence of actinomycin D with or without cycloheximide and compared it with that determined for MUC1. (ii) We studied the heterogeneity of mucin messenger RNAs isolated from polysome preparations from the HT29-MTX cell line and from human colonic mucosa and compared the patterns given by nuclear, cytoplasmic, and total RNA preparations by Northern blot analysis. (iii) We explored the hypothesis that conventional methods of mRNA purification are not adapted to the study of very large mucin messenger RNAs, and we developed a modified protocol for the extraction and purification of mucin transcripts with the aim of reducing mechanical degradation. Moreover, we improved the blotting method and investigated the efficiency of poly(A) ϩ selection.
The data obtained allowed us to conclude that all the tissues analyzed express MUC mRNAs that are stable and of unusually large size to be found in eukaryotes (14 -24 kb) and in most cases show allelic length variations. We demonstrate that these variations are directly related to the variable number of tandem repeat polymorphisms seen at the level of the DNA.

Study of the Half-life of Mucin mRNAs Expressed by HT29-MTX Cell
Line-Parental HT29 cells selected by adaptation to 10 Ϫ5 M methotrexate (HT29-MTX) were obtained from Drs T. Lesuffleur and A. Zweibaum (Unité INSERM 178, Villejuif, France) (37,38). Cells were grown in Dulbecco's modified Eagle's minimum essential medium supplemented with 10% inactivated bovine fetal serum. 1.5 ϫ 10 6 cells were seeded in 75-cm 2 flasks (Corning Glassworks, Corning, NY) and grown at 37°C in 10% CO 2 , 90% air atmosphere. The medium was changed daily. Cells were maintained in culture until day 16, before the addition of 8 g/ml actinomycin D (Boehringer Mannheim). In some experiments, protein synthesis was blocked by the addition of 10 g/ml cycloheximide (Sigma) 2 h before the addition of actinomycin D (39).
Total cellular RNA from HT29-MTX cells was extracted using the guanidinium isothiocyanate-cesium chloride ultracentrifugation method as described previously (40). Total RNA was analyzed by Northern blot and by dot blot using previously described technical procedures (15,41). Total RNA was hybridized with the following repetitive mucin probes: MUC2 (10), MUC3 (18), MUC5B (14), and MUC5AC (42). A MYC probe was used as a control of actinomycin D activity (43), and a 28 S rRNA probe was used as a quantitative reference marker for measurement of mucin mRNA half-life. After the hybridization and washing steps, the intensity of the radioactive signal was analyzed by using the PhosphorImager 445SI (Molecular Dynamics Inc., Sunnyvale, CA) and the ImageQuant TM program. Hybridization of the samples with 28 S rRNA was used as a control for the amount of total RNA analyzed in each assay. The relative mucin mRNA/28 S rRNA levels were expressed as the percentage of the corresponding value measured in untreated cells (100% at 0 h) and plotted against the time (hours) after Act D treatment. Between the successive hybridization steps, removal of the probe was achieved by two washings for 15 min in 0.1% SDS at 90°C. Complete removal of each probe was checked by using the PhosphorImager.
Normal Tissues-Normal tissues were obtained from uninvolved areas of surgical specimens, and immediately following surgery, they were snap-frozen in liquid nitrogen and stored in liquid nitrogen until used. Tissues specimens were collected in accordance with the local ethical committee.
Isolation of Nuclear, Polysomal, and Cytoplasmic RNAs-We adapted a protocol that combined two procedures published previously (44,45), which we applied to tissues (ϳ0.5 g) or to cells (10 8 cells). Solid tissue was taken from liquid nitrogen storage, whereas the HT29-MTX cells were harvested at day 16 and used immediately for polysome isolation (two 75-cm 2 flasks). To minimize RNase contamination, all glassware was heated at 180°C for 2 h, and all buffers were prepared using sterile 0.1% diethylpyrocarbonate-treated water and were autoclaved, except for the magnesium acetate stock solution, which was sterilized by filtration before use. All experiments were performed at 4°C. The biological material was homogenized in liquid nitrogen in 10 ml of a buffer containing 20 mM Tris-HCl, pH 7.6, 100 mM KCl, 5 mM magnesium acetate, 10 mM 2-mercaptoethanol, 2 mM dithiothreitol, 1 unit/l RNasin, and 0.25 M sucrose. The homogenate was gently thawed and disrupted with 10 strokes of a Potter-Elvehjem homogenizer. Cell lysis was monitored by using trypan blue dye. The mixture was centrifuged at 10,000 ϫ g for 10 min at 4°C, and a pellet of nuclei was thus obtained. The supernatant was carefully and gently removed and layered over 3 ml of a 1 M sucrose solution containing 20 mM Tris-HCl, pH 7.6, 100 mM KCl, 5 mM magnesium acetate, 10 mM 2-mercaptoethanol, 2 mM dithiothreitol, 0.1 unit/l RNasin. Ultracentrifugation was performed at 38,000 rpm in a Beckman SW41 rotor for 3 h at 4°C, resulting in a polysomal fraction (pellet) and a cytoplasmic fraction (supernatant). The three different fractions containing nuclear, polysomal, and cytoplasmic RNAs, respectively, were resuspended in guanidinium isothiocyanate solution and stored at Ϫ80°C. Purification of RNA was performed from these three fractions as described above.
Improved Method for the Isolation and Transfer of Large RNAs-Large RNAs were extracted from different human tissues (bronchus, antrum, fundus, colon) using an improved method we developed, derived from the guanidinium isothiocyanate protocol (40) as follows.
Guanidinium isothiocyanate buffer (4 M) was prepared by dissolving 23.6 g of guanidinium isothiocyanate, 73.5 mg of sodium citrate, and 250 mg of sodium N-lauroylsarcosine in 50 ml of water, treated with 0.1% diethylpyrocarbonate, and it was then autoclaved. 2-Mercaptoethanol was added to 100 mM before use. The cesium chloride cushion (5.7 M) was prepared by dissolving 24 g of cesium chloride in 25 ml of 0.1 M EDTA, pH 7.5, treated with 0.1% diethylpyrocarbonate, and then autoclaved.
Tissue (optimal weight of 1 g) was ground to a fine powder in a mortar and pestle in liquid nitrogen and mixed with 10 ml of guanidinium isothiocyanate buffer, still in liquid nitrogen. The homogenate was then allowed to thaw gradually at room temperature, during which time the guanidinium isothiocyanate and 2-mercaptoethanol efficiently solubilized the tissue mixture. The homogenate obtained was then transferred onto 3.2 ml of 5.7 M cesium chloride cushion followed by ultracentrifugation for 16 h at 29,500 rpm in a Beckman SW41 rotor. The supernatant was removed, and the pellet of total RNA was very carefully resuspended in 0.1 ϫ Tris-EDTA buffer, pH 8.0 (10 mM Tris-HCl, pH 8.0, and 1 mM EDTA, pH 8.0), 0.1% SDS by using wide mouth pipettes and avoiding all shear forces. RNA was purified by the addition of two volumes of chloroform/n-butyl alcohol (4:1) mixture; the tubes were mixed by gentle inversion, and vortexing was strictly avoided. The top aqueous phase, which contained the RNA, was carefully removed with wide mouth pipettes and transferred to a fresh tube, and the RNA was precipitated by adding 0.1 volume of 3 M sodium acetate, pH 5.5, and 2.5 volumes of ethanol, at Ϫ80°C for 15 h. After centrifugation at 10,000 ϫ g for 30 min at 4°C, the pellet of RNA was washed with ice-cold 95% ethanol and then 100% ethanol, centrifuged as above, and dried in a vacuum desiccator. The RNA pellet was very carefully resuspended in diethylpyrocarbonate-treated water and quantified by measuring the A 260 of an aliquot, and the final preparation was stored at Ϫ80°C. Extreme care was taken at each step during the experiments to prevent any risk of mechanical degradation. Homogenization, the use of syringes, and vortexing were strictly avoided.
RNA samples (the optimal quantity was 10 g) were denatured in loading buffer containing 50% formamide, 18% deionized formaldehyde, and 0.02 M MOPS; heated at 68°C for 10 min; and then fractionated through a 0.9% agarose denaturing gel (13 ϫ 18 ϫ 0.3 cm) containing 18% formaldehyde and 0.02 M MOPS. The gel running buffer was 0.02 M MOPS, pH 7.0. The electrophoresis was performed for 16 h at 30 V. Prior to transfer, the gel was soaked for exactly 20 min in 0.05 N NaOH (this time is optimal for a 3-mm-thick gel). The gel was then rinsed in RNase-free water and soaked for 45 min in 20 ϫ SSC (46). The RNA was transferred onto Hybond TM -N ϩ membrane (Amersham Corp.) via vacuum blotting (1 h in 20 ϫ SSC) and immobilized by baking at 80°C in a vacuum for 30 min and by exposure to UV light for 4 min. Standard RNA markers were obtained from Life Technologies, Inc. and from Boehringer Mannheim (Germany). The apoB-100 DNA fragment cloned in pUC9 was purchased from ATCC (number 57699) and used as a probe.
Southern Blot-For polymorphism studies, DNA and RNA were prepared from the same tissue specimens. DNA from three individuals was prepared from colon mucosae. Tissues were ground to a fine powder in a mortar and pestle in liquid nitrogen, mixed with 40 ml of lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM EDTA), and then purified as described by Jeanpierre (49).
DNA was digested with the following restriction endonucleases: EcoRI/PstI, BglII, and HinfI, according to the manufacturer's protocol (Boehringer Mannheim). Fragments were separated by electrophoresis in 1% (w/v) agarose gels and then transferred onto Hybond TM -N ϩ membranes by vacuum blotting and immobilized by exposure to UV light for 3 min. Prehybridization and hybridization were performed at 65°C in 6 ϫ SSC, 5 ϫ Denhardt's solution, 0.5% SDS, 250 g/ml sheared salmon sperm DNA, 10% dextran sulfate buffer; the final stringent wash was carried out with 2 ϫ SSC, 0.1% SDS (w/v) buffer for 30 min at 65°C.
Poly (A) ϩ RNA Isolation-Two methods were used: poly(A) ϩ RNAs were selected by affinity chromatography on oligo (dT)-cellulose (Pharmacia Biotech Inc.) using standard protocols (50) or using the Poly(A)tract ® mRNA Isolation System obtained from Promega according to the manufacturer's protocol.

Study of the Half-life of Mucin mRNAs in Act D-treated
HT29-MTX Cells-To demonstrate the possible role a rapid turnover of mucin mRNA may have in leading to the polydisperse pattern shown on Northern blots, we blocked the synthesis of mRNA using actinomycin D (Act D), which is a transcription inhibitor. HT29-MTX cells were harvested at various times from 3 to 24 h after Act D treatment, and total RNA was extracted according to Ref. 40. The RNA transcripts in each sample were analyzed by Northern blotting and hybridized with probes for MUC2, MUC3, MUC5AC, and MUC5B. Several experiments were performed using different times of treatment with Act D. In the first experiment, the treatment time with Act D was 3 and 9 h. For RNA from cells cultured without Act D (T), the hybridization signal appeared in most cases as a band with a smear as described previously (51). After 3 and 9 h of contact with Act D, there was a general decrease in intensity of both the band and the smear detected with all of the probes, reflecting inhibition of transcription. MUC5AC is shown as an example in Fig. 1. The persistence of the hybridization signal of MUC2, MUC3, MUC5AC, and MUC5B mRNA at high levels in the presence of Act D at 9 h indicates that mucin transcripts in these cells are not particularly unstable and that the polydisperse pattern cannot be explained by a rapid turnover. The disappearance of MYC mRNA was a control of Act D efficiency ( Fig. 1) (43).
To confirm this result and to estimate the half-life of mucin mRNAs, other experiments were performed adding longer incubation times with Act D: 3, 6, 9, 16, and 24 h. MUC2, MUC3, MUC5AC, and MUC5B messenger RNAs expressed in the HT29-MTX cell line were quantified by dot-blot analysis using 28 S rRNA as a control for the amount of total RNA analyzed in each assay. For each probe and for each incubation time with Act D, dot-blot analysis was performed four times, and the mean of these four values was calculated. The mucin mRNA/28 S rRNA levels were expressed as a percentage of the corresponding value measured in untreated cells (100% in T) and plotted against the time (hours) after Act D treatment ( Fig.  2A). Very similar kinetics of the decrease of mucin mRNAs were observed with all of the probes. The half-life was estimated to be 13 h for the MUC2 and MUC3 messengers and 16 h for the MUC5AC and MUC5B messengers; however, these values must be considered relatively rough estimates because of the cytotoxicity during prolonged treatment.
The same Northern blot and dot blot were also analyzed with a MUC1 probe. On Northern blot, MUC1 transcripts appeared as two bands of 4 and 6.5 kb. A significant and identical decrease in the intensity of the two transcripts was observed after 9 h of incubation with Act D (data not shown). The half-life of the two MUC1 messengers was calculated to be 17 h (Fig. 2B).
When HT29-MTX cells were grown in the same conditions but in the presence of Act D together with cycloheximide, which is a translation inhibitor, similar kinetics of mucin mRNA decrease were observed (by Northern blotting and by dot blotting) for each MUC probe, and approximately the same half-life values were calculated (Fig. 2, A and B).
Isolation of Nuclear, Polysomal, and Cytoplasmic RNAs-We isolated RNAs from human colon mucosa from three subcellu- lar fractions (RNA from the nucleus, RNA specifically associated with polysomes, and RNA from cytoplasm), and we compared their patterns by Northern blot analysis. Analysis of these three fractions was aimed at discriminating subpopulations of mRNAs, namely newly translated RNA (associated with polysomes), pre-mRNA (found in the nucleus), and degraded RNA (found in the cytoplasm). The RNAs isolated from these three fractions and the corresponding total RNA were analyzed by Northern blotting (Fig. 3). The expression pattern of MUC2 was compared in the four preparations, the ␤-actin probe being used as a control of RNA integrity. The same polydisperse signal was observed in the total RNA fraction as well as in each of the three subfractions: nuclear, polysomal, and cytoplasmic. No variations in the distribution of mucin mRNAs in the smear were observed, indicating that the different fractions of RNA cannot be distinguished by length, i.e. RNAs that have the ability to be translated and are therefore physiologically important were no longer than the mRNA species in the cytoplasm, and the pre-mRNA in the nucleus was also not different. The same results were obtained with HT29-MTX cells (data not shown).
These results indicate that a biological mechanism is not responsible for the polydisperse pattern of mucin mRNAs.
Improved Method for the Isolation and Transfer of Large RNAs-Total RNA was prepared from human colon mucosa using the modifications we made to the original guanidinium isothiocyanate-ultracentrifugation method. These modifications mainly involved taking extreme care to prevent all risk of mechanical degradation by shear forces. The efficiency of this improved method for RNA isolation, compared with the original protocol, is illustrated in Fig. 4, A and B. Using our method, the Northern blot hybridized here with the MUC2 probe exhibits one distinct band (Fig. 4B). The size of this transcript is larger than 10 kb. The intensity of the smear was significantly decreased.
The efficiency of the transfer of large mRNAs was also improved; two different pretransfer treatments of the gel were compared. The first method, which had previously been used routinely, included three rinses in 10 ϫ SSC (3 ϫ 10 min) and one rinse in 20 ϫ SSC (10 min). In the second method, the gel was soaked in 0.05 N NaOH (for a 3-mm-thick gel, the optimal signal was obtained after treatment for 20 min) prior to rinsing in RNase-free water and soaking for 45 min in 20 ϫ SSC (46). Aliquots (10 g) of the same RNA sample prepared using our improved method (Fig. 4, B and C) were used. The intensity of the RNA band was significantly higher when the gel was soaked in 0.05 N NaOH before transfer. Transfer efficiency was increased approximately 10-fold. Moreover, only a very weak smear was observed.
Comparison with Another Large mRNA, ApoB-100 -To confirm this result and to determine whether the degradation is an exclusive feature of mucin mRNA or affects other large messenger RNAs, we compared the patterns obtained with mucin probes and with the apoB-100 probe. The apoB-100 cDNA has been entirely sequenced; the apoB-100 transcript is 14.1 kb long (52)(53)(54). We compared two samples of RNA prepared from the same tissue specimen (small intestine) using the original guanidinium isothiocyanate ultracentrifugation protocol and our improved method. Each Northern blot was hybridized with MUC3 and apoB-100 probes. Using the guanidinium isothiocyanate ultracentrifugation protocol, a smear was obtained with the MUC3 probe as expected, and a polydisperse pattern was also obtained with the apoB-100 probe (Fig. 5A). Using our improved method (purification and transfer), distinct bands were obtained with both the MUC3 and apoB-100 probes (Fig.  5B).
Transcripts Expressed by the Mucin Genes MUC2 to MUC6 -To determine the number of the transcripts expressed by each of the mucin genes, MUC2 to MUC6, preparations of RNA from human mucosae (bronchus, antrum, fundus, and colon) available from seven individuals were analyzed by Northern blotting using the improved RNA purification method and introducing the treatment with 0.05 N NaOH before the transfer. All of the mucin probes (MUC2 to MUC6) revealed either a single band or a double band where both bands had the same intensity. Moreover, the size of the band(s) can vary between individuals (Fig. 6). On the other hand, when the same specimens were extracted and purified using other methods (described under "Experimental Procedures"), smears were always obtained on Northern blots (data not shown).
The standard RNA molecular size markers used ranged from 0.24 to 9.49 kb. All MUC2-6 transcripts were much larger than 9.49 kb. To determine whether our standard curve could be meaningfully extrapolated, we used the apoB-100 probe, which gave a positive signal on Northern blots of colonic RNA, but this overestimated the size of the apoB-100 mRNA, giving a value of 22 rather than 14 kb, showing that the use of standard RNA molecular size markers is misleading in the case of very large RNAs. So, we used another standard curve using three points corresponding to apoB-100, 28 S rRNA, and ␤-actin signals. These points were not on a perfect straight line, so they were joined to form a nonlinear curve (Fig. 7), and where necessary (up to 14 kb), we used as an additional point

FIG. 4. Comparison of the two methods for RNA purification (A and B) and for RNA transfer (B and C).
Total RNA (from the same human colon sample) was isolated by the original guanidinium isothiocyanate-ultracentrifugation protocol (A) or by our improved method (B) and transferred without treatment in 0.05 N NaOH. The same RNA sample used in B was transferred with treatment in 0.05N NaOH (C). The three electrophoreses were performed in the same conditions, and the Northern blots were hybridized with the MUC2 and ␤-actin probes. Autoradiography was performed for 10 h.

FIG. 5. Comparison between Northern blot patterns obtained
with the MUC3 and the apoB-100 probes, using the two different methods for RNA purification. Total RNA (from the same tissue specimen) was prepared using the original guanidinium isothiocyanateultracentrifugation protocol (A) or using our improved method (B).
MUC5B, which has been shown to be 17.6 kb in length, 2 is expressed in colon, and does not show common variation. 3 Under these conditions, the sizes of mucin mRNAs were estimated from 14 to 24 kb. The results are shown in Table I.
Polymorphism Studies: Comparison between Southern Blot and Northern Blot Patterns-With mucin probes MUC2, MUC3, MUC4, MUC5AC, MUC5B, and MUC6, a single band or a double band was observed, depending on the individuals (Fig. 6). When two bands were present, they appeared with the same intensity, suggesting that two alleles are co-expressed. Moreover, RNA sizes vary among the individuals, except for MUC5B. To demonstrate that a double band corresponds to the expression of the two alleles and a single band to two RNAs with similar sizes, we compared the RNA sizes on Northern blot with the size of a fragment encompassing the genomic tandem repeat domain on Southern blot. It was possible to perform this study on the MUC2, MUC4, and MUC5B genes for which partial genomic organization is known. We selected restriction endonucleases with recognition sites on either side of the tandem repeat domains: HinfI for MUC2 (55), EcoRI/PstI for MUC4 (56), and BglII for MUC5B (13). To evaluate the DNA polymorphism of the tandem repeat region, DNA digested with these four enzymes and from three out of the seven individuals were analyzed by electrophoresis and hybridized with the corresponding repetitive mucin probes: MUC2, MUC4, and MUC5B (Fig. 8A).
In the case of MUC4, four distinct alleles were identified for the three individuals tested, either at the DNA or at the RNA level. The allele sizes of the tandem repeat domain range from 9 to 19 kb, and the corresponding RNA sizes range from 16.5 to 24 kb. There is a notable similarity between Southern (Fig. 8A) and Northern blot (Fig. 8B)  dem repeats, had the smallest transcript, and conversely, individual number 6, exhibiting an allele encompassing the greatest number of tandem repeats, had the largest transcript (Fig. 8, A and B). Nevertheless, because of the lack of RNA markers larger than 14.1 kb (apoB-100), it was not possible to determine the precise size of the largest MUC4 transcript. Four distinct alleles were also identified for the three individuals for the MUC2 gene, and the same similarity was observed between the two patterns. The MUC5B gene had the distinction that only one allele was present in the three individuals studied.
In all cases, the sizes of mRNAs correlated with the size of the DNA restriction fragments containing the VNTR domain. Thus, the mucin gene polymorphism was reflected at the RNA level, and expression of alleles is co-dominant.
Study of the Efficiency of Large Poly(A) ϩ RNA Selection-The efficiency of poly(A) ϩ RNA selection in the case of large RNAs has not yet been investigated in the literature. This last step of purification usually recommended prior to Northern blot has been studied. A mRNA purification kit purchased from Promega was used, and the original protocol using selection by an oligo (dT)-cellulose column was also tested. Total RNA prepared using our improved method and corresponding poly(A) ϩ RNA was analyzed by Northern blot (Fig. 9). The yield of purification of MUC4 mRNA was estimated by comparing the intensity of the hybridization signals obtained with the ␤-actin probe. Whatever the protocol of purification, the yield of poly(A) ϩ selection was very low for mucin messengers when compared with smaller messengers such as the ␤-actin. Moreover, in some cases, total RNA showed two distinct bands, but in the corresponding poly(A) ϩ fraction, only the smallest transcript was selected. DISCUSSION Since the initial observation in the case of the MUC2 gene, it has been a common characteristic of genes encoding secreted mucins (MUC2, MUC3, MUC4, MUC5AC, MUC5B, and MUC6) that the pattern observed by Northern blot analysis is in the form of a smear or very wide bands, suggesting heterogeneity of transcripts. In the literature, other examples of polydisperse patterns have been described that are of biological origin, for example the rat insulin-like growth factor I (57) and the apopolysialoglycoprotein of rainbow trout eggs (58). The insulin-like growth factor I gene is transcribed and processed in a complex manner to produce a large number of mature mRNAs (from 0.8 to 7.5 kb). These mRNAs result from multiple transcription start sites, the use of different polyadenylation sites, and alternative splicing. The multiple apopolysialoglycoprotein mRNAs (from 0.8 to 6 kb) are transcribed from multiple genes containing divergent numbers of exact 39-base pair tandem repeats. On the contrary, other examples of polydisperse patterns have been attributed to artifactual causes; messenger RNAs encoding human titin (23 kb) (59), apoB-100 (14.1 kb) (60), and dystrophin (14 kb) (61) each show one band followed by a weak smear on Northern blot analyses.
Initially, the presence of a polydisperse signal for mucin mRNAs was considered to be an artifact due to bad preparation of RNA, but rapidly this polydispersity became considered as an original and inherent feature of secreted mucins, because the hybridization of the same blots with a control probe (such as GAPDH or ␤-actin) showed a single band of the expected size, suggesting that the RNA was not degraded. Moreover, smears were also obtained using commercial mRNA preparations and Northern blots. The origin of this polydisperse pattern was unknown. Two kinds of hypotheses have been proposed in the literature to explain these polydisperse signals: those that propose a biological origin (such as rapid turnover, several related genes encoding similar transcripts varying in length, or alternative splicing); and those that suggest an artifactual polydispersity due to specific properties of the mucin mRNAs (such as their unusual sensitivity to mechanical damage related to their large size or the occurrence of a particular conformation due to the large tandem repeat domains, which perturbs the migration of molecules in electric fields). In this work, different experiments were carried out to explore the hypothesis of a rapid turnover by studying the half-life of the mucin mRNAs in HT29-MTX cells and by isolating subpopulations of RNAs from nucleus, polysomes, and cytoplasm. Subsequently, the susceptibility to mechanical degradation of the mucin transcripts during preparation was studied by adapting a protocol suitable for the preparation of large RNAs and by increasing the efficiency of transfer onto membrane.
The colon adenocarcinoma cell line HT29 adapted to methotrexate (HT29-MTX) that expresses the five mucin genes MUC1, MUC2, MUC3, MUC5AC, and MUC5B was chosen to study the decay of mucin mRNAs in HT29-MTX cells treated with Act D with the aim of following the pathway of degrada-  tion of the mRNAs and of estimating their half-life. Degradation of mucin transcripts proceeds by a slow and progressive decrease of all of the messenger RNAs, and the half-lives were all clearly long. Drugs such as Act D, which inhibit transcription, are commonly used to determine message half-life and this works well for messages with short half-lives. However, the long half-lives determined for the MUC genes (13 h for MUC2 and MUC3, 16 h for MUC5AC and MUC5B, and 17 h for MUC1) must be considered as cautious estimates, since prolonged treatment with Act D can inhibit metabolic processes other than transcription (62).
We also compared the half-life of a mucin gene exhibiting defined transcripts (MUC1) to the half-life of mucin genes showing polydisperse patterns of RNA and cultured the cells in the presence of Act D and cycloheximide. Since the mRNA decay showed similar kinetics in each case, we concluded that de novo synthesis of a regulatory protein was not involved in the generation of a polydisperse pattern of mucin messenger RNAs. These results indicate that in HT29-MTX cells the halflives of the mRNAs encoding both membrane-associated and secreted mucins are very close to the average half-life of mRNA turnover, which is 10 -20 h in eukaryotic cells (62). Thus, rapid turnover of mucin mRNAs could be ruled out as an explanation of the polydisperse pattern.
Isolated nuclear, polysomal, and cytoplasmic fractions exhibited the same polydisperse pattern of their mucin mRNAs, and it was not possible to discriminate, by size, the subpopulations of RNA that were translated, degraded, or immature. This result is in complete agreement with the study of the RNA turnover, i.e. the mucin transcripts are stable. We therefore decided to investigate an artifactual origin.
We chose to improve and adapt the original guanidinium isothiocyanate method for large RNA isolation. This technique is recognized for the purity and quality of the RNA obtained. The guanidinium isothiocyanate and 2-mercaptoethanol mixture combines the strong denaturing characteristic of guanidine with the chaotropic action of isothiocyanate and efficiently solubilizes tissue homogenates. Effective disruption can be obtained without the use of a homogenizer, which has otherwise been used routinely. Great care was taken to prevent mechanical degradation. Using this modified method, we were able to isolate large and intact mucin mRNAs from various human tissues (bronchus, fundus, antrum, and colon). Moreover, the yield of RNA signal was considerably better when, prior to transfer, the gel was soaked in NaOH. This treatment partially hydrolyzes the RNA and improves the efficiency of the transfer of large RNA and thus the detection of the signal. Hence, we demonstrated that the smear observed is due to partial degradation of very large mucin RNAs during purification and is exaggerated because of the better transfer of the products of degradation than the intact RNAs. We used the apoB-100 probe to confirm these results, because the apoB-100 gene is known to encode a large transcript, the cDNA being fully sequenced (14.1 kb), and it is expressed in the small intestine and in the colon. The same kind of pattern was observed with the mucin and apoB-100 probes whatever the method used for RNA preparation. We concluded that degradation of RNA during purification is not a specific feature of mucin RNAs but occurs with other large messenger RNAs. We can therefore conclude that the demonstration of the integrity of the RNA preparation using a probe such as ␤-actin, GAPDH, or 28 S rRNA is misleading in the case of large RNAs, because these probes hybridize to short messages, for which the degradation problem is not encountered. Moreover, the use of standard RNA molecular size markers is not adapted to determine large sizes. We therefore recommend use of apoB-100 and MUC5B for these two purposes. Nevertheless, precise determination of sizes larger than 17.6 kb remains difficult because of the lack of RNA markers. Moreover, the use of large DNA fragments as size markers is inappropriate, because DNA and RNA molecules do not have the same behavior in denaturing gels. In the future, once it has been precisely sized after sequencing, it will also be possible to use the largest MUC4 allele of 24 kb in Fig.  6 and 7 (which is common) as a size standard (since it will be present in many of the samples and can be demonstrated by Southern blot analysis). 4 We also studied the isolation of poly(A ϩ ) RNA, and we concluded that selection of poly(A ϩ ) is not recommended in the case of large mRNAs because of a very poor yield and because of additional risks of mechanical damage. Moreover, the preferential loss of the largest transcript observed in our experiment suggests that there is a risk of misinterpretation of the data.
To study the migration of large mRNAs that have large repetitive domains, we adapted a protocol using pulsed field gel electrophoresis for the separation of RNAs. With the aim of reducing mechanical damage of large RNAs, we attempted to carry out the electrophoresis directly from cells included in agarose plugs as performed for very large DNA fragments. Despite the fact that great amounts of RNasin were used, we did not succeed in separating large RNAs. Nevertheless, we were able to compare patterns on Northern blots performed with the same RNA preparation, using conventional electrophoresis and pulsed field gel electrophoresis. We observed that resolution of large mRNAs prepared according to our improved protocol was good using a contour-clamped homogenous electric field apparatus and electrophoretic conditions adapted to separation of molecules from 15 to 50 kb (63). However, the separation of similar size RNAs was not better (data not shown). Therefore, we recommend the use of conventional electrophoresis as described in this work.
In the case of MUC2, MUC4, and MUC5B, it was possible to compare the size of a fragment encompassing the genomic repetitive array with the RNA length, and we demonstrated that when size variations of transcripts among the individuals exist, they are directly related to the VNTR polymorphism that also exists at the DNA level. We speculate that the same observation will be made for MUC3, MUC5AC, and MUC6 genes, which also demonstrate variation in the size of transcripts in different individuals and thus putative VNTR variation at the level of the DNA but for which full restriction maps of the genes are not available. From this preliminary study, we can postulate that mucin genes exhibit a variable degree of VNTR polymorphism, the MUC4 gene appearing to be the most polymorphic (7.5 kb between the largest and the smallest allele), and on the contrary, the MUC5B gene seems not to be polymorphic. However, more samples will be required to con-firm these results.
This work demonstrates clearly that mucin mRNAs are not polydisperse but are very large. This had been suggested only recently by Baeckström and colleagues (64). Human mucin transcripts are among the largest reported for eukaryotes. In the literature, two examples of proteins whose transcript sizes are larger than 20 kb have been described. An avian protein, the megadalton protein titin, is a single polypeptide chain of ϳ2800 -3000 kDa that constitutes 8 -10% of the myofibrillar proteins in striated muscle cells. Titin is encoded by a single RNA species of ϳ23 kb (59). The second protein is the human nebulin, a myofibrillar protein. Its full-length cDNA sequence is 20.7 kb long (65).
Our results are consistent with the message size necessary to synthesize large apomucin polypeptides. Recent studies have assessed the molecular mass of several mucin precursors immunoprecipitated with monoclonal antibodies and analyzed on reducing SDS-polyacrylamide gel electrophoresis (MUC2, MUC3, MUC4, MUC5AC, and MUC6 apomucins) (66). Mucins are synthesized as large N-glycosylated precursors (ϳ500 kDa), which oligomerize and are converted by O-glycosylation into very large glycoproteins. The molecular masses found are consistent with the size of mucin mRNA estimated using our Northern blot analysis (Table I).
In Some Individuals, Mucin RNAs Are Detectable as a Double Band with Equal Intensity. We have shown that this finding reflects the expression of the two different alleles that could encode two mucin precursors with variable size that are probably synthesized in comparable amounts. The observation of co-expression of the two alleles is in accordance with other studies on mucin precursors. Evidence for co-expression of two alleles for rat gastric and colon apomucins with variation in molecular mass among animals has already been reported (67,68).
Moreover, our data agree with the previous studies on MUC1. The MUC1 gene exhibits allelic variations due to the variation in the number of tandem repeats; the same polymorphism is demonstrated at the protein level, and the allele products are co-dominant (4).
In addition to the improvements in the methodology for isolation of intact large RNA usable for mucin mRNAs and all other large transcripts, this work provides an important step in understanding the role of mucin gene expression in mucusassociated pathologies. VNTR polymorphism results in the existence of different sizes of mucin core proteins bearing different numbers of potential O-glycosylation sites in different individuals. This work will now stimulate research into the biological implications of the length of the VNTR domain on mucin functional properties. Further studies will be required to demonstrate whether mucin gene polymorphism is implicated in the increase of susceptibility to any pathology, as proposed for the MUC1 gene in gastric cancer (69). Such studies are in progress in our laboratory.