Molecular Cloning and Sequencing of a Novel Invertebrate Intestinal Mucin cDNA*

The first invertebrate intestinal mucin, termed insect intestinal mucin (IIM), was recently identified fromTrichoplusia ni larvae (Wang, P., and Granados, R. R. (1997) Proc. Natl. Acad. Sci. U. S. A., in press). We report the cDNA cloning and sequencing of IIM, which is only the second completely sequenced intestinal mucin after human intestinal mucin, MUC2. To clone and sequence the cDNA for IIM, a T. ni larval midgut cDNA expression library was constructed and screened with an anti-IIM antiserum. Two full-length cDNA clones for IIM were identified and sequenced. The deduced proteins from the two cDNA clones contained 807 and 788 amino acid residues, respectively. The structural organization of IIM is similar to that of MUC2, containing a 25-amino acid signal leading sequence and two threonine/proline/alanine-rich tandem repeat domains flanked by cysteine-rich sequences. One tandem repeat domain contained two repeating units, TTTQAP and AATTP, and the other contained one repeating unit, TAAP. The cysteine-rich regions showed potential chitin binding features. By immunolocalization in tissue sections, it was determined that IIM is expressed in midgut tissues. The IIM mRNA is abundant in the midgut tissue, and Northern blot analysis indicated that IIM transcripts were not polydispersed as is found in mammalian mucin transcription.

Vertebrate epithelial organs are covered, throughout the body, with a mucus lining, which serves as a selective physical barrier between the extracellular milieu and the epithelial surface. The mucus lining, especially in the gastrointestinal tract, is highly resistant to various digestive enzymes and provides protection and lubrication for the underlying cells (2,3). The function of mucus is reliant on one major component, mucins, which are highly O-glycosylated proteins. Carbohydrate moieties on mucins commonly account for more than 50% of the protein by weight. The biochemistry and molecular biology of mucins from vertebrates has been broadly studied, with human epithelial mucins being the most extensively studied (for reviews, see Refs. 4 -6). Several mucins from humans and other vertebrates have been completely or partially sequenced, and this has contributed to a greater understanding of their structure and function. Full cDNA sequences for human mucin MUC1 (7), MUC2 (8), and MUC7 (9), have been obtained. In addition, mucins from other vertebrates, including mouse MUC-1 (10), rat ascites sialoglycoprotein-1 (11), canine tracheobronchial mucin (12), bovine submaxillary mucin-like protein (13), and frog FIM-A.1 (14), have also been fully sequenced by cDNA cloning.
Studies on invertebrate mucins are very limited in comparison with vertebrate mucins. Drosophila glue proteins from salivary glands (15) have structural characteristics of mucinlike proteins, although these mucin characteristics have not been studied. Mucin-like proteins have also been reported in protozoans (16,17). A secretory mucin involved in squid egg mass formation was identified from the nidamental gland (18,19). A glycoprotein from Drosophila melanogaster cultured cells, was reported to be a mucin-like protein (20). Recently, a membrane-associated mucin from the hemocytes of D. melanogaster was identified, and a cDNA for the mucin was subsequently cloned (21). There were no reports, however, on mucins identified from invertebrate digestive tracts until the insect intestinal mucin (IIM) 1 was identified from the midgut of Trichoplusia ni larvae. 2 Insects do not possess a mucus layer lining the digestive tract and/or other epithelial cells, as is the case with vertebrates. The digestive tract in insects is commonly lined with an invertebrate-unique structure, the peritrophic membrane (PM). PMs are noncellular matrices of chitin and proteins that are suggested to have similar functions to the mucus layer in vertebrates (i.e. a selective barrier protecting the digestive tract from physical damages and microbial infections) (22,23). The two major constituents of PMs, chitin and IIM, play very important roles in the protection of insects from microbial infections (24). 2 Biochemical analysis of IIM by Wang and Granados 2 revealed properties similar to secreted epithelial mucins from vertebrates. Identification of this first insect intestinal mucin has provided a basis for the study of the novel strategy adopted by viruses to overcome the insect host's defensive mechanism by encoding a mucin-degrading enzyme. 2 Knowledge of the structure of IIM is necessary for the comparative analysis of invertebrate and vertebrate intestinal mucins, the structure and function of mucins, and the mechanism of mucin-pathogen interaction.
In this study, we have cloned and sequenced full-length cDNAs for IIM from T. ni, enabling further characterization of IIM. We report that IIM has a similar structural organization to human intestinal mucin, MUC2, and is expressed in midgut tissue. Sequence analysis indicates potential chitin binding domains that may interact with the chitin present within the PM.  Fig. 3, are marked with arrows indicating the regions. The translation initiation codon ATG and the stop codon TAA are double underlined. The potential polyadenylation signal sequences are underlined. The broken line indicates gaps in nucleotide sequence. Four potential N-glycosylation sites are indicated with upward arrows. The N-terminal amino acid sequence obtained from purified IIM is also included in this figure. X in the N-terminal amino acid sequence indicates unidentified amino acid residues from N-terminal amino acid sequencing. A potential alternative 3Ј-splicing site, AG, is indicated in boldface letters.

EXPERIMENTAL PROCEDURES
Preparation of IIM-Midgut PMs were dissected from mid-fifth instar T. ni larvae and stored at Ϫ70°C before use. IIM was isolated by preparative SDS-PAGE (25) as described by Wang and Granados, 2 and a phosphate-buffered SDS-PAGE system (26) was used to prepare IIM for N-terminal amino acid sequencing. IIM was recovered from excised gel slices by electroelution, and the eluted IIM solution was desalted and concentrated as described previously. 2 N-terminal Amino Acid Sequencing-N-terminal amino acid sequencing of IIM was performed using a Perkin-Elmer/Applied Biosystems Division 470A gas phase protein sequencer at the Analytical Chemistry and Peptide/DNA Synthesis Facility, Biotechnology Program, Cornell Center for Advanced Technology (Ithaca, NY).
Construction of a cDNA Expression Library-Midgut epithelial tissues were dissected from early to mid-fifth instar T. ni larvae in cold Rinaldini's solution (27). PMs with food contents and other attached tissues (i.e. fat bodies, trachea, and malpighian tubules) were quickly removed from the midgut epithelium. Isolated midgut epithelia were rinsed with cold Rinaldini's solution, quickly frozen in liquid nitrogen, and stored at Ϫ70°C prior to use.
Midgut mRNA was isolated using the RNeasy total RNA isolation kit and the Oligotex mRNA isolation kit (Qiagen Inc., Chatsworth, CA), according to the manufacturer's specifications. The quality of mRNA was confirmed by Northern blot analysis, which showed no detectable degradation of mRNA after probing with ␤-tubulin DNA.
A cDNA library was constructed from T. ni midgut mRNA using the ZAP-cDNA Gigapack Cloning Kit (Stratagene, La Jolla, CA), following the manufacturer's instructions. cDNA was unidirectionally ligated into the Uni-ZAP XR vector (Stratagene, La Jolla, CA) between the EcoRI and XhoI sites and packaged with the Gigapack II Gold package extract. The resultant cDNA library was amplified once at 50,000 plaques/15-cm plate in XL1-Blue MRFЈ E. coli host cells.
cDNA Library Screening-Screening of the cDNA expression library for IIM cDNA clones was conducted using an IIM-specific polyclonal antiserum 2 in conjunction with the pico Blue TM Immunoscreening Kit (Stratagene, La Jolla, CA), according to the manufacturer's specifications. The first round of screening was performed at a high density (i.e. 50,000 plaques/15-cm plate). Positive plaques were selected and further purified by screening at a low plating density (i.e. 20 -50 plaques/10-cm plate). From purified positive phages, the pBluescript ® SK(Ϫ) phagemid (Stratagene, La Jolla, CA) was excised in vivo following the ZAP-cDNA Gigapack cloning kit protocol.
DNA Sequencing and Sequence Analysis-After restriction enzyme analysis of 20 purified cDNA clones, two different clones were chosen for sequencing. Nested deletions from both orientations of the cDNA inserts were constructed using the Erase-a-Base System (Promega Corp., Madison, WI). Both strands of the cDNA were sequenced by automated cycle sequencing using T3 and T7 primers, complementary to the pBluescript ® SK(Ϫ) sequences flanking the cDNA inserts. DNA sequence analysis and a data base search were conducted using the DNASTAR software package (DNASTAR Inc., Madison, WI) and BLAST data base search programs (28). Protein O-glycosylation sites were predicted following an O-GLYCBASE search (29).
Immunolocalization of IIM in T. ni Larvae-Fourth instar T. ni larvae were fixed in 4% paraformaldehyde overnight at 4°C and embedded in paraffin. After tissue sectioning and dewaxing, immunostaining was performed as follows: sections on glass slides were blocked for nonspecific staining with 3% bovine serum albumin in phosphatebuffered saline, followed by incubation with antiserum against IIM in phosphate-buffered saline containing 3% bovine serum albumin. After incubation with the first antiserum, the sections were washed with phosphate-buffered saline and incubated with a secondary antibody against rabbit IgG conjugated with colloidal gold (Sigma). Following secondary antibody incubation and subsequent washing, the sections were fixed with 2.5% glutaraldehyde. Immunogold staining was intensified by silver enhancement using the Silver Enhancer kit (Sigma). The immunostained sections were counterstained with hematozylin and eosin and examined by microscopy.
Western Blot Analysis of T. ni Larval Tissues-Tissues were isolated from fifth instar T. ni larvae and rinsed with phosphate-buffered saline. The tissues were then homogenized and boiled in 0.0625 M Tris-HCl (pH 6.8) containing 2% SDS, 5% ␤-mercaptoethanol, and 10% glycerol. Undissolved materials were removed by centrifugation. Protein concentrations in the supernatants were estimated using the Bradford protein assay (30). One microgram of protein from each tissue extract, except for the PM extract, for which 0.04 g of protein was used, was loaded onto the gel. Proteins were separated by SDS-PAGE, followed by blotting onto Immobilon membrane (Millipore Corp., Bedford, MA), and probed with anti-IIM antiserum, as described by Wang and Granados. 2 SDS-PAGE Analysis of IIM from PM-To examine the contribution of disulfide bonds to the stability of IIM, PMs were treated with 10 mM DTT in the presence and absence of active PM endogenous proteases. Inactivation of PM endogenous proteases was performed by heating PMs in 2% SDS and subsequent washing, as described by Wang and Granados. 2 After treatments with DTT, the PMs were washed in deionized water. The PMs and the spent incubation solutions were boiled in SDS-PAGE sample buffer and analyzed by SDS-PAGE (25).

RESULTS
cDNA Library Construction and Library Screening-A cDNA expression library was constructed from T. ni midgut mRNA. The library has a complexity of 2.35 ϫ 10 6 plaques, of which  over 99.5% were recombinants. Screening of the library with an antiserum specific to IIM indicated that the mRNA for the IIM was abundant; 50 positive plaques were obtained from 50,000 plaques. Since only one in three plaques will be in the correct reading frame for protein expression, the frequency of IIM cDNA clones could be 1 in 333. From these 50 plaques, 20 positive plaques were further purified. From these 20 plaques, the pBluescript ® SK(Ϫ) phagemids were rescued by in vivo excision. Following restriction enzyme analysis to map the selected clones, two different full-length clones, pIIM14 and pIIM22, were chosen for sequencing. cDNA Sequences-The cDNAs from both pIIM14 and pIIM22 were full-length clones, encoding a protein of 788 and 807 amino acid residues, respectively (Fig. 1). The open reading frame in the cDNA from IIM14, was 57 base pairs shorter than in IIM22; otherwise, the open reading frames in these two clones were identical (Fig. 1). IIM22 contains a putative polyadenylation signal consensus, AATAAA, located 331 base pairs downstream of the translation stop codon, TAA, and 17 base pairs upstream of the poly(A) (Fig. 1). IIM14 contains a putative polyadenylation signal, AATTAA, located 15 base pairs upstream of the poly(A) (Fig. 1).
The deduced protein sequences from IIM14 and IIM22 showed a hydrophilicity profile characteristic of a signal sequence (31) at the N terminus of protein sequences (Fig. 2). The N-terminal amino acid sequence determined from purified IIM indicated that the cDNA clones encode a protein containing a signal peptide 25 amino acids long and confirmed that the cDNA clones code for the IIM (Fig. 1). The amino acid composition of the deduced proteins from IIM14 and IIM22 were very similar to the composition of IIM isolated from T. ni 2 (Table I), further confirming that the cDNA clones code for the IIM. Protein sequence data reveal that there are four potential N-glycosylation sites (Fig. 1). This is in agreement with the biochemical analysis results reported by Wang and Granados, which demonstrated that IIM has N-linked glycosylation. 2 The overall IIM sequence can be divided into six distinct regions based upon its sequence features (Fig. 3). The amino acid composition of each region shows characteristics of a secreted epithelial mucin (Table II). Both the N-terminal and C-terminal domains (II and VI in Fig. 3), are rich in cysteine, which accounts for 8.2 and 7.8% of the total amino acid residues, respectively. Region III is rich in threonine, proline, and alanine (49.2, 16.2, and 21.5%, respectively) and contains two types of tandem repeats, TTTQAPT and AATTP, which are typical features for a mucin (6,32). Region IV is similar to regions II and VI and contains 9.0% cysteine residues. Region V is another threonine-, proline-, and alanine-rich section, containing a repetitive sequence, TAAP. This region differed between IIM14 and IIM22 in sequence length, but the sequence features of the two cDNA clones were similar. This region (V), contains 25 TAAP repeats in IIM22.
Northern Blot Analysis of IIM Transcripts-Northern blot analysis of T. ni midgut RNA with a probe made from IIM22 showed a single band with a molecular size of 3.1 kilobase pairs (Fig. 4), indicating that there was no similar polydispersity in IIM transcription, as is found in mammalian mucin transcripts.
Localization of IIM-The IIM from T. ni larvae was localized by immunocytochemistry with the antiserum to IIM. Microscopic observations (Fig. 5) showed that IIM was localized in the peritrophic membrane and in the area surrounding the midgut epithelial brush border (Fig. 5A). Observation at a high magnification suggested that IIM could be secreted from goblet cells of the midgut epithelium (Fig. 5B). Immunostaining with preimmune serum from the same rabbit used to generate the anti-IIM antiserum did not show any positive reaction (Fig.  5C). In addition to the midgut, positive staining was occasionally observed in malpighian tubules on the lumen side (Fig. 5A, indicated by an arrow). To verify whether this occasional positive staining in malpighian tubules was specific to IIM and to test whether IIM was present in other tissues, a Western blot analysis of extracts from various tissues of T. ni larvae using anti-IIM antiserum was conducted (Fig. 6). The Western blot analysis showed that IIM was primarily present in the noncellular PM (Fig. 6, lane 1). A broad band at 200 kDa could also be detected in the PM extract when this sample was overloaded (Fig. 6, lane 1). We consider this band a degradation product of IIM by active midgut digestive enzymes, since the PM moved through the digestive tract. The midgut was the only tissue in which a significant amount of IIM was detected (Fig. 6, lane 2). Besides the IIM band, some lower molecular weight bands were also present in the midgut extract (Fig. 6, lane 2). These bands possibly were the IIM protein in the process of glycosylation but not yet fully glycosylated. The extract from malpighian tubules did not show any positive staining at the gel position for IIM, although a very faint smear showed at a position slightly over 200 kDa (Fig. 6, lane 3). Some weak positive staining was detected in the extract from hemolymph with a major broad band between 66 and 97 kDa (Fig. 6, lane 6). Salivary gland, fat body, and epidermis extracts did not show any positive reaction to the anti-IIM antiserum (Fig. 6, lanes 4,  5, and 7).
Effect of Reducing Reagent on IIM-Disulfide bonds appear to be essential for the stability of IIM in the PM (Fig. 7), since in the presence of DTT, IIM was quickly degraded by endogenous digestive enzymes associated with the PM (Fig. 7, lane 2). Once the endogenous proteases were inactivated, IIM was stable in the PM following treatment with DTT (Fig. 7, lane 6). IIM showed a strong association with the chitin-containing PM structure. 2 Treatment of PMs by boiling in 2% SDS and subsequent incubation in DTT to reduce disulfide cross-linking  Table II. bonds did not result in any release of IIM from the PM (Fig. 7,  lane 8). DISCUSSION Biochemical analysis by Wang and Granados 2 has shown that IIM from T. ni midgut peritrophic membranes is a novel invertebrate intestinal mucin. The cDNA sequence presented here confirms the identity of this secreted invertebrate intestinal mucin. Compared with human and other mammalian mucins, the overall structural organization of IIM is similar to human intestinal mucin, MUC2 (8), which can be described as follows: (a) like a secreted mucin, the IIM contains a 25-amino acid signal peptide at the N terminus (region I); (b) relative to MUC2, which has two different tandem repeat domains interspersed by a cysteine-rich region that distinguishes MUC2 from other mucins, IIM also contains two threonine-rich tandem repeat regions (regions III and V) where potential Oglycosylation sites are located; and (c) the two tandem repeat regions are flanked by cysteine-rich regions (regions II, IV, and VI) (Fig. 3).
In comparison with MUC2, which contains more than 5100 amino acid residues, the apoprotein in IIM is relatively small. The mature IIM contains 763-782 amino acid residues. Prediction of O-glycosylation using the O-GLYCBASE search program (29) indicated that 127 of the 147 threonine residues and 5 of the 23 serine residues in IIM22 (excluding the signal peptide) were potential O-glycosylation sites. In regions III and V, all threonine residues, except the two at the boundaries of region III (at position 99) and region V (at position 486), were potential O-glycosylation sites. There is only one threonine in the non-tandem repeat domains (at position 314) marginally predicted as a potential O-glycosylation site. A PROSITE data base search using DNASTAR suggested four tentative N-glycosylation sites. All four sites were located within region V, disrupting the tandem repeat (Fig. 1).
Regions III and V have high levels of threonine, alanine, and proline and do not contain any aromatic or sulfur-containing amino acids (Table II), which is similar to the corresponding domains in MUC2 (8). IIM contains multiple repeating units. These repeating units are short compared with those found in mammalian mucins (4,6,32). Region III contains two tandem repeating sequences, TTTQAPT and AATTP, throughout the whole region. Region V contains an even shorter repeating unit, TAAP. The repeating units in this region are dispersed at four potential N-glycosylation sites and several other locations. Sequences TTVT(V/S)PP and TTAVPEI occur frequently in the disrupted locations in region V (Fig. 1). The repeating sequences in IIM did not exhibit similarity to any known repeating sequences from other mucins (6).
The difference between cDNAs IIM14 and IIM22 is in region V. In this region, IIM14 contains 19 fewer amino acids than TABLE II Regional amino acid composition of IIM Regional amino acid composition of IIM as presented in Fig. 3. Mol % values were calculated from the deduced protein sequences of the IIM cDNA clones. IIM22, which could be due to genetic polymorphism, as reported for human and other vertebrate mucin genes (33)(34)(35). Both IIM cDNAs contain G ϩ C-rich repeated sequence units in region III and V. These G ϩ C-rich repeated sequences (with -like sequence features), could be responsible for the evolution of genetic polymorphisms (4,36,37). This difference between IIM14 and IIM22 could also be the result of alternative splicing during RNA processing. Such a phenomenon has been observed in mucin gene expression (38). The AG at position 2005 and 2006 in IIM22 (Fig. 1) could potentially serve as a 3Ј-splicing site, which would lead to a mRNA corresponding to IIM14.
Localization of IIM by immunocytochemistry indicates that IIM is primarily expressed in the midgut tissue (Fig. 5) and is likely to be secreted by goblet cells (Fig. 5B). Interestingly, this is similar to the secretion of mucins by goblet cells in vertebrate intestinal epithelium (39). Although positive staining could occasionally be observed in malpighian tubules by immunocy-tochemistry (Fig. 5A), Western blot analysis of various tissue extracts from T. ni larvae showed that IIM was expressed in the midgut (Fig. 6, lane 2). Very weak positive reactions were detected in the extracts of malpighian tubules and hemolymph (Fig. 6, lanes 3 and 6).
However, the bands detected in the malpighian tubules and hemolymph did not show the correct molecular weight corresponding to IIM, and the reactivity to the anti-IIM serum was very low. Therefore, these bands, which are yet to be determined, do not appear to indicate the presence of IIM.
The protein sequence features of IIM are in agreement with the data from the biochemical analysis of IIM reported by Wang and Granados. 2 The presence of N-glycosylation motifs and mucin-characteristic threonine-rich tandem repeats in the IIM sequence confirmed the presence of N-glycosylation and extensive O-glycosylation of IIM, previously analyzed by carbohydrate-specific lectin binding and specific glycosidase analyses. 2 Cysteine-rich domains are common in mucins and have been suggested to cause oligomerization of mucins by disulfide bonding (8, 40 -42). These cysteine-rich regions might also contain globular structures with intramolecular disulfide bonds (43). These protein regions could become exposed once the disulfide bonds are reduced (44). Our results (Fig. 7) demonstrate that disulfide bonds in the non-heavily O-glycosylated regions of IIM were involved in maintaining a digestive protease-resistant structure. However, protein sequence analysis did not show significant sequence similarity between the cysteine-rich regions in IIM and the cysteine-rich regions from MUC2 (or other mammalian mucins). This is not surprising, since insects are phylogenetically distant from mammals and since IIM is a constituent of a unique invertebrate chitin-containing structure. Studies on sequence characteristics and the disulfide bond formation in the IIM cysteine-rich regions have yet to be conducted.
Previous studies in our laboratory have shown that IIM is tightly associated with the PM 2 and that IIM is a major structural constituent of the PM. These results indicate that IIM may have a high affinity to the chitinous fibril network of PMs. By computer-assisted sequence analysis, a protein fragment in region IV was aligned to two chitin binding domains in chitinases from a yeast, Saccharomyces cerevisiae (45), and a fungus, Rhizopus oligosporus (46) (Fig. 8A). In addition to region IV, sequences in regions II and VI also show a certain degree of similarity to the chitin binding domains described above; however, the levels of similarity were lower than that found in region IV (data not shown). In a recent report, a non-mucin insect PM protein from Lucilia cuprina, peritrophin-44, showed binding capability to chitin, but it did not show significant sequence similarity to known chitin binding sequences (1), as demonstrated in this study (Fig. 8A). However, the cysteine-rich domains with peritrophin-44 shared the same structural feature, a six-cysteine-containing sequence present in cysteine-rich domains in chitinases (1). Surprisingly, the sequence features of IIM in the cysteine-rich regions are similar to what Elvin et al. (1) proposed for peritrophin-44. Almost all sequences in regions II, IV, and VI are composed of such a six-cysteine consensus (Fig. 8B). This result supports our findings that IIM may tightly bind to the chitin network of PM in the nonglycosylated cysteine-rich regions. The strong binding of IIM to chitin could be a very important factor for the formation of PMs in invertebrates and aid in the stability of the chitin network. Based on the structural characteristics of IIM and the strong binding associated with IIM and chitin, it is possible that the chitin fibrils in PMs are protected from enzymatic degradation. Considering the biochemical properties of IIM and the putative chitin binding sequences in nonglycosylated regions in IIM, we propose that the IIM protein backbone may be protected from degradation in the hydrolytic enzymerich midgut environment by two possible mechanisms: (a) the densely O-glycosylated regions (regions III and V) are protected by oligosaccharide moieties; and (b) the cysteine-rich nonglycosylated or less glycosylated regions (regions II, IV, and VI) are protected by disulfide covalent bonding forming a "buried" structure or by the protein binding to chitin in the PM. The mucin nature and chitin binding capability of IIM can explain the high resistance of IIM to midgut digestive enzymes and the protective functions of PMs in invertebrates, especially in insects. Any reagents with the potential effect of damaging IIM, such as baculovirus enhancins or reducing agents, will result in the destruction or attenuation of the protective role of the PM against parasites and other microorganisms.