Cloning, Expression, and Characterization of the Galα1,3Gal High Affinity Lectin from the Mushroom Marasmius oreades *

The purification and unique carbohydrate binding properties, including blood group B-specific agglutination and preferential binding to Galα1,3Gal-containing sugar epitopes, of theMarasmius oreades agglutinin (MOA) are reported in an accompanying paper (Winter, H. C., Mostafapour, K., and Goldstein, I. J. (2002) J. Biol. Chem. 277, 14996–15001). Here we describe the cloning, characterization, and expression of MOA. MOA was digested with trypsin and endoproteinase Asp-N, and the peptide fragments were purified by high performance liquid chromatography. Amino acid sequence data were obtained for eight peptides. Using oligonucleotides deduced from the peptide sequences for a reverse transcriptase-PCR, a 41-base pair cDNA was obtained. The 41-base pair fragment allowed the generation a full-length cDNA using 5′ and 3′ rapid amplification of cDNA ends. MOA cDNA encodes a protein of 293 amino acids that contains a ricin domain. These carbohydrate binding domains were first described in subunits of bacterial toxins and are also commonly found in polysaccharide-degrading enzymes. Whereas these proteins are known to display a variety of sugar binding specificities, none to date are known to share MOA's high affinity for Galα1,3Gal and Galα1,3Galβ1,4GlcNAc. Recombinantly expressed and purified MOA retains the specificity and affinity observed with the native protein. This study provides the basis for analyzing the underlying cause for the unusual binding specificity of MOA.

The Gal␣1,3Gal epitope has received considerable attention, stemming from its presence in the glycoproteins of most mammals and its conspicuous absence in humans, apes, and Old World monkeys (3). This absence is attributable to lack of the specific ␣1,3-galactosyltransferase because of frameshift mutations in its gene (4). The resulting immunogenicity of the Gal␣1,3Gal epitope is a significant barrier to xenotransplantation (5). Despite the importance of Gal␣1,3Gal epitope recognition, MOA is currently the only lectin known to have exclusive specificity for this disaccharide (2).
Few proteins have been shown to bind with any specificity to Gal␣1,3Gal. While Clostridium difficile toxin A and antibodies recognizing the ␣-galactosyl epitope both bind well to some Gal␣1,3Gal-containing oligosaccharides (6), the size and species of origin of MOA suggest that it is fundamentally dissimilar to these proteins. On these grounds, the blood group Bspecific Griffonia simplicifolia I-B 4 isolectin is perhaps more appropriate for comparison (7). A recent x-ray crystallographic structural analysis of G. simplicifolia I-B 4 isolectin complexed with Gal␣1,3Gal revealed that its binding pocket is restricted to the terminal nonreducing sugar, consistent with data showing the lectin to have similar affinity for monosaccharide and the various positional isomers of the disaccharide (8). However, MOA is expected to have an extended binding site to explain its overwhelming preference for Gal␣1,3Gal-containing di-and trisaccharides.
To study the basis for its unique carbohydrate binding specificity, we have cloned, recombinantly expressed, and characterized MOA. These studies reveal that MOA is a member of the ricin superfamily.

EXPERIMENTAL PROCEDURES
Peptide Sequencing and Analysis-Peptide sequences were determined by the Macromolecular Structure Facility at Michigan State University. Briefly, purified protein was digested with trypsin or endoproteinase Asp-N. Proteolytic fragments were bound to a C-18 column and eluted with a gradient of acetonitrile. Purified peptides were then sequenced by automated Edman degradation.
RNA Isolation, cDNA Cloning, and Northern Analysis-M. oreades mushrooms were collected in grassy plots in Ann Arbor, Michigan, frozen immediately in dry ice, and stored at Ϫ80°C until extracted. The frozen tissue was ground under liquid nitrogen to a medium-fine powder with a mortar and pestle resting in dry ice. Subsequent steps in the RNA purification followed recommendations given with the Plant RNA Isolation Aid as an accessory to the RNAqueous-Midi kit (Ambion). Using this protocol, 7.2 g of total RNA/g of mushroom was isolated.
Oligonucleotides for RT-PCR were designed from the available peptide sequences. The two regions with lowest degeneracy were within a region of four peptides whose sequences overlap one another. The forward primer (5Ј-GGNTGGCARTTYACNCC-3Ј) was reverse trans-* This work was supported in part by the National Institutes of Health and the Walther Cancer Institute. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. RT-PCR was conducted with Moloney murine leukemia virus reverse transcriptase (Invitrogen) and Amplitaq Gold polymerase (Applied Biosystems). Template mRNA was purified from 0.86 g of total RNA using the mRNA capture kit (Roche Molecular Biochemicals). A PCR product of appropriate size (ϳ40 bp) was cloned using the TA TOPO PCR cloning kit (Invitrogen). Sequencing of this product yielded a total of 11 unambiguous bases. 5Ј and 3Ј RACE was performed essentially as described in the First-Choice RLM-RACE kit (Ambion). Two overlapping primers were designed for each 5Ј and 3Ј RACE. These primers include all or part of the 11 unambiguous bases. The two primers used for subsequent amplification steps in 5Ј RACE (primer 1, 5Ј-ARYTGRTGCCARTTRATCGT-3Ј; primer 2, 5Ј-TGCCARTTRATCGTGTCTGG-3Ј) are 32-and 4-fold degenerate, respectively, with the degeneracy weighted toward the 5Ј-ends. Similarly, the two primers used for 3Ј RACE (primer 1, 5Ј-GGNTGGC-ARTTYACRCCAGA-3Ј; primer 2, 5Ј-CARTTYACRCCAGACACGAT-3Ј) are 32-and 8-fold degenerate, respectively.
For Northern analysis, total RNA (10 g) was run on a prerun formamide gel and transferred to a nylon membrane (Nytran) with the Bios blotting system. The cDNA probe was generated by random primer labeling with Klenow (Roche Molecular Biochemicals) incorporating FIG. 1. MOA cloning strategy. Relative position of degenerate primers with respect to derived sequence from the overlapping peptides 1-4 is shown by arrows labeled F and R. RT-PCR from M. oreades total RNA yields a product of the expected size. Subsequent cloning and sequencing confirm the size of the product (41 bp) and show it to encode the intervening amino acids. Sequencing of this product yielded 11 nondegenerate bases. Primers utilizing this nondegenerate sequence were used for 5Ј and 3Ј RACE to generate a full-length sequence.  Expression and Characterization of Recombinant MOA-A fulllength coding sequence PCR product incorporating NdeI and EcoRI sites into its forward and reverse primers, respectively, was cloned into PCR Blunt II using topoisomerase (Invitrogen) and subsequently subcloned into an isopropyl-1-thio-␤-D-galactopyranoside-inducible pT7 expression vector (MOApT7LO). Recombinant MOA was expressed in a Nova Blue DE3 strain of Escherichia coli. Induced bacteria were collected by centrifugation and resuspended in a lysis buffer consisting of 50 mM NaH 2 PO 4 , 300 mM NaCl, 10 mM imidizole, 10 mM 2-mercaptoethanol, 1 mM phenylmethylsulfonyl fluoride, 1% Nonidet P-40, and a protease inhibitor mixture. The extract was run twice through a French press. The insoluble fraction was removed by centrifugation (10,000 ϫ g, 15 min).
Purification of Recombinant MOA and Native Intact MOA-Recombinant MOA was purified from the soluble fraction by absorption on a column of melibiose-Sepharose and elution by lactose, as described previously (2). As similarly described therein, further purification was carried out on an affinity column of Synsorb B. Except for the diaminopropane elution, all affinity purifications were carried out in PBS, pH 7.2, containing 1.25 mM EDTA. After purification, lectin solutions were dialyzed against distilled water and lyophilized. Salt-free lyophilizates were readily soluble in distilled water or buffer, with retention of full agglutinating activity. Intact native MOA was prepared by the purification procedure described using the protease inhibitor mixture and metal-free buffers (2).

RESULTS AND DISCUSSION
Here we report the deduced amino acid sequence and recombinant expression of the only known Gal␣1,3Gal-specific lectin. This is the first recorded protein sequence from the fairy ring mushroom M. oreades.
Enzymatic digestion, purification of peptide fragments, and Edman degradation of the native protein yielded eight peptide sequences ( Table I). Inspection of the peptides reveals that four have overlapping amino acid sequences, designated peptides 1-4. The two low degeneracy oligonucleotides used for RT-PCR were designed from the overlapping region (Fig. 1). These oligonucleotides were used to obtain a 41-base pair product whose sequence generated 11 unambiguous bases. The 11 unambiguous nucleotides proved a sufficient starting point for the generation of a full-length sequence via 5Ј and 3Ј RACE. Cloning and sequencing of 5Ј and 3Ј RACE products generated 169 and 881 bp of 5Ј and 3Ј sequence, predicting a total message size, not including polyadenylation, of 1062 bp (GenBank TM acces-sion number AY06613). This corresponds well with Northern analysis showing a major band at ϳ1.5 kb and a minor band at ϳ1.1 kb (data not shown). Sequencing of multiple clones revealed that the mRNA apparently contains four nucleotide polymorphisms, only one of which confers an amino acid ambiguity. Specifically, position 200 can be either aspartic acid or asparagine ( Fig. 2A). This polymorphism seems unlikely to alter binding specificity, since it lies outside of the predicted ricin domain (discussed below).
Analysis of the cDNA indicates an open reading frame encoding a protein of 293 amino acids ( Fig. 2A). Inspection of the predicted amino acid sequence shows the presence of all eight of the sequenced peptides. MOA also apparently lacks a signal peptide and is therefore probably cytosolic. The predicted molecular weight of this protein is 32,299. This is consistent with analysis of the native protein by MALDI-TOF mass spectrometric analysis giving a molecular mass for the full-length native protein of approximately 32,290. Mass spectrometric analysis of the native protein and tryptic digests thereof showed remarkable correlation between the observed molecular weights and those predicted from the deduced amino acid sequence (data not shown). This strongly suggests that the isolated lectin and the cloned cDNA product are the same protein.
The MOA open reading frame was cloned into a T7 expression vector. The protein was produced in E. coli and purified as described above. Recombinant MOA had an electrophoretic mobility in SDS-PAGE identical to that of native protein at 32 kDa (Fig. 3) and eluted as a single, symmetrical peak at the same elution volume as native MOA from a G2000 SWXL molecular sieve column (not shown; see Ref. 2). Moreover, polyclonal rabbit antisera prepared against either the native MOA or the recombinant MOA formed precipitin bands of identity with the native and recombinant MOA preparations. The recombinant protein and native protein were also subjected to MALDI-TOF mass spectrometry. Recombinant MOA showed a molecular mass of 32,090 Ϯ 20 Da, whereas the native protein had a slightly higher mass, 32,132 Ϯ 17 Da. Both the native and recombinant proteins were found to contain less than 0.25 mol of neutral sugar/mol of 32-kDa protein by the phenol-sulfuric acid assay (9), provided that the proteins were purified by absorption to Synsorb B and elution at high pH with diaminopropane, a procedure not involving elution with a  sugar. Similarly, neither protein was stained on SDS-PAGE gels by the periodate-Schiff stain. Since the native protein appears to be blocked at the N terminus, the difference in the molecular mass of the native versus recombinant proteins might be caused by the presence of a blocking group, such as an N-acetyl moiety (M ϭ 42 Da) on the native protein.
Binding constants of several relevant oligosaccharides to recombinant and intact native MOA were determined calorimetrically. As shown in Table II, little or no difference was observed between the two preparations. The original isolation of MOA had produced protein containing a mixture of fulllength (32 kDa) and "clipped" protein (23-and 10-kDa fragments (2)). Not surprisingly, both intact MOA and recombinant MOA show slightly stronger binding to Gal␣1,3-linked oligosaccharides than the "clipped" form of the native lectin.
Blast searches of the MOA sequence showed highest similarity to the ricin domains of a xylanase-arabinofuranosidase from Streptomyces chattanoogensis (NCBI accession number AAD32559), a ␤-mannanase from Polyangium cellulosum (NCBI accession number AAK19890), the ␣ chain of coagulation factor G from horseshoe crab (NCBI accession number BAA04044), and mosquitocidal toxin 21 from Bacillus sphaericus (NCBI accession number S27514). The presence of a ricin domain is best shown by the alignment of MOA with this subset of ricin domain-containing proteins (Fig. 2B). Other ricin domain-containing proteins showed less identity with MOA. Outside of the prospective ricin domain, however, no convincing homology was observed with these or any other known proteins. Many of the ricin domain-containing proteins, like the ricin B chain itself, promote the internalization of disulfide-linked toxic protomers through their binding to glycosylated cell surface receptors (10); however, there is no evidence that MOA functions in this manner.
Structural analysis of ricin domains suggests that they are composed of three repeating subdomains that may have originated from an ancestral galactose-binding motif (11). Closer analysis of the three subdomains of MOA indicates strong conservation with the key residues in the 1␣ and 2␥ subdomains of ricin and ebulin (Fig. 4). Structural determination of these proteins in the presence of sugar shows binding to these two subdomains (11,12). All of the MOA subdomains have the conserved QXW motif (13). In ricin, the conserved tryptophan is necessary for hydrophobic packing of the core structure, whereas the glutamine coordinates the conserved aspartic acid that hydrogen-bonds with the third and fourth oxygens of the galactosyl moiety. The asparagine prior to the QXW motif also hydrogen-bonds with the O-3 and O-4 of the sugar. The corresponding histidine found in the MOA subdomains could function similarly. Additionally, there is a conserved hydrophobic position occupied by tryptophan, tyrosine, or phenylalanine between the conserved aspartic acid and asparagine. This residue forms a stacking interaction with the sugar ring. In the MOA subdomains, this position is also occupied by a tryptophan.
Because the essential features required for galactosyl binding are conserved in MOA, it is interesting that the specificity of ricin is very different from that of MOA. While MOA is specific for Gal␣1,3Gal-containing sugars, ricin binds well with ␤-1,3or ␤-1,4-linked galactose-terminated sugars (14). Like MOA, ricin shows higher affinity for larger, more complex saccharides than for simple sugars (2,15). The affinity constant for lactose binding to ricin is 10-fold greater than for galactose alone (15). Similarly, MOA binds Gal␣1,3Gal with an affinity constant 44-fold greater than that for Me␣-Gal (2). While the structure of ricin shows hydrogen bonding exclusively to the terminal sugar, it is clear that elements outside of the main binding pocket are important for determining the strength and specificity of binding.
Of particular interest in explaining the difference between MOA and other ricin domain proteins could be the loop region between the stacking hydrophobic residue and the sugar binding asparagine/histidine. Unlike other subdomain segments, this loop does not model well onto ricin. It is longer in MOA than in ebulin and ricin by 1-3 residues and appears structurally different, since it lacks a conserved proline following the hydrophobic stacking residue. This region could provide an additional hydrophobic stacking interface or hydrogen bonding specific for Gal␣1,3Gal-containing sugars either through direct side chain contact or water-mediated interactions and would be appropriately positioned to sterically block sugars not in the 1,3 orientation.
The cloning and expression of the recombinant MOA provides a route for understanding the structure and unique carbohydrate binding specificity of this novel lectin. Crystallographic structure determination of MOA in the presence of bound sugar should provide the rationale for the specific binding of Gal␣1,3Gal. This study also emphasizes the flexibility of the ricin domain in sugar binding specificity and suggests that the ricin superfamily will be a continuing source for the discovery of novel lectins that, like MOA, are specific in recognition for both sugar moiety and linkage.