Characteristic recognition of N-acetylgalactosamine by an invertebrate C-type Lectin, CEL-I, revealed by X-ray crystallographic analysis.

CEL-I is a C-type lectin, purified from the sea cucumber Cucumaria echinata, that shows a high specificity for N-acetylgalactosamine (GalNAc). We determined the crystal structures of CEL-I and its complex with GalNAc at 2.0 and 1.7 A resolution, respectively. CEL-I forms a disulfide-linked homodimer and contains two intramolecular disulfide bonds, although it lacks one intramolecular disulfide bond that is widely conserved among various C-type carbohydrate recognition domains (CRDs). Although the sequence similarity of CEL-I with other C-type CRDs is low, the overall folding of CEL-I was quite similar to those of other C-type CRDs. The structure of the complex with GalNAc revealed that the basic recognition mode of GalNAc was very similar to that for the GalNAc-binding mutant of the mannose-binding protein. However, the acetamido group of GalNAc appeared to be recognized more strongly by the combination of hydrogen bonds to Arg115 and van der Waals interaction with Gln70. Mutational analyses, in which Gln70 and/or Arg115 were replaced by alanine, confirmed that these residues contributed to GalNAc recognition in a cooperative manner.

tebrate C-type lectins, the C-type CRDs mostly occur as carbohydrate-binding modules linked to other domains with distinct functions. They fall into seven groups (4) and function through cooperation of the CRDs with other individual domains. In contrast, invertebrate C-type lectins are mostly single-domain proteins. One of their possible roles is thought to be inactivation or opsonization of foreign microorganisms in place of immunoglobulins in vertebrates. There are only limited numbers of C-type CRDs whose tertiary structures have been determined. Mannose-binding protein (MBP) was the first C-type lectin for which a crystal structure was determined (5). This protein contains an N-terminal collagenous domain, followed by a link domain and a C-type CRD. The latter two domains have been expressed in a recombinant form, and the structure as well as various mutant forms have been analyzed by x-ray crystallography (6 -10). In addition to MBP, the crystal structures of CRDs complexed with carbohydrates have been solved for DC-SIGN, DC-SIGNR (11), P-and E-selectins (12), tunicate lectin TC14 (13), and rattlesnake venom lectin RSL (14). In addition to these lectins, several C-type lectin-like domains (CTLD) without carbohydrate-binding ability have also been found (2). These lack the essential residues for carbohydrate binding, and instead many are thought to function as receptors for noncarbohydrate ligands. Although these proteins share only slight sequence similarity (ϳ20%), their tertiary structures are basically similar to each other (15).
We previously isolated four Ca 2ϩ -dependent galactose/Nacetylgalactosamine-specific lectins (CEL-I, -II, -III, and -IV) from the marine invertebrate Cucumaria echinata (Holothuroidea) (16). Based on their amino acid sequences, CEL-I and CEL-IV are clearly categorized into the C-type lectin family, whereas CEL-III is a novel Ca 2ϩ -dependent lectin with strong hemolytic activity as well as cytotoxicity (17)(18)(19)(20). CEL-III shows sequence similarity with ␤-trefoil lectins, such as the B-chains of ricin and abrin (21). The similarity of the threedimensional structure of CEL-III with these proteins has recently been revealed by x-ray crystallographic analysis (22). CEL-I, which is the smallest lectin in C. echinata, is composed of two identical 16-kDa subunits linked by a single interchain disulfide bond (23). CEL-I has very high specificity for Nacetylgalactosamine (GalNAc), and its affinity for GalNAc is ϳ1000-fold stronger than that for galactose as judged by hemagglutination inhibition assays. To date, no other C-type lectins with such high specificity for GalNAc have been identified. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The It is therefore of great interest to clarify the CEL-I-specific carbohydrate-recognition mechanism not only to understand the carbohydrate-recognition mechanism of C-type lectins but also the structural basis to design molecules with novel carbohydrate-specific affinity. We previously obtained single crystals of CEL-I that were suitable for x-ray diffraction studies (24). Recently, we also succeeded in crystallizing the CEL-I-GalNAc complex. In this report, we describe x-ray crystallographic analyses of CEL-I and the CEL-I-GalNAc complex to elucidate the mechanism of its high specificity for GalNAc along with the site-directed mutagenesis experiments of a recombinant CEL-I (rCEL-I) that was expressed in Escherichia coli cells.

EXPERIMENTAL PROCEDURES
Purification of CEL-I-CEL-I was purified from C. echinata according to a previously reported method (19). The proteins extracted from a homogenate of C. echinata were applied to a lactose-Cellulofine column equilibrated with 0.15 M NaCl, 10 mM Tris-HCl, pH 7.5, (TBS) containing 10 mM CaCl 2 . Adsorbed lectins (CEL-I, -III, and -IV) were eluted with TBS containing 20 mM EDTA. The lectins were then separated using a GalNAc-Cellulofine column utilizing the differences in their carbohydrate-binding specificities. After elution of CEL-III with TBS containing 0.1 M lactose, CEL-I and CEL-IV were eluted with TBS containing 20 mM EDTA. CEL-I and CEL-IV were finally separated by gel filtration through Sephadex G-75 in TBS.
Crystallizations and Data Collection-The purification and crystallization of native CEL-I have already been reported (24). Briefly, the protein solution (5 mg/ml, 2-4 l) in TBS was mixed with the same amount of reservoir solution (0.1 M Tris-HCl, pH 8.0, 60% MPD), and subjected to hanging or sitting drop vapor diffusion at 20°C. X-ray data collection was performed on beamline BL-18B at the High Energy Acceleration Research Organization, Tsukuba, Japan, using an ADSC Quantum 4R CCD camera (25). The space group was monoclinic C2 with unit cell parameters of a ϭ 92.4, b ϭ 69.9, c ϭ 76.7 Å, and ␤ ϭ 136.5°. Reduction of a total of 72,844 reflections from CEL-I crystals yielded 19,924 independent reflections with 92.3% completeness for a resolution range of 52.7-2.0 Å and an overall R merge of 6.1%. CEL-I-GalNAc complex crystals were prepared under essentially the same conditions as the native crystals except for the addition of 1 mM GalNAc to the samples. Because the complex crystals were initially formed as spherulites, these were used as seeds for a microseeding method. Single crystals of the CEL-I-GalNAc complex grew to sufficient sizes for data collection within one month. Data collection was performed at 100 K using a Rigaku R-AXIS VII imaging plate area detector equipped with a Rigaku FR-E SuperBright rotating-anode generator. Image data were processed by the programs MOSFLM (26) and SCALA (27). The space group and unit cell parameters of the CEL-I/GalNAc crystals were determined to be P2 1 and a ϭ 39.6, b ϭ 52.2, c ϭ 136.6, and ␤ ϭ 91.7°, respectively. Assuming two molecules of CEL-I, each composed of a homodimer in the asymmetric unit, the value of the Matthews constant, V m (28), is 2.20 Å 3 Da Ϫ1 corresponding to a solvent content of 44.2%. The final data for the CEL-I-GalNAc complex crystals comprised 59,176 independent reflections with 96.3% completeness for a range of 33.8 -1.7 Å and an overall R merge of 6.2%. The completeness and R merge for the highest resolution shell (1.79 -1.70 Å) were 84.9 and 16.2%, respectively.
Structure Determination and Refinement-The crystal structures of native CEL-I and the CEL-I-GalNAc complex were solved by the molecular replacement method. The search model for native CEL-I was based on the human lithostathine structure (PDB code 1LIT) (29), which shares 28% amino acid sequence identity with CEL-I. Molecular replacement calculations were performed using the program AMoRe (30) from the CCP4 suite (26). The model was refined using the program CNS (31) with noncrystallographic symmetry restraints. Five percent of the reflections were set aside for R free calculations (32). Manual fitting of the model was carried out by the program Xfit (33). The final R and R free factors for all reflections between 52.7-and 2.0-Å resolution for native CEL-I were 0.140 and 0.179, respectively. The structure of the CEL-I-GalNAc complex was also solved by the molecular replacement method using the native CEL-I coordinates as a search model. The positions and orientations of GalNAc in the complex were clearly shown by the F o Ϫ F c difference Fourier map with a contour level of 3. The model was finally refined by the program REFMAC (34) without noncrystallographic symmetry restraints. The final R and R free factors between 33.8-and 1.7-Å resolution for the CEL-I-GalNAc complex were 0.158 and 0.186, respectively. The quality of the final models for native and complexed CEL-I was assessed by Ramachandran plots and analysis of the model geometry with the program PROCHECK (35). The crystallographic statistics are summarized in Table I.
Expression of Recombinant CEL-I-Wild-type rCEL-I was constructed from an artificial synthetic gene as reported previously (36). The rCEL-I gene was inserted into the pET-3a vector (Novagen), and

FIG. 1. Stereo view of the ribbon model of CEL-I (protomer A).
The ␣-helices, ␤-strands, loops, and intramolecular disulfide bonds are shown in purple, blue, gray, and green, respectively. Bound calcium ions are shown by cyan spheres. Secondary structures were determined by the program PROMOTIF (45). Fig. 1 was drawn by the program MOLSCRIPT (46) and rendered by the program Raster3D (47). The rCEL-I was pooled and dialyzed against TBS, and the active protein was separated by affinity chromatography through a GalNAc-Cellulofine column (18). DNAs encoding mutant proteins, in which Arg 115 and/or Gln 70 were replaced by Ala (R115A, Q70A and Q70A/ R115A), were prepared by polymerase chain reaction using two oligonucleotides (30mers) containing the mutation site and the plasmid containing wild-type rCEL-I DNA as a template. The mutant proteins were purified by the same procedure used for the wild-type protein.
The N-terminal sequences of these proteins were confirmed using a Shimadzu PPSQ-21 protein sequencer. Hemagglutination Assay-Serial 2-fold dilutions of a sample (50 l) were mixed with the same volume of a 5% (v/v) suspension of rabbit erythrocytes in the wells of round bottomed microtiter plates. Incubation was performed in TBS containing 10 mM CaCl 2 . The extent of agglutination was examined visually after incubation for 1 h at room temperature. The hemagglutinating activity was expressed as a titer, i.e. the reciprocal of the highest dilution producing detectable agglutination. Hemagglutination inhibition assays were performed by incubating 50-l aliquots of the protein solutions (titer 2) containing various concentrations of carbohydrates with the same volume of a 5% (v/v) suspension of rabbit erythrocytes in TBS containing 10 mM CaCl 2 .

RESULTS AND DISCUSSION
The crystal structure of CEL-I was solved by the molecular replacement method using human lithostathine, which contains a CTLD (29), as a search model. Native CEL-I is composed of one disulfide-linked homodimer (protomers A and B) in each asymmetric unit (Fig. 1). As shown in other C-type CRDs, the three-dimensional structure of CEL-I consists of two parts, a lower part composed of two ␣-helices (␣1 and ␣2) and four ␤-strands (␤0, ␤1, ␤1Ј, and ␤5), and an upper part composed of four ␤-strands (␤2, ␤2Ј, ␤3, and ␤4), the latter of which contains the calcium-binding sites (Figs. 1 and 2). Regardless of the low sequence similarity (16 -30%) with other C-type CRDs and CTLDs, structural comparison with known three-dimensional structures by DALI (37) (10) were aligned. The colors for the secondary structures are the same as those described in the legend to Fig. 1. Conserved residues are illustrated in red. Conserved residues among both the galactose-and mannose-binding types and the galactose-binding type only of the C-type lectins are shown in yellow and orange, respectively. Cysteine residues widely conserved among CTLDs are shown in green. The structure-based sequence alignment was performed by DALI (37), except for the GalNAc-specific binding starfish lectin. The residues that coordinate with carbohydrate-binding Ca 2ϩ ions are boxed in blue. This figure was drawn by the program ALSCRIPT (50) and manually modified. insertion (residues 36 -41) between ␣1 and ␤1Ј containing a 3 10 helix (residues 37-39) (Fig. 3).
There are two intramolecular disulfide bonds in CEL-I (Cys 3 -Cys 14 and Cys 31 -Cys 135 ). The bond between Cys 31 and Cys 135 is equivalent to a widely conserved disulfide bond among CTLDs, whereas that between Cys 3 and Cys 14 is only observed in long form C-type CRDs (38). However, CEL-I does not have a disulfide bond connecting ␤3 and the loop between ␤4 and ␤5, which is well conserved in CTLDs. In CEL-I, the corresponding cysteine residues are replaced by Tyr 111 and Ala 127 (Fig. 3). The space resulting from the lack of this disulfide bond is filled by Tyr 111 , which may contribute to the stabilization of the protein structure instead of the corresponding disulfide bond.
The two identical protomers of CEL-I are linked by an interchain disulfide bond between the Cys 36 residues (Fig. 4A). In addition to this disulfide bond, there are several interactions between the two subunits around the N-terminal (residues 1-4), C-terminal (residues 137-140), and insertion loop (residues 36 -41) (Fig. 5). Eight hydrogen bonds are present between the two protomers (Gln 2 -Gln 2 , Gln 2 -Gly 11 , Glu 140 -Thr 5 , Tyr 34 (protomer A)-Thr 38 (protomer B) and Thr 38 (protomer A)-Ser 35 (protomer B)). In particular, Gln 2 appears to be especially important for dimeric interactions, because the O⑀-1 and N⑀-2 atoms of its side chain in one subunit make hydrogen bonds to the main-chain amide group of Gln 2 and the mainchain carbonyl group of Gly 11 in the other subunit, respectively (Fig. 5A). There are also hydrophobic interactions between the two subunits through Pro 4 , Leu 39 , Val 41 , Leu 137 , and Phe 139 (Fig. 5B). Among the available C-type lectin crystal structures, only tunicate lectin TC14 has a homodimeric structure composed of two C-type CRDs (39). The two subunits of the TC14 homodimer associate with each other through noncovalent interactions between ␤-strands and ␣-helices (Fig. 4B), because two N-terminal ␤-strands (␤1) make an anti-parallel ␤-sheet, and two ␣-helices (␣2) associate by hydrophobic interactions.
Recently, another C-type CRD linked by intermolecular disulfide bonds has been reported in the decameric structure of rattlesnake venom lectin RSL (14). Meanwhile, the coagulation factor-binding proteins also form a disulfide-linked dimer composed of CTLDs, which have long loops to make dimeric interactions (40,41) Three Ca 2ϩ ions (designated CA1, CA2, and CA3) are present in each subunit (Fig. 2). They are bound in a similar way to those of MBP-A (6), lung surfactant protein D (42), DC-SIGN, and DC-SIGNR (11). CA1 is coordinated by the side chains of Asp 77 , Glu 81 , Asn 104 , and Asp 110 , the main-chain carbonyl oxygen of Glu 109 , and one molecule of water ( Fig. 2A). CA2 is coordinated by the side chains of Gln 101 , Asp 103 , Glu 109 , Asn 123 , and Asp 124 and the main-chain carbonyl oxygen of Asp 124 (Fig.  2B). In addition, two hydroxyl groups of MPD, used as a precipitant for crystallization, coordinate with CA2. They also form hydrogen bonds with the side chains of Gln 101 , Asp 103 , Glu 109 , and Asn 123 , and the main-chain carbonyl oxygen of Asp 124 . CA3 shares the side chains of Glu 81 and Asp 110 with CA1, and the other ligands for CA3 are the main-chain carbonyl group of Gly 93 in the symmetry-related protomer B (only for CA3 in protomer A) and three water molecules ( Fig. 2A).
There are two CEL-I homodimer molecules (the subunit structure is designated 1A/1B and 2A/2B for each molecule) in the asymmetric unit of the CEL-I-GalNAc complex crystal. Four bound GalNAc molecules coordinate to CA2 with their 3-OH and 4-OH in place of the two hydroxyl groups of MPD found in the native CEL-I crystal (Fig. 6A). These hydroxyl groups also form hydrogen bonds with the side chains of Gln 101 , Asp 103 , Glu 109 , and Asn 123 . This binding mode is basically consistent with those of the carbohydrate complexes of MBP-C (9) and the GalNAc-binding mutant of MBP (10) (Fig. 6B). Although the indole rings of Trp 105 in protomers 1B and 2B are flipped relative to those in protomers 1A and 2A, because of the hydrogen-bonding with the main-chain carbonyl oxygen of Gln 101 in the symmetry-related molecule of protomers 1B and 2B, the Trp 105 residues in all protomers make van der Waals contact with the C-6 atom of GalNAc, thereby stabilizing the binding of GalNAc. Tryptophan residues have been found to stabilize the binding of carbohydrates in the Gal/GalNAc-recognizing C-type CRDs (7,13). However, the interaction between Trp 105 and the C-6 atom of GalNAc appears to be relatively weak in the case of CEL-I, compared with that of Trp 189 in the GalNAc-binding mutant of MBP, which makes van der Waals contacts with the C-3, C-4, C-5, and C-6 carbon atoms of GalNAc (Fig. 6B). In the mutant MBP, five residues (192)(193)(194)(195)(196) inserted as a glycine-rich loop (7) appear to lead to closer contact between Trp 189 and GalNAc.
In the case of the CEL-I-GalNAc complex, there are two hydrogen bonds between the guanidium group of Arg 115 and the carbonyl oxygen of the acetamido group of GalNAc (Fig.  6C). An arginine residue (Arg 122 ) corresponding to Arg 115 in CEL-I has been found in starfish (Asterina pectinifera) lectin, which also specifically binds to GalNAc (43). In addition, the acetamido group of GalNAc also makes van der Waals contacts with the side chains of Gln 70 and Asn 123 (Fig. 6C). Therefore, these interactions are assumed to be closely related to the high affinity binding of GalNAc to CEL-I. CEL-I binds to the ␤-anomer of GalNAc, except for protomer 1B which preferentially binds the ␣-anomer because of a hydrogen bond between 1-OH of the bound GalNAc and Asp 83 in protomer 2A. In protomers 1A and 2A, the terminal NH 2 group of the side chain of Arg 115 also makes a hydrogen bond with 1-OH of GalNAc, which preferentially stabilizes the binding of the ␤-anomer of GalNAc (Fig. 6C). Because Asn 123 is essential as a ligand for CA2, we constructed mutants of rCEL-I, in which Arg 115 and/or Gln 70 were replaced by alanine (R115A, Q70A, and Q70A/R115A), to evaluate the contributions of these residues to GalNAc recognition. As reported previously, the wild-type rCEL-I expressed in E. coli cells exhibited a very similar, although slightly lower affinity, carbohydrate-binding specificity to the native protein in hemagglutination inhibition assays (36). Therefore, we evaluated the carbohydrate-binding specificity of the CEL-I mutants through the inhibition of hemagglutination by several simple carbohydrates. As shown in Table II, hemagglutination was inhibited by galactose-related carbohydrates in the wildtype as well as the mutant rCEL-I. In particular, GalNAc exhibited much stronger inhibition than the other carbohydrates (more than 250-fold in the case of the wild-type). However, inhibition by GalNAc decreased to ϳ1/8, 1/32, and 1/64 for the R115A, Q70A, and Q70A/R115A mutants, respectively, compared with the wild-type. On the other hand, other galactose and galactose-containing carbohydrates showed only slight changes in their inhibitory profiles, compared with the wild-type. This confirms that Arg 115 and Gln 70 make significant contributions to the recognition of GalNAc. In particular, Gln 70 appears to be more important than Arg 115 despite hydrogen bonds between the latter and GalNAc. As shown in Table  II, the Q70A/R115A mutant still had the same affinity for Gal as well as a moderately high affinity for GalNAc. This indicates that Gln 70 and Arg 115 have little influence on the recognition of the galactose moiety and also suggests that other residue(s), such as Asn 123 , may be involved in recognizing the acetamido group of GalNAc in addition to Gln 70 and Arg 115 .
A GalNAc-binding mutant of MBP was constructed based on sequence comparison with rat hepatic lectin, which shows higher affinity for GalNAc than for Gal (44), and its complex structure with GalNAc was reported (10). Its preferential affinity for GalNAc compared with that for Gal is achieved by van der Waals contact between the methyl group of the acetamido group of GalNAc and the C⑀-1 and N⑀-2 atoms of His 202 , which was introduced in place of threonine, the corresponding amino acid in rat hepatic lectin. This MBP mutant attained a 60-fold higher affinity for GalNAc than for Gal, indicating that such van der Waals interactions are very important. As shown in Fig. 6, B and C, the orientation of GalNAc bound to CEL-I is very similar to that in the GalNAc-binding mutant of MBP. In the case of CEL-I, the methyl group of the acetamide of GalNAc makes van der Waals contact with Gln 70 as mentioned above. Therefore, it appears that Gln 70 may play a similar role to His 202 in the GalNAc-binding mutant of MBP. On the other hand, Gln 113 , which is located at the corresponding position to His 202 in the MBP mutant, may have the potential to interact with the methyl group of GalNAc, although a mutational analysis investigating this was not performed in the present study.
CEL-I exhibits strong cytotoxicity (36), which suggests an actual biological role for this protein as a defense toxin against predators. The toxicity is inhibited in the presence of GalNAc suggesting that it is mediated by binding to specific carbohydrate chains on the cell surface. One probable mechanism is that the binding of CEL-I to cell surface carbohydrate chains triggers intracellular signaling, leading to cell death. An identification of the natural carbohydrate ligands for CEL-I that are present on the target cell surface seems to be important for understanding the cytotoxicity at the molecular level. Further investigation of the involvement of the residues around the carbohydrate-binding site of CEL-I using recombinant proteins should provide important clues regarding the recognition mechanism for natural ligands on the target cell surface.