Novel Galactose-binding Proteins in Annelida

Novel type lectins were found in the phylum Annelida, i.e. in the earthworm, tubifex, leech, and lugworm. The lectins (29–31 kDa) were extracted from the worms without the use of detergent and purified by affinity chromatography on asialofetuin-agarose. On the basis of the partial primary structures of the earthworm Lumbricus terrestris 29-kDa lectin (EW29), degenerate primers were synthesized for use in the reverse transcriptase-polymerase chain reaction. An amplified 155-base pair fragment was used to screen a cDNA library. Four types of full-length clones were obtained, all of which encoded 260 amino acids, but which were found to differ at 29 nucleotide positions. Since three of them resulted in non-silent substitutions, EW29 mRNA was considered to be a mixture of at least three distinct polynucleotides encoding the following proteins: Ala44-Gln197-Ile213 (clone 5), Gly44-Gln197-Val213 (clone 7), and Ala44-His197-Ile213 (clones 8 and 9; different at the nucleotide level, but encoding an identical polypeptide). Genomic polymerase chain reaction using DNA from a single worm revealed that the single worm already had four sets of cDNAs. The EW29 protein showed two features. First, the lectin was composed of two homologous domains (14,500 Da) showing 27% identity with each other. When each of the domains was separately expressed inEscherichia coli, the C-terminal domain was found to bind to asialofetuin-agarose as strongly as the whole protein, whereas the N-terminal domain did not bind and only retardation was observed. EW29 was found to exist as a monomer under non-denaturing conditions. It had significant hemagglutinating activity, which was inhibited by a wide range of galactose-containing saccharides. Second, EW29 contained multiple short conserved motifs, “Gly-X-X-X-Gln-X-Trp.” Similar motifs have been found in many carbohydrate-recognizing proteins from an extensive variety of organisms, e.g. plant lectin ricin B-chain and Clostridium botulinum 33-kDa hemagglutinin. Therefore, these carbohydrate-recognition proteins appear to form a protein superfamily.

Galactose-binding proteins (lectins) represent a distinguished group among lectins because statistics have shown that more than 60% of the lectins reported thus far are galactose-specific (1). A possible explanation for such a preference is that galactose was selected as an important "recognition" saccharide especially in higher organisms. In fact, galactose is used more frequently in higher organisms like mammals, whereas glucose and mannose are recognized by microorganisms such as bacteria and fungi. From a glycochemical viewpoint, galactose has a nature inherently distinct from that of glucose, mannose, or fructose. The latter three monosaccharides are related by "Lobry de Bruyn-Alberta van Ekenstein transformation" (2). It is well established that in N-linked oligosacharide biosynthesis galactose is incorporated at later stages after removal of glucosyl and mannosyl residues from the common precursor Glc 3 Man 9 GlcNAc 2 (3,4). These observations led one of the authors to present a hypothesis on the origin of elementary hexoses that galactose is a "latecomer" saccharide relative to glucose and mannose (5). Galactose is exposed at outermost spaces of cells unless it is masked by sialic acids, so that it is easily recognized by various communication molecules by homophilic carbohydrate-carbohydrate (6) or heterophilic carbohydrate-protein interactions (7).
Among galactose-specific lectins, galectins are unique in that all of the members belonging to this family are galactosespecific (8,9). In general, they are soluble and require no metal ion for their activity. Galectins bind most preferentially to lactosamine-containing saccharides of both glycoproteins and glycolipids. Although galectins had long been believed to occur in vertebrates only, they proved to be distributed in much lower organisms as well (10 -15). Although the biological roles of galectins are not fully understood, they are supposed to play multiple roles basic to multicellular organisms, such as in development, differentiation, tumorigenesis, metastasis, apoptosis, etc. In this regard, if galectins are essential for these biological processes, they would be expected to exist in all animal species. However, there have been only few reports of galectins other than in two animal phyla representing deuterostomes and protostomes, i.e. Vertebrata and Nematoda, respectively. On the other hand, if this is not the case, a possibility emerges that some other galactose-binding lectins compensate for the absence of galectins in such animal phyla.
We have undertaken screening of galectin-like proteins from the phylum Annelida, by employing the same purification strategy as that for galectins, i.e. lactose-specific extraction in the absence of detergent and metal ion, and affinity chromatography on asialofetuin-agarose. As a result, 29 -31-kDa lectins were purified from four annelids as follows: earthworm, tubifex (both belonging to the class Oligochaeta), lugworm (Polychaeta), and leech (Hirudinea). These proteins immunologically cross-reacted with one another and had similar bio-chemical properties as galectins with respect to specificity, solubility, and metal independence. Detailed structural analysis including cDNA cloning was carried out for 29-kDa lectin from the earthworm Lumbricus terrestris. As a result, the earthworm 29-kDa lectin (designated hereafter EW29) 1 proved to be a "tandem repeat"-type lectin, which consists of two tandemly repeated homologous domains (14.5 kDa). Contrary to our expectation, it showed no sequence homology to the known galectins. However, it showed resemblance to some carbohydrate-relating proteins such as ricin B-chain, C. botulinum hemagglutinin, etc. It also contained conserved multiple repeats of short motifs.
Purification of Annelid Lectins-Lectins were extracted from worms by essentially the same procedure previously described for galectins (16,17); briefly, worms were disrupted by homogenization in a Polytron (Kinematica) with 5 volumes of cold EDTA-MEPBS (4 mM ␤-mercaptoethanol, 2 mM EDTA, 20 mM sodium phosphate, pH 7.2, 150 mM NaCl). After the bulk of soluble proteins had been removed by centrifugation (15,000 rpm, 4°C, 25 min), the lectins were specifically extracted from the precipitate (ppt-1) with EDTA-MEPBS containing 20 mM lactose by shaking for 30 min at 4°C. After centrifugation as above, the obtained extract (sup-2) was extensively dialyzed to remove lactose and then applied to a column of asialofetuin-agarose (bed volume, 10 ml) prepared according to De Waard et al. (18). After washing of the column with EDTA-MEPBS (400 ml), bound protein was eluted with the same buffer containing 20 mM lactose.
Protein concentration was determined by use of the Bio-Rad Protein Assay Dye Reagent. Protein-containing fractions were subjected to sodium dodecyl sulfate-polyacrylamide gel (14%) electrophoresis (SDS-PAGE) under reducing conditions, unless otherwise stated. Protein was visualized with a Wako Silver Stain Kit.
Separation of EW29 by High Performance Gel-permeation Chromatography-The main component (29 kDa) of the affinity purified lectin fraction from earthworms (EW29) was further purified by high performance gel-permeation chromatography on a TSK-G2000SW XL column (7.5 ϫ 300 mm). The column was equilibrated and eluted with EDTA-PBS both in the presence and absence of 20 mM lactose at a flow rate of 0.5 ml/min. Protein elution was monitored by absorbance at 280 nm. For the subsequent hemagglutination assay, lactose was removed by extensive dialysis against EDTA-PBS.
Hemagglutination Assay-Basically, the conventional assay system described by Nowak et al. (19) was used; briefly, 25 l of serially diluted samples was mixed in each well of a 96-well microtiter V plate with 25 l of EDTA-PBS, 25 l of 1% (w/v) bovine serum albumin in saline, and 25 l of trypsinized rabbit erythrocytes. After the plate had stood at room temperature for 1 h, "dot" (no agglutination) or "mat" (agglutination) formation was judged. For assessment of the inhibitory effect of various mono-and oligosaccharides, 25 l of maximally diluted lectin solution that gave mat formation and 25 l of serially diluted inhibitor saccharides were used in place of serially diluted samples and EDTA-PBS, respectively. Minimum concentrations that gave negative dot formation under the above conditions were defined as I 50 .
Effect of Saccharides on Extraction of Earthworm Lectin-The effect of various saccharides on lectin extraction was investigated as described previously (12); earthworm lectin was extracted from small portions of ppt-1 (equivalent to 1-g wet weight of the worm) in the presence of a 0.1 M concentration of various sugars. After centrifugation, the supernatant solutions were subjected to SDS-PAGE, followed by Western blotting on a nitrocellulose membrane. The lectin was stained by a conventional double antibody method using anti-tubifex lectin antiserum (described below) and horseradish peroxidase-conjugated goat anti-rabbit IgG antiserum (Seikagaku Co., Tokyo). Both antisera were used at a 1,000-fold dilution. For peroxidase detection, POD Immunostain Set or High Sensitive Immunoblotting Kit (both from Wako Chemicals, Tokyo) was used.
Production of Antisera-Antisera were raised in rabbits against either affinity purified tubifex lectin or affinity and gel filtration-purified EW29 by injecting the animals several times at 10 -14-day intervals with 0.1-0.2 mg of lectin emulsified with Freund's complete adjuvant. Titer of the produced antisera was evaluated by both dot and Western blotting analyses. The antisera were stored at 4°C in the presence of 0.02% NaN 3 .
Protein Structural Analyses-Affinity purified lectins from earthworm, tubifex, and leech were further purified by reversed phase chromatography on a TSK TMS250 column (4.6 ϫ 75 mm). Protein was eluted by a linear gradient of acetonitrile (20 -60%, v/v) in 0.1% trifluoroacetic acid. The separated lectin fractions were lyophilized, dissolved in 10 mM Tris-HCl, pH 9.0, and digested with Achromobacter protease I (Wako Chemicals, Tokyo). Generated peptides were separated by reversed phase chromatography on a TSK-ODS-80TM column (7.5 ϫ 250 mm) and were analyzed by a protein sequencer (Applied 477A) as described (17). The peptides were designated "Lys"-1, -2, -3 and so on, in the order of elution. In addition, in the case of earthworm 29-kDa lectin, reduction and S-carboxymethylation was also performed prior to the protease digestion. Thus derived peptides were prefixed "Cm-Lys." Preparation of Genomic DNA from Earthworms-Genomic DNA was prepared from two earthworm species, A. japonica and L. terrestris, by a conventional procedure (20). A few worms were used for the preparation from A. japonica, and a single worm was used for that from L. terrestris to assess the observed gene polymorphism (described below).
cDNA Cloning of EW29 -A probe DNA was prepared by means of the polymerase chain reaction (PCR). For amplification of a part of earthworm lectin gene, four sets of convergent oligonucleotide primers were synthesized based on the determined peptide sequences (Lys-4, Lys-7, Lys-8, and Lys-10), designated as Lys-4F/4R, Lys-7F/7R, Lys-8F/8R, and Lys-10F/10R (see Table IV for detail). PCR was performed by use of all combinations of the above forward (F) and reverse (R) primers and a Takara LA-PCR Kit (94°C, 30 s; 52°C, 1 min, 72°C, 3 min for 35 cycles, then 72°C, 10 min). Genomic DNA prepared from the earthworm A. japonica was used as a template. The amplified DNA (155 bp) obtained with primers Lys-4F and Lys-7R was cloned into pCRII (Invitrogen) on the basis of TA-cloning strategy according to the manufacturer's instruction, and the nucleotide sequence was confirmed with an Applied 373S DNA sequencer.
For cloning of full-length cDNAs, a ZAP cDNA library (kindly provided by Dr. Giebing, Dü sseldorf, Germany) was screened with the above 155-bp PCR fragment labeled with [␣ 32 P]dCTP by use of an Amersham Megaprime Kit. A standard plaque hybridization was performed for screening 3 ϫ 10 5 phages as described by Sambrook et al. (20). Eight positive phages were obtained and converted into plasmid pBluescript (Stratagene) by an automatic in vivo excision system according to the manufacturer's instructions. The plasmids were cut with EcoRI and XhoI to confirm the insert size and subjected to sequence analysis using an Applied 373S sequencer.
Production of Recombinant Earthworm Lectins, Wh, Nh, and Ch-Expression plasmids for recombinant EW29 protein (Wh), N-terminal domain (Nh; meaning N-half), and C-terminal domain (Ch) were constructed by a PCR procedure; for the production of Wh, a pair of oligonucleotide primers were designed, i.e. G ATG GCT GGA AGG CCT TTT CTG (Met 1 -Ala-Gly-Arg-Pro-Phe-Leu; designated Wh-F; see Table  IV) and CGAGTGGAGT TTA CTC GGA TTC G (antisense, Glu 258 -Ser-Glu-Ter; Wh-R), and a full-length fragment (794-bp) encoding 260 amino acids was amplified with a full-length plasmid (clone 5) used as a template. The derived fragment was cloned into pCRII in frame with ␤-galactosidase ␣-peptide, and the derived plasmid was used to transform Escherichia coli TOP10FЈ (Invitrogen). Positive clones were selected by colony immunoblot analysis using specific antiserum against EW29 and horseradish peroxidase-conjugated goat anti-rabbit IgG as described previously (21). The cloned E. coli was proliferated to full growth in 1 liter of LB medium in the presence of antibiotics at 37°C. The recombinant protein was induced by adding isopropyl-␤-thiogalactoside to give a final concentration of 0.2 mM and incubating the culture for 2 h at the same temperature. The cells were collected and lysed by sonication, and recombinant protein was adsorbed to asialofetuin-agarose, as described previously (10).
For construction of plasmids for expression of individual domains (Nh and Ch), pairs of oligonucleotide primers for an interdomain region were designed, i.e. C ATG AAG CCG AAG TTC TTC TAC ATC (Met 129 -Lys-Pro-Lys-Phe-Phe-Tyr-Ile; designated Ch-F; underline denotes introduced mutations) and TTA GAA GAA CTT CGG CTT CAA GTG (antisense, His 128 -Leu-Lys-Pro-Lys-Phe-Phe-Ter; Nh-R). For production of Nh and Ch, primer sets Wh-F/Nh-R and Ch-F/Wh-R were used, respectively. All other procedures were the same as those employed for Wh.
Single Worm Genomic PCR-In order to assess a cause for the polymorphism found in the earthworm lectin cDNAs (described under "Results"), genomic DNA was prepared from a single worm, and genomic PCR was performed with primers Wh-F and Wh-R to amplify a full-length region. After cloning of the derived fragments (0.8 kilobase pairs) into pCRII and transformation of E. coli TOP10FЈ, 9 independent genomic clones were selected and their nucleotide sequences were compared with one another and to those derived from the cDNA cloning.

Identification of Galactose-binding Lectins in Annelids-
Based on the concept that galectins are bound to insoluble glycoconjugates but are solubilized with a competing sugar, lactose, we applied the same procedure to purify annelid lectins as employed for galectins (10,16,17) as follows: (i) galactosespecific extraction in the absence of metal ions, and (ii) affinity chromatography on asialofetuin-agarose. A typical result of affinity chromatography on asialofetuin-agarose is shown for earthworm lectin in Fig. 1A, and the results of SDS-PAGE for all four annelid speceis, i.e. earthworm, tubifex, leech and lugworm, are shown in Fig. 1B. Major molecular species observed after lactose elution from the column were the following: 29 kDa (earthworm), 31 kDa (tubifex), 30 kDa (leech), and 30 kDa (lugworm). The observed sizes were also similar to those reported for either mammalian chimera-type (29 -35 kDa) (22) or nematode tandem repeat-type galectins (32 kDa; see Refs. 10, 11, and 13). Since these lectins were obtained in the absence of metal ion, they were considered not to be C-type lectins. The fact that no detergent was necessary for the extraction also excluded the possibility that the annelid lectins are membranebound proteins. The yields of purified lectins from the four annelid species are summarized in Table I.
Beside major molecular species (29 -31 kDa), earthworm, tubifex, and leech preparations also showed the presence of multiple smaller fragments of 15-17 kDa (Fig. 1B). However, the following observations suggest that these fragments were derived by proteolysis of the 29 -31-kDa species. (i) When purification of earthworm lectin was processed as rapidly as possible (e.g. within 2 days) and special care was taken for the maintenance of the homogenate at a low temperature, the proportion of the 16-kDa species in Fig. 1B was considerably decreased (data not shown). (ii) Among the four species of annelids, the amount of smaller fragments was highest in the most fragile worm, tubifex (see Fig. 1B). (iii) The sizes of smaller fragments were almost half of those of the main species (29 -31 kDa). To confirm the above possibility, we isolated the 16-kDa earthworm protein by reversed phase chromatography and compared its peptide sequences with those of EW29. Since both molecular species gave identical peptide sequences (data not shown), at least some of the smaller fragments were considered to be degradation products of the 29 -31-kDa proteins, probably due to cleavage at the connecting region between tandemly repeated domains (described below).
Comparison of Structures of Annelid Lectins-In order to elucidate the structural relationship between the annelid lectins, we first attempted Western blotting analysis by using anti-tubifex lectin antiserum. As shown in Fig. 2, all of the lectins purified from the other annelids were also positive, together with the smaller fragments described above. The degree of cross-reactivities with leech (Hirudinea) and lugworm (Polychaeta) proteins was relatively weak, probably reflecting phylogenic kinship between tubifex and earthworm (both belonging to the same class, Oligochaeta). When crude lectin extracts (sup-2) were used instead of purified fractions, only high molecular species (29 -31 kDa) were observed. This is consistent with the above assumption that smaller fragments were digestion products.
For more direct structural comparison, affinity purified lectins from earthworm, tubifex, and leech were further purified by reversed phase chromatography on a TSK-TMS250 (data not shown) and digested with lysine-specific Achromobacter protease I. The derived peptides were separated on a TSK ODS-80T M column and were analyzed by a protein sequencer. For earthworm lectin, we also carried out reduction and Scarboxymethylation to identify cysteine residues. The results of peptide analyses, summarized in Table II, strongly suggest that these annelid lectins are structurally related. In particular, most of the lectins had multiple similar sequences represented by "Gly-X-X-X-Gln-X-Trp." Moreover, some of the sequences containing this short motif showed highest similarity; in case of earthworm lectin, these included His-Gly-Gln-Asp-Ala-Gln-Gln-Trp (Lys-4), His-Gly-Gly-Thr-Asn-Gln-Gln-Trp (Lys-7), and Asn-Gly-Gly-Pro-Asn-Gln-Ala-Trp (Lys-10; conserved residues underlined). This indicates that the annelid lectin contained a significant number of repeats.
Effect of Saccharides on Lectin Extraction-Since it became evident that annelid lectins are structurally related, subsequent studies were focused on the earthworm 29-kDa lectin (EW29), because its purification yield was the best among the species examined (Table I). First, various saccharides were tested for their ability to extract the lectin from the earthworm precipitate (ppt-1) as described under "Experimental Procedures." As a result, among various simple saccharides, only galactose, lactose, and melibiose were found to be effective. On  the other hand, glucose, ␣-methylmannoside, L-fucose, maltose, and sucrose had no effect even at the highest concentration used (100 mM; data not shown). As the next step, concentrations of the effective saccharides were changed (i.e. 1, 10, and 100 mM), and the amounts of extracted lectin were compared (Fig. 3). Lactose and melibiose showed almost the same ability to extract the lectin at 10 and 100 mM, whereas galactose showed slightly poorer ability. It is notable that EW29 was extractable with not only lactose, but also galactose and melibiose (Gal␤1,6Glc), a linkage isomer of lactose (Gal␤1,4Glc). Such a feature has never been observed for galectins. Oligomeric Structure and Hemagglutinating Activity of EW29 -High performance gel filtration analysis was carried out to estimate the molecular organization of EW29 under non-denaturing conditions. The affinity purified earthworm lectin (a mixture of major 29-kDa and minor 15-17-kDa fragments) was applied to a TSK G2000SW XL column in the presence of 20 mM lactose (Fig. 4A), since in its absence the protein elution was considerably retarded. Fractions showing UV adsorption (280 nm) were subjected to SDS-PAGE analysis (Fig.  4B). By comparing eluting positions of marker proteins, the 29-kDa lectin was considered to exist as a monomer (calculated mass, 30,000 Da). SDS-PAGE analysis showed that the purified fraction was Ͼ95% pure and almost free of the 16-kDa species. We used this fraction as highly purified EW29 for the production of specific antiserum and for the following hemagglutination test.
Thus purified EW29 agglutinated trypsinized rabbit erythrocytes at the minimum concentration of approximately 10 g/ml (final concentration in the assay, 2.5 g/ml). This agglutinating ability means that EW29 has more than one sugarbinding site per 29-kDa polypeptide, as it was proved to exist as a monomer. A panel of saccharides was tested for inhibitory activity toward hemagglutination as described under "Experimental Procedures." No saccharides lacking galactose showed inhibition. This is consistent with the result of lectin extraction as described above. On the other hand, galactose-containing saccharides showed varied inhibitory effects on hemagglutination, which were expressed as I 50 in Table III. The results are summarized as follows. (i) As regards monosaccharides, galactose had similar inhibitory activity as its N-acetylated (Gal-NAc) and ␣-methylated (␣-Me-Gal) derivatives, but its ␤-glycosides were significantly stronger inhibitors than galactose; i.e. Gal, GalNAc, ␣-Me-Gal Ͻ ␤-Me-Gal Ͻ ␤-pAP-Gal. (ii) Among disaccharides, melibiose showed the same inhibitory power as lactose, but their N-acetyl derivatives, N-acetyllactosamine (type 2) and Gal␤1,6GlcNAc as well as another linkage isomer, Gal␤1,3GlcNAc (type 1), were still better inhibitors. Notably, similar inhibitory power was also observed for Gal␤1,3GalNAc (T antigen). However, the non-reducing synthetic disaccharide thiodigalactoside (Gal␤1-S-1␤Gal) was the best inhibitor among those tested. (iii) L-Fucosylation of lactose at either galactose (L-Fuc␣1,2Gal␤1,4Glc) or glucose moiety (Gal␤1,4 (L-Fuc␣1,3) Glc) reduced the activity by 1 order.
The above binding feature was common to galectins in some ways but significantly distinct in others (8). They were similar in showing (i) moderate affinity to lactose, (ii) significantly higher affinity to both N-acetyllactosamine (type 2) and its linkage isomer Gal␤1,3GlcNAc (type 1), and (iii) the strongest affinity to thiodigalactoside among disaccharides. However, EW29 showed strong affinities for monosaccharides, particularly ␤-Me-Gal and ␤-pAP-Gal, whereas galectins never have such a preference. EW29 also showed a wider preference for disaccharides such as melibiose and Gal␤1,3GalNAc, to which galectins in general do not bind. However, the latter is exceptionally well recognized by rat galectin-4 (23) and nematode 16-kDa galectin. 2 Thus, EW29 is basically galactose-specific but seems to exhibit much wider preference for galactose-containing saccharides than galectins.
cDNA Cloning of EW29 -Based on the peptide sequences of EW29 (Lys 4 , Lys 7 , Lys 8 , and Lys 9 ), four pairs of degenerate primers were synthesized (for their sequences, see Table IV). PCR was performed by using all combinations of these forward and reverse primers,, and genomic DNA was prepared from the earthworm as a template. A significant amplification product (155 bp) was obtained only when the combination of primers, Lys-4F and Lys-7R, was used (data not shown). The amplified fragment was cloned into pCRII, and the nucleotide sequence was determined. The deduced amino acid sequence was found to follow the sequence of Lys-4 (used as a forward primer) with no termination codon in frame and also included the determined sequence of Lys-3, "Asp-Val-Val-His-His-Arg-Asn-Asp-Lys" (data included in Fig. 5). This confirmed the derived fragment is relevant as a PCR product corresponding to EW29.
Next, the above 155-bp PCR fragment was used as a probe to screen a -ZAP cDNA library of the earthworm L. terrestris (kindly provided by Dr. T. Giebing, Dü sseldorf). As a result of screening 3 ϫ 10 5 plaques, 8 positive clones were obtained (designated clones 1, 2, 3, 5, 7, 8, 9 and 10), and their sequences were determined following conversion into plasmid Bluescript as described under "Experimental Procedures." We found the following. (i) Clones 5, 7, 8, and 9 were of full-length clones, all of which consisted of 261 codons including ATG (Met) initiation and TAA termination codons. (ii) Clones 1, 3, 5, and 10 contained identical sequences, but their lengths were different (non-full-length clones 1, 3 and 10 started with G 48 , C 76 , G 16 , respectively. Numbers begin at initiation codon A 1 TG). Clone 2 is almost identical to clone 5, but differed at two positions, i.e. CGC (clone 2)/CGA(clone 5) both encoding Arg 225 and GTC (clone 2)/GTT (clone 5) encoding Val 228 . (iii) The remaining clones 7, 8, and 9 were apparently distinct. They differed from clone 5 at 12, 15, and 10 positions, respectively, in the coding region (sequence identities to clone 5: 98.5, 98.1, and 98.7%, respectively). All of these differences were only single nucleotide substitutions, and no deletion or insertion was observed.  2. Immunological cross-reactivity between purified annelid lectins. Affinity purified lectins from earthworm, tubifex, leech, and lugworm (each 100 ng) were subjected to Western blotting by a conventional double-antibody method using antiserum raised against tubifex lectin as the first antibody and horseradish peroxidase-conjugated goat anti-rabbit IgG as the second antibody, as described under "Experimental Procedures." The antibody binding was visualized by the enhanced chemiluminescence procedure.
Nucleotide and deduced amino acid sequences of clone 5, as a representative, is presented in Fig. 5. The sequences of clones 5, 7, 8, and 9 are also available with GenBank TM /EMBL/DDBJ accession numbers AB010783, AB010784, AB010785, and AB010786, respectively.
Most of the above differences between clones were found at the third letter of codons (24 cases out of 29; see Table V) 8 and 9). Calculated molecular masses for these variants were 29,006, 28,978, and 29,015 Da, respectively. The values well agreed with those obtained by SDS-PAGE (29,000 Da). The nucleotide sequences did not con-tain any hydrophobic signal sequence for secretion or membrane integration, or any signal for N-linked oligosaccharide attachment (Asn-X-(Ser/Thr)). These characteristics apparently resemble those of galectins. The fact that 50% of the obtained cDNA clones (clones, 1, 3, 5, and 10) were identical suggests that the corresponding transcript is more abundant than the transcripts for the others (clones 7, 8, and 9).
Two additional aspects should be mentioned. First, the lectin had no homology to galectins despite having many similar properties described above (solubility, metal independence, basic saccharide specificity, molecular size, absence of sugar chain, and signal sequence). Second, the lectin consisted of two homologous domains (14.5 kDa) having 27% amino acid identity (described in detail under "Discussion"). Homology was observed over the entire region of the 29-kDa polypeptide. However, similarity was more evident between the regions which encompassed the above-mentioned consensus motif Gly-X-X-X-Gln-X-Trp. In such regions, amino acid identities reached as high as 44%.
Single Worm Genomic PCR-To clarify whether the polymorphism described above occurs at the species or individual level, we performed genomic PCR using DNA prepared from a single worm of L. terrestris. The entire coding region was amplified by TABLE II Summary of peptide sequences derived by lysylendopeptidase digestion of annelid lectins Numbers on the first amino acid residues of most earthworm peptides are the position numbers (initial Met as position 1). Lowercase letters represent amino acids for which determination is ambiguous. Prefex Cm represents peptides derived by lysylendopeptidase digestion after EW29 was denatured and S-carboxymethylated. Bold letters denote conserved three amino acid residues in the repetitive motif, GxxxQxW (Y or F). Underlined residues indicate corresponding nucleotide sequence used as a PCR primer. NQLwYQDQSG LIr (Mix) GxEIxAYtFK Lys 5 NGPNQrxTIVYmtk Lys 6 NQLwYQDQSGLIRSSLNDFV a These fractions contained a mixture of two peptides.
using newly synthesized primers, Wh-F and Wh-R (Table IV), and the derived fragments (0.8 kilobase pairs) were analyzed after cloning into pCRII. As a result, all of the 9 genomic clones analyzed proved to be correlated to one of the four types of cDNA clones as follows: 1, 2, 3, and 3 genomic clones were assigned to cDNA clone 5, 7, 8, and 9, respectively. This result clearly demonstrates that the observed polymorphism already existed at the individual level. Bacterial Expression of Recombinant Earthworm Lectins, Wh, Nh, and Ch-To prove that the obtained cDNAs were derived from mRNAs for functional 29-kDa lectin, we chose one of the cDNAs (clone 5) for an expression experiment in E. coli. As expected, a 35-kDa protein (a fusion product with ␤-galactosidase ␣-peptide) was produced and purified by affinity chromatography on asialofetuin-agarose (Fig. 6). However, many smaller fragments (15-17 kDa) were again observed, as in the case of natural proteins (Fig. 1). Since these small fragments were able to bind to the affinity column, they were considered to be either N-terminal or C-terminal domains that retained binding ability even after degradation. To confirm this, we transferred the small fragments that had been adsorbed to asialofetuin-agarose to a polyvinylidene difluoride membrane (indicated by a, b, c, and d in Fig. 6), and subjected them to direct sequence analysis. Partial N-terminal sequences suggested that the bands a, c, and d were C-terminal fragments beginning from Lys 105 , His 127 , and Lys 132 , respectively. Since all these positions were located in the connecting region between N-terminal and C-terminal domains, this result indicates that only the C-terminal domain retained binding activity. On the other hand, the band "b" did not give any significant amino acid peak even when a sufficient amount of protein was applied. It might be attributable to artificial N-terminal cyclization to form pyroglutamate, because many glutamine residues, i.e. Gln 110 , Gln 122 , and Gln 123 , are located in the connecting region.
We further constructed expression plasmids for both individual N-terminal and C-terminal domains of EW29. Two mutant primers, 384F (Ch-5Ј) and 405R (Nh-3Ј), were synthesized (Table IV). The individual N-half and C-half regions were expressed in E. coli, and their binding properties were tested. The C-terminal domain bound to the column (Fig. 7B) as stably as the whole molecule, whereas the N-terminal domain did not bind (data not shown). However, even in the latter case, elution from the asialofetuin-agarose column was significantly retarded (Fig. 7A). Therefore, we concluded that the C-terminal domain has stronger binding ability than the N-terminal domain, as far as asialofetuin-agarose was used, although the latter retains some affinity.

DISCUSSION
When we first chose tubifex (class Oligochaeta) for purification of galactose-binding proteins, multiple protein bands around 15-17 kDa together with a faint 31-kDa band were observed on SDS-PAGE after purification by asialofetuin-agarose chromatography. However, antiserum raised against this mixed fraction detected only the 31-kDa species in the lactose extract of tubifex (data not shown). This implied that the smaller fragments were degradation products generated by proteolysis during purification from the parental 31-kDa protein. This antiserum also detected approximately 30-kDa pro- FIG. 3. Extraction of earthworm lectin with various saccharides. Earthworm was homogenized with EDTA-MEPBS in the absence of saccharide. After centrifugation, the soluble fraction (sup-1) was removed, and an insoluble fraction (ppt-1) was then divided into 10 portions, from which lectin was extracted with MEPBS containing one of the following saccharides: 100, 10, 1 mM lactose (lanes 1-3, respectively), 100, 10, 1 mM melibiose (lanes 4 -6), and 100, 10, 1 mM galactose (lanes 7-9). Each lane contained 10 g of protein. Glucose, methyl-␣mannoside, L-fucose, maltose, and sucrose had no effect on lectin extraction (data not shown).
FIG. 4. Separation of EW29 by high performance gel-permeation chromatography. A, affinity purified earthworm lectin (50 g), consisting of 29 kDa and smaller molecular species (15)(16)(17), was applied to a column of TSK G2000SWXL (7.5 ϫ 300 mm) equilibrated with EDTA-PBS containing 40 mM sodium phosphate and 20 mM lactose. Protein was eluted at a flow rate of 0.5 ml/min while monitoring absorbance at 280 nm and was fractionated every 0.5 min from when the first protein elution began (15 min). Marker proteins used for calibration were as follows: bovine serum albumin (66,000 Da), chicken ovalbumin (45,000 Da), bovine carbonic anhydrase B (29,000 Da), and bovine pancreatic ribonuclease A (13,800 Da). B, the result of SDS-PAGE for the fractionated proteins. Electrophoresis was carried out under reducing conditions using 14% gel. Protein was stained with silver.  teins in the lactose extracts prepared from other annelids, i.e. earthworm (Oligochaeta), lugworm (Polychaeta), and leech (Hirudinea).
During the course of this study, Cole and Zipser (24,25) reported the presence of similar galactose-specific lectins, LL16, LL35, and LL63, from the leech Hemopis marmorata. LL16 and LL35 seem to have biochemical properties similar to those of the earthworm lectin investigated in the present study, i.e. in terms of molecular weight, metal independence for sugar binding, and specificity for simple saccharides. As they used detergent to extract leech lectins, it should have solubilized endogenous ligand glycoconjugates to which the lectins were bound. They also reported that LL16 and LL35 showed broader sugar binding specificity than LL63; binding of the former FIG. 5. Nucleotide and deduced amino acid sequences of EW29, clone 5. Solid lines with arrowheads represent the locations of gene-specific primers used for nucleotide sequencing (Ϫ1F, 247F, 360R, 354F, 405R, and 793R), and dotted lines with arrowheads represent degenerate primers used for the first genomic PCR (Lys-4F, Lys-7R). Initial "ATG" codon and termination codon "TAA" are double-overlined and double-underlined, respectively. Potential poly(A) addition signal "AATAAA" is wavy underlined. Dots above nucleotides denote substitution positions observed between the analyzed cDNA clones. Three amino acids, which are the result of "non-silent" substitutions, are shaded (see Table V for details).

TABLE IV
Oligonucleotide primers used in this study Numbers in parentheses are nucleotide numbers. Underlines denote introduced mutations to make initiation codon ATG in 384F (Ch-5Ј) and termination codon TTA (antisense) for 405R (Nh-3Ј).

Primer
Length ( lectins to asialofetuin-agarose was inhibited by galactosamine and ␣-methylgalactoside as in the case of our preparation, whereas that of the latter was not inhibited by these saccharides. Since the reported molecular size (35 kDa) of LL35 was close to those of our annelid lectins (29)(30)(31), and specific antiserum raised against highly purified EW29 reacted significantly with LL35, 3  Although these annelid lectins were prepared essentially by the same strategy as used for galectins, partial peptide analyses suggested that they are not related to galectins but form another protein family having the key consensus motif Gly-X-X-X-Gln-X-Trp. In fact, the earthworm lectin could be extracted with both galactose and melibiose as well as lactose, whereas galectins in general have only weak affinity to the former saccharides. Although we added ␤-mercaptoethanol to the buffer for lectin extraction, it proved to be unimportant, because its removal did not significantly reduce the recovery or activity of the earthworm lectin. The result of cDNA cloning of EW29 was striking in several aspects. First of all, the lectin proved not to be a member of the galectin family, contrary to our expectation. Moreover, the annelid lectins were suggested to be members of a larger family that includes many other carbohydrate-recognition proteins. At least two groups of proteins have been shown to contain the multiple short consensus motif Gly-X-X-X-Gln-X-Trp (Fig. 8).   6. Construction and production of recombinant whole EW29. A, the recombinant earthworm lectin was expressed as a fusion product with ␤-galactosidase ␣-peptide in E. coli TOP10FЈ cells as described under "Experimental Procedures." B, E. coli cells were lysed by sonication, and the obtained extract was applied to an asialofetuinagarose column (bed volume, 10 ml) equilibrated with EDTA-MEPBS. After extensive washing of the column, the bound protein was eluted with EDTA-MEPBS containing 20 mM lactose. C, the peak fraction of the eluted fractions were separated by SDS-PAGE, and the protein was stained with silver.  (Fig. 6), both individual domains (Nh and Ch) were produced. E. coli extracts were applied to an asialofetuin-agarose column, and both flow-through, retarded, and bound fractions were analyzed by SDS-PAGE. A, the binding of Nh; there was no adsorption to the column, but the elution was strongly retarded. B, the binding of Ch; the adsorbed protein was eluted with 20 mM lactose from fraction 16.
(Sambucus sieboldina), S. sieboldina agglutinin (31), and sieboldin-b (32). Modeccin (33), abrin (34), and viscumin (35) are also this type of toxin found in other plants. The second group consists of various hydrolytic enzymes (see Fig. 8): (i) glycosidases Oerskovia xanthineolytica ␤-1,3-glucanase (36) and Streptomyces lividans xylanase A (37), and (ii) proteases Rarobacter faecitabitus protease I (38) and ␣ subunit of horseshoe crab coagulation factor G (␤ subunit is a serine protease; see Ref. 39). Notably, all members of the lectin group are known to be specific for galactose/N-acetylgalactosamine ex-cept for S. sieboldina agglutinin. Since S. sieboldina agglutinin binds to sialic acid that is ␣2,6-linked to galactose (Sia␣2,6Gal), it may be a variant having deviated from an ancestral galactose-specific protein. All of these members including EW29 consist of two homologous domains (14 -16 kDa), both of which retain carbohydrate binding activity as far as examined. This fact implies that acquirement of multivalency is critical for preservation of efficient lectin function(s). On the other hand, carbohydrate-binding properties are not known for most of the enzyme group. As a sole exception, the CRD of R. faecitabitus protease I has been shown to have mannose binding activity (38). The possibility is excluded that EW29 exists as a larger conjugate via disulfide bridge(s) like ricin and coagulation factor, because Western blotting detected only the 29-kDa species even when ␤-mercaptoethanol was omitted throughout the experiment.
X-ray crystallographic study has been carried out for ricin complexed with lactose (40). As described, the ricin B-chain has two galactose-binding sites corresponding to the two homologous domains. Amino acid residues forming the binding sites are as follows: Asp 22 , Gln 35 , Trp 37 , Lys 40 , and Asn 46 for Nterminal CRD, and Asp 234 , Asn 255 , and Tyr 248 (shown as inverted boldface letters in Fig. 8B). These residues are mapped to the first repeat segment of the N-terminal CRD and to the third one of the C-terminal CRD, respectively. Inclusion of tryptophan (or tyrosine) residues at the seventh position of the consensus motif is also intriguing, because such aromatic residues are frequently found in carbohydrate-recognition sites of various sugar-binding proteins, particularly in galactose-specific lectins (41,42).
Although total amino acid identities between these carbohydrate-recognition proteins were low (Ͻ10%), it is noteworthy that these CRDs had striking similarity in their segment architecture; all of the CRDs consisted of three segments each consisting of 40 -50 amino acids (5 kDa) and containing 2 consensus motifs in the latter positions. Since the segment size agrees with a moderate "exon" size (43,44), it is possible to speculate that the original CRD (15 kDa) was the result of several gene duplication events of this ancestral peptide, or "module," which is presumed to have had primitive carbohydrate binding activity. According to this scenario, one group of the descendants might have evolved as "lectins" (30 kDa) by acquiring divalency by further duplication of the 15-kDa CRD, whereas the other group might have evolved to "enzymes" by ligation with some peptides having catalytic domain(s). Their CRDs could have been useful, e.g. for facilitated binding to target cells covered by carbohydrate chains, as has been suggested for R. faecitabitus protease I (38). X-ray crystallography studies should support this hypothesis. "Ricin superfamily" or "R-type" lectin (where R indicates for ricin and repeat) seem to be appropriate as the name for this novel superfamily.
The present results on cDNA (genome) cloning showed that EW29 is encoded by multiple genes, i.e. genome polymorphism. The analyzed 8 cDNA and 9 genomic clones completely overlapped except for cDNA clone 2 (Table V), which differed from the major cDNA type (clones 1, 3, 5, and 10) at only two positions. Therefore, the actual number of EW29 genes in a single worm would not seem to far exceed four (i.e. the four types represented by cDNA clones 5, 7, 8 and 9; or genomic clones 1, 5, 2, and 3, respectively; see Table V). Difference among the four types of genes (transcripts) spans over the whole coding region at 29 positions, although most of them (26 cases) were silent substitutions. Even for the non-silent substitutions, they were rather homologous ones, and therefore, the generated three polypeptides would not be expected to show very different properties, although their stability or fine sugarbinding specificity might be different. In this regard, three closely similar bands detected by anti-tubifex antibody in the extraction experiment (Fig. 3) might represent these three polypeptides. On the other hand, the fact that half of the obtained cDNA clones were identical to clone 5 suggests that the corresponding gene is more strongly expressed than other genes (corresponding to cDNA clones 7, 8, and 9). Apparently, further functional studies are necessary to evaluate these variant genes.
The physiological function(s) of the investigated annelid lec-tins is not known. However, based on the observation that LL35 is localized in leech photoreceptors in sensory afferents, Zipser and colleagues (45) speculated that the lectin has some function in the axon targeting in the central nervous system of this organism. At the moment, it is not known whether the earthworm and other annelids have galectins or not. However, if they do, such galectins would not seem to be dominant with respect to the investigated lectins. In this context, the hypothesis described under the Introduction has again emerged that absence of galectins in some invertebrate phyla can be compensated by similar but distinct types of galactose-binding lectins. Inversely, it is of interest to note that no homologues to the investigated annelid lectins have been reported so far in vertebrates and nematodes, in which galectins are dominant. Alternatively, galectins do exist in almost all animal species but cannot be extracted with simple saccharides such as lactose, as has been demonstrated in some tandem repeat-type galectins (46,47). In any case, the present study has shown the occurrence of relatively abundant, novel galactose-binding lectins in the phylum Annelida. Our results add further evidence of the ubiquitous occurrence of galactose-binding proteins in multicellular organisms and thus evidence for fundamental importance of "galactose recognition."