Sialylated oligosaccharide-specific plant lectin from Japanese elderberry (Sambucus sieboldiana) bark tissue has a homologous structure to type II ribosome-inactivating proteins, ricin and abrin. cDNA cloning and molecular modeling study.

Bark lectins from the elderberry species belonging to the genus Sambucus have a unique carbohydrate binding specificity for sialylated glycoconjugates containing NeuAc(α2-6)Gal/GalNAc sequence. To elucidate the structure of the elderberry lectin, a cDNA library was constructed from the mRNA isolated from the bark tissue of Japanese elderberry (Sambucus sieboldiana) with gt11 phage and screened with anti-S. sieboldiana agglutinin (SSA) antibody. The nucleotide sequence of a cDNA clone encoding full-length SSA (LecSSA1) showed the presence of an open reading frame with 1902 base pairs, which corresponded to 570 amino acid residues. This open reading frame encoded a signal peptide and a linker region (19 amino acid residues) between the two subunits of SSA, the hydrophobic (A-chain) and hydrophilic (B-chain) subunits. This indicates that SSA is synthesized as a preproprotein and post-translationally cleaved into two mature subunits. Homology searching as well as molecular modeling studies unexpectedly revealed that each subunit of SSA has a highly homologous structure to the galactose-specific lectin subunit and ribosome-inactivating subunit of plant toxic proteins such as ricin and abrin, indicating a close evolutionary relationship between these carbohydrate-binding proteins.

Plant lectins with defined carbohydrate binding specificities have been isolated from various origins and used as invaluable tools for the detection, fractionation, and isolation of glycoconjugates. The biological roles of these plant lectins, however, are still not clear compared with some animal or microbial lectins that have been shown to play important roles in biological recognition systems. Structural studies on these molecules may provide useful information not only on the molecular basis for the binding specificity, but also on their biological function through the comparison of their structure with other functional proteins.
Bark lectins from the elderberry species belonging to the genus Sambucus specifically bind to NeuAc(␣2-6)Gal/GalNAc sequence (1)(2)(3) and have been used as a useful tool for the analysis of sialylated glycoconjugates (4 -8). These lectins are tetrameric glycoproteins consisting of two types of subunits, one with a carbohydrate-binding site and one with an unknown function (1,3). The elderberry bark lectin behaved basically as a galactose-binding lectin, a most common group of plant lectins, but the affinity of the elderberry lectins for the sialylated galactose unit increased several thousandfold compared with the galactose itself. This was found to be true only when the sialic acid attached to the 6-position of the galactopyranose (1,3). This remarkable increase in affinity, which was caused by the attachment of sialic acid at the specific site of the Gal/ GalNAc residue, prompted us to elucidate the molecular basis for such a unique binding specificity as well as the possible evolutionary relationships to other galactose-binding lectins.
We report here that the molecular cloning of a cDNA encoding the bark lectin from Japanese elderberry (Sambucus sieboldiana agglutinin (SSA)) 1 as well as molecular modeling studies unexpectedly revealed that each subunit of SSA has a highly homologous structure to the galactose-specific lectin subunit and ribosome-inactivating subunit of plant toxic proteins such as ricin and abrin. This would indicate a close evolutionary relationship between these carbohydrate-binding proteins. We also show that both subunits that constitute the tetrameric lectin molecule are encoded on a single mRNA, which suggests the presence of post-translational processing to form two mature subunits.

EXPERIMENTAL PROCEDURES
Plant Material-The twigs of Japanese elderberry were collected weekly at the beginning of October from the plantation of the National Institute of Forestry and Forest Product Research Institute. The bark tissue removed from the twigs was immediately frozen in liquid nitrogen and kept at Ϫ80°C until use.
Preparation of cDNA Library, Cloning, and Nucleotide Sequencing-Total RNA was extracted from the bark tissue using the guanidinium thiocyanate procedure (9), and the poly(A ϩ ) RNA fraction was obtained by oligo(dT)-cellulose column chromatography (Pharmacia LKB Biotech, Uppsala, Sweden). The cDNA library was constructed with double-stranded cDNAs with the expression vector gt11. The cDNA library was amplified with Escherichia coli Y1090 and screened with an * This work was supported in part by Grant-in-aid BMP 95-V-4-2 (Bio Media Program) from the Ministry of Agriculture, Forestry, and Fisheries. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EMBL Data Bank with accession number(s) D25317.
§ To whom correspondence should be addressed. Tel.: 81-298-38-8364; Fax: 81-298-38-8397. affinity-purified anti-SSA antibody until individual plaques could be identified and isolated (at least three times). The nucleotide sequence of subcloned cDNA in the BamHI site of a Bluescript II KS ϩ plasmid vector was analyzed using an automatic DNA sequencing system (Model 373A, Applied Biosystems, Inc., Foster City, CA). The deleted cDNA mutant clones in the Bluescript vector on both strands were prepared using a deletion kit (Takara Shuzo, Kyoto, Japan) according to the manufacturer's instructions.
Northern and Southern Blot Analyses-Northern blot analysis was carried out with total RNA (10 g) from the bark as described (9). Southern blot analysis was performed as described previously (10). The [␣-32 P]dCTP-labeled cDNA probe was prepared by random primer labeling (11). The blots on the membrane were exposed to Kodak XAR films at Ϫ80°C for 2 days with intensifying screens.
Preparation of SSA and Anti-SSA Antibody-A bark lectin (SSA) was purified from the extract of the twigs of Japanese elderberry by affinity chromatography on fetuin-agarose as previously reported (2). The determination of lectin activity was conducted as described (7). The rabbit anti-SSA antiserum was prepared and purified by affinity chromatography on immobilized SSA as described previously (2). The anti-SSA antibody was further purified by passing through an affinity column of immobilized bark extract, which was devoid of SSA itself. This treatment was very effective in removing antibodies against glycan chains and increased the specificity of the antibody.
Cyanogen Bromide Cleavage of SSA-SSA (10 mg) was reduced and alkylated with 4-vinylpyridine as described previously (3). Cleavage of the methionine residues of alkylated SSA was carried out as previously reported (12). The resulting peptides were analyzed by two-dimensional polyacrylamide gel electrophoresis (PAGE) (13) and transferred to a nitrocellulose membrane. The nitrocellulose membrane was stained with Coomassie Brilliant Blue, and the peptide spots were cut and sequenced by a gas-phase protein sequencer (PPSQ-10, Shimadzu, Kyoto, Japan).

Determination of C-terminal Amino Acid Residues of SSA A-chain-
The SSA A-chain was isolated from reduced and alkylated SSA by reversed-phase HPLC using an Inertsil 300-C8 column (4.6 ϫ 100 mm; GL Science Inc., Tokyo, Japan) as previously reported (3). After the lyophilized SSA A-chain was dissolved with 5% SDS in 0.5 M citrate buffer (pH 5.6) and kept at 60°C for 20 min, the solution was diluted 10 times with distilled water and incubated with 1:200 (w/w) carboxypeptidase Y (Boehringer Mannheim GmbH) at 25°C. Aliquots of the reaction mixture were taken at appropriate time intervals, added to a sulfosalicylic acid solution (final concentration of 1%), kept for 15 min at 25°C, and centrifuged. The released amino acid in the supernatant was identified using a Hitachi Model L-8500A amino acid analyzer.
In Vitro Translation Experiment-In vitro protein synthesis was carried out with total RNA (2.9 g) using rabbit reticulocyte lysate (Amersham Japan, Tokyo, Japan) according to the manufacturer's protocol. The radioactive bands were detected by autoradiography.
Molecular Modeling of SSA and Ricin-The molecular model of SSA was constructed using the crystal structure of ricin obtained from the Protein Data Bank as template, which was refined in a crystallographic sense (14 -17). The amino acid residues of SSA were aligned with the sequence of ricin to optimize identities. Amino acid substitutions, insertions, and deletions in the structure of ricin were performed using the program Quanta 4.0 on the Iris Indigo workstation (Silicon Graphics Inc.). The model for SSA was subjected to 350 cycles of energy minimization to reduce the number of unfavorable van der Waals contacts.
Analysis of Inhibition of in Vitro Protein Synthesis-Inhibition of in vitro protein synthesis by SSA was analyzed by the method of Rojo et al. (18). Monomeric monovalent SSA (MSSA) derivatives were prepared as described previously (7) and used for the same assay. Varying amounts of SSA or MSSA were mixed with rabbit reticulocyte lysate (Sigma) in 20 mM Tris-HCl buffer (pH 7.8) containing L-[ 3 H]valine (4 Ci/ml). 5 mM dithiothreitol, 50 mM KCl, and 1.5 mM MgCl 2 and incubated for 20 min at 30°C. The 3 H-labeled translation products were precipitated with alkali/trichloroacetic acid and collected on glass-fiber filters (Whatman GF/A). The radioactivity of the filters was determined by liquid scintillation counting.

Cloning of cDNA Encoding Japanese Elderberry Bark Lectin
(SSA)-Total RNA (4.4 mg from 10 g of fresh tissue) was prepared from the bark tissue of Japanese elderberry (S. sieboldiana) at the beginning of October, when the tissue was actively synthesizing SSA (data not shown). Poly(A ϩ ) RNA was obtained by oligo(dT)-cellulose chromatography. In vitro translation using rabbit reticulocyte lysate followed by coprecipitation with anti-SSA antibody showed the synthesis of one major band corresponding to M r 58,000, along with several weaker bands (Fig. 1).
A bark cDNA library was constructed using the EcoRI site of the expression vector gt11 phage with the double-stranded cDNA and was screened three times with an affinity-purified anti-SSA antibody. Four positive clones were obtained and subcloned into the BamHI site of the Bluescript II KS ϩ plasmid vector. Sequence as well as Southern blot analyses of these four cDNA clones showed that these clones had significant overlapping regions, indicating that they were derived from the same gene. Arrangement of the sequence of these four clones yielded a 834-base pair sequence, which corresponded to 278 amino acid residues. Portions of the deduced sequence coincided with those of two internal peptides isolated by CNBr cleavage of SSA, suggesting that these clones encoded the cDNA of SSA. However, none of these four clones contained the region corresponding to the N-terminal sequences of the two subunits of SSA, indicating that they were not full-length clones. To isolate a full-length clone, the cDNA library was rescreened by plaque hybridization using a probe of the 243-base pair nucleotide corresponding to the 5Ј-terminal region of the 834-base pair sequence. Seven positive clones, each 2000 base pairs in size, were isolated. Southern blot analyses indicated that these clones belonged to the same group. Analysis of a clone (Lec-SSA1) with the longest insertion showed a sequence of 1902 nucleotide base pairs with an open reading frame encoding a polypeptide with 570 amino acid residues (Fig. 2).
Alignment of the known sequences of the N-terminal regions as well as the internal peptides of SSA revealed that both subunits of SSA were encoded in this open reading frame (Fig.  2). This indicated that a precursor polypeptide synthesized from the mRNA corresponding to this open reading frame was post-translationally cleaved into two subunits. From the Nterminal sequences of the two subunits of SSA (3), it was shown that the hydrophobic subunit (SSA A-chain) was encoded at the 5Ј-terminal side of the cDNA and that the hydrophilic subunit (SSA B-chain) was encoded at the 3Ј-terminal side. To determine the coding region for the first subunit, the SSA A-chain, the C-terminal sequence of the mature A-subunit was analyzed. Reduced and alkylated A-subunit was isolated by reversed-phase HPLC using conditions similar to those reported previously (3). Carboxypeptidase Y treatment of the A-subunit liberated serine, threonine, and valine successively, which corresponded to the sequence of Val 287 -Ser 289 in the structure of the precursor polypeptide (Fig. 2). Combining this information with the known N-terminal sequences of both subunits, coding regions for the A-and B-subunits were determined as Val 29 -Ser 289 and Gly 309 -Ala 570 , respectively. These results also showed the presence of the linker peptide portion, Ser 290 -Arg 308 , between the A-and B-subunits.
Hydropathy plot analysis (Fig. 3) as well as the N-terminal sequence of the hydrophobic subunit indicated the presence of a signal peptide consisting of 28 amino acid residues (Met 1 -Arg 28 ; Fig. 2). Thus, SSA is synthesized as a single preproprotein and processed into two mature subunits by post-translational removal of the signal peptide and the internal linker peptide between the two subunits. The hydropathy plot also supported the identification of the region for the hydrophobic and hydrophilic subunits as described above.
The calculated molecular weights of the A-and B-subunits were 28,774 and 29,055, respectively. The discrepancy between these values and those previously obtained for the two subunits by SDS-PAGE, 31,000 and 35,000, may be explained by the presence of sugar chains in each subunit (2).

Comparison of SSA Sequence with Other Protein Sequences-
The deduced amino acid sequence of the SSA precursor showed extensive homology to the sequences of the precursor proteins of the well known plant toxic proteins abrin and ricin, which are composed of a galactose-specific lectin subunit and a toxin subunit with RNA N-glycosidase activity (Fig. 4). The hydrophilic subunit of SSA (SSA B-chain) showed 46.3 and 44.7% identity to the lectin subunits of ricin and abrin, respectively. The hydrophobic subunit (SSA A-chain) also showed 34.8 and 40.1% identity to the toxin subunits of ricin and abrin, respectively. Moreover, the amino acid residues that have been reported to be highly conserved among these ribosome-inactivating proteins were completely conserved within the SSA sequence, except Gln 27 , which replaced the conserved arginine residue in the other ribosome-inactivating proteins (Fig. 4). Comparison of the SSA sequence with that of the castor bean lectin (Ricinus communis agglutinin (RCA)) precursor (19) also showed the presence of high homology (45.2% identity). These results indicate that SSA belongs, at least structurally, to the family of these toxin/lectins and probably originated from the same ancestral gene(s), although the carbohydrate binding specificity is significantly different compared with these proteins.
Ricin is a glycoprotein and contains two asparagine-linked oligosaccharides at Asn 374 and Asn 414 in the B-chain and another at Asn 10 in the A-chain (20). Both SSA subunits were also shown to be glycosylated by periodic acid-Schiff staining of the SDS-polyacrylamide gel as well as by lectin blotting using several horseradish peroxidase-labeled lectins (data not shown). The sequence of SSA indicated the presence of six potential N-glycosylation sites in the A-chain and two sites in the B-chain, although none of them coincided with the glycosylation sites of ricin, nor were the real glycosylation sites identified.
Molecular Modeling-A computer-assisted, three-dimensional model of SSA was constructed based on the crystal structure of ricin (14 -17). The amino acid residues of both subunits of SSA were aligned with those of the corresponding subunits of ricin to optimize identities. The amino acid substitutions, insertions, and deletions were made on this model, which was further stabilized using an energy minimization program. Fig. 5 shows the ␣-carbon backbone structures of ricin (panel B) and SSA (panel A) thus obtained, indicating the clear similarity in overall structure of these proteins, even though ricin has 15 additional residues in the whole molecule.
The disulfide linkages in the SSA molecule were generated based on their corresponding position in ricin as the positions of the cysteine residues in the primary structure of SSA coincided well with those in ricin, except for Cys 351 in the SSA B-chain. One of these disulfide linkages (Cys 284 -Cys 316 ) connects the Aand B-subunits, and the other is present within the B-subunit. The free cysteine residue in the B-subunit (Cys 351 ) that was not found in the ricin B-chain was located on the surface of the SSA molecule (Fig. 5A, indicated by the arrow).
Biological Activity of SSA-The effect of intact SSA and also dissociated and stabilized subunits of SSA (MSSA derivatives) on in vitro protein synthesis was analyzed using rabbit reticulocyte lysate. SSA and MSSA showed only a very weak inhibiting activity (IC 50 ϭ 985 and 540 ng/ml, respectively) in this system ( Table I). The inhibitory potencies of SSA and MSSA on in vitro protein synthesis were 2700 -5000-fold weaker than that of ricin and 100 -200-fold weaker than that of RCA. The IC 50 of SSA was also 100 -300-fold less potent than those of ribosome-inactivating proteins isolated from the bark tissue of plants belonging to the same genus, Sambucus, nigrin b and ebulin l (21,22). DISCUSSION Molecular cloning of the cDNA for SSA (LecSSA1) revealed that both the hydrophobic and hydrophilic subunits of SSA (SSA A-and B-chains) are encoded as a single preproprotein in the cDNA. From the C-and N-terminal sequences of the SSA Aand B-chains, the site for the post-translational cleavage was determined between Ser 289 and Ser 290 and also Arg 308 and Gly 309 (Fig. 2). The presence of similar post-translational processing has been reported for some lectins as well as for some storage proteins. Interestingly, however, the specificity of the putative endopeptidase for the cleavage of the SSA precursor seems to be different from those for ricin (20), RCA (19), abrin (23), other legume lectins (24,25), and plant storage proteins such as soybean glycinin (26) and pea legumin (27). A polypeptide corresponding to the approximate size of unprocessed A-B chain, although associated with some other minor bands, was detected by in vitro translation experiments using rabbit reticulocyte lysate, which may lack such a specific endopeptidase.
Although there is no direct evidence for the identification of a SSA subunit that carries the carbohydrate-binding site, the extensive homology of the hydrophilic subunit (SSA B-chain) to the lectin subunit of abrin/ricin suggests the presence of the binding site in this subunit. Structural similarity of the Bsubunit of SSA to that of ricin in the three-dimensional model (Fig. 5) also supports this. We recently found, by a chemical modification study, that histidine and tyrosine residues in SSA play an important role in the binding to sialylated oligosaccharides. 2 The absence of a histidine residue in the coding region for the hydrophobic subunit (SSA A-chain) further supports that the carbohydrate-binding site is located in the SSA B-chain.
Each subunit of SSA is connected to the other subunits by disulfide linkage to form the tetrameric glycoprotein molecule. As the SSA A-chain has only one cysteine residue (Cys 284 ) near the C terminus, it must form a disulfide linkage with a SSA B-chain. Otherwise, SSA cannot form the tetrameric molecule connected through disulfide linkage. On the other hand, the sequence of the SSA B-chain contains 10 cysteine residues. For the same reason, one of them should form a disulfide linkage with the SSA A-chain, and at least another one should participate in the cross-linkage between another SSA B-chain. The previous findings that the selective reduction and alkylation of the disulfide linkage between the SSA subunits yielded 1.4 pyridylethylated cysteines per subunit (7) indicate that the Aand B-subunits are connected through one disulfide linkage to form a tetrameric molecule in the form of A-B-B-A. Two cysteine residues, Cys 284 and Cys 316 , which connect the A-and B-subunits, could be assigned because of their conserved position in the corresponding subunits of ricin/abrin. Concerning the cysteine residue responsible for the connection between two B-subunits, the SSA B-chain was shown to contain one additional cysteine residue (Cys 351 ; Fig. 2) that is not present in the ricin B-chain. This additional cysteine residue in the SSA Bchain might be responsible for the connection of two B-subunits as these connections are not present in dimeric molecules such as ricin and abrin. The presence of this cysteine residue on the surface of the three-dimensional model of SSA further supports the possible involvement of this residue in the cross-linkage between the subunits (Fig. 5A).
The striking structural similarity of SSA to ricin/abrin-type ribosome-inactivating proteins as well as to RCA revealed the close evolutionary relationship of SSA to these toxic plant proteins (28,29), although elderberry is taxonomically very far from those plants that produce these toxins (for example, elderberry and R. communis, which produces ricin, belong to different subclasses). However, there are significant differences between the properties of SSA and these proteins. First, despite the fact that the three-dimensional model of the SSA B-chain suggested the presence of two domains corresponding to two carbohydrate-binding domains of the ricin B-chain (Fig.  5B), SSA has only one carbohydrate-binding site, probably in this subunit. Also, the carbohydrate binding specificity of SSA is significantly different from that of these toxic proteins. Ricin/ abrin-type toxic proteins or RCA is basically specific to Dgalactose residues, but the elderberry bark lectins including SSA are specific to the NeuAc(␣2-6)Gal/GalNAc sequence (1, 2 H. Kaku and N. Shibuya, unpublished data.  (19). Identical amino acid residues are indicated (*), as are conserved amino acid residues among ribosome-inactivating proteins (#) (31). The amino acid residues of ricin involved in the binding to galactose are also indicated (ϩ) (16).
3). Although it is difficult to indicate the amino acid residues responsible for such differences in the binding specificity at present, further comparison of their structure coupled with site-directed mutagenesis and chemical modification and crystallographic/NMR studies will eventually clarify why the elderberry lectins recognize 2,6-linked sialylated oligosaccharides so specifically.
The structural similarity of the SSA A-chain to the A-chain of ricin and abrin raised the question about the biological function of this subunit. The A-chain of ricin and abrin is a N-glycosidase that hydrolyses a very specific site of rRNA, resulting in the inhibition of protein synthesis at the ribosome (30 -32). The invariant amino acid residues known for most of the ribosome-inactivating proteins including those of bacterial origin (Shiga-like toxin) were conserved in the SSA A-chain, except for Gln 55 (33). However, SSA showed only a very weak (several thousandfold weaker than ricin) inhibitory activity against the in vitro protein synthesis of rabbit reticulocyte lysate (Table I). MSSA (free and stabilized subunits of SSA) also showed only a very limited activity, suggesting that the inability of SSA to terminate protein synthesis does not relate to its tetrameric structure and reflects its intrinsic property. SSA is also quite different compared with the structurally related tetrameric lectin RCA-120, which was reported to inhibit strongly the in vitro protein synthesis of rabbit reticulocyte lysate (34). In this context, it is noteworthy to point out the recent report of the presence of a new group of ribosomeinactivating proteins (RIPs), ebulin l and nigrin b, that have been isolated from the bark tissue of elderberry species (Table  I) (21,22). These proteins are composed of two different subunits and have a molecular size corresponding to that of ricin/ abrin, although their structure has not yet been elucidated. Interestingly, they could inactivate mammalian ribosomes in vitro, but were inactive on the cell itself. Actually, we recently discovered the presence of a similar RIP in bark extract from S. sieboldiana, 3 and this makes it difficult to determine whether the very weak inhibitory activity detected in the SSA preparation reflects the property of SSA itself or the contamination of a trace amount of such a RIP. Thus, it can be said that, despite the structural similarity to ricin/abrin-type toxins, SSA has only a very weak activity as a RIP or actually does not have such activity. Structural comparison of these proteins with 3   b MSSA was prepared as previously described (7). c Data are taken from Citores et al. (34). d Data are taken from Girbes et al. (22). e Data are taken from Girbes et al. (21).
SSA, ricin/abrin, and RCA combined with their biological activity will give more insight into their evolutionary relationship as well as the structure/function relationships of these proteins.