Novel Fold and Carbohydrate Specificity of the Potent Anti-HIV Cyanobacterial Lectin from Oscillatoria agardhii*

Oscillatoria agardhii agglutinin (OAA) is a recently discovered cyanobacterial lectin that exhibits potent anti-HIV activity. Up to now, only its primary structure and carbohydrate binding data have been available. To elucidate the structural basis for the antiviral mechanism of OAA, we determined the structure of this lectin by x-ray crystallography at 1.2 Å resolution and mapped the specific carbohydrate recognition sites of OAA by NMR spectroscopy. The overall architecture of OAA comprises 10 β-strands that fold into a single, compact, β-barrel-like domain, creating a unique topology compared with all known protein structures in the Protein Data Bank. OAA sugar binding was tested against Man-9 and various disaccharide components of Man-9. Two symmetric carbohydrate-binding sites were located on the protein, and a preference for Manα(1–6)Man-linked sugars was found. Altogether, our structural results explain the antiviral activity OAA and add to the growing body of knowledge about antiviral lectins.

HIV infection occurs via virus-cell and cell-cell fusion mediated by the two viral envelope glycoproteins gp120 and gp41 (1)(2)(3). gp120 interacts with the CD4 receptor of the host cell, resulting in a conformational change in gp120 that eventually leads to the insertion of the fusion peptide of gp41 into the target membrane, causing membrane fusion (4). The gp120 glycoprotein is remarkably enriched in high mannose N-linked sugars (5), and novel avenues for controlling HIV infection may become available by targeting to the sugars of gp120.
Here, we determined the crystal structure of Oscillatoria agardhii agglutinin (OAA), 2 a recently discovered cyanobacterial lectin with potent anti-HIV activity (19); mapped the carbohydrate-binding sites on the protein by NMR spectroscopy; and determined its oligosaccharide specificity. Our results demonstrate that OAA is unique with respect to both its structure and specific carbohydrate binding. Altogether, this work provides the molecular basis for understanding the potent anti-HIV activity of OAA and may aid in its further development as a useful diagnostic and pharmacological reagent in the fight against HIV transmission.

EXPERIMENTAL PROCEDURES
OAA Expression, Purification, and Crystallization-A synthetic OAA gene encoding residues 1-133 (20) was cloned into the pET-26b(ϩ) expression vector (Novagen) using NdeI and XhoI restriction sites at the 5Ј-and 3Ј-ends, respectively. For protein expression, Escherichia coli Rosetta 2(DE3) cells (Novagen) were transformed with the pET-26b(ϩ)-OAA vector. Cells were initially grown at 37°C, induced with 1 mM isopropyl ␤-D-thiogalactopyranoside at 16°C, and grown for ϳ18 h at 16°C for protein expression. Isotopic labeling of the protein for NMR studies was carried out by growth in modified M9 minimal medium containing 15 NH 4 Cl and/or [ 13 C]glucose as the sole nitrogen and/or carbon source, respectively. For SeMet labeling, cells were grown in modified M9 minimal medium, and SeMet was added to the culture 1 h before induction at 100 mg/liter. For unlabeled samples, cells were cultured in LB medium.
Protein was prepared from the soluble fraction of E. coli after opening the cells by sonication, removal of cell debris by centrifugation, and dialyzing the supernatant overnight against 20 mM Tris-HCl buffer (pH 8.5). Further purification involved anion-exchange chromatography on a Q HP column (GE Healthcare) using a linear gradient of NaCl (20 -1000 mM) for elution, followed by gel filtration on Superdex 75 (GE Healthcare) in 50 mM sodium acetate, 100 mM NaCl, and 3 mM NaN 3 (pH 5.0). Purified protein fractions were collected and concentrated up to 40 mg/ml using Centriprep devices (Millipore). For crystallization and NMR, the buffer was exchanged to 20 mM Tris-HCl buffer, 100 mM NaCl, and 3 mM NaN 3 (pH 8.0) and with 20 mM sodium acetate, 3 mM NaN 3 , and 90:10% H 2 O/D 2 O (pH 5.0), respectively.
Crystallization trials were carried out by the sitting-drop vapor diffusion method at room temperature using drops consisting of 2 l of protein and 2 l of reservoir solutions. Well diffracting crystals were obtained in 1. Diffraction Data Collection and Structure Determination of OAA-X-ray diffraction data for the SeMet derivative and native crystals were collected at the SER-CAT facility sector 22-BM beam line of the Advance Photon Source at Argonne National Laboratory (Chicago, IL). The multiwavelength anomalous dispersion (MAD) data were collected at wavelengths corresponding to the leading edge, peak, and a high energy remote point of the anomalous scattering plot for selenium (0.9795, 0.9793, and 0.9718 Å, respectively). All diffraction data used for analysis were collected from the crystals grown in 1.2 M NaH 2 PO 4 /0.8 M K 2 HPO 4 (pH 5.5), 0.2 M Li 2 SO 4 , and 0.1 M CAPS (pH 10.5) given their better diffraction quality compared with those grown in 2.0 M (NH 4 ) 2 SO 4 (pH 5.4), 0.2 M Li 2 SO 4 , and 0.1 M CAPS (pH 10.5). Data for the SeMet and native crystals were collected up to 1.50 and 1.10 Å resolution, respectively. All diffraction data were processed, integrated, and scaled using d*TREK software (21) and eventually converted to MTZ format using the CCP4 package (22).
The selenium atom sites and MAD phases were automatically determined using the AutoSol program incorporated in Phenix (23). Initial model building was also automatically carried out using the AutoBuild program incorporated in Phenix (23). The automatically generated initial model was then examined, edited, and rebuilt using the program Coot (24). In all cases, only half of the protein sequence (66 residues) (see Fig.  1A) was considered, and positions that exhibited amino acid differences between the highly homologous sequence repeats were treated as glycines at this stage (see Results").
The final refined atomic coordinates for the complete OAA structure (code 3OBL) have been deposited in the Protein Data Bank. All structure figures were generated using the program Chimera (25). Pertinent data collection and refinement statistics are summarized in Table 1.
Carbohydrate Binding Studies by NMR Spectroscopy-Binding of Man-9 (V-Labs) was investigated at 25°C using 0.020 mM 15 N-labeled OAA in 20 mM sodium acetate, 3 mM NaN 3 , and 90:10% H 2 O/D 2 O (pH 5.0) by 1 H-15 N HSQC spectroscopy at 600 MHz. Because Man-9 is not readily available and very expensive, spectra were recorded for only two titration points at protein/Man-9 molar ratios of 1:0.8 and 1:2.4 (molar ratios of 1:0.4 and 1:1.2 for an individual binding site for sugar). OAA was also titrated with the disaccharide components of Man-9 at protein/sugar molar ratios up to 1:20. Twodimensional 1 H-15 N HSQC spectra were recorded after each addition of carbohydrate.

RESULTS
OAA Three-dimensional Structure-The atomic structure of OAA was determined by x-ray crystallography for the protein comprising Ala-2-Thr-133 (Fig. 1A). Although the protein starts at Ala-2, with Met-1 being completely removed by the E. coli N-terminal methionine aminopeptidase during protein expression (verified by NMR and mass spectrometry), we kept the numbering according to Sato and Hori (20) for consistency.
Initial indexing of diffraction data from the SeMet analog crystal was for the tetragonal system, space group I4 1 , with unit cell dimensions of a ϭ 59.55 and c ϭ 42.66 Å. Although the data processed well in this system with a reasonable merging R value, only half of a protein molecule per asymmetric unit could be accommodated, as a full molecule leads to an extremely low or negative solvent content based on a Matthews probability calculator. Thus, the tetragonal space group was physically impossible. However, the fact that the data merged and scaled well in this space group, permitting only half of a molecule in an asymmetric unit, implied that the protein molecule itself must have a high degree of internal 2-fold symmetry. This was not totally surprising given the high sequence similarity between the two halves of the OAA sequence, with ϳ77% identity (51/66 residues) and ϳ86% similarity (57/66 residues) (Fig. 1A).
As expected from the Matthews probability calculation, solution of the structure via the MAD tetragonal data at 1.50 Å resolution revealed a symmetric molecule with half of a chain per asymmetric unit (supplemental Fig. S1A) and disorder at places with sequence differences. To determine the entire molecular structure, diffraction data were collected up to 1.10 Å resolution from a native crystal and were indexed in lower symmetry space groups, both the monoclinic C2 space group, with unit cell dimensions of a ϭ 84.21, b ϭ 42.77, and c ϭ 59.58 Å and ␤ ϭ 134.88°, and the triclinic P1 space group, with unit cell dimensions of a ϭ 42.75, b ϭ 47.28, and c ϭ 47.38 Å and ␣ ϭ 78.16°, ␤ ϭ 62.99°, and ␥ ϭ 63.14°. The halfmolecule determined from the MAD data was then utilized as a molecular replacement probe using the program Phaser (30) for the native crystal in both the monoclinic C2 and triclinic P1 space groups, and apparent molecular replacement solutions were readily obtained in both cases. Note that two and four half-molecules are expected to be present in the monoclinic C2 and triclinic P1 asymmetric units, respectively. Surprisingly, the two half-molecules located in the monoclinic C2 space group asymmetric unit are arranged as two independent units (supplemental Fig. S1B) instead of forming an intact protein chain. The other half-molecules that complete the entire cell are symmetry-related (supplemental Fig. S1C) and cannot be connected to form an intact single chain, even when all half-molecules, symmetry operations, and cell translations are considered. On the other hand, for the molecular replacement solution using the high resolution data in the triclinic cell, four half-molecules were indeed found that properly paired up into two intact molecules, likely constituting the correct solution. Upon electron density map examination, this map also clearly revealed the few expected sequence differences in the two halves of the molecule, thus confirming the structure. This new model was ultimately refined using the REFMAC program (31) in the CCP4 package (22) to a resolution of 1.2 Å, with working and free R factors of 14.0 and 16.9%, respectively ( Table 1).
Within each sequence repeat, the linkers connecting strands ␤2 and ␤3 and strands ␤7 and ␤8, respectively, pass across the top or bottom of the barrel. Therefore, the first two ␤-strands of each sequence repeat (␤1-␤2 and ␤6-␤7) and the next three ␤-strands (␤3-␤4-␤5 and ␤8-␤9-␤10) are positioned on opposite sides of the barrel (Fig. 1, C and D). In this manner, the two ␤-strands from the first sequence repeat (␤1-␤2) are positioned between strands ␤6 and ␤7 on one side and strands ␤8, ␤9, and ␤10 on the other side of the second se-FIGURE 1. Amino acid sequence and crystal structure of OAA. A, amino acid sequence alignment of the two sequence repeats in OAA. The first and second sequence repeats comprise Met-1-Leu-66 and Asn-69 -Thr-133, respectively, with a Gly-67-Asn-69 three-residue linker between them. All conserved amino acids (51/66) are in black, whereas different but similar residues are shown in magenta. B, ribbon representation of the triclinic P1 crystal structure, including the four bound CAPS molecules in stick representation. Two molecules of OAA are present in the asymmetric unit, and the first and second molecules are colored dark and light blue, respectively. C, ribbon representation of the structure comprising the two sequence repeats in one of the monomers. The first and second repeats are colored blue and light gray, respectively. All ␤-strands are numbered 1-5 (first repeat) and 6 -10 (second repeat), and the connecting region between strands ␤5 and ␤6 is colored orange. The pseudo 2-fold symmetry axes are indicated. D, best fit superposition of the backbone C␣ atoms for the two sequence repeats in ribbon representation using the same color scheme as in C.
quence repeat (Fig. 1C). Similarly, strands ␤3, ␤4, and ␤5 are located between two and three ␤-strands, respectively, of the other sequence repeat (Fig. 1C). The swap of ␤-strands between the two sequence repeats creates an almost perfect C2 symmetric arrangement, with the conformation of the five ␤-strands in each sequence repeat being extremely similar (Fig. 1D).
Comparison of OAA with Other Protein Structures Reveals Its Novel Fold-An automated search of the Protein Data Bank using the entire structure (the 10-stranded ␤-barrel) (see Fig. 1C) or the structure of an individual sequence repeat (the chain of five consecutive ␤-strands) (see Fig. 1D) with the program DALI (33) for related structures did not uncover any proteins with significant overall structural similarity (Z  Data were obtained from the best diffracting crystal according to crystallization conditions (see "Results" for details). b The MAD data in the tetragonal I4 1 space group were used as a molecular replacement probe using the program Phaser for the native crystal. c Refinement was carried out using REFMAC5. d Values in parentheses are for the highest resolution shell. e r.m.s.d., root mean square deviation. score Ͼ 4.3). This may not be surprising given the lack of amino acid sequence similarity to any known sequences for which structures are available. The closest match for the entire structure of OAA is the quinohemoprotein amine dehydrogenase (QAD; residues 173-272; Protein Data Bank code 1JJU) (34), for which 67 residues can be superimposed onto the domain of OAA with a C␣ atomic root mean square deviation of ϳ2.7 Å (Fig. 3A). The other top three structures predicted by DALI are rhizavidin and structures of avidin homologs (provided in supplemental Fig. S3). Their folds are similar to that of QAD but clearly somewhat different from that of OAA.
We also submitted the OAA structure to the PDBeFold server, searching for homologous structures. Again, no significant match was found (Q score Ͻ 0.14). Similar to the results of the DALI search, the closest match found by PDBeFold was rhizavidin (Protein Data Bank code 3EW1) and other avidin homologs (supplemental Fig. S3).
Although the overall ␤-barrel shape of OAA is similar to that of QAD, the secondary structure topology is clearly different. In QAD, the eight antiparallel ␤-strands are arranged contiguously in space, without any crossover loops between the strands (Fig. 3B). In OAA, the 10 antiparallel ␤-strands contain two crossovers: between strands ␤2 and ␤3 and be- tween strands ␤7 and ␤8. Thus, strand ␤3 is flanked by strands ␤4 and ␤8, and strand ␤7 is flanked by strands ␤2 and ␤6 (Fig. 3D). A top view of the 10 antiparallel ␤-strands in OAA is shown in Fig. 3E. Loops that cross over the top and bottom of the ␤-barrels are structural hallmarks of the jellyroll fold found in may lectins. Therefore, even if no close match to previously determined lectin structures is found, the general architecture is clearly of the lectin jellyroll type. However, given the different connectivities between the ␤-strands in OAA, we believe that the particular secondary structure arrangement seen here is novel and has not been observed for any lectin to date.
Structural Basis for Man-9 Binding and Carbohydrate Specificity of OAA-The two-dimensional 1 H-15 N HSQC spectrum of OAA (Fig. 4) exhibited well dispersed and narrow resonances indicative of a native folded structure. Complete backbone assignments were obtained using three-dimensional HNCACB, CBCA(CO)NH, and 1 H-15 N NOESY HSQC spectra. All expected amide backbone resonances were observed, except for Asn-69, which is broad and cannot be detected under the conditions described under "Experimental Procedures." Note that the two amide resonances of Gly-26 and Gly-93 are significantly upfield-shifted in their proton frequencies (Fig. 4, inset).
With the structure in hand and NMR assignments available, it is possible to directly investigate Man-9 binding and the specificity of OAA for carbohydrates using 1 H-15 N HSQC NMR titrations. Titration data sets were recorded for 15 Nlabeled OAA at 0.020 mM, and spectra in the absence and presence of ϳ0.016 mM Man-9 (OAA/Man-9 molar ratio of 1:0.8) and ϳ0.048 mM Man-9 (OAA/Man-9 molar ratio of 1:2.4) are provided in Fig. 5. As reported previously (19), OAA comprises two carbohydrate-binding sites. Therefore, Man-9/ binding site ratios of 0:1 (free), 0.4:1, and 1.2:1 were used.
Chemical shift mapping of 1 H and 15 N resonances for free and Man-9-bound OAA allows direct delineation of the ligand-binding sites on the protein as well as determination of apparent binding affinity. As evidenced by the data in Fig. 5  Their unusually high field-shifted proton resonance frequencies are most likely due to hydrogen bond formation between these amide protons and the -electron clouds of the indole rings of Trp-90 and Trp-23, respectively (inset). Three residue pairs (Ser-40/Ser-107, Gly-41/Gly-108, and Asp-42/Asp-109, labeled in red) were not unambiguously assigned because their C␣/C␤ chemical shifts are also degenerate. All side chain resonances are colored green (Trp, Arg, Asn, and Gln), and the amide resonance of Asn-69 was too broad for detection under the present sample conditions (20 mM sodium acetate, 3 mM NaN 3 , and 90:10% H 2 O/D 2 O (pH 5.0) at 25°C).
Structural mapping of the Man-9-interacting residues onto the structure of OAA revealed two areas on opposite ends of the molecule (Fig. 6A). One is located at the top and the other at the bottom in the space-filling representation shown in Fig.  6A. All residues involved in Man-9 binding (see above) are colored blue and magenta, with the magenta ones also being affected by the Man␣(1-6)Man disaccharide. This subset of residues lies along the short narrow clefts within the two binding surfaces, possibly explaining why only the Man␣(1-6)Man disaccharide and not other possible disaccharides of Man-9 can be recognized by OAA.

DISCUSSION
We have determined the 1.2 Å x-ray structure of OAA, an HIV-inactivating cyanobacterial lectin. In contrast to previous suggestions that implied a similar structure to bacterial lectins or to those of cyanobacteria or marine algae, such as Eucheuma serra agglutinin 2 (ESA-2) and Myxobacterium hemagglutinin (MBHA) (19), the three-dimensional structure of OAA is clearly distinct from any known lectins. Indeed, searches for fold similarity using the DALI server failed to uncover any homologs. Therefore, to the best of our knowledge, OAA can be regarded a novel lectin fold. The structural novelty of OAA is intimately related to the unique tandem primary sequence repeat of OAA.
Inspection of the OAA structure reveals that each sequence repeat does not constitute an individual domain. This is similar to what was found for other sequence repeat-containing lectins, such as cyanovirin-N and members of the CVNH family (35)(36)(37). However, the exchange of ␤-strands between the two sequence repeats is quite different in the two cases. In OAA, the interchange of ␤-strands creates a ␤-barrel in which all neighboring ␤-strands are hydrogen-bonded. In cyanovirin-N and other CVNH proteins, the exchange creates a structure in which two hairpins are positioned on top of the two three-stranded ␤-sheets, thereby creating a bilobal structure.
As described above, two amide resonances of OAA (Gly-26 and Gly-93) exhibit unusual, upfield-shifted proton reso-nances in the two-dimensional 1 H-15 N HSQC spectrum (Fig.  4). Inspection of the crystal structure reveals that the corresponding amide hydrogens are located right above the indole rings of Trp-90 and Trp-23 (Fig. 4, inset). Thus, -hydrogen bonds can be formed between the amide protons of these two Gly residues and the -electron cloud of Trp, a feature discussed previously (38,39), but not often observed.
NMR spectroscopy allowed us to determine which of the OAA residues were involved in Man-9 binding. Delineation of two binding sites on the protein for carbohydrates revealed their location in two crevices on the surface of the protein, positioned symmetrically at two ends. In addition, we also determined which disaccharide linkage is recognized by OAA: the protein preferentially binds to Man␣(1-6)Man compared with all other disaccharide linkages. A schematic model of the OAA/Man-9 recognition for one of the two binding sites is depicted in Fig. 6 (B and C), differing in the two possible orientations of the two GlcNAc moieties. In either case, one or both of the Man␣(1-6)Man linkages can be accommodated in the binding site cleft formed by the loops connecting strands ␤1-␤2, ␤7-␤8, and ␤9-␤10 in site 1 or by the loops connecting strands ␤6-␤7, ␤2-␤3, and ␤4-␤5 in site 2, respectively.
Because OAA recognizes the Man␣(1-3)Man␣(1-6)Man␤(1-4)GlcNAc␤(1-4)GlcNAc pentasaccharide, one may speculate that specificity determinants similar to those seen for MVL and Man␣(1-6)Man␣(1-3)Man␤(1- Residues whose amide resonances were affected by Man-9 binding are colored blue and magenta in the surface rendering of the OAA structure, whereas those perturbed solely upon Man␣(1-6)Man disaccharide addition are colored magenta. Note that the magenta-colored residues constitute a subset of the blue residues. Non-perturbed resonances are colored light gray, and Ala-2 is colored orange. B and C, two possible binding scenarios for Man-9. Mannose and GlcNAc moieties of Man-9 are depicted as hexagons and ovals, respectively. The two Man␣(1-6)Man linkages in Man-9 that are most likely positioned in the cleft on the surface of OAA are indicated by the light gray dashed lines. 4)GlcNAc␤(1-4)GlcNAc pentasaccharide recognition may exist (16). However, inspection of the latter complex structure revealed quite different sugar determinants. Key components in carbohydrate recognition by MVL are the two GlcNAc moieties that sit in a deep pocket of MVL, supplemented by additional interactions between the ␣(1-6)-linked mannoses. This renders the Man␣(1-6)Man␤(1-4)GlcNAc␤(1-4)GlcNAc tetrasaccharide as the central element for recognition (16). In the present OAA structure, on the other hand, the binding cleft is too short to accommodate a tetrasaccharide. This was borne out by our titration data that revealed no interaction between the protein and the GlcNAc␤(1-4)GlcNAc disaccharide. Therefore, OAA recognizes mainly either of the two Man␣(1-6)Man disaccharide units, imbedded within the pentasaccharide glycan, resulting in a distinctly different mode of OAA binding to Man-9, compared with MVL.
Interestingly, the Man-9-binding sites in OAA are close to the positions of the bound CAPS molecules in the free protein structure. A similar observation was made for cyanovirin-N, where, in the crystal structure with high mannose oligosaccharides, a well defined CHES molecule was bound in one of the sugar-binding sites (13). Both ionic and hydrophobic interactions between CAPS and OAA are observed, with the cyclohexyl ring and the hydrophobic chain of CAPS packing against the aromatic rings of the Trp-10 or Trp-77 side chains. The sulfate moiety forms a hydrogen bond with the backbone amide proton as well as a water-mediated one with the backbone carbonyl oxygens of Gly-124 or Gly-57. Because CHES and CAPS are structurally related molecules, similar types of interactions are observed in both cases, although binding of CHES to cyanovirin-N (13) or CAPS to OAA is very weak. Therefore, the presence of these organic molecules in the structures is simply a result of their inclusion in the crystallization buffers. Nevertheless, in certain cases, they may indicate the positions of the actual sugar ligands. Having CAPS molecules bound close to the Man-9-binding sites on OAA may also explain our unsuccessful efforts in obtaining a structure of the OAA⅐Man-9 complex through either soaking or co-crystallization.
In summary, we have elucidated the tertiary structure of OAA and investigated its carbohydrate specificity. Our results provide insights into the molecular basis and a mechanistic understanding of the protein's potent anti-HIV activity. In turn, our data may be useful for the development of OAA as a novel reagent in the quest to combat HIV transmission.