Recognition of the Thomsen-Friedenreich Pancarcinoma Carbohydrate Antigen by a Lamprey Variable Lymphocyte Receptor*

Background: Variable lymphocyte receptors (VLRs) bind tumor-associated carbohydrates, such as Thomsen-Friedenreich antigen (TFα), with high selectivity. Results: The crystal structure of a VLR-TFα complex coupled with thermodynamic analysis revealed the basis for selectivity. Conclusion: VLRs utilize leucine-rich repeats to recognize glycans with affinity comparable to that of lectins and antibodies. Significance: The VLR-TFα structure provides a template for engineering VLRs to bind biomedically relevant glycans. Variable lymphocyte receptors (VLRs) are leucine-rich repeat proteins that mediate adaptive immunity in jawless vertebrates. VLRs were recently shown to recognize glycans, such as the tumor-associated Thomsen-Friedenreich antigen (TFα; Galβ1–3GalNAcα), with a selectivity rivaling or exceeding that of lectins and antibodies. To understand the basis for TFα recognition by one such VLR (VLRB.aGPA.23), we measured thermodynamic parameters for the binding interaction and determined the structure of the VLRB.aGPA.23-TFα complex to 2.2 Å resolution. In the structure, four tryptophan residues form a tight hydrophobic cage encasing the TFα disaccharide that completely excludes buried water molecules. This cage together with hydrogen bonding of sugar hydroxyls to polar side chains explains the exquisite selectivity of VLRB.aGPA.23. The topology of the glycan-binding site of VLRB.aGPA.23 differs markedly from those of lectins or antibodies, which typically consist of long, convex grooves for accommodating the oligosaccharide. Instead, the TFα disaccharide is sandwiched between a variable loop and the concave surface of the VLR formed by the β-strands of the leucine-rich repeat modules. Longer oligosaccharides are predicted to extend perpendicularly across the β-strands, requiring them to bend to match the concavity of the VLR solenoid.

The normal processes of glycosylation are often severely altered during oncogenic transformation, such that nearly all human tumors express glycoproteins with aberrant glycosylation patterns (1). These changes can affect interactions between tumor cell surface glycans and glycan binding receptors, which may determine the metastatic potential of the cancer cell. The best-studied tumor-associated carbohydrates are truncated glycans, such as T-nouvelle antigen (GalNAc␣) and Thomsen-Friedenreich antigen (TF␣ 4 ; Gal␤1-3GalNAc␣), which are linked to serine or threonine as O-glycans (1). These saccharides decorate mucin-type glycoproteins in ϳ90% of human cancers but are absent from nearly all normal tissues (1,2). A number of carbohydrate binding lectins and antibodies have been developed to detect expression of T-nouvelle antigen, TF␣, and other cancer-specific glycans for diagnostic and prognostic applications (2)(3)(4). However, most of these reagents display either broad specificity or low affinity for their target glycans and, therefore, have limited clinical utility (5,6).
A variety of approaches have been pursued for obtaining proteins that bind cancer-specific carbohydrates and other biomedically relevant glycans with high selectivity and affinity. Most often, monoclonal antibodies are isolated from mice immunized with the glycan of interest, but this process is laborious, and most human glycans are not particularly immunogenic in mice, probably due to their conserved structures in these species, which gives rise to self-tolerance. Other strategies include directed evolution of antibodies (7) or lectins (3,8), * This work was supported, in whole or in part, by National Institutes of Health (NIH) Grants AI036900 (to R. A. M.) and AI083892 (to Z. P.) and by the intramural research program of the NIH, NCI (to J. J. B. and J. C. G.). This work was also supported by National Science Foundation Grant MCB-0614672 (to Z. P.). The atomic coordinates and structure factors (codes 4K79 and 4K5U) have been deposited in the Protein Data Bank (http://wwpdb.org/). 1 Both authors contributed equally to this work. 2  carbohydrate binding peptides (9), and glycan-specific aptamers (10), but none has proven generally useful. We recently described a rapid and cost-effective strategy using yeast surface display for isolating monoclonal variable lymphocyte receptors (VLRs) from lamprey that selectively bind various human glycans, including the pancarcinoma antigens T-nouvelle antigen and TF␣ (11). VLRs are leucine-rich repeat (LRR) proteins that mediate adaptive immunity in jawless vertebrates (lamprey and hagfish). Although VLRs are structurally unrelated to the immunoglobulin-based antibodies of jawed vertebrates, they are able to recognize as broad a spectrum of antigens as antibodies, with entirely comparable affinity and specificity (12). VLR genes are assembled from multiple LRR-encoding cassettes by DNA recombination in a process that generates a vast repertoire of Ͼ10 14 unique receptors, which is sufficiently diverse to recognize most, if not all, antigens (13)(14)(15).
Among the lamprey VLRs isolated by screening naïve VLR libraries displayed on yeast for glycan binders was VLR-B.aGPA.23, which recognized TF␣ (11). Glycan array profiling showed that VLRB.aGPA.23 possessed a high degree of selectivity for TF␣. Indeed, its overall selectivity was superior to that of most lectins and antibodies (5,6). Furthermore, a dimeric form of VLRB.aGPA.23 bound TF␣ with nanomolar affinity (11). In human tissue microarrays, VLRB.aGPA.23 selectively detected tumor-associated carbohydrate antigens in a variety of adenocarcinomas and squamous cell carcinomas. Moreover, lung cancer patients whose tumors stained with VLR-B.aGPA.23 experienced a significantly worse survival rate compared with patients whose tumors did not (11). Thus, highly selective VLRs such as VLRB.aGPA.23 provide a powerful means for detecting aberrantly glycosylated proteins on tumor cells, which may lead to an entirely new class of diagnostic and prognostic reagents.
Recent structural studies of VLRs bound to protein or carbohydrate ligands have begun to reveal the features that endow these LRR-based adaptive immune receptors with specificity and affinity for diverse antigens (12). However, the structural database for VLRs, which at present comprises only four VLRantigen complexes (16 -19), is far smaller than that for antibodies, for which hundreds of structures have been determined. Here we report the structure of VLRB.aGPA.23 bound to its tumor-associated ligand TF␣. This structure, in conjunction with a thermodynamic analysis of glycan binding by this VLR, provides a basis for understanding the remarkable selectivity of VLRB.aGPA.23 at the atomic level and for engineering VLRs as highly specific reagents for recognizing tumor-associated and other biomedically important carbohydrates.
Isothermal Titration Calorimetry-ITC measurements were carried out using a MicroCal iTC200 titration microcalorimeter. Purified VLRB.aGPA.23 was exhaustively dialyzed against 10 mM phosphate (pH 7.2), 136 mM NaCl, and 4 mM KCl. In a typical experiment, 1-or 2-l aliquots of 2.46 -7.40 mM glycan solution were injected from a 40-l rotating syringe into the sample cell containing 200 l of 0.157-0.377 mM VLR-B.aGPA.23 solution at 10°C. For each titration experiment, an identical buffer dilution correction was conducted; these heats of dilution were subtracted from the corresponding binding experiment. Equilibrium binding constants (K b ) were obtained by non-linear least-squares fit of the ITC data to a single-site binding model. Data acquisition and analysis were performed with the software package ORIGIN.
Surface Plasmon Resonance (SPR) Analysis-All SPR binding experiments were performed at 25°C in phosphate-buffered saline (pH 7.4) using a BIAcore T100 biosensor. Biotinylated TF␣-PAA (33 nM), aGPA (2.8 nM), and GPA (4.6 nM) were immobilized on streptavidin-coated BIAcore SA chips followed by blocking the remaining streptavidin sites with 1 M biotin solution. An additional flow cell was injected with only free biotin to serve as a blank control. Solutions containing different concentrations of VLRB.aGPA.23 were then serially injected over the chips with immobilized ligands and the blank until SPR signals reached a plateau. Equilibrium affinity measurements were performed at different flow rates (30 -50 l/min). Specific binding data were fitted with a 1:1 Langmuir binding model using BIAevaluation 4.1 software (BIAcore) to calculate K b values.
Crystallization and Data Collection-For crystallization of the VLRB.aGPA.23-TF␣ complex, VLRB.aGPA.23 (15 mg/ml) was mixed with TF␣-Ser in a 1:10 molar ratio. Crystals grew at 25°C in hanging drops in 20% (w/v) polyethylene glycol (PEG) 3350, 4% (v/v) glycerol, and 0.1 M Tris-HCl (pH 7.0). Crystals of the VLRB.aGPA.23-BG-H complex grew in 28.5% (w/v) PEG 3350, 0.6 M NaCl, and 0.1 M Tris-HCl (pH 7.0) from solutions containing a 5-fold molar excess of BG-H. For data collection, both complexes were cryoprotected in 100% paraffin oil before flash-cooling. X-ray diffraction data were recorded in-house at 100 K using a Rigaku R-axis IV 2ϩ image plate detector. All data were indexed, integrated, and scaled with the program Crystal-Clear (Rigaku). Both crystals contain four molecules in the asymmetric unit. Data collection statistics are summarized in Table 1.
Structure Determination and Refinement-The structure of the VLRB.aGPA.23-TF␣ complex was solved by molecular replacement with the Phaser program (20) using VLRB.RBC36 (PDB accession code 3E6J) (16) as a search model. Refinement was performed with RefMac 5.0 (21). Modeling and rebuilding were accomplished with COOT (22) using A -weighted 2F o Ϫ F c and F o Ϫ F c electron density maps. The structure of the VLRB.aGPA.23-BG-H complex was solved by molecular replacement with Phaser (20). The search model used in the calculations was the refined VLRB.aGPA.23-TF␣ structure. Final refinement statistics are presented in Table 1.
We used ITC to determine thermodynamic parameters for the binding of VLRB.aGPA.23 to TF␣, TF␣-Ser, TF␣-Thr, BG-H, and galactose (Table 2). For this purpose, VLR-B.aGPA.23 was expressed as a soluble monomeric protein by in vitro folding from bacterial inclusion bodies. As measured by ITC, VLRB.aGPA.23 bound the TF␣ disaccharide with an equilibrium binding constant (K b ) of 2.21 ϫ 10 4 M Ϫ1 (Fig. 1, A and  B). Importantly, this affinity is as high as has been reported for animal or plant lectin binding to disaccharide ligands (24). It is also nearly identical to the affinities determined for TF␣-Ser (K b ϭ 2.04 ϫ 10 4 M Ϫ1 ) (Fig. 1, C and D) and TF␣-Thr (2.43 ϫ 10 4 M Ϫ1 ) (see Table 2), showing that the amino acid moiety made no appreciable contribution to binding VLRB.aGPA.23, at least in solution. In terms of energetics, all three reactions were exothermic (negative ⌬H b ), indicating favorable contacts between the sugars and protein. Whereas the binding of TF␣ and TF␣-Ser was enthalpically driven, that of TF␣-Thr was characterized by a large favorable entropy (positive T⌬S b ), which compensated for a reduced enthalpy to achieve similar affinity.
VLRB.aGPA.23 bound the BG-H disaccharide ϳ2-fold less tightly than TF␣ (K b ϭ 0.96 ϫ 10 4 M Ϫ1 ) (Fig. 1, E and F) ( Table  2). However, VLRB.aGPA.23 showed no detectable binding to galactose (not shown), which demonstrates the contribution of fucose at the non-reducing end of BG-H and of N-acetylgalactosamine at the reducing end of TF␣ to ligand recognition by this VLR.
To independently confirm our K b measurements from ITC, we used SPR. In these experiments a biotinylated synthetic polyacrylamide glycoconjugate of TF␣ (TF␣-PAA) was immo-bilized on a streptavidin-coated biosensor surface, over which soluble monomeric VLRB.aGPA.23 was then injected ( Fig. 2A). Under equilibrium binding conditions, a K b of 1.5 ϫ 10 4 M Ϫ1 was obtained, which is very similar to the K b from ITC (2.21 ϫ 10 4 M Ϫ1 ). The much higher effective affinity from SPR reported previously (K b ϭ 1.3 ϫ 10 8 M Ϫ1 ) (11) may be explained by avidity effects. In that study we used a dimeric, not monomeric, form of VLR.aGPA.23 in which the VLR was fused to the Fc region of an antibody. In addition, the configuration of the SPR assay was reversed; multivalent TF␣-PAA was flowed over immobilized dimeric (VLR.aGPA.23) 2 -Fc protein (11), which allowed multipoint binding that produced a large avidity effect. We also compared the binding of VLRB.aGPA.23 to immobilized aGPA versus native GPA (Fig. 2, B and C). Whereas VLRB.aGPA.23 bound aGPA with K b ϭ 0.56 ϫ 10 4 M Ϫ1 , which is ϳ3-fold lower than its affinity for TF␣, no binding to native GPA was detected, in agreement with previous results showing high selectivity for the desialylated protein (11).
Overview of the VLRB.aGPA.23-TF␣ Complex-We determined the crystal structure of VLRB.aGPA.23 bound to TF␣-Ser to 2.2 Å resolution by molecular replacement using VLR-B.RBC36 (16) as a search model (Table 1; Fig. 3A). The root mean squared deviation in ␣-carbon positions for the four VLRB.aGPA.23 molecules in the asymmetric unit of the VLR-B.aGPA.23-TF␣ crystal was in the range of 0.14 -0.25 Å, indicating close similarity. Unambiguous electron density corresponding to the Gal␤1-3GalNAc␣ moiety of TF␣-Ser was found in the binding site, as evident from the 2F o Ϫ F c electron density map (Fig. 3B). However, no density was observed for the serine residue, suggesting flexibility due to lack of stable contacts with VLRB.aGPA.23. In agreement with this interpretation, TF␣ and TF␣-Ser bound VLRB.aGPA.23 with very similar affinities and enthalpy changes ( Table 2). The average temperature factor (B) for the Gal␤1-3GalNAc␣ portion of TF␣-Ser was 34 Å 2 , compared with 27 Å 2 for main-chain atoms of the protein, indicating that the disaccharide is well ordered in the crystal.
We also determined the structure of VLRB.aGPA.23 in complex with the BG-H disaccharide Fuc␣1-2Gal␤ to 1.9 Å resolution ( Table 1). The galactose moiety of BG-H was clearly defined in the electron density (Fig. 3C). However, no density was observed for fucose, even though this moiety contributes significantly to binding (see above). The average B factor for the galactose was 35 Å 2 , compared with 20 Å 2 for main-chain atoms of VLRB.aGPA.23.
The main structural difference between VLRB.RBC36 and VLRB.aGPA.23 resides in their LRRCT inserts, which differ in The stoichiometry (n) values ranged from 0.989 to 1.09 with uncertainties from 0.4 to 10.0%. The values in parentheses represent uncertainties of fit. The temperature was 25°C. both sequence and secondary structure. In VLRB.RBC36, the 10-residue insert forms a ␤-hairpin, whereas in VLRB.aGPA.23 the 9-residue insert is a loop whose overall shape differs markedly from that of the VLRB.RBC36 insert (Fig. 6B). In VLR-B.RBC36, Trp-204 is located at the end of the first ␤-strand of the LRRCT insert, before the ␤-hairpin turn, and interacts only with the galactose of BG-H2. The corresponding residue of VLRB.aGPA.23, Trp-187, is located at the tip of the insert loop and interacts with both the galactose and N-acetylgalactosamine of TF␣, thereby covering much more of the ligand (Fig. 4A).

DISCUSSION
Carbohydrate recognition by proteins is a topic of considerable interest with practical implications for basic research and clinical applications once the specifics of the molecular recognition process are understood. For carbohydrate binding proteins to achieve high selectivity, they must meet the unique challenge of discriminating among a vast array of sugar structures arising from the stereochemistry of hydroxyl groups in monosaccharides and different sugar linkage possibilities. A substantial amount of thermodynamic and structural information is available on glycan binding by plant and animal lectins (24,26) and by anti-carbohydrate antibodies (27)(28)(29)(30)(31). These studies have shown that lectins and antibodies recognize sugars through the stacking of aromatic residues against the sugar ring and hydrogen bonding of sugar hydroxyls to polar amino acid side chains.
VLRs, such as VLRB.aGPA.23, represent a new type of binding proteins with exceptional selectivity for carbohydrates, as demonstrated previously by glycan array profiling (11). Moreover, the affinity of VLRB.aGPA.23 for TF␣ measured here by ITC (K b ϭ 2.21 ϫ 10 4 M Ϫ1 ) was at the upper end of the spectrum for plant or animal lectins binding to disaccharide ligands (24). For example, the anti-HIV algal lectin griffithsin binds dimannoses with K b values up to 2 ϫ 10 4 M Ϫ1 (32). More typically, lectins display ϳ10-fold lower affinity for disaccharides (24). Affinities in excess of 10 4 M Ϫ1 generally require longer oligosac-  charides, which can interact with a more extended surface on the lectin (24,26).
For anti-carbohydrate monoclonal antibodies, affinities in the range of 10 4 -10 6 M Ϫ1 have been reported for tri-, tetra-, and pentasaccharides derived from Salmonella and Shigella lipopolysaccharide antigens (27)(28)(29)(30). It should be noted, however, that these antibodies were obtained from mice immunized with the corresponding bacteria, whereas VLRB.aGPA.23 was isolated from lampreys that were not challenged with TF␣ (11). In addition, it should be possible to increase the affinity of glycanspecific VLRs such as VLRB.aGPA.23 by in vitro directed evolution, as in the case of VLRs reactive with the protein lysozyme, for which binding improvements of up to 1300-fold were achieved (33).
In common with other protein-carbohydrate interactions (24,26), the binding of VLRB.aGPA.23 to all four glycans tested was characterized by a favorable enthalpy term, presumably due to the formation of hydrogen bonds and van der Waals contacts with the ligands (Table 2). In the case of TF␣ and TF␣-Ser, the entropic contribution to the binding energy is slightly favorable (TF␣-Ser) or slightly unfavorable (TF␣), whereas for TF␣-Thr and BG-H the entropy term makes a similar (BG-H) or greater (TF␣-Thr) contribution to driving binding as the enthalpy term. This is somewhat atypical for sugar binding to proteins, where entropy is usually unfavorable due to loss of ligand conformational flexibility, and binding is enthalpically driven (24,26). In this regard, structural studies of small glycopeptides have shown that GalNAc␣ attached to serine and threonine prefer very different rotomer populations, both in solution (34) and in silico (35), which could affect binding thermodynamics.
The absence of electron density for fucose in the VLRB.aGPA.23-Fuc␣1-2Gal␤ structure is surprising, as fucose contributes significantly to binding. Although we do not have a satisfactory explanation for this result, the lack of density does not necessarily mean there is no interaction at all between fucose and VLR.aGPA.23 but only that the interaction may not be sufficiently stable to be visualized in the crystal. It is also conceivable that the conformational entropy of the Fuc␣1-2Gal␤ ligand increases upon binding VLR.aGPA.23, thereby contributing favorably to the binding affinity. Although this may appear counterintuitive, NMR relaxation studies of galectin-3 have demonstrated that ligand binding increases the conformational entropy of the protein, which contributes favorably to the free energy of binding (36).
The architecture of the glycan-binding site of VLR-B.aGPA.23 differs markedly from those of lectins or antibodies, which typically consist of long grooves on the protein surface for accommodating oligosaccharide chains (24, 26, 28 -30, 32). In the VLRB.aGPA.23-TF␣ complex, the disaccharide is sandwiched between the LRRCT insert loop and the concave surface of the VLR solenoid formed by the short ␤-strands of the LRR and CP modules. This parallel ␤-sheet does not lend itself to the construction of carbohydrate binding grooves of the type seen in lectins and antibodies (37), and none is present in VLR-B.aGPA.23 or VLRB.RBC36 (16). Based on the VLRB.aGPA.23-TF␣ structure, longer oligosaccharides with extensions at the reducing end of the TF␣ disaccharide (GalNAc␣) would project perpendicularly across the ␤-strands, contacting additional LRR modules beyond LRRV1 (Fig. 6A). Although VLR-B.aGPA.23 contains only two additional N-terminal modules (LRRNT and LRR1), which would limit the potential contacting surface to perhaps a tetrasaccharide, other VLRs contain as many as six additional modules (12,15). However, the curvature of the binding surface would increase progressively (37), requiring the oligosaccharide to bend to conform to the concavity of the rigid VLR solenoid to maintain contacts with the protein. By contrast, the carbohydrate binding surfaces of lectins and antibodies are generally flat or convex (24, 26, 28 -30, 32). Despite these striking differences in binding site architecture, our structural and thermodynamic study of VLR-B.aGPA.23 has revealed how VLRs utilize the LRR scaffold to recognize glycans with an affinity and selectivity rivaling that of lectins and antibodies, making VLRs a highly promising class of natural glycan-binding proteins for basic research and clinical applications.