Structural Diversity of the Hagfish Variable Lymphocyte Receptors*

Variable lymphocyte receptors (VLRs) are recently discovered leucine-rich repeat (LRR) family proteins that mediate adaptive immune responses in jawless fish. Phylogenetically it is the oldest adaptive immune receptor and the first one with a non-immunoglobulin fold. We present the crystal structures of one VLR-A and two VLR-B clones from the inshore hagfish. The hagfish VLRs have the characteristic horseshoe-shaped structure of LRR family proteins. The backbone structures of their LRR modules are highly homologous, and the sequence variation is concentrated on the concave surface of the protein. The conservation of key residues suggests that our structures are likely to represent the LRR structures of the entire repertoire of jawless fish VLRs. The analysis of sequence variability, prediction of protein interaction surfaces, amino acid composition analysis, and structural comparison with other LRR proteins suggest that the hypervariable concave surface is the most probable antigen binding site of the VLR.

Variable lymphocyte receptors (VLRs) are recently discovered leucine-rich repeat (LRR) family proteins that mediate adaptive immune responses in jawless fish. Phylogenetically it is the oldest adaptive immune receptor and the first one with a non-immunoglobulin fold. We present the crystal structures of one VLR-A and two VLR-B clones from the inshore hagfish. The hagfish VLRs have the characteristic horseshoe-shaped structure of LRR family proteins. The backbone structures of their LRR modules are highly homologous, and the sequence variation is concentrated on the concave surface of the protein. The conservation of key residues suggests that our structures are likely to represent the LRR structures of the entire repertoire of jawless fish VLRs. The analysis of sequence variability, prediction of protein interaction surfaces, amino acid composition analysis, and structural comparison with other LRR proteins suggest that the hypervariable concave surface is the most probable antigen binding site of the VLR.
The immunoglobulins of jawed vertebrates are a large family of proteins with enormous sequence and structural diversity that can specifically recognize a virtually unlimited number of antigens (1). Recently a new type of immune receptor of comparable diversity was discovered in jawless fishes. These variable lymphocyte receptors (VLRs) 3 have characteristic properties resembling the adaptive immune receptors in jawed vertebrates (2). In the lamprey, a large repertoire of VLR genes is reported to be generated from a single germ line VLR gene by genomic rearrangement of flanking LRR (leucine-rich repeat) cassettes. Hagfishes have two germ line VLR genes, called VLR-A and VLR-B, that produce comparably diverse receptor repertoires (3). An individual lymphocyte cell of the sea lamprey expresses a unique VLR protein in a monoallelic manner that can specifically recognize the corresponding antigen in the humoral response. When lampreys were immunized with a mixture of antigen and mitogen, the titer of VLR protein specifically interacting with the antigen increased dramatically in the plasma; this was accompanied by an increase of VLR-positive large lymphocytes in the blood samples (2,4). Hagfish and lamprey are the only surviving jawless fishes belonging to the cyclostome taxon and appear to be the earliest creatures with an adaptive immune system (5-7). They have lymphocyte-like cells comparable with those of jawed vertebrates; these cells express various genes reminiscent of the mammalian adaptive immune system (8 -10).
The sequences of VLR proteins are completely unrelated to those of the immunoglobulins. They have a variable number of the LRR modules that are frequently found in the innate immune receptors of multicellular organisms (2). The LRR domains of VLRs contain a signal peptide, an N-terminal cap (LRRNT), a variable number of LRR modules, and a C-terminal cap (LRRCT). A threonine/proline-rich stalk and a hydrophobic tail region are attached to the C-terminal end of the LRRCT. Based on sequence alignment and variability, the LRR modules are subdivided into LRR1, LRRV (LRR variable), LRRVe (LRR variable end) and LRRCP (LRR connecting peptide) modules (4) (Fig. 1). In a sequence analysis of 517 sea lamprey VLR and 139 pacific hagfish VLR-A clones, the number of LRRVs varied between 0 and 7 (average: 1.31) and 1 and 5 (average: 3.1), respectively (4). The enormous diversity of VLRs derives from differences in the number and sequence of the LRR modules.
To gain a precise structural insight into the diversity and antigen specificity of the VLR receptors, we performed a crystallographic study of inshore hagfish Eptatretus burgeri VLR-A and -B proteins.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-VLRs (Eb7VLRA.29, residues 1-292; Eb8VLRB.59, residues 1-234; Eb8VLRB.61, residues 1-200) fused to the Fc domain of human IgG 1 were expressed in Hi5 insect cells (Invitrogen). The primer sequences user for the cloning of the VLR-As are 5Ј-actacg-gatccATGATGGGTCCGGTCTTGGCTGCAT-3Ј (forward) and 5Ј-actac gcggccgctATTTGGAGAGACGCAATCTGAG-GCCGC-3Ј (reverse). The primer sequences used for the cloning of the VLR-Bs are 5Ј-actacggatccATGAAGTTCGCACTG-AGAGGAACCT-3Ј (forward) and 5Ј-actacgcggccgctAGTAG-GGCAGATGATACTTCGGACGGG-3Ј (reverse). The PCR products were inserted between the BamHI and the NotI sites of the pVL1393 recombinant baculovirus expression vector (BD Pharmingen). The Fc tag was cloned between the NotI and the BglII sites of the vector using primers 5Ј-actacgcggccgcC-TGGTTCCGCGTGGTTCCGAGCCCAAATCTTGTGA-C-3Ј and 5Ј-atccaagatctTCATTTACCCGGAGACAGGGA-3Ј. The secreted VLR and Fc fusion proteins were purified using protein A-Sepharose (APBiotech) affinity chromatography. The Fc tag of the fusion protein was removed overnight with 0.5% (w/w) thrombin at 4°C, and the cleaved VLRs were further purified using ion exchange chromatography and Superdex 200 gel filtration chromatography (APBiotech). All VLR proteins were eluted as monomers in the gel filtration chromatography. They were concentrated to 10 -15 mg/ml in 20 mM Tris, pH 8.0, 200 mM NaCl buffer and used for crystallization.
Crystallization and Data Collection-Crystals were grown at 22°C using the hanging-drop vapor diffusion method by mixing 1 l of the protein solution and 1 l of crystallization buffer. Final crystallization conditions were 33% polyethylene glycol 1000, 0.2 M MgCl 2 , 0.1 M bis-Tris, pH 5.5, for Eb7VLRA.29; 33% polyethylene glycol 1000, 0.2 M NaCl, 0.1 M bis-Tris, pH 5.5, for Eb8VLRB.59; and 30% polyethylene glycol 4000, 0.2 M NaCl, 0.1 M Tris-HCl, pH 8.5, for Eb8VLRB.61. Crystals suitable for diffraction experiments were formed after 1 week. For data collection, the crystals were flash-frozen at Ϫ170°C in the crystallization buffer supplemented with 30% glycerol. The diffraction data were collected at the 4A beam line of the Pohang Accelerator Laboratory and the ID29 beam line of the European Synchrotron Radiation Facility and processed using the HKL package (11) ( Table 1).
Structure Determination and Refinement-The crystal structures of the VLRs were determined by the molecular replacement technique and with the program PHASER (12,13). For each VLR target, three entries with the highest sequence homology were selected from the PDB data base using the BLAST server (www.ncb.nih.gov) and tested as probes in the molecular replacement calculation. Among these, the LRR structures of the Nogo receptor (PDB code 1OZN) (14) and glycoprotein Ib␣ (PDB code 1M10) (15) gave the lowest initial R-factors for Eb7VLRA.29 and Eb8VLRB.61, respectively, and were employed as search probes (supplemental Fig. 1). The LRRNT and LRRCT modules of the search probes were removed before molecular replacement. Some of the LRR modules in the probes were also removed to match the number of LRR modules in the target VLRs and search probes. The initial models of the VLRs were refined with the program CNS (16), and the refined models were used to calculate the electron density maps (supplemental Fig. 2). The electron density maps corresponding to the LRRNT and LRRCT modules were clearly visible in the calculated maps, and the atomic models were built with the program O (17). For Eb8VLRB.59 we generated an artificial probe based on the Eb8VLRB.61 structure. The LRRV2 modules were duplicated and inserted between the LRRV2 and LRRV3 modules of Eb8VLRB.61. To make room for the inserted LRR, residues from LRRV3 to LRRCT were shifted ϳ4.5 Å while retaining the curvature and the twist of the horseshoe shape. The resulting structures were further refined. Ramachandran plots calculated with the final structures have no non-glycine residues in the generously allowed and prohibited regions.
Homology Modeling and Prediction of Protein Interaction Surfaces-The sequences of the inshore hagfish VLRs were downloaded from the GenBank TM data base (www.ncbi.nlm. nih.gov). Model structures for the entire 84 E. burgeri VLR sequences (supplemental Table 1) were either taken from the crystal structures or generated by homology modeling using the SWISS-MODEL (swissmodel.expasy.org) server in the first approach mode (18). The SWISS-MODEL server requires both template structures and target sequences for automatic homology modeling. For VLRs containing a number of LRRV modules different from those of the crystallized VLRs, template structures were generated by deleting or inserting LRRV modules in the experimentally determined structures. To estimate the accuracy of the modeling we generated a template structure for Eb8VLRB.61 based on the structure of Eb8VLRB.59 by deleting one LRRV module. The model, generated by the SWISS-MODEL server, could be superimposed on the experimentally determined structure with a C␣ r.m.s. deviation of 0.85 Å. The 81 VLR models and three experimentally determined VLR structures were submitted to the PPI-Pred (protein-protein interface prediction) (19) and ProMate (20) servers to predict the protein-protein interaction surface. The default score configuration (atoms distribution, chemical character, amino acid pairs distribution, evolutionarily conserved positions, non-regular secondary structure length, sequence distances within a circle, secondary structure, hydrophobic patch rank, water molecules) was used for the ProMate calculation.
Sequence Variability Analysis-For reliable sequence alignment, VLRs with an equal number of LRRV modules were chosen for analysis. For VLR-A, all 43 inshore hagfish VLR-A sequences with three LRRV modules in the Swiss-Prot data base were aligned using the ClustalW program (21). The structural model generated by homology modeling and the aligned sequence file were submitted to the ConSurf server (22). The conservation scores were calculated using an empirical Bayesian method. For VLR-B, the structure of Eb8VLRB.61 and 10 inshore hagfish VLR-B sequences with two LRRV modules were analyzed using the ConSurf server. The calculation was repeated with Pacific hagfish VLRs, and similar results were obtained.

RESULTS
Structure Determination-We determined the crystal structures of three inshore hagfish VLRs ( Table 1). The GenBank TM accession numbers are ABB21039 for clone Eb7VLRA.29, AAZ16360 for clone Eb8VLRB.59, and AAZ16361 for clone Eb8VLRB.61. The "stalk domain" was not included in the crystallized proteins because its amino acid sequence is invariant and therefore unlikely to be involved in antigen recognition. The structures of Eb7VLRA.29 and Eb8VLRB.61 were determined by the molecular replacement technique using the reported structures of the Nogo receptor and glycoprotein Ib␣ as search probes, respectively. The LRRNT and LRRCT mod- ules of the search probes were removed before making the molecular replacement calculations, because their structures are predicted to differ from those of their VLR counterparts. The refined atomic model of Eb8VLRB.61 was used for the molecular replacement calculation of Eb8VLRB.59. The Structure of VLR-A-Eb7VLRA.29 adopts a horseshoelike solenoid structure common to LRR family proteins (Fig. 2). The LRR modules of all VLR proteins possess 24-amino acid repeats of consensus XLXXLXXLXLXXNXLXXLPXXXFX. Eb7VLRA.29 contains 8 modules of the LRR repeats including LRR1, LRRV1 to V5, LRRVe and LRRCP. Conserved LXXLXLX repeats form the concave surface containing 8 parallel ␤-strands. The remaining 17 amino acids of the consensus sequences form a short 3 10 helix and connecting loops in the convex part of the molecule. The VLR-A LRR sequence shows strict length conservation with virtually no insertions or deletions. The only exception is the last LRR module, LRRCP, which has only 16 amino acid residues, because the convex region of LRRCP is replaced by the LRRCT module. Structurally, VLR-A belongs to the "typical" subclass of the LRR family; the asparagine ladder and phenylalanine spine found in other typical LRR proteins are present in the VLR-A structure (23,24). The conserved leucines and phenylalanines are involved in forming the extended hydrophobic core. The hydrophobic cores of LRR proteins are usually capped with cysteine-rich N-and C-terminal modules called LRRNTs and LRRCTs, respectively. VLR-A also contains these capping modules. The LRRNT module of VLR-A contains a ␤-hairpin structure stabilized by two disulfide bridges, whereas the LRRCT of VLR-A belongs to the CF1 class and contains a characteristic ␣-helix and two disulfide bridges.
The Structure of VLR-Bs-Eb8VLRB.61 contains five LRR modules and also belongs to the "typical" subclass of the LRR family (23,24). The leucines forming the hydrophobic core and the asparagines and phenylalanines involved in the asparagine ladder and phenylalanine spine, respectively, are strictly conserved (Fig. 3A). The LRR modules of VLR-B and VLR-A can be superimposed with r.m.s. deviations of the C␣ positions of less than 0.84 Å (Fig. 3, B and  C). The disulfide bridges in the LRRNT and LRRCT modules are found at similar locations. However, these modules show recognizable structural variations from those of VLR-A. The ␤-hairpin structure in the LRRNT of VLR-B is considerably shorter because of amino acid deletions between the disulfide bridges. The backbone structure of the C-terminal ␣-helix is conserved in the LRRCT modules of both the VLR-B and VLR-As. However the structure and orientation of the connecting loops differ.
The structure of Eb8VLRB.59 is highly homologous to that of Eb8VLRB.61 except that it has one additional LRRV module and an eight-amino acid insertion in the LRRCT module (Fig.  4A). The two structures can be superimposed with r.m.s. deviations of the C␣ positions of less than 0.49 Å (Fig. 4, B and C). The addition of the LRRV module does not cause detectable changes in the radius or twist of the horseshoe structure, and the effect of the eight-amino acid insertion in the LRRCT module is limited to the local area, with a minimal effect on the overall structure of the protein.
Structure of Other VLR Proteins-Our structural studies of the three hagfish VLR proteins demonstrate that the structures of their LRR modules are highly homologous and their backbone atoms can be accurately superimposed. The VLRs of several other jawless fish, the Pacific hagfish and three lamprey species, have been reported (2)(3)(4). Although the LRRNT and LRRCT modules show significant interspecies sequence variation, key residues in their LRR modules are strictly conserved; the leucines and phenylalanines in the hydrophobic core and asparagines and prolines important for the convex structure are strictly conserved in all known VLR sequences. The length of the LRR modules is practically invariant, and the deletion or addition of residues is extremely rare. This high sequence conservation suggests that our structures are likely to represent the LRR structures of the entire repertoire of jawless fish VLRs.
Prediction of the Antigen Binding Site-Identification and structural analysis of the antigen binding site is important for understanding the function of the VLR proteins. Because there is no reported antigen and hagfish VLR complex, experimental identification of the antigen binding site is not yet possible. Therefore we tried to predict the antigen binding site by sequence and structure analysis.
Sequence variability analysis has proved to be useful for identifying antigen binding sites in immunoglobulins and MHC (major histocompatibility complex) molecules (25). Analysis of the VLR sequences shows variable patches in both the concave and convex surfaces (Fig.  5). However, the variable patches on these two surfaces are differently distributed; the variable patches in the concave surfaces are clustered together forming a large hypervariable surface, whereas those in the convex surfaces are small and scattered over the surface. As shown for immunoglobulins, the antigen binding site of VLR is likely to form an extended surface for binding globular proteins. Therefore, our variability analysis indicates that the concave surface is the best candidate for the antigen binding site of VLRs.
The hypervariable antigen binding loops of immunoglobulins contain an unusually high percentage of tyrosines (26,27). Tyrosine appears to be important for protein interactions, because it is hydrophobic but can be exposed on the protein surface. We analyzed 66 inshore hagfish VLR-A sequences and 18 VLR-B sequences in the Swiss-Prot data base. In the VLR-As, 9.1% of the amino acids on the concave surfaces are tyrosines compared with 2.1% in the VLR-A proteins as a whole. In the VLR-Bs the corresponding frequencies are 11.7 and 3.4%, respectively. A similar pattern was found in the VLRs of Pacific hagfish and sea lamprey. The high frequency of tyro-  . Sequence variability of inshore hagfish VLRs. Variability is indicated by the color gradation from red to blue. Red, the most variable patch; blue, the least variable patch. A, the variability was calculated with the inshore hagfish VLR-A sequences with three LRRV modules. The surface was generated with a model structure produced by homology modeling (see "Experimental Procedures"). B, the variability was calculated with the inshore hagfish VLR-B sequences with two LRRV modules. The surface was calculated with the structure of Eb8VLRB.61. Left, concave views; right, convex views. sine residues in the concave surface supports the hypothesis that it is the antigen binding site.
Bradford and Westhead (19) recently reported a new prediction method, PPI-Pred, for protein-protein interaction surfaces. This technique utilizes the shape, electrical property, hydrophobicity, and aromaticity of the protein surface for prediction, and its success rate is reported to exceed 70%. We applied this technique to predict the antigen binding site in the VLRs. For this analysis we built 81 models of the inshore hagfish VLRs using the SWISS-MODEL automatic homology modeling server, as described under "Experimental Procedures." As shown in Table 2, the concave surface is predicted to be either the most probable or the second most probable surface for protein interaction. In contrast, the convex surface is consistently predicted to be the least probable surface for protein interaction in the majority of the VLRs. To test this prediction, we employed a different program called ProMate (20). ProMate also utilizes various structural properties of protein surfaces for prediction but with an independent algorithm. As expected, the concave surfaces of the majority of VLRs got scores higher than the convex surfaces in the ProMate calculation (Table 2).
Structural comparison is often useful for predicting the functional site of a protein. To search for structurally related proteins, the VLR structures were submitted to the Dali data base. The Slit D3 domain (PDB code 1W8A), Nogo receptor (PDB code 1OZN), glycoprotein Ib␣ (PDB code 1M10), and TLR3 (PDB code 1ZIW) were identified by the server as the closest structural relatives of VLRs. These proteins have similar modular structures and patterns of amino acid sequence conservation (Fig. 1B). Interestingly, crystallographic analysis and data from mutation experiments indicate that the concave surface is the major protein interaction surface in Slit, the Nogo receptor, and glycoprotein Ib␣ (14,15,28). Therefore, structural comparison with other LRR family proteins also supports the hypothesis that the concave surface is the most probable antigen binding site of VLRs.
Our proposal regarding the antigen binding site is consistent with a previous prediction by Alder et al. (4). Based on evolutionary analysis, they proposed that the concave surface of VLRs is under positive evolutionary selection and is the most likely site for antigen binding (4). Our view and that of Alder et al. clearly require future experimental validation.

DISCUSSION
In this report, we have presented the first crystal structures of jawless fish VLRs. Both VLR-A and VLR-B have the horseshoeshaped structure common to LRR family proteins. All of the VLRs appear to have a highly homologous backbone structure, and the sequence variation is concentrated in the concave surface of the protein. Based on various analyses, we proposed that the hypervariable concave surface of the protein is the most probable antigen interaction site.
As noted previously, the concave surfaces of LRR family proteins are frequently used for protein interaction (6,23,24). However, in some LRR proteins, including TLR3 and CD14, surfaces other than the concave surface are known to play the major role in ligand binding (29 -31). Both TLR3 and CD14 interact with non-protein ligands, double-stranded RNA and lipopolysaccharide, respectively. The convex surfaces of TLR3 and CD14, unlike those of other LRR proteins, have nonrepetitive sequences and structural patterns. As a result, nonrepetitive secondary structures are frequently inserted into the convex part of the molecule. Therefore, unlike the majority of the LRR family proteins, their convex surfaces contain grooves or pockets ideal for ligand binding. The LRR modules of VLRs show strict length and sequence conservation even in the convex portion of the module. The resulting convex surfaces of the VLRs have a repeated pattern of 3 10 helices and connecting loops, without grooves or pockets large enough for ligand binding.
The diversity of the antigen binding sites of immunoglobulins and VLRs is generated by gene rearrangement. However, their structures are completely different. In immunoglobulins, antigen binding is mediated by six hypervariable loops from the heavy and light chains. These loops are highly diverse in length, sequence, and backbone structure. By exploiting the structural diversity in these loops, immunoglobulins can recognize a wide variety of antigen structures. In contrast, the concave surfaces of VLRs are composed of rigid ␤-strands, and structural changes of the backbone atoms are minimal. The average surface area of immunoglobulins that binds protein antigens is ϳ800 Å 2 . The concave surface area of the most abundant form of hagfish VLRs with three LRRV modules is estimated to be more than 1500 Å 2 . As shown for other LRR family proteins (15,32,33), this large binding area of the VLRs may promote high affinity interaction with macromolecular antigens.
Sequence variation in immunoglobulins is essentially confined to the short antigen binding loops. The constant domain constitutes most of the immunoglobulin molecule and appears to be important either for stability of the protein or for interaction with effector proteins. Unlike the immunoglobulins, sequence variation in VLRs is present not only on the concave but also on the convex surface. It is not clear why the convex surface has variable patches. It is possible that it has a secondary role in antigen binding via long range interactions. Another possibility is that the LRR scaffold is very stable and some evolutionary variation is allowed without compromising the stability of the proteins.
Several non-immunoglobulin frameworks have been tested for their ability to generate random artificial libraries that To predict the protein interaction site, 64 VLR-A and 17 VLR-B structures generated by SWISS-MODEL, and the three crystal structures were analyzed by PPI-Pred and ProMate (details under "Experimental Procedures"). The ProMate scores for the most, second most, third most, and least probable binding sites range from 100.0 to 85.0, from 85.0 to 70.0, from 70.0 to 55.0, and from 55.0 to 0.0, respectively.

Convex surface
Concave surface

Convex surface
Most probable binding site  44  6  33  5  2nd most probable binding site  24  4  38  13  3rd most probable binding site  5  4  11  43  Least probable binding site  11  70  2  23 Total analyzed VLR structures 84 mimic the natural immunoglobulin library (34 -38). The LRR library reported by Plückthun and co-workers (38) is a successful example of this approach. Based on the structure of ribonuclease inhibitor, they produced proteins with 4 -14 LRR modules containing randomized amino acids in their concave surfaces. All the clones chosen at random from the resulting library, with a theoretical diversity of greater than 10 12 , produced monomeric, stable, and highly soluble proteins. Interestingly, the design principles of the artificial LRR library and the naturally evolved VLR library are strikingly similar. 1) Residues with an obvious structural role are strictly conserved. 2) Insertions and deletions in the LRR modules are essentially not allowed. 3) Sequence variability is focused on the concave surface. 4) The first and the last LRR modules contain special residues and their sequence conservation differs from that of the other LRR modules. The success of the artificial LRR library demonstrates that modularity is inherent in the LRR scaffold and that residues in the concave surface can be changed without affecting the overall stability of the protein (38). These intrinsic properties of the LRR scaffold may have sustained the role of VLRs as adaptive immune receptors during evolution.