High Resolution NMR-based Model for the Structure of a scFv-IL-1β Complex

Monoclonal antibodies have recently started to deliver on their promise as highly specific and active drugs; however, a more effective, knowledge-based approach to the selection, design, and optimization of potential therapeutic antibodies is currently limited by the surprising lack of detailed structural information for complexes formed with target proteins. Here we show that complexes formed with minimal antigen binding single chain variable fragments (scFv) reliably reflect all the features of the binding interface present in larger Fab fragments, which are commonly used as therapeutics, and report the development of a robust, reliable, and relatively rapid approach to the determination of high resolution models for scFv-target protein complexes. This NMR spectroscopy-based approach combines experimental determination of the interaction surfaces and relative orientations of the scFv and target protein, with NMR restraint-driven, semiflexible docking of the proteins to produce a reliable and highly informative model of the complex. Experience with scFvs and Fabs targeted at a number of secreted regulatory proteins suggests that the approach will be applicable to many therapeutic antibodies targeted at proteins, and its application is illustrated for a potential therapeutic antibody targeted at the cytokine IL-1β. The detailed structural information that can be obtained by this approach has the potential to have a major impact on the rational design and development of an increasingly important class of biological pharmaceuticals.

The ability of antibodies to bind to an almost unlimited number of target proteins with high specificity makes them one of the fastest growing classes of therapeutics in the biological drugs market (1). Since the first description of monoclonal antibodies (2), dramatic progress has been made in the expression, engineering, humanization, and applications of antibodies as therapeutics. A wide variety of antibody fragments have been evaluated as potential therapeutics including the well characterized antigen binding fragment (Fab), which contains the light chain (V L and C L domains) and N-terminal portion of the heavy chain (V H and C H domains). The smallest fragment to retain full binding activity has also attracted considerable interest, with the so-called single chain variable fragment (scFv) 3 (3) consisting of the two variable domains joined by a short peptide.
A detailed understanding of the interactions between candidate therapeutic antibodies and target proteins is key to further progress in rational design and humanization. Currently, identification of the binding sites for antibodies on target proteins is achieved via one or a combination of indirect methods such as protease protection, peptide scanning, site-directed mutagenesis, or analysis of backbone amide exchange (4 -6). Although providing valuable information, each of these approaches has drawbacks; in particular, they may not detect discontinuous epitopes and do not provide information on the spatial organization of epitopes. For such an important area of biotherapeutics, relatively few crystal structures have been determined for potential therapeutic antibody-target protein complexes, which probably reflects the inherent flexibility and solubility of antibodies, resulting in limited success in crystallization trials.
Continued developments in NMR spectroscopy mean that it is now possible to obtain detailed structural information for proteins and complexes of up to 80 kDa in solution (7), which makes this an attractive approach for determining the structures of isolated scFvs (28 kDa) and Fabs (50 kDa), as well as complexes formed with target proteins. NMR spectroscopy is an ideal tool for mapping the precise interaction sites on both the antibodies and also target proteins. To date, only a few limited NMR studies of functional antibody fragments have been reported, including scFv, Fv, and isolated V L domains (8,9), with broad line widths limiting the experiments possible. In the case of scFvs, the formation of domain-swapped dimers at even relatively low concentrations is now well documented (10,11) and presumably accounts for the line width problems encountered in previous attempts to obtain detailed structural information for scFvs using NMR-based methods.
In this study, we report the successful use of NMR spectroscopy to determine a reliable model for the scFv-IL-1␤ complex, which reveals details of the scFv residues involved in IL-1␤ recognition, as well as the binding site on IL-1␤. We also provide direct evidence that a scFv binds to a target protein in the same manner as an equivalent Fab, indicating that high resolution models for scFv-target protein complexes can be used as reliable guides for the rational design and development of therapeutic antibodies.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The scFv, Fab, and IL-1␤ were expressed as soluble proteins in Escherichia coli and purified using a combination of affinity and size-exclusion chromatography. Full details of the expression vectors used and purification protocols are given in the supplemental materials.
Analysis of the Monomer to Domain-swapped Multimer Equilibrium for the scFv-The concentration dependence (28 -395 M) of the monomer to the domain-swapped dimer ratio for the isolated scFv was determined by analytical gel filtration on a Superdex 75 16/60 column using a 25 mM sodium phosphate, 100 mM sodium chloride, 0.02% sodium azide buffer at pH 6.5. The column was calibrated using a range of molecular mass protein standards (6.5, 13.7, 29.0, 43.0, and 75.0 kDa) supplied by GE Healthcare.
The two-dimensional and TROSY-based three-dimensional spectra recorded to obtain sequence specific backbone assignments for the scFv-IL-1␤ complex were: 15 N/ 1 H HSQC, 15 N/ 13 C/ 1 H HNCACB, HN(CO)CACB, HNCA, HN(CO)CA, and HNCO (12). Typical acquisition times/spectral widths for three-dimensional experiments were 8 -9 ms/24 -65 ppm in F 1 ( 13 C) (except for the HNCO, which was 24 ms/11 ppm), 21-24 ms/36 ppm in F 2 ( 15 N), and 70 ms/14 ppm in F 3 ( 1 H). The three-dimensional spectra were collected for between 64 and 80 h, and the 15 N/ 1 H HSQC spectra were collected for between 0.5 and 2 h, with acquisition times/spectral widths of 60 ms/14 ppm in F 2 ( 1 H) and 50 ms/36 ppm in F 1 ( 15 N). NOE data were obtained using 15  Residual dipolar coupling (RDC) data were collected using 4 mg/ml Pf1 phage (ASLA BIOTECH Ltd.) in samples of 0.2-0.3 mM scFv-IL-1␤ complex with either the scFv or IL-1␤ 2 H/ 15 N labeled. Backbone amide RDC values were derived from the differences between the 15 N-1 H scalar couplings for isotropic and partially aligned samples using 15 N/ 1 H HSQC and TROSY spectra (13), with acquisition times/spectral widths of 60 ms/14 ppm in F 2 ( 1 H) and 50 ms/36 ppm in F 1 ( 15 N), and collected for ϳ10 h. Digital resolution in the spectra used for RDC measurements was 1.1 Hz. All NMR data were processed using TopSpin (Bruker Biospin Ltd.) and analyzed using Sparky (University of California, San Francisco). Backbone amide proton line widths were measured for a selection of well resolved cross peaks in two-dimensional 15 N/ 1 H HSQC spectra of several 15 N-labeled proteins (Pdcd4 MA-3 domains, ESAT-6-CFP-10 complex, scFv, Fab, and scFv-IL-1␤ complex; 15-45 kDa (14, 15)), with the line width determined at half-height.
Sequence-specific resonance assignments (N, H N , C ␣ , C ␤ , and CЈ) were obtained for the scFv-IL-1␤ complex from the identification of intra-and inter-residue connectivities in TROSY-based three-dimensional triple-resonance spectra, with additional supporting evidence provided by sequential NOEs observed in 15 N/ 1 H NOESY-HSQC spectra. The chemical shift index (16) and TALOS (17) programs were used to determine the positions of elements of regular secondary structure from the chemical shift data.
The minimal shift approach, as described previously (18,19), was used to identify scFv residues involved in IL-1␤ binding. The minimal shift values were obtained from the combined chemical shift difference in 15 N and 1 H for each assigned peak in the 15 N/ 1 H HSQC spectrum of the 15 N-labeled scFv bound to unlabeled IL-1␤ when compared with all peaks observed in the 15 N/ 1 H HSQC spectrum of the free 15 N-labeled scFv. A histogram of minimal combined shift versus the protein sequence was used to reveal residues from the scFv with significantly perturbed backbone amide signals. IL-1␤ residues involved in scFv binding were determined in a similar manner by the determination of actual combined backbone amide 15 N and 1 H shifts from assigned 15 N/ 1 H HSQC spectra of free and scFv-bound IL-1␤.
Homology Modeling of the scFv-Initially, two BLAST (20) searches were completed against the Protein Data Bank (PDB), one with the V H domain of the scFv and the other with the V L domain. The closest antibody structures were retained and aligned to the scFv using FUGUE (21). Each of the pair-wise alignments was scored based on sequence similarity within the framework region, the conservation of solvent-inaccessible residues, the similarity in complementarity-determining region (CDR) length, and the resolution of the template structure. The alignment with the best score (PDB code 1L7I (22)) was used as the input to MODELLER (23) to build a homology model of the Fv region. The model was then subject to manual refinement. CDR H2 was identified by HARMONY3 as a potentially problematic region, and following manual examination, it was replaced by that from the deposited structure 1KNO (24). The resulting model of the Fv region was converted to a scFv by adding the flexible linker.
Calculation of a Reliable Model for the scFv-IL-1␤ Complex-The structure of the scFv-IL-1␤ complex was determined by NMR restraint-driven docking of IL-1␤ and the scFv using HADDOCK (25), in which residues involved in interaction sites were defined as semiflexible. A homology model of the scFv and a crystal structure of free IL-1␤ (PDB code 2I1B (26)) were used as starting points for the docking calculations. To dock the scFv and IL-1␤, ambiguous interaction restraints were selected to define the protein-protein interaction surface using either active or passive residues. Active residues are ones that have been experimentally identified as being involved in the interaction and are solvent-exposed, with passive residues being all solvent-accessible neighbors of active residues. Analysis of the chemical shift perturbation data and of solvent accessibilities using NACCESS (27) resulted in the identification of active residues as scFv residues with a minimal shift of over 0.075 ppm and 20% solvent accessibility and IL-1␤ residues with a combined shift of over 0.1 ppm and 20% solvent accessibility. The active and passive residues selected are summarized in supplemental Table 1 The axial tensor (D a ) and rhombicity (R) components of the alignment tensor for partially aligned samples of the scFv-IL-1␤ complex were calculated using PALES (28), and together with the backbone amide RDC data, they were used to incorporate restraints defining the orientation of the scFv and IL-1␤. In addition, intervector projection angle restraints (29) were derived from the RDC data. A substantial number of long range, intramolecular H N to H N NOEs identified were also included as distance restraints during the docking calculations (387 for the scFv and 240 for IL-1␤). The NOEs were calibrated on the basis of peak intensity and determined to correspond to 1 H-1 H distance restraints of Ͻ5, 5-6.5, or 6.5-8 Å.
In the first stage of the docking calculations, 1000 initial complex structures were generated by rigid body energy minimization. The 200 complexes with the lowest overall energy were then selected and refined in an explicit water shell (25). After refinement, the complexes were initially sorted into structurally related groups using the default clustering cut-off of 7.5 Å, which placed all the structures in one group with an overall backbone r.m.s.d. of 2.2 Å. The complexes were therefore reclustered using a more stringent clustering cut-off of 2.2 Å, which yielded one dominant cluster and eight sparsely populated ones. To verify the scFv-IL-1␤ complex obtained, backbone amide RDCs calculated from the complex structures were compared with the experimentally determined RDCs using PALES. Analysis of the calculated structures was carried out

RESULTS
Domain-swapped Dimer Formation-A number of previous studies have reported the formation of domain-swapped dimers for purified scFv proteins at relatively modest concentrations (20 M) (10,11,31). We have investigated the behavior of two distinct scFvs selected for specific and high affinity binding to IL-1␤. Analysis by gel filtration revealed that they were predominantly monomeric at concentrations below 10 M, but at concentrations required for detailed NMR structural analysis (Ͼ200 M), they approached nearly 50% domain-swapped dimer, as illustrated in Fig. 1a. 15 N/ 1 H HSQC spectra obtained from samples of the scFvs containing nearly 50% domain-swapped dimer were surprisingly good (Fig. 2a), which in part reflects the conservation of the V L -V H domain interface in the dimer (supplemental Fig. 1) and results in only a few signals being shifted on dimer formation. In contrast, dimer formation has a marked influence on the signal line width, resulting in significantly reduced sensitivity and resolution, with average backbone amide proton line widths of 31.5 Ϯ 5.5 Hz when compared with less than 25 Hz expected for a monomeric scFv. This predicted value was based on extrapolation from the experimentally determined amide proton line widths for a selection of proteins (15-45 kDa).
Mapping of Binding Sites-Residues involved in the interaction between the scFv and IL-1␤ were identified by the comparison of 15 N/ 1 H HSQC spectra acquired for the free and bound proteins (Figs. 3a and 4a), as described previously (18,19). It was not possible to obtain assignments for the free scFv (see above), and so significantly perturbed backbone amide signals were identified by minimal shift analysis (18,19). Assignments have been previously reported for IL-1␤ (32), and it proved relatively straightforward to obtain nearly complete backbone resonance assignments for both the free and the scFv-bound protein under our experimental conditions, which allowed the actual combined shifts in amide 15 (Fig. 4). For the scFv, residues with significantly perturbed backbone amides mainly lie in the CDRs (Fig. 3b), as expected for antibody binding.
Restraint-driven Docking-Analysis of the C ␣ , C ␤ , and CЈ assignments obtained for the scFv-IL-1␤ complex revealed that both proteins have an unchanged secondary structure in the complex (supplemental Fig. 2), which strongly suggests that neither protein undergoes a significant conformational change  NOVEMBER 13, 2009 • VOLUME 284 • NUMBER 46 on complex formation. This clearly indicates that the backbone amide signal perturbations seen on complex formation reflect changes at the interaction sites and supports a restraint-driven docking approach to determine the structure of the scFv-IL-1␤ complex.

NMR-based Model of a scFv-IL-1␤ Complex Structure
Backbone amide chemical shift perturbation data were used to identify the scFv-IL-1␤ interface and to define ambiguous interaction restraints for docking, which resulted in 17 active and 25 passive residues being selected for the scFv and 26 active and 19 passive residues being selected for IL-1␤. Information concerning the relative orientation of the two proteins was obtained from backbone amide RDC measurements for the complex (supplemental Fig. 3), with partial alignment achieved using Pf1 phage. RDC values were obtained for 107 residues of IL-1␤ (70%) and 166 residues of the scFv (65%), with data for the remaining residues missing due to overlap in the 15 N/ 1 H HSQC and TROSY spectra. For the scFv, 387 long range, intramolecular H N to H N NOE-derived distance restraints were also included in the docking calculations (119 sequential (i, i ϩ 1), 102 medium (i, i Յ 4), and 166 long range (i, i Ն 5)), and for IL-1␤, 240 long range, intramolecular restraints were included, (78 sequential (i, i ϩ 1), 58 medium (i, i Յ 4), and 104 long range (i, i Ն 5)).
The docking calculations using HADDOCK (25) produced one main cluster for the scFv-IL-1␤ complex, which contained 77 of the 200 calculated structures and is shown in Fig. 5a. A best fit superposition of the structures for the backbone atoms of residues in elements of regular secondary structure in both IL-1␤ and the scFv gives backbone atom (N, C ␣ , and CЈ) r.m.s.d.s to the mean of 0.7 Ϯ 0.1 Å for IL-1␤ and 0.9 Ϯ 0.2 Å for the scFv. The remaining structures were grouped into eight sparsely populated clusters, which all showed poorer agreement with the NMR data and higher energies.
The agreement between the measured RDC data and the two proteins in the scFv-IL-1␤ complex was greatly improved after docking and refinement, with Cornilescu quality factor (Q) values (33) reducing from 0.31 to 0.11 for IL-1␤ and from 0.48 to 0.11 for the scFv. The final set of converged scFv-IL-1␤ complexes is entirely consistent with the NMR-derived constraints used for docking, with no significant or consistent violations. The family of structures obtained was also validated by comparison of the backbone dihedral angle ranges indicated by TALOS analysis of the resonance assignments (276 pairs of and angle ranges for the scFv and 166 pairs for IL-1␤) with those observed in the converged complex structures. Consistent differences were found for only 2 residues in IL-1␤ (Glu-96 and Ile-106) and 6 residues in the scFv (Ala-51, Thr-69, Phe-91, Gly-184, Asn-229, and Lys-230), which probably reflects the known 2% error rate for TALOS predictions (15). The excellent agreement between the substantial NMR data available and the complex structures obtained clearly indicates that the approach reported here can produce reliable, high resolution models for scFv-target protein complexes. The scFv-IL-1␤ complex structures, together with the NMR constraints, have been deposited in the Protein Data Bank under accession number 2KH2.
ScFv-IL-1␤ Complex Structure-The structure obtained for the scFv-IL-1␤ complex is shown in Fig. 5 and is typical of that expected for an antibody-target protein complex, with residues in the CDR loops responsible for most of the contacts with IL-1␤. The complex features a fairly large interface between the two proteins, with the buried surface area upon complex formation corresponding to 1930 Ϯ 130 Å 2 . Residues were considered to be involved in intermolecular contacts at the interface if the distance between neighboring atoms corresponded to less than the sum of their van der Waals radii plus 0.5 Å (34). On this basis, 18 residues from both the scFv and IL-1␤ contribute to the protein-protein interface, as summarized in Table 1. For the scFv, four of the six available CDRs (L1, L3, H2, and H3) make significant contacts with IL-1␤, and interestingly, 3 framework residues (Asp-1, Asp-191, and Lys-194) are also found at the interface.  15 N-labeled IL-1␤ in blue, with assignments indicated for both. Signals from a number of residues, such as Cys-8 and Val-19, clearly undergo significant shifts on complex formation, whereas others, such as Val-72, remain unperturbed. b, the histogram shows the combined backbone amide shifts seen for IL-1␤ on binding to the scFv. Regions of regular secondary structure are indicated by red bars for helices and blue arrows for ␤-sheets. c, the shift data are mapped onto a space-filled view of IL-1␤, with significantly perturbed residues (shift Ͼ0.1 ppm) colored on a gradient from white to red. Residues highlighted in yellow are ones for which no chemical shift perturbation data were obtained. d, a ribbon representation of the IL-1␤ structure in the same orientation as c. Docking Protocol Robustness-The robustness of the docking procedure and in particular its dependence on the completeness of the backbone amide RDC data were evaluated by repeating the docking calculations with subsets of the RDC data randomly removed. The full set of backbone amide RDC data obtained covered 65% of the scFv residues and 70% of the IL-1␤ residues. Removal of up to 30% of the RDC data obtained, resulting in coverage of 46% of the scFv residues and 48% of the IL-1␤ residues, led to the production of several equally populated clusters from the docking calculations. However, the cluster of complex structures with the lowest overall energy and best agreement with the NMR data were closest to the scFv-IL-1␤ complex obtained with the full data set (backbone r.m.s.d.s of 1.5 Ϯ 0.15 Å for all residues). As expected, removal of over 50% of the RDC data collected resulted in significant variability between the clusters of scFv-IL-1␤ complexes produced by docking, with no reliable criteria on which to select the correct complex structure. This analysis clearly illustrates the importance of obtaining a good coverage of RDC measurements to obtain a reliable complex structure by NMR restraintdriven docking. For the scFv-IL-1␤ complex reported here, this corresponds to backbone amide RDC data for over 60% of the residues in each protein, which is likely to reflect the requirement for tight protein complexes in general.
The impact of the long range, intramolecular H N to H N NOE-derived distance restraints on the docking calculations was assessed by comparing the structures obtained for the scFv-IL-1␤ complex with and without these restraints included. The removal of the NOE-derived restraints had no significant effect on the complex structures obtained. This probably reflects the initial correctness of the input homology model for the scFv and input crystal structure for IL-1␤, which both fully satisfied all relevant NOE restraints. However, the inclusion of NOE data would be very important in correcting any significant errors in homology models for scFvs or in interpreting large conformational changes in the proteins on complex formation.
ScFvs as Models for Fabs-To the best of our knowledge, no direct evidence has been previously presented to show that a scFv and Fab with identical V L and V H domains bind in an essentially identical manner to a target protein. A comparison of the 15 N/ 1 H HSQC spectra for 15 N-labeled IL-1␤ bound to an equivalent scFv and Fab is shown in Fig. 6. The spectra of IL-1␤ bound to the scFv and Fab look nearly identical in terms of the positions of signals, which provides direct evidence that scFvs bind in a very similar manner to equivalent Fabs. Very subtle chemical shift differences can be seen in the N-and C-terminal regions of IL-1␤ ,which are involved in interactions with the antibody fragments ( Fig. 6 and supplemental Fig. 4). However, these are approximately one-tenth of those induced by scFv or Fab binding (Fig. 3b) and must reflect only very subtle variations in the interactions with the Fab and scFv, which are unlikely to be detectable by other NMR measurements, such as intermolecular NOEs, or by x-ray crystallography.

DISCUSSION
The work reported here clearly demonstrates that an NMR restraint-driven docking approach can be successfully used to determine a reliable and informative model for the structure of a scFv-target protein complex. One limitation of the approach described is the lack of direct experimental data to define the conformations of amino acid side chains in scFvtarget protein interfaces, which in the model reported here are simply a best computational solution to optimize the interactions of side chains involved in the interface. The precise scFv-IL-1␤ interactions involving side chains should therefore be viewed as reasonable possibilities, rather than confirmed interactions, but provide a good basis for the assessment of potentially key interactions by further experimental work, such as site-directed mutagenesis. Detailed analysis of the scFv-IL-1␤ interface and comparison with other antibody-antigen structures are summarized in Table  1 and supplemental Table 2. The nature of the interface, in particular the high content of aromatic and non-polar residues and the size of the buried surface area, are consistent with a representative sample of previously reported antibody-target protein complexes (35)(36)(37)(38)(39)(40)(41)(42).

TABLE 1 Summary of the interface residues and interactions made in the scFv-IL-1␤ complex
In addition to the expected contacts from CDR loops in the scFv, three framework (FR) residues also interact with IL-1␤. Residues highlighted with an asterisk are involved in hydrogen bonds across the interface.  Of particular importance for the design and humanization of therapeutic antibodies is a knowledge of which residues from the antibody make interactions with the target protein. In the scFv-IL-1␤ complex, 18 of the scFv residues make contacts with IL-1␤ and are distributed equally between the variable heavy and light domains, as summarized in Table 1. Residues in CDRs L1 and H2 make the most interactions with IL-1␤ with CDRs L2 and H1 playing no role in binding. Interestingly, of the eight representative antibody complexes selected for comparison from the Protein Data Bank (43), three (2GHW, 1LK3, and 1DZB) had interactions with the target protein involving framework residues as well as CDR loops (defined using the International ImMunoGeneTics (IMGT) information system (44)). This feature is also seen in the scFv-IL-1␤ structure reported here, with 3 scFv framework residues contacting IL-1␤ (Table 1). A comparison of the interface characteristics of antibodies selected as therapeutics with ones not selected specifically for this purpose (supplemental Table 3) reveals no obvious differences in the size of the contact surface, the types of interactions, or the locations of residues involved in interactions.
The strikingly similar spectra seen for IL-1␤ bound to equivalent scFv and Fab antibody fragments (Fig. 6) provide direct evidence that scFvs bind target proteins in an essentially identical manner to Fabs, which makes scFv-target protein complexes an attractive target for structural studies directed at a knowledge-based approach to rational design and humanization of antibodies. In addition to scFvs and Fabs targeted at IL-1␤, we have obtained equivalent quality NMR data for complexes formed with a number of secreted proteins, including sclerostin (45). This strongly suggests that NMR spectroscopy will provide a reliable and fairly widely applicable approach for determining high resolution models of the structures of scFvtarget protein complexes, which has the potential to have a major impact on the rational design, optimization, and humanization of therapeutic antibodies.