Kinetic and Affinity Predictions of a Protein-Protein Interaction Using Multivariate Experimental Design*

We measured the influence of 14 mutations and 5 environmental variables (buffer perturbation) on the association and dissociation rate of a camel single domain antibody (cAb-Lys3) interacting with hen egg white lysozyme using a surface plasmon resonance-based biosensor. Based on this data set, we constructed quantitative predictive models for both kinetic (ka and kd ) constants as for the affinity constant (K d ). Mutations, after parameterization by quantitative descriptors, and buffers were selected using multivariate experimental design. These models were able to predict the corresponding parameters of four new variants of cAb-Lys3. Moreover, the models provide insights to the important chemical aspects of the interacting residues, which are difficult to deduce from the crystal structure. Our approach provides useful physicochemical information of protein-protein interactions in general. The information obtained from this kind of analysis complements and goes beyond that of conventional methods like alanine scanning and substitution by closely related amino acids. The mathematical modeling may contribute to a rational approach in the optimization of bio-molecules of biotechnological interest.

Protein-protein interactions attract a lot of attention in this era of proteomics and antibody technology. Naive and synthetic antibody libraries in combination with phage and ribosome display are now routinely used to obtain antibody molecules with desired properties and specificities (1)(2)(3)(4)(5)(6). However, the affinity and/or stability of the original binder retrieved from the library often requires improvements. Molecular evolution techniques (1)(2)(3)(4)(5)(6)(7)(8) and rational design using sophisticated computer programs (9 -16) are two approaches to achieve this goal. Drawbacks to the former strategy are its dependence on the generation of large libraries, the difficulty in discrimination between binders with small differences in affinity, and the bias in favor of binders with improved expression levels (1,3). Additionally, codon bias and the generation of nonfunctional or deleterious mutants are major drawbacks of this strategy (1,3). The rational design strategy largely depends on the availability of the structure of the protein-protein complex, and, although predictive algorithms have been improved significantly over the years, predictions of interaction energies, especially those related to amino acid modifications, remain highly problematic (9 -16).
It is generally accepted that experimental data on the energetic contribution of the modified amino acids are required to develop algorithms relating binding energetics to particular amino acid modifications. Many analyses have used site-directed mutagenesis, especially alanine scanning (17)(18)(19)(20)(21) and substitution by closely related amino acids (21)(22)(23), to determine the energetic contributions of the amino acid side chains and functional groups. These strategies have shown to be very attractive because of their simplicity. However, they often lead to misinterpretations and inconclusive results (21)(22)(23)(24)(25). One of the possible reasons for these failures is that quantitative measures like kinetic and affinity constants have been related to amino acid substitutions in a qualitative or intuitive way, effectively reducing the analysis to the presence or absence of a side chain or functional group. Other chemical property differences between amino acids are not considered. For instance, a tyrosine residue is often considered to have the same chemical nature as a phenylalanine, except for an additional hydroxyl group (19,21,23). It is not taken into account that addition or removal of functional groups of a residue may influence the chemical properties of the residue as a whole, thereby changing the chemical nature of the interaction more than intended. To analyze the effect of various physicochemical properties of the amino acids, replacement of a residue by a range of different amino acids may be required. In addition, the introduced modification has to be quantified to extract a mathematical correlation model. "Quantitative structure activity relationship" (QSAR) 1 is a method of choice to construct such models and has already been used extensively in the field of rational drug design. The QSAR uses descriptive scales (e.g. electronic, steric, and hydrophobic properties) to account for the observed correlation between the structure and activity of protein-ligand interactions (26 -32). The QSAR models are predictive and are often combined with powerful multivariate statistical methods, thereby extracting a maximum of information from a minimum * This work was supported by the Vlaams Interuniversitair Instituut voor Biotechnologie and the Fonds voor Wetenschappelijk Onderzoek Vlaanderen. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
In this paper, we demonstrate the use of QSAR to investigate the interaction of the camel single domain antibody, cAb-Lys3, with its antigen, hen egg white lysozyme (33,34). The cAb-Lys3 antibody fragment is a well studied model system, of which several crystal structures in complex with lysozyme are available (33)(34)(35).
Two residues of the CDR3 loop of cAb-Lys3 were replaced simultaneously by various amino acids, and quantitative descriptions for the substituted residues were used. In our analysis we related the sequence of these two positions to the kinetic parameters of the interaction. In addition, the binding of all mutants to lysozyme was characterized in several buffers containing a variety of chemical additives, to explore the sensitivity of binding resulting from changes in chemical environment for each mutant. Both the amino acid replacements and the buffer additives were selected using a multivariate experimental design.

MATERIALS AND METHODS
Reagents-All reagents were analytical grade. Hen egg white lysozyme (HEL) was purchased from Roche Molecular Biochemicals. PDEA, EDC, NHS, ethanolamine, and dithioerythritol were obtained from Biacore AB, and cystamine was obtained from Sigma-Aldrich. Restriction enzymes were purchased from Invitrogen.
Amino Acid Numbering and Notation-The amino acids are numbered as they occur in the linear sequence of cAb-Lys3. Positions 101 and 105 correspond to, respectively, 97 and 100a in the Kabat numbering (36). The cAb-Lys3 mutants are referred to by a two-letter code where the first and second letter represents the amino acid (singleletter code) at position 101 and 105, respectively. The wild type cAb-Lys3 is thus represented by TS because it possesses threonine and serine at the respective positions.
Selection of the Positions for Modification-We needed to identify positions of cAb-Lys3 where an amino acid substitution would modulate antigen binding but does not abolish the interaction. These are to be found at the periphery of the interaction interface (37). We analyzed the crystal structure data of the cAb-Lys3 with HEL (34) and looked for (i) two amino acids, (ii) within the CDR3 loop, (iii) that were not in direct contact with each other, and (iv) that were at the edge of the antibody/ antigen interaction surface. The residues at positions 101 and 105 comply with those criteria. In addition, we were attracted by Ser-105 because it is located at the tip of the CDR3 loop of cAb-Lys3 and it is in contact with the catalytic residues Glu-35 and Asp-52 of HEL (34). Furthermore, we knew from previous (unrelated) experiments that Ser-105 could be replaced without dramatic effects on the cAb-Lys3 affinity for HEL. 2 Experimental Design of the Amino Acid Replacements-In the selection of the amino acid substitutions at positions 101 and 105 of cAb-Lys3, we excluded tryptophan, tyrosine, and phenylalanine at either position because inspection of the crystal structure revealed that these residues at those positions would lead to large sterical clashes with the antigen. Additionally, cysteine was not included to avoid dimerization and folding problems of the recombinant mutant proteins.
The amino acids at positions 101 and 105 were parameterized using three scales: ZZ1, ZZ2, and ZZ3, representing, respectively, hydrophilicity, size/polarizability, and electronic properties (polarity, electrophilicity, charge, electronegativity) (27) (Table I). To resolve six parameters in the model, more than six mutants are required. We designed eight new mutants, mainly double mutants, and added six already available single mutants (TG, TA, TP, TH, TQ, TN). Therefore, 14 single and double mutants at position 101 and 105 of cAb-Lys3 were envisaged. The final selection of the eight new mutants was based on the optimization of the condition number of the design matrix (29,38). The condition number is a measure of the degree of linear dependence between the rows of the descriptor-matrix. For a statistically perfect design, the condition number would be 1 (29). Condition numbers between 1 and 5 are considered to be good; above 5 the design becomes progressively less acceptable (29). The optimization of the condition number was performed by manually testing various combinations of double amino acid mutations (positions 101 and 105) combined with the preset single mutants (position 105), until a condition number of 2.81 was obtained. An ordinary D-optimal design generator could not be applied here, because this algorithm assumes that all descriptors can be varied independently, which is not the case for amino acids (three parameters are fixed per amino acid).
Generation of the cAb-Lys3 Mutants-Single mutants TG, TA, TP, TM, TH, TQ, and TN were initially constructed by randomization of the codon at position 105 in a cAb-Lys3 pHEN4 vector (39) by PCR. The sequence of the single mutants was analyzed on an ALF automatic sequencer (Amersham Biosciences) and recloned in a vector in which the hemagglutinin tag and geneIII between NotI and EcoRI sites were replaced by a fragment encoding six consecutive histidine codons followed by a termination codon. This modified expression vector is called pHEN6 cAb-Lys3 (39). The mutants LS, VN, RT, PG, SQ, HD, QP, and MV of cAb-Lys3 were constructed in a pHEN6 cAb-Lys3 vector by site-directed mutagenesis of closed circular DNA in vitro, according to  Expression and Purification-Periplasmic expression, extraction, and purification by immobilized metal affinity chromatography and gel filtration of mutants and wild type cAb-Lys3 were performed according to Lauwereys et al. (39). The concentration of the mutants and wild type cAb-Lys3 was determined spectrophotometrically at 280 nm using the calculated sequence-based extinction coefficient (41) which is equal to 27,220 M Ϫ1 cm Ϫ1 for all mutants.
Immobilization of PDEA-modified Lysozyme-Lysozyme was modified with PDEA to introduce reactive disulfide groups before immobilization via surface thiol coupling, as described in the BIAapplications Handbook (42). This immobilization method was chosen based on the closest approximation to a 1:1 binding model for the binding of eight mutants and the wild type cAb-Lys3 to the immobilized ligand.
PDEA-modified lysozyme dissolved in 10 mM NaAc buffer, pH 5.5, was coupled to CM5 chips at a rate of 5 l/min. Two flow cells (fc1 and fc3) were incubated with 35 l of EDC/NHS (mixture of EDC (200 mM) and NHS (50 mM)) and subsequently with 35 l of 1 M ethanolamine, pH 8.5, and served as reference surfaces for the kinetic measurements. Two other flow cells (fc2 and fc4) were used to immobilize PDEA-modified lysozyme at 80 RU and 500 RU, respectively. The two flow cells were treated with 10 l of EDC/NHS and subsequently with 15 l of cystamine dihydrochloride, followed by 15 l of dithioerythritol. Manual injection of the PDEA-modified lysozyme was then performed until the aimed immobilization levels were obtained. Excess reactive thiols on the chip were deactivated by 20 l of PDEA-NaCl.
Experimental Design of the Perturbation Buffers-All buffers used were phosphate-buffered saline (PBS)-based. The chemical additives NaCl, urea, Me 2 SO, KSCN, and pH, and their ranges were chosen as detailed by Andersson et al. (29) and Choulier et al. (38). To analyze the main effects of these five chemical factors on the (mutant) antibody antigen interaction, a standard fractional factorial design to a resolution of III was applied, which consists of 11 experiments (2exp(5-2) buffers to probe the high and low value of five factors without resolving interaction effects plus three replicate centerpoints) (see Table II) (43).
Measurements⁄Fittings⁄Calculations-Four Biacore ® 3000 instruments and four Sensor Chips (CM5) were used to obtain the data at 25°C and 30 l/min, with PBS supplemented with 3 mM EDTA and 0.005% surfactant P20 (Table II) as running buffer. Replicate runs were performed on a different chip and instrument. Each mutant was characterized in PBS at four protein concentrations ranging from 12 M to 31.25 nM, depending on the affinity of the mutant for HEL, and was subsequently tested in 11 perturbation buffers (in randomized run order). The mutants were injected for 2 min, and dissociation data were collected during 5 min (perturbation buffers) or 10 min (PBS). Regeneration after each cycle was performed using 10 mM glycine-HCl, pH 1.0, for 2 min.
Control injections using 1500 nM TG mutant in PBS was performed before and after two kinetic cycles to confirm that the activity of the immobilized lysozyme did not change over time.
Zero concentration data (injection of buffer alone) were always subtracted from the sensorgrams before fitting. Kinetic parameters were obtained by global fitting of the sensorgrams to a 1:1 model using BIAevaluation 3.1 software. The R max parameter, representing the binding capacity of the chip, was fixed to the average equilibrium response of the TG injections.
Mathematical Modeling-All mathematical modeling was performed in Modde 5.0 (Umetrics AB, Umeå, Sweden) essentially according to Andersson et al. and Choulier et al. (29,38). A "quantitative sequence kinetic relationship" (QSKR) (29,38) mathematical model relating the logarithm of k a , k d , and K d in PBS buffer to the ZZ scales at position 101 and 105 of the mutants in the design was obtained using multiple linear regression (MLR). Logarithmic transformation of the kinetic and affinity constants was required to get normally distributed data (a prerequisite for MLR) (27,29).
A "quantitative buffer kinetic relationship" (QBKR) (29, 38) model relating the logarithm of k a , k d , and K d of the mutants in the design to the buffer composition was performed for each replicate separately using MLR. Again, logarithmic transformation of the kinetic constants and affinity constant was necessary to obtain normally distributed data. For each QBKR, a chemical sensitivity fingerprint was derived according to Andersson et al. (29) and Choulier et al. (38) by normalizing the coefficients (dividing the coefficients by the constant term in the linear model). "Quantitative sequence perturbation relationship" (QSPR) was performed by relating the sequence descriptors (ZZ scales) of the mutants with the chemical sensitivity fingerprints using partial least squares (PLS).
The quality of all the mathematical models was estimated using leave-one-out cross-validation (Q 2 ) and the correlation coefficient (R 2 ). Q 2 reflects the fraction of the variance of the data that can be predicted by the model, as opposed to the correlation coefficient R 2 , which reflects the percentage of the variance in the responses that can be explained by the model. Large Q 2 (Ն0.7) indicates that the model has good predictive ability and will have small prediction errors (44).

Properties of the Target Positions-
The crystal structure of cAb-Lys3 in complex with HEL revealed that Thr-101 and Ser-105 are at the edge of the interaction interface (34). These residues are part of the protruding CDR3 loop of cAb-Lys3, and their atoms are not in direct contact with each other in the folded protein (Fig. 1). The residues are contacting lysozyme via their side chains. The methyl group of Thr-101 fills a hydrophobic cavity formed by residues Asn-103 and Ala-107 of lysozyme and the aromatic ring atoms of Tyr-103 of cAb-Lys3. The polar hydroxyl group of Thr-101 is pointing toward the solvent ( Fig. 2A). Ser-105 is located at the tip of the CDR3 loop of cAb-Lys3 and contacts Gln-52, Glu-35, and Asp-52 of HEL (Fig. 2B). With the possible exception of large aromatic side chains, mutants can be generated at these positions that will have a minimal disturbance of the overall antibody-antigen contact. Thus, the cAb-Lys3 amino acids at these positions (101 and 105) were chosen to be replaced. The mutants are expected to affect binding as opposed to abolishment. Indeed, previous affinity measurements of the single mutants at position 105 revealed that Ser-105 in the wild type cAb-Lys3 is not crucial for binding, although the affinity can be severely influenced (see, for example, Table IV (parts a and b) with TN, TH, TQ, TA, TG, TP, and TM having a K d ranging from 700 to 9 nM).
Experimental Design of the Amino Acid Replacements-The selection of the mutants at positions 101 and 105 was performed as described (see "Experimental Design of the Amino Acid Replacements" under "Materials and Methods"). The result of the experimental design is a matrix of 14 mutants ϫ 6 ZZ scales (Table III, part a) with condition number 2.81. The double mutants SS and PS (generated fortuitously by an artifactual chemical synthesis of the mutagenic primer intended to encode SQ and PG, respectively), the mutant TM, and the wild type cAb-Lys3 (TS) (Table III, part b) were not included in the design, because they were not necessary to improve the condition number.  a and b). The mutants cover a broad range in k a , k d , and K d (1000 -100,000 M Ϫ1 s Ϫ1 , 0.0004 -0.3 s Ϫ1 , and 20,000 -10 nM, respectively).
The values of the kinetic constants and affinity constant were highly reproducible for the replicate runs, as well as between the two immobilized flow cells on the same sensor chip. No mass transfer limitations were detected. All binding traces fitted well to the 1:1 binding model.
The QSKR analysis resulted in models with very good statistical properties: (R 2 , Q 2 ) of (0.93, 0.88), (0.83, 0.85), (0.85, 0.77) for log k a , log k d , and log K d , respectively, and they are described by the following mathematical equations. Equations 1-3 were used to predict the values of k a , k d and K d for three additional mutants of cAb-Lys3 not included in the design, as well as for the wild type cAb-Lys3 (Tables III (part b) and IV (part b)). Fig. 3 (A-C) shows the observed versus predicted values for log k a , log k d , and log K d for all cAb-Lys3 variants. The plot shows good predictability for all three constants. From Fig. 3 we also see that the absolute values of k a , k d , and K d of the four extra proteins are predicted by their models within an order of magnitude.
Inspection of the residuals versus predicted responses (Fig. 3, D-F) shows that the fit to these linear models is very good, that there are no important interaction terms between the amino acids at the two mutated positions, and that there are no important interactions between the ZZ scales per position. This is also reflected in the two double mutant cycles present in the data, namely SQ-TQ-TS-SS-SQ and PG-TG-TS-PS-PG. They Plots of scaled and centered coefficients of the models are presented in Fig. 4. We observe that both positions affect the affinity and kinetics of the interaction. The largest negative effects on log k a are ZZ2 101 , ZZ3 105 , ZZ3 101 , ZZ2 105 , and ZZ1 101 (in descending order). ZZ1 105 has a quite significant positive effect on log k a .
The large negative effect of ZZ3 105 is a striking feature in the log k d model. ZZ3 101 , ZZ2 101 , ZZ1 105 , ZZ1 101 , and ZZ2 105 in descending order of importance determine positive effects.
The log K d model, represented by the subtraction of the log k d from the log k a model, reflects the global binding energy of the interaction as a function of the factors under consideration. In descending order of importance, ZZ3 105 and ZZ1 105 are the negative factors and ZZ3 101 , ZZ2 101 , ZZ2 105 , and ZZ1 101 are the positive factors of the log K d model (Fig. 4C).
By comparing the importance of each ZZ scale (Fig. 4) to the physicochemical properties that they represent (Table I), we find that, at position 101, small hydrophobic/apolar amino acids are optimal for all three parameters of binding. At position 105 we observe that a small polar and electrophilic amino acid would be optimal for k d and K d , whereas a fast on-rate (k a ) would be favored by a small apolar (and nonelectrophilic) amino acid.
QBKR-The QSKR analysis uses sequence modifications to perturb the protein-protein interaction, to extract information on the physicochemical characteristics of the interaction. However, the QSKR perturbation is restricted in this case by the fact that only 19 mutations can be constructed per residue position. Alternatively, perturbation of the interaction can also

IV Kinetic and affinity constants of the mutants
The measurements were performed in replicate: k a 1, k a 2 represent the k a of the first and the second independent measurement, respectively. The same notation was used for the replicates of k d and K d .   3. Observed versus predicted values for k a (A), k d (B), and K d (C) and the corresponding residuals versus predicted log k a (D),  log k d , (E) and log K d (F)  be achieved by changing environmental parameters. Therefore, we extended our analysis by measuring the binding kinetics of all our mutants in a number of perturbation buffers. The data for each of the 18 cAb-Lys3 variants (14 designed mutants, TM, SS, PS, and wild type cAb-Lys3 (TS)) in this analysis consist of the kinetic and affinity constants measured in 11 perturbation buffers containing five buffer additives, NaCl, urea, KSCN, Me 2 SO, and pH.

a. Kinetic and affinity constants of the mutants in the design
The values of the kinetic constants and affinity constant were highly reproducible for the replicate runs as well as between the two immobilized flow cells on the same sensor chip. No mass transfer limitations were detected. All binding traces fitted well to the 1:1 binding model. Because of very fast kinetics, no data for k a and k d could be obtained for the binding of the SQ mutant in buffers 1, 3, 6, 9, and 11. For these runs K d data were obtained from the equilibrium responses.
For most mutants we obtained very good statistical properties for the QBKR models. Some mutants (e.g. SQ and TH) resulted in models of poor quality caused by high or low k d values, which were barely perturbed by the buffer additives. These data were therefore excluded during further analysis.
Coefficient plots for log k a , log k d , and log K d for the wild type cAb-Lys3 are shown in Fig. 5. The sensitivity fingerprints for log K d are given for both replicates of each mutant in Table V. (The sensitivity fingerprints for log k a and log k d can be obtained from the corresponding author upon request.) From Table V we observe that the normalized coefficients for KSCN of the TQ, TH, and RT mutants differ in sign between the replicates because the QBKR coefficient for KSCN falls within the noise level.
QSPR-In this analysis we correlated the sensitivity fingerprints of the different mutants to the ZZ scale matrix. The result is a sensitivity model for the kinetic constants for a given chemical substance related to the ZZ scales at position 101 and 105. For example, for sensitivity of log k a to changes in NaCl concentration is represented by the following model: In contrast to the coefficients in the QSKR and QBKR models that are directly proportional to the binding parameter, the coefficients in the QSPR model are proportional to the variation of the binding parameter. Depending on the sign (ϩ/Ϫ) of the coefficients in the sensitivity fingerprint, the interaction parameter increases or decreases in magnitude upon increasing levels of the added chemical. Correlation of the sensitivity fingerprint and the ZZ scale matrix was obtained by PLS. Only the sensitivity model for NaCl yielded Q 2 values above 0.5, and we therefore limit our analysis to these models. The NaCl sensitivity models are called NaCl Sens k a , NaCl Sens k d , and Na-Cl Sens K d for k a , k d , and K d , respectively, and yielded a Q 2 of 0.72, 0.52, and 0.68.
From Fig. 6 we see that ZZ2 101 has a large positive effect on the salt sensitivity for K d , whereas ZZ2 105 has the largest negative effect. This salt sensitivity is predominantly manifested in the k a parameter for position 101 and in the k d parameter at position 105.
The interaction of cAb-Lys3 with lysozyme will have a larger decrease in K d when the salt concentration increases if one or both of the following two conditions are fulfilled: 1) large hydrophobic and electrophilic amino acids at position 101 and 2) a small hydrophilic amino acid at position 105. DISCUSSION QSKR-Affinity maturation in vitro is a major challenge in antibody engineering. Binders from naive or synthetic scFv libraries with the required antigen specificity often have suboptimal affinities. Molecular evolution techniques have been employed as a strategy to improve the affinities of moderate binders (1-8). However, these methods are based on the random insertion of mutations and reselection, so that a favorable  for log k a (A), log k d (B), and log K d (C). The coefficients were scaled and centered by orthogonal scaling (Modde 5.0 user guide and tutorial (44)). The error bars represent the 95% confidence interval. mutation might be cancelled by a negative mutation elsewhere in the sequence. Therefore, rational design of modifications within the sequence of these moderate binders constitutes on a longer term a preferred approach to optimize the affinity (9 -16). Although many attempts have already been made to predict the effect of amino acid modifications on the affinity, these predictions remain highly problematic and the molecular evolution techniques remain largely the methods of choice. In the present study, we aimed at a rational approach to probe two interacting residues in a protein-protein interface and to extract a model which quantifies the contribution of three descriptor variables, ZZ1-3, to the interaction parameters, k a , k d ,  Lys3 models for log k a (A),  log k d (B), and log K d (C) of the QBKR analysis, a so-called "fingerprint." The coefficients were scaled and centered by orthogonal scaling to be comparable (Modde 5.0 user guide and tutorial (44)). Both replicate runs were used to obtain the plots. The R 2 and Q 2 values of the MLR are presented to demonstrate the validity of the models. The error bars represent the 95% confidence interval. and K d , measured in a standard buffer. The cAb-Lys3/lysozyme complex was taken, and positions 101 and 105 of cAb-Lys3 were chosen to be modified. These residues are located at the edge of the interface and interact with lysozyme via their side chains. The residues at the edge of the interface are the best targets for affinity optimization, because they possess more mutational flexibility than the amino acids located at the center of the protein-protein interface, which often determine the bulk of the affinity of the interaction (37). It has also been shown for antibodies that in vivo maturation is accompanied by an increased variability of the amino acids of the CDR regions located at the edges of the interface (45). Mutations at these positions exert moderate effects to the association and dissociation kinetics of the interaction (37).
The results of our analysis show that the linear models correlating the logarithm of k a , k d , and K d to the ZZ scales of the amino acids introduced by mutagenesis describe and predict the observed data very well. With Q 2 values of 0.88, 0.83, and 0.77, they are as good or even better than previously obtained models for kinetic and binding parameters using QSAR methods tested on enzyme/substrate or antibody/oligopeptide systems (26 -32, 38). The mutants not included in the design (Tables 3 (part b) and 4 (part b), TM, SS, PS, and TS(wt)), but within the ZZ scale ranges of the design were predicted successfully within an order of magnitude. This is remarkable because small deviations from the model will lead to a quite large deviation in absolute value, because of the logarithmic transformation of the data. The physical interpretation of the logarithm of the kinetic and affinity constants is an evaluation of the activation and binding energy of the interaction as opposed to the absolute values of k a , k d , and K d (⌬G k a° ‡ ϭ ϪRTlnk a , ⌬G k d° ‡ ϭ ϪRTlnk d , ⌬G°ϭ ϪRTlnK d ) (20,21). In this regard, we consider the model to be a function of energy terms corresponding to the coefficients of the ZZ scales.
In energy terms, the standard deviation of the prediction residuals for association, dissociation, and affinity is 0.35, 0.49, and 0.56 kcal/mol for all mutants and the four variants not included in the design, indicating that the models have good predictive power. These measurements are surprisingly precise because binding energy measurements have typical errors of 0.5 kcal/mol (24,25) and computational modeling analyses of binding energies of mutants frequently have errors exceeding 3 kcal/mol (9 -12, 14 -16, 21), even when training sets are used.
The models quantify clearly the contributions of the ZZ1-3 scales at each position in all three models, and no evidence for major interaction between the modified positions could be detected. At position 101 small hydrophobic and apolar, nonelectrophilic amino acids are optimal for all three parameters of binding and at position 105 different patterns are observed between the three models. A small, hydrophilic (nonelectrophilic) amino acid would be optimal for a high k a , whereas a polar and electrophilic amino acid would be optimal for a low k d and K d . This leads to different amino acid preferences for the three models at position 105. Glycine, threonine, and glutamine are predicted to be optimal to achieve a high k a . To obtain a low k d , proline, aspartic acid, and serine are predicted to be optimal, whereas aspartic acid, serine, glycine, and proline would result in an optimal (i.e. low) K d constant. This clearly shows that the importance of the chemical properties of the amino acid at this position is different when considering activation energy (reflected in the k a parameter) and the Gibbs free energy of binding (⌬G°) (reflected in the K d parameter). Energetic contributions of interacting amino acids in protein-protein interactions have mostly been evaluated by the ⌬G°of the interaction. Evaluation of activation energy could also be very useful in elucidating docking trajectories of protein-protein interactions (20,21). Fig. 4 reveals that the effect of ZZ1 105 , important in both the log k a and log k d model, is neutral in the log K d model, because of the positive coefficient (of approximately equal magnitude) of this factor in log k a and log k d models (log K d ϭ log k d Ϫ log k a ). Limiting the analysis to the K d (⌬G°) makes that the importance of this factor cannot be retrieved.
Thr-101 in wild type cAb-Lys3 has a methyl group occupying FIG. 6. Column plot representation of the PLS coefficients for the Na-Cl Sens k a (A), NaCl Sens k d (B), and Na-Cl Sens K d (C) models of the QSPR analysis. The coefficients were scaled by unit variance to be comparable (44). The error bars represent the 95% confidence interval. a relatively hydrophobic cavity in the interface of cAb-Lys3 and lysozyme ( Fig. 2A). In conventional mutagenesis strategies, T101S is considered to be a conserved amino acid replacement. We calculated from the crystal structure that this replacement would introduce a cavity of 34 Å 3 in the interface. From Table  4, part b, we infer that the SS mutant leads to a 20-fold reduction in affinity compared with TS. The introduction of a cavity results in this case to a decrease of 1.8 kcal/mol, which is about 1 kcal/mol larger than the estimated loss in binding energy for a cavity of this size (24 -34 cal/mol Å 3 ) (25). However, our QSKR model confirms that ZZ3 101 is an important factor in binding. ZZ3 corresponds to the polarity and electrophilicity of the amino acid (27). Serine, as seen from Table I, has a higher ZZ3 value than threonine; therefore, sterical aspects do not purely determine the loss in binding energy of the T101S replacement. This rational is corroborated by the observation that a valine at position 101 (VN compared with TN in Table 4, part a) leads to a higher affinity than the wild type residue because of its lower ZZ3 value. This shows that estimation of the contribution of functional groups by conventional methods, where the functional group is considered to be independent from the rest of the side chain, would lead to incorrect interpretation of the contribution to the binding energy of the interaction.
We observed that the wild type amino acid sequence at positions 101 and 105 is quasi-optimal for binding, as we can expect from an in vivo maturated antibody. However, our analysis suggests that the affinity can still be improved. At position 101 a valine (V) is predicted to have a better affinity than the wild type residue, as for position 105, an aspartic acid (D) would be favored, although predicted differences between wild type TS and the predicted VD mutant are small (30 nM compared with 6 nM). It is clear from the comparison of the TN with the VN mutant (Table 4, part a) that the latter mutant has a 2-fold lower k d value than the former. Because of the high accuracy of k d constants in surface plasmon resonance (29), we consider this difference to be significant. This shows that, although differences in binding strength are small, QSKR gives supportive evidence that these small differences in observed affinities and kinetic constants are true. Extension of this analysis to all interacting residues in the binding interface could result in a considerable increase in binding strength of the interaction by optimizing a number of contacts that show small differences in binding energy compared with the wild type protein. This could also lead to a rational design strategy to improve affinities or kinetics of binders isolated from naive or synthetic antibody libraries (1)(2)(3)(4)(5)(6). Using a limited set of double or triple mutants and generating predictive models, we may be able to increase the affinity considerably. We showed that the combination of two mutated residues in the binding interface can lead to a 2000-fold difference in affinity ( Fig. 3C and Table IV), even though the residues play a secondary role in binding.
QBKR and QSPR-In the QBKR analysis, we obtained reliable models relating the effect of NaCl, urea, KSCN, Me 2 SO, and pH to the kinetics and affinity of the interaction of the cAb-Lys3 variants with lysozyme. The models were converted to chemical sensitivity fingerprints, which were related to the ZZ scale matrix (Table III, part a) in a QSPR analysis. Here we probe the effect of ZZ1-3 at positions 101 and 105 in the sequence on the sensitivity of the binding parameters to the different chemicals. Only the salt sensitivity could be reliably correlated to the ZZ scales. The other models had a low signal to noise ratio as a result of the limited differences in the perturbation of the kinetic or affinity constants by most of the chemical additives.
In the k d model, we see that position 101 does not significantly contribute to the salt sensitivity, whereas at position 105 a small hydrophilic amino acid increases the salt sensitivity. Smaller amino acids at this position are accompanied by a decrease of the dissociation rate parameter at higher salt levels, whereas larger amino acids are accompanied by an increase in dissociation rate at high salt levels. At this site there are two possible interaction partners for sodium ions at pH above 7.0, namely Asp-52 and Glu-35 of lysozyme (Fig. 2B). In the vicinity of position 101, there are no charged groups present that may interact with sodium ions, possibly leading to a relative insensitivity of the k d to sodium chloride ( Fig. 2A).
In the k a model for salt sensitivity, we see that the larger amino acids at both positions increase the salt sensitivity. Comparison of Figs. 6A and 4A reveals an opposite pattern in the coefficient plot, showing that mutants leading to a higher salt sensitivity of k a also have a lower intrinsic k a . Because all mutants show an increase of the association rate parameter upon increasing salt concentration, a higher sensitivity of k a to salt is accompanied by an increase in k a . The wild type protein, which has a high association rate and a low NaCl sensitivity, could therefore have optimized the amino acids at these positions in vivo to partially resolve unfavorable long range electrostatic repulsions in the association process.