A Procedure for Detection and Quantitation of Cavity Volumes in Proteins

Accurate identification of cavities is important in the study of protein structure, stability, design, and ligand binding. Identification and quantitation of cavities is a nontrivial problem because most cavities are connected to the protein exterior. We describe a computational procedure for quantitating cavity volumes and apply this to derive an estimate of the hydrophobic driving force in protein folding. A grid-based Monte Carlo procedure is used to position water molecules on the surface of a protein. A Voronoi procedure is used to identify and quantitate empty space within the solvated protein. Additional cavities not detected by other existing procedures can be identified. Most of these are close to surface concavities. Residue volumes for both the interior and the surface residues as well as cavity volumes are in good agreement with volumes calculated from fully hydrated protein structures obtained from molecular dynamic simulations. We show that the loss of stability because of cavity-creating mutations correlates better with cavity volumes determined by this procedure than with cavity volumes determined by other methods. Available structural and thermodynamic data for a number of cavity-containing mutants were analyzed to obtain estimates of 26.1 cal·mol−1·Å−3 and 18.5 cal·mol−1·Å−2 for the relative contributions of cavity formation and the hydrophobic effect to the observed stability changes. The present estimate for the hydrophobic driving force is at the lower end of estimates derived from model compound studies and considerably lower than previous estimates of approximately 50 cal·mol−1·Å−2 derived from protein mutational data. In the absence of structural rearrangement, on average, deletion of a single methylene group is expected to result in losses in stability of 0.41 and 0.70 kcal·mol−1resulting from decrease in hydrophobicity and packing, respectively.

An understanding of the folding and stability of proteins depends on an accurate evaluation of the interaction of buried groups with each other and the interaction of exposed groups with solvents. The basic concepts of packing density, packing interaction, atomic volume, and solvent-accessible surface area have been extensively applied to such studies for many years (1)(2)(3)(4). These studies have shown that close packing is an important feature of protein structure and stability. While studying the packing density of lysozyme and ribonuclease, it was determined that the average value of packing density in the protein interior is close to 0.75 (1). Although on average the protein interior is highly packed, there are variations in the packing efficiency. In bovine pancreatic ribonuclease (RNase A), for example, this number ranges from 0.66 to 0.84 in different regions of the protein (1). Cavities in proteins are a major contributor to low packing densities and decreased stability (5). Cavities and surface grooves are also potential sites for binding of drugs, ligands, and other proteins. Hence, identification and detection of cavities is important in the study of protein structure, stability, and design.
There are several existing procedures/programs such as VOI-DOO (6), MS package (7,8), VOLBL (9,10), and CAST (11) for cavity detection in proteins. However, some of them are often unable to detect cavities close to the protein surface or to identify surface grooves. We present here a simple and robust procedure to detect cavities in proteins. The method uses a Monte Carlo (MC) 1 procedure described previously (12) to solvate the protein and to identify internal cavities. We compare the performances of the existing programs with that of the MC method here.
For an infinite set of arbitrary points/atoms in space, a geometrical procedure called the Voronoi procedure (13) divides up space with a unique volume assigned to each point or atom. Voronoi volume calculations are one of the best methods available to study packing in proteins (1). To determine the volume of space associated with an atom, the procedure requires the location of all neighbors. The procedure works well for atoms in the protein core. However, for protein atoms on the surface, many of the neighbors are often disordered water molecules that are not detected in protein crystal structures or nuclear magnetic resonance (NMR) structures (14,15). To address this problem, using bovine pancreatic trypsin inhibitor as an example, molecular dynamic (MD) simulations have been used (16) to realistically position water molecules around the protein. Such simulations are in principle the most appropriate method to overcome the difficulty in calculation of surface residue volume using the Voronoi procedure but are computa-tionally very expensive. We make use of a simple procedure described in earlier work (12) to overcome the problem of calculating volumes of surface protein atoms. The results of our procedure are in good agreement with volumes calculated from MD simulations.
It is generally thought that the hydrophobic effect is one of the major factors in stabilizing folded proteins (17). However, there has been considerable debate regarding the quantitative contribution of the hydrophobic effect to protein stability. Solubility and vapor pressure measurements on nonpolar solutes indicate a value of 25-30 cal⅐mol Ϫ1 ⅐Å Ϫ2 (18 -20). However, surface tension measurements at a hydrocarbon water interface suggest a value as high as 72 cal⅐mol Ϫ1 ⅐Å Ϫ2 . It was suggested that effects of curvature, surface roughness, and disorder account for this apparent difference and that a revised value for the strength of the hydrophobic effect is 47 cal⅐mol Ϫ1 ⅐Å Ϫ2 (4,21,22). This value was also in agreement with results of some protein engineering studies (23). In a typical protein engineering experiment, a hydrophobic residue within the core of a protein is substituted by a smaller hydrophobic residue. The resulting change in the stability is taken as measure of the difference between the hydrophobic stabilization provided by the two amino acids. However, such experiments carried out with different proteins or at different sites in the same protein have given varied results (23)(24)(25)(26)(27). This is possibly because of the differences in local packing interactions and cavity formation, which were not taken into account. A recent analysis of protein mutational data and small molecule transfer studies highlighted some significant differences between the two sets of data (28). It was suggested that the strength of the hydrophobic driving force was likely to be somewhat lower than the estimates of 25-30 cal⅐mol Ϫ1 ⅐Å Ϫ2 derived from the transfer studies. Other work has concluded that the unfavorable transfer free energies of nonpolar molecules from nonpolar solvents to water are primarily because of more favorable dispersion interactions in the nonpolar solvent, rather than an unfavorable free energy of interaction between nonpolar solute and water in the aqueous phase (29). In the present work we have tried to quantitate the relative strengths of packing and hydrophobic interactions in a number of cavity-containing mutants for which both structural and thermodynamic data were available. The data were analyzed to obtain relative estimates of the contributions of cavity formation and the hydrophobic effect to the observed stability changes in proteins.
MC Cavity Volume Calculations-Internal water molecules obtained using the MC procedure (12) in each of the individual iterations are used to localize the cavities in the MC method. These are clustered into distinct groups using a distance based criterion. The clustering results in distinct groups such that no member of a particular group will be within 2.5 Å of any member of any other group. The Voronoi volumes of these clusters were determined to obtain an estimate of the cavity volume.
To obtain volumes of cavities close to the site of a large to small substitution, the wild type protein was superposed on the mutant protein (31) using only the C␣ atoms. The number of internal water molecules, N m and N w (for the mutant and the wild type, respectively), in a sphere of radius 5.5 Å from the side-chain centroid of the wild type residue were counted. The difference between the sum of the Voronoi volumes of the N m and N w internal waters corresponded to the cavity volume (⌬[cavity volume]) created because of the substitution. A distance of 5.5 Å was chosen because, in the examples examined in this work, additional cavities (present only in the mutant protein but not in the wild type) of sufficient size to accommodate a water molecule were found only in the vicinity of the mutation. All appreciable structural changes were also local. It is desirable to have a cut-off to avoid considering artifactual small cavities and surface grooves that occur all over the protein. Small differences in the coordinates of wild type and mutants that are within the coordinate error sometimes result in differences in the locations and values of such cavity volumes. It is very difficult to evaluate the cumulative energetic effects of small, distributed changes in coordinates, but the excellent correlation observed (see below) between changes in stability and observed cavity volumes suggests that, in the systems examined, these effects are not appreciable.
Cavity Detection by VOIDOO, MS, and CAST-Cavity volumes were computed using VOIDOO (6), MS package (8), and CAST (11) and compared with those of the MC method. The VOIDOO program was obtained from x-ray.bmc.uu.se/usf/voidoo.html. For all our comparisons, a probe radius of 1.2 Å was used for VOIDOO and the MS package. This radius was shown previously to be optimal for cavity detection by the MS program (32). Each of these cavity detection programs measures different kinds of cavity volumes so the numbers are not directly comparable. CAST uses an alpha shape-based procedure to measure the cavity enclosed by the molecular surface of atoms that surround the cavity. VOIDOO is a grid-based procedure that measures the cavity volume defined by the van der Waals surface of atoms lining the cavity. Both CAST and VOIDOO have options to measure cavities enclosed by the solvent-accessible surface, but we did not use this. The MS program measures cavity volumes accessible to a probe sphere enclosed by the molecular surface of cavity lining atoms. The MC procedure developed in the present work measures the Voronoi volume of a cluster of overlapping spheres that map the cavity. The MS program reports the center of the cavity, whereas CAST reports only the atoms in the boundary of the cavity. The center of a cavity detected by CAST was generated from the center of mass of the boundary atoms. The centers of the cavities were represented by water molecule for visual examination using computer graphics. The results of CAST were received from cast.engr.uic.edu/cgi-bin/cast1/index.pl (11). For all the CAST calculations, the server default radius of 1.4 Å was used. In all discussions above, the volume of the cavity created because of large to small residue substitutions was obtained from the difference between the cavity volume in the mutant and the wild type protein (⌬cavity volume). Only cavities close to the site of mutation were selected, and the total cavity volume of these selected cavities was calculated. To compare cavity volumes calculated by different methods, model mutants were constructed and the cavity volumes calculated by each method were normalized as described under "Model Mutants." ⌬⌬G and ASA Calculations-The loss of stability in cavity-containing mutants, ⌬⌬G 0 U3 F is obtained as follows.
Solvent-accessible surface area (ASA) was calculated using the implementation of the Lee and Richards (33) algorithm with a probe radius of 1.4 Å and a z-section of 0.05Å. Solvent accessibility of a residue X was calculated as the ratio of the observed ASA of X in the folded state to its ASA in a Gly-X-Gly peptide of extended conformation (18). The ⌬⌬ASA resulting from mutation of a particular residue was determined as follows.
⌬ASA ϭ ASA(folded state) MD Simulations-The MD simulations were performed using the AMBER 5.0 suite of programs (34). NPT simulations at 300 K were carried out on 11 different solvated (3 wild type and 8 mutant proteins) (Table III) cavity-containing proteins. Starting models for these proteins were obtained from high resolution x-ray crystallography (RNase S: 1RNV, 1D5H, 1RBC; barnase: 1BNI, 1BRH, 1BRJ; T4 lysozyme: 1L63, 1L85, 243L, 244L, 243L). All the hetero atoms including crystallographic water molecules were removed from the respective Protein Data Bank coordinate files prior to equilibration. Using the BOX option in the EDIT module of AMBER, the protein coordinates were placed at the center of a pre-equilibrated box of TIP3P Monte Carlo water molecules. The box was then copied in all three spatial directions until the solute was completely solvated and the minimum distance between an atom of the protein and the wall of the box was 5.5 Å. For most surface atoms, this distance was much larger. In the RNase S simulation, for example, the average distance between a surface atom (greater than 80% accessibility) and the nearest wall was 11.5 Å. Those waters that made bad van der Waals contact with any protein atom were discarded. This resulted in the RNase S and the barnase group of proteins being solvated by ϳ2500 water molecules and the T4 lysozyme group by ϳ3000 water molecules. The total number of protein atoms for RNase S and barnase groups were ϳ9000 and ϳ12,000 for T4 lysozyme. The solvated proteins were heated from 0 to 300 K in 12 ps. The simulations consisted of 10 ps of equilibration followed by a 40-ps production run at 300 K. Coordinates were saved every 1 ps, resulting in 40 structures for each protein. The fluctuations in the total energy and temperature of all the proteins during the production run were less than 0.25 and 1.5%, respectively. The root mean square deviation of the MD average structures of each protein with the respective starting structures was less than 1 Å in all cases. The average excess Voronoi volume associated with atoms surrounding the respective cavities was determined from the MD structures. These were taken to be cavity volumes averaged over the MD simulations and were compared with corresponding cavity volumes calculated using the MC and CAST procedures. No solvent penetration into cavities was observed during the simulation. Relatively short simulation times were used, as the objective was only to obtain sets of protein structures with properly positioned solvent. In the case of 1BRJ and 244L, simulations were carried out for 200 ps. However, the results were similar whether only 40 ps or all 200 ps were analyzed.

RESULTS AND DISCUSSION
Procedure for Volume Calculations by MC Method-The protein molecule of interest is placed at the center of a pre-equilibrated box of transferable intermolecular potential (TIP3P) water obtained from a Monte Carlo simulation. Different configurations of solvent ensembles around the protein are generated in an iterative fashion as described previously (12). In every iteration water molecules that sterically overlap with protein are removed, leaving internal water molecules (those present in the cavities and surface grooves) and bulk water molecules (those surrounding the protein) (12). Shown in Fig. 1 is a schematic of a protein structure depicting positions of water molecules in the internal cavity, C, and surface groove, G, of a protein along with other hypothetical water molecules detected in a single iteration. At the end of several iterations (usually 25), internal water molecules detected in grooves and cavities are clustered into distinct groups. An empty space at which a cluster of internal water molecules is detected is considered to be a cavity or a surface pocket. The Voronoi volume of these clusters is calculated to obtain an estimate of the cavity volume. The Voronoi volume calculations are also performed on a solvated protein with and without the cluster of internal waters. Residue volumes calculated in the presence and absence of the cluster of internal waters give us an estimate of the excess space associated with residues that surround cavities.
Because of the empirical nature of the above procedure, there are some inherent limitations. The minimum size of detected cavities is on the order of ϳ9 Å 3 , close to the van der Waals volume of water molecule ϳ10 Å 3 . The maximum size of a spherical cavity detected in this procedure is ϳ310 Å 3 , the size of a sphere of radius 4.2 Å. However, most large cavities in proteins are nonspherical in nature, so this is not a serious limitation.
Calculation of Residue Volumes-After removal of internal water molecules in each iteration, volumes of protein atoms in the solvated protein structures were calculated using the Voronoi procedure (1,13,16,30). The Voronoi polyhedra were constructed using the radical plane method (35). Residue volumes were determined from the sum of the constituent atom volumes. In all our calculations, 25 iterations were used to generate different ensembles of surface water molecules. The volume of a residue is taken as the average of its volumes calculated in 25 iterations. To check the accuracy of the method, a comparison was made with similar calculations performed with solvated protein structures obtained during the course of MD simulations. Structures were extracted at different time points in the course of simulations. The simulations were carried out on RNase S (124 residues, 1RNV), barnase (110 residues, 1BNI), and T4 lysozyme (162 residues, 1L63) solvated by 2443, 2248, and 3096 water molecules, respectively. A comparison of the residue volumes calculated using structures from MD simulations with those calculated using the MC procedure is shown in Fig. 2. The figure shows the percentage of change in residue volume calculated using our procedure with respect to that obtained using structures from MD simulations as a function of residue depth (12). Residue volumes calculated with our procedure are in good agreement with those from MD simulations, although surface residue volumes are slightly higher using our MC method (Fig. 2). Approximately 25% of residues show 10% or higher volume in the MC method. The average depth of residues showing 10% or higher volume is 4.2 Å, indicating that only surface residues show this increase in the MC method. During the course of MD simulation, some solvent water molecules are able to come close to the protein surface atoms because of highly directional, specific charge and dipolar interaction. This results in slightly smaller Voronoi polyhedra than those constructed in our approach, which uses only a simple distance-based criterion for solvent positioning.
Model Mutants-Because each of the cavity volume procedures (CAST, VOIDOO, MS, and MC) computes different measures of the cavity volume, it is necessary to normalize the calculated values to allow comparisons. To carry out such a normalization as well as to compare the accuracy of volumes determined by the various procedures, we carried out calculations on artificial/model mutants of 14 proteins where large residues were replaced with smaller residues. Model mutants were created by deleting the side-chain atoms of the selected residue from the Protein Data Bank structure file. For example in a Phe 3 Ala model mutant, all Phe side-chain atoms after the CB atom would be deleted. We constructed a total of 43 X 3 Ala model mutants where X stands for Trp, Phe, Met, Leu, Ile, and Val. Only bulky residues with side-chain depth 8.0 Å were chosen. This was done to ensure proper comparison with the expected volumes as Voronoi volumes of only buried (18,19,30) residues are accurately known. Fig. 3 shows comparisons of cavity volumes determined by the various methods. The cavity volumes determined by all four methods were compared with the side-chain Voronoi volume (19) of the wild type bulky residue. The average expected cavity volume for an X 3 Ala model mutant was computed by subtracting the known Voronoi volume of Ala from that of residue X (19). The average expected cavity volumes of X 3 Ala model mutants were found to be related to the calculated volumes by linear relationships in all four cases (Fig. 3, Table I). This indicates that all the methods overestimate cavity volume at buried sites in a consistent and correctable fashion. In the MC method, this is probably the result of the use of a uniform cut-off distance of 2.6 Å to select the internal waters instead of using atom-specific van der Waals radius for atoms. The actual Voronoi volume of a particular cavity can be obtained using the following corrections, where m1 and m2 are the slope and intercept, respectively, of a plot of expected Voronoi volume as a function of [⌬cavity volume].
Expected Voronoi volume ϭ m1[⌬cavity volume] ϩ m 2 (Eq. 5) Unless otherwise stated, all cavity volumes reported hereafter are corrected using the above equations with values of m1 and m2 listed in Table I. Normalizing calculated volumes to Voronoi volumes is useful for many applications, as the Voronoi volume is the closest measure of the amount of space functionally occupied by an atom or residue in a protein. Furthermore, use of these normalized volumes allows one to compare cavity volumes calculated by different methods with those obtained from MD simulations (see below).
Although  (Table II). RNase S is a two-component system comprising S-peptide (residues 1-20) and S-protein (residues 21-124). The structures of the mutants analyzed here have a truncated S-peptide (residues 1-15) instead of the wild type peptide (36 -39). The VOIDOO and MS procedures failed to detect cavities in 17 of 34 and 13 of 34 cases, respectively. In the case of the MS procedure, cavities located close to the surface (as in several of the RNase S mutants) are not detected, suggesting that rolling probe methods are inefficient at detecting cavities close to the surface. VOIDOO also had difficulty with detection of small cavities (especially those associated with Val 3 Ala mutations) located close to the surface. In contrast, the MC and CAST methods both detected cavities in 32 of 34 cases and the calculated cavity volumes are similar to each other except in a few cases (Table II). Hence, for further analysis, we restrict ourselves to cavity volumes calculated by CAST and MC only. The cavity volumes determined by CAST in a few cases differ by ϳ30 Å 3 from those determined by our method. Fig. 4 shows stereo representations of the cavity detected by the MC procedure in one such case (barnase I88A mutant (1BRJ)). The empty space or cavity formed because of the replacement of Ile 3 Ala in the barnase mutant (Fig. 4a) is shown to be detected well by the space-filled grid points of the MC method (Fig. 4c). Fig. 4b shows that the cavity occupies the same physical space as the wild type Ile residue side-chain atoms. In the other cases where discrepancy existed, visual examination suggested that cavity volumes calculated by the MC procedure were more appropriate than those calculated by CAST.
The reason for the differences in calculated cavity volumes in the above mutants detected is not apparent. To examine which cavity calculation (MC or CAST) was more accurate, we carried out MD simulations on eight different solvated cavity-containing mutants of T4 lysozyme, RNase S, and barnase, which showed significant differences in normalized cavity volumes calculated by the two procedures. The excess Voronoi volumes associated with atoms surrounding cavities in these mutants were calculated using solvated MD structures of the wild type and mutant proteins extracted at different time points during the course of simulation. These excess volumes are taken to be realistic estimates of cavity volume in the mutants and were compared with volumes calculated using the two methods.
MD Simulations of Cavity-containing Mutants-Molecular dynamics simulations of solvated cavity-containing mutants of barnase (L14A, I88A), RNase S (F8A, M13A), and T4 lysozyme (F153A, I58A, I100A, V103A), as well as for the corresponding wild type proteins, were carried out to obtain structures with energetically correct positioning of water molecules around the proteins (Table III). Voronoi volume calculations were per- formed on solvated protein structures extracted at every 1 ps. The excess volume associated with atoms surrounding a particular cavity was determined as follows. The sum of Voronoi volumes of atoms surrounding a particular cavity was noted in the wild type, as well as in the mutant. In each case the Voronoi volume was averaged over the entire simulation. The difference of the average of these sums gives the Voronoi volume of the cavity averaged over the simulation (Table III). I100A and I58A mutants of T4 lysozyme and L14A and I88A mutants of barnase were chosen, as cavity sizes in these mutants determined by the MC method differed significantly from those calculated by CAST. Fig. 5 (a and b) shows the plot of Voronoi volume of cavities determined during the course of simulation versus cavity volume determined by the MC method and by CAST. The correlation of the MD cavity volumes with the MC method (r 2 ϭ 0.77) is better than CAST (r 2 ϭ 0.31). This indicates that, in the above mutants, the cavity volumes detected by the MC method are more realistic than those determined by CAST. Other advantages of the MC procedure are that the cavity space is partitioned among the surrounding residues and also that it is straightforward to visualize the cavity space. This partitioning is useful in deciding the nature and the positions of cavity filling substitutions. However, CAST is a much faster program than the current version of MC and, in several cases, the two programs yield similar cavity volumes (Table II). Cavity calculations for a 100-residue protein on SGI indigo 2 (195-MHz R10000 processor with 128 Mb of memory) requires less than 1 min with CAST, whereas calculations using the MC procedure take approximately 7 min.
Energetics of Cavity Formation in Proteins-Previous studies have attempted to correlate the energetic effects of large to small substitutions in proteins with cavity volume produced as a result of such substitutions (24,28,40). With mutants of T4 lysozyme, a good correlation was observed. However, this did not extend to other proteins (28), possibly because the existing methods had problems with cavity detection in several cases.
We have therefore calculated cavity volumes for a larger set of mutants with large 3 small residue replacements (Table II). Mutants of RNase S (28, 36 -39, 41), T4 lysozyme (24,40), barnase (42,43), human lysozyme (44), and Gene V protein (25,26) were chosen for this study. Only mutants with 15% or less accessibility in the wild type protein were considered for the study. Volume calculations were performed on the wild type as well as on the mutant proteins. ⌬[cavity volume], the difference between the cavity volumes in the mutant and wild type, is shown to correlate with the loss of stability ⌬⌬G 0 U3 F (r 2 ϭ 0.71, slope ϭ 37.5 Ϯ 2.9 cal⅐mol Ϫ1 ⅐Å Ϫ3 , intercept ϭ 0.59 Ϯ 0.12 kcal⅐mol Ϫ1 ) in the MC method. A poorer correlation is seen with cavity volumes calculated using CAST (r 2 ϭ 0.52, slope ϭ 36.1 Ϯ 4.2 cal⅐mol Ϫ1 ⅐Å Ϫ3 , intercept ϭ 0.49 Ϯ 0.30 kcal⅐mol Ϫ1 ) (Fig. 6, a and b). A high linear correlation between MC and CAST cavity volumes (r 2 ϭ 0.87, slope ϭ 0.88, intercept ϭ 10 Å 3 ) was seen, indicating that cavity volumes detected by both the methods are close to each other except in a few cases. In cases where there were large differences, MD simulations were carried out as described above. The results of the MD simulations clearly showed that the volumes calculated by the MC method were more accurate. ⌬⌬G 0 U3 F showed a much poorer correlation (r 2 ϭ 0.3 and 0.44) with cavity volumes calculated using the MS and VOIDOO programs, respectively (data not shown). This is probably caused by the fact that cavities close to the surface are often not detected by either program.
The loss of stability (⌬⌬G 0 U3 F ) resulting from such large to small substitutions was partitioned into two components: 1) resulting from the loss of packing at the site of mutation and 2) resulting from the change in the hydrophobic driving force as a result of the mutation. The latter term is quantitated in terms of the change in the free energy (⌬⌬G 0 tr ) of transfer of the amino acid at the site of mutation from solvent to the protein interior. The first component is proportional to the ⌬[cavity volume] and the second to the ⌬⌬ASA. ⌬⌬ASA is the difference between the mutant and the wild type residue ASA buried upon folding. To get an estimate of the relative contributions of these two factors, the data were fit to the following equation.
The values of m1 and m2 were 26.1 Ϯ 5.6 cal⅐mol Ϫ1 ⅐Å Ϫ3 and 18.5 Ϯ 5.4 cal⅐mol Ϫ1 ⅐Å Ϫ2 , respectively, in the MC method. The incorporation of the hydrophobic effect term improves the agreement between observed and calculated values of ⌬⌬G 0 U3 F . In Fig. 5, the root mean square deviation difference between the volumes obtained by MC and MD methods is ϳ10 Å 3 . In an attempt to see the effect of an error in cavity volume estimate of 10 Å 3 on the fitted parameters in Equation 6, the following two exercises were carried out. (a) Each cavity vol- FIG. 3. Cavity volumes in model mutants. The expected average cavity volume for a set of X 3 Ala model mutants is plotted against the average cavity volume calculated using MC method (a), CAST (b), MS (c), or VOIDOO (d). The expected average cavity volume is given by the difference between the average Voronoi volumes of X and Ala (19).  Fig. 6 was randomly incremented by either ϩ10 or Ϫ10 Å 3 , and the data were refitted. (b) Each cavity volume in Fig. 6 was incremented by a random number between ϩ10 and Ϫ10 Å 3 and the data refitted. These two procedures were repeated 25 times. The refitted parameters were virtually identical to the parameters originally obtained. This suggests that errors in estimating cavity volumes would not affect any of the conclusions arrived at. A methylene group has average values of 27 Å 3 and 22 Å 2 for Voronoi volume and ASA, respectively. Thus, deletion of a single methylene group in the absence of any structural rearrangement would result in a decrease in stability of 0.7 and 0.41 kcal⅐mol Ϫ1 because of loss of packing and hydrophobicity, respectively. Hence, the magnitude of the contribution made by a single methylene group is 1.1 kcal⅐mol Ϫ1 . This is in excellent agreement with the average value of 1.1 kcal⅐mol Ϫ1 per methylene group reported from cavity-filling mutations (45) but is slightly lower than the average value of 1.3 kcal⅐mol Ϫ1 reported earlier for another data set of cavity-creating mutations (46). In the present data set, there are mutants with ⌬⌬G 0 U3 F either higher or lower than the corresponding average values, e.g. L99A of T4 lysozyme (⌬⌬G 0 U3 F ϭ 5.0 kcal⅐mol Ϫ1 ) and I17A and L46A (⌬⌬G 0 U3 F ϭ 2.7 kcal⅐mol Ϫ1 ). In such cases the cavity volumes are either larger or smaller than the corresponding expected cavity volume, i.e. difference between the side-chain Voronoi volume of wild type and the mutant residue (74 Å 3 in this case). From the fitted values described above, the approximate maximal values associated with deletion of one (e.g. Ile 3 Val), two (e.g. Val 3 Ala) or three methylene (e.g. Ile 3 Ala, Leu 3 Ala) are 1.1, 2.2, and 3.3 kcal/mol, respectively. However, it should be remembered that the packing contribution obtained above is an average value and in general will depend on the actual packing density around the site of mutation. The close agreement between observed and calculated values of ⌬⌬G 0 U3 F at several sites in multiple proteins suggests that the variation in packing density is not large, at least for the present set of mutational data.
There are several mutants of SNase such as L108A, I72A, V111A, L125A, and L103A, where the experimentally observed values of ⌬⌬G 0 U3 F (5.8, 5.1, 4.7, 4.9, and 4.6 kcal⅐mol Ϫ1 ) are larger than expected from the present analysis (27). Cavity volume calculations of these mutants could not be carried because structures are not available. The stability changes for a The cavity volume of 1D5H (F8A of RNase S) is corrected from the graphics as Phe-120 moves to fill up the space created by removal of Phe-8. The value of the new cavity generated due to the movement of Phe-120 is added to obtain the total cavity volume.
b The ⌬⌬G 0 U¡F and cavity volumes of Gene V protein are shown per monomer.
these mutants were obtained using guanidine hydrochloride induced denaturation and assuming two-state unfolding (27). The m-values (an index of the cooperativity of folding) of these mutants are substantially less than that of the wild type. Subsequently Carra and Privalov (47) have shown that the SNase mutants of this nature, termed as m Ϫ mutants, thermally denature via a three-state mechanism rather than a two-state mechanism. It was shown for some of the mutants that ⌬⌬G 0 U3 F determined from calorimetric studies (with the three-state assumption) was substantially smaller in magnitude than the ⌬⌬G 0 U3 F previously determined from guanidine hydrochlorine denaturation studies (Table II of Carra and Privalov (Ref. 47)). Hence, consistent with the present work, the true ⌬⌬G 0 U3 F change of the above SNase mutants is likely to be in the range observed for the proteins examined here.
Earlier studies by Eriksson et al. (24) showed that destabiliza- FIG. 4. A stereoplot of the barnase I88A structure (Protein Data Bank identification code 1BRJ) in the region around the cavity. Atoms within a sphere of radius of 7.0 Å from the centroid of the wild type Ile-88 are shown in the mutant structure. The van der Waals dot surface of these atoms around the cavity is shown to highlight the cavity that is formed. The CA and CB atoms of Ala-88 are shown in black. a, cavity is seen in the mutant protein. b, the side-chain atoms of Ile-88 (generated by superposing the wild type structure on the mutant protein) are shown to fill the space, indicating that the wild type protein is devoid of a cavity at this site. c, space-filled grid points (MC method) are shown to fill the empty space of the cavity, indicating proper detection of cavity. tion of several T4 lysozyme Leu 3 Ala cavity-containing mutants was approximated by a constant term (ϳ2.0 kcal⅐mol Ϫ1 ) plus a term that increases in proportion to the size of the cavity (24). Cavity volumes in this study were measured by the MS procedure. The constant term arising as a result of the loss of three -(CH 2 )-groups in Leu 3 Ala mutants was shown to be approximately equal to the transfer free energy of leucine relative to alanine as determined from partitioning between aqueous phase and octanol. Hence, for a single -(CH 2 )-, this would correspond to ϳ0.6 kcal⅐mol Ϫ1 and is slightly larger than our estimate of 0.41 kcal⅐mol Ϫ1 . The estimate of the cavity size-dependent energy term by Eriksson et al. (24) was shown to range from 24 to 33 cal⅐mol Ϫ1 ⅐Å Ϫ3 . Our estimate of the cavity dependent energy term of 26.1 cal⅐mol Ϫ1 ⅐Å Ϫ3 is in numerical agreement with this previous result, although it should be noted that the volumes measured in the MS procedure (before normalization) are ϳ50% larger than the Voronoi volumes obtained after normalization by the MC procedure (Table I). Our estimate of the strength of the hydrophobic driving force is similar to but somewhat lower than the value of 25 cal⅐mol Ϫ1 ⅐Å Ϫ2 obtained in those studies, but is obtained using a much larger collection of mutants in five different proteins. Eisenhaber (48) reported a similar value of 18 cal⅐mol Ϫ1 ⅐Å Ϫ2 for the specific free energy of solvation of hydrophobic surface regions of proteins, which was estimated from the area distribution of hydrophobic surface patches. Recently a series of MD simulations of aliphatic solutes in water (49) was used to obtain a microscopic estimate of the hydrophobic driving force of 24 cal⅐mol Ϫ1 ⅐Å Ϫ2 . The present work shows that our previous attempt (28) to correlate values of ⌬⌬G 0 U3 F with cavity volumes was unsuccessful because the methods of cavity detection used earlier failed to accurately detect and quantitate the cavity spaces. In a very recent study by Sundberg et al. (50), the decrease in affinity between hen egg white lysozyme and its antibody D1.3 because of substitution of an interface hotspot residue with smaller residue was shown to correlate with loss of apolar buried surface area caused by the mutation (50). The correlation between change in the binding free energy and the change in apolar buried surface area corresponded to 21 cal⅐mol Ϫ1 ⅐Å Ϫ2 for the effective hydrophobicity at the mutation site residue 92 of D1.3. Our estimate of 18.5 cal⅐mol Ϫ1 ⅐Å Ϫ2 for the hydrophobicity is very close to their estimate but quite different from the estimate of 47 cal⅐mol Ϫ1 ⅐Å Ϫ2 proposed earlier (4,21,22). An earlier analysis that yielded this higher estimate was based on thermodynamic data from small molecule transfer studies (4). Although the analysis was carefully done, several implicit assumptions were involved, the most important being the applicability of such data to protein folding/stability studies. Although this analysis accounts very well for the maximal changes in stability that arise from small to large mutations, it is not able to satisfactorily account for the large amount of mutational data which have changes smaller than predicted by such an analysis. The analysis predicts that the loss of stability caused by removal of a -CH 2 -group is approximately 2 kcal/ mol, i.e. the loss of stability is approximately 6 kcal/mol for Leu 3 Ala mutations (Table II of  Cavities play a crucial role in protein function, dynamics, stability, and ligand binding. Even in proteins of known threedimensional structure, accurate identification and quantitation of cavities is not straightforward. This is because proteins are not solid objects and most atoms in a protein are connected to the surface by narrow channels. We have developed a simple, Voronoi-based procedure for calculating volumes of atoms as well as cavities in proteins. The method is able to detect cavities not detectable by other methods. Volumes calculated by the present (MC) procedure are in good agreement with accurate residue and cavity volumes calculated from computationally expensive molecular dynamics simulations.
The hydrophobic effect is widely asserted to be one of the most important driving forces in biological phenomena such as protein folding and macromolecular association. Most estimates of the hydrophobic driving force are obtained from small molecule transfer studies. It has been difficult to obtain direct quantitative estimates of the strength of the hydrophobic effect from thermodynamic studies of actual proteins because the overall free energy of folding, ⌬G 0 U3 F , has many contributions. The present cavity detection procedure was used to analyze structural and thermodynamic data for a large data set of cavity-containing mutants. The observed changes in ⌬G 0 U3 F upon mutation correlated better with cavity volumes calculated using the present procedure than with other procedures. The analysis yielded relative estimates of changes in packing and the hydrophobic driving force to the observed stability change. A value of ϳ18.5 cal⅐mol Ϫ1 ⅐Å Ϫ2 was obtained for the hydrophobic driving force. This is at the lower end of estimates of approximately 20 cal⅐mol Ϫ1 ⅐Å Ϫ2 from small molecule transfer studies and demonstrates that the destabilization in cavitycontaining mutants results primarily from loss of van der Waals interactions rather than a reduced hydrophobic driving force. The studies also confirm the dominant role of close packing in stabilization of the folded state of proteins.