Additivity Principles in Biochemistry*

We cannot yet reliably fold proteins or RNA molecules by computer, predict ligand binding affinities, compute conformational transitions, or use the sequence information in the Human Genome very effectively to understand biomolecular function and disease. Why not? Perhaps some of our models in computational biology are based on flawed assumptions. Thermodynamic additivity principles are the foundations of chemistry, but few additivity principles have yet been found successful in biochemistry.

According to the principles of thermodynamics, to predict how molecules act we must account for the free energies. Free energies are expressed in the language of van der Waals interactions, hydrogen bonding, ion pairing, solvation and hydrophobic interactions, and entropies due to translations, vibrations, rotations, and configurations. Free energies or entropies of protein or RNA folding, mutations, enzyme kinetics, or ligand binding are often modeled as sums, based on group additivities, ⌬G ϭ ⌬G amino acid 1 ϩ ⌬G amino acid 2 ϩ ...⌬G amino acid n or ⌬G ϭ ⌬G CH2 ϩ ⌬G OH ϩ ⌬G aromatics ϩ ...⌬G other substituents or (Eq. 1) ⌬G ϭ ⌬G backbone ϩ ⌬G side chain or free energy component additivities, ⌬G ϭ ⌬G nonpolar solvation ϩ ⌬G hydrogen bonding ϩ ⌬G PDB-derived statistical potentials ϩ ⌬G polar burial or electrostatic effects ϩ ⌬G van der Waals or packing effects ϩ ⌬G ⌽⌿ Ϫ T⌬S conformational Ϫ T⌬S side chain rotations ϩ etc.

⌬S ϭ ⌬S translations ϩ ⌬S rotations ϩ ⌬S vibrations ϩ etc. (Eq. 3)
Thermodynamic additivity is the principle that if two components, A and B, contribute independently to some process, then the total change in free energy (or enthalpy or entropy) is the sum of components, ⌬G ϭ ⌬G A ϩ ⌬G B . However, additivity only applies if components A and B are independent.
We call these sums "thermodynamic additivity models." Are there "true" independent free energies of a hydrogen bond, a salt bridge, or a hydrophobic contact that we could we add together to compute binding or folding free energies? Are enzyme rate accelerations sums of translational, rotational, and vibrational free energy terms? If a peptide, say tetraalanine, binds a protein, does the free energy equal four times that of the binding of alanine? Is protein folding the sum of oil-towater-like transfers of each amino acid? Can we add surface area-based solvation terms to molecular dynamics force fields? Are the conformational entropies of biopolymers simple sums of the monomer entropies or of backbone plus side chain entropies?
Modelers using expressions like Equations 1-3 must pay attention to decisions about the interactions and entropies. What is the right balance of interactions in a model? What is the relative importance of a hydrogen bond and a hydrophobic contact? How should we estimate packing and side chain energies and entropies? Is the protein interior like a hydrocarbon liquid, diketopiperazine crystal, or something else? What mathematical forms should we use for the interactions? Clearly, flawed answers to such questions can be sources of errors in models.
But perhaps some of our models in computational biology are failing at a deeper level. The concept of additivity is a fundamental premise (1, 2) that is widely taken for granted. It may be that all choices of parameters in Equations 1-3 might fail. Perhaps the problem is additivity itself. Perhaps it is inappropriate to sum free energies no matter what the relative weights and no matter what choices we make for the individual terms in it.

A Major Culprit? The "4th Law" of Thermodynamics: the Assumptions of Independence and Additivity
Without additivity, chemistry would have limited predictive power (3). 1,2 Additivity has been called the 4th law of thermodynamics (3). For example, if the heat of formation of covalent compounds were not equal to the sum of the bond enthalpies (if the heat of formation of carbon dioxide were not equal to twice the heat of a C-O bond) then chemical equilibria and kinetics would not be predictable from simpler reactions. Every chemical equilibrium would require its own separate measurement. We could not look up bond energies in tables and compute the energetics of ATP cycles, the breakdown of glucose, or other equilibria.
Thermodynamic additivity principles could be equally important in biochemistry (noncovalent processes). When we design drugs using quantitative structure-activity relationship substituent constants, when we use single-site mutagenesis as a basis for protein engineering, when we design folding and binding algorithms based on models of amino acid partitioning, or when we compute the melting behavior of DNA and RNA as sums of nearest neighbor interaction energies, we assume additivity relationships. The search for additivity principles is based on the hope that we will be able to make independent measurements or calculations for subcomponents of a system and add them together using equations such as 1-3 to predict the structures and properties of biomolecules.
In chemistry, the "state" called "a carbon dioxide molecule" and the "state" called "carbon and oxygen atoms" are sharply defined and distinguishable from each other, differing in stability by tens of kcal/mol, and not much dependent on temperature or solvent conditions. However, for biology, as for polymer science, the individual monomer interactions are much weaker, involving noncovalent interactions nearer to thermal energies (kT ϭ 0.6 kcal/mol at physiological temperatures). In these cases, broad ensembles of microscopic conformations comprise macroscopic "states" (denatured states, molten globules, folding intermediates and transition states, some bound complexes). States are sometimes not so distinct as they are in covalent chemistry, and simple additivity may not apply.

How Good Does Additivity Need to Be?
Whether an additivity assumption is good or poor depends on what errors we can tolerate. Suppose we want an energy function to model protein folding. If an energy function has errors totaling 10 kcal/mol for a protein of 100 amino acids, it will be useless for predicting the native structure. Ten kcal/mol is about the difference in free energy between native and denatured conformations, so such functions will have no meaningful ability to discriminate among protein conformations. On the other hand, an energy function with an error of 1 kcal/mol, even though not perfect, would be useful. Since random errors grow with the square root of the number of monomers (5), a 100amino acid protein has about ͌100 ϭ 10 times the error of one amino acid, so an adequate energy function must err by less than about 100 cal/mol per amino acid. (Only 10 cal/mol would be tolerable for systematic errors, which scale linearly with size.) This is a crude estimate, but it gives a rough target: if we can model contact, transfer, or binding interactions to better than 100 cal/mol for molecules about the size of amino acids or nucleotides, then our energy functions will be useful for models of folding or other large conformational changes.

Group Additivities in Biochemistry
Group additivities account successfully for the partitioning of alkanes, alkenes, alkadienes, alcohols, and other homologous series of hydrocarbons from one medium to another (6,7). The free energy of transfer depends linearly on the number of monomer units in the chain. Additivity predicts effects of double mutations on enzyme reaction rates (8), binding, and protein stability (9) (see Fig. 1) as sums of single mutations, at least when the mutation sites are spatially separated. Sometimes mutational effects on protein stability correlate well with oil/water partitioning (10). The stabilities (⌬G) and melting behavior (⌬H) of oligomers of DNA (11) or RNA (12) are well predicted as sums of free energies and enthalpies of nearest neighbor pairs of nucleotides. Additivity predicts well mutational changes in the binding of proteinases to their inhibitors. 3 But these successes of additivity may owe as much to uniformity of the neighboring environment, as to additivity per se. The remarkable linearities in homologous series may depend on the constancy of next neighbors; for long chains, each added substituent has the same neighbors as the previous substituent. Often the monomers and dimers in homologous series do not fall on the same line as the longer chains. The free energy of transferring the tripeptide Gly-Gly-Gly from oil to water minus the free energy of transferring Gly-Gly should equal the free energy of Gly-Gly minus the free energy of glycine, since both differences represent the transfer of a single glycine. However, Nozaki and Tanford (14) showed that the former difference is 1270 cal/mol while the latter is 895 cal/mol. This discrepancy of 375 cal/mol is larger than the target error of 100 cal/mol. Moreover, the rank ordering of partitioning amino acids into oil is different for different oil phases (15), and solute partition coefficients depend on the "ordering" in oil phases (16). The additive free energies of binding observed in protease inhibi-tors 3 apply to a particular site. What happens at one site may differ from what happens at another. It is fundamental to biology that biochemical environments (in complexes, inside proteins and RNAs, at binding sites) differ in their structures and energetics. It may be that "effective medium" models (like oils or solids), where additivity applies, will not adequately model complex biochemical environments.
The limit of current experimental errors is probably around 200-400 cal/mol (see the deviations from the straight line in Fig. 1, for example). If so, it implies a possible fundamental limitation on our ability to find useful thermodynamic additivity principles in biochemistry.

Energy Component Additivities in Biochemistry
Even more problematic than group additivity is energy component additivity (1, 18). Mark and van Gunsteren (1) have proven that adding entropies or free energy terms, as in Equation 2, is generally not justified, although sometimes the nonadditivities may be small. 4 More broadly justified is the summing of energies or enthalpies: ⌬H total ϭ ⌬H van der Waals ϩ ⌬H solvation ϩ ⌬H electrostatics ϩ ⌬H hydrogen bonding , etc., provided the terms describe independent forces (1). If we combine this with ⌬G ϭ ⌬H Ϫ T⌬S, an additivity relationship that is always justified by thermodynamics, then we might hope that expressions such as ⌬G ϭ ⌬H van der Waals ϩ ⌬H solvation ϩ ⌬H electrostatics ϩ ⌬H hydrogen bonding Ϫ T⌬S are on a sounder footing than sums of free energy components.
How should we divide the total enthalpy into truly independent component terms? For example, what experiments will isolate enthalpies of solvation from hydrogen bonding? Making isosteric mutations does not mean "no change" in van der Waals interactions, because two different shapes having the same volume can pack a cavity differently.
If we cannot yet predict free energies, it is even harder to predict enthalpies because of enthalpy/entropy compensation (24); perturbations that increase the enthalpy can also increase the entropy, with little or no effect on the free energy. Compensation, which occurs broadly throughout biochemistry, means that the enthalpy is not independent of the entropy. Fig.  2, taken from a review by Sturtevant (25), shows compensation for some mutants in two different systems: the unfolding of ribonuclease H 1 and the binding of modified S-peptides to S-protein to form ribonuclease S. The figure shows that a free energy change due to a mutation can be less than 1 kcal/mol, while the corresponding enthalpy change can be 10 kcal/mol.
In another example, Breslauer et al. (26) bound netropsin and distamycin A to two very similar DNA molecules: A, ATATAT . . . on one strand and TATATA on the other and B, AAAAAA on one strand and TTTTTT on the other. The free energy of binding netropsin is nearly identical, Ϫ12.7 kcal/mol to molecule A and Ϫ12.2 kcal/mol to molecule B, and the dependence on NaCl is similar in the two cases. Nevertheless the driving forces are completely different; binding to molecule 1 is dominated by enthalpy (⌬H ϭ Ϫ11.2 kcal/mol), while binding to molecule 2 is dominated by entropy (⌬H ϭ Ϫ2.2 kcal/mol). A similar result is found for distamycin. A better understanding of the enthalpy and entropy components of free energies may lead to better microscopic models.

Repairing Incorrect Additivity Assumptions: the Role of Statistical Mechanics in Biochemistry
While thermodynamic models such as Equations 1-3 require additivity assumptions, statistical mechanical models do not. Statistical mechanics provides rigorous tools for relating molecular structures to thermodynamic quantities. Statistical mechanical theories give a good accounting for some cooperativities in biochemistry. 1) The Zimm-Bragg and related models (27,28) describe helix-coil transitions. The tendency of a monomer to form a helical turn is not independent of the helical tendency of its neighbor. 2) Ligand binding can involve a cooperativity of binding at different sites (20,21). While thermodynamic additivity models often rely on untested assumptions (about independence, additivity, averaged medium models of the environment, or ways to lump degrees of freedom together) statistical mechanical models need not be limited in this way. Statistical mechanical models and atomic simulations can aim to identify all the relevant degrees of freedom without bias and to weight them according to the Boltzmann distribution law.

Entropies Are Important in Biology; They Are Often Not Additive
An important class of non-additivities pertains to the entropies of conformational change. Polymer conformational entropies are often large and seldom additive. When chains obey random-flight statistics, as in denatured states, or when interactions are dominated by local factors, as in helical peptides, chain entropies are sums of monomer entropies (29). However, when nonlocal contacts are involved, in molten globules or folded or compact states, they are not. One contact between the chain ends can globally restrict the options available to all the monomers (30).
What is the error due to non-additivities? It can be large. If z is the number of conformational isomers per monomer (z Ϸ 3-10), the entropy (per monomer) of a random-flight chain is approximately S ϭ R ln z, where R is the gas constant. But the "correction," the excluded volume entropy, when a chain is constrained to be compact is estimated to be (31) ⌬S ϭ R ln e ϭ R, a free energy error of RT Ϸ 600 cal/mol. This exceeds the target of 100 cal/mol. For a 100-mer protein, this "non-additivity" is about 60 kcal/mol (32,33), a large driving force.
Non-additivities may be the rule for biopolymer conformational entropies and free energies. Theory predicts that the free energy of folding is not a simple sum of water-to-oil transfers of the constituent amino acids since the denatured state (under native conditions) can harbor some buried residues (34). The conformational entropy of the denatured state is not independent of external conditions; it depends on solvent and temperature since strongly denatured chains are expanded (high entropy), while under native conditions denatured states are compact (low entropy). The conformational entropy of protein folding is predicted not to equal the sum of backbone plus side chain entropies, i.e. ⌬S ⌬S sc ϩ ⌬S bb (35). As the backbone decreases its radius of gyration, it "freezes out" side chain conformations. Different loops, bulges, hairpins, and pseudoknots in DNAs and RNAs or disulfide-bonded regions, 4  hinges, flaps, and interfaces in proteins are predicted to be not independent of each other (36), as is assumed in random-flight models (30). Experiments have not yet tested these predictions. In many cases, non-additivities in entropies and free energies could be measured by summing state functions around a thermodynamic cycle to determine their deviations from zero.
Statistical potentials are amino acid contact pairing frequencies, derived from data bases, and are used in protein folding algorithms. They assume pairing free energies are additive: ⌬G ϭ ⌬G Ala-Ala ϩ ⌬G Gly-Tyr ϩ ⌬G Pro-Leu ϩ etc. (37,38). However, model tests show that pairing frequencies are not independent (39). For example, if the hydrophobic amino acids drive protein collapse, they also indirectly drive polar groups to protein surfaces, so polar group pairings are not independent of the nonpolar group pairings.
Chemistry texts express entropies as sums of translations, rotations, and vibrations for small molecules in the gas phase. Are such sums, ⌬S ϭ ⌬S translations ϩ ⌬S rotations ϩ ⌬S vibrations ϩ ⌬S conformations ϩ ⌬S solvation , also justified for liquids, solvation, biomolecule binding, and enzyme reaction kinetics (40)? Remarkably little evidence bears on this question.
Statistical mechanical theory shows that in oil phases and other polymer solutions, translations are coupled to polymer conformations (i.e. ⌬S ⌬S translation ϩ ⌬S conformation ) (31,41). In liquid crystalline solutions, rotations are coupled to translations (42,43). Free volume may be coupled to translations in some solutions (23). In general the shapes of solutes and solvents affect the interdependence of their degrees of freedom (13,41), but models of ligand binding and enzyme mechanisms depend on separability assumptions (40). If the hydrophobic effect involves restricted water orientations around nonpolar solutes, then the water translational and rotational degrees of freedom are coupled. Such considerations have led Sharp et al. (4) to argue that the hydrophobic effect may involve much more free energy (45 cal/mol Å 2 ) than previously thought (25 cal/mol Å 2 ) (7). Since the binding of proteases to their inhibitors can involve 600 -1800 Å 2 of contact area (2), such uncertainties can amount to more than 10 kcal/mol.

Conclusions
A wide class of models in computational biology assumes thermodynamic additivity and independence (of energy types, of neighbor interactions, of conformational freedom, of monomer contact pairing frequencies, etc.). Biomolecules may achieve stability in the face of thermal uncertainty, as polymers do, by compounding many small interactions, but this summing trick works against modelers, since it compounds our errors. Weak interactions imply ensembles of states and possible non-additivities of entropies and free energies.
If thermodynamic additivity principles can be found having variances smaller than about 0.1 kcal/mol of monomer units, they may be as important to biochemistry as the great symmetry principles are to physics. At the present time, however, additivity principles appear to be few and limited in scope in biochemistry. Neighborhood and environment effects on additivities need to be better understood. To measure non-additivities, experimentalists could test the closure of thermodynamic cycles. Statistical mechanical models and molecular dynamics simulations may contribute to more predictive theories in biochemistry.