CH···O Hydrogen Bonds at Protein-Protein Interfaces* 210

For the first time, a statistical potential has been developed to quantitatively describe the CH···O hydrogen bonding interaction at the protein-protein interface. The calculated energies of the CH···O pair interaction show a favorable valley at ∼3.3 Å, exhibiting a feature typical of an H-bond and similar to the ab initio quantum calculation result (Scheiner, S., Kar, T., and Gu, Y. (2001) J. Biol. Chem. 276, 9832–9837). The potentials have been applied to a set of 469 protein-protein complexes to calculate the contribution of different types of interactions to each protein complex: the average energy contribution of a conventional H-bond is ∼30%; that of a CH···O H-bond is 17%; and that of a hydrophobic interaction is 50%. In some protein-protein complexes, the contribution of the CH···O H-bond can reach as high as ∼40–50%, indicating the importance of the CH···O H-bond at the protein interface. At the interfaces of these complexes, CαH···O H-bonds frequently occur between adjacent strands in both parallel and antiparallel orientations, having the obvious structural motif of bifurcated H-bonds. Our study suggests that the weak CH···O H-bond makes an important contribution to the association and stability of protein complexes and needs more attention in protein-protein interaction studies.


[ABSTRACT]
For the first time a statistical potential has been developed to quantitatively describe the CH···O hydrogen bonding interaction at the protein-protein interface. The calculated energies of CH···O pair interaction show a favorable valley at about 3.3Å, exhibiting the feature typical of hydrogen bond and is similar to the ab initio quantum calculation result (39). The potentials have been applied to a set of 469 protein-protein complexes to calculate the contribution of different types of interactions for each protein complex: the average energetic contribution of conventional hydrogen bond is around 30%; that of CH···O hydrogen bond is 17% and hydrophobic interaction is 50%. In some protein-protein complexes the contribution of the CH···O H-bond can reach as high as 40~50%, indicating the importance of the CH···O H-bond at the protein interface. At the interfaces of these complexes, C α H···O H-bonds frequently occur between adjacent strands in both parallel and anti-parallel orientation, having the obvious structural motif of the bifurcated hydrogen bonds. Our study suggests that the CH···O weak hydrogen bond provides important contribution to the association and stability of protein complexes and needs more attention in protein-protein interaction studies.
However the energetic contribution of this kind of interactions to the stability of protein-protein complexes as compared to other forces remains to be explored. Despite the finding of numerous CH···O contacts at protein-protein interfaces, it remains unclear about their relative importance in the protein recognition or drug binding process.
Although some structural analyses have provided a wealth of information about average hydrophobicities and residue compositions of protein-protein complexes (29)(30)(31)(32)(33), they did not provide any quantitative information on the strengths of these different sorts of interactions (such as hydrophobic interaction, hydrogen bonding interaction and CH···O hydrogen bonding interaction). But it is the quantitative magnitude of the interaction that is of most significance in understanding the possible role each of them plays in the association of protein complex.
In attempts to describe the energetic aspects of CH···O interactions, theoretical calculations have been developed to evaluate the strength of CH···O interactions (34)(35)(36)(37).
Ab initio quantum calculation has shown that a small molecule such as CH 2 F 2 can establish an H-bond of the strength similar to a conventional OH···O interaction (38).
Recently the ab initio quantum calculation for the C α H···O hydrogen bond of some representative amino acid residues (39) has demonstrated that the peptide CH group is a potent proton donor and the CH···O interaction appears to be a true hydrogen bond.
Furthermore the study also shows that some CH···O bond is even stronger than a conventional OH···O interaction, suggesting that the CH···O hydrogen bond interactions in proteins need to be paid more attention. Here for the first time we calculate the average energetic percentage of CH···O interaction in total binding free energy, using our established mean-field potential for describing protein-protein complexes (40). The potentials of mean force (PMF), a beneficial tool in protein fold recognition, generally use the training database of known protein structures to extract 'pseudo-potentials' for predicting unknown structures (46)(47)(48)(49). Many applications have demonstrated their usefulness in studies of protein-ligand binding (50)(51)(52)(53)(54)(55) and in protein-protein associations (56)(57).
Based on the new definition of atom types and our developed method, the distance dependent potentials have been derived to describe different types of interactions at the protein-protein interface. And the energetic aspects of CH···O interactions have been discussed quantitatively. The calculated energies of CH···O pair interactions exhibit the feature typical of hydrogen bond. The quantitative study on the energetic percentage have shown the importance of the CH···O hydrogen bond at the protein interface. The obvious structural motif of bifurcated hydrogen bond is highlighted in the stereochemical analysis of CH···O contacts in the representative examples with a high CH···O percentage. We expect that the method would be helpful to understand the interactions in protein-protein interfaces and how they drive protein-protein associations.

[METHOD]
We use the established empirical approach for the description of protein-protein association from energetic aspect. The mean-field potentials are derived from the same training set as that used before (40). Using the methodology as described in our previous paper, we implicitly treat solvation and entropic effects and directly estimate total free binding energies of protein-protein complexes without any knowledge of experimental binding affinities and fitting procedures.

New definition of five atom types
To extract and characterize the CH···O hydrogen bonding interactions, five atom types are defined: hydrogen bond donor (D), hydrogen bond acceptor (A), both donor and acceptor (B), CH type-neutral atom bonded to hydrogen atoms (CH), and neutral type-neither donor nor acceptor (N). In the atom type definition, primary and secondary amines are defined to be donors; oxygen and nitrogen atoms with no bound hydrogen are defined to be acceptors; hydroxyl oxygen, ND NE of HIS and carboxyl oxygen in C-terminal are defined to be both donor and acceptor; carbon atoms with hydrogen atom are defined to be CH type; and carbon atoms with no bound hydrogen and sulfur atoms are defined to be neither donor nor acceptor. The atom occupancy in the crystal structure file is used to function as a weighting factor.
The distance-dependent Helmholtz free energies of protein-protein complexes are extracted from the non-redundant training set (40) in Brookhaven Protein Data Bank (43,44). Atoms of the 'receptor' part and the 'ligand' part are treated differently. So we calculate 25 atom pair interaction terms here (see Table 1). In our training set metal and hetero-atoms are excluded. It is assumed that all crystallized complexes use water as the medium. Water molecules are neglected, as the solvation effects are implicitly treated.
Hydrogen atoms are omitted in all the analysis.

Statistical potentials
Pair potentials are derived from the training set using the same methodology described in the previous paper (40). Here we only give a brief description of the method.
According to reverse Boltzmann relationship, the free energy between the 'receptor' atom of type i and the 'ligand' atom of type j at a distance r can be written as where k is the Boltzmann constant and T is the absolute temperature. f ij (r) is a frequency of these ij contacts occurring at distance r. In fact, our statistical potential ∆A ij (r) is the difference relative to a reference potential where m ij is the total number of contacts between types i and j, g ij (r) is the distribution of these contacts occurring at distance r, f(r) is the distribution of all contacts for all types at distance r. The atom pair distance r uses a histogram-based representation, and r here refers to a given bin of width 0.2Å. We simplify our reference energy: we just import a big value in very short distance (i.e. where our statistics are not included), in order to capture strong van der Waals repulsive potentials in this distance range.
The derived potentials for the interaction of 'receptor' atom type i and 'ligand' atom j are summed up to evaluate the total PMF value.
For atom presentations, occupancy ratio p i is used to function as a weighting coefficient.
The occurrence of the atom pair in a distance less than 8.0 Å is recorded. If the total number of atom pairs in the shell of a distance r±∆r (∆r =0.1 Å) is less than 30, the contributions of all atom pairs at the distance interval are ignored because of their statistically insufficient data.

The contribution of different interactions to the total binding free energy in each protein-protein complex
We apply the statistical potentials to a larger set of protein-protein complexes. The X-ray and NMR structure of protein-protein complex having a more flexible threshold of For each protein complex, the contributions of different interaction types to total binding energy are calculated by where z represents certain type of pair interaction (see Table 1). p z is the energetic percentage of the interaction of type z to the total binding free energy of the calculated protein-protein complex. The pair interaction between atom i′ and j′ is the type z, according to the definition in Table 1. The denominator A represents total PMF energy calculated by the equation 3.

Analysis of the stereochemistry of CH···O hydrogen bond
The stereochemical details of CH···O hydrogen bonds in some representative examples with a high energetic percentage of CH···O interactions were surveyed. Hydrogen atoms were added to these PDB coordinate files using CHARMM (61). Each of the CH···O hydrogen bond is analyzed using three different geometrical parameters: the C···O distance (d), H···O distance (d H ) and C-H-O angle (θ). The definition of these parameters is shown in Table 2 and Table 3. In the analysis, only those contacts with a θ angle greater than 100˚ were accepted.
Because we could not generate realistic binding energies using van der Waals interactions and partial charge electrostatics, a distance constraint is used to force d(C···O) to take on the observed value of 3.2~3.4Ǻ in order to modeled the CH···O=C weak hydrogen bond.
According to the potential shape of CH···O=C weak H-bond, two different force constants of the constraint are used in the close or long distance range respectively.
The resultant CH···O=C weak interaction potential is: and the energetic terms of van der Waals interaction and electrostatic interaction are calculated by the values taken from CHARMM (60). The distance constraint is artificially stable and adequately simulates the van der Waals and partial interaction between the C and O atoms in the CH···O=C weak hydrogen bond.

(2) Analysis of the stereochemistry of CH···O hydrogen bond based on the minimized structures with and without the inclusion of the CH---O hydrogen bonding interaction energies
The crystal structures of the protein complexes were served as the starting point. The initial hydrogen atoms were added to these PDB coordinate files using CHARMM (60).
Two different strategies are adopted during the minimization process of CHARMM. In the first one, the 2500 cycles of conjugate gradient minimizer (CONJ) is used until the energy convergence, and no atoms is fixed and the van der Waals interaction and electrostatic interaction are calculated as the nonbonded interaction of atom pair during the minimization process. In the other one, the 2500 cycles of conjugate gradient minimizer (CONJ) is also used until the energy convergence, and the distance constraint forcing the distance of C···O pair are added to the interaction energies in order to model [RESULT]

The Helmholtz free energy for general CH···O hydrogen bonding interactions
Based on the definition of the five atom types, the distance-dependent potentials are extracted from the training set. In Table 1    The steric interaction of atom types like hydrogen bond acceptor-acceptor and donor-donor is also analyzed with a mean percentage of 3%, which shows that it plays a weaker role at protein interfaces than the three major types of interactions (Figure 2b).

Structural features of CH···O interaction in some protein complexes with high CH···O interaction percentage
The protein-protein complexes with relatively high percentage of CH···O interaction have been analyzed (see Figure 2c). We carefully surveyed the geometrical parameters and structural features of these contacts in some representative examples, focusing on the contacts between adjacent β-strands and α-helices located at protein-protein interfaces.

Adjacent β-strands in parallel and anti-parallel orientations
In the representative examples with a high contribution of CH···O interaction, the stereochemistry of close CH···O contacts between adjacent β-strands at the interface has been analyzed. The PDB entries include 2kin, 2gac, 2bqp, 1pya, 1prt, 1lya, 1kvd, 1fi8, 1dgw and 1apy. Figure   [DISCUSSION]

The Simple atom type definition
Biological interfaces of protein-protein complexes contain many specific interactions, including hydrogen bonding, water bridging interactions, and nonspecific interactions (32,33). In our method, the two atoms of each atom pair at protein-protein interfaces contact through either hydrogen bond (conventional or CH···O hydrogen bonding interactions) or hydrophobic interactions, which are pertinent on protein interfaces and tend to reflect the forces driving the association of protein complexes (see Table 1). In all the statistics, the occurrence of each the atom pair is sufficiently large. Moreover we omit the statistics in the distance shells having little atom pair occurrence to reduce the mistakes of insufficient statistics. We believe that the details of the potentials obtained are meaningful, which is the basis for the comparison of different interactions.

Is the CH···O pair interaction a real hydrogen bond?
As the favorable potential of CH···O interaction at protein-protein interfaces has not yet been obtained experimentally, we will discuss the question whether CH···O interaction is a real hydrogen bond or one of the nonspecific interactions based on the study here. The reasonable Helmholtz free energies of CH···O interaction at protein-protein interfaces are extracted using our method of mean-field potential. And the calculated energies obviously reflect the characterization of hydrogen bonding interaction. Recently the result of the ab initio quantum calculation shows that the binding energies of ideal CH···O pair interactions indicate the comparative hydrogen bond energy in the calculation between water molecules and some representative amino acids (39). Here using our statistical potentials from the training set of real protein-protein complexes, the calculated free energy has a reasonable potential form, which is similar to the quantum calculation of the ideal model. At the same time, we find that the favorable valley of our calculated CH···O potential is very different from that of nonspecific interactions, which generally reflects the random contacts between atoms. In Figure 1c, the van der Waals repulsive potential of N···N atom pair is just one of the nonspecific interactions, which represents the distribution between non-bonded and randomly distributed non-polar atoms. These nonspecific interactions have an even potential close to zero at longer distances and have a rapidly climbing potential at short distances (up to 4.0Å). More importantly, they have no obviously favorable valleys at all distances. Thus the favorable valley of CH···O potential shows that different from random contacts, the CH···O pair interaction is one of the specific potentials, which has the characterization of hydrogen bonding interaction. Moreover, this shape of potential is similar to that of conventional hydrogen bond, in spite of the different strength and optimum distance. Thus the CH···O contact at protein-protein interfaces has specific interactions similar to conventional hydrogen bonds. According to the calculation of our PMF potential at protein-protein interfaces, the by guest on  http://www.jbc.org/ Downloaded from CH···O contact should be considered as one of the hydrogen bonding interactions. Our conclusion is supported by the ab initio quantum calculation of Scheiner et al. (38,39) that CH···O contacts appear to be a true hydrogen bond. Compared with the conventional hydrogen bond, the general CH···O hydrogen bond has rather weaker strength and longer optimum distance with its strength near one third of the conventional hydrogen bond and its optimum distance around 3.3Å.

The role of the three types of interactions at protein-protein interfaces: the quantitative calculation on energetic aspect
First we quantitatively calculate the energetic contribution of different types of interactions at protein-protein interfaces, which indicates the role of different forces at protein-protein interfaces. At the protein-protein interface, the average contribution of conventional hydrogen bond is around 30%; the percentage of CH···O hydrogen bond is around 17%; that of hydrophobic interaction is 50%; and the other steric interaction is 3%.
Conventional hydrogen bonding and hydrophobic interactions are generally considered to play important roles in protein associations. The biggest contribution of hydrophobic interactions indicates that the large number of hydrophobic atom pairs occur at the protein-protein interface, because the energy of each hydrophobic atom pair is less than the other pair interactions. Also the 30% contribution of conventional hydrogen bonds shows the significant involvement of charged or polar residues in protein-protein association that is accepted widely (20,21).
The most important finding here is that the energetic contribution of the CH···O weak hydrogen bonding interaction at protein-protein interfaces cannot be neglected. Close CH···O contacts abound at protein-protein interfaces. Although normally weaker than conventional hydrogen bonds, the number of CH···O contacts cannot be neglected. In our calculation of each protein-protein complex, it should be noted that the energetic percentages of CH···O hydrogen bonding interactions in some examples are up to 30%~50% for protein-protein associations.

β-strands
We have analyzed the protein-protein complexes with relatively high percentage of CH···O interactions, focusing on the geometrical parameters of the CH···O hydrogen bonds between adjacent β-strands and α-helices at the interface. These CH···O H-bonds appear to be a specific interaction favoring the formation and stabilization of the adjacent α-helices and β-strands (acting as a secondary hydrogen bond between β-strands compared with conventional C=O···HN hydrogen bond). Especially C α H···O H-bonds between adjacent strands in parallel and anti-parallel orientations have an important contribution at the protein interface. It is interesting to note that the structural motif of a bifurcated hydrogen bond is found between the adjacent strands at some representative examples of real protein interface (see Figure 3a and 3b). And the CH···O weak hydrogen bond secondarily effects the stabilization of the β-strands. In these examples, the CH···O interaction has a high energetic percentage of the total binding free energy, and the bifurcated H-bond motif appears to have an important contribution to the formation and stabilization of the parallel and anti-parallel β-strands.
In the protein complexes with strong CH···O interactions, close C α H···O contacts frequently occur between the parallel and anti-parallel β-strands, and the structural motif of bifurcated hydrogen bonds has been found between those adjacent strands. The C α H···O hydrogen bond is indispensable to form the bifurcated H-bond motif between β-strands. In these complexes, the CH···O hydrogen bonding interaction has a high percentage of binding free energy and thus plays an important role for the formation of the β-strands interactions at protein-protein interfaces. Moreover the observed geometrical parameters of all the close contacts are very close to those expected for hydrogen bonds. Recently stereochemical analysis of close CH···O contacts has reported that these contacts exhibit stereochemical features typical of hydrogen bonds in proteins, membrane proteins and active sites of proteases (17,18,(58)(59)(60). Adding our observation at protein-protein interfaces, all of the studies show that CH···O H-bond appears to be a specific interaction having a favoring valley regardless of protein interiors, membrane proteins or protein-protein interfaces. However, those popular crystal refinement programs often treat the CH···O contact as a repulsive interaction. The importance of the CH···O interaction, indicated by our quantitative energetic calculation, calls for a revision of the refinement programs.

[CONCLUSION]
An empirical approach to quantitatively describe forces (including CH···O hydrogen bond) at protein interfaces in energetic aspect is presented here. Our calculated Helmholtz free energies of the CH···O pair have a similar favorite valley exhibiting the feature typical of hydrogen bond. When applying the scoring function to the set of protein-protein complexes, we found the significant contribution of CH··O hydrogen bonds. The average energetic contribution of the CH···O H-bond is 17% and this value in some complexes can reach as high as 40~50%. In these complexes, the structural motif of a bifurcated H-bond, combining the C α H···O H-bond with the conventional C=O···HN H-bond, is found between adjacent strands in both parallel and anti-parallel orientation at the interface.
In conclusion, the importance of CH···O hydrogen bonding interactions calls for a revision for the point of view that CH···O contacts are treated as repulsive. When studying protein-protein interfaces, the CH···O type hydrogen bonds should be taken into appropriate consideration.

[Acknowledgement]
We thank Dr. Chao Tang                   a. the interatomic C···O distance (d) in the crystal structure b. the interatomic C···O distance (d') in the minimized structure without the inclusion of the CH···O hydrogen bonding interaction energies c. the interatomic C···O distance (d'') in the minimized structure with the inclusion of the CH···O hydrogen bonding interaction energies