Probing the Energetic and Structural Role of Amino Acid/Nucleobase Cation-π Interactions in Protein-Ligand Complexes* 210

X-ray structures of proteins bound to ligand molecules containing a nucleic acid base were systematically searched for cation-π interactions between the base and a positively charged or partially charged side chain group located above it, using geometric criteria. Such interactions were found in 38% of the complexes and are thus even more frequent than π-π stacking interactions. They are moreover well conserved in families of related proteins. The overwhelming majority of cation-π contacts involve Ade bases, as these constitute by far the most frequent ligand building block; Arg-Ade is the most frequent cation-π pair. Ab initioenergy calculations at MP2 level were performed on all recorded pairs. Though cation-π interactions involving the net positive charge carried by Arg or Lys side chains are the most favorable energetically, those involving the partial positive charge of Asn and Gln side chain amino groups (sometimes referred to as amino-π interactions) are favorable too, owing to the electron correlation energy contribution. Chains of cation-π interactions with a nucleobase bound simultaneously to two charged groups or a charged group sandwiched between two aromatic moieties are found in several complexes. The systematic association of these motifs with specific ligand molecules in unrelated protein sequences raises the question of their role in protein-ligand structure, stability, and recognition.

X-ray structures of proteins bound to ligand molecules containing a nucleic acid base were systematically searched for cation-interactions between the base and a positively charged or partially charged side chain group located above it, using geometric criteria. Such interactions were found in 38% of the complexes and are thus even more frequent thanstacking interactions. They are moreover well conserved in families of related proteins. The overwhelming majority of cation-contacts involve Ade bases, as these constitute by far the most frequent ligand building block; Arg-Ade is the most frequent cation-pair. Ab initio energy calculations at MP2 level were performed on all recorded pairs. Though cation-interactions involving the net positive charge carried by Arg or Lys side chains are the most favorable energetically, those involving the partial positive charge of Asn and Gln side chain amino groups (sometimes referred to as amino-interactions) are favorable too, owing to the electron correlation energy contribution. Chains of cation-interactions with a nucleobase bound simultaneously to two charged groups or a charged group sandwiched between two aromatic moieties are found in several complexes. The systematic association of these motifs with specific ligand molecules in unrelated protein sequences raises the question of their role in protein-ligand structure, stability, and recognition.
To be able to follow the protein folding process or the association of a protein with other (macro)molecules, by computer or real experiments, it is essential to have a fundamental knowledge of the energetic and entropic factors that drive these processes or to be able to precisely reproduce them by means of the correct descriptors. Indeed, even though the inter-and intramolecular non-covalent interactions, including electro-static, hydrogen bond, London, Pauli, and electron charge transfer contributions, are known in principle, we still have a limited appreciation of their precise nature and of the way in which they compete or reinforce one another in complex systems. In particular, the importance of cationinteractions between aromatic rings and positive charges has only recently started to be appreciated in the biomolecular context (1,2), and the question regarding their true nature is far from settled.
Both experimental and theoretical studies have emphasized the presence of this favorable interaction in protein structure (3,4), as well as in biomolecular association processes such as ligand-antibody binding and receptor-ligand interactions (5)(6)(7)(8)(9). Cationinteractions have also been shown to be quite common at the interface between protein and DNA (10,11), where they involve positively charged Arg or Lys side chains and aromatic rings of nucleic acid bases. According to ab initio calculations the most favorable energy is obtained with Gua bases; hence, cationinteractions contribute to the specificity of protein-DNA recognition.
Related interactions, termed amino-or simply cation-interactions, involve a partial instead of a full positive charge, carried by the polar amino acids Asn or Gln (10 -15). These amino acids, albeit globally neutral, possess a partial positive charge ␦(ϩ) on the amino group of their side chain amide group, which, when positioned above the aromatic ring, forms a cationinteraction. Such interactions have been observed at the interface between protein and DNA (10,11) and between receptor and ligand (15), but their favorable nature has not yet been demonstrated.
Biological processes such as enzymatic reactions generally need cofactor molecules to perform catalytic transformations of substrates. Most of these intervening molecules include nucleic acid bases in their molecular structure, and we shall focus on them. These nucleobases contribute to the protein-ligand binding but are not directly involved in the biological processes and thus not chemically altered during the catalytic reaction. One main group of cofactors embodies ATP/ADP, GTP/GDP, and CTP/CDP containing an Ade, Gua, and Cyt base, respectively. When bound to their target, they liberate a phosphate group by hydrolysis of the high energy phosphoanhydride bonds and constitute the most important energy source for the cell. Another important group of cofactors comprehends NAD and FAD, which are involved in the transfer of an electron pair from reduced NAD or FAD to the final electron acceptor in an elaborate electron transport cascade. The electrons pass through a series of discrete steps that permit the gradual harvesting of electron energy, whose dissipation is coupled to the synthesis of ATP molecules. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The present study is focused on probing the structural and biological role of amino acid-nucleobase cation-interactions and lies within the scope of our recent research devoted to protein-DNA complexes (10,11). Here, we consider proteins bound to ligand molecules and cation-interactions linking the nucleobases contained in these molecules to amino acid side chain groups carrying a net or partial positive charge. In particular, we searched a non-redundant structure set of proteinligand complexes for cation-interactions and performed ab initio quantum mechanics energy calculations on all recorded cation-pairs. The conservation of these interactions in families of related proteins was also investigated, so as to further evaluate their importance compared with other non-covalent interactions. The agreement and sometimes disagreement between the frequent occurrence of specific cation-interactions in x-ray structures and their computed energies is debated. Finally, we discuss the predominance of Ade-containing ligands.

MATERIALS AND METHODS
Structural Dataset of Protein-Ligand Complexes-To generate a nonredundant set of protein-ligand complexes, we first extracted from the ReLiBase (16) data base the list of all protein structures complexed with a ligand containing Ade, Gua, Cyt, or Thy, deposited in the protein data bank (PDB) (17) and having a resolution of 2.0 Å or better. This yielded 580 protein-ligand complexes. We then extracted from these complexes a subset that contained only protein chains displaying less than 25% sequence identity. Information on sequence identity was gathered from the FSSP fold classification data base (18). In this way, we obtained a final set of 188 protein-ligand structures. The subset of these structures containing protein-ligand cationinteractions is listed under "Supplementary Material." Geometric Definition of Cation-Interactions-We define cationinteractions geometrically by a distance and an angle criterion (10). The distance criterion requires that at least one of the atoms of the aromatic ring is located no further than 4.5 Å from one of the atoms carrying the positive charge (i.e. N 1 , N 2 , C , or N ⑀ for Arg and N or C ⑀ for Lys) or the partial positive charge (i.e. N ⑀ 2 or C ␦ for Gln and N ␦ 2 or C ␥ for Asn). Note that nitrogen atoms are considered instead of hydrogen atoms because the latter are lacking in x-ray structures. Atoms carrying a partial positive charge designate atoms where the intramolecular electron distribution presents a positive peak due to a lack of electrons. The angle criterion requires that the (partially) positively charged atom is situated above the plane defined by the aromatic ring, more precisely, inside a cylinder of height 4.5 Å, whose base includes the ring and has a radius equal to the ring diameter.
Simplified Representation of Cation-Partners-Each pair was reduced to a simple system that could be studied computationally. The ligands were reduced to the nucleic acid base they contain. Lys and Arg were represented as ammonium and guanidinium groups, respectively, and Gln and Asn as formamide groups. In the case of Lys, two different orientations of the ammonium moiety toward the -electron cloud were considered, with one of the N-H bonds pointing either toward the aromatic ring or in the opposite direction.
Ab Initio Quantum Mechanics Energy Calculations-All ab initio energy calculations were carried out using the Gaussian 98 suite of programs (19). In a first step, the nucleic acid bases as well as the ammonium, guanidinium, and formamide groups, which represent the cation-partners, were optimized separately, using the Hartree-Fock method (HF) and the 6-31G** basis set. The optimized coordinates were then superimposed onto the crystal coordinates using the U3BEST algorithm (20).
In a second step, the energies of the cation-systems were calculated at second order level of Møller-Plesset pertubation theory (MP2) (21). The basis set used is 6-31G**(0.2), which corresponds to the standard 6-31G** basis set where the Gaussian ␣ d -exponent of the d-polarization functions on the heavy atoms C, N, and O is equal to 0.8, with an additional ␣ d -exponent equal to 0.2. It has indeed been shown that this extended description of the d-polarization functions allows a more accurate description of cationinteractions (21,22). 1 The interaction energy ⌬E MP2 is calculated as the sum of the Har-tree-Fock energy ⌬E HF and a correlation energy contribution ⌬E COR , evaluated here in Equation 1 as the second order term of the Møller-Plesset perturbation expansion.
For a given complex A-B, ⌬E HF and ⌬E COR are evaluated as the differences at HF and MP2 level, respectively, between the energy of the complex E(A-B) and the energies of isolated partners of the system, E(A) and E(B) as shown in Equations 2 and 3.
⌬E HF contains electrostatic, polarization, electron transfer, and exchange energy contributions, while ⌬E COR contains electron correlation (dispersion) contributions. Together, they are expected to represent the major contributions to the Van der Waals energy. All calculations were corrected for the basis set superposition error (BSSE) by using the standard function counterpoise (CP) method (24,25). Details on the ab initio calculation methods can be found in Ref. 10.
Conservation of Cation-Interactions in Protein Families-To investigate the conservation of the cation-motifs in related proteins, we scanned the PIR annotation and similarity data base. 2 The PIR protein entry displaying the highest degree of sequence identity with the considered PDB sequence (found to vary between 98 and 100%) was used as probe. The sequences similar to each probe and belonging to the same family as classified in the PIR data base, were collected with the FASTA sequence alignment program (27) using the threshold e-value Ͻ 0.0001. Within each family, we discarded protein sequences shorter than 75% of the original PDB sequence. The conservation of the amino acids forming the cation-motif was evaluated on the basis of multiple sequence alignments performed with the ClustalW program (28).

Cation-Interactions in X-ray Protein-Ligand Complexes-A
non-redundant dataset of 188 high-resolution x-ray structures of protein-ligand complexes, defined as described under "Materials and Methods," was searched for cation-interactions linking protein to ligand. All the ligand molecules considered contain a nucleic acid base, which is however not directly involved in the enzymatic reaction. The overwhelming majority of the ligands (81%) contain an Ade; Gua, Thy, and Cyt only occur in 12, 4, and 3% of the ligands, respectively. These numbers could be biased, as our structure set contains few membrane proteins that are known to frequently bind Gua-containing ligands; however, a survey of sequence data bases confirms the abundance of Ade in ligands. The most frequent ligands are adenosine mono-, di-, or tri-phosphates (AMP, ADP, ATP) and ligands containing a nicotinamide adenine dinucleotide moiety (NAD, NAP, NAQ, NDP): they occur 37 and 38 times in the structure set, respectively.
We focused on cation-interactions between an aromatic ring of the ligand's nucleic acid base and an Arg or Lys side chain carrying a net positive charge or an Asn or Gln side chain carrying a partial positive charge on its amino group. We did not consider cation-interactions involving other aromatic moieties of the ligand, nor positively charged ligand groups interacting with aromatic amino acid side chains. We also overlooked His residues, due to their ubiquitous nature. They indeed play the role of cations under protonated form and behave like simple aromatic systems under unprotonated form. All these interactions are equally interesting to study, but fall outside the scope of the present work.
In the set of 188 complexes, 77 protein-ligand cation-interactions (see "Supplementary Material") were identified using the geometrical criterion described under "Materials and Methods," distributed over 71 structures (listed under "Supplementary Material"). More than half of these interactions (40 occur-rences) involve Arg, followed by Lys (19), Asn (12), and Gln (6). The frequency differences are even more marked for nucleic acid bases: by far the most frequent base is Ade (54 occurrences), followed by Gua (19), Thy (2), and Cyt (2). In the case of Gua, the cation-partner is always positioned above the 5-member cycle, whereas for Ade, this happens in 72% of the cases. With its 36 occurrences, the Arg-Ade is the preferred cation-pair. Note that in 60% of the observed cation-interactions, the charged group is positioned right above the aromatic cycle, and in the remaining interactions it is situated above the extracyclic atoms.
The preferential occurrence of Arg residues in cation-interactions is in agreement with previous observations in proteins (3,12,29,30) and in protein-DNA complexes (10,31). The lower frequency of Asn and Gln residues compared with Arg and Lys also agrees with previous observations (13) and is related to the fact that they carry only a partial positive charge on their side chain amino group. The observation that Ade is almost three times more frequently involved in cation-interactions than Gua seems a priori in contradistinction with what happens in protein-DNA complexes where Gua is more frequent (10). But it is not. Indeed, we find that 58% of the Gua-involving ligands form a cation-interaction, against 40% for Ade-involving ligands.
To probe the orientation of the Arg, Asn, and Gln side chains relative to the nucleic acid base, we computed the angle between the aromatic plane and that formed by the guanidinium group of Arg or the amide group of Asn or Gln side chains. This angle is, on average, equal to 32, 35, and 41°for Arg, Gln, and Asn, respectively. The planes are thus usually somewhat more parallel than perpendicular, which has already been noted for guanidinium groups in aqueous solution (32) and in proteins (13,33) although the perpendicular or T-shaped conformation seems more stable in vacuum (3,30). Moreover, the deviation from the stacked conformation is largest for Gln and Asn. This allows the partial positive charge carried by the amino group to be closer to the aromatic ring than the partial negative charge FIG. 1. Three-dimensional representation of cation-chain motifs. The ligand is colored using the following code: carbon (green), nitrogen (blue), oxygen (red), and phosphate (purple). The images were generated using Insight II (Accelrys Inc.). a, ؉ІІ؉ o motif in 1QF5, involving the GDP ligand, Lys (A18) and Lys (A331); b, ؉ІІ؉ s motif in 1C1Y involving the GTP ligand, Asn (A116) and Lys (A117); c, І؉І motif in 1LVK, involving the MNT ligand, Asn (127) and Phe (129); d, І؉І motif in 1H8E, involving the ADP ligand, Gln (A432) and Tyr (A433). carried by the oxygen of the carbonyl group, thereby improving the electrostatic contribution (13). Note that the deviation from the stacked conformation of Arg, Gln, and Asn is larger here than in the protein-DNA context (11), where the simultaneous formation of a double H-bond between the guanidinium or amide group and a subsequent nucleic acid base along the DNA stack imposes geometrical constraints favoring the parallel conformation.
Cation-Chains in Protein-Ligand Complexes-In some instances, two or more cation-interactions occur concomitantly, hence forming cation-chain motifs (11). Two types of chains motifs can be distinguished: either a side chain carrying a net or partial positive charge interacts with two aromatic rings (type І؉І chain, where denotes an aromatic cycle; ϩ, a net or partial positive charge; and І a cation-interaction) or an aromatic ring interacts with two side chains carrying a net or partial positive charge (type ؉ІІ؉ chain). A total of 10 cation-chains are identified in the set of protein-ligand complexes, 5 of each type (Table I).
In the 5 occurrences of type І؉І cation-chains, a guanidinium or amide group of Arg, Asn, or Gln is sandwiched between an Ade base and an aromatic Trp, Tyr, or Phe side chain. The protein-ligand cation-interaction is thus prolonged with a protein-protein cation-interaction (Fig. 1, c and d).
The ؉ІІ؉ cation-chains can be subdivided in two subsets, one in which the two side chains are at the same side of the aromatic ring (noted ؉ІІ؉ s ) and one in which they are at opposite sides (؉ІІ؉ o ). There is only one occurrence of ؉ІІ؉ o chain, with the 5-member cycle of a Gua base sandwiched between two Lys side chains (Fig. 1a), whereas there are four occurrences of ؉ІІ؉ s chains, where a Gua base interacts simultaneously with two consecutive residues along the polypeptide chain, an Asn and a Lys, both situated above the 5-member cycle (Fig. 1b). Note the simultaneous presence of one partial and one full positive charge above the aromatic cycle; the presence of two net positive charges would probably be unfavorable.
Though the number of cation-chains is too limited to allow firm conclusions, it is striking that І؉І cation-chains always involve Ade bases included in ADP-like ligands and that the ؉ІІ؉ chains always involve Gua bases contained in GDP-like ligands. Furthermore, the five proteins presenting the І؉І motif present neither sequence nor structure similarity. In contradiction, the four proteins having the ؉ІІ؉s motif have unrelated sequences but are structurally similar; the structure superposition program SoFiSt (34) superposes about 130 residues with a root mean square deviation of heavy backbone atoms in the 1.0 -2.7 Å range. Strikingly, in the resulting sequence alignments, the Asn-Lys pair forming the cation-chain with the ligand is aligned in all four structures. Among the other residues that interact with the ligand, in particular through H-bonds, the pattern Gly-Lys-Thr-Thr, or its variant Gly-Lys-Ser-(Ser/Ala), is also well conserved and aligned, as well as an Asp residue at position A119 in 1C1Y numbering. There is thus a residue conservation in the immediate neighborhood of the ligand but not in the rest of the protein.
In general, it can be argued that the presence of a specific cation-chain motif could be due to a local sequence conservation (a global sequence similarity is excluded because of the requirement of less than 25% protein sequence identity, see "Materials and Methods") and/or have a structural role. In particular, the conservation of the ؉ІІ؉ s motif could possibly reflect some sequence similarity left after (highly) divergent evolution or acquired during convergent evolution. Alternatively, this motif, as well as the І؉І motif, could play a particular role in the recognition process, especially as the former is exclusively found in GDP/protein complexes and the other in ADP/protein complexes. However, it must be noted that not all ADP-or GDP-containing proteins exhibit these motifs. At this point, the exact role of these chain motifs is thus far from settled.
the nucleobases and the aromatic residues Tyr, Trp, and Phe, among which 49 correspond tointeractions. Hence, the nucleobase-amino acid cation-interactions, with their 78 occurrences, appear to be more frequent thanstacking interactions in protein-ligand complexes, and its importance should thus not be overlooked. Furthermore, most of the amino acid side chains involved in cation-interactions simultaneously make additional interactions with their local environment. These interactions vary among the complexes, and involve solvent molecules, other amino acid side chains or other ligand moieties adjacent to the purine or pyrimidine ring.
This observation demonstrates an important difference between protein-ligand and protein-DNA cation-interactions. Indeed, the latter occur almost systematically with an H-bond, generally double, that links the guanidinium, amide, or ammonium group with the next base along the DNA stack, hence forming a recurrent motif with typical stair shape (11). This concomitant occurrence has sometimes been used to shed doubt on the reality on cation-interactions, suspected to result only from the geometric constraint imposed by the H-bond. In contrast, in 9% of the protein-ligand cation-interactions no concomitant H-bond is observed, and in the 91% remaining cases, the H-bond forms with different partners positioned differently with respect to the cation-partners, as a consequence of a much more flexible environment. Thus, the cation-interaction is here certainly not due to a geometric constraint.

Conservation of Cation-Interactions in Protein Families-
The conservation of the amino acid residues involved in cation-interactions in families of related proteins was analyzed by scanning the PIR sequence data base, as described under "Materials and Methods." For the sake of simplicity, we focused on the cation-chain motifs given in Table I. The results are summarized in Table II. The two residues carrying a partial or net positive charge, involved in the ؉ІІ؉ s cation-chain motifs, are almost perfectly conserved, in the 4 protein families in which they occur. In the only family presenting a ؉ІІ؉ o motif, at least one of the Lys residues, and thus at least one of the two cation-interactions, is conserved. Finally, in the 5 families displaying a І؉І chain, the conservation of the charged residue, which forms the cation-interaction with the ligand is quite fair; the conservation of the aromatic residue, which extends the cation-motif inside the protein is somewhat lower. Note that the observed substitutions often maintain the aromatic or (partially) charged nature of the amino acid side chains. In summary, the rate of residue conservation of the different motifs is somewhat variable, but overall quite good, which supports the importance of cation-interactions in the formation and/or stabilization of protein-ligand complexes.
Energy Calculations on Cation-Partners-With the purpose of explaining the preferred occurrence of specific cationpartners, we performed ab initio quantum mechanics energy calculations at MP2/6 -31G**(0.2) level on the 77 cation-pairs extracted from their environment, as described under "Materials and Methods." The energy values are given under "Supplementary Material" and their averages in Table III; the cationinteractions involving Ade bases and guanidinium groups are depicted and colored according to their energy in Fig. 2. For Lys residues, two different orientations of the ammonium group relative to the nucleic acid base were considered; because both orientations gave almost the same energy value (see Table III), only the most favorable energy was retained.
The first remarkable feature, clearly visible in Fig. 2, is that roughly all positions in a slice situated about 4 Å above the aromatic cycles are favorable energetically to groups carrying a net or partial positive charge. The positions with the lowest energies are just above the cycles or above some of the extracyclic atoms; the energies become less favorable when moving FIG. 2. Distribution of cation-pairs involving guanidinium groups and Ade ring systems, observed in the protein set. The Ade bases are superimposed and, for a better comparison, the guanidinium groups have all been located "above" the Ade plane, by projection with respect to that plane. Yellow and red indicate energetically favorable and unfavorable positions, respectively. Carbon atoms of the Ade base are colored in green and nitrogen atoms in blue. The images were generated using Insight II (Accelrys Inc.). a, back view; b, side view. further away from the cycles.
Another observation is that the most favorable energies are reached for cation-interactions involving the positively charged Lys or Arg side chains and the purine bases Ade or Gua. The Arg-Gua, Arg-Ade, Lys-Ade, and Lys-Gua pairs display an average interaction energy of Ϫ10.4, Ϫ5.6, Ϫ4.5, and Ϫ3.9 kcal/mol, respectively. Note that the significance of the average energy of Arg-Gua pairs is questionable as it is computed on the basis of only two occurrences. The interaction energies of cation-pairs between purine bases and Asn/Gln residues involving only a partial positive charge are somewhat less favorable but usually still negative: Ϫ3.7, Ϫ2.4, and Ϫ1.3 kcal/mol on average for Asn-Ade, Asn-Gua, and Gln-Ade. Finally, cation-interactions involving pyrimidine bases Cyt and Thy are very rare, there is at most one occurrence of each pair. Their energy is equal to Ϫ1.6 or Ϫ6.1 kcal/mol when involving Arg, and to Ϫ0.5/Ϫ1.6 kcal/mol when involving Asn/Gln. Cation-interactions involving pyrimidine bases seem thus slightly less favorable than those involving purine bases, but there are too few examples to be sure of this. Note that these conclusions do not depend on whether the charged group is located above the aromatic ring itself or above the extracyclic atoms. Indeed, the average energies remain essentially unchanged.
A feature that distinguishes Lys as cation-partner is that the ⌬E HF and ⌬E MP2 interaction energies are almost equal, whereas for Arg, Asn, and Gln, ⌬E MP2 is of the order of 3 kcal/mol lower than ⌬E HF . This means that the cation-interaction energy is essentially of electrostatic nature when Lys residues are involved, whereas the electron correlation contribution is non-negligible for Arg, Asn, and Gln residues, owing to the stacking between the aromatic and the guanidinium or amide planes when these are parallel and to the delocalization of the charge. This difference could perhaps explain that Arg is more frequently involved in cation-interactions than Lys. Note that it is the electron correlation contribution, which renders in general the cation-interactions involving Asn or Gln favorable. Indeed, ⌬E HF is usually positive or zero and ⌬E MP2 is usually negative, both for the parallel and perpendicular (or T-shaped) conformations.
The lowest cation-interaction energies are found for specific geometries. The Lys-Gua pair in the 1QF5 complex, with the lowest of all ⌬E MP2 energies computed (Ϫ16.4 kcal/mol), has one hydrogen atom of the ammonium group close to the extracyclic oxygen atom O 6 yielding a strongly negative electrostatic term. Interaction energies of Ϫ10/Ϫ11 kcal/mol are reached for Arg-Ade pairs in the 1E2F, 1QFL, and 1ZIN complexes where the guanidinium ion is stacked above the 6-member aromatic ring in close proximity of the exocyclic nitrogen; this particularly favorable conformation has already been noted before (30,37). The two Asn-Ade cation-pairs identified in 1A82 and 1DXY structures are also quite particular; the same partners form both a cation-and an H-bond interaction, which explains the low energy values obtained (Ϫ8.2 and Ϫ7.4 kcal/mol). More precisely, the H-bond links the O ␦1 atom of Asn to the H-atom on N 6 of Ade, whereas the N ␦ 2 atom of Asn is in cation-interaction above the N 7 atom of Ade; these cationinteractions are however at the limit of our detection criteria, as they are situated near the border of the cylinder defining them (see "Materials and Methods").
In protein-DNA complexes, the Arg-Gua pairs were shown to constitute the most stable and frequent cationinteractions (10). Here, in the protein-ligand context, only two such associations have been identified, and their energy is so different (Ϫ5.6 and Ϫ15.2 kcal/mol) that it is difficult to determine if they are really more stable than the other cationpairs. As a matter of fact, there is no clear correlation between the energy values and the frequency of occurrences, in contrast to the protein-DNA cationinteractions.
An important difference between the protein-DNA and protein-ligand contexts is that in the former the amino acid side chains can approach only some sides of the nucleic acid bases and remain in general far from the center, because of steric constraints. Moreover, since the exocyclic heteroatoms accessible to amino acid side chains are different in the four nucleic acid bases, the cation-energy values are very different. This renders the protein-DNA cation-interactions quite specific. This view is in agreement with the recent observations that the strength of the cation-interactions is very sensitive to the electron density on the face of the aromatic ring (38). In protein-ligand complexes, there are much fewer constraints, and amino side chains can come right above the aromatic cycles of the ligands, with the consequence that the cation-energy values are more similar for the four nucleic acid bases and that these interactions lose specificity.

DISCUSSION
Protein-ligand cation-interactions between charged residues and nucleic acid bases are observed in 38% of the complexes of the dataset and are thus rather common. They are even more common thaninteractions between aromatic residues and these bases. The more specific cation-chain motifs that simultaneously involve two cation-interactions appear in 5% of the complexes. These observations concur with the generally favorable energies of cation-pairs estimated using ab initio quantum mechanics calculations at MP2 level. These energies are, on average, in the Ϫ4 to Ϫ10 kcal/mol range when involving the net charge of Arg or Lys, and in the Ϫ1 to Ϫ4 kcal/mol range when involving the partial positive charge of Asn or Gln. The frequent occurrence of protein-ligand cation-pairs in protein structures and their favorable energy lead to the conclusion that they contribute to the stability of the complex.
We would like to stress that the energy of the Asn/Gln involving cation-interactions, also termed amino-interactions, is on average slightly positive at HF calculation level and becomes negative at MP2 level (see Table III and "Supplementary Material"). These interactions are thus only computed as being favorable when electron correlation contributions are taken into account.
An a priori surprising result of the present analysis is the poor correlation between the frequency of occurrence of specific cation-pairs and their computed energy. Indeed Arg-Ade is more frequent and more favorable energetically than Lys-Ade, on average, but Arg-Gua is less frequent though presenting a more negative average energy than Lys-Gua. By far the most often occurring cation-pairs is Arg-Ade, which constitutes 47% of all observed pairs, whereas its energy is not more favorable than that of Arg-Gua pairs, which forms less than 3% of the cation-pairs. There is thus only a very limited correlation between the frequency of cation-pairs and their energies, in contrast with what happens in the protein-DNA context, where an almost perfect correlation has been noted (10).
There are different explanations to this limited correlation. First, the ab initio energy computations were performed in vacuum and may not be directly transposed to more realistic environments consisting of water and/or protein residues. Ab initio energies of H-bonds and especially salt bridges are largely overestimated in vacuum compared with water (4, 39), whereas stacking interactions are less overestimated (39). In other words, electrostatic contributions, estimated at HF calculation level, are more overestimated than electron correla-tion contributions introduced at MP2 level. Since both the electrostatic and electron correlation contributions are important in cation-interactions (Table III), albeit at different degrees according to the type of cation-partners and their relative positioning, the transposition of vacuum energy values to energy values in solvent can entail modifications in the energy ranking of the different cation-pairs. Note that we chose not to optimize the geometry of the interactions. Indeed, optimization is likely to induce non-realistic geometries, because the relative weight of the different interactions is not reliable, and also because extracting two partners from their environment overlooks the natural structural constraints (40).
Another reason of the preferential occurrence of certain cation-pairs could be their possible role in protein-ligand recognition and/or in functional features. This view is supported by the conservation in unrelated sequences of the ؉ІІ؉ s cation-chain, with a Gua from a GDP-type ligand in simultaneous cation-interaction with an Asn and a Lys, and of the ІϩІ chain motif, where a (partially) charged side chain group is sandwiched between an aromatic amino acid and an Ade base from an ADP-type ligand. However, considering that the proteins in which these chain motifs occur do not seem to have particular functional features compared with proteins that are bound to the same ligands but that present no or different cation-interaction patterns, this view appears rather dubious. The comparison of protein subsets with and without cation-interactions (not necessarily cation-chains) leads to a similar conclusion. We indeed found that each type of ligand is linked to its target protein, sometimes through cation-interactions and sometimes not. In particular, ATP is involved in cation-interactions in some transferases and kinases and not in others. It remains nevertheless conceivable that cation-(chain) interactions are associated with a more precise, not yet identified, recognition or functional feature. Alternatively, there could several ways to achieve a given goal, either by means of cation-(chain) motifs or through other types of interactions.
Finally, the overwhelming majority of protein-ligand cation-interactions involving Ade bases, though their energy is not more favorable than of those containing Gua, can simply be explained by the abundance of Ade as natural ligand building block. A rapid calculation shows indeed that there is a slightly larger fraction of Gua-containing ligands that forms cationinteractions than of Ade-containing ligands. This result shifts the question to why is Ade the most frequent ligand skeleton. A possible answer is that Ade was the first purine to be incorporated in living matter (41), but this prebiotic hypothesis is still a subject of debate (26). It is in agreement with the fact that purine bases are easier to synthesize than pyrimidine bases, thereby explaining the very low number of Cyt-and Thy-containing ligands. However, the de novo purine nucleotide biosynthesis requires, at least today and in most eukaryotes, an equivalent number of steps. The predominance of Ade could also be a random initial choice of nature that would be kept through evolution, or be related to the fewer number of possible H-bonds of Ade compared with Gua and thus to its lesser specificity. It could also be related to the property of Gua to be easily oxidizable (23), which could be problematic for certain protein-ligand interactions. Clearly, this issue is far from being understood.