The Structure of an Ancient Conserved Domain Establishes a Structural Basis for Stable Histidine Phosphorylation and Identifies a New Family of Adenosine-specific Kinases*

Phosphorylation of both small molecules and proteins plays a central role in many biological processes. In proteins, phosphorylation most commonly targets the oxygen atoms of Ser, Thr, and Tyr. In contrast, stably phosphorylated His residues are rarely found, due to the lability of the N-P bond, and histidine phosphorylation features most often in transient processes. Here we present the crystal structure of a protein of previously unknown function, which proves to contain a stably phosphorylated histidine residue. The protein is the product of open reading frame PAE2307, from the hyperthermophilic archaeon Pyrobaculum aerophilum, and is representative of a highly conserved protein family found in archaea and bacteria. The crystal structure of PAE2307, solved at 1.45-Å resolution (R = 0.208, Rfree = 0.227), forms a remarkably tightly associated hexamer. The phosphorylated histidine at the proposed active site, pHis85, occupies a cavity that is at the interface between two subunits and contains a number of fully conserved residues. Stable phosphorylation is attributed to favorable hydrogen bonding of the phosphoryl group and a salt bridge with pHis85 that provides electronic stabilization. In silico modeling suggested that the protein may function as an adenosine kinase, a conclusion that is supported by in vitro assays of adenosine binding, using fluorescence spectroscopy, and crystallographic visualization of an adenosine complex of PAE2307 at 2.25-Å resolution.

Phosphorylation of both small molecules and proteins plays a central role in many biological processes. In proteins, phosphorylation most commonly targets the oxygen atoms of Ser, Thr, and Tyr. In contrast, stably phosphorylated His residues are rarely found, due to the lability of the N-P bond, and histidine phosphorylation features most often in transient processes.
Here we present the crystal structure of a protein of previously unknown function, which proves to contain a stably phosphorylated histidine residue. The protein is the product of open reading frame PAE2307, from the hyperthermophilic archaeon Pyrobaculum aerophilum, and is representative of a highly conserved protein family found in archaea and bacteria. The crystal structure of PAE2307, solved at 1.45-Å resolution (R ‫؍‬ 0.208, R free ‫؍‬ 0.227), forms a remarkably tightly associated hexamer. The phosphorylated histidine at the proposed active site, pHis 85 , occupies a cavity that is at the interface between two subunits and contains a number of fully conserved residues. Stable phosphorylation is attributed to favorable hydrogen bonding of the phosphoryl group and a salt bridge with pHis 85 that provides electronic stabilization. In silico modeling suggested that the protein may function as an adenosine kinase, a conclusion that is supported by in vitro assays of adenosine binding, using fluorescence spectroscopy, and crystallographic visualization of an adenosine complex of PAE2307 at 2.25-Å resolution.
Phosphorylation of appropriate functional groups provides one of the key mechanistic devices in biology. Thus, phosphoryl transfer reactions involving molecules such as ATP and GTP, and the phosphorylation state of small molecule substrates, are critical to many enzyme reactions. Likewise, the phosphorylation and dephosphorylation of amino acid side chains in proteins forms the basis of many signaling and regulatory processes. The most common targets are the -OH groups of amino acid side chains such as serine, threonine, and tyrosine and small molecules such as sugars. In part this reflects the ubiquitous occurrence of such groups, but it also depends on the relative stability of the O-P bond to hydrolysis, which gives longlived phosphorylation.
In contrast, the phosphorylation of nitrogen atoms, with the formation of N-P bonds, is not often observed (1). In proteins, the phosphorylation of histidine residues produces a phosphoramidate bond that has a large standard free energy of hydrolysis. This makes phosphohistidines the most unstable of any known phosphoamino acid (2) and favors the utilization of histidine in rapid processes involving transient phosphorylation. Examples include the use of phosphohistidines as enzyme intermediates, as in the mechanisms of succinyl-CoA synthetase (3) and nucleoside diphosphate kinase (4), or for rapid signaling processes such as the two-component systems of bacteria (5,6). In the latter, a sensor is connected to a regulator through histidine phosphorylation and a subsequent phosphotransfer to an aspartate residue. Another well characterized system, the bacterial phosphoenolpyruvate:sugar phosphotransferase system (PTS), 4 consists of proteins that carry out four successive phosphoryl transfers until the final phosphorylation of the sugar, concomitant with transport into the cell across the bacterial cell membrane (7). In the PTS, Enzyme I (EI) first autophosphorylates, using phosphoenolpyruvate as a substrate, and then transfers the phosphoryl group to the histidine-containing protein. Subsequently the phosphoryl group is transferred to the sugar-specific Enzymes II (EIIA and EIIB), which can consist of multisubunit enzymes or separate proteins. Each of four proteins, EI, histidine-containing protein, EIIA, and EIIB, is transiently phosphorylated at a histidine residue, except certain EIIBs, which instead can be phosphorylated at a cysteine residue. During the phosphotransfer in the PTS, the phosphoryl group, commencing with the N⑀2 atom on EI, alternates between being bound on the N⑀2 atom of a histidine residue of one component and the N␦1 atom of a histidine of the next component (8).
The characteristic instability of phosphorylated histidines means that only a few examples have been defined structurally. The structure of the phosphohistidine form of histidine-containing protein has been determined by NMR spectroscopy (9), but none of the phosphohistidine forms of the PTS proteins have been elucidated by x-ray crystallography. The structures of phosphohistidine intermediates have been determined crystallographically for three enzymes, succinyl-CoA synthetase (3), nucleoside diphosphate kinase (4), and a cofactor-dependent phosphoglycerate mutase (10). From these, it is clear that phosphohistidine residues can be stabilized in certain environments. Indeed, in histone H4, a phosphohistidine (at residue 75) has been shown to have a half-life of 12 days at room temperature and pH 7.6 (11).
Here we describe the crystal structure of a functionally uncharacterized protein, PAE2307, which has been discovered to contain a stably phosphorylated histidine residue. This protein was selected as a target in a pilot structural genomics enterprise, focused on open reading frames (ORFs) from the hyperthermophilic archaeon Pyrobaculum aerophilum (12), which were annotated as unknowns and for which no functional or structural predictions could be made based on amino acid sequence. PAE2307 is a representative of a conserved family of proteins found in both archaeal and bacterial species, which has been described as an "ancient conserved domain" (13) and presumably has an evolutionarily distant origin, prior to the divergence of the bacteria and the archaea. The high level of conservation in this family suggests some important but previously uncharacterized biological function (14,15). The discovery of a phosphorylated histidine strongly suggests that PAE2307 is involved in a phosphotransfer reaction common to both bacteria and archaea and enables us to analyze the structural features that contribute to stable histidine phosphorylation. Additionally, bioinformatic evidence coupled with weak structural similarity to a thermostable DNA polymerase suggests that nucleotides or nucleosides are the most likely substrate for the phosphorylation reaction thought to be catalyzed by PAE2307. In silico modeling identified a binding site for adenosine that places the C5Ј-hydroxyl group of ribose adjacent to the phosphorylated histidine, suggesting that the protein may function as a nucleoside kinase. This model is supported by in vitro measurements using fluorescence spectroscopy and the visualization by x-ray crystallography of the adenosinebound form of the protein.

EXPERIMENTAL PROCEDURES
ORF Selection-ORFs that were annotated as "unknown" in the preliminary annotation of the P. aerophilum genome (12) were selected as an initial target set. These ORFs were then screened for putative trans-membrane helices using DAS (16) and TMHMM (17). PSI-BLAST (18) searches using the ORFs that lacked predicted transmembrane helices were manually inspected, and those with any significant similarity to characterized genes were removed. Finally, the remaining ORFs were passed through a fold prediction algorithm (19), and those that gave a Z-score of Ͼ5 were removed. The remaining ORFs were considered 'true' unknowns and were selected for further study.
Protein Expression, Purification, and Crystallization-PAE2307 was overexpressed in Escherichia coli, using the expression vector pET28, and was purified by immobilized metal ion affinity chromatography and size-exclusion chromatography as described (20). The expression vector adds an N-terminal polyhistidine tag, which was used for purification, but was not cleaved, and therefore remained on the protein during subsequent studies. Native PAE2307 crystals were initially obtained in two crystal forms, only one of which was suitable for x-ray analysis (20). This crystal form was grown from an unbuffered solution of 50 mM KH 2 PO 4 containing 20% polyethylene glycol 8000, and proved to be tetragonal, space group I4 1 22, with unit cell dimensions a ϭ b ϭ 120.0, c ϭ 156.5 Å and three copies of the protein monomer in the asymmetric unit. The crystals diffracted to 1.45-Å resolution, and they were used for the native structure determination but were not reproducible. For subsequent ligand soaking experiments, a third crystal form (Type III) was obtained by mixing protein solution (6.8 mg/ml in 20 mM HEPES, 150 mM NaCl, pH 8.0) in a 1:1 ratio with mother liquor (21% (w/v) methoxypolyethylene glycol 5000, 0.2 M N-(1,1-dimethyl-2-hydroxyethyl)-3-amino-2-hydroxypropanesulfonic acid/KOH, pH 8.3) and incubating in a sittingdrop experiment at 18°C. Thick, plate-like crystals appeared after 3 days and reached their maximum size after 14 days. These crystals proved to be orthorhombic, space group P2 1 2 1 2 1 , a ϭ 76.9, b ϭ 109.8, c ϭ 112.5 Å, with six molecules in the asymmetric unit.
Data Collection, Structure Determination, and Refinement-The structure of native apoPAE2307 was solved by single-wavelength anomalous diffraction phasing from a platinum derivative, as described previously (20). Manual building using O (21) into a map calculated from experimental phases to 2.1 Å yielded a partial model, which was refined against a 1.45-Å resolution native data set, collected at 110 K at the Stanford Synchrotron Radiation Laboratory (beamline 9-2, Area Detector Systems Corp. Quantum 4 detector, wavelength 0.9792 Å). The data statistics are in Table 1. A single round of simulated annealing using CNS (22) reduced the R free from 53.3% to 43.4%, and subsequent automatic model building using ARP/wARP (23) resulted in an almost complete amino acid chain for all protomers. Two further rounds of model building using O, interspersed with simulated annealing, energy minimization, and individual atomic B-factor refinement using CNS, resulted in a final model comprising residues 5-167 of monomers A and B, and residues 6 -167 of monomer C, with R cryst ϭ 20.8% and R free ϭ 22.1%. No density was found for the N-terminal His tag on any of the three molecules in the asymmetric unit. Full refinement statistics are shown in Table 1. The Ramachandran plot produced by PROCHECK (24) showed that 90.8% of all residues fall in the most favored regions, and only two residues per monomer, Ala 26 and Phe 28 , are in disallowed regions; both are adjacent to the active site, and both have unambiguous electron density.
The complex with adenosine was obtained by soaking a Type III native crystal in a series of cryoprotectant solutions, which consisted of mother liquor containing 1%, 5%, 10%, 15%, and 20% glycerol, and which were all saturated with adenosine. Adenosine was poorly soluble in the mother liquor, but its concentration was estimated to be at least 250 M. The crystal was then flash-cooled in liquid nitrogen for data collection. Data were collected at 110 K at the European Synchrotron Radiation Facility, Grenoble (beamline ID-29, wavelength 0.9793 Å). The data were processed with MOSFLM and SCALA from the CCP4 program suite (25), giving a full data set to 2.25 Å ( Table  1). The structure of the adenosine complex was solved by molecular replacement with MOLREP (26), using the 1.45-Å native structure as search model. Refinement was with , with manual rebuilding achieved by using COOT (28). Clear electron density for a bound adenosine molecule was found in the putative active sites of each of the six molecules in the asymmetric unit; these were modeled in and refined with full occupancy. The final structure comprises residues 5-167 of molecules A, B, D, and E, and 6 -167 of molecules C and F; six adenosine molecules; two phosphate ions; and 407 water molecules. Final values of R cryst and R free were 18.2% and 24.1%, respectively (see Table 1 for full details).
Crystallographic Verification of His 85 Phosphorylation in ApoPAE2307-The first electron density maps of the I4 1 22 crystal form strongly suggested that His 85 was phosphorylated in each of the three protomers. The initial refinement, in which this residue was modeled as an unmodified histidine, returned B-factors for His 85 and the associated Asp 139 that were consistent and similar: 10 -14 Å 2 and 13-15 Å 2 , respectively, for protomer A, 12-15 Å 2 and 11-13 Å 2 for protomer B, and 11-13 Å 2 and 14 -15 Å 2 for protomer C. Difference maps after this refinement showed unequivocal electron density for phosphoryl groups attached to each histidine, at levels of 4, 2, and 4, respectively, for monomers A, B, and C, and with P-N⑀2 (His) distances of 1.7-1.8 Å. Phosphoryl groups were then added to each His 85 side chain, at full occupancy, and a further round of refinement was carried out. The B-factors for the three phos-phoryl groups were all in the range 20 -23 Å 2 , slightly higher than the values for the non-hydrogen atoms of the side chains, which remained at 12-17 Å 2 . However, difference maps showed some negative density around the phosphorus atoms, so a round of grouped occupancy refinement was performed for the histidine phosphate groups, which lowered the occupancy of the phosphate groups to 0.74 for protomer A, 0.60 for protomer B, and 0.67 for protomer C. These values were retained in further rounds of refinement.
In Silico Docking Experiments-The structures of potential ligands at the highest resolution available were obtained from the HIC-Up data base (29) and were energy-minimized and converted to MOL2 format using the PRODRG server (30). In silico ligand binding was performed using the program GOLD (31).
Fluorescence Spectroscopy-Spectroscopic experiments were performed on an Hitachi F-4500 fluorescence spectrophotometer, with excitation and emission slit widths of 5 nm, and a scan speed of 240 nm/min. Excitation and emission wavelengths of 280 nm and 347 nm, respectively, were used for all samples, using a quartz cuvette with 0.5-cm path length. Analysis by mass spectrometry gave a mass of 20,674 Da, compared with 20,675 Da expected for the full-length His-tagged protein minus its N-terminal Met residue. This showed that the protein used in these assays was homogeneous and not phosphorylated. Protein (5 M) and ligands (0.5-500 M) were each dissolved in a buffer solution containing 20 mM HEPES and 150 mM NaCl at pH 8.0. The data were corrected for inner filter effects using a correction factor as described by Eftink (32). The true protein fluorescence intensity was determined by subtracting the buffer fluorescence intensity from the sample fluorescence intensity, and multiplying by the correction factor (C) for that particular sample, where C ϭ 10 ((⌬Aex/2)ϩ(⌬Aem/2)) , ⌬A ex is the absorbance change due to substrate at the excitation wavelength, and ⌬A em is the absorbance change due to substrate at the emission wavelength. Ligand binding data were fitted using SigmaPlot v9.01 (Systat Software Inc.) using a one-site saturation model:

RESULTS
Protein Sequence Analysis-A PSI-BLAST query of the NCBI non-redundant protein sequence data base (release date: December 15, 2005), with the PAE2307 sequence, shows convergence after three iterations (using an E-value cutoff of 0.05) to produce a set of protein sequences derived from 42 distinct species. The proteins form a discrete family of well conserved sequences, all of which are at least 35% identical to each other in amino acid sequence. The members of this protein family are drawn equally from archaeal species (18 examples) and bacterial species (24 examples). They cover a broad phylogenetic range, including representatives from the crenarchaeota, euryarchaeota, ␣, ␤, and ␦ proteobacteria, thermotogae, actinobacteria, Deinococci, and cyanobacteria. However, the family contains no eukaryotic members. A multiple sequence alignment of a phylogenetically diverse selection of these protein se-quences is shown in Fig. 1. The multiple sequence alignment reveals 24 residues that are absolutely conserved, including the phosphorylated histidine and several other residues that surround the putative active site. This protein family is classified in the Pfam and Interpro databases as protein of unknown function DUF355 (Pfam accession number PF04008; Interpro accession number IPR007153) and is annotated with the comment that "The high level of conservation in this family suggests some as yet unknown important biological function." The genomic context of the PAE2307 gene is uninformative, because it does not sit in an obvious operon and is flanked in the genome by poorly characterized ORFs, which are annotated as a putative resistance protein and putative transcriptional regulator. There is also no clear synteny, as the context of orthologs in other species is variable.
Overall Fold of PAE2307-The PAE2307 monomer (Fig. 2a) is folded into a single domain with an extended C-terminal arm, culminating in an ␣-helix, residues 148 -162, that plays a major role in oligomerization. The monomer folds around a central five-stranded antiparallel ␤-sheet, comprising (in order) strands ␤6-␤2-␤5-␤3-␤4 (Fig. 2b). On one face of this sheet are packed two ␣-helices, ␣1 (residues 28 -42) and ␣3 (residues 99 -105). The other face is covered by a third helix ␣2 (residues 67-82) and a second, three-stranded, antiparallel ␤-sheet, comprising strands ␤1, ␤7, and ␤8. The phosphorylated histidine, pHis 85 , described later, is found at the start of ␤5, the central strand of the main ␤-sheet, in a cavity bounded by loops from two adjacent monomers. A striking feature of the structure, possibly related to protein thermostability, given that it is derived from a hyperthermophilic organism, is the large number of charged side chains (7 Asp, 13 Glu, 13 Lys, 10 Arg in 167 residues; 25% overall), which give rise to many salt bridges ( Table 2). Eleven intramolecular salt bridges are found, and a FIGURE 2. Structure of PAE2307. a, ribbon diagram of the PAE2307 monomer colored blue to red from N to C termini, with secondary structure elements labeled. This and subsequent figures were produced using PyMol (W. L. DeLano (2002) The PyMol Molecular Graphics System, DeLano Scientific LLC, San Carlos, CA, available at www.pymol.org). b, topology diagram of the PAE2307 monomer, colored in the same way as in a, with the start and end residues marked for each secondary structure element. c, the PAE2307 hexamer, with each monomer colored separately. The red, blue, and green monomers form a trimer, which represents the contents of the asymmetric unit. A crystallographic 2-fold axis produces the complete hexamer by generating the monomers shown in yellow, brown, and pink. The left-hand picture shows the view looking down the 3-fold non-crystallographic axis and indicates the position of each active site in the trimer, outlined with a light blue oval. The right-hand picture shows the orthogonal (side-on) view of the hexamer.
further eight help to stabilize the quaternary structure (see below). Also notable are two internal cysteine residues that are close enough to form a disulfide bond (inter-sulfur distance ϭ 3.8 Å) but in the present structure do not do so.
Quaternary Structure-The protein monomers are arranged in a tightly associated hexamer (Fig. 2c), best described as a dimer of trimers. Dynamic light scattering analysis showed that PAE2307 has a particle size of 123 Ϯ 10 kDa in solution, consistent with a hexameric species. The trimer is formed by the three monomers of the crystal asymmetric unit, with the hexamer being completed by 2-fold crystallographic symmetry. At the center of the trimer, the outer strands (␤8; residues 131-140) of the small ␤-sheet from each of the three monomers face each other around the 3-fold axis. Taking molecule A as the reference, residues 131-135 run antiparallel to 136 -140 of molecule B and residues 136 -140 run antiparallel to 131-135 of molecule C. Many hydrophobic side chains pack between these three small ␤-sheets to give stability to the trimer.
Residues 139 -144 of molecule A continue on to make extensive inter-subunit contacts, passing through a cleft made by the ␤3-␤4 and the ␤7-␤8 loops from subunit C, with a salt bridge (Arg 59 -Glu 130 ) completing the encirclement of the peptide as it passes between them. Residues 144 -147 from molecule A then run parallel to residues 59 -63 of molecule C, making a number of hydrogen bonds, both main chain and side chain; Arg 62 and Glu 147 , both invariant in all the sequences in Fig. 1, form a salt bridge and make other hydrogen bonds that contribute to this very intimate subunit association.
Finally, the C-terminal helix has a striking amino acid composition with no fewer than 11 charged residues out of 16, and extends as an arm to make extensive interactions with the adjacent monomer in the trimer (A with C, C with B, and B with A). Three arginine residues on the inner face of the helix, Arg 152 , Arg 155 , and Arg 160 , play a central role, making three salt bridges, and additional hydrogen bonds, with residues from ␤1 and ␤4 of the adjacent subunit, and three intrahelix salt bridges. Three further salt bridges involving Glu 147 and Lys 166 from this C-terminal arm link it with the ␤1-␤2 loop of the adjacent subunit. These intermolecular salt bridges are listed in Table 1 and clearly play a large part in the trimer stability.
A phosphate ion is found within the narrow tunnel that runs along the 3-fold axis of the trimer, bound by the Arg 111 side chain of each protomer in the trimer. The P-O2 bond lies along the 3-fold axis such that O2 interacts with the NH 2 groups of three Arg 111 side chains. In addition, N⑀ of each Arg 111 side chain in the trimer interacts with one of the other phosphate oxygens.
The trimer can be described as a domed assembly with a rather flat surface as its base. The flat face is formed by the three sets of helices ␣1 and ␣3 that back against the central ␤-sheets of the three monomers; the hexamer then forms by the packing of these two flat faces against each other. Overall, each monomer buries 1830 Å 2 of surface area (18.6% of its total) in contact with one adjacent monomer in the trimer, and 1910 Å 2 of surface area (19.4% of its total) in contact with the other adjacent monomer. The total of 3740 Å 2 of surface area in each monomer is buried in intra-trimer subunit contacts, from a total surface area of 9838 Å 2 per monomer, indicates an extremely tightly associated trimer. A further 1015 Å 2 of surface area per monomer is buried in inter-trimer contacts, which have a majority (60%) hydrophobic character. The center of the hexamer is an open, water-filled chamber, with a noticeably positively charged surface.
Fold Similarity-Searching the Protein Data Bank with the PAE2307 monomer using DALI (33) or SSM (34) gives two obvious matches: the conserved hypothetical proteins TT1634 from Thermus thermophilus (PDB code: 1VGG) and TA1353 from Thermoplasma acidophilum (PDB code: 1RLH), which are both orthologs of PAE2307. The structure of the T. thermophilus protein is extremely similar to PAE2307. It is also a hexamer, and the two structures superimpose with a rootmean-square (r.m.s.) difference in C␣ atomic positions of only 0.78 Å. However, its active site histidine is not phosphorylated. The T. acidophilum homologue lacks the C-terminal ␣4 helix, and the crystal structure contains only a monomer in the asymmetric unit, but forms trimers of the same sort as PAE2307 via 3-fold crystallographic symmetry. The T. acidophilum protein and PAE2307 superimpose with an r.m.s. difference in C␣ positions of 1.24 Å.
Besides these obvious cases, the structural similarity to known protein structures detected by fold comparison is limited to a partial match of the central ␤-sheet and the ␣1 and ␣3 helices. Many proteins display a topologically similar packing of two helices against a 3-or 4-stranded antiparallel sheet (a ferrodoxin-like folding arrangement) in a variety of different contexts, but the packing of helices on either side of a five-stranded antiparallel sheet appears to be novel. The most similar such arrangement is found in the N-terminal domain of a B-type DNA polymerase from Thermococcus gorgonarius (1TGO (35)). Interestingly, this domain has an adjacent three-stranded antiparallel ␤-sheet, albeit with different topological connections, which is involved in inter-domain interactions within the polymerase (Fig. 3). The inference that can be drawn from this similarity is limited as the functional importance of this domain for the polymerase is unknown.
Active Site-The phosphorylated histidine residue, pHis 85 , is located in a cavity at the interface between two subunits. This is the presumed active site, giving 6 active sites per hexamer. The cavity is formed by residues 26 -30 and 53-58 of one subunit (the ␤2-␣1 and ␤3-␤4 loops, respectively) and residues 115-119 (the ␤6-␤7 loop) and 138 -142 of the adjacent subunit. At the center of this cavity, His 85 is phosphorylated on its N⑀2 Each of the oxygen atoms of the phosphoryl group on pHis 85 is hydrogen-bonded, with good geometry. The O2P atom is at the N terminus of a 3 10 helix with a hydrogen bond to the peptide NH of Phe 28 , O1P hydrogen bonds to the peptide NH of Ala 55 , and O3P bonds to O␥ of Ser 56 and N␦2 of Asn 118 of the adjacent subunit. The stabilizing environment of the phosphoryl group is illustrated in Fig. 4.
Many of the groups in and around the putative active site are fully or mostly conserved in all homologous sequences (Fig. 1). The phosphorylated His 85 , together with Phe 28 , Phe 87 , Asn 118 , Asp 139 , Gly 140 , and Tyr 165 are all invariant, residue 95 is always aromatic, residue 97 is always either Ile or Val, and Asn 20 and His 27 are only replaced by similar hydrogenbonding residues. Between them, these residues account for many of the conserved residues in the sequence alignment in Fig. 1, with the other residues presumably conserved for conformational reasons (Gly and Pro) or because they make multiple stabilizing hydrogen bonds or salt bridges.
Functional Hypothesis and in Silico Ligand Binding-The presence of a stably phosphorylated histidine strongly suggested that PAE2307 has a biochemical role that involves phosphate transfer. The putative substrate for this phosphorylation is not clear, but two lines of evidence imply a role in DNA or nucleotide metabolism. First, there is the weak structural similarity to a B-type DNA polymerase described above. Secondly, when analyzed using a phylogenetic profiling technique based on the comparison of orthologs from 81 microbial genomes, 5 PAE2307 is found to cluster with DNA-binding proteins from Methanopyrus kandleri, Sulfolobus solfataricus, and Thermoplasma acidophilum, a gyrase from S. solfataricus, and adenylate and predicted nucleotide kinases from M. kandleri.
To test the feasibility of a role for PAE2307 in nucleotide metabolism, in silico docking studies were carried out with a number of nucleosides and nucleotides, using the program GOLD (31). The phosphate of group of pHis 85 was taken as the center of the active site, and a radius of 20 Å was used to define the possible binding surface, which encompassed the whole inter-subunit cleft. Phosphorylated nucleotides bound in a number of configurations, with the binding being primarily dictated by the interaction of their phosphate groups with the side chains of Arg 62 and Lys 58 , and the base ring making few contacts with the protein. However, nucleosides reproducibly bound in the hydrophobic pocket adjacent to the active site, with GOLD fitness scores (ϳ80) comparable to those obtained for tight protein-ligand complexes (31). The predicted binding mode of adenosine (Fig. 5a) had the adenine base stacked between the conserved side chains of Phe 28 and Trp 95 and hydrogen-bonded to the phenolic oxygen of Tyr 165 . The  The active site region of molecule A from the trimer is shown as a stereo diagram. The electron density around residue pHis 85 is the SIGMAA-weighted, 2m͉F o ͉ Ϫ D͉F c ͉ density from the final refined structure, contoured at 1.6, and calculated using CNS (22). Only the amino acid residues making interactions with the side chain of pHis 85 are shown. Residues from molecule A are labeled in green, and residues from molecule B are labeled in blue.
involvement of Tyr 165 in this site emphasizes the importance of the many interactions that hold the C-terminal helix ␣4 (residues 148 -162) in place. The predicted positioning of the ring places the ribose moiety adjacent to the phosphate of pHis 85 , with the ribose O5Ј-C5Ј bond gauche to the C4Ј-O4Ј bond and trans to the C4Ј-C3Ј bond, correctly positioning the O5Ј atom to accept the phosphoryl group from the pHis 85 . This binding mode suggested that PAE2307 may function as a nucleoside kinase. The nature of the phosphate donor could not be predicted, although it would presumably bind on the opposite side of pHis 85 , in the other half of the binding cleft, in such a way as to be able to donate a phosphate group.
In Vitro Ligand Binding-To test the hypothesis that PAE2307 may function as a nucleoside kinase, in vitro ligand binding experiments were carried out to establish whether nucleosides were able to bind to the protein. The presence of Trp 95 in the predicted nucleoside binding site enabled intrinsic protein fluorescence to be used to monitor ligand binding.
Addition of various nucleosides to the protein in solution resulted in the reduction of observed fluorescence and allowed the measurement of binding curves, as shown in Fig. 6. Adenosine showed relatively tight binding in comparison to the other nucleosides tested (Fig. 6a), with a calculated K D of 15 M compared with 515, 223, and 302 M for guanosine, thymidine, and deoxyuridine, respectively. Therefore, PAE2307 appears to be specific in binding adenosine over other nucleosides. To test whether the phosphorylation state of the nucleoside was important in binding, the three phosphorylated forms of adenosine, AMP, ADP, and ATP, were tested for their ability to bind (Fig. 6b). AMP bound only slightly less tightly than adenosine alone, with a calculated K D of 26 M. However, the di-and triphosphates bound much more weakly, with K D values of 740 and 741 M, respectively. Interestingly, adenine base alone bound to the protein more tightly than either ADP or ATP, with an estimated K D of 198 M. This preference for binding non-or mono-phosphorylated forms of adenosine over the higher FIGURE 5. Adenosine binding to PAE2307. a, the binding cleft is shown with a molecule of adenosine docked into place, with the key interacting residues labeled. The residues making contacts with the ligand are all absolutely conserved, apart from Trp 95 , which is always aromatic, and Ser 56 , which is conserved in most members of the protein family. The accessible molecular surface is shown as a semi-transparent skin, colored green for chain A and blue for chain B. b, electron density, in red, for the bound adenosine, from a SIGMAA-weighted, m͉F o ͉ Ϫ D͉F c ͉ electron density map calculated using the refined structure, prior to modeling any ligand and contoured at 3. The final refined position of the adenosine molecule is superimposed. c, stereoview showing the predicted and observed adenosine binding modes. The conformation of the bound adenosine and interacting side chains as predicted by in silico docking are shown in white, with predicted hydrogen bonds in magenta; the observed conformation is shown in yellow, with hydrogen bonds in green.
phosphorylated forms supports the hypothesis that PAE2307 acts as a kinase, with adenosine and/or AMP as preferred substrate.
Structure of Adenosine Complex-To confirm that adenosine binds as predicted by in silico modeling, the crystal structure of an adenosine complex of PAE2307 was determined and refined at a resolution of 2.25 Å (Table 1). Electron density maps showed unambiguous positive difference peaks at the predicted adenosine binding sites, as shown in Fig. 5b. Adenosine modeled into this difference density refined well, with atomic B-factors that were very similar to surrounding residues (25-30 Å 2 ), and no residual difference electron density, thus validating the predictions made from in silico modeling. A comparison of the predicted and observed conformations of adenosine is shown in Fig. 5c. The binding mode is largely as predicted, but there are several small changes in the active site, the most notable of which is that the side chain of Trp 95 has rotated about C␤-C␥ to enable better stacking with the ring of the adenine base. The conformation of the observed ligand is slightly different to that predicted by modeling, with a hydrogen bond made from the N1 rather than the N6 nitrogen of the adenine base to the phenolic oxygen of Tyr 165 , and a rotation around the N9 -C1Ј bond, which enables hydrogen bonds to be made from the O2Јand O3Ј-hydroxyl groups on the ribose ring to the side chain of Asn 20 and the backbone nitrogen of Ala 117 , respectively. One additional change is that, because this form of the protein is not phosphorylated, a hydrogen bond is able to form between the N⑀2 nitrogen of His 85 and the O5Ј of the adenosine ribose. Overall, the in silico modeling had accurately placed the adenosine ligand, with a root-mean-square difference in atomic positions of only 1.85 Å between the predicted and observed conformations.

DISCUSSION
The structure of PAE2307 demonstrates the powerful role that structural biology can play in providing functional hypotheses for proteins of previously unknown function. The unexpected discovery of a phosphorylated histidine in PAE2307, in a solvent-exposed cavity that contains a number of fully or mostly conserved residues, leaves little doubt that this is the active site and that the function of this protein and its homologs is in phosphoryl transfer. Given the rarity of stably phosphorylated histidine residues, it also allows us to examine the factors that stabilize such modifications and how these may be related to function.
There are currently three other examples of crystal structures of phosphohistidine-containing proteins, nucleoside diphosphate kinase (4), succinyl-CoA synthetase (3), and cofactor-dependent phosphoglycerate mutase (10). A common factor in each of these structures is that the non-phosphorylated imidazole nitrogen of the active histidine side chain is hydrogen-bonded to an oxygen atom of the protein. In nucleoside diphosphate kinase, N␦1 of the histidine is phosphorylated, while N⑀2 interacts with a carboxylate oxygen from a glutamate side chain. In succinyl-CoA synthetase, N⑀2 is phosphorylated while N␦1 also interacts with a carboxylate oxygen of a glutamate side chain. In cofactor-dependent phosphoglycerate mutase, N⑀2 is phosphorylated while the N␦1 hydrogen bonds to a peptide carbonyl oxygen. Histidine phosphorylation involves donation of a pair of electrons from the imidazole nitrogen to form the N-P bond and thus generates a positive charge on the imidazole ring. The ion pair interactions in nucleoside diphosphate kinase and succinyl-CoA synthetase, in which the phosphohistidine ring nitrogen interacts with negatively charged carboxylate oxygen, thus likely stabilize the phosphohistidine group. In the cofactor-dependent phosphoglycerate mutase structure, the interaction between the phosphohistidine and the carbonyl oxygen atom is only a hydrogen bond and would have minimal ionic character.
We conclude that histidine phosphorylation is stabilized by appropriate ion pair hydrogen bonds involving the non-phosphorylated imidazole nitrogen. This is consistent with the full occupancy for the phosphoryl groups in nucleoside diphosphate kinase and succinyl-CoA synthetase, compared with the partial occupancy (0.28) in cofactor-dependent phosphoglycer-FIGURE 6. In vitro assays of nucleoside and nucleotide binding to PAE2307. The change in intrinsic protein fluorescence at 347 nm was monitored as a function of ligand concentration. Fluorescence values were normalized to the fluorescence of protein alone and of buffer alone and were corrected for inner filter effects. a, binding of various purine and pyrimidine nucleosides to PAE2307. b, binding of adenine, adenosine, and adenosine nucleotides to PAE2307. ate mutase where there is a hydrogen bond but no ionic interaction. In agreement with this, the salt bridge in PAE2307 between pHis 85 N␦1 and the carboxylate oxygen of Asp 139 stabilizes the phosphoryl group on N⑀2 and is expressed in an occupancy of ϳ0.7 in each of the three crystallographically independent protomers.
The in vivo substrate for the phosphorylation reaction predicted to be catalyzed by PAE2307 is not known, but the phylogenetic linkage of orthologous proteins to DNA-binding proteins and nucleotide kinases, and a weak structural similarity to a domain (of unknown function) from a thermostable DNA polymerase, gave the first suggestion that a nucleotide or nucleoside may be the substrate. The favorable adenosine-binding mode predicted by in silico ligand binding analyses led to our functional hypothesis that the in vivo biochemical function of this highly conserved protein family is as an adenosine-specific nucleoside kinase. This hypothesis is strongly supported by our experimental binding studies. The in vitro binding studies demonstrated that adenosine, but not other nucleosides, bound tightly to the protein (K D ϭ 15 M) and that di-or triphosphorylated adenosine nucleotides bind much less well than adenosine or AMP. The crystallographic analysis of adenosine binding confirmed its specific and stereochemically favorable binding mode, with the base snugly accommodated between invariant or highly conserved residues and the O5Ј oxygen of the adenosine ribose in a position to receive a phosphate transferred from the phosphorylated His 85 .
A priori, it is possible to hypothesize that AMP may be the phosphate donor rather than the substrate of the proposed kinase activity. If this were the case, AMP would lose a phosphate to become adenosine in the course of the reaction. However, as adenosine binds more tightly than AMP (K D of 15 M compared with 26 M for AMP), this scenario would lead to a dead-end adenosine-protein complex. If, on the other hand, AMP or adenosine is the acceptor substrate and phosphorylation leads to formation of ADP or ATP, the di-or tri-phosphorylated product would dissociate from the protein, as both have a much lower binding affinity (30-to 50-fold weaker). We therefore propose that AMP or adenosine is the acceptor in a phosphoryl transfer from another (unknown) donor.
Adenosine kinases have been characterized from a number of species, with structures known for the human (36) and Toxoplasma gondii (37) enzymes. These belong to a wider family of carbohydrate kinases called the ribokinase family after the archetypal member (38). They share a common fold comprising a large ␣/␤ domain and a smaller, more variable, domain, and have an ordered, associative mechanism, in which the phosphate is transferred directly from the donor ATP to the substrate, without formation of a phosphoenzyme intermediate (37,39). A basic group, usually Asp, activates the hydroxyl of the acceptor molecule.
PAE2307 has a different fold and appears to be representative of a novel family of nucleoside kinases. Although it is possible that the active site histidine, His 85 , could act as a base to remove the C5Ј hydroxyl proton, the observed stable phosphorylation of His 85 and its favorable orientation for phosphoryl transfer to the hydroxyl group strongly suggest that its mechanism involves a phosphohistidine intermedi-ate. Unlike the enzymes of the ribokinase family, however, PAE2307 does not undergo any conformational change outside of the active site as a result of adenosine binding; the native and adenosine-bound structures are essentially identical.
Although the phosphate donor in the reaction catalyzed by PAE2307 is not yet identified, its likely binding site can be proposed from an examination of the structure. The cavity in the protein surface in which the active site histidine is found is 'L'-shaped, with His 85 sitting at the corner of the 'L.' The adenosine molecule binds into the short arm of the 'L,' and it is possible that the phosphate donor will bind into the cavity along the long arm of the 'L,' in such a way as to be able to transfer phosphate to adenosine via His 85 . Several strongly or absolutely conserved residues line this region, including Glu 54 , Ser 56 , and Gly 57 on one side and Ile 29 and Arg 155 on the other, with Glu 54 and Arg 62 forming the cavity floor. However, attempts to model the binding of potential phosphate donor molecules to this region in silico were not successful.
PAE2307 exhibits a strong preference for adenine-containing ligands (K D of 15 M for adenosine compared with 515, 223, and 302 M for guanosine, thymidine, and deoxyuridine, respectively). The reason for this specificity is apparent upon inspection of the adenosine binding site, as the C2 carbon of the adenine ring packs closely against the side chain of Ile 97 . The C2 carbon of the base ring is modified in guanine by the addition of an amino group, and in thymine and cytosine by a carbonyl oxygen, and either of these modifications would cause a clear steric clash, disfavoring productive binding. Ile 97 is itself part of a strongly conserved sequence motif ( 96 P(I/L)N(I/V)L 100 ) in helix ␣3, which forms the back of the binding site, indicating that the observed binding specificity is likely to be shared by the whole protein family.
In summary, the structure of PAE2307 identifies it as a likely nucleoside kinase, with a strong preference for adenosine over other nucleosides. The nature of the donor molecule for the phosphorylation reaction is unknown, but the reaction appears to proceed via a phosphohistidine intermediate, in contrast to known nucleoside kinases. The biological purpose of this phosphorylation reaction is unclear, but the strong conservation of sequence within the protein family and its wide phylogenetic distribution among the archaea and bacteria implies that it is likely to have an important role.