High Resolution Structure of the Phosphohistidine-activated Form of Escherichia coli Cofactor-dependent Phosphoglycerate Mutase*

The active conformation of the dimeric cofactor-de-pendent phosphoglycerate mutase (dPGM) from Escherichia coli has been elucidated by crystallographic methods to a resolution of 1.25 Å ( R -factor 0.121; R -free 0.168). The active site residue His 10 , central in the catalytic mechanism of dPGM, is present as a phosphohistidine with occupancy of 0.28. The structural changes on histidine phosphorylation highlight various features that are significant in the catalytic mechanism. The C-terminal 10-residue tail, which is not observed in previous dPGM structures, is well ordered and interacts with residues implicated in substrate binding; the displace-ment of a loop adjacent to the active histidine brings previously overlooked residues into positions where they may directly influence catalysis. E. coli dPGM, like the mammalian dPGMs, is a dimer, whereas previous structural work has concentrated on monomeric and tetrameric yeast forms. We can now analyze the sequence differences that cause this variation of quaternary structure. They be into histidine solvent water 0.28 occupancy phosphohistidine and 0.72 occupancy histi- dine plus three water molecules. We had no reason to histidine phosphorylation prior to crystallization, because the half-life of the phosphohistidine is expected to be of the order of 35 min, as observed with the S. cerevisiae enzyme (25). Quality of the Model— The excellent quality, high resolution data have led to a reliable, precise model with root mean square deviations from Engh and Huber bond lengths and angle distances of 0.014 and 0.030 Å (26). No residues have disallowed f / c angles, and only Ala 182 has generously allowed values, as defined by PROCHECK (27).

Phosphoglycerate mutases (PGMs) 1 are enzymes involved in glycolysis and gluconeogenesis. They can be subdivided into two types: cofactor-dependent PGM (dPGM) and cofactor-independent PGM (iPGM). Whereas vertebrates, yeasts, and many bacteria have only dPGM, and higher plants, nematodes, archaea, and many other bacteria have only iPGM, a small number of bacteria including Escherichia coli have both (1).
The crystal structure of Saccharomyces cerevisiae dPGM 2 was first published in 1974 (Protein Data bank code 3PGM (2, 3)), and structures of different crystal forms and inhibitor complexes at increasing resolution have followed (4PGM, 5PGM, 1BQ3, 1BQ4, 1QHF (4 -7)). Schizosaccharomyces pombe dPGM has been studied by NMR, and a backbone assignment has been published (8). In most organisms for which a dPGM has been characterized, including E. coli and mammals, the active enzyme exists as a dimer. S. cerevisiae dPGM, however, is tetrameric, and S. pombe dPGM is monomeric. Most recently, the crystal structure of the iPGM from Bacillus stearothermophilus has been solved (9,10), highlighting the absence of any similarity to dPGM in all aspects except its main mutase activity.
dPGM is the archetype of the "phosphoglycerate mutaselike" protein fold superfamily (SCOP (11)), which also contains the phosphatase domain of the 6-phosphofructo-2-kinase/fructose-2,6-bisphosphatase family as well as prostatic acid phosphatase and phytase. The common fold of these proteins is commensurate with their use of phosphohistidine as a catalytic intermediate.
Other examples of N-phosphorylation at histidine occur in bacterial signaling proteins (12) and in enzymes such as fructose permease (13), nucleoside diphosphate kinase (14), and succinyl-CoA synthetase (15). Structures of intact phosphohistidine-containing proteins are particularly rare, those of HPr by NMR (16) and succinyl-CoA synthetase (15) and nucleoside diphosphate kinase (14) by x-ray crystallography being the only examples to date. None of these structures represent the phosphoglycerate mutase-like fold family.
We now report the structure of E. coli dPGM in its phosphorylated, active conformation. The structure of a dimeric dPGM provides a basis for examining the residues involved in interactions in the varying oligomerization states observed in dPGMs. The establishment of this structure as representative of the active conformation of the enzyme and comparison with the available dephosphorylated structures provide new information regarding the roles of specific residues in the complex catalytic mechanism of this class of enzymes.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification-The E. coli (K12) pgm1 gene was amplified from genomic DNA by a polymerase chain reaction using the 5Ј and 3Ј end-specific primers 5Ј CCC-GCG-CAT-ATG-GCT-GTA-ACT-AAG 3Ј and 5Ј CGC-GGA-TCC-TTA-CTT-CGC-TTT-ACC-CTG 3Ј. These oligonucleotides (Amersham Pharmacia Biotech) introduced NdeI and BamHI restriction sites, respectively (underlined). Taq po-* This work was funded by the Wellcome Trust. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The  1 The abbreviations used are: PGM, phosphoglycerate mutase; dPGM, cofactor-dependent PGM; iPGM, cofactor-independent PGM. 2 Sequence numbering used throughout is that based on the gene sequence of the E. coli protein. The S. cerevisiae sequence has extensions at the N and C termini and one insertion and one deletion compared with the E. coli sequence, occurring at positions 226 and 229, respectively. Hence for the majority of the sequence residue n in the E. coli sequence corresponds to residue n Ϫ 2 in the S. cerevisiae sequence. lymerase, DNA ligase, and the relevant restriction enzymes were obtained from Promega. The polymerase chain reaction product (ϳ0.75 kilobases) was gel-purified (Qiaex extraction kit, Qiagen) and cloned into pUC18 (SureClone, Amersham Pharmacia Biotech), and positive clones were identified by restriction digest. The DNA fragment was ligated into the BamHI/NdeI-cleaved plasmid pET3a (Novagen) to give the pET3a-pgm construct that was amplified in E. coli JM109 (Novagen), and the integrity of the gene was confirmed by sequencing. E. coli strain BL21(DE3)pLysS (Novagen) was heat shock-transformed with pET3a-pgm and selected on Luria-Bertani agar plates containing ampicillin and chloramphenicol. Single colonies were cultured, and the expression of protein with isopropyl-␤-D-thiogalactopyranoside was tested under a range of conditions. E. coli dPGM was then overexpressed and purified as described previously (1).
Data Processing and Refinement-Data (Table I) were collected at SRS Daresbury station 9.6 ( ϭ 0.87 Å) on an ADSC Quantum-4 CCD detector, processed with DENZO, and scaled with SCALEPACK (17). Molecular replacement was performed with AMORE (18) using data to 2.0 Å and a search model derived from S. cerevisiae dPGM (the highest resolution structure available at 1.7 Å, Protein Data Bank code 1QHF (7)), using one monomer and truncating all side chains to C␤. The suitability of the best solution was confirmed by the presence of the correct dimer interface provided by the crystallographic 2-fold. Subsequent phase improvement and automated building were achieved using wARP (19), resulting in a model with an R-factor of 0.229.
Refinement with SHELXL (20), addition of water molecules, manual intervention using O (21), use of restrained anisotropic thermal parameter refinement, and the inclusion of 17 dual side chain conformers resulted in a structure containing 2069 nonhydrogen protein atoms (residues 1-247), two sulfates, and a chloride; this structure had an R-factor of 0.121 and R-free of 0.168. Multiple conformers were refined with total occupancy restrained to 1.0. Occupancies were refined for the two sulfates in the active site, whereas the chloride, located on a 2-fold axis at the dimer interface, is modelled with 0.5 occupancy.
At an early stage in the refinement it was noted that three waters lie adjacent to the N⑀2 of His 10 in a tetrahedral arrangement. In the center of these atoms was a prominent peak of residual electron density corresponding to ϳ0.5 that of an omitted ordered solvent molecule. His 10 is the nucleophilic histidine that becomes phosphorylated during the catalytic cycle. Although there are two histidines that have roles in catalysis, the term "active site histidine" will be used exclusively for His 10 . The geometry of the imidazole, three water molecules, and peak (N-P distance, 1.74 Å; O-P distances, 1.50 Ϯ 0.02 Å) was in agreement with the structure of phosphorylimidazole found in the Cambridge Structural Data base ( (22,23); CADPIM (24)). The outstanding quality of the diffraction data allowed us to successfully refine His 10 as a partially occupied phosphohistidine with its occupancy coupled to a histidine and three solvent water molecules (Fig. 1). The resulting model has 0.28 occupancy phosphohistidine and 0.72 occupancy histidine plus three water molecules. We had no reason to expect histidine phosphorylation prior to crystallization, because the half-life of the phosphohistidine is expected to be of the order of 35 min, as observed with the S. cerevisiae enzyme (25).
Quality of the Model-The excellent quality, high resolution data have led to a reliable, precise model with root mean square deviations from Engh and Huber bond lengths and angle distances of 0.014 and 0.030 Å (26). No residues have disallowed / angles, and only Ala 182 has generously allowed values, as defined by PROCHECK (27).

RESULTS AND DISCUSSION
Overall Fold-The ␣/␤ fold of the monomer of E. coli dPGM is the same as S. cerevisiae dPGM, as expected from their 54% sequence identity (Fig. 2a). In summary, the protein core consists of a six-stranded ␤-sheet, C-B-D-A-E-F, with all but E being parallel, flanked by six ␣-helices (Fig. 2b). The active site is located at the C-terminal edge of the ␤-sheet and is constructed from stretches of sequence dispersed throughout the amino acid sequence.
Quaternary Structure-The active dimer is formed by the antiparallel alignment of the C strands of two monomers. Comparison of the dimeric structure with the S. cerevisiae tetramer highlights the structural basis for the different oligomerization states. The S. cerevisiae and E. coli monomers superpose in LSQMAN (28), with a root mean square deviation of 1.2 Å over 227 C␣ atoms. Despite having a largely similar backbone structure, the region of lowest sequence similarity (residues 124 -145) provides a number of important differences at the S. cerevisiae tetramerization interface. Significant substitutions include Asp 141 (S. cerevisiae)-Ser 143 (E. coli), which abolishes a hydrogen bond to Trp 162 * (* signifies a residue of another subunit); Pro 142 -Glu 144 , which alters the local backbone conformation significantly for a stretch of four residues; Val 144 -Glu 146 , which produces a steric clash with Gln 163 *; Asp 164 -Glu 166 , which breaks a salt bridge to Arg 83 *; and Lys 168 -Pro 170 , which causes a steric clash with the main chain of Tyr 139 *. The S. cerevisiae K168P mutant has indeed been shown to reduce the K m of tetramerization (29); but whereas the introduction of this proline into a helix does disrupt the hydrogen bond network, the change in backbone conformation is insignificant, and it is the position of the proline side chain that disrupts the interface interaction (Fig. 3).
An explanation for the inability of S. pombe dPGM to dimerize is less clear, particularly given the lack of a three-dimensional structure. Sequence comparison suggests a number of interactions present in the E. coli and S. cerevisiae proteins that are absent in S. pombe. These dimer-forming interactions come from two stretches of sequence including residues 58 -77 and 136 -139. The first stretch forms helix ␣3 and the adjacent strand ␤C, whereas the second stretch is part of the region discussed above that promotes tetramerization in S. cerevisiae. The positions where the S. pombe sequence differs from both E. coli and S. cerevisiae include Val 58 (E. coli)-Ala 63 (S. pombe), which causes the loss of a hydrophobic contact, and Ala 76 - a Mechanically twinned crystals and ice rings caused a reduction in completeness in some resolution ranges.
b Isotropic equivalent to anisotropic thermal parameters, B(eq) ϭ Pro 81 , which is likely to cause some distortion of the backbone. A significant hydrophobic packing interaction of Trp 77 with Arg 138 * and Tyr 139 * is also lost on substitution of Trp to Asn 82 and deletion of the loop from 124 -147. These comparisons reveal that there are no major structural rearrangements between dPGMs. Rather, the differences are restricted to amino acid changes at the subunit interfaces. Given that all dPGMs, whether monomeric, dimeric, or tetrameric, retain essentially the same activity, a question remains as to the biological function of these distinct quaternary assemblies.
Structural Consequences of Histidine Phosphorylation-Although no measurements have been made of the phosphohistidine half-life of E. coli dPGM, there was no reason to expect it to be any longer than the 35 min observed for S. cerevisiae dPGM; yet the presence of phosphohistidine in the crystals indicates that a certain level of phosphorylation must persist for considerably longer. Whereas the native structure has only 0.28 occupancy phosphohistidine, the remainder of dephosphorylated protein also adopts the active conformation, with water molecules occupying the vacant phosphate oxygen positions. We cannot rule out the possible contribution of crystal-packing forces in aiding the dephosphorylated protein to adopt the active conformation.
The native structure presented here is representative of the enzyme in its competent, phosphorylated form and is used as such in the following discussion; the S. cerevisiae structures are typical of an inactive or inhibited dephosphorylated form. Comparison of these two forms indicates significant structural differences. The C-terminal tail of the protein, with the exception of the final two residues, is ordered in the phosphorylated form. This tail, the subject of much speculation regarding its possible role in the catalytic cycle (30), is not modelled in the S. cerevisiae structures because of disorder. The conformation of His 10 when phosphorylated is distinct from that in the yeast structures, and the adjacent residues in the loop from Arg 9 to Thr 22 have moved up to 1.7 Å (C␣-C␣ distance) relative to the equivalent residues in the S. cerevisiae structure.
Active Site-dPGM has a cup-shaped active site that is 16 Å deep and 10 by 8 Å wide, with a volume of ϳ1200 Å 3 containing up to 36 ordered solvents and 2 sulfates. This extensive cavity is lined by atoms from 43 residues: 9 -23, 36, 61, 88 -91, 99, 111-116, 183-188, 203-209, and 239 -247. The roles of these residues can, to a large extent, be divided into three categories: the catalytic machinery, the residues responsible for substrate binding, and the site of access where substrates enter and products leave (Fig. 4a).
The residues at the base of the active site that surround His 10 are strictly conserved among the E. coli, S. cerevisiae, S. pombe, and human dPGM family (black circles in Fig. 2a). Interactions between the phosphohistidine and the rest of the protein are depicted in Fig. 4b, with interatomic distances given in Table II. The His 10 side chain is held in position in both the phosphorylated and dephosphorylated forms by a hydrogen bond between N␦1 and the amide oxygen of the adjacent Gly 11 . The length of this hydrogen bond decreases on phosphorylation, accompanied by the movement of residues 9 -22. One of these residues, Asn 16 , alters its side chain conformation to allow N␦2 to form a hydrogen bond with a phosphate oxygen (3.18 Å) and the O␦1 to participate in a CH⅐⅐⅐O hydrogen bond with C⑀1 of His 10 (3.14 Å). A second phosphate oxygen accepts a hydrogen bond from His 183 N␦1 (2.72 Å), which is itself hydrogen-bonded via its N⑀2 atom to Ser 57 (data not shown). His 183 may also serve as a proton source during catalysis, because it is spatially adjacent to the active site acid Glu 88 . The other basic residues that bind the phosphate group are Arg 61 , FIG. 1. Electron density (blue) at the active site histidine. a, 6 F o Ϫ F c ␣ calc electron density omitting the phosphorus. b, 2 2F o Ϫ F c ␣ calc electron density calculated omitting the atoms shown. The phosphohistidine, at an occupancy of 0.28, is shown as semitransparent ball and stick. The solid ball and stick shows the remaining 0.72 occupancy histidine (blue) with three water molecules (red). Figs. 1, 2b, 3, 4b, and 5 were prepared using Molscript (32) and Raster3D (33).
which forms a hydrogen bond to the same oxygen as His 183 via its N⑀ atom (2.96 Å), and Arg 9 , which forms a hydrogen bond to the third phosphate oxygen via N⑀ (2.74 Å). This oxygen is also hydrogen-bonded to the amide nitrogen of Gly 184 , which is the N-terminal residue of a 12-residue ␣-helix (␣8). This helix (gray in Fig. 2b) is conspicuous on a ribbon diagram because it lies more perpendicular to the ␤-sheet than do the other flanking helices and is oriented such that the N-terminal helix dipole contributes to stabilization of the phosphohistidine, rather than to the stabilization of another substrate phosphate group, as previously proposed (31). In addition to the polar interactions with the phosphoryl group, the aliphatic segments of the side chains of Arg 9 and Arg 61 also provide a series of hydrophobic contacts that contribute to the orientation of the imidazole ring of His 10 . Whereas the interactions with Arg 61 are conserved on dephosphorylation, those with Arg 9 are completely abolished.
The binding and presentation of substrates for phosphoryl- FIG. 2. A, alignment of four dPGM sequences from E. coli (PMG1_ECOLI), human brain (PMGB_HUMAN), S. cerevisiae (PMG1_YEAST), and S. pombe (PMGY_SCHPO) from the Swiss Protein Database (34), numbered according to the E. coli sequence. Secondary structure elements assigned by PROMOTIF (35) to the E. coli structure are marked on, colored, and labeled as in b. Dark blue boxes signify identity, and cyan boxes signify similarity. Where the S. pombe sequence has deletions, identity and similarity are calculated for the other three sequences. Black circles highlight phosphohistidine binding residues; red boxes underline residues involved in dimerization; green boxes underline residues involved in tetramerization; magenta boxes indicate the C-terminal tail. Prepared using CLUST-ALW (36) and ALSCRIPT (37). B, a ribbon diagram of E. coli dPGM. The ␤-strands labeled A-F are shown as yellow arrows, most of the ␣and 3 10 ()helices are colored cyan. Helix ␣10 and the C-terminal tail are magenta, and helix ␣8 is gray. Blue spheres represent the active phosphohistidine. The two active site sulfates (red and yellow) and a chloride ion (green) are also depicted as spheres.
FIG. 3. Stereo view highlighting differences between E. coli and S. cerevisiae dPGMs at the tetramerization interface of the latter. Subunits of S. cerevisiae are shown in magenta and black, and subunits of E. coli are shown in green. Side chains are shown for residues that may be responsible for the inability of E. coli dPGM to tetramerize. ation or dephosphorylation is mediated by active site residues between 2 and 12 Å from the active histidine. In addition to the phosphohistidine, the active site of the E. coli structure contains two sulfate ions derived from the crystallization medium. It is likely that these binding sites are formed by residues that are involved in binding the phosphate groups of the mono-and bisphosphoglycerate substrates. Two of the S. cerevisiae crystal forms also have two sulfates in the active site, and in the case of 1QHF a partially occupied 3-phosphoglycerate has been modelled overlapping one of these sulfates (7). It is of particular interest that the two pairs of sulfate binding sites in S. cerevisiae and E. coli are different (their positions are displaced by 3.1 and 4.0 Å, respectively) and thus in combination describe four sites where the phosphate moieties of the substrates may bind, with implications for the enzyme mechanism. For simplicity, the designations E1 and E2 are used to identify the two sites observed in the E. coli structure, and Y1 and Y2 are used to identify those sites identified in the S. cerevisiae structures. When the protein structures are superposed, Y1 is 3.9 Å from the position of the phosphoryl group of the phosphohistidine forming hydrogen bonds to the phosphohistidine-stabilizing residues Arg 61 and Asn 16 and to Ser 13 . Site E1 is located further from the phosphohistidine and also participates in a hydrogen bond with Arg 61 , but its most important interactions are with the amide nitrogens of Thr 22 and Gly 23 . Sites Y2 and E2 are formed, in the main part, by interactions with Arg 115 and Arg 116 , both of which are strictly conserved residues. These arginines form hydrogen bonds with residues in the access site (discussed below) and probably contribute to linking catalytic events to structural change at the access site.
Access Site-The structure of the C-terminal tail of dPGMs has remained a mystery throughout previous structural and   (Fig. 5). This observation strongly implies that the ordering of the C terminus is commensurate with enzyme phosphorylation and thus typical only of the active enzyme. The access site as a whole consists of the rim of the active site cavity and the C-terminal tail, which forms a lid. The rim is formed by residues Glu 12 , Lys 17 , Asn 19 , Lys 32 , Glu 36 , Ala 100 , Asp 108 , Lys 112 , Glu 204 , Asn 206 , and Thr 209 , whereas the Cterminal tail consists of residues 238 -249 (Lys-Ala-Ala-Ala-Val-Ala-Asn-Gln-Gly-Lys-Ala-Lys) and is highlighted in magenta in Figs. 2 and 4a.
The basic secondary structure of the tail is a ␤-hairpin based around a ␤-turn at Ala 243 -Gly 246 . This motif extends away from helix ␣10 across the active site opening, forming a number of hydrogen bonds with residues of the rim and substrate binding region (Fig. 5).
Ala 239 -O forms a hydrogen bond with Asn 206 -N␦2, whereas Asn 206 -O␦1 accepts a hydrogen bond from the substrate binding residue Arg 116 . The side chain of Val 242 interacts with Lys 247 , whereas its amide oxygen forms a hydrogen bond with the side chain of another substrate binding residue, Arg 115 . Asn 244 -O␦1 also forms a hydrogen bond to Arg 115 . In the S. cerevisiae structures these two arginine side chains are oriented differently to bind sulfate rather than to form the interactions listed above, which may promote the disorder of the tail. The hydrogen bond between Asn 244 -O and the side chain of Asn 19 is the only direct link between the C-terminal tail and the stretch of residues from 9 -22.
Ala 243 -O and Gly 246 -N form the hydrogen bond that makes the ␤-turn. This places Asn 244 and Gln 245 in the correct orientation to hydrogen-bond to the active site rim residue, Asp 108 , a residue that is conserved throughout the full-length dPGMs and was previously proposed to bind the C-terminal lysine residues. The observed interactions serve as a clasp to pin the tail over the active site.
It has been proposed that adoption of an ordered conformation by the tail when the protein is phosphorylated prevents solvent access and thus phosphoenzyme hydrolysis (31). The present work shows this to be unlikely because up to 36 ordered water molecules are found in the cavity, but there may be a less direct role in phosphohistidine stabilization. S. pombe dPGM is typical of a group of "short" dPGMs (also including Zymomonas mobilis and Haemophilus influenzae) that have no C-terminal tail. Limited proteolysis of S. cerevisiae dPGM, removing the C-terminal seven residues, produces a protein of similar character to S. pombe dPGM with markedly reduced mutase activity and enhanced phosphatase activity (30). Most of the residues involved in interactions that hold the tail in place are conserved among all "full-length" dPGMs and link, via one or two residues, directly to the substrate binding region. This hydrogen-bonding network is shown in Fig. 4a.
We propose that Arg 115 and Arg 116 provide the switch with which the substrate in the active site induces a change between active and inactive forms by making interactions with the tail residues more or less favorable. When the active form is selected, the interactions of the tail with Asp 108 provide further stability to the conformation. The real key to preserving the phosphohistidine is the conformation of Asn 16 . This residue forms two hydrogen bonds with the phosphohistidine and lies on the loop from residues 9 -21, which may well be constrained in the active form by the interaction of Asn 19 with the Cterminal tail.
The crystal structure of a dPGM in its competent, phosphorylated form advances our understanding of the contributions from various parts of the dPGM structure in determining and regulating catalysis. Of particular importance is the ordered structure of the C-terminal portion of the polypeptide, which has been an enigma for many years. The structure of this tail differs from predictions and redefines the proposed roles of a number of residues. The dimeric structure has allowed us to identify the determinants of the distinct oligomerization states of dPGMs, although it remains unclear why such variety exists. The analysis of the structural changes on phosphorylation suggests important roles for previously overlooked residues, such as Asn 16 and Asn 19 , which can now be investigated through site-directed mutagenesis. The existence of a crystal form of dPGM that diffracts to atomic resolution provides an excellent basis for such studies and also allows further investigation of substrate and inhibitor binding.