Structural and Functional Insights into DR2231 Protein, the MazG-like Nucleoside Triphosphate Pyrophosphohydrolase from Deinococcus radiodurans

Deinococcus radiodurans is among the very few bacterial species extremely resistant to ionizing radiation, UV light, oxidizing agents, and cycles of prolonged desiccation. The proteome of D. radiodurans reflects the evolutionary pressure exerted by chronic exposure to (nonradioactive) forms of DNA and protein damage. A clear example of this adaptation is the overrepresentation of protein families involved in the removal of non-canonical nucleoside triphosphates (NTPs) whose incorporation into nascent DNA would promote mutagenesis and DNA damage. The three-dimensional structure of the DR2231 protein has been solved at 1.80 Å resolution. This protein had been classified as an all-α-helical MazG-like protein. The present study confirms that it holds the basic structural module characteristic of the MazG superfamily; two helices form a rigid domain, and two helices form a mobile domain and connecting loops. Contrary to what is known of MazG proteins, DR2231 protein shows a functional affinity with dUTPases. Enzymatic and isothermal calorimetry assays have demonstrated high specificity toward dUTP but an inability to hydrolyze dTTP, a typical feature of dUTPases. Co-crystallization with the product of hydrolysis, dUMP, in the presence of magnesium or manganese cations, suggests similarities with the dUTP/dUDP hydrolysis mechanism reported for dimeric dUTPases. The genome of D. radiodurans encodes for all enzymes required for dTTP synthesis from dCMP, thus bypassing the need of a dUTPase. We postulate that DR2231 protein is not essential to D. radiodurans and rather performs “house-cleaning” functions within the framework of oxidative stress response. We further propose DR2231 protein as an evolutionary precursor of dimeric dUTPases.

Bacteria possess a wide variety of enzymes with the common physiological function of sanitizing the cell of wasteful or even toxic endogenous metabolites. Such enzymes also modulate the accumulation of certain intermediates in biochemical pathways. These enzymes have been coined "house-cleaning" enzymes, because they protect the cell from the harmful effects resulting from the unbalanced presence of potentially toxic compounds.
Pyrophosophohydrolysis, the enzymatic cleavage of the ␣-␤ phosphodiester bond of nucleoside triphosphates (NTPs) generates the corresponding nucleoside monophosphate and inorganic phosphate (PP i ), subsequently hydrolyzed into free phosphate by other enzymes present in the cytosol. These exergonic reactions allow the recycling of nucleoside monophosphates (NMPs) back into metabolic pathways as well as intracellular scavenging for inorganic phosphate. Pyrophosphohydrolysis, however, has yet another role in cellular metabolism and is used in the removal of non-canonical nucleotide triphosphates arising from oxidation, deamination, or other modifications of canonical nucleotides (1), thereby preventing their incorporation into DNA or RNA. Proteins hydrolyzing NTPs with such cleansing functions can be found in structurally different superfamilies, such as the MutT-related hydrolases (Nudix) (␣ ϩ ␤), dUTPase (all-␤), dITPase (Maf/HAM1) (␣/␤), all-␣-NTP pyrophosphatases (MazG), and phospho-ribosyl-ATP pyrophosphatase (HisE).
Analysis of the complete genome of Deinococcus radiodurans revealed the specific expansion of certain protein families believed to be connected to the organism's response to stress and damage resistance, repair mechanisms, and signal transduction (2). Orthologs of almost all known genes involved in stress response in other bacteria are present in Deinococcus (3). In the case of some missing gene families, function is maintained by nonorthologous proteins with similar functions (4). Among the highly represented protein families are hydrolases, such as the house-cleaning Nudix pyrophosphatases and other pyrophosphohydrolases, the calcineurin-like phosphoesterases, phosphatases, lipase/epoxidase-like (␣/␤) hydrolases, subtilisin-like proteases, and sugar deacetylases (5). A considerable number of these stress response-related genes are clustered in unusual gene arrays. Some appear to have evolved by tandem duplication (see DR2254/DR2255 or DR0783/DR0784, for example) or gene translocation (DR0675 to DR0677), within the Deinococcus lineage. However, the majority of unusual gene clusters suggest that acquisition took place through horizontal gene transfer from various archael, bacterial, and even eukaryotic sources (4).
Like other bacteria, D. radiodurans seems to lack entirely monomeric, trimeric, and archeal dUTPases, as has been pointed out by Moroz et al. (6). Recently, the crystal structures of dUTPases from Campylobacter jejuni and Trypanosoma The atomic coordinates and structure factors ( 2 To whom correspondence may be addressed. E-mail: daniele.de_sanctis@ esrf.fr. 3 To whom correspondence may be addressed. E-mail: seanmcs@esrf.fr.
cruzi revealed a new all-␣-helix fold with a homodimeric arrangement in contrast to the classical trimeric dUTPase reported until then. No homologues of this dimeric dUTPase could be identified in D. radiodurans.
Through sequence analysis restricted to the active site motif, Moroz et al. (6,7) identified a "basic module" of the dUTPase/ dCTPase family in the genomes of several Gram-positive bacteria and respective phages. This basic module, consisting of only five active site-forming helices, is conserved in two other families: the nonspecific NTP-PPase MazG and phosphoribosyl-ATP pyrophosphatase HisE. These three enzyme families, which share similar function, were unified into a single superfamily, the all-␣-NTP-PPase superfamily. Following the same criteria, DR2231 was identified by sequence analysis as a putative member of this superfamily, as were two other genes encoding MazG family proteins, DR1022 and DR1183 (7). We performed a broader sequence search using as a query the sequence of Escherichia coli MazG protein, (8) against the D. radiodurans genome and identified only DR1022 and DR1183 as MazG-like proteins. Furthermore, there are no annotated phosphoribosyl-ATP pyrophosphatase HisE entries in the D. radiodurans genome. Determining the three-dimensional structure of DR2231 protein and clarifying its function acquire a particular significance regarding the genetic versatility of D. radiodurans in stress response and DNA damage.
Here, we report the crystal structure of DR2231 from D. radiodurans at 1.8 Å resolution. The enzyme has been identified as the prototype of a subfamily of the NTP pyrophosphohydrolase superfamily (7). It has significant structural resemblance to MazG but is functionally related to the dimeric dUTPases, exhibiting activity exclusively on deoxy-NTPs with a very high specificity toward dUTP and none toward dTTP. The crystal structures of the native protein in the apo form and with Mn 2ϩ coordinated to the active site and DR2231 in complex with its product from dUTP hydrolysis with either Mn 2ϩ or Mg 2ϩ are presented. We propose that the DR2231 protein is a dUTPase with marked specificity in hydrolyzing dUTP into dUMP and thus reduces the dUTP/dTTP ratio, which, otherwise, would compromise DNA integrity.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification of DR2231-The open reading frame annotated as DR2231 was amplified by PCR from the genomic DNA of D. radiodurans. The PCR product comprising the 5Ј-CACC overhang was inserted into the expression pET151/D-TOPO directional vector (Invitrogen) with His 6 at the N terminus and a recombinant tobacco etch virus protease cleavage site. The resulting expression vector was used to transform E. coli BL21 (DE3) competent cells (Invitrogen). Transformed cells carrying pET151-DR2231 were grown in Luria-Bertani enriched growth medium containing 100 g/ml ampicillin; the medium was inoculated with a sample of an overnight culture and grown at 310 K until an optical density of ϳ0.65 at 600 nm was reached, at which point isopropyl ␤-D-1thiogalactopyranoside was added to a final concentration of 0.3 mM, thus inducing overexpression. Cultures were allowed to grow for a further 4 h, after which the cells were harvested at 7000 ϫ g for 20 min at 277 K and resuspended in lysis buffer (50 mM Tris-HCl, pH 7.0, 300 mM NaCl, 2 mM ␤-mercaptoethanol, 5% (v/v) glycerol). The cells were frozen at 193 K, thawed, and lysed by two passes through a French press. DNase I was added to a final concentration of 20 g/ml together with an EDTAfree protease inhibitor tablet (Roche Applied Science). The lysed culture was centrifuged at 40,000 ϫ g for 30 min at 277 K, and the soluble fraction was loaded onto a 5-ml His-Trap column (GE Healthcare) equilibrated with buffer A (50 mM Tris-HCl, pH 7.5, 300 mM NaCl, 5% (v/v) glycerol). The first wash consisted of a step elution to 12% buffer B (buffer A supplemented with 500 mM imidazole) followed by a gradient elution from 12-100% buffer B. DR2231 was eluted and pooled near 250 mM imidazole, as confirmed by SDS-PAGE. The pooled fractions were exchanged into a suitable buffer for His tag cleavage by enzymatic digestion with tobacco etch virus protease. The tag-free protein was again loaded onto a 5-ml His-Trap column, and the flow-through was collected. Further purification was performed, and oligomerization states were confirmed using a Superdex S200 HRHiPrep 16/60 column (GE Healthcare) equilibrated with 25 mM Tris-HCl, pH 8.0, 150 mM NaCl. The oligomerization state and monodispersity of DR2231 in solution were estimated by dynamic light scattering (Malvern Instruments, Zetasizer, Nano series). The purified protein was concentrated to 7.5 mg/ml and cryo-cooled to Ϫ70°C when necessary.
Crystallization, Data Collection, and Structure Determination-DR2231 was successfully crystallized in three different crystallization conditions, each producing a different crystal form. All conditions were obtained through an initial screening (with or without preincubation of substrates or analogues) using the high throughput crystallization robot, Cartesian PixSys, and automated imaging system, both available at the High Throughput Crystallization Laboratory at EMBL (Grenoble, France) (9). The first crystals (crystal type A) appeared in 0.01 M magnesium chloride, 0.05 M sodium cacodylate, pH 6.0, and 1 M lithium sulfate (Natrix screen formulation, Hampton Research). Crystals were reproduced without further optimization and achieved dimensions of ϳ0.4 ϫ 0.3 ϫ 0.2 mm within 4 -5 days. For data collection, 16% (v/v) glycerol was included in the mother liquor as cryoprotectant, and crystals were immediately flash-cooled in liquid nitrogen (100 K) after soaking for a few seconds. The second crystal form (crystal type B) refers to crystals obtained from 0.18 M lithium acetate and 20% (w/v) PEG 3350 (optimized from the PEG/Ion Formulation Screen, Hampton Research). Prior to setting up of the crystallization drops, protein was incubated with 10 mM dUTP and 10 mM manganese chloride (DR2231_dUMP_Mn) or 10 mM magnesium chloride (DR2231_dUMP_Mg). Single crystals grew in 7-8 days as thick plates. These were cryoprotected by soaking in crystallization solutions progressively increased in glycerol content up to 16% (w/v). The third successful crystallization condition (crystal type C) refers to crystals with a diamond-like shape that grew in 1 M LiCl, 0.1 M citric acid, pH 5, 10% (w/v) PEG 6000 (obtained from Grid Screen PEG6000 KIT, Hampton Research). DR2231 protein was co-crystallized in the presence of 10 mM dUpCpp or dUpNHpp and 10 mM MgCl 2 (or MnCl 2 ). These crystals were collected directly from the robot screen and subjected to cryogenic cooling after immersion in paraffin oil.
Type A crystals belong to the space group P2 1 2 1 2 1 with six molecules in the asymmetric unit and were used in three different data collection experiments: DR2231_Apo_1, DR2231_Mn, and DR2231_Gd. For de novo phasing (DR2231_Gd), a heavy atom derivative crystal was prepared by soaking crystal of type A in 20 mM GdCl 3 added to the mother liquor for ϳ18 h. DR2231_Mn corresponds to a data set collected on a type A crystal that was soaked overnight in a modified precipitant: 250 mM lithium sulfate, 750 mM manganese sulfate.
For the crystal type B form, the space group was P2 1 2 1 2 with four molecules in the asymmetric unit, arranged in two dimers. This crystal type has been pursued in order to obtain a protein complexed with the substrate or the product in the active site and with the metal atoms.
Crystal type C was produced in order to obtain DR2231 with a bound non-hydrolyzable analog and followed the unsuccessful trials with type B. Unfortunately, neither crystal form when examined revealed the substrate analog in the postulated active site. Crystals obtained correspond to another apo form (similar to crystal type A) and belong to space group P6 1 22 with one molecule in the asymmetric unit, the dimer being generated by a crystallographic 2-fold axis.
All data collection statistics are summarized in Table 1. Diffraction data from crystal type A were processed with MOS-FLM (10) and scaled with SCALA (11). Data from crystal forms B and C were integrated and scaled with XDS (12). In both cases, further data analysis was carried out using the CCP4 suite 6.0.2 (13).
The structure of DR2231 was solved by single wavelength anomalous dispersion from a highly redundant data set from the gadolinium derivative collected to 2.0 Å resolution and at a wavelength of 1.245 Å using an x-ray at the European Synchrotron Radiation Facility (ESRF) beam line ID14eh4 (14). After analyzing the structure factors using SHELXC and SHELXD with HKL2MAP version 0.2 (15,16), the positions of one gad-olinium site per monomer could be determined. Initial phases were calculated with SHELXE (17), which also applied density modification. Structure factors and density modified phases were input in ARP/wARP (18), which produced, with the automatic model building feature, a partial starting model. The initial model was then completed by rounds of manual building with Coot (19) and refinement with REFMAC (20). The structure of the native protein was obtained by rigid body refinement using the gadolinium derivative structure as an initial model; for space groups other than P2 1 2 1 2 1 , molecular replacement with the program Phaser (21) using the gadolinium derivative structure as a starting model was used for structure determination. Protein/substrate structures were determined again with Phaser using the higher resolution native structure as a search model.
All structures were refined using bulk solvent correction and a maximum likelihood target function. Resulting models were all subject to translation/libration/screw (TLS) 4 refinement within REFMAC (20), using the rigid and mobile domains as separate TLS groups except for DR2231_dUMP_Mn and DR2231_dUMP_Mg, which were refined with PHENIX, in which TLS groups were defined differently for the open and closed conformations. TLS groups were chosen after analysis of the structure with TLSMD (22). All of the data measured, apart from a 5% test set used for R free calculation, were included in the refinement. The quality of the structures was assessed using PROCHECK (23) and MolProbity (24). Refinement statistics and geometry analysis of the structures are summarized in Table 2.
NTP Pyrophosphatase Assay-NTP pyrophosphatase assays for DR2231 were performed using the malachite green phosphate assay (26,27). For ELISA plates (200-l total volume/ well) DR2231 (2 g or 2 ng, as indicated) in reaction buffer (100 mM Tris/HCl, pH 7.2, 20 mM MgCl 2 ) was incubated with varying concentrations of NTPs or dNTPs, ranging from 1.0 to 20 M at room temperature (delivered from 100-fold stocks) for 10 min. Each reaction contained inorganic pyrophosphatase enzyme (0.05 unit) (Sigma) in order to convert released PP i into P i and allow full binding to malachite green dye. Reactions were terminated upon the addition of 40 l of dye reagent, whereupon the dye color changes from orange to green after 10 min. Control reactions lacking DR2231 were performed for each substrate (10 M) in the presence of inorganic pyrophosphatase, in order to guarantee the absence of ␤-␥ phosphate cleavage. Another set of control reactions lacking inorganic pyrophosphatase produced no color development, further proving that DR2231 releases PP i and not phosphate in free form.
Initial velocities were followed by incubating a 1.8 ml of reaction buffer at 37°C, with agitation, in the presence of inorganic phosphatase (0.05 unit) and substrate. DR2231 at 0.634 nM (2 ng/l) was added, and 160-l aliquots were withdrawn at 0, 1, 2, 4, 6, 8, and 10 min and added to 40 l of malachite green dye each. After a 10-min incubation, absorbance was measured at 630 nm; phosphate quantification was calculated against a previously determined calibration curve for inorganic phosphate in the buffer conditions used. For metal preference tests, magnesium was replaced by 20 mM MnCl 2 , CaCl 2 , ZnCl 2 , NiCl 2 , or EDTA.
Isothermal Calorimetry (ITC) Assay-Isothermal calorimetry was carried out using a VP-ITC titration microcalorimeter (Microcal Inc., Northampton, MA). Reaction cells were filled with solutions and equilibrated at the experimental temperature. After equilibration, an additional delay period was allowed to generate the base line used in subsequent data acquisition. All solutions were degassed prior to an experiment. The stirring speed was set at 326 rpm, thermal power was recorded every 2 s, and heat flow (reference power of 15 cal/s) was recorded as a function of time.
Ligand solutions were prepared in the buffer from the last dialysis change. ITC measurements were routinely performed in 20 mM Tris/HCl, pH 7.0, 5 mM NaCl, 25 mM MgCl 2 (or MnCl 2 ). Controls were performed by injecting ligand into buffer under identical experimental conditions. Raw data were collected and corrected for the ligand heat of dilution. Kinetic data analysis was determined by the single injection method, where the change in thermal power as substrate is depleted can be continuously monitored (28,29).

Overall Fold of DR2231 and Structure of the Dimer-DR2231
is a single chain protein composed of 148 amino acids with an estimated molecular mass of 16,056 Da. The recombinant protein reported here corresponds to the full-length sequence and includes six extra N-terminal residues from the cloning vector (Gly, Ile, Asp, Pro, Phe, Thr). Three different crystal forms belonging to three different space groups, P2 1 2 1 2 1 , P2 1 2 1 2, and P6 1 22, were obtained, depending on whether a substrate (or substrate analog) was used for co-crystallization or not. All crystal forms show the same basic fold.
The dimer (Fig. 1B) is composed of two monomers with essentially the same structure, interacting in a head to tail orientation with overall approximate dimensions of 45 ϫ 49 ϫ 97 Å. Upon dimer formation, helices 2 and 3 from one monomer b R free was calculated with a small fraction (5%) of randomly selected reflections.
stack antiparallel to helices 2Ј and 3Ј of the other monomer, respectively. A stable four-helix bundle is formed in the center. On either side, flanking the central bundle, are the "winged" regions composed of H1 from one monomer and H4Ј and H5Ј from the other. Helix 1 is anchoring H4Ј into its position relative to H3 and gives further stability to the "fork." The intertwining of the two hairpin structures produces an extensive subunit-subunit interface, burying a total surface of 2832 Å 2 , where, among the 73 residues participating in the interface, 58% are non-polar and 42% are polar (analyzed using the protein-protein interaction server (30)). This value increases to ϳ2922 Å 2 for the dimer in complex with dUMP. Interestingly, the long loop (residues 23-33) connecting H1 and H2 that seems to stabilize the residues in the kink (residues 93-96) between H3 and H4, confers further sturdiness to the rigid helix bundle. A considerable array of interactions secures this stabilization through hydrogen bonds: Ala 23  There is yet another hydrogen bond between Pro 32 (CϭO) and Gln 62 (N ⑀2 ) in helix 2 of the partner monomer. Structures belonging to the P2 1 2 1 2 1 space group (6 molecules/asymmetric unit) (DR2231_Gd, DR2231_Apo_1, and DR2231_Mn) show very good electron density for the N-terminal residues. These residues (Gly Ϫ5 to Pro 8 ), including those provided by the cloning vector, are anchored in the substrate binding pocket of a symmetry-related molecule. The disulfide bond formed between Cys 6 residues of each dimer is an artifact induced by crystallographic packing in this crystal form and does not appear in the other two forms reported. The final five residues (Ala 144 -Asp 148 ) are always unstructured, with the exception of some chains in the DR2231_Mn structure. Structures determined with crystal type B (4 molecules/asymmetric unit; DR2231_dUMP_Mg and DR2231_dUMP_Mn) showed electron density attributable to residue Pro 7 and onward. One monomer in each dimer lacks density for residues 116 -123 as well as for residues 145-148. Finally, the structure corresponding to space group P6 1 22 (DR2231_Apo_2) holds one protein chain per asymmetric unit, and, similar to the P2 1 2 1 2 1 structures, the first four N-terminal residues from a symmetry-related molecule are imbedded in the active site.
In accordance with the literature (6 -8), compared with dUTPases and MazG proteins, DR2231 also holds the Mg 2ϩ (or Mn 2ϩ ) coordination motif, EXXE 12-28 EXXD, crucial for biocatalysis. For DR2231, however, the first metal binding conserved motif is EEXXE, similar to all MazG proteins, whereas the second conserved motif, EXXD, does not have the extra glutamate, similar to most dimeric dUTPase conserved motifs (Fig. 2). The Mg 2ϩ -coordinating side chains in DR2231 are Glu 47 , Glu 50 , Glu 79 , and Asp 82 , the first two located on H2 and the latter two on H3, and are directed toward the pocket formed by the central helix bundle and the "winged" helical elements. This crevice composes the substrate binding pocket (described below). The loop between H4 and H5 (Ser 114 -Glu 127 ) is very mobile and may possibly have a role as a lid for substrate accessibility and recognition. Different structures of DR2231 reported here have shown this loop in an "open" conformation in the P2 1 2 1 2 1 and P6 1 22 structures (DR2231_Apo1, DR2231_Mn, and DR2231_Apo2) and in a "closed" conformation for those complexed with the product, dUMP.
Comparison with dUTPase and MazG Structures-As already mentioned, the divalent ion binding motif is the identity signature for dUTPase/dCTPase, MazG, and HisE proteins. Outside this stretch of residues, sequence homology is usually poor even among subfamily members (Fig. 2). However, despite low sequence homology with other family members, the structure similarity is strikingly high when compared with known structures of both dimeric dUTPases and MazG proteins.
By superimposing the structures of a dimeric dUTPase (C. jejuni dUTPase, PDB code 1W2Y) and two MazG structures (S. solfataricus MazG, PDB code 1VMG; Bacillus subtilis YpjD, PDB code 2GTA) with DR2231_Mn, the structural elements in common become evident, highlighting the basic module as the unifying structural element throughout the all-␣-NTP pyrophosphohydrolase superfamily. For C. jejuni dUTPase, the structural overlay is performed with the monomer on the DR2231 protein dimer; the superimposition clearly supports the notion of a gene duplication of the basic module for the dUTPase (Fig. 3A). Within this four-helix bundle, two helices (DR2231 H2(3) and H3(4), 5 from monomer A, aligning with C. jejuni dUTPase H3 and H4, respectively) are particularly rigid. The perfect conservation of the Mg 2ϩ -binding motif is noteworthy. In contrast to the DR2231 protein dimer, which holds two putative active sites, the C. jejuni dUTPase monomer, despite structural resemblance, has evolved into an enzyme with a single active site. Besides these features, no other similarity between the proteins is found from the superposition.
Superposition of the structure of DR2231 with "shorter" versions of MazG structures produced very good overlays (Fig. 3B). For B. subtilis YpjD protein (PDB code 2GTA), the calculated root mean square deviation over all C ␣ atoms is Ͻ2.01 Å, and for S. solfataricus MazG (PDB code 1VMG), the root mean square deviation is Ͻ1.85 Å. The overlay includes elements beyond the rigid helices H2(3) and H3(4) and is extended to the entire basic module, including H1, H4, and the loop connecting H3 and H4. In overlaid structures, the magnesium-binding residues are wedged between H2(3) and H3(4) and are structurally coincidental with those of DR2231. No similarity is found in the loops between the helices H1, H2, and H3. There is no structural equivalent for the loop between H4 and H5 (which is involved in substrate recognition) and the H5 helix itself in either MazG structure.
Open and Closed States of DR2231-The superposition of a native structure of DR2231 protein (DR2231_Apo1, DR2231_Apo2, or DR2231_Mn) with a structure in complex with product (DR2231_dUMP_Mg or DR2231_dUMP_Mn) shows major conformational changes concerning H5 and the lidlike loop between H4 and H5 (Fig. 4). The loop region (residues Ser 114 -Glu 127 ) contains the only ␤-turns present in DR2231; the first is a type I ␤-turn and comprises residues Arg 118 -Gly 121 , and the second is a type II ␤-turn encompassing residues Pro 126 -Trp 129 .
DR2231_dUMP_Mg and DR2231_dUMP_Mn structures (P2 1 2 1 2; with two dimers in the asymmetric unit) configure two situations; in monomers A and C, the active sites are occupied with a dUMP molecule each and the lidlike loops in this conformation that correspond to the closed state. Interestingly, the

Crystal Structure of MazG-like DR2231
second situation shows the respective monomers, B and D, with an unoccupied active site and a flexible latch such that no electron density is detected for residue range 116 -123. Nevertheless, this conformation suggests the true marked mobility of the lidlike loop when in the open state.
Structural overlay of DR2231_dUMP_Mn and DR2231_ Apo1 shows that the latch is displaced ϳ10 Å when in the presence of the product, with a rigid rotation of the residue 109 -131 fragment of about 30°.
Divergence between open and closed states appears at the end of H4 with residues Asn 109 -Lys 112 (Fig. 4). The N ␦2 from the side chain of Asn 109 is bound to a water molecule, whereas the O ␦1 is bound to two waters in the open state; in the closed state, N ␦2 hydrogen binds O3Ј of the deoxyribose, and O ␦1 keeps one of the bound water molecules and simultaneously makes a hydrogen bond to the O4Ј of the deoxyribose of dUMP and the N of the Lys 112 side chain.
The torsion angle (⌿) of Ser 111 changes considerably between open and closed structures, and this implies major modifications on Lys 112 , changing in particular the direction of its side chain and the number of hydrogen bonds in which this residue is involved. The nitrogen of the peptide bond is hydrogen-bound to the carbonyl oxygen of Ala 108 in the open state, but in the closed state, this bond is broken, and the nitrogen becomes involved in two different hydrogen bonds: with the carbonyl oxygen of Asn 109 and Leu 110 , simultaneously. This results in a reorientation of the side chain, and the N ␦2 is directed toward the cavity in order to bind O4Ј (of the deoxyribose) and O5Ј in dUMP, two water molecules, and O ␦1 of Asn 109 . The torsion angles of Ala 113 and Ser 114 (the side chain O ␥ , bound to a water molecule in the open state, has a hydrogen bond with the carbonyl oxygen of Ser 111 in the closed state) change a great deal as a result of these rearrangements. Curiously, both Gly 115 and Pro 116 keep their hydrogen bonding intact in both conformations, and only minor rearrangements occur with the residues that follow up to residue 122. Here, both Lys 122 and Gln 123 play an important role in substrate recognition and participate in the active site arrangement. Lys 122 N is pointing outside the substrate cleft in the absence of dUMP, but once in the presence of the product, the peptide nitrogen binds to O ␦1 of Asp 120 , and N binds in a tetrahedral fashion to a water molecule, O ⑀1 and O ⑀2 of Glu 46 of the second subunit, and OP1 (of the ␣-phosphate coordinated by Mn2) of dUMP. The O ⑀ of Gln 123 loses a solvent water molecule and becomes bound to N ⑀ and N 1 of Arg 117 , whereas the N ⑀2 of Gln 123 binds the carbonyl oxygen of Gly 115 in both states, but in the closed state, it also binds the carbonyl oxygen of Lys 112 . This observation is particularly relevant because the angle produced by the loop movement upon substrate/product binding seems to be stabilized by this hydrogen bond. Another interesting feature of this residue is that the carbonyl oxygen that binds a bridging water is bound in turn to OP3 of dUMP, whereas the nitrogen of the peptide bond is binding O4 of the uracil in dUMP. Finally, another important residue involved in product binding is Lys 125 , which normally has the side chain N hydrated by two water molecules but in the closed state binds OP2 (of the ␣-phosphate) of dUMP.
Comparing DR2231 with the dUTPases and MazG structures available, a lack of structural equivalents to this loop is evident. The closest possible resemblance would come from C. jejuni dUTPase (PDB code 1W2Y) rather than MazG structures. C. jejuni dUTPase reveals a short latch between helices 3 and 4 that is swapped between subunits and reaches into the active site region. This latch contains residues responsible for substrate binding and specificity. iMazG from Vibrio sp. DAT722 (PDB codes 2Q5Z and 2Q73) exhibited an open and closed conformation contributed by a moveable helix, H1, postulated to be involved in substrate capture. Nevertheless, in all structures of DR2231 protein, H1 is rigid and shows no movement or reorientation relative to the central helix bundle.
The structural elements from the lidlike latch in DR2231 protein responsible for substrate recognition and catalysis only partly overlap with structural elements with the same role in C-terminal E. coli MazG (PDB code 3CRC) (namely H10 and the short loop between H6 and H7). This difference strongly suggests that the high structural variability in the vicinity of the binding pocket, other than the rigid helix bundle, conveys to the enzyme its specificity toward substrates and establishes the subfamily identity.
Structure of the Active Site; Mg 2ϩ Versus Mn 2ϩ -The wildtype DR2231 protein was co-crystallized with the reaction substrate, dUTP, and in the presence of Mg 2ϩ or Mn 2ϩ . In both crystal structures obtained (DR2231_dUMP_Mg and DR2231_dUMP_Mn), well defined electron density was located in the ligand binding pocket and subsequently modeled as the hydrolyzed product, dUMP, in one active site per dimer (Fig. 5, A and B).
In DR2231_dUMP_Mg structure, one magnesium ion is present per subunit, each coordinated in the complexed subunit. The Mg 2ϩ is coordinated by three conserved glutamate residues (Glu 47 , Glu 50 , and Glu 79 ) and one conserved aspartate (Asp 82 ) (Fig. 5, A and B). Bond distances correspond well to those expected for the Mg 2ϩ ion (31,32). A further coordination is established with water W1, which is bridging between the metal and the ␣-phosphate of the product.
Regarding DR2231_dUMP_Mn, the unoccupied active sites hold only one Mn 2ϩ ion (Mn1) with an identical coordination to DR2231_Mn of crystal type A (and identical to gadolinium coordination in DR2231_Gd). In the dUMP-occupied site, two manganese divalent cations (Mn1 and Mn2) can be modeled. Mn1 is in the same position observed for Mg1, whereas Mn2 is found in the vicinity of the first and is always hexacoordinated to Glu 46 (O ⑀2 ), Glu 47 (O ⑀2 ), and Glu 50 (O ⑀1 ) as well as two water molecules and OP1 of the ␣-phosphate of dUMP (Fig. 5A). W1 water is located in the same position as in DR2231_dUMP_Mg and is coordinated between the two Mn 2ϩ ions. The two Mn 2ϩ atoms have been confirmed by anomalous difference Fourier map. In the DR2231_dUMP_Mg structure, a water molecule was modeled in the position of Mn2 because the coordination distance was not typical for an Mg 2ϩ ion. An overlay of both structures, DR2231_dUMP_Mg and DR2231_dUMP_Mn, gives an excellent superposition with a minor exception for the residues surrounding Mn2. In the presence of Mg1, Glu 50 is bound to this metal in a monodentate manner, through O ⑀1 , binding also the "bridging" water and a second water molecule.
Glutamate 46 side chain is pointing away from this region. However, for DR2231_dUMP_Mn, when Mn2 is in the position occupied by the second water in DR2231_dUMP_Mg, the side chain of Glu 50 rotates in order to to bind Mn1 and Mn2 in a bidentate fashion. The presence of Mn2 reorients Glu 46 , which in turn is also binding an oxygen atom from the ␣-phosphate of dUMP.
It is not known whether DR2231 binds manganese in vivo. Enzymatic studies reported here have shown a similar preference for both divalent cations. The first crystallization condition obtained (P2 1 2 1 2 1 ) produced an apo form of the protein with no Mg 2ϩ in the metal-binding site (DR2231_Apo1), although attempts to produce crystal type A with Mg 2ϩ in the metal-binding site by soaking, co-crystallization, or replacing Li 2 SO 4 with varying concentrations of MgSO 4 from 250 to 750 mM, failed. In contrast, the same procedure was reproduced now with varying concentrations of manganese sulfate; all soaks produced structures holding high occupancy of the Mn 2ϩ ion coordinated by the EXXE 28 EXXD motif. This may be explained by the more labile nature of magnesium coordination when compared with manganese (33).
Specific Substrate-interacting Residues of the Active Site-The environment composing the substrate binding pocket is composed by both hydrophobic and polar residues. The uracyl ring is settled between hydrophobic residues Phe 17 and Leu 43 , each one making van der Waals interactions on both sides of the moiety. Furthermore, O2, N3, and O4 of the uracyl moiety interact with three water molecules, with W3, W4, and W5, respectively (Fig. 5B). W3 bonds to Arg 40 (N 1 ) and Arg 40 (N 2 ), and W4 bonds to Arg 40 (N 1 ) and His 18 (N ⑀2 ), whereas W5 is involved in a hydrogen bond to W4. These two uracilanchoring residues, His 18 and Arg 40 , are highly conserved within the DR2231 subfamily (Fig. 9).
Atom O4 of the uracyl is also bound to the nitrogen of the peptide bond of Gln 123 . This suggests that there may still be space for dATP or dGTP, depending on the extent of accommodation given by the lidlike latch, although a similar network of interactions as seen with dUMP is not foreseeable with these two nucleotides. On the other hand, the proximity of the backbone of the lidlike loop of the second subunit appears to make steric hindrance toward the methyl group from thymine.
One of the divalent binding residues, Asp 82 (through O ␦1 ), coordinates the O3Ј on the deoxyribose moiety, along with Asn 109 (from the second subunit). In the vicinity of the C2Ј of the deoxyribose are Tyr 85 and Val 86 , whose hydrophobic character would not leave room for the 2Ј-hydroxyl group of a ribose moiety. Conservation of hydrophobic residues in this region is a feature among dimeric dUTPases, dCTPases, and HisE families, but not the MazG family. In the DR2231 subfamily, Tyr 85 and Val 86 are included in a highly conserved stretch of residues (positions 85-89). There is another feature quite exclusive to DR2231 concerning the deoxyribose; Lys 112 binds both O4Ј and O5Ј, securing a proper orientation of the triphosphate moiety for nucleophilic attack of the catalytic water. This residue is strictly conserved within the subfamily.
Finally, the ␣-phosphate is bound by residues: Glu 46 (when the second divalent cation is present), Lys 122 , and Lys 125 . The latter two belong to the swapped lidlike loop, a hallmark motif of this subfamily.
(Deoxy)NTP Pyrophosphatase Activity-Recent literature relative to enzymatic activity of MazG (nucleoside triphosphate pyrophosphohydrolase)-related proteins has suggested that these enzymes are active on canonical (d)NTPs (8,34,35). However, there has also been debate as to whether MazG-like proteins may not be equally involved in the removal of noncanonical nucleotides (36). Purified DR2231 was tested for (d)NTP-PPase activity against ATP, GTP, UTP, CTP, dATP, dGTP, dTTP, dCTP, and dUTP, using an indirect colorimetric malachite green assay. Cleavage of the NTP is expected to occur at the ␣-␤ phosphate bond, with the release of PP i as one of the products. Proper binding with the dye and colorimetric measurement requires pyrophosphate to be further cleaved into free P i . To this end, inorganic phosphate is used, and magne- sium must be present in the assay. A simple setup allows quick evaluation of specificity (Fig. 6, A and B).
No hydrolysis was observed for canonical NTPs (ATP, GTP, CTP, and UTP). The same amount of enzyme had no specificity toward dTTP. The general order of reactivity for dNTPs is dUTP Ն dATP Ͼ dGTP Ͼ dCTP Ն dTTP. Note that the hydrolysis of dUTP was by far strongest. This screen was repeated in the presence of a 10 3 -fold dilution of enzyme, showing a strong specificity for dUTP and an almost negligible specificity for the others (Fig. 6B). Linear velocities at a 5 M concentration of each of these dNTPs were performed, confirming the results obtained from the screen (Fig. 6C). As expected, DR2231 cleaves between the ␣-␤ phosphate bond, releasing (deoxy)NMP and PP i . Quite unexpected was its utter preference for dUTP, although all reports available on MazG proteins have never included tests for dUTPase activity (8,34,35).
Metal preference was also tested by determining initial velocities for dUTP hydrolysis in the presence of Mn 2ϩ , Ca 2ϩ , Ni 2ϩ , and Zn 2ϩ (Fig. 6D). Consistent with the crystallographic data, both manganese and magnesium coordination favor enzymatic activity, whereas calcium and all other metals tested show residual activities.
Isothermal Calorimetric Assays-The requirement of magnesium for inorganic phosphatase activity and the fact that the colorimetric assay is in itself an indirect method led to the need of applying a more direct technique, such as ITC. This technique allows assessment of possible differences in catalysis between manganese and magnesium. Fig. 7A shows a typical experimental thermogram for the titration of DR2231 protein with dUTP in the presence of 25 mM MgCl 2 (at pH 7.0 and 25°C). Three (single) injections were performed, and the profile of the trace exhibits an enzymecatalyzed reaction with product inhibition. The raw data from Fig. 7A (indicated by a box) is treated and transformed into a plot, Fig. 7D, representing reaction rates versus substrate concentration. Kinetic parameters were calculated by fitting the curve to the Michaelis-Menten equation using non-linear regression and following the method described by Todd and Gomez (28). The experiment was performed under conditions where product inhibition may be considered negligible for the first injection. The subsequent injections show a shallower peak, due to inhibition by dUMP. Such behavior has been reported for Plasmodium falciparum dUTPase (29) and T. cruzi dUTPase (37). Apparent kinetic parameters (K m(app) , k cat , and catalytic turnover) are listed in Table 3. These values fall within a similar range to those reported for P. falciparum (29), a trimeric dUTPase. They show, nonetheless, a weaker specificity and catalytic turnover, when compared with other An identical experiment was conducted with 25 mM MnCl 2 in place of MgCl 2 , and parameters were calculated in the same fashion (Fig. 7D, gray curve). K m(app) and k cat values are similar, although manganese suggests being less efficient catalytically (Table 3). Catalytic efficiency of dUTPase from P. falciparum showed a marked loss of catalytic turnover, ϳ17-fold when Mn2 ϩ was exchanged for Mg2 ϩ , against a 1.4-fold difference for DR2231 protein.
Activity assays testing for dTTP hydrolysis by DR2231 protein were also implemented (Fig. 7E). Dilution peaks of the substrate in the calorimetric vessel in absence of enzyme were of the same amplitude as those measured in presence of the enzyme. This confirms for DR2231 protein the complete lack of specificity for dTTP, in accordance to what had been observed with the colorimetric assays and as has been reported for true dUTPases.   13.5 6.9 197 analysis, it was assigned to a distinct family. Unpredictable, however, was its striking affinity for dUTP, a feature untested in all reported MazG proteins to date, and it was shown to have a rather broad selectivity for (deoxy)nucleotides. Unique features of operon organization regarding this superfamily open debate as to whether DR2231 is the "missing" dUTPase in the D. radiodurans genome (4) or another MazG-like NTP-PPase in the notably overrepresented hydrolase families of this bacterium. To date, three entries encoding all-␣-NTP-PPase have been identified in the D. radiodurans genome. DR1183 is annotated to encode the MazG protein, and both DR1022 and DR2231 are annotated as genes encoding conserved hypothetical proteins. There is no biochemical or structural information available on these proteins in the literature. A close focus on the genomic context of these genes may provide clues toward their function (40).

DISCUSSION
In E. coli the genetic module mazEF consists of two adjacent genes, mazE and mazF, located downstream from the relA gene. The mazG gene is the fourth on the operon, located immediately downstream of mazF (41). In D. radiodurans, however, the mazEF operon is organized differently; the addiction module mazE/mazF (corresponding to locus DR0416/ DR0417, respectively) is not preceded by the relA gene or rather the spoT gene, as named in Gram-positive bacteria (putatively DR1838), whereas the mazG gene (assigned to DR1183) does not follow the addiction module. This gene dispersion is not unique among bacteria regarding this operon, and for D. radiodurans, in particular, it further highlights horizontal gene transfer as a rich source of genetic diversity (42).
In contrast with other bacteria, D. radiodurans appears to lack MazG(-like) proteins with two tandem domain arrangements; both DR1183 and DR1022 proteins hold, according to sequence analysis, only one copy of the core helical domain, with one active site per monomer, as DR2231 also does. Both DR1183 and DR1022 neighbor a Nudix hydrolase gene each (43,44). Deducing possible functions based on genomic context has led to the proposal that MazG(-like) domains may be involved in non-canonical NTP processing. Curiously, however, DR2231 does not have any gene neighbors with pyrophosphohydrolase or phosphatase activity that may provide clues toward its cellular function.
The structural data obtained from the crystal structures of DR2231 complexed with dUMP reported here show that the ligand-binding pocket cannot accommodate canonical nucleotides. This not only explains the lack of hydrolytic activity on all nucleotide triphosphates tested but also rules out, in particular, the stringent response molecule, ppGpp, as a substrate for DR2231 protein. In E. coli MazG, nonpolar aromatic residues constitute a hydrophobic pocket for bases without any interaction with the 2Ј-OH of the ribose ring, explaining how both oxyand deoxyribonucleotides could be substrates to this enzyme (8). Although DR2231 has a remarkable structural and topological resemblance to MazG proteins, structural details within the active site relative to the binding of the (deoxy)ribose ring reveal its strong affinity to dUTPases.
Through their catalytic activity, dUTPases provide the precursor for the formation of dTMP by thymidylate synthase but simultaneously have a crucial role in maintaining a low dUTP/ dTTP ratio in the cell in order to limit the incorporation of deoxyuridylate into DNA by DNA polymerases. Misincorporation of deoxyuridine can impair DNA integrity by producing a base pair change of C:G into U(T):A and, thus, a stable point mutation (for reviews, see Refs. 45 and 46).
Regarding dTTP synthesis, the precursor for de novo pyrimidine biosynthesis is the cytosine ring that must undergo deamination to result in uracil, upon which the methyl group can be added to produce thymine. Cytosine deamination may occur at different levels; one is through the direct input provided by dCMP deaminases, directly producing dUMP, widespread in eukaryotes and most Gram-positive bacteria. The other pathway, common to Gram-negative bacteria, is at the dCTP level, catalyzed by dCTP deaminases, producing dUTP that must be converted subsequently into dUMP by a dUTPase. For some organisms, such as Mycobacteria and Plasmodia, this is the only course in obtaining dUMP (47,48). In the first path, however, dUMP supply from the dUTPase-catalyzed reaction has a minor role. Fig. 8 illustrates the de novo and salvage pathways for dTTP biosynthesis. D. radiodurans genes that could be identified with the various enzymes participating in these pathways are represented. Although D. radiodurans has many features reminiscent of Gram-negative bacteria and many others that are entirely unique, it is classified as Gram-positive (49,50). Accordingly, we could not identify a gene that encodes a dCTP deaminase with significant homology with those known to exist in Gram-negative bacteria. This suggests that in D. radiodurans, the de novo production of dTMP relies mostly on the dCMP pathway or on the salvage alternative directly from thy-midine. We propose that the dUTPase activity detected in DR2231 protein may correspond to the house-cleaning task of removing dUTP arising from deamination of dCTP but also unidentified non-canonical NTPs occasionally. It would be interesting to assert the latter experimentally with naturally occurring oxidative products of cytosine, such as 5-hydroxycytosine and 5-hydroxyuracil, for example.
There are reasons to believe that the catalytic mechanism of DR2231 protein is similar, if not identical, to that of C. jejuni dUTPase (6). The orientation of the product relative to the metal-binding site is very similar to that observed in the structure of C. jejuni dUTPase complexed with dUpNHpp. By comparing DR2231_dUMP_Mn structure with C. jejuni structure, more information may be inferred; W1 possibly occupies the position of the ␥-phosphate of dUTP after catalysis, and another water molecule, W2 (water molecule 107), bound to O ␦2 of Asp 82 may correspond to the catalytic water identified in C. jejuni dUTPase, thus regenerating the active site. A third manganese ion was never observed in our co-crystallizations with dUMP. Still, the residues that bind Mg3 in C. jejuni dUTPase (Glu 49 and Glu 79 ) are aligned with those of DR2231 protein, Glu 79 and Glu 50 , leaving open the possibility of a third, albeit labile, coordination site, probably occupied when substrate is bound. 6 There are other elements in the pocket that provide clues as to being involved in ␤and ␥-phosphate binding and stabilization. These residues are possibly His 78 and Lys 125 , the latter from the second subunit. We further postulate that one of the most crucial functional residues is Lys 112 , located at the end of H4. Along with Asn 109 , these two residues have a dual role; they provide both orientation and stabilization of the deoxynucleotide. As the substrate is bound, it induces conformational alterations upon the hinge, forcing the entire latch to move and close upon the binding pocket.
From sequence alignment analysis of a non-redundant database, it is clear that only putative DR2231 family members show high homology for the "lid" region ( Fig. 9). This structural feature is unique to the DR2231 subfamily and can be considered as its identity signature.
The affinity of DR2231 for Mn 2ϩ is not entirely understood. It has been reported that D. radiodurans accumulates very high intracellular manganese levels and that such accumulation may be radiation-and dessication resistance-related (51). This accumulation suggests that Mn(II) facilitates recovery from radiation injury. In particular, it is essential for the detoxification of reactive oxygen species in most bacteria and also by preventing the production of iron-dependent reactive oxygen species (52). More recently, other studies showed that the extraordinary robustness to proteome oxidation (i.e. protein carbonylation) of D. radiodurans depends on efficient proteome protection (not DNA protection) conferred by low molecular weight cytosolic compounds (molecular species of less than 3 kDa) (53). Daly et al. (54) showed that ultrafiltered, protein-free preparations of D. radiodurans cell extract prevent protein oxidation at massive doses of ionizing radiation compared with those of radiation-sensitive bacteria, with no protective effect at all. The D. radiodurans ultrafiltrate is enriched in manganese, phosphate, nucleosides, bases (of which the most highly represented are uridine, uracil, adenosine, and inosine), and peptides.
As a putative house-cleaning enzyme, we may speculate whether DR2231 protein expression may not be connected to situations of oxidative stress when Mn 2ϩ intracellular levels are high. It should be noted, however, that according to reported transcriptome dynamics and proteomic analysis studies, DR2231 protein is not among those gene products identified to overexpress when exposed to prolonged irradiation (55,56).
The gene product of DR2231 may not be an essential protein for D. radiodurans concerning the pyrimidine metabolic pathway, but it ensures cell survival by balancing the dUTP availability in the cell pool and delivering dUMP, processed also into uridine and uracil elsewhere, as well as inorganic phosphate, important in proteome radioprotection (53). It stands out in the genome of D. radiodurans as one among many in the set of NTP pyrophophatases that have been possibly acquired through horizontal gene transfer. The findings reported here on the structure and function of DR2231 protein support the view that the MazG-like subunit may be considered as the common ancestor for other family members, in particular, dimeric dUTPases.