Functional Diversity of Haloacid Dehalogenase Superfamily Phosphatases from Saccharomyces cerevisiae

Background: Haloacid dehalogenase (HAD)-like hydrolases represent the largest superfamily of phosphatases. Results: Biochemical, structural, and evolutionary studies of the 10 uncharacterized soluble HADs from Saccharomyces cerevisiae provided insight into their substrates, active sites, and evolution. Conclusion: Evolution of novel substrate specificities of HAD phosphatases shows no strict correlation with sequence divergence. Significance: Our work contributes to a better understanding of an important model organism. The haloacid dehalogenase (HAD)-like enzymes comprise a large superfamily of phosphohydrolases present in all organisms. The Saccharomyces cerevisiae genome encodes at least 19 soluble HADs, including 10 uncharacterized proteins. Here, we biochemically characterized 13 yeast phosphatases from the HAD superfamily, which includes both specific and promiscuous enzymes active against various phosphorylated metabolites and peptides with several HADs implicated in detoxification of phosphorylated compounds and pseudouridine. The crystal structures of four yeast HADs provided insight into their active sites, whereas the structure of the YKR070W dimer in complex with substrate revealed a composite substrate-binding site. Although the S. cerevisiae and Escherichia coli HADs share low sequence similarities, the comparison of their substrate profiles revealed seven phosphatases with common preferred substrates. The cluster of secondary substrates supporting significant activity of both S. cerevisiae and E. coli HADs includes 28 common metabolites that appear to represent the pool of potential activities for the evolution of novel HAD phosphatases. Evolution of novel substrate specificities of HAD phosphatases shows no strict correlation with sequence divergence. Thus, evolution of the HAD superfamily combines the conservation of the overall substrate pool and the substrate profiles of some enzymes with remarkable biochemical and structural flexibility of other superfamily members.

The haloacid dehalogenase (HAD)-like enzymes comprise a large superfamily of phosphohydrolases present in all organisms. The Saccharomyces cerevisiae genome encodes at least 19 soluble HADs, including 10 uncharacterized proteins. Here, we biochemically characterized 13 yeast phosphatases from the HAD superfamily, which includes both specific and promiscuous enzymes active against various phosphorylated metabolites and peptides with several HADs implicated in detoxification of phosphorylated compounds and pseudouridine. The crystal structures of four yeast HADs provided insight into their active sites, whereas the structure of the YKR070W dimer in complex with substrate revealed a composite substrate-binding site. Although the S. cerevisiae and Escherichia coli HADs share low sequence similarities, the comparison of their substrate profiles revealed seven phosphatases with common preferred substrates. The cluster of secondary substrates supporting significant activity of both S. cerevisiae and E. coli HADs includes 28 common metabolites that appear to represent the pool of potential activities for the evolution of novel HAD phosphatases. Evolution of novel substrate specificities of HAD phosphatases shows no strict correlation with sequence divergence. Thus, evolution of the HAD superfamily combines the conservation of the overall substrate pool and the substrate profiles of some enzymes with remarkable biochemical and structural flexibility of other superfamily members.
Characterization of proteins with unknown functions is one of the major challenges to modern biology (1). Global genome and metagenome sequencing efforts have already produced millions of new sequences, from which 30 -40% of the genes have no known function (2)(3)(4). Even for the two best characterized model organisms, Escherichia coli and Saccharomyces cerevisiae, ϳ20% of their genes have no function assigned or have only a general function predicted (e.g. putative hydrolase), and experimental data are available for only 54% of E. coli proteins (5,6). In addition, a substantial and growing number of genes have inaccurate functional annotations. Focused analyses of new sequences for members of 37 protein families deposited in 2005 have shown that about 40% of these proteins remain misannotated (7,8). Our knowledge gap also includes over 1,000 of the known enzyme activities in the Enzyme Classification (of the 4,997 EC numbers) that have no associated gene sequence (orphan enzymes) (9,10).
To infer gene function, complementary computational and experimental approaches and their combinations have been used, including sequence analysis, comparative genomics, gene expression and disruption, protein interaction, and protein structure, but ultimately all depend on experimental testing (3,(11)(12)(13)(14)(15)(16)(17)(18)(19)(20)(21). Recently, the Computational Bridge to Experiments (COMBREX) consortium coordinated a community of computational and experimental scientists to generate functional predictions for the most interesting families of unknown proteins and to carry out experimental characterization (22). Another large scale project, the Enzyme Function Initiative (EFI) merges experimental approaches with computationally based function predictions (23). The EFI combines bioinformatics, protein structure analysis, and enzymology to assign reliable functions to unknown enzymes from microbial genomes. Both projects have considerable potential to accelerate functional characterization of unknown proteins.
Most enzymes are organized into families of sequence-related proteins whose members catalyze the same or similar reactions but have evolved different substrate preferences and specific biological functions. These families represent a challenge for functional annotation because their catalytic activities and substrates are likely to be very similar. One such family are haloacid dehalogenase (HAD) 2 -like hydrolases, which represent one of the largest enzyme superfamilies found in all organisms, with 479,051 sequences in databases (InterPro IPR023214) and 33 major families (24,25). Most genomes are predicted to contain multiple HAD-like proteins, including 28 genes in E. coli, 45 genes in S. cerevisiae, and 183 genes in humans (26). This superfamily was originally named after the haloacid dehalogenases, but it is dominated by putative phosphatases (ϳ79%) and ATPases (ϳ20%) and also includes phosphonatases and phosphomutases (25)(26)(27). Specifically, the HAD-like phosphatases are a family of diverse enzymes that are responsible for the majority of metabolic phosphomonoester hydrolysis reactions in all kingdoms of life (26,28). Extensive sequence comparisons show that the HAD-like hydrolases are defined by the presence of four conserved sequence motifs with the characteristic N-terminal motif I containing two Asp residues (DXD), whereas motifs II and III contain a highly conserved Thr (or Ser) and Lys, respectively (24,25). Most HADlike hydrolases contain a highly conserved ␣/␤ core domain with a mobile cap domain, which can be inserted between motifs I and II (type I or C1) or between motifs II and III (type II or C2) of the core domain (25,29). Type III HAD-like hydrolases have no cap domain (C0).
HAD-like phosphatases can perform different cellular roles, including primary and secondary metabolism, regulation of enzyme activity or protein assembly, cell housekeeping, and nutrient uptake (26). To date, many HAD-like phosphatases from different organisms have been characterized both biochemically and structurally, including the phosphoserine phosphatase SerB from Methanococcus jannaschii (30), phosphogly-colate phosphatase TA0175 from Thermoplasma acidophilum (31), UMP nucleotidase NagD from E. coli (32), and inorganic pyrophosphatase BT2127 from Bacteroides thetaiotaomicron (33). In particular, phosphoprotein phosphatase activity has been demonstrated for several eukaryotic HAD-like hydrolases: human CTDP1 (or FCP1; Ser(P) phosphatase) (34), Drosophila and mammalian Eyes absent (Tyr(P) phosphatase) (35)(36)(37), and mammalian chronophin (Ser(P) phosphatase) (38,39). In a previous study, we have experimentally characterized the substrate specificities of the 19 soluble HAD-like phosphatases from E. coli and demonstrated that most of the E. coli HADs show remarkably broad and overlapping substrate profiles, being active against phosphorylated carbohydrates, nucleotides, organic acids, and coenzymes (40). A phylogenetic analysis of the E. coli HADs suggested that their secondary activities might have no direct physiological function, but they could comprise a reservoir for the evolution of phosphatases with novel specificities.
Here, we present the results of biochemical and structural studies of soluble HAD-like hydrolases from S. cerevisiae with a focus on the 10 uncharacterized HADs. Eight previously uncharacterized HADs had phosphatase activity. Additional substrates were found for the five known S. cerevisiae HAD phosphatases. Collectively, these were active against 2-phosphoglycolate, thiamine monophosphate, pyridoxal phosphate, phosphoserine, glycerol 1-phosphate (Gly-1-P), nucleotides, and phosphorylated peptides. The crystal structures of four S. cerevisiae HADs provided insight into the molecular basis of their substrate specificity. Although the S. cerevisiae and E. coli HADs share low overall sequence similarities, these enzymes show remarkable conservation of their biochemical activities.

Experimental Procedures
Gene Cloning, Protein Purification, and Mutagenesis-The genes encoding 15 selected yeast HADs (Table 1) were amplified by PCR from S. cerevisiae genomic DNA and cloned into a modified pET15b vector (Novagen) as described previously (40). Purification of proteins for screening and biochemical characterization was performed as described previously (48). The oligomeric state of purified proteins was determined using gel filtration analysis on a Superdex 200 10/300 column (GE Healthcare). Site-directed mutagenesis of YKR070W was performed using a protocol based on the QuikChange site-directed mutagenesis kit (Stratagene) as described previously (48). The presence of mutations was verified by DNA sequencing, and the mutant proteins were overexpressed and purified in the same manner as the wild-type YKR070W.
Enzymatic Screens and Assays-Purified yeast HADs were initially screened for the presence of phosphatase activity against the general phosphatase substrates pNPP and acetyl phosphate as described previously (40). Secondary phosphatase screens with 93 phosphorylated metabolites (supplemental Table 2) were then performed to identify the preferred in vitro substrates for these proteins (40). Phosphatase activity against phosphorylated metabolites and phosphopeptides was measured spectrophotometrically using the malachite green reagent or a mild phosphate detection method (48). The dependence of phosphatase activity on divalent metal cations was determined using saturating concentrations of the indicated substrates and cations (5 mM Mg 2ϩ or 0.5 mM for other ions), whereas the pH dependence was determined using a mixed buffer system (MEGA buffer) (49). For the determination of kinetic parameters (K m and k cat ), enzymatic assays were performed using a range of substrate concentrations (0.05-10 mM) or a range of metal ion concentrations (0.005-2.5 mM) in the presence of saturating substrate concentrations (0.5-0.8 mM). Kinetic parameters were calculated by non-linear regression analysis of raw data to fit to the Michaelis-Menten function using GraphPad Prism Software (version 4.00 for Windows, GraphPad Software, San Diego, CA). A heat map of HAD activity was performed using the gplots heatmap.2 function, whereas hierarchical clustering of HADs (based on substrate profiles) was calculated using Euclidian distance, and groups were clustered using the complete linkage method (R Foundation for Statistical Computing).
The reactions were performed in a reaction mixture (20 l final volume) containing 50 mM HEPES-K (pH 7.5), 10 mM CoCl 2 , 1 mM pseudouridine triphosphate (⌿-UTP), and 2.5 g of YOR111W or BSU28050 for ⌿-UTPase reactions (to generate ⌿-UMP) followed by the addition of purified yeast HADs (PHM8, SDT1, or YKL033W-A; 5 g) for ⌿-UMPase assays. The reactions were carried out at 30°C for 2 h and analyzed using reversed phase chromatography on a Varian ProStar HPLC system equipped with a Varian Pursuit C18 column as described previously for Maf proteins (50). Standard solutions of ⌿-UTP and ⌿ were used to confirm the identity of the observed peaks and products.
The S. cerevisiae haploid mutant ycr015c⌬ (genotype BY4741; Mat a; his3D1; leu2D0; met15D0; ura3D0; YCR-015c::kanMX4) was obtained from the Euroscarf collection, and the YCR015c deletion was verified by PCR. The ycr015c⌬ and the corresponding wild type strain BY4741 were grown at 30°C in a synthetic defined medium without thiamine (SD Ϫthiamine) that consisted of yeast nitrogen base without amino acids, without ammonium sulfate, and without thiamine (1.9 g/liter; ForMedium), ϪUra DO supplement (0.77 g/liter; Clontech), and uracil (180 mg/liter; Sigma) with glucose at 2% (w/v) final concentration. Cells were harvested when A 600 reached 0.6, frozen in liquid nitrogen, and stored at Ϫ80°C. Cell pellets were resuspended in 0.5 ml of 7.2% (v/v) perchloric acid and sonicated. The sonicate was held on ice for 15 min with periodic vortex mixing and then cleared by centrifugation at 4°C (2,000 ϫ g, 15 min). Thiamine and its phosphates were analyzed by oxidation to thiochrome derivatives followed by HPLC with fluorometric detection (51). The oxidation reagent was a freshly prepared solution of 12.14 mM potassium ferricyanide in 3.35 M NaOH. Samples or standards (160 l) were mixed with 15 l of methanol; 100 l of oxidation agent was added and mixed for 60 s, and 100 l of 1.43 M phosphoric acid was then added. The standards (thiamine, TMP, and thiamine pyrophosphate (TPP) dissolved in 0.1 M HCl) were made up in 7.2% (v/v) Analysis of Protein-Protein Interactions-The physical interactions of the S. cerevisiae YKR070W were analyzed using an affinity tagging and purification mass spectrometry approach essentially as described previously (52). The C-terminally GFPtagged YKR070W was purified using anti-GFP MicroBeads, and the associated proteins were analyzed using a high performance linear trap quadrupole Orbitrap Velos Pro mass spectrometer (Thermo Scientific, Waltham, MA). After filtering the data set with a confidence score of 95% probability and 2 or more peptides, we were able to retain 119 protein-protein interactions. This data set was merged with previously known interactions extracted from the literature and public databases (BioGRID, MINT, IntAct, and DIP), resulting in 127 unique protein-protein interactions. These were then subjected to the GO Slim Mapper deployed in SGD (53) to group the interacting proteins into specific processes.
Protein Crystallization-The S. cerevisiae HAD-like proteins were crystallized at room temperature using the sitting or hanging drop vapor diffusion protocols. The crystals of RHR2 were grown in a crystallization solution containing 100 mM Bistris propane buffer (pH 7.0) and 2.5 M ammonium sulfate. Mercurylabeled crystals were obtained by overnight soaking of crystals in 10 mM HgCl 2 followed by cryoprotection in 3.6 M ammonium sulfate and flash freezing in liquid nitrogen. The selenomethionine-labeled YKR070W (54) was crystallized in a solution containing 100 mM potassium/sodium phosphate (pH 6.2), and 26% (w/v) polyethylene glycol (PEG) 3350. The crystals were treated with paratone oil as the cryoprotectant and flash-frozen in liquid nitrogen. The crystals of the YKR070W complex with glycerol 3-phosphate (Gly-3-P) were obtained by co-crystallization in a solution containing 100 mM HEPES-K (pH 7.5), 1.2 M sodium citrate, 0.2 mM MgCl 2 , and 4 mM Gly-3-P (added to the crystallization sitting drop prior to set-up). The SDT1 crystals were grown using native protein in a solution containing 100 mM BisTris (pH 6.5) and 25% (w/v) PEG 2000 MME (monomethylether). Prior to freezing, 25% (v/v) ethylene glycol was added for cryoprotection. Mercury labeling was performed by adding 6 mM HgCl 2 to the crystallization drop followed by a 4-h incubation.
Data Collection, Structure Determination, and Refinement-Single-wavelength anomalous diffraction (SAD) data sets were collected for the RHR2 (Hg-SAD) and YKR070W (Se-SAD) crystals at 12.66 keV at the Structural Biology Center at the Advanced Photon Source (19-ID) (55). The low resolution data sets were collected on mercury-labeled SDT1 crystals (data not shown) using a Rigaku Micromax-007 HF generator equipped with a CR Raxis-4ϩϩ detector and producing chromium K␣ radiation, whereas the refinement of the SDT1 structure was performed with native crystal data sets collected using copper K␣ radiation. Data collection, integration, and scaling on all data were performed with the HKL3000 suite of programs (56).
The initial phases of RHR2, SDT1, and YKR070W were determined by SAD phasing, and the initial protein models were built using the HKL3000 software package (57)(58)(59)(60)(61)(62). A summary of the crystallographic data can be found in Table 2. The structure of the YKR070W⅐Gly-3-P complex was solved by molecular replacement using the program MOLREP from the CCP4 program suite and the YKR070W structure as a search model (60,63). All protein models required substantial rebuilding and refining using the program COOT in order to obtain the final models (57). The models were refined against all reflections in the resolution range except for a randomly selected 5% of reflections, which were used for monitoring R free . The quality of the structures was checked using the validation tools included in the programs COOT and Molprobity (64). The final refinement statistics are shown in Table 2.
Accession Numbers-The atomic coordinates and structure factors have been deposited in the Protein Data Bank with accession codes 2QLT (RHR2), 3NUQ (SDT1), 3KC2 (the wild type YKR070W), and 3RF6 (the YKR070W D19A in complex with Gly-3-P).

Results
The Complement of Soluble HADs in S. cerevisiae-The S. cerevisiae genome encodes at least 45 HAD-like hydrolases, of which 19 proteins have been predicted to be membranebound ATPases (PEDANT database) (supplemental Table 1). Previously characterized yeast HADs include the RNA polymerase II subunit A C-terminal domain phosphatase FCP1 (YMR277W); the phosphatidate phosphatase PAH1 (YMR165C); the three trehalose-phosphate synthase-phosphatases TPS2 (YDR074W), TSL1 (YML100W), and TPS3 (YMR261C); the PAH1 protein phosphatase NEM1 (YHR004C); the DNA 3Ј-phosphatase TPP1 (YMR156C); the mitochondrial phosphatidylglycerophosphatase GEP4 (YHR100C); the phosphomannomutase SEC53 (YFL045C); the pyrimidine-and NMN-specific 5Ј-nucleotidases ISN1 (YOR155C) and SDT1 (YGL224C); and the proteinfructosamine-6-phosphatase MDP-1 (YER134C) (43)(44)(45)(46)(47)(65)(66)(67)(68)(69)(70). In addition, specific substrates have been identified for two pairs of paralogous HADs, namely the 2-deoxyglucose-6-phosphate phosphatases DOG1 and DOG2 (92% sequence identity) and the glycerol-1-phosphate phosphohydrolases RHR2 (GPP1) and HOR2 (GPP2) (95% sequence identity) (41,42). Although purified PHM8 protein has been shown to specifically hydrolyze lysophosphatidic acid, the reported specific activities were at a low nanomolar level (up to 3 nmol min Ϫ1 mg Ϫ1 protein) (71). Sequence analysis of the 15 selected HAD-like hydrolases confirmed the presence of conserved HAD motifs in all of these proteins (supplemental Fig. 1). Based on the cap domain architecture, they mostly belong to the HAD C1 group (the cap domain is inserted between the first and second HAD motifs) with only two C2 proteins (the cap domain is located between the second and third HAD motifs) (Fig. 1). According to a previous sequence analysis (25), the yeast HADs of the C1 group can be further divided into five families, including PSP, Epo, BGPM, Dehr, and Eno. Members of each family that are likely to be (co)orthologs of the respective yeast HADs were also detected in E. coli and, for three of the families, in humans as well ( Fig. 1). Screening of Purified Yeast HADs for Phosphatase Activity-We expressed in E. coli and purified 10 uncharacterized S. cerevisiae HAD-like proteins as well as five HAD enzymes with identified substrates (DOG1, DOG2, HOR2, RHR2, and SDT1) to confirm these activities and to explore the possibility of their having additional activities ( Table 1). The proteins were affinity-purified to over 95% homogeneity and first analyzed for Mg 2ϩ -dependent phosphatase activity using the generic phosphatase substrate p-nitrophenyl phosphate (pNPP) (21). Except for YKL033W, most yeast HADs showed readily detectable Mg 2ϩ -dependent hydrolysis of pNPP (Table 1), with the highest activity observed for PHO13 (ϳ200 mol min Ϫ1 mg Ϫ1 protein), which has been annotated as a pNPP phosphatase (72). Like the E. coli HADs (40), the majority of S. cerevisiae HADs also hydrolyzed the small phosphodonor substrate acetyl phosphate (data not shown).
Several HAD domain-containing proteins have protein phosphatase activity (37)(38)(39)73). Therefore, we screened the purified yeast HADs against a library of 65 synthetic oligopeptides containing Ser(P), Thr(P), or Tyr(P) (supplemental Table  3). The sequences of the 46 phosphopeptides include the most common protein phosphorylation sites found in the S. cerevisiae phosphoproteome, whereas the remaining 19 phospho-peptides represent common protein phosphatase substrates related to various protein kinases (74,75). Four S. cerevisiae HADs showed Mg 2ϩ -dependent protein phosphatase activity, including SDT1 (Tyr(P)-and Ser(P)-containing peptides), YKR070W (Tyr(P)), YNL010W (Tyr(P) and Ser(P)), and YOR131C (Tyr(P) and Ser(P)) (supplemental Table 4). The targeted peptide sequences are present in the S. cerevisiae MAPKs HOG1, FUS3, and KSS1; in the regulatory proteins REG1 and MMF1; and in subunits of several complexes (Pat1p, RIF1, and NMD2). Although the presence of protein phosphatase activity has not yet been reported for E. coli HADs (25,40), this activity has been proposed for several microbial effector proteins with HAD domains from the pathogenic bacteria Porphyromonas gingivalis and Coxiella burnetii (76,77).
substrate for SDT1 (70). Indeed, our in vitro assays with purified SDT1 revealed high activity of this protein toward this substrate, which was comparable with that using CMP and XMP (Table 3). However, SDT1 showed lower K m for nucleoside monophosphates compared with NMN.
SER2 is annotated as a putative phosphoserine phosphatase based on sequence similarity to known Ser(P) phosphatases from the HAD superfamily. The S. cerevisiae SER2 shows 31.9% sequence identity to the E. coli SerB, but phosphoserine phosphatase activity of SER2 has not yet been verified experimentally. Our screens confirmed the Ser(P) phosphatase activity in purified SER2, which exhibited high catalytic activity against Ser(P) (Fig. 2, Table 3, and supplemental Table 5). SER2 and SerB had similar K m values for Ser(P), but SER2 was at least 2 times more active than SerB (Table 3) (40). The high catalytic efficiency of SER2 is consistent with its cellular function, because this enzyme catalyzes the last step in the biosynthesis of serine from carbohydrates (79).
We screened the remaining HADs and revealed phosphatase activity in seven proteins. Two enzymes showed nucleotidase activity (PHM8 and YKL033W), two had glycerol phosphatase activity (YNL010W and YKR070W), and enzymes were found with phosphohydrolase activity against 2-phosphoglycolate (P-glycolate; PHO13), thiamine monophosphate (thiamine-P; YCR015C), or pyridoxal 5-phosphate (PLP; YOR131C) ( Fig. 2). In addition to glycerol phosphates, YKR070W was also active against phosphorylated metabolites with 4 -6 carbon atoms as well as toward fructose 1,6-bisphosphate (Fig. 2). We could not detect phosphatase activity for UTR4 and YMR130W, although both proteins appear to be properly folded (based on their circular dichroism spectra; data not shown). This suggests that these (predicted) enzymes are specific for substrates that are missing in the substrate library used in this work. UTR4 is predicted to be involved in L-methionine biosynthesis, and its phosphatase domain is predicted to catalyze dephosphorylation of 5-(methylthio)-2,3-dioxopentyl phosphate (UniProt P32626). The natural substrate for YMR130W is not known. Other HAD members have ␤-phosphoglucomutase or dehalogenase activity, but none was detected for YMR130W (data not shown).
Similar to the E. coli HADs (40), phosphatase activities of yeast HADs against natural substrates had slightly acidic or neutral pH optima (pH 6.5-7.5) and were strictly dependent on the addition of a divalent metal cation, with Mg 2ϩ , Mn 2ϩ , and Co 2ϩ being the preferred metal ions for most enzymes (supplemental Table 6). In contrast to E. coli HADs, Ni 2ϩ also supported phosphatase activity of PHO13, PHM8, SDT1, and SER2, whereas Zn 2ϩ was found to be inefficient for all tested yeast HADs. The yeast HADs required higher concentrations of Mg 2ϩ for maximal activity (K d ϭ 0.03-1.4 mM, depending on substrate used, mostly 0.2-0.5 mM), whereas Mn 2ϩ , Co 2ϩ , and Ni 2ϩ were saturating at lower concentrations (K d ϭ 0.1-80 M) (supplemental Table 6). Mg 2ϩ was the best metal cofactor for most yeast HADs except for YKR070W, which showed a preference for Co 2ϩ . Like E. coli HADs, the yeast HADs usually showed higher K m in the presence of Mg 2ϩ and lower K m with other metal ions (supplemental Table 5). In addition, several yeast HADs (DOG1, DOG2, PHM8, SER2, YKL033W, YKR070W, and YOR131C) exhibited sigmoidal substrate saturation curves, suggesting positive cooperativity in substrate binding with Hill coefficients n H ϭ 1.4 -1.8 (supplemental Table 5). This is in line with the presence of dimers in these protein preparations (supplemental Table 5). Thus, both E. coli and yeast HADs exhibit notable promiscuity toward divalent metal cations and positive cooperativity in substrate binding. YKR070W, a Broad Substrate Range Mitochondrial Phosphatase-YKR070W is annotated as an unknown protein localized in mitochondria and has no homologs in E. coli. YKR070W shares 29% sequence identity with the uncharacterized human HAD-like protein CECR5, which is also localized in mitochondria and is associated with cat eye syndrome, a developmental disorder (UniProt Q9BXW7). Mitochondria are ubiquitous organelles that carry out many crucial processes in eukaryotic cells, including ATP production, gluconeogenesis, pentose phosphate pathway, and NAD metabolism (80). As shown in Fig. 2 and Table 3, purified YKR070W is a broad substrate range phosphatase with high activity against several phosphorylated carbohydrates and glycerol phosphates, which represent various intermediates of the respiratory, gluconeogenetic, and pentose phosphate pathways as well as NAD metabolism. Analysis of the YKR070W kinetic parameters revealed low K m values mostly below 1 mM, suggesting that this enzyme might contribute to the homeostasis of phosphorylated metabolites (Table 3 and supplemental Table 5). The intracellular levels of phosphorylated metabolites have to be tightly regulated because high levels are toxic and cause DNA damage and growth inhibition (81)(82)(83)(84). In our previous work on E. coli HAD phosphatases, we proposed that several enzymes (YniC, YfbT, YbiV, YidA, YjjG, YihX, and YigL) might have a detoxification function in E. coli and experimentally demonstrated this role for one protein (YniC) (40). Recently, the phosphosugar detoxification role has been demonstrated for E. coli YigL (85). In S. cerevisiae, the phosphatase activity of YKR070W against of a broad range of phosphorylated metabolites might represent a similar molecular mechanism for fast attenuation of phosphosugar stress by reducing their levels and promoting the efflux of dephosphorylated products.
Another potential in vivo role of YKR070W might be associated with its ability to dephosphorylate Tyr(P)-containing phosphopeptides demonstrated in this work (supplemental Table 4). Using affinity tagging and purification mass spectrometry, we identified a high number of protein-protein interactions of YKR070W with yeast proteins involved in metabolism, protein folding, translation, ribosomal RNA processing, transport, and phosphorylation, including the HAD-like glycerol-3-phosphatase HOR2 (Fig. 3). Presently, 36 protein phosphatase genes are known in S. cerevisiae, including 21 Ser(P)/Thr(P)-specific phosphatases and 14 Tyr(P)-specific phosphatases, none of which is related to the HAD superfamily (86,87). Mitochondrial localization has been demonstrated for the three yeast PP2C-like (Ser(P)/Thr(P)-specific) protein phosphatases PTC5 (YOR090C), PTC6 (AUP1, YCR079W), and PTC7 (YHR076W) (88) (Saccharomyces Genome Database). Human mitochondria have been shown to contain several non-HAD-related protein Ser(P)/Tyr(P)-specific protein phosphatases, including PP2Cm (PPM1K, PP2C family), PGAM5, and the Tyr(P)-specific protein phosphatase PTPMT1, which play important roles in ATP production, insulin secretion, and cell death regulation (89 -91). Thus, YKR070W might also function as a protein phosphatase in S. cerevisiae.
PHM8 and YKL033W-A, Novel Yeast Nucleotidases-Previously, PHM8 (phosphate metabolism protein 8) has been reported to exhibit low nanomolar phosphatase activity against lysophosphatidic acid (71). However, this protein shares significant sequence similarity with the S. cerevisiae nucleotidase SDT1 (41.5% identity), and our screens with purified PHM8 revealed a much higher phosphatase activity against nucleotides, in the order CMP Ͼ XMP Ͼ GMP ϭ UMP ( Fig. 2 and Table 3) (92). With CMP as substrate, the enzyme showed hyperbolic saturation, whereas with XMP, sigmoidal saturation was observed with the Hill coefficient n H ϭ 1.4 -1.8, indicating positive cooperativity between the PHM8 subunits in XMP binding. This finding is consistent with the oligomeric state of PHM8 in solution, which is dimeric, as indicated by size exclusion chromatography (observed M r of 64,000 predicted monomer M r of 39,000).
Phosphatase screens with purified YKL033W-A identified XMP as the best in vitro substrate for this protein (Fig. 2). YKL033W-A is annotated as a protein of unknown function (Saccharomyces Genome Database) or uncharacterized hydrolase (UniProt Q86ZR7) with low sequence similarity to both SDT1 and PHM8 (20 and 19% sequence identity, respectively). Comparison of the catalytic parameters shows that YKL033W-A is a less efficient XMP phosphatase, due to a low k cat value and high apparent K m (Table 3). Lower catalytic efficiencies of YKL033W-A observed with XMP and other substrates suggest that these could be secondary activities for this protein, and its primary substrate was not included in our screen. The other new yeast nucleotidase PHM8 (YER037W) exhibited a higher level of nucleotidase activity with a substrate profile similar to that of SDT1 (Fig. 2). These two phosphatases have different roles in yeast cells, with SDT1 shown to be responsible for the production of nicotinamide riboside and nicotinic acid riboside as well as for removal of toxic 6-or 5-modified pyrimidines, whereas PHM8 is a nucleotidase involved in autophagy and ribose salvage (45,92).
Presently, over 100 types of RNA modifications have been characterized, with pseudouridine (⌿) being the most abundant modified base found both in non-coding RNAs and mRNAs (93,94). Pseudouridine stabilizes the structure of transfer RNA and ribosomal RNA, enhancing their function (95). In S. cerevisiae and humans, the formation of pseudouridine in mRNA is regulated by environmental signals, suggesting a mechanism for the rapid regulation of protein synthesis and regulated rewiring of the genetic code (93). Because of the abundance of pseudouridine, many organisms have evolved various pseudouridine-metabolizing enzymes (96). Pseudouridine monophosphate (⌿-UMP or pseudouridine 5Ј-phosphate) produced in the course of RNA degradation can be phosphorylated by promiscuous kinases to ⌿-UTP, which needs to be removed from the cellular nucleotide pool to prevent its uncontrolled incorporation into new RNAs (97,98). Recently, we have identified the presence of ⌿-UTP pyrophosphatase activity in several Maf proteins, including yeast YOR111W and human ASMTL-Maf, which produced ⌿-UMP and pyrophosphate as reaction products (50). In the present work, we found that the addition of purified YKL033W-A, PHM8, or SDT1 to a reaction mixture with a Maf protein (yeast YOR111W or BSU28050 from B. subtilis) and ⌿-UTP resulted in dephosphorylation of the produced ⌿-UMP and formation of pseudouridine (⌿) (Fig. 4). These results suggest that the yeast nucleotidases YKL033W-A, PHM8, and SDT1 possess pseudouridine 5Ј-phosphatase activity and together with YOR111W might constitute a pathway for the detoxification of ⌿-UTP and ⌿-UMP in S. cerevisiae. A similar pathway comprising the human proteins ASMTL-Maf and pseudouridine 5Ј-phosphatase HDHD1 can also be proposed for humans (50,99). HDHD1 also belongs to the HAD superfamily and shows low sequence similarity to YKL033W-A (35% sequence identity), PHM8 (19% sequence identity), and SDT1 (17% sequence identity).
PHO13, a Phosphoglycolate Phosphatase-PHO13 is annotated as 4-nitrophenyl phosphatase (UniProt P19881) or as an alkaline phosphatase active against pNPP and phosphorylated histone II-a and casein (YDL236W, Saccharomyces Genome Database), with the latter annotation based on experiments with a partially purified protein (100). However, our results demonstrated that the recombinantly expressed and purified PHO13 has a neutral pH optimum for pNPP hydrolysis and exhibits no dephosphorylation activity against phosphopeptides. Furthermore, our screens revealed that purified PHO13 is highly active against P-glycolate, with micromolar activity toward several secondary substrates, including Gly-2-P, imidodiphosphate, Gly-1-P, Gly-3-P, 3-phosphoglycerate, and phosphoenolpyruvate (PEP) (Fig. 2). In all organisms, 2-phosphoglycolate is produced during the repair of DNA apurinic/ apyrimidinic (AP) sites as well as during photorespiration in plants. The AP sites are the most frequent DNA lesions that can be formed spontaneously or as a consequence of the removal of damaged bases during DNA repair (101). The AP sites are primarily repaired via the base excision repair and nucleotide excision repair pathways (102). These pathways produce various intermediates with blocked 3Ј-ends, including those with 3Ј-phosphoglycolate termini, which are removed through a 3Ј-phosphodiesterase activity of AP endonucleases, producing 2-phosphoglycolate as one of the products (101,102). In E. coli, there are two major AP endonucleases (exonuclease III and endonuclease IV), whereas the HAD-like P-glycolate phosphatase Gph has been identified as a housekeeping enzyme responsible for the hydrolysis of produced 2-phosphoglycolate (103). In S. cerevisiae, the major endonuclease responsible for the repair of AP sites is APN1 (YKL114C), which has several cata-lytic activities, including 3Ј-phosphodiesterase activity, which excise the 3Ј-phosphoglycolate group (101). Our results show that PHO13 exhibits high catalytic efficiency toward 2-phosphoglycolate in vitro, suggesting that in S. cerevisiae, this enzyme might be involved in the hydrolysis of 2-phosphoglycolate produced by the major AP endonuclease APN1 in the base excision repair pathway.
It has been reported that deletion or inactivation of PHO13 improved growth and production of ethanol by S. cerevisiae from D-xylose (104 -106). Based on a higher phosphatase activity of crude extracts from S. cerevisiae cells overexpressing PHO13 against xylulose 5-phosphate, it has been proposed that PHO13 dephosphorylates xylulose 5-phosphate, creating a futile cycle with xylulokinase (106). We could not test this hypothesis because xylulose 5-phosphate is not commercially available. Another possible explanation for the increased ethanol production by the PHO13 deletion strains is based on the fact that 2-phosphoglycolate is a competitive inhibitor of Gly-3-P dehydrogenase, an enzyme involved in the production of glycerol in S. cerevisiae (GPD1 and GPD2) and other organisms (107). Deletion of PHO13 might increase the intracellular level of 2-phosphoglycolate in S. cerevisiae cells, thereby inhibiting the activity of GPD1 and GPD2 and increasing carbon flow in the ethanol production branch.
YCR015C, a Thiamine Monophosphate Phosphatase-YCR-015C is annotated as a protein of unknown function (UPF0655; UniProt: P25616), and our screens with purified YCR015C revealed the presence of high phosphatase activity against thiamine monophosphate (TP) (Fig. 2 and Table 3). Like most prokaryotes and plants, S. cerevisiae can synthesize thiamine (vitamin B 1 ) and its active form TPP de novo (108,109). The two separate branches of the TPP biosynthetic pathway generate the thiazole and pyrimidine moieties, which are then joined to produce TP. In S. cerevisiae and plants, TP is dephosphorylated by an unknown phosphatase producing thiamine, which is then pyrophosphorylated by the thiamine pyrophosphokinase THI80 (YOR143C) to form TPP (108). The dephosphorylation of TP has been proposed to be catalyzed by nonspecific phosphatase(s) located in the cytosol (108). Given that purified YCR015C showed low K m for TP in vitro (19 M) (Table 3), it might represent the missing phosphatase involved in the final stage of TPP synthesis. However, our HPLC analyses revealed no significant increase in the TP level in extracts of the YCR015C deletion strain (supplemental Fig. 2), suggesting that there might be another phosphatase(s) in S. cerevisiae that dephosphorylates TP. Alternatively, YCR015C might dephosphorylate "damaged" forms of TP, as has been recently demonstrated for the S. cerevisiae Nudix hydrolase YJR142W, which is up to 60-fold more active against the toxic TPP degradation products oxy-and oxo-TPP compared with TPP (51). Further studies using "damaged" TP forms are required to test this hypothesis.
YOR131C, a Pyridoxal Phosphate Phosphatase-YOR131C is annotated as a putative uncharacterized hydrolase (UniProt Q12486). Our enzymatic screens with purified YOR131C revealed the presence of phosphatase activity against pyridoxal 5-phosphate (PLP), which is the active form of vitamin B 6 , and Tyr(P)-containing peptides (Fig. 2). The biochemically characterized human PLP phosphatase PDXP (also known as chronophin, CIN) shows less than 20% sequence identity with YOR131C, in contrast to 30% identity with the yeast phosphoglycolate phosphatase PHO13. This discrepancy once again demonstrates that at low sequence similarity (roughly less than 30% identity with HADs), homology-based prediction of enzyme substrate specificity (as opposed to the general type of the catalyzed reaction) often results in erroneous functional annotations. PLP is the active form of vitamin B 6 , which is an essential cofactor in all organisms involved in a broad variety of enzymatic reactions (110). The intracellular level of PLP is mainly controlled by its synthesis, binding to enzymes, and degradation by phosphatases (111,112). Given that purified YOR131C showed high catalytic activity and low K m to PLP (Table 3), we propose that this protein is the missing PLP phosphatase that is involved in PLP catabolism in S. cerevisiae (EC 3.1.3.74).
The human PLP phosphatase PDXP (CIN) has been shown to also be active as a Ser(P)-specific protein phosphatase that directly dephosphorylates cofilin, a key regulator of actin polymerization that is present in all eukaryotes (38,113,114). The activity of human cofilin is regulated by phosphorylation (inactivation) at the conserved Ser-3 by specific kinases and dephosphorylation (activation) by the unrelated phosphatases CIN and SSH (38,115,116). In our assays, the S. cerevisiae PLP phosphatase YOR131C was active against phosphorylated peptides containing Tyr(P) or Ser(P), suggesting that it might also function as a protein phosphatase (supplemental Table 4). However, systematic mutational analysis of the S. cerevisiae cofilin indicated that in contrast to vertebrates, yeast cofilin appears not to be phosphorylated at the conserved N-terminal Ser residue (Ser-4), whereas the YOR131C deletion strain was found to be viable (117,118). Thus, if YOR131C also functions as a protein phosphatase in S. cerevisiae, it is likely that this enzyme targets different (not cofilin) proteins.
YNL010W, a Novel Yeast Glycerol Phosphate Phosphatase-Several yeast HAD-like hydrolases were found to be capable of dephosphorylating glycerol phosphates, including both the known (RHR2/GPP1 and HOR2/GPP2) and newly characterized (YNL010W, PHO13, and YKR070W) phosphatases (Fig.  2). The two known yeast glycerol phosphate phosphatases (RHR2 and HOR2) dephosphorylated Gly-3-P as the preferred substrate but were also able to hydrolyze Gly-1-P, albeit with lower activities. Gly-1-P and Gly-3-P are enantiomeric isomers, of which Gly-1-P is typically found in archaea, whereas Gly-3-P is present in bacteria and eukaryotes (119). However, a recent study has demonstrated the presence of Gly-1-P dehydrogenase in B. subtilis (120), indicating that Gly-1-P is not exclusive to archaea. Our screens have identified a novel glycerol phosphate phosphatase in S. cerevisiae, YNL010W, which showed a preference for Gly-1-P in vitro but was also active against Gly-3-P (Fig. 2). Glycerol is the main compatible solute in S. cerevisiae, which is important for osmoregulation, stress response, carbon metabolism, redox balance, and lipid synthesis (121). HOR2 and RHR2 are required for glycerol synthesis and are involved in responses to osmotic, anaerobic, and oxidative stress (122). Our results suggest that YNL010W might contribute to these processes and may also have additional functions in S. cerevisiae. It has been shown that deletion of both gene copies of YNL010W leads to an increase in glycogen accumulation (123). This effect might be due to the ability of YNL010W to dephosphorylate erythrose-4-P (Fig. 2) or Ser(P)/Tyr(P)-containing peptides (supplemental Table 4).
Structural Analysis of Yeast HAD Phosphatases-The high resolution crystal structures of the three yeast HAD phosphatases RHR2 (GPP1, 1.60 Å resolution), SDT1 (1.60 Å), and YKR070W (1.55 Å) were determined by the SAD method (Table 2). In addition, the crystal structure of the yeast UTR4 has been solved by the Joint Center for Structural Genomics (PDB code 2G80, resolution 2.28 Å). The crystal structures of four yeast HADs revealed the general topology of the HAD hydrolase fold, which forms a three-layer ␣␤␣ sandwich with the ␣,␤ core Rossmannoid domain containing a six-stranded parallel ␤-sheet flanked by five or more ␣-helices on both sides (Fig. 5). The crystal structures also revealed the presence of a cap domain, which can be classified as type C1 in RHR2, UTR4, and SDT1 (inserted between HAD motifs 1 and 2) or type C2 in YKR070W (inserted between HAD motifs 2 and 3) (Fig. 5). The cap domains of the first three proteins represent a five-or sixhelix bundle, whereas YKR070W has an ␣,␤ three-layer sandwich cap domain of similar size to that of the core domain (Fig.  5). Previous analysis of the available structures of HAD-like hydrolases indicated that ϳ60% of them are likely to form dimers or higher oligomers, whereas the other proteins appeared to be monomeric (124). In the human PLP/protein phosphatase chronophin, homodimerization is essential for the proper positioning of a conserved His residue in the substrate specificity loop (124). The structures of four yeast HADs suggested a monomeric state for RHR2, SDT1, and UTR4 and a dimeric state for YKR070W with the cap domain of one protomer extending/reaching close to the core domain of the second protomer (Fig. 5).
Despite low sequence similarity between the four yeast HADs (up to 21% sequence identity), the superposition of their core Rossmann-like domains revealed high conservation of the overall fold (average r.m.s. deviation of ϳ2.4 Å). The conservation of the structural elements that form the active site loops, the central ␤-sheet region, and the "squiggle" (formed by a 6-residue helical turn, which is located after the first strand of the central ␤-sheet) was even higher, with an average r.m.s. deviation of 0.9 Å signifying the importance of these elements for catalytic activity of HAD phosphatases (Fig. 6A). In contrast, the cap domains of the four yeast HADs showed high structural diversity, which is consistent with different substrate specificities of these enzymes (Fig. 2).
In HAD C1 proteins, the cap domains are inserted in the "flap," allowing large conformational changes, including opening and closing of the active site (25). The closed active site is represented by the RHR2 structure showing the cap domain positioned close to the core domain, whereas the UTR4 structure revealed the active site in an open conformation (Fig. 5). In addition, the comparison of the SDT1 structure with the recently determined structure of this protein in complex with substrate (UMP) (125) displayed significant structural changes in its cap domain upon substrate binding (Fig. 6B). The major change is associated with the cap domain helices H1 and H2 with the initiation point located near Ser-69 (labeled) at the border of the squiggle region. Ser-69 is the last residue of the 6-residue squiggle structure, which undergoes conformational change of 3.5 Å during the open-closed state transitions. This triggers a ϳ3-Å movement of helix H1 (orange) to the new position HI (dark gray) and a 30°swing of the helix H2, moving its end 13 Å away (to the HII position). In contrast, the cap domain of the HAD C2 proteins, including YKR070W, is inserted into a rigid part of the core domain (between HAD motifs II and III). Accordingly, the structures of YKR070W, in complex with Gly-3-P (a substrate) or phosphate (a product), revealed limited conformational changes (Fig. 5).
A Dali search for four yeast HADs identified many similar structures of characterized and uncharacterized HAD-like hydrolases from different organisms that show low sequence similarity to yeast HADs (13-25% sequence identity). For RHR2, the top two characterized structural neighbors include the inorganic pyrophosphatase BT2127 from B. thetaiotaomicron (Z-score 20.7-21.3, r.m.s. deviation 3.0 Å, PDB codes 3QUB and 3QU2) and the 2-deoxyglucose 6-phosphatase YniC from E. coli (Z-score 20.9, r.m.s. deviation 3.0 Å, 18%  (magenta), and YKR070W (orange). The overlay shows close alignment of the structural elements involved in catalysis, including the active site loops (L1-L4) and bound metal ions (the Mg 2ϩ ion from the YKR070W structure is shown as a red sphere). The conserved structural elements squiggle and flap are positioned close to loop L1 (labeled). For clarity, the cap domains of these HADs are not shown, whereas the ␤-strands are numbered with Roman numerals. B, structural superposition of the apo-form and UMP binary complex of SDT1. We have determined the 1.70 Å structure of the apo-form of SDT1 (PDB 3NUQ; orange ribbon), whereas the structure of the SDT1-substrate complex with UMP is available from PDB (3OPX; shown as a gray ribbon with UMP shown as a space-filled model). Superimposition of these two structures revealed a similar overall structural fold (r.m.s. deviation 1.43 Å) with significant structural changes in the cap domain (indicated by red arrows). C, close-up view of the active site of YKR070W with bound Gly-3-P with 2F o Ϫ F c electron density map (gray) contoured at 1.0 . The bound Gly-3-P and residues are shown as sticks, the Mg 2ϩ ion as a green sphere, and two water molecules as red spheres. The molecule of Gly-3-P was omitted during the map calculation.
The location of the HAD active sites is indicated by the position of a metal ion (Mg 2ϩ , Ca 2ϩ , or Na ϩ ) which is bound at the bottom of the active site close to the side chains of conserved catalytic residues of the signature HAD motif (Fig. 7). The active site volumes calculated using CASTp (126) were found to be 460 Å 3 for RHR2, 966 -1003 Å 3 for YKR070W, 1737 Å 3 for UTR4, and 1698 -2096 Å 3 for SDT1. Thus, it appears that the HAD active site volumes show no correlation with their substrate promiscuity, which is the highest in YKR070W. The active sites of the four yeast HADs include four loops that support the catalytic residues. Loop 1 accommodates the Asp nucleophile, followed 2 residues downstream by the second Asp, which functions as a general acid-base. Loops 2 and 3 support the phosphoryl group binding residues Thr (or Ser) and Lys, whereas the two Mg 2ϩ -binding carboxylates are located on loop 4 (Fig. 7). The RHR2 active site also revealed the presence of a bound sulfate ion, whereas a phosphate was found in the YKR070W structure, suggesting that these groups mimic the position of the phosphate product (Fig. 7). In YKR070W, the Mg 2ϩ ion adopts an octahedral coordination geometry, being coordinated by the backbone carbonyl oxygen of Asp-21, the side chain carboxylates of Asp-19 and Asp-298, the phosphate group oxygen atom, and two water molecules (all bond lengths are less than 2.12 Å). The phosphate ion is coordinated by direct hydrogen bond interactions with the side chain atoms of Asp-19 (OD1-PO3, 2.6 Å), Asp-21 (OD2-PO4, 2.6 Å), Thr-52 (O-PO3, 2.7 Å), Lys246 (NZ-PO1, 2.9 Å), and the main chain amide groups of Asn-53 (N-PO4, 2.7 Å), and Asp-21 (N-PO3, 2.8 Å) (Fig. 7). The role of the HAD motifs and other conserved residues of the core and cap domains of YKR070W were analyzed using site-directed mutagenesis ( Fig. 8 and supplemental Table 7). Alanine replacement mutagenesis of the HAD motif residues in YKR070W produced inactive or almost inactive proteins (D19A, D21A, T52A, K246A, D298A, and D303A). In addition, a greatly reduced phosphatase activity was found in the YKR070W mutant proteins with mutations in Asn-53 and Asp-206, which are located in the core and cap domains, respectively, indicating that residues from both domains are important for the YKR070W activity ( Fig. 8 and supplemental Table  7). Mutations in the other residues of the YKR070W core and cap domains generated proteins with reduced or wild type activity (Lys-28, Arg-62, Phe-132, Asp-163, Asn-204, and Trp-209). Conservative amino acid replacements (e.g. F132Y, W209Y, or W209F) typically had less significant effects on the YKR070W activity and kinetic parameters compared with nonconservative mutations (e.g. K28E, R62E, or W209A) ( Fig. 8 and supplemental Table 7).
The Structure of the YKR070W Dimer in Complex with Gly-3-P Reveals a Composite Substrate Binding Site-To provide insight into the molecular mechanisms of substrate selectivity of yeast HADs, we crystallized the inactive YKR070W D19A protein in the presence of Gly-3-P. This crystal structure revealed the presence of two molecules of Gly-3-P and two Mg 2ϩ ions per dimer bound to the active site, representing a structure of an enzyme-substrate (Michaelis) complex (Fig. 9). The positions of the Gly-3-P phosphate group and Mg 2ϩ are very similar to those in the YKR070W structure in complex with phosphate (r.m.s. deviation 0.6 and 0.3 Å for the phosphate and Mg 2ϩ ions, respectively) (Fig. 7). The phosphate moiety of Gly-3-P is positioned near the Mg 2ϩ ion (2.1 Å) at the bottom of the active site pocket and is also coordinated by the side chains of Thr-52 (2.3 Å) and Lys-246 (2.7 Å) as well as by the main chain amide group of Asn-53 (2.5 Å) (Figs. 6C and 9). The glycerol moiety of Gly-3-P is bound mostly within the cap domain in a large cavity formed by the side chains of Thr-101, Asn-162, Asp-163, His-165, Asn-204, and Trp-209 (Fig. 9). The C2-OH group of Gly-3-P forms hydrogen bonds with the side chain ND2 nitrogen of Asn-204 (3.2 Å) and OD2 oxygen of Asp-206 (2.8 Å), whereas the C1-OH is 4 Å away from the side chains of Asp-163 and Trp-209 and interacts with a water molecule coordinated by Asp-163 (OD2), Asn-204 (ND2), and the main chain amide of Gln-220. Therefore, the substrate-binding site of the YKR070W pocket can potentially accommodate longer substrates, in line with the high activity of this phosphatase against phosphorylated C4, C5, and C6 carbohydrates (Fig. 2).
The structure of the YKR070W dimer revealed that the side chain of Phe-132 from one protomer contributes to the substrate binding site of another protomer and is positioned near the side chains of Asp-163, His-165, and Trp-209 (3.3-3.7 Å) and just 5.2 Å away from the C1 hydroxyl group of bound Gly-3-P (Fig. 9). This suggests that binding of the substrate to one active site of the YKR070W dimer can be transmitted to the second active site through the Phe-132 side chain, creating an opportunity for allosteric interaction of the two active sites. Therefore, the structure of YKR070W proposes a potential molecular mechanism for positive substrate cooperativity in dimeric and tetrameric HAD-like hydrolases based on the conformational change induced by substrate binding to one sub-  unit and transmitted (through Phe-132 in YKR070W) to another subunit. This is consistent with the observed positive cooperativity of substrate binding by purified YKR070W with 2-deoxyribose-5-phosphate as substrate (k H ϭ 1.5). Substrate saturation experiments with 2-deoxyribose-5-phosphate and purified YKR070W mutant proteins revealed a significant reduction both in cooperativity (k H ϭ 1.1-1.2) and substrate binding (K m ϭ 0.4 -0.5 mM) in the F132A, F132Y, and F132H proteins (k H ϭ 1.1-1.2) compared with the wild type YKR070W (k H ϭ 1.4 and K m ϭ 0.3 mM). Although a recent analysis of the available structures of HAD dimers has led to the suggestion that the adjacent HAD protomers do not share active site residues (124), all of these structures show the presence of a conserved aromatic residue homologous to the YKR070W Phe-132 positioned close to the substrate binding site of another protomer. These HAD proteins include the mouse chronophin (Phe-152, PDB code 4BX3) and HDHD2 (Tyr-124, PDB code 2HO4), human LHPP (Tyr-132, PDB code 2X4D), PH1952 from Pyrococcus horikoshii (Tyr-132, PDB code 1ZJJ), and AF0374 from Archaeoglobus fulgidus (Phe-128, PDB code 3QGM). Thus, in dimeric HADs, each of the protomers can contribute an aromatic residue to the substrate binding site of the other protomer, providing a potential molecular mechanism of cooperativity in dimeric HAD-like hydrolases.

Discussion
Biochemical characterization of previously unexplored HAD superfamily hydrolases from S. cerevisiae revealed the presence of phosphatase activity against the general phosphatase substrate pNPP in nine proteins and activity against specific phosphorylated substrates for eight proteins (Table 1 and Fig. 2). Thus, the in vitro or in vivo substrates have been experimentally identified or can be reliably predicted for 25 of the 26 soluble S. cerevisiae HADs, with the exception of YMR130W, for which we demonstrated only low phosphatase activity against pNPP. Most of the soluble S. cerevisiae HADs are phosphomonoesterases active against a broad range of C2-C12 metabolites, including phosphorylated carbohydrates, nucleotides, organic acids, glycerols, and cofactors ( Fig. 10) and supplemental Table 8). The identified in vitro substrates of previously uncharacterized S. cerevisiae HAD phosphatases suggest that these enzymes have a role in cell metabolism (SER2, YCR015C, and YNL010W), regulation (PHM8, SDT1, and YKL033W), or housekeeping (PHO13 and YOR131C).
The S. cerevisiae HAD family includes both highly specific (e.g. SER2 and YCR015C) and promiscuous phosphatases (e.g. YKR070W and DOG2), which show high activity against multiple substrates (Fig. 2). Previously, pronounced substrate promiscuity has been demonstrated for HAD phosphatases from E. coli and other organisms (28,32,40). This broad substrate specificity apparently contributes to the key role of these enzymes in the hydrolysis of a broad range of phosphomonoesters as well as the "house-cleaning" functions. Enzyme substrate promiscuity (also known as substrate ambiguity) is likely to be the starting point for the evolution of new enzymes through gene duplication followed by subfunctionalization (substrate specialization) of the diverging enzymes (28,127). Beneficial promiscuous activities in the promiscuous repertoire can be selected and improved through mutations, initially without losing the primary enzyme activity (127). In the evolutionary biological context, this mode of evolution represents a special case of subfunctionalization (128,129), whereby the diverging paralogs not only retain but enhance some of the multiple activities of the common ancestor. The S. cerevisiae complement of HAD phosphatases provides at least three examples of gene duplication that are compatible with this general model of enzyme evolution but reflect different stages of subfunctionalization. The glycerol phosphatases HOR2 and RHR2 share 95% sequence identity and have almost identical substrate profiles, suggesting a recent gene duplication without much sequence or functional divergence (Fig. 2). In contrast, despite nearly as high sequence conservation, the 2-deoxyglucose-6-phosphatases DOG1 and DOG2 (92% sequence identity) have already diverged, with DOG2 having evolved a preference toward fructose 1-phosphate but still retaining high activities toward 2-deoxyglucose-6-phosphate and mannose 6-phosphate ( Fig. 2 and Table 3). Finally, the nucleotidases PHM8 and SDT1 appear to represent older duplication and greater divergence because they show lower sequence similarity to each other (41.5% sequence identity) and prefer different (although similar) substrates, CMP and UMP, respectively (Fig. 2). The third nucleotidase, YKL033W, shares only 20% sequence identity with PHM8/SDT1 and has a different substrate profile, suggesting that it evolved from an even older gene duplication event or developed a preference for nucleotide substrates independently (through convergent evolution).
The only other organism with thoroughly characterized HADs is E. coli, which encodes a comparable number of soluble HADs (23 proteins) but only five membrane-bound HADs (compared with 19 in S. cerevisiae) (40). Although the orthologous HADs from yeast and E. coli share low overall sequence similarities (30% sequence identity at most), a comparison of the top preferred substrates for these enzymes identified seven common metabolites (Ser(P), P-glycolate, PLP, 2-deoxyglucose-6-phosphatase, fructose 1-phosphate, UMP, and trehalose-6-P). Thus, in the evolution of the HAD superfamily phosphatases, conservation of substrate specificity of orthologous enzymes does not require a particularly high level of sequence conservation. Furthermore, analysis of substrates that support at least moderate activity of HADs from both organisms (Ͼ0.2 mol/min/mg of protein) revealed comparable numbers of substrates (40 in yeast and 44 in E. coli) with 28 common metabolites (70%), which include phosphorylated carbohydrates, nucleotides, organic acids, FMN, and PLP ( Fig. 10 and supplemental Table 8). We propose that these metabolites represent the primary pool of potential substrates of the HAD superfamily that are likely to be conserved in most free living organisms with comparable numbers of HADs. This pool of secondary activities is likely to represent the reservoir for evolution of novel phosphatases.
In this work, we also identified several substrates that have not been reported for E. coli HADs and might represent recent additions to the yeast HAD substrate pool, including 3-phosphoglycerate, thiamine-P, and phosphopeptides ( Fig. 10 and supplemental Table 8). Four yeast HADs (YOR131C, YNL010W, YKR070W, and SDT1) were found to be active against phosphorylated (Tyr(P) and Ser(P)) peptides, adding novel potential protein phosphatases to the previously identified Ser(P)/Thr(P) phosphatases NEM1 and FCP1. HAD-like hydrolases with protein phosphatase activity have previously been identified in several eukaryotes (37)(38)(39)73) but not in E. coli, suggesting that this activity might be a recent addition to the substrate repertoire in the evolution of the HAD superfamily. In contrast, E. coli encodes two HADs that are involved in a bacteria-specific pathway of cell wall lipopolysaccharide biosynthesis (YrbI and GmhB) (130,131), which are absent in S. cerevisiae. Overall, E. coli has more HAD superfamily enzymes that are active against phosphorylated carbohydrates (seven compared with three in yeast), whereas S. cerevisiae has more HADs with a preference toward P-glycerols (three compared with none in E. coli).
Using the information from the COG and OrthoMCL databases combined with the results of the previous sequence analysis of the HAD superfamily (25,132,133), we established several co-orthologous groups for yeast, E. coli, and human HADs and compared their substrate specificity ( Fig. 1 and Table 4). As shown in Fig. 1, yeast and E. coli HADs belong to six co-orthologous groups. The two pairs of recently evolved paralogous yeast HADs, DOG1/DOG2 and HOR2/RHR2 (92 and 95% sequence identity, respectively), which belong to the same co-orthologous group (Fig. 1), share similar substrate profiles, although, as pointed out above, the preferred substrates differ for DOG1 and DOG2 (Fig. 2). However, the E. coli HAD phosphatase YfbT, to which all of these yeast enzymes appear to be co-orthologous, has a different substrate profile with a preference for fructose 1-phosphate (40). The BPGM group of phosphatases from both organisms are active against 2-deoxyglucose-6-phosphatase, fructose 1-phosphate, and Gly-3-P, whereas the Epo group enzymes are mostly nucleotidases (Table 4). Thus, substrate preferences of these HAD-like phosphatases correlate with their sequence similarity at the family level. Similarly, the PSP group HADs from S. cerevisiae (SER2), human (PSPH), and E. coli (SerB) were found to be Ser(P) phosphatases, suggesting that this activity is highly conserved in many organisms. In contrast, the S. cerevisiae P-glycolate phosphatase PHO13 (YDL236W) from the NagD family has several human orthologs active against P-glycolate (PGP) and PLP (PDXP), whereas the E. coli ortholog NagD is a UMP phosphatase, and the more diverged human paralog LHPP is a P-lysine phosphatase (Table 4). This distribution of activities implies that the P-glycolate phosphatase activity probably emerged early in the evolution of eukaryotes, but its paralogs can quickly change their substrate specificity. Moreover, the three E. coli PLP phosphatases YbhA, YigL, and Cof belong to the Cof family, which also includes the yeast phosphomannomutase SEC53 (YFL045C) (not shown).
The variation of enzyme substrate preferences within several families of HADs from yeast, humans, and E. coli indicates that the evolution of substrate specificity does not necessarily follow sequence divergence, although, as pointed out above, in some cases, even highly diverged orthologs retain the ancestral specificity. In addition, our analysis indicates that the HAD phosphatases from different families can convergently evolve to catalyze the dephosphorylation of the same substrate (e.g. P-glycolate). Thus, in general, sequence similarity-based classification of HAD-like phosphatases cannot accurately predict their preferred substrates, emphasizing the importance of biochemical experiments for functional annotation of these enzymes. The presence of large numbers of paralogous HADs with (often) low sequence similarity in many organisms suggests a high rate of evolution of these genes. HAD phosphatases can quickly evolve new biochemical functions and acquire novel biological roles using a broad pool of secondary sub-strates. The substrate pools of the HAD phosphatases are closely similar in E. coli and yeast even if, in many cases, the specificities of enzymes within the same co-orthologous family are not. The biochemical promiscuity of the HAD phosphatases together with their structural flexibility and catalytic efficiency underlies their dominant role in metabolic dephosphorylation in all kingdoms of life.
Author Contributions-E. K., G. B., R. F., and A. K. designed, performed, and analyzed the experiments shown in Tables 1 and 2 Table 3. K. S. M., Y. I. W., and E. V. K. designed, performed, and analyzed the experiments shown in Table 3

Co-orthologous groups and substrates of HADs from S. cerevisiae, E. coli, and humans
The preferred HAD substrates are indicated in parentheses. The 15 characterized yeast HADs and their COG groups are shown together with the orthologous HADs from E. coli and humans. 2d-glucose-6P, 2-deoxyglucose 6-phosphate phosphatase 1.