Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family*

Haloacid dehalogenase (HAD)-like hydrolases are a vast superfamily of largely uncharacterized enzymes, with a few members shown to possess phosphatase, β-phosphoglucomutase, phosphonatase, and dehalogenase activities. Using a representative set of 80 phosphorylated substrates, we characterized the substrate specificities of 23 soluble HADs encoded in the Escherichia coli genome. We identified small molecule phosphatase activity in 21 HADs and β-phosphoglucomutase activity in one protein. The E. coli HAD phosphatases show high catalytic efficiency and affinity to a wide range of phosphorylated metabolites that are intermediates of various metabolic reactions. Rather than following the classical “one enzyme-one substrate” model, most of the E. coli HADs show remarkably broad and overlapping substrate spectra. At least 12 reactions catalyzed by HADs currently have no EC numbers assigned in Enzyme Nomenclature. Surprisingly, most HADs hydrolyzed small phosphodonors (acetyl phosphate, carbamoyl phosphate, and phosphoramidate), which also serve as substrates for autophosphorylation of the receiver domains of the two-component signal transduction systems. The physiological relevance of the phosphatase activity with the preferred substrate was validated in vivo for one of the HADs, YniC. Many of the secondary activities of HADs might have no immediate physiological function but could comprise a reservoir for evolution of novel phosphatases.

Haloacid dehalogenase (HAD)-like hydrolases are a vast superfamily of largely uncharacterized enzymes, with a few members shown to possess phosphatase, ␤-phosphoglucomutase, phosphonatase, and dehalogenase activities. Using a representative set of 80 phosphorylated substrates, we characterized the substrate specificities of 23 soluble HADs encoded in the Escherichia coli genome. We identified small molecule phosphatase activity in 21 HADs and ␤-phosphoglucomutase activity in one protein. The E. coli HAD phosphatases show high catalytic efficiency and affinity to a wide range of phosphorylated metabolites that are intermediates of various metabolic reactions. Rather than following the classical "one enzyme-one substrate" model, most of the E. coli HADs show remarkably broad and overlapping substrate spectra. At least 12 reactions catalyzed by HADs currently have no EC numbers assigned in Enzyme Nomenclature. Surprisingly, most HADs hydrolyzed small phosphodonors (acetyl phosphate, carbamoyl phosphate, and phosphoramidate), which also serve as substrates for autophosphorylation of the receiver domains of the two-component signal transduction systems. The physiological relevance of the phosphatase activity with the preferred substrate was validated in vivo for one of the HADs, YniC. Many of the secondary activities of HADs might have no immediate physiological function but could comprise a reservoir for evolution of novel phosphatases.
Most enzymes form families of paralogs whose members are related by sequence and catalyze similar reactions but have evolved specific biological functions. Comprehensive determination of the substrate specificities and selectivities of all metabolic enzymes in an organism is an essential step toward understanding the relationship between the proteome and the metabolome. By the most recent estimate, Escherichia coli possesses at least 1186 metabolic enzymes and 1005 metabolites (1). The most common functional group in the metabolome is phosphate; 35-40% of the metabolites contain a phosphate group (2). The pool of phosphorylated metabolites is controlled by the activity of diverse kinases and phosphatases, of which there are hundreds in the E. coli genome.
Haloacid dehalogenase (HAD) 4 -like hydrolases (3) represent the largest family of predicted small molecule phosphatases encoded in the genomes of bacteria, archaea, and eukaryotes, with 6,805 proteins in data bases. The great majority of these proteins have no known biochemical or biological function. In any individual genome, the number of HAD genes can range from 10 to 20 in different bacteria to 100 in humans and 115 in Arabidopsis thaliana (InterPro data base). HADs share little overall sequence similarity (15-30% identity), but they can be unequivocally identified by the presence of three short conserved sequence motifs (3) (supplemental Fig. 1). Most of the characterized HADs have phosphatase activity (CO-P bond hydrolysis), and several also catalyze dehalogenase (C-halogen bond hydrolysis), phosphonatase (C-P bond hydrolysis), and ␤-phosphoglucomutase (CO-P bond hydrolysis and intramolecular phosphoryl transfer) reactions (3,4). The biochemically and structurally studied HADs include phosphoserine phosphatase SerB from Methanococcus jannaschii (5), phosphoglycolate phosphatase from Thermoplasma acidophilum (6), phosphonacetaldehyde hydrolase from Bacillus cereus (7), ␤-phosphoglucomutase from Lactococcus lactis (8), haloacid dehalogenases from Pseudomonas sp. YL (9), and Xanthobacter autotrophicus (10), and two E. coli phosphatases, YbiV and NagD (11,12). However, the vast majority of HADs remains uncharacterized. Since these enzymes generally show little sequence similarity, the catalyzed reaction and, especially, the substrate specificity are hard to predict on the basis of sequence conservation and have to be determined experimentally.
Using a set of 80 representative phosphorylated metabolites, we characterized the substrate specificities of all 23 soluble E. coli HADs and found that they comprise a family of promiscuous phosphatases with overlapping substrate profiles and are capable of hydrolyzing a wide range of phosphorylated metabolites, including carbohydrates, nucleotides, organic acids, coenzymes, and small phosphodonors. Genetic analysis demonstrated that the activity of one of the HADs, YniC, toward its preferred substrate was biologically important. We further show that all E. coli HADs have phosphatase activity against small phosphate donors (acetyl phosphate, carbamoyl phosphate, phosphoramidate), which resembles the autophosphorylation reaction catalyzed by CheY fold receiver domains of the two-component regulatory systems. Together with the previously reported structural similarity, these results indicate that the HAD superfamily and the receiver domain originate from an ancestral low specificity phosphatase. Clustering of E. coli HADs on the basis of their phosphatase activities (k cat /K m ) was incongruent with the sequence-based phylogeny. Thus, many of the secondary activities of HADs might be of no immediate functional importance but comprise a reservoir for evolution of phosphatases with novel specificities.

EXPERIMENTAL PROCEDURES
Gene Cloning and Protein Purification-For most HADs analyzed in this work, the genes were amplified by PCR from the E. coli DH5␣ genomic DNA and cloned into a modified pET15b (Novagen) as previously described (17). Several HADs (YbjI, YaeD, and YrbI) were cloned from the E. coli K12 W3110 genomic DNA into the archive vector pCA24N (Genobase data base; available on the World Wide Web at ecoli.aist-nara.ac.jp). Purification of proteins for screening and biochemical characterization was performed as previously described (18). NagD was expressed in an insoluble form and was partially refolded from the inclusion bodies by a buffer exchange method using the Spin-Column Protein Folding Screen kit (SFC01-10) and the column PFC02 from ProFoldin Protein Folding Services according to the manufacturer's instructions (available on the World Wide Web at www.profoldin.com).
Enzymatic Screens and Assays-General phosphatase screens with p-nitrophenyl phosphate (pNPP) as substrate and natural substrate phosphatase screens with 80 phosphorylated compounds from Sigma (supplemental Table 1) were performed as previously described (18). Acetyl-phosphatase activity was assayed by measuring the acetyl-phosphate concentration using the hydroxylamine protocol of Lipmann and Tuttle (19). The production of fructose in enzymatic reactions was determined using an enzyme-coupled assay with fructose dehydrogenase (F5152; Sigma), essentially as previously described (20). This assay was adapted for 96-well microplates (200-l reaction mixtures). Haloacid dehalogenase activity was determined spectrophotometrically by measuring the release of halide ions using the mercuric thiocyanide method (21). The assays were adapted to the 96-well format (125 l) and contained 50 mM CHES buffer (pH 9.0), 10 mM substrate, and 5 g of protein. Six compounds were used as substrates (R-chloropropionic acid, S-chloropropionic acid, bromoacetic acid, 4-chlorobenzoic acid, 2,2-dichloropropionic acid, and 2-bromopropionic acid), and L-2-haloacid dehalogenase from Pseudomonas sp. YL (22) was used as a positive control. Phosphonatase activity was assayed using phosphonoacetate as a substrate by measuring the release of inorganic phosphate using the Malachite Green reagent as previously described (23). ␤-Phosphoglucomutase activity was determined using a glucose-6-phosphate dehydrogenase-coupled assay and 1 mM ␤-glucose 1-phosphate as substrate (24).
For K m and V max determination, the phosphatase assays contained substrates at concentrations 0.005-2.0 mM. Kinetic parameters were determined by nonlinear curve fitting from the Lineweaver-Burk plot using the GraphPad Prism software (version 4.00 for Windows, GraphPad Software, San Diego, CA).
Mutagenesis and Growth Experiments-The yniC gene was deleted from the chromosome of the E. coli K-12 W3110 strain using a one-step inactivation method described by Datsenko and Wanner (25). The obtained ⌬YniC strain contains an unmarked gene deletion, which was verified using PCR. The YniC-overexpressing strain was prepared by subcloning (PCR) of the wild-type yniC into the KpnI/HindIII sites of the arabinose-inducible vector pBAD-33 (26). The resulting plasmid (pKC1) and the empty (no insert) vector pBAD-33 (a control) were transformed into the wild-type W3110 strain. Cells were grown aerobically at 37°C on the MOPS-buffered minimal medium containing 0.2% of succinate as a carbon source, and the expression of YniC was induced by the addition of 0.02% arabinose. The culture growth (A 600 ) was determined after 14 h of cultivation.
Bioinformatic Analyses-Hierarchical clustering of HADs (based on their substrate profiles) and substrates (based on their HAD spectra) was calculated using cosine correlations, and groups were clustered using the average method (R Foundation for Statistical Computing; available on the World Wide Web at www.R-project.org). For hierarchical clustering of proteins across HAD substrates, we consider each protein as a point in the m-dimensional space, where m designates the total number of HAD substrates. Each protein can then be represented as a vector A ϭ (s1, . . . sm) of length ͉A͉ and unit (nor-malized) vector AЈ(s1Ј ϭ s1/͉A͉, . . . smЈ ϭ sm/͉A͉). We use the cosine angle between normalized vectors as a similarity measure between proteins across substrates. The protein profiles across substrates are clustered into groups using the average linkage method (R Foundation for Statistical Computing; available on the World Wide Web at www.R-project.org). For hierarchical clustering of substrates across proteins, we consider each protein as a point in the n-dimensional space, where n designates the total number of proteins. Each substrate can then be represented as a vector B ϭ (p1, . . . pn) of length ͉B͉ and unit (normalized) vector BЈ(p1Ј ϭ p1/͉A͉, . . . pnЈ ϭ pn/͉B͉). We use the cosine angle between normalized vectors BЈ as a similarity measure between substrates across proteins. The substrate profiles across proteins are clustered into groups using the average linkage method (R Foundation for Statistical Computing; available on the World Wide Web at www.R-project.org).
The distance between the catalytic efficiency (k cat /K m ) profiles of 18 enzymes acting on 26 different substrates (see Table  2) was computed using the square distance approach (see supplemental materials and "Experimental Procedures"). A neighbor-joining tree was constructed from the distance matrix using the NEIGHBOR program of the PHYLIP package (28). Multiple alignments of amino acid sequences were constructed using the MUSCLE program (29) and optimized manually to ensure the correct alignment of known sequence motifs of the HAD superfamily (3). Sequence-based phylogenetic trees were constructed using the following methods: (i) neighbor-joining method, as implemented in the NEIGH-BOR program of the PHYLIP package (28), (ii) least squares method as implemented in the FITCH program of the PHYLIP package (28), (iii) local maximum likelihood optimization of the least squares tree using the ProtML program of the MOLPHY package (30), (iv) quartet puzzling as implemented in the TREE-PUZZLE program (31), and (v) Markov chain Monte Carlo Bayesian estimation using the MrBayes program (32,33).
Phosphatase Activity with Natural Substrates-In an effort to identify the physiological roles for the E. coli HADs, all purified proteins were screened for phosphatase activity (P i release) against a set of 80 natural phosphatase substrates (supplemental Table 1) representing all main divisions of the E. coli phosphometabolome (nucleotides, phosphorylated carbohydrates, organic acids, and amino acids) (2). These screens did not detect activity for YcjU (HAD11), HisB (HAD21), and YedP (HAD19). As mentioned above, YcjU is a ␤-phosphoglucomutase, whereas HisB (predicted histidinol phosphatase) and YedP might be phosphatases that are highly specific to their respective substrates. The remaining 20 HADs showed significant phosphatase activity toward a variety of substrates, in many cases, at levels predicted to be of physiological relevance (Fig. 1). To keep the background low, the screens with natural substrates were performed with substrate concentrations of 0.1-0.2 mM, which were subsequently found to be nonsaturating for most HADs (Table 2). Therefore, the observed velocities do not represent the maximum activities of these enzymes. Nevertheless, these screens not only identified the positive substrates and revealed the broad substrate ranges for most HADs but also correctly identified their preferred substrates (as shown by the subsequent saturation experiments presented in Table 2). The broad substrate spectra of HADs could not be  10.95 Ϯ 2.99% of the mean value). Assays were performed in the presence of 2.5 mM Mg 2ϩ and the following concentrations of substrates: 10 mM pNPP, 10 mM acetyl-P, 0.13 mM carbamoyl-P, or 0.13 mM imidodi-P. These substrate concentrations were saturating for most enzymes (except for carbamoyl-P, which showed no saturation with all tested proteins). ND, not detected.

JOURNAL OF BIOLOGICAL CHEMISTRY 36153
Analysis of the kinetic parameters of the E. coli HADs ( Table  2) revealed that with saturating substrates, most HADs showed the same substrate preferences (k cat /K m ) that were observed in substrate profiles (presented as phosphatase activities) with subsaturating substrate concentrations (Fig. 1). Moreover, for YfbT (HAD2), YjjG (HAD5), YrfG (HAD8), YbiV (HAD12), YbhA (HAD14), and YigL (HAD16), the level of substrate discrimination is quite low, given that k cat /K m values for several of their substrates agree within an order of magnitude. Comparison of the substrate profiles and kinetic parameters of various HADs also revealed that some phosphorylated metabolites can be hydrolyzed by several HADs with significant or even comparable catalytic efficiency. For example, ribose-5-P and glucose-6-P were substrates for YbiV (HAD12), YidA (HAD13), YniC (HAD1), and YfbT (HAD2), whereas fructose-1-P can be dephosphorylated by YfbT (HAD2), YihX (HAD4), YqaB (HAD6), YbiV (HAD12), and YidA (HAD13) (Table 2, Fig. 1). Although specificity is considered a hallmark of enzymatic activity and certain enzymes can be extraordinarily specific, there has been growing appreciation that substrate specificities are perhaps broader than is generally accepted, and many enzymes exhibit considerable catalytic and substrate promiscuity (34 -36). Catalytic promiscuity (also called polyreactivity or moon-lighting activity) is defined as the ability of enzyme active sites to catalyze distinctly different chemical transformations (different types of bonds cleaved or formed or different catalytic mechanisms of bond making or breaking) (37,38). Substrate promiscuity (also called substrate ambiguity or cross-reactivity) is defined as the ability of enzymes to catalyze one chemical transformation on several structurally related substrates. It is a generally accepted hypothesis that promiscuous activities serve as a starting point in the evolution of new enzymes and create the basis for the emergence of structurally and mechanistically related protein superfamilies (34, 35, 39 -41). This theory is supported by the analysis of enzymes from enolase, amidohydrolase, thiyl radical, and crotonase superfamilies (40,42,43). Our work with the E. coli HADs demonstrated that these enzymes are metal-dependent small molecule phosphatases with various degrees of substrate promiscuity and overlapping substrate specificities.
Metabolites Hydrolyzed by the E. coli HADs-E. coli HADs hydrolyze a wide range of phosphorylated metabolites, including carbohydrates, nucleotides, organic acids, and coenzymes. Hierarchical clustering of the HAD phosphatase activities (observed in the screens) against 52 substrates showed that phosphorylated carbohydrates represent the major group of the HAD substrates ( Fig. 2A). The most common substrates for these enzymes were fructose-1-P, glucose-6-P, mannose-6-P, 2-deoxyglucose-6-P, fructose-6-P, ribose-5-P, and erythrose-4-P ( Fig. 2A). PLP and FMN also appear to be common substrates of HADs. In addition, hierarchical clustering recognized a group of three nucleotidases, YrfG (HAD8), YjjG (HAD5), and YieH (HAD3). YrfG preferentially hydrolyzed purine nucleotides (GMP and IMP), and YjjG preferred pyrimidines (UMP, dUMP, and dTMP), whereas YieH hydrolyzed both purines and pyrimidines as secondary substrates. The fourth nucleotidase, NagD (HAD23) (which was only partially refolded and, therefore, was not included in the clustering), had an unusually broad substrate range and hydrolyzed deoxyribo-and ribonucleoside tri-, di-, and monophosphates, as well as polyphosphate and glucose-1-P (Fig. 1). Nucleotidase activity of E. coli NagD has been recently described by others, and a broad substrate range of this enzyme, with the highest catalytic efficiency toward nucleoside monophosphates, has been demonstrated (12).
Hierarchical clustering of 52 metabolites based on their ability to serve as substrates for the HADs separated these compounds into distinct groups of structurally related molecules (Fig. 2B). The dendrogram in Fig. 2B, which was based solely on the similarity of the enzyme profiles, grouped together hexoses (mannose-6-P, glucose-6-P, and glucosamine-6-P), pentoses (ribose-5-P and ribulose-5-P), purine nucleotides (GMP, dGMP, dAMP, and IMP), pyrimidine nucleotides (UMP, dUMP, dTMP, TDP, and UDP), pyrophosphate-containing metabolites (thiamine pyrophosphate, 5-phosphoribosyl-lpyrophosphate, PP i , and polyphosphate), and amino acids (P-serine and P-threonine). This indicates that structurally related substrates are recognized by similar sets of HADs, demonstrating a direct relationship between the chemical structure of HAD substrates and their biological activity (HAD spectra).
Several HADs catalyzed the dephosphorylation of two important coenzymes, pyridoxal phosphate (PLP) and FMN. PLP was the preferred substrate for E. coli Cof (HAD18) and YbhA (HAD14), and several other HADs (YbiV (HAD12), YihX (HAD4), YbjI (HAD15), and Gph (HAD10)) showed significant phosphatase activity with this molecule (0.2-1.44 mol of P i /min/mg of protein) ( Table 2, Fig. 1). In mammalian cells, the intracellular level of PLP is regulated, to a large extent, by the PLP phosphatase (49 -51), which is also an enzyme of the HAD superfamily and has K m and V max for this substrate similar to those determined here for E. coli HADs.
Four E. coli HADs (YbjI (HAD15), YigB (HAD7), Cof (HAD18), and YrfG (HAD8)) were capable of dephosphorylating FMN, an activity without EC number assigned (Table 2, Fig.  1). The KEGG data base indicates that in E. coli, three acid phosphatases are involved in the dephosphorylation of FMN: AppA, SurE, and AphA. However, two of them (AppA and AphA) are periplasmic nonspecific phosphatases and, therefore, not likely to be involved in the control of the intracellular FMN level, whereas no FMN dephosphorylating activity has been found in SurE in our recent work (47). We have found that FMN was a preferred substrate for two HADs, YbjI (HAD15) and YigB (HAD7) ( Table 2, Fig. 1), and they represent more likely candidates for the FMN dephosphorylation function in E. coli cells. Generally, we suggest that the activity of several HADs could be important in the regulation of the intracellular levels of PLP and FMN in E. coli.
Three E. coli HADs showed high phosphatase activity with ␣-D-glucose-1-P (YidA (HAD13) and YihX (HAD4)) or ␤-Dglucose-1-P (YfbT (HAD2)) ( Table 2, Fig. 1). These proteins were strictly specific to one anomer of glucose-1-P (␣ or ␤) and did not hydrolyze the other anomer. In E. coli, the ␣-form is produced from glycogen by glycogen phosphorylase GlgP or in other reactions of pentose, glucuronate, or nucleotide sugar metabolism, whereas ␤-D-glucose-1-P might be produced by yet unidentified maltose or trehalose phosphorylases (58). Both anomers can be used in various biosynthetic reactions or channeled to glycolysis or the Entner-Doudoroff pathway through conversion into the respective ␣and ␤-glucose 6-phosphates by specific phosphoglucomutases, ␣-phosphoglucomutase Pgm (EcoCyc data base) or ␤-phosphoglucomutase YcjU (this work). E. coli also makes the ␣-glucose 1-phosphatase Agp, a periplasmic enzyme that acts primarily as a glucose scavenger (59). Although, in the KEGG data base, this periplasmic phosphatase is annotated as an enzyme responsible for intracellular dephosphorylation of ␣-glucose-1-P, our results indicate that both YihX (HAD4) and YidA (HAD13) have at least 2 times lower K m (higher affinity) to this substrate, and therefore, these enzymes are likely to be the principal phosphatases involved in the intracellular metabolism of ␣-glucose-1-P in E. coli. YfbT (HAD2) is the first phosphatase found to selectively hydrolyze ␤-D-glucose-1-P and showing no activity with the ␣-form of this substrate.
HADs Hydrolyze Acetyl Phosphate and Other Small Phosphodonors-The promiscuity of the HAD phosphatase family raises interesting questions regarding the evolution of the activities and the emergence of new activities from ancestral ones. One clue into the evolution of the HAD family comes from structural analysis of phosphoserine phosphatase from M. jannaschii, which showed significant structural similarity to CheY, a receiver domain/response regulator of the two-component signal transduction system CheA-CheY (63). In two-component systems, the receiver domains of response regulator proteins are either phosphorylated by their cognate sensor his-tidine kinases or autophosphorylate using acetyl phosphate (acetyl-P), carbamoyl phosphate (carbamoyl-P), or phosphoramidate as phosphodonors (64,65). The structural similarity and the inferred common ancestry of response regulators and the HADs prompted us to test if any of the E. coli HADs were able to hydrolyze small phosphodonors. All purified E. coli HADs, including even YcjU (HAD11), a ␤-phosphoglucomutase, showed various levels of Mg 2ϩ -dependent activity toward these substrates (imidodiphosphate was used as a phosphoramidate substrate) ( Table 1). Most proteins showed higher activities (Ն1 mol/min/mg of protein) with acetyl-P and lower activities with carbamoyl-P or imido-di-P, but YidA (HAD13), YbiV (HAD12), and Gph (HAD10) had high activity with all three substrates (Table 1). To investigate whether these small phosphodonors are specific substrates for HAD phosphatases, we purified seven non-HAD phosphatases from E. coli (CCA, YfdR, YcdX, SurE, CysQ, GlpX, and YfbR) and eight HADs from other organisms (Cof, YigL, and YrfG from Salmonella typhimurium; PA0006, PA0065, and PA3172 from Pseudomonas aeruginosa; RPA3639 and RPA4337 from Rhodopseudomonas palustris) and assayed them for phosphatase activity against these compounds. Although all of these proteins showed high phosphatase activity against pNPP, only HAD phosphatases were active toward small phosphodonors. Thus, the phosphohydrolase activity toward acetyl-P, carbamoyl-P, and imido-di-P is a specific property of the HADs.
In Vivo Validation of an in Vitro Activity-To determine whether any of the in vitro activities observed in our experiments represent the corresponding biologically relevant reactions, we performed genetic and physiological experiments for a representative in vitro phosphatase activity against 2-deoxyglucose-6-P observed in several E. coli HADs. 2-deoxyglucose-6-P is a toxic analogue of glucose-6-P, and its intracellular concentration can be very high (up to 100 mM) (68). YniC (HAD1) showed higher catalytic efficiency (k cat /K m ) and affinity (K m ϭ 0.61 mM) to this substrate than YigL (HAD16) and Cof (HAD18) ( Table 2), suggesting that in vivo YniC might be the principal phosphatase responsible for the hydrolysis of this compound. It has been shown previously that 2-deoxyglucose is taken up by E. coli and phosphorylated to yield 2-deoxyglucose-6-P (69). The E. coli yniC deletion mutant was much more sensitive to the presence of 2-deoxyglucose in the growth medium (IC 50 ϭ 0.02 mM) than the wild-type strain (IC 50 ϭ 0.59 mM) (Fig. 3). By contrast, the YniC-overproducing strain grew well even in the presence of 20 mM 2-deoxyglucose (Fig. 3). These results indicate that the 2-deoxyglucose 6-phosphatase activity of YniC (HAD1) plays an important role in the resistance of E. coli to 2-deoxyglucose and suggest that the in vitro activities of HADs toward their preferred and at least some of the secondary substrates reflect their functionally relevant in vivo activities.
Substrate Specificities, Sequence Similarity, and Evolutionary Relationships among HADs-The availability of substrate profiles for the full complement of soluble E. coli HADs and the kinetic parameters for most of them allowed us to examine the relationship between protein evolution within a large family of enzymes and evolution of their substrate preferences. To this end, we first constructed sequence-based phylogenetic trees of E. coli HADs using five methods, namely neighbor-joining, least squares, maximum likelihood obtained by local optimization of a minimum evolution tree, quartet puzzling, and Bayesian inference (see "Experimental Procedures"). These methods yielded very similar phylogenetic trees; the maximum likelihood tree is shown on Fig. 4A. Most of the HADs fell into four well supported clusters of paralogs, which can be designated the YieH (HAD3), YbhA (HAD14), HisB, and YihX (HAD4) subfamilies after their representative members, and several HADs (e.g. OtsB or YrbI) whose phylogenetic affinities were less certain (Fig. 4A). Notably, however, each of the subfamilies includes at least one sugar phosphatase, suggesting that this specificity might be ancestral in the HAD superfamily.
In addition to the traditional, sequence-based phylogenetic tree (Fig. 4A), we constructed a neighbor-joining tree on the basis of the matrix of quadratic distances between the catalytic efficiencies (k cat /K m ) of the HADs (Fig. 4B). In general, although the sequence-based trees reconstructed using different methods show remarkable agreement, the topology of the activity-based tree showed no congruence with that of the sequence-based phylogenetic trees (Fig. 4, compare A and B). This lack of compatibility of the tree topologies suggests that many of the secondary substrate specificities could be of lesser functional importance and might evolve (largely) neutrally.
The present work might offer insights into even more ancient history of the HAD hydrolases and their evolutionary progenitors. We showed that small intracellular phosphodonors (acetyl-P, carbamoyl-P, imido-di-P) are universal substrates for all E. coli HADs (as well as for HADs from other organisms) but not for any of the other phosphatases we were able to test (Table  1). This functional similarity meshes well with the monophyly of the HADs and the receiver domain superfamily, which together form a distinct branch in the evolutionary tree of the Rossmann fold domains (70), because the receiver domains can autophosphorylate using the same small phosphodonors. Conceivably, the common ancestor of the HADs and the receiver domains was a broad specificity phosphatase that could use as substrates, at least, small phosphodonors and various phosphorylated sugars (Fig. 5). Subsequently, after duplication, the evolving HAD and receiver domain families apparently retained different subsets of the multiple functions of the ancestor (71, 72) (subfunctionalization), which, in some cases, was followed by neofunctionalization (i.e. emergence of new specificities in some of the paralogs) (73). Thus, the receiver domains retained the ability to hydrolyze small phosphodonors and evolved the activity against another N-P bond that occurs in the phosphohistidine of phosphorylated histidine kinases (Fig. 5). In contrast, the HADs expanded, through a series of duplications, the repertoire of phosphatase activities against a broad variety of metabolites, as demonstrated by their broad substrate profiles (Fig. 5). Conceivably, the more specialized phosphatases like OtsB, SerB, or Gph were locked into their narrow specificities at a later stage of evolution. Dehalogenase, phosphonatase, and protein phosphatase activities observed in several HADs from other organisms (7,9,10,74), but not found in E. coli, represent further substrate specialization of the HADs. The present findings on the substrate profiles of the HAD superfamily phosphatases emphasize the enormous and still largely unexplored versatility of enzymes that can evolve within the same structural scaffold through duplication, subfunctionalization, and neofunctionalization (75).

DISCUSSION
Our results demonstrate that HADs represent the largest family of small molecule phosphatases in E. coli. 12 phosphatase activities catalyzed by the E. coli HADs have no EC number assigned (supplemental Table 3). Two HADs are involved in lipopolysaccharide biosynthesis: YrbI (3-deoxy-D-manno-octulosonate 8-phosphatase) and YaeD (GmhB, DD-heptose 1,7bisphosphatase) (15,76). The phosphoglycolate phosphatase Gph has been recently shown to be involved in the degradation For each protein, the preferred substrates are shown. A, a sequence-based phylogenetic tree created using the maximum likelihood (local optimization of minimum evolution) method. Branches that are common between the phylogenetic trees reconstructed with different methods (see "Experimental Procedures") are indicated with thick lines. B, a neighborjoining tree constructed from the matrix of quadratic distances between the catalytic efficiencies (k cat /K m ) of HADs (Table 2). of 2-phosphoglycolate produced during DNA repair (14). Data base annotations suggested that SerB (HAD9) is a phosphoserine phosphatase involved in the biosynthesis of serine, whereas OtsB (HAD17) is a trehalose 6-phosphatase that plays an important role in the resistance of E. coli to various shocks (heat, cold, osmotic, oxygen) (1,77). The present work directly shows that these proteins, indeed, possess the respective phosphatase activities (Fig. 1, Table 2). Additionally, we demonstrate that 2-deoxyglucose 6-phosphatase activity of YniC (HAD1) plays an important role in the resistance of E. coli cells to 2-deoxyglucose (Fig. 3). Similar detoxification functions can be suggested for other promiscuous E. coli HADs (YfbT, YbiV, YidA, YjjG, YihX, and YigL). Recently, the human HAD phosphatase MDP-1 has been proposed to be involved in glycation repair to free proteins from the glycation products derived from direct nonenzymatic glycosylation (glycation) of proteins by aldose phosphates (glucose-6-P, ribose-5-P, and erythrose-4-P) (78). This enzyme dephosphorylates lysozyme glycated with glucose-6-P, converting it to a substrate for fructosamine-3kinase, the next enzyme of the glycation repair cascade (78). Nonenzymatic glycation of proteins and chromosomal DNA has been demonstrated in E. coli (79,80). Therefore, we can suggest that several E. coli HAD phosphatases (YniC (HAD1), YfbT (HAD2), YbiV (HAD12), YidA (HAD13), and YigL (HAD16)) might be also involved in glycation repair by direct dephosphorylation of phosphoglycated proteins or DNA or by preventing the intracellular concentrations of the phosphorylated aldoses from reaching deleterious levels.
One of the principal findings of this work is that nearly all HADs can hydrolyze a wide range of phosphorylated molecules present in bacterial cells and that some of these reactions are predicted to be of physiological significance, based on known intracellular substrate concentrations and the enzyme properties that we determined. The recent, complementary work by Thornton and co-workers (27) on ligand selectivity and competition between enzymes in silico suggests that cellular metabolites also have significant promiscuity in the recognition of enzymes. Thus, the specificity of the enzyme-substrate recognition in vivo is, in general, achieved by a combination of enzyme and substrate selectivity. The broad occurrence of natural substrate promiscuity in different classes of enzymes (phosphatases, acetyltransferases, kinases, epimerases, esterases, and amidohydrolases) strongly suggests that this feature is important in vivo. Thus, systems biologists must take this property of enzyme/substrate interactions into account when modeling cellular metabolic pathways and networks.
A fundamental question is whether the secondary activities of HADs (and other promiscuous enzymes) are biologically relevant and maintained by selection or represent functionally neutral variations. The lack of congruence between the phylogenetic tree of the HADs and the clustering of their activities (k cat /K m or substrate profiles) (Fig. 4) seems to be most compatible with the latter possibility. However, even if most of the secondary specificities are not directly maintained by selection, the breadth of the substrate spectra of HADs might be crucial in evolutionary terms, since they supply the pool from which new specificities could evolve. The ability of HADs to hydrolyze different types of bonds and their substrate promiscuity described here might have contributed to the evolution of new enzymatic specificities involved in the regulation of bacterial metabolism and environmental response.