The Legionella pneumophila effector Ceg4 is a phosphotyrosine phosphatase that attenuates activation of eukaryotic MAPK pathways

Host colonization by Gram-negative pathogens often involves delivery of bacterial proteins called “effectors” into the host cell. The pneumonia-causing pathogen Legionella pneumophila delivers more than 330 effectors into the host cell via its type IVB Dot/Icm secretion system. The collective functions of these proteins are the establishment of a replicative niche from which Legionella can recruit cellular materials to grow while evading lysosomal fusion inhibiting its growth. Using a combination of structural, biochemical, and in vivo approaches, we show that one of these translocated effector proteins, Ceg4, is a phosphotyrosine phosphatase harboring a haloacid dehalogenase–hydrolase domain. Ceg4 could dephosphorylate a broad range of phosphotyrosine-containing peptides in vitro and attenuated activation of MAPK-controlled pathways in both yeast and human cells. Our findings indicate that L. pneumophila's infectious program includes manipulation of phosphorylation cascades in key host pathways. The structural and functional features of the Ceg4 effector unraveled here provide first insight into its function as a phosphotyrosine phosphatase, paving the way to further studies into L. pneumophila pathogenicity.

Disruption of effector translocation (1-3) results in attenuation of intracellular growth, highlighting the essential role of effectors in pathogen-host interactions. Despite significant progress in identification of effector arsenals in bacterial genomes and in defining the molecular functions of individual effectors, many remain uncharacterized, necessitating further studies into their role in pathogenesis. The composition and number of effectors vary dramatically between different pathogens. For example, plague and intestinal disease-causing Yersinia species encode fewer than 10 effectors delivered by the type III secretion system (4), whereas the similar secretion system in Shigella spp. is involved in translocation of close to 30 effectors (5). Notably, sequence-related effectors are found in pathogens with very diverse invasion strategies, suggesting that these effector families are involved in common host manipulation tactics.
The genome of Legionella pneumophila, a causative agent of severe pneumonia in humans, encodes over 330 effector proteins that are translocated via the defect in organelle trafficking/intracellular multiplication (Dot/Icm), 2 type IVB secretion system (e.g. Refs. 6 -11 or, for a recent review, see Ref. 12). Legionella's effectors account for more than 10% of its proteome (13) and represent the largest effector set known to the bacterial world. In their natural habitat of fresh water reservoirs, Legionella spp. invade diverse amoebae by preventing formation of the phagolysosome (14) followed by modification of the newly coopted compartment into an organelle ideal for intracellular replication of the bacteria, the Legionella-containing vacuole (15). The ability to apply the same invasion strategy to invade human alveolar macrophages raises the intriguing This work was supported in part by National Institutes of Health Grant GM094585 (to A. S.) through the Midwest Center for Structural Genomics. The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This article contains Figs. S1-S6. The atomic coordinates and structure factors ( cro ARTICLE possibility that Legionella's effectors target host processes that are conserved between distant eukaryotic phyla. The sheer number of Legionella effectors and their apparent functional redundancy makes their functional characterization particularly challenging (16).
Studies into the function of bacterial effectors have suggested that these pathogenic factors demonstrate unparalleled abilities to manipulate a wide spectrum of the host cell's processes including facilitating alterations in cytoskeletal rearrangement (17,18), vesicular trafficking (19,20), signal transduction (21,22), and transcription regulation (23,24). To achieve these effects, effectors are often involved in post-translational modification (PTM) of specific host proteins. A key regulation mechanism in both eukaryotic and bacterial cells, PTMs typically involve enzymatic covalent modification of targeted proteins at specific residues that affects that protein's activity, localization, or interactions, thus triggering a change in protein function. Effector proteins can possess not only the PTM activities found in the bacterial world but also ostensibly exclusively eukaryotic PTM activities. Of particular prevalence, bacterial effectors have evolved to mimic the activity of ubiquitin protein ligases, which control the final step in the eukaryote-specific PTM involving attachment of the ubiquitin polypeptide to targeted proteins, usually resulting in this protein's degradation by the proteasome (25). Effectors with this PTM activity have been identified in the arsenals of many bacterial pathogens including Escherichia coli (26,27), Shigella flexneri (24,28), Salmonella spp. (26,29), and L. pneumophila (30,31).
The most common PTM in eukaryotic cells is phosphorylation in which a phosphate group from a donor molecule such as ATP is transferred onto hydroxyl functional groups on residues (serine, threonine, or tyrosine) of the targeted protein (32,33). This PTM is catalyzed by kinases, and the human genome encodes several hundred protein kinases divided into tyrosineand serine/threonine-specific enzymes (34). This PTM mechanism is involved in most if not all known human cell processes.
One of the best studied examples of phosphorylationcontrolled signaling is mediated by mitogen-activated protein kinases (MAPKs). MAPKs are highly conserved Ser/Thr protein kinases that have been extensively studied for their central roles in mediating signal transduction of extracellular stimuli to the appropriate biological response (35). MAPKs are activated by dual phosphorylation of threonine and tyrosine residues in a Thr-X-Tyr motif located in their activation loops by upstream kinases (MAPK kinases (MAPKKs)) (36). In turn, MAPKs are responsible for phosphorylating and activating downstream MAPK-activated protein kinases (MAPKAPKs or MKs). These downstream targets elicit activation of processes including regulation of stress response, proliferation, differentiation, and apoptosis. Their control of processes highly relevant to bacterial infections and their careful regulation and conservation among eukaryotes have made MAPKs attractive targets for bacterial effectors.
Although protein phosphatases have not been previously detected among Legionella's complement of effectors (40), precedent for this mechanism among other effectors of other species undoubtedly exists: Yersinia pestis YopH is a phosphotyrosine phosphatase capable of removing the pTyr signal through hydrolysis, thereby muting its activity (41), whereas Shigella flexneri OspF is a phosphothreonine lyase that irreversibly dephosphorylates the activating threonine of MAPKs (42). A recent analysis of the effector repertoire in the causative agent of Q fever, Coxiella burnetii, also identified effectors Cbu1676 and Cbu0885 as phosphatases targeting the MAP kinase pathway in the eukaryotic host surrogate Saccharomyces cerevisiae (43). S. cerevisiae possesses five main MAP kinase pathways regulating filamentation, cell wall integrity, sporulation, mating, and hyperosmosis with the latter two pathways, controlled by Fus3 and Hog1, respectively, being most broadly conserved among diverse eukaryotic organisms (44). The closest human homologues of Fus3 and Hog1 are ERK2 and p38. The first indication of Cbu1676 and Cbu0885 function came from in silico analysis that pointed to the presence of the haloacid dehalogenase (HAD)-like domain in these bacterial proteins (43). Proteins containing HAD-like domains have a broad range of activities including dehalogenase, phosphonatase, and phosphomutase activities, and although the majority characterized thus far are phosphatases and ATPases (45,46), protein phosphatase activity has only been observed in eukaryotes (47)(48)(49)(50).
All HAD-like domains share a common overall fold featuring a core Rossmann fold, consisting of at least two pairs of ␣-helices that sandwich the core five-stranded parallel ␤-sheet in the order "54123," with a squiggle and flap motif at the end of ␤1 strand (45,46). The common molecular architecture among HAD superfamily members includes four (I-IV) highly conserved sequence motifs colocalized to the active site. Motifs I and IV feature conserved aspartate residues that coordinate the Mg 2ϩ ion required for catalysis. In addition to squiggle and flap motifs, HAD-like domains typically contain an insertion to the catalytic core domain called a cap domain, which controls active site access and is involved in substrate binding (51)(52)(53). Despite these recognizable sequence motifs, significant variation in substrate specificity and activity of the HAD-like protein family necessitates detailed structural analysis and rigorous substrate specificity studies.
In addition to the Coxiella effectors mentioned above, putative HAD-like domains have been identified in the uncharacterized effectors Lpg0096 (also known as "coregulated with the effector encoding genes 4" (Ceg4)), Lpg110,1 and Lpg2555 from L. pneumophila (43). Here, using X-ray crystallography and biochemical activity screening, we show that Ceg4 is an atypical HAD-like phosphotyrosine phosphatase able to attenuate the activation of MAP kinases in both human and yeast cells. These results indicate that L. pneumophila facilitates its infectious program by manipulation of phosphorylation cascades of key pathways in its host cells.

Legionella effector Ceg4 demonstrates phosphotyrosinespecific phosphatase activity in vitro
The Dot/Icm-dependent translocation of L. pneumophila Ceg4 protein encoded by the lpg0096 gene was previously demonstrated using CyaA fusion translocation assays (8). Our sequence analysis using Phobius (54) suggested that, in addition to the N-terminal HAD-like domain mentioned above, this effector contains two C-terminal transmembrane (TM) helices (residues 266 -289 and 295-320). As no TM signatures were detected in Lpg1101 or Lpg2555, we hypothesize that Lpg0096/ Ceg4 may be a member of a functionally diversified family that relies on localization to appropriately direct its activity toward the host. Using data from two recent large-scale comparative genomics studies of Legionella species (55), we performed phylogenetic analysis of Ceg4 with the 33 other putative HAD domain-containing effectors (Fig. 1). Ceg4, Lpg1101, and an additional 12 homologues formed a distinct clade corresponding to Legionella orthologue group LOG_02908, whereas Lpg2555 and five other effectors were localized in a distinct group. Comparative sequence analysis across the Ceg4-containing clade indicated that sequence conservation among these effectors was highest within the HAD-like domain with significant sequence variation in their C-terminal domains. Lani_0822 from Legionella anisa, possessing an additional 50 residues at both the N and C termini, was the sole exception to this observation. Lpg1101 is also unique within LOG_02908, having homology to the phosphatidylinositol 4-phosphatebinding domain of SidM/DrrA-like (56, 57) at its C terminus. Most notably however, eight of the 14 Ceg4 homologues pos-sessed two TM domains in their C-terminal portion, suggesting that localization to the host membrane is an important and common feature to this subset of HAD-like effectors in Legionella.
To confirm the general activity of the Ceg4 HAD-like domain, we purified an N-terminal fragment spanning residues 1-193 of L. pneumophila Ceg4 (see "Experimental procedures" for details). In line with predictions, the Ceg4(1-193) fragment demonstrated robust phosphatase activity against the generic phosphatase substrate p-nitrophenyl phosphate (pNPP) (58). Highest reaction rates were observed between pH 6.5 and 8 with activity dropping markedly above pH 8. Ceg4 also demonstrated a strict requirement for Mg 2ϩ metal ions (Fig. S1, B, C, and D). Expanding on these results, we tested Ceg4  for activity against a library of 94 phosphorylated metabolic substrates, allowing for querying a broad range of possible specificities (58). In these assays, Ceg4  demonstrated the highest activity toward phosphotyrosine ( Fig. 2C; see also Fig. S1E). To determine whether Ceg4(1-193) is active against protein substrates, we screened for its ability to remove the phosphate group from a selection of 53 phosphopeptides chosen for their importance to signaling in S. cerevisiae, which has been successfully used as a model eukaryotic system in characterization of bacterial effector functions including those from Legionella (30,59). This set included pSer-, pThr-, and pTyr-containing sequences. Consistent with our previous results, Ceg4  demonstrated robust phosphatase activity against nine diverse peptides comprising the full pool of pTyr-containing sequences in this substrate array ( Fig. 1E and Fig. S1F) and, moreover, at Legionella phosphotyrosine phosphatase activity toward MAPKs levels 4 -5 times higher than for pSer or pThr peptides. Combined, these data showed that Ceg4 is a phosphotyrosinespecific phosphatase active against peptide substrates.

Crystal structure of HAD-like domain provides molecular insight into phosphatase activity of Ceg4
To gain further insight into the molecular function of Ceg4, we determined the crystal structure of the Ceg4(1-208) fragment to 1.88 Å by the single-wavelength anomalous dispersion method (see Table 1 for X-ray crystallographic statistics). The final structural model spanned Ceg4 residues 1-204 and a portion of the N-terminal fusion tag sequence (GQENLYFQG) corresponding to the TEV protease cleavage site (Fig. 3A). In addition to a core Rossmann-like fold, the HAD-like domain of Ceg4 included a cap subdomain consisting of three ␣-helices and two long loops inserted between the Asp 9 -X-Asp 11 "squiggle motif" and the ␣1 helix (Fig. 2, A and B). The location of the cap subdomain insertion in relation to conserved motifs I and II of the core Rossman fold classified it as a C1 cap (46). Such C1 caps were previously identified in cytosolic 5Ј-nucleotidase III (cN-III) nucleotidase (60), Eya2 protein-tyrosine phosphatase (61), and MDP-1 sugar phosphatase (62,63). However, according to our analysis, the Ceg4 cap subdomain did not show any significant structural similarity with these or any other C1 caps of structurally characterized HAD-like domains (Fig. S2).
In addition to overall structural similarity to other Rossmann-like folds, the core fold of Ceg4 possessed several conserved features consistent with the canonical motifs of other HAD-like phosphatases, such as Asp 9 and Asp 11 (motif I), Thr 103 (motif II), Lys 135 (motif III), and Asp 157 and Asp 158 (motif IV) (Fig. 3C). Collectively, these residues formed a small pocket ϳ350 Å 3 in volume with Asn 20 from the cap subdomain forming a "lid" over the pocket (Fig. 3D). Inspection of the active site appears to confirm the correct positioning of Asp 9 and Asp 11 for their putative functions as the nucleophile and general acid/base residues, respectively. The active site also contained a Mg 2ϩ ion that was coordinated by Asp 9 , Asp 158 , and three ordered water molecules; the position of the magnesium ion is conserved with those of mono-and divalent ions bound to other HAD-like phosphatases (60 -62). The active site also contained a Cl Ϫ ion associated with the side chain of Lys 135 , the backbone of Lys 104 , and two ordered water molecules; its position is conserved with the position of sulfate or phosphate ions trapped in the active site of other crystallized HAD-like phosphatases (60 -62). Finally, the active site also contained a highly coordinated water that interacted with the Phe 8 , Asp 9 , Asp 11 , Thr 103 , and the Cl Ϫ ion.
To confirm the involvement of these residues with Ceg4 catalytic activity, we performed site-directed mutagenesis and tested the resultant Ceg4(1-193) mutants in vitro for activity toward pNPP and pTyr substrates as described above. In accordance with their predicted contributions to catalysis, the D9A, D11A, D11N, D157A, D158A, and D162A mutations abrogated phosphatase activity, validating their essentiality for the catalytic function of this protein (Fig. S3).
Inspection of the molecular packing in the Ceg4(1-208) crystal lattice did not reveal significantly extensive contacts indicative of oligomerization. However, by size exclusion chromatography (Fig. S4A), we observed a mixture of monomeric and  Fig. S1E for the full list of tested substrates. Error bars show 95% confidence intervals from triplicate data. D, kinetics were obtained for the generic phosphatase substrate pNPP. Error bars denote S.E. values. E, Ceg4 dephosphorylates all tested phosphotyrosine-containing peptides at levels 4 -5 times higher than phosphoserine-or phosphothreonine-containing peptides in vitro. Error bars denote 95% confidence intervals from triplicate data. See also Fig. S1E for all tested peptides. aa, amino acid; alc, alcohol; sug, sugar; ald, aldehyde; DHAP, dihydroxyacetone phosphate.

Legionella phosphotyrosine phosphatase activity toward MAPKs
dimeric species. Consistent with this, we observed that the nine residues that we resolved of the N-terminal fusion tag comprising the TEV protease cleavage sequence (Gly Ϫ8 -Gln Ϫ7 -Glu Ϫ6 -Asn Ϫ5 -Leu Ϫ4 -Tyr Ϫ3 -Phe Ϫ2 -Gln Ϫ1 -Gly 0 ) contacted the active site of the adjacent molecule in the crystal lattice (Fig. S4B). This peptide adopted an extended, nearly ␤-strand-like conformation (Fig. S4C). Notably, the tyrosine residue in the tag sequence deeply bound in the active site pocket with its hydroxyl group forming a network of interactions including hydrogen bonds with the side chain of Asp 11 , the bound Cl Ϫ ion, and two ordered water molecules (Fig. S4C). Other interactions included hydrogen bonds between the side chain of Gln Ϫ1 of the tag and Glu 40 , between the backbone amide of Phe Ϫ2 residue of the tag and Asn 20 of the C1 cap, and between the side chain of Lys 104 and the backbone carbonyl of Gln Ϫ6 , and Arg Ϫ7 of the tag formed two interactions (with Glu 165 and the backbone carbonyl of Gly 133 ). The C1 cap also interacts with the fusion tag peptide via hydrophobic interactions between Tyr 43 and Tyr Ϫ3 and between Phe 24 and Gln Ϫ1 and a hydrogen bond between Glu 40 and Gln Ϫ1 .
Given Ceg4 specificity toward pTyr peptides, we hypothesized that our observed binding of the tyrosine from the N-terminal fusion tag may be representative of Ceg4 interactions with its substrate. To further examine this, we modified the N-terminal fusion TEV cleavage sequence to match the sequence one of the pTyr-carrying peptides, Gln Ϫ7 -Met Ϫ6 -Thr Ϫ5 -Gly Ϫ4 -Tyr Ϫ3 -Val Ϫ2 -Ser Ϫ1 -Thr 0 , identified as a Ceg4 substrate in our peptide array screening and representing the activation loop sequence of the yeast Hog1 MAP kinase (64). Hog1 is involved in a signaling pathway regulating yeast hyper-

Legionella phosphotyrosine phosphatase activity toward MAPKs
osmotic adaptation (64) and is a close homologue of human p38 MAP kinase, a pathway previously implicated in Legionella pathogenesis (38, 39, 65). As mentioned previously, Coxiella HAD-like effectors efficiently modulated the activity of other yeast MAP kinases, prompting us to hypothesize that our in vitro activity results may also be indicative of Ceg4 activity against MAPK kinases. The structure of the tag-modified Ceg4(1-208) HOG1p fragment was solved to 1.9 Å by molecular replacement (Table 1). This crystal structure was almost identical to the original Ceg4(1-208) TEV site structure described above and superimposed with an r.m.s.d. of less than 0.3 Å across the entire protein backbone. In this crystal structure, we resolved the positions of the six Hog1-derived residues Thr Ϫ5 -Gly Ϫ4 -Tyr Ϫ3 -Val Ϫ2 -Ser Ϫ1 -Thr 0 from the modified fusion tag. As with the Ceg4(1-208) TEV site structure, the crystal packing of Ceg4(1-208) HOG1p showed that the N-terminal peptide interacted with the active site of the adjacent Ceg4(1-208) HOG1p molecule (Fig. 4A) with the Tyr Ϫ3 residue bound deeply in the pocket (Fig. 4B). The HOG1p peptide also adopted an extended ␤-strand-like conformation, and its structure was strikingly similar to that of the TEV cleavage site peptide, most especially across residues Ϫ5 through 0 (r.m.s.d. of 0.5 Å over the six matching C␣ atoms) (Fig. 4C). As we could resolve only a shorter region of this peptide, we were able to identify fewer interactions between this peptide and Ceg4(1-208), and those were similar to the interactions observed in the Ceg4(1-208) TEV site structure (i.e. the interactions between Tyr Ϫ3 and the active site and between

Legionella phosphotyrosine phosphatase activity toward MAPKs
Asn 20 and the backbone amide of residue Ϫ2 of the tag and hydrophobic interactions between Tyr 43 and Tyr Ϫ3 and between Phe 24 and residue Ϫ1.
Next, we compared the position of the tyrosine from the expression tags and the active site configuration of the Ceg4(1-208) active site with other structurally characterized HAD phosphatases. This analysis showed that the fusion tag tyrosine hydroxyl group adopts a position that is 3.4 Å from the bound chlorine atom, which itself occupied the same general position as phosphate or phosphate analogues (i.e. phosphate bound to E. coli YrbI (Protein Data Bank code 3I6B (66)) or beryllium trifluoride bound to Eya2 (Protein Data Bank code 3HB0 (61)) (Fig. S2B). The position of the Ceg4(1-208)-bound magnesium ion was also conserved with those of other HAD-like phosphatases (Fig. S2B). This analysis indicates that the active site configuration observed in the Ceg4(1-208) crystal structures is a good approximation of the position of a phosphotyrosine substrate bound to this enzyme.
Overall, our structural analysis revealed a compact active site in the HAD-like domain of Ceg4(1-208) restricted by a unique cap motif. This active site is able to accommodate the phosphotyrosine residue from a peptide substrate, the position of which can be gleaned from the conformation of the fusion tag in the Ceg4(1-208) crystal lattice.

Ceg4 shows activity against Hog1 and Fus3 MAP kinases in S. cerevisiae
According to our phosphopeptide library screening, Ceg4  demonstrated equally robust activity against peptides QMTGpYVSTR and GMTEpYVATR (where pY is phosphotyrosine) representing the activation loops of yeast Hog1 and Fus3 MAP kinases, respectively (Fig. 1C). To test whether in vitro activity of the Ceg4(1-193) fragment against yeast MAP kinase phosphopeptides is representative of in vivo activity of this effector, we tested the ability of full-length Ceg4 to affect the activation of Hog1-and Fus3-controlled pathways in a yeast model system. For this, we used two S. cerevisiae strains engineered to express a fluorescent reporter protein (Stl2-BFP or Fus1-GFP) upon activation of Hog1-or Fus3-controlled pathways, respectively (see "Experimental procedures" for details). We expressed full-length Ceg4(1-397) or the Ceg4(1-208) fragment in these S. cerevisiae strains and measured the overall fluorescence signal from 10,000 cells. S. cerevisiae strains overexpressing Ceg4 showed significant reduction of activation of both Hog1-and Fus3-activated pathways (42 and 56%, respectively) as compared with the control strain carrying an empty vector (Fig. 5A). The strain expressing the Ceg4(1-208) fragment showed a reduced ability to suppress MAPK activation in the case of Fus3-controlled activation compared with the strain

Legionella phosphotyrosine phosphatase activity toward MAPKs
expressing the full-length effector. In contrast, the expression of the Ceg4(1-208) fragment resulted in further decrease in Hog1-controlled activation compared with the strain expressing the full-length effector. Based on these results, Legionella Ceg4 is implicated in regulation of host MAPK-controlled pathways as demonstrated by reduced expression of fluorescent reporters for both Fus3 and Hog1. In addition, our data suggested that Ceg4 activity against the Fus3 pathway is dependent on the C-terminal region of the effector that contains the membrane-spanning elements, pointing to the potential role of this domain for in vivo specificity of Ceg4.
Next, to test the link between Ceg4 phosphatase activity and MAPK regulation, we probed the effect of individual Ceg4 active site residue substitutions on the ability of this effector to dampen the activation of Hog1-controlled pathways in yeast. The Ceg4 residues targeted by this analysis were chosen based on their direct involvement in phosphatase catalytic activity (Asp 11 , Lys 135 , Asp 158 , and Asp 162 ) and their participation in forming the active site by the cap subdomain (Lys 17 , Ser 18 , Asn 20 , Val 23 , Phe 24 , Glu 26 , and Tyr 43 ) and the core HAD-like domain (Lys 104 , Glu 108 , Glu 131 , Thr 161 , and Asn 186 ) (Fig. 5, B and C).
In line with our in vitro activity results, alanine substitution of the conserved Asp 11 , Lys 135 , Asp 158 , and Asp 162 residues directly involved in catalytic activity of the HAD-like domain resulted in abrogation of Ceg4 suppression of the hyperosmotic stress response. This observation confirmed our hypothesis that Ceg4-triggered dampening of MAPK activation is directly linked to the phosphatase activity of its HAD-like domain. Substitution of active site pocket residues Val 23 , Phe 24 , Tyr 43 , and Lys 104 also had a negative impact on the ability of Ceg4 to control MAPK activation with V23D substitution resulting in complete loss of activity. In contrast, substitutions of Ceg4 Lys 17 , Glu 26 , Glu 108 , Glu 131 , and Thr 161 , which are located distally from the active site, and Asn 20 , which partially covers the entrance to the catalytic pocket, did not significantly affect its activity against the Hog1-activated pathway. These observations are consistent with previously observed mechanisms of catalysis of HADs and furthermore indicate the involvement of certain cap domain residues in substrate recognition. Combined, these results clearly linked Ceg4's in vivo activity as a regulator of MAPK pathways with its HAD-like domain phosphatase active site and identified individual residues involved in catalysis and substrate interactions.

Ceg4 localizes to HeLa endoplasmic reticulum and attenuates MAPK p38 activation in vivo
Having determined that Ceg4 is able to act on conserved eukaryotic MAP kinases and also established a possible role of the C-terminal region of the Ceg4 effector in this activity, we were interested in identifying the subcellular localization of Ceg4 in human cells and in determining whether the MAPKdampening activity extended to human MAP kinases. To that end, human HeLa cells were transfected with constructs expressing wildtype Ceg4, the catalytically inactive variant Ceg4 D9A , and Ceg4(1-207) and Ceg4(208 -397) fragments each fused to GFP. In keeping with the presence of the predicted transmembrane regions in the C-terminal portion of Ceg4, the full length (both wildtype and D9A mutant) and Ceg4(208 -397) fragments demonstrated specific perinuclear localization, whereas the construct Ceg4(1-207), which lacks the C-terminal region containing the TM domain, showed diffuse localization throughout the cell (Fig. 6A). Given the differential importance of the C-terminal domain to dampening S. cerevisiae pheromone and hyperosmolarity responses, we posit that this portion of Ceg4 is critically important to its function in the host cell. We therefore sought to more specifically determine the subcellular localization of Ceg4. HeLa cells transfected with N-terminally GFP-tagged Ceg4 D9A , counterstained with ER-Tracker Red dye, demonstrated that Ceg4 colocalizes predominantly with endoplasmic reticular structures.
With both structural and biochemical analyses indicating MAPK activation loop sequences as potential targets of Ceg4 phosphatase activity, we also tested its ability to perform this function on the closest human homologue of Hog1, p38 MAPK. HEK293T cells transfected with either Ceg4 or Ceg4 D9A were tested for changes in the phosphorylation state of human p38 MAPK upon stimulation with either TPA or anisomycin. Western blot analysis for both total p38 and phospho-p38 (Fig. 6C) showed that, although the total levels of p38 are the same in cells expressing the wildtype and catalytically inactive mutant, cells harboring wildtype Ceg4 demonstrated a significant reduction of signal corresponding to phosphorylated p38 compared with the same signal in cells carrying the Ceg4 D9A mutant. Combined, our structural and functional analyses have identified Legionella Ceg4 as a bacterial HAD protein-tyrosine phosphatase that is able to attenuate the MAP kinase responses in both human and yeast cells in vivo and does so via removal of the phosphate moiety from phosphotyrosines in their activation loops that are critical to their activation.

Discussion
Translocation of effector proteins inside the host cell is an important and common strategy adopted by many Gram-negative bacteria including important human pathogens. This necessitates the functional characterization of specific effectors as a necessary step in understanding of host-pathogen interactions and for the development of novel antibacterial therapies. Here, we show that the conserved Legionella effector Ceg4 can modulate the phosphorylation state of eukaryotic MAP kinases through its HAD-like phosphatase domain, and we clarify the molecular structure of this domain, providing key molecular details into this effector.
Ceg4 is one of three predicted HAD-like domain-containing effectors in the L. pneumophila genome that are known to be translocated by the Dot/Icm system. According to our analysis, HAD-like effectors are also found in other Legionella species, suggesting that this functional domain is widely used by these pathogens for manipulation of host signaling pathways. Despite commonality among HAD-like effectors, Ceg4 represents a sequence-distinct group containing eight Legionella effectors that feature the combination of an N-terminal HAD-like domain and a C-terminal region carrying two transmembrane helices. The C-terminal region of the Ceg4 effector plays an important part in interactions of this effector with eukaryotic

Legionella phosphotyrosine phosphatase activity toward MAPKs
MAP kinases as deletion of this region had a significant effect on the ability of this effector to dampen the signaling by the yeast Fus3p MAPK. Furthermore, alanine substitution of catalytic residues Asp 11 and Asp 158 in the Ceg4 HAD-like domain active site abrogated phosphatase activity and the ability to dampen MAPK activation. Combined with the broad substrate specificity of the Ceg4 HAD-like domain toward phosphotyrosine peptides demonstrated by our in vitro assays, this observation prompted us to suggest that the C-terminal region may be responsible for tailoring the general phosphatase activity of Ceg4 toward its specific host target.
The crystal structure of the Ceg4(1-208) fragment showed strong similarity with previously structurally characterized members of the HAD-like protein family including specific features of the active site such as conservation of key catalytic residues and coordination of a Mg 2ϩ ion, known to be essential for catalysis (51). Probing the Ceg4 active site cavity with sitedirected mutagenesis not only confirmed the role of residues predicted to be directly involved in catalysis but also revealed the subset of residues important for Ceg4 activity against MAP kinase substrates. Substitution of residues such as Asp 11 , which acts as a general acid/base, drastically reduced or completely abrogated the activity of this effector against Fus3 MAPK in a yeast model system. Notably, previous prediction of specific residues important for Ceg4 activity based on similarity to Coxiella HAD-like effectors (43) was only partially confirmed by our structure and subsequent mutagenesis. Specifically, key residues of motifs II and III were previously predicted to correspond to Ser 81 and Lys 111 . However, our structural analysis pointed instead to Thr 103 and Lys 135 fulfilling this role. This observation highlights the limits of primary sequence-based analysis applied to highly diverse HAD-like proteins and effectors in general and reiterates the necessity in molecular and structural data to complement functional diversity across large protein families.
The substrate specificity of HAD-like domains is often defined by the "cap" subdomain insertion that controls the access to the catalytic center (67). The Ceg4 HAD-like domain structure features an unusual ␣-helical cap motif never before described for this domain. This novel cap motif is compatible with the broad specificity of this domain against phosphotyrosine peptides as indicated by our observation that the tyrosine residue from two different N-terminal tag sequences is able to make intimate contacts with the Ceg4 active site of a neighboring molecule in two different crystal lattices. The specific position of the tyrosine residue in the Ceg4 active site is compatible with the position of trapped phosphate/phosphate analogue substrates in other structurally characterized HAD-like phosphatases, suggesting that this crystallization observation may indeed be representative of the interaction between the Ceg4 HAD-like and its phosphoprotein substrate.

Legionella phosphotyrosine phosphatase activity toward MAPKs
Effectors have been demonstrated to target all strata of the MAPK-controlled signaling pathways (MAPKKK, MAPKK, MAPK, and MKs) using several differing enzymatic activities (68,69). Characterization of Ceg4 phosphatase activity against yeast and human MAP kinases adds a new member to this growing list of MAPK-modulating factors along with the recently characterized Coxiella HAD-like effectors active against the yeast cell wall integrity MAPK (43). Human HEK293T cells transfected with Ceg4 demonstrated clear reduction of the amount of phosphorylated p38 MAP kinase, and this was compromised by mutation of the key Ceg4 active site residue Asp 9 . Previous work has shown that MAPK phosphorylation in human cells is increased at very early time points of bacterial challenge by Legionella and that this activation is sustained for some time in an effector-dependent manner (39). Therefore, it is tempting to speculate that Ceg4 would serve a functional role only at significantly later points of infection, and in keeping with this model, RNA sequencing data taken during infection show that Ceg4 transcription levels are low during the postexponential/infectious stage but increase 10-fold during exponential growth (70). An additional possibility is that, given the C terminus-dependent localization to the endoplasmic reticulum membranes and the loss of the ability of Ceg4 to modulate Fus3 activation without this domain, it is possible that Ceg4 acts to reduce specific MAPK activation in a localized manner while allowing general activation throughout the rest of the cell. Characterization of the structural and functional features of the Ceg4 effector presented here provide the first insight into the function of this and other HAD-like effectors during Legionella infection, paving the way to further studies into this bacterium's pathogenic strategy.

Protein purification and size exclusion chromatography
Based on domain and transmembrane location predictions, genes corresponding to Ceg4 residues 1-208 and 1-193 were cloned into p15Tv-LIC-TEV by ligation-independent cloning and transformed into E. coli BL21 CodonPlus (DE3) RIPL competent cells using standard procedures. Several mutants of Ceg4  in p15Tv-LIC-TEV (D9A, D11N, D11A, E26A, D157A, D158A, and D162A) along with a variant of Ceg4(1-208) with the TEV sequence "GRQNLYFQG" mutated to match the activation loop sequence from S. cerevisiae Hog1 "PQMTGYVST" were prepared by site-directed mutagenesis. Cultures were grown at 37°C in M9 medium with selenomethionine or LB medium supplemented with kanamycin to an A 600 of 0.6 -0.8, expression of His 6 -TEV-tagged Ceg4 was induced by the addition of 0.4 mM isopropyl ␤-D-1-thiogalactopyranoside, and the temperature of the culture was reduced to 16°C overnight. The following day, cultures were harvested by centrifugation, and pellets were lysed by sonication on ice in 50 mM HEPES (pH 7.5), 150 mM NaCl, and 1 mM PMSF. All further purification was conducted at 4°C. Cell lysate was clarified by ultracentrifugation at 17,000 ϫ g for 30 min, and 1-5 ml of nickel-nitrilotriacetic acid resin (Qiagen) was added and incubated with gentle rotation for 30 min. Resin was washed with 50 mM HEPES (pH 7.5), 150 mM NaCl, and 30 mM imidazole, and protein was eluted with 50 mM HEPES (pH 7.5), 150 mM NaCl, and 500 mM imidazole. Proteins were further concentrated using a centrifugal concentrator, flash frozen in liquid N 2 , and stored at Ϫ80°C. Oligomerization of Ceg4(1-208) was tested by size exclusion chromatography using a Superdex S200 column with running buffer containing 50 mM HEPES (pH 7.5) and 150 mM NaCl.

General enzyme activity screening
General screens for enzyme activity were performed as described previously (58) using Ceg4 . Briefly, 20 l of purified protein (at 0.5 g/l) was added to wells of a 96-well plate, and then 180 l of protease, phosphatase, phosphodiesterase, dehydrogenase, oxidase, NADH/NADPH oxidase, and lipase or 170 l of thioesterase mixtures (containing buffers, metal cations, and substrates) were added as well as 10 l of 5,5Ј-dithiobis(2-nitrobenzoic acid) to the thoiesterase mixture. Plates were incubated at 37°C for 1 h. Phosphatase, phosphodiesterase, protease, lipase, and thioesterase results were read at 410 nm, and dehydrogenase, oxidase, and NADH/NADPH oxidase results were monitored at 340 nm. All results were obtained using a Spectramax M2 plate reader.

Natural phosphatase substrate screening
Screens for activity toward naturally occurring phosphatase substrates were performed as described previously (58) using Ceg4 . Briefly, two 96-well plates were prepared, one as a blank control and a second for the protein. 10 l of natural phosphatase substrate at 4 mM (see Table S1 for list) was added to each assay well followed by 150 l of reaction mixture or reaction mixture containing 2 g of protein to give a final concentration of 50 mM HEPES-K (pH 7.5), 5 mM MgCl 2 , 1 mM MnCl 2 , and 0.5 mM NiCl 2 . Plates were incubated at 37°C for 30 min. After incubation, 40 l of malachite green development reagent was added to each well prior to reading the absorbance at 630 nm. Enzyme velocity (in mM phosphate⅐min Ϫ1 ⅐mg Ϫ1 ) was calculated based on a KH 2 PO 4 standard curve.

Ceg4 metal and pH dependence assays and kinetics using p-nitrophenyl phosphate
Assays for metal requirements were conducted using 20 mM pNPP in 50 mM HEPES-K (pH 7.0), 0.1 g⅐ml Ϫ1 of purified Ceg4 , and the concentration of metals indicated in a final volume of 200 l. For pH optimizations, reactions were conducted using 20 mM pNPP, 15 mM MgCl 2 , 0.1 g⅐ml Ϫ1 Ceg4 , and one of the following buffers at 50 mM: MES (pH 6.5), HEPES (pH 7), HEPES (pH 7.5), HEPES (pH 8), CHES (pH 9), CHES (pH 9.5), and CHES (pH 10). For kinetics determinations, pNPP was used at the concentrations specified. Reactions were started by the addition of enzyme and monitored using a Spectramax M2 plate reader at 405 nm for 25 min at 25°C. Data analysis and curve fitting were performed using GraphPad Prism 6.

Phosphopeptide phosphatase assay
Phosphatase activity toward a library of 53 phosphopeptide sequences was tested as described previously (46). Briefly, 10 l of peptide solution was mixed with 150 l of 50 mM HEPES Legionella phosphotyrosine phosphatase activity toward MAPKs (pH 7.0), 15 mM MgCl 2 , and 0.01 g of streptavidin-binding peptide-purified Ceg4 . The reaction was incubated for 10 min at 25°C before addition of 40 l of malachite green development reagent. Absorbance at 630 nm was recorded. Enzyme velocity (in mM phosphate⅐min Ϫ1 ⅐mg Ϫ1 ) was calculated based on a KH 2 PO 4 standard curve. Subsequent kinetics determinations with peptide QMTGpYVSTR were performed using the method above with the peptide concentrations specified in the figure. Data analysis and curve fitting were performed using GraphPad Prism 6.
Diffraction data were collected at 100 K at beamline 19-ID at the Structural Biology Center, Advanced Photon Source at a wavelength of 0.979 Å (selenium peak). All diffraction data were reduced with HKL-3000 (71). The Ceg4(1-208) TEV site structure was solved first by single-wavelength anomalous dispersion phasing using PHENIX.solve (72), which identified all five selenomethionine residues in the asymmetric unit, followed by model building using PHENIX.autobuild. The structure of Ceg4(1-208) HOG1p was determined by molecular replacement using the Ceg4(1-208) TEV site as a search model using PHENIX.phaser. All structures were refined using PHENIX.refine and Coot (73). The final Ceg4(1-208) TEV site model includes the sequence GRQNLYFQG from the TEV site followed by residues 1-204 of Ceg4; the final Ceg4(1-208) HOG1p model includes the sequence TGYVST from yeast Hog1 followed by residues 1-204 of Ceg4 with residues 146 and 147 unmodeled due to poor electron density. B-factors were refined as isotropic for all structures. All geometries were verified with PHENIX.refine and the wwPDB validation server. Structure coordinates were deposited to the Protein Data Bank under accession codes 6AOK and 6AOJ for the Ceg4(1-208) TEV site and Ceg4(1-208) HOG1p structures, respectively. Structural orthologs were identified using the DaliLite server (74). Active site volume was calculated using CastP (75).

Yeast transformation and MAPK pathway activation assay
Ceg4(1-397) or the mutants specified were cloned into pYES2 NT/A, transformed into S. cerevisiae (W303 MATa, bar1::NatR, far1⌬, mfa2::pFus1-GFP, ura3::Kan-pSTL1-BFP, his3, trp1, leu2) using the lithium acetate procedure (76) and grown on selective medium for 2 days at 30°C. For analysis of mating and high-osmolarity glycerol (HOG) pathway responses by flow cytometry, transformants were grown either in duplicate or triplicate in selective medium overnight at 30°C. Empty plasmid was grown as a negative control. Overnight cultures were diluted to A 600 between 0.1 and 0.2 and grown to early log phase. For analysis of the mating pathway response, yeast cells were treated with 1 mM ␣-factor and incubated at 30°C for 2 h. For analysis of the HOG pathway response, cells were treated with 2 mM KCl and incubated at 30°C for 1 h. For both conditions, cells were then treated with the protein synthesis inhibitor cycloheximide for 30 min. For each sample, 10,000 cells were measured with a MACSQuant Vyb (Miltenyi Biotech). Data shown are mean fluorescence (GFP for mating response and BFP for HOG response) and standard deviation of duplicates or triplicates.

HEK293T and HeLa cell culture and transfection
For MAPK activation assays, HEK293T cells were maintained in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% FBS at 37°C and 5% CO 2 and grown to a confluence of ϳ70% at the time of transfection. Cells were transfected using Lipofectamine 3000 (Life Technologies) according to the manufacturer's instructions with endotoxin-free pcDNA3-N-Flag-LIC containing Ceg4(1-397) or Ceg4(1-397) D9A .
For cell localization studies, HeLa cells were grown on poly-L-lysine-treated glass coverslips, maintained in DMEM, supplemented with 10% FBS at 37°C and 5% CO 2 , and grown to a confluence of ϳ70% at time of transfection. Cells were transfected with either pEGFP-N1 containing Ceg4(1-397), Ceg4(1-397) D9A , Ceg4(1-207), or Ceg4(208 -397) using Lipofectamine 3000 according to the manufacturer's instructions followed by counterstaining with DAPI during fixation. For colocalization studies, pEGFP-N1 Ceg4(1-397) D9Atransfected cells were grown in chambered coverslips and incubated with ER-Tracker and LysoTracker Red dyes according to the manufacturer's instructions prior to fixation and counterstaining with DAPI. Microscopy data were collected using a Nikon TiE inverted microscope and Nikon C2 confocal system with a 60ϫ oil immersion lens.

MAPK activation and immunoblotting
24 h post-transfection with Ceg4(1-397) or Ceg4(1-397) D9A , 5 g/ml anisomycin or 200 nM TPA was added to the culture medium. After 30 min, cells were gently washed once with PBS followed by lysis directly into SDS-PAGE loading buffer. SDS-PAGE was performed with the addition of 0.5% 2,2,2-trichloroethanol added to the resolving portion of the gel. After electrophoresis, gels were exposed to UV light for 2 min and imaged with a GelDoc (Bio-Rad) to obtain loading controls prior to Western blotting. Proteins were transferred to nitrocellulose using a Transblot Turbo (Bio-Rad). Membranes were blocked with blocking buffer (5% (w/v) BSA in TBS with 0.1% Tween 20) for 1 h followed by incubation with either anti-p38 (Cell Signaling Technology, catalog number 8690; 1:1000 dilution in blocking buffer) or anti-phospho-p38 (Cell Signaling Technology, catalog number 4511; 1:1000 dilution in blocking buffer) overnight. After washing, membranes were incubated in 5% nonfat skim milk in TBS with 0.1% Tween 20 containing anti-rabbit HRP (Cell Signaling Technology, catalog number 7074; 1:4000 dilution) for 1 h. Blots were developed with Bio-Rad Clarity Western Plus reagent, imaged using a GelDoc, and visualized using ImageLab (Bio-Rad).

Visualization of total cellular tyrosine phosphorylation by immunoblotting in S. cerevisiae and HEK293T
BY4741 S. cerevisiae (MATa his3⌬1 leu2⌬0 lys2⌬0 ura3⌬0 (77)) were transformed with the indicated constructs in pYES2 NT/A using the lithium acetate procedure (76) and grown on synthetic complete medium agar with uracil for 2 days at 30°C. Colonies were picked and cultured overnight in selective medium supplemented with 2% (w/v) raffinose. In the morning, 3 OD units of cells were harvested by centrifugation, washed, resuspended in 5 ml of selective medium supplemented with 2% (w/v) galactose, and incubated at 30°C with shaking for 5 h. Cultures were harvested by centrifugation, and cells were lysed using an alkaline/SDS lysis procedure (78). HEK293T cells were cultured and transfected in 6-well plates according to the procedures described for MAPK activation assays. Cells were washed with PBS and harvested directly using 100 l/well SDS-PAGE loading buffer followed by brief sonication to reduce viscosity and 5 min of heating at 95°C. 20 l of each yeast lysate and 15 l of HEK293T cell lysate was loaded and separated by 12% SDS-polyacrylamide gels supplemented with 0.5% (v/v) 2,2,2-trichloro ethanol followed by visualization of protein loading after exposure of gels to UV light for 2-3 min. Protein was transferred to nitrocellulose for immunoblotting using a Transblot Turbo. For total phosphotyrosine detection, yeast and HEK293T blots were blocked with 5% (w/v) BSA, 1ϫ TBS, and 0.1% Tween 20 at 4°C for 1 h with gentle shaking followed by overnight incubation with ␣-P-Tyr-100 antibody (Cell Signaling Technology, catalog number 9411; 1:2000 dilution in blocking buffer). For Ceg4 detection in yeast, blocking was performed in 5% nonfat skim milk in TBS with 0.1% Tween 20 followed by incubation with ␣-Xpress (Thermo Fisher Scientific, catalog number R910-25; 1:4000 dilution) overnight. For detection of Ceg4 in HEK293T cells, blocking was performed in 3% nonfat skim milk in TBS followed by incubation with ␣-FLAG M2 (Sigma-Aldrich, catalog number F1804; 1:2000 dilution) for 45 min at room temperature.
After three 5-min washes with TBS with 0.1% Tween 20, all blots were incubated in 5% nonfat skim milk in TBS with 0.1% Tween 20 containing anti-mouse HRP (Cell Signaling Technology, catalog number 7076; 1:4000 dilution) for 1 h. Blots were developed with Bio-Rad Clarity Western Plus reagent and imaged using a GelDoc and visualized using ImageLab.
Author contributions-A. Q. performed human in vivo work with the help of D. V., and E. E. and O. E. purified and crystalized Ceg4. O. E. and A. T. Q. performed enzyme activity screening. P. S. K. and S. P. performed yeast MAPK assays. B. N. collected crystal data. P. J. S. solved the structures and wrote the manuscript. A. Q. and A. S. wrote the manuscript with input from A. W. E. and A. S. All work was performed under the supervision of A. W. E., A. F. Y., and A. S.