Legionella pneumophila effector Lem4 is a membrane-associated protein tyrosine phosphatase

Legionella pneumophila is a Gram-negative pathogenic bacterium that causes severe pneumonia in humans. It establishes a replicative niche called Legionella-containing vacuole (LCV) that allows bacteria to survive and replicate inside pulmonary macrophages. To hijack host cell defense systems, L. pneumophila injects over 300 effector proteins into the host cell cytosol. The Lem4 effector (lpg1101) consists of two domains: an N-terminal haloacid dehalogenase (HAD) domain with unknown function and a C-terminal phosphatidylinositol 4-phosphate–binding domain that anchors Lem4 to the membrane of early LCVs. Herein, we demonstrate that the HAD domain (Lem4-N) is structurally similar to mouse MDP-1 phosphatase and displays phosphotyrosine phosphatase activity. Substrate specificity of Lem4 was probed using a tyrosine phosphatase substrate set, which contained a selection of 360 phosphopeptides derived from human phosphorylation sites. This assay allowed us to identify a consensus pTyr-containing motif. Based on the localization of Lem4 to lysosomes and to some extent to plasma membrane when expressed in human cells, we hypothesize that this protein is involved in protein–protein interactions with an LCV or plasma membrane–associated tyrosine-phosphorylated host target.

be regulated by the CpxRA system (22), and its translocation is affected by the IcmS-IcmW chaperon complex (23). Lem4 is a 322-residue protein that consists of two domains: a C-terminal PI(4)P-binding P4M domain that apparently drives ectopically expressed Lem4 to early LCVs in Legionella-infected mammalian cells (20), and an N-terminal haloacid dehalogenase-like (HAD-like) domain of unknown function (Fig. 1A).
Here, we use a structural approach to demonstrate that the N-terminal haloacid dehalogenase domain of Lem4 is an active protein tyrosine phosphatase. Our detailed studies of Lem4 localization upon overexpression in mammalian cells show that it is targeted by P4M domain to lysosomes, which differs from the cellular localization of P4M domain of SidM (24). We used co-immunoprecipitation coupled with MS to identify potential cellular targets of Lem4. Consistent with the results of our localization studies, the list of potential interactors was enriched with membrane proteins and proteins involved in vesicular transport. Moreover, within the list we found a significant representation of components of the Wnt signaling pathway, an evolutionarily conserved pathway that is mostly known for regulating embryonic development and tissue homeostasis but also plays a role in inflammation and host immune response to pathogens (25).

Lem4-N is a structural homolog of mouse MDP-1 phosphatase
Full-length Lem4 (aa 2-322) was overexpressed in Escherichia coli strain BL21(DE3) as a fusion protein with a His 6 tag and purified by metal affinity and size-exclusion chromatogra-phy. During purification, the protein tended to undergo a spontaneous degradation, yielding a fragment that was determined by MS to encompass residues 6 -218. This stable fragment containing the predicted HAD-like domain of Lem4 was cloned into the same vector as the full-length protein, and the resulting construct Lem4(6 -218), or Lem4-N, was successfully purified and crystallized. Crystals of native Lem4-N diffracted to 1.75 Å.
A search for structural homologs of Lem4-N with the DALI server (27) identified more than 30 proteins with the Z-score above 10. All these homologs belong to the haloacid dehalogenase superfamily. The top hit was the mouse magnesium-dependent phosphatase-1 (MDP-1, PDB code 1U7P, Z-score 12.1, root mean square deviation 3.3 Å for 163 C␣ atoms, identity 13%; 1.6 Å for 81 C␣ atoms out of 164 in 1U7P, within the core of the protein fold) (Fig. 1C) (28). The overall topology of the

Structure and function of Legionella Lem4 effector
core domain of the two enzymes is very similar. Lem4-N has a more prominent loop between ␤3 and ␣4 (aa 122-132) with an additional 3 10 helical turn. The main difference between these two proteins is the ␣-helical cap domain of Lem4-N, which corresponds to an insert in MDP-1 consisting of two antiparallel ␤-strands (Fig. 1C).

Lem4-N is an active tyrosine phosphatase
To experimentally test whether Lem4-N displays phosphatase activity, we used the p-nitrophenyl phosphate (pNPP) as a generic substrate. For initial assessment of Lem4-N activity, we used the conditions optimized for murine MDP-1 (assay buffer with Mg 2ϩ ions and pH 5.5 (29,30)). The kinetic parameters determined for Lem4-N (K m ϭ 0.74 Ϯ 0.037 mM, k cat ϭ 14.3 s Ϫ1 ) were in the same range as those previously reported for MDP-1 (K m ϭ 1.7 mM, rate ϭ 4.0 s Ϫ1 ) (30). We next tested small molecules that were identified as substrates for MDP-1, namely O-phosphotyrosine (pTyr) and some closed-ring sugars, as well as O-phosphoserine (pSer), O-phosphothreonine (pThr), several nucleotides, and phosphoinositol derivatives. The latter substrates were included because the presence of the C-terminal PI(4)P-binding domain in Lem4 suggested that they could be a substrate for Lem4-N. Of these, only pNPP and pTyr were good substrates for Lem4-N (Fig. 2). Neither pSer nor pThr were dephosphorylated by Lem4-N to a significant level, suggesting that this protein is a specialized tyrosine phosphatase. Kinetic parameters of pTyr dephosphorylation by Lem4-N (K m ϭ 2 Ϯ 0.17 mM, k cat ϭ 17.8 s Ϫ1 ) were very similar to those quantifiedforpNPP(seeabove).TheoptimalpHforpTyrdephosphorylation was determined to be 5.5 (Fig. S1) and was used in all the following experiments that assessed Lem4-N activity.

Peptide substrate profiling of Lem4-N
To test Lem4-N activity and specificity toward pTyr in a peptide context, we used a general peptide substrate for tyrosine phosphatases, a phosphorylated nona-peptide ENDpYINASL (31). Lem4-N dephosphorylated the peptide at a lower rate than pTyr, but still significantly faster than any of the other small molecule substrates tested (data not shown).
To establish the Lem4-N substrate specificity profile, we tested its activity on a tyrosine phosphatase substrate set (JPT, Berlin, Germany), containing a selection of 360 peptides derived from known human phosphorylation sites. All the synthetic 11-mer peptides in the array contained a single phosphorylated tyrosine in the central position. The final concentration of each peptide in the reaction mixture was 10 M. The conditions and timing of the dephosphorylation reactions were adjusted to distinguish between faster and slower dephosphorylating peptides. Peptides that showed Ͼ60% dephosphorylation (10 M peptide per well, free phosphate concentration higher than 6 M as assessed by malachite green assay) were used to determine the preferred sequence motif recognized by Lem4-N (Fig. 3A). Positively charged amino acids (Arg and Lys) dominated in the "ϩ" positions (pTyr, position 0; Ϫ sites toward the N terminus; ϩ sites toward the C terminus) and in particular in the ϩ1 to ϩ4 positions, with a strong preference for Lys or Arg at the ϩ2 position. Acidic residues (Asp and Glu) were under-represented in the ϩ positions. On the Ϫ side there was a noticeable preference for Asp in the Ϫ2 position and Asn in the Ϫ4 position (Fig. 3A). This pattern of acidic/polar side chains on one side of the Tyr and basic side chains on the other side suggests a strong electrostatic component in the substrate recognition.
Among the top 10 phosphopeptides that were almost completely dephosphorylated by Lem4-N (free phosphate concentration at the end of the reaction was close to the initial peptide concentration of 10 M) (Table 1), only one belonged to a protein that had been previously shown to reside on LCV during L. pneumophila infection (macrophage colony-stimulating fac-

Structure and function of Legionella Lem4 effector
tor 1 receptor (CSF1R)) (32). Indeed, two CSFR1 peptides derived from the physiologically important sites of Tyr-708 and Tyr-809 were efficiently dephosphorylated by Lem4-N. Autophosphorylation of CSF1R at Tyr-708 and Tyr-809 is important for the receptor interaction with downstream signaling molecules (33).
To validate the results obtained in the peptide array, we syn-thesizedseveralpeptidesandrepeatedtheLem4-Ndephosphorylation reaction in a larger volume. The apparent initial rate of the reaction for the two best substrate peptides identified in the array was significantly higher than for the phosphotyrosine or for the generic phosphatase substrate ENDpYINASL (Fig. 3B). Moreover, the peptide KLNTEEpYLRVIGK that best conforms to the identified consensus motif is a better substrate of Lem4-N than IHLEKKpYVRRDSG, which has lysine residues in Ϫ1 and Ϫ2 positions instead of negatively charged residues of the consensus sequence. A, graphical representation of the consensus motif was generated with pLogo web server using 53 peptides with dephosphorylation level Ͼ60% as a foreground set. All the 355 peptides from the substrate array served as a background set. Residue heights are scaled relative to their statistical significance (64). Over-and under-represented residues are positioned above and below the x axis, respectively. Phosphorylated Tyr is "fixed" in a zero position and highlighted in gray. The horizontal red lines correspond to statistically significant values (p ϭ 0.05) as described previously (64). B, activity assay of Lem4-N with different substrates: pTyr, general phosphatase substrate ENDpYINASL and two peptides identified as good substrates in a peptide array. For all substrates in 0.2 mM concentration, Lem4-N was added to a final concentration of 50 nM; final reaction volume was 1 ml, and reactions were performed in triplicates. Reaction products (free tyrosine or dephosphorylated peptides) were detected by measuring A 280 of the reaction mixture.

Structure and function of Legionella Lem4 effector Active site of Lem4-N
The structural similarity to MDP-1 allowed us to propose a putative active site for Lem4-N. The HAD superfamily members that perform phosphoryl transfer have four signature motifs located in the loops, which carry the conserved catalytic residues. Lem4 contains all four of these motifs (Fig. 4A). Side chains of nucleophilic Asp-25 (loop I) and Asp-157 (loop IV), as well as the backbone carbonyl oxygen of Asp-27 (loop I), coordinate the Mg 2ϩ ion present in the active center. The Mg 2ϩ adopts an octahedral coordination with the three other ligands being water molecules. Residues Lys-134 (loop III) and Ser-96 (loop II), which are known to form H-bonds with the phosphate group, are also conserved (Fig. 4B). All these residues are located at the bottom of a deep tunnel, ϳ9 Å from the protein surface. The entrance to the tunnel is highly restricted by the surface residues Lys-36 and Ile-39 (beginning of the cap), Arg-97 (end of strand ␤2), Arg-122 (end of strand ␤3), and Asp-161 (beginning of helix ␣6). The latter forms a salt bridge with Arg-97. These putative Lem4 active-site residues and the position of the Mg 2ϩ ion superimpose very well on their counterparts in MDP-1 (Fig. 4C). This structural likeness along with the preference for similar substrates suggest that both enzymes use the same strategy to desolvate the active site for catalysis. The bulky leaving group of the substrate fits into the restricted opening on the surface of the protein that leads to the active center and shields the active center from water (28).

Co-crystallization of Lem4-N with phosphotyrosine and with a peptide substrate
We have shown that a peptide containing the pTyr-708 sequence from CSF1R is a substrate for Lem4-N. We attempted to capture the WT enzyme-substrate complex using either pTyr or the synthetic phosphorylated peptide IHLEKK-pYVRRDSG, where underline indicates position 708. Unfortunately, these attempts were unsuccessful. We therefore created several Lem4-N mutants with compromised activity by introducing single point mutations into the main conserved motifs of the active center, namely D25N in loop I, S96A in loop II, and D157N in loop IV. Indeed, two of these mutants showed very weak residual activity (S96A) or no detectable activity (D25N) with pNPP (Fig. S2), pTyr, or pTyr-708 -containing peptide (data not shown) as substrates. D157N mutant underwent partial degradation during expression and purification, which might indicate that the mutation affected protein stability.
Lem4-N(D25N) and Lem4-N(S96A) produced diffraction quality crystals in similar conditions to the WT Lem4-N. These crystals were soaked with pTyr or with the pTyr-708containing peptide, and their diffraction data were collected. Although neither tyrosine nor the peptide were visible in the structure, the presence of a phosphate in the active site was clear (Fig. 5A). This suggested that either the mutant Lem4-N displayed a low residual activity or that a spontaneous hydrolysis of the pTyr occurred under crystallization conditions. Interestingly, three of the four phosphate oxygens' positions were occupied by water molecules in the native structure (Fig. 5B). As a control, we collected diffraction data from a crystal of Lem4-N(D25N) alone. We found some electron density in the active site; however, the density was flattened and indicated a planar molecule. We modeled an acetate molecule, which was present in a 200 mM concentration in the crystallization solution, into this density (Fig. 5C).
The phosphate is tightly held in the active site. One of its oxygens coordinates the Mg 2ϩ ion, whereas other oxygens form H-bonds with OD2 and NH of Asp-27, OD1 of Asn-25, NH of Arg-97, and NZ of Lys-134. Upon phosphate binding, the active center of Lem4-N undergoes several rearrangements: (i) the Lys-134 side chain shifts toward phosphate and forms an H-bond with its oxygen atom, and (ii) the side chain of the conserved Arg-97, stabilized by a salt bridge with Asp-161 in the absence of a substrate, becomes almost completely disordered in the enzyme-substrate complex, providing the necessary space for the substrate leaving group.

Lem4 co-localizes with lysosomes
To explore the function of Lem4 in mammalian cells, we investigated the subcellular localization of Lem4. Full-length Lem4 was C-terminally tagged with GFP and overexpressed in HEK293 and HeLa cells (Fig. 6A). Cells transfected with the GFP-encoding vector served as a control. The subcellular localization of these constructs was examined using fluorescence microscopy.

Structure and function of Legionella Lem4 effector
GFP control was evenly distributed in the cytosol (Fig. 6A, upper panel), whereas Lem4-GFP predominantly showed a punctate pattern both in HEK293 and HeLa cells (Fig. 6A, lower panel). Lem4 possesses a predicted P4M PI(4)P-binding domain at the C terminus, and the subcellular localization of ectopically expressed Lem4 suggested that it may bind to a specific host organelle. Markers of several organelles (cis-Golgi (GM130), endoplasmic reticulum (ER) (calnexin), and lysosomes (LAMP1, LysoTracker Red)) were used to visualize various compartments in HEK293 cells. In this experiment, Lem4-GFP co-localized with lysosomes but was also present at the plasma membrane. No co-localization with the ER or cis-Golgi was observed (Fig. 6, B and C).

Localization of Lem4 is determined by the C-terminal domain
We next inquired which Lem4 domain is responsible for the observed localization. For this purpose, the individual Lem4 domains were tagged with GFP and expressed in HEK293 cells.
The HAD-like domain (aa 1-200) was evenly distributed in the cytosol but had no specific localization (Fig. 7, A and B). However, the P4M domain (aa 200 -322) was localized to lysosomes and the plasma membrane, similarly to the localization of a full-length protein (Fig. 7, A and B). Similar localization was observed in the HeLa cells (data not shown) and the macrophage cell line RAW264.7 (Fig. 7C). The C-terminal P4M domain was sufficient to localize Lem4, indicating that the binding to PI(4)P is essential for localization.
We then filtered the list of potential targets based on the presence of physiologically relevant Tyr phosphorylation sites that matched the substrate recognition pattern of Lem4. Almost one-half of the top 25 hits were proteins involved in membrane dynamics and cytoskeleton rearrangement, including catenin ␦-1 that contributes to regulation of cell-cell adhesion and plays a role in inflammatory response in infectious models (34), as well as the AP2 complex subunit ␤ (responsible for clathrin-dependent endocytosis (35) and Ras GTPase-activating-like protein IQGAP1 that regulates actin cytoskeleton by binding small GTPases Cdc42 and Rac1 (Table S2) (36). Interestingly, these proteins are involved in Wnt signaling, a pathway modulated by many pathogenic bacteria as part of their infectious cycle (37). Reactome pathway analysis of the list of 258 potential Lem4 interaction partners also demonstrated significant enrichment in the following categories: "␤-cateninindependent WNT signaling (R-HSA-3858494)" (20 proteins, p value 5.0 ϫ 10 Ϫ12 ); "signaling by Wnt (R-HSA-195721)" (23 proteins, p value 5.0 ϫ 10 Ϫ09 ); and "degradation of ␤-catenin by the destruction complex (R-HSA-195253)" (eight proteins, p value 1.76 ϫ 10 Ϫ02 ) (Table S2).

Site-specific tyrosine phosphorylation profiling of Lem4overexpressing mammalian cells
To further investigate the potential cellular dephosphorylation targets of Lem4, we probed the phosphorylation level of cellular proteins when Lem4 was expressed in mammalian cells. For detection of phosphorylation, we have used an array featuring 228 site-specific phosphotyrosine antibodies (Full Moon BioSystems).
We generated HEK293 cells lines stably expressing Lem4-GFP or GFP alone and compared tyrosine phosphorylation in these cell lines using the antibody array (Fig. 8). Although we detected several proteins with decreased phosphorylation levels in Lem4-expressing cells, there were significantly more pro-

Discussion
L. pneumophila binding and uptake by the host cell leads to significant changes in phosphorylation state of multiple host proteins (38, 39). Secreted effectors of L. pneumophila were shown to interfere with mitogen-activated protein kinase (MAPK) (40) and NF-B (nuclear factor -light chain enhancer of activated B cells) pathways to promote infection (41).
Until recently, no Legionella effectors with protein phosphatase activity that could directly alter the phosphorylation state of host cell proteins were known. However, a structural approach toward the determination of bacterial effector function proved to be particularly efficient in identifying protein phosphatases. Structures of effectors WipA and WipB revealed that these proteins harbor protein phosphatase domains related to the eukaryotic phosphoprotein phosphatase family (42,43). WipB displays a Ser/Thr phosphatase activity and is targeted by its nonenzymatic C-terminal domain to lysosomes. There, it was shown to interact with the components of the lysosomal nutrient sensing (LYNUS) apparatus (42). WipA, however, despite being structurally similar to Ser/Thr phosphatases, preferably dephosphorylated phosphotyrosine-containing peptides and proteins in vitro (43).
Recently, the structure of another member of the HAD superfamily present in L. pneumophila, Ceg4, was reported (44). Like Lem4 (reported here), Ceg4 was also identified as a tyrosine phosphatase based on the structural information and was confirmed experimentally. The Ceg4 structure serendipi-

Structure and function of Legionella Lem4 effector
tously revealed a tyrosine side chain from an uncleaved tag sequence of a symmetry-related molecule bound to the activesite pocket. Ceg4 dephosphorylates MAPK p38 and thus attenuates the MAPK-signaling pathway.
HAD phosphatases are known to be rather promiscuous and to show a broad substrate range (45), with their specificity often being determined by the nature of a so-called cap domain (46). Interestingly, the absence of a cap domain leads to higher substrate specificity (45). Being a close structural homolog of mouse tyrosine phosphatase MDP-1 that lacks a full-sized cap domain, Lem4 also demonstrates strong specificity for phenylphosphate-containing substrates like pNPP and pTyr. Using an array of phosphorylated peptides, we detected a consensus motif that was preferably dephosphorylated by Lem4 and identified the CSF1R protein as one of the potential substrates of Lem4.
CSF1R is a receptor tyrosine kinase expressed on the surface of macrophages that is activated by the colony-stimulating factor-1 (CSF-1) and interleukin-34 (IL-34). It exerts pleiotropic effects in development, innate immunity, inflammation, and tissue repair (47). CSF-1 binding promotes rapid dimerization of CSF1R and its autophosphorylation on several tyrosine residues, including Tyr-708 and Tyr-809, which leads to activation of mitogen-activated protein kinase/extracellular signal-regulated kinase kinase and phosphatidylinositol 3-kinase pathways (48,49). Interestingly, CSF1R proceeds to signal from endosomes even after its CSF-1-induced internalization and thus enables prolonged activation of ERK1/2 and Akt pathways (50). CSF1R has been identified as a part of the LCV proteome in human macrophages (32) and might be targeted by L. pneumophila effectors, including Lem4, which could hamper CSF1R intracellular signaling through tyrosine dephosphorylation.
Even though CSF1R appears to be a good substrate for Lem4, the detected selectivity for peptide substrates is rather broad, indicating that Lem4 could dephosphorylate other pTyr-containing proteins that come in its proximity. Therefore, its localization and the temporal regulation of its expression is crucial for proper functioning during infection. Localization studies of ectopically expressed Lem4 in mammalian cells infected with L. pneumophila demonstrated strong co-localization with LCVs 30 min post-infection, suggesting that it acts very early during infection (20). This is consistent with the fact that Lem4 is up-regulated by the CpxRA system (22), which was shown to activate effectors involved in establishing the LCV compartment (51).
Localization of Lem4 to the host cell membranes enriched in PI(4)P is fully determined by the binding of its nonenzymatic C-terminal P4M domain. Previously, it has been reported that Lem4 predominantly localizes to the plasma membrane upon overexpression in HEK293 cells (20). Our localization experiments demonstrated that Lem4 is only partially bound to plasma membrane and predominantly associates with vesicular structures inside the cell that were also stained with an anti-LAMP1 antibody or the LysoTracker dye for acidic organelles that are used to label and track lysosomes. The P4M domain from another L. pneumophila effector, SidM, has been reported to predominantly bind the Golgi apparatus, but it was also present at the plasma membrane (24). P4M domains from Lem4 and SidM share only 30% sequence identity and 56% similarity, which could rationalize differences in their preferred localization.
The Ser/Thr phosphatase WipB, which has also been shown to be localized to the lysosomes, shares similar domain organization with Lem4. It has an N-terminal phosphatase domain and a C-terminal domain responsible for localization. The C-terminal domain of WipB (residues 399 -508) is homologous to a protein-binding domain of another Legionella effector, RavJ (lpg0944, residues 238 -374), with 34% sequence identity, as determined by the HHpred homology detection and structure prediction server (52). Both proteins also share a conserved WXRHH motif, which in the case of RavJ was required for interaction with components of the host septin and elongator complexes (53). WipB is targeted by its C-terminal domain to another large host macromolecular assembly, a lysosomal nutrient-sensing (LYNUS) apparatus on the surface of lysosomes. Although WipB and Lem4 are localized to the same cellular compartments, they are driven toward their host targets by different mechanisms: protein-lipid binding in the case of Lem4 and (probably) protein-protein interactions in the case of WipB.
Specific localization of Lem4, as well as tight temporal regulation of its expression during infection, may compensate for the broader specificity of this protein toward phosphotyrosinecontaining macromolecular substrates. Our proteomics analysis supports the localization of Lem4 to plasma membrane and lysosomes. Additional filtering of the proteins co-purifying with Lem4 as identified by proteomics and that adhere to the Lem4 substrate pattern (Fig. 3) resulted in a list that was significantly enriched in proteins involved in cell adhesion, tight junction assembly, and cytoskeleton rearrangements. The list included three components of the Wnt/␤-catenin-signaling pathway, indicating the potential role of Lem4 in regulating ␤-catenin levels during L. pneumophila infection.
These findings were supported by phosphotyrosine profiling of Lem4-overexpressing HEK293 cells, which revealed activation of JAK/STAT and focal adhesion signaling pathways, as well as changes in tyrosine phosphorylation levels of several components of the actin cytoskeleton regulation pathway. Although our co-immunoprecipitation experiments provide information about the mammalian proteins potentially involved in protein-protein interactions with Lem4, additional research is required to identify and validate the direct substrates of Lem4 during Legionella infection.

Cloning
The gene encoding full-length Lem4 (lpg1101) (residues 2-322) was PCR-amplified from the genomic DNA of L. pneumophila strain Philadelphia 1 (ATCC 33152). Based on bioinformatics analysis, we designed and cloned a construct containing residues 15-200. Later, we identified by limited proteolysis a slightly longer fragment 6 -218 that behaved well, and we cloned this fragment as well. For expression in E. coli, the PCR products were cloned into the ligation-independent cloning vector pMCSG7, according to the standard protocol (54). The

Structure and function of Legionella Lem4 effector
pMCSG7 vector encodes an N-terminal His 6 tag, separated from the target gene by a cleavage site for TEV protease.
For the expression of Lem4 and its truncated constructs in the mammalian cells, the lpg1101 gene or its fragments were PCR-amplified and cloned into XhoI/BamHI sites of pEGFP-N1 (Clontech) expression vector. Restriction enzymes and T4 DNA ligase were purchased from New England Biolabs (Ipswich, MA).
Site-directed mutagenesis for D25N, S96A, and D157N was performed with Q5 site-directed mutagenesis kit (New England Biolabs), according to the manufacturer's instructions. All obtained constructs were verified by DNA sequencing.

Protein expression and purification
Plasmids carrying designed constructs were transformed into BL21(DE3)pLysS cell line. All culture media were supplemented with 100 g/ml ampicillin and 25 g/ml chloramphenicol. Overnight culture was started in 50 ml of LB media, supplemented with 0.4% glucose and cultivated for 16 h at 37°C. The overnight culture was inoculated into 1 liter of Terrific Broth and grown at 37°C with shaking at 200 rpm. The culture was induced with 0.5 mM isopropyl 1-thio-␤-D-galactopyranoside when A 600 reached 1.5. The temperature was lowered to 18°C; cells were incubated for another 16 h and then pelleted by centrifugation for 20 min at 5000 ϫ g at 4°C. For selenomethionine labeling, a shorter version of the Lem4 HAD domain, His-Lem4 , was expressed in a methionine auxotrophic E. coli strain B834(DE3) (Novagen).
Cell pellets were resuspended in 50 ml of buffer A (50 mM HEPES, pH 8.0, and 400 mM NaCl), supplemented with a Halt protease inhibitor mixture (ThermoFisher Scientific, 78430) and 2 mM benzamidine. The homogeneous cell mixture was lysed using a cell disrupter (Constant Systems Ltd., TS Series Benchtop), and the lysate was centrifuged at 30,000 ϫ g for 30 min. The clarified lysate was loaded onto 5 ml of Talon metal affinity resin (Clontech, 635504) prewashed with water and buffer A. The binding was performed at 4°C for 1 h. After the binding, 30 ml of buffer A and 10 ml of buffer A supplemented with 10 mM imidazole were applied to wash the resin. Protein was eluted in two steps with 10 ml of buffer A supplemented with 50 and 100 mM imidazole. Target protein was treated with TEV protease at a 1:50 (enzyme/substrate) ratio to remove the tag. The cleavage reaction was set up in a dialysis bag against 1 liter of dialysis buffer (20 mM HEPES, pH 8.0, 150 mM NaCl, 0.5 mM tris(2-carboxyethyl)phosphine) for 16 h. After the cleavage reaction, the protein mixture was loaded onto 1 ml of Nuvia IMAC nickel-charged resin (Bio-Rad, 7800801) to eliminate His-tagged TEV protease. Cleaved protein was collected and concentrated to 20 mg/ml. The concentrated protein was loaded onto an Enrich SEC650 10 ϫ 300 column (Bio-Rad, 7801650) for further separation. After gel filtration, peak fractions were pooled and concentrated to 25 mg/ml for crystallization screening.

Crystallization and data collection
Crystallization screening of the full-length Lem4 (residues 2-322) did not lead to crystals, but the construct containing residues 15-200 with a cleaved N-terminal His tag could be crystallized. This protein was used to obtain SeMet-containing crystals that were used for phase determination. The crystals of SeMet Lem4  were optimized by whisker seeding with a well solution containing 0.1 M MES, pH 6.5, 0.2 M magnesium acetate tetrahydrate, 0.65 M NaCl, and 18% (w/v) PEG 8,000 and were grown at 20°C. These crystals diffracted to 2.0 Å resolution.
Screening the 6 -218 fragment for crystallization with the Gryphon robot (Art Robbins Instruments) using the ComPAS Suite (Qiagen) identified a new set of crystallization conditions. The optimized conditions contained 0.1 M MES, pH 6.5, 0.2 M magnesium acetate tetrahydrate, and 18% (w/v) polyethylene glycol (PEG) 8,000; the crystals grew at 20°C. These crystals diffracted to 1.75 Å resolution and were used to obtain the final structure. For data collection, the crystals were soaked in a cryo-protectant (reservoir solution supplemented with 25% glycerol) and flash-cooled in liquid nitrogen.
The X-ray diffraction data collection was performed at 100 K on the 08ID-CMCF beamline at the Canadian Light Source (Saskatoon, Saskatchewan, Canada) (55) and on the LRL-31-ID-D beamline at the Advanced Photon Source at Argonne National Laboratory. The native and SeMet datasets were processed and scaled with XDS (56).

Structure solution and refinement
The positions of selenium atoms for the Lem4(15-200) construct were found with SHELXD (57), and the initial model was built with Phenix AutoBuild (58). The model was then transferred to the higher resolution native data from the Lem4(6 -218) crystal by molecular replacement. The structure was refined using the PHENIX software package (58) combined with manual rebuilding using Coot (59). The model contains residues Glu-15-Lys-206, one Mg 2ϩ , and 175 water molecules. The native structure was a starting model for the Lem4(6 -218)-D25N with and without soaked substrate. The Lem4(6 -218)-D25N model contains residues Ser-11-Asn-210, one Mg 2ϩ and 127 water molecules. The structures were validated with MolProbity (60). The pertinent details of data collection and refinement statistics are shown in Table 2. The coordinates and structure factors for Lem4-N, Lem4-N(D25N)substrate and Lem4-N(D25N) were deposited at the Protein Data Bank with codes 6CGJ, 6CGK, and 6CDW, respectively.

Phosphatase activity measurements
The assay was carried out in 96-well assay plates (Costar 9017, Corning, NY). For small molecule substrate screening, phosphatase substrates were diluted with the reaction buffer (50 mM sodium acetate, pH 5.5, 0.1 M NaCl, 5 mM MgCl 2 ) to 0.1 mM concentration. Lem4(6 -218) (10 g/ml final concentration, or ϳ400 nM) was added to the wells, and the reaction was carried out for 30 min at 30°C. The same buffer with 2 mM EDTA instead of magnesium was used in control reactions.

Structure and function of Legionella Lem4 effector
After the reaction ran to completion, plates were equilibrated to room temperature for 5 min, and malachite green reagent BIOMOL Green (Enzo Life Sciences, Farmingdale, NY) was added to measure free phosphate. Plates were incubated for 20 min at room temperature, and the absorbance at 630 nm was measured. The measured values were adjusted by subtracting the values from the control wells (with EDTA). Free phosphate concentrations were calculated using a phosphate standard curve. All reactions were performed in duplicates.
For assessing substrate selectivity, we utilized the peptide library tyrosine phosphatase substrate set (JPT, Berlin, Germany) in a 384-well plate format. The assay was performed following the manufacturer's instructions, using the buffer and Lem4 concentration as described above for small molecule substrate screening.
Kinetic measurements were performed in the same assay buffer in the presence of 30 nM Lem4-N for 2 min at 25°C with various substrate concentrations (0.5 Ϭ 10 mM for pTyr and 0.15 Ϭ 10 mM for pNPP). Free phosphate concentration was measured by malachite green assay as described above.
Activity measurements of Lem4-N in the presence of pTyr or different peptide substrates were performed at 25°C in the same assay buffer as described above, using the substrate concentration of 0.2 mM and Lem4-N concentration of 50 nM. Dephosphorylated substrates (free tyrosine or tyrosine-containing peptides) were detected by measuring absorbance at 280 nm in a 1-cm cuvette; reaction volume was 1 ml. Each reaction was repeated three times.

Mammalian cell culture, transient transfection, and microscopy
HEK293 or HeLa cells were cultured in Dulbecco's modified Eagle's medium (Sigma) supplemented with 10% fetal bovine serum (FBS) (Sigma) at 37°C with 5% CO 2 . DNA constructs were transfected into cells using the X-treme GENE TM HP DNA transfection reagent (Roche Applied Science, catalog no. 06366236001) according to the manufacturer's instructions.
To visualize lysosomes, the cells were stained with Lyso-Tracker Red DND-99 (ThermoFisher Scientific) according to the manufacturer's procedure. Microscopy images were collected on a laser scanner confocal microscope (Zeiss LSM700).
To generate cell lines that stably express GFP or Lem4-GFP, the HEK293 cells were transfected with pEGFP or pEGFP-Lem4 plasmids. The cells were selected and maintained in DMEM containing 200 g/ml G418 (Geneticin, Sigma).

Mass spectrometry
Lem4 D25N-GFP was overexpressed in mammalian cells as described above, and cells were harvested and lysed by sonication in the PBS buffer containing PhosSTOP Phosphatase Inhibitor Mixture (MilliporeSigma, Darmstadt Germany) and 2 mM EDTA. Protein complexes were cross-linked with dithiobis(succinimidyl propionate) and used for affinity purification (AP) followed by mass spectrometry (MS) essentially as described previously (61) except for the following: for AP, Miltenyi Biotec anti-GFP beads were used with binding buffer (25 mM Tris-HCl, 10 mM HEPES-KOH, 500 mM NaCl, 1 mM imidazole, 2.5 mM EDTA, 2.5 mM EGTA, 0.5% Triton X-100, 0.5% Nonidet P-40) containing protease and phosphatase inhibitors. Following binding, cells were washed twice with Table 2 Summary of data collection and refinement statistics The information for the highest resolution shell is given in parentheses.
Proteins, present only in the Lem4(D25N)-GFP samples, were identified by selecting proteins with two or more peptides in each sample and eliminating proteins present in the control samples. In addition, the AUC for each protein across samples were normalized within each sample. A ratio of experimental over control AUC for each protein ((Lem4(D25N)-GFP AUC for the protein)/(GFP AUC for the protein)) was calculated. This allowed us to identify proteins present in the control (GFP) and Lem4(D25N)-GFP samples but over-represented in Lem4 samples. The ratios were then log-transformed; the Z-score was normalized, and distribution-based significance cutoffs (62) were calculated. A statistical over-representation test was performed using the PANTHER program (63) version 12.0 with default settings.

Phospho-specific protein microarray analysis
Tyrosine phosphorylation ProArray was purchased from Full Moon Biosystems (Sunnyvale, CA). HEK293 cell lines stably expressing either GFP (control) or Lem4-GFP were used in the experiment. Following the manufacturer's instructions, 100 g of cell lysate was labeled with biotin and conjugated with the array antibodies using the Array Assay Kit (Full Moon Bio-Systems). The conjugated labeled protein was detected using Alexa Fluor-555 streptavidin (Invitrogen). Fluorescence intensity scanning service was provided by Full Moon Biosystems. For data analysis, the intensity of the array was measured using GenePix Pro 6.0 software (Axon Instruments). The normalized data for each array were computed as follows: normalized data ϭ (average signal intensity of replicate spots)/(median signal of the average signal intensity) for all antibodies on the array. The normalized data were then used to determine the fold change between the control (GFP) and Lem4-GFP samples.