Toxoplasma gondii Cathepsin L Is the Primary Target of the Invasion-inhibitory Compound Morpholinurea-leucyl-homophenyl-vinyl Sulfone Phenyl*

The protozoan parasite Toxoplasma gondii relies on post-translational modification, including proteolysis, of proteins required for recognition and invasion of host cells. We have characterized the T. gondii cysteine protease cathepsin L (TgCPL), one of five cathepsins found in the T. gondii genome. We show that TgCPL is the primary target of the compound morpholinurea-leucyl-homophenyl-vinyl sulfone phenyl (LHVS), which was previously shown to inhibit parasite invasion by blocking the release of invasion proteins from microneme secretory organelles. As shown by fluorescently labeled LHVS and TgCPL-specific antibodies, TgCPL is associated with a discrete vesicular structure in the apical region of extracellular parasites but is found in multiple puncta throughout the cytoplasm of intracellular replicating parasites. LHVS fails to label cells lacking TgCPL due to targeted disruption of the TgCPL gene in two different parasite strains. We present a structural model for the inhibition of TgCPL by LHVS based on a 2.0 Å resolution crystal structure of TgCPL in complex with its propeptide. We discuss possible roles for TgCPL as a protease involved in the degradation or limited proteolysis of parasite proteins involved in invasion.

The recent completion of many genome-sequencing projects has allowed an unprecedented view of the complete set of proteases in biologically or medically important organisms (1). Of the five mechanistically distinct catalytic types (serine, cysteine, aspartyl, metallo, and threonine), cysteine proteases are the sec-ond largest group. In particular, cysteine proteases of the C1 papain family of "lysosomal" cathepsins have garnered intense scrutiny because of their key roles in cancer, embryogenesis, heart disease, osteoporosis, immunity, and infectious diseases. Microbial cathepsins, particularly those expressed by parasites, have also attracted attention recently because of their potential as targets for treatment of helminthic and protozoal infections (2,3).
The protozoan parasite Toxoplasma gondii infects virtually all warm-blooded animals and approximately one-third of the human population worldwide. Although most Toxoplasma infections are benign, severe opportunistic disease is seen in immunodeficient or immunosuppressed individuals or congenitally infected babies. T. gondii is an obligate intracellular organism that uses an actin-myosin-based motility system to actively invade nucleated host cells (4,5). The parasite secretes a variety of proteins during and after cell invasion that contribute to recognition of the host cell, formation of an adhesive "moving" junction, modulation of host signaling pathways and gene expression, and remodeling of the parasitophorous vacuole in preparation for parasite growth (6,7). Although it has been known for some time that many Toxoplasma secretory proteins are post-translationally modified by proteolysis before and/or after secretion, in most cases, the consequences of proteolysis or the specific protease involved are unclear.
Analysis of the T. gondii genome indicates the existence of five genes encoding cathepsin proteases of the papain family, including three cathepsin C proteases (TgCPC1, TgCPC2, and TgCPC3), one cathepsin B (Toxopain-1 or TgCPB), and one cathepsin L (TgCPL). TgCPC1 and TgCPC2 are secreted into the parasitophorous vacuole after parasite invasion and are proposed to function in nutrient acquisition (8). TgCPC3 is not expressed in tachyzoites, a rapidly dividing form of the parasite that is most commonly studied in the laboratory. TgCPB is localized in club-shaped invasion organelles called rhoptries, where it may act as a maturase for rhoptry proteins involved in modulation of the host cell (9). TgCPL is predicted to be a type II membrane protein, and a recent report by Reed and co-workers (10) showed that it has enzymatic activity with a low pH optimum and that it occupies a membrane-bound structure in the apical region of extracellular parasites. This same study revealed that T. gondii expresses two endogenous inhibitors of cysteine proteases (TgICP1 and TgICP2), but their role in regulating parasite or host cysteine proteases remains to be determined. Similar inhibitors are expressed by other parasites, including Trypanosoma cruzi, that act on host proteases, and the crystal structure of an inhibitor (chagasin)-enzyme (human cathepsin L) complex was recently reported (11).
In a recent study, we screened a small library of cathepsin and proteasome inhibitors and identified two compounds that substantially impair Toxoplasma cell invasion (12). The most effective of these compounds, morpholinurea-leucyl-homophenylvinyl sulfone phenyl (LHVS), 2 inhibited invasion with a 50% inhibitory concentration (IC 50 ) of ϳ10 M. Further analysis revealed that LHVS blocks parasite attachment and gliding motility by impairing the release of proteins from a distinct set of apical secretory organelles called micronemes. Here we definitively show, using a variety of biochemical, genetic, and structural approaches, that TgCPL is the primary target of LHVS in the parasite.

EXPERIMENTAL PROCEDURES
Cloning, Protein Expression, Purification, Refolding, and Autoactivation-A 0.95-kb fragment coding for 94 carboxyl terminal amino acids of the prodomain and the complete 224amino acid mature domain of TgCPL (100TgCPL) was amplified by PCR from a T. gondii RH cDNA library 3 with primers TgCatL.313(BamH1)f and TgCatL.1269(HindIII)r (supplemental Table S1) using Expand TM High Fidelity enzyme mix containing Taq DNA polymerase and Tgo DNA polymerase with proofreading activity (Roche Applied Science). The PCR product was gel-purified using a Qiagen gel extraction kit, initially ligated into pGEM-T Easy Vector (pGEM-T/100TgCPL), and transformed into the DH5␣ Escherichia coli strain. Clones were sequenced in both directions and digested with the restriction enzymes BamHI and HindIII. The complete cDNA encoding TgCPL was also amplified, cloned, sequenced, and deposited in GenBank TM (accession number DQ407191).
For recombinant protein expression, the 100TgCPL cDNA was ligated into BamHI-and HindIII-digested pQE30 vector, which provided an in-frame His 6 tag at the N terminus (Qiagen), thereby generating pQE30/100TgCPL. Competent E. coli M15[pRep4] cells (Qiagen) were transformed with pQE30/ 100TgCPL and grown at 37°C in Terrific Broth containing 100 g/ml ampicillin and 25 g/ml kanamycin to A 600 ϳ0.6 when 1 mM isopropyl-␤-D-thiogalactopyranoside was added before additional culturing for 5 h. Inclusion bodies were purified from the harvested cells and resuspended in denaturing buffer (20 mM Tris-HCl, pH 8.0, 300 mM NaCl, 10 mM imidazole, and 8 M urea). Solubilized inclusion bodies were then incubated with Ni 2ϩ -nitrilotriacetic acid resin (Qiagen), equilibrated with the same buffer, overnight at room temperature. The mixture was loaded into a column and washed with 20 -30 column volumes of denaturing buffer, and denatured proform TgCPL was eluted with elution buffer (100 mM NaH 2 PO 4 , 10 mM Tris, 8 M urea, pH 4.5). Elution fractions were pooled and concentrated ϳ5-fold using a spin concentrator (Millipore) and dialyzed overnight back into denaturing buffer. The denatured protein was then reduced with 10 mM dithiothreitol for 45 min at 37°C and diluted to 20 g/ml with ice-cold refolding buffer (100 mM Tris-HCl, pH 8.0, 1 mM EDTA, 20% glycerol, 250 mM L-arginine, 2 mM reduce glutathione, 1 mM oxidized glutathione). After incubation at 4°C for 48 h with moderate stirring, proform recombinant TgCPL (rTgCPL) was concentrated ϳ10-fold and dialyzed overnight into proform buffer (50 mM Tris-HCl, pH 6.8, 900 mM NaCl, 2 mM EDTA). To activate the protease, proform rTgCPL was exchanged into activation buffer (100 mM sodium acetate, pH 5.5, 900 mM NaCl, 2 mM EDTA), followed by an increase of the dithiothreitol concentration to 5 mM and incubation at 37°C for 4 -5 h. The efficiency of maturation was monitored using SDS-PAGE. Autoactivated rTgCPL was then concentrated to 4.7 mg/ml, supplemented with 0.825 mM LHVS, and immediately flash frozen in liquid nitrogen prior to storage at Ϫ80°C to await crystallization.
Antibody Production-A primary injection of rabbits with 200 g of proform rTgCPL in Freund's complete adjuvant (Sigma) was followed by four boosts of 200 g each in Freund's incomplete adjuvant (Sigma) at 2-week intervals. A polyclonal mouse antiserum also was raised against proform rTgCPL by immunizing mice with a single injection of 25 g of recombinant protein mixed with TiterMax Gold adjuvant (Cytex). Serum was collected 4 weeks postimmunization.
Activity-based Profiling-RH strain tachyzoites, grown in human foreskin fibroblasts and Dulbecco's minimal essential medium, containing 10% fetal bovine serum, 2 mM glutamine, were filter-purified, washed with Dulbecco's minimal essential medium-glutamine-HEPES medium (Dulbecco's minimal essential medium, 2 mM glutamine, 10 mM HEPES), and resuspended at 5 ϫ 10 8 ml Ϫ1 . Fifty l of parasite suspension was added to each well of a round bottom microwell plate containing 0.5 l of DMSO or 20 M BODIPY-LHVS (BO-LHVS) in DMSO. After labeling for 30 min in a humidified 37°C, 5% CO 2 incubator, the plate was cooled on ice and centrifuged 5 min at 4°C. Forty-five l of supernatant was removed, and parasites were lysed in 50 l of SDS-PAGE sample buffer containing 2% 2-mercaptoethanol and placed in a 100°C water bath for 5 min. Samples were vortexed vigorously, microcentrifuged to remove insoluble material, and resolved by 12.5% SDS-PAGE. Gels were imaged on a Typhoon Trio PhosphorImager (GE Healthcare) and quantified using ImageQuant software.
Targeted Disruption of TgCPL-Genomic flanking sequences of the TgCPL gene were obtained from the Toxoplasma gondii data base (ToxoDB (39)   S1) were used to amplify ϳ1.5 kb of the 5Ј-and 3Ј-sequence flanking the gene. The flanks were amplified to have overlap on one end with the dihydrofolate reductase-thymidylate synthase (DHFR-TS) selectable marker cassette, and similarly, the DHFR-TS marker cassette was amplified to contain TgCPL sequences on the ends. RH genomic DNA was used as template for amplifying the 5Ј-and 3Ј-flanks, and pDHFR-TSc plasmid (13) was used as template for amplifying the DHFR-TS selectable marker. A fusion PCR product was then amplified using primers, and the three individual 5Ј, 3Ј, and DHFR-TS fragments as templates. The TgCPL knock-out construct (13 g) was electroporated into RH and Ku80 tachyzoites using conditions described previously (14). Parasites were grown in 24-well plates for 2 days in the absence of drug selection before transferring them to new wells containing 1 M pyrimethamine (Sigma). After 2 weeks of selection, parasite clones were derived by limited dilution in 96-well plates. Clones were screened for the absence of TgCPL expression by immunoblotting and fluorescence microscopy. Clones were also assessed for correct integration of the construct at the TgCPL locus by PCR using the following primers depicted in Fig Table S1).
Immunoblotting, Immunoprecipitation, and Fluorescence Microscopy-Immunoblotting, immunoprecipitation, and indirect immunofluorescence assay were performed essentially as described previously (15). Antibodies were diluted 1:5,000 for immunoblotting and 1:250 for indirect immunofluorescence assay. Tachyzoites were labeled with 200 nM BO-LHVS for 30 min in a humidified 37°C, 5% CO 2 incubator before fixation with 4% paraformaldehyde and antibody staining. Images were acquired on a Nikon E800 upright fluorescence microscope with a SpotRT slider camera or a Zeiss Axiovert Observer Z1 inverted fluorescence microscope and an Axio-CAM MRm camera and processed using Compix software or Zeiss Axiovision 4.3 software, respectively.
Protein Crystallization-Purified protein was screened at the high throughput facility at the Hauptman Woodward Institute to identify initial crystallization conditions (16). Twenty-four crystallization leads were identified at the 4-week time point. One of these leads was further optimized in house using sitting drop vapor diffusion to produce crystals suitable for x-ray diffraction data collection. Upon thawing, 1 l of autoactivated protein solution (4.7 mg/ml) was mixed with an equivalent amount of reservoir solution (40% polyethylene glycol 8000, 0.1 M ammonium bromide, and 0.1 M sodium citrate at pH 4.0). The resultant drop was equilibrated over a 100-l reservoir at 25°C. American football-shaped crystals measuring ϳ30 m from point to point appeared within 2 weeks. The crystals were cryoprotected by adding 2.5 l of reservoir solution followed by 0.5 l of ethylene glycol directly to the drop. Crystals were then mounted in cryoloops and frozen in liquid nitrogen in preparation for x-ray diffraction experiments.
Data Collection and Structure Determination-Crystals of TgCPL were screened at the Stanford Synchrotron Radiation Lightsource on beamline 11-1 using the Stanford Synchrotron Radiation Lightsource automated mounting system (17). Data from a single crystal maintained at 100 K were collected at a wavelength of 0.98 Å using the Blu-Ice software package (18) and processed using HKL2000 (19) to a resolution of 2.0 Å. The crystals belong to space group P4 3 2 1 2 with unit cell dimensions of 65.6 ϫ 65.6 ϫ 149.8 Å. One complex of the active protease with its propeptide is present in the asymmetric unit, giving a solvent content of ϳ43% and a Matthews coefficient of ϳ2.2 Å 3 /Da. However, the asymmetric unit was initially believed to contain one or, remotely possible, two copies of only the active protease in complex with LHVS, which would correspond to Matthews coefficients of ϳ3.3 or 1.65 Å 3 /Da, respectively.
The structure of TgCPL was solved by molecular replacement with the program Phaser (20), using the full resolution range of the data and a search model derived from human procathepsin L (Protein Data Bank code 1cs8). The sequence identity between the active protease domains of TgCPL and 1cs8 is about 50%, whereas the identity between the full-length constructs of the two proteins is about 42%. Because it was expected that the crystal would contain the protease domain in complex with the inhibitor LHVS, the search model was modified prior to molecular replacement to be consistent with this expectation (21). First, the propeptide domain of 1cs8 was manually removed, and then, using a pairwise sequence alignment created with ClustalW (22), the non-conserved amino acids were truncated or removed using the program CHAINSAW (23). The resulting Phaser-placed model was refined as a rigid body using Refmac5 (24) and manually edited using Coot (25). Refinement continued by iteration of manual editing in Coot, followed by restrained refinement in Refmac5. For all steps from data preparation through refinement, the CCP4 suite of programs (26,27) was used.
After several building/refinement cycles of nearly full-length protease, large polypeptide-like blobs of positive difference density remained, and R factors were stalled near 40%. This suggested that the asymmetric unit contained more than the single copy of the protease that was placed by molecular replacement, but there did not appear to be enough room to place a second copy. LHVS was present in the protein solution, and although it is somewhat peptide-like in appearance, the difference density extended well beyond the active site and was much too large to be accounted for solely by inhibitor. It was then realized that the propeptide domain was not purified away from the active proteases. The model of the protease domain was submitted to ARP/wARP (28) to use for initial phase calculations in the automated rebuilding of the full-length proform of the TgCPL construct. This step also served to reduce bias toward the search model. Indeed, ARP/wARP was able to build 64 amino acids belonging to the propeptide in addition to rebuilding most of the existing model, which resulted in a drop of the R factor by ϳ15%. The iteration of building and refinement continued as before with this more complete model. In the final cycles of refinement, the propeptide and the protease were described by four translation/libration/screw groups each, with group boundaries suggested by the TLSMD server (29,30). Translation/libration/screw parameters were refined for each group prior to restrained refinement in Refmac5. Model quality was monitored and validated using Coot, Mol-Probity (31), and R free . Data collection and model refinement statistics are presented in Tables 1 and 2, respectively.
One hundred sixty-five water molecules, one ethylene glycol molecule from the cryoprotectant, and two halide atoms originating from either the protein storage buffer (chloride) or the mother liquor (bromide) were placed in the final model. The last two corresponded to strong, round difference density peaks that are surrounded primarily by nitrogen atoms but also a few oxygen atoms and do not appear to be water molecules or metal ions because the distances to the closest potential hydrogen bond donors/acceptors or coordinating ligands are too great. The surrounding atoms are contributed by symmetry-related protein molecules, so the modeled halides are involved in crystal contacts.
Modeling of the LHVS Inhibitor into the TgCPL Active Site-A search of the Protein Data Bank (32) for "vinyl sulfone" yielded 14 hits, all of which were inhibitors bound to papainlike proteases or to the ATP-dependent protease, HslV. These vinyl sulfone inhibitor structures were then manually examined for a scaffold that approximated that of LHVS. Through this search, we identified the inhibitor N-[1S-(2-phenylethyl)-3phenylsulfonylallyl]-4-methyl-2R-piperazinyl carbonylaminovaleramide (APC3328) bound to human cathepsin K (Protein Data Bank code 1mem) (33), which differs from LHVS only in that it contains a piperazine ring in place of the morpholine ring of LHVS; a single-atom difference of nitrogen versus oxygen. Structure factors were deposited in the Protein Data Bank with the 1mem coordinates, so it was possible to inspect the electron density of the APC3328 inhibitor. Difference electron density shows that the rotamer selected for the inhibitor's leucine side chain is not ideal, so the side chain was altered from the deposited 1mem coordinates to a rotamer that more favorably fits the density.
APC3328 was initially placed in the active site of TgCPL by performing a least squares superposition of all the catalytic triad atoms of TgCPL and 1mem using Lsqkab (34), after first removing the propeptide residues from TgCPL. The root mean square deviation (RMSD) between the catalytic triads is 0.28 Å, with differences between the side chain atoms of the histidine/asparagine residues contributing the most to this value. Nitrogen N4 of the piperazine ring in the superimposed inhibitor was substituted with oxygen to change APC3328 into LHVS, and the side chains of the catalytic triad residues of TgCPL were substituted with those of the superimposed 1mem structure. The side chain amide of Gln 69 was flipped 180°so that N ⑀ could hydrogen-bond with O4 of the LHVS morpholine ring; O ⑀ of Gln 69 is 3.2 Å from N of Lys 181p in the propeptide-bound structure. To achieve a more meaningful model of the binding mode, energy minimization of this superimposed, modified ligand was performed. To this end, water molecules were removed, and hydrogen atoms were added to protein and ligand polar groups. The active site was defined as residues within 7.0 Å of the starting position of the superimposed ligand. Energy minimization calculations were carried out with QXP/FLO (35). Protein atoms were fixed during the calculations, with the exception of the side chain of Asn 69 .

LHVS Reacts with Recombinant and Native
TgCPL-LHVS is a dipeptide vinyl sulfone that was originally designed as a selective inhibitor of human cathepsins S and V. However, extensive studies of its specificity have revealed that, although most potent against cathepsins S and V, it is also a moderate inhibitor of human cathepsins K, L, and O2 and a weak inhibitor of human cathepsin B (36,37). Since LHVS is an order of magnitude more potent toward human cathepsin L than human cathepsin B, we expected that T. gondii cathepsin L should likewise be much more sensitive to inhibition by this compound than the cathepsin B homolog. To test TgCPL for reactivity with LHVS, we used a fluorescently labeled derivative of LHVS termed BO-LHVS (Fig. 1A). rTgCPL was expressed in E. coli, extracted, purified, refolded, and activated, as described previously (38) (supplemental Fig. S1). BO-LHVS incubation with rTgCPL and analysis by fluorescence-scanned SDS-polyacrylamide gels (activity-based protein profiling) showed saturated covalent labeling at molar ratios of Ͼ1:1 (Fig. 1B). BO-LHVS failed to react with heat-inactivated rTgCPL, demonstrating that labeling is dependent on enzymatic activity. To determine if LHVS also reacts with native TgCPL, we performed activity-based protein profiling of live purified tachyzoites incubated with solvent (DMSO) or 200 nM BO-LHVS (Fig. 1C). BO-LHVS strongly labeled a 30-kDa protein and weakly labeled proteins of 29 and 24 kDa. Immunoprecipitation with anti-rTgCPL confirmed that the 30-kDa species is TgCPL. The 29-kDa species may have also been immunoprecipitated by anti-rTgCPL but was only very faintly detected, whereas the 24-kDa species was not observed in the immunoprecipitate and therefore is probably unrelated to TgCPL.
Since BO-LHVS differs from LHVS by the presence of the BODIPY fluorophore and the absence of the morpholinurea group (Fig. 1A), either of which could affect specificity, it is important to confirm that LHVS is also capable of inhibiting TgCPL in live parasites using a competition assay (Fig. 1D). Tachyzoites were preincubated with increasing concentrations of LHVS before exposure to BO-LHVS and gel analysis. LHVS effectively blocked subsequent TgCPL labeling by BO-LHVS, exhibiting an IC 50 of ϳ20 nM for the 30-kDa species. Interestingly, the 29-kDa species appeared to be slightly more refractory to inhibition by LHVS, suggesting that it is either a distinct protein or a modified form of TgCPL with lower affinity for LHVS. To examine the subcellular distribution of active TgCPL within the parasite, we labeled live extracellular and replicating intracellular tachyzoites with BO-LHVS before fixing and staining with anti-rTgCPL (Fig. 1E). Although BO-LHVS labeling was much weaker than that of the antibody, the two staining patterns were nearly identical, with extracellular parasites showing one or two discrete structures usually in the apical region but occasionally posterior to the nucleus. A few of these structures were labeled with anti-rTgCPL but not BO-LHVS, possibly indicating the presence of a pool of proform or otherwise inactive enzyme in some parasites. Intracellular replicating tachyzoites tended to display a greater number of TgCPLassociated structures distributed throughout the parasite. Parasites preincubated with LHVS showed a dose-dependent reduction in labeling of the TgCPL-associated structures (data not shown), confirming that the labeling is principally specific. Properties of the TgCPL-associated organelle will be described in greater detail in a separate study. 4 Together, these findings establish that TgCPL is a major reactive target of BO-LHVS in vitro and in live parasites.
Targeted Disruption of TgCPL-To validate the specificity of LHVS for TgCPL, we generated parasites deficient in TgCPL expression by gene ablation. The TgCPL gene is composed of four exons, is present in a single copy on chromosome 1b, and is transcribed at a moderate level in a variety of Toxoplasma strains as annotated in ToxoDB (39). We generated TgCPLdeficient strains by double homologous gene replacement with a mutant allele of DHFR-TS conferring resistance to pyrimethamine (13) (Fig. 2A). TgCPL was deleted in RH strain parasites and in a RH-derived strain called Ku80, which is more amenable to targeted genetic manipulation (40). Both knockout strains, RH⌬cpl and Ku80⌬cpl, showed the expected pattern of PCR products consistent with targeted deletion of TgCPL (Fig. 2B). The absence of TgCPL expression in the knock-out strains was confirmed by immunoblotting (Fig. 2C), showing the loss of the 30-kDa major immunoreactive species.
An immunofluorescence assay (Fig. 2D) also showed the lack of TgCPL staining in newly invaded intracellular RH⌬cpl or Ku80⌬cpl tachyzoite. Similar results were seen in extracellular and replicating intracellular knock-out tachyzoites (data not shown). Collectively, these results confirm the abrogation of TgCPL expression in the RH⌬cpl and Ku80⌬cpl strains.

BO-LHVS Labeling of TgCPL Knock-out Strains-To determine if
TgCPL is the primary LHVS reactive species in tachyzoites, we incubated wild-type and knock-out parasites with BO-LHVS for activity-based protein profiling. Neither knock-out strain showed reactivity in the 30-kDa region above background, strongly suggesting that TgCPL is both the 30-and 29-kDa reactive species (Fig. 3A). Although several additional minor reactive bands remained visible in RH⌬cpl and Ku80⌬cpl lysates, including 24 and 16 kDa bands, integration of the fluorescence scans showed that collectively these labeled species constitute only 14.8% of the TgCPL reactivity in the wild-type strains. Moreover, as described above (Fig. 1D), BO-LHVS labeling of these species is not blocked by pretreatment with unlabeled LHVS, indicating their nonspecific reactivity with BO-LHVS. Whereas BO-LHVS labeling of newly invaded intracellular RH parasites showed staining of discrete puncta, these structures were not labeled in RH⌬cpl parasites. RH⌬cpl showed faint residual staining with BO-LHVS that seems to be associated with the internal periphery of the parasite. The identity of this structure is unknown but could be the parasite's tubular mitochondrion or a subdomain of the endoplasmic reticulum. Similar results were seen with Ku80⌬cpl (data not shown). Staining of numerous puncta within the cytoplasm of both infected and non-infected host cells is consistent with BO-LHVS labeling of host cathepsins within lysosomes, which conveniently serve as an internal control for reactivity. Collectively, these findings definitively establish that TgCPL is the primary target of BO-LHVS.
X-ray Crystal Structure of TgCPL and Its Propeptide Reveals the Canonical Catalytic Triad and Active Site Cleft-To determine whether the active site architecture of TgCPL is consistent with susceptibility to LHVS, we determined the x-ray crystal structure of rTgCPL as a complex with much of its  SEPTEMBER 25, 2009 • VOLUME 284 • NUMBER 39

JOURNAL OF BIOLOGICAL CHEMISTRY 26843
propeptide at a resolution of 2.0 Å (Fig. 4). The enzyme was autoproteolytically activated prior to crystallization, as evidenced by SDS-PAGE. An attempt was made to separate the propeptide from the activated protease by chromatography, but it was unsuccessful due to low recovery of material. Thus, by necessity, the crystals were grown from a protein solution containing the active protease and the cleaved propeptide. The final model of TgCPL consists of residues 108p-182p of the propeptide and residues 2-224 of the protease. The propeptide is missing the N-terminal His tag and linker through the first three cloned residues (Ile 105p -Glu 107p ) and the last 16 residues (Ser 183p -Leu 198p ), presumably due to disorder in these regions of the protein. The protease model is essentially complete, with only the first residue (Asn 1 ) not visible in the model. The refined structural model for TgCPL is available in the Protein Data Bank with accession code 3f75.
As expected, TgCPL adopts the papain-like fold (reviewed in Refs. 41 and 42) and looks very similar to previously determined papain-like cysteine protease structures, particularly other cathepsin Ls (Fig. 4A). The full TgCPL model, including the propeptide, superimposes by secondary structure matching (43) onto human procathepsin L (Protein Data Bank code 1cs8) with a RMSD of 1.04 Å for 282 aligned C ␣ atoms of 297 for TgCPL and 316 for human procathepsin L. Considering only the catalytic domain, TgCPL superimposes on mature human cathepsin L (Protein Data Bank code 3bc3) (44) with a RMSD of 0.96 Å over 210 aligned C ␣ atoms, essentially the entire polypeptide chain.
The protease consists of two domains divided by the deep active site cleft; the left (L) domain is primarily ␣-helical, whereas the right (R) domain contains a ␤-barrel-like motif that is decorated by a few short ␣-helices. The canonical catalytic triad is composed of the catalytic cysteine Cys 31 , positioned at the N terminus of the long central helix in the L domain, and His 167 and Asn 189 in the R domain. TgCPL contains the three stabilizing disulfide bonds that are highly conserved among other papain-like cysteine proteases. They link cysteines Cys 28 to Cys 71 and Cys 62 to  A, schematic illustration of the TgCPL knock-out strategy. A knock-out construct consisting of ϳ3 kb of 5Ј-and 3Ј-flanking sequence from the TgCPL gene appended to either side of a DHFR-TS-selectable marker cassette was transfected into RH and Ku80 parasites for double crossover gene replacement of TgCPL. The arrows indicate PCR primers used in B. B, agarose gel electorphoresis of PCR products derived from parental (RH and Ku80) and knock-out (RH⌬cpl and Ku80⌬cpl) strains by amplification with the indicated primers. C, immunoblot analysis of parental and knock-out strains probed with R␣TgCPL. Note the absence of the TgCPL reactivity. Asterisks denote nonspecific bands. A parallel blot was probed with anti-actin as a loading control. D, indirect immunofluorescence assay of newly invaded intracellular tachyzoites showing M␣TgCPL reactivity with RH and Ku80 (arrows) but lack of reactivity with RH⌬cpl or Ku80⌬cpl.
Cys 104 in the L domain and Cys 161 to Cys 213 in the R domain. Two additional disulfide bonds are also present, namely Cys 90 -Cys 104 and Cys 161 -Cys 213 , with the latter possibly being a mixture of free and disulfide-bonded cysteines. These two additional disulfide bonds are not highly conserved among the papain-like cysteine proteases, so they are not likely to be critical for proper folding or activity of TgCPL.
As observed in other cathepsin L structures containing the propeptide (e.g. Protein Data Bank codes 1cjl and 1cs8) (45), the propeptide of TgCPL is composed of an N-terminal globular domain of three ␣-helices followed by an extended C-terminal tail that occupies the active site cleft (Fig. 4). The propeptide tail lies in the cleft in the opposite orientation to that of a natural polypeptide substrate and, in a true proenzyme, continues around the R domain and links to the N terminus of the protease. Electron density for the final 16 residues of the propeptide is not observed in the structure. These residues could have been removed by further proteolysis, or they might not form an ordered structure and thus are not visible in the electron density map.
Propeptide residues Lys 176p -Lys 182p occupy the majority of the active site cleft (Fig. 4, B and C). Although the orientation of the propeptide is reversed relative to the peptide backbone of a natural substrate, the environments of the substrate-binding subsites can still be observed in relation to these residues. Lys 181p sits roughly in the S3 subsite. Phe 180p occupies the S2 subsite, the major specificity-determining subsite, and indeed it has been observed that cathepsin L favors substrates with an aromatic residue at P2 (41). 4 Gly 179p is adjacent to S1, and the peptide bond between Leu 178p and Gly 179p lies directly above the catalytic cysteine, approximating the position of the scissile bond of the substrate. Leu 178p occupies the S1Ј pocket of the protease active site, with its backbone carbonyl oxygen forming a hydrogen bond with N ⑀ of Gln 25 while being positioned only 3 Å from the S ␥ of the catalytic cysteine (Cys 31 ). Interestingly, Leu 178p exhibits a somewhat strained backbone conformation ( ϭ Ϫ110°, ϭ Ϫ116°, ϭ 167°). Residual electron density near the carbonyl oxygen (Fig. 4D) suggests that the true conformation of this peptide bond is further distorted from the refined position, which is biased by refinement restraints describing typical peptide geometry. We note that the equivalent propeptide residue in the human procathepsin L (Protein Data Bank code 1cjl) and procathepsin K (Protein Data Bank codes 1by8 and 7pck) structures also displays a strained backbone conformation, although the potential significance is not clear.
Modeling the Binding Mode of LHVS-Although the TgCPL recombinant protein used for structural studies was treated with LHVS, the enzyme remained in a complex with its natural inhibitor, the propeptide, presumably because crystallization conditions were unfavorable for its displacement by LHVS. Despite this, several structures of cathepsin-vinyl sulfone inhibitor complexes have been characterized, thus enabling us to model the TgCPL⅐LHVS complex to ensure that a favorable binding mode is possible (Fig. 5). The structure of human cathepsin K in complex with APC3328 (Protein Data Bank code 1mem) (33) was used to aid the initial positioning of LHVS in the active site. APC3328 is the closest structural homolog to LHVS available in the Protein Data Bank with only a single atom difference of nitrogen versus oxygen. The conservation of the catalytic triads between the two cathepsins, with a RMSD of only 0.28 Å, allows for an excellent initial placement of the inhibitor. After this initial placement and substitution of the piperazine nitrogen in APC3328 by an oxygen atom to create the morpholine group of LHVS, the binding mode model was improved by energy minimization.
Overall, the model predicts that LHVS fits quite well in the active site, establishing several hydrogen bonds with residues lining the active site cleft (Fig. 5, B and C). The morpholine group of LHVS mimics position P3 of the natural substrate and occupies the S3 subsite of the protease, with the morpholine oxygen forming a hydrogen bond with N ⑀ of Gln 69 . This favorable interaction is afforded by allowing the side chain amide of Gln 69 to rotate ϳ180°during energy minimization from its orientation in the propeptide-bound complex, where the side chain carboxyl oxygen is about 3.2 Å from N of propeptide residue Lys 181p . The leucyl moiety of LHVS mimics substrate position P2 and occupies the largely hydrophobic S2 pocket of the protease. The amide and carbonyl of the P2 leucine form antiparallel hydrogen bonds with TgCPL residue Gly 74 as would be expected for the backbone of the P2 residue in the natural polypeptide substrate. Homophenylalanine mimics the P1 position of the substrate, and its side chain is situated along the fairly shallow S1 protease subsite, whereas its backbone amide forms a hydrogen bond with the carbonyl oxygen of Asp 166 . The catalytic Cys 31 nucleophilically attacks the vinyl group, which mimics the scissile bond, to form a covalent complex with the inhibitor. The phenyl sulfone of LHVS mimics P1Ј of the natural substrate and thus occupies the S1Ј subsite in the active site cleft. One of the sulfonyl oxygen atoms is oriented toward the catalytic site, where it is stabilized by N ⑀ of Gln 25 and N ⑀ of Trp 191 and also by N ␦ of His 167 in the catalytic triad. The binding mode predicted by this model strongly suggests that a covalent complex between TgCPL and LHVS is favorable and that, similar to other cathepsin L proteases, LHVS will inhibit TgCPL activity.

DISCUSSION
Identification of the in vivo target is often the most difficult step in the characterization of small molecule inhibitors. Two properties of LHVS greatly facilitated identification of its main target, TgCPL. First, the vinyl sulfone warhead of LHVS covalently modifies the active site thiol of reactive cathepsin proteases, thus allowing irreversible labeling of its target. Second, the structure of LHVS and its known binding mechanism permitted the synthesis of a functional fluorescent derivative, BO-LHVS, for activitybased protein profiling and target identification. The versatility of this chemical probe is also an asset, since it can label targets in vitro or in live cells, and it allows an assessment of the specificity, abundance, and subcellular distribution of the active enzyme. It should be noted, however, that specificity depends greatly on the concentration of inhibitor used. Incubation of live parasites with 200 nM BO-LHVS principally results in labeling of TgCPL. Several additional minor products are also labeled, but these are not blocked by pretreatment with LHVS, indicating that they are nonspecific targets. LHVS inhibition of parasite microneme secretion, gliding motility, and attachment occurs with an IC 50 of 10 M (12), whereas LHVS inhibits TgCPL activity with an IC 50 of ϳ20 nM. Therefore, it remains possible that a target in addition to TgCPL contributes to the observed effects on parasite cell entry. Alternatively, the parasite might express a small pool of TgCPL that is only susceptible to high concentrations of LHVS or is maintained in the proform and is thus not reactive with LHVS. Indeed, evidence of this was seen by fluorescence microscopy, where some TgCPL-associated structures failed to label with BO-LHVS (Fig. 1E), although only a small subset of parasites displayed this phenomenon. These antibody-reactive but BO-LHVS-unreactive structures may contain a store of procathepsin L where the propeptide blocks reaction with the catalytic cysteine.
Preliminary phenotypic studies suggest that the RH⌬cpl and Ku80⌬cpl show different invasion competencies. TgCPB expression is up-regulated to different levels in these strains, which is reminiscent of TgCPC2 up-regulation seen after genetic ablation of TgCPC1 (8). Elevation of TgCPB expression may suppress phenotypes in the TgCPL-deficient strains. Determining the precise relationship between the functions of TgCPL and TgCPB will require more extensive genetic and cell biological studies that are beyond the scope of the current work.
TgCPL is associated with a discrete vesicular structure usually seen in the apical region of extracellular parasites. Intracellular replicating parasites show multiple puncta of TgCPL structures throughout the cytoplasm. Although the exact nature of the TgCPL-associated structure(s) is still being investigated, the available evidence suggests that it is an endocytic organelle possibly related to a lysosome. 4 In this case, TgCPL might act in the classical role of a cathepsin (i.e. as a degradative protease involved in protein turnover and nutrient acquisition). Additionally, TgCPL may have a specialized role in the selective proteolysis of substrates, akin to, for example, the function of human cathepsin L in the processing of the MHC II invariant chain in thymic epithelial cells (46) and proenkephalin in neuroendocrine cells (47). In these examples, cathepsin L selectively degrades a regulator of antigen presentation (invariant chain) and performs limited proteolysis of a key neurotransmitter (proenkephalin). Approximately half of the proteins targeted to micronemes and all of those destined for rhoptries undergo limited proteolysis (maturation) en route to these invasion organelles. Indeed, recent studies have revealed that rhoptry and microneme proteins traffic through the parasite endocytic system (15,48,49), where they may encounter TgCPL along the way. Many of these proteins are also further processed coincident with their secretion during parasite cell invasion. It is possible that LHVS inhibition of TgCPL (or other reactive targets) interferes with the processing of invasion proteins, thus accounting for the effects of LHVS on parasite entry. TgCPL is expressed in both the tachyzoite stage responsible for acute infection and opportunistic disease and in the bradyzoite cyst stage seen during chronic asymptomatic infection (8). 4 Thus, TgCPL could contribute to infection during both stages seen in humans. Identification of TgCPL substrates by comparative analysis of wild type and knock-out parasites or LHVS-treated parasites should provide additional insight into whether it serves as a general degradative protease or has a more selective role in the processing of invasion proteins or both. The x-ray crystal structure of TgCPL in complex with its propeptide provides a detailed look at the active site of the enzyme and facilitates modeling of the LHVS binding mode. A valid and meaningful model of the LHVS binding in TgCPL is FIGURE 5. Structural basis of LHVS inhibition. A, surface representation of the TgCPL active site cleft with modeled LHVS shown as balls and sticks. The catalytic triad is colored magenta, and the view is the same as that depicted in Fig. 4, B and C. B, stereoview of the LHVS binding mode. The surface of TgCPL has been removed, and amino acids within 5 Å of modeled LHVS are shown as sticks, with hydrogen bonds between the enzyme and the inhibitor shown as dashed lines. C, two-dimensional representation of the LHVS binding mode. LHVS is shown as black lines, whereas the TgCPL amino acids that surround it are shown as gray lines. Hydrogen bonds are shown as dashed lines. (C was created with ChemDraw 11.) possible from the structure presented here for several reasons. First, the in vitro and in vivo work presented here clearly shows that TgCPL is a primary target of LHVS. Second, the inhibitor is covalently linked to the protein at a defined location and with a known directionality in the active site cleft severely restricting the possible binding mode search space to sample. Also, since comparisons of proform protease structures with structures of their mature enzyme show that the catalytically active form already exists within the proenzyme (41,42), the presence of the cleaved propeptide in the structure probably has little effect on the active site architecture. The report by Huang et al. (10) that TgCPL favors Leu over other residues in the P2 position is also consistent with the favorable recognition and binding of LHVS. Thus, the model shown in Fig. 5 confirms that a favorable interaction is possible with the TgCPL active site and presents a plausible structural basis for the observed in vitro and in vivo activity of LHVS.
In summary, we identified TgCPL as the primary target of the protease inhibitor LHVS and have shown that the structure of the active site cleft is consistent with recognition of LHVS. Future work will focus on identifying TgCPL substrates and determining its role in parasite microneme secretion and cell invasion.