Structure, interdomain dynamics, and pH-dependent autoactivation of pro-rhodesain, the main lysosomal cysteine protease from African trypanosomes

Rhodesain is the lysosomal cathepsin L-like cysteine protease of Trypanosoma brucei rhodesiense, the causative agent of Human African Trypanosomiasis. The enzyme is essential for the proliferation and pathogenicity of the parasite as well as its ability to overcome the blood–brain barrier of the host. Lysosomal cathepsins are expressed as zymogens with an inactivating prodomain that is cleaved under acidic conditions. A structure of the uncleaved maturation intermediate from a trypanosomal cathepsin L-like protease is currently not available. We thus established the heterologous expression of T. brucei rhodesiense pro-rhodesain in Escherichia coli and determined its crystal structure. The trypanosomal prodomain differs from nonparasitic pro-cathepsins by a unique, extended α-helix that blocks the active site and whose side-chain interactions resemble those of the antiprotozoal inhibitor K11777. Interdomain dynamics between pro- and core protease domain as observed by photoinduced electron transfer fluorescence correlation spectroscopy increase at low pH, where pro-rhodesain also undergoes autocleavage. Using the crystal structure, molecular dynamics simulations, and mutagenesis, we identify a conserved interdomain salt bridge that prevents premature intramolecular cleavage at higher pH values and may thus present a control switch for the observed pH sensitivity of proenzyme cleavage in (trypanosomal) CathL-like proteases.

Rhodesain is the lysosomal cathepsin L-like cysteine protease of Trypanosoma brucei rhodesiense, the causative agent of Human African Trypanosomiasis. The enzyme is essential for the proliferation and pathogenicity of the parasite as well as its ability to overcome the blood-brain barrier of the host. Lysosomal cathepsins are expressed as zymogens with an inactivating prodomain that is cleaved under acidic conditions. A structure of the uncleaved maturation intermediate from a trypanosomal cathepsin L-like protease is currently not available. We thus established the heterologous expression of T. brucei rhodesiense pro-rhodesain in Escherichia coli and determined its crystal structure. The trypanosomal prodomain differs from nonparasitic pro-cathepsins by a unique, extended α-helix that blocks the active site and whose side-chain interactions resemble those of the antiprotozoal inhibitor K11777. Interdomain dynamics between pro-and core protease domain as observed by photoinduced electron transfer fluorescence correlation spectroscopy increase at low pH, where prorhodesain also undergoes autocleavage. Using the crystal structure, molecular dynamics simulations, and mutagenesis, we identify a conserved interdomain salt bridge that prevents premature intramolecular cleavage at higher pH values and may thus present a control switch for the observed pH sensitivity of proenzyme cleavage in (trypanosomal) CathL-like proteases.
Human African Trypanosomiasis (HAT) or African Sleeping Sickness is a so-called neglected tropical disease (NTD) as classified by the World Health Organization (www. who.int). HAT is caused by two subspecies of African trypanosomes, Trypanosoma brucei gambiense and T. brucei rhodesiense (1). These unicellular, protozoan parasites are transmitted to the host via the bite of a Tse-Tse fly. The disease progresses in two stages. In the early hemolymphatic stage, the parasites are present in the blood and in the lymphatic systems leading to unspecific symptoms such as fever and headache. In the later meningoencephalitic stage, the parasites cross the blood-brain barrier (BBB) and enter the central nervous system of the host. At this point, patients suffer from severe neurological symptoms, deregulation of sleep-wake cycles, coma, and ultimately death (2). The few available treatment options often have severe side effects (3).
T. brucei spp. parasites express two main lysosomal cysteine proteases that belong to the papain family, the cathepsin B-like T. brucei cathepsin B (TbCathB) and the cathepsin L-like rhodesain (also called TbCathL, brucipain, or trypanopain). Rhodesain plays an important role in the progression into the second disease stage as it is involved in the crossing of parasites into the central nervous system via the BBB of the host (4,5). The expression of TbCathB mRNA varies strongly during the life cycle of the parasite, while rhodesain is constitutively expressed (6). In agreement with an important role in parasite survival, proliferation, and pathogenicity, RNAi-based rhodesain knockdown or inhibition of rhodesain but not TbCathB led to diminished parasitic growth and an increased sensibility to lytic factors in human serum (7,8). Rhodesain is thus generally assumed to be the main lysosomal cysteine protease in trypanosomes and presents a promising antitrypanosomal drug target. In general, proteases are attractive drug targets because the affinities of inhibitors toward active site residues can be tuned and potential inhibitors can easily be identified in high-throughput in vitro assay screenings (9). Nevertheless, structural similarities to host proteins can lead to problems with off-target effects. Rhodesain shares 46% (60%) sequence identity (similarity) with human cathepsin L (hsCathL). The irreversible, vinylsulfone-based rhodesain inhibitor K11777 (N-methylpiperazine-urea-Phe-homophenylalanine-vinylsulfone-benzene), was shown to prevent T. brucei from overcoming a barrier formed by brain microvascular endothelial cells in an in vitro BBB model (4,5). Encouragingly, while K11777 was indeed observed to not be biochemically selective, it was nontoxic to mammals, efficient against related protozoans, and could cure mice from a Trypanosoma cruzi infection (10)(11)(12).
In the cell, cathepsins are expressed as inactive zymogens where the 215 amino acid papain-fold core protease domain is preceded by a 20 amino acid signal peptide required for translation of the protein into the endoplasmatic reticulum, and a 100 amino acid prodomain required for correct folding and lysosomal targeting of the protease (13)(14)(15)(16)(17). The prodomain interacts with the active site of the protease and keeps it in an autoinhibited state until it has been successfully trafficked to the acidic lysosome lumen. Here, the prodomain is removed by proteolytic cleavage, which may occur intra-or intermolecularly (18)(19)(20)(21). The cleavage site is located in the linker between the prodomain and the core protease domain (22,23). While the rhodesain core protease domain has been crystallized in complex with different inhibitors including K11777 (PDB: 2P7U) (23)(24)(25)(26), a structure of the zymogen as an important intermediate of the parasitic protease maturation process is currently not available and the molecular details of the pH-dependent cleavage process remain unclear.
Here, we present the 2.8 Å crystal structure of the active site C150A mutant of the uncleaved T. brucei rhodesiense cysteine protease pro-rhodesain under acidic conditions. The rhodesain prodomain shares many structural features with other CathL prodomains, but is unique in the presence of an extended αhelix in its C-terminal end. Using photoinduced electron transfer-fluorescence correlation spectroscopy (PET-FCS) (27), we investigated the interdomain dynamics of the proenzyme. At low pH values, where the protease undergoes autocleavage, the "blocking peptide" region of the prodomain covering the protease active site shows increased dynamics. We further found evidence that pro-rhodesain can be processed both intra-and intermolecularly in a pH-dependent manner and identified a highly conserved interdomain salt bridge that can prevent premature intramolecular cleavage at high pH and may thus act as a "delay switch" for pro-cathepsin autocleavage.

Heterologous expression and purification of pro-rhodesain
Cathepsin-like cysteine proteases are expressed as an inactive proform. pH-dependent cleavage of the prodomain from the globular core protease domain is an important step in the protease maturation process contingent on the correct localization of the protein to the lysosome. For a better understanding of pro-rhodesain inhibition and activation, we determined the crystal structure of T. brucei rhodesiense prorhodesain. While rhodesain expression in P. pastoris relies on the secretion of the folded protein (22,25), all currently available expression and purification protocols for rhodesain from Escherichia coli are based on its expression in inclusion bodies and subsequent protein refolding (28). To obtain the folded proenzyme for structural studies, we found both production routes to be ineffective. We thus optimized the purification of mature and pro-rhodesain from E. coli in a manner that circumvents secretion and refolding (see Figs. S1-S4 and Extended Materials and Methods for details).
Purified mature rhodesain proteins from P. pastoris and E. coli have similar structural and functional properties as elucidated by size-exclusion chromatography (SEC), circular dichroism (CD) spectroscopy (Fig. 1, A and B), and a fluorescent peptide-cleavage assay using the fluorescent peptide Z-Phe-Arg-AMC (Z-phenylalanine-arginine-7-amido-4methylcoumarin) to probe rhodesain activity (see Extended Materials and Methods) (9). E. coli can thus be used as a reliable source for the production of active rhodesain for optimal flexibility in biophysical studies. To prevent autocleavage and to stabilize the proenzyme, we introduced a mutation into the active site of the protein, C150A. Prorhodesain C150A displays the expected increase in molecular weight during SEC compared with the mature protease as well as an additional small increase in α-helical content in CD spectroscopy measurements (Fig. 1, B and C). Comparison of heterologously expressed rhodesain purified from P. pastoris or E. coli. A, size-exclusion chromatography of mature rhodesain from P. pastoris (dashed-lines, dark gray) or E. coli (orange) and pro-rhodesain C150A from E. coli (light gray). B, circular dichroism spectra of mature rhodesain (P. pastoris, dark gray; E. coli, orange) as well as pro-rhodesain C150A (light gray). C, secondary structure analysis of CD spectra shown in (B) using the BeStSel server (http://bestsel.elte.hu/index.php) (68).

T. b. rhodesiense pro-rhodesain crystal structure
We obtained a 2.8 Å crystal structure for T. b. rhodesiense pro-rhodesain at acidic pH (Fig. 2, Table 1, Fig. S5). The proprotease structure contains two parts, the N-terminal prodomain (residues E38 to T123 according to uniprot numbering, ID.: Q95 PM0) and the C-terminal core protease domain (residues A126 to P342). The first 18 amino acids belonging to the prodomain, a short loop between the first two helices (residues 77/78) and residues G124/R125 at the Cterminus of the prodomain, are not resolved. High disorder, in agreement with the crystallographic B factors in the C-terminal prodomain region (Fig. 2, C and D), coincides with the expected intermolecular cleavage site of pro-rhodesain at position T123 or R125 (22,23). The entire C-terminus of the core protease up to residue P342 as well as the remaining residues of a TEV cleavage site (ENLYFQ), which was used to remove the C-terminal purification tag, could be reliably placed in the electron density. The remainder of the TEV cleavage site mediates crystal contacts to the neighboring protein's prodomain and may have thus aided crystallization of the construct but does not interfere with the interaction of the prodomain with its corresponding catalytic domain.
The catalytic core domain follows the typical two-lobed fold of the papain family of cysteine proteases with an α-helical Ldomain and a β-sheet containing R-domain. The cleft between these domains harbors the active site triad residues C150 (mutated to alanine in our construct), H287 and N307 (Fig. 2B).
In the same position as in papain (PDBs: 9PAP, 3TNX) (29,30), pro-rhodesain contains three intramolecular disulfide bonds between residues C147-C188 and C181-C226 in the L-domain as well as between C280-C328 in the R-domain (Fig. S5). As  Pro-rhodesain structure and dynamics apparent from the electron density, these residues all form disulfide bridges. The backbone RMSD between our structure of the pro-rhodesain core catalytic domain (PDB: 7AVM, residues 125-342) and the previously determined structure of mature rhodesain (PDB: 2P7U) (25) is 0.39 Å, thus the presence of the prodomain does not affect the structure of the catalytic domain, including the relative side-chain orientations of the residues of the active site catalytic triad (Fig. 2B).
The prodomain constitutes about one-third of the protein in immature cathepsin proteases and consists of three helices, H1-H3 (Fig. 2). In pro-rhodesain, the N-terminal helix 1 (H1, residues E38 pro -K47 pro ) is shorter compared with H1 from other crystallized CathL zymogens. The lack of resolution for the first 18 amino acids indicates that this region may be highly flexible and potentially not structured in pro-rhodesain. H1 is connected to the orthogonal helix 2 (H2, residues A55 pro -A76 pro ) via a long loop. H2 then leads into a long third helix (H3, residues E96 pro -K113 pro ). Helix 3 covers the active site and constitutes the main difference between the prodomain of rhodesain and other CathL-like protease prodomains (see below).
CathL and CathB proteases both belong to the CA clan and the C1 family of proteases (31) and thus share a papain-like fold. Nevertheless, their prodomains differ. For instance, the conserved ER(I/V)FNIN motif is only found in CathL-like precursors within helix 2 of the prodomain and is slightly modified in the trypanosomal CathL proteases (32). Here, the ER(A/V)FNAA motif (consensus sequence Ex 3 Rx 2 (A/V)(F/W) x 2 Nx 3 Ax 3 A, with x = any amino acid) (Fig. 3, Fig. S6) is involved in the correct folding of the prodomain and mediates intradomain interactions (33,34). In most zymogens, H2 is longer than in our pro-rhodesain structure, and we do not observe density for the C-terminal end of this region (residues 77 pro -78 pro ). Together with the elevated B-factor values for the C-terminal end of H2 indicate that this region may be flexible in the prodomain of pro-rhodesain as has been stated for human CathL (35).
H2 connects to Helix 3 (H3, residues 96 pro -113 pro ) via a long linker. This linker partially forms a short antiparallel β-sheet (residues 82 pro -84 pro ) with the propeptide binding loop (PBL, residues 271-275, Fig. 3C), a loop extending from the R subdomain of the catalytic core domain. Notably, both the short β-sheet in the H2/H3 linker of the prodomain and the interacting PBL in the protease domain display very low Bvalues indicating high rigidity (Fig. 2, C and D). In cathepsin prodomains, the β-strand in the H2/H3 linker is typically followed by a GNFD motif with the GxNxFxD (x = any amino acid) consensus sequence, which is required for proper protein folding and autoactivation (36). In pro-rhodesain, this sequence corresponds to the GNFD-like motif G 85 VT 87 PF 89 SD 91 . Residues in this motif form multiple contacts within both the prodomain and to the core protease domain (Fig. 3D). Our structure may thus provide some insights why the core protease domain's PBL β-strand harbors a di-glycine motif (G274/G275) that is highly conserved among CathL proteases (Fig. S6) and why this region could be crucial for protease folding (37) as larger amino acid side chains may sterically interfere with the PBL interactions (Fig. 3, C and D).
An interaction network involving mostly hydrophobic and aromatic residues positions the remainder of the prodomain H2/H3 linker whose P88 pro and F89 pro side chains dip into a hydrophobic basket formed by the side chains of F269, M270, W309, and W313 in the R-domain of the core protease (Fig. 3D). In addition, the backbone carbonyl of M270 forms a hydrogen bond with the side chain of T87 pro while its side chain forms a hydrophobic interaction with F97 pro in the first turn of H3, the long α-helix covering the entire length of the cleft between the L-and R-domains of the catalytic domain. In all available structures of other cathepsin proenzymes, this region is mainly unstructured (Fig. 4A). This α-helix therefore presents the most notable structural difference between prorhodesain and closely related nontrypanosomal proteases.
Helix H3 in pro-rhodesain differs from that of other nonparasitic pro-cathepsins The residues in H3 facing the catalytic domain are involved in numerous, mostly hydrophobic interactions with the catalytic domain, thereby efficiently blocking the rhodesain active site (Fig. 4A).
In agreement with structures from multiple CathL proenzymes, the pro-rhodesain H3 helix starts out as a regular αhelix. However, and so far unique to pro-rhodesain, a bulge forms around residues N103 pro and G104 pro , whose backbone amides are not involved in hydrogen bonds. The hydrogen bond acceptor group for the amide group of G104 pro in the context of a regular α-helix would be the oxygen of the backbone carbonyl of R100 Pro . This group, however, is 5.1 Å away from the G104 pro amide group, thus precluding formation of a hydrogen bond and instead locally distorting the αhelix in proximity to the active site of the core protease ( Fig. 4B). Importantly, this bending of the α-helix enables the guanidino group of R100 pro to interact with the side chain of Q144 in the core protease domain. Notably, Q144 is conserved across all CathL proteases while R100 pro is conserved in the majority of homologous trypansomal CathL proproteases (Fig. S6). Below the bulge, H3 continues as a regular α-helix up to residue K113 pro from where the prodomain leads into the Rdomain of the catalytic core domain via a C-terminal 15 amino acid unstructured linker (Figs. 2 and 4B). The electron density in our X-ray structure allows tracing the backbone of the entire H3 α-helix unambiguously for all residues in H3 (E95 pro -K113 pro ) (Fig. S5B). The B-factor values for the two N-terminal helical turns of H3 are among the highest in the prodomain while in the entire remainder of this helix following the bulge, the B-factors are among the lowest (Fig. 2, C and D). The sequence of the N-terminal region of H3 preceding the bulge, which is α-helical across all available CathL prodomain structures, is generally well conserved across CathL enzymes (Fig. 4A, Fig. S6). In contrast to the continuous α-helix observed for H3 in pro-rhodesain, the structures of all other available, nontrypanosomal pro-CathL proproteases display a mostly unstructured peptide in the C-terminal end of H3 (Fig. 4A). Nonetheless, despite the differences in secondary structure within this blocking peptide, its overall position relative to the cleft of the core protease domain is highly Pro-rhodesain structure and dynamics similar in all pro-CathL enzymes. However, this region shows no sequence similarity between T. brucei rhodesain and nonparasitic pro-CathL enzymes, but extremely high sequence identity across trypanosomal pro-cathepsins (Fig. S6).
The observation that the region of the prodomain blocking the protease active site is α-helical in pro-rhodesain, but not in any of the other pro-Cath(L) structures determined to date (Fig. 4A), prompted us to test whether the intrinsic propensity of this region to form an α-helix is higher in pro-rhodesain compared with related proteases. We used the isolated blocking peptides from T. b. rhodesiense rhodesain (residues R102 pro -K113 pro ) and human CathL (residues N93 pro -R101 pro ) to record CD spectra at pH 8 and pH 4 and with increasing amounts of trifluoroethanol (TFE) (Fig. 4, C and D, Fig. S7). TFE induces secondary structure formation by promoting hydrogen bond formation (38,39). Both peptides are mainly unstructured in the absence of TFE. For the peptide from human CathL, even high TFE concentrations (up to 90% v/v) do not induce significant α-helical content (Fig. S7). This is in agreement with the highly disordered protein region observed in the human pro-CathL crystal structure (PDB: 1CJL) (37). In contrast, the isolated rhodesain prodomain peptide becomes α-helical already at low TFE concentrations ( Fig. 4, C and D). The variation in pH had only a small effect on secondary structure propensity of either blocking peptide. Likewise, the CD spectra of the purified pro-rhodesain C150A construct are also virtually identical at high and low pH (Fig. S8).

Propeptide binding site interactions resemble the binding mode of the K11777 inhibitor
Proteases recognize their substrates by engaging with the substrate amino acid side chains in specific pockets. These pockets are numbered outward from the cleavage site toward the substrate's N-(binding sites S1-Sn) and C-terminus (prime sites S1'-Sn') (40). In pro-cathepsins, the N-to Cterminal orientation of the prodomain blocking peptide is inverted with respect to the interaction of a putative peptide substrate. Additionally, the peptide backbone of the blocking peptide hovers "above" the active site cleft typically occupied by a substrate (compare Fig. 5, A, C and E). In the prorhodesain structure, the S2' and S1 sites are partially occupied by R100 pro and S1' by the side chain of Y101 pro while a glycerol molecule sits in the center of the substrate-binding site (Fig. 5, A and B). The side chains of F108 pro and Y107 pro occupy the S2 and S3 pockets, thus overlapping with Pro-papain residues are displayed in parentheses. B, conserved aromatic π-stacking between residues from H1 and H2 of the propeptide. Such interactions are also observed in e.g., pro-papain (PDB: 3TNX) (30) and at least three of the four aromatic residues involved in this interaction are highly conserved across the prodomains of other cathepsin zymogens, including human pro-CathL (37) (Fig. S6). C, the propeptide binding loop (PBL) forms an antiparallel β-sheet with a loop in the R subdomain of the core protease. Additionally, the hydroxyl group of the Y272 sidechain forms an H-bond with the backbone carbonyl group of G275 while its aromatic ring stacks with the peptide bond between N273 and G274, thus providing further stabilizing interactions. D, interactions of the GNFD-like motif. D91 pro interacts via an H-bridge with Y52 pro of the first prodomain loop, while P88 pro and F89 pro are buried in a hydrophobic pocket formed by F269, W309, and W313 in the catalytic domain.
Pro-rhodesain structure and dynamics the binding sites of the phenyl-and N-methylpiperazine groups of the K11777 inhibitor observed in the crystal structure of rhodesain (PDB: 2P7U) (Fig. 5, C and D) (25). We also looked at the position of the peptidic substrate Z-Phe-Arg-AMC used for activity assays by molecular docking and observed a similar occupation of the rhodesain substratebinding pockets (Fig. 5, E and F).

pH-dependent autocleavage of pro-rhodesain can occur intraand intermolecularly
For the physiological activation of rhodesain and other CathL-like proteases, the prodomain has to undergo cleavage at low pH, and it has been suggested that this process can occur either inter-or intramolecularly (or both) (21,41,42). In our hands, folded, inactive pro-rhodesain C150A purified from E. coli is not cleaved, ruling out a role for other E. coli proteases in the cleavage process of heterologously expressed prorhodesain. Lowering the pH for purified wildtype prorhodesain leads to cleavage indicating that intramolecular autocleavage plays at least some role in pro-rhodesain processing (Fig. S4). However, this does not exclude the possibility that cleavage also occurs intermolecularly. Intermolecular protease cleavage can be easily probed by mixing catalytically inactivated enzymes with a wildtype protease (43). Incubating an excess of purified, catalytically inactive pro-rhodesain C150A with catalytically active, mature, wildtype rhodesain led to a continuous decrease of the high molecular band corresponding to pro-rhodesain in a gel-shift assay (Fig. 6, A and  B). Importantly, this process is pH-dependent and lower pH values result in more efficient autocleavage.
More efficient pro-rhodesain C150A cleavage at lower pH values could be due to a higher intrinsic activity of the mature wt rhodesain and/or changes in the structural dynamics of the proenzyme and consequently a higher accessibility of its cleavage site. To probe the first possibility, we tested the residual activity of mature rhodesain at different pH values by incubating it with the fluorogenic peptide-based substrate (Fig. 6C). While the activity of rhodesain slowly decreases at pH 8 over the course of multiple hours, this drop in activity is not sufficient to explain the near-absence of prodomain cleavage within the same timeframe and the differences in prorhodesain C150A cleavage at pH 6.5, pH 5.5, and pH 4 (Fig. 6). We therefore investigated pro-rhodesain dynamics at high and low pH values, in particular the relative motion of the prodomain to the core protease domain.

Interdomain dynamics of pro-rhodesain probed by PET-FCS and MD simulations
To investigate the dynamics of the prodomain in relation to the core protease domain, we used PET-FCS (27). Here, the dynamics of a protein can be probed via contact-induced fluorescence quenching, i.e., a PET reaction of an extrinsic fluorophore attached to a cysteine side chain with the side chain of the natural amino acid tryptophan (44). Fluorophore and indole group form a van der Waals contact at distances below 1 nm, which can be disrupted or formed by conformational changes of a protein. Such distance changes are thus translated into fluorescence fluctuations, and the combination of the PET fluorescence reporters with fluorescence correlation spectroscopy facilitates the single-molecule detection of fast protein dynamics with nanosecond time resolution (27,45,46). Pro-rhodesain contains eight native cysteine residues: the three cysteine pairs forming disulfide bonds in the core protease domain (Fig. S5), the active site residue C150, and a cysteine in the prodomain (C22). To avoid unspecific labeling of the enzyme with a fluorophore, we mutated both the active site cysteine (C150A) to stabilize the proenzyme as well as C22 pro (C22S pro ).
For the pro-rhodesain C22S pro , C150A double mutant, protein modified with an AttoOxa11-dye was hardly detected, showing that the remaining six cysteine residues were protected against fluorophore labeling due to their involvement in disulfide bridges. Subsequently, two new cysteine residues were introduced in the C22S pro , C150A background of pro-rhodesain, either at the C-terminal end of pro-domain helix 1 (V51C pro ) or right above the bulge in helix 3 in the vicinity of the active site (A99C pro ). For PET fluorescence measurements, a tryptophan mutation at position Q146 in the L domain of the core protease domain was introduced in the vicinity of V51C pro and A99C pro (Fig. 7). Neither the cysteine mutations nor the additional tryptophan mutation perturbed the secondary structure of pro-rhodesain and all constructs could be successfully labeled with an AttoOxa11-dye as assessed by steady-state fluorescence spectroscopy (Figs. 7 and  8, A and B). As expected, due to the large distance from the fluorophore as apparent from our structure, the eight native tryptophan residues in pro-rhodesain did not lead to fluorescence quenching of dyes attached to residues V51C pro or A99C pro , while introduction of Q146W did. The AttoOxa11labeled V51C pro /Q146W and A99C pro /Q146W pairs can Pro-rhodesain structure and dynamics thus be used as sensitive fluorescence reporters for interdomain dynamics between the prodomain and the protease domain of pro-rhodesain.
No changes in fluorescence intensity upon reducing the pH are observed in our fluorescence spectra of pro-rhodesain (Fig. 8, A and B). To probe whether there are nonetheless dynamic ramifications for pro-rhodesain in response to pH changes, we additionally carried out PET-FCS measurements with the dye/Trp pair containing proteins at pH 8 and pH 5.5 (Fig. 8, C-F). Here, the drop in pH led to a small increase of the diffusion time constant τ D of all tested mutants indicative of a larger hydrodynamic radius under acidic conditions. Since this behavior is observed for all proteins, it might be interpreted as a global structural rearrangement of pro-rhodesain or "loosening" of the overall structure at low pH.
Interestingly, the pro-rhodesain construct with the A99C pro / Q146W dye/quencher pair also showed a pH-dependent change in sub-ms fluorescence fluctuations (Fig. 8, E and F). Fluorophores attached to residue A99C pro , within the "bulge" of H3 covering the protease active site, yield a monoexponential autocorrelation curve with a 1 μs time constant at pH 8. At pH 5.5, the same construct shows a biexponential trace with time constants of 0.21 and 2.1 μs (Fig. 8, E and F). The amplitudes are the same in both cases, showing that the total fluorescence quenching is similar at high and low pH, but that the local dynamics of the system are more heterogeneous under acidic conditions. In contrast, the pro-rhodesain construct harboring the V51C pro /Q146W dye/quencher pair, with V51C pro at the C-  terminal end of H1 in the prodomain shows only a modest response to the change in pH (Fig. 8, C and D).
To shed additional light on the pH-dependent autocleavage process, we carried out molecular dynamic (MD) simulations with the proenzyme at pH 4 and 8, respectively. While large conformational changes are beyond the timescale of the 20 ns MD simulations, "loosening" of the prodomain may be indicated by an increased flexibility and the reduction of interactions between residues sensitive to pH changes (35). All simulations showed a high stability within the given timeframe (Fig. S9A). The root mean square fluctuation (Fig. S9B) and deviation (RMSD, Fig. S9C) analysis indicated flexibility in the same protein regions as found by B-factor analysis (Fig. 2D). The evaluation of the interaction energies between the prodomain and main domain indicates significantly reduced interactions between the two domains at pH 4 compared with Figure 8. PET-FCS on pro-rhodesain. A and B, steady-state fluorescence spectra of pro-rhodesain constructs, labeled at position A99C pro or V51C pro using AttoOxa11, show efficient quenching of the label by the Trp residue Q146W in the core protease domain while changes in pH have a negligible effect on the spectra. C-F, FCS autocorrelation functions (G(τ)) measured from pro-rhodesain constructs at 37 C under solution conditions specified in panels. Data shown in panels C-E were fit using a model for molecular diffusion containing a single exponential relaxation (orange line). Data shown in panel F were fit using a model for molecular diffusion containing two single-exponential relaxations (orange line). Amplitudes (A n ) and time constants (τ n ) extracted from fits are shown in panels.
Pro-rhodesain structure and dynamics pH 8 while higher interaction energies were found within the main domain at pH 4 (Table S1).
Since pro-rhodesain is cleaved efficiently at low pH and our PET-FCS data indicate that the dynamics of H3 increase under acidic conditions, we wondered whether protonatable groups mediating interactions between H3 and the core protease play a role in the cleavage process. We thus performed a detailed analysis of the H-bond interaction profile for the MD simulations with a focus on titratable groups (Tables S2-S7). This revealed some changes in the interaction patterns of specific amino acids for different protonation states. A salt bridge connecting the C-terminal end of H3 (R116 Pro ) with the core protease (D194 and D242) was deemed especially interesting. For MD simulations at pH 4, D194 was protonated and therefore able to interact with D242 via a hydrogen bond (64-74% of the simulation time). When D194 was deprotonated during simulations at pH 8, this interaction was no longer possible and D194 and D242 were instead able to interact with the side chains of R114 Pro and R116 Pro via ionic interactions ( Table 2). This provides a hint for putative stabilizing interactions between pro-and main domain at higher pH values that can be replaced by intradomain interactions at acidic pH.
Incidentally, the residues involved in this putative interdomain "lock" at pH 8 are highly conserved across trypanosomal pro-CathL proteins (Fig. 9A, Fig. S6). Despite the lack of overall sequence similarity, this interaction is even replicated in nontrypanosomal CathL proteins (Fig. 9A). In C. papaya pro-caricain, the positions of the corresponding basic and acidic residues are interchanged further underscoring the fact that these amino acids presumably play an important role and may coevolve.
To probe the effect of this putative interaction on prorhodesain autocleavage, we mutated the two aspartic acids into nonprotonatable asparagine residues in the C150A background yielding pro-rhodesain C150A, D194N, D242N (C150A DD/NN). Adding mature wt rhodesain to this mutant, we probed its digestion via the same gel-shift assay used previously to establish intermolecular cleavage (Fig. 9, B and C). As observed previously for the construct harboring only the C150A mutation, the inactive pro-rhodesain C150A DD/ NN mutant was efficiently cleaved at low pH values. The interdomain interaction at this site therefore does not significantly affect the intermolecular cleavage process. To probe whether it may instead play a role in the intramolecular cleavage, both wt pro-rhodesain and pro-rhodesain DD/NN (with an intact active site cysteine) were expressed in E. coli and obtained in their proforms by purifying them at pH > 7 and in the presence of the protease inhibitor phenylmethylsulfonyl fluoride (PMSF) to avoid autocleavage (47). After purification, the PMSF inhibition was reversed by the addition of DTT. Upon adjusting the pH to either pH 8 or pH 4, the enzymes' autoactivation was monitored using the fluorescence-based peptide cleavage assay. As expected, at pH 4 the overall activity is higher for both enzymes. While the DD/NN mutant exhibits a slightly reduced maximum activity compared with the wt protein, both enzymes show comparable activation kinetics and reach their activation plateau at the same time (Fig. 9D). At pH 8, the activity for both proteins is significantly lower. However, under these conditions the autoactivation of the DD/NN mutant is accelerated considerably compared with the wild-type enzyme (Fig. 9E).

Discussion
For survival, propagation, and overcoming the BBB in the host, African trypanosomes rely on proteases such as rhodesain. Similar to related cathepsin proteases, rhodesain is expressed as a zymogen with a prodomain that maintains its inactive state until the protein is correctly delivered into the acidic environment of the lysosome. Here, we present the first trypanosomal pro-CathL structure, T. brucei rhodesiense pro-rhodesain, as an important intermediate of the protease maturation process. While the overall architecture of pro-rhodesain agrees with related proenzymes including human cathepsin L or C. papaya papain (37,48), pro-rhodesain displays a unique extended αhelix covering the entire cleft between the L-and R-lobes of the core protease domain (Fig. 2). A pronounced bulge in the region covering the protease active site disrupts the extended α-helix in H3 of the prodomain. In the structures of currently available homologous, nontrypanosomal proproteases, H3 becomes unstructured at this position (Fig. 4A). It thus seems possible that deviations from a regular α-helix are a unifying structural characteristic of pro-CathL proteins. As a word of caution, it should be noted that all available structures of pro-CathL proteases including pro-rhodesain were obtained by mutating the active site cysteine. This removes a thiolate anion from the active site and may thus locally distort the structure of the prodomain. In addition, we observe a glycerol molecule in the active site that could lead to deviations in the side chain orientations of residues within prodomain peptide interacting with the active site.
In the α-helical region following the bulge in H3, trypanosomal CathL proteases display very high sequence identity to each other but not to CathL enzymes from other organisms. The extended H3 helix may thus be a defining feature of trypanosomal CathL proteases and as such present a structural template to further improve the affinity or selectivity of rhodesain inhibitors. We found that the side chains of the rhodesain prodomain blocking peptide partially resemble binding of K11777, a promising drug candidate against protozoan diseases (49), by occupying the S2 and S3 substrate-binding pockets with its aromatic and piperazine groups. In hindsight, this nicely rationalizes the original screening-based  NN (D194N, D242N) was probed by incubating the purified protein with a substoichiometric amount of mature rhodesain (100:1 mol:mol) at pH 4.0, 5.5, 6.5, or 8.0 at 37 C. Samples were taken at indicated time points and run on 15% SDS-PAGE. Pro-rhodesain C150A DD/NN can be digested intermolecularly by mature rhodesain. C, intensities of the pro-rhodesain band at 37 kDa were determined with ImageJ and compared with timepoint t 0 . D and E, purified wt pro-rhodesain and the pro-rhodesain DD/NN variant, both harboring the active site cysteine to enable autoactivation, were incubated at pH 4.0 (D) and pH 8.0 (E) at 37 C. Inhibition with PMSF was removed by adding DTT and waiting for the specified time before monitoring the protease catalytic activity via the cleavage of a fluorogenic peptide as a proxy for protease autoactivation. Data were normalized to the maximum activity of the wt enzyme at pH 4.
Pro-rhodesain structure and dynamics design of the K11777 (50) and may provide an opportunity for further inhibitor optimizations.
In agreement with the literature (51), our data show that pro-rhodesain can be activated by pH-dependent intra-and intermolecular cleavage processes. Our PET-FCS measurements may provide a molecular explanation for the pH dependency of pro-rhodesain autoactivation due to changes in the interaction dynamics between the H3 helix in the prodomain and the core protease domain (Fig. 8). Low pH "loosens" the pro-rhodesain structure as indicated by the increase in the hydrodynamic radius and the observed reduced diffusion times compared with the proenzyme at pH 8. The acid-induced heterogeneity in conformational dynamics may be interpreted as local folding/unfolding events of H3 at low pH values. The high crystallographic B factors also indicate higher flexibility in this region compared with the remainder of the prodomain and the core protease domain.
Our MD simulations also suggest that the interactions between pro-and core protease domain are significantly reduced at lower pH values. Interestingly, a PET-FCS reporter pair informing on the relative motions between H1 from the propeptide and the core protease domain did not exhibit changes in dynamics under all conditions tested, thus suggesting that the degree of pHdependent structural dynamics differs within the prodomain and that H3 may be particularly sensitive to pH changes.
Combining MD simulations and site-directed mutagenesis, we narrowed the pH-sensitive changes in H3 down to a set of protonatable groups, D194 and D242 in the protease domain. These residues can mediate ionic interactions with basic residues in the prodomain, R114 pro and R116 pro , and thus form a pH-dependent "interdomain lock." The participating residues of this salt bridge connecting the C-terminal end of H3 to the core protease domain are highly conserved across pro-cathepsins. In wild-type pro-rhodesain, where the lock is intact, autocleavage occurs efficiently at pH values of 5.5 or lower (Fig. 6B) when the aspartic acids become protonated. The typical pK a of free aspartic acid is 4 although this value can be significantly lower in the context of a salt bridge in the interior of a protein (52). In pro-rhodesain, the salt bridge is solvent exposed, a feature that may be necessary to fine-tune pH-dependent autoactivation of pro-rhodesain in a physiological pH range. An elevated pK a of one of the two aspartic acid residues involved in the intradomain interaction is likely because they form a "handshake" interaction ( Fig. 9) (53). Mutation of the two conserved aspartic residues breaks the lock and thus enables autoactivation of pro-rhodesain even at near neutral pH (Fig. 9E). This indicates that these residues act as a pH-dependent protective lock, presumably to prevent premature protease activation at higher pH values, i.e., before the protein reaches the acidic lumen of the lysosome.
In agreement with models put forward for the related human CathL protease (21) as well as more recently for rhodesain itself (51), activiation of pro-rhodesain could follow a two-step mechanism. An initial, slow intramolecular cleavage step is delayed by the protective salt bridge between the prodomain and the core protease as well as strong interdomain interactions. This reduces interdomain dynamics and prevents premature autoactivation at higher pH values. As the proprotease progresses along the cellular trafficking routes to the lysosome and experiences continuously more acidic environments, the aspartic residues within the core protease domain are protonated, which leads to the opening of the "interdomain lock" by replacement of the interdomain ionic interaction with an intradomain D194-D242 hydrogen bond, followed by an increase in interdomain dynamics and resulting in autoactivation of the protease. Once liberated from the inhibitory prodomain, the mature protease is now available to also activate other pro-rhodesain protomers in an intermolecular fashion, thus speeding up the maturation process of the protease population significantly. The interdomain salt bridge and dynamics centered on the unique helix 3 of the trypanosomal prodomain are thus decisive elements to prevent premature activation and to correctly time pro-rhodesain cleavage under the right (i.e., low pH) physiological conditions. Due to the high structural similarities among pro-CathL enzymes and the additional high sequence similarity of H3 among trypansomatid proteases, our structure in combination with the presented delayed activation model may potentially act as a blueprint for these important enzymes in trypanosomatids and beyond.

Experimental procedures
Rhodesain expression and purification from P. pastoris and E. coli The expression of rhodesain from P. pastoris was performed as described previously (9). For purification from E. coli Rosetta2 DE3 pLysS (Novagen), the T. b. rhodesiense rhodesain sequence (uniprot-ID.: Q95 PM0) followed by a TEV cleavage site and a hexahistidine-tag was cloned between NdeI and BamHI restriction sites of a pET-11a vector by Genscript using codon optimization (Fig. S3). The plasmid was amplified in XL-10 cells. A GFP sequence was additionally introduced between the TEV site and the 6xHis-tag by Gibson assembly following extensive construct and purification optimization (see Extended Materials and Methods for details). The rhodesain plasmid and the GFP gene were amplified in a PCR using the primers 5 0 -caccaccatcaccaccactaagg-3 0 and 5 0 -gctaccttgaaaatacagattctccggg-3 0 for the vector and 5 0 -ctgtattttcaaggtagcgtgagcaagggcgaggagc-3 0 as well as 5 0 -ggtggtgatggtggtgcttgtacagctcgtccatgccg-3 0 for the insert.
Purification of pro-rhodesain C150A from E. coli The active site C150A mutation was introduced by site direct mutagenesis using the primers 5 0 -GCGGTAGCGCGTGG GCGTTCAGCACC-3 0 and 5 0 -CGCCCACGCGCTACCG-CATTGGCCCTG-3 0 . Transformation, protein expression, cell lysis, purification by Ni-NTA, and TEV-cleavage were performed as described in the Extended Materials and Methods. In short, after digestion of IMAC-purified pro-rhodesain with TEV protease, the cleavage products were loaded on 8 ml Ni-NTA gravity flow column preincubated with 50 mM Tris buffer (pH 8.5, 100 mM NaCl, 10 mM imidazole) to remove the TEV protease and GFP. Pro-rhodesain was eluted with 50 mM Tris buffer (pH 8.5, 100 mM NaCl, 10 mM imidazole) and combined with the flow through. After concentration by spin filtration, pro-rhodesain was loaded onto a HiLoad 16/600 Superdex 75 pg column (GE Healthcare) and purified by SEC (0.5 ml/min, 20 mM Tris, pH 8.0, 200 mM NaCl). The protein in the peak fraction was dialyzed against H 2 O at 4 C, lyophilized, and stored at −80 C until further use. To obtain wild-type pro-rhodesain, the purification protocol was adapted (see Supporting information for details).

Circular dichroism spectroscopy
CD spectra were measured using a Jasco-815 CD-spectrometer between 190 and 260 nm wavelength with 5 nm bandwidth, an interval of 1 nm, and a scanning speed of 50 nm/min. Measurements of mature rhodesain from P. pastoris and E. coli (5 μM in 3 mM sodium phosphate buffer, pH 6.8, 0.5 mM TCEP, 5 mM NaCl) and inactive prorhodesain C150A (5 μM in 10 mM Tris, pH 8.0, 3 mM NaCl) were performed in a 1 mm quartz curvet at 25 C. Spectra were derived from the average of three measurements after subtraction of the baseline.
The obtained ellipticities θ were converted to the mean residue ellipticities ½θ mrw;λ by the equation where c equals the concentration of the protein, d the pathlength through the sample, and MRW for the mean residue weight, which is calculated by M corresponds to the molecular weight and N to the number of amino acids in the protein.

Size-exclusion chromatography
SEC runs were performed with a HiLoad 16/600 Superdex 75 pg column (GE Healthcare) at a flow rate of 0.5 ml/min at 4 C. In total, 4.5 ml samples were loaded using a 5 ml loop and 20 mM Tris (pH 8.0, 200 mM NaCl) as the running buffer. Absorption was measured at a wavelength of 280 nm and values were normalized to the absorption maximum.

X-ray crystallography
For crystallization, 1 μl of reservoir buffer (40 mM sodium citrate pH 3.5, 30% PEG-6000 (v/v), Pb(OAc) 2 (saturated) ) was added to 1 μl of a pro-rhodesain C150A solution (0.7 mg/ml in 10 mM sodium citrate, pH 5). Crystals were grown at room temperature by the hanging drop method within 2 weeks. The crystals were preincubated in reservoir buffer with additional 10% (v/v) glycerol for 20 s before flash freezing in a 100 K N 2 cryo stream. Diffraction data were collected with a Bruker AXS Microstar-H generator and a MAR-scanner 345 mm imageplate detector in a distance of 350 mm and a X-ray wavelength of 1.5417 Å in 1 -steps after 10 min exposition. Data processing was performed by the program XDS in space group H1 (54). For structure determination, the previously determined structure of mature rhodesain in complex with a macrolactam inhibitor (PDB: 6EX8) was used as a model for molecular replacement with PhaserMR of the phenix work suites after removal of the ligand and all water molecules (24,(54)(55)(56). The prodomain was then built into the remaining electron density and final refinement was done with WinCoot (57).

MD simulations
MD simulations were performed with the crystal structure of pro-rhodesain described in this article but containing an active site cysteine residue instead of alanine. All amino acids and crystallographic water molecules were kept, while one glycerol molecule was removed from the structure. Missing residues within the structure were added and the structures were built using tleap of AmberTools17 (58,59). For MD simulations mimicking the protein at pH 8, aspartate and glutamate residues were deprotonated and the catalytic diad was defined to form an ion pair of C150 − (thiolate, CYM) and H287 + (imidazolium, HIP). Additionally, residue H231 was protonated and H240 deprotonated (HIE). Protonation states of pH 4 were determined using predicted pK a -values from MOE, resulting in protonation of all histidine residues, as well as protonated residues D143, 185, 194, 271, 289, 296 (ASH) and E38, 58, 66, 95, 144, 175, 232, 256, 283, 322, and 343 (GLH). Again, the catalytic C150 was deprotonated to form the active ion pair. The protein structures were energetically minimized with sander over 200 time steps. Counter ions (7 Cl − for pH 4 and 10 Na + for pH 8) to Pro-rhodesain structure and dynamics neutralize the system and a TIP3P waterbox exceeding the protein by 10 Å in every dimension were added. System equilibration was performed over 500 ns with gradually releasing the constraints and 500 ns without constraint while heating from 100 to 300 K in an NVT ensemble (60). Subsequent production runs were performed in an NPT ensemble with periodic boundary conditions and a van der Waals cutoff of 14.0 Å over 20 ns. All simulations were performed with the AMBER forcefield (ff14SB) on the high-performance computing (HPC) cluster MOGON of the University of Mainz with NAMD-2.12 (61,62). Simulations were performed four times for each pH and results were analyzed using VMD-1.9.3 and related scripts (63). For the analysis of hydrogen bonds a cutoff distance of 3.0 Å and angle of 20 were selected.

Z-Phe-Arg-AMC docking
The predicted binding mode of the assay substrate in complex with rhodesain ( Fig. 5) was generated by molecular docking. The receptor was prepared within LeadIT-2.3.2 (BioSolveIT GmbH, Sankt Augustin, Germany, 2017, www.biosolveit.de/LeadIT) using the crystal structure of the K11777-rhodesain complex, PDB-ID 2P7U (25,64). The covalent ligand was untethered from the enzyme using MOE (Molecular Operating Environment, 2018.01; Chemical Computing Group ULC: Montreal, QC, Canada, 2018). For docking, the receptor was protonated using the Protoss module within LeadIT and the catalytic dyad was manually set to form the C25 thiolate and H162 imidazolium ion pair (corresponding to C150 and H287 in the proenzyme) (65). Water molecule 512 was kept as part of the binding site. Ligands were energetically minimized using omega2 (OMEGA 3.1.0.3, OpenEye Scientific Software, Santa Fe, NM. http://www. eyesopen.com) and the Merck molecular force field (MMFF94) (66,67). The receptor was validated by redocking of K11777 (FlexX score-24.92 kJ/mol, RMSD of 1.7 Å). The substrate ligand Z-Phe-Arg-AMC was subsequently docked into the binding site.

PET fluorescence experiments
Steady-state fluorescence emission spectra were recorded using a Jasco FP-6500 spectrofluorometer. Sample temperature was set to 37 C using a Peltier thermocouple. A sample concentration of 100 nM modified pro-rhodesain in a 0.5 ml quartz glass cuvette (Hellma) was used. Spectra were recorded in either 50 mM phosphate, pH 8.0, with the ionic strength adjusted to 200 mM using sodium chloride, or in 50 mM acetate, pH 5.5, with the ionic strength adjusted to 200 mM using sodium chloride. Both buffers contained 5 mM EDTA; and 0.3 mg/ml bovine serum albumine and 0.05% Tween-20 to prevent sticking of rhodesain to the glass surface.
PET-FCS was performed on a custom-built confocal fluorescence microscope setup that consists of an inverse microscope body (Zeiss Axiovert 100 TV) equipped with a diode laser emitting at 637 nm as the excitation source (Coherent Cube). The laser beam is coupled into an oil-immersion objective lens (Zeiss Plan Apochromat, 63x, NA 1.4) via a dichroic beam splitter (Omega Optics 645DLRP). The average laser power was adjusted to 400 μW before entering the back aperture of the microscope using an optical density filter. The fluorescence signal was collected by the same objective, filtered by a band-pass filter (Omega Optics 675RDF50), and imaged onto the active area of two fibre-coupled avalanche photodiode detectors (APDs; PerkinElmer, SPCM-AQRH-15-FC) using a cubic nonpolarizing beam-splitter (Linos) and multimode optical fibres of 100 μm diameter. The signals of the APDs were recorded in the cross-correlation mode using a digital hardware correlator device (ALV 5000/60X0 multiple tau digital real correlator) to bypass detector dead time and afterpulsing effects. Fluorescently modified pro-rhodesain was diluted to 1 nM in the buffered solutions specified above for steady-state fluorescence spectroscopy. Samples were filtered through a 0.2 μm syringe filter before measurement, transferred onto a microscope slide, and covered by a cover slip. Sample temperature was set to 37 C using a custom-built objective heater. For each sample the accumulated measurement time of the autocorrelation function, G(τ), was 15 min. G(τ) was fitted using an analytical model for translational diffusion of a molecule that exhibits chemical relaxations (27). ACFs fitted well to a model for diffusion of a globule with either one-or two single-exponential decays: where τ is the lag time, N is the average number of molecules in the detection focus, τ D is the diffusion time constant, a n is the amplitude of the nth relaxation, and τ n is the according time constants. The application of a model for diffusion in two dimensions was of sufficient accuracy because the two horizontal dimensions (x, y) of the detection focus were much smaller than the lateral dimension (z) in the applied setup (27). Errors are s.e. from regression analyses.

Data availability
Raw data for experiments including but not restricted to e.g., CD spectroscopy, PET fluorescence experiments, X-Ray crystallography, or MD simulation trajectories are available upon request from the corresponding authors Ute A. Hellmich, Faculty of Chemistry and Earth Sciences, Institute of Organic Chemistry and Macromolecular Chemistry, Friedrich-Schiller-University, Jena, Germany: ute.hellmich@uni-jena.de or Christian Kersten, Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University, Mainz, Germany: kerstec@uni-mainz.de. The X-ray structure of T. brucei rhodesiense pro-rhodesain C150A has been deposited in the PDB under the accession number 7AVM.
Supporting information-This article contains supporting information. (22,23,68) expressing rhodesain. We gratefully acknowledge the advisory services offered and the computing time granted on the supercomputer Mogon at Johannes Gutenberg University Mainz (hpc.unimainz.de), which is a member of the AHRP (Alliance for High Performance Computing in Rhineland Palatinate, www.ahrp.info) and the Gauss Alliance e.V. We further thank Openeye Scientific for free academic software licenses.
Author contributions-P. J. cloned the constructs, established E. coli expression and purification protocols, purified the proteins, and carried out CD spectroscopy and SEC; P. J., E. J., and C. K. carried out X-ray crystallography; C. K. carried out MD simulations; P. J.  Conflict of interest-The authors declare that they have no conflicts of interest with the contents of this article.