Structure of the Mycosin-1 Protease from the Mycobacterial ESX-1 Protein Type VII Secretion System*

Background: The mycosin-1 protease (MycP1) is essential for export and cleavage of the type VII-secreted virulence-associated proteins involved in pathogenesis of Mycobacterium tuberculosis and related species. Results: The x-ray structure of MycP1, with its proposed propeptide, is described. Conclusion: The proposed propeptide wraps around the perimeter of a subtilisin-like fold, leaving the catalytic center unobstructed. Significance: MycP1 may operate through a novel mode of regulation. Mycobacteria use specialized type VII (ESX) secretion systems to export proteins across their complex cell walls. Mycobacterium tuberculosis encodes five nonredundant ESX secretion systems, with ESX-1 being particularly important to disease progression. All ESX loci encode extracellular membrane-bound proteases called mycosins (MycP) that are essential to secretion and have been shown to be involved in processing of type VII-exported proteins. Here, we report the first x-ray crystallographic structure of MycP1(24–407) to 1.86 Å, defining a subtilisin-like fold with a unique N-terminal extension previously proposed to function as a propeptide for regulation of enzyme activity. The structure reveals that this N-terminal extension shows no structural similarity to previously characterized protease propeptides and instead wraps intimately around the catalytic domain where, tethered by a disulfide bond, it forms additional interactions with a unique extended loop that protrudes from the catalytic core. We also show MycP1 cleaves the ESX-1 secreted protein EspB from both M. tuberculosis and Mycobacterium smegmatis at a homologous cut site in vitro.

Mycobacteria use specialized type VII (ESX) secretion systems to export proteins across their complex cell walls. Mycobacterium tuberculosis encodes five nonredundant ESX secretion systems, with ESX-1 being particularly important to disease progression. All ESX loci encode extracellular membranebound proteases called mycosins (MycP) that are essential to secretion and have been shown to be involved in processing of type VII-exported proteins. Here, we report the first x-ray crystallographic structure of MycP1(24 -407) to 1.86 Å, defining a subtilisin-like fold with a unique N-terminal extension previously proposed to function as a propeptide for regulation of enzyme activity. The structure reveals that this N-terminal extension shows no structural similarity to previously characterized protease propeptides and instead wraps intimately around the catalytic domain where, tethered by a disulfide bond, it forms additional interactions with a unique extended loop that protrudes from the catalytic core. We also show MycP1 cleaves the ESX-1 secreted protein EspB from both M. tuberculosis and Mycobacterium smegmatis at a homologous cut site in vitro.
Mycobacterium tuberculosis is an intracellular pathogen that specifically requires specialized type VII secretion systems (T7SS) 3 to passage proteins across its cell wall, a process required for survival in the human host phagosome (1). Mycobacterial genomes encode up to five nonredundant T7SS (also referred to as ESX-1 to 5). The most well studied system is ESX-1, which M. tuberculosis and related pathogens use to secrete the T cell antigens ESAT-6/CFP-10 in addition to other ESX-secreted-proteins (Esps) that are required for progression of disease (2)(3)(4)(5)(6). MycP1 (mycosin-1 protease) is essential for ESX-1 secretion in pathogenic M. tuberculosis and for efficient DNA conjugation in the avirulent saprophyte Mycobacterium smegmatis (7,8). The other T7SS of known function are involved in iron acquisition (ESX-3) (9) or export of the virulence-related PE/PPE (Pro-Glu/Pro-Pro-Glu) family of proteins (ESX-5) (10,11). MycP1 is one of six conserved components that are found in all five T7SS and are thus designated MycP1-5 in accordance with their associated systems (12). Although a recent compositional analysis of the ESX-5 T7SS inner membrane complex did not identify MycP5 as part of the assembly (13), knocking out MycP1 abrogates ESX-1 protein secretion, indicating that its presence is essential (7,14).
Proteins exported by the T7SS have been detected in culture filtrates and in cell wall fractions as lower molecular mass/proteolytically processed species compared with those which are not exported (3,4,7,11,15). For example, LipY, a substrate of the ESX-5 system, is cleaved between its N-terminal PE domain and its C-terminal lipase domain following secretion (15). Members of this enigmatic PE family, as well as those of the PPE family, have been shown to be required for export to the cell wall (15,16). It was thus hypothesized that the MycP family may function in removal of these PE/PPE "leader" domains following (or concomitantly with) secretion (15). Ohol and co-workers provided the first evidence of MycP1 tb proteolytic activity against a T7SS substrate by complementing a ⌬mycP1 tb knockout strain with a mutant in which the predicted serine nucleophile of the enzyme was changed to alanine (MycP1 tb -S332A) (7). This catalytically impaired mutant resulted in an altered cleavage pattern of the C-terminal portion of the secreted protein EspB tb which suggested that native MycP1 tb mediates cleavage at two sites (7). Intriguingly, the mutant strain also showed a 2-fold increase in secretion of known ESX-1 substrates as well as reduced virulence in mice, leading to the hypothesis that MycP1 activity negatively regulates secretion via EspB (7) tion-coupled cleavage of the EspB C terminus (3,4,7), convincing data showing the exact residue at which proteolysis occurs have not been provided. Proteases are often synthesized with propeptide segments that act as intramolecular chaperones to promote folding as well as regulate activity to prevent untimely and potentially destructive proteolysis in the incorrect cellular compartment (17). These propeptides range from short dipeptides to independent domains and are often proteolytically processed and removed as the enzyme matures (17). Sequence analysis (18,19) of the MycP family indicates that they have an N-terminal Sec signal sequence followed by a 40-amino acid stretch that, despite showing no sequence homology to propeptides of other subtilisin-like proteases, has previously been proposed to operate as a propeptide (7,19). Following this is a predicted subtilisin domain which is attached to a predicted C-terminal transmembrane anchor by a ϳ30-amino acid proline-rich linker.
Despite more than a decade of investigation, there are minimal data describing maturation events leading to MycP activity. In one study, M. tuberculosis MycP1 tb expressed in infected macrophages was assayed by immunoblotting with the enzyme found at a lower apparent molecular mass after 6 weeks compared with samples taken from initial cell cultures; the authors speculated that this electrophoretic shift was related to loss of the putative propeptide (19). Recently, Ohol et al. reported that a recombinant maltose-binding protein fusion construct of M. smegmatis MycP1 sm (24 -407) is initially inactive (7). However, extended sample aging with Factor Xa resulted in a shortened species as determined by SDS-PAGE relative to the maltosebinding protein fusion as well as an apparent gain in promiscuous proteolytic activity against a range of fluorogenic tetrapeptide substrates (7). The authors attribute the shortened species to loss of the proposed prodomain; however, data showing specific cut sites to validate this hypothesis were not explicitly provided.
Toward clarifying these issues pertaining to MycP1 maturation in relation to subsequent proteolytic activity, we present the first crystallographic analysis of a mycosin protease, including the proposed propeptide region (which we will herein refer to as the "N-terminal extension"). MycP1 sm from M. smegmatis shares 78% sequence identity to its counterpart in M. tuberculosis, making it a suitable model system for studying mycosins of pathogenic mycobacteria. Our structure shows that the N-terminal extension is not a conventional subtilisin foldase, but rather an extended proline-rich, disulfide-tethered appendage that interacts extensively with the subtilisin core. This observed N-terminal extension of MycP1 does not occupy the active site cleft in a manner typical of previously characterized propeptides, suggesting regulation of the catalytic domain through a novel mechanism. Further, we show that MycP1 sm with the N-terminal extension intact is slowly active against the C-terminal region of purified EspB from both M. smegmatis and M. tuberculosis species and determine the conserved cut site using mass spectrometry and N-terminal sequencing.

MATERIALS AND METHODS
Expression and Purification of MycP1-The full MycP1 sm gene was synthesized with codons optimized for expression in Escherichia coli (Biobasic). Residues 24 -407 were cloned into pET28a(ϩ) using the restriction-free method (20) and transformed into E. coli BL21 codon-plus RIPL-competent cells (Stratagene) for use in protein expression. BL21 E. coli cells expressing native His-tagged (MGSSHHHHHHSS-GLVPRGSH) MycP1(24 -407) were grown as described (21), and SeMet derivative MycP1 (24 -407) were grown in minimal media as described (22). Induction was initiated using 1 mM isopropyl 1-thio-␤-D-galactopyranoside at A 600 ϭ 0.6 followed by growth for 20 h. After harvesting by centrifugation, cells were frozen after weighing resuspensions in a 1:1 volume of 2ϫ lysis buffer (100 mM HEPES, pH 7.5, 1 M NaCl, 20 mM imidazole, and 1 mM TCEP). When needed, the pellet was thawed and diluted three times with IMAC buffer (50 mM HEPES, pH 7.5, 500 mM NaCl, 10 mM imidazole, 0.5 mM TCEP) (16) and lysed using an Avisten C5 homogenizer. Lysate was spun at 40,000 rpm for 1 h, and the supernatant was loaded onto a 1-ml HisPur IMAC column (Thermo, 88225) using an AKTA purifier FPLC system, washed with 10 column volumes of 25 mM imidazole, and eluted with 8 column volumes 500 mM imidazole without subsequent cleavage of the His tag. The eluate was immediately concentrated and injected onto a Superdex 75 XK 26/60 column equilibrated with 20 mM HEPES, pH 7.5, 150 mM NaCl, and 1 mM TCEP, and the fractions containing the center of the main UV peak immediately concentrated to 20 mg/ml and set up for crystallization. The N-terminal deletion constructs were created using the QuikChange protocol (Stratagene). Small scale protein purification of the MycP1 truncation mutants was carried out as described previously (21).
Crystallization, Structure Determination, and Analysis-MycP1(24 -407) crystals grew from 0.2 M NaF, 25% PEG 3350 and reached maximum size after 1 week. Crystals were harvested and briefly washed in mother liquor containing 23% glycerol and rapidly cooled in liquid nitrogen before screening using a Rigaku RU200 rotating anode and Mar345db detector. SAD data were collected at the Canadian Light Source CMCF2 at a peak wavelength of 0.97895 Å and processed with imosflm and SCALA (23,24). Phenix AutoSol (25) was used for phasing, with six selenium atoms found in the substructure solution (figure of merit ϭ 0.37). The model was built and refined using Phenix and Coot (26). Data collection and refinement statistics are given in Table 1. The resultant structure served as a molecular replacement search model to solve a second, higher resolution dataset using crystals grown in 0.1 M Bistris propane, pH 6.77, 0.18 M sodium thiocyanate, and 26% PEG 3350, which were cryoprotected as described above. Both structures had one molecule in the asymmetric unit, and no significant model changes were observed despite the differing space groups in each (I222 versus P2 1 2 1 2 1 ). The structure was analyzed and figures generated using UCSF Chimera (27). Active site cleft sizes for MycP1 and thermitase (PDB ID code 1THM) were determined with DogSiteScorer (28).
MycP1 and EspB Cleavage Assay-Full-length EspB tb (1-460) (Rv3881c) and EspB sm (1-520) (MSMEG_0076) were cloned into pET28a(ϩ), and N-terminally His-tagged (MGSSHHHHHHSSGLVPRGSH) EspB was expressed in E. coli BL21 codon-plus RIPL cells grown and processed as described above for MycP1. Cells were harvested and lysed in 100 mM HEPES, pH 7.5, 500 mM NaCl and purified over HisPur nickel-nitrilotriacetic acid resin (Thermo, 88223) and further purified using a Superdex 75 column with 20 mM HEPES, pH 7.5, 150 mM NaCl as the elution buffer. MycP1 used for activity assays was purified as above except the buffer used in all stages was 20 mM MES, pH 6.0, 50 mM NaCl. MycP1 and EspB were incubated separately (for controls) and mixed together in 20 mM MES, pH 6.0, 50 mM NaCl at a concentration of 0.2 g/l in 50-l reaction volumes and incubated at 37 ºC. 10-l aliquots were withdrawn at the indicated time points. The samples were analyzed using SDS-PAGE.
In-gel Digestion and Mass Spectrometry Determination of EspB Cleavage Products-Protein bands were excised and ingel digested with Trypsin Gold (Promega) as described (29). Eluted peptides were acidified with 0.5% formic acid and purified using C18 Stage tips (30) before analysis on a capillary liquid chromatography system (LC Packings; Dionex) coupled to a quadrupole time-of-flight mass spectrometer (QSTAR XL, Applied Biosystems, operated by the UBC Centre for Blood Research Mass Spectrometry Suite). Peptides were separated on a column packed with Magic C18 resin (Michrom Bioresources) using a 0 -80% gradient of organic phase over 90 min. MS data were acquired automatically using the Analyst QS software, v1.1 (Applied Biosystems). Peptides were identified from a protein database containing the M. tuberculosis proteome database (Uniprot) with appended M. smegmatis MTB48, common laboratory contaminant sequences and a reverse sequence decoy database, using Mascot v.2.3 (Matrix Science). Search parameters included mass tolerances of 200 ppm for the parent ion and 0.4 Da for the fragment ions, trypsin, or semitrypsin as cleavage specificity with up to two missed cleavages, and carboxyamidomethylation of cysteine residues (ϩ57.02 Da). To be accepted, spectrum-to-sequences matches required a Mascot ion score Ͼ25 for tryptic peptides and an ion score Ͼ38 for semi-tryptic peptides.
Edman Sequencing-Separated EspB cleavage fragments were transferred from SDS-PAGE gel to a PVDF membrane and stained with 0.1% (w/v) Coomassie Brilliant Blue R-250, 40% methanol, 10% acetic acid for 5 min. The membrane was destained in 40% methanol, 10% acetic acid and then rinsed for 30 s in 90% methanol, 5% acetic acid. The membrane was dried, and EspB cleavage products were cut out. The first six amino acids were determined by Edman sequencing at the Advanced Protein Technology Centre within the Hospital for Sick Children, Toronto.
Peptide Docking-MycP1 was aligned with subtilisin BPNЈ (PDB ID code 2SNI) and the CI-2 chain merged into the MycP1 structure. Residues PVGTIVTMEYRIDR were changed to the identified EspB sm cut sequence DPSLGKPASAGGGGG and the remainder of CI-2 deleted. Hydrogens and missing side chains were added to the model which served as the input for FlexPepDock (31). 100 low resolution and 100 high resolution structures were generated, and the peptide with the lowest energy/RMSD was analyzed.
Differential Scanning Fluorometry and Circular Dichroism (CD)-Differential scanning fluorometry experiments were carried out using 0.2 mg/ml protein in 100 mM MES, pH 6.0, 50 mM NaCl with SYPRO Orange (5ϫ final concentration; Invitrogen S6650). The mixture was monitored in MicroAmp Fast optical reaction plates (Applied Biosystems 4346906) using a 25-l assay volume and an Applied Biosystems StepOnePlus RT-PCR system set to ROX (excitation, 488 nm; emission, 620 nm). CD spectra were acquired using 5 M protein in 10 mM Tris, pH 7.5, 30 mM NaCl, 1 mM EDTA over wavelengths spanning 195-275 nm.

MycP1 with Its N-terminal Extension Intact Cleaves Two Purified EspB Homologues in the C-terminal Region in Vitro-
We first analyzed MycP1 sm (residues 24 -407, with predicted N-terminal signal sequence and C-terminal transmembrane helix deleted) by mass spectrometry (Ϯ3-Da resolution) and measured the molecular mass to be that of the expected construct, indicating the proposed propeptide is stably attached (data not shown). To determine whether MycP1 sm (24 -407) is active in vitro against M. tuberculosis EspB (EspB tb ) as previously described (7), we purified EspB tb and mixed it with MycP1 sm at a molar ratio of 1:1 and conducted cleavage assays by observing electrophoretic shift on SDS-PAGE. EspB tb (consistently running as an ϳ55-kDa band) migrated at lower molecular mass (ϳ48 kDa) following overnight incubation with MycP1 at 37°C; this truncated band was not observed when EspB was incubated with the MycP1 sm -S334A catalytic mutant (Fig. 1A). A doublet of two additional closely spaced lower molecular mass EspB tb bands was also produced (ϳ10 kDa). We further tested and confirmed MycP1 sm activity against purified EspB from M. smegmatis (EspB sm ) (Fig. 1B). In contrast to the ϳ10-kDa doublet band observed for EspB tb , a single ϳ12-kDa band was produced when EspB sm was cleaved. At an equimolar ratio under the assay conditions tested, MycP1 required Ͼ6 h incubation to completely cleave either EspB homologue (Fig. 1C). To determine where MycP1 cleaves EspB tb , the cleavage products, along with full-length EspB as a control, were excised from the gel and subjected to in-gel digestion by trypsin followed by LC-ESI-MS/MS analysis. Peptides from the large molecular mass band mapped to a discrete N-terminal region (residues 1-356), whereas peptides from the smaller two bands mapped primarily to the C-terminal region (358 -460) (Fig. 1D). A similar peptide distribution was also observed in the LC-MS/MS analysis of the EspB sm cleavage products (data not shown). Thus, MycP1 sm , with its N-terminal extension intact, cleaves two distinct EspB homologues in the C-terminal region.
To locate the MycP1 cleavage site precisely, we repeated the spectrum-to-sequence assignment searches to include semitryptic peptides, i.e. peptides that had been cut by trypsin at one terminus only, which would suggest the remaining terminus resulted from MycP1 protease activity. Analysis of the middle (ϳ10-kDa) EspB tb cleavage product revealed the most N-terminal semitryptic peptide matching to EspB tb at residues 359 -394, indicating a nontryptic cleavage between Ala 358 and Ser 359 ( 354 AVKAA 358 2 359 SLGGG 363 , Fig. 1E). Edman sequencing independently confirmed this site as the N terminus of the C-terminal EspB tb fragment, strongly suggesting MycP1 cleav-age at this site. No semitryptic peptide could be identified for M. smegmatis EspB sm but Edman sequencing identified the N terminus of the C-terminal cleavage product at ( 402 SLKPA 406 2 407 SAGGG 411 ). For both EspB homologues, the predicted molecular mass of fragments resulting from cleavage at these sites correlated well to their observed apparent molecular mass on the gel. Pairwise sequence alignment of EspB tb and EspB sm indicates this cleavage site is at a homologous location, immediately preceding a polyglycine stretch (Fig.  1F). Multiple sequence alignment indicates that these residues are similar across EspB species (Fig. 1F). According to protease nomenclature, substrate residues upstream of the cleaved scissile bond are denoted as P1-P6 in the sequence, and residues downstream of the scissile bond are termed P1Ј-P6Ј (32). The binding of the P1 side chain to the corresponding S1 pocket of the enzyme is often deemed most critical in aiding in the optimal positioning of the adjacent substrate peptide carbonyl for nucleophillic attack, and thus greatest conservation is typically observed at this position. The P2Ј, P2, and P4 positions are also typically key specificity determinants in subtilisin-like protease substrate recognition. Thus, it is notable that residues comprising these positions in our identified EspB tb and EspB sm cleavage sites are identical or similar in chemical nature to each other  sequence and may indicate a second cleavage site, which would explain the doublet band we observe when EspB tb is cleaved. Duplex EspB tb cleavage has also been observed previously in the cellular context (7). Indeed, LC-MS/MS analysis of the lower EspB tb cleavage product identified an N terminus at Gly 387 , and N-terminal sequencing supports cleavage at this second VRPA cut site (Fig. 1E). Both EspB tb sites we identified at Ala 358 and Ala 386 are within the vicinity but differ from previously proposed cut sites suggested from experiments using cultures of M. tuberculosis and Mycobacterium marinum cells secreting EspB (3,4,7).
Overall Architecture of MycP1-To further understand MycP1 substrate specificity as well as the possible role of the N-terminal extension as a regulatory propeptide, we sought to determine its crystal structure. MycP1(24 -407) crystallized in the orthorhombic space group P2 1 2 1 2 1 , and the structure was determined to 1.86 Å resolution (Table 1, PDB ID code 4J94). As predicted from sequence analysis, residues 62-390 of MycP1 adopts a subtilisin-like ␣/␤-fold centered on a central 7-stranded parallel ␤-sheet ( Fig. 2A), with a DALI (33) search yielding subtilisin BPNЈ as its closest structural homologue (root mean square deviation of 0.965 Å over 186 common C␣ atoms) (Fig. 2C). Thus, the canonical subtilisin nomenclature will be used to describe the structure (32). The N-terminal extension of MycP1(24 -61) (proposed earlier to form the putative propeptide (7,19)) is observed to be structurally and functionally distinct from typical subtilisin propeptides (32) in that it does not occupy the active site or have a characteristic foldase domain, which acts as a intramolecular chaperone for subtilisn-like enzymes (35). MycP1 also contains several other unique insertions localized around the active site (Fig. 2B). Part of the C-terminal proline-rich membrane linker (390 -403) was resolved and is found hugging the subtilisin core in an extended conformation, passing adjacent to the protein N terminus. Coulombic electrostatic surface maps indicate that MycP1 is negatively charged over the majority of its surface, including the active site cleft. In contrast to other subtilisin-like proteases (32), MycP1 does not appear to bind calcium ions to promote structural integrity.
Catalytic Triad-In structures derived from two different space groups, the positioning of the MycP1 catalytic triad is unusual, namely, the side chain conformation of the nucleophile Ser 334 is flipped and facing away from the presumed scissile bond location (Fig. 3A). Moreover, the phi angle () for His 123 in our MycP1 structure is Ϫ147°, compared with Ϫ60°in subtilisin BPNЈ. A survey of subtilisin-like enzymes of known structure reveal both / torsion angles for the catalytic histidine are conserved in all other cases ( ϭ Ϫ58°Ϯ 5.8 (S.D.), n ϭ 22), making the MycP1 backbone atypical in this region (Fig.  3B). The increased planarity of this peptide bond alters the back bone scaffold, and in combination with other global structural changes, His 123 C␣ and its imidazole side chain are positioned ϳ1.7 Å further from the active site cleft in comparison to subtilisin BPNЈ (Fig. 3, C and D). The other two triad members, Ser 334 and Asp 92 , maintain their hydrogen bonds with His 123 (Ser 334 -O␥-His 123 -N⑀ and Asp 92 -O␦1-His 123 -N␦, respectively), resulting in all three triad residues occupying positions further from the presumed scissile bond position, with perhaps the most important change being the Ser 334 side chain adopting a conformation that in our structure appears suboptimal for nucleophilic attack, and a conformational change would be required to properly position the catalytic machinery. The oxyanion hole residue Asn 239 appears positioned similarly as in other subtilisin-like enzymes.
Active Site Cleft-The predicted MycP1 catalytic cleft is unusually large (volume ϭ 1083 Å 3 ) compared with typical subtilisin-like enzymes of known structure (e.g. thermitase ϭ 373 Å 3 ) (28) (Fig. 4A). This is in part due to a unique 18-residue insertion after the eI substrate binding strand (termed here the eI loop; Fig. 4B) that creates a wide, elongated groove via its interaction with the N-terminal extension (Fig. 4C). A second mycosin-specific insertion, the e8/e9 ␤-strands, juts over the cleft to form a "lid" over the catalytic triad. To locate putative subsites (denoted by S) we used Rosetta FlexPepDock (31) to model the newly identified M. smegmatis recognition motif (SLKPASAGG) into the cleft of MycP1 sm (Fig. 4D). As mentioned above, P1-P4 and P2Ј are typically the most important determinants for specificity in subtilisin-like enzymes. The conserved P1 Ala and P2 Pro side chains are complementary to their respective subsites and contact Ala 204 (C␤) and Thr 156 (C␥), respectively. MycP1 appears to have a strong preference for Lys/Arg at P3, unusual in that this side chain is typically solvated and not a major contributor to specificity in most subtilisin-like enzymes (32). However, MycP1 has evolved a disulfide-stabilized loop that protrudes over top of the cleft to scaffold Asp 243 , creating a negatively charged S3 site that is optimally positioned to interact with the positively charged P3 side chain. The S2Ј site is very similar to that of subtilisin BPNЈ, an apolar pocket created by Phe 292 that could accommodate a hydrophobic residue, and is consistent with the Leu/Ala identified in our cleavage site. Although P4 is [LV] and should reasonably bind to a hydrophobic pocket, an obvious S4 subsite was not evident in the docking experiment. Of final note, the eI/eIII substrate-docking strands that typically orient the pep-tide substrate for catalysis (32) are separated by ϳ15 Å in MycP1 (Fig. 4E) and appear incapable of forming such a triplestranded ES complex without a conformational change/induced fit mechanism that would bring these active site elements into sufficiently close proximity. Taken together, the   (27), are shown. The catalytic triad is colored green. B, the eI loop and hD helix are unusually long compared with nearly all other subtilisin-like enzymes; the structural alignment of subtilisin-like enzymes highlights how these features are typically conserved in length and position. The outliers are C5a peptidase (red, 3EIF) and MycP1 (green). Note how the eI substrate docking strand is unusually positioned. C, the N-terminal segment (green) and eI loop (gold) interact to form one side of the cleft, the docked EspB peptide (orange) indicates the cleft. D, a docked peptide corresponding to the EspB sm recognition sequence is shown (orange) with P and S sites indicated. E, MycP1 eI/eIII substrate docking strand spacing (left) compared with subtilisin BPNЈ (right). Previously published PDB files used in structural alignment were: 3AFG, 3D43, 3EIF, 3BX1, 3F7M, 3QFH, 2ID4, and 3HJR. MycP1 sm S2Ј, S1, S2, and S3 subsites in the structure appear complementary to the P2Ј, P1, P2, and P3 residues identified in the EspB sm cleavage site; however, it is possible that additional conformational changes may be needed to optimize the ␤-strand interactions and formation of downstream subsites involved in binding substrate and subtle repositioning of catalytic groups to promote optimal catalysis.
N-terminal Extension-The unusual N-terminal extension of MycP1 (residues 24 -61) follows a securely anchored extended path, wrapping along the surface of the subtilisin domain (Fig.  5). At the extreme N terminus, a diproline motif (residues 24 -29) sits in an aromatic cradle of MycP1 formed by conserved residues Trp 272 , Trp 261 , Trp 298 , and Tyr 273 . This cradle is the result of an insertion (between the e5/e6 strands) composed of two short 3 10 helices flanking a ␤-strand, that effectively creates an outcropping from the catalytic domain (Fig.  5B). The two proline side chains of the N-terminal motif stack favorably onto the tryptophan indole rings (Fig. 5B). This region of the N-terminal extension adopts a polyproline II helix, with characteristic 120°spacing between residues imposed by the conformational rigidity of the proline residues (36). Its interaction with the aromatic cradle is reminiscent of proline recognition motifs of the WW family (37). The N-terminal extension passes underneath the hD/hE active site helices where a second diproline motif forms another polyproline II helix and is anchored by interactions with a patch of hydrophobic amino acids (Fig. 5C). The N-terminal extension is further bound through a series of electrostatic interactions, ␤-sheet interactions with the eI loop and tethering by the Cys 51 -Cys 120 disulfide bond (Fig. 5A). Generally, the N-terminal extension is rich in prolines, which appear required to facilitate its extended path around the subtilisin domain of MycP1. Sequence alignment indicates that this is likely a feature common to the entire mycosin family with the upstream end exhibiting more conservation than the downstream portion of the N-terminal extension (Fig. 5D). We observed that MycP1 was expressed and folded (albeit with reduced solubility) even with up to 33 of 40 N-terminal extension residues removed, suggesting that its primary function is unrelated to folding but may contribute to stability (Fig. 5, D and E). Differential scanning fluorometry comparing MycP1(24 -407) and MycP1(49 -407) showed a 7°C difference in melting temperature (T m ϭ 64°C versus 57°C), indicating that the N-terminal extension contributes to overall stability (Fig. 5F).

DISCUSSION
MycP1(24 -407) was crystallized with a native catalytic triad that failed to hydrolyze the proposed prodomain, indicating this full-length enzyme is highly stable over the course of the purification procedure and crystallization experiment. In contrast to subtilisin prodomains, which form a typical foldase fold that occupies the active site within reach of the Ser-His-Asp catalytic triad, the structurally distinct MycP1 N-terminal extension is not bound within the cleft and is Ͼ13 Å away from the Ser 334 nucleophile. This rules out the possibility of autoproteolytic cleavage of this region in cis, barring conformational rearrangement. Crystallization of wild type subtilisin-like enzymes with their uncleaved prodomains is rare (a catalytically impaired form is typically required), the only other example to our knowledge is the proprotein convertase PCSK9 (38). Further, our mass spectrometry data showing MycP1, with its N-terminal extension intact, is able to cleave EspB (albeit slowly) at a [L/V]K[P/A]A 2 S[L/A]GG site adds further support that this appendage is not a propeptide. Thus, it is possible that the N-terminal extension may not be cleaved but may modulate proteolysis in response to an appropriate trigger. This mode of operation may be comparable with the intracellular subtilisin proteases, which have a short N-terminal extension that operates through a combined active site blocking/catalytic triad rearrangement mechanism (39). Another possibility is that cleavage of the N-terminal extension may occur in a cellular context, but the appendage could remain anchored to the subtilisin domain through the disulfide bond (e.g. as with chymotrypsin (17)), which could help explain previous reports of activation resulting from addition of Factor Xa protease (7). We have shown that the majority of the N-terminal extension is not required for folding, but does contribute to overall stability. Given that the extension interacts with several features surrounding the active site (the eI strand (residues 159 -163), aromatic cradle (residues 261-274), the Cys 51 -Cys 120 disulfide bond, and e8/e9 extension (residues 320 -328), we suspect a role in modulating catalysis or conferring specificity. Another possibility is that the N-terminal extension is serving as a placeholder in obstructing binding surfaces, and if removed, these surfaces would have the potential to target substrates or other components of the secretion system, perhaps through the same proline recognition motifs which anchor the uniquely pro-rich N-terminal extension in the structure.
As MycP1 is capable yet inefficient at cleaving EspB in vitro, we hypothesize the MycP1 structure presented here is an isolated T7SS component in its preassembly form and could still represent a zymogen with "leaky" activity. Given that the previous in vitro fluorogenic substrate profiling data indicated that an uncharacterized form of MycP1 is capable of catalysis with broad specificity (7), it is unsurprising that MycP1 is synthesized with the intrinsic capacity to prevent rapid premature proteolysis, which could cause damaging nonspecific cleavage. It is well understood that nucleophilic cleavage of peptide bonds requires a precisely defined geometric arrangement (40). Our analysis of the MycP1 native catalytic triad allows us to propose one possible mechanism by which a subtle conformational change, for example through N-terminal segment cleav-age or binding to a cofactor or binding partner, could twist the His 123 torsion angle and Ser 334 side chain into positions more similar to that of catalytically competent proteases. MycP1 may have extrinsically low activity in the absence of unknown cofactors including the lipidic membrane environment, other ESX components such as the predicted EccBCDE ATPase-translocon complex, or as-of-yet uncharacterized components of the extracellular portion of the secretion apparatus. Perhaps an undefined temporal trigger mechanism may ensure proteolysis only occurs as ESX substrates are crossing the membrane.
Due to its essential action in virulence and relatively accessible nature, MycP1 is an attractive extracellular anti-tuberculosis drug target. Additionally exciting is the mycosin potential as an engineerable factor in rational design of more effective tuberculosis vaccines due to its influence on processing and presumably the downstream function of highly antigenic/virulence-associated EspB and the PE/PPE protein families (7). Indeed, manipulation of T7SS have driven major advancements in mycobacterial vaccinology (34,41), and we have provided the first structural template for engineering vaccine strains that perturb the influence of this protease on regulation, secretion, and processing of secreted tuberculosis antigens.