A New Autocatalytic Activation Mechanism for Cysteine Proteases Revealed by Prevotella intermedia Interpain A*

Prevotella intermedia is a major periodontopathogen contributing to human gingivitis and periodontitis. Such pathogens release proteases as virulence factors that cause deterrence of host defenses and tissue destruction. A new cysteine protease from the cysteine-histidine-dyad class, interpain A, was studied in its zymogenic and self-processed mature forms. The latter consists of a bivalved moiety made up by two subdomains. In the structure of a catalytic cysteine-to-alanine zymogen variant, the right subdomain interacts with an unusual prodomain, thus contributing to latency. Unlike the catalytic cysteine residue, already in its competent conformation in the zymogen, the catalytic histidine is swung out from its active conformation and trapped in a cage shaped by a backing helix, a zymogenic hairpin, and a latency flap in the zymogen. Dramatic rearrangement of up to 20Å of these elements triggered by a tryptophan switch occurs during activation and accounts for a new activation mechanism for proteolytic enzymes. These findings can be extrapolated to related potentially pathogenic cysteine proteases such as Streprococcus pyogenes SpeB and Porphyromonas gingivalis periodontain.

Periodontal disease (PD) 5 affects the tissues that surround and support the teeth and may lead to loosening and eventual loss of teeth if untreated. It is caused by bacteria and affects mildly 90% and severely 10% of the population worldwide (1,2). In addition, symptoms of PD appear in a series of systemic diseases due to its inflammatory and infective character (2,3). Present day treatment and curettage of severe PD includes the mechanical cleansing of the affected area and is efficient in general. However, it is costly, time consuming, and painful and needs frequent repetition. In addition, it may entail the indiscriminate usage of antibiotics, which contributes to the spread of antibiotic-resistant strains (2,4). Consequently, there is a need for innovative and specific therapeutic approaches against PD.
Prevotella intermedia is a major bacterial periodontal pathogen in humans together with Porphyromonas gingivalis, among others (5,6). Such bacteria colonize the gingival crevice and produce virulence factors that cause disease. Bacterial infection leads to the bacterial secretion or induction of host overproduction of proteolytic enzymes such as bacterial collagenases, matrix metalloproteases, and serine and cysteine proteases (CPs) (2,7,8). These proteases destroy host tissue and compromise host defenses. In addition, proteases may give rise to fibrinolytic activity and inactivate components of the bloodcoagulation cascade such as the protease inhibitors, ␣ 1 -proteinase inhibitor and ␣ 2 -macroglobulin. Proteolysis further covers alimentary requirements, because most of bacterial nutrition is obtained from degraded periodontal tissue and tissue fluid (9).
Most studies on the bacterial proteolytic armamentarium in PD have been performed with P. gingivalis (9). In contrast, the factors governing P. intermedia infection, a black-pigmented Gram-negative obligate anaerobic non-motile rod bacterium, are poorly understood (7). In humans, Prevotella sp. have frequently been recovered from subgingival plaque in patients suffering from acute necrotizing gingivitis, pregnancy gingivitis, and adult periodontitis (10). In addition, Prevotella species easily acquire resistance toward antibiotics, which hampers their elimination (11). A deep molecular knowledge of how infection and resistance occur is crucial for the development of alternative treatments. In P. intermedia, several proteases have been described, among them trypsin-like serine proteases, a dipeptidyl peptidase IV and CPs (12)(13)(14), but no structural studies are available that could help in understanding their particular mode of action or facilitate the design of specific drugs. The structures of some clan-A papain-like CPs (according to the MEROPS data base (15)) from other infective bacteria are known, namely those of staphopain A and B from Staphylococcus aureus (16,17), the avirulence putative peptidase AvrPphB from Pseudomonas syringae (18), and streptopain (alias streptococcal pyrogenic exotoxin B and SpeB) and IdeS endopeptidase, both from Streptococcus pyogenes (19,20). Together with other bacterial enzymes such as bleomycin hydrolase from Lactococcus lactis and a calpain-like enzyme from P. gingivalis, they may be among the ancestral enzymes that gave rise to the 20 families currently identified within this clan of proteases (15,21). They display a relatively broad substrate specificity but are restricted to a small group of related bacterial species or are even limited to a single species, thus constituting attractive targets for the selective design of antibiotics (22). All these proteases have been identified as or proposed to be secreted virulence factors that elicit nutrient generation, evasion of the adaptive immune system response through inactivation of immunoglobulins, or release of bacterial proteins from the cell surface (23).
For more than 60 years, SpeB, a protein secreted by Streptococcus pyogenes (24), was considered a unique CP, unrelated to plant papains or vertebrate cathepsins, and the founding member of family C10 within clan CA (15). A recent analysis of bacterial genomes identified genes encoding potential SpeB orthologues in several species, predominantly Bacteroidetes (31). Interestingly, two forms of genes are common, either short orthologues encoding an SpeB-like protein with an N-terminal pro-domain and a catalytic CP domain or large orthologues with an additional large C-terminal extension, which shares no similarity with any other proteins sequenced. The latter orthologues are present in bacteria that are involved in pathogenicity of periodontal disease in humans. With this in mind, a genome search within P. intermedia 17 was undertaken, and three open reading frames potentially encoding CPs were identified (22). We studied the first of these potential proteases, interpain A (InpA), encoded by locus PIN0048. This gene encodes a long SpeB-orthologue of 868 residues, including a 44-residue signal peptide, a pro-domain (Ala 1 -Asn 111 , see Fig. 1), a catalytic domain (Val 112 -Pro 359 ) and a further 465 C-terminal residues arranged in distinct domains, with putative regulatory and secretory functions (25). We cloned, overexpressed, purified, and functionally analyzed protein variants comprising the first two domains, the wild-type (wt) form and a variant, in which the active-site Cys 154 had been mutated to alanine (C154A), hereafter termed pro-cd-InpA and pro-cd-InpA C154A, respectively. We further analyzed the three-dimensional structures of a major fragment of pro-cd-InpA C154A and of the wt catalytic domain, cd-InpA. Unexpectedly, these studies have uncovered a hitherto undescribed activation mechanism for cysteine proteases and helped us to understand a family of virulence factors produced by human pathogens.

EXPERIMENTAL PROCEDURES
Expression, Mutant Construction, and Purification of Pro-interpain A-Genomic DNA of P. intermedia was extracted from strain ATCC 25611. The structural gene region of InpA comprising the pro-domain and the catalytic domain, pro-cd-InpA, was amplified by PCR using forward primer 5Ј-ATGCCATG-GCAAAGCCACGCACAAAGGAACAG-3Ј with an NcoI recognition site and reverse primer 5Ј-ATGCTCGAGTGGTTT-TCCGTAAACACCC-3Ј with an XhoI recognition site. Because the NcoI site encompasses the ATG start codon, two bases (CA) were introduced into the forward primers immediately after the NcoI site for in-frame translation of the target protein. This genetic manipulation inserted a methionine before the N-terminal alanine residue of InpA. In addition, the reverse primer introduced two additional codons (CTC GAG) for a leucine and a glutamate following the C-terminal proline residue of pro-cd-InpA. The PCR product was purified and cloned into the NcoI/XhoI site of pET24d(ϩ) expression vector (Novagen), which provides the coding sequence for a C-terminal hexahistidine tag (His 6 ). The recombinant plasmid was transformed into Escherichia coli strain BL21(DE3) pLysS under the control of the T7 promoter. The wt construct was used to produce mutation C154A using overlap extension PCR (26). The correctness of the constructs was verified by doublestranded DNA sequencing.
Protein production and purification were essentially the same for the wt and the mutant protein. Cells freshly transfected with the expression plasmid were grown at 37°C to an optical density (A 600 ) of 0.7-0.8 in 1 liter of Luria-Bertani medium supplemented with 2% glucose and kanamycin sulfate (50 g/ml). The culture was induced with isopropyl-1-thio-␤-Dgalactopyranoside to a final concentration of 0.1 mM and further incubated at 26°C for 2-3 h for protein production. Cells were harvested, washed with phosphate-buffered saline buffer, and resuspended in binding buffer A (20 mM sodium phosphate, 500 mM NaCl, 20 mM imidazole, pH 7.4) supplemented with 1.5 mM 4Ј,4Ј-dithiodipyridine (a reversible CP inhibitor), 6 mM 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate, 10 M phenylmethylsulfonyl fluoride, 1 mM HgCl 2 , and 1 mM 1,4-dithio-DL-threitol (DTT). The latter compounds were added to prevent protein aggregation and autolysis. Cells were lysed by ultrasonication on ice for ϳ5Ј, and cell lysates were cleared by centrifugation, filtered through 0.45-m pore size filters and mixed with Fast Flow nickel-nitrilotriacetic acid Sepharose resin slurry (2 ml) previously equilibrated with buffer B (buffer A implemented with 1 mM HgCl 2 ). After 1 h at room temperature (or overnight at 4°C), the slurry was poured into a column and first washed in buffer B until the baseline (A 280 ) was stable and then in 2 ml of buffer B supplemented with 60 mM imidazole. The protein was eluted stepwise with buffer B further containing 100, 200, 300, 400, and 500 mM imidazole, respectively. The fractions collected were analyzed by SDS-PAGE, and those containing a single band attributable to the target protein were pooled, dialyzed at 4°C overnight against 10 mM Tris⅐HCl, 1 mM HgCl 2 , pH 7.5, passed through 0.45-m filters, and concen-trated using a Centricon-10 device (Millipore). The last purification step comprised ion-exchange chromatography (Amersham Biosciences) with a Mono Q column equilibrated with 20 mM Tris⅐HCl, 1 mM HgCl 2 , pH 7.5.
Activity Assay-Activity was determined with the fluorigenic substrate di-tertbutyl dicarbonate-Val-Leu-Lys-aminomethylcoumarin. Briefly, recombinant pro-cd-InpA protein was activated at 37°C in 0.1 M Tris⅐HCl, 5 mM EDTA, pH 7.5, freshly supplemented with 2 mM DTT. The fluorigenic reaction was started by adding substrate (10 mM; final concentration in the reaction mixture, 250 M) and the release of aminomethylcoumarin was recorded by measuring the increase in fluorescence using a micro-titer plate reader.
Autocatalytic Assay-A total of 100 g of pro-cd-InpA protein, alone or with 0.7 g of active cd-InpA protein, was preincubated at 37°C in buffer C (0.1 M Tris⅐HCl, 1 mM HgCl 2 , 2 mM DTT, pH 7.6). The autocatalytic reaction was initiated by diluting the sample with buffer D (buffer C but with 5 mM EDTA instead of 1 mM HgCl 2 ) at 37°C (final pro-cd-InpA and cd-InpA concentrations were 10 and 0.1 M, respectively). Aliquots were taken at distinct time intervals and mixed with E-64 inhibitor (N-[N-{L-trans-carboxyoxiran-2-carbonyl}-L-leucyl]agmatine) to stop the reaction. At the same time intervals, samples of the incubation mixture were assessed for activity against the above fluorigenic substrate, and the initial rate of substrate turnover was determined. As a negative control, the same experiments were carried out using buffer C. To ascertain whether pro-cd-InpA autoactivation was an intra-or an intermolecular process, the zymogen was incubated as described above at 10, 2, and 0.4 M, respectively, with samples withdrawn from the above activation reaction mixture at the mentioned time intervals.
Processing of Pro-cd-InpA C154A by wt cd-InpA-Pro-cd-InpA C154A was tested as a substrate for wt cd-InpA in a reaction mixture containing 0.1 M of the latter and 10 M of the former protein in buffer D at 37°C. Aliquots of 10 l were withdrawn from the reaction mixture at distinct time intervals, and the reaction was quenched by addition of E-64. Results were analyzed by 12% SDS-PAGE.
Generation of N-terminally Truncated Pro-cd-InpA C154A-Pro-cd-InpA C154A (25 mg/ml) in 20 mM Tris-HCl, pH 7.6, was incubated with 1.7 g of DTT-activated wt cd-InpA overnight at 21°C. The reaction was terminated by addition of E-64 to 100 M final concentration, and the protein was purified by ionic-exchange chromatography employing a NaCl gradient. Fractions containing the N-terminally truncated 36-kDa form of pro-cd-InpA C154A were pooled, concentrated, and dialyzed against a buffer suitable for protein crystallization. N-terminal sequencing, mass spectrometry, and Western blot analyses revealed that this protein variant (⌬N1pro-cd-InpA C154A) encompassed residues Ala 39 -Pro 359 plus the C-terminal expression vector-derived leucine-glutamate dipeptide but was lacking the His 6 tag.
N-terminal Sequence Analysis-Wild-type pro-cd-InpA, the C154A mutant protein, their truncated variants, as well as their cleavage products were analyzed by 12% SDS-PAGE and transferred to polyvinylidene difluoride membranes. Membranes were stained with 0.2% Amido Black and subjected to Edman degradation using a Procise 494-HT protein sequencer.
Crystallization and Data Collection and Processing-⌬N1pro-cd-InpA C154A was crystallized from sitting drops containing protein (22 mg/ml) and 10% polyethylene glycol 3000, 0.2 M magnesium chloride, 0.1 M sodium cacodylate, pH 6.5. Wt cd-InpA protein comprising residues Val 112 -Pro 359 plus the C-terminal dipeptide was crystallized from drops comprising protein solution (16 mg/ml) and 28% polyethylene glycol 4000, 0.2 M magnesium chloride, 30% xylitol, 0.1 M Tris⅐HCl, pH 8.5. Diffraction data were collected at 110 K at the European Synchrotron Radiation Facility (Grenoble, France) beam line ID29 using an ADSC Q315 CCD area detector. ⌬N1pro-cd-InpA C154A crystals diffracted beyond 1.5-Å resolution and belonged to space group C2 with one molecule per asymmetric unit. Wt cd-InpA crystals diffracted to 3.2 Å, belonged to the tetragonal space group P4 1 2 1 2, and contained two molecules per asymmetric unit. Diffraction data were indexed and integrated with the program XDS (27) and scaled and reduced with SCALA within the CCP4 suite (28). Statistics on data collection and processing are presented in Table 1. Wt cd-InpA crystals diffracted very weakly. This led to a high value for the merge indicator R r.i.m. but to an acceptable R p.i.m. value due to the almost 8-fold average multiplicity of the data. In any case, these data were accurate enough to yield valid structural information.
In the case of the ⌬N1pro-cd-InpA C154A crystals, diffraction data were strong and of excellent quality, leading to low values for both R p.i.m. and R r.i.m. (see Table 1).
Structure Solution and Analysis-The structure of ⌬N1procd-InpA C154A was solved with program PHASER (29) using all diffraction data, and the coordinates of S. pyogenes pro-SpeB protein (Protein data bank (PDB) access codes 1pvj and 1dki (19)) were used as a searching model. A refined final solution was found comprising 170.8, 60.0, and 294.0 for ␣, ␤, and ␥ (in Eulerian angles) and 0.145, Ϫ1.007, and 0.734 for x, y, and z (in fractional cell coordinates), with a log-likelihood gain value of 44.2. The appropriately rotated and translated search-model coordinates were subjected to crystallographic refinement albeit with no positive result. Accordingly, the model was given 200 cycles of refinement with the program SHELXL (30) starting from a resolution of 3 Å and increasing it by 0.01 Å with every cycle until full data resolution (1.5 Å). No data were set aside as a free R factor set. The resulting model phases were subjected to a density modification step with SHELXE (31). Amplitudes and phases for non-measured reflections were thereafter extrapolated from the current map to fill in the missing reflections within the experimental resolution limits and beyond, to a resolution of 1.0 Å. 60 cycles of density modification and 5 iterations of 20 cycles each of main-chain tracing with SHELXE followed. Combination of the resulting partial main-chain model with the original phases was succeeded by a new density modification run that eventually led to a partial backbone model for 192 of the residues and a set of phases with a figureof-merit of 0.76. An electron density map was computed and subjected to a final density modification step with the program DM within CCP4. This step improved the figure-of-merit to 0.87 and enabled straightforward model completion and refinement. Manual model completion with TURBO-Frodo alternated with crystallographic refinement using REFMAC5 within the CCP4 suite. The final ⌬N1pro-cd-InpA C154A model com-prised all residues from Ala 39 to Pro 359 ( Fig. 1) except Ser 295 -Gln 301 .
The structure of wt cd-InpA was solved with program AMoRe (32) using the coordinates corresponding to residues Ala 121 -Pro 359 from ⌬N1pro-cd-InpA C154A and structurefactor amplitudes in the range 15-3.5 Å. These calculations unambiguously confirmed P4 1 2 1 2 as the correct space group and a unique solution was found at 48.5, 88.4, 223.7, 0.1172, 0.5977, and 0.1152 (␣, ␤, ␥, x, y, and z; refined values after rigid-body refinement; see Ref. 32) and 41.7, 88.6, 115.6, 0.1940, 0.0880, and 0.7122 for each of the two molecules A and B in the asymmetric unit, respectively, with a combined score CCF/ crystallographic R factor , according to Ref. 32, of 54.4%/41.4%. The appropriately rotated and translated coordinates were subjected to rigid-body and positional refinement applying strong non-crystallographic-symmetry restraints with CNS v. 1.2 (33). Despite the weakness and low resolution of the wt cd-InpA diffraction data, the resulting electron density maps clearly disclosed the entire polypeptide chain of the mature protease moiety. It contained an unambiguous trace for the first eight residues (Val 112 -Tyr 120 ) of the polypeptide chain that had not been included in the search model, a proof of concept for data quality which ruled out model bias. Careful model building alternated with crystallographic refinement under application of strong non-crystallographic-symmetry-restraints with programs CNS and REFMAC5 (at the final stages). The final wt cd-InpA model comprised residues Val 112 -Gly 357 for molecule A and Val 112 -Pro 359 plus two residues from the C-terminal tag (termed Leu 360 and Glu 361 ) for molecule B. These two molecules were almost equivalent in practice. Accordingly, under "Results and Discussion" we consider molecule A unless otherwise stated. Table 1 provides statistics on the final refinement steps and parameters of the quality of the resulting models.
Miscellaneous-The figures were prepared with programs TURBO-Frodo, SETOR (34), and MOLMOL (35). Structures were superimposed with TURBO-Frodo. Bioinformatic amino acid sequence similarity searches were undertaken within MEROPS data base and with the PSI-BLAST server (www.ncbi.nlm.nih.gov/blast). Structural similarity searches were performed with program DALI and secondary structure predictions with program JPRED. Close contacts and interaction surfaces (with a probe radius of 1.4 Å) were calculated with CNS taking the half of the total surface buried at the interface. The final coordinates of ⌬N1pro-cd-InpA C154A and wt cd-InpA have been deposited with the Protein Data Bank at the Research Collaboratory for Structural Bioinformatics with access codes 3bb7 and 3bba.

RESULTS AND DISCUSSION
Protein Purification and Characterization-Pro-cd-InpA and pro-cd-InpA C154A were overexpressed as 40-kDa proteins and purified to homogeneity. The wt zymogen was readily converted into the fully processed mature 27-kDa catalytic domain during purification so that the zymogenic form could only be obtained if reversible CP inhibitors were included during homogenization of bacterial cells and purification (Fig. 2, A-C). Subsequent inhibition release resulted in time-dependent autocatalytic processing of the zymogen with the concurrent release of activity (Fig. 2, C and E). Processing and activity  (19)). Identical residues are displayed over an orange background (28% sequence identity). Amino acid residues not present in the respective three-dimensional structures are depicted in blue. The four autolytic cleavage points of pro-cd-InpA are indicated by blue scissors. The main cleavage points of either protein leading to the stable mature forms are characterized by blue scissors and light green background. The presently studied protein, ⌬N1pro-cd-InpA C154A, includes all the residues from Ala 39 onward, and a predicted N-terminal helix in the N-terminal missing region is shown in pink. release were accelerated by catalytic amounts of active cd-InpA (Fig. 2, D  and E). This, together with the finding that the initial rate of activity generation was dependent on the zymogen concentration (Fig. 2F), suggested that the autocatalytic maturation of pro-cd-InpA occurred in trans (intermolecularly). Pro-cd-InpA C154A was produced to elucidate the sequence of cleavage events during activation and for structural purposes. As in the case of other CPs, pro-cd-InpA C154A was enzymatically inert and did not undergo autoprocessing. Analysis of concentration-and timedependent proteolysis of pro-cd-InpA C154A by the active protease revealed that the process occurred stepwise through a main 36-kDa intermediate (⌬N1pro-cd-InpA C154A) generated by hydrolysis of peptide bond Thr 38 -Ala 39 . Accordingly, ⌬N1pro-cd-InpA C154A lacks the first 38 residues of the fulllength zymogen (see Fig. 1 . c Crystallographic R factor ϭ ⌺ hkl F obs Ϫ kF calc /⌺ hkl F obs , with F obs and F calc as the observed and calculated structure factor amplitudes; free R factor , same for a test set of reflections not used during refinement. d Including atoms in alternate conformation. e The last refinement step included anisotropic B-factor refinement of the protein atoms in the case of ⌬N1 pro-cd-InpA (C154A). f According to program MOLPROBITY (47). FEBRUARY 1, 2008 • VOLUME 283 • NUMBER 5 to further degradation. The same activation pathway may operate in vivo, because a similar band pattern representing variably processed InpA species was detected in the P. intermedia culture medium (Fig.  2H). The maturating self processing of InpA resembles pro-SpeB with respect to formation of one major intermediate and several cleavages within the remaining part of the N-terminal pro-domain (36). Such a mechanism provides regulation of proteolytic activity independent of other secreted and host proteases, thus ensuring that the activity is developed when required.

Activation of P. intermedia InpA Cysteine Protease
Structure Solution Employing a Novel Approach-Contrary to the intact mutant protein, the ⌬N1procd-InpA C154A variant crystallized. Its structure was solved by Patterson-search methods using maximum-likelihood criteria. This approach improves the definition of the target for the search by removing the contribution of unknown variables. This means that the errors attributable to lack of completeness of a search model are better estimated. In practice, this entails a larger radius of convergence (i.e. it yields a solution for structurally more distant searching models) than conventional search methods, which failed in the present case. Unfortunately, the current crystallographic refinement programs have a shorter radius of convergence. This restricted model refinement and led us to develop a novel approach based on a further development of the SHELX suite of programs (37). It consists of the application of the "free-lunch algorithm," whose theoretical basis had been developed by Giacovazzo and coworkers (38), combined with autotracing, model refinement, and density modification. This process essentially envisaged that the initially (poorly) refined model, displaying a weighted mean-phase error of 64°with respect to the final refined model (as determined a posteriori), was used to calculate an electron density map that was subjected to density modification. With this map, missing structure-factor amplitudes and phases were estimated within the resolution range of the experimental data. Further values were extrapolated to a nominal resolution of 1.0 Å. Subsequently, density modification (weighted mean-phase error ϭ 33°), main-chain auto-tracing and phase-combination (weighted mean-phase error ϭ 27°) eventually produced an accurate partial model for ϳ60% of the residues. In addition, the resulting electron density map was excellent, even in those parts where the original search model showed a different chain trace (Fig. 3). This permitted straightforward manual tracing of the entire molecule and successful refinement, enabling us to ascertain three differences in comparison to the sequence of the PIN0048 open reading frame in the Institute for Genomic Research data base that were subsequently confirmed by sequencing at the DNA level (see Fig. 1).
Structure of InpA Zymogen-The protein has an elongated shape with an N-terminal pro-domain (Ala 39 -Asn 111 ) and a C-terminal papain-like CP domain (Val 112 -Pro 359 ), which bifurcates into a right subdomain (RSD) and a left subdomain (LSD) (see Fig. 4A). RSD and LSD interact through a surface of 1,332 Å 2 establishing 69 contacts (Ͻ4 Å), among them 11 hydrogen bonds (Ͻ3.4 Å) and 22 hydrophobic interactions ( Table 2). The pro-domain contacts laterally the top of the CP moiety through a surface of 1177 Å 2 , with 54 contacts (Ͻ4 Å), among them 12 hydrogen bonds and 19 hydrophobic interactions ( Fig. 4A and Table 2). The pro-domain is stabilized by a central hydrophobic core and evinces an open-faced sandwich with a twisted antiparallel four-stranded ␤-sheet (sheet I; strands ␤1-␤4) of simple up-and-down connectivity mediated by short loops. After ␤4, a segment in extended conformation (loop joining strands ␤4 and ␣1, L␤4␣1) leading to helix ␣1. The N-terminal part of the helix approaches the active-site cleft, thus contributing to latency, and is hereafter termed "backing helix." The polypeptide reaches the molecular surface after ␣1 and undergoes a sharp turn, folding back along the surface and entering a connecting segment that links the prodomain with the CP domain. This segment adopts an extended conformation from Asn 107 to Pro 117 , i.e. optimal for binding to and cleavage by an active-site cleft of a protease (39). This   Fig. 1), the N-and the C terminus, the primary activation point (at Asn 111 -Val 112 ), and the structure regions responsible for latency maintenance are marked and labeled. B, superimposition of the C␣-carbon traces of ⌬N1pro-cd-InpA C154A (yellow) and wt mature cd-InpA (red) in standard orientation. Some residues of ⌬N1pro-cd-InpA C154A are labeled for reference. C, close-up view of the active site of ⌬N1pro-cd-InpA C154A. Orientation as in B after a horizontal rotation of ϳ45°. D, same as in C but for wt active cd-InpA. E, C␣-trace of the structure of ⌬N1pro-cd-InpA C154A (yellow) and wt mature cd-InpA (red) around the active site, including the catalytic cysteine residue (Cys 154 ; mutated to alanine in ⌬N1pro-cd-InpA C154A), imbedded in active-site helix ␣2 (circled 1), the zymogenic hairpin (circled 3) encompassing the catalytic histidine (His 305 ) (undefined from Ser 295 to Gln 301 in ⌬N1pro-cd-InpA C154A) (circled 4), the backing helix ␣1 (absent in cd-InpA) (circled 1), and the latency-flap, displayed from Tyr 332 to Met 351 for either structure (circled 2). The gray arrows indicate the displacements of the keynote structural elements upon zymogen activation as explained in the text. FEBRUARY 1, 2008 • VOLUME 283 • NUMBER 5 stretch includes the activation cleavage point, Asn 111 -Val 112 (Fig. 4A), which is superficial and accessible for processing.

Activation of P. intermedia InpA Cysteine Protease
At Val 112 , the polypeptide chain enters the RSD of the mature enzyme moiety, which is a split subdomain (Val 112 -Leu 127 plus Thr 260 -Pro 359 ) with an open-faced sandwich topology created by a six-stranded twisted antiparallel ␤-sheet (sheet II; strands ␤11-␤16). The sheet extends from the bottom of the molecule (outermost strand ␤11) to the interface with the prodomain at ␤15 (Fig. 4A). The twist gives rise to a concave and a convex face, and the latter mediates the main interaction with the LSD. The main contact between the pro-domain and the CP part is formed by the outermost strand of sheet I, ␤4, and the lateral strand of sheet II, ␤15. This gives rise to a continuous ten-stranded ␤-sheet that completely traverses the zymogen from its upper right to the bottom center (Fig. 4A). After the inset of the LSD (see below), the polypeptide chain rejoins the RSD at strand ␤11 of sheet II, which runs outward approximately perpendicular to the view in Fig. 4A. After this strand, a short loop leads to helix ␣5, which nestles in the concave side of sheet II, followed by the next four strands of sheet II (␤12-␤15), inserted with simple up-and-down connectivity. These strands are connected by loops, which contribute to the substrate-binding cleft and the active site. The polypeptide chain is very well defined for the whole protein moiety except for the tip of a ␤-hairpin structure created by strands ␤12 and ␤13 and the enclosed loop, the "zymogenic hairpin" in the following. The hairpin is rigid at its trunk, because it is stabilized by six ␤-sheet interactions between ␤12 and ␤13, but flexible at its tip (between Ser 295 and Gln 301 (Fig. 4)). After ␤15, the polypeptide runs below the backing helix ␣1 and gives rise to what will now be referred to as the "latency-flap," L␤15␤16, which spans the 16 residues from Ile 334 to Gln 349 . This structure displays a unique conformation and is stabilized by a series of internal contacts. It consists of two sequential dextrohelical elements, Ile 334 -Asn 338 and Ser 344 -Gln 349 , connected by two residues in extended conformation (Pro 339 -Gly 340 ) and a tight 1,4-turn of type I (Asn 341 O-Ser 344 N, 3.13 Å), which protrudes from the molecular surface. The bottom of the first dextrohelical segment is anchored to L␤11␣5 through a bidentate interaction of its main chain with the completely buried side chain of Arg 267 and includes another tight 1,4-turn of type I (Ile 334 O-Leu 337 N; 2.94 Å). In addition to this arginine anchor, the structure of the latency flap is galvanized by a total of nine internal hydrogen bonds that confer an extraordinary rigidity to this structural element. After this flap, the protein chain enters the second strand of sheet II, ␤16, and leads to the surface C terminus of the molecule at Pro 359 , whose position permits additional downstream domains in the fulllength InpA protein (Fig. 4A).
The LSD (Leu 128 -Phe 259 ) is inserted into the RSD and is characterized by a central three-helical bundle made up by helices ␣2, called the "active-site helix," as well as ␣3 and ␣4, which traverse the subdomain from the back to the front (Fig. 4A). In addition, three ␤-hairpins are found on the front side of the LSD, ␤5␤6, ␤7␤8, and ␤9␤10. After ␣4, the polypeptide chain rejoins the RSD at Thr 260 and leads to ␤11. As observed for the pro-domain, the LSD is held together by a large central hydrophobic cluster that reaches the subdomain surface at the bottom and at the left of the molecule and accommodates activesite helix ␣2.
Substrate-binding Crevice and Active Site-The active-site cleft of InpA is in a crevice formed by loops connecting strands of sheet II at its carboxyl end. The walls of the crevice are provided by RSD and LSD (see Fig. 4). Classic CPs like papain, cathepsin B, and staphopain have a short, four-residue segment connecting the two residues that are topologically equivalent to Gln 134 and Gly 153 of InpA, respectively, as contributors to the left-side rim of the cleft on its primed side. In contrast, InpA displays between the latter two residues an 18-residue insertion that forms a unique upper-left region of the molecule. This entails that the zone ascribable to substrate binding would be reduced in InpA to Gly 133 -Gln 135 and Thr 152 -Gly 153 , immediately preceding the catalytic cysteine, Cys 154 . The former stretch includes Gln 134 , whose position is absolutely conserved among CPs and which, by analogy, would be involved in the formation of an oxyanion hole together with the amide nitrogen of Cys 154 , which would bind the scissile carbonyl (21,40). On the non-primed side of the cleft, Ser 242 -Met 246 and Tyr 264 would also contribute to the left rim. Again in contrast to classic CPs, InpA possesses a much longer connection between helices, which shapes part of the front surface and gives rise to a unique ␤-hairpin, novel for CPs (␤9␤10). This entails that the residues from Pro 238 to Gly 241 should further assist Ser 242 -Met 246 in shaping the cleft rim. In even greater contrast to clas-  sic CPs, the segments shaping the right-hand rim of the cleft on its primed side may be restricted to the side chains of the strongly conserved Trp 324 from L␤14␤15, which becomes rearranged upon activation, and the previously mentioned Gln 134 (21). Regarding the right rim on the non-primed side of the cleft, binding may be provided by the main chain of the rearranged zymogenic hairpin, in particular His 305 -Ala 306 and Tyr 291 -Gly 293 , as well as Asp 350 . Structures Related to InpA-As might have been expected, a search for structural relatives of ⌬N1pro-cd-InpA identified pro-SpeB as the closest homologue, with an rms deviation of 2.0 Å over 275 topological equivalent residues (PDB 1dki and 1pvj (19)). This protein is secreted as a zymogen, and no structural information on the mature protein is currently available. InpA and SpeB are the only members of the catalytic-dyad enzymes, i.e. those lacking a catalytic asparagine, structurally studied to date (19). Because P. intermedia has been shown to degrade connective-tissue constituents and to interfere with the tightly regulated defense mechanism of the host (9), like SpeB in S. pyogenes (22), it is tempting to speculate that InpA is a virulence factor equivalent to SpeB in P. intermedia. In addition, P. gingivalis was shown to harbor a further CP, periodontain (41), which is closely related to SpeB and InpA. Accordingly, we conclude that P. gingivalis, P. intermedia, and S. pyogenes may have inherited these homologous proteins from a common ancestor and that they may undergo a similar activation mechanism (42).

Inter-domain (pro-domain/protease domain) and inter-subdomain (RSD/LSD) interactions in ⌬N1pro-cd-InpA
Overall, the core of the protease and the pro-domain of InpA conform to the pro-SpeB fold (Fig. 5A). However, the difficulties encountered during ⌬N1pro-cd-InpA structure solution employing pro-SpeB as a search model for phasing and a sequence identity of just 28%, i.e. in the twilight zone of protein sequence alignments (43), already pointed to significant differences in structure. The pro-SpeB crystal structure displays unconnected electron density for a helix nestling on the concave side of sheet I of the pro-domain. Several secondary structure prediction algorithms consistently predicted an ␣-helix to similarly run from Lys 6 to Asn 18 in pro-cd-InpA (Fig. 1). However, differences in length in the loop connecting this (putative) first helix with strand ␤1 (nomenclature of ⌬N1pro-cd-InpA, see Fig. 1 for equivalences), as well as in L␤1␤2, lead sheet I to have a bulge on its left-hand side in the streptococcal enzyme, which is compensated in ⌬N1pro-cd-InpA by a different chain trace of L␤4␣1 (around residue Val 83 , see Fig. 5B). The pro-SpeB pro-domain is undefined for segment Ala 112 P SPE -Gln 1 SPE (residues of pro-SpeB are subscripted SPE; see Fig. 1 for the complete pro-SpeB sequence), which includes the primary activation cleavage site at Lys 118 P SPE -Gln 1 SPE . Accordingly, this region, which is instrumental for understanding a latent structure, is defined in the prevotellaceal zymogen but not in the streptococcal protein.
There are also important differences in the surface structures of pro-cd-InpA and pro-SpeB, which affect activation and substrate binding in the mature enzymes. At the end of the first segment of the RSD, at Leu 128 -Thr 129 , a three-residue insertion creates a bulge in pro-SpeB leading to structural differences in the loop structure preceding ␣3, with a maximal difference at Gly 210 (Ser 105 SPE ) of 2.8 Å. Further downstream of the polypeptide chain, pro-SpeB has ten extra residues preceding the active site helix and does not display a hairpin equivalent to ␤5␤6 of ⌬N1pro-cd-InpA. This, together with the flipped L␤7␤8 loop (three residues more in pro-SpeB), has implications in shaping the left rim of the substrate-binding crevice and for the interaction with other proteins (Fig. 5). In addition, the tip of hairpin L␤9␤10, possibly involved in the left rim of the crevice in InpA (see above), also diverges due to the three extra residues in the latter proteinase. The greatest differences, however, affect regions surrounding the active site, in particular the zymogenic hairpin and the latency flap. The former, comprising the catalytic histidine in both proteins, is flexible and five residues longer in ⌬N1pro-cd-InpA, where it adopts a different orientation. In turn, the segment equivalent to the latency flap is completely disordered between Ser 230 SPE and Gly 239 SPE and has six additional residues in pro-SpeB (Figs. 1 and 5), following a completely different path. Hence, it does not contribute significantly to interactions with the zymogenic hairpin or the backing helix to maintain the zymogenic structure. Furthermore, in pro-SpeB the polypeptide preceding Ser 230 SPE invades the space occupied by the segment connecting the pro-domain with the mature moiety in pro-cd-InpA, thus pointing to differences in the segment flanking the primary activation cleavage point.
In contrast to these differences, there are also similarities in detail. As in ⌬N1pro-cd-InpA, latency is achieved in pro-SpeB through a catalytically incompetent conformation of the catalytic histidine, His 195 SPE , while the catalytic cysteine is probably in a functional position. It is conceivable, extrapolating from our structures, that the zymogenic hindrance is exerted in the streptococcal enzyme by a simple ϳ90°rotation around the 1 angle of His 195 SPE . This movement swings the imidazole side chain away from its cysteine-binding position and establishes a van-der-Waals interaction with Val 192 SPE C␥2 within the hairpin segment equivalent to the zymogenic hairpin in ⌬N1procd-InpA. The competent imidazole position is occupied in pro-SpeB by a unique asparagine, Asn 89 P SPE from the pro-domain, which establishes a highly-specific key hydrogen bonding network with Trp 214 SPE N⑀1, Ala 196 SPE O, and Trp 212 SPE O (19). The position equivalent to Asn 89 P SPE is occupied in ⌬N1pro-cd-InpA by Ser 88 , which establishes one of these three interactions (with Ala 306 O) in the InpA zymogen as one of the elements likewise leading to an incompetent histidine conformation. Accordingly, the major differences in the structure of the zymogens do not preclude that the novel activation mechanism described below may also be valid with variations for pro-SpeB and, possibly, periodontain activation.
A Novel Mechanism For Latency Maintenance and Activation-The mature enzyme structure confirms that the pro-domain, including the backing helix, is removed upon activation and that it does not sterically block access to the substrate-binding cleft in ⌬N1pro-cd-InpA. There are two parallels between this zymogen and other CP zymogens such as human and rat pro-cathepsin B (44), K (45), and pro-staphopain B (16). In the latter, the pro-segment packs against a surface loop of the C-terminal domain termed the pro-segment binding loop. This loop is absent in ⌬N1pro-cd-InpA but its pro-domain binds in the same place. A further common feature is that the association between the pro-domain and the protease domain is based on hydrophobic residues. However, in the mentioned CP zymogens the pro-domain segments run the full length of the cleft in the opposite direction to a peptidyl substrate, and block the crevice. In the InpA zymogen, in contrast, backing helix ␣1 and the preceding loop L␤4␣1 are inserted laterally like a wedge (Fig. 4A). Trp 324 , from L␤14␤15 within the CP domain, stops the wedge with its side chain (Fig. 4C). The relative antipodal disposition on the molecular surface of the active site and Val 122 (Fig. 4), which are ϳ26 Å apart, supports kinetic data (see above) suggesting that autolytic activation of InpA is likely to occur in trans. The CP domain is similar in both the mature enzyme and the zymogen (239 out of 248 common C␣ atoms show an root mean square deviation of 0.82 Å; see Fig. 4B). Interestingly, activation is not correlated with significant displacement of the newly formed N terminus at Val 122 (Fig. 4B). Despite this similarity, detailed comparison of the two structures reveals that selected structure elements display complete different chain traces (Fig. 3).
In CPs, function requires a correct spatial arrangement of the catalytic cysteine provided by the active-site helix within LSD and the catalytic histidine of the RSD to render a functional thiolate-imidazolium ion pair (46) (Fig. 4, C-E). Unlike InpA, most other CPs also have an asparagine with a supportive role (46). The position and conformation of the active-site helix and the cysteine, Cys 154 , are maintained in both InpA structures. In contrast, the catalytic histidine, His 305 , undergoes major rearrangement. In the mature enzyme it is oriented to favor the interaction with Cys 154 S␥ through a hydrogen bond, His 305 N⑀2-Trp 322 O, and is further stabilized in this position by a hydrophobic environment created by Trp 324 , Phe 307 , Phe 345 , and Trp 322 (Fig. 4D). In the zymogen, however, the histidine is swung out from its active position. It requires a rotation of ϳ45°around bond Ala 306 C␣-N and of ϳ180°around its 1 angle to adopt an active conformation (Fig. 4, C-E). The position of His 305 in the zymogen is stabilized by three of elements that contribute to a compact "histidine cage" structure ( Fig. 4C): the backing helix, the zymogenic hairpin, and the latency flap. The first one is completely removed and the further two undergo major rearrangement upon activation (Fig. 4D).
As mentioned, the zymogenic hairpin is only defined until residue Gly 294 and from Asp 302 onwards in the ⌬N1pro-cd-InpA structure so that the enclosed region L␤12␤13 is disordered. The position of the hairpin base is kept by interactions with surrounding elements. Strand ␤12 establishes three intermain-chain contacts with ␤16. In turn, ␤13 interacts with the backing helix through a hydrogen bond (Ala 306 N-Ser 88 O␥, 3.01 Å) and face-to-face ring stacking between His 305 and Trp 91 . Removal of the backing helix leads to a rearrangement of the zymogenic hairpin due to a rotation of ϳ45°producing a maximal displacement of ϳ7 Å (measured at Ala 303 C␣). In addition, the hairpin becomes rigid and fully defined by electron density (Fig. 4, C-E). The hairpin-constituting ␤-strands are extended to Gly 297 (␤12) and from Gln 301 onwards (␤13) and give rise to a new intra-main-chain interaction (Ser 295 N-Ala 303 O). These changes carry along the activatory reorientation of His 305 (Fig. 4E).
Another important element shaping the histidine cage in the zymogen is the latency flap, which anchors the catalytic histidine in the non-competent position through a hydrogen bond (Glu 348 O⑀1-His 305 N⑀2). In addition, the latency flap interacts with the backing helix through three hydrogen bonds, a hydrophobic interaction, and a small hydrophobic cluster made up by