Structure and Mechanism of Cysteine Peptidase Gingipain K (Kgp), a Major Virulence Factor of Porphyromonas gingivalis in Periodontitis*

Background: The cysteine peptidase gingipain K is a major proteolytic virulence factor of Porphyromonas gingivalis. Results: The structure of the catalytic and immunoglobulin-type domains has been solved in complex with a covalent inhibitor. Conclusion: A distinct S1 pocket explains its high specificity for lysines. Significance: The structural details reveal the working mechanism and may lead to the design of drugs to selectively treat periodontitis. Cysteine peptidases are key proteolytic virulence factors of the periodontopathogen Porphyromonas gingivalis, which causes chronic periodontitis, the most prevalent dysbiosis-driven disease in humans. Two peptidases, gingipain K (Kgp) and R (RgpA and RgpB), which differ in their selectivity after lysines and arginines, respectively, collectively account for 85% of the extracellular proteolytic activity of P. gingivalis at the site of infection. Therefore, they are promising targets for the design of specific inhibitors. Although the structure of the catalytic domain of RgpB is known, little is known about Kgp, which shares only 27% sequence identity. We report the high resolution crystal structure of a competent fragment of Kgp encompassing the catalytic cysteine peptidase domain and a downstream immunoglobulin superfamily-like domain, which is required for folding and secretion of Kgp in vivo. The structure, which strikingly resembles a tooth, was serendipitously trapped with a fragment of a covalent inhibitor targeting the catalytic cysteine. This provided accurate insight into the active site and suggested that catalysis may require a catalytic triad, Cys477-His444-Asp388, rather than the cysteine-histidine dyad normally found in cysteine peptidases. In addition, a 20-Å-long solvent-filled interior channel traverses the molecule and links the bottom of the specificity pocket with the molecular surface opposite the active site cleft. This channel, absent in RgpB, may enhance the plasticity of the enzyme, which would explain the much lower activity in vitro toward comparable specific synthetic substrates. Overall, the present results report the architecture and molecular determinants of the working mechanism of Kgp, including interaction with its substrates.

Bacteria are normally part of the commensal flora that is generally beneficial for human health (1). However, in a susceptible host, some are pathogenic and invade cells and tissues, causing infection and disease. Moreover, the emergence of resistant strains, which are currently responsible for half of all infections, has exacerbated the danger (2). In the United States alone, resistant pathogens infect at least 2 million people every year, which makes such infections more common than cancer, and they cause 23,000 deaths (2). The only way to keep pace with these extremely adaptive pathogens is via continuous effort in the development of new antimicrobials. Inexplicably, however, the pharmaceutical industry has neglected the development of new antibiotics in recent decades: only four new drug applications were approved by the United States Food and Drug Administration in 2005-2012 (3). Responsibly, academic research must fill the gap.
Among the most prevalent human bacterial commensals turned into pathogens is Porphyromonas gingivalis, a Gramnegative oral anaerobe that causes periodontitis, an inflammatory disease that afflicts half the adult population in the United States, destroys the gums, and leads to tooth loss (4). It was even detected in the 5,300-year-old mummy of the Tyrolean Iceman "Ötzi" in what may well be the earliest report of gingival infection in Homo sapiens (5). P. gingivalis invades periodontal tissues by colonizing the gingival sulcus and proliferating in the subgingival plaque. It evades the host defense mechanisms through a panel of virulence factors that deregulate innate immune and inflammatory responses. In addition, bacteria and their products can enter the circulation, contributing to development and severity of systemic diseases at distal sites, such as cardiovascular diseases (6), rheumatoid arthritis (7), diabetes (8), and preterm delivery (9). Currently, specific treatment of severe periodontitis consists only of curettage of the affected area, which is time-consuming, painful, and needs frequent repetition (10), and the adjunct doxycycline hyclate (Periostat), which targets matrix metalloproteinases and was approved by the Food and Drug Administration in 1988 (11). Consequently, there is an urgent need for the development of novel therapeutic approaches.
Peptidases are a substantial part of the infective armamentarium of P. gingivalis (12)(13)(14). Most are cysteine peptidases, and the best characterized are the gingipains K (alias Kgp) 4 and R (RgpA and RgpB) (4,15), which are major virulence factors of the pathogen (16). Gingipains are cell surface-anchored or soluble and responsible for up to 85% of the total extracellular proteolytic activity of P. gingivalis (17). This activity yields nutrient acquisition, cleavage of host cell surface receptors, signaling via protease-activated receptors, and inactivation of cytokines and components of the complement system. The pathogen thus keeps host bactericidal activity in check and maintains chronic inflammation (4). In particular, Kgp cleaves many constituents of human connective tissue and plasma, including immunoglobulins; fibronectin; plasma kallikrein; fibrinogen; iron-, heme-, and hemoglobin-transporting proteins; and peptidase inhibitors, thus contributing to bleeding and vascular permeability as well as to heme and iron uptake by the bacterium (18,19). Further pathophysiologically relevant substrates of Kgp include cadherins at the cell adherence junction, membrane TNF␣, interleukin-8, the interleukin-6 receptor, thrombomodulin, complement regulatory protein CD46, and osteoprotegerin (18). Kgp thus contributes far more to the pathogenicity of P. gingivalis than any other peptidase (20), and so it is essential for bacterial survival and the pathological outcome of periodontitis (21). This was further confirmed by the reduction of bacterial virulence observed in a mouse model of infection upon specific inhibition of Kgp (22). Accordingly, Kgp, like RgpA and RgpB, is a promising target for the development of therapeutic inhibitors to treat periodontitis (18,23).
Functionally, RgpA and RgpB specifically cleave bonds after arginines, whereas Kgp cleaves after lysines (21,24). Structurally, these enzymes are translated as multidomain proteins made up of at least a signal peptide, a prodomain, a catalytic domain (CD), an immunoglobulin-superfamily domain (IgSF), and a C-terminal domain. RgpB shows just this minimal con-figuration (21). RgpA has four additional hemagglutinin/adhesion domains (termed RgpA A1 -RgpA A4 ) inserted between the IgSF and the C-terminal domains. Kgp in turn has between three and five such domains (termed KgpA A1 -KgpA A5 ), depending on the bacterial strain, thus spanning up to 1,723-1,732 residues (21). Both Kgp and RgpA are subjected to extensive post-translational proteolytic processing and are secreted as non-covalent but very tight complexes of the catalytic and hemagglutinin/adhesion domains, which are held together through oligomerization motifs (25).
Detailed structural and functional knowledge of target virulence factors at the molecular level can lead to the development of new drugs following rational drug design strategies (26). Atomic structural data are available for the catalytic and IgSF domains of RgpB, for both a zymogen complex and the active form (27,28), and for the ancillary hemagglutinin/adhesion domains KgpA A1 , KgpA A2 , and KgpA A3 of Kgp (29,30). The latter, however, do not provide insight into the proteolytic function and mechanism of Kgp. Given the importance of the distinct but complementary cleavage specificities of RgpB and Kgp, which may be related to the differences between their respective CDϩIgSF moieties (27% identity; see Fig. 1), we analyzed the three-dimensional structure of a catalytically competent 455-residue fragment of Kgp from P. gingivalis strain W83 (hereafter Kgp(CDϩIgSF)) and assessed its molecular determinants of action and specificity.

EXPERIMENTAL PROCEDURES
Protein Production-Kgp(CDϩIgSF) of P. gingivalis strain W83 (sequence Asp 229 -Pro 683 ; see UniProt database accession number Q51817) plus a C-terminal hexahistidine tag was purified by affinity chromatography on nickel-Sepharose beads from culture medium of P. gingivalis mutant strain ABM1 expressing recombinant Kgp with one oligomerization motif disrupted by hexahistidine insertion (25,31). In contrast to wild-type strain W83, which secretes only heterooligomeric complexes of Kgp and RgpA, strain ABM1 can release soluble and functional Kgp fragments into the medium (25). This facilitates purification of a protein variant that is compatible with crystallization. To avoid autoproteolysis, the sample was incubated with N ␣ -tosyl-L-lysinylchloromethane (TLCK; Sigma) prior to elution from the beads.
Crystallization and Diffraction Data Collection-Crystallization assays were performed by the sitting drop vapor diffusion method. Reservoir solutions were prepared by a Tecan robot, and 100-nl crystallization drops were dispensed on 96 ϫ 2-well MRC plates (Innovadyne) by a Phoenix nanodrop robot (Art Robbins) or a Cartesian Microsys 4000 XL robot (Genomic Solutions) at the Automated Crystallography Platform at Barcelona Science Park. Plates were stored in Bruker steady-temperature crystal farms at 4 and 20°C. Successful conditions were scaled up to the microliter range in 24-well Cryschem crystallization dishes (Hampton Research). The best crystals were obtained at 20°C with protein solution (at 5.7 mg/ml in 5 mM Tris⅐HCl, pH 7.4, 0.02% sodium azide) and 22% polyethylene glycol 8000, 0.1 M sodium cacodylate, pH 6.5, 0.2 M calcium acetate as reservoir solution from 2:1-l drops. Crystals were cryoprotected by immersion in harvesting solution (18% poly-ethylene glycol 8000, 0.08 M sodium cacodylate, pH 6.5, 0.16 M calcium acetate, 20% (v/v) glycerol). A complete diffraction data set was collected from a liquid N 2 flash cryocooled crystal at 100 K (cooled by an Oxford Cryosystems 700 series cryostream) on an ADSC Q315R charge-coupled device detector at beam line ID14-4 of the European Synchrotron Radiation Facility (Grenoble, France) within the Block Allocation Group "BAG Barcelona." This crystal was orthorhombic and contained one Kgp(CDϩIgSF) moiety per asymmetric unit (V M ϭ 2.2 Å 3 /Da; 44% solvent content (32)) and tightly packed (for comparison, see e.g. Ref. 33). Diffraction data were integrated, scaled, merged, and reduced with programs XDS and XSCALE (34) (see Table 1).
Structure Solution and Refinement-The structure of Kgp(CDϩIgSF) was solved by likelihood-scoring molecular replacement with the program Phaser (35) using the protein part of the structure of RgpB(CDϩIgSF) of P. gingivalis strain HG66 (GenBank TM accession number AAB41892; Protein Data Bank code 1CVR (28)) as a searching model. The side chains were trimmed from the model with the program CHAINSAW within the CCP4 suite (36) based on sequence alignment performed with MULTALIN (37). A two-body search was performed with the CD and the IgSF separately to obtain suitable phases. These calculations rendered two unambiguous solutions at final Eulerian angles (␣, ␤, and ␥) of 225.7°, 87.3°, and 30.7°and 226.2°, 92.0°, and 34.2°and fractional cell coordinates (x, y, and z) of 0.019, 0.265, 0.096 and 0.959, 0.259, 0.090, respectively. The initial values for the rotation/translation function Z-scores were 7.9/10.1 and 5.4/7.2, respectively, which confirmed P2 1 2 1 2 1 as the correct space group. A Fourier map calculated with the appropriately rotated and translated model was then subjected to density modification and model extension with ARP/wARP (38). The model obtained was completed through successive rounds of manual model building using programs TURBO-FRODO (39) and Coot (40) and crystallographic refinement with program BUSTER-TNT (41), which included translation/libration/screw refinement (one translation/libration/screw group for each domain) until completion of the model. The final model contained Kgp residues Asp 229 -Val 680 (the last three residues and the hexahistidine tag were not visible in the final map) plus four glycerol, one standalone histidine, two acetate, and three azide molecules in addition to 533 solvent molecules and six (tentatively assigned) cations, two calciums, three sodiums, and one nickel (see Table 1). Of these, the two calcium ions (numbered Ca 999 and Ca 998 ) and two of the sodium ions (numbered Na 997 and Na 996 ) were intrinsic parts of the structure and are described under "Results and Discussion." The nickel ion in turn was observed at the interface between two symmetric molecules, tetrahedrally coordinated by the carboxylate oxygens of Glu 379 , a symmetric Glu 355 , and an acetate from the mother liquor as well as by the N⑀2 atom of an isolated histidine, possibly resulting from digestion of the C-terminal hexahistidine tag during purification. The third sodium ion was bound 10 Å away from the nickel by the main-chain carbonyl of Thr 415 , four solvent molecules, and the carboxylate side chain of a symmetric Glu 498 residue. We attribute these two metal sites to purification and crystallization artifacts.
Although the main chain and most of the side chains of the entire molecule were fully defined in the final Fourier map because of the high resolution and quality of the diffraction data, the CD moiety was more rigid and better defined than the downstream IgSF as indicated by the lower average thermal displacement parameter (13.6 versus 34.5Å 2 ; see Table 1). This is reminiscent of the structure of RgpB(CDϩIgSF) in complex with its prodomain (27). The side chain of the catalytic cysteine residue (Cys 477 ) showed additional density (see "Results and Discussion"). Moreover, Met 594 showed alternate occupancy for its side chain. All other sulfur-containing side chains (three cysteines and 10 methionines) were apparently unaltered according to the final Fourier map. The only Ramachandran outlier of the structure was Ala 443 (Table 1), which, however, was unambiguously defined by the final Fourier map and had similar main-chain angles in the RgpB structures (27,28). Three where I i (hkl) is the ith intensity measurement and n hkl is the redundancy of reflection hkl, including symmetry-related reflections, and I(hkl) is its average intensity. R r.i.m. (alias, R meas ) and R p.i.m. are improved multiplicity-weighted indicators of the quality of the data, the redundancy-independent merging R factor and the precision-indicating merging R factor. The latter is computed after averaging over multiple measurements (for details, see Ref. 87). b According to Karplus and Diederichs (88). c Crystallographic R factor ϭ ⌺ hkl ʈF obs ͉ Ϫ k͉F calc ʈ/⌺ hkl ͉F obs ͉ where k is a scaling factor and F obs and F calc are the observed and calculated structure factor amplitudes, respectively. This factor is calculated for the working set reflections; free R factor is the same but for a test set of reflections (Ͼ500) not used during refinement. d According to MolProbity (47). proline residues were found in cis conformation (Pro 241 , Pro 424 , and Pro 453 ).
Miscellaneous-The structure-based sequence alignment in Fig. 1 was performed with the program EXPRESSO within T-COFFEE version 10.0 (42) and represented with program ESPRipt 3.0 (43). Ideal coordinates and parameters for crystallographic refinement of non-standard ligands were obtained from the PRODRG server (44). Structural similarity searches were performed with Dali (45), and structure figures were prepared with programs Coot and Chimera (46). The model was validated with MolProbity (47). The final coordinates of P. gingivalis Kgp(CDϩIgSF) have been deposited in the Protein Data Bank under code 4RBM.

RESULTS AND DISCUSSION
Overall Structure of Kgp Catalytic Domain-The structure of Kgp(CDϩIgSF) is elongated with approximate maximal dimensions 75 ϫ 50 ϫ 45 Å ( Fig. 2A). Curiously, it resembles a tooth with CD featuring the crown and IgSF the root (Fig. 2B). The neck is the interface between the two domains, and the active site is at the cusp, on the grinding surface (see below).
The globular CD (Asp 229 -Pro 600 ; see Figs. 1 and 2A) is a competent cysteine peptidase domain and conforms to the ␣/␤-hydrolase or PLEES fold (48,49). It contains four cationbinding sites (two sodium and two calcium ions; Fig. 3, A-C), which generally contribute to tertiary structure integrity (50). It is subdivided into a smaller N-terminal (or A) subdomain (NSD; Asp 229 -Lys 375 ) and a larger C-terminal (or B) subdomain (CSD; Ser 376 -Pro 600 ). The NSD starts on the left of the molecule (orientation hereafter according to Fig. 2A) with a small helical segment (␣1; for regular secondary structure elements, see Figs. 1 and 2, A and D), and the polypeptide chain follows an extended trace downward along the surface. At Pro 241 , the chain makes a sharp turn upward and enters a fourstranded parallel pleated ␤-sheet (sheet I) through the second strand from the right (␤1). This sheet (from left to right: ␤6-␤3-␤1-␤2) has connectivity ϩ1x,Ϫ2x,Ϫ1x according to Richardson (51) and is twisted by ϳ40°but not arched or curved. The NSD is a three-layer (␣␤␣) sandwich; thus sheet I is flanked by two almost parallel helices on its right (␣2 and ␣5) and two more (␣3 and ␣4) on its left. Although in general the regular secondary structure elements are connected by tight loops, the one connecting ␤3 with ␤6 (L␤3␤6) exceptionally spans 30 residues and contains a ␤-ribbon (␤4␤5) and a calcium-binding site (Ca 998 ) in addition to helix ␣3. This cation is oxygen-coordinated in an octahedral manner as is usual for calcium (52) by Asp 330 O␦1, two solvent molecules, and bidentately the carboxylate oxygens of Glu 343 in a plane with the ion. Asp 337 O␦1 and Phe 341 oxygen are found at the apical positions (Fig. 3B). Therefore, one in-plane position of the octahedron is split into two ligands, and the binding distances range between 2.31 and 2.55 Å, which are typical values for calcium (2.36 -2.39 Å (53)).
After ␣5, at the front neck surface, the polypeptide enters the CSD with helix ␣6, which in turn leads to a central six-stranded twisted (ϳ40°), but not arched or curved, pleated ␤-sheet (sheet II; ␤8-␤7-␤9-␤13-␤14-␤15 from left to right). The chain enters the sheet with the second strand from the left and has connectivity Ϫ1x,ϩ2x,ϩ1x,ϩ1x,ϩ1; i.e. all strands are parallel and run upward except for the rightmost one, ␤15, which runs down-  (27)). The amino acid numbering and the regular secondary structure elements (strands as black arrows labeled ␤1-␤21 and helices as loops labeled ␣1-␣14) above the alignment correspond to the Kgp(CDϩIgSF) structure (this study); those below the alignment correspond to RgpB(CDϩIgSF) (taken from Fig. 2c of Ref. 28). Identical residues are in bold white over black background, similar residues are in bold black over gray background, and the overall sequence identity is 27%. (Potential) catalytic residues of Kgp CD are pinpointed by an open rhombus; residues framing the S 1 pocket are indicated by an arrow.
ward. The latter creates the junction with the NSD and runs parallel to the leftmost strand of sheet I but is horizontally rotated ϳ60°away, giving rise to a pseudo-continuous 10-stranded supersheet. Like NSD, CSD is a three-layer (␣␤␣) sandwich, which contains the active site cleft (see below). Five helices (␣7, ␣11, ␣12, ␣13, and ␣14) are found on the left side of the sheet, and three more (␣8, ␣9, and ␣10) are on the right flank. Interestingly, helices ␣11 and ␣12 are aligned with  Fig. 3D). B, picture of a tooth with its parts labeled. C, (2mF obs Ϫ DF calc )-type Fourier map of the region around the catalytic cysteine Cys 477 obtained with diffraction data to 1.75-Å resolution and contoured at 1 above threshold. D, structure of Kgp(CDϩIgSF) in cross-eye stereo in standard orientation (28,79), which corresponds to a horizontal 90°rotation of the view in A, i.e. viewing the CD cusp region. Regular secondary structure elements of the CD (helices ␣1-␣14 and strands ␤1-␤15) are labeled. The NSD is on top, and the CSD is at the bottom (see also Fig. 5D). E, close-up of D in stereo centered on the non-primed side of the active site. Residues framing the specificity pocket S 1 and pocket S 2 are labeled. Small green spheres represent solvent molecules. respect to their axes and almost in phase. Such interrupted helices are exceedingly rare in protein structures (51), and only segment Gly 522 -Gly 539 (like prolines, flanking glycines are observed in hinges (54)) prevents these helices from being a single continuous unit. This intercalated segment gives rise to an extended loop that protrudes from the molecular surface and folds back to cover helix ␣7 like a cape, and it contains a sodium-binding site (Na 997 ). This cation is octahedrally coordinated by six oxygen ligands, the most common coordination number for sodium (55), through Thr 536 O␥, Val 521 oxygen, Asp 542 O␦2, and a solvent molecule coplanar with the metal and through Ser 537 oxygen and Tyr 402 O at the apical positions. Coordinating distances span 2.31-2.59 Å (Fig. 3C), which is consistent with most common distances for sodium (2.38 -2.41 Å (53)). Nearby, L␣12␣13 contains a second octahedral oxygen-liganded sodium site (Na 996 ) 13.4 Å from the former. This ion is bound at distances of 2.36 -2.69 Å by Tyr 550 oxygen, Asn 551 O␦1, Ala 543 oxygen, and Leu 546 oxygen in the plane and apically by Thr 544 oxygen and Ser 549 oxygen (Fig. 3C). As found for the segment connecting strands ␤3 and ␤6 in NSD sheet I, the segment connecting ␤9 and ␤13 in the CSD is elaborate and includes a small three-stranded antiparallel ␤-sheet (sheet III; ␤10, ␤11, and ␤12), which is almost perpendicular to sheet II. The downstream loop, L␤13␣10, contributes together with L␤3␤4 of the NSD to a second calcium site (Ca 999 ), which may thus have a role in maintaining the structural integrity of the NSD-CSD interface. This calcium is bound by Phe 482 oxygen, two solvent molecules and bidentately the carboxylate oxygens of Glu 491 in a plane with the ion. A further solvent molecule and bidentately the carboxylate oxygens of Asp 313 are in the respective apical positions (Fig. 3A). Therefore, two positions of the octahedron are split into two ligands, and the binding distances range from 2.33 to 2.55 Å (Fig. 3A). After strand ␤15, at segment Asp 590 -Ser 592 , the polypeptide abruptly changes direction and runs horizontally outward from the interface with the NSD to the left molecular surface. At Ala 598 , the chain turns abruptly downward and leads to the interface between CD and IgSF at Pro 600 -Lys 601 .
Finally, an internal channel is found within the CSD, vertically traversing the molecule over 20 Å from the bottom of the specificity pocket in the active site (see below) to the lower outer surface of the subdomain (Fig. 3D) where it emerges through a crater surrounded by Pro 595 , Asn 551 , and Arg 597 . It is filled with 13 solvent molecules, which are well defined in the final Fourier map (average thermal displacement parameter, 17.0 Å 2 ; for comparison, the overall value for the CD is 13.6 Å 2 , and that of all solvent molecules is 29.8 Å 2 ). The outermost solvent molecule of the channel at the domain surface is bound by Ser 669 O␥ from the downstream domain IgSF. This channel is embraced by sheet II and helices ␣7, ␣11ϩ␣12, and ␣13. Such extended solvent channels traversing the inner core of proteins are rare, and here its role, if any, is unknown. The channel is not wide enough to evacuate reaction products, as e.g. in catalase (56) or the ribosome (57), and is too far away from the active site cysteine to serve as supplier of solvent for the deacylation step in catalysis. In addition, in the structurally related RgpB, this channel is replaced by a compact hydrophobic core (see Protein Data Bank code 1CVR (28)).
Solvent molecules buried in internal cavities, which are integral structural components of proteins, interchange with the external bulk solvent (58), thus conferring a "breathing" motion to a protein. By serving as mobile hydrogen bonding donors or acceptors, internal waters may facilitate transition and structural rearrangement between different functional states (59), and they cluster at internal cavities of functional importance such as hinge regions or along channels (60). However, the generation of cavities inside a protein at places where a compact hydrophobic core is found in close structural relatives usually reduces stability (61), although water-filled cavities destabilize less than empty cavities: the water molecules may still interact favorably with neighboring protein residues (62). Conversely, the hydrogen bonding potential of a water molecule inside a protein structure is less exploited than in the aqueous phase, and moving a solvent molecule from bulk solvent to the interior of a protein entails entropic costs (63) and energy costs for hydration of the cavity (64), which in turn destabilize protein structures. In Kgp(CDϩIgSF), the extended solvent channel could contribute more to the overall plasticity and flexibility of the enzyme than in the compact hydrophobic core of RgpB. Although certain flexibility (at least around the active site) is a prerequisite for efficient catalysis (65), destabilization of the overall enzyme moiety contravenes the axiom that proteins must adopt a stable tertiary fold to be wholly functional (66). This would be consistent with much lower activity of Kgp in vitro than RgpB against comparable synthetic substrates mimicking their respective specificities (24,67). This in turn would apparently contract the superior role of Kgp as a proteolytic virulence factor (20). It must be kept in mind, however, that native Kgp occurs as a complex of the catalytic and hemagglutinin adhesion domains, which work as a homing device to deliver Kgp to its targets and exert essential functions for P. gingivalis such as agglutination of red blood cells, acquisition of heme, and binding to the extracellular matrix (4,18,19).
Overall Structure and Similarity of Kgp Immunoglobulin Superfamily Domain-With Lys 601 , the polypeptide chain enters the IgSF, which is essential for folding of Kgp: no properly folded CD is detected by specific monoclonal antibodies if IgSF is ablated despite the truncated kgp gene being transcribed (68). In addition, only residual Kgp-specific activity is detected in such deletion mutants (68). Structurally, IgSF consists of a six-stranded antiparallel open ␤-barrel adopting a Greek key topology for its first four strands (␤16-␤19-␤18-␤17) followed by a final ␤-ribbon structure (␤20␤21). The initial segment of IgSF (Lys 601 -Pro 608 ) runs in rather extended conformation and partially closes the open side of the barrel, but because of a bulge at Pro 608 -Pro 612 , it only interacts through one hydrogen bond with both neighboring strands ␤16 (Thr 606 oxygen-Gln 621 oxygen; 2.99 Å) and ␤21 (Lys 601 oxygen-Leu 672 nitrogen; 2.86 Å), so strictly speaking, it cannot be considered a proper ␤-strand ( Fig. 2A). Overall, the IgSF fold corresponds to classic immunoglobulin-like domains, which usually function as cell adhesion molecules (69). In particular, it best fits into the C2  (70). In addition, the macroglobulin-like MG domains are also similar (71).
The IgSF contacts the bottom of the CD through an interface that generates the neck of the tooth and involves L␣1␤1, ␣2, L␣2␤2, and the end of ␣5 of CD, which fit into the concave outer surface of the IgSF barrel. In turn, L␤20␤21 of IgSF inserts like a wedge between CD C-terminal segment Ser 592 -Pro 600 and the Na 996 -stabilized loop L␣12␣13.
Catalytic Site and Active Site Cleft-Catalysis in Kgp occurs at the cusp of the tooth through binding of peptide substrates to the active site cleft (Fig. 2, A, D, and E). As occurs in ␣/␤hydrolase or PLEES fold enzymes, active site residues are provided by loops connecting strands at the C-terminal edge of the central ␤-sheet (here sheet II of the CSD). We serendipitously trapped the structure of Kgp(CDϩIgSF) in a covalent reaction intermediate mimic, which was interpreted, based on the excellent quality of the Fourier map (Fig. 2C), as an L-lysinylmethyl (LM) moiety attached to the S␥ atom of the catalytic cysteine Cys 477 (provided by L␤13␣10). The latter was identified as such by active site labeling and confirmed by mutagenesis (72). LM introduces an extra methylene group between Cys 477 S␥ and the carbonyl, mimicking the scissile carbonyl (Fig. 2C), so it does not yield a thioester and cannot be hydrolyzed. This covalent modification resulted from the irreversible cysteine peptidase chloromethyl ketone inhibitor TLCK used during purification (see "Experimental Procedures"). Inhibition of cysteine peptidases by TLCK was first reported by Cohen and co-workers (73), and such chloromethyl ketones are routinely used during protein purification to prevent degradation (74). Noteworthily, although these compounds target the catalytic cysteine in cysteine peptidases (75,76), they also inhibit serine peptidases but by covalent attachment to the aromatic N⑀2 of the respective catalytic histidines instead (77). In a second step, the nearby catalytic serine of serine peptidases may (78) or may not (77) attack the carbonyl of the N⑀2-attached carboxymethyl moiety and yield a tetrahedral reaction intermediate-like product covalently bound to both serine O␥ and histidine N⑀2.
The LM moiety enabled us to identify the active site cleft (Figs. 2, D and E, and 4). When viewed in the standard orientation of cysteine peptidases (28,79), i.e. with a substrate binding horizontally from left (non-primed side) to right (primed side), the cleft is rather flat and delimited by His 444 -Glu 447 from ␤10, Asp 388 -Val 395 from L␤7␣7, and Phe 527 at its bottom; Cys 476 -Ile 478 from L␤13␣10 at its back; and Pro 509 -Trp 513 from L␤14␣11 and Ile 573 -His 575 from L␣13␣14 at its top. L␤9␤10 contains His 444 , which strongly hydrogen bonds through its N␦1 atom the carbonyl group oxygen of the LM moiety (2.61 Å apart; Figs. 2, D and E, and 4). This oxygen is close to where the scissile carbonyl oxygen is expected to be in a true substrate after the acylation step of catalysis. This is consistent with His 444 playing a major role in catalysis, potentially as part of a charge relay system together with Cys 477 , as suggested previously (18).
Although most cysteine peptidases comprise a cysteine-histidine dyad as catalytic residues (80,81), the position and proximity of one of the carboxylate oxygens of Asp 388 from L␤7␣7 to His 444 N⑀2 in the present structure (2.68 Å apart; Fig. 2, D and E) strongly suggests a role for this aspartate in catalysis in Kgp as already described in the foot-and-mouth disease virus leader cysteine peptidase (82) and as postulated for RgpB. In the latter, however, Glu 381 (RgpB of P. gingivalis strain HG66 numbering in italics according to GenBank accession number AAB41892; see Fig. 1; subtract 229 for the protein numbering used in Protein Data Bank code 1CVR and Ref. 28) is found instead of an aspartate (27,28). This hypothesis would entail that Kgp had a catalytic triad spanning Cys 477 -His 444 -Asp 388 and that the cleavage mechanism would include a thiolate-imidazolium ion pair making an oxyanion hole-assisted nucleophilic attack on the scissile peptide carbonyl in the acylation step (81). Alternatively, the imidazole may also function as a general base and abstract a proton from the cysteine thiol group (81). In either case, the histidine imidazolium would thereafter transfer a proton to the leaving ␣-amino group of the downstream cleavage product, and the upstream part of the substrate would remain covalently bound as a thioester to the catalytic cysteine. We hypothesize that the aspartate would also have a role in protonation and thus side-chain orientation of the histidine imidazole/ium during catalysis as in serine peptidases. Unfortunately, site-specific mutagenesis failed to demonstrated the catalytic efficacy of third residues in other cysteine peptidases such as papain (81), so this hypothesis remains to be verified by other methods.
The complex of Kgp(CDϩIgSF) with LM also revealed that the specificity pocket in S 1 can accommodate a lysine side chain whose ⑀-amino group is tetrahedrally bound by one of the carboxylate atoms of Asp 516 ( Finally, the latter solvent is further bound to Asp 516 oxygen (2.79 Å) and Ser 520 O␥ (2.73 Å). The latter oxygen bridges the three-solvent cluster at the bottom of the pocket with the internal solvent channel (see above and Figs. 3D and 4). The aliphatic part of the lysine side chain of LM in turn is sandwiched among the side chain of Trp 513 , the main chain at Asn 475 -Cys 477 , and Ser 511 O␥. Replacement of lysine in P 1 with an arginine, which would match the specificity of RgpB, would entail rupture of the salt bridge with Asp 516 and the hydrogen bonds with Asn 475 oxygen and Thr 442 O␥ and possibly clashes with the latter atom, thus explaining why Kgp is specific for lysines and not arginines. This lysine specificity resembles trypsin, and like several trypsin-like serine peptidases, Kgp has been shown to cleave proteins involved in the blood coagulation/fibrinolysis cascade (24). In contrast to these serine peptidases, however, Kgp needs an anaerobic environment as found in the periodontal pockets of infected patients. Finally, upstream of S 1 , S 2 is shallow and small, framed by Tyr 512 , His 575 , Trp 513 , and Trp 391 (Figs. 2, D  and E, and 4). This explains why Kgp substrates do not have arginines or lysines at position P 2 (18).
Structural Kinship of the CD-Despite low overall sequence identity, the closest structural similarity of Kgp CD was found with the corresponding domain of the two essentially identical RgpB strain variants studied to date (strains HG66 and W83; Protein Data Bank codes 1CVR and 4IEF; Dali Z-score, 44.6 and 43.2; root mean square deviation, 2.1 and 2.0 Å; length of alignment, 336 and 328; sequence identity, 24 and 26%; see also Fig. 1). Although both Kgp and RgpB CDs generally fit well on top of each other, in particular at the regular secondary structure elements (Fig. 5, A and B), large differences are found in a number of loops as well as due to the long solvent-filled internal channel of Kgp, which is absent in RgpB. Notable insertions or deletions or substantially different chain traces are found at Kgp elements L␤2␣3, L␤4␤5, L␣5␣6 and the interface between NSD and CSD, L␤7␣7, L␣7␤8, L␤8␣8, L␣10␤14, L␣11␣12, L␣12␣13, and L␣13␣14. In particular, L␤7␣7 contains a long flap, Gly 382 -Ser 391 in RgpB (absent in Kgp) that partially replaces the elongated cap Gly 522 -Gly 539 of Kgp (in turn missing in RgpB). Consistently, the two Kgp helices ␣11 and ␣12 are a continuous single helix in RgpB. Overall, these differences also entail that, although both structures share the two calcium sites of Kgp but none of its sodium sites, an extra calcium site is found in RgpB at the region flanked by ␣7 and ␣11-L␣11␣12-␣12 in the CSD of the CD. Noteworthily, although catalytic activity of RgpB is ablated by the calcium chelator EDTA (83), Kgp remains unaffected (24). This supports that it is actually the extra calcium site of RgpB that is targeted by the chelator, whereas the two common calcium sites likely remain bound to the respective protein moieties. These differences also have a direct consequence for L␤7␣7, which provides the third putative catalytic acidic residue, Asp 388 (see above). This loop protrudes slightly more in Kgp, thus explaining why an aspartate suffices in the latter to approach the catalytic histidine, whereas a glutamate is required in RgpB (see Fig. 5, B and C), as the positions of the catalytic cysteines and histidines nicely coincide in both structures (Fig. 5C).
Substantial differences are also found in the specificity pockets. Interestingly, although in both structures an aspartate salt bridges the tip of the specific lysine or arginine (Asp 516 in Kgp or Asp 392 in RgpB, respectively), none of them is placed at the bottom of the pocket but rather on the side although on opposite walls of the pocket (Asp 392 is close to Gly 396 , and Asp 516 is close to Pro 516 ; see Fig. 5C), so the C␥ atoms of these aspartates are ϳ8 Å apart. This entails that, although the lysine is bound in extended conformation in Kgp, in RgpB the arginine side chain is rotated clockwise around C␦-N⑀ by ϳ50°to meet Asp 392 by means of a double salt bridge through its N1 and N2 atoms. This is enabled by the presence of Val 471 and Met 517 and by a rearrangement of RgpB region Ser 507 -Gln 520 (Ser 507 -Ser 520 in Kgp), which relocates Trp 513 and thus avoids a clash with the arginine (Fig. 5C). In summary, although the CD structures of Kgp and RgpB are quite similar, structural peculiarities account for their varying specificities and help to explain distinct catalytic efficiencies against comparable substrates (68).
Mechanistically, the CDs of gingipains belong to MEROPS database family C25 (80,84). They have been grouped together with families C11 (clostripain), C13 (legumain), C14 (caspases), C50 (separases), C80 (RTX self-cleaving toxin from Vibrio cholerae), and C84 (PrtH peptidase from Tannerella forsythia) into clan CD. All these families share the following properties (80): they are broadly distributed across all kingdoms of life, the catalytic histidine is found in a histidine-glycine motif preceded by a block of four hydrophobic residues (motif II according to Aravind and Koonin (85)), and the catalytic cysteine is found in an alanine-cysteine motif (exceptionally cysteine-cysteine in Kgp) preceded by a second block of four hydrophobic residues (motif III according to Aravind and Koonin (85)). In addition, they are specific for residues in position P 1 of substrates (arginine, lysine, asparagine, or aspartate for the distinct families) and resistant to the broad spectrum cysteine peptidase inhibitor E-64 but susceptible to chloromethyl ketone inhibitors (80). Structurally, despite being cysteine peptidases, Kgp and RgpB CDs are unlike any other clans and proteins structurally characterized to date. A structure-based search identified legumain as the closest structural relative of Kgp and RgpB (Fig. 5D), but it only matches the CSD and shows negligible sequence identity (Protein Data Bank codes 4AW9, 4AWA, and 4AWB; Dali Z-score, 13.5; root mean square deviation, 2.8 Å; length of alignment, 173; sequence identity, 14%). A certain similarity of the entire CD of Kgp and RgpB is also found with 2 ϩ 2 heterotetramers of caspases (see Fig. 6 in Ref. 28).
Conclusion-The present detailed structural analysis of an essential virulence factor of a major human periodontopathogen has revealed the molecular determinants of its mode of action and specificity. This may lay the foundations for the rational design of specific inhibitors (complementary to but distinct from those against RgpB) that may curtail the survival of the pathogen and palliate the effects of periodontal disease  (28)). The two calcium and two sodium ions of Kgp are depicted as red and blue spheres, respectively, and the three calcium ions of RgpB are depicted as green spheres. The catalytic active site residues are shown as sticks for each structure with carbons colored in the respective ribbon colors. B, superposition as in A but in standard orientation. C, close-up of B focusing into the active sites. RgpB is covalently modified at its catalytic cysteine S␥ atom by a Phe-Phe-Arg-CH 2  and its associated systemic disorders. This approach is complementary to the one aimed at developing inhibitors simultaneously targeting both types of gingipains (23).