Structure- and Substrate-based Inhibitor Design for Clostridium botulinum Neurotoxin Serotype A*

The seven antigenically distinct serotypes of Clostridium botulinum neurotoxins cleave specific soluble N-ethylmaleimide-sensitive factor attachment protein receptor complex proteins and block the release of neurotransmitters that cause flaccid paralysis and are considered potential bioweapons. Botulinum neurotoxin type A is the most potent among the clostridial neurotoxins, and to date there is no post-exposure therapeutic intervention available. To develop inhibitors leading to drug design, it is imperative that critical interactions between the enzyme and the substrate near the active site are known. Although enzyme-substrate interactions at exosites away from the active site are mapped in detail for botulinum neurotoxin type A, information about the active site interactions is lacking. Here, we present the crystal structures of botulinum neurotoxin type A catalytic domain in complex with four inhibitory substrate analog tetrapeptides, viz. RRGC, RRGL, RRGI, and RRGM at resolutions of 1.6–1.8Å. These structures show for the first time the interactions between the substrate and enzyme at the active site and delineate residues important for substrate stabilization and catalytic activity. We show that OH of Tyr366 and NH2 of Arg363 are hydrogen-bonded to carbonyl oxygens of P1 and P1′ of the substrate analog and position it for catalytic activity. Most importantly, the nucleophilic water is replaced by the amino group of the N-terminal residue of the tetrapeptide. Furthermore, the S1′ site is formed by Phe194, Thr215, Thr220, Asp370, and Arg363. The Ki of the best inhibitory tetrapeptide is 157 nm.

Botulinum (BoNT) 2 and tetanus neurotoxins are causative agents of botulism and tetanus, serious neurological disorders.
Their LD 50 values in humans are in the range of 0.1-1 ng/kg (1,2), which makes them the most poisonous substances known. They are listed as Category A bioterror agents by the Centers for Disease Control and Prevention. Of the seven serotypes (A to G) produced by Clostridium botulinum, BoNT/A, BoNT/B, and BoNT/E (and possibly BoNT/C and BoNT/F) have been implicated in cases of botulism in humans, with BoNT/A being the most potent of them. Clostridium neurotoxins (CNT) are synthesized as single inactive polypeptide chains (150 kDa) and released as active dichains (heavy, 100 kDa; light, 50 kDa) held together by an interchain disulfide bond after cleavage by proteinases (3)(4)(5)(6). BoNTs act via a four-step process involving (i) cell binding, (ii) internalization, (iii) translocation into the cytosol, and (iv) enzymatic modification of a cytosolic target (7)(8)(9)(10)(11). BoNTs block the release of acetylcholine at the neuromuscular junction, causing flaccid paralysis, whereas tetanus blocks the release of neurotransmitters such as glycine and ␥-aminobutyric acid in inhibitory interneurons of the spinal cord, resulting in spastic paralysis. Despite different clinical symptoms, these etiological agents intoxicate neuronal cells in the same way and have similar functional and structural organizations (12)(13)(14)(15). The light chain is the catalytic domain and acts as a zinc endopeptidase on specific components of the neuroexocytosis apparatus in the cytosol of the target cell. BoNT/A and BoNT/E cleave SNAP-25 (synaptosomal-associated protein-25 kDa), whereas BoNT/B, BoNT/D, BoNT/F, and BoNT/G cleave the vesicle-associated membrane protein at specific peptide bonds. BoNT/C is unique because it is the only CNT capable of cleaving two substrates, SNAP-25 and syntaxin (16).
Because they are Category A bioterror agents, it is imperative that countermeasures are developed to block their toxicity by targeting any one of the four steps in their mechanism of action. Here, we present the design and structural basis of inhibitors to block the catalytic activity of the BoNT/A serotype. Although it is desirable to have a full-length BoNT/A light chain structure for inhibitor studies, we cloned the BoNT/A light chain (residues 1-424) (hereafter called Balc424), which is as active as the wild type because the full-length light chain was not amenable to crystallization (17).

MATERIALS AND METHODS
Cloning-The DNA encoding light chain 1-424 was PCRamplified using the forward primer 5Ј-ATGACCATGG-GCATGCCATTTGTTAATAAAC-3Ј, which adds an NcoI restriction site to the 5Ј-end, and the reverse primer 5Ј-CCGCTCGAGTTCAAACAATCCAG-3Ј, bearing a XhoI restriction site, to the 3Ј-end. The restriction sites are in bold and underlined. The PCR product was subcloned into the pET28b(ϩ) vector (Novagen) using the NcoI and XhoI restriction sites. The BoNT/A full-length light chain plasmid, a kind gift from Thomas Binz, was used as the template for PCR.
Expression and Purification-The plasmid encoding Balc424 was transformed into Escherichia coli BL21(DE3)-RIL cells (Stratagene). Cells were inoculated into ZYP-5052 medium containing antibiotics and grown at 37°C to an A 600 of about 0.6. The cells were then allowed to auto-induce overnight at 18°C (18). Cells were harvested, and the C-terminal hexahistidine-tagged protein was purified on nickel-nitrilotriacetic acid resin (Qiagen) by standard techniques. Fractions containing Balc424 protein were concentrated and loaded onto a size exclusion (S-200) column previously equilibrated with 2 mM dithiothreitol, 200 mM NaCl, and 20 mM HEPES buffer at pH 7.4. The purified protein was then stored at Ϫ80°C. Protein concentration was determined from ⑀ 280 ϭ 45.4 M Ϫ1 cm Ϫ1 .
Crystallization and Data Collection-Balc424 crystals were grown using the sitting drop vapor diffusion method at room temperature. Equal volumes (2 l) of protein (20 mg/ml) and reservoir solutions were mixed and equilibrated against 800 l of reservoir solution containing 15% (w/v) polyethylene glycol 3350, 0.3 M ammonium sulfate, and 100 mM Bis-Tris buffer at pH 6.8. Plate-like crystals obtained in a week were flash-frozen in liquid nitrogen after transferring to the mother liquor containing 15% ethylene glycol as a cryoprotectant. X-ray data were collected at beamline X29 of the National Synchrotron Light Source using a wavelength of 0.979 Å. Balc424 crystals diffract to at least 1.7-Å resolution. Data were indexed, processed, and scaled using HKL2000 (19) (shown in Table 1). Assuming one molecule of 50 kDa per asymmetric unit, the Matthews coefficient is 2.2 Å 3 /Da corresponding to an estimated solvent content of 43% by volume of the unit cell.
Structure Determination and Refinement-The crystal structure was determined by molecular replacement with MOLREP in the CCP4 program suite (20). The catalytic domain (residues 10 -420) of the full holotoxin type A (Protein Data Bank entry 3BTA) was used as a search model after removing flexible loops. Rigid-body refinements of initial solution followed by simulated annealing with CNS gave an excellent electron density map (21). Residues not included in the search model were built using program O, and the model was further refined (22). The final R and R free are 20.6 and 24.1%, respectively. The final model contains residues 1-423, two sulfate ions, and 230 waters. In addition to these, the model contains an acetate ion that coordinates to zinc in a bidentate fashion. Residue 424 and the C-terminal His tag were not modeled because of weak or missing electron density.
Balc424-Tetrapetide Complexes-Balc424 was cocrystallized individually with tetrapeptides (RRGC, RRGM, RRGL, and RRGI amides) by the sitting drop vapor diffusion method at room temperature using conditions similar to native protein.
Balc424 was mixed with a Ͼ10-fold molar excess of the tetrapeptide for cocrystallization trials. Balc424 gave better com-plex crystals with RRGC, RRGM, RRGL, and RRGI at stoichiometric ratios of 1:30, 1:30, 1:40, and 1:40, respectively. Structures of the complexes were determined by rigid-body refinement of the Balc424 model followed by stimulated annealing. Both the composite omit map and the difference Fourier map showed interpretable electron density for all the tetrapeptides. In RRGC, the residual density near the S-␥ atom of P3Ј Cys was modeled as a sodium ion. Initial attempts to model this residual density as a water molecule or sulfenic acid of P3Ј Cys were unsuccessful. Residual densities elsewhere were modeled as sulfate ions in all four complexes. The respective models were refined using CNS. The data and refinement statistics are given in Table 1. To confirm the sulfur position in RRGC and RRGM complexes, redundant data were collected at a wavelength of 1.54 Å. The anomalous difference Fourier maps computed with the model phases (excluding the tetrapeptide) clearly showed the sulfur position of cysteine and methionine, thus confirming the orientation of the peptide. The zinc ion is shown as a sphere in magenta. B, a ball-and-stick representation (green) of the active site residues with a coordination bond (solid lines) and hydrogen bond and saltbridge network (dashed lines). Zinc, carbon, oxygen, and nitrogen atoms are shown in magenta, olive green, red, and blue, respectively. Bound acetate ion is shown as a gray ball-and-stick model.

RESULTS AND DISCUSSION
Structure of Wild-type Balc424-The crystal structure of Balc424 was determined by the molecular replacement method using the light chain of holotoxin (Protein Data Bank entry 3BTA) as a search model. The asymmetric unit contains one molecule of Balc424 with an acetate ion at the active site replacing the nucleophilic water ( Fig. 1). The structure of Balc424 is essentially similar (root mean square deviation of ϳ1.3 Å for C␣ atoms) to reported structures of BoNT/A light chain except for the structural changes in the 200 and 250 loops (12,17,(23)(24)(25)(26)(27). These loops pack against the 370 loop and take a helical conformation (Fig. 1A), unlike other light chain structures where they are disordered. The electron density for these loops is well defined, showing ordered secondary structures, implying that this might be the most stable arrangement for this region. This may also be due to the allosteric effect of the acetate ion binding in the active site.
The active site of CNTs is characterized by a strictly conserved HEXXH ϩ E motif and a catalytic zinc ion. A water molecule coordinated to a zinc ion and hydrogen-bonded to the conserved glutamate of the HEXXH motif acts as a nucleophile for the proteolytic reaction. In the current structure, zinc takes hexa-coordination from nitrogen atoms of His 223 and His 227 , carboxylate oxygens of Glu 262 , and an acetate ion in a bidentate fashion (Fig. 1B). Oxygen atoms of the carboxylate group of an acetate ion are hydrogen-bonded to Glu 224 and the hydroxyl group of Tyr 366 , with one of them replacing the nucleophilic water. In the BoNT/B structure, the nucleophilic water is replaced by a sulfate ion, and such a replacement has been commonly observed in other enzymes (14,24). Mutating Glu 224 to Gln or Ala abolishes the catalytic activity completely, whereas mutating Tyr 366 to Phe or Ala reduces the activity in BoNT/A and BoNT/E (28 -31). Accordingly, Glu 224 is essential and acts as a general base for the activation, whereas Tyr 366 is required for stabilizing the transition state intermediate. Glu 351 is hydrogen-bonded to His 223 and also forms a salt bridge with Arg 363 . These interactions are essential to keep the active site geometry intact and maintain the charge distribution that is critical for substrate binding and catalysis. They are also observed in other serotypes, suggesting that all BoNTs follow similar catalytic mechanism. This is the first crystal structure of BoNT/A catalytic domain with one molecule per asymmetric unit and with the active site exposed to a solvent region unobstructed by crystal packing, making this crystal form ideal for inhibitor complex studies.
Substrate Binding and Catalysis-CNTs hydrolyze specific peptide bonds in the neuronal soluble N-ethylmaleimide-sensitive factor attachment protein receptor proteins (32). For example, BoNT/A cleaves the peptide bond Gln 197 -Arg 198 of SNAP-25. It is known that CNTs recognize the substrate at remote regions away from the active site called exosites to undergo efficient hydrolysis (33)(34)(35)(36)(37). The crystal structure of the complex between the double mutant (Y366F/E224Q) of Balc and the SNAP-25-(146 -204)-peptide has provided information about exosites, substrate recognition, and the substrate-binding mode (23). However, the critical interactions between the enzyme and the substrate at the proteolytic site are missing, probably due to the introduced mutation. To fill this information gap and understand the interactions between the substrate and BoNT/A at the active site, we determined the high-resolution crystal structures of Balc424 in complex with inhibitory tetrapeptides (partly mimicking the substrate), RRGX, where X ϭ C, M, L, or I. These peptides are designed to mimic the substrate scissile bond (P1 and P1Ј) and are designed for optimal binding (P2Ј and P3Ј) as well as inhibition. Balc424 was cocrystallized with tetrapeptides RRGC, RRGM, RRGL, and RRGI (all containing a C-terminal amide group) and are hereafter referred to as such to represent the complexes ( Table 1). The native Balc424 structure was used as the starting model with data collected from crystals of complexes. Models of tetrapeptides were fitted unambiguously in the difference Fourier maps calculated after rigidbody refinement followed by simulated annealing refinement with CNS (Fig. 2). The orientation of the tetrapeptide was verified and confirmed by the anomalous signal from sulfur (Cys/Met).
Crystal structures of the Balc424tetrapeptide complex provide substantial information about the substrate-binding mode and its essential and critical interactions for proteolysis at the active site. Overall, the structure and active site of Balc424 in all four tetrapeptide complexes are similar to the acetate ion-bound native structure (root mean square deviation of ϳ1.0 Å for 400 common C␣ atom pairs). Also, the overall binding mode, orientations, and molecular interactions of all four tetrapeptides with the enzyme are similar except for the side chain conformation of the second arginine (Fig. 3A). The tetrapeptides are tightly bound to the protein (average B-factor and buried surface area for the peptides, Table 1) and mostly interact with the 200, 250, and 370 loops.
In all four structures, the nitrogen atom of the N-terminal amino group of the tetrapeptide replaces the nucleophilic water, coordinates with zinc, and forms hydrogen bonds with the general catalytic base, Glu 224 (Fig. 3, B and C). The guanidinium group of the first arginine in the tetrapeptide forms a salt bridge with Glu 164 in the S1 subsite while the side chain of the second arginine of the peptide binds in the S1Ј pocket and makes a salt bridge with Asp 370 (Figs. 3 and 4). In addition to the salt bridge, the guanidinium group of the second arginine stacks between the guanidinium group of Arg 363 and Phe 194 . Mutation of Glu 164 or Asp 370 reduces the catalytic activity of BoNT/A, indicating that these residues are important for substrate binding and positioning of the scissile bond for nucleophilic attack and proteolysis (25,38). Therefore, it is reasonable to say that the first two arginines of the tetrapeptide mimic P1 and P1Ј of the substrate and accordingly are called P1 and P1Ј hereafter. Significant reduction in enzymatic activity was observed when Tyr 366 was mutated to Phe (30,38), and here Tyr 366 OH forms a strong hydrogen bond with P1 carbonyl oxygen, confirming that Tyr 366 stabilizes the tetrahedral intermediate of the carbonyl carbon of P1 (28,39). Moreover, this carbonyl oxygen forms a coordination bond with the zinc ion, resulting in zinc having six coordinations with distorted octahedral geometry. In the Balc-SNAP-25-(146 -204)-structure,   the crucial interactions between the scissile bond of the P1 and P1Ј residues (Gln 197 -Arg 198 ) of SNAP-25 and the active site are missing, probably because of the double mutation. This is the first structural study showing critical interactions between the substrate and BoNT/A at the proteolytic site responsible for catalytic activity.
Arg 363 and Glu 351 are conserved across all serotypes, and their role in catalytic activity has been established both by mutational and structural studies in BoNT/E and BoNT/A (28,30,40). In addition to interactions similar to native structure, the side chain of Arg 363 is hydrogen-bonded to P1Ј carbonyl oxygen and helps in positioning the substrate for catalytic activity. In all four structures, the S1Ј site is formed by Phe 194 , Thr 215 , Thr 220 , Arg 363 , and Asp 370 . Also, the C-terminal amide oxygen of the tetrapeptide makes a hydrogen bond with the backbone nitrogen of Asp 370 . These interactions are common in all four Balc-tetrapeptide complex structures (Fig. 4). The C-terminal side chain of the tetrapeptide is packed against the hydrophobic pocket that is composed of Pro 206 , Leu 207 , Tyr 250 , Tyr 251 , Met 253 , Leu 256 , Phe 369 , and Phe 423 (Fig. 5). Ten water molecules in the vicinity of the active site and tetrapeptide are involved in stabilizing the complex.
Enzymatic Mechanism-The present structures of enzyme-tetrapeptide complexes could represent the Michaelis complex formation, though the P1 amino group replaces the nucleophilic water (Fig.  3C). P1 carbonyl oxygen forms a coordination with a zinc ion and also hydrogen bonds with OH of Tyr 366 , whereas P1Ј carbonyl oxygen is hydrogen-bonded to NH1 of Arg 363 . These two interactions position and stabilize the substrate for catalytic action. Based on our structural and available biochemical information, we are proposing the following catalytic mechanism. Glu 224 acts as a general base by abstracting a proton from the nucleophilic water. The nucleophilic water attacks the carbonyl carbon of the scissile bond, which forms the tetrahedral transition state (Fig. 6). The zinc ion and Tyr 366 might stabilize this intermediate tetrahedral transition state. The shuttling of protons with the help of Glu 224 assists subsequent formation of a stable leaving amino group. This model is consistent with our model proposed for BoNT/B and BoNT/E and is similar to thermolysin (29,39,41).
Substrate-based Inhibitor-Developing a substrate-based peptide inhibitor is a common strategy for designing inhibitors for proteases. Recently, crystal structures of BoNT/A light chain in complex with inhibitors L-arginine hydroxamate, 4-chlorocinnamic hydroxamate, and 2,4-dichlorocinnamic hydroxamate have been reported (17). These inhibitors inhibit utmost at 300-nM levels. In this study we used substrate-and structure-based approaches to search for better inhibitors. The inhibitor was designed based on two main criteria: the charge distribution in the active site cavity and the substrate proteolytic site (42). Fig. 7 shows the electrostatic surface potential diagram for BoNT/A active site. It is evident that the zinc-binding region is mostly negatively charged, and the region where loops 200, 250, and 370 come together is hydrophobic. Accord- ingly, the inhibitor should have positively charged residues at one end and hydrophobic ones on the other to enhance the binding affinity. BoNT/A cleaves at the peptide bond between Gln 197 (P1) and Arg 198 (P1Ј) of SNAP-25. P1Ј was maintained as Arg as in the substrate because it maintains the positive charge and also fits well at the S1Ј site. Moreover, P1Ј is critical for the substrate binding (43). P1 was changed to Arg from Gln to enhance positive charge. Based on this we designed four tetrapeptides (RRGC, RRGM, RRGL, and RRGI), all of them having hydrophobic residues of varying size at the C terminus. The K i determined for the inhibitors with a 17-mer SNAP-25 substrate peptide using the full-length Balc was 157 nM, 660 nM, 786 nM, and 845 nM for RRGC, RRGL, RRGI, and RRGM, respectively. When the assay was done in the presence of ZnCl 2 and dithiothreitol, the IC 50 for RRGC significantly increased, but those of the other three were unaffected. 3 The latter three or their derivatives, potentially being free from the intracellular redox environment, should thus be suitable for further studies as drug candidates. This is the first report of a high affinity, small peptide inhibitor of BoNT/A protease activity in solution and the structural study in complex with the target.
Analysis of intermolecular forces between the enzyme and peptide molecule indicates that the N-terminal arginines of the tetrapeptides play a vital role in the binding of inhibitor to the enzyme. As we expected, the two N-terminal arginines of the peptide inhibitor fit in the acidic region, and the C-terminal hydrophobic residues (P3Ј) of all four (Cys, Met, Leu, and Ile) pack against the hydrophobic region (Figs. 3 and 5). P2Ј Gly acts as a flexible linker between the two. The salt bridges formed by the N-terminal P1 Arg with Glu 164 and P1Ј Arg with Asp 370 anchor the inhibitor for binding. The nucleophilic water has 3 M. L. Ludivico, S. Swaminathan, and S. A. Ahmed, unpublished data.  been displaced by the amino group of P1 and is not available for catalytic activity. The optimum length for a peptide inhibitor has been suggested to be a heptapeptide (14,24). However, we find that the tetrapeptide in our case has a lower K i than the heptapeptide. However, it is worth exploring the effects of adding residues to the C terminus. The structural work presented here will form the basis for design of more potent inhibitors for this neurotoxin.