Substrate Specificity in the Highly Heterogeneous M4 Peptidase Family Is Determined by a Small Subset of Amino Acids*

The members of the M4 peptidase family are involved in processes as diverse as pathogenicity and industrial applications. For the first time a number of M4 family members, also known as thermolysin-like proteases, has been characterized with an identical substrate set and a uniform set of assay conditions. Characterization with peptide substrates as well as high performance liquid chromatography analysis of β-casein digests shows that the M4 family is a homogeneous family in terms of catalysis, even though there is a significant degree of amino acid sequence variation. The results of this study show that differences in substrate specificity within the M4 family do not correlate with overall sequence differences but depend on a small number of identifiable amino acids. Indeed, molecular modeling followed by site-directed mutagenesis of one of the substrate binding pocket residues of the thermolysin-like proteases ofBacillus stearothermophilus converted the catalytic characteristics of this variant into that of thermolysin.

The members of the M4 peptidase family are involved in processes as diverse as pathogenicity and industrial applications. For the first time a number of M4 family members, also known as thermolysin-like proteases, has been characterized with an identical substrate set and a uniform set of assay conditions. Characterization with peptide substrates as well as high performance liquid chromatography analysis of ␤-casein digests shows that the M4 family is a homogeneous family in terms of catalysis, even though there is a significant degree of amino acid sequence variation. The results of this study show that differences in substrate specificity within the M4 family do not correlate with overall sequence differences but depend on a small number of identifiable amino acids. Indeed, molecular modeling followed by site-directed mutagenesis of one of the substrate binding pocket residues of the thermolysin-like proteases of Bacillus stearothermophilus converted the catalytic characteristics of this variant into that of thermolysin.
Thermolysin-like proteases (TLPs) 1 are members of the peptidase family M4 (1) of which thermolysin (TLN; EC 3.4.24.27) is the prototype. The phylogenetic tree for the M4 family is shown in Fig. 1A. The family contains only secreted eubacterial endopeptidases from both Gram-positive and Gram-negative sources. All members of this comprehensive family are produced as pre-pro-proteins. During export the pre-sequence (signal sequence) is cleaved off, whereas the prosequence has been shown to assist in proper folding by acting as a molecular chaperone (2). In addition, it has been shown that the prosequence can act as a specific inhibitor (3), thus preventing (4) unwanted proteolytic activity in the cytoplasm (2). The mature enzymes are all of moderate size, around 35 kDa (316 amino acids for thermolysin). These proteases contain the typical HEXXH amino acid motif, require Zn 2ϩ ions for their activity, and contain multiple Ca 2ϩ ions (up to four) for stability. All enzymes are optimally active at neutral pH (1,5).
For several of these enzymes three-dimensional structures are available (6 -9). In fact, thermolysin was among the first proteins for which the structure was solved. Although consid-erable sequence diversity exists within this family (Fig. 1B), there is a high degree of structural conservation. All members for which the structure has been solved were shown to consist of two major domains. The N-terminal domain contains mainly ␤-sheets, whereas the C-terminal domain predominantly contains ␣-helices. The active site is located in the cleft between these two domains. In those enzymes of which the structure has been determined, the catalytically essential Zn 2ϩ ion is located at the bottom of this cleft (Fig. 2). In a significant number of published structures in which TLN was co-crystallized with inhibitors (6, 10 -14), the residues involved in catalysis could be identified.
The family also includes enzymes from pathogens such as Legionella, Listeria, Clostridium, Staphylococcus, Pseudomonas, and Vibrio. For example, pseudolysin, the TLP from Pseudomonas aeruginosa, has been shown to cause tissue damage by degrading collagens, elastin, and fibronectin (15), whereas the TLPs from Listeria sp. appear to be involved in the maturation of specific virulence factors (16). Furthermore, the active site organization of M4 peptidases exhibits similarity to those of a number of eukaryotic metallopeptidases, in particular to members of the matrix metalloproteases (17). These latter enzymes were shown to be involved in a number of important processes in man, including the processing of precursors that play modulation roles in the formation of tumors. In addition, metalloendopeptidases are involved in many cellular processes such as exocytosis, cell-cell fusion, and neuropeptide hydrolysis (18). Consequently, metalloproteases of the M4 family have attracted increasing attention as model proteins for the development of specific inhibitors that can be applied for disease treatment (19). In addition, several members of this protease family are applied in industry, e.g. in baking, brewing, and leather processing (20). Thermolysin is being used for the synthesis of the artificial sweetener aspartame (20).
In this study we have characterized several TLPs of Bacillus and Staphylococcus species. The availability of an impressive amount of sequence, structural, and kinetic data renders this group of proteases an ideal subject for rational design strategies. Although some of the family members have been characterized individually (5,(21)(22)(23)(24), a consistent comparison with an identical substrate set and a uniform set of assay conditions has never been conducted. Previously it was suggested that TLPs exhibit a preference for large hydrophobic P 1 Ј residues (Leu or Phe) (1,17,21,22). In addition, it has been demonstrated that the S 1 Ј pocket is the major determinant of the substrate specificity (21). Here we show that the TLP family is an extremely homogeneous family in terms of catalysis, even though there is a significant degree of sequence variation. Furthermore, we show that existing differences in specificity and activity between two individual members can be canceled by a single amino acid substitution.

MATERIALS AND METHODS
Genetics-The nprM gene encoding TLN of Bacillus thermoproteolyticus (25), the nprT gene encoding the TLP of Bacillus stearothermophilus CU21 (26) (TLP-ste), the nprC gene encoding the TLP of Bacillus cereus (3)(TLP-cer), and the nprB gene encoding the TLP-sub of Bacillus subtilis (27) were cloned, subcloned, and expressed as described previously (28). The purified TLP-sau of Staphylococcus aureus (29) (aureolysin, EC 3.4.24.29) was kindly provided by Dr. J. Potempa. Site-directed mutagenesis was performed by the polymerase chain reaction-based mega-primer method, essentially as described by Sarkar and Sommer (30). Mutagenic primers were designed such that mutant clones could be recognized by the appearance or disappearance of an endonuclease restriction site (28). The nucleotide sequences of mutated fragments of the nprT gene were verified by DNA sequence analysis.
Modeling and Mutant Design-A three-dimensional model of TLPste was built on the basis of homology with thermolysin (86% sequence identity) using the molecular modeling program WHAT-IF (31). The modeling procedures have been described in detail elsewhere (32). Because of the high sequence similarity, the model was expected to be sufficiently reliable for prediction and analysis of the effects of most amino acid substitutions (32,33). This has been confirmed by the fact that the model has been used for the successful design of various stabilizing mutations (34 -37). Throughout this paper, residues in all TLPs are numbered according to the numbering of corresponding residues in thermolysin.
Production and Characterization of Enzymes-Production and purification of the enzymes were performed as described earlier (28,38). Before determining the kinetic parameters, protease preparations were desalted to 20 mM sodium acetate, pH 5.3, 5 mM CaCl 2 , and 20% isopropanol using pre-packed PD-10 gel filtration columns supplied by Amersham Pharmacia Biotech.
Specific activities of the TLPs toward casein were determined according to a method adapted from Fujii et al. (26). Approximately 0.5 g of protease was incubated in 1 ml of 50 mM Tris-HCl, pH 7.5 containing 0.8% (w/v) casein and 5 mM CaCl 2 at 37°C for 1 h. The reaction was quenched by the addition of 1 ml of a solution containing 100 mM trichloroacetic acid, pH 3.5. One unit of activity is defined as the amount of enzyme activity needed to liberate a quantity of acid-soluble peptide corresponding to an increase in A 275 nm of 0.001/min. The k cat /K m and K m values for furylacryloylated di-and tripeptides of the enzymes were determined at 37°C in a thermostated Perkin-Elmer Lambda 11 spectrophotometer. The reaction mixture (1 ml) contained 50 mM Tris, 50 mM MES, pH 7.0, 5 mM CaCl 2 , 5% Me 2 SO, 0.5% isopropanol, 0.01% Triton X-100, and 100 M to 2.5 mM substrate, and the reaction was followed by measuring the decrease in absorption at 345 nm (⌬⑀ 345 ϭ Ϫ317 M Ϫ1 ⅐cm Ϫ1 ) (21). All substrates were supplied by Bachem. Stock solutions of the furylacryloylated dipeptides 3-(2furylacryloyl)-L-glycyl-L-leucineamide and 3-(2-furylacryloyl)-L-glycyl-L-phenylamide, and of the furylacryloylated tripeptides 3-(2-furylacryloyl)-L-glycyl-L-leucine-L-alanine and 3-(2-furylacryloyl)-L-glycyl-Lphenylalanine-L-leucine were prepared by dissolving the peptides in Me 2 SO. The apparent second order rate constant k cat /K m was determined by varying the enzyme concentrations (over a 50-fold range) under pseudo-first-order conditions and measuring the initial activity, essentially according to the method described by Feder (21).
For the determination of thermal stability, 0.1 M purified protease solutions (in 20 mM sodium acetate, pH 5.3, 5 mM CaCl 2 , 0.01% Triton X-100, 0.5% 2-propanol, and 62.5 mM NaCl) were incubated at various temperatures for 30 min, after which the residual proteolytic activity was determined with casein as a substrate (26). Thermal stability was quantified by T 50 , being the temperature giving 50% residual activity after a 30-min period of incubation (32,40).
The proteolytic properties of the mutant enzymes toward ␤-casein (Sigma-Aldrich) were determined by means of HPLC. ␤ -Casein (1 mg ml Ϫ1 ) was incubated in 50 mM Tris and 50 mM MES, pH 7.0, 5 mM CaCl 2 , 0.01% Triton X-100 with each of the TLP variants at a molar ratio of 1,000:1 at 37°C for 24 h. The peptides resulting from hydrolysis were derivatized with dansyl chloride. The proteolytic products were separated by loading a sample corresponding to 50 g of ␤-casein on a reversed phase column (RP-304, Bio-Rad). The mobile phase used was 50 mM sodium acetate, pH 5.2. Peptides were eluted with a linear gradient of 0 -60% acetonitrile in 30 min at a flow rate of 1 ml min Ϫ1 . Absorption of the eluting peptides was monitored at 254 nm.

RESULTS
Enzymatic Properties toward Casein-To investigate the activity of the various M4 proteases on large protein substrates, the activity toward casein was determined. Casein was selected as a standard substrate for activity measurements because it behaves as a noncompact and largely flexible structure (41), thus rendering all scissionable motifs accessible to the same extent for the various proteases at all temperatures employed. Indeed, we have shown previously that digestion of ␤-casein with TLP-ste at different temperatures yielded identical degradation products (42). The results are shown in Table I. Most of the wild-type enzymes show similar specific activities, with a variation of a factor of approximately 3. The major exception is TLP-cer, which is much less active on casein than the other enzymes tested.
To determine the thermal stability and the optimal temperature for catalysis of the various proteases, we determined the T 50 (43) values and the temperature dependence of activity toward casein. The T 50 values are given in Table I. These values correlate well with the temperature optima of the TLPs as shown in Fig. 3 in the sense that the most thermally labile protease shows the lowest optimum temperature. To facilitate comparison, the maximum activity of the different TLPs has been normalized to 100%.
Between closely related TLPs a correlation exists between the degree of sequence identity and the difference in thermal stability (see Table I for a comparison of the sequence identity and the ⌬T 50 values). In all cases, the temperature optimum is just below the T 50 value determined, which is a direct result of the experimental procedures: the T 50 values were determined with a 30-min incubation period followed by determination of the remaining activity at 37°C, whereas the temperature optima were determined during a 1-h incubation period at the indicated temperatures. As a consequence, this longer incubation period can be expected to lead to a higher degree of inactivation at the elevated temperatures.
Inspection of Fig. 3 shows that the shape of the curve of TLN differs as compared with those of the other TLPs. Of the enzymes tested only TLP-sub and TLN shows Arrhenius behavior; the activity increases exponentially with the temperature. TLP-sau deviates from the other TLPs by showing an unexpectedly broad temperature optimum, suggesting that thermal (in)activation of this protease might differ from that in the other proteases.
Catalytic Properties of TLPs on Di-and Tripeptide Substrates-To determine the P 1 Ј substrate specificity, the activities of the various M4 proteases toward di-and tripeptide substrates were determined. Their activities on dipeptide and tripeptide substrates are shown in Tables II and III, respectively. The results show that both substrates with a Leu as well as with a Phe as P 1 Ј residue are efficiently hydrolyzed by TLPs.  As with casein, TLP-cer shows a relatively low activity toward dipeptide substrates. With dipeptide substrates, most TLPs prefer Leu over Phe at the P 1 Ј position, as shown by the Phe/Leu ratio for dipeptides. In contrast, the M4 family is often described as having an equal P 1 Ј preference for Leu and Phe (1,17,21,22). The diversity or similarity in primary amino acid sequence (Fig. 1B) is not reflected in either different or similar cleavage efficiencies for the peptide substrates tested. In fact, the substrate specificity of TLN is much more similar to that of TLP-sub than to that of TLP-ste, contrary to what might be expected on the basis of sequence similarity (45% and 86% identity, respectively). With the exception of TLP-cer, all activities on peptide substrates are less than one order of magnitude different from those of TLN.
In contrast, the inhibition constant for the inhibitor phosphoramidon, which was specifically designed for TLN, seems to correlate with the sequence difference. The TLPs that are phylogenetically close to TLN are much more sensitive to phosphoramidon as compared with those that are more distant.
HPLC Characterization of ␤-Casein Digests-To examine whether differences in substrate specificity can be observed on large peptide substrates, HPLC analyses were performed on ␤-casein hydrolysates by the various TLPs. Fig. 4 shows a detail of the reversed phase HPLC analyses of the peptides that were formed upon digestion of ␤-casein with the TLPs. A number of characteristic and reproducible products could be identified for each of the TLPs. Although the preference toward the small peptides used showed little variation, differences in the digestion patterns of ␤-casein are clearly detectable. This illustrates that differences can be much more readily detectable on large protein substrates than on small peptide substrates.
Site-directed Mutagenesis of the Active Site-The results presented above suggest that differences in substrate specificity between TLP variants are not correlated with overall sequence dissimilarities. To examine whether such differences might be reflected in the structure of the active site and substrate binding pockets, molecular modeling of the active sites of the different variants was employed. For several members of the family (TLN, TLP-cer, TLP-sau, and elastase) high resolution x-ray structures are available. In addition, models have been constructed for TLP-sub and TLP-ste. The latter models have previously been shown extremely useful for identifying structural features involved in thermal stability (32,42). The fact that the amino acid conservation in and around the active sites is very high suggests that they are structurally similar. We decided to compare the active sites of TLN and TLP-ste in more detail to identify structural features that could explain the observed differences in substrate specificity. The two enzymes are highly similar (86% sequence identity). In particular, in the active site regions the sequence conservation is very high. Therefore, the constructed model for TLP-ste is expected to be highly reliable in this region.
Our recent close inspection of the model of TLP-ste and careful comparison with the TLN structures available revealed that one of the major differences between the two TLPs in the active site region concerns residue 133, which is a Leu in TLN but a Phe in TLP-ste. From these studies it is now concluded that the S 1 Ј subsite is composed of the side chains of Phe-130, Phe-133 or Leu-133, Val-139, and Leu-202. Furthermore, inspection of the S 1 Ј pocket and the conformation of the various   It might be anticipated that the large Phe-133 residue in the S 1 Ј pocket will influence the binding of substrates in this specificity-determining pocket to a considerable extent. To test this hypothesis, Phe-133 in TLP-ste was substituted by Leu, and the effects on substrate specificity were determined. As documented in Tables II and III, the TLP-ste mutant shows enzymatic characteristics on di-and tripeptides that are much more TLN-like than TLP-ste-like. In addition, this mutation almost doubled the activity toward casein (Table I). Furthermore, the reversed phase HPLC patterns obtained with the TLP-ste F133L mutant showed some TLN-specific peaks, whereas some TLP-ste-specific peak continue to be present as well. However, the temperature optimum, the shape of the temperature curve, and the thermal stability of the single mutant remained identical to wild type TLP-ste. DISCUSSION The present study shows a correlation between the thermal stability and sequence identity of the various TLPs. This correlation with sequence identity does not exist for the differences in activity and specificity on both peptide substrates and casein. Although differences in specificity on the peptide substrates used are relatively small, HPLC analysis of digestion patterns of ␤-casein does show specific digestion patterns. The fact that the differences in activity and specificity can be canceled by mutating one amino acid in a substrate binding pocket indicates that a small set of identifiable amino acid residues, not overall sequence differences, is responsible for the differences in performance of these enzymes.
The comparison of the thermostability of closely related TLPs showed that a correlation exists between the sequence identity and the difference in thermal stability. However, inspection of Fig. 3 shows that TLP-sub and TLN are the only two enzymes of which the thermal activation shows Arrhenius-like behavior. A previously described hyperstable variant of TLPste (42) does not show this behavior (44). Thus it seems unlikely that thermal stability underlies this difference between these two and the remainder of the enzymes studied. Rather a process such as hinge bending (45, 46) could be a more likely cause for the difference between these two sets of enzymes.
The comparison of the enzymatic performance, i.e. activity and specificity, of TLPs from Bacilli and Staphylococcus indicates that overall divergence in primary sequence is not correlated with differences in activity and substrate specificity. In contrast, local sequence differences in the active site and binding pockets seem to be responsible for the majority of the differences in activity and substrate specificity. This hypothesis is supported by the observation that both the k cat /K m for the di-and tripeptide substrates differ less than 1 order of magnitude between the various enzymes, with the exception of TLP-cer, and as shown by the TLP-ste F133L mutant, the observation that the activity and substrate specificity of one variant can be changed into that of another by mutating just one binding pocket residue. However, this hypothesis seems to be contradicted by the apparent relation between the K i for phosphoramidon and the sequence difference. This relation can be explained by the fact that the most important residue for phosphoramidon binding is Phe-114 (6,14,47) present in both TLN and TLP-ste (low K i ), whereas the 114 position in TLP-cer, -sub, and -sau is occupied by an Ala (high K i ).
The similarity in activity and specificity of the various TLPs toward peptide substrates does not exclude the possibility that overall sequence differences can play a role in activity and specificity toward larger proteinaceous substrates. Analysis of the digestion patterns of ␤-casein indicated that there are clear differences in substrate specificity on protein substrates. However, the mutant TLP-ste F133L, which changes the specificity of TLP-ste to that of TLN on peptide substrates, also changes the digestion pattern on ␤-casein into a more TLN-like pattern. Although other explanations cannot be excluded, this suggests that the observed differences in specificity are mainly caused by differences in the active site and binding pockets and not by overall sequence differences.
The present study is the first example of an approach in which the enzymatic and catalytic properties of a significant number of members of the M4 peptidase family are compared under identical conditions. The need of such a comparison is obvious in view of the roles of members of this family in processes as diverse as pathogenicity and industrial applications. The notion that overall differences in sequence do not correlate with substrate specificity enabled us to modify the substrate specificity by site-directed mutagenesis of those residues di- rectly involved in substrate binding and catalysis. Indeed, a single amino acid substitution converted catalytic characteristics of one family member into that of another. Consequently it can be envisaged that specific inhibitors, for example to be used for blocking disease-related members of the metalloprotease family, can be designed on the basis of amino acid residues identified in TLPs. Thus, this study provides additional arguments for the potential of TLPs as a model system in the search for novel metalloprotease inhibitors. Both for the development of specific inhibitors as well as for the improvement of biocatalysts, a better understanding of existing relations between sequence, structure, and function is of considerable importance.