Substrate specificity of the hepatitis C virus serine protease NS3.

The substrate specificity of a purified protein encompassing the hepatitis C virus NS3 serine protease domain was investigated by introducing systematic modifications, including non-natural amino acids, into substrate peptides derived from the NS4A/NS4B cleavage site. Kinetic parameters were determined in the absence and presence of a peptide mimicking the protease co-factor NS4A (Pep4A). Based on this study we draw the following conclusions: (i) the NS3 protease domain has an absolute requirement for a small residue in the P1 position of substrates, thereby confirming previous modelling predictions. (ii) Optimization of the P1 binding site occupancy primarily influences transition state binding, whereas the occupancy of distal binding sites is a determinant for both ground state and transition state binding. (iii) Optimized contacts at distal binding sites may contribute synergistically to cleavage efficiency.

The N-terminal third of the hepatitis C virus (HCV) 1 NS3 protein contains a trypsin-like serine protease that accomplishes four out of the five processing events that take place during maturation of the nonstructural portion of the HCV polyprotein, performing cleavages at the NS3-NS4A, NS4A-NS4B, NS4B-NS5A, and NS5A-NS5B junctions (1)(2)(3)(4)(5)(6). It has been shown that cleavage between NS3 and NS4A is an intramolecular event, whereas all remaining junctions are processed in trans.
In vivo, NS3 appears to form a heterodimer whereby the protease domain associates with the viral protein NS4A. The latter is a 54-residue protein that has been shown to bind to the N-terminal region of the protease via a central hydrophobic domain spanning residues 21-34 (7)(8)(9)(10)(11)(12)(13). NS4A acts as a cofactor of the protease enhancing cleavage at all sites and being an absolute requirement for processing of the NS4B/NS5A junction ex vivo (7). Several studies have shown that a peptide encompassing the central hydrophobic domain of NS4A is sufficient for eliciting activation of the protease (12, 14 -17).
Serine proteases contact the P1 residue of their substrates through characteristic specificity pockets. The residues flanking the specificity pocket are important determinants of substrate recognition. Homology modelling of the S1 specificity pocket of the NS3 protease has predicted the presence of a phenylalanine as a prominent feature, thus rendering the pocket rather small and hydrophobic (18). These characteristics have led to the prediction of the preference for small, hydrophobic residues, ideally cysteine residues, in the P1 position of NS3 substrates. Radiosequencing of the single cleavage products has subsequently confirmed these predictions, yielding the consensus sequence (D/E)XXXXC(A/S) for all trans cleavage sites, with X being any amino acid and the scissile bond being located between Cys and Ala or Ser (2,18). The homology model has been used to successfully redesign the enzyme's specificity, thereby increasing its validity. Very recentyl the three-dimensional structure of the protease has been solved by two different groups (20,21), confirming the presence of a phenylalanine ring pointing into the pocket. The sequences of the four NS3 cleavage sites are listed in Table I, indicating that the intramolecular cleavage site between NS3 and NS4A differs from the consensus having a Thr in the P1 position.
Substrate specificity of the NS3 protease has been investigated by several groups in a qualitative way using transient transfection (22,23), in vitro translation (24), or intracellular processing of fusion proteins in Escherichia coli (25). Availability of quantitative data using peptidic substrates has been so far hampered by difficulties in expressing and purifying sufficient amounts of enzymatically active recombinant NS3 protease. We (17,26) and others (15,20,21,(27)(28)(29)(30)(31)(32)(33) have recently described efficient heterologous expression and purification of the enzyme and were able to define optimized conditions for the determination of protease activity (26).
In the present work we investigate the substrate specificity of the NS3 protease domain by introducing systematic modifications in a peptide substrate derived from the sequence of the NS4A/NS4B junction. Activity on modified substrates has been determined both in the presence and absence of the NS4A co-factor. The results are discussed in the light of our previous homology model of the S1 pocket of the enzyme.

MATERIALS AND METHODS
Enzyme Preparation-The protease domain of the HCV Bk strain NS3 protein encompassing residues 1027-1206 of the viral polyprotein was purified from E. coli as described previously (26). The enzyme was homogeneous as judged from silver-stained SDS-polyacrylamide gel electrophoresis and Ͼ95% pure as judged from reversed phase HPLC performed using a 4.6 ϫ 250-mm Vydac C4 column. The enzyme preparations were routinely checked by mass spectrometry done on HPLC purified samples using a Perkin Elmer API 100 instrument and Nterminal sequence analysis carried out using Edman degradation on an Applied Biosystems model 470A gas-phase sequencer. Both techniques indicated that in more than 90% of the enzyme molecules the Nterminal methionine and alanine had been removed, yielding an enzyme starting with Pro 2 . Enzyme stocks were made 50% in glycerol content, quantified by amino acid analysis or by determining their absorbance at 280 nm (17), shock-frozen in liquid nitrogen, and kept in aliquots at Ϫ80°C until use. Control experiments had shown that this freezing procedure does not affect specific activity of the enzyme.
Peptides and HPLC Assays-Peptide synthesis was performed on a NovaSyn Gem flow synthesizer by Fmoc/t-Bu chemistry. Protecting groups were as follows: N alfa (Fmoc), Asp(Ot-Bu), Glu(Ot-Bu), Tyr(t-Bu), * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Concentration of stock solutions of peptides, prepared in Me 2 SO and kept at Ϫ80°C until use, was determined by quantitative amino acid analysis performed on HCl-hydrolyzed samples.
Cleavage assays were performed in 27 l of 50 mM Tris, pH 7.5, 2% CHAPS, 50% glycerol, 10 mM dithiothreitol, to which 3 l of substrate peptide in Me 2 SO, leading to 10% final Me 2 SO concentration, were added. Enzyme concentrations (50 nM to 6 M) and incubation times were chosen to obtain Ͻ20% substrate conversion. Where indicated, Pep4A having the sequence GSVVIVGRIILSGR(NH 2 ) was added at a final concentration of 3 M. Reactions were stopped by addition of 70 l of 0.1% trifluoroacetic acid. Cleavage of peptide substrates was determined by HPLC using a Merck-Hitachi chromatograph equipped with an autosampler. 90-l samples were injected on a Lichrospher C18 reversed phase cartridge column (4 ϫ 125 mm, 5 m, Merck) or on a Beckman Ultrasphere ODS column (4.6 ϫ 250 mm) and fragments were separated using a 3-100% acetonitrile gradient at 2%/min. Peak detection was accomplished by monitoring both the absorbance at 220 nm and fluorescence ( ex ϭ 260 nm, em ϭ 305 nm).
Cleavage products were quantitated by integration of chromatograms with respect to appropriate standards. Initial rates of cleavage were determined on samples having Ͻ20% substrate conversion. Kinetic parameters were calculated from least-squares fit of initial rates as a function of substrate concentration with the help of a Kaleidagraph software, assuming Michaelis-Menten kinetics. Where individual determination of k cat and K m was not possible k cat /K m values were calculated from the slope of the linear part of the Michaelis-Menten plot at substrate concentrations Ͻ K m . All experiments were repeated at least three times.
To determine K i values, initial velocities were determined as a function of substrate concentration in the presence of three fixed competing inhibitor concentrations. K i values were calculated from least-squares fit using a modified Michaelis-Menten equation (Equation 1),  (34) were added to 100 nM enzyme solution added from a 5 M stock, previously quantitated by amino acid analysis. 50-l samples were withdrawn at timed intervals and immediately quenched by addition of 50 l of 1% trifluoroacetic acid. A total of 48 data points were collected within 3 min of reaction. The pre-steady state burst expected to equal the concentration of active sites in the enzyme preparation was determined by extrapolation of product formation to zero time.

RESULTS
Estimation of Active Site Concentration-We first addressed the question of whether the activity of our protease preparation was attributable to a fully active enzyme or only to a fraction of enzymatically active protease molecules. To this purpose we used the fact that serine proteases follow a two-step mechanism: acylation of the active site serine residue with concomitant release of the C-terminal fragment of the substrate is followed by deacylation and the release of the N-terminal fragment. The former acylation reaction is usually the rate-limiting step for amide bond scission, whereas for ester substrates the acylation reaction is usually fast as compared with deacylation. Thus, for ester substrates the pre-steady-state release of the C-terminal leaving group creates an initial burst that equals the number of active sites encountered by the substrate (35).
To determine the number of active sites in our enzyme preparation 100 M fluorogenic ester substrate was added to 100 nM enzyme, the reaction was stopped at timed intervals and samples were analyzed by HPLC (Fig. 1). Extrapolation of the linear phase of the reaction indicated a burst of 94 Ϯ 10 nM demonstrating that 94% of the protease molecules in our preparation were enzymatically active.
Substrate Specificity-We investigated the substrate specificity of the NS3 protease by introducing several modifications into a decamer peptide whose sequence was based on the NS4A-NS4B junction. The choice for this sequence was determined by the relative ease of synthesis (the other two NS3 trans-cleavage sites have two cysteine residues that are prone to oxidation), by the fact that the corresponding junction is cleaved with a relatively high efficiency in the context of the polyprotein, and taking into account that previous mutagenesis studies have shown that this junction is intermediate with respect to its sensitivity to mutations (22). All experiments were done in buffer containing 50% glycerol and 2% CHAPS. These conditions have been worked out to yield optimum activity (26). However, glycerol might exert differential effects on the binding of polar and non-polar substrates. Since we found that glycerol activation is a very complex phenomenon, affecting both K m and k cat values of substrates, and in addition having a dramatic effect on the dissociation constant of the NS3-Pep4A complex, no attempts were made to study the influence of this agent on the interaction of the enzyme with different substrates. Our data therfore have to be interpreted with the caveat that they have been obtained in the presence of this not entirely passive solvent.
P1 Substitutions-The consensus sequence of the NS3-dependent junctions within the HCV polyprotein points to con- servation of negatively charged residues in the P6 position, a cysteine or a threonine residue in the P1, and an alanine or a serine residue in the P1Ј positions (Table I). In an attempt to better define the P1 specificity of the enzyme a series of cysteine substitutes, with a special focus on both natural as well as non-natural small hydrophobic residues, were introduced in the P1 position using a decamer NS4A-NS4B substrate spanning from P6 to P4Ј. This length has been shown by earlier work to be the optimum compromise between number of residues, cleavage efficiency, and ease of detectability by HPLC (26). Most P1 substitutions resulted in uncleaved substrates (Table II). We have estimated that our cleavage detection limit was in the order of k cat /K m ϭ 0.05 M Ϫ1 s Ϫ1 . Among the natural amino acids only threonine was accepted in P1 in the absence of Pep4A while addition of the co-factor also allowed cleavage, albeit at barely detectable levels, of a substrate having a valine residue in P1. The best cysteine substitutes turned out to be homocysteine, having the side chain length increased by 1 -CH 2 -unit with respect to cysteine, and allylglycine, in which the sulfhydryl group of cysteine is replaced by an ethenyl group. Still, these P1 residues decreased cleavage efficiency by 5-10-fold, both in the absence and presence of Pep4A. Replacement of the SH group by an amino or a hydroxyl group (in diaminopropionic acid and serine, respectively) abolished cleavage, as did incorporation of the SH group in a thiophene ring or its carboxymethylation. Replacement of the cysteine sulfhydryl by a methyl group in aminobutyric acid was compatible with cleavage of the resulting decamer, although at the expense of a 15-(ϩPep4A) to 50-fold (ϪPep4A) decrease in the respective k cat /K m values. Increasing the length of the side chain by incorporating norvaline in the P1 position resulted in a peptide that was cleaved only in the presence of Pep4A.
To determine whether inability of the protease to cleave substrates with certain P1 substituents was due to poor ground state binding or to their inability to proceed through the transition state we determined the K i values of some selected peptides using the decamer peptide with cysteine in P1 as substrate. For amide substrates of serine proteases the relationship K i ϳ K m ϳ K d usually holds, indicating that K i values are very good approximations of the true dissociation constants of the enzyme substrate complex (36, 37). We verified this experimentally by determining both for the substrate having Abu as P1 residue and obtained (in the presence of Pep4A): K m ϭ 97 Ϯ 28 M and K i ϭ 133 Ϯ 15 M. Next, we determined the K i values of decamer peptides having alanine, proline, phenylalanine, or serine in P1, and that were therefore not cleaved (Table III). Interestingly, the K i values differ only by a factor of 2-8 from the K m value determined for the wild type substrate, indicating that the P1 residue makes relatively minor contributions to ground state binding. This is true even for a bulky substituent such as phenylalanine.
P6 and P1Ј Substitutions-To investigate the role of the side chain length of the P6 residue we substituted the aspartic acid residue present in the NS4A-NS4B sequence by a glutamic acid. This substitution, although affecting slightly k cat and K m individually, had no significant effect on the overall cleavage efficiency (Table IV). Neutralization of the negative charge by introduction of an asparagine residue decreased the cleavage efficiency by a factor of 5. This effect was attributable mainly to an increase in K m . When the charge in the P6 position was inverted through introduction of a lysine residue, a pronounced decrease in k cat /K m was observed, which was again attributable to an impairment in ground state binding of the resulting substrate, as judged from the increase of the respective K m values. All these effects were less pronounced in the presence of the Pep4A cofactor.
Substitution of the P1Ј alanine residue by a serine residue, which is found in this position in other substrates, moderately (2-5-fold) decreased k cat /K m (Table IV). Conversely, introduction of a phenylalanine residue in P1Ј decreased k cat /K m by 2 orders of magnitude in the absence of the co-factor and 25-fold in its presence.
Substrate Alanine Scanning-To investigate the relative contribution to cleavage efficiency of positions other than P1,  P6, and P1Ј we performed an alanine scanning experiment. The experiment was repeated also in the presence of saturating amounts of Pep4A. Table V summarizes the results. Only the substitution of the P1 cysteine resulted in complete abolishment of cleavage both in the presence and absence of the co-factor. Introduction of alanine residues in other positions had only slight effects on cleavage efficiency. The largest effect (a 7-fold decrease in efficiency) was observed for the P3 position in the absence of Pep4A. Inverse Alanine Scanning of a "Minimalist" Substrate-The picture that emerges from our data confirms the importance of the consensus P1 and P1Ј residues, and to a lower extent also of the P6 residue, in determining cleavage efficiency of a given substrate. Still, there are remarkable differences in the kinetic behavior of the single cis cleavage sites which, nevertheless contain the same consensus P6, P1, and P1Ј residues (22,23). We have recently shown that these differences can be reproduced using decamer peptide substrates (26). Thus, there must exist additional determinants. Failure to detect them in the above mentioned alanine scanning experiment might indicate that it is the sum of several minor contributions that modulates the recognition of substrates containing the consensus residues in P6, P1, and P1Ј. To start to address this issue we synthesized a polyalanine peptide containing the consensus P6, P1, and P1Ј residues. Based on the results of the alanine scanning, pointing to the P3 residue as important contribution to cleavage efficiency we decided also to fix a glutamic acid in this postion. We extended the peptide to the P6Ј residue, which being a tyrosine facilitated HPLC detection of cleavage products via monitoring of tyrosine fluorescence. We further introduced a lysine residue in P7Ј.
The parent peptide Ac-DAAEACAAAAPYK was cleaved, but cleavage was slowed down 70 -85-fold with respect to the wild type sequence. Inspection of the sequences of the natural cleavage sites reveals that a negatively charged residue is conserved in the P5 position of two out of four cleavage sites. Re-introduction of the wild type aspartate in our minimalist substrate indeed resulted in a modest (ϳ2-fold) enhancement of cleavage efficiency (Table VI).
In the P4Ј position of natural substrates there appears to be a preference for hydrophobic residues (Table I). As a matter of fact, Leu, Tyr, or Trp residues are found in this position. Introduction of Tyr into the P4Ј position of the minimalist substrate had no detectable effect, whereas Leu modestly increased cleavage rates. Both P4Ј substituted peptides were very insoluble, thus not permitting individual determinations of k cat and K m . When both P3 Glu and P4Ј Leu were re-introduced in the sequence a 20 -23-fold enhancement of cleavage efficiency was observed, yielding a substrate that was cleaved with 22-34% efficiency with respect to the wild type sequence. DISCUSSION The homology model of the specificity pocket of the NS3 protease predicts that both its shape and its physico-chemical environment be primarily determined by the presence of phenylalanine 213 (according to the chymotrypsin numbering). Furthermore, the pocket was predicted to be very hydrophobic and closed by the aromatic ring of the phenylalanine. While this article was in preparation the crystal structure of NS3 protease has been solved independently by two groups (20,21). In the published structures the side chain of Phe 213 is indeed pointing inside the S1 pocket, thereby confirming the model. As the sulfhydryl group of cysteine has been shown to favorably interact with the aromatic ring of phenylalanine, cysteine has been suggested to be the most reasonable P1 residue. Our kinetic data are in line with these predictions. In fact, P1 substitutions have shown the following order of preferences: Cys Ͼ hCys ϳ Alg Ͼ Abu Ͼ Thr Ͼ NVal Ͼ Val. Attempts at substituting the thiol group of cysteine with a hydroxyl group in serine or an amino group in diaminopropionic acid resulted  in uncleaved peptides. Most likely these side chains are too hydrophilic to favorably interact with the hydrophobic milieu of the S1 pocket. Increasing the side chain length of the P1 residue in homocysteine and in allylglycine resulted in substrates that were still reasonably well cleaved. In contrast, incorporation of NVal, having the same side chain length resulted in a more pronounced impairment of cleavage efficiency. This could be related to a favorable interaction of the SH or the allyl group with the phenylalanine ring in the pocket which is expected not to occur in the case of the methyl group in NVal. The kinetics of Thr and Val substituted peptides, the only two branched residues for which we could detect cleavage, deserve some further comment. The peptide substrate containing Val in P1 was detectably cleaved only in the presence of Pep4A and with a relative efficiency that was 15-fold lower than that observed for Thr in the P1 position. As a matter of fact the isopropyl branch seems detrimental to productive transition state binding as judged by the preference of NVal over Val. Fig. 2 shows a schematic view of the S1 pocket together with a cysteine docked into the pocket and a comparison of the conformation of this cysteine with the conformers of Thr and Val most commonly found in proteins. In this conformation the Val side chain is more likely to encounter steric hindrances in contacting the pocket than would be expected for Thr (Fig. 2). Alternatively, it could be assumed that for both Thr and Val only a methyl group of the side chain will point into the S1 pocket, whereas the other branch (a hydroxyl group in Thr and a methyl group in Val) will point out of the pocket. In this view, the fact that Thr is preferred over Val could indicate that its hydroxyl group will make some contacts outside the pocket that cannot be made by the methyl group of Val.
It is interesting to compare our data obtained with peptidic substrates to previous reports in which point mutations were introduced into polyprotein substrates. Kolykhalov and coworkers (22) have shown that susceptibility to mutations depends on the sequence context, the NS4A/NS4B cleavage site being intermediate between the least sensitive NS3/NS4A junction and the most sensitive NS5A/NS5B cleavage site. Among several substitutions, only Arg and Asp in the P1 position of the NS4A/NS4B junction resulted in complete abolishment of cleavage while a gradient following the order Asn Ͻ Gly Ͻ Ser Ͻ Thr Ͻ Cys Ͻ Leu was observed under conditions of short metabolic labeling pulses. Remarkably, in this experiment Leu proved to be even superior to Cys as P1 residue. In similar experiments Bartenschlager et al. (23) found that in the context of a NS4A/NS4B junction cleavage was reduced but still well detectable upon substitution of the P1 Cys with Phe, Ser, Thr, and Ala. Clearly, these findings are at variance with our data. However, both the experimental context and the nature of the substrates that have been used might explain these differences. In fact, in transient transfection experiments, even using short labeling times, the accumulation of considerable amounts of substrate and cleavage products is inevitable leading to deviation from true initial rates. This fact compresses differences in cleavage efficiencies. Furthermore, it is possible that NS3 is more active on polyprotein substrates than it is using a peptide substrate, thereby being less discriminative against suboptimal P1 residues. Differences in specific activity using either polyprotein or peptidic substrates have been reported for other proteases such as CMV protease (38) or tPa (39). Using in vitro translated substrates based on the NS5A/NS5B junction we have estimated the specific activity of added purified protease to be in the order of k cat /K m ϭ 200,000 M Ϫ1 s Ϫ1 . 2 Nevertheless, we wanted to rule out that the relatively low activities we were observing using peptidic substrates were due to an only partially active enzyme. We have shown that we were indeed working with an enzyme population that was composed of more than 90% of enzymatically active molecules, indicating that the activities we measured under our experimental conditions were intrinsic features of the enzyme.
We found that optimized S1 pocket occupancy was a major determinant of k cat , whereas the P1 residue exerted a less pronounced effect on ground state binding of substrate peptides. Incorporation in the P1 position of residues that reduced cleavage of the resulting peptide to undetectable levels (at least 100-fold) resulted in unaltered or only up to 8-fold decreased affinities, as judged from the respective K i values. Interestingly, this turned out to be true even for residues that were expected to cause significant steric or conformational perturbations such as phenylalanine or proline. This behavior sheds some light on the mechanisms of substrate recognition by the NS3 protease: apparently ground state binding of the substrate is mediated by multiple interactions involving distal residues, 2 R. De Francesco unpublished observations.  2. Schematic view of the model of the S1 pocket of the NS3 protease. Solid lines indicate the backbone of residues flanking the S1 pocket. Cysteine is docked into the pocket with its side chain pointing in the direction of Phe 213 . Cys, Thr, and Val are shown in the conformations most commonly found in proteins.
whereas the efficiency with which the bound substrate will proceed through the transition state is strongly influenced by the nature of the residue in the P1 position. This dual requirement is probably needed to endow the enzyme with the high degree of specificity necessary to accomplish its physiological role of mediating the generation of the mature HCV replication machinery.
In the P6 position a conserved negative charge is present in all cleavage sites. From our data it appears that, at least for the NS4A/NS4B junction there is a preference but no stringent requirement for this negative charge. This finding is in agreement with what has been found by others through introduction of point mutations in polyprotein substrates (22,23). In fact, it has been reported that, in the context of different cleavage sites, extensive mutagenesis of the P6 position has little if any effect on cleavage efficiency. The absolute conservation of this residue especially in the light of the pronounced variability of the HCV genome might therefore indicate that it serves some more subtle function.
We attempted to identify additional crucial determinants of recognition by the protease by both "classical" alanine scanning and by "inverse alanine scanning" using a polyalanine substrate with 4 non-alanine positions. Re-introduction of wild type sequence residues in the minimalist polyalanine substrate causes a gradual but apparently, synergistic increase in cleavage efficiencies. Thus, re-introduction of the wild type P5 and P4Ј residues singularly into the minimalist substrate results in 2-2.5-fold cleavage enhancement, whereas pairwise re-introduction of both residues results in a more than 20-fold increase in k cat /K m .
The picture emerging from this study points to a rather permissive substrate binding site, where the only absolute requirement for cleavage is a small, hydrophobic P1 residue. Several minor contributions arising from contacts with distal residues then cooperate in modulating substrate recognition and cleavage. Our findings can be rationalized in the context of the recently solved structure of the NS3 protease (20,21): the substrate binding channel is relatively solvent exposed, with all the contributing loops being shorter or absent in comparison with the other serine proteases of similar fold, like chymotrypsin or elastase. Moreover, substrate modelling in the active site suggested that major binding contributions should come from the P6-P2 peptide backbone, with an apparent lack of P5-P2 side chain to enzyme interactions (21). Overall, it appears that a deeper understanding of how substrate recognition by NS3 is fine-tuned, possibly by the use of combinatorial peptide libraries spanning several simultaneous mutations, will be necessary to guide the design of substrate-based inhibitors of NS3.