Biosynthesis, Purification, and Substrate Specificity of Severe Acute Respiratory Syndrome Coronavirus 3C-like Proteinase

The 3C-like proteinase of severe acute respiratory syndrome (SARS) coronavirus has been proposed to be a key target for structural-based drug design against SARS. In order to understand the active form and the substrate specificity of the enzyme, we have cloned, expressed, and purified SARS 3C-like proteinase. Analytic gel filtration shows a mixture of monomer and dimer at a protein concentration of 4 mg/ml and mostly monomer at 0.2 mg/ml, which correspond to the concentration used in the enzyme assays. The linear decrease of the enzymatic-specific activity with the decrease of enzyme concentration revealed that only the dimeric form is active and the dimeric interface could be targeted for structural-based drug design against SARS 3C-like proteinase. By using a high pressure liquid chromatography assay, SARS 3C-like proteinase was shown to cut the 11 peptides covering all of the 11 cleavage sites on the viral polyprotein with different efficiency. The two peptides corresponding to the two self-cleavage sites are the two with highest cleavage efficiency, whereas peptides with non-canonical residues at P2 or P1′ positions react slower. The P2 position of the substrates seems to favor large hydrophobic residues. Secondary structure studies for the peptide substrates revealed that substrates with more β-sheetlike structure tend to react fast. This study provides a basic understanding of the enzyme catalysis and a full substrate specificity spectrum for SARS 3C-like proteinase, which are helpful for structural-based inhibitor design against SARS and other coronavirus.

The outbreak of a severe atypical pneumonia in early 2003 has caused 8422 cases and 916 related deaths. The World Health Organization has designated the illness as severe acute respiratory syndrome (SARS). 1 A novel form of coronavirus has been identified as the major cause of SARS (1,2). The genome of SARS coronavirus has been sequenced within a short period of time after confirmation of the virus (3,4). Currently, 23 genome sequences of different variations of SARS coronavirus have been released at the NCBI web site (www.ncbi.nlm.nih. gov/). Coronaviruses are members of positive-stranded RNA viruses featuring the largest viral RNA genomes up to date. The SARS coronavirus replicase gene encompasses two overlapping translation products, polyproteins 1a (ϳ450 kDa) and 1ab (ϳ750 kDa), which are conserved both in length and amino acid sequence to other coronavirus replicase proteins. Polyproteins 1a and 1ab are cleaved by the internally encoded 3C-like proteinase to release functional proteins necessary for virus replication. The SARS 3C-like proteinase is fully conserved among all of the released SARS coronavirus genome sequences and is highly homologous with other coronavirus 3C-like proteinase.
Two crystal structures of coronavirus 3C-like proteinase from transmissible gastroenteritis virus (TGEV) (5) and human coronavirus (hCoV) 229E have been solved (6). The structure of coronavirus 3C-like proteinase contains three domains. The first two domains form a chymotrypsin fold, which is responsible for the catalytic reaction, and the third domain is ␣-helical with unclear biological function. Coronavirus 3C-like proteinase shares the chymotrypsin fold part with the 3C proteinases from other viruses like rhinovirus called picornavirus (7,8). The 3C proteinase of rhinovirus has been used as a target to develop drugs against the common cold (9 -15). Because of the functional importance of SARS 3C-like proteinase in the viral life cycle, it has been proposed to be a key target for structural-based drug design against SARS (6). Homology modeling for the SARS 3C-like proteinase has been performed by various groups (6,16,17), and the conformational flexibility of the substrate-binding site has been studied (17). Virtual screening of chemical compounds libraries has given possible inhibitors (16). An 8-mer peptide has been docked on the model of SARS 3C-like proteinase to study the possible interactions of the protein and the substrate (18) Similar to other coronaviruses, a sequence analysis revealed 11 cleavage sites of the 3C-like proteinase on the SARS polyprotein. The substrate specificity of coronavirus 3C-like proteinase is determined mainly by the P1, P2, and P1Ј positions (19). The P1 position has a well conserved Gln residue, and the P2 position has a hydrophobic one. Unlike other previously identified coronavirus 3C-like proteinases, which have Leu/Ile at position P2, SARS 3C-like proteinase also tolerates Phe, Val, and Met residues at P2 position. To study the substrate specificity of SARS 3C-like proteinase, we have cloned, expressed, and purified the protein and studied its activity toward 11 peptides covering the 11 cleavage sites on the virus polyprotein. Our results confirm that purified SARS 3C-like proteinase is active toward substrate peptides mapped from the cleavage sites on the polyprotein and reveals the substrate requirement of the proteinase-binding site. This study helps to understand the mechanism of SARS polyprotein process and provides clues for drug design.

EXPERIMENTAL PROCEDURES
Cloning of SARS-CoV 3C-like Proteinase-The reverse transcriptional mixture of SARS-CoV RNA from supernatant fluid of the virusinfected Vero cells using random primers was generously supplied by Dr. Y. Lu from Zhejiang Provincial Center for Disease Prevention and Control. For cloning of the cDNA of 3C-like proteinase of the virus, the first stand cDNA mixture was subjected to PCR amplification using a pair of specific primers, comprising F99 (5Ј-AGT GGT TTT AGG AAA ATG GCA TTC CC-3Ј) and R108 (5Ј-TTG GAA GGT AAC ACC AGA GC-3Ј) to amplify a 917-bp fragment containing full-length 3C-like proteinase coding sequence. The PCR products were purified by agarose gel electrophoresis and then cloned directly into pGEM T Easy vector (Promega, Madison, WI). The resultant sequence confirmed that the amplified fragment was the same as that of SARS-CoV 3C-like proteinase.
Expression of 3C-like Proteinase-pET 3CLP-21x was transformed into Escherichia coli BL21(DE3) cells. Cultures were grown at 37°C in 1 liter of LB medium-containing ampicillin (100 g/ml) until the A 600 reached 0.8 and then induced with 0.5 mM isopropyl-1-thio-␤-D-galactopyranoside at 30°C for 3 h. The cells were harvested by centrifugation at 5000 ϫ g for 10 min. The pelleted cells were suspended in buffer A (40 mM Tris-HCl, pH 8.0, 100 mM NaCl, 10 mM imidazole, 7.5 mM 2-mercaptoethanol), at 2% of the original culture volume. After cell lysis by ultrasonic, the cell lysate was separated by centrifugation at 24,000 ϫ g for 20 min. The filtrated supernatant was applied to a nickel-nitrilotriacetic acid column (Qiagen) equilibrated by 50 ml of buffer A. After being washed in 100 ml of buffer A, the 3C-like proteinase was eluted with the gradient of 1-100% buffer B (40 mM Tris-HCl, pH 8.0, 100 mM NaCl, 250 mM imidazole, 7.5 mM 2-mercaptoethanol). The eluted enzyme was concentrated and loaded on a gel filtration column Sephacryl S-200 HR (Amersham Biosciences) equilibrated by 180 ml of buffer C (40 mM Tris-HCl, pH 8.0, 100 mM NaCl, 7.5 mM 2-mercaptoethanol). After elution with another 180 ml buffer C, we received over 95% purified 3C-like proteinase.
Analytic Gel Filtration-The aggregation state of the SARS 3C-like proteinase was analyzed using a Superdex 75 HR column (Amersham Biosciences) on Ä KTA fast protein liquid chromatography. Freshly purified protein was diluted to 4 and 0.2 mg/ml and equilibrated at room temperature for 2 h. 400 l of 4-and 2-ml 0.2 mg/ml samples were injected into the Superdex 75 HR column and eluted with the buffer (40 mM Tris-HCl, pH 8.0, 100 mM NaCl, 7.5 mM 2-mercaptoethanol) at a flow rate of 0.5 ml/min. The eluted peaks were monitored at 280 nm on fast protein liquid chromatography.
CD Spectra-All of the CD spectra of the proteinase and the substrate peptides were recorded on a Jobin Yvon CD 6 spectrometer at 20°C. The CD spectra of 3C-like proteinase were recorded in 40 mM Tris-HCl buffer, pH 8.0. For near-UV CD spectrum, a cell with a path length of 1 mm was used and the proteinase concentration is 544 M, whereas a cell with a path length of 0.1 mm and 54.4 M of proteinase solution was used for far-UV CD spectrum. The substrate peptides were solved in 20 mM Tris-HCl buffer, pH 7.3, and the final concentration was 2 mM. A cell with a path length of 0.1 mm was used. Each spectrum was the average of four scans corrected by subtracting a spectrum of the buffer solution in the absence of proteinase/peptide recorded under identical condition. Each scan in the range of 184 -260 nm for far-UV CD and of 250 -320 nm for near-UV CD spectra was obtained by taking data points every 0.5 nm with integration time of 1 s and a 2-nm bandwidth. Thermal denaturation spectrum was recorded by CD at 218 nm using the same condition for far-UV CD spectrum from 10 to 90°C with an interval of 0.5°C. Secondary structure content was calculated using the program VARSLC1 (20) Synthesis of Substrate Peptides-The substrate peptide S01 was synthesized by solid-phase peptide synthesis using standard Fmoc (N-(9-fluorenyl)methoxycarbonyl)/tert-butyl strategy (21). The cleavage of the peptide from Rink resin and removal of all of the side-chain protecting groups were achieved in trifluoroacetic acid solution. The crude peptide was purified by reversed-phase high performance liquid chromatography (RP-HPLC, LabPrep System, Gilson) on a Vydac C18 semipreparative column (218TP510, 10 by 250 mm, Vydac) with gradients of water/acetonitrile containing 0.1% trifluoroacetic acid. Peptide homogeneity and identity were analyzed by analytical HPLC and matrixassisted laser desorption/ionization time-of-flight mass spectroscopy (MALDI-TOF MS), respectively. Other substrate peptides of HPLC purity from S02 to S11 were purchased from GL Biochemistry Ltd. (Shanghai, China).
Peptide Cleavage-The proteolysis activity of the SARS 3C-like proteinase was determined by peptide cleavage assay. Peptide S01 (Table  I) was used as substrate and was incubated with the enzyme in Tris-HCl buffer, pH 7.3, at room temperature. The cleavage mixture was analyzed by RP-HPLC. To verify the cleavage site on the substrate peptide, the two products were purified by semi-preparative RP-HPLC using a 15-min 0 -50% linear gradient of acetonitrile in 0.1% trifluoroacetic acid and lyophilized. The relative molecular weights of the products were identified by MALDI-TOF MS (BIFLEX III time-of-flight mass spectrometer, Bruker).
The relative enzyme activity at different pH values was determined in citric acid/phosphate buffer (pH 5, 6, 7, and 8) or glycine/NaOH buffer (pH 9 or 10) containing 6.8 mM dithiothreitol, 2 mM S01 as substrate, and 2.14 M SARS 3C-like proteinase with a final volume of 50 l. The cleavage reaction was stopped after 20 min by the addition of 50 l of 0.1% trifluoroacetic acid aqueous solution and analyzed by RP-HPLC (LabPrep System, Gilson) on a Zorbax C18 analytic column (4.6 ϫ 250 mm, Agilent). Cleavage products were resolved using a 15-min 0 -50% linear gradient of acetonitrile in 0.1% trifluoroacetic acid.
To determine the k cat /K m for the substrate, 0.2 mM of substrate peptide was incubated with SARS 3C-like proteinase in 40 mM Tris-HCl buffer, pH 7.3. The concentration of the enzyme varied from 0.90 to 22.5 M because of the different cleavage activity to different substrates. Reaction aliquots were removed at different times within 7 h and analyzed by RP-HPLC as described above. k cat /K m was determined by plotting substrate peak area as Equation 1, where PA is the peak area of the substrate peptide, c E is the total concentration of 3C-like proteinase, and C is an experimental constant. K m and k cat of the proteinase for selected substrates were determined by incubation of the substrate peptide at different concentration varying from 2 to 0.1 mM with SARS 3C-like proteinase in 40 mM Tris-HCl buffer, pH 7.3, for 20 min and analyzed by RP-HPLC as described above. The concentration of the enzyme varied from 1.07 to 17.1 M because of the different cleavage activity to different substrate. Peak areas were calculated by integration and converted to absolute units by using peptide standards. The reaction rate was calculated for all of the cleavage products of two experiments and averaged. K m and k cat were calculated by the Lineweaver-Burk plot.

Biosynthesis, Purification, and Secondary Structures of Recombinant SARS CoV 3CL
Proteinase-The C-terminal His tagged SARS 3C-like proteinase has been successfully expressed in E. coli and purified. Induction was first done at 37°C. As a result, the majority of 3C-like proteinase can be found in the insoluble fraction of the cell lysate. Induction with isopropyl-1-thio-␤-D-galactopyranoside was then done at 30°C for 3 h. As a result, most of the 3C-like proteinase was found in the soluble fraction. The protein was purified by nickel column followed by gel filtration on a Sephacryl S-200 HR column (Fig.  1). Approximately 10 mg of purified protein can be obtained from 1-liter cells. The protein can be concentrated to 10 mg/ml in 50 mM Tris-HCl, 0.1 M NaCl, and 1 mM dithiothreitol, pH 7.3.
The CD spectra of SARS 3C-like proteinase were shown in Fig. 2. Far-UV and near-UV CD spectra show that the purified protein has well defined secondary and tertiary structures. Far-UV CD spectrum (Fig. 2a) shows a positive peak at 196 nm and two negative peaks at 209 and 222 nm, respectively, which clearly indicates for a mixed ␣ and ␤ structure. Calculated secondary structure content shows 26% of ␣ -helix and 23% of ␤ -sheet, similar to those in the TGEV 3C-like proteinase crys-tal structure (22 and 26%, respectively). Near-UV CD spectrum (Fig. 2b) shows a broad positive peak at ϳ280 nm and a small positive shoulder at 291 nm, indicating a well folded tertiary structure. Thermal melting of the protein by monitoring the CD signal at 218 nm (Fig. 2c) gives a sigmoid denaturation curve and a T m of 61°C, indicating a highly cooperative thermodenaturation.
Aggregation State and Enzyme Activity-Because both the crystal structures of the 3C-like proteinase in TGEV and human coronavirus give dimer structure and the residues at dimeric interface are conserved in coronavirus, it has been proposed that the dimer may be the biological functional form of the protein (5, 6). Dynamic light-scattering experiment shows that both hCoV 229E and TGEV 3C-like proteinases exist as a mixture of monomer (65%) and dimer (35%) at a concentration of 1-2 mg/ml (6). We have performed analytical gel filtration of SARS 3C-like proteinase at different concentrations using a Superdex 75 column (Fig. 3). At the concentration of 4 mg/ml, two peaks corresponding to the monomeric and dimeric form of the protein appeared while only one peak corresponding to the monomeric form was found at a lower concentration of 0.2 mg/ml. The dissociation constant of the dimer was estimated to be around 100 M. In vitro peptide or protein cleavage assays of coronavirus 3C-like proteinase usually were performed with a protein concentration at micromolar level (22,23). In this study, we also used a peptide cleavage assay with the SARS 3C-like proteinase concentration not Ͼ22.5 M that corresponds mainly to the monomer form of the enzyme. This raises an interesting question whether the minor amount of the dimer plays a major role in catalysis, although we are still not clear about the exact form of the protein in the cell under a molecular crowding environment.
To answer this question, the proteolysis activity of SARS CoV 3C-like proteinase at different enzyme concentrations was studied. Peptide S01 was used as substrate and was incubated with the proteinase in Tris-HCl buffer, pH 7.3, at room temperature. The cleavage mixture was analyzed by RP-HPLC. The peak area of the substrate S01 decreases as reaction time increases, whereas the area of two newly formed peaks increases. The products were collected and identified by MALDI-TOF MS with experimental (M ϩ H) ϩ of 593 and 618, respectively, that were identical to the theoretical (M ϩ H) ϩ of the C-terminal pentapeptide and N-terminal hexapeptide. This confirms that the substrate is cleaved by the SARS CoV 3C-like proteinase at the predicted fragile Gln-Ser peptide bond.
The observed k cat /K m was determined as described under "Experimental Procedures" at different proteinase concentrations between 4.5 and 0.9 M (shown in Fig. 4). The observed k cat /K m increases linearly with the increase of the enzyme concentration. Using the estimated dissociation constant of ϳ100 M, we can deduce from the linear dependence that the monomeric form of SARS CoV 3C-like proteinase almost has no catalytic activity, whereas the k cat /K m of the dimeric form is ϳ1.4 ϫ 10 3 mM Ϫ1 min Ϫ1 . This result supports the previous prediction that the dimer of the 3C-like proteinase is the active form of the enzyme and also explains that the low in vitro activity of CoV 3C-like proteinases compared with other monomeric virus 3C proteinases is because of the low concentration of the active dimeric form under the assay conditions. Substrate Specificity-Eleven 11-mer peptides derived from the 11 cleavage sites of 3C-like proteinase upon the SARS polyprotein have been synthesized and tested for SARS 3C-like proteinase cleavage experiment (Table I). The proteolysis ac-tivity of expressed SARS CoV 3C-like proteinase at different pH was tested as described under "Experimental Procedures." When peptide S01 was used as substrate as shown in Fig. 5, the proteinase had the highest activity at around pH 7 and decreased at another pH, resulting a bell-shaped curve. When pH decreased to 5, the proteolytic activity of the enzyme decreased to 10% because of the protonation of the catalytic His-41.
To investigate the relationship between the proteolysis activity and the secondary structure of the substrates, far-UV CD spectra were recorded for all of the peptides at 2 mM (for S09 the peptide concentration was 0.2 mM because of its low solubility) (see Fig. 6a). The secondary structure contents were calculated using VARSLC1 (20) with the CD data in the range of 184 -260 nm for each peptide with the exception of S09 (which has very low solubility) and summarized in Table II.
Although not obvious in the CD spectra, each peptide forms more or less sheet structures. The calculated sheet content reflects the formation of a ␤-strandlike extended conformation in equilibrium with other turnlike or random conformations. Fig. 6b shows an interesting tendency in which substrate peptide with small k cat /K m has relatively less content of sheet but more of other unordered structure, suggesting that the formation of ␤-strandlike extended structure is favored for binding to the enzyme and proteolysis. As shown in the crystal structure of TGEV 3C-like proteinase, residues P5 to P3 of the substrate form an antiparallel ␤-sheet with segment 164 -167 of the long strand eII of the enzyme on one side and with segment 186 -191 on the other side (6). The binding modes of substrate to different coronavirus 3C-like proteinases were predicted to be identical by comparing the crystal structure of the substrate-binding regions of the free proteinases of hCoV and SARS CoV and of TGEV proteinase in complex with a hexapeptidyl chloromethyl ketone inhibitor (6). Substrate peptide that tends to form an extended ␤-strandlike conformation would be the preferred choice to bind with the enzyme, resulting in a small K m and a large k cat /K m . It is also supported by a previous study (25) on 3C proteinase in which helix-destabilizing residues were often found in close proximity to cleavage sites.
Of all of the 11 peptides tested, S01 and S02 were the most suitable substrates for SARS CoV 3C-like proteinase cleavage, which were derived from the N-terminal and C-terminal selfcleavage sites, respectively. K m of S01 was determined as 1.15 Ϯ 0.28 mM, which is three times larger than that of hCoV 229E 3C-like proteinase for a 15-mer substrate, 0.39 Ϯ 0.07 mM. The relative large K m was counteracted by a large k cat , 12.2 Ϯ 2.9 min Ϫ1 , which is an order larger than that of the five other substrates whose k cat and K m values are also determined by the Lineweaver-Burk plot ( Table I).
Most of the cleavage sites of coronavirus polyprotein have a conserved (Leu/Ile)-Gln2 (Ser, Ala, or Gly) core sequence (the cleavage site is indicated by 2). However, the SARS coronavirus polyprotein has three noncanonical cleavage sites with Phe, Val, or Met in the P2 position and one noncanonical cleavage site with Asn in the P1Ј position. Here, peptides S02, S03, and S07 are derived from the three noncanonical cleavage sites at P2 and peptide S05 is derived from the one noncanonical cleavage site at P1Ј. With the dipeptide sequence of Asn-Asn at positions P1Ј and P2Ј, peptide S05 has relatively low catalytic efficiency, which is the second lowest one among all of the 11 peptides. Peptides S02 and S03 have a Phe and a Val, respectively, in the position P2 and have similar cleavage activity as other substrates with Leu in P2 position, indicating that Phe or Val can also be well fitted in the large hydrophobic S2 pocket so that the substitutions are tolerated for 3C-like proteinase cleavage. The substitution by a relative large hydrophobic amino acid Phe did not affect the substrate binding, whereas On the other hand, peptide S07, which has a methionine in the P2 position, is the substrate with least cleavage activity among 11 substrates. The low cleavage activity of S07 may be caused by the mutant from Leu to Met, although it may also be affected by other reasons such as different secondary structures. As shown in the far-UV CD spectrum, peptide S07 may adopt an unusual helix-like, type II polyproline conformation, which has a strong negative peak at ϳ200 nm and a positive peak at ϳ230 nm. The formation of the polyproline helix will disturb the conformation transition to ␤-strandlike-extended structure of the substrate peptide and disable the binding of the substrate to the enzyme. It is also supported by the factor that this peptide has the least content of sheet.
Hegyi and Ziebuhr (19) have studied the conservation of substrate specificities among coronavirus 3C-like proteinase from three coronavirus by using peptides corresponding to the four cleavage sites on the viral polyprotein. Our findings that the two peptides S01 (P1/P2) and S02 (P2/P3) corresponding to the two self-cleavage sites of the SARS 3C-like proteinase are the two most reactive ones and S05 (P5/P6) is less reactive are comparable to their results. This implies that SARS 3C-like proteinase follows the same substrate specificity rules governing all of the coronavirus 3C-like proteinase. As we have stud-ied, the substrate specificities of SARS 3C-like proteinase using peptides covering all of the 11 cleavage sites try to correlate not only the conserved sequences but also secondary structures of FIG. 5. Enzyme activity of SARS CoV 3C-like proteinase at different pH values. The proteolysis activity of expressed SARS CoV 3C-like proteinase at different pH values was determined in citric acid/phosphate buffer (pH 5, 6, 7, and 8) or glycine/NaOH buffer (pH 9 or 10) containing 6.8 mM dithiothreitol, 2 mM S01 as substrate, and 2.14 M 3C-like proteinase.
FIG. 6. The relationship of secondary structure contents in substrate peptides with cleavage efficiency. a, far-UV CD spectra of peptide substrates. ࡗ, S01; G, S02; OE, S05; , S07; and f, S11. b, calculated content of secondary structure versus k cat /K m profile. This figure shows an interesting tendency of which substrate peptide with small k cat /K m has relatively less content of sheet (f) but more content of unordered structure (E), suggesting that the formation of ␤-strandlike extended structure is favored for binding to the enzyme and proteolysis. the substrate peptides. The full spectrum of substrate specificity for SARS 3C-like proteinase studied here can be extended to understand other coronavirus 3C-like proteinases.
In summary, the recombinant SARS 3C-like proteinase has been successfully cloned and expressed. The purified protein exists in a mixture of monomer and dimer at a concentration of 4 mg/ml but mostly monomer at 0.2 mg/ml. The specific activity of the enzyme decreases linearly with the decrease of enzyme concentration implying that only the dimeric form is active and that the dimeric interface could be targeted for structural based drug design against SARS 3C-like proteinase. The enzyme can cut the 11 peptides covering all of the 11 cleavage sites on the viral polyprotein with different efficiency. Substrates with a more ␤-sheetlike structure tend to react fast. The P2 position of the substrates seems to favor large hydrophobic residues. This study provides basic understandings of the enzyme catalysis and substrate specificity for SARS 3C-like proteinase and helpful information for structurally based inhibitor design.  S01  5  35  27  32  S02  3  42  24  31  S03  2  25  28  46  S04  2  24  30  45  S05  10  19  27  44  S06  2  25  30  43  S07  1  18  40  41  S08  5  21  29  45  S10  4  22  33  41  S11  2  30  32  36