Critical Assessment of Important Regions in the Subunit Association and Catalytic Action of the Severe Acute Respiratory Syndrome Coronavirus Main Protease*

The severe acute respiratory syndrome (SARS) coronavirus (CoV) main protease represents an attractive target for the development of novel anti-SARS agents. The tertiary structure of the protease consists of two distinct folds. One is the N-terminal chymotrypsin-like fold that consists of two structural domains and constitutes the catalytic machinery; the other is the C-terminal helical domain, which has an unclear function and is not found in other RNA virus main proteases. To understand the functional roles of the two structural parts of the SARS-CoV main protease, we generated the full-length of this enzyme as well as several terminally truncated forms, different from each other only by the number of amino acid residues at the C- or N-terminal regions. The quaternary structure and Kd value of the protease were analyzed by analytical ultracentrifugation. The results showed that the N-terminal 1–3 amino acid-truncated protease maintains 76% of enzyme activity and that the major form is a dimer, as in the wild type. However, the amino acids 1–4-truncated protease showed the major form to be a monomer and had little enzyme activity. As a result, the fourth amino acid seemed to have a powerful effect on the quaternary structure and activity of this protease. The last C-terminal helically truncated protease also exhibited a greater tendency to form monomer and showed little activity. We concluded that both the C- and the N-terminal regions influence the dimerization and enzyme activity of the SARS-CoV main protease.

length, and the organization is similar to that of other coronaviruses. The replicase gene encodes two overlapping polyproteins, polyprotein 1a (ϳ450 kDa) and polyprotein 1ab (ϳ750 kDa). The polyproteins are cleaved by the internally encoded main protease (M pro , 3CL), which is required for the production of new infectious viruses. The main protease represents an attractive target for the development of novel anti-viral agents due to the functional importance of this enzyme in the viral life cycle (9 -12).
The crystal structures of reported CoV, including the SARS-CoV main proteases, are homodimers (13)(14)(15). Each protomer of the main protease is composed of three structural domains (Fig. 1). The first two domains of the SARS-CoV main protease have an antiparallel ␤-barrel structure, which is similar to the other CoV proteases, and form a chymotrypsin-like fold responsible for catalytic reactions (15). The active site containing a catalytic dyad defined by His-41 and Cys-145 is located between domains I and II. The third domain contains five ␣-helices with an unclear biological function. The domain III of one protomer and the domain II of another form a contacting region in the dimer. The N terminus (N-finger containing amino acid residues 1-7) is seated at this region and plays an important role in dimerization (16).
We have demonstrated that the major quaternary structure of SARS-CoV main protease at neutral pH is a dimer, which is the catalytically competent form (17). It is ultimately important to understand the factors that control dimerization, as the dissociated monomer could be enzymatically inactive. The Cterminal helical domain interacts with the active site of another protomer in the dimer and switches the enzyme molecule from the inactive form to the active form (18). The structural and biochemical data also show that the N-terminal residues 1-7 play an important role in the dimerization and formation of the active site of SARS main protease (15). N-terminal truncation of the whole N-finger, ⌬(1-7) results in almost complete loss of enzymatic activity (19).
In this report, we studied critically the functional role of the N and C termini by serial truncations. We report the stability and structure-function relationship of the full-length SARS-CoV main protease in comparison with the various truncated forms. Our results demonstrate that both N-and C-terminal regions are involved in the enzyme activity as well as in dimerization. We have narrowed down the critical amino acid residues to the fourth amino acid residue of the N-terminal and the last helical amino acids of the C-terminal region as those involved in dimerization to give a correct conformation of the active site. were kindly provided by Dr. Shao-Hung Wang (Genome Research Center, National Yang-Ming University, Taipei, Taiwan). The genes of the full-length SARS-CoV main protease were amplified by polymerase chain reaction with appropriate primers. The forward primer for the full-length WT SARS-CoV main protease was 5Ј-GGTGGTCATAT-GAGTGGTTTTAGG-3Ј, and the reverse primer was 5Ј-AACTCGAGGG-TAACACCAGAG-3Ј. After digestion with BglII and XhoI, the PCR product was cut into two fragments, 168 and 747 bp. The 168-bp fragment was then digested with NdeI. Finally, the 168-bp NdeI-BglII and 747-bp BglII-XhoI fragments were co-ligated to the NdeI and XhoI sites of the vector pET-29a(ϩ) (Novagen, Madison, WI).
The pET-SARS-CoV main protease vector was used as the template. The DNA polymerase Pfu (Promega, Madison, WI) extended and incorporated the mutagenic primers in the process of PCR. After 16 -18 temperature cycles, the N-terminally truncated plasmid containing staggered nicks was generated. The PCR products were then treated with DpnI (New England Biolabs, Beverly, MA) to digest the template. Finally, the vector containing the protease cDNA with the desired mutation was transformed into Escherichia coli. The C-terminally truncated proteases were subsequently amplified by PCR using the following sequences. The forward primer, 5Ј-TGAAGATCTGCTCATTCGCA-A-3Ј; reverse primer of ⌬(293-306), 5Ј-AACTCGACTGTAAACTCATC-TTC-3Ј; reverse primer of ⌬(201-306), 5Ј-AACTCGACTATGGTTGTG-TCTG-3Ј. After digestion with BglII and XhoI, the PCR products were inserted into the BglII and XhoI sites of the pET-SARS-CoV main protease. The DNA sequences of the full-length, N-and C-terminally truncated SARS-CoV main proteases were checked by autosequencing. The recombinant SARS-CoV main protease has a His tag at the C terminus. This His tag was not removed, because earlier reports seemed to rule out the possible effect of the His tag on the dimeric structure or enzyme activity (17).
Expression and Purification of WT and N-and C-terminally Truncated SARS-CoV Main Proteases-The modified plasmids of the recombinant proteases were transformed into the E. coli strain BL21 (DE3)competent cells. The cells were grown at 37°C in Luria-Bertani medium with 50 g/ml kanamycin until the absorbance at 600 nm reached 0.8 and were then induced by 1 mM isopropyl-1-thio-␤-D-galactoside at 18°C overnight. The cells were centrifuged at 5,000 ϫ g, 4°C for 10 min. The supernatant was removed, and the pelleted cells were then suspended in binding buffer (20 mM Tris-HCl, 300 mM NaCl, and 2 mM BME, pH 7.6). The cells were sonicated for 10 min at 10-s burst cycles at 300 W with a 10-s cooling period between each burst. The cell debris was removed by centrifugation (10,000 ϫ g at 4°C for 25 min). One ml of binding buffer-equilibrated nickel-nitrilotriacetic acid slurry (Qiagen, Hilden, Germany) was then added to the soluble lysate, and the solution was mixed gently at 4°C for 50 min to equilibrium. The lysate-nickel-nitrilotriacetic acid mixture was then loaded into a col- umn and washed with the washing buffer (20 mM imidazole, 20 mM Tris-HCl, 300 mM NaCl, and 2 mM BME, pH 7.6). Finally, the protease was eluted with elution buffer (400 mM imidazole, 20 mM Tris-HCl, 300 mM NaCl, and 2 mM BME, pH 7.6). The purified protein was then concentrated at 4°C using Amicon Ultra-4 centrifugal filter units (Mil-lipore, Bedford, MA) with molecular mass cutoff at 10 kDa. The purified protein was concentrated to 5-15 mg/ml and then diluted to 0.5-3 mg/ml by 10 mM PBS (containing 10 mM sodium phosphate buffer, 150 mM NaCl, 2 mM BME, pH 7.6), which was used to replace the elution buffer over six concentration-dilution cycles. The sample from the pu-

FIG. 2. CD spectra of the full-length WT and truncated SARS-CoV main proteases.
Far-UV CD spectra of all recombinant SARS-CoV main proteases were monitored at a 0.8 mg/ml concentration in 10 mM PBS buffer (pH 7.6) at 25°C, where deg is the ellipticity in degrees.

FIG. 3. Fluorescence spectra of the full-length WT and truncated SARS-CoV main proteases.
Fluorescence emission spectra of all recombinant proteases were monitored at 8 g/ml concentration in 10 mM PBS buffer (pH 7.6) at 25°C. rification step was separated on a 4 -12% gradient sodium dodecyl sulfate polyacrylamide gel to check the homogeneity. Circular Dichroism (CD) and Fluorescence Analyses-CD experiments were performed in a Jasco J-810 spectropolarimeter (Tokyo, Japan) equipped with a Neslab RTE-111 water-circulated thermal controller. The samples were prepared in 10 mM PBS solution at pH 7.6 with a protein concentration of 0.5 mg/ml. Far-UV CD spectra from 250 to 190 nm were collected using a 0.01-cm path-length cuvette with a 0.1-nm spectral resolution at 25°C. Ten independent scans were averaged for each sample. All spectra were corrected for buffer contributions and converted to mean residue ellipticity ([⌰]). The [⌰] at each wavelength was calculated from Equation 1,

TABLE I Structural characteristics of the recombinant SARS-CoV main proteases in crystal and solution
where MRW is the mean residue weight (a value of 111.3 was used for the WT), is the measured ellipticity in degree at wavelength , l is the cuvette path length (0.01 cm), and c is the protein concentration in g/ml. The secondary structure analysis was performed by DICHROWEB (20,21), which provides an interactive web site server allowing the deconvolution of data from circular dichroism spectroscopy experiments (public-1.cryst.bbk.ac.uk/cdweb/html/). DICHROWEB offers several important types of software, such as CDSSTR (22)(23)(24), CONTINLL (25,26), SELCON3 (27,28), and K2D (29). Thermal stability of the WT and various truncated mutants were analyzed with a spectropolarimeter by monitoring the 222-nm circular dichroism at the temperature range between 30 and 90°C, and the temperature at which half of the protein molecules were unfolded was recorded (T m ).
Fluorescence experiments were performed in a PerkinElmer Life Sciences 50B luminescence spectrometer (Beaconsfield, Backinghamshire, England). The sample was prepared in 10 mM PBS solution at pH 7.6 with a protein concentration of 8 g/ml. The fluorescence emission spectra from 300 to 400 nm were collected after excitation at 280 nm. Fluorescence spectra of proteins were determined with a 1-cm path quartz cuvette at 25°C. The spectral bandwidth was 5 nm for excitation and 10 nm for emission. All spectra were corrected for the buffer contribution. The average emission wavelength (͗͘) was calculated from Equation 2 (30), where F is the fluorescence intensity and is the wavelength.
Analytical Ultracentrifugation Analysis-The molar mass and sedimentation coefficient of the proteases were analyzed by a sedimentation velocity experiment. It was performed on a Beckman Optima XL-A analytical Ultracentrifuge (Fullerton, CA). Prior to the experiments, the sample was diluted to various protein concentrations with 10 mM PBS buffer at pH 7.6. Sample (400 l) and buffer (440 l) solutions were loaded into the double sector centerpiece separately. All experiments were carried out at 20°C with an An50 rotor at the speed of 42,000 revolutions/min. where a(r,t) denotes the experimentally observed signal, L(s,D,r,t) denotes the solution of the Lamm equation for a single species (35), and ⑀ is the noise component.
For a precise determination of the monomer-dimer equilibrium of the SARS-CoV main protease, the sedimentation velocity experiment was performed at three different protein concentrations, and all sedimentation data were subjected to the monomer-dimer equilibrium model fitting. The partial specific volume of the protease, solvent density, and viscosity were calculated by the software program SEDNTERP (36). The dissociation constant (K d ) was calculated by the global modeling of the SEDPHAT program (33). Enzymatic Activity Assay of the SARS-CoV Main Protease Using a Fluorogenic Substrate-The kinetic measurements of the SARS-CoV main protease activity were performed in 10 mM PBS with 2 mM BME at 30°C. The reaction was initiated by adding 12 g of WT and 1.5-2.0 mg of mutants in 1 ml of reaction mixture. Enhanced fluorescence due to cleavage of the internally quenched fluorogenic substrate peptides (ortho-aminobenzoic acid-TSAVLQSGFRK-2,4-dinitrophenylamide) by protease was monitored at 420 nm with excitation at 362 nm using a PerkinElmer Life Sciences 50B luminescence spectrometer.
The mixture containing N-terminal peptides (ortho-aminobenzoic acid-TSAVLQ) and C-terminal peptides (SQFRK-2,4-dinitrophenylamide) of different concentrations were prepared to monitor the specific fluorogenic intensity. All intensities were corrected for the buffer contributions. The serial intensities at 420 nm were used to construct a standard curve for quantifying the product. In this way, the enzyme activity could be precisely determined.

Expression, Purification, and Characterization of the Recombinant SARS-CoV Main Proteases-
The full-length and truncated SARS-CoV main proteases have been successfully expressed in E. coli and purified by a single affinity column. All the recombinant SARS-CoV main proteases were found in the soluble fraction of the cell lysate. The expressed SARS-CoV main proteases bound to the nickel column, but other proteins flowed through the column as refuse. SDS-PAGE analysis indicated that the recombinant proteins were almost homogeneous in solution. All purified proteins had M r in agreement with the theoretical values. After concentration, 5-15 mg/ml recombinant main protease could be obtained from 200 ml of cells. Unfortunately, the other C-terminally truncated mutants were not successfully expressed, probably because of their instabilities.
To determine the secondary structure of the successfully expressed and purified recombined main proteases, far-UV CD spectra were recorded. The overall CD spectra were shown in Fig. 2. The spectra of all recombinant SARS-CoV main proteases seemed to be similar, except that of ⌬(201-306) as anticipated (Fig. 1A). These results indicated that the proteins have a well defined secondary structure. This is reflected in the secondary structural estimation by the DICHROWEB server (20,21). The CDSSTR analysis is shown in Table I. The normalized root mean square deviation values of the data fitting for WT and various truncations were all Ͻ0.2 and thus showed excellent goodness-of-fit parameters (37). The analysis of the secondary structure of the full-length SARS-CoV main protease is consistent with the data derived from the crystal structure, 1uk3 ( Table I). The helical contents of the recombinant WT and crystal structure (1uk3) were 0.21 and 0.22, respectively. All truncations showed similar results with full-length main protease, except that ⌬(201-306) had significant low ␣-helix content, which was in agreement with the structure in which the whole helical domain III was deleted. The thermal stability of the recombinant SARS-CoV main proteases was also examined (Table I). Among the truncated SARS-CoV main proteases, the C-terminally truncated protease ⌬(293-306) has significantly lower T m than WT.
The fluorescence emission spectra of the recombinant SARS-CoV main proteases were shown in Fig. 3. Only ⌬(201-306) had significantly low fluorescence intensity. The average emission wavelengths were calculated by the method of Sanchez del Pino and Fersht (30), which accounts for both wavelength shift and the fluorescence intensity attenuation. The average emission wavelength of the full-length SARS-CoV main protease is 342 nm. With the only exception of ⌬(201-306), which has an emission wavelength 11 nm lower than WT, other truncations show minor differences (Table I).
Analytical Ultracentrifugation Analysis-Analytical ultracentrifugation was performed to investigate the association states of WT and truncated SARS-CoV main proteases. This method was successfully used to demonstrate the dimerization of SARS-CoV main protease under various conditions (17). The data were analyzed by continuous size distribution, which implemented a highly reliable model, as indicated by the homogeneous bitmap picture (Figs. 4 and 5, insets). All of these data were derived from an excellent matching curve of the original raw sedimentation data and the randomly distributed residual values (data not shown). WT protease shows a monomer-dimer equilibrium in solution. The sedimentation coefficients of 2.4 S and 4.2 S were monomer and dimer, respectively, corresponding to species with molar mass measurements of 34 and 68 kDa (17). WT and N-and C-terminally truncated proteases also displayed a mixture of monomer and dimer (Figs. 4 and 5). Deletion of three residues from the N terminus showed a similar monomer-dimer distribution with WT. The monomer became the major species when the fourth amino acid residue was deleted from the N terminus (Fig. 4). Further deletion of more residues (⌬(1-5), ⌬(1-6), and ⌬(1-7)) from the N terminus showed a similar pattern with ⌬ (1-4). In addition, we have also examined the involvement of domain III in the subunit association. Truncation of the last helix, the ⌬(293-306) mutant, caused the SARS-CoV main protease to become a monomer (Fig. 5). These results clearly indicated that residues 4 and 293-306 are critically involved in stabilizing the dimer structure.
The influence of individual residues or domains in the sub- unit interaction was further quantified by comparing the monomer-dimer dissociation constants. As mentioned above, the global analysis was employed to determine the K d value of WT and various truncated mutant proteases (Fig. 6). The K d value for monomer-dimer equilibrium of WT was measured to be 0.28 M (Table II). Sequential deletion of residues from the N terminus increased the K d value. The C-terminally truncated proteases have much higher K d values than the others.
Kinetic Properties of the Protease-The deletion mutant of the related transmissible gastroenteritis virus main protease that lacks residues 1-5 is almost enzymatically inactive (13). The crystal structure of the SARS-CoV main protease reveals that the N-terminal residues 1-7 from subunit A are directly inserted into the active site of subunit B (15). We determined the enzyme activity of the full-length, sequential N-and Cterminally truncated proteases. The enzyme activity was measured by peptide cleavage assay (17). The internally quenched fluorescent substrate is cleaved specifically at the Q-S peptide bond (38). The apparent K m value of WT measured was 17 Ϯ 1 M, and the apparent k cat value was 198 Ϯ 22 s Ϫ1 . The mutant main protease ⌬(1-3) still possesses 76% of enzyme activity as compared with WT (Table II). However, the enzyme activity of other truncated mutants was decreased to only 0.2-1.3% of WT activity.

DISCUSSION
The biophysical analyses of the SARS-CoV main protease and its truncated mutants performed here allowed a detailed structural characterization of the mutants compared with the full-length protease. In our present data, except the domain III-truncated protease, ⌬(201-306), which has lower mean residue ellipticity and fluorescence intensity, the CD and fluorescence emission spectra of other truncated proteases were similar to the full-length protease. With the secondary and tertiary structures as anticipated, we then tried to study the structure and function relationships of the N-and C-terminal regions of the SARS-CoV main protease, especially on the correlation between dimerization and enzyme activity. We used the CASTp program (39) to analyze the pockets and cavities of the enzyme. Only three of the cavities with mouths were big enough to be significant (Fig. 1C). One of these pockets was located at the interface between domains II and III with a surface area of 2.82 nm 2 and has no contact with protomer B (Fig. 1C, yellow  pocket). The two largest pockets were located at the subunit interfacial region. The N-terminal finger and C-terminal tip were sited at the pocket with a solvent-accessible surface area of 18.41 nm 2 and a volume of 2.173 nm 3 (Fig. 1C, purple  pocket). The amino acid residue contacts between subunits A and B were analyzed with the Contacts of Structural Units software (40). Phe-3 in subunit A contacts with subunit B with only one destabilizing hydrophobic-hydrophilic contact. However, Arg-4 (A) extends deeply into subunit B (Figs. 1 and 7) involving six destabilizing contacts. It is then clear that re-  c Enzyme activity assay performed with the internally quenched substrate ortho-aminobenzoic acid-TSAVLQSGFRK-2,4-dinitrophenylamide. The enzyme activities of the various truncated mutants were too small to allow a precise determination of the kinetic parameters. Only the relative enzyme activity was reported. moval of Arg-4 has great impact on the structure of the enzyme molecule.
All of the coronavirus conserved an extra large C-terminal ␣-helical domain that is not found in other RNA virus 3C-like proteases (13,15). The precise biological role of the ␣-helical domain is very interesting but still not completely understood. Our experimental data indicate that the C-terminal domain plays an important role in dimerization and enzyme activity. These results are consistent with the previous report that loss of the extra domain III of the SARS-CoV main protease induced monomer formation and loss of enzyme activity. A further step forward, we have narrowed down the region to a single ␣-helix.
A similar thermal stability curve was observed for most truncations, except for the last ␣-helically truncated protease, ⌬(293-306), which showed significantly lower T m than WT (Table I). Deletion of the whole domain III, on the other hand, restores the stability of the molecule. These results imply the functional role of domain III of the protease. The deleted last helical segment is located at a large pocket between domains II and III (Fig. 1C, yellow pocket). This helix is essential for protein stability of the whole molecule. Without this helix, the remaining domain III becomes a burden for domains I and II that renders the protease unstable.
With a complete domain III, the full-length SARS main protease was in monomer-dimer equilibrium with dimer as the major form, even in very low protein concentration (0.1 mg/ml) (17). This result is different from some of the recent reports (18,41), which showed a major monomer in a protein concentration of Ͻ0.2 mg/ml. This discrepancy is due to technical differences used in characterizing the quaternary structure of the protein. It is extremely important to obtain an unequivocal answer to this question, because the dissociated monomer might be enzymatically inactive. To study the protein self-association, we used the rebirth state-of-the-art analytical ultracentrifugation technique. The K d value determined by analytical ultracentrifugation (0.28 M), however, is ϳ360 and 810 times smaller than that previously estimated from the analytical gel filtration experiment and isothermal titration calorimeter, respectively (19,42). Our K d value for the monomer-dimer equilibrium of SARS-CoV main protease was obtained from global analysis of three different protein concentrations. Under the stringent conditions in analytical ultracentrifugation, we believe that the K d obtained is more reliable (33).
Our data showed that the N-finger of SARS-CoV main protease was indispensable for proteolytic activity, which was consistent with previous findings (19). The contribution of the N-finger in dimerization was clearly demonstrated by analytical ultracentrifugation analysis (Fig. 4). As shown in Fig. 7, the N-finger from protomer A is completely buried into protomer B (Fig. 7, A and B). We have further narrowed down to a single residue, Arg-4, which plays a pivotal role in building the molecular interaction (Fig. 4). The correlation between enzymatic activity and dimerization is also demonstrated. The fourth residue plays a critical role in dimerization of SARS-CoV main protease, which is essential for the enzymatic activity.
In conclusion, both the N-and C-terminal regions play pivotal roles in controlling the dimerization and activity of the enzyme. Our study provides fundamental information for the novel design of inhibitors of the SARS-CoV main protease activity by disrupting its dimerization interface.