Mechanism of the Maturation Process of SARS-CoV 3CL Protease

Severe acute respiratory syndrome (SARS) is an emerging infectious disease caused by a novel human coronavirus. Viral maturation requires a main protease (3CLpro) to cleave the virus-encoded polyproteins. We report here that the 3CLpro containing additional N- and/or C-terminal segments of the polyprotein sequences undergoes autoprocessing and yields the mature protease in vitro. The dimeric three-dimensional structure of the C145A mutant protease shows that the active site of one protomer binds with the C-terminal six amino acids of the protomer from another asymmetric unit, mimicking the product-bound form and suggesting a possible mechanism for maturation. The P1 pocket of the active site binds the Gln side chain specifically, and the P2 and P4 sites are clustered together to accommodate large hydrophobic side chains. The tagged C145A mutant protein served as a substrate for the wild-type protease, and the N terminus was first digested (55-fold faster) at the Gln-1-Ser1 site followed by the C-terminal cleavage at the Gln306-Gly307 site. Analytical ultracentrifuge of the quaternary structures of the tagged and mature proteases reveals the remarkably tighter dimer formation for the mature enzyme (Kd = 0.35 nm) than for the mutant (C145A) containing 10 extra N-terminal (Kd = 17.2 nm) or C-terminal amino acids (Kd = 5.6 nm). The data indicate that immature 3CLpro can form dimer enabling it to undergo autoprocessing to yield the mature enzyme, which further serves as a seed for facilitated maturation. Taken together, this study provides insights into the maturation process of the SARS 3CLpro from the polyprotein and design of new structure-based inhibitors.

Severe acute respiratory syndrome (SARS) 1 is a severe febrile respiratory illness caused by a newly identified coronavirus, SARS-associated coronavirus (SARS-CoV) (1)(2)(3)(4). In the period from February to June, 2003, SARS rapidly spread from its likely origin in southern China to 32 countries in the world. SARS-CoV belongs to a coronaviridae family that includes porcine transmissible gastroenteritis virus (TGEV), human coronavirus (HCoV) 229E, mouse hepatitis virus, bovine coronavirus, and infectious bronchitis virus (5)(6)(7). These coronaviruses are large, enveloped, positive single-stranded RNA viruses (27-31 kb) that cause respiratory and enteric diseases in humans and other animals. The SARS-CoV genome comprises about 29,700 nucleotides and encodes two overlapping polyproteins, pp1a (486 kDa) and pp1ab (790 kDa) that mediate all the functions required for viral replication and transcription (5,7). The functional polypeptides are released from the polyproteins by extensive proteolytic processing. This is primarily achieved by the 34.6-kDa main protease (M pro ) which is frequently called 3C-like protease (3CL pro ), because its substrate specificity is similar to those of picornavirus 3C proteases (8,9). SARS-CoV 3CL pro cleaves the polyproteins at eleven sites involving a conserved Gln at the P1 position and a small amino acid (Ser, Ala, or Gly) at the P1Ј position, a process initiated by enzyme's own autolytic cleavage (autoprocessing) (10).
Several crystal structures of coronavirus 3CL pro (apo form or with suicide inhibitors) reported from TGEV, HCoV 229E, and SARS-CoV (11-13) revealed a common feature in 3CL pro : two chymotrypsin-like ␤-domains (residues 1-184) and one ␣-helical dimerization domain (residues 201-303). The active site of SARS-CoV 3CL pro is located in the center of the cleft between domains I and II and includes a catalytic dyad consisting of His 41 and Cys 145. Domain III was proposed to mediate dimer formation, because the C-terminal helical domain III (residues 201-306) alone formed a tight dimer (14). However, the electron densities of the C-terminal residues (301-306) of SARS-CoV-3CL pro could not be detected in the previous structure (13).
Here, we determined the crystal structures of the wild-type and the C145A mutant 3CL pro . Unlike the previous threedimensional structure of the wild-type protease (13), the new crystal structure of C145A shows clearly visible C-terminal residues that are intercalated into the neighboring protomer ) for the core facility for protein X-ray crystallography at Academia Sinica, Taiwan. The synchrotron data collection was conducted at the Biological Crystallography Facilities (BL17B2 at National Synchrotron Radiation Research Center (NSRRC), Taiwan and Taiwan beamline BL12B2 at SPring-8, Japan) supported by the National Science Council. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.  1 The abbreviations used are: SARS, severe acute respiratory syndrome; SARS-CoV, SARS-coronavirus; TGEV, transmissible gastroenteritis virus; 3CL pro , 3C-like protease; M pro , main protease; Trx, thioredoxin; GST, glutathione S-transferase; Dabcyl, 4-(4-dimethylaminophenylazo)benzoic acid; Edans, 5-[(2-aminoethyl)amino]naphthalene-1-sulfonic acid; FXa, factor Xa; Ni-NTA, nickel nitrilotriacetic acid; AUC, analytical ultracentrifuge; IC 50 , 50% inhibitory concentration; Bicine, N,N-bis (2-hydroxyethyl)glycine; Bis-Tris, creating a product-bound structure that may resemble intermediates during autoprocessing. Autoprocessing has been known to be an essential step for viral maturation, but its detailed molecular mechanism is still hypothetical. To further understand the maturation process, we constructed a wild-type SARS-CoV 3CL pro with 10 additional amino acids that are part of the polyprotein sequence at the N and/or C termini (termed 10aa-WT, WT-10aa, and 10aa-WT-10aa), respectively. We also used analytical ultracentrifugation (AUC) to determine their quaternary structures. In addition, thioredoxin (Trx) and glutathione S-transferase (GST) tags were appended to observe their autoprocessing by SDS-PAGE analysis. The inactive mutant C145A with the tags (Trx-10aa-C145A-10aa-GST) served as a substrate for facilitated processing by the wild-type protease.
Because 3CL pro is responsible for polyprotein maturation, it is a potential target for anti-SARS drug development. We had previously used a fluorescence-based assay to characterize the protease and identified some C2 symmetry peptidomimetic compounds and metal-conjugated compounds as inhibitors of SARS-CoV 3CL pro (15)(16)(17). Other small molecules targeting SARS-CoV 3CL pro were identified from several compound libraries, including bifunctional aryl boronic acids (18), a quinolinecarboxylate derivative (19), a thiophenecarboxylate (20), and phthalhydrazide-substituted ketoglutamine analogues (21). These are all active site inhibitors. The new structures presented here suggest the possibility of drug design targeting protease dimerization during maturation. Overall, our study provides insights to substrate recognition, protein maturation, and drug discovery for SARS 3CL pro .

EXPERIMENTAL PROCEDURES
Materials-Fluorogenic peptide substrate Dabcyl-KTSAVLQSG-FRKME-Edans was prepared as previously described (16). The plasmid mini-prep kit, DNA gel extraction kit, and Ni-NTA resin were purchased from Qiagen. FXa and the protein expression kit (including the pET32Xa/ LIC vector and competent JM109 and BL21 cells) were obtained from Novagen. The pGEX vector was obtained from Amersham Biosciences. All commercial buffers and reagents were of the highest grade.
Expression and Purification of SARS-CoV 3CL pro -Different constructs (wild-type and C145A mutant, with and without extra amino acids in their N and C termini) of the SARS proteases were cloned in pET32Xa/LIC (with N-terminal Trx, His tag, and FXa site), pET28 (with C-terminal His tag), or pGEX (with N-terminal GST tag). The previously cloned gene encoding the wild-type SRAS-CoV 3CL pro was used as a template, and the mutant forward primer (5Ј-CCTTAATG-GATCAGCTGGTAGTGTTGGT-3Ј) for C145A was used in a polymerase chain reaction (PCR) to create the mutant gene. For constructing 10aa-C145A (C145A mutant containing 10 extra amino acids QTSITSAVLQ derived from the natural pp1a polyprotein attached to the N terminus), the forward primer (5Ј-GGTATTGAGGGTCGCCAGACATCAAT-CACTTCTGCTGTTCTGCAGAGTGGTTTTAGGAAAATGGCA-3Ј) was used; for C145A-10aa (10 extra amino acids GKFKKIVKGT derived from the natural pp1a polyprotein attached to its C terminus), the backward primer (5Ј-AGAGGAGAGTTAGAGCCTTAAGTGCCCT-TAACAATTTTCTTGAACTTACCTTGGAAGGTAACACCAGAGCA-3Ј) was used; and for 10aa-C145A-10aa, the above forward and backward primers were used. The 5Ј-GGTATTGAGGGTCGCAGTGGTTT-TAGG-3Ј part of the forward primer and the 5Ј-AGAGGAGAGTTA-GAGCCTTATTGGAAGGTAACACC-3Ј part of the reverse primer 5Ј are for cloning into the pET32Xa/LIC vector. These primers were also used to prepare the wild-type SARS protease containing extra N-and Cterminal amino acids (10aa-WT, WT-10aa, and 10aa-WT-10aa). In a PCR reaction, thirty cycles of PCR were performed using a thermocycler (Applied Biosystems) with the melting temperature at 95°C for 2 min, annealing temperature at 42°C for 1 min, and polymerization temperature at 68°C for 1 min. The PCR product was subjected to electrophoresis on 1.2% agarose gel in TAE buffer (40 mM Tris-acetate, 5 mM EDTA, pH 8.0), and then the gel was stained with ethidium bromide. The band with the correct size was excised, and the DNA was recovered using a DNA elution kit. The construct was ligated to the pET-32Xa/ LIC vector by incubation for 1 h at 22°C. For preparation of the N-terminal Trx-tagged and C-terminal GST-tagged protease (Trx-10aa-C145A-10aa-GST), the GST gene in the pGEX 6P-1 vector (Amersham Biosciences) amplified with the primers (forward primer 5Ј-GGTAAG-TTCAAGAAAATTGTTAAGGGCACTATGTCCCCTATACTAGGTTA-T-3Ј and reverse primer 5Ј-AGAGGAGAGTTAGAGCCTCAATCCGATT-TTGGAGGATGGT-3Ј) was used as a template. This template, containing the GST portion, and the second template of 10aa-C145A-10aa were used with the forward primer of 5Ј-GGTATTGAGGGTCGCCAGACATC-AATCACTTCTGCTGTTCTGCAGAGTGGTTTTAGGAAAATGGCA-3Ј and the backward primer of 5Ј-AGAGGAGAGTTAGAGCCTCAATCCGA-TTTTGGAGGATGGT-3Ј to clone into pET32Xa/LIC vector.
The recombinant protease plasmid was then used to transform Escherichia coli JM109 competent cells that were streaked on a Luria-Bertani (LB) agar plate containing 100 g/ml ampicillin. Ampicillinresistant colonies were selected from the agar plate and grown in 5 ml of LB culture containing 100 g/ml ampicillin overnight at 37°C. The entire SARS protease gene of the plasmid obtained from the overnight culture was sequenced. The correct construct was subsequently transformed into E. coli BL21 for protein expression. The protein purification followed our reported procedure using Ni-NTA column chromatography (16). To prepare the protease without tags (10aa-C145A and C145A-10aa) for AUC studies, the N-terminal Trx and His tags were removed by FXa protease digestion, and the mixture was loaded onto another Ni-NTA column to recover the highly purified untagged protein. To purify the Trx-10aa-C145A-10aa-GST to be used as the substrate for facilitated processing, both Ni-NTA and GST columns were used. For protein crystallization, SARS-CoV 3CL pro wild type and mutant C145A clones were incorporated in pGEX-6p-1 plasmid DNA (Amersham Biosciences) with a Factor Xa cutting site immediately before the Nterminal Ser1 of the target gene. GST-tagged protein was purified using a GST column and after the tag cleavage by FXa, the mixture was loaded onto a HiTrap TM 16/10 QFF column (Amersham Biosciences) and eluted with the buffer (20 mM Tris-HCl, pH 8.0, 1 mM EDTA, and 1 M NaCl). The flow-through fractions containing 3CL pro were pooled and concentrated for growing crystals.
Protein Crystallization-Wild-type 3CL pro was stored in a buffer containing 10 mM Tris-HCl (pH 7.5), 1 mM dithiothreitol, and 1 mM EDTA. The protein was crystallized using the sitting drop diffusion method by mixing 2 l of the wild-type 3CL pro protein solution (10 mg/ml) with 2 l of the reservoir solution (1.0 M sodium malonate, pH 7.0, and 4% isopropanol) onto a sitting drop post, equilibrated with 500 l of the reservoir solution. The crystallization proceeded at 25°C in the dark for 7 days.
Purified C145A was concentrated by Amicon ultrafiltration and desalted using a HiPrep 26/10 column and a buffer containing 10 mM Tris-HCl (pH 7.5) and 1 mM EDTA. The protein was crystallized using the sitting drop diffusion method by mixing 2 l of the C145A solution (18 mg/ml) with 2 l of the reservoir solution (0.1 M Bicine, pH 9.0, and 20% polyethylene glycol 6000) and 1 l of additive (10 mM cobalt sulfate) onto a sitting drop post, equilibrated with 500 l of the reservoir solution. Crystallization was carried out at 25°C in the dark for 3 days.
Data Collection and Processing-Crystals for data collection were rinsed with the reservoir buffer and cryo-cooled in liquid nitrogen. The preliminary x-ray analysis was performed with an in-house Micro-Max002 x-ray generator with a Rigaku R-Axis IV ϩϩ image plate system. High resolution data for wild type and C145A crystals were collected at BL17B2 beamline in National Synchrotron Radiation Research Center (NSRRC) (Taiwan) and Taiwan beamline BL12B2 in SPring-8 (Japan), respectively. Data were processed and integrated by using the program HKL2000 (22).
Structure Determination, Refinement, and Model Building-The wild-type structure was determined by molecular replacement using one monomer of HCoV 229E M pro (PDB ID code 1P9S) (12) as the starting model. Cross-rotation function and translation function searches were performed with the program CCP4 (23). The geometric adjustments were made with XtalView (24) under the guidance of (2F o Ϫ F c ) sum difference maps. The Crystallography and NMR System (CNS) program (25) was used for structure refinement, including stimulated annealing procedure, positional, and B-factor refinements.
Autoprocessing Experiments-Six constructs were individually expressed in E. coli. The 5-ml overnight culture of a single transformant was used to inoculate 500 ml of fresh LB medium containing 100 g/ml ampicillin. The cells were grown to A 600 ϭ 0.6 and induced with 1 mM isopropyl-␤-thiogalactopyranoside. After 4 -5 h, the cells were harvested by centrifugation at 7,000 ϫ g for 15 min. The enzyme purification was conducted at 4°C. The 2-liter cell culture was collected to yield ϳ20 g of cell paste, which was suspended in 80 ml of lysis buffer containing 12 mM Tris-HCl, pH 7.5, 120 mM NaCl, and 0.1 mM EDTA in the presence of 1 mM dithiothreitol plus 7.5 mM ␤-mercaptoethanol. A French-press (AIM-AMINCO®, Spectronic Instruments) was used to disrupt the cells at 12,000 p.s.i. The lysis solution was centrifuged, and the debris was discarded. The cell-free extract was loaded onto a Ni-NTA column, which was pre-equilibrated with the lysis buffer. After washing the column exhaustively with the lysis buffer plus 5 mM imidazole, the column was eluted with the lysis buffer plus 300 mM imidazole. The eluant was concentrated to ϳ1 mg/ml for SDS-PAGE analysis to check the existence of the protease (the tag-cleaved protease by autoprocessing cannot be obtained after Ni-NTA column chromatography).

Kinetics of Maturation Assayed by SDS-PAGE-
The processing reactions at the N and C termini of the tagged C145A mutant were followed by SDS-PAGE analysis to monitor the degradation of the substrate and the formation of the products with time. The reaction mixture containing Trx-10aa-C145A-10aa-GST (5 M) and the active protease (0.5 M) was incubated at 25°C for 10, 100, 500, 1000, or 1500 min in 20 mM Bis-Tris (pH 7.0), and the reaction was terminated by 100 M ZnCl 2 (a known SARS 3CL pro inhibitor) and subject to SDS-PAGE analysis, which allowed the resolution of substrate and products.
Analytical Ultracentrifuge Experiments-Wild-type and C145A mutant SARS-CoV 3CL pro with extra N-or C-terminal amino acids at a concentration of 1 mg/ml (ϳ14.3 M of dimer) were used for the AUC analysis of AUC with the buffer 12 mM Tris-HCl, pH 7.5, 120 mM NaCl, 0.1 mM EDTA, 1 mM dithiothreitol, and 7.5 mM ␤-mercaptoethanol. Sedimentation coefficients (s values) were estimated by a Beckman-Coulter XL-A analytical ultracentrifuge with an An60 Ti rotor as described before (26). Sedimentation velocity analysis was performed at 40,000 rpm at 25°C with standard double sector aluminum centerpieces. The UV absorption of the cells was scanned every 5 min for 4 h. Data were analyzed with the SedFit program (version 8.7). The Sednterp program (version 1.07) was used to obtain solvent density, viscosity, and Stokes radius (R s ). The Sedphat program (version 1.5b) was used to obtain the dimer-monomer equilibrium dissociation constant of all proteins tested. A and B, the overall three-dimensional structures of wild-type and C145A mutant 3CL pro are shown as ribbons (protomer A is green and protomer B is blue), and the "product" in the active site cleft between domain I and II is shown in yellow. In C, the dimer structure is composed of protomer A (shown with a solid tube in green) and protomer B (shown with charge potentials). The C terminus of protomer BЈ (shown in cyan) in another asymmetric unit is intercalated into the active site of protomer B. D, an enlarged view of C near the active site, showing the C-terminal amino acids of protomer BЈ as well as the N-terminal amino acids of protomer A in the neighborhood of the active site of protomer B.

RESULTS
Overall Structure of the SARS-CoV Wild-Type 3CL pro -Data collection and refinement statistics for the wild-type and C145A 3CL pro are summarized in Table I. The wild-type crystal belongs to the P2 1 2 1 2 space group with one monomer in an asymmetric unit so that the two monomers in the dimer are symmetric (Fig. 1A). Like the previous 3CL pro structures from HCoV 229E, TGEV, and SARS-CoV (11-13), our SARS-CoV 3CL pro structure is composed of three domains, including a chymotrypsin-like fold for domain I (residue 1-101) and antiparallel ␤-barrel for domain II (residue 102-184). The active site is located between domains I and II. The six C-terminal residues in the wild-type protease are flexible (shown below) in contrast to those of the C145A structure. The sequences surrounding the N-and C-terminal cutting sites of 3CL pro in different coronaviruses are shown in Supplemental Fig. 1S.
Overall Structure of SARS-CoV C145A 3CL pro -The 3CL pro (C145A) crystal belongs to the C2 space group with one dimer in the asymmetric unit, so the two monomers in a dimer are not identical (Fig. 1B). The two protomers of the dimeric C145A, denoted "A" and "B," are oriented perpendicular to each other, and each protomer contains three domains as in the wild type. A novel finding here is that the active site of protomer B is intercalated with the C-terminal residues 301-306 of protomer BЈ (shown as a cyan ribbon in Fig. 1C and as a cyan stick in Fig.  1D) from the dimer in another asymmetric unit. The N-terminal residues of protomer A (shown as a green ribbon in Fig. 1C and as a green stick in Fig. 1D) are located near the active site of protomer B. This structure reveals the way in which the product is bound in the active site during the maturation process, and the six amino acids at the C terminus of protomer BЈ represent the P6 to P1 sites of the autoprocessed product. The inter-and intra-interactions of SARS 3CL pro dimer are shown in Table II.
Autoprocessing of Tagged SARS-CoV 3CL pro during Lysate Preparation-To further understand autoprocessing, we generated the six 3CL pro constructs listed in Fig. 2A. During the cell lysate preparation of Trx-10aa-WT (construct 1; here Trx refers to Trx-6ϫHis-FXa site), the protein underwent autoprocessing to yield mature SARS protease by self-cleaving the 10 amino acids preceding the N terminus of mature 3CL pro . The processed protease failed to bind to the Ni-NTA column, because the His tag was removed by autoprocessing. The SDS-PAGE (Fig. 2B) shows no band at ϳ50 kDa (Trx plus 3CL pro ) after Ni-NTA column chromatography for this construct (lane 1). The same result was observed for construct 3, which contained cleavage sites in both N and C termini (lane 3, Fig. 2B), because it was also autoprocessed. In contrast, construct 2 without the cleavage site in the N terminus was retained in the Ni-NTA column (lane 2, Fig. 2B). Constructs 4, 5, and 6 of mutant C145A all yielded a purified band in lanes 4 -6 on SDS-PAGE (Fig. 2B), because the C145A mutation prevented autoprocessing.
Facilitated Processing-We prepared the inactive protein Trx-10aa-C145A-10aa-GST (5 M) with tags to examine its processing by using 0.5 M active 3CL pro . The processed products can be easily resolved from the substrate on SDS-PAGE. As shown in Fig. 3, the N-terminal Trx tag was cleaved first, followed by cleavage of the C-terminal GST. The bands were assigned to the fragments shown in the right panel. The band (ϳ34 kDa) corresponding to the added 3CL pro appeared in every lane except in the substrate-only lane, which contained no enzyme. The time course data in Fig. 3B show that the initial rate for the formation of Trx tag (N-terminal cleavage) and GST tag (C-terminal cleavage) is 0.22 and 0.004 M/min, respectively. Thus, the N terminus is processed much faster (55-fold) than the C-terminal end.
AUC Analysis of Wild-type and Mutant Proteases-We then utilized AUC to examine the quaternary structures of 10aa-C145A, C145A-10aa, and the wild-type 3CL pro and compare their dimer K d values. The AUC data for the wild-type SARS

FIG. 2. SDS-PAGE analysis of the maturation of SARS-CoV recombinant proteases.
In A, the six constructs of the recombinant protease are listed, including the wild-type or C145A mutant enzyme containing N-and/or C-terminal 10 additional amino acids and the N-terminal Trx tag. In B, MW represents the molecular weight markers. Lanes 1 and 3 represent 300 mM imidazole eluant from Ni-NTA column of constructs 1 and 3, respectively, where the protease disappeared, because it underwent autoprocessing and lost the tags. Lanes 2, 4, 5, and 6 represent the protease of constructs 2, 4, 5, and 6, respectively, eluted by 300 mM imidazole from the Ni-NTA column. They either lack the N-terminal cleavage sequence or contain C145A mutant, so they were retained in the Ni-NTA column before elution. protease as shown in Fig. 4 indicate that the determined molecular weight is that of a dimer and the dimeric wild-type protein has a K d of 0.35 nM. In contrast, 10aa-C145A and C145A-10aa, which contain N-or C-terminal 10 extra amino acids, shows 49-and 16-fold larger K d values (17.2 nM and 5.6 nM), respectively. Even with only 10 extra amino acids in the Nor C terminus, dimer formation is inhibited with the N-terminal having a larger impact. The sub-micromolar K d value of the tagged protease indicates that immature 3CL pro can form a small amount of dimer enabling it to undergo autoprocessing to yield the mature enzyme, which further serves as a seed for facilitated maturation.
Processing Intermediate-like, Product-bound C145A Structure-The P1-P6 peptide from protomer BЈ shown in the omit map occupies the active site of C145A protomer B (Fig. 5A). The detailed molecular contacts of the peptide with the active site amino acids are shown in Fig. 5B. In the S1 site, the side-chain O ⑀1 of Gln 306 (P1) forms a hydrogen bond with side chain N ⑀2 of His 163 . The side-chain N ⑀2 of P1-Gln donates an H-bond to the side-chain carbonyl of Glu 166 . Moreover, the oxygen anion at the free carboxylate end of P1-Gln forms H-bonds with the backbone N atoms of Gly 143 . Carbon atoms of P1-Gln interact with His 41 and Ala 145 by hydrophobic interactions. If Ala 145 is replaced by Cys using computer modeling (shown in blue) to generate the active form, the S␥ atom of Cys 145 will interact with the P1 carboxyl group (Fig. 5B).
Residues 140 -145 and 163-166 form the "outer wall" of the S1 site. The S2 site of C145A is formed by the main-chain atoms of Val 186 , Asp 187 , Arg 188 , and Gln 189 as well as the side-chain atoms of His 41 , Met 49 , and Met 165 , suggesting that the P2 site prefers a bulky side chain such as Val, Leu, or Phe.  Comparison of 3CL pro Structures-The conformations of the apo form (protomer A, blue) and the product-bound form (protomer B, cyan) of the C145A active sites are compared in Fig.  6A. We also superimposed the previously solved crystal structure of chain A of SARS 3CL pro complexed with a substrate-like chloromethyl ketone irreversible inhibitor (coded 1UK4), in which the bound inhibitor and water molecules were removed (shown in green), and chain B without inhibitor bound (shown in gold). The crystal structures of 3CL pro of HCoV 229E (1P9S, crimson) and TGEV (1P9U, with inhibitor; pink) were also superimposed. For SARS 3CL pro , the positions of the active-site residues of C145A protomer B (product-bound form) and those of 1UK4 chain A (inhibitor-bound form) are similar. However, C145A protomer A (free form) has slightly different conformation, in which Thr 190 is displaced from the active site so that the pocket hole is larger than the product-bound form. In comparison with the active site pocket of 3CL pro of other coronaviruses, the size is TGEV Ͼ SARS-CoV Ͼ HCoV-229E.
So far, five 3CL pro inhibitor/product-bound crystal structures have been reported. As shown in Fig. 6B, the structure of the six C-terminal residues QFTVGS of SARS 3CL pro (shown in cyan) superimposes well on that of the inhibitor QLTSNV (shown in pink) of TGEV 3CL pro reported previously (12) except for P5 and P6 and inhibitor QLTSNV (shown in gold) in the active site of 1UK4 protomer B beyond the P1 site. Otherwise, AG7088, an effective inhibitor of Rhinovirus 3C protease (1CQQ) (27), has been suggested to be a potential inhibitor for SARS-CoV 3CL pro (12). Therefore we superimposed AG7088 (shown in green), which spans the P1-P2-P3 sites, with the six C-terminal residues of protomer BЈ. However, the alignment is poor (Fig. 6B). In our enzymatic assay, AG7088 at the maximal tested concentration (100 M) did not inhibit SARS-CoV 3CL pro (data not shown). DISCUSSION Herein, we report clear electron densities for the C-terminal six amino acids of SARS 3CL pro protomer BЈ, which insert into the active site of the nearby symmetry-related protomer B. The FIG. 4. AUC experiments of wild-type SARS 3CL pro . Shown here is an example of using AUC to measure the K d of the wild-type protease dimer-monomer equilibrium. The A 280 absorbance of the protein as a function of radius and time was recorded to calculate the sedimentation coefficient and the K d value A, circles represent the experimental data, and the lines are the computer-generated results from fitting the data to the Lamm equation with the SedFit program. B, fitting residuals plotted as a function of radial position. C, grayscale of the residual bitmap. The randomly distributed residuals and bitmap show the quality of the data fitting. D, continuous sedimentation coefficient of wild-type 3CL pro derived from the data shown in A.
product-bound structure provides the first evidence of an intermediate during SARS viral protease maturation. SARS 3CL pro has its N and C termini both close to the active site, providing clear guidance for designing a novel inhibitor to block the viral maturation before the mature dimer is formed. In a dimer, the N-terminal amino acids (also called the N-finger) have many specific interactions with domains II and III of the parent monomer and domain III of the other monomer. From the crystal structure of TGEV M pro , it had been suggested that, after in trans autocleavage, the N terminus of one monomer slides over the active site of the other monomer and adopts a position at the edge of the active site (12). That report also hypothesized that the replication complex could be anchored to the membrane in an uncleaved form, and later, when the precursor proteins accumulate to high concentrations at particular region, the 3CL protease could release itself by intermolecular cleavage, thereby triggering the trans-processing reaction. Our structural data enable the maturation process to be better understood. It is likely that the main protease forms a dimer after autocleavage that immediately enables the catalytic site to act on other cleavage sites in the polyprotein.
Our AUC data show that wild-type SARS 3CL pro displays a remarkably small dimer K d and that the extra amino acids in the N or C terminus cause a significant increase in the K d value. This is consistent with the previous AUC study that the full-length SARS 3CL pro containing an un-natural C-terminal hexa-His tag that yielded a K d ϭ 89 nM at pH 7.6 (28). Therefore, the wild-type enzyme containing the N-or C-terminal tags probably still forms a small amount of the active dimer, which undergoes autoactivation during cell lysate preparation (see Fig. 7 as explained below). This does not happen for the inactive mutant C145A protease with extra amino acids of the N and C termini. The kinetics of autoprocessing determined from SDS-PAGE shows that the N-terminal cleavage occurs before the C-terminal cleavage. This is consistent with observations that a short peptide with the N-terminal cleavage sequence is a better substrate than one with the C-terminal cleavage sites determined using an high-performance liquid chromatography assay (10). After autoprocessing, the newly formed active dimeric SARS main protease can serve as a seed for further chain reaction to activate the premature protease and the other proteins. In vitro autoactivation had also been observed for a number of proteases such as caspase and cathepsin K (29,30).
A neighboring protomer can catalyze the autoprocessing, as indicated by the crystal structure of C145A in a processing intermediate-like product-bound form (Fig. 1), and the processing rate at the N terminus is faster than that at the C terminus (Fig. 3). Moreover, at the mature 3CL pro dimer the N terminus FIG. 6. Superposition of 3CL pro active sites and inhibitors. A, superimposition of the active site of five 3CL pro protease structures: cyan and blue, protomer A and B of C145A; light green and gold, protomer A and B of the wild type, respectively; crimson, HCoV 229E 3CL pro (1P9S); and pink, TGEV 3CL pro (1P9U). B, superimposition of the six C-terminal residues of SARS 3CL pro (SGVTFQ) (cyan), the inhibitor of TGEV 3CL pro (1P9U, pink), the inhibitor of TGEV 3CL pro (VNSTLQ) at the active site of SARS 3CL pro (1UK4, gold), and the inhibitor of Rhinovirus 3CL pro , AG7088 at the active site (1CQQ, green). of one protomer is close to the active site near domain II of the other protomer (Fig. 1D). These findings allow us to develop a hypothesis of SARS 3CL pro maturation. As illustrated in Fig. 7, with three domains in the protease represented as I, II, and III, and with domain I at the N terminus and domain III at the C terminus, the maturation processing can be divided into four steps. Step 1: The N-terminal tail of one polyprotein (cyan) approaches the active site of the other polyprotein (pink) to be cleaved. Because the monomeric protease is inactive (10), the intra-chain processing for maturation is impossible. Step 2: The N-terminal tail of the polyprotein (pink) is cleaved by the partially cleaved polyprotein (cyan). Step 3: After N-terminal cleavage, the polyprotein (cyan) can flip over to its position in the immature dimer. A shift of ϳ10.3 Å of the N terminus is estimated from the distance between the bound P1 carboxylate in the C145A structure and the N terminus of the mature wild type, as shown in Fig. 1D. Step 4: The C-terminal tail cleavage then follows by inserting the C-terminal tail into the neighboring immature dimer (resembling the product-bound form of C145A), and then the mature dimer is formed after cleavage.
The crystal structure of C145A in a product-bound form presented here not only suggests the mechanism for maturation, but also guided the structure-based protease inhibitor design. AG7088 did not inhibit SARS protease because the structural features of the active site pocket of SARS 3CL pro differ from those of Rhinovirus 3CL pro . Thr 142 of Rhinovirus 3CL pro is absent from SARS 3CL pro , so that there cannot be effective hydrogen bonding with the lactam moiety at P1 site. Because both of the N and C termini are close to the active site of the SARS-CoV 3CL pro according to the new structure of C145A in a product-bound form, inhibitors targeting the dimer interface may block the protease maturation. In fact, several drugs targeting the viral protease interface were in different phases of clinical trials (31). Our new structure of the C145A mutant thus provides the template for design of the inhibitors to block the maturation process, which is ongoing in our laboratories. FIG. 7. Proposed scheme of SARS 3CL pro maturation. Two polyproteins are shown in pink and cyan, each with three domains (I, II, and III). The maturation processing is composed of Step 1: polyprotein (cyan) approaches a second polyprotein (pink) and inserts its N terminus into the active site to be cleaved; Step 2: the N terminus of the uncleaved polyprotein (pink) then inserts its N terminus into the active site for processing; Step 3: after N-terminal processing, the polyprotein with the N terminus flips over to its new position from the active site to form a premature dimer; Step 4: the C terminus of the partially digested polyprotein in the premature dimer is inserted into the active site of another immature dimer to be cleaved and finally the mature dimer is formed.