Structure of the Autocatalytic Cysteine Protease Domain of Potyvirus Helper-component Proteinase*

The helper-component proteinase (HC-Pro) of potyvirus is involved in polyprotein processing, aphid transmission, and suppression of antiviral RNA silencing. There is no high resolution structure reported for any part of HC-Pro, hindering mechanistic understanding of its multiple functions. We have determined the crystal structure of the cysteine protease domain of HC-Pro from turnip mosaic virus at 2.0 Å resolution. As a protease, HC-Pro only cleaves a Gly-Gly dipeptide at its own C terminus. The structure represents a postcleavage state in which the cleaved C terminus remains tightly bound at the active site cleft to prevent trans activity. The structure adopts a compact α/β-fold, which differs from papain-like cysteine proteases and shows weak similarity to nsP2 protease from Venezuelan equine encephalitis alphavirus. Nevertheless, the catalytic cysteine and histidine residues constitute an active site that is highly similar to these in papain-like and nsP2 proteases. HC-Pro recognizes a consensus sequence YXVGG around the cleavage site between the two glycine residues. The structure delineates the sequence specificity at sites P1–P4. Structural modeling and covariation analysis across the Potyviridae family suggest a tryptophan residue accounting for the glycine specificity at site P1′. Moreover, a surface of the protease domain is conserved in potyvirus but not in other genera of the Potyviridae family, likely due to extra functional constrain. The structure provides insight into the catalysis mechanism, cis-acting mode, cleavage site specificity, and other functions of the HC-Pro protease domain.

The genomes of many viruses are translated into polyprotein precursors that are further processed by proteolytic cleavage into mature products (1). Potyviruses are positive-sense singlestranded RNA viruses and form the largest genus of plant virus (2). The genome of potyvirus is translated into a single large 340 -370-kDa polyprotein that contains in the N to C order 10 proteins: P1, HC-Pro, P3, 6K1, CI, 6K2, VPg, NIa-Pro, Nlb, and coat protein. The polyprotein is processed into mature products by three virus-encoded proteases P1 (3,4), helper-compo-nent proteinase (HC-Pro) 2 (5,6), and NIa-Pro (7). P1 and HC-Pro cleave only at their respective C termini in cis, whereas NIa-Pro is a trypsin-like cysteine proteinase responsible for the remaining of seven cleavage events in the C-terminal twothirds of the polyprotein.
HC-Pro is a multifunctional protein associated with polyprotein processing, aphid transmission, and various defense-related functions (2,8). HC-Pro is generally divided into three functional domains: a N-terminal domain, a central region, and a cysteine protease domain (CPD) in the C-terminal region. HC-Pro was first identified as a helper component for aphidmediated plant-to-plant transmission of potyvirus (9), and it likely functions as a bridge between the aphid stylet and virions. Mutagenesis studies showed that a KITC motif in the N-terminal domain is critical for binding the aphid stylets (10), and a PTK motif close to the CPD interacts with viral capsid protein (11). The N-terminal domain is essential for aphid transmission but is dispensable for virus viability, infectivity and synergism (12,13).
HC-Pro is involved in genome amplification, long distance movement, pathogenicity, and viral synergism (13)(14)(15)(16)(17). These effects are now generally ascribed to the role of HC-Pro as an RNA silencing suppressor (RSS) (18 -21). RNA silencing serves as an antiviral defense system in plants, in which viral doublestranded RNAs are processed by Dicer into small interfering RNAs (siRNAs) that guide Argonaute proteins to cleave viral RNAs (22). The RSS activity of HC-Pro was mainly associated with its central region (13,14,21) and was also affected by mutations in the CPD (23,24). The molecular mechanism of HC-Pro RSS activity remains elusive. Recent studies suggested that HC-Pro suppresses RNA silencing by sequestering siRNA duplex, as many other viral RSSs (25,26). Consistently, both the siRNA binding and RSS activity of plum pox virus HC-Pro were impaired by mutations in a conserved FRNK motif in the central region (27).
HC-Pro cleaves in cis a Gly-Gly dipeptide at its own C terminus (5,6). The autoproteolytic activity is exclusively intramolecular as a CPD fragment could not cleave a dead enzyme substrate provided in trans (6). This proteolytic activity is essential for virus viability (28). HC-Pro was classified as a cysteine protease because of a cysteine and histidine residue that are essential for autoproteolysis (29). However, the HC-Pro CPD is not homologous with other cellular and viral cysteine proteases beyond the Potyviridae family.
Despite being a subject of considerable interest for Ͼ3 decades, only low resolution structures for HC-Pro have been obtained by means of two-dimensional crystallography and electron microscopy (30,31). In this study, we determined the crystal structure of HC-Pro CPD from turnip mosaic virus (TuMV) at 2.0 Å resolution. The structure provides important insight into the catalysis mechanism, cis-acting mode, cleavage site specificity, and other functions of the HC-Pro CPD.

EXPERIMENTAL PROCEDURES
Protein Expression, Purification, and Crystallization-The DNA sequence encoding residues 301-458 of TuMV HC-Pro was PCR-amplified and cloned into an engineered pET28a vector, in which the inserted protein was fused to the C terminus of a His 6 -SMT3 tag. SMT3 is a yeast small ubiquitin-like modifier protein used to increase protein expression level and solubility (32). The protein was expressed in the Escherichia coli Rosetta(DE3) stain, and protein expression was induced with 0.3 mM isopropyl ␤-D-1-thiogalactopyranoside and growth at 16°C. Cells were resuspended in phosphate-buffered saline (PBS) containing 137 mM NaCl, 2.7 mM KCl, 10 mM Na 2 HPO 4 , and 2 mM KH 2 PO 4 (pH 7.4) and lysed by sonication. After clarification, the lysate supernatant was loaded onto a HisTrap column. The bound protein was washed with PBS buffer and 50 mM imidazole in PBS buffer and eluted with 500 mM imidazole in PBS buffer. The His 6 -SMT3 tag was cleaved by ULP1 at 4°C overnight. The sample was diluted 10-fold with PBS buffer and passed through a HisTrap column to remove the His 6 -SMT3 tag. The flowthrough was concentrated and further purified in a Superdex-75 column equilibrated in 20 mM HEPES-K (pH 7.6) and 100 mM KCl. Because the protein is prone to reversible aggregation at room temperature, all purification and crystallization steps were performed at 4°C. The protein was labeled with seleno-methionine by inhibition of the methionine biosynthesis pathway (33) and purified as described above.
Structure Determination-Diffraction data were collected at the Shanghai Synchrotron Radiation Facility beamline BL17U and processed by HKL2000 (34). The crystals belong to space group I422 with one molecule in the asymmetric unit. The structure was solved by selenium single-wavelength anomalous diffraction phasing in SHARP (35). The model was built in COOT (36) and refined in Refmac (37). The final model contains residues 336 -458 and 17 water molecules. The RAM-PAGE analysis showed that 95.9% of the residues are in a favorable region, 3.3% in an allowed region, and 0.8% in an outlier region (38). The single outlier residue Ser 424 is located in a loop (residues 419 -426) that was roughly modeled due to weak electron density. Structural figures were created in PyMOL (39).

RESULTS
Structural Description-The proteolytic activity of the tobacco etch virus (TEV) HC-Pro was previously shown to reside in 155 residues within its C terminus (5). We expressed, purified, and crystallized a fragment (residues 301-458) that covers a similar C-terminal region of TuMV HC-Pro and corresponds to the mature product of autocleavage (Fig. 1A). The structure was solved by the single-wavelength anomalous diffraction method using a seleno-methionine-labeled crystal and was refined to 2.0 Å with a R work /R free of 0.246/0.274 (Table 1 and supplemental Fig. S1). The refinement statistics appear worse than what would be expected for a structure with 2.0 Å resolution. The poor refinement statistics may be partially caused by some residues that were not (residues 301-335) or roughly modeled (residues 419 -426) because of missing or weak electron density.
Residues 301-335 were not resolved in the structure, likely due to conformational flexibility and partial degradation. SDS-PAGE analysis of dissolved crystals revealed a mixture of both intact and degraded proteins. This region harbors a PTK motif (residues 310 -312 in TuMV) that was shown to probably bind the viral coat protein during aphid-mediated transmission (11).
The catalytic residue Cys 344 is located at the N terminus of helix ␣1, and the other catalytic residue His 417 is located on strand ␤2. The substrate binding cleft is lined by the loop connecting helices ␣2 and ␣3 and the N-terminal region of helix ␣1 on one side and by strand ␤2 on the other side. The C-terminal tail remains tightly bound at the cleft. The bound product would block the enzyme from accepting new substrate, which explains why HC-Pro possesses only cis-proteolysis activity (6).
Comparison with Other Cysteine Protease Structures-We searched the DALI server and did not find a close structural homolog of the CPD (40). Many papain-like cysteine proteases, such as papain, staphopain, ervatamin, and cathepsin family members, were retrieved with a low z-score of 2.0 -3.7 and a root mean square deviation of 3.4 -4.0 Å over ϳ80 C␣ pairs. The overall structure of HC-Pro CPD differs significantly from papain-like-folds (Fig. 1C). Papain-like folds are generally divided into two domains: an L-domain that is mainly helical and harbors the catalytic cysteine and an R-domain with a predominant ␤-barrel that provides the catalytic histidine (41). The active site cleft is situated at the interface of the two domains. The HC-Pro CPD is much smaller than papain (122 versus 212 residues). Helices ␣1-␣3 of HC-Pro roughly correspond to the L-domain of papain, whereas the equivalents of HC-Pro strands ␤1 and ␤2 are part of a much larger and complex ␤-barrel in the papain R-domain (Fig. 1C).
By visual comparison of known cysteine protease structures, we found another remote structural homolog, the CPD of nsP2 from Venezuelan equine encephalitis alphavirus (42), which was not identified by the DALI search (Fig. 1C). The CPD of HC-Pro and nsP2 are both compact and share similar topology at the ␣1-␣2-␣3-␤1-␤2 region of HC-Pro, although these structural elements are highly different between HC-Pro and nsP2 in terms of orientation and size. In the HC-Pro CPD structure, the region downstream of strand ␤2 orients the C terminus into the active site for autoproteolysis. No such elements exist in nsP2 CPD, which is a trans-acting protease, and its additional C-terminal region wraps around the structure opposite to the active cleft (Fig. 1C).
Despite different overall structures, HC-Pro, nsP2, and papain-like proteases share a similar active site configuration with the catalytic cysteine residue located at the N terminus of a helix and the catalytic histidine residue in a ␤-strand (Fig. 1, D  and E). In addition, the substrate of HC-Pro and a covalently linked inhibitor of papain occupy a similar position at the active site cleft (Fig. 1D). The active site of most cysteine proteases contains a catalytic triad in which the third residue is an asparagine (Asn 175 in papain), aspartate, or glutamate (43,44). The third residue orients the imidazole ring of the catalytic histidine and perhaps stabilizes its protonated form. The structure shows that HC-Pro apparently lacks the third catalytic residue, which is similar to nsP2 (42).
In papain, the backbone amide of Cys 25 and the side chain amide of Gln 19 make up an oxyanion hole that stabilizes the oxyanion of the tetrahedral transition intermediate during   (Fig. 1D). No structural equivalence of Gln 19 is present in HC-Pro; nevertheless, the amides of Cys 344 and Tyr 343 might form the oxyanion hole (Fig. 2D).
The side chains of the catalytic dyad residues in the HC-Pro CPD structure apparently adopt an inactive conformation; His 417 ND1 and Cys 344 SG are too distant (7.2 Å) to interact with each other (Fig. 1D). This may be a feature of the postcleavage state or due to a crystal packing interaction involving the His 417 imidazole group and the lack of the third catalytic residue stabilizing His 417 conformation.
To study the conformation of the enzyme before cleavage, we made constructs that contained mutations C344S or H417K at the catalytic dyad to inactivate the enzyme and the C terminus extended to residue 461. Unfortunately, all of these mutant proteins failed to crystallize. Hence, we resorted to modeling the precleavage state by adding the P1Ј residue (Gly 459 ) and orientating the side chains of the catalytic dyad residues (Fig. 2D). According to the classic chemical mechanism of cysteine proteases (43), the catalytic residues Cys 344 and His 417 likely form a thiolate-imidazolium ion pair. The SG atom of Cys 344 would make a nucleophilic attack on the carbonyl carbon of Gly 458 , resulting in the formation of a thioester-bonded acylenzyme intermediate and release of the amine product. The carboxylate product is released upon hydrolysis of the thioester bond. Our structure corresponds to the postcleavage state with the carboxylate product bound at the active site.
Recognition of the Cleavage Site-Previous mutagenesis analysis demonstrated that HC-Pro recognizes the consensus sequence YXVGG around the cleavage site, in which the cleavage occurs between the two glycine residues (45). According to the standard convention (46), the five residues are named in the N to C order as P4, P3, P2, P1, and P1Ј, and their respective enzyme binding sites are called S4, S3, S2, S1, and S1Ј. Our structure delineates the specificity determinants for P4 to P1 ( Fig. 2A). The P4 -P1 segment adopts an extended structure and sandwiches between strand ␤2 and the loop connecting helices ␣1 and ␣2 by forming four hydrogen bonds along the backbone.
The P1 residue Gly 458 is fully encircled by a hole composed of the backbone atoms of Gly 342 , Tyr 343 , Cys 344 , Ile 416 , His 417 , and Val 418 and the side chain of Trp 379 (Fig. 2, B and C). Any side chain at the P1 position would lead to a steric clash with the hole. The P2 residue Val 457 is entirely buried, and its side chain is bound at a hydrophobic pocket lined by residues Tyr 345 , Ile 348 , Leu 382 , Val 385 , Met 407 , Ile 416 , and Val 418 . In addition, the backbone atom of Val 457 is shielded from solvent by the side chains of Trp 379 and Ile 415 , which form a bridge across the substrate binding cleft. The side chain of the P3 residue Arg 456 is exposed to solvent and not recognized, which is consistent with the sequence variability at this position.
The P4 residue Tyr 455 is largely buried at the S4 subsite composed of Val 409 , His 411 , Lys 414 , Ile 415 , IIe 416 , Ser 451 , and Leu 452 (Fig. 2A). The phenol ring of Tyr 455 is sandwiched between the backbone atoms of Lys 414 and Ile 415 on one face and the side chain of Leu 452 on the other face. One edge of the Tyr 455 phenol ring is docked against Val 409 and Ile 416 , whereas the other edge is largely exposed. The hydroxyl group of Tyr 455 is specified by a hydrogen bond with the imidazole ring of His 411 , which further stacks over the side chain of Tyr 429 . Alanine substitution of the equivalent residues of Asp 410 and His 411 in TEV HC-Pro (mutant AS 23) was shown to impair autoproteolytic activity   14). The H411A mutation likely disturbs the binding of the P4 tyrosine residue to affect catalytic activity. Interestingly, the HC-Pro of Bymovirus does not display specificity for the P4 tyrosine (Fig. 3, A and B). The S4 subsite residues Val 409 and His 411 are invariant in HC-Pro homologs that have a P4 specificity; however, in Bymovirus, His 411 is replaced by a proline or alanine, and Val 409 is replaced by a lysine or phenylalanine. These changes within the S4 subsite may account for the loss of P4 specificity in Bymovirus.
The structure contains no P1Ј residue, and the mechanism of glycine specificity at P1Ј remains undetermined. The structure model at the precleavage state suggests that Trp 379 is a key determinant for P1Ј specificity (Fig. 2D). In support of this proposal, HC-Pro homologs from Bymovirus have no specificity at P1Ј and show a parallel replacement of Trp 379 by phenylalanine or tyrosine. A smaller six-member ring at the position of Trp 379 may relieve the size restriction on P1Ј (Fig. 2D).  (47,48). Homologs of potyviral HC-Pro are found in the Rymovirus, Tritimovirus, Bymovirus, and Brambyvirus genera and in one species of Ipomovirus (Fig. 3B). Whether Macluravirus has HC-Pro is unknown because its complete genome is not available. In the genus Ipomovirus, HC-Pro is present in sweet potato mild mottle virus, but it is not present in three other species: cucumber vein yellowing virus (49), squash vein yellowing virus (50), and cassava brown streak virus (51). The taxonomic statue of sweet potato mild mottle virus was thought to be out of date (50). The equivalent genome position of P1/HC-Pro is occupied by two serine proteases P1a and P1b in cucumber vein yellowing virus and squash vein yellowing virus and a single P1 in cassava brown streak virus.

HC-Pro in the Potyviridae Family-The
Only HC-Pro proteins from Rymovirus show sequence homology to the full-length Potyvirus HC-Pro, and those from other genera have homology only to the CPD. This suggests that every function of potyviral HC-Pro may not necessarily be conserved in other genera. For example, HC-Pro of wheat streak mosaic virus, which is a tritimovirus, has been shown to be dispensable for systemic infection (52). To provide insight into unique function of HC-Pro CPD in potyvirus, we mapped the residues that are conserved within the Potyvirus genus or across the Potyviridae family onto the CPD structure.
We found that the residues universally conserved in the Potyviridae family primarily constitute the hydrophobic core of the CPD structure or surround the active site, which is consistent with a role in structure maintenance and catalysis. However, within the Potyvirus genus, the CPD shows a much higher degree of conservation with an average pairwise identity of 64% and an average pairwise similarity of 81% (Fig. 3A). Notably, the surface that spans helices ␣1 and ␣2, strands ␤1 and ␤2, and the loop between ␤2 and ␣3 is highly conserved in the Potyvirus genus (Fig. 3C). This is not simply due to a lack of evolutionary divergence because the opposite surface is not conserved (Fig.  3C). The conserved surface is likely involved in functions other than proteolysis that are specific to Potyvirus (and perhaps Rymovirus). Interestingly, asparagine replacement of the equivalent residue of TuMV Asp 420 , which is located on this surface and shows little structural stabilization role (Fig. 3C), abolished the RSS activity of TEV HC-Pro (23).

DISCUSSION
The HC-Pro CPD performs a single essential proteolytic cleavage to release its C terminus from the rest of the polyprotein. We have determined the first crystal structure of HC-Pro CPD, providing important insight into its unique fold, catalysis mechanism, substrate specificity, and cis-acting mode. The boundary of CPD (residues 336 -458) defined by the structure is slightly shorter at its N terminus than previously defined (6) and excludes the PTK motif that is involved in aphid-mediated virus transmission (11). We show that the HC-Pro CPD structure is divergent from known cysteine protease structures while maintaining a conserved catalytic cleft of papain-like proteases. The structure also verifies that the cysteine and histidine residues shown previously to be essential for catalysis indeed form the catalytic dyad at the active site (29).
The structure of HC-Pro CPD in the postcleavage state suggests that trans-activity of HC-Pro would be inhibited by binding of the cleaved product to the active cleft. Similar autoinhibitory structures were observed for the Sindbis virus nucleocapsid protease and the hepatitis C virus NS2 protease (53,54). Inhibition by the unreleased product appears to be a common mechanism underlying the action of cis-only proteases.
In an attempt to convert the enzyme into a trans-acting protease, we made a fragment (residues 332-447) with the inhibitory C-terminal tail deleted. The truncated protein failed to be purified, suggesting that the C-terminal tail is important for the integrity of the structure. To our knowledge, there is no previous example that an exclusively cis-acting protease can be converted into a trans-protease by removal of the C-terminal tail.
In the current MEROPS protease database (44), HC-Pro proteins constitute a C6 family with unassigned clan status. The nsP2 protease of alphavirus belongs to the C9 family, which is the only member of the newly created clan CN. The structural similarity between HC-Pro and nsP2 CPDs suggests that the HC-Pro C6 family could be assigned to clan CN. Interestingly, both potyvirus and alphavirus are positive-sense RNA viruses and transmitted by insect vectors (aphid for potyvirus and mosquito for alphavirus). The structural similarity between HC-Pro and nsP2 CPD may be a manifestation of common evolutionary origin or a result of convergent evolution. Clan CN may also include p29 and p48 cysteine proteases from Cryphonectria hypoviruses which show short similar sequences with HC-Pro around the catalytic dyad residues and cleavage site (55,56).
In addition to proteolysis, the CPD of potyvirus has been associated with RSS activity (23,24). Although the exact role of CPD in RSS activity remains to be defined, an intact CPD structure seems to be important. Many mutations that abolish RSS activity appear to disrupt the CPD structure (23,24). We identified a surface on the CPD that is conserved in Potyvirus, but not in other genera of Potyviridae that contain a homologous CPD. The surface might contribute to RSS activity by contacting the central region or by making direct role.