Structure of Arterivirus nsp4

Arteriviruses are enveloped, positive-stranded RNA viruses and include pathogens of major economic concern to the swine- and horse-breeding industries. The arterivirus replicase gene encodes two large precursor polyproteins that are processed by the viral main proteinase nonstructural protein 4 (nsp4). The three-dimensional structure of the 21-kDa nsp4 from the arterivirus prototype equine arteritis virus has been determined to 2.0 Å resolution. Nsp4 adopts the smallest known chymotrypsin-like fold with a canonical catalytic triad of Ser-120, His-39, and Asp-65, as well as a novel α/β C-terminal extension domain that may play a role in mediating protein-protein interactions. In different copies of nsp4 in the asymmetric unit, the oxyanion hole adopts either a collapsed inactive conformation or the standard active conformation, which may be a novel way of regulating proteolytic activity.

The arteriviruses porcine reproductive and respiratory syndrome virus (PRRSV) 1 (1) and equine arteritis virus (EAV) (2) are major economic concerns for the swine-and horse-breeding industries worldwide. Both viruses are widespread in the population, can establish persistent infections, and are easily transmitted via both respiratory and venereal routes. PRRSV is currently considered to be a major swine pathogen, causing reproductive failure and severe pneumonia in neonates. Horses infected with EAV are often asymptomatic, but persistently infected stallions ("shedding stallions") can infect mares through semen, regularly leading to spontaneous abortion of the fetus (3). Apart from vaccinations that are costly and (given the use of live attenuated vaccines) not without risk (4), there are currently no effective methods to combat arteriviral diseases.
The family Arteriviridae (order Nidovirales) comprises enveloped, mammalian RNA viruses with a 12-16-kilobase positive stranded genome (5)(6)(7). In addition to the prototype EAV and PRRSV, the family includes the lactate dehydrogenaseelevating virus of mice and simian hemorrhagic fever virus. Nidovirus genomes are polycistronic, containing a large 5Јterminal replicase gene, which is expressed from the viral genome RNA, and a downstream set of (largely) structural protein genes. The latter are expressed via the transcription of a nested set of subgenomic mRNAs from the 3Ј-terminal region of the genome (8). The replicase gene is translated into two multidomain precursor proteins, which are cleaved into mature nonstructural proteins by multiple viral proteinases, a key regulatory mechanism in the nidovirus life cycle (7). In the arterivirus prototype EAV, the replicase gene is translated into open reading frame 1a (ORF1a) and ORF1ab polyproteins of 1,727 and 3,175 amino acids, respectively; the latter product resulting from a ribosomal frameshift that can occur just prior to termination of ORF1a translation. As illustrated in Fig. 1, the EAV ORF1a and ORF1ab proteins are cleaved by three different ORF1a-encoded proteinases, papain-like cysteine proteinases located in nonstructural protein 1 (nsp1) and nsp2, and the chymotrypsin-like serine proteinase nsp4 (9). Following the rapid autocatalytic release of nsp1 and nsp2, the remainder of the polyproteins (nsp3-8 and nsp3-12) is processed by nsp4, the main viral proteinase (10). Probably, internal hydrophobic domains target these proteins to intracellular membranes, which are modified to accommodate viral RNA synthesis (11).
The nsp4 main proteinase processes the remaining eight cleavage sites, five in the ORF1a protein (12,13) and three in ORF1b-encoded part of the ORF1ab protein (14), which encodes (among other functions) the viral RNA-dependent RNA polymerase and helicase (9). Alternative processing pathways are used during cleavage of the nsp3-8 intermediate (13). In the major pathway, nsp2 associates with nsp3-8 as a cofactor and triggers nsp4 to autocatalytically cleave its C terminus to yield the nsp3-4 and nsp5-8 intermediates. The relatively slow processing of the nsp324 and nsp728 sites ensues (10), but the nsp526 and nsp627 bonds are not cleaved, and cleaved nsp6 and nsp7 are not produced. In the minor pathway, nsp2 does not associate with nsp3-8, and the nsp425 junction is not cleaved (13). However, nsp4 does cleave the nsp324, nsp526, nsp627, and nsp728 sites of the nsp3-8 intermediate, indicating that the nsp4 proteinase is fully functional.
To provide a structural framework for understanding the complexities of proteolysis in arteriviruses, we have determined the crystallographic structure of EAV nsp4 at 2.0 Å resolution. The structure reveals the smallest chymotrypsinlike fold known as well as a novel C-terminal ␣/␤ extension domain. This first structure of an arteriviral proteinase opens new avenues for understanding replicase maturation and establishes a basis for rationally developing antiviral agents.

EXPERIMENTAL PROCEDURES
Expression, Purification, and Crystallization-Nsp4 (nt 3417-4025 of the EAV genome; GenBank TM accession number X53459) was cloned into pMAL-c2 (New England Biolabs, Inc.) and expressed as an MBP-nsp4 fusion in Escherichia coli BL21(DE3) (Promega). Expression was induced with 1 mM isopropyl-1-thio-␤-D-galactopyranoside for 4 h at 37°C, and cells were lysed using a French press. MBP-nsp4 was purified by amylose-affinity chromatography and cleaved with 1 unit of thrombin/mg of protein in 20 mM Tris, pH 8.1 and 150 mM NaCl. Following cleavage, nsp4 contained additional residues at its N and C terminus, Ser-Met and Leu-Ala-Ser, respectively. Further purification was achieved using Sepharose G-75 and Mono Q columns (Amersham Biosciences). MBP and uncleaved fusion protein were removed using amylose resin. The integrity of nsp4 was confirmed by mass spectrometry. Crystals were grown at room temperature by the hanging drop vapor diffusion method by mixing 2 l of protein (25 mg/ml) and 2 l of 23% (w/v) polyethylene glycol MME 2000, 100 mM Tris, pH 8.5, 10% (w/v) ethylene glycol, and 0.2 M ammonium acetate. Crystals grew within 1-2 weeks and had typical dimensions of 0.5 ϫ 0.2 ϫ 0.2 mm. The crystals belonged to space group P1 with cell dimensions a ϭ 56.6, b ϭ 60.7, c ϭ 62.8 Å, ␣ ϭ 74.0°, ␤ ϭ 63.2°, and ␥ ϭ 66.2°.
Data Collection, Phasing, and Refinement-All data were collected on flash-frozen crystals at 100 K by transferring crystals from the mother liquor or heavy atom soaking solutions into a nitrogen gas stream. Data from the crystal soaked in mersalyl were measured at beamline BM-14-C (BioCARS-CAT, Advanced Photon Source) using an ADSC Quantum-4 detector. All other data were measured using CuK␣ radiation (Rigaku RU-300 generator, Osmic optics) and a R-AXIS IVϩϩ image plate detector. Data were processed using the HKL suite (15). Experimental phases were calculated using SOLVE (16). RESOLVE (17) was used for density modification, improving the figure of merit from 0.57 to 0.69, and to generate a partial main-chain trace. The experimental electron density map was further improved by ARP/ wARP (18). From this modified electron density map, a nearly complete trace of a single copy in the asymmetric unit was constructed using XFIT (19). A model for the complete asymmetric unit was generated by transposing the coordinates of the first copy onto partial models of the other copies and making manual adjustments. Refinement against experimental phase information was performed using CNS (20). The model was further refined against the Native 1 data set using REFMAC version 5.0.36 (21) with individual TLS parameters refined separately for the N-terminal barrel, C-terminal barrel, and the C-terminal domain. The Native 1 data set was used for refinement instead of the higher resolution mersalyl data set because of systematic measurement errors in the latter. Non-crystallographic symmetry restraints were not used at any point during refinement because of significant differences in conformation between different copies. Residues 1-6 are missing from all copies. In addition, the following residues are missing: 204 in copy A, 199 -204 in copy B, 198 -204 in copy C, and 8 and 204 in copy D. 88.6% of residues lie in the most favored regions of the Ramachandran plot and 10.7% lie in the additional allowed regions (regions defined by PROCHECK) (22). Additional checks on geometry were performed using WHATCHECK (23). Data quality, phasing, and refinement statistics are given in Table I. Figures were prepared with MOLSCRIPT (24), BOBSCRIPT (25), and RASTER3D (26).

RESULTS AND DISCUSSION
Structure of EAV nsp4 -The structure of EAV nsp4 was determined using the multiple isomorphous replacement technique ( Fig. 2A; refer to Table I for crystallographic parameters). There are four copies of nsp4 (designated A, B, C, and D) in the triclinic unit cell, each containing two ␤-barrels as well as a unique C-terminal domain. The N-terminal barrel consists of six ␤-strands (A1 to F1), while the C-terminal barrel is composed of seven (A2 to G2) (Fig. 2B). The G2 ␤-strand is also found in Sindbis virus core protein (SCP) (27) and Semliki forest virus core protein (SFCP) (28) but is not present in most other chymotrypsin-like proteinases (CLPs). The core of both ␤-barrels is comprised of conserved hydrophobic residues (Fig.  2B). In addition, Trp-114 is a conserved, fully solvent exposed hydrophobic residue that lies in a groove lined with other conserved residues near the substrate-binding pockets. This groove and Trp-114 may mediate protein-protein interactions.
The most striking feature of the nsp4 structure is the presence of an additional C-terminal domain not found in most other CLPs. This third domain comprises residues 156 -204 and consists of two short pairs of ␤-strands and two ␣-helices. The C-terminal domain interacts with the C-terminal barrel through an interface (buried surface area of 1638 Å 2 ) consisting of conserved hydrophobic residues: Leu-105 and Leu-112 from the C-terminal ␤-barrel and Val-158, Leu-163, Phe-167, Ile-182, Leu-196, and Ile-197 from the C-terminal domain. There is also an exposed patch of conserved solvent-exposed hydrophobic residues (Fig. 2C) that may form part of the interface with nsp5 in the nsp4-5 intermediate. This hydrophobic patch may also mediate interactions with nsp2, which associates with nsp3-8 to induce cleavage of the nsp425 site by nsp4 (13).
The overall conformation of the two ␤-barrels in all four copies of nsp4 is highly conserved, except for the loops between C1 and D1 and E2 and F2, both of which are flexible and have higher temperature factors (Fig. 2D). The largest variation in conformation is seen in the position of the C-terminal domain relative to the ␤-barrels. The root mean square deviations when superimposing the entire molecule range from 0.40 to 0.77 Å and are 11-91% higher than when superimposing only the ␤-barrels. DYNDOM (29) reveals a rotation (residue 157 or 161 as the hinge) of the C-terminal domain relative to the proteinase domain of 11°when comparing copies A and C or C FIG. 1. EAV proteolytic processing pathways. A, overview of the proteolytic processing of the EAV replicase ORF1a and ORF1ab polyproteins. The three EAV proteinases (located in nsp1, nsp2, and nsp4), their cleavage sites, and the EAV nsp nomenclature are depicted. Abbreviations: PCP, papain-like cysteine proteinase; SP, serine proteinase; RdRP, RNA-dependent RNA polymerase; Z, zinc finger; Hel, helicase; N, Nidovirus-specific conserved domain. B, overview of the two alternative processing pathways that apply to the C-terminal half of the EAV ORF1a protein. The association of cleaved nsp2 with nsp3-8 (and probably also nsp3-12) was shown to direct the cleavage of the nsp425 site by the nsp4 proteinase (major pathway). Alternatively, in the absence of nsp2, the nsp526 and 627 sites are processed, and the nsp425 junction remains uncleaved. The status of the small nsp6 subunit (fully cleaved or partially associated with nsp5 or nsp7) remains to be elucidated. and D. The C-terminal domain can clearly adopt different orientations relative to the two ␤-barrels, which may facilitate substrate binding or autoproteolysis as discussed below.
Comparison to Other Chymotrypsin-related Proteinases-Despite having little sequence identity to other CLPs, nsp4 was proposed to have a CLP fold (7,12). The crystal structure confirms the existence of a CLP fold, which at 149 residues is the smallest serine proteinase domain known. Human rhino-virus 2 (HRV-2) 3C is a viral CLP (30) with a similar P1 specificity of substrates to nsp4. A least squares superposition with nsp4 gives an r.m.s.d. of 0.96 Å (40 C ␣ pairs, cut-off of 1.5 Å) (Fig. 3A). Apart from the additional C-terminal domain in nsp4, a major difference between the two structures is the length of the loop separating strands B2 and C2. In HRV-2 3C and in most other CLPs, this ␤-hairpin is 10 -20 residues longer. The shorter ␤-hairpin found in nsp4 provides enough space . Highly conserved residues are highlighted: hydrophobic residues at the interface between the N-and C-terminal ␤-barrels (mauve); hydrophobic core of the two ␤-barrels and the C-terminal extension domain (cyan); catalytic triad (red); hydrophobic residues at the interface between the C-terminal ␤-barrel and the C-terminal extension domain (dark blue); S1 pocket (yellow); Asp-119, a buried and charged residue important for the conformation of the active site and the oxyanion hole (green). LDVC and LDVP, LDV neurovirulent type C and strain Plagemann, respectively; PRRSVLV and PRRSVVR, PRRSV strain Lelystad and strain ATCC VR-2332, respectively. C, surface representation of nsp4. The highly conserved solvent-exposed hydrophobic residues in the C-terminal extension domain are shown in yellow whereas the catalytic triad is shown in red. The figure was prepared using PYMOL (38). D, ribbon diagram of the superposition of the four copies of nsp4 in the asymmetric unit.
for the seventh ␤-strand of the C-terminal barrel, G2. A short ␤-hairpin connecting B2 and C2 is also present in SCP and SFCP, both of which also have G2. The presence of this shorter ␤-hairpin in SCP may facilitate cis cleavage to produce the viral capsid protein (27). The structures of nsp4 and SCP align well, despite remote sequence similarity, with an r.m.s.d. of 0.95 Å (39 C ␣ pairs, cut-off of 1.5 Å) (Fig. 3B). The overall shape of nsp4 is much like SCP as it is more compact than most other CLPs and results from shorter ␤-hairpins.
A recently reported structure of a coronavirus chymotrypsinlike cysteine main proteinase from the transmissible gastroenteritis virus in pigs reveals the only other structure of a CLP with an additional C-terminal domain (32). Coronaviruses comprise the distantly related second family in the order Nidovirales (7). Unlike nsp4, the structure of the C-terminal domain in the coronavirus proteinase is nearly twice as large at 110 residues and is comprised of only ␣-helices. In addition, the linker between the C-terminal domain and the C-terminal ␤-barrel differs markedly from that found in nsp4. Despite the lack of sequence and structural similarity, the C-terminal domains in both arteriviruses and coronaviruses may share a common functionality in facilitating substrate recognition (32).
Nsp4 Active Site-The crystal structure of nsp4 reveals that His-39, Asp-65, and Ser-120 form a catalytic triad similar to that of other CLPs ( Fig. 2A), confirming the predictions from sequence analysis (9) and site-directed mutagenesis (12). The active sites in all four copies of nsp4 adopt a very similar conformation (Fig. 4A), except for the peptide bond between residues 117 and 118. The distances between Ser-120 O ␥ and His-39 N ⑀2 range from 2.8 to 3.5 Å, and the distances between His-39 N ␦1 and the two carboxylate oxygen atoms of Asp-65 range from 2.8 to 3.0 Å. These distances are similar to values seen in other CLPs. The rotamer adopted by Ser-120 is the same as in other CLPs ( 1 ϳ Ϫ80°).
In CLPs, the oxyanion hole (in nsp4, main chain amides between residues 118 and 120) stabilizes the negative charge on the P1 carbonyl oxygen atom in the tetrahedral intermediate. Remarkably, in three of the four copies, the amide nitrogen of Gly-118 points away from Ser-120, causing the collapse of the oxyanion hole (Fig. 4, A and B). In contrast, the oxyanion hole is properly formed in copy B (Fig. 4C), where the only major difference between the other copies is the conformation of the peptide bond between residues 117 and 118. This represents an interconversion of a type I (copies A, C, and D) to a type II (copy B) ␤-turn. The psi angle of Ser-117 in copy B is 136°, and for copies A, C, and D it ranges between Ϫ41 and Ϫ51°. The phi angle of Gly-118 in copy B is 116°whereas that for copies A, C, and D ranges between Ϫ59 and Ϫ64°. This is a common type of peptide flip observed in pairs of homologous protein structures, especially with a glycine residue in the i ϩ 1 position of the flipped peptide bond (33). Molecular orbital calculations indicate an energy barrier of ϳ3 kcal/mol for this flip (34). Nsp4 is the first example of a wild-type serine proteinase in which alternate conformations of the oxyanion hole are observed. Previously, a collapsed oxyanion hole was observed in an active site cysteine to alanine mutant of HAV 3C proteinase (35). The collapsed oxyanion hole in nsp4 may be part of a novel mechanism regulating proteolytic activity. An important structural feature stabilizing the collapsed conformation of the oxyanion hole is a water molecule that forms hydrogen bonds with Ser-120 O ␥ and His-39 N ⑀2 (Fig.  4B). A second water molecule hydrogen bonds with His-134 N ⑀2 and the main-chain carbonyl oxygen atom of Thr-116. These hydrogen bonds can only form if the oxyanion hole adopts a collapsed conformation, and hence these water molecules are absent from copy B.
Substrate Binding-Previous studies have shown that nsp4 has a specificity for Glu (and in one case Gln) at the P1 position (7,12,14). As in the picornaviral 3C proteinases and SGPE, which have specificity for either Gln or Glu in the P1 position, nsp4 also has a conserved histidine residue (His-134 in nsp4, His-161 in HRV-2 3C, His-213 in SGPE) at the base of the S1 specificity pocket, and a conserved Ser/Thr residue lining one "wall" of the S1 pocket (Thr-115 in nsp4, Thr-142 in HRV-2 3C, Ser-192 in SGPE) (Fig. 5A). The histidine in the S1 pocket hydrogen bonds to the P1 side-chain carbonyl oxygen atom of an inhibitor bound to HRV-2 3C (30) and to one of the glutamate carboxylate oxygens of the P1 side chain of a peptide bound to SGPE (31). Thr-142 in HRV-2 3C and Ser-192 in SGPE also hydrogen bond to the carbonyl or carboxylate oxygen atoms of the P1 side chain. Because the structures of the active site and S1 pocket are conserved, His-134 and Thr-115 in nsp4 could also donate hydrogen bonds to the carboxylate oxygen atom of the P1 side chain in substrates. Site-directed mutagenesis studies confirm that His-134 is essential for the efficient processing of the EAV polyprotein cleavage sites (12). Thr-115 can be replaced by serine, glycine, and asparagine with variable effects on cleavage specificity and efficiency (12).
Both nsp4 and SGPE prefer glutamic acid over glutamine in the P1 position of substrates. In SGPE, Ser-216 hydrogen bonds with the carboxylate oxygen atom (not interacting with His-213 or Ser-192) of the P1 side chain. Ser-216 cannot hydrogen bond with the amide nitrogen of a P1 glutamine side chain, because the lone pair already accepts a hydrogen bond from the main-chain amide of Gly-219; this proposal is supported by site-directed mutagenesis studies (36). In nsp4, Ser-137 occupies the same position as Ser-216 in SGPE and should also recognize a negatively charged P1 residue (Fig. 5B). In picornaviral 3C proteinases, which prefer glutamine at the P1 position, a glycine is found in lieu of serine.
Nsp4 Self-processing: a cis or trans Event?-Nsp4 is known to cleave its N and C termini in the nsp3-8 intermediate (7,(12)(13)(14). Although nsp4 can cleave several sites in trans (14), self-processing may also be a cis event. To act as both an enzyme and a substrate in cis reactions and to act as an enzyme in trans reactions, nsp4 must adopt alternate conformations. The N or C termini must be either in or away from the substrate-binding pocket for cis and trans cleavages, respectively. In the nsp4 crystal structure, the chain termini are remarkably close together and do not lie in the substrate-binding pocket. The positions adopted by the chain termini likely differ from that adopted during cis cleavage to prevent self-inhibition of proteolytic activity, as in picornaviral proteinases (35,37). Each terminus can be brought into the appropriate position for cis cleavage by a conformational rearrangement of the ten N-terminal residues or part of the C-terminal domain. In support of this possibility, the N-terminal six residues are disordered and residues 198 -203 have higher than average temperature factors (in copies A and D) or are completely disordered (in copies B and C).
One of the few hydrogen-bonding interactions fixing the position of the C terminus is the Asp-119:Arg-203 ion pair. Al-  though Asp-119 is conserved in arteriviral proteinases, Arg-203 is not. Interestingly, Arg-4, which is disordered in all four copies of nsp4, is conserved among arteriviral proteinases. Even though the N-terminal six residues are disordered, the position adopted by Lys-7 indicates that Arg-4 may be able to interact with Asp-119 in an alternate conformation. When the C terminus adopts the conformation required for cis cleavage, the Asp-119:Arg-203 ion pair must be broken, thereby allowing Arg-4 to interact with Asp-119. CONCLUSIONS Our crystallographic analysis of EAV nsp4 elucidates the structural basis of proteolytic processing in arteriviruses. Nsp4 adopts alternate conformations of the oxyanion hole, which may be a novel means of regulating proteolytic activity. The structure reveals how the C-terminal extension interacts with the chymotrypsin-like domains and suggests how it may mediate the formation of multiprotein complexes to control proteolytic processing pathways critical to the viral life cycle. Understanding the structural details of proteolytic processing in arteriviruses and other nidoviruses will allow for the design of novel antiviral agents.