Unusual zwitterionic catalytic site of SARS–CoV-2 main protease revealed by neutron crystallography

The main protease (3CL Mpro) from SARS–CoV-2, the etiological agent of COVID-19, is an essential enzyme for viral replication. 3CL Mpro possesses an unusual catalytic dyad composed of Cys145 and His41 residues. A critical question in the field has been what the protonation states of the ionizable residues in the substrate-binding active-site cavity are; resolving this point would help understand the catalytic details of the enzyme and inform rational drug development against this pernicious virus. Here, we present the room-temperature neutron structure of 3CL Mpro, which allowed direct determination of hydrogen atom positions and, hence, protonation states in the protease. We observe that the catalytic site natively adopts a zwitterionic reactive form in which Cys145 is in the negatively charged thiolate state and His41 is doubly protonated and positively charged, instead of the neutral unreactive state usually envisaged. The neutron structure also identified the protonation states, and thus electrical charges, of all other amino acid residues and revealed intricate hydrogen-bonding networks in the active-site cavity and at the dimer interface. The fine atomic details present in this structure were made possible by the unique scattering properties of the neutron, which is an ideal probe for locating hydrogen positions and experimentally determining protonation states at near-physiological temperature. Our observations provide critical information for structure-assisted and computational drug design, allowing precise tailoring of inhibitors to the enzyme's electrostatic environment.

The main protease (3CL M pro ) from SARS-CoV-2, the etiological agent of COVID-19, is an essential enzyme for viral replication. 3CL M pro possesses an unusual catalytic dyad composed of Cys 145 and His 41 residues. A critical question in the field has been what the protonation states of the ionizable residues in the substrate-binding active-site cavity are; resolving this point would help understand the catalytic details of the enzyme and inform rational drug development against this pernicious virus. Here, we present the room-temperature neutron structure of 3CL M pro , which allowed direct determination of hydrogen atom positions and, hence, protonation states in the protease. We observe that the catalytic site natively adopts a zwitterionic reactive form in which Cys 145 is in the negatively charged thiolate state and His 41 is doubly protonated and positively charged, instead of the neutral unreactive state usually envisaged. The neutron structure also identified the protonation states, and thus electrical charges, of all other amino acid residues and revealed intricate hydrogen-bonding networks in the active-site cavity and at the dimer interface. The fine atomic details present in this structure were made possible by the unique scattering properties of the neutron, which is an ideal probe for locating hydrogen positions and experimentally determining protonation states at near-physiological temperature. Our observations provide critical information for structure-assisted and computational drug design, allowing precise tailoring of inhibitors to the enzyme's electrostatic environment.
COVID-19, a deadly disease caused by the novel coronavirus SARS-CoV-2 (severe acute respiratory syndrome-coronavirus 2), is a pandemic of extraordinary proportions, disrupting social life, travel, and the global economy. The development of vaccines and therapeutic intervention measures promises to mitigate the spread of the virus and to alleviate the burdens COVID-19 has caused in many communities in recent months (1)(2)(3)(4)(5)(6). SARS-CoV-2 is a single-stranded positive-sense RNA virus with a genome comprised of ;30,000 nucleotides. The viral replicase gene encodes two overlapping polyproteins, pp1a and pp1ab, consisting of individual viral proteins essential for replication (7,8). Each polyprotein must be processed into indi-vidual functional proteins, a vital step in the virus life cycle. This is accomplished by a chymotrypsin-like protease, 3CL M pro or main protease, a hydrolase enzyme that cleaves peptide bonds. The proper functioning of 3CL M pro is indispensable for SARS-CoV-2 replication, whereas its inhibition leads to the inability to produce mature infectious virions. Thus, the enzyme is considered a promising target for the design and development of SARS-CoV-2-specific protease inhibitors and for repurposing existing clinical drugs (9)(10)(11)(12)(13)(14)(15).
SARS-CoV-2 3CL M pro is a cysteine protease (16,17) and is catalytically active as a homodimer (Fig. 1). Its amino acid sequence is 96% homologous to the earlier SARS-CoV 3CL M pro , and the catalytic efficiencies of the two enzymes are similar (10,11,(18)(19)(20). The ;34-kDa enzyme has three distinct domains: catalytic domains I (residues 8-101) and II (residues 102-184) and the a-helical domain III (residues 201-303), which is required for protein dimerization (10,11). Importantly, the monomeric enzyme shows no catalytic activity, as was demonstrated for SARS-CoV 3CL M pro (21)(22)(23)(24). The catalytic site of 3CL M pro employs a noncanonical Cys 145 -His 41 dyad thought to be assisted by a water molecule hydrogenbonded to the catalytic histidine (18,25). The cysteine thiol group functions as the nucleophile during the first step of the hydrolysis reaction by attacking the carbon atom of the scissile peptide bond. Substrate hydrolysis requires the catalytic dyad to be in the zwitterionic state with deprotonated Cys 145 and protonated His 41 , which either can be generated through a proton transfer from the Cys 145 thiol to the His 41 imidazole by a general acid-base mechanism (19) or may already be present before substrate binding (20,26). However, the protonation states of the 3CL M pro catalytic site have not been experimentally determined. The enzyme recognizes a general amino acid sequence of Leu-Gln;Ser-Ala-Gly, where ; marks the cleavage site, but displays some substrate sequence promiscuity. The active-site cavity is located on the surface of the protease and can bind substrate residues in positions P1´through P5 in the substrate-binding subsites S1ʹ-S5, respectively (Fig. 2). Subsites S1, S2, and S4 are shaped into well-formed binding pockets, whereas S1ʹ, S3, and S5 are located on the protein surface with no defined shape. The peptide bond cleavage occurs between the substrate residues at the C-terminal position P1ʹ and N-terminal position P1.
Current structure-assisted drug design efforts are mainly directed toward reversible and irreversible covalent inhibitors that mimic the protease substrate binding to subsites S1ʹ-S5 in the active-site cavity (9)(10)(11)(12)14), whereas the dimer interface can also be explored for the design of dimerization inhibitors (27,28). Knowledge of the SARS-CoV-2 3CL M pro active-site cavity structure at an atomic level of detail, including the actual locations of hydrogen atoms, can provide critical information to improve rational drug design. The presence or absence of hydrogen atoms at specific sites on amino acid residues determines their protonation states and, thus, their electrical charges, defining the electrostatics and hydrogen-bonding interactions. Of note, half of all atoms in protein and smallmolecule drugs are hydrogen. X-ray crystallography is typically the standard experimental method for structure-assisted drug design but cannot reliably locate hydrogen atoms in biological macromolecules, leaving significant gaps in our understanding of biological function and drug binding (29). Electron clouds scatter X-rays; thus, scattering power is determined by the number of electrons present in an atom, i.e. by its atomic num-ber. Hydrogen, with just a single electron that often participates in highly polarized chemical bonds, is the weakest possible Xray scatterer and consequently is invisible in X-ray structures with a few exceptions beyond subatomic resolution (30)(31)(32). In contrast, atomic nuclei scatter neutrons, where the scattering power of neutrons is independent of the atomic number. Deuterium, a heavy isotope of hydrogen, scatters neutrons just as well as carbon, nitrogen, and oxygen. Neutron crystallography is capable of accurately determining positions of hydrogen and deuterium atoms and visualizing hydrogen-bonding interactions at moderate resolutions (33)(34)(35)(36), where X-rays cannot locate functional hydrogen atoms (30). Moreover, unlike Xrays (37), neutrons cause no direct or indirect radiation damage to protein crystals, permitting diffraction data collection at near-physiological (room) temperature, avoiding possible artifacts induced by the use of cryoprotectant chemicals required for X-ray cryo-crystallographic measurements.
In neutron crystallographic experiments, protein crystals are usually hydrogen/deuterium-exchanged with the heavy water (D 2 O) to increase the signal-to-noise ratio of the diffraction pattern because hydrogen has a large incoherent scattering cross-section that increases background. Also, the coherent neutron-scattering length of hydrogen is negative (23.739 fm) and is therefore observed in the neutron-scattering length (or nuclear) density maps as troughs. At moderate resolutions, the negative neutron-scattering length of hydrogen leads to the density cancelation phenomenon observed for CH, CH 2 , and CH 3 groups as hydrogen atoms attached to carbon atoms cannot exchange with deuterium. Conversely, deuterium has a coherent neutron-scattering length of 16.671 fm and, thus, is observed as peaks in nuclear density maps. Because deuterium atoms scatter neutrons just as well as other protein atoms, they can be directly detected in neutron structures at moderate resolutions as low as 2.5-2.6 Å (38,39). Notably, sulfur has a coherent neutron-scattering length of 12.847 fm, less than half the magnitude of that for carbon, oxygen, nitrogen, and deuterium. Consequently, deprotonated thiol groups (S -) in Cys and side- Figure 1. Structure of SARS-CoV-2 3CL M pro . A, the catalytically active dimer is shown in cartoon representation, with the catalytic dyad residues Cys 145 and His 41 in ball-and-stick representation. B, the enzyme protomer shown in stick representation demonstrates that X-ray diffraction only provides the positions of heavy atoms, i.e. carbon, nitrogen, oxygen, and sulfur (2380 atoms in one protomer). Water molecules are shown as red spheres. C, the same protomer shown with deuterium atoms colored yellow and hydrogen atoms colored gray illustrates that neutron diffraction provides positions of all atoms in the enzyme (4681 atoms in one protomer). Notice how much busier the protein neutron structure is compared with its X-ray structure because there are twice as many atoms visible when using neutron diffraction data. Only oxygen atoms (red spheres) are visible for water molecules in the X-ray structure, whereas all three atoms are fully visible in the neutron structure. chain sulfur atoms in Met residues are often not easily visible in nuclear density maps.
We grew neutron-quality crystals of the ligand-free SARS-CoV-2 3CL M pro at pH 6.6, allowing us to obtain a room-temperature neutron structure of the enzyme refined jointly with a room-temperature X-ray data set collected from the same crystal ( Fig. 1) (40). We accurately determined the locations of exchangeable hydrogen atoms that were observed as deuterium attached to electronegative atoms such as oxygen, nitrogen, or sulfur atoms. We discovered that the catalytic dyad comprising residues Cys 145 and His 41 is in the reactive zwitterionic state having deprotonated negatively charged Cys 145 and doubly protonated positively charged His 41 . Our experimental observations identified the protonation states of all other ionizable amino acid residues, allowing us to accurately map the electric charges and hydrogen-bonding networks in the SARS-CoV-2 3CL M pro active-site cavity and throughout the enzyme structure. Neutron diffraction data were collected from a hydrogen/deuterium-exchanged SARS-CoV-2 3CL M pro crystal at pD 7.0 (pD = pH 1 0.4) to 2.5 Å resolution (Fig. 3).

Protonation states of the catalytic site and nearby residues
The electron density for the catalytic site and the nearby residues of SARS-CoV-2 3CL M pro is shown in Fig. 3A Although hydrogen-bonding interactions can be inferred from the distances between the heavy atoms, the locations of hydrogen atoms and the protonation states of the amino acid residues can only be assumed. Instead, the nuclear density map shown in Fig. 3B displays the actual positions of exchanged deuterium atoms, accurately visualizing hydrogen-bond donors and acceptors. In the neutron structure, we observed that the catalytic Cys 145 thiol is in the deprotonated, negatively charged thiolate state.
In contrast, located 3.9 Å away from Cys 145 , the catalytic residue His 41 is protonated on both Nd1 and Ne2 nitrogen atoms of the imidazole side chain and is therefore positively charged (Fig. S1). As a result, the catalytic site natively adopts the zwitterionic reactive state required for catalysis (19,20,41). His 41 is strongly hydrogen-bonded to a water molecule (D 2 O cat ) that presumably plays the role of the third catalytic residue (18, 25) from a canonical catalytic triad, stabilizing the charge and position of the His 41 imidazolium ring. The Nd1-D. . .O D2O distance is 1.7 Å. The position of the D 2 O cat molecule is stabilized by several more, but possibly weaker, hydrogen bonds with His 41 main chain, His 164 , and Asp 187 . His 164 is doubly protonated and positively charged; it donates a deuterium in a hydrogen bond with the Thr 175 hydroxyl within the interior of the protein. As expected, Asp 187 is not protonated, is negatively charged, and participates in a strong salt bridge with Arg 40 . His 163 positioned near the catalytic Cys 145 is singly protonated and uncharged, making a hydrogen bond with the protonated phenolic side chain of Tyr 161 , whose OD group is rotated away from the His 163 imidazole (Fig. S1). The main-chain deuterium atoms of Gly 143 , Ser 144 , and Cys 145 comprising the oxyanion hole are also readily visible in the nuclear density (Fig. 3B).

Substrate-binding subsite S1
The P1 group of a substrate, usually Gln, binds in a rather peculiar substrate-binding subsite S1 (Fig. 4A). From one side, it is flanked by residues 140-144, making a turn that creates the oxyanion hole and, on the opposite side, by Met 165 , Glu 166 , and His 172 . The back wall of subsite S1 is created by the side chains of Phe 140 and His 163 . Interestingly, the N terminus of Ser 1ʹ of the second protomer within the active 3CL M pro homodimer reaches in to cap the subsite S1 from the top. In our neutron structure, the N-terminal amine is the protonated, positively charged 2ND 3 1 ammonium cation. It forms three hydrogen bonds, one each with the main chain carbonyl of Phe 140 , the side chain carboxylate of Glu 166 , and a D 2 O molecule. Both histidine residues, His 163 and His 172 , in this subsite are singly protonated and neutral (Fig. S1). The deprotonated Nd1 of His 172 is hydrogen-bonded with the main-chain amide nitrogen of Gly 138 with a nitrogen-deuterium distance of 2.2 Å, whereas the protonated Ne2 makes a long, possibly weak, hydrogen-bond interaction with the Glu 166 carboxylate with the deuterium-oxygen distance of 2.5 Å.

Substrate-binding subsite S2
Subsite S2 is more hydrophobic than subsite S1, because it binds the hydrophobic residues Leu or Phe in substrate position P2 (Fig. 4B). Subsite S2 is flanked by the p-systems of His 41 and the main chains connecting Asp 187 , Arg 188 , and Gln 189 . It is capped by Met 165 , whereas Met 49 , situated on the short P2 helix spanning residues Ser 46 -Leu 50 , adopts a conformation impeding the entrance to this site. Met 49 appears to be conformationally flexible, vacating its position in the ligand-free enzyme to allow various P2 groups to occupy this subsite when inhibitors bind (9)(10)(11). Tyr 54 serves as the base of subsite S2, and its phenolic hydroxyl group donates its deuterium in a hydrogen bond with the main chain carbonyl of Asp 187 .

Substrate-binding subsites S3-S5
Among the substrate-binding subsites S3, S4, and S5, only subsite S4 has a well-defined binding pocket architecture. Subsites S3 and S5 are on the protein surface, fully exposed to the bulk solvent, and have ill-defined borders. Subsite S3 is located between residues Glu 166 and Gln 189 that are .9 Å apart. Subsite S5 is between Pro 168 of the b-hairpin flap, spanning residues Met 165 -His 172 , and the P5 loop consisting of residues Thr 190 -Ala 194 (Fig. 2). Subsite S4 is formed between a long loop spanning residues Phe 185 -Ala 194 that acts as its base and the b-hairpin flap on the top. The loop turns 180°at Gln 189 , with its secondary structure being stabilized by hydrogen bonds between the Gln 192 side-chain amide and the main chain atoms of Val 186 . The side chains of Leu 167 and Phe 185 form the back wall of this site in the protein interior, creating a deep, mainly hydrophobic pocket (Fig. 4C).

Dimer interface
The two protomers in the SARS-CoV-2 3CL M pro homodimer interact through an extensive dimer interface. Protomer 1 (unprimed residue numbers) forms elaborate networks of hydrogen-bonding interactions with N-terminal residues 1ʹ-16ʹ, a b-strand with residues 118ʹ-125ʹ, and a loop containing residues 137ʹ-142ʹ of protomer 2 (primed residue numbers; Fig. 5). There are also many hydrophobic interactions within the dimer interface. The N termini of the two protomers meet at the start of a short a-helix spanning residues Gly 11 -Cys 16 and Gly 11ʹ -Cys 16ʹ to form several hydrogen bonds involving the main-chain and side-chain atoms of Ser 10 , Gly 11 , Ser 10ʹ , Gly 11ʹ , and Glu 14ʹ (Fig. 5). The N-terminal loop then extends across the face of the opposite protomer to subsite S1. Here, Ala 7ʹ , Phe 8ʹ , Arg 4ʹ , and Ser 1ʹ are observed making hydrogen bonds with Val 125 , Glu 290 , Lys 137 , Phe 140 , and Glu 166 . Ser 123ʹ and Asn 119ʹ of protomer 2 b-strand containing residues 118ʹ-125ʹ have direct hydrogen bonds and water-mediated interactions with the C-terminal residues of protomer 1, in addition to the reciprocal hydrogen bonds between Val 125' and Ala 7 formed because of the 2-fold symmetry of the enzyme dimer (Fig. 5). Similar reciprocity of hydrogen bonding is found for the loop consisting of residues 137ʹ-144ʹ in protomer 2 that interacts with the N-terminal residues Ser 1 and Arg 4 of protomer 1. Also, the side-chain hydroxyl of Ser 139ʹ makes a hydrogen bond with the side-chain amide of Gln 299 (Fig. 5).

Protonation states of Cys residues
The SARS-CoV-2 3CL M pro structure contains 12 Cys residues spread throughout the sequence. The reasons for such a large number of cysteines within the enzyme sequence and their functional roles are unknown, except for the catalytic Cys 145 . By examining the nuclear density, we established the protonation states of all Cys residues in the structure (Figs. S2 and S3). The side chains of residues Cys 16 , Cys 85 , Cys 117 , Cys 156 , Cys 160 , Cys 265 , and Cys 300 are protonated thiol groups. Conversely, the remaining Cys residues, Cys 22 , Cys 38 , Cys 44 , Cys 128 , and Cys 145 , contain deprotonated thiolates. Sulfur is a soft chemical base, with a large van der Waals radius of 1.84 Å; thus, a deprotonated thiol group is stable in solution at the physiological pH (pK a of free Cys is 8.35). Hence, it is not unexpected to observe deprotonated Cys residues in a protein structure at close-to-physiological pH. In addition, Cys 22 and Cys 128 are located close to the protein surface, with their negative charges further stabilized by the positive charges of nearby Lys 61 and Arg 49 from the second protomer, respectively (Fig.  S4). Cys 38 and Cys 44 are positioned within hydrophobic pockets, with their negative charges shielded from the bulk solvent. Deprotonation of Cys 145 is also consistent with it being reactive and oxidation prone, as reported previously (41). Accordingly, because we can unambiguously differentiate between seven protonated and five deprotonated Cys side chains, this provides strong evidence that the catalytic Cys 145 is in the reactive negatively charged thiolate form.

Discussion
The COVID-19 pandemic has claimed over a million lives globally to date because of the lack of therapeutic intervention options and vaccines that specifically target the SARS-CoV-2 virus. Rational or structure-assisted and computational drug design efforts could greatly benefit from a detailed structural knowledge of the drug targets, such as the SARS-CoV-2 3CL M pro enzyme. Considering that half of the protein atoms are hydrogens that play critical roles in various chemistries occurring in enzyme active sites and in inhibitor-binding events, it is of paramount importance to locate hydrogen atoms in a protein to understand enzyme function and to guide drug design. Neutron crystallography is the only technique capable of accurately determining the positions of hydrogen (and deuterium) atoms in biological macromolecules without causing radiation damage to the samples at near-physiological temperature (33). We have succeeded in determining a room-temperature neutron structure of the homodimeric SARS-CoV-2 3CL M pro enzyme mapping the protonation states of amino acid residues in the active-site cavity and throughout the protein structure. Thus, for the first time, we determined protonation states of all Neutron structure of SARS-CoV-2 main protease ionizable residues in a cysteine protease and any coronavirus protein. The neutron structure also reveals the elaborate hydrogen-bonding networks formed in the catalytic site, including those in the six substrate-binding subsites, and throughout the dimer interface. Protonation states are determined by the locations of hydrogen atoms, observed in the neutron structure as deuterium after the hydrogen/deuterium exchange was performed. Also, protonation states establish the electrical charges and therefore determine the electrostatic environment in the protein. The implications of knowing the actual protonation states in and around the active site are far-reaching because this information is essential for the structure-assisted and computational drug design. Structure-assisted drug design requires precise tailoring of inhibitor chemical groups to the electrostatic environment of an enzyme's binding site and has to take advantage of hydrogen-bonding opportunities because direct hydrogen bonds between a drug molecule and its protein target are the strongest noncovalent interactions that can dramatically improve the inhibitor-binding affinity. Moreover, if protonation states are not experimentally known and are assumed based on general chemical knowledge, computational methods will produce misleading results.
In the SARS-CoV-2 3CL M pro neutron structure, we discovered that the catalytic dyad is observed in the reactive zwitterionic state in the crystal at pD 7.0, having a deprotonated negatively charged Cys 145 and doubly protonated positively charged His 41 (Fig. 3), instead of the catalytically resting (nonreactive) state (26) in which both catalytic residues are neutral (i.e. protonated Cys 145 and singly protonated His 41 ). The negative charge on Cys 145 thiolate is stabilized by the sulfur atom being a soft chemical base with a large van der Waals radius and by the positive charge on His 41 side chain and the oxyanion hole main chain amides positioned in close proximity to the Cys 145 side chain. His 41 is hydrogen-bonded to the catalytic water molecule held in place by interactions with the doubly protonated His 164 and negatively charged Asp 87 . The substrate-binding subsite S1 contains the protonated positively charged N-terminal amine from protomer 2 and neutral His 163 . The nonprotonated Ne2 of His 163 is exposed and can be utilized as a hydrogen-bond acceptor by an inhibitor P1 substituent (Fig. 4). In the hydrophobic subsite S2, the Tyr 54 hydroxyl can act as a hydrogenbond acceptor for a P2 substituent with a hydrogen-bond donor functionality. Consequently, specific SARS-CoV-2 3CL M pro protease inhibitor design needs to consider the observed charge distribution in the enzyme active-site cavity, taking a special note that the catalytic site contains a zwitterionic catalytic dyad but appears to be neutral because the opposite charges on Cys 145 and His 41 located close to each other cancel out.
We also accurately mapped the hydrogen-bonding networks within the dimer interface of SARS-CoV-2 3CL M pro . Significantly, the N-terminal residues between Ser 1 and Cys 16 (and Ser 19 and Cys 169 ) account for the most hydrogen bonds between the two protomers (Fig. 5). There are 13 direct hydrogen bonds formed by each N terminus interacting with the opposite protomer. Ser 1 is an essential residue for building the correct architecture of substrate-binding subsite S1. Truncation of the first four N-terminal residues in SARS-CoV 3CL M pro has previ-ously been shown to disrupt dimerization and dramatically diminish enzyme activity (22). We therefore suggest that the enzyme area that interacts with residues 1-16 from the other protomer may be the most promising part of the dimer interface for designing SARS-CoV-2 3CL M pro -specific dimerization inhibitors, as was also proposed for the SARS-CoV enzyme (42,43).

Conclusion
We have successfully determined the neutron structure of the ligand-free SARS-CoV-2 3CL M pro at near-physiological (room) temperature and pH and mapped the locations of hydrogen atoms (observed as deuterium) in the enzyme. Thus, protonation states and electrical charges of the ionizable residues have been accurately resolved. Establishing the electrostatics in the enzyme active-site cavity and at the dimer interface is critical information to guide the structure-assisted and computational drug design of protease inhibitors, specifically targeting the enzyme from SARS-CoV-2.

Materials and methods General information
Protein purification columns were purchased from Cytiva (Piscataway, NJ, USA). Crystallization reagents were purchased from Hampton Research (Aliso Viejo, CA, USA). Crystallographic supplies were purchased from MiTeGen (Ithaca, NY, USA) and Vitrocom (Mountain Lakes, NJ, USA).

Cloning, expression, and purification of SARS-CoV-2 3CL M pro
The 3CL M pro (Nsp5 M pro ) from SARS-CoV-2 was cloned into pD451-SR plasmid harboring kanamycin resistance (ATUM, Newark, CA, USA), expressed, and purified according to the procedure published elsewhere (25). To make the authentic N terminus of 3CL M pro , the enzyme sequence is flanked by the maltose-binding protein followed by the protease autoprocessing site SAVLQ;SGFRK (down arrow indicates the autocleavage site), which corresponds to the cleavage position between NSP4 and NSP5 in the viral polyprotein. To form the authentic C terminus, the enzyme construct codes for the human rhinovirus 3C protease cleavage site (SGVTFQ;GP) connected to a His 6 tag. The N-terminal flanking sequence is autocleaved during the expression in Escherichia coli (BL21 DE3), whereas the C-terminal flanking sequence is removed by the treatment with human rhinovirus 3C protease (Millipore-Sigma). A detailed protocol for the enzyme expression, purification, and crystallization has been published elsewhere (44).

Crystallization
For crystallization, the authentic 3CL M pro is concentrated to ;4 mg/ml in 20 mM Tris, 150 mM NaCl, 1 mM tris(2-carboxyethyl)phosphine, pH 8.0. The methodology for growing large crystals of 3CL M pro suitable for neutron diffraction is described as follows. Initial protein crystallization conditions were discovered by screening conducted at the Hauptman-Woodward Medical Research Institute (45). Crystal aggregates were reproduced using the sitting-drop vapor-diffusion method using 25% PEG3350, 0.1 M Bis-Tris, pH 6.5, in 20-ml drops with 1:1 ratio of the protein:well solution. Microseeding using Hampton Research seed beads was performed to grow neutron-quality crystals in Hampton 9-well plates and sandwich box set-ups with 200-ml drops of protein mixed with 18% PEG3350, 0.1 M Bis-Tris, pH 6.0, 3% DMSO at a 1:1 ratio seeded with 1 ml of microseeds at a 1:500 dilution. This condition produced a final pH in the crystallization drop of 6.6 as measured by microelectrode. The crystal tray used to harvest a crystal for neutron diffraction was incubated initially at 18°C and then gradually lowered to 10°C over several weeks. The crystal grew to final dimensions of ;2 3 0.8 3 0.2 mm (;0.3 mm 3 ) in a triangular plate-like morphology. The crystal was mounted in a quartz capillary accompanied with 18% PEG3350 prepared with 100% D 2 O and allowed to hydrogen/deuterium exchange for several days before starting the neutron data collection.

Neutron diffraction data collection
The large crystal was screened for diffraction quality using a broad-bandpass Laue configuration using neutrons from 2.8 to 10 Å at the IMAGINE instrument at the high flux isotope reactor at Oak Ridge National Laboratory (46)(47)(48). Neutron diffraction data were collected using the macromolecular neutron diffractometer instrument MaNDi at the Spallation Neutron Source (47,(49)(50)(51). The crystal was held still at room temperature, and diffraction data were collected for 24 h using all neutrons between 2 and 4.16 Å. Following this, the crystal was rotated by Df = 10°, and a subsequent data frame was collected again for 24 h. A total of 23 data frames were collected in the final neutron data set. Diffraction data were reduced using the Mantid package, with integration carried out using three-dimensional TOF profile fitting (52). Wavelength normalization of the Laue data was performed using the Lauenorm program from the Lauegen suite (53,54). The neutron data collection statics are shown in Table S1.

X-ray diffraction data collection
The room-temperature diffraction data set was collected from the same crystal used for the neutron data collection using a Rigaku HighFlux HomeLab instrument equipped with a MicroMax-007 HF X-ray generator and Osmic VariMax optics. The diffraction images were obtained using an Eiger R 4M hybrid photon counting detector. Diffraction data were integrated using the CrysAlis Pro software suite (Rigaku Inc., The Woodlands, TX). Diffraction data were then reduced and scaled using the Aimless (54) program from the CCP4 suite (55); molecular replacement using PDB code 6WQF (25) was then performed with Molrep (54) from the CCP4 program suite. The protein structure was first refined against the X-ray data using Phenix.refine from the Phenix (56) suite of programs and the Coot (57) molecular graphics program to obtain an accurate model for the subsequent X-ray/neutron joint refinement. The geometry of the final model was then carefully checked with MolProbity (58). The X-ray data collection statics are shown in Table S1.

Joint X-ray/neutron refinement
The joint X-ray/neutron refinement of ligand-free 3CL M pro was performed using nCNS (40,59), and the structure was manipulated in Coot (57). After initial rigid-body refinement, several cycles of positional, atomic displacement parameter, and occupancy refinement were performed. The structure was checked for the correctness of side-chain conformations, hydrogen bonding, and orientations of D 2 O water molecules built based on the mF o 2 F c difference neutron-scattering length density maps. The 2mF o 2 DF c and mF o 2 DF c neutron-scattering length density maps were then examined to determine the correct orientations of hydroxyl (Ser, Thr, and Tyr), thiol (Cys), and ammonium (Lys) groups and protonation states of the enzyme residues. The protonation states of some disordered side chains could not be obtained directly and remained ambiguous. All water molecules were refined as D 2 O. Initially, water oxygen atoms were positioned according to their electron density peaks and then were shifted slightly in accordance with the neutron-scattering length density maps. All labile hydrogen positions in 3CL M pro were modeled as deuterium atoms, and then the occupancies of those atoms were refined individually within the range of 20.56 (pure hydrogen) to 1.00 (pure deuterium). Before depositing the neutron structure to the PDB, a script was run that converts a record for the coordinates of a deuterium atom into two records corresponding to a hydrogen and a deuterium partially occupying the same site, both with positive partial occupancies that add up to unity. The percentage of deuterium at a specific site is calculated according to the following formula: % deuterium = {Occupancy(deuterium) 1 0.56}/1.56.