The dynamic nature of the conserved tegument protein UL37 of herpesviruses

In all herpesviruses, the space between the capsid shell and the lipid envelope is occupied by the unique tegument layer composed of proteins that, in addition to structural roles, play many other roles in the viral replication. UL37 is a highly conserved tegument protein that has activities ranging from virion morphogenesis to directional capsid trafficking to manipulation of the host innate immune response and binds multiple partners. The N-terminal half of UL37 (UL37N) has a compact bean-shaped α-helical structure that contains a surface region essential for neuroinvasion. However, no biochemical or structural information is currently available for the C-terminal half of UL37 (UL37C) that mediates most of its interactions with multiple binding partners. Here, we show that the C-terminal half of UL37 from pseudorabies virus UL37C is a conformationally flexible monomer composed of an elongated folded core and an unstructured C-terminal tail. This elongated structure, along with that of its binding partner UL36, explains the nature of filamentous tegument structures bridging the capsid and the envelope. We propose that the dynamic nature of UL37 underlies its ability to perform diverse roles during viral replication.

Herpesviridae are a large family of viruses that infect a broad range of hosts from mollusks to birds to humans. All herpesviruses are composed of a capsid containing the dsDNA genome surrounded by a layer of proteins called the tegument, which is enclosed by the glycoprotein-studded lipid envelope. The tegument layer is stabilized by extensive protein/protein interactions and is largely structurally maintained even once the viral membrane is removed (1). The tegument includes 23 distinct virally encoded proteins, in addition to several host proteins (2). The tegument is further subdivided into the inner and outer tegument. The inner tegument is directly associated with the capsid, and protein copy numbers are tightly controlled. The outer tegument is less ordered and associates with the viral envelope instead of the capsid (2).
In addition to linking the capsid to the lipid envelope (3), tegument proteins play additional roles during viral infection (2). The inner tegument protein UL36, at ϳ330 kDa, the largest tegument protein (4), is involved in capsid transport (5,6), release of viral DNA (7), and virion maturation (6,8,9) and contains a deubiquitinating domain (10). Other tegument proteins are involved in regulation of viral and host gene expression. For example, the outer tegument protein UL48, also known as ␣-transactivator, activates transcription of immediate-early viral genes (11). Another outer tegument protein, UL41, is an endonuclease that suppresses host protein expression by degrading mRNA and is commonly termed vhs (viral host shutoff) (12).
Likewise, the inner tegument protein UL37 plays multiple roles in the viral replication cycle. Along with UL36 (13)(14)(15), which is directly bound to the viral capsid (16), UL37 initiates tegument accumulation on the capsid during virion assembly and forms filamentous structures that may serve as a scaffold for the outer tegument proteins (17,18). In HSV-1, 3 UL37 promotes formation of the enveloped viral particles, termed secondary envelopment, by binding viral envelope proteins gK and UL20, thereby linking capsid and envelope (3). Beyond structure and assembly, UL37 homologs from HSV-1 and PRV have been shown to facilitate efficient capsid trafficking during both viral entry and egress. During entry into epithelial cells, HSV-1 and PRV viruses lacking UL37 are delayed in trafficking, with about 50% of capsids failing to reach the nucleus (13). During egress, in the absence of UL37, HSV-1 and PRV capsids do not efficiently traffic from the nucleus to the sites of secondary envelopment (vesicles derived from Golgi (19) or endosomes (20)), and "naked" capsids accumulate within the cytoplasm (21,22). In HSV-1, UL37 has been shown to promote capsid transport on microtubules during egress in epithelial cells through its interaction with host trafficking protein dystonin/ BPAG1a (23). UL37 plays an even more prominent role in cap-sid trafficking in neurons during infection with HSV-1 and PRV. UL37-null viruses are defective in retrograde movement from axon termini to the nucleus and avirulent in the mouse model of infection, demonstrating the importance of efficient capsid trafficking in viral replication (24). Finally, HSV-1 UL37 manipulates the innate immune response by binding immune modulators TNF receptor-associated factor 6 (TRAF6) (25) and retinoic acid-inducible gene I (RIG-I) (26). Binding of UL37 to TRAF6 leads to activation of transcription factor NF-B, which subsequently localizes to the nucleus and turns on expression of immune response genes, such as interleukin-8 (IL-8), presumably to increase transcription of early viral genes (25). UL37 also deamidates RIG-I, thus preventing its activation by viral dsRNA, which decreases the overall interferon production and stunts the antiviral response (26).
The structural characteristics underlying multiple distinct activities of UL37 are unknown. We previously reported the crystal structures of the N-terminal half of UL37 (UL37N) from PRV and HSV-1 (27,28) that revealed the bean-shaped structure of UL37N and helped identify a conserved surface region essential for directional capsid trafficking during neuroinvasion (24). However, no structural information is yet available for the C-terminal domain of UL37 (UL37C) that mediates the majority of known interactions of UL37. Here, we show that UL37C forms an elongated, conformationally flexible species with a folded core and an unstructured C-tail. We hypothesize that, in addition to providing a large surface with many potential binding sites for its multiple binding partners, UL37C can adopt several distinct conformations, each of which mediates a spe-cific subset of UL37 activities. We propose that this conformational flexibility contributes to the multifunctionality of UL37.

Biochemical and structural characterization of PRV UL37
However, SEC-coupled multi-angle light scattering (SEC-MALS) estimated molecular masses of PRV UL37C(478 -919)-StII and UL37C(478 -884) to be 50 and 46 kDa, which is close to their respective theoretical molecular masses of 48 and 43 kDa. Thus, PRV UL37C is a monomer in solution (Fig. 2, B and C). To rule out concentration-dependent oligomerization, SEC-MALS of UL37C(478 -884) was performed at a range of concentrations: 1, 3, and 5 mg/ml. At all three concentrations, UL37C(478 -884) was a monomer with an estimated molecular mass of 46 kDa (data not shown). To confirm the lack of dimerization, the PRV UL37C constructs were incubated with bis(sulfosuccinimidyl)suberate (BS 3 ), a homobifunctional cross-linking reagent that reacts with primary amines, including the N-terminal amine and lysine side chains. Monomeric PRV UL37N (27) was used as a negative control. No high-molecular-mass species were observed for any construct in the presence of the cross-linker, consistent with a lack of oligomerization (Fig. 2D). The largest UL37C construct, UL37C(478 -919)-StII, migrated at ϳ55 kDa on SDS-PAGE (Fig. 2D), higher than its theoretical molecular mass of 48 kDa, yet migrated at ϳ48 kDa after incubation with BS 3 . By contrast, UL37C(478 -884) migrated at ϳ43 kDa with or without cross-linker (Fig.  2D). The increased mobility of UL37C(478 -919)-StII is indicative of intramolecular cross-linking, which prevents complete unfolding under SDS-PAGE conditions. An apparent lack of intramolecular cross-linking in UL37C(478 -884), which lacks the second half of the unstructured C-terminal tail, suggests that BS 3 reacts with a lysine side chain near the C terminus.

UL37C has a proteolytically resistant ␣-helical core
UL37C has high ␣-helical content, according to its CD spectra (Fig. 2E), but is predicted to have an unstructured C-terminal tail (Fig. S1). We used limited proteolysis to map the boundaries of the folded core of UL37C. Digests of UL37C(478 -919)-StII with either chymotrypsin or trypsin resulted in a single major proteolytic product migrating at ϳ40 kDa, with the trypsin-generated product being slightly larger (Fig. 3, A and B). N-terminal sequencing of both 40-kDa proteolytic products showed that both start with the GPGS sequence, generated by cleavage of the N-terminal His 6 -SUMO tag by PreScission protease during protein purification (Fig.  1B), implying that trypsin and chymotrypsin cleavage leave the Experimental molecular masses are plotted as black dots against the right y axis, as calculated across the protein elution peak. Theoretical molecular masses corresponding to those of a monomer are indicated as horizontal dashed gray lines. D, BS 3 chemical cross-linking of PRV UL37N (54 kDa), PRV UL37C(478 -919)-StII (48 kDa), and PRV UL37C(478 -884) (43 kDa). Gels were split to hide unrelated lanes, but contrast settings and molecular weight ladder remain consistent between gel sections. E, CD of PRV UL37C(478 -919)-StII. Helical content from UL37C was estimated by the K2D2 program (http://k2d2.ogic.ca/) using the spectrum over the range from 190 to 240 nm. Helical content from UL37N was reported previously (27).

Biochemical and structural characterization of PRV UL37
N terminus intact. Neither degradation product retained the C-terminal StII tag, as determined by streptavidin-horseradish peroxidase Western blotting (data not shown). Thus, both proteases cleave UL37C near the C terminus. Mass spectrometry analysis of the chymotrypsin digest showed a predominant peak at 41.8 kDa, consistent with cleavage after residue Leu-866 ( Fig. 3C and Fig. S2) (theoretical mass of 41.6 kDa). Trypsin generated a product ϳ2 kDa larger, which is consistent with cleavage after residue Arg-884 (theoretical mass 43.5 kDa) (Fig.  3C). According to the secondary structure prediction, both proteolytic sites are located within the unstructured C-terminal tail (Fig. 3C).

UL37C has a low melting temperature
To assess the overall stability of UL37C, we used the Thermofluor, or differential scanning fluorimetry, assay (29), which estimates the melting temperature (T m ) by measuring binding of a hydrophobic dye to hydrophobic regions exposed during unfolding. Thermal melting was done in 24 different buffer and salt conditions to identify those that could potentially stabilize the protein. The highest T m for UL37C(478 -919)-StII and UL37C(478 -884) was 49 and 45°C, respectively (Fig. 1C, Fig.  S3, and Table S1). By contrast, the highest T m for PRV UL37N is 61°C (Fig. 1C, Fig. S3, and Table S1). The highest T m for any construct fell within a pH range of 7.0 -7.5 and increased with the addition of 150 mM NaCl and 5-10% glycerol (Table S1). Regardless of buffer conditions, the lowest T m for PRV UL37N was still higher than any T m for PRV UL37C. The relatively low melting temperature of UL37C could be due to a weak hydrophobic core (30).

UL37C is an elongated, conformationally flexible molecule
We have been unable to crystallize any PRV UL37C constructs despite extensive efforts. To obtain structural information on UL37C, we employed small-angle X-ray scattering coupled with SEC (SEC-SAXS), which is used to characterize conformational state and flexibility of macromolecules in solution. SAXS profiles were obtained for three constructs of PRV UL37C and for HSV-1 UL37N, a compact, conformationally rigid protein that served as a control ( Fig. 4 and Fig. 5A). PRV UL37C(478 -919)-StII and HSV-1 UL37N had only one intensity peak, whereas UL37C(478 -884) and UL37C(618 -919)-StII each had two peaks. The later peaks correspond to a smaller radius of gyration (R g ) around 5-10 Å, potentially due to degradation products (Fig. 4, B and C). The R g profile across the primary peak for all samples slopes downward, but frames chosen for analyses had R g values within Ϯ1 Å of each other. Parameters calculated from SAXS data are summarized in Table 1.
For PRV UL37C(478 -919)-StII, UL37C(478 -884), and HSV-1 UL37N, SAXS-derived molecular masses (31) were consistent with the theoretical values (Fig. 1C), and Guinier plots were linear, which indicated good data quality and no significant protein aggregation (Fig. 5B). By contrast, the SAXS-derived molecular mass of PRV UL37C(618 -919)-StII was 85 kDa, which is more than twice its theoretical molecular mass of 34 kDa (Fig. 1C). In the Guinier plot, the slight upturn at the lowest q values and steeper slope indicated aggregation for this construct (Fig. 5B). In addition, the R g increased before flattening out within the intensity peak instead of decreasing, as seen with the other three traces (Fig. 4). Such highly variable R g indicates poor data quality and an unreliable R g calculation, and

Biochemical and structural characterization of PRV UL37
therefore the UL37C(618 -919)-StII data were not analyzed further. SAXS data show that whereas both PRV UL37C(478 -919)-StII and UL37C(478 -884) constructs are at least 10 kDa smaller than HSV-1 UL37N, they have larger R g and maximum dimensions (D max ) ( Table 1). Globular proteins have symmetrical bell-shaped distance distribution curves. The distance distribution curve of HSV-1 UL37N is slightly asymmetric, which is consistent with a bean-shaped structure observed in the crystals (Fig. 5C). In contrast, the obviously asymmetrical distance distribution curves that trail off at large distances for PRV UL37C(478 -919)-StII and UL37C(478 -884) (Fig. 5C) are characteristic of elongated and/or flexible structures.
The default settings in the program DATGNOM (used to evaluate the distance distribution) force the distance distribution function to 0 at the determined D max . Using these default settings, the D max for PRV UL37C(478 -884) and UL37C(478 -919)-StII was estimated as ϳ125 and ϳ140 Å, respectively (Fig.  5C, inset). However, when not forced to 0 at D max , the distance distribution function for UL37C has a pronounced "tail" that runs asymptotically to 0, well beyond 200 Å (Fig. 5C). Such behavior of the distribution function can be attributed to either aggregation or flexibility. Given that all other quality checks indicated that UL37C samples were not aggregated, based on the shape of the distribution function, we conclude that PRV UL37C is flexible. Whereas D max is difficult to define for a flexible protein, ϳ125 and ϳ140 Å represent the major transitions toward zero (i.e. transitions from average electron density of the protein to average electron density of the solvent), suggesting that they represent the D max values of the most common conformations adopted by each of the two UL37C constructs, respectively. In contrast, the long tail is representative of more extended conformations present in lower frequencies. Structural parameters for the distance distribution analysis are reported both for the scenario when the distance distribution curve is forced to zero at D max and when it is not (Table 1).
To estimate conformational flexibility, we used the dimensionless Kratky plots. HSV-1 UL37N has a symmetrical bellshaped curve that reaches its peak at qR g ϭ ͌3 and then returns to zero, which is characteristic of globular, ordered proteins ( Fig. 5D and Fig. S4). In contrast, intrinsically disordered proteins have Kratky plots that continue to increase after qR g ϭ ͌3. Both PRV UL37C(478 -919)-StII and UL37C(478 -884) had mixed profiles, with a small decrease after qR g ϭ ͌3, indicating a balance between order and disorder, which is charac-

Biochemical and structural characterization of PRV UL37
teristic of a conformationally flexible yet mostly folded protein ( Fig. 5D and Fig. S4). Taken together, the SAXS data suggest that UL37C is an elongated, conformationally flexible protein, in contrast to the slightly elongated, rigid UL37N.
The flexibility of the UL37C constructs was further analyzed using ensemble optimization modeling (EOM), which describes the size distribution of species in a heterologous mixture (32,33). EOM version 2.0 uses the R flex metric to quantify the differences between flexible and rigid systems, where R flex ϭ 0% for a fully rigid system and R flex ϭ 100% for a fully flexible system (33). The R flex of a random pool, which represents 10,000 possible conformations, is typically between 85 and 90%. For a rigid system, R flex would be expected to decrease significantly after ensemble optimization modeling. However, the R flex for either UL37C(478 -919)-StII or UL37C(478 -884) decreased only 2-3%, which is characteristic of a highly flexible system (Fig. 6, A and B).
EOM generates probability density functions, representing the likelihood that the protein is found in a specific conforma-tion and the relative ratios of those conformations. Multiple ensembles for both constructs show that the majority of the protein is found in a compact conformation, centered around a D max of ϳ150 and 130 Å for the longer and the shorter construct, respectively (Fig. 6, A and B). The ensembles also indicate the presence of extended conformations for both constructs, with D max of 200 -250 Å. No discrete extended conformation was found across all ensembles (Fig. 6, A and B), indicating that the extended conformation is highly flexible. Representative models of the compact and extended conformations for each construct (Fig. 6, C and D) show possible conformations but do not represent the only solution.
To complement EOM analysis, we turned to ab initio bead modeling, which can generate an envelope shape to match the experimental data (Fig. 5A). Ten ab initio models were generated independently and averaged for PRV UL37C(478 -919)-StII, UL37C(478 -884), and HSV-1 UL37N. The ab initio bead model of HSV-1 UL37N closely resembles the bean-shaped crystal structure (Fig. 6G), whereas the ab initio bead models of

Biochemical and structural characterization of PRV UL37
UL37C(478 -919)-StII and UL37C(478 -884) show extended, S-shaped envelopes (Fig. 6, E and F). In a flexible system, however, the ab initio model represents an average of all possible conformations. Therefore, the ab initio bead models of PRV UL37C(478 -919)-StII and UL37C(478 -884) should not be construed as accurate structures. Nevertheless, the D max values of the models fall at the larger end of the compact populations identified in EOM (Fig. 6, A and B), supporting the observation that PRV UL37C is predominantly found in this conformation.

UL37FL is composed of two independently folded halves that do not interact
The low yield of purified PRV UL37FL ruled out SAXS measurements yet permitted thermostability analysis using the Thermofluor method. UL37FL showed two distinct melting transitions, at 46 and 60°C (Fig. 1C, Fig. S3D, and Table S1). These temperatures are close to the individual melting temperatures of UL37C and UL37N, 49 and 61°C, respectively ( Fig. 1C and Fig. S3, and Table S1). The melting profile of PRV UL37FL suggests that UL37N and UL37C have individual hydrophobic cores with distinct thermal stabilities. Individually expressed PRV UL37N and PRV UL37C(478 -919)-StII do not form a stable complex under the conditions of the SEC experiment (Fig.  7) and are not cross-linked by BS 3 (Fig. 2D). We conclude that UL37N and UL37C do not interact and, within UL37FL, exist as independent domains connected by a hinge or linker.

Discussion
UL37 is a conserved inner tegument protein that plays multiple diverse roles in viral replication. In HSV-1 alone, UL37 interacts with viral proteins UL36 (34) and gK (3) and host proteins dystonin (23), TRAF6 (25), and RIG-I (26). Previously, we determined the crystal structures of UL37N from HSV-1 and PRV. Given that most of these interactions map to the C-terminal half of UL37, there is a considerable interest in determining the structure of UL37C. Here, we characterized biochemical properties of PRV UL37C and determined its molecular envelope by SAXS. We found that whereas UL37N has a compact bean-shaped structure, PRV UL37C is an elongated molecule. Due to this nonglobular shape, PRV UL37C constructs migrate as apparent dimers or trimers by SEC, which depends not only on molecular mass but also shape of the protein. Nonetheless, SEC-SAXS, MALS, and cross-linking clearly show that PRV UL37C constructs are monomeric in solution. Thus, PRV UL37C is an elongated monomer. Surprisingly, PRV UL37C has a relatively low melting temperature, 45-49°C, in contrast to the melting temperature of 61°C for UL37N. A low melting temperature could reflect a weak hydrophobic core (30), potentially due to its elongated shape and the unstructured C terminus. Proteins with intrinsically disordered regions typically have fewer hydrophobic residues and, thus, weaker hydrophobic cores (35). The differences in order between UL37N and UL37C can also be seen in a plot indicating which residues in a protein sequence are folded versus unfolded (Fig. 8A and Fig. S1) (36). The residues predicted to be unfolded in PRV and HSV-1 UL37 are found almost exclusively within the C-terminal half of the protein (Fig.  S1) (36,37). The specific residues important for UL37 interactions with gK/UL20, UL36, RIG-I, and BPAG map to ordered regions ( Fig. 8A and Fig. S1), suggesting that although the binding sites may be rigid, the surrounding disordered regions could provide the flexibility needed to modulate the accessibility of the binding site. In this way, UL37 could control its choice of binding partners in different cellular locations or during different stages of the viral replication cycle.
UL37C has a high ␣-helical content, consistent with secondary structure predictions, and a large proteolytically stable core connected to an unstructured C-terminal tail (C-tail). BS 3 treatment of UL37C(478 -919)-StII yielded faster-migrating species indicative of intramolecular cross-linking. The BS 3 cross-linker is 11.4 Å long and can generate intramolecular cross-links due to flexibility in both the side chains and carbon backbone (38). The UL37C(478 -919)-StII construct has four lysine residues: three within the UL37C itself (Lys-574, Lys-765, and Lys-919) and one within the StII tag. Lack of intramolecular cross-linking in UL37C(478 -884) implies that either Lys-919 within the unstructured C-tail or the lysine within StII is crosslinked with either one of the other two lysines or the N-terminal amine, resulting in the observed faster-migrating species in UL37C(478 -919)-StII. This suggests that the unstructured C-tail contacts the folded core of UL37C.
In the absence of a crystal structure, SAXS was used to determine the shape of UL37C and to assess its conformational flexibility. The distance distribution function for both UL37C(478 -919)-StII and UL37C(478 -884) indicates that UL37C is an elongated protein, in agreement with the SEC results, whereas the Kratky plots suggest that it is conformationally flexible, but not intrinsically disordered. The downward sloping R g profiles within SEC intensity peaks are probably representative of the inherent flexibility of UL37C, where the more extended conformations are on one side of the peak and more compact conformations are on the other (Fig. 4). Further, the extensive tailing in the distance distribution functions for PRV UL37C constructs, when not forced to 0, supports our observations that UL37C is flexible. Whereas defining an accurate D max for a flexible protein is difficult, ϳ125 and ϳ140 Å represent the D max of the conformations most frequently

Biochemical and structural characterization of PRV UL37
adopted by each of the two UL37C constructs, respectively. These values are consistent with ensemble modeling of both UL37C constructs, which demonstrate a preference for the most compact conformations between 130 and 150 Å in length. EOM also shows that the remaining population is found in various extended conformations, but at lower frequencies, which contribute to the asymptotic tail in the distance distribution function. The unstructured C-tail of UL37C likely contributes significantly to this conformational flexibility by adopting various orientations. However, the construct UL37-478-884, which lacks about half of the C-tail, remains conformationally flexible, suggesting that the core is likely not rigid either. Spectrin repeats provide a useful model for the conformational flexibility of the UL37C core. Spectrin repeats are composed of ␣-helical bundles that assemble into filaments. SAXS experiments have shown that whereas individual spectrin repeats are rigid, constructs containing several repeats yielded Kratky plots similar to UL37C, suggesting that rigid units can be flexible in relationship to each other without being intrinsically disordered (39). We hypothesize that the inherent flexibility of UL37C precluded its crystallization.
We did not observe any interaction between the individually expressed UL37N and UL37C from PRV by either SEC or chemical cross-linking. We hypothesize that within full-length UL37, these domains either have limited interactions or do not interact at all. This hypothesis is supported by the biphasic melting profile of full-length PRV UL37, where transitions occur at the temperatures corresponding to melting temperatures of individually expressed PRV UL37N and UL37C. UL37N and UL37C are thus independently folded domains connected by a linker or a hinge within UL37. Fig. 8B shows a model of PRV UL37, composed of a stable, rigid UL37N and a conformationally flexible UL37C that resembles an arm or hook (Fig. 8B). We propose that the dynamic nature of UL37C may contribute to the multifunctionality of UL37, whereby distinct conformations could engage different binding partners and perform different roles during viral replication. Flexibility of disordered regions has been proposed as a significant factor in the ability of intrinsically disordered proteins to bind to two or more partners (40). Such a concept could help to explain the ability of UL37 to perform diverse roles during replication.
Viral proteomes have been shown to contain a greater percentage of intrinsic disorder than their human host proteome (41). Moreover, the intrinsically disordered regions of proteins have been shown to significantly contribute to the overall pathogenicity or oncogenicity of viruses (41). One well-characterized example is the E6 protein from human papillomavirus. Previous analysis showed that E6 from human papillomavirus genotypes that have higher oncogenic potential (high risk) contain a greater percentage of predicted intrinsically disordered residues when compared with low-risk genotypes (42). Whereas intrinsically disordered viral proteins implicated in pathogenesis have been identified in several viral families, there are no current examples from herpesviruses. Structural information on tegument proteins is scarce. By demonstrating the inherent flexibility within a highly conserved tegument protein, our work expands our knowledge of tegument proteins and will inform future studies of this complex virion layer.
Previously, it has been proposed that UL36 and UL37 form long filamentous structures extending away from the capsid  (Fig. 6, C and D).

Biochemical and structural characterization of PRV UL37
vertices within the virion (17,18). A recombinant construct corresponding to the central third of UL36 forms long fibers in vitro (43). Here, we showed that UL37C forms an extended, conformationally flexible structure. Given that UL37N has a compact, slightly elongated structure, we hypothesize that the extended, flexible UL37C contributes to the filamentous structures observed in the virion. Future characterization of UL37 structure and interactions will lead to a more complete understanding of the complex tegument protein UL37.

Cloning
The PRV UL37 gene with a C-terminal StII tag, codon-optimized for E. coli expression, was synthesized by GeneArt. The PRV UL37(1-919)-StII fragment was excised from the GeneArt plasmid using BamHI and HindIII and subcloned into either pJP4 or pGEX-6P1. The pJP4 plasmid contains a His 6 -SUMO-PreScission tag in frame with the BamHI restriction site of the multiple-cloning site in a pET24b vector. The pGEX-6P1 plasmid contains a GSH S-transferase (GST)-PreScission tag in frame with the BamHI restriction site of the multiple-cloning site. The resulting plasmids encode PRV UL37 1-919-StII with either a cleavable N-terminal His 6 -SUMO tag (pJP14) or an N-terminal GST tag (pAK26). DNA encoding UL37C(478 -919)-StII was amplified by PCR from the codon-optimized PRV UL37 gene using the primers JP46 (CTAGGGATCCGGTCT-GCGTGCAGATGGTGCC, BamHI site underlined) and AK7 (CGCAAGCTTCATTTTTCAAACTG, HindIII site underlined) and subcloned into pJP4 vector using the BamHI and HindIII restriction sites to yield plasmid pAK1. DNA encoding UL37C(478 -884) was amplified by PCR from the codon-optimized PRV UL37 gene using the primers JP46 and AK23 (AA-AAAACTAGAAGCTTCTAACGCAGAACTTCCAGATT-AAC, HindIII site underlined) and subcloned into pJP4 using the BamHI and HindIII restriction sites to yield plasmid pAK13. DNA encoding UL37C(499 -884) was amplified by PCR from the codon-optimized PRV UL37 gene using the primers JP79 (CTAGGGATCCGATCTGGCAGCCGCAGC-CGAT, BamHI site underlined) and AK23 and subcloned into pJP4 using the BamHI and HindIII restriction sites to yield plasmid pAK3. DNA encoding UL37C(618 -919)-StII was amplified by PCR from the codon-optimized PRV UL37 gene using the primers AK118 (CTAAAGGATCCGCTCCGGCACCTC-CGA, BamHI site underlined) and AK7 and subcloned into pJP4 using the BamHI and HindIII restriction sites to yield plasmid pAK19. Construction of plasmid pJP23 encoding PRV UL37N with an N-terminal His 6 -SUMO tag was described previously (27). Construction of plasmid pAK11 encoding HSV-1 UL37N with an N-terminal His 6 -SUMO tag was described previously (28).

Recombinant protein expression and purification
All constructs except pAK26 were expressed in low background strain (LoBSTr) E. coli (a gift from Thomas Schwartz, Massachusetts Institute of Technology). Freshly transformed cells were incubated at 37°C overnight in 5 ml of LB starter culture supplemented with 50 g/ml kanamycin and 34 g/ml chloramphenicol. The starter culture was diluted into 1 liter of LB supplemented with 50 g/ml kanamycin and 34 g/ml chloramphenicol and grown at 37°C until the A 600 reached 0.8 -1.0. At this point, the temperature was shifted to 16°C, and the cells were induced with 0.5 mM isopropyl-␤-D-thiogalactopyranoside for 16 -20 h. Cells were harvested by centrifugation at 5,000 ϫ g for 40 min, resuspended in 40 ml of buffer A (100 mM HEPES, pH 7.5, 150 mM NaCl, 5% glycerol, 0.1 mM tris(2carboxyethyl)phosphine (TCEP)) for pAK1, pAK3, pAK13 or buffer B (20 mM HEPES, pH 7.5, 50 mM NaCl, 5% glycerol, 0.1 mM TCEP) for pJP14 or buffer C (100 mM PIPES, pH 7.5, 150 mM NaCl, 5% glycerol, 0.1 mM TCEP) for pAK19 and lysed using a Microfluidizer. All lysis buffers were supplemented with 15 mM imidazole to reduce nonspecific binding during the subsequent metal-affinity capture step. The insoluble fraction was removed by centrifugation at 13,000 ϫ g for 30 min at 4°C. Soluble lysate was loaded onto a 5-ml nickel-Sepharose 6B FF column (GE Healthcare). The column was subsequently washed with buffer A, B, or C, depending on the construct, containing increasing amounts of imidazole from 15 to 100 mM. Protein was eluted in buffer A, B, or C containing 300 mM imidazole. The protein concentration was determined from the absorbance at 280 nm using a calculated extinction coefficient. Purified recombinant GST-tagged HRV3C (PreScission) protease was added to the protein solution at a 1:30 protease/protein ratio, and the protein was cleaved overnight at 4°C to remove the His 6 -SUMO tag. Cleaved constructs were further purified by SEC using a Superdex 200 column in buffer A, B, or C (GE Healthcare). pAK26 was purified as described above, with some modifications. pAK26 was expressed in LB supplemented with 100 g/ml ampicillin and 34 g/ml chloramphenicol. Cells were harvested and lysed in buffer C. The soluble fraction was captured on GSH-Sepharose 4B (GE Healthcare), eluted with 10 mM reduced GSH in buffer C, and further purified by SEC. pJP23 and pAK11 were expressed and purified as described previously (27,28). Protein purity was assessed by SDS-PAGE and GelCode Blue staining (Thermo Fisher Scientific).

Thermofluor assay
Protein was diluted into 1ϫ TBS to a final concentration of 0.15 mg/ml. The fluorescent dye SYPRO Orange (Invitrogen) was added at a 1:1,000 dilution. Ten l of the protein-dye solution was pipetted into each well of a 96-well PCR microplate. Next, 10 l of buffer (from a custom-made screen containing buffers at pH 4.5-10.5 and sodium chloride concentrations ranging from 0 to 500 mM, 24 conditions in total) was added to wells containing the protein-dye solution. The plate was sealed and centrifuged for 1 min at 500 ϫ g and 25°C. Samples were analyzed on a Roche Applied Science LightCycler 480 quantitative PCR machine using an excitation wavelength of 465 nm and detection of emission at 610 nm. The emission signal was analyzed from 25 to 95°C at a continuous acquisition rate of 3 measurements/°C. Data were analyzed using the ThermoQ software program.

Crystal screening
PRV UL37C constructs were screened at various concentrations in nine screens (in-house Harrison Lab Grid Screen, Clas-

Biochemical and structural characterization of PRV UL37
sics Suite (Qiagen), Protein Complex Suite (Qiagen), Index (Hampton Research), Peg/Ion (Hampton Research), SaltRx (Hampton Research), Top 96 (Anatrace), and Wizard 1-4 (Rigaku)) using a 96-well sitting-drop vapor diffusion format with drops containing 0.2 l of crystallization solution and 0.2 l of protein dispensed by a Crystal Phoenix liquid handling robot (Art Robbins). Plates were stored at room temperature and were evaluated using a stereo microscope daily for a week and throughout subsequent months.

Circular dichroism
Secondary structure content of UL37(478 -919)StII was estimated using CD. Purified UL37(478 -919)StII was exchanged into CD buffer (20 mM Tris, pH 8.0, 150 mM sodium chloride, 5% glycerol, 0.5 mM TCEP) and diluted to 0.2 mg/ml. The far-UV CD spectrum from 190 to 250 nm was measured at room temperature on a JASCO model 810 spectropolarimeter using a 50-nm/s scan speed, 1-nm band pass value, and 1-s response time. For each sample, three scans were collected and data were averaged, a buffer blank spectrum was subtracted, and the sample was processed for noise elimination. Measurements are presented in units of ␦⑀. The spectra were analyzed using Dichroweb (http://dichroweb.cryst.bbk.ac.uk/html/ home.shtml) 5 (50).

Multi-angle light scattering
The oligomeric state of UL37C(478 -884) was assessed by SEC-MALS using a Wyatt Dawn Heleos II multi-angle light scattering detector and Optilab TrEX refractive index monitor with an Agilent isocratic HPLC system. The construct was purified as described above and prepared at three different concentrations (1, 3, and 5 mg/ml) in SEC-MALS buffer (100 mM HEPES, pH 7.5, 150 mM sodium chloride, 5% glycerol, 0.1 mM TCEP). Separation steps were performed in SEC-MALS buffer with a Tosoh G4SWxl column at a flow rate of 0.5 ml/min. The oligomeric state of UL37C(478 -919)-StII was assessed by SEC-MALS as described above, except using an Akta isocratic HPLC system with a GE Superose 6 10/300 column at a flow rate of 0.3 ml/min. Data analysis for both constructs was performed with the Astra software package version 6.0.5.3 (Wyatt).

Cross-linking
BS 3 (Thermo Fisher Scientific) was added to 100 pmol of purified UL37(478 -884)StII at a 0:1, 25:1, 50:1, 250:1, and 500:1 cross-linker/protein molar ratio. Samples were incubated in amber tubes in the dark at room temperature for 30 min, at which point 1 M Tris, pH 8.0, was added to a final concentration of 50 mM and incubated for 15 min to stop the reaction. Purified HSV-1 UL37N was used as a negative control. Half of the reaction was analyzed by SDS-PAGE and Coomassie staining.
For MS, 200 g of purified UL37C(478 -919)-StII was digested with chymotrypsin at a 1:200 protease/protein mass ratio as described above. After 1 h, the reaction was stopped by adding phenylmethanesulfonyl fluoride to a final concentration of 10 mM. The mixture was analyzed by MALDI-TOF on a Voyager DE-PRO instrument using a dihydroxybenzoic acid matrix at the Tufts University Core Facility.
For N-terminal sequencing, 75 g of purified UL37C(478 -919)-StII was digested with either chymotrypsin at a 1:200 or trypsin at a 1:100 protease/protein mass ratio, separated on a 4 -15% SDS-PAGE, and transferred to polyvinylidene fluoride membrane in 50 mM CAPS, pH 10.5, supplemented with 10% methanol using a semi-dry transfer apparatus. The membrane was stained in 40% methanol with 0.05% Coomassie Blue R-250, followed by destaining, first in 50% methanol and then water. Pieces cut from the dried membrane were subjected to N-terminal sequencing by Edman degradation at the Tufts University Core Facility.

SEC-SAXS
SEC-SAXS experiments were performed at the BioSAXS facility at beamline G1 at the Cornell High Energy Synchrotron Source (Ithaca, NY). UL37C(478 -919)-StII at ϳ6 mg/ml, UL37C(478 -884) at 3 mg/ml, and HSV-1 UL37N at 6 mg/ml were precleared of aggregates by centrifugation at 16,000 ϫ g for 5 min at 4°C. 100-l aliquots were injected onto a Superdex 200 Increase 5/150 column (GE Healthcare) in buffer A at 0.3 ml/min at 4°C and fed directly into a proprietary SEC-SAXS flow cell with a glass window. Samples were irradiated with a 9.9496-keV (1.246122-Å) beam with 5.7 ϫ 10 11 -photon/s flux and diameter of 250 m ϫ 250 m, and 2-s images were recorded through the duration of the run on a Pilatus 100K-S detector (Dectris) at a sample-to-detector distance of 1531.09 mm, resulting in a q range of 0.006 Å Ϫ1 Ͻ q Ͻ 0.8 Å Ϫ1 , where scattering vector q ϭ 4sin()/.
Data were processed using the RAW software package (https://sourceforge.net/projects/bioxtasraw/) 5 (44) by averaging frames with a consistent R g value (Ϯ1 Å) and subtracting averaged buffer frames. Specific frames were chosen based on which region gave the highest quality DATGNOM solution, which sometimes required exclusion of frames even with consistent R g values. Subtracted curves were further analyzed in RAW and using programs in the ATSAS software package (https://www.embl-hamburg.de/biosaxs/software.html) 5 (51), including DATRG, DATGNOM, DAMMIN (45), DAMAVER (46), and EOM version 2.0 (33). DATRG was used to evaluate uncertainty in R g estimates. DATGNOM was used to generate and automatically evaluate the pair distance distribution function (P(r)) from the subtracted experimental data before manual adjustment of the automatically assigned D max value to improve P(r) shape. DAMMIN was used to generate ab initio shape determination by simulated annealing using a single-phase dummy atom. DAMAVER was used to align 10 models generated from DAMMIN (using P(r) functions not forced to 0, with a D max of 250 Å), select the most typical one, and build an averaged model. In EOM, a pool of 10,000 random native-like structures representing most of the structural space available to the sequence of UL37C were generated. Theoretical SAXS curves were generated for random combinations of these models, and the ensemble was optimized with a genetic algorithm to best fit the experimental data. No symmetry information or structured domains were provided. The initial pool and the final ensemble were given flexibility metrics relating directly to the amount of information entropy in the system. R g and D max values calculated from the species in each group were represented in histograms. Some low-angle data points in the Guinier region were excluded from the other three approximations, indicated by the gray dots in the Guinier plot (Fig. 4B). Data for UL37C(478 -919)-StII, UL37C(478 -884), and HSV-1 UL37N have been deposited into the SASBDB under codes SASDDQ9, SASDEL4, and SASDDP9, respectively.