Crystal structure of a novel domain of the motor subunit of the Type I restriction enzyme EcoR124 involved in complex assembly and DNA binding

Although EcoR124 is one of the better-studied Type I restriction-modification enzymes, it still presents many challenges to detailed analyses because of its structural and functional complexity and missing structural information. In all available structures of its motor subunit HsdR, responsible for DNA translocation and cleavage, a large part of the HsdR C terminus remains unresolved. The crystal structure of the C terminus of HsdR, obtained with a crystallization chaperone in the form of pHluorin fusion and refined to 2.45 Å, revealed that this part of the protein forms an independent domain with its own hydrophobic core and displays a unique α-helical fold. The full-length HsdR model, based on the WT structure and the C-terminal domain determined here, disclosed a proposed DNA-binding groove lined by positively charged residues. In vivo and in vitro assays with a C-terminal deletion mutant of HsdR supported the idea that this domain is involved in complex assembly and DNA binding. Conserved residues identified through sequence analysis of the C-terminal domain may play a key role in protein–protein and protein–DNA interactions. We conclude that the motor subunit of EcoR124 comprises five structural and functional domains, with the fifth, the C-terminal domain, revealing a unique fold characterized by four conserved motifs in the IC subfamily of Type I restriction-modification systems. In summary, the structural and biochemical results reported here support a model in which the C-terminal domain of the motor subunit HsdR of the endonuclease EcoR124 is involved in complex assembly and DNA binding.

Although EcoR124 is one of the better-studied Type I restriction-modification enzymes, it still presents many challenges to detailed analyses because of its structural and functional complexity and missing structural information. In all available structures of its motor subunit HsdR, responsible for DNA translocation and cleavage, a large part of the HsdR C terminus remains unresolved. The crystal structure of the C terminus of HsdR, obtained with a crystallization chaperone in the form of pHluorin fusion and refined to 2.45 Å, revealed that this part of the protein forms an independent domain with its own hydrophobic core and displays a unique ␣-helical fold. The full-length HsdR model, based on the WT structure and the C-terminal domain determined here, disclosed a proposed DNA-binding groove lined by positively charged residues. In vivo and in vitro assays with a C-terminal deletion mutant of HsdR supported the idea that this domain is involved in complex assembly and DNA binding. Conserved residues identified through sequence analysis of the C-terminal domain may play a key role in proteinprotein and protein-DNA interactions. We conclude that the motor subunit of EcoR124 comprises five structural and functional domains, with the fifth, the C-terminal domain, revealing a unique fold characterized by four conserved motifs in the IC subfamily of Type I restriction-modification systems. In summary, the structural and biochemical results reported here support a model in which the C-terminal domain of the motor subunit HsdR of the endonuclease EcoR124 is involved in complex assembly and DNA binding.
Restriction (R) 2 enzymes, or restriction endonucleases, are a diverse group of site-specific DNA cutters, acting in tandem with methyltransferases (MTases) capable of DNA modification (M) through nucleotide methylation. Tens of thousands of putative restriction endonucleases and MTases have been identified throughout bacterial and archaeal kingdoms, and thousands of restriction-modification (R-M) proteins have been biochemically or genetically described (1). The staggering diversity of R-M enzymes is attributed to their role as a prokaryotic "immune system" (2); these proteins are able to recognize specific sequences in DNA, read out their methylation state, and act accordingly, either by cutting unmethylated DNA ("foreign DNA," e.g. bacteriophages) or by methylating the daughter DNA strand generated during the preceding replication event, thereby protecting the cell's own DNA from degradation. R-M enzymes have also been implicated in horizontal gene transfer (3), differential gene expression through epigenetic modification (4,5), and driving divergent evolution (6,7). The practical impact of R-M enzymes on genetic engineering and DNA fingerprinting can hardly be overstated (8).
R-M enzymes are historically divided into four types (I-IV) based on their subunit composition and biochemical, genetic, and other properties (1,9). EcoR124 is one of the best-described Type I R-M enzymes (10) along with EcoAI (11), EcoBI (12), and EcoKI (13). Two key features of the Type I enzymes are their multimeric composition (one specificity (HsdS, S) and two modification subunits (HsdM, M) form the trimeric MTase (M 2 S) that can be supplemented by two restriction subunits (HsdR, R) to complete the pentameric holoenzyme (R 2 M 2 S)) and their ability to translocate DNA before cleaving it at a distant random site (14). Both structural and functional complexity are implied by these two characteristics and have indeed become apparent in decades of research. Type I enzymes split into five families, A-E, based on complementation tests, anti-body cross-reactivity, and sequence similarity (8). EcoR124 belongs to the Type IC family, along with EcoprrI, EcoDXXI, and NgoAV (8).
To study DNA recognition, methylation, translocation, and cleavage, attempts have been made to obtain structures of the individual Type I R-M subunits and their assemblies using X-ray crystallography and EM techniques. These attempts have been met with varying success: a low-resolution structure (ϳ18 Å) of MTase.EcoKI (PDB codes 2y7c and 2y7h (15)) and the 3.2-Å DNA-bound MTase from Thermoanaerobacter tengcongensis were solved (PDB code 5YBB (16)) as well as unbound HsdM (PDB codes 2AR0, 3KHK, 2LKD, and 3UFB (17)), HsdS (PDB codes 1YF2 (18), 1YDX (19), and 3OKG (20)), and HsdR subunits (PDB codes 2W00 (21) and 3H1T (22)) from several species of microorganisms. However, the structural information available to date is incomplete and fragmented; structures of the subunits often originate from different R-M systems or even species, and some are only partially resolved.
The first structure of HsdR was obtained from EcoR124 in 2009 (PDB code 2W00 (21)), along with four single-point mutants in later years (PDB codes 4BE7, 4B4B, and 4BEC (23) and 4XJX). HsdR was described as having four domains (endonuclease, two RecA-like helicases, and one helical domain); however, a large stretch of the C terminus (residues 893-1038) was not resolved and said to be disordered (21). It was initially suggested that these amino acids add to the fourth helical domain. As demonstrated for a related Type I R-M enzyme, EcoKI, in partial proteolysis experiments, cleaving off 155 terminal residues of HsdR produced a fragment stable in the presence of ATP but unable to bind to MTase (24). De novo modeling of HsdR from EcoR124 supported the notion of the C terminus actively participating in complex formation but revealed little about its structural organization except for a relatively short coiled-coil fragment and an overall disordered nature (25). Another known crystal structure of HsdR from Vibrio vulnificus also lacks ϳ220 C-terminal residues and folds into three domains (endonuclease and two RecA-like helicases), providing no further insight into the 3D structure of the C terminus. A structure of the full EcoR124I complex was derived using EM and small-angle scattering (neutron and X-ray) (26). However, the ring-like shape together with a resolution of only 21 Å unfortunately does not allow orientation of HsdR in the EM map unambiguously. Additionally, the HsdM subunit used for fitting into the EM map was a low-resolution model based on a distant homolog with missing parts modeled de novo. Although this model appropriately describes the open and closed conformations of the full pentameric complex, it is not suitable to explore the HsdM-HsdR interface. In the absence of a crystal structure, it has become apparent that atomic-level structural information could shed first light on the way HsdR subunits attach to the MTase to form a holoenzyme capable of DNA translocation and cleavage.
Bioinformatic analysis of HsdR is hindered because of a low sequence identity of ϳ23-27% for the classified HsdR subunits between families of Type I R-M enzymes (8). Within the Type IC family, full sequences are only available for EcoR124 and NgoAV from Neisseria gonorrhoeae (27) that display ϳ75% sequence identity. An alignment between members of different families was nevertheless achieved in endonuclease and helicase I and II domains because of the presence of clearly defined motifs (25,21). The helical and C-terminal domains exhibit an even lower sequence identity than the rest of the protein, thereby prohibiting a direct identification of conserved regions through multiple pairwise alignment. So far, no motifs or folds have been identified for these two domains, although they have been said to belong to an ␣-helical class (21).
The obtained crystal structure of the C terminus of HsdR from EcoR124 in the form of a pHluorin fusion protein reveals that the C terminus stands as a separate domain featuring a helical bundle with its own hydrophobic core and a unique fold. The full-length HsdR model based on a re-refinement of the original WT crystal structure proposes that the C-terminal domain, attached by a short linker region, points outward from the four-domain arrangement and thereby clearly breaks the planar nature of the HsdR subunit of EcoR124. Sequence analysis using putative HsdR sequence data allowed a tentative assignment of conserved amino acid sequences in the C-terminal domain. Combined with the results of biochemical and physiological experiments, the model augments the hypothesis of the involvement of the C-terminal domain in complex assembly and DNA binding.

Structure determination and overall architecture of the C-terminal domain
The vector design, purification, crystallization, and X-ray data analysis for pHluorin-HsdR887 fusion protein were reported previously (28). Initial molecular replacement revealed two pHluorin molecules in the asymmetric unit. Using the Buccaneer pipeline in CCP4, 74% of all amino acids were built in eight fragments. After combining the fragments and manual model building, the C-terminal domain was resolved for one of the two molecules in the asymmetric unit. A pore of ϳ42-Å diameter with almost no electron density delineates the missing residues from the second molecule in the asymmetric unit ( Fig. 1). In total, 382 residues were built in one molecule and 228 in the other, along with 198 solvent molecules; the structure was refined to a final resolution of 2.45 Å (see Table 1 for refinement statistics).
The following description applies to molecule A of the asymmetric unit, as it contains the resolved HsdR domain. Residues 8 -244 correspond to the pHluorin sequence and residues 247-398 to the C terminus of HsdR (residues 887-1038 in "HsdR coordinates"). Electron density was missing for residues 263-267 and therefore modeled as a loop in the YASARA software. Residues with missing or poor electron density for the side chains were modeled to C␤. Two histidines from the His 6 tag, 6 and 7, are also present in molecule A, along with a short linker of Glu 245 -Phe 246 between the pHluorin and HsdR sequences. Linker design is a trial and error process (29), and, as a first possibility, a rigid linker of two amino acids was chosen to reduce conformational heterogeneity and promote interactions between the C-terminal domain and pHluorin. This first trial proved successful.

Novel domain of the motor subunit of Type I R-M system
Despite the prevailing assumption that the C terminus is a part of the helical domain, it actually forms a separate domain characterized by a unique fold and its own hydrophobic core. The domain is stable on its own in solution, as demonstrated in molecular dynamics simulations (data not shown). The C-terminal domain consists of six ␣-helices arranged in two bundles around the hydrophobic core (Fig. 1B), shaped similar to a tube with dimensions of 50 by 26 Å. According to secondary struc-

Novel domain of the motor subunit of Type I R-M system
ture analysis in STRIDE (30), of 152 residues in the C-terminal domain, 60.5% form ␣-helices, 6.0% 3 10 -helices, 11.2% turns, 19% coils, and 3.3% are unresolved. The longest ␣-helix is the fourth one (27 residues), whereas the shortest is the third one (nine residues), which is preceded by three residues forming a 3 10 -helix. All ␣-helices are arranged in an antiparallel manner, except ␣5 and ␣6. They are linked by a 27-residue-long coilturn-3 10 -helix-turn-coil-turn region, bringing helix ␣6 to an ϳ45°angle in relation to ␣5 and almost perfectly antiparallel to ␣4. The N and C termini of the domain are ϳ27 Å apart. Six contacts between the C-terminal domain and the crystallization chaperone pHluorin are present (Asn 284 -Phe 246 (linker), Arg 285 -Tyr 45 , Asn 333 -Lys 47 , Arg 349 -Thr 44 , Thr 351 -Asp 82 , and Lys 376 -Thr 236 ). A fold analysis by the Dali server (31), which compares newly solved structures against structures in the Protein Data Bank to compute the amount of structural similarity as a so-called Z-score, was unable to assign the C terminus to a specific fold family. Of 30 structures with a Z-score of 4.0 or higher reported by Dali, the highest percentage of aligned residues is 71% (PDB code 5W1H, chain A) and the lowest root mean square deviation is 2.4 (PDB code 2DOB, chain A). Most of the structures with a Z-score Ն 4.0 have less than 50% aligned residues corresponding at best to a three-helix bundle. Hence, the C-terminal domain represents a novel six-helix bundle.

Modeling of full-length HsdR and electrostatic surface potential analysis
The extended HsdR crystal structure was obtained by combining the previous WT HsdR coordinates and structure factors PDB code 2W00 (21)) with the structure of the C-terminal domain determined here. Using Phaser from the CCP4 software bundle, the main features of the secondary structure of the C-terminal domain were fitted into the electron density map, even though many side chains and some regions were not fully resolved after refinement (Fig. 1, C and D, and Table 1). High mobility of the C-terminal domain that formerly prevented structure determination correlates well with increased B-factors of amino acids most distal from the core of the enzyme. The resulting structure represents the first experimentally determined full-length structure of HsdR deposited in the Protein Data Bank (PDB code 6H2J). Nevertheless, to be able to calculate the electrostatic surface potential and determine hydrogen-bond interactions, a full-length HsdR model was generated that includes the missing loops, side chains, and hydrogens. To attain more side chain information, the C-terminal domain from the pHluorin-HsdR887 fusion structure was aligned in YASARA using the Mustang software (32), and atom coordinates were replaced by higher-resolution information from the pHluorin-HsdR887 crystal structure. A root mean square deviation of 0.884 Å over 119 aligned residues of the C-terminal domain was observed during the alignment. The connecting region between the helical domain and the C-terminal domain (residues 887-902) was modified manually to fit the electron density map of the WT structure. The resulting model represents the first full-length HsdR subunit of a Type I restriction enzyme (Fig. 2). The full-length model was used for all further analysis (see Model S1).
The four previously identified domains of HsdR are the endonuclease domain (residues 13-260), RecA-like helicase domains 1 and 2 (residues 261-461 and 470 -731), and the helical domain (residues 732-892) (21). The C-terminal fifth domain connects to the helical domain via a short linker (residues 886 -888). Consequently, the helical domain should be reassigned to include residues 732-886 only, and residues 887-1038 should be assigned to the C-terminal domain. The most striking feature of the C-terminal domain is the way it is arranged with respect to the rest of the protein; although the four previously identified domains are arranged to form a square-planar body, the fifth domain protrudes outward (Fig. 2).
Calculations of the common molecular surface between the C-terminal and other domains of HsdR show a common surface of 39.67 Å 2 for the helicase 2 domain and 142.86 Å 2 for the helical domain. The hydrophobicity of the common surface between the C-terminal and helicase 2 domain is 57.5% and 57.9% for the helical domain. The endonuclease and helicase 1 domains do not share a common molecular surface with the C-terminal domain.
Potential interactions with the helicase 2 domain are as follows: Glu 899 and Lys 572 with a 3.3-Å distance between C␦ of glutamate and C⑀ of lysine; Gln 903 and Thr 570 with a 3.6-Å distance between the nitrogen of the amino group of glutamine and the backbone oxygen of threonine; and Tyr 892 and Pro 533 with a 3.5-Å distance between the oxygen of tyrosine and the C␤ atom of proline. The helical domain has the following potential interactions with the C-terminal domain: a hydrophobic interaction between Leu 922 and Leu 883 (3.5 Å between C␦1); Ser 920 and Gln 844 with a distance of 3.4 Å between the backbone oxygen of serine and the amino group nitrogen of glutamine and a hydrogen bond of 2.1 Å between the backbone oxygen of Leu 922 and the H␦2 of Asn 851 . Another potential interaction zone for the helical domain is in the 858 -862 loop that comes as close as 3.5 Å (measured between C␦ of Tyr 991 and the backbone oxygen of Arg 859 ) to the potentially flexible loop-coil region connecting helices 5 and 6 in the C-terminal domain. The potential interactions and common molecular surfaces of the C-terminal domain are shown in Fig. 3.
Electrostatic surface potential calculations were carried out for the full structure (Fig. 2). A prominent groove between the endonuclease and the C-terminal domain (and, partially, the helical domain) provides a prevalently positively charged protein surface that extends the DNA-binding cleft between the two helicase domains (Fig. 2, A and C). This groove could potentially bind to DNA, bringing the dsDNA close to the catalytic residues Asp 151 , Glu 165 , and Lys 167 in the endonuclease domain.

Bacteriophage restriction assay with the C terminus deletion mutant of HsdR
A standard bacteriophage restriction test was used to compare the C terminus deletion mutant with the WT HsdR in its restriction activity and MTase binding (33). The pTrcR124 plasmid (34) carrying a gene for either WT or mutated HsdR was introduced into a restriction-deficient (JM109(DE3)/ pACMS rϪmϩ) or restriction-proficient host (JM109(DE3)/ Novel domain of the motor subunit of Type I R-M system pKF650 rϩmϩ) as described under "Experimental procedures." Positive and negative complementation tests described previously (35) were employed to distinguish between DNA cutting deficiency and inability to assemble with MTase, respectively. The results in Table 2 represent relative efficiency of plating (EOP) values, calculated as the number of plaques relative to that of a nonrestricting strain (rϪmϩ), JM109(DE3).
The C terminus deletion mutant HsdR shows a restrictiondeficient phenotype in the rϪmϩ strain with an EOP of ϳ0.7 (values from 0.1 to 1 correspond to a nonrestricting strain), whereas the WT protein complements the rϪmϩ strain, showing an ϳ100-fold decrease in phage growth (EOP, 0.0084). Such a difference in DNA cutting ability can be explained in at least two different ways: inability of the mutant to function properly as a DNA cutter or defective interaction with MTase. The latter appears to be true, according to the results of a negative complementation experiment using the rϩmϩ strain, where the mutant and WT HsdRs compete for MTase binding. Similar values of EOP for the C-terminal deletion mutant and full-length HsdR in this test (0.0041 and 0.0075, respectively) show no significant competition between the subunits, leaving restriction proficiency almost intact for the strain expressing both mutant and WT HsdR.

Electrophoretic mobility shift assay with C terminus deletion HsdR mutant
Further indication of the involvement of the C-terminal domain in binding to MTase comes from a direct observation of complex formation of purified proteins in an electrophoretic mobility shift assay (EMSA). In this experiment, 32 P-labeled duplex oligonucleotides were mixed with MTase, and the WT or C terminus deletion mutant HsdR was added in increasing concentrations. The position of the labeled DNA duplex, visualized on a native polyacrylamide gel, is shifted according to the physical properties of the protein complexes bound to it, allowing to track the interactions between MTase and the HsdR subunit. Fig. 4 shows the formation of distinct S 1 M 2 R 1 and S 1 M 2 R 2 for the WT HsdR as the ratio of HsdR to MTase increases. At

Novel domain of the motor subunit of Type I R-M system
the same time, only the S 1 M 2 complex forms for the mutant HsdR, showing that no interaction occurs up to a 3ϫ molar excess of HsdR over MTase.

Identification of conserved regions in the C-terminal domain
Because the direct multiple alignment between the characterized HsdRs of the Type I families is impossible because of low sequence identity of the helical and C-terminal domains, a different approach using putative enzymes was implemented here. All available putative HsdR sequences were downloaded from REBASE, and redundant sequences were removed as described under "Experimental procedures." The final trimmed dataset contained 8851 sequences, among which ϳ92% of sequences were 20 -40% identical to R.EcoR124, whereas other percentiles are much less populated (Fig. 5). This dataset served to identify conserved motifs in the C terminus using Clustal Omega for generating multiple alignments (Fig. 6). The identified motifs were then cross-checked in all available sequence data based on pairwise alignment to R.EcoR124 to evaluate their conservation (Fig. 5). The main motifs identified in sequences with a sequence identity as low as 40 -45% with EcoR124 were as follows: 887 EXNXDYIL 894 , 925 RXKX-XLXXXFI 935 , 996 G-1004 PXXS 1007 , and 1016 KKXXXXXK 1023 close to the C terminus. These motifs seem to have very specific functions in the C-terminal domain. Motif 887 EXNXDYIL 894 is part of the N-terminal "hinge" region connecting the helical and C-terminal domain (Fig. 7A), with Tyr 892 contacting Pro 533 of the helicase 2 domain (with a distance of 2.7 Å between C of Tyr 892 and H␤ of Pro 533 ). Motif 925 RXKXXLXXXFI 935 is flanking the ␣3 helix (Fig. 7B), with Arg 925 and Lys 927 forming a nearly helical structure just before the first real helix-turn and Phe 934 and Ile 935 being in the C-terminal turn of the helix, contacting helices 4 and 2, respectively. The conserved Leu 930 is facing and contacting helix 4. Motif 996 G-1004 PXXS 1007 helix (Fig. 7C) is part of the 24-amino acid-long loop connecting helices 5 and 6, with Pro 1004 and Ser 1007 forming a local structure placed in a part of the loop that passes through a cleft formed by helices 4 and 6. In the fourth motif, 1016 KKXXXXXK 1023 (Fig.  7D), Lys 1016 is contacting Glu 972 (H-bond donor-acceptor distance, 2.76 Å) in the last turn of helix 4, with Glu 972 being conserved throughout the analyzed sequences. Lys 1023 has a very similar interaction, as it is contacting Glu 971 (H-bond donor-acceptor distance, 2.76 Å) in the last turn of helix 4 neighboring the conserved Glu 972 . However, Glu 971 is less conserved, being either aspartic acid or glutamic acid in sequences around 40 -50% identity but not providing a negative charge in sequences with higher identity (70 -75%). Lys 1017 is oriented away from the protein, pointing into bulk water, and is thus potentially a candidate for contacting another subunit or DNA.

Novel domain of the motor subunit of Type I R-M system
The multiple alignment in Fig. 6 shows a sample of sequences of putative HsdRs based on alignment scores and phylogenic representativeness. So far, 37 Type I enzymes were characterized biochemically, and only for ten is the HsdR sequence is available (EcoKI, EcoAI, StySBLI, NgoAV, KpnBI, KpnAI, CjeFII, CjeFIV, and the N-terminal fragments of EcoprrI and StySKI). All four motifs were found only in NgoAV (IC family). The other IC member, EcoprrI, could not be analyzed because only a 313-amino-acid N-fragment was sequenced. Other sequences shown in the alignment are putative HsdRs, one representing a similar high sequence identity as NgoAV, whereas others are representing lower identities in the range of 44 -55%. Among all putative HsdRs, 3.5% of the sequences were more than 65% identical, probably representing subfamily IC (Fig. 5). This distribution has its maximum at 70 -80%, with 232 sequences of 308 sequences being in that range. Nevertheless, even in sequences with a sequence identity as low as 40%, all four motifs are conserved in ϳ7% of all sequences. Here, motif 887 EXNXDYIL 894 shows the highest conservation, being present in 78% of all sequences with a sequence identity of 40 -45%. Motifs 925 RXKXXLXXXFI 935 and 996 G-1004 PXXS 1007 are still present in ϳ50% of all sequences, with a sequence identity of 40 -45%. The least conservation is observed for motif 1016 KKXXXXXK 1023 , appearing in sequences with a sequence identity above 50%. The general threshold for being assigned to the same family is a sequence identity of at least 30%. As approximately one thousand sequences in the data set show a sequence identity of 30 -40% despite not having any of the identified motifs, the novel six-helix bundle observed in the C-terminal domain is probably common to subfamily IC and several other closely related subfamilies of Type I R-M systems.

Conclusions
Despite a few structures of HsdR from the EcoR124 R-M system that have already been solved, the absence or poor quality of the electron density around the C-terminal domain barred previous efforts to obtain a robust atomic-level crystallographic structure. We propose that the consistently poor quality of the electron density in this region is due to its high flexibility and, hence, irregularities in crystal packing. The fusion with ratiometric pHluorin (a pH-sensitive variant of GFP) was a measure to induce additional contacts to make the C-terminal domain less flexible within the crystal packing context. The fusion partner also helped to overcome the absence of expression when trying to express the C-terminal domain separately (data not shown). The gene of the fusion protein, on the other hand, was easy to express, and the overproduced protein was easy to purify and crystallize (28).
Additionally, the fusion approach allowed us to obtain phases for the C-terminal domain without carrying out experimental phasing. Combining crystallographic data from the previously obtained WT structure allowed us to create the first full model of the HsdR subunit for a Type I RM enzyme. The whole procedure further validates the crystallization chaperone approach (29).
So far, the motor subunit has been described to be comprised of four structural and functional domains (61) forming a planar array, and the C-terminal region of HsdR had been thought to enlarge the helical domain, making the square-planar

Novel domain of the motor subunit of Type I R-M system
arrangement more symmetrical. The crystal structure refined to 2.45-Å resolution reported here reveals not only that this part of the protein has its own hydrophobic core and forms an independent C-terminal domain that has a unique ␣-helical fold but also breaks the symmetry and planarity of the motor subunit and reshapes it by protruding outward. The C-terminal domain is probably common to subfamily IC and several other closely related subfamilies of Type I restriction-modification systems.
The results of both in vivo and in vitro assays demonstrate that the C-terminal domain is an essential component in HsdR's ability to bind to MTase. Because one or two HsdRs are recruited by MTase bound to DNA after recognition of the methylation state of adenines within the recognition sequence, both protein-protein and DNA-protein interactions are expected to happen. In agreement with this, electrostatic surface potential calculations with full-length HsdR revealed a positively charged patch of residues in the C-terminal domain that extend the DNA-binding cleft between the two helicase domains. We suppose that these residues may facilitate the correct DNA positioning at the active site of the endonuclease domain. The conserved motif 1016 KKXXXXXK 1023 would be a good candidate for participating in binding to the methyltransferase. Hence, the motor subunit is comprised of five structural and functional domains, and the fifth, the C-terminal domain, reveals a novel fold characterized by four conserved motifs in subfamily IC of Type I restrictionmodification systems, essential for proper complex assembly and probably involved in DNA binding.

Fusion protein production and crystallization
A gene coding for the pHluorin-HsdR887 fusion protein, consisting of a His 6 tagged GFP variant, ratiometric pHluorin (36), and amino acids 887-1038 of the HsdR subunit corresponding to C-terminal domain, was cloned and expressed, and the overproduced protein was purified and crystallized as described previously (28).
Briefly, initial crystallization trials were done using a Gryphon crystallization robot (Art Robbins Instruments) employing the crystallization screens Morpheus (Molecular Dimensions) and PEG/Ion (Hampton Research) in MRC 2-well crystallization plates (Hampton Research). For further optimization, protein and precipitant concentrations were varied, and additives from Additive Screen (Hampton Research) were added in 24-well Comb- Figure 6. Multiple sequence alignment of the C-terminal domain of putative and described HsdR subunits. The alignment was performed for two biochemically described Type IC HsdR subunits (EcoR124 and NgoAV) and five putative proteins with a sequence identity of ϳ44 -71% compared with R.EcoR124. The sequences were retrieved from REBASE (1). The cladogram was produced using http://www.phylogeny.fr/ (please note that the JBC is not responsible for the long-term archiving and maintenance of this site or any other third party-hosted site) (59) and PhyML (60); the alignment was performed in Clustal Omega (58). Only the C-terminal domain (amino acids 887-1038) is shown, although the alignments were performed with full-length subunits. The numbers on top of the sequence denote the EcoR124 numbering. Conserved residues are highlighted ( 887 EXNXDYIL 894 motif, blue; 925 RXKXXLXXXFI 935 , orange; 996 G-1004 PXXS 1007 , green; 1016 KKXXXXXK 1023 , yellow; others, gray). The residues involved in inter-and intradomain contacts are marked with triangles colored according to the residue's contact partner (the domain coloring scheme is the same as in Fig. 2). The secondary structure elements are given according to the EcoR124 sequence (gray bars, ␣-helix numbered as in Fig. 1B; blue bars, 3 10 -helix; green bars, turn).

Novel domain of the motor subunit of Type I R-M system
iClover crystallization plates (Jena Bioscience). The sitting drop vapor diffusion technique was used for all trials (37).

Data processing and structure determination
Data collection was done at the PETRA III X-ray radiation source (Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany) on beamline P13 (38) operated by EMBL Hamburg, equipped with a PILATUS 6 M-F detector (DECTRIS Ltd.). Prior to flash-cooling in liquid nitrogen, 1-3 l of 50% (w/v) PEG 3350 was added to the drop with crystals for cryoprotection. XDS software (39) was used for integrating data images. The space group was determined with POINTLESS from the CCP4 software package (40), and the Matthews coefficient (41) was calculated using MATTHEWS_COEF from CCP4. Initial phases were obtained by molecular replacement with Molrep (42) using a GFP structure (PDB code 1W7S (43)) as a template. Missing parts corresponding to the C-terminal domain of HsdR were traced by Buccaneer (44). The model was refined by applying cycles of restrained and rigid body refinement in Refmac5 (45) and manual model building in Coot (46). The final structure was checked with the PDB validation server and uploaded to the PDB after minor corrections (PDB code 5J3N).

Refinement, modeling, and surface analysis of full-length HsdR
WT HsdR coordinates and structure factors (PDB code 2W00 (21)) were used to position the C-terminal domain with respect to the rest of the protein. In PDB code 2W00, the C-terminal domain is not resolved; however, the missing ϳ150 C-terminal residues of each monomer were confirmed to be present in dissolved crystals by gel electrophoresis, and the electron density reveals several short disordered regions, allowing positioning of the C-terminal domain relative to the rest of the structure. To do so, the C-terminal domain and the WT HsdR structure of PDB code 2W00 were used as two independent ensembles in Phaser (47). The obtained model was then trimmed in Coot, and a few cycles of restrained and TLS (Translation/Libration/Screw) refinement and model building

Novel domain of the motor subunit of Type I R-M system
were performed with Refmac5 (45) and phenix.refine (48). The extended WT structure with the C-terminal domain was deposited in the PDB database (PDB code 6H2J).
Electrostatic surface potentials were calculated in YASARA with the particle mesh Ewald method using a 1-nm cutoff, a maximum electrostatic surface potential of 60 kcal mol Ϫ1 , and a probe radius of 1.4 Å. The molecular surfaces common between the C-terminal and other domains were calculated as the difference of the sum of molecular surfaces of each domain pair and the combined molecular surface of the domain pair. All hydrophobicity calculations and surface area calculations were performed using the standard approach described in the YASARA tutorial (49).

HsdR and MTase production
A deletion in the HsdR gene coding for the C-terminal amino acids 887-1038 was introduced into the pTrcR124 plasmid commonly used for HsdR overexpression (34) by means of onestep PCR-based mutagenesis (51). The forward and reverse primers were as follows: 5Ј-GCTGAAGTCTCAGTAGC-CCAATTCGTGTTTTTC-3Ј and 5Ј-ACGAATTGGGC-TACTGAGACTTCAGCAAATCG-3Ј. PCR products were digested with DnpI to degrade the methylated template and used to transform Escherichia coli DH5␣ cells. DNA was extracted from bacterial cultures using the Zyppy plasmid miniprep kit (Zymo Research). The presence of the deletion and absence of undesired additional mutations was verified by DNA sequencing. Both WT and mutant HsdRs were purified as described previously (34). MTase was produced with a two-step purification procedure in a similar manner as before (52).

Bacteriophage restriction assay
The in vivo restriction activity for the WT and C terminus deletion mutant of HsdR was assessed by analyzing the relative EOP in a bacteriophage 0 plaque assay. The E. coli JM109 (DE3) strain (53) lacking RecA and genes of restriction enzymes was transformed with pACMS carrying the HsdS and HsdM genes coding for MTase or pKF650 carrying the HsdS, HsdM, and HsdR genes (54). In a positive complementation test, the JM109(DE3)/pACMS strain was further transformed with pTrcR124 (34) carrying either the WT or C terminus deletion mutant HsdR. Similarly, the JM109(DE3)/pKF650 strain was transformed with pTrcR124 for a negative complementation test. The virulent bacteriophage (55) was cultivated, and its titer was determined as described previously (33). 0.5 ml of fresh overnight culture was mixed with 3 ml of soft agar medium (1% Tryptone, 0.5% yeast extract, 0.5% NaCl, 0.6% agar) and spread on the plates with the corresponding antibiotics. Phage dilutions (10 2 to 10 6 pfu/ml) were then dropped on the plates, followed by overnight incubation at 37°C. The aver-age EOP was calculated relative to a nonrestricting JM109(DE3) strain from at least three independent experiments.

EMSA
Binding of the EcoR124 complex to DNA was assessed in vitro following the retardation in electrophoretic mobility of a radioactively labeled 30-mer duplex oligonucleotide (5Ј-CGT-GCAGAATTCGAGGTCGACGGATCCGGG-3Ј; the recognition site is shown in bold). Equimolar amounts of complementary oligonucleotides were annealed, labeled with [␥ 32 P] ATP by T4 polynucleotide kinase (New England Biolabs), and purified with the QIAquick Nucleotide Removal Kit (Qiagen). Binding reactions were performed for 10 min at room temperature in a total volume of 10 l in 50 mM Tris-HCl (pH 8.0), 50 mM NaCl, 10 mM MgCl 2 , 1 mM DTT, and 10% (v/v) glycerol with 10 nM DNA substrate and varying concentrations of MTase and the WT or mutant HsdR. DNA and protein complexes were separated on a 6% nondenaturing polyacrylamide gel run in TAE buffer (40 mM Tris, 20 mM acetic acid, 1 mM EDTA, pH 8.3) at 4°C and 7 V/cm field strength. After being vacuum-dried for 30 min at 80°C (583 Gel Dryer, Bio-Rad), the gel was visualized on a Molecular Imager FX (Bio-Rad).

Sequence analysis
The putative Type I HsdR sequences for analysis were downloaded from REBASE (1); 18,073 sequences were retrieved on February 24, 2018 (see supporting information). Because many sequences represent the results of whole-genome sequencing of closely related strains of microorganisms, the redundancy of the dataset was reduced by removing nearly identical sequences from the same species (more than 98% sequence identity) and fragments (less than 700 amino acids). The dataset was trimmed using R (56) with the Biostrings package (57). Multiple sequence alignment was performed in Clustal Omega v1.2.4 (58).