The structure of the Thermococcus gammatolerans McrB N-terminal domain reveals a new mode of substrate recognition and specificity among McrB homologs

McrBC is a two-component, modification-dependent restriction system that cleaves foreign DNA-containing methylated cytosines. Previous crystallographic studies have shown that Escherichia coli McrB uses a base-flipping mechanism to recognize these modified substrates with high affinity. The side chains stabilizing both the flipped base and the distorted duplex are poorly conserved among McrB homologs, suggesting that other mechanisms may exist for binding modified DNA. Here we present the structures of the Thermococcus gammatolerans McrB DNA-binding domain (TgΔ185) both alone and in complex with a methylated DNA substrate at 1.68 and 2.27 Å resolution, respectively. The structures reveal that TgΔ185 consists of a YT521-B homology (YTH) domain, which is commonly found in eukaryotic proteins that bind methylated RNA and is structurally unrelated to the E. coli McrB DNA-binding domain. Structural superposition and co-crystallization further show that TgΔ185 shares a conserved aromatic cage with other YTH domains, which forms the binding pocket for a flipped-out base. Mutational analysis of this aromatic cage supports its role in conferring specificity for the methylated adenines, whereas an extended basic surface present in TgΔ185 facilitates its preferential binding to duplex DNA rather than RNA. Together, these findings establish a new binding mode and specificity among McrB homologs and expand the biological roles of YTH domains.

Modification-dependent restriction systems recognize and cleave modified DNA (1). Some enzymes like Mrr, McrA, MspJI, and McrBC are directed against methylated cytosines (2), whereas others like GmrSD and members of the PvuRts1I family show specificity toward glucosylated nucleic acids (3,4). Collectively these proteins play a role in establishing the epigenetic landscape of bacterial genomes (5) and are especially important in protecting against predatory bacteriophages, many of which incorporate modified bases into their DNA to evade detection by other defense systems (6).
McrBC is a two-component, motor protein complex that was initially identified in Escherichia coli (Ec) 2 genetic screens by its ability to restrict glucosylation-deficient mutants of T4 phage (7). EcMcrB is a 53-kDa protein with an N-terminal domain (pfam: DUF3578) that binds fully or hemi-methylated R M C recognition elements (where R is a purine base and M C is a 4-methyl-, 5-methyl-, or 5-hydroxymethyl-cytosine) (8 -13) and a C-terminal extended ATPases associated with various cellular activities (AAAϩ) domain that binds/hydrolyzes GTP and mediates nucleotide-dependent oligomerization (14). EcMcrB exhibits a low basal GTPase activity (ϳ0.5-1 min Ϫ1 ) that can be stimulated ϳ30 -40-fold via interaction with its partner EcM-crC (15), a 40-kDa protein that contains a C-terminal PD-(D/ E)XK family endonuclease domain and lacks the ability to bind DNA on its own (16). Biochemical studies suggest a model for cleavage in which EcMcrB and EcMcrC assemble at two R M C sites separated by up to 3 kilobases and translocate DNA in a manner that depends on stimulated GTP hydrolysis (17). Collision of these assemblies cleaves both DNA strands near one of the R M C sites (12,18), suggesting that the complexes remain bound and translocate via DNA looping or twisting (19). These mechanochemical properties are reminiscent of type I and III restriction-modification systems, which bind DNA at nonmodified sites separated by up to thousands of base pairs and use ATP hydrolysis to power similar long-range translocation events that trigger cleavage either by collision or stalling (20).
EcMcrB achieves specificity through a base flipping mechanism (13,21). Modified bases are rotated out of the DNA duplex This work was supported by National Institutes of Health Grant GM120242 (to J. S. C.) and based upon research conducted on Beamlines 24-ID-C and 24-ID-E of the Northeastern Collaborative Access Team, which is supported by National Institutes of Health Grant P41 GM103403. The Pilatus 6M detector on Beamline 24-ID-C is supported by National Institutes of Health-ORIP HEI Grant S10 RR029205. The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This article contains Tables S1 and Figs. S1-S3. The atomic coordinates and structure factors ( and positioned into a pocket in the N-terminal domain, where they form numerous hydrogen bonds and hydrophobic interactions (Fig. S1). The concomitant insertion of a tyrosine residue (Tyr 41 ) into the resulting gap stabilizes the duplex via base stacking. This strategy, although elegant, cannot simply be extrapolated to other McrB homologs because their N-terminal domains vary significantly in sequence, size, and predicted structural fold across different bacterial and archaeal species (see Fig. 1). In the handful of sequences that show identifiable homology to EcMcrB in this region (e.g. Rhizobium sp. CF097), the tyrosine plug is not conserved, and its mutation to the corresponding residue at that position-either alanine or glutamine-results in loss of DNA binding in vitro (21). These findings imply that McrB homologs have evolved different mechanisms for substrate binding and/or may preferentially target other sequences and modifications. In support of this, we previously showed that the N-terminal domain of Helicobacter pylori LlaJI.R1, a distant relative of the McrB family, uses a B3 domain to recognize DNA site-specifically (22).
Here we present the crystal structure of the N-terminal DNA-binding domain of Thermococcus gammatolerans McrB (Tg⌬185) both alone and in complex with methylated DNA at 1.68 and 2.27 Å, respectively. Tg⌬185 is structurally distinct from the EcMcrB DNA-binding domain, adopting a YTH domain fold commonly found in eukaryotic proteins that bind methylated RNA. Filter-binding experiments show that Tg⌬185 does not bind RNA and instead preferentially associates with 6-methyladenosine-modified DNA. Structural characterization of the Tg⌬185-DNA complex coupled with mutagenesis reveals that TgMcrB uses base flipping and an aromatic cage to recognize the modified base and an extended basic surface to associate with DNA preferentially. Together, these findings highlight a new biological function for YTH domains and underscore the notion that McrBC is a modular nuclease that can be adapted to a broad array of targets.  (Fig. 1). Of these, we chose the McrB homolog from T. gammatolerans (Tg) and purified its full-length protein (TgMcrB) and isolated N-terminal domain (Tg⌬185; Fig. 2A). We reasoned that this homolog would provide new insights into McrB specificity because structural modeling algorithms failed to assign any known fold with high confidence and would be amenable to crystallographic and biochemical studies because Tg is a hyperthermophilic, radiation-tolerant archaea with enhanced thermostability (23).

TgMcrB does not preferentially bind m 5 C DNA
Specificity for DNA-containing methylated cytosines is a defining feature of EcMcrB (8 -13). Because Tg⌬185 shares lit-tle sequence homology with the EcMcrB DNA-binding domain (Ec⌬155; Fig. 2B), we first asked whether it could bind 5-methylcytosine (m 5 C) modified DNA substrates (Table S1). Initial characterization by analytical size-exclusion chromatography (SEC) showed that Tg⌬185 forms stable complexes with m 5 C DNA similar to Ec⌬155 (Fig. 2, C and D). To assess these interactions quantitatively, we examined the retention of radiolabeled m 5 C and nonmethylated (nm) DNA in the presence of full-length TgMcrB or EcMcrB on alkaline-treated nitrocellulose filter paper (24). Filter binding shows that EcMcrB has a strong preference for m 5 C DNA with a calculated binding constant on the order of ϳ160 nM ( Fig. 2E and Table 1). TgMcrB, in contrast, binds both m 5 C and nm DNA almost equally but with weaker affinity than EcMcrB (calculated binding constants of ϳ700 nM) ( Fig. 2E and Table 1). These data indicate that TgM-crB is distinct from EcMcrB and displays a different sensitivity to modified DNA.

Tg⌬185 adopts a YTH domain fold and preferentially binds m 6 A DNA
To understand the molecular basis for the observed specificity differences, we determined the crystal structure of Tg⌬185 at 1.68 Å by selenium single-wavelength anomalous diffraction (SAD) phasing (25) (Fig. 3A and Table 2). Tg⌬185 is comprised of a six-stranded ␤-sheetordered ␤6-␤1-␤3-␤4-␤5-␤2-that is flanked by clusters of ␣-helices (Fig. 3B). The strands adopt a mainly antiparallel arrangement with only ␤1 and ␤3 oriented in a parallel fashion. The extended ␤4 strand subdivides the sheet and induces a sharp curvature that nearly folds the two opposing segments onto one another. Helical segments insert in loops that flank the ␤-sheet: ␣1 and ␣2 in the ␤1-␤2 loop; ␣3 and ␣4 in the ␤4-␤5 loop; and ␣5 and ␣6 in the ␤5-␤6 loop. Importantly, the overall topology of the Tg⌬185 fold differs from that of Ec⌬155 (Fig. 3, C and D).
The DALI alignment algorithm (26) indicates that Tg⌬185 shares structural homology with YT521-B homology (YTH) domains (Z score, 7.5-8.5; root-mean-square deviation, 3.0 -3.5) (Fig. 3, E and F). YTH domains are conserved RNA-binding modules that specifically recognize 6-methyl-adenosine (m 6 A) modifications (27)(28)(29). In eukaryotes, m 6 A modifications are linked to the regulation of alternative splicing, RNA processing, mRNA degradation, and the circadian clock (30 -33). Given the structural similarity to YTH domains and lack of specificity toward m 5 C DNA, we tested whether Tg⌬185 can associate with m 6 A-modified RNA. Filter binding shows that although the human (Hs) YTHDC1 YTH domain specifically associates with m 6 A RNA (calculated binding constant of ϳ400 nM), Tg⌬185 shows little affinity for either the methylated or nonmethylated RNA substrates ( Fig. 4A and Table 1). We next asked whether Tg⌬185 could bind m 6 A-modified DNA. Surprisingly, Tg⌬185 associates more tightly with m 6 A dsDNA, exhibiting a ϳ5.5-fold increase in affinity compared with m 5 C or nonmethylated dsDNA substrates ( Fig. 4B and Table 1). This enhancement appears to be driven solely by the modification, because single-stranded DNA oligonucleotides show the same binding profile ( Fig. 4C and Table 1). These data indicate that Tg⌬185 is a DNA-specific YTH domain that preferentially targets substrates containing m 6 A modifications.

Structure of the T. gammatolerans McrB N-terminal domain An aromatic cage in Tg⌬185 confers specificity for m 6 A DNA
Crystallographic studies have shown that YTH domains recognize m 6 A via a conserved "aromatic cage," wherein two to three aromatic residues provide stabilizing -stacking and hydrophobic interactions (34 -40). Structural superposition with the m 6 A-bound YTH domain from HsYTHDF2 (PDB code 4rdn; Z score, 8.5; root-mean-square deviation, 3.1) identifies Trp 53 , Trp 115 , and Phe 121 as putative cage residues in Tg⌬185, poised to serve as a binding site for modified bases (Fig. S2).
To confirm this hypothesis, we determined the crystal structure of Tg⌬185 in complex with DNA ( Fig. 5A and Table 2). Although Tg⌬185 crystallized with a variety of different modified substrates, suitable diffraction could only be obtained with a 19-mer dsDNA substrate that had single-base pair overhangs and contained two mismatches flanking internal m 5 C modifications in each strand (meDNA; Fig. 5B and Table S1). Incorporation of mismatches did not significantly alter the binding profile of TgMcrB ( Fig. 4D and Table 1). Initial maps at 2.64 Å revealed partial, discontinuous DNA density associated with each Tg⌬185 monomer and strong peaks for backbone phosphates. Numerous bases throughout the duplex, however, remained poorly resolved. An incomplete model for the Tg⌬185-meDNA complex was built and used for molecular replacement into a 2.27 Å resolution isomorphous data set ( Table 2). The higher resolution data set yielded vastly improved phases and interpretable electron density for both a flipped-out base and the base pairs within the surrounding DNA duplex (Fig. 5, C-E).
Despite a 19-mer substrate being used for crystallization, the asymmetric unit contains a single Tg⌬185 monomer bound to six base pairs of DNA (Fig. 5C, yellow). These DNA segments align end to end, forming a pseudocontinuous duplex throughout the crystal lattice that is highly distorted (Fig. 5F). Tg⌬185 decorates the extended duplex along a single strand (Fig. 5C), flipping every seventh base into a pocket on the surface of the protein (Fig. 5D). The substrate length (19 nucleotides) causes a register shift across adjacent unit cells and suggests that  (62)

Structure of the T. gammatolerans McrB N-terminal domain
Tg⌬185 monomers throughout the lattice interact with different DNA sequences. This implies that the resulting electron density attributed to the DNA represents the average distribution of the bases over the length of the duplex rather than a single, defined sequence. A similar scenario has been observed with Streptomyces coelicolor IHF, wherein crystallization with a 19-mer DNA substrate yielded an asymmetric unit with eight nucleotides (41). During refinement, we modeled all possible sequence registers of the substrate and chose the one that yielded the lowest R free value and the strongest base density. The preferred sequence based on these parameters positions an adenine as the flipped-out base (Fig. 5D). The apo-and DNAbound Tg⌬185 monomers superimpose with an average rootmean-square deviation of 0.549 Å, indicating no significant structural changes occur in the protein upon substrate binding. We do note, however, a significant widening of the major groove ( Fig. 5F) that likely arises from both Tg⌬185-induced base flipping (Fig. 5D) and the presence of mismatches in the DNA substrate that enhanced crystallization (Fig. 5B).
As predicted, Trp 53 , Trp 115 , and Phe 121 form an aromatic cage that stabilizes each flipped-out adenine base (Fig. 6A). The organization of this pocket mirrors the stabilization of m 6 A in the HsYTHDC1 YTH domain-m 6 A ssRNA complex (PDB code 4r3i; Z score, 7.7; root-mean-square deviation, 3.3) (Fig.  6B). In HsYTHDC1, mutation of either cage tryptophan (W377A or W428A) completely abolishes m 6 A RNA binding (38). To assess how the Tg aromatic cage contributes to DNA binding and modified base recognition, we engineered a triple alanine mutant (W53A/W115A/F121A) in full-length TgMcrB and measured how this construct interacts with m 6 A-modified dsDNA by filter binding (Fig. 6C). W53A/W115A/F121A shows a ϳ5.3-fold reduction in binding relative to WT (Table  1). This finding was corroborated using electrophoretic mobility shift assays (EMSAs) to measure the association of TgMcrB with digested m 6 A methylated (dam ϩ ) and nonmethylated (dam Ϫ ) phage DNA (Fig. 7). We observe a significant gel shift with WT TgMcrB on m 6 A DNA ( Fig. 7A) with higher affinity compared with nonmethylated DNA (Fig. 7B). The W53A/ W115A/F121A triple mutant, however, significantly impairs binding to m 6 A DNA (Fig. 7, C versus A) but not to nonmethylated DNA (Fig. 7, D versus B). Importantly, these changes reduce binding to a level that is comparable with WT TgMcrB's  Table S1 for substrate sequences. The data points represent the averages of at least three independent experiments (means Ϯ S.D.). Binding constants were determined by nonlinear curve fitting using Kaleidagraph (Synergy Software) and defined as the concentration of the protein at which 50% of the labeled DNA substrate is retained. Calculated K d values are listed in Table 1.

Structure of the T. gammatolerans McrB N-terminal domain
affinity for m 5 C or nm DNA (Table 1). This suggests that the cage residues primarily confer specificity for methylated adenines and that other structural features mediate the preferred association with DNA. Although Glu 16 and Asn 19 also form hydrogen bonds to the flipped-out base (Fig. S3A), disruption of these interactions by mutagenesis (E16A/N19A) has no significant effect on m 6 A-DNA binding ( Fig. S3B and Table 1).

An expanded basic patch facilitates TgMcrB DNA binding
HsYTHDC1 and Tg⌬185 both contain a basic patch surrounding the aromatic cage that interacts with the negatively charged backbone of nucleic acids (Fig. 6, D and E). The area of this interaction surface is dramatically increased in Tg⌬185 (Fig. 6E, dashed magenta circle), which facilitates the binding of a duplex rather than a single strand of nucleic acids. Several arginines within these patches contact the bound substrate in each structure (Fig. S3, C-E). In HsYTHDC1, Arg 475 stabilizes the resulting gap caused by base flipping and -stacks with the G-1 base, whereas Arg 404 engages the phosphate backbone (Fig.  S3C). Arg 475 appears to be more critical, because mutation of this side chain to alanine decreases binding affinity over 100fold (38). In Tg⌬185, Arg 78 and Arg 81 engage the major groove near the flipped-out base, whereas Arg 55 and Arg 162 contact the phosphate backbone on opposite strands (Fig. S3, D and E). Mutation of both Arg 78 and Arg 81 to alanine, surprisingly, has no effect on DNA binding via filter binding ( Fig. S3B and Table  1) despite a similar spatial orientation that is analogous to Arg 475 in HsYTHDC1 (Fig. S3, C and D). The R78A/R81A double mutant, however, shows reduced affinity for both methylated (Fig. 7E) and nonmethylated (Fig. 7F) phage DNA via EMSA, suggesting that these side chains may play a role in mediating alternative structural contacts with DNA.
Tyr 61 and Asn 82 also contribute to the Tg⌬185-binding surface, forming a wedge that packs into the major groove of the bound DNA substrate (Fig. S3F). A double mutant removing this wedge (Y61A/N82A) surprisingly increases DNA binding by 7-fold ( Fig. S3B and Table 1). Together, these data highlight the numerous structural differences that distinguish the TgM-crB N terminus from other YTH domains and contribute to the specific recognition of DNA.

Discussion
Chemical modifications in nucleic acids serve as important markers that critically control a wide array of cellular processes (42). DNA modifications are central to the epigenetic regulation of gene expression and transcriptional events (43,44), activation of DNA repair pathways (45,46), and defense machineries that underlie the ongoing arms race between bacterial hosts and predatory bacteriophage viruses (1,6). m6A modification of RNA affects stem cell pluripotency, cancer, splicing, circadian rhythm, immunity, sex determination, and viral replication (27)(28)(29). Recent structural and biochemical studies have established that eukaryotic YTH domains act as "readers" of this RNA methylation and orchestrate the recruitment of different effector complexes to these sites (47). Here we showed that the N-terminal domain of the archaeal McrB homolog from T. gammatolerans (Tg⌬185) adopts a YTH domain fold and shows a preference for m 6 A modified DNA in vitro. This specificity sets it apart from every other YTH domain and expands the potential capabilities for how this fold can be utilized in nature. It remains to be seen whether Tg⌬185 is an outlier among the family that simply co-opted this binding module or whether other noneukaryotic YTH domains exist and share a similar propensity for targeting DNA. A more exhaustive bioinformatic analysis coupled with structural and biochemical validation will be necessary to clarify this in the future.
Canonical YTH domains recognize m 6 A modifications using base flipping and an aromatic cage that when mutated completely abolishes RNA binding (34,35,36,37,38,39,40). Our structural data indicate that Tg⌬185 employs the same general strategy. The contribution of the aromatic cage to the overall substrate binding, however, is less significant than in other YTH domains: cage mutations only impair binding to m 6 A DNA by ϳ5-fold, reducing it to a level that approaches TgM-crB's intrinsic affinity for nonmethylated DNA (Figs. 6C and 7C and Table 1). We also note that Tg⌬185 contains two arginine residues (Arg 78 and Arg 81 ) that are spatially conserved near the flipped-out base. In other YTH domains, one or more of these residues are important in sequence-specific recognition of the Ϫ1 base immediately upstream of the modified base (35,36,38,39). Interestingly, these residues have little influence on DNA binding via filter binding, but display a drastic decrease in both m6A and nonmethylated phage DNA binding. We interpret the disparity between the two assays as being a consequence of additional sequence specificity exhibited by TgMcrB that manifests when exposed to the greater sequence diversity present in the pool of lambda fragments. Moreover, unlike the aromatic cage triple mutant that only displays decreased affinity for m 6 A DNA, the R78A/R81A double mutant impacts both Table 1 Dissociation

constants from filter-binding experiments
The binding constants (K d ) were determined by nonlinear curve fitting using Kaleidagraph (Synergy Software) and defined as the concentration of the protein at which 50% of the labeled DNA substrate is retained. The curves were fit to data points that were the averages of three independent experiments (means Ϯ S.D.). The error values were calculated automatically in Kaleidagraph (Synergy Software) and represent the overall percentage deviations of the data from the final curve fit.

Structure of the T. gammatolerans McrB N-terminal domain
m 6 A and nonmethylated DNA binding. This suggests that these side chains form alternative contacts to DNA that are independent from m 6 A recognition. These findings argue that the aromatic cage primarily dictates TgMcrB's preferred specificity for the m 6 A modification and that overall DNA binding is mediated by other structural features. To this end, we observe that Tg⌬185 contains an extended basic surface that associates with the second strand of the DNA duplex (Fig. 6, D and E). These subtle structural differences further distinguish Tg⌬185 from other YTH domains. Although aromatic cage mutations reduce DNA binding, the Y61A/N82A double mutant increases TgMcrB's affinity for m 6 A DNA by nearly 7-fold ( Fig. S2 and Table 1). Together, these side chains shape the contours of the binding surface and form a wedge into the major groove of the bound DNA (Fig.  S3F). We hypothesize that removing these features may relax the structural constraints needed for binding and may increase tolerance for different substrates, similar to how distortions in the substrate caused by mismatched base pairs helped facilitate stable interactions in the crystal lattice.
Bacterial McrBC homologs function as defense systems that restrict foreign bacteriophage DNA (2, 7). Despite sharing conserved AAAϩ motor and nuclease machineries, each complex characterized to date exhibits a unique specificity that is deter-mined by the nonconserved N-terminal domain of its associated McrB protein (Fig. 1). Thus, although E. coli McrB targets DNA containing methylated cytosines via a DUF3578 fold (8 -13), more distantly related family members likes LlaJI, LlaI, and BsuMI recognize DNA site-specifically (48 -50) using modules like a B3 domain in some instances (22). Our Tg⌬185 structures and biochemical data define a new modality of binding-using a YTH domain to bind m 6 A-modified DNA-not previously observed or predicted for any McrB protein. Numerous archaeal viruses have been found that exhibit m 6 A genomic methylation and/or carry genes encoding for adenine methyltransferases (51)(52)(53). This suggests that archaea like T. gammatolerans have modified the modular McrBC scaffold in response to evolutionary pressures imposed by their viral pathogens much in the same manner as their bacterial counterparts.
Because of its ability to recognize and cleave m 5 C DNA, EcMcrBC is commonly used as a diagnostic tool to monitor epigenetic changes underlying mammalian gene expression (54), tissue-specific development (55), and perturbation to normal methylation patterning associated with human diseases like Prader-Willi and Angelman syndromes (56) and Fragile-X mental retardation (57). Members of the PvuRts1I family have been similarly employed to map 5-hydroxymethylcytosine

Structure of the T. gammatolerans McrB N-terminal domain
modifications (58,59). Recent studies have implicated N 6 -methyladenine modification as an important epigenetic marker in mammalian cells (60,61). Our structural and biochemical results suggest TgMcrBC could be utilized in a similar capacity to track and map dynamic changes in patterns of m 6 A methylation. Further biochemical characterization of the full restriction complex will provide a platform for this application.

Identification and phylogenetic analysis of McrB homologs
Putative McrB homologs were initially identified by BLAST using the sequence of the E. coli McrB AAAϩ domain to search against the Department of Energy Integrated Microbial Genomes database (62). These candidates were only considered if they contained the conserved McrB consensus motif MNXX-DRS and the presence of an adjacent McrC gene that could be confirmed by neighbor analysis. Homologs were then subdivided into groups according to their divergent N-terminal domains. A phylogenetic tree incorporating a representative from each group was generated using the Department of Energy Integrated Microbial Genomes analysis tools. Structural fold prediction for each unique N-terminal domain was carried out using the Phyre 2 protein fold recognition server (63).

Cloning, expression, and purification of TgMcrB constructs
DNA encoding the T. gammatolerans EJ3 McrB protein (Department of Energy Integrated Microbial Genomes data-base code 644807740) was codon-optimized for E. coli expression and synthesized commercially by GENEART. DNA encoding full-length TgMcrB was amplified by PCR and cloned into pET21b, introducing a His 6 tag at the C terminus. DNA encoding the N-terminal domain (Tg⌬185, residues 1-185) was amplified by PCR and cloned into pET15bP, a modified pET15b (Novagen) plasmid in which an Hrv3C protease site (LEV-LFQGP) replaces the thrombin site after the N-terminal His 6 tag. Native TgMcrB and Tg⌬185 were transformed into BL21(DE3) cells, grown at 37°C in Terrific Broth to an A 600 of 1.0, and then induced with 0.3 mM isopropyl 1-thio-␤-D-galactopyranoside (IPTG) overnight at 19°C. All cells were harvested, washed with nickel load buffer (20 mM HEPES, pH 7.5, 500 mM NaCl, 30 mM imidazole, 5% glycerol (v/v), and 5 mM ␤-mercaptoethanol), and pelleted a second time. Pellets were flash frozen in liquid nitrogen and stored at Ϫ80°C. Selenomethionine-labeled (SeMet) Tg⌬185 was expressed in minimal medium in the absence of auxotrophs as described previously (64).
Thawed Pellets from 500-ml cultures were resuspended in 30-ml of nickel load buffer supplemented with 10 mM phenylmethylsulfonyl fluoride (PMSF), 5 mg of DNase (Roche), 5 mM MgCl 2 , and a complete protease inhibitor mixture tablet (Roche). Lysozyme was added to 1 mg/ml, and the mixture was incubated for 15 min rocking at 4°C. The cells were disrupted by sonication, and the lysate was cleared of debris by centrifugation at 13,000 rpm (19,685 ϫ g) for 30 min at 4°C.

Structure of the T. gammatolerans McrB N-terminal domain
For native and SeMet Tg⌬185, the supernatant was filtered, loaded onto a 5-ml HiTrap chelating column charged with NiSO 4 and then washed with nickel load buffer. Tg⌬185 was eluted with an imidazole gradient from 30 mM to 1 M. Pooled fractions were dialyzed overnight at 4°C into nickelloading buffer with reduced salt (50 mM NaCl) in the presence of Hrv3C protease to remove the N-terminal His tag. The sample was reapplied to a 5-ml HiTrap chelating column charged with NiSO 4 . The flow through was fractionated to collect cleaved Tg⌬185, concentrated, and further purified by SEC using a Superdex 75 16/600 pg column.
For full-length TgMcrB, the supernatant from sonication was filtered, heated to 65°C for 20 min, centrifuged at 4,000 rpm (6,057 ϫ g) for 10 min at 4°C, and filtered again prior to purification on a 5-ml HiTrap chelating column as described above. Pooled peak fractions were concentrated and purified further by SEC.
All proteins were exchanged into a final buffer of 20 mM HEPES, pH 7.5, 150 mM KCl, 5 mM MgCl 2 , and 1 mM DTT during SEC and concentrated to 5-40 mg/ml. SeMet Tg⌬185 was purified similarly but was supplemented with 5 mM DTT in the SEC buffer. TgMcrB mutants were generated by QuikChange mutagenesis (Agilent Technologies) and confirmed by sequencing.

Cloning, expression, and purification of EcMcrB constructs
DNA encoding the full-length E. coli McrB protein (Uniprot P15005; Department of Energy Integrated Microbial Genomes database code 646316336) was codon-optimized for E. coli expression and synthesized commercially by GENEART. DNA encoding the full-length EcMcrB (residues 1-459) and the N-terminal domain (Ec⌬155, residues 1-155) were cloned into pMAL-c2Xp, a modified pMAL-c2X (New England Biolabs) plasmid in which an Hrv3C protease site replaces the Factor Xa site after the N-terminal MBP tag. Both constructs were transformed into BL21(DE3) cells, grown at 37°C in Terrific Broth to an A 600 of 1.0, and then induced with 0.3 mM IPTG overnight at 19°C. All cells were harvested, washed with TGED500 (20 mM Tris-HCl, pH 8.0, 500 mM NaCl, 1 mM EDTA, 5% glycerol (v/v), and 1  Table S1 and Table 1, respectively. m 5 C and m 6 A denote 5-methylcytosine and 6-methyladenine modifications, respectively. nmC and nmA denote nonmethylated versions of the same substrates. A, filterbinding analysis of TgMcrB and HsYTHDC1 YTH domain interactions with RNA substrates. B, filter-binding analysis of TgMcrB interactions with dsDNA substrates. Binding curves from Fig. 2E are included for comparison. C, filter-binding analysis of TgMcrB interaction with different single stranded DNA (ssDNA) substrates. D, filter-binding analysis of TgMcrB with different mismatched dsDNA substrates.

Structure of the T. gammatolerans McrB N-terminal domain
mM DTT), and pelleted a second time. The pellets were flash frozen in liquid nitrogen and stored at Ϫ80°C.
Thawed pellets from 500-ml cultures were resuspended in 30 ml of TGED500 supplemented with 10 mM PMSF, 5 mg of DNase (Roche), 5 mM MgCl 2 , and a complete protease inhibitor mixture tablet (Roche). Lysozyme was added to 1 mg/ml, and the mixture was incubated for 15 min of rocking at 4°C. The cells were disrupted by sonication, and the lysate was cleared of debris by centrifugation at 13,000 rpm (19,685 ϫ g) for 30 min at 4°C. Each supernatant was filtered, loaded onto 30 -40 ml of amylose resin, washed with TGED500, and eluted with TGED500 supplemented with 10 mM D-maltose. Pooled fractions were dialyzed overnight at 4°C into TGED with reduced salt (TGED50, 50 mM NaCl) in the presence of Hrv3C protease to remove the N-terminal MBP tag. Samples were then applied to a 5-ml HiTrap Q HP ion-exchange column in TGED50 and eluted with a NaCl gradient from 50 to 500 mM. Pooled frac-tions were concentrated and further purified by SEC using a Superdex 75 10/300 GL column. Both full-length and Ec⌬155 McrB were exchanged into a final buffer of 20 mM HEPES, pH 7.5, 150 mM KCl, 5 mM MgCl 2 , and 1 mM DTT during SEC and concentrated to 5-40 mg/ml.

Cloning, expression, and purification of HsYTHDC1
DNA encoding the human YTHDC1 YTH domain (residues 344 -509) was codon-optimized for E. coli expression and synthesized commercially by Integrated DNA Technologies and cloned into pET15bP. The HsYTHDC1 344 -509 was transformed into BL21(DE3) cells, grown at 37°C in Terrific Broth to an A 600 of 1.0, and then induced with 0.3 mM IPTG overnight at 19°C. All cells were harvested, washed with nickel load buffer, and pelleted a second time. The pellets were flash frozen in liquid nitrogen and stored at Ϫ80°C. Thawed pellets from 500-ml cultures were resuspended in 30-ml of  ; Table S1) shown in two orientations. Tg⌬185 is colored yellow, and bound DNA is colored wheat. B, schematic of the meDNA substrate used for crystallization with Tg⌬185. Mismatched bases are colored red and indicated by arrows. C, crystal packing of Tg⌬185 with meDNA. One asymmetric unit is colored yellow with the bound DNA illustrated as sticks and colored wheat. The electron density map associated with the DNA is colored light gray and illustrated as mesh. D, zoomed-in view of the electron density surrounding the flipped-out adenine base. E, zoomed-in view of the electron density surrounding base pairs within the bound DNA duplex. F, structural comparison of meDNA with B-form DNA (PDB code 1bna) illustrates deformation in the bound DNA.

Structure of the T. gammatolerans McrB N-terminal domain
nickel load buffer supplemented with 10 mM PMSF, 5 mg of DNase (Roche), 5 mM MgCl 2 , and a complete protease inhibitor mixture tablet (Roche). Lysozyme was added to 1 mg/ml, and the mixture was incubated for 15 min rocking at 4°C. The cells were disrupted by sonication, and the lysate was cleared of debris by centrifugation at 13,000 rpm (19,685 ϫ g) for 30 min at 4°C. The supernatant was filtered, loaded onto a 5-ml HiTrap chelating column charged with NiSO 4 , washed with nickel load buffer, and eluted with an imidazole gradient from 30 mM to 1 M. Pooled fractions were concentrated and further purified by SEC using a Superdex 75 10/300 GL column. HsYTHDC1 344 -509 was exchanged into a final buffer of 20 mM HEPES, pH 7.5, 150 mM KCl, 5 mM MgCl 2 , and 1 mM DTT during SEC and concentrated to 5-40 mg/ml.

Preparation of oligonucleotide substrates
All DNA and RNA substrates for analytical SEC, filter binding, and crystallization were purchased from Integrated DNA Technologies. Lyophilized nonmethylated and HPLC-purified modified single-stranded oligonucleotides were resuspended in to 1 mM in 10 mM Tris-HCl and 1 mM EDTA and stored at Ϫ20°C until needed. Single-stranded oligonucleotides were 5Ј end-labeled with [␥-32 P]ATP using polynucleotide kinase (New England Biolabs) and then purified on a P-30 spin column (Bio-Rad) to remove unincorporated label. Duplex substrates were prepared by heating equimolar concentrations of complementary strands (denoted with suffixes "us" and "ls" indicating upper and lower strands) to 95°C for 15 min followed by cooling to room temperature overnight and then purification on an S-300 spin column (GE Healthcare) to remove single-stranded DNA. Table S1 shows the sequence of each oligonucleotide used in this work.

Analytical size-exclusion chromatography
Samples (50 l) of 100 M Ec⌬155 or Tg⌬185 were mixed with m 5 C dsDNA in a 2:1.2 molar ratio in 20 mM HEPES, pH 7.5, 150 mM KCl, 5 mM MgCl 2 , and 1 mM DTT and incubated at A, zoomed-in view of the Tg⌬185 aromatic cage residues (yellow) and modeled adenine base from co-crystallized DNA substrate (wheat). B, zoomed in view of the HsYTHDC1 aromatic cage residues (green) with bound m 6 A base from co-crystallized RNA substrate (wheat; PDB code 4r3i). C, filter-binding analysis of TgMcrB WT and aromatic cage mutants with m 6 A dsDNA (see Table S1 for sequence). The data points represent averages of three independent experiments (means Ϯ S.D.). Binding constants were determined by nonlinear curve fitting using Kaleidagraph (Synergy Software) and defined as the concentration of the protein at which 50% of the labeled DNA substrate is retained. Calculated K d values are listed in Table 1. D and E, electrostatic surfaces of HsYTHDC1 with bound m 6 A-modified ssRNA substrate (PDB code 4r3i; D and Tg⌬185 with bound mismatched dsDNA substrate (E). A yellow box is drawn around the position of the aromatic cage in both structures and indicated by arrows. The scale bar indicates electrostatic surface coloring from Ϫ3 K b T/e c to ϩ3 K b T/e c .

Structure of the T. gammatolerans McrB N-terminal domain
room temperature for 10 -15 min. Each reaction was fractionated via gel filtration on a Superdex 75 3.2/300 analytical SEC column equilibrated with 20 mM HEPES, pH 7.5, 150 mM KCl, 5 mM MgCl 2 , and 1 mM DTT. Fractions containing samples were subjected to 4 -20% gradient SDS-PAGE, silver-stained for DNA, and Coomassie-stained for protein.

Filter-binding assays
The standard buffer for the DNA-binding assays contained 25 mM MES, pH 6.5, 2.0 mM MgCl 2 , 0.1 mM DTT, 0.01 mM EDTA, and 40 g/ml BSA. Binding was performed with purified full-length TgMcrB (WT or mutants) or HsYTHDC1 YTH domain at 30°C for 10 min in a 30-l reaction mixture containing 14.5 nM unlabeled DNA and 0.5 nM labeled DNA. The samples were filtered through KOH-treated nitrocellulose filters (Whatman Protran BA 85, 0.45 m) using a Hoefer FH225V filtration device for ϳ1 min. The filters were subsequently analyzed by scintillation counting on a 2910TR digital, liquid scintillation counter (PerkinElmer Life Sciences). All measured values represent the average of at least three independent experiments (means Ϯ S.D.) and were compared with a negative control to determine fraction bound. Binding constants were determined by nonlinear curve fitting using Kaleidagraph (Synergy Software) and defined as the concentration of the protein at which 50% of the labeled DNA substrate is retained. Calculated K d values are listed in Table 1. Error values were calculated automatically in Kaleidagraph (Synergy Software) and represent the overall percentage deviations of the data from the final curve fit.

Electrophoretic mobility shift assays
The standard buffer for the EMSAs contained 10 mM Tris-HCl, pH 8.0, 250 mM NaCl, 1 mM MgCl 2 , and 1 mM DTT. Binding was performed with purified full-length TgMcrB (WT or mutants) at 25°C for 30 min in a 16-l reaction mixture containing 5 ng/l of -phage DNA (purified from dam ϩ E. coli) or N 6 -methyladenine-free -phage DNA (purified from dam Ϫ E. coli) (New England Biolabs). All -phage DNA was digested with BamHI and NdeI (New England Biolabs) at 37°C for 90 min and purified via a NucleoSpin gel and PCR clean-up kit (Machery-Nagel) prior to incubation with TgMcrB. Following incubation, the samples were analyzed by 0.7% agarose gel in 1ϫ TAE at 4°C and 80 V for 90 min. All gels were stained with SYBR Green in 1ϫ TAE overnight at 25°C (Thermo Fisher Scientific) and visualized using a Bio-Rad Gel Doc TM EZ imager system.

Crystallization, X-ray data collection, and structure determination
SeMet Tg⌬185 was crystallized by sitting-drop vapor diffusion in 0.1 M MES, pH 6.5, 3.2 M (NH4) 2 SO 4 with a drop size of 2 l and a reservoir volume of 650 l. Crystals appeared within 6 -8 days at 20°C and were of the space group C2 with unit cell dimensions a ϭ 67.84 Å, b ϭ 43.99 Å, c ϭ 61.96 Å, ␣ ϭ 90.00°, ␤ ϭ 120.28°, and ␥ ϭ 90.00°. The samples were cryoprotected with Parabar 10312 from Hampton Research and frozen in liquid nitrogen. The crystals were screened and optimized at the MacCHESS F1 beamline at Cornell University, and SAD data were collected remotely on the tuneable Northeastern Collaborative Access Team 24-ID-C Beamline at the Advanced Photon Source at the selenium edge energy at 12.663 keV (0.9791 Å) ( Table 2). The data were integrated and scaled using the Northeastern Collaborative Access Team RAPD pipeline. Heavy atom sites were located using SHELX (65), and phasing, density modification, and initial model building were carried out using the Autobuild routines of the PHE-NIX package (66). Further model building and refinement was carried out manually in COOT (67) and PHENIX, respectively (66). The final model contained one molecule in the asymmetric unit containing residues 1-175 and was refined to 1.68 Å resolution with R work /R free values of 0.1770/ 0.1907 (Table 2).
SeMet Tg⌬185 was crystallized in complex with a 19-mer methylated DNA substrate (meDNA; Table S1) by sitting-drop vapor diffusion in 0.1 M HEPES, pH 7.5, 20% PEG 3350, and 0.20 M (NH4) 2 SO 4 with a drop size of 2 l and reservoir volume of 650 l. meDNA contained a single m 5 C modification in each strand (meDNA upper strand and lower strand oligonucleotides; Table S1) and flanking sequences that produced base pair mismatches in the annealed double stranded duplex that were necessary to obtain diffraction quality crystals. Tg⌬185 and meC15 mismatched DNA were mixed at a molar ratio of 2:1.2 and incubated at room temperature for 10 -15 min prior to crystallization experiments. Crystals appeared within 10 -14 days at 20°C and were of the space group P2 1 2 1 2 1 with unit cell dimensions a ϭ 41.87 Å, b ϭ 56.50 Å, c ϭ 109.28 Å, ␣ ϭ 90.00°, ␤ ϭ 90.00°, and ␥ ϭ 90.00°. The samples were cryoprotected with Parabar 10312 and frozen in liquid nitrogen. An initial 2.64 Å data set (TgMcrB D185 ϩ meDNA 1) was collected at Northeastern Collaborative Access Team 24-ID-E Beamline at the selenium edge energy at 12.663 keV (0.9791 Å) and solved by molecular replacement in PHASER (68) using the unbound Tg⌬185 monomer structure determined from selenium SAD phasing as the search model (Table 2). Discontinuous portions of the DNA could be visualized and built; however, the overall model did not improve significantly beyond the initial rounds of refinement. A more complete model was obtained using the diffraction data from a second crystal, Tg⌬185 ϩ meDNA 2 ( Table 2). This structure was solved by molecular replacement with PHASER (67) using the MR-derived structure from Tg⌬185 ϩ meDNA 1 as the search model. The statistics and resulting maps following subsequent rounds of manual model building and refinement continued to improve, ultimately revealing density for the DNA backbone and individual bases. This strategy proved critical, because the density for the DNA remained poorly resolved if the unbound Tg⌬185 monomer was instead used as a search model for molecular replacement. The final model of crystal 2 contained one molecule in the asymmetric unit containing residues 3-175 with 6 bp of the DNA substrate and was refined to 2.27 Å resolution with R work / R free ϭ 0.2378/0.2893 (Table 2).