Structural basis for the DNA-binding activity of human ARID4B Tudor domain

Human ARID4A and ARID4B are homologous proteins that are important in controlling gene expression and epigenetic regulation but have distinct functions. Previous studies have shown that the N-terminal domain of ARID4A is an unusual interdigitated double Tudor domain with DNA-binding activity. However, how the Tudor domain of ARID4B differs from that of ARID4A remains unknown. Here, we found that the ARID4B Tudor domain has significantly weaker DNA affinity than the ARID4A Tudor domain despite sharing more than 80% sequence identity. Structure determination and DNA titration analysis indicated that the ARID4B Tudor domain is also an interdigitated double Tudor domain with a DNA-binding surface similar to ARID4A. We identified a residue close to the DNA-binding site of the Tudor domain that differs between ARID4A and ARID4B. The Leu50 in ARID4A is Glu50 in ARID4B, and the latter forms salt bridges with two lysine residues at the DNA-binding surface. This causes a decrease in the strength of positive charge, thus reducing DNA-binding affinity while significantly increasing protein stability. We also found that a C-terminal extension region enhances the DNA-binding affinity of the ARID4B Tudor domain. This C-terminal extension is disordered and contains a positively charged RGR motif, providing an additional DNA-binding site. Finally, sequence and phylogenetic analyses indicated that the residue differences and the presence of the RGR extension region are conserved. These results provide new insight into the functional differences between ARID4A and ARID4B proteins, as well as elucidating the function of the disordered regions in these proteins.

Human ARID4A and ARID4B are homologous proteins that are important in controlling gene expression and epigenetic regulation but have distinct functions. Previous studies have shown that the N-terminal domain of ARID4A is an unusual interdigitated double Tudor domain with DNA-binding activity. However, how the Tudor domain of ARID4B differs from that of ARID4A remains unknown. Here, we found that the ARID4B Tudor domain has significantly weaker DNA affinity than the ARID4A Tudor domain despite sharing more than 80% sequence identity. Structure determination and DNA titration analysis indicated that the ARID4B Tudor domain is also an interdigitated double Tudor domain with a DNAbinding surface similar to ARID4A. We identified a residue close to the DNA-binding site of the Tudor domain that differs between ARID4A and ARID4B. The Leu50 in ARID4A is Glu50 in ARID4B, and the latter forms salt bridges with two lysine residues at the DNA-binding surface. This causes a decrease in the strength of positive charge, thus reducing DNA-binding affinity while significantly increasing protein stability. We also found that a C-terminal extension region enhances the DNA-binding affinity of the ARID4B Tudor domain. This Cterminal extension is disordered and contains a positively charged RGR motif, providing an additional DNA-binding site. Finally, sequence and phylogenetic analyses indicated that the residue differences and the presence of the RGR extension region are conserved. These results provide new insight into the functional differences between ARID4A and ARID4B proteins, as well as elucidating the function of the disordered regions in these proteins.
Human ARID4A and ARID4B, also known as retinoblastoma-binding protein 1 (RBBP1) and RBBP1-like protein 1 (RBBP1L1), are both components of the mSin3A complex, which suppresses gene expression and regulates epigenetic marks (1)(2)(3)(4)(5). Both ARID4A and ARID4B contribute to suppression of cancers such as leukemia and regulate the male reproductive process as well as the epigenetics of several genetic diseases including Prader-Willi syndrome (2,(6)(7)(8). ARID4A and ARID4B have also been identified as transcriptional coactivators for the androgen receptor (AR) and RB, which are involved in the regulation of Sertoli cell function and male fertility (9,10). In addition to these shared functions, ARID4A and ARID4B also have important differences in their function. ARID4A contains the RB-binding motif (LXCXE), and specifically interacts with retinoblastoma protein (RB) (11), while ARID4B has no such motif. Functional studies indicate that mice heterozygous (+/−) or homozygous (−/−) for Arid4a deficiency were viable and fertile, while Arid4b −/− mice were not born alive (8). Further, Sertoli cell-specific Arid4b knockout (Arid4bSCKO) mice display several unique and more severe phenotypes than Arid4a −/− Arid4b +/− male mice (12). Searching the BioGRID database indicates that ARID4A has 40 physical and one genetic interactors, while ARID4B has 65 physical and one genetic interactors, of which 18 interactors are common for both proteins (13). Besides the fact that only ARID4A contains the RB-binding motif, the molecular basis of other functional differences between the two protein homologues is still unclear.
Both ARID4A and ARID4B contain five domains (Fig. 1A), three of which are Royal domains, including Tudor, PWWP, and chromobarrel domains. The other two domains are the ARID and R2 domains, which have HDAC-independent and -dependent gene repression activity, respectively. These five domains are highly conserved between the two proteins with sequence identity of 40 to 80%, while disordered regions share much lower sequence identity (<29%) (14). We have previously reported that the chromobarrel domain of ARID4A can bind to methylated histone tails and thus is responsible for the epigenetic regulation function of ARID4A (14). The Tudor domain of ARID4A was demonstrated to be an interdigitated double Tudor domain, like the Tudor domains in the three Jumonji C domain-containing histone demethylases (JMJD2A/ 2B/2C), in which two β-strands at the N terminus and two βstrands at the C terminus form a hybrid Tudor domain (HTD-1), and the middle four β-strands form another Tudor domain (HTD-2) (15). Tudor domains usually recognize methylated lysine or arginine (16,17). The interdigitated double Tudor domains of JMJD2 proteins specifically bind to H3K4me3 by a conserved aromatic box in HTD-2 (18). However, the ARID4A Tudor domain does not bind to methylated histone tails due to lack of the conserved aromatic box. Instead, it binds to the DNA duplex with an affinity (K D ) of 10 to 20 μM mainly via HTD-1 (15).
We noticed that the region (residues 122-146) following the construct of the ARID4A Tudor domain is highly conserved between ARID4A and ARID4B with just one residue difference at position 137, and this region was predicted to possibly contain some ordered structure between residues 122 to 138 (14). This region contains five positively charged residues in the sequence 138-KKTNRGRRS-146, namely the RGR motif, which may also be a DNA-binding site. To study the function of this C-terminal disordered region, we constructed two ARID4B Tudor domain proteins containing residues 1 to 121 (TD121) and residues 1 to 151 (TD151). Surprisingly, ARID4B TD121 showed much weaker DNA-binding affinity than ARID4A TD121, although they have about 80% sequence identity. ARID4B TD151 showed a DNAbinding affinity stronger than both ARID4A TD121 and ARID4B TD121. We determined the solution structure and the DNA-binding sites of ARID4B TD151 and revealed the structural basis of the DNA-binding affinity difference. These results provide molecular insight into the functional differences between the Tudor domains of ARID4A and ARID4B and shed light on the roles of the intrinsically disordered regions in the two proteins.

DNA-binding affinity of ARID4B Tudor domain and the role of the C-terminal extension region
We previously solved the solution structure of the ARID4A Tudor domain including residues 4 to 121, which forms an interdigitated double Tudor domain with DNA-binding activity (15). Notably, sequence containing residues 122 to 146, which follows on from the Tudor domain, is highly conserved between the two ARID4 homologues (Fig. 1B), implying an important function of this region. Therefore, we constructed an extended ARID4B Tudor domain containing residues 1 to 151 (TD151). We also constructed the ARID4B Tudor domain containing residues 1 to 121 (TD121) for comparison.
The structure of the ARID4B Tudor domain is highly similar to the ARID4A Tudor domain, with RMSD 0.66, 0.77, and 2.0 Å between secondary structure backbone atoms of the two HTD-1, the two HTD-2, and the two overall structures, respectively ( Fig. 2D and Fig. S2D). However, compared with the positively charged DNA-binding interface of the ARID4A Tudor domain, the positive charge of the corresponding surface regions of the ARID4B Tudor domain is significantly reduced (Fig. 2, E and F). Detailed analysis indicates that the ARID4B Tudor domain contains a negatively charged residue, Glu50, at the interface, corresponding to Leu50 in the ARID4A Tudor domain, while the other charged residues in the interface are conserved between the Tudor domains of ARID4A and ARID4B (Fig. 2G). The carboxyl group of the Glu50 side chain forms two salt bridges with the side chain amino groups of Lys37 and Lys39. Distances between the O ε atom of Glu50 and the N ξ atoms of the two lysine residues are 3.4 ± 1.0 and 3.5± 1.8 Å, respectively, forming two strong salt bridges (Fig. 2G).

DNA-binding activity of ARID4B Tudor domain
Both the interdigitated double Tudor domain and the Cterminal RGR motif of ARID4B TD151 interact with DNA We then performed NMR titration of ARID4B TD151 by gradually adding dsDNA1, which caused significant chemical shift perturbations (Fig. 3, A and B). The affinity was obtained by fitting the NMR titration data and the K D obtained was 22 μM (Fig. 3C), in agreement with the ITC result. Besides chemical shift perturbations, addition of dsDNA1 also enhanced NH signal intensities of C-terminal disordered residues of ARID4B TD151, mainly within the sequence 137-GKKTNRGRRS-146 (RGR motif, intensity ratio >3) and Gly110-Ile136 (3 > intensity ratio >1.5) (Fig. 3, D and E). This suggests that the RGR motif of ARID4B TD151 undergoes intermediate exchange between multiple conformations in the free protein leading to line broadening, but adopts a more rigid conformation with stronger and sharper signals when bound to dsDNA.

DNA-binding activity of ARID4B Tudor domain
Tudor domain, similar to the ARID4A Tudor domain (15), while the second is the RGR motif containing a heavily positively charged region, which may contribute to DNA binding by electrostatic interactions (Fig. 3F). ARID4B TD121 has an affinity (K D ) of 110 μM for dsDNA1 estimated by fitting the NMR titration data (Fig. 4, A and B), about five times weaker than ARID4B TD151, which is consistent with the EMSA result. Similarly, the affinity of ARID4B TD121 for dsDNA2 obtained from titration was 237 μM, about three times weaker than ARID4B TD151 (K D 79 μM) (Fig. 4, C-F). Consistent with this, the affinity of ARID4A TD151 with dsDNA1 was measured as 9.7 μM (K D ) (Fig. 4, G and H), about 2 to 3 times stronger than ARID4A TD121 (K D 27 μM) (15) and ARID4B TD151 (K D 22 μM). These results confirm that the C-terminal disordered region can enhance DNA-binding affinity of the Tudor domain.

RGR motif of ARID4B TD151 weakly prefers AT-rich DNA
The C-terminal RGR motif of ARID4B TD151 is similar to the AT-hook motif, which contains a Pro-Arg-Gly-Arg-Pro sequence (20). The RGR motif of ARID4B TD151 lacks the two proline residues. As the AT-hook motif is well known to specifically interact with AT-rich DNA, especially with A-tract DNA (21), we measured affinities of ARID4B TD151 for ATrich, A-tract, and GC-rich DNAs, finding that AT-rich and A-tract DNAs have similar affinities, while both are about twofold stronger than GC-rich DNA (Fig. 5, A-C), suggesting a very weak sequence preference for AT-rich DNA. To elucidate the role of each residue of the RGR motif in DNA binding, we constructed a series of point mutants in the C-terminal RGR motif of ARID4B TD151, including K139A, R142A, G143A, R144A, and R145A, and measured their binding affinity with dsAT18 and dsGC18 (Fig. 5D). The results showed that these mutations decreased the affinity by up to fivefold. Affinity ratios of dsAT18 versus dsGC18 binding to mutants R142A, G143A and R144A, which are core residues in the RGR motif, are 1.1 to 1.3, while the ratios for the mutants K139A and R145A are 0.4 to 0.5, which is similar to wildtype ARID4B TD151 (ratio 0.5) (Fig. 5D). These results indicated that the weak preference of ARID4B TD151 for AT-rich DNA is achieved by the core residues Arg142, Gly143, and Arg144, while surrounding residues such as K139 and R145 do not contribute to the preference but contribute to the overall DNA-binding affinity.

Glu50 decreases DNA affinity but increases thermostability of ARID4B TD121
Previous studies have obtained K D values for binding of ARID4A TD121 to dsDNA1 and dsDNA2 of 27 μM and 16 μM, respectively (15), which are 4 to 15 times stronger than the affinities of ARID4B TD121 for these dsDNAs (Fig. 4, A, B, E and F). As stated above, the structure determination revealed that most of the charged residues of the DNA-binding site of ARID4A Tudor domain are conserved in the ARID4B Tudor domain except that Leu50 is replaced by the negatively charged Glu50 in the ARID4B Tudor domain (Fig. 2G). Glu50 in the ARID4B Tudor domain forms two salt bridges with Lys37 and Lys39, while the corresponding residues in the ARID4A Tudor domain are indicated to contact with DNA (15). We therefore suspected that Glu50 is the key reason for the lower DNA-binding affinity of the ARID4B Tudor domain. To confirm the effect of Glu50/Leu50 on DNA binding, we mutated Glu50 of ARID4B TD121 to Leu, and Leu50 of ARID4A TD121 to Glu. EMSA results showed that the E50L mutation of ARID4B TD121 increased the DNA-binding affinity, while the L50E mutation of ARID4A TD121 decreased the DNA-binding affinity (Fig. 6A), confirming the attenuation effect of Glu50 on DNA binding. Interestingly, this mutation significantly increased the stability of the ARID4A Tudor domain, as the mutant did not precipitate after 2 h incubation at 30 C, similar to ARID4B TD121, while the wild type underwent significant precipitation under the same conditions, with only 30% protein remaining in the supernatant (Fig. 6B). The effect of Glu50/Leu50 on the stability was further investigated by DSC, which indicated that the L50E mutation of ARID4A TD121 increased the T m value from 40.6 C to 48.2 C, while the E50L mutation of ARID4B TD121 decreased the T m value from 50.8 C to 44.5 C (Fig. 6C). Therefore, the salt bridges between Glu50 and Lys37/Lys39 in the ARID4B Tudor domain significantly increase thermostability of the protein and simultaneously decrease its DNA-binding affinity.
HADDOCK structure model of ARID4B TD151 complex with DNA Based on the NMR titration results for ARID4B TD151 using dsDNA1, we attempted to construct a structural model

DNA-binding activity of ARID4B Tudor domain
for the complex of ARID4B TD151 and dsDNA1 using HADDOCK docking (22). The chemical shift perturbations obtained from the NMR titration experiments were used to define residues involved in the interaction, which includes residues from the folded region and the C-terminal segment, 138-KKTNRGRRS-146 and Ile149. However, we found that the long C-terminal loop between Gly110 and Gly137 disturbed model construction if we used the full-length ARID4B TD151 in the docking. We therefore used the structural region (residues 9-110) and the C-terminal peptide (138-KKTNRGRRS-146) separately for HADDOCK docking.
Analysis of the final 200 HADDOCK models for the ARID4B Tudor domain and dsDNA1 complex resulted in ten clusters and the statistics of the top seven clusters are displayed in Table S1. The top cluster has the largest number of structures and the lowest HADDOCK-and Z-scores, with an RMSD value of 2.0 ± 1.2 Å and buried surface area of 1422 ± 161 Å 2 . The best model in the top cluster is shown in Figure 7A.
In this model of the complex, the DNA duplex mainly binds to the ARID4B HTD-1 at sites containing β1, β4', loop L12, the Cterminal loop of HTD-1, and the linker between HTD-1 and HTD-2 (Fig. 7A). The axis along HTD-1 and HTD-2 of TD151 is almost parallel to the DNA duplex, with L12 and β1 making contact with the DNA major groove, while β4' and the C-terminal loop of HTD-1 make contact with the DNA minor groove and backbone atoms. The structure model of the complex is quite similar to that of the ARID4A Tudor domain with dsDNA2 (Fig. 7, B and C). Detailed analysis of the model indicates that residues Lys19-Arg21 of L12 extend into the DNA major groove, whereas the side chains of Trp88 and Arg102 of the C terminus extend into the minor groove (Fig. 7D). Side chains of Lys37 and Gln52 in HTD-2, and Ser104 and Ser105 in HTD-1 also contact the DNA backbone atoms, while Glu50 has no contact with DNA but still forms a salt bridge with Lys37 (Fig. 7D). These residues form a positively charged DNAbinding surface, which has electrostatic interactions and physical complementarity with DNA (Fig. 7E). These contact residues and the positively charged interacting surface of the ARID4B Tudor domain are basically similar to the structure model of the complex between ARID4A Tudor domain and DNA, although the positive charge of ARID4B Tudor domain is partially neutralized by Glu50 (Fig. 7, F and G).
We then built the HADDOCK model of the DNA complex with the C-terminal RGR motif (138-KKTNRGRRS-146) of ARID4B TD151. As the short segment of peptide is not suitable for cluster analysis using HADDOCK, we chose ten structures with the lowest HADDOCK scores from the final 200 structures to represent the complex in the model (Fig. 7H). In all ten structures, the peptide binds to the DNA minor groove. In the representative structural model, side chains of N141, R144, and R145 penetrate into the groove, and the positively charged side chains of lysine and arginine residues also have electrostatic interaction with DNA (Fig. 7I). Combining the HADDOCKderived structures of the complexes of the ARID4B Tudor domain and the C-terminal RGR peptide with DNA, Figure 7J shows a model in which the ARID4B Tudor domain and the Cterminal peptide cooperatively bind to the DNA duplex. The Tudor domain binds to both major and minor grooves of the DNA duplex, while the RGR motif only binds to the minor groove. By sliding and rotation along the grooves of the dsDNA1, binding of about three ARID4B TD151 molecules could be accommodated (Fig. S3), which agrees with the stoichiometry obtained in NMR and ITC experiments. Interestingly, the model also indicates that the structured Tudor domain and the RGR motif can bind to opposite sides of a short DNA duplex without spatial hindrance, because the linker between the Tudor domain and the RGR motif is around 30 residues, which is long enough to allow binding to distal sites.

Discussion
This study reveals that the Tudor domains of ARID4A and ARID4B have different DNA-binding affinities and stability, although the two Tudor domains share 80% sequence identity.
Our results indicate that both domains bind to DNA using similar structural regions, but the one-residue difference at position 50 is the major reason for the differences in DNA-binding affinity and protein stability. Interestingly, detailed structure-based alignment of the two Tudor domains indicates that most residues that differ between the two Tudor domains (green residues in Fig. 1C) are located either within HTD-2 (Fig. S4A) or in the N-and C-terminal disordered regions. These residues are generally far away from DNA-binding sites, except that Glu50 of ARID4B forms two salt bridges with DNA-binding lysine residues and decreases the DNA affinity. Besides Glu50, we noticed that a hydrophobic residue Val68 is located in the highly hydrophobic core of HTD-2 of the ARID4B Tudor domain, surrounded by hydrophobic residues Phe42, Val49, and Val51 (Fig. S4B), while in the ARID4A Tudor domain structure, the corresponding residue is a less hydrophobic Thr68 surrounded by Leu42, Gln49, and Val51 (Fig. S4C). Because both Val68 of ARID4A and Thr68 of ARID4B are in the hydrophobic core, the difference in their hydrophobicity may also lead to the observed difference in stability. Consistent with this, we found that wild-type ARID4B TD121 (T m 50.8 C) has greater thermostability than the L50E mutant of ARID4A TD121 (T m 48.2 C), and the E50L mutant of ARID4B TD121 (T m 44.5 C) has greater thermostability than wild-type ARID4A TD121 (T m 40.6 C) (Fig. 6C), suggesting an important role of hydrophobic core formation involving Val68 and surrounded residues for stability of the interdigitated Tudor domain.
Besides the core interdigitated double Tudor domain, we also investigated the C-terminal positively charged disordered RGR motif of ARID4B TD151 and found that the RGR motif can bind to the DNA duplex and enhance the DNA-binding affinity of ARID4B TD151 by about fivefold. The docking results indicate that the C-terminal RGR motif prefers to bind to the DNA minor groove. The RGR motif is similar to the AThook motif containing a conserved Arg-Gly-Arg-Pro sequence, which can penetrate into the minor groove through the side chains of the two Arg residues during DNA binding (20). However, the RGR motif of ARID4B lacks the proline residue, which is conserved in the AT-hook motif and proposed to be critical for conformational adaption of the AT-hook motif to the DNA minor groove (20). Our results indicate that the lack of the two prolines leads to a very weak preference of ARID4B RGR motif for AT-rich DNA, which we confirmed by mutational analysis. Therefore, the RGR motif can be considered as an AT-hook-like motif belonging to a positively charged extension of the DNA-binding domain, which has been discovered in many DNA-binding proteins (23).
Sequence alignment and phylogenetic analysis of the Tudor domains and the C-terminal extensions of the ARID4A/ ARID4B family proteins revealed different conservation of the Glu50/Leu50 residues and the RGR motif (Fig. 8). Higher animals from Danio rerio to Homo sapiens contain both ARID4A and ARID4B homologues, while lower animals contain only one homologue. In higher animals, all ARID4B Tudor domains contain the Glu residue at the corresponding position to Glu50, while all ARID4A Tudor domains contain a Leu/Val residue at the same position. The corresponding residues in the homologues of lower animals show more variation without charge, which is more similar to ARID4A than ARID4B (Fig. 8A). Therefore, Glu50 of ARID4B has likely evolved for the specific function of ARID4B after it diverged from an ARID4A-like ancestor (Fig. 8B), implying its importance for the function of ARID4B. Our observation of different DNA-binding affinity and stability caused by the difference between Glu50 and Leu50 could be related to the specific functions of the two homologous proteins. The RGR motif is largely conserved in both ARID4A and ARID4B in higher animals, but less conserved in the ARID4A-like homologues of lower animals. Therefore, the function of the RGR motif is likely to be important for both ARID4A and ARID4B in higher animals, but less important in lower animals. The results presented in this study not only provide novel molecular insight into the functional differences between the homologous ARID4A and ARID4B proteins, but also shed light on the role and importance of the RGR intrinsically disordered region in the ARID4A/ARID4B protein family.

NMR spectroscopy
All NMR experiments on ARID4B TD151 to obtain NMR assignments and distance restraints were performed at 302 K on Bruker AVANCE 600 MHz or 800 MHz spectrometers, each of which was equipped with a triple-resonance cryoprobe. NMR samples of ARID4B TD151 contained 0.6 mM protein in buffer A (20 mM Na 2 HPO 4 -NaH 2 PO 4 , 100 mM NaCl, pH7.0), with addition of 5 mM DTT, 0.02% (w/v) sodium 2,2dimethylsilapentane-5-sulfonate (DSS), and 10% (v/v) D 2 O. The two-dimensional 1 H-15 N and 1 H-13 C heteronuclear single quantum coherence (HSQC) and three-dimensional CBCA(CO)NH, HNCACB, HNCO, HN(CA)CO, HBHA(CO) NH, HCCH-TOCSY, CCH-TOCSY experiments were performed for backbone and side chain assignments of ARID4B TD151. The three-dimensional 1 H-15 N and 1 H-13 C NOESY-HSQC spectra with mixing times of 120 ms were collected to generate distance restraints. All data were processed using NMRPipe (25) and analyzed using NMRViewJ (26). Proton chemical shifts were referenced to the internal DSS, and 15 N and 13 C chemical shifts were referenced indirectly.

Structure calculations
The ARID4B TD151 structure was initially calculated using the program CYANA (27), and then refined using CNS (28) with semiautomated NOE assignments by SANE (29). Backbone dihedral angle restraints obtained using CSI 3.0 (19) and TALOS-N (30), as well as hydrogen-bond restraints according to the regular secondary structure patterns, were also used in the structural calculation. From 200 CNS-calculated conformers, 50 lowest-energy conformers were selected for further water refinement using CNS and RECOORDScript (31). The resulting 20 energy-minimized conformers were used to represent the solution structure of ARID4B TD151. The quality of the determined structure (Table 1) was analyzed using PROCHECK-NMR (32) and MolMol (33). Structural figures were created with MolMol (33) and PyMOL (34).

DNA titration
DNA duplexes used in the titration experiments were 12mer dsDNA2 and 18-mer dsDNA1 (15)   DNA-binding activity of ARID4B Tudor domain cooled slowly to room temperature. DNAs were further purified by gel filtration and then concentrated. The stock solution contained 5 mM DNA duplexes in buffer A. ARID4B TD121 and TD151 protein samples were extensively dialyzed against buffer A before the titration.
Interaction of ARID4B TD121 and TD151 with DNA was monitored by recording a series of two-dimensional 1 H-15 N HSQC spectra of proteins at each DNA titration point. The observed chemical shift perturbations (CSPs) of the protein resonances were calculated using the equation: Where δ HN and δ N are the changes of 1 H N and 15 N chemical shifts, respectively. The equilibrium dissociation constants (K D ) of protein with DNA were estimated by fitting the CSPs to the equation: where CSP max is the CSP value at the theoretical saturated condition obtained from the titration curve fitting; r is the molar ratio of DNA to protein; C pro is the concentration of initial protein solution; C lig is the stock concentration of DNA. n is the number of equivalent and independent binding sites on the DNA, while the physical meaning of the obtained value of n is complicated as it could also account for any uncertainty in DNA and protein concentrations that were fixed in fitting.
To be consistent with fitting for the ARID4A Tudor domain (15), n was fixed as 5 and 3 in the fitting curves of ARID4B TD121 titration with 18-bp dsDNA1 and 12-bp dsDNA2, respectively, and fixed as 3 and 2 for ARID4B TD151 titrated with 18-bp dsDNA1 and 12-bp dsDNA2, respectively.

Isothermal titration calorimetry (ITC)
ITC measurements were performed at 25 C on an iTC-200 calorimeter (MicroCal, Inc). The titrations were carried out in buffer A. The reactant (0.1 mM protein) was placed in the 200-μl sample cell. Then dsDNA solutions in an injection syringe (0.6 mM) were injected into protein solutions in the cell. The volume of each injection was 2 μl except for the first injection, which was 0.4 μl. A titration experiment consisted of 20 consecutive injections of 4 s duration, with a 120 s interval between injections. Control experiments were performed under identical conditions to determine the heat signals that arise from addition of DNA into the buffer. The resulting data were fitted to a single-site binding model using the Origin software package (MicroCal, Inc).

Thermostability test
Protein samples of ARID4B TD121, ARID4A TD121, and ARID4A L50E mutant, each containing 0.25 mM proteins in 50 μl volume, were incubated for 2 h at 30 C in the buffer containing 50 mM Tris-HCl (pH 7.5) and 50 mM NaCl. Samples were then centrifuged at 13,000g for 30 min. The absorbance of the supernatant at 280 nm was then measured to determine the concentration.

Differential scanning calorimetry (DSC) experiments
DSC measurements were performed using a Nano DSC system (TA). Prior to scanning, samples were degassed under vacuum for 15 min using a degassing station (TA). DSC thermograms were determined by monitoring the difference in heat capacity in solution upon increasing temperature at a scan rate of 1 C min −1 by heating the sample from 15 C to 75 C under increased pressure (3 atm). All proteins used in this study were extensively dialyzed against a buffer containing 50 mM NaCl, 50 mM Tris, pH 7.6, and the dialysis buffer was used for instrumental baseline scans and as reference samples. Protein concentrations used were 1.0 mg/ml, corresponding to 75.0 μM for ARID4A/ARID4B TD121 proteins. Data were fitted to a two-state scaled model using NanoAnalyze software.

HADDOCK modeling
Structure modeling of the ARID4B TD151 and DNA complex was performed using HADDOCK (22). The starting structural coordinate files for HADDOCK were generated from the 20 conformers of the ARID4B TD151 solution structure and B form dsDNA1 duplex built using the Web 3DNA server (35). For HADDOCK calculations, active residues for ARID4B TD151 were defined as those having weighted CSPs larger than the mean plus standard deviation in the dsDNA1 titration. As residues within the long loop between Gly110 and Lys138 show minor CSP values and significantly affect the docking process, we performed haddock docking with dsDNA1 duplex separately for residues 1 to 110 and 138 to 151 of ARID4B TD151. Residues 1 to 8 and 147 to 151 were also deleted after initial docking as they are not important for DNA binding and their flexibility could lead to steric hindrance during the docking process. Docking of residues 9 to 110 with DNA was performed using the HADDOCK2.2 webserver (22). Passive residues were automatically defined around the active residues by HADDOCK. The active residues were optimized according to the initial docking result, and the final active residues included Lys19-Gly22, Lys33, Lys37, Gln52, Trp88, and Lys99-Ser105. All the bases of the dsDNA1 sequence were considered active in the initial docking. Bases 4 to 8, 10 to 14, and 24 to 28 of dsDNA1 were defined as active residues at final docking, and passive residues of dsDNA1 were automatically defined. A total of 1000 initial structures of the complex were generated for rigid-body docking, and the 200 lowest-energy structures were further refined in explicit water after semiflexible simulated annealing. A cluster analysis was performed on the final 200 water-refined structures based on a 0.6 Å RMSD cutoff criterion. The clusters were ranked based on the averaged HADDOCK score of their top ten structures. The structure in the cluster with the lowest HADDOCK score was selected to represent the model of the ARID4B Tudor domain and dsDNA1 complex.
Docking of the RGR motif, 138-KKTNRGRRS-146, with DNA was performed using HADDOCK2.2 on a local machine. All peptide residues were considered as active during docking. All DNA bases were considered active in the initial docking, and the active residues were then optimized according to the initial docking result. The final active residues included bases 8 to 12 and 25 to 28. As the short peptide sequence is not suitable for clustering, we chose ten structures with the lowest haddock scores from 200 final water-refined structures to represent the model of the complex between DNA and the peptide.
A structure model of the ARID4B TD151 and dsDNA1 complex was constructed by manually combining the models of the Tudor domain-dsDNA1 complex and the RGR motif-dsDNA1 complex. The distance and orientation between the Tudor domain and RGR motif in this model are arbitrarily chosen without further optimization.

Data availability
All atom assignments of ARID4B TD151 have been deposited in BMRB under accession number 50612. The structure and the restraints have been deposited in the Protein Data Bank under accession number 7DM4 for ARID4B TD151. All remaining data are contained within the article.
Supporting information-This article contains supporting information.