Structural basis of substrate specificity in human cytidine deaminase family APOBEC3s

The human cytidine deaminase family of APOBEC3s (A3s) plays critical roles in both innate immunity and the development of cancers. A3s comprise seven functionally overlapping but distinct members that can be exploited as nucleotide base editors for treating genetic diseases. Although overall structurally similar, A3s have vastly varying deamination activity and substrate preferences. Recent crystal structures of ssDNA-bound A3s together with experimental studies have provided some insights into distinct substrate specificities among the family members. However, the molecular interactions responsible for their distinct biological functions and how structure regulates substrate specificity are not clear. In this study, we identified the structural basis of substrate specificities in three catalytically active A3 domains whose crystal structures have been previously characterized: A3A, A3B- CTD, and A3G-CTD. Through molecular modeling and dynamic simulations, we found an interdependency between ssDNA substrate binding conformation and nucleotide sequence specificity. In addition to the U-shaped conformation seen in the crystal structure with the CTC0 motif, A3A can accommodate the CCC0 motif when ssDNA is in a more linear (L) conformation. A3B can also bind both U- and L-shaped ssDNA, unlike A3G, which can stably recognize only linear ssDNA. These varied conformations are stabilized by sequence-specific interactions with active site loops 1 and 7, which are highly variable among A3s. Our results explain the molecular basis of previously observed substrate specificities in A3s and have implications for designing A3-specific inhibitors for cancer therapy as well as engineering base-editing systems for gene therapy.

APOBEC3s (A3s) are a family of cytidine deaminases that have seven members in human with functions in innate immunity and roles in cancer (1)(2)(3)(4)(5). All A3 domains share a conserved structural fold with an active site zinc tetrahedrally coordinated with catalytic His and Cys residues and an additional water. The human A3s have either one (A3A, A3C, and A3H) or two zinc-binding domains (A3B, A3D, A3F, and A3G). The two-domain A3s consist of a catalytically active C-terminal domain (CTD) and an N-terminal domain (NTD) that binds to substrate but has no deamination activity. A3s deaminate cytosine to uracil on single-strand DNA (ssDNA) and certain RNAs (6,7), thus creating mutations.
Through deamination, A3s play crucial roles in innate immunity by mutating foreign pathogenic genomes and thus protecting host cells against retroviruses and retrotransposons (8)(9)(10)(11)(12)(13)(14). Specifically, A3s deaminate cytosines to uracils on ssDNA during reverse transcription to create G to A hypermutations on the complementary strand. However, misregulated A3 deamination activity may promote cancer and the development of therapeutic resistance. Overexpressed A3s, especially A3A, A3B, and A3H, have been shown to cause heterogeneities in multiple cancers, including breast, bladder, head and neck, cervical, and lung cancer (15)(16)(17)(18). The A3 mutational signature, which is C to T transition in TC context, has been observed in multiple cancer genomes (15,16,19). Moreover, study of human cancer cell lines has suggested that A3s may be involved in the origination of cancer (20). Recently, coupled with CRIPSR/Cas9, A3s are explored as novel base editors to treat genetic diseases (21,22).
Although A3s share highly similar overall structural folds, they have varying levels of deamination activity and substrate specificity. For instance, the activity of A3A, which is the highest in A3 family, can be up to 5000-fold higher compared with the least active A3D (50). All A3 proteins deaminate deoxy-cytidines in ssDNA, but vary in their preferred hotspot sequences or deamination motifs: 5'-(T/C)TC(A/G) for A3A, 5 0 -ATC(A/G) for A3B and 5 0 -CCC(A/C/T) for A3G (50)(51)(52)(53)(54). However, based on the amino acid sequence or even the available structures, the molecular mechanisms that are responsible for varied ssDNA binding affinity and deamination activity as well as substrate specificities among A3 domains are not apparent. The loops (loop 1, 3, 5, and 7) surrounding the active site pocket are the least conserved region of A3s ( Fig. 1A; Fig. S1). In addition, these active site loops undergo substantial conformational changes relative to apo structures upon ssDNA binding. Therefore, changes surrounding the active site including variations in the arrangements of these loops are likely largely responsible for distinctions in substrate specificity, binding affinity, and deamination activity for ssDNA, as well as the physiological functions of A3s. Structural analysis using molecular modeling and dynamics simulations have been used for investigating the molecular mechanisms underlying biological functions of A3s (55)(56)(57)(58)(59)(60). These studies have revealed the importance of active site loops in substrate binding/activity and specificity. However, the structural mechanism for substrate specificities in A3s is largely unknown.
In this study, we investigate the structural mechanism of substrate specificity and ssDNA-binding conformation in A3s using a combination of molecular modeling, structural analysis, and parallel molecular dynamics (pMD) simulations (7,(61)(62)(63)(64)(65). Three domains of the human A3 family that have high enzyme sequence similarity yet varied substrate specificity are the focus of this study: A3A, A3B-CTD, and A3G-CTD. These enzymes are the best experimentally characterized A3s in terms of available structures and substrate specificity (Figs. 1 and 2) (47-49, 61). In our analyses, A3 domains were . ssDNA (orange) and the residues coordinating the zinc (H70, E72A, C101, and C106) are shown as sticks. The catalytic zinc is shown as gray sphere. Amino acid sequence alignments of active site loops in A3A, A3B-CTD, and A3G-CTD are given on the right. B, crystal structure of chimeric A3B-CTD (PDB ID: 5TD5) and A3G-CTD (PDB ID: 6BUX) in cartoon representation. C, schematic representation of the ssDNA binding conformations: U-shape as seen in A3A or chimeric A3B-CTD, or a more extended linear (L) shape as seen in A3G-CTD. The target C is in the zero (0) position with the active site zinc indicated by the dark orange circle, and the other nucleotide positions are represented by orange diamonds along the ssDNA from 5 0 to 3 0 direction (blue arrow). characterized bound to ssDNA of varying nucleotide sequences in either U or L-shape conformation, and the results show an interdependence between substrate specificity and ssDNA conformation. Although the ssDNA was crystallized in a U-shape with A3A and (chimeric) A3B-CTD (47,49), we find that the A3A-ssDNA enzyme complex adopts varied conformations in a substrate dependent manner, binding ssDNA with a CTC 0 motif in a U-shape and CCC 0 in an L-shape, A3B can accommodate either conformation in the recognition of its preferred ATC 0 substrate motif, while A3G only stably recognizes CCC 0 with the ssDNA in an L-shape. These sequencespecific conformations are stabilized by specific interactions with the highly variable loops, loop-1 and loop-7, of the A3 enzymes ( Fig. 1). Thus, A3 family of enzymes accommodate varied substrate specificity through conformational rearrangements of both the active site loops and the ssDNA substrate.

Results
The three A3s investigated, A3A, A3B-CTD, and A3G-CTD, were modeled and energy minimized with the DNA conformation either in a wrapped (U) or linear (L) shape based on that in the cocrystal structures with A3A and A3G-CTD (47,48) respectively (Table S1). The preferred trinucleotide deamination motif is (C/T)TC 0 for A3A, ATC 0 for A3B-CTD, and CCC 0 for A3G-CTD (50)(51)(52)(53)(54). For each A3 enzyme the preferred motif of substrate DNA in both the U and L conformation were characterized. The complexes investigated together with full ssDNA nucleotide sequences are tabulated in Table S1. To analyze the relative stability of these A3-ssDNA complexes, fully solvated pMD simulations were performed in triplicate at 25 C (300 K) to critically compare and evaluate the molecular interactions with DNA between the A3s.
Substrate specificity and conformation correlate with overall dynamics in the simulations The wild-type A3A-CTC 0 (U) complex, which has DNA conformation in the cocrystal structure of A3A-DNA (47) with the preferred substrate sequence, was stable during the MD simulations as expected. The bound DNA had relatively small root-mean-square fluctuations (RMSF), less than 1.8 Å for the central five nucleotides ( Fig. 2A; Fig. S2A). In addition, the key molecular interactions between the target cytidine (C 0 ) and A3A observed in crystal structures were all well maintained throughout the course of the simulation. These included hydrogen bond interactions with His29 (gatekeeper), Thr31, Asn57, Ala71, Glu72, Ser99, Tyr130 and stacking interactions with Tyr130 and His70 ( Fig. 3A; Fig. S3). To probe whether A3A can also stably bind ssDNA in a conformation similar to that observed in the A3G-CTD DNA-bound crystal structure, the identical substrate sequence was simulated in a  linear conformation in the A3A-CTC 0 (L) complex. The ssDNA in this complex had significantly larger fluctuations compared with A3A-CTC 0 (U) complex ( Fig. 2A; Fig. S2A), with the notable exception of the C 0 position (RMSF A3A-CTC 0 (U)/A3A-CTC 0 (L): 2.9/1.6 Å at −3 0 , 1.6/1.1 Å at −2 0 , 1.1/0.7 Å at −1 0 , 0.5/0.6 Å at 0 0 , 1.8/0.9 Å at +1 0 and 2.8/1.8 Å at +2 0 position). In addition, the gatekeeper residue, His29, which is critical for stabilizing ssDNA binding in the U-shape seen in the crystal structure (47), flipped to the apo conformation ( Fig. 3A; Fig. S3). As a result, the stacking interactions with downstream nucleotides (+1, +2, +3) and hydrogen bonds with the DNA backbone were lost. Together these results suggested that A3A prefers binding ssDNA bearing a CTC 0 motif in U-shape rather than a linear conformation.
Next, A3A bound to DNA with a CCC 0 motif, which is still deaminated but reportedly with an 3-fold lower efficiency compared with CTC 0 (6), was investigated. In contrast to A3A-CTC 0 (L), the A3A-CCC 0 (L) complex was highly stable, comparable to the A3A-CTC 0 (U) complex as indicated by both the RMSF of the ssDNA and stability of critical interactions with A3A (Figs. 2A and 3A; Figs. S2A and S4). The A3A-CCC 0 (L) complex was also significantly more stable than A3A-CCC 0 (U) as shown by the relative RMSFs ( Fig. 2A). In fact, in the destabilized A3A-CCC 0 (U) simulation, His70, which coordinates the catalytic zinc, lost stacking interactions with the target cytidine (70% occupancy in CCC 0 (L); 50% occupancy in CTC 0 (U)), appearing to destabilize the active site. Thus, A3A was able to stably bind ssDNA with both CTC 0 and CCC 0 motifs (6), albeit using varied interactions and with the DNA in distinct conformations: CTC 0 motif was stable in a U-shape while the CCC 0 motif was stable in an L-shape.
Similar to the case with A3A, the A3B-ATC 0 (U) complex (61), which corresponds to our previously presented wild-type complex (derived from the crystal structure of chimeric A3B/ A3A-DNA complex (49)), was stable during the MD simulations. However, unlike A3A, which preferred binding ssDNA depending on the target dinucleotide motif in either U-or Lshaped, A3B showed strong preference for ATC 0 over ACC 0 motif in MD simulations regardless of the DNA conformation. The RMSF of bound DNA in both A3B-ATC 0 complexes, especially for −1 0 and 0 0 nucleotides (less than 1 Å and 0.6 Å, respectively), was relatively small compared with those in A3B-ACC 0 complexes ( Fig. 2B; Fig. S2). The molecular interactions between the deamination target nucleotide C 0 and A3B were maintained only in ATC 0 but not ACC 0 complexes ( Fig. 3B;  Fig. S4). In the A3B-ACC 0 (U) complex, the side chain of active site residue Asn240 changed conformation, and hydrogen bond with the ssDNA backbone was lost. This conformer change is similar to what was previously observed for A3G nonsubstrate (rC) simulations (7), which suggests that cytidine was unstable and not poised for deamination. In conclusion, A3B can accommodate both DNA conformations but strongly prefers ATC 0 substrate.
The A3G-DNA complexes were considerably more dynamic compared with A3A and A3B, correlating with lower binding affinity (Table S2). ssDNA-binding affinity has been shown to be a good indicator of A3 activity and substrate specificity (6). The wild-type A3G-CCC 0 (L) complex, which represents the crystal structure, was the most stable A3G complex during the MD simulations as indicated by relatively low fluctuations of bound DNA (especially the central five nucleotides) and the most stable interactions between 0 0 and −1 0 cytidines and protein ( Fig. 2C; Fig. S5). In contrast, the A3G-CCC 0 (U) complex had the highest RMSF of substrate DNA with 4.3 Å. The hydrogen bond interactions as well as the stacking interactions between His216 and ssDNA were lost in both the A3G-CTC 0 (U) and A3G-CCC 0 (U) complexes ( Fig. 3C; Fig. S5). Thus, for A3G only DNA with the CCC 0 motif in a linear conformation formed a stable complex in the simulations.
APOBEC3 preferred ssDNA-binding conformation is defined through interactions with "loop 1" Among all the active site loops, loop 1, whose sequence is highly variable, had the most extensive molecular contacts with ssDNA (Fig. 1). ssDNA either wrapped around (as observed in the U-shaped complexes) or extended along (as observed in the L-shaped complexes) loop 1 (Fig. 4A). As a result, loop 1 contributed considerable van der Waals (vdW) interactions with ssDNA to stabilize binding ( Fig. 4B; Fig. S6). In A3A, the shorter loop 1, with a three-residue deletion relative to A3B and A3G, may allow ssDNA to bind in both conformations. In the A3A-CTC 0 (U) complex, ssDNA wrapped around the gatekeeper residue His29 in loop 1. The neighboring Arg28 stacked with upstream bases (−2; −3) and thus stabilized the U conformation. These two residues in loop 1 had the most vdW contacts with ssDNA. In A3A-CCC 0 (L) complex, the three-residue deletion in loop 1 allowed ssDNA to extend and make interactions with His182 in alpha-helix 6.
In this binding conformation, Ile26 of loop 1 packed with upstream bases and thus had the second highest vdW interactions with ssDNA. Thus loop 1 of A3A both enabled accommodating two different DNA conformations, which permits differential specificity, and established critical interactions with the substrate in both cases. In A3B and A3G, loop 1 is longer compared with A3A (Fig. 1B). In the A3G-DNA cocrystal structure, Trp211 in the 210-PWV-212 insertion stacks with the −3 0 base and thus stretches the bound ssDNA into the more extended L-shape. This interaction was maintained in the complex during simulations and Trp211 was the residue with the strongest vdW interactions with DNA, indicating that Trp211 is critical for the L binding conformation of ssDNA. Similarly, in the A3B-ATC 0 (L) complex, Pro206 of the 206-PLV-208 insertion in loop 1 packed with the −3 0 nucleotide, thus stabilizing the overall extended L-shape. In A3B, binding DNA in a U-shape may be enabled by Arg210 in loop 1. Arg210 in A3B has a unique conformation compared with other A3s (40,61): Arg210 replaces the position of the conserved Arg313 position in loop 7 and stabilizes a conserved network of hydrogen bonds initially revealed in apo A3 crystal structures as providing structural stability. As a result, the side chain of Arg210 is oriented toward the core of the protein and thus results in a cavity next to the gatekeeper residue Arg211 (61). This cavity appears to allow ssDNA to wrap around Arg211 and bind in a U conformation in the A3B-ATC 0 (U) complex (Fig. 4B). Loops surrounding the active site, including loop 1, also contribute to the electrostatics of the enzyme surface to accommodate the negatively charged DNA backbone. Although still positively charged, the active site of A3G-CTD is weaker and not as positively charged compared with A3A and A3B-CTD (Fig. 4C), correlating with the lower binding affinity (Table S2).
Substrate conformation and nucleotide specificity at −1 0 position of the substrate DNA We previously reported that A3A can bind either thymidine or cytidine at −1 0 position with slight preference of T over C (experimental K d 85 nM versus 250 nM) (6). Our MD simulations indicate that the TC 0 and CC 0 motifs are accommodated in distinct DNA-binding conformations and provide insights into preference for the TC 0 motif: thymidine   Table S3). The same hydrogen bonds are also observed in A3A-DNA cocrystal structure (47,49). However, the C −1 in the unstable A3A-CCC 0 (U) complex (see above) lost the hydrogen bond interaction with the backbone of Tyr132 and was more dynamic with RMSF about twofold higher than that of −1 0 thymidine in A3A-CTC 0 (U) ( Fig. 2A; Fig. S2). In contrast, C −1 in A3A-CCC 0 (L) complex revealed stable interactions with A3A protein ( Fig. 2A; Fig. S2). The C −1 formed stable hydrogen bonds with the backbone of Tyr132 and side chain of Asp131 through a different side chain conformation (Fig. 5A right). The RMSF of C −1 in CCC 0 (L) was about the same as T −1 in CTC 0 (U) complex ( Fig. 2A). Together these molecular interactions reveal how A3A may accommodate and deaminate TC and CC substrate sequence motifs with varied DNA-binding conformations.
To further examine the interdependency between substrate-binding conformation and sequence specificity at the −1 0 position, additional MD simulations with the four ssDNA-bound A3A complexes were carried out at a higher temperature of 37 C (310.2 K). These simulations accentuated the instability of the unfavored DNA conformations and facilitated the potential for a conformational switch to a more favorable conformation. ssDNA in both A3A-CTC 0 (U) and A3A-CCC 0 (L) complexes was again stable during the higher temperature simulations (Fig. 6, A and C). In contrast, ssDNA initiated in A3A-CTC 0 (L) and A3A-CCC 0 (U) complexes underwent conformational changes to the preferred mode of binding (Fig. 6, B and D). ssDNA in the A3A-CTC 0 (L) complex transitioned from an L-shape to U-shape, while ssDNA initiated in a A3A-CCC 0 (U) conformation transitioned from U-shape to L-shape during the simulations. These additional simulations not only validated our results at 25 C but also supported the interdependency between ssDNA-binding conformation and substrate sequence specificity for A3A.
A3B exhibits stronger preference for T rather than C at the −1 0 position, unlike A3A (53). However, for A3A and A3G, the complexes analyzed above had a C at the upstream −2 0 position. To investigate any influence of −2 0 nucleotide on substrate preference, additional A3B complexes with CTC 0 motif were characterized and compared with ATC 0 motif. When the DNA was in a U-shape, i.e., in both A3B-ATC 0 (U) and A3B-CTC 0 (U) complexes, −1 0 T formed stable hydrogen bonds with the backbone of Tyr315 and the side chain of Asp314, which was similar to what was observed in A3A and consistent with our previous results (61) (Fig. 5B left; Table S3). In the A3B-ATC 0 (L) complex, unlike what was observed in A3A-CTC 0 (L), Asp314 maintained the same side hydrogen bond as in U-shape conformation and thus promoted T over C (Fig. 5B right; Table S2). Thus, these results indicate that within an ATC 0 motif A3B can accommodate T at −1 0 position in either DNA conformation.
A3G binds ssDNA in an L shape and thus prefers CC 0 rather than TC 0 at −1 0 position. A3G is the only A3 that prefers cytidine over thymidine at −1 0 position. In our MD simulations of the A3G-CCC 0 (L) complex, the side chain of Asp316 and the backbone of Asp317 formed stable hydrogen bonds with −1 0 cytidine (Fig. 5C left; Table S3). These direct hydrogen bonds were lost in the unstable A3G-CCC 0 (U) complex simulation (Fig. 5C right; Table S3), further confirming the preference of A3G-CCC 0 for the more stable Lshaped conformation.
Nucleotide specificity at −2 0 position of substrate DNA Intra-DNA interactions in A3A, as we previously revealed, may underlie the substrate specificity of T/C at −2 0 position (6). However, atomic interactions defining the −2 0 specificity for A3B or A3G have remained elusive. To address the molecular mechanism of specificity at −2 0 position in A3B, additional models of A3B with a C at position −2 0 (A3B-CTC 0 ) and (A3B-CCC 0 ) were compared with A3B's preferred ATC sequence (53) which was used in the complexes described above. In A3B-ATC 0 (L) complex simulation, A −2 formed stable hydrogen bonds with the side chain of Arg311 (occupancy 38% during the simulations), backbone of Ile312 (occupancy 90%), Asp314 (occupancy 98%), and stacking interactions with Trp281 (occupancy 32%) (Fig. 7A). However, all these interactions were significantly weaker in the A3B-CTC 0 (L) complex simulations (Fig. 7B) especially for Trp281. Moreover, the side chain of the gatekeeper residue for DNA binding, Arg211, lost interactions with DNA backbone, which likely impairs the binding of target cytidine C 0 in the active site (Fig. S4). The destabilization of these interactions is consistent with preference of A3B for the ATC over CTC sequence.
Even though there is enough space to accommodate an A, the larger base of A in the ACC 0 (L) complex simulation did not occupy the space where water-mediated hydrogen bonds had been; instead, the A −2 was significantly destabilized and formed only one hydrogen bond with Asp316 (occupancy 42%) (Fig. 7D). Thus, A3G preferred to accommodate the smaller C at the −2 0 position through an extensive watermediated hydrogen bonding network.

Discussion
In this study, we investigated the structural molecular mechanism for substrate specificity in A3 enzymes and found an interdependence between substrate conformation and specificity. Specifically, our results indicate that A3A and A3B can accommodate DNA in a linear conformation, in addition to the wrapped U-shaped binding conformation of substrate DNA observed in crystal structures. For A3A, the linear conformation permits recognition of CC 0 dinucleotide motif while the U-shape prefers TC 0 as observed in the crystal structure. The active site loops are key in defining the overall binding surface and conformation adopted by the DNA to bind A3s. For A3A, A3B, and A3G, loop 1 is critical with extensive interactions with DNA including the gatekeeper residues (His29 in A3A, Arg211 in A3B, and His216 in A3G), which locks DNA in the active site. We described that the threeresidue insertion in loop 1 of A3B and A3G compared with A3A underlies the preference for the more extended L-shaped conformation of the DNA through specific stacking interactions. These results indicate that DNA conformation should be considered together with nucleotide preference in defining substrate specificity of A3 enzymes.
Another key loop in substrate recognition is loop 7; previous studies have shown that swapping loop 7 from A3G into A3B altered the substrate specificity in A3B from TC 0 to CC 0 (40). Additionally, changing Asp317 of A3G into the corresponding residue of A3A (Tyr132) caused A3G to adopt a more A3Alike 5'-TC 0 preference (69). While these results suggested that Tyr132 (Tyr315 in A3B) in loop 7 might be important for substrate specificity at −1' position, our analysis demonstrates there is no specific hydrogen bond between this residue and −1 0 base during the MD simulations. Instead we propose this tyrosine is key for stabilizing the U-shaped binding conformation of DNA, and thus the TC 0 preference in A3A/ A3B and not in A3G. This is supported by the significantly higher vdW contacts of Tyr132/315 with DNA compared with those of Asp317 in A3G (Fig. S6). Thus tyrosine at this critical position in loop 7 stabilizes the U-shape of bound DNA, allowing accommodation of a T at this position.
Unlike human CDAs that bind single nucleosides (70), A3s require at least five consecutive nucleotides for stable substrate recognition (6,30,71). The interdependence between binding interactions around the active site suggests that the sequence of nucleotides flanking the target cytidine is also critical for stable substrate binding. We observed interdependence between preference and interactions of various nucleotides positions in our pMD analysis. The nucleotide at −1 0 position affects the binding stability of the target nucleotide in the catalytic site (0 position). Having a disfavored nucleotide at the −1 0 position destabilized the target nucleotide. Additionally, the nucleotide at −2 0 position can influence the specificity at −1 0 position. In A3B, the interactions between −2 0 A and Asp314 locked the side chain in a conformation that promotes thymidine over cytidine at −1 0 position. In both A3A and A3B, preference for T at the −1 position was possible only when the substrate DNA was in a U-shape, which cannot be accommodated by A3G.
Considering the roles of A3s in viral infections and cancer, a better understanding of the molecular mechanism by which A3s recognize different oligonucleotides will be critical for developing therapeutics. Currently, combined with catalytically inactive Cas9 (dCas9), A3s are investigated as novel base editors for direct modification of genomic DNA at single-base resolution (21,22); as cytosine base editors (CBE), A3s can create mutations to potentially correct genetic diseases. Nevertheless, these editors still require optimization, considering that CBEs can have significant off-target effects, low purity, and wide editing windows (72). To overcome these problems, several versions of CBEs have been engineered; UGI added to increase product purity (73) and generate high fidelity Cas9 with reduced off-target effects (74) or different Cas nucleases (75)(76)(77)(78)(79) were used to narrow activity window. However, modifications to improve the efficiency and fidelity of base editing have not focused on the deaminase. Another major problem in applying CBEs to treat genetic diseases is that the target site must naturally exist in the preferred DNA sequence context for the cytidine deaminase, which may or may not be the case for the desired modification. Therefore, having a library of A3s with different substrate specificities as context-dependent base editors would expand the toolkit available for targeted base editing. Our results revealing the molecular mechanisms underlying A3 specificities could help guide engineering of A3s, especially with modifications to the active site loops, to rationally design A3s with the desired sequence specificity.

Experimental procedures Protein sequence alignment
Protein sequence alignment was generated by program Geneious 9.0.5 using default multiple alignment.

Molecular modeling
All structure models in this study were first generated with MODELLER9.23 and then optimized using Protein Preparation Wizard in Maestro from Schrodinger suite. The optimization was performed at pH 7.0 and minimized using restrained minimization panel. The ssDNA-bound crystal structures of A3A (PDB: 5KEG for protein; 5SWW for ssDNA) and A3G-CTD2 (PDB: 6BUX) were used as templates for molecular modeling of wild-type A3-ssDNA complexes. ssDNA sequences were mutated through program Coot. ssDNA-bound A3B-CTD structures were modeled using the crystal structures of both apo A3B-CTD (PDB: 5CQH) and A3A-ssDNA complex.

Molecular dynamics simulations
All molecular dynamics simulations were performed for 100 ns using program Desmond from the Schrodinger suite. The simulation systems were built using SPC solvation model and cubic boundary conditions of 12 Å buffer box size with OPLS3 force field through system builder in Maestro. The final systems were neutral and had 0.15 M sodium chloride. A multistage MD simulation protocol was used, which was previously described (61), at 298 K (25 C). The four A3A-ssDNA models were also simulated at 310.2 K (37 C) using the same simulation protocol.

Analysis of molecular dynamics simulations
The RMSD and molecular interactions (hydrogen bond, stacking interaction occupancies) over the trajectories were calculated using Simulation Interaction Diagram in Maestro from Schrodinger. The per base RMSFs of ssDNA were calculated using Schrodinger python API. The residue vdW potential between A3s and ssDNA during the MD simulations was extracted from the simulation energies using Desmond.
The frame closest to the average RMSD was used as a representative structure for each MD simulation. The electrostatic distributions were calculated using PDB2PQR server and Pymol with the APBS plugin and visualized with contour levels positive (+3) and negative (−3). The time series representations of ssDNA were generated with program VMD using 2000 frames as time step (total 20,000 frames for each MD trajectory). All other structural graphics were made using the program PyMol.
Supporting information-This article contains supporting information.
Funding and additional information-W. M. is supported in part by the NIH office of intramural training and education's Intramural AIDS Research Fellowship. For W. M. and H.M., this project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract HHSN26120080001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does it mention of trade names, commercial products, or organizations imply endorsement by the