Crystal Structure of the DNA Deaminase APOBEC3B Catalytic Domain*

Background: APOBEC3B-catalyzed DNA cytosine deamination causes mutations in cancer. Results: We present the first APOBEC3B catalytic domain crystal structures including a dCMP-bound form. Conclusion: A closed active site conformation distinguishes APOBEC3B from related enzymes and suggests that conformational changes are central to the overall single-stranded DNA binding mechanism. Significance: These high resolution structures provide a foundation for inhibitor development. Functional and deep sequencing studies have combined to demonstrate the involvement of APOBEC3B in cancer mutagenesis. APOBEC3B is a single-stranded DNA cytosine deaminase that functions normally as a nuclear-localized restriction factor of DNA-based pathogens. However, it is overexpressed in cancer cells and elicits an intrinsic preference for 5′-TC motifs in single-stranded DNA, which is the most frequently mutated dinucleotide in breast, head/neck, lung, bladder, cervical, and several other tumor types. In many cases, APOBEC3B mutagenesis accounts for the majority of both dispersed and clustered (kataegis) cytosine mutations. Here, we report the first structures of the APOBEC3B catalytic domain in multiple crystal forms. These structures reveal a tightly closed active site conformation and suggest that substrate accessibility is regulated by adjacent flexible loops. Residues important for catalysis are identified by mutation analyses, and the results provide insights into the mechanism of target site selection. We also report a nucleotide (dCMP)-bound crystal structure that informs a multistep model for binding single-stranded DNA. Overall, these high resolution crystal structures provide a framework for further mechanistic studies and the development of novel anti-cancer drugs to inhibit this enzyme, dampen tumor evolution, and minimize adverse outcomes such as drug resistance and metastasis.

APOBEC3B (A3B) 3 is a member of a larger family of zincdependent DNA deaminases that convert cytosine bases into uracils in single-stranded DNA and RNA (reviewed in Refs. 1 and 2). Seven APOBEC3 (A3) enzymes, A3A/B/C/D/F/G/H, encoded by a tandem gene array on human chromosome 22 provide overlapping protection against a variety of DNA-based parasites including viruses and transposable elements (reviewed in Refs. 3 and 4). A3B is the only family member that is predominantly nuclear through the duration of the cell cycle, except in the M phase, in which it is excluded from metaphase chromosomes, and after cell division and formation of daughter cell nuclei, it is then actively reimported to the nuclear compartment (5,6). Two related enzymes shuttle transiently into the nuclear compartment to reach biologically relevant nucleic acid substrates. Activation-induced deaminase (AID) is transported into the nucleus to deaminate immunoglobulin gene DNA and drive the antigen-driven processes of class switch recombination and somatic hypermutation (reviewed in Refs. 7 and 8). APOBEC1 shuttles into the nucleus to deaminate a variety of RNA substrates including cytosines in the coding region of mRNAs (such as the APOB mRNA), as well as a wide variety of noncoding regions (reviewed in Refs. 9 and 10). A3B is thought to have retained the nuclear import mechanism of an ancestral AID/APOBEC1 enzyme but lost the capacity to be exported actively into the cytoplasm from the nuclear compartment (11)(12)(13).
In contrast to its beneficial function in innate immunity, a variety of studies have combined to demonstrate a role for A3B in cancer mutagenesis (reviewed in Refs. 14 -16). First, A3B is up-regulated in breast, head/neck, lung, bladder, cervical, and several other cancers, and up-regulation correlates with higher frequencies of cytosine mutation (17)(18)(19)(20)(21). Second, the patterns of both the dispersed and clustered (kataegis) cytosine mutations that dominate the mutational landscapes in these cancers in vivo closely resemble the intrinsic biochemical preferences of A3B-catalyzed DNA cytosine deamination in vitro (17)(18)(19)(21)(22)(23). Specifically, the most frequently mutated cytosines in these cancers are in 5Ј-TCA, 5Ј-TCG, and 5Ј-TCT trinucleotide motifs, which resemble the actual biochemical preferences of recombinant A3B (5Ј-TCA ϭ 5Ј-TCG Ͼ 5Ј-TCT) (17,21). Third, high A3B mRNA expression levels correlate with poor outcomes for estrogen receptor-positive breast cancer (24,25), renal cancer (26), and multiple myeloma (27). Fourth, the spectrum of activating mutations in PIK3CA is biased toward a potential A3B deamination hot spot in A3B-high, HPV-infected head/neck tumors, in comparison to A3B-low, virusnegative tumors (20). Taken together, these and other studies demonstrate that A3B is a significant endogenous source of genomic mutation in cancer, which promotes tumor evolution, heterogeneity, and poor clinical outcomes and is induced by some types of viral infection.
Given this role in cancer mutagenesis, A3B has become a promising target for anti-cancer drug development. The attractiveness of A3B is further enhanced by the fact that it is a nonessential enzyme (28). However, progress toward developing methods to inhibit A3B mutagenesis will require high resolution crystal structures, and to date, only homology models are possible because this enzyme has resisted structural determination (i.e. only structures of related human DNA deaminases have been reported (29 -40)). Here we present multiple crystal structures of the A3B catalytic domain, including one bound by dCMP, which represents the first nucleotide-bound structure of any polynucleotide cytosine deaminase. These structures reveal an active site pocket occluded by loop regions, which highlights the importance of conformational flexibility in the ssDNA deamination mechanism.

Experimental Procedures
Expression Constructs and Sequence Alignments-The bacterial expression plasmids used in this study were generated by inserting a codon-optimized gene for the full-length human A3B (residues 1-382) or A3B C-terminal domain (A3Bctd, residues 187-378) between the NdeI/XhoI sites of the pET24a vector (Novagen) for C-terminally His 6 -tagged proteins or between the BsaI/XbaI sites of the pE-SUMO vector (Lifesensors) for the N-terminally SUMO-tagged proteins. The A3B protein sequence used in this study matches UniProt ID Q9UH17. Derivatives encoding amino acid substitution mutants such as the quadruple mutant (QM) of A3Bctd were constructed by standard site-directed mutagenesis procedures and verified by DNA sequencing (University of Minnesota Genomics Center).
Protein Purification-A3Bctd-QM (F200S/W228S/L230K/ F308K), A3Bctd-QM⌬loop3, and its variant A3Bctd-QM⌬loop3-A3Aloop1 were expressed with a noncleavable C-terminal His 6 tag (LEHHHHHH) using the pET24a-based expression vector in Escherichia coli strain C43(DE3)pLysS. A3Bctd-DM (L230K/ F308K) was expressed as a SUMO fusion. Protein expression was induced by adding isopropyl ␤-D-1-thiogalactopyranoside at a final concentration of 0.5 mM into the E. coli culture in LB medium during the mid-log growth phase, supplemented by 100 M of zinc chloride at the time of induction. E. coli were collected after ϳ16 h of incubation at 18°C and lysed by sonication, and soluble protein was purified by nickel affinity and Superdex 200 size exclusion chromatography. The purified proteins suspended in 20 mM Tris-HCl (pH 7.4), 0.5 M NaCl, and 5 mM ␤-mercaptoethanol produced single peak in size exclusion chromatography corresponding to a monomer (see "Results"). Additional zinc was not included in buffers during purification, but the purified protein contained a stoichiometric amount of zinc as confirmed by a colorimetric zinc quantification assay (Abcam). For A3Bctd-DM, the SUMO tag was removed by Ulp1 treatment prior to the size exclusion chromatography step. C-terminally myc-tagged APOBEC3A was expressed in 293T cells and purified as described (39,41,42).
Crystallization and Structure Determination-A3Bctd-QM⌬loop3 was concentrated to ϳ20 mg/ml by ultrafiltration and crystallized by the hanging drop vapor diffusion method, using a reservoir solution including 20% polyethylene glycol 3,350 and 0.1 M MES-NaOH buffer (pH 6.5). Crystals grew as clusters of fine needles. Both the orthorhombic and monoclinic space group crystals were obtained in essentially the same condition, and they had similar crystal morphologies. Isolated crystals were cryoprotected by transfer into a reservoir solution supplemented by 25% glycerol or ethylene glycol and flashcooled in liquid nitrogen. X-ray diffraction data were collected at the Advanced Photon Source NE-CAT Beamlines 24-ID-C and 24-ID-E and processed using HKL2000 (43) or XDS (44). The structures were solved by molecular replacement phasing using a previously reported A3Gctd crystal structure (39) (PDB code 3V4K) as the search model, with the program PHASER (45). Model building and refinement was done using the program COOT (46) and PHENIX (47), respectively. The nucleotide (dCMP)-bound structure of A3Bctd-QM⌬loop3 (5CQH) was obtained by soaking the crystal in the orthorhombic crystal form in 200 mM dCMP during cryoprotection with ethylene glycol. The glycerol-bound structure (PDB code 5CQI) was obtained by soaking the crystal with a small fragment of ssDNA (5-mer) during cryoprotection with glycerol, although electron density was not observed for the ssDNA. The glycerol/PEGbound structure (PDB code 5CQK) was obtained by cryoprotection with glycerol, without any nucleotide ligands. The dCMP and PEG bound to overlapping sites on the protein surface. Summaries of x-ray diffraction data and model refinement statistics are listed in Table 1. Active site volumes were estimated using Sitemap (48).
Rifampicin Resistance Mutation Assay-E. coli strain C43(DE3) was transformed with each plasmid and plated on LB plates containing either ampicillin (100 g/ml) or kanamycin (50 g/ml). The rpoB forward mutation assay was performed as described (49,50). Briefly, single colonies were picked and inoculated into 1 ml of LB with appropriate antibiotic. Cultures were grown to an A 600 of 0.4 -0.6 followed by plating 0.1-0.5 ml of neat culture onto LB rifampicin (100 g/ml) or appropriate dilutions onto LB plates. After an overnight incubation, colonies were counted, and mutation frequency was calculated by dividing the number of rifampicin-resistant colonies per ml by the total number of cells per ml. The sequence preference of rpoB mutagenesis was determined by colony PCR using 5Ј-TTGGCGAAATGGCGGAAAACC and 5Ј-CACCGACG-GATACCACCTGCTG, treatment of the PCR products with shrimp alkaline phosphatase and Exonuclease I (New England Biolabs, Ipswich, MA), and sequencing using the forward rpoB primer. Mutations were analyzed using Sequencher version 5.0.1 (GeneCodes, Ann Arbor, MI). Most constructs tested in the rpoB assay were Sumo-fused (pE-SUMO vector) because they elicited slightly higher mutation frequencies than the corresponding His 6 -tagged constructs (pET24a vector), except for A3AL1 for which the Sumo fusion construct could not be generated because of toxicity in E. coli.
In Vitro DNA Deamination Assay-800 nM oligonucleotide 5Ј-ATTATTATTATTCAAATGGATTTATTTATTTATTT-ATTTATTT-fluorescein was treated with purified proteins for 1 h at 37°C, followed by treatment with 0.06 units uracil-DNA glycosylase (New England Biolabs) for 10 min at 37°C, and treatment with 100 mM NaOH for 10 min at 95°C (39,42). Reaction products were separated by 15% denaturing PAGE and scanned on a Typhoon FLA 7000 imager (GE Healthcare Life Sciences).

Results
Solubility and Activity of the A3B Catalytic Domain-The full-length human A3B enzyme consists of two tandem zinccoordinating domains: the N-terminal domain with a pseudoactive site and the C-terminal domain with catalytic activity (51). Initial purification attempts with full-length A3B did not succeed because it is expressed at low levels and is poorly soluble. We therefore created a deletion construct that retained residues 187-378 to focus on characterizing the catalytic domain (A3Bctd) (5, 11, 12, 52). These deletion boundaries were informed by comparisons with the related enzymes A3G and A3F, where analogous approaches led to successful catalytic domain structures (29 -31, 33, 37-39).
The E. coli-based rifampicin resistance mutation assay was used to assess DNA cytosine deaminase activity (49). The median number of rifampicin-resistant mutants for each condition provides a quantitative comparison of the mutation frequency relative to control plasmid expressing cells. Single colonies were grown overnight, and then aliquots of saturated cultures were plated on rifampicin-containing plates to select for rpoB mutants and additional aliquots were diluted and plated on rich growth medium to assess viability. The A3Bctd construct elicited a slightly higher mutation frequency than full-length A3B (residues 1-382), and E. coli cells expressing either of these constructs showed mutation frequencies over 100-fold higher than cells with the vector control or expressing a catalytic mutant (E255A) (Fig. 1A). These data demonstrated that the C-terminal domain alone is sufficient for DNA editing.
We next focused on improving A3Bctd solubility in E. coli by introducing four amino acid substitutions: F200S, W228S, L230K, and F308K (QM). Three of these substitutions (F200S, W228S, and L230K) changed A3Bctd residues to the corresponding sequence of the homologous A3A protein, which recently yielded to structural studies (34,40). F308K mimics the F302K and F310K substitutions used in structural studies of A3Fctd and A3Gctd, respectively (29 -31, 33). In addition, to reduce conformational entropy and facilitate crystallization, we replaced the region spanning Ala-242 to Tyr-250, which corresponds to the flexible loop 3 of A3A and A3G (53), with a single

APOBEC3B Structure and ssDNA Interaction
serine residue (designated Ser-250 in the structural coordinates and from here on). The resulting construct, A3Bctd-QM⌬loop3, was confirmed to be active by eliciting an rpoB mutation frequency nearly 100-fold higher than background and only modestly reduced compared with the parental construct (Fig. 1B).
Next, individual rifampicin-resistant mutants were subjected to high fidelity colony PCR and DNA sequencing. Importantly, the distribution of cytosine mutations within the rpoB gene was similar for full-length A3B (wild type), A3Bctd, and A3Bctd-QM⌬loop3 demonstrating again that the C-terminal domain governs catalytic activity and intrinsic local deamination preferences and that the amino acid changes required for solubility did not compromise these important properties (Fig. 1C). The major transition mutation hot spot for A3B activity in rpoB is a 5Ј-GTCG (C1585) consistent with the previously reported in vitro preferences (17,21). In addition, purified A3Bctd-QM⌬loop3 showed a monodisperse distribution by size exclusion chromatography, single band by SDS-PAGE, and ssDNA deaminase activity in vitro (Fig. 1, D and E).
A3B Catalytic Domain Crystal Structures-A3Bctd-QM⌬loop3 crystallized in two different crystal forms, and molecular replacement was used to determine the structures ( Fig. 2 and Table 1). The three structures determined in the orthorhombic space group (P2 1 2 1 2 1 ) were different only in terms of bound ligands. The structure refined at 1.73 Å resolution was bound by a single dCMP nucleotide and four ethylene glycol molecules. The 1.68 Å resolution structure had ordered glycerol molecules bound inside and near the active site pocket. The 1.88 Å resolution structure has glycerol in the active site, as well as an ordered partial PEG chain on the protein surface. The structure refined in the monoclinic space group (P2 1 ) also showed an ordered glycerol molecule occupying the active site pocket of both of the protein molecules in the asymmetric unit.
The refined atomic models of A3Bctd-QM⌬loop3 are very similar between the two crystal forms and between the two crystallographically independent molecules in the monoclinic space group (Fig. 2A). The backbone root mean square deviations for all pairwise comparisons are smaller than 0.5 Å. The overall architecture of A3Bctd-QM⌬loop3 closely resembles those of other A3 members and consists of a single-layered, five-stranded ␤-sheet surrounded by six ␣-helices and connecting loops ( Fig. 2A). As observed for the closely related Z1-type deaminase domains A3A and A3Gctd, ␤2 on the exposed edge of the ␤-sheet is interrupted by a short 3 10 helical segment (helix 1 in Fig. 2A; A3A (34,40); A3Gctd (30,31,33,37,39)). Correspondingly, the C-terminal end of ␣2, which interacts with this helical segment, is bent toward the discontinuous ␤2 ( Fig. 2A). Although functional dimerization has been proposed for closely related A3 proteins A3A (40) and A3Gctd (33), the  NOVEMBER 20, 2015 • VOLUME 290 • NUMBER 47

JOURNAL OF BIOLOGICAL CHEMISTRY 28123
A3Bctd crystal structures showed no evidence for oligomerization through homologous motifs. For instance, the interfaces appear mostly distinct between the two crystal forms, with the only recurring crystallographic dimer interface observed in both crystal forms facilitated by Ser-250 hydrogen bonding (the residue that had been introduced to replace loop 3). Moreover, at high protein concentrations, A3Bctd-QM⌬loop3 behaved as a monomer in size exclusion chromatography (Fig. 1D). Our data combine to indicate that the A3Bctd is active as a monomer, although we have not addressed the possibility that it might oligomerize upon substrate engagement, and it is notable that full-length protein is capable of forming higher order oligomers in a cellular context, most likely through its insoluble N-terminal domain (54). Closed Conformation of the A3B Active Site-In the structure of A3Bctd-QM⌬loop3, a single zinc ion is bound between the N-terminal ends of ␣2 and ␣3, coordinated by two cysteines (Cys-284 and Cys-289) and a histidine (His-253; Fig. 2, A-C). The mode of zinc coordination resembles those of other A3 members and zinc-dependent nucleoside/mononucleotide deaminase enzymes. The single zinc ion in A3Bctd-QM⌬loop3 is partially accessible to solvent through a deep active site pocket (Fig. 3, A-E). The exposed face of the zinc ion interacts with a water molecule serving as the fourth ligand of zinc coordination in the glycerol-bound active site (described below) or, alternatively, is bound by an ethylene glycol molecule when it was used as the cryoprotectant (Figs. 2, B and C, and 3, A-D).
This water or ethylene glycol molecule is also hydrogen-bonded to Glu-255 in the bottom of the pocket; the observed geometry supports a catalytic mechanism in which Glu-255 activates the water molecule for nucleophilic attack of the C4 position of cytosine, and the zinc ion directly coordinates and stabilizes the transition state of the deamination intermediate. In the active site pocket, several additional well ordered water molecules were also observed (Fig. 3, A and B) or, depending on the cryoprotectant used, a well ordered glycerol molecule (Figs. 2C and 3, C and D). The glycerol molecule bound in the pocket makes either direct or water-mediated hydrogen bonds with the polar side chains (Thr-214 and Asn-240) or main chain groups (Arg-211 carbonyl and Cys-284 amide) lining the active site pocket, which may mimic interactions made by the cytosine substrate.
Previous structural and functional studies of A3 proteins have demonstrated that loop 1 (between ␣1 and ␤1) and loop 7 (between ␤4 and ␣4) surround the active site and suggested that these flexible regions play important roles in binding singlestranded DNA substrates (30,53,(55)(56)(57)(58). The structures of A3Bctd-QM⌬loop3 in both crystal forms show a tightly closed conformation of the active site compared with those observed for other A3 proteins (Fig. 3, A and C). In particular, Arg-211 from loop 1 and Tyr-315 from loop 7 stack on each other to stabilize a collapsed conformation of these loops. The Arg-211 and Tyr-315 side chains in turn stack on Tyr-313 that is positioned over the active site pocket. In addition, the short bulge connecting the two zinc-coordinating cysteines (Cys-284 and   Cys-289) that precedes ␣3 is positioned proximal to the active site pocket, where Ser-286 and Trp-287 form a hydrogen bond and van der Waals contacts, respectively, with Asp-316 and Arg-211 to further stabilize the closed conformation of loops 1 and 7. The tightly packed side chains around the active site leave a limited opening for single-stranded DNA cytosines to enter the active site pocket, and this does not appear large enough for a nucleotide substrate to pass through (Fig. 3, A and  C). This closed conformation results in an active site pocket of ϳ104 Å 3 in comparison with the larger active site volumes of 272 Å 3 for A3A and 184 Å 3 for A3Gctd (Fig. 3E). Thus, substrate binding into the A3B active site is likely to require considerable rearrangement of the surrounding residues from the closed conformation observed in our A3Bctd-QM⌬loop3 crystal structures into a more open and catalytically active conformation.
Roles of A3B Active Site Residues-To probe roles of the amino acid residues around the active site mentioned above, we tested the deaminase activities of mutant A3B proteins using the E. coli-based mutation assay. The amino acid substitutions were introduced into the A3Bctd-L230K/F308K background, because it had already proven to be one of the most active constructs (Fig. 1B). Of all amino acid substitutions tested, Y313A showed the strongest effect, with a complete loss of activity (Fig.  4A). On the other hand, Y313F caused no change of mutation frequency, suggesting that the aromatic moiety of Tyr-313, but not the hydroxyl group of the tyrosine side chain, is critical for function. Tyr-313 from loop 7 is conserved across most A3 proteins, and aromatic residues at the equivalent position in various mononucleotide/nucleoside cytosine deaminases, provided in trans (intermolecularly) in some cases, have been observed to make direct contact with the pyrimidine moiety occupying the active site pocket (59,60). Substitution of Arg-211 for alanine in loop 1, another well conserved residue among A3 members, caused a strong but incomplete reduction of mutation frequency (Fig. 4A). W287A did not affect the mutation frequency, suggesting that Trp-287 is not an essential residue for the catalytic activity of A3B. This result, however, is not too surprising because Trp-287 and the adjacent Gly-288 are particular to A3B and the related subfamily member A3A, suggesting that these amino acid residues may have a function beyond substrate binding and catalysis.
We also examined the mutagenic activity of chimeric A3Bctd proteins (Fig. 4B). Replacing loop 7 residues of A3Bctd with the corresponding sequence from A3G (GL7) only slightly reduced mutation frequency, indicating that the precise identities of loop 7 residues are not essential for catalytic activity. Although A3B and A3G residues around the active site are generally well conserved, their loop 1 and 7 sequences differ significantly, likely reflecting the different ssDNA substrate sequence preferences of these enzymes (elaborated further below). A3A and A3B have essentially identical sets of active site residues, except that their loop 1 sequences differ. Replacing A3B loop 1 with the corresponding residues from A3A led to enhancement of mutation frequency in the rpoB assay and to significantly higher in vitro deaminase activity of the purified proteins (Fig. 4, B and  C). This improved activity relative to the A3Bctd-QM⌬loop3 parental construct is consistent with the more open active site conformation reported recently for A3A (34,40) (Fig. 3E).
A3B Sequence Preferences-The deamination target sequence profile of various A3B constructs were examined to gain insights into the determinants of sequence selectivity. Wildtype A3B and A3Bctd preferentially deaminated a single 5Ј-GTCG-3Ј hot spot motif in the rpoB gene (Figs. 1C and 4D), consistent with previous reports for recombinant A3Bctd in vitro (17,21). Thus, the N-terminal domain of A3B is not important for activity or intrinsic target selectivity. The crystallized protein A3Bctd-QM⌬loop3 lacking the flexible loop 3 showed the same target preference as the wild-type protein, suggesting that the loop 3 is not essential for either catalytic activity or local target sequence selectivity of A3B (Figs. 1C and  4D). The more active protein, A3Bctd-QM⌬loop3-A3AL1, which has loop 1 from A3A, also showed a similar 5Ј-TC-3Ј sequence preference (Fig. 4D).
Previous studies have demonstrated the importance of loop 7 residues in sequence selectivity of related A3 family members, AID, A3A, A3F, and A3G (53,(55)(56)(57)(58). Consistent with these reports, replacing loop 7 of A3B with that from A3G (GL7) converted the target sequence preference to 5Ј-CC-3Ј preferred by A3G (cytosine 1691 in rpoB; Fig. 4D). Interestingly, single amino acid residue change, D314R, which is a conserved residue in loop 7 between A3A (Asp-131), A3B (Asp-314), and A3G (Asp-316), was sufficient to cause the change of preference at the Ϫ1 position from T to C (Fig. 4D). Although the precise mechanism of this change in selectivity is unknown, the observation underscores the importance of A3B loop 7 in target sequence selectivity. Another amino acid substitution within loop 7, Y315A, did not cause a clear conversion of target sequence preference but appeared to relax it somewhat (Fig.  4D).
Nucleotide Binding Site and an A3B-ssDNA Model-Soaking the A3Bctd-QM⌬loop3 crystal in the orthorhombic space group with dCMP as an attempt to capture the nucleotide in the active site yielded no additional electron density in the active site pocket. Unexpectedly, however, we observed clear electron density at 1.7 Å resolution for the nucleotide bound at a positively charged surface close to but outside of the active site (Fig.  5, A-C). The dCMP molecule lies flat on the protein surface; it is bound at this position through hydrogen bonding of the 5Ј-phosphate group with Tyr-319 and Lys-320 side chains from ␣4 and cation stacking between the pyrimidine ring and Arg-372 side chain from ␣6. The guanidinium group of Arg-372 forms a salt bridge with Asp-314. In addition to these interactions, the 3Ј-OH group of dCMP is hydrogen-bonded to a side chain from a symmetry-related protein molecule as part of the crystallographic lattice. The observed interactions do not appear sequence (cytosine)-specific.
We reasoned that the bound dCMP molecule may represent a nucleotide 5Ј to the target cytosine within a ssDNA. Based on the orientation of the bound dCMP and its distance from the likely position of the deamination target cytosine and the likelihood that loop 7 residues located between the dCMP and the active site dictate the identity of the Ϫ1 base, we further reasoned that this dCMP represents the Ϫ3 base in a typical singlestranded DNA substrate (i.e. Ϫ3 relative to a 0 position target cytosine). Thus, we built a hypothetical model of A3B-ssDNA complex by connecting the crystallographically observed nucleotide, which occupies the Ϫ3 position, and a nucleotide in the active site (0 position). The nucleotide in the active site pocket was modeled based on superposition of a 2Ј-deoxycytidylate deaminase crystal structure (PDB code 1VQ2 (59)) with ligand bound in the active site onto the A3Bctd-QM⌬loop3 crystal structure. As described above, the A3Bctd-QM⌬loop3 crystal structures show a closed conformation not compatible with substrate binding into the active site pocket. Thus, to accommodate the ssDNA substrate including the intervening nucleotides (at the Ϫ1 and Ϫ2 positions), the conformation of loop 7 was modified to mimic the more open state observed in the A3Gctd crystal structure (Fig. 3E) based on superpositioning of the two structures, followed by energy minimization to regularize geometry.
In the resulting ssDNA-A3B model (Fig. 5D), the cytosine base in the active site pocket stacks on the zinc-coordinating residue His-253, whereas the C2 carbonyl group of the pyrimidine base potentially interacts with Asn-240. Tyr-313 stacks on the thymine base at the Ϫ1 position, which fits tightly within a limited space surrounded by aromatic side chains that would only accommodate a pyrimidine base. This thymine (Ϫ1) base is modeled so that its C5 methyl group makes a van der Waals contact with the cytosine base in the active site, whereas its C4 carbonyl group is positioned close to Tyr-315. When A3G with the 5Ј-CC-3Ј target preference binds its cognate sequence, the C4 amino group of the cytosine base at this (Ϫ1) position could hydrogen bond with the aspartate side chain at the corresponding position (Asp-317), which has been proposed to serve as a helix-capping residue to help define the cytosine-binding pocket (53). Arg-211 interacts with a backbone phosphate of the DNA. The ssDNA shown in Fig. 5D has thymine at Ϫ2 and adenine at ϩ1 positions, but the model would accommodate other sequences at these positions as well. The possible significance of this hypothetical ssDNA binding mechanism is discussed below.

Discussion
We have generated a highly soluble A3B catalytic domain construct (A3Bctd-QM⌬loop3) and determined its high resolution crystal structures in two crystal forms with distinct molecular packing environments. Importantly, A3Bctd-QM⌬loop3 is catalytically active and has the same target DNA sequence preference as the wild-type enzyme, ensuring functional relevance of the observed structural features. Our crystal structures reveal a closed active site conformation that is likely to prevent substrate entry into the active site pocket. The high similarity between the structures of three crystallographically independent molecules, across two crystal forms, suggests that the observed closed conformation is the most thermodynamically stable state of the enzyme (Fig. 2). The closed conformation is stabilized by a stacking interaction between Arg-211 and Tyr-315 from loops 1 and 7, respectively. Previous studies together with the mutation analyses presented here demonstrate that these loop regions are important for catalytic activity and dinucleotide target sequence selectivity, respectively, strongly implicating direct interactions with ssDNA substrates (30,53,(55)(56)(57)(58). Based on these observations, we hypothesize that the binding of a ssDNA substrate carrying the preferred 5Ј-TCG or 5Ј-TCA target sequence triggers conformational changes of structural elements including loops 1 and 7, which in turn stabilize the open conformation of the active site and allow entry of the target cytosine base into the active site pocket. Together with the fact that related enzymes, A3A and A3Gctd, have been crystallized in more open states (Fig. 3E) (33,39,40), the closed conformation of our A3Bctd structures suggests that the transition between the two states may be regulated in a manner that maximizes innate antiviral activity and minimizes damage to genomic DNA.
Currently, no nucleotide-or polynucleotide-bound structures exist for any member of the APOBEC family from humans or any other species (i.e. AID, APOBEC1, APOBEC2, APOBEC3A-H, and APOBEC4). Thus, the high resolution, dCMP-bound structure of A3BctdQM⌬3 is of interest as a starting point and model to guide future studies (Fig. 5). The single dCMP molecule binds the A3Bctd surface and makes several contacts with the enzyme, but it is also assisted by crystal lattice contacts involving symmetry-related molecules (see PDB entry 5CQH), and thus the significance of this result should be interpreted conservatively. Nonetheless, the general location and orientation of the bound nucleotide appear consistent with previous models of A3 proteins binding ssDNA substrates (30,34,37,38,61,62). A3Gctd and A3A NMR chemical shift perturbation experiments, as well as mutational analyses of A3Fctd and A3Gctd, although differing in details, all suggest ssDNA binding paths spanning the catalytic domain surface and extending from the active site toward the ␣-helix 6, which overlaps with the crystallographically observed dCMP interacting with Arg-372 from the ␣-helix 6 and Tyr-319/Lys-320 from the N terminus of ␣-helix 4 (A3Bctd data in Fig. 5; prior studies with other family members (30,34,37,38,61,62)). Curiously, this nucleotide-binding site on the A3Bctd surface is accessible even when the active site is in the closed conformation. Therefore, this site could serve as an initial contact point to recruit ssDNA substrates, which in turn may allow downstream nucleotides to form more sequence-specific interactions with ligands in loops 1 and 7 around the active site. The nucleotide bound at this position may remain engaged during catalysis (as in Fig. 5D) or, alternatively, may transition to a more stable binding mode concomitant with conformational changes of the active site as discussed above. In either scenario, we propose a model in which ssDNA binding by A3Bctd entails at least two distinct steps: initial nonspecific binding that involves the exposed binding site followed by a more sequence-specific, stable interactions that require conformational changes of the loops adjacent to the active site pocket.
Comprehensive future experiments will be required to test this model rigorously and rule out potential alternatives ranging from the nucleotide binding site being a co-factor binding site, a protein-protein interaction motif, or a region of low functional relevance. Many scenarios are possible given precedents from functionally relevant ligand-bound structures of related enzymes including those of the human ADAR2/IP6 complex (PDB code 1ZY7 (63)), where the ligand serves as an essential co-factor to maintain the enzyme fold, the Staphylococcus aureus TadA/tRNA complex (PDB code 2B3 (64)), where an RNA polynucleotide substrate was captured in complex with the enzyme, and the E. coli cytidine deaminase/nucleoside complex (PDB code 1CTU (60)), the Bacillus subtilis cytidine deaminase/nucleoside complex (PDB code 1JTK (65)), and the bacteriophage T4 2Ј-deoxycytidylate deaminase/nucleotide complex (PDB code 1VQ2 (59)), where transition state analogs were captured in the active sites. By contrast, our observed nucleotide-binding site could be part of a larger ssDNA binding path, and accordingly, analogous future experiments may be even more challenging. In any event, high resolution crystal structures of the enzyme in complex with ssDNA substrates will be needed to unambiguously resolve the substrate binding mechanism.
A3B is a significant source of somatic mutations in multiple human cancers (reviewed in Refs. 14 -16). Together with recently published clinical outcome data (24 -27), it is likely that A3B causes a chronic mutator phenotype that drives tumor evolution and facilitates mutation dependent hallmarks of cancer such as drug resistance and metastasis. Our high resolution crystallographic studies provide a framework for further mechanistic studies on this enzyme. Our structures are also likely to facilitate the development of inhibitors of this enzyme, which may be useful as mutation-suppressing adjuvants in combination with targeted cancer treatments. An A3B inhibitor could target the active site pocket to compete with substrate binding or allosterically block substrate entry by stabilizing the closed conformation of the active site.
Author Contributions-H. A. and R. S. H. designed the studies and wrote the manuscript. K. S. purified and crystallized A3B and determined its x-ray structures. M. A. C. performed E. coli mutation and in vitro editing experiments. K. K. assisted with mutant construction, protein purification, and crystallization. All authors analyzed the results and approved the final version of the manuscript.