Molecular basis for the faithful replication of 5-methylcytosine and its oxidized forms by DNA polymerase β

DNA methylation is an epigenetic mark that regulates gene expression in mammals. One method of methylation removal is through ten–eleven translocation–catalyzed oxidation and the base excision repair pathway. The iterative oxidation of 5-methylcytosine catalyzed by ten–eleven translocation enzymes produces three oxidized forms of cytosine: 5-hydroxmethylcytosine, 5-formylcytosine, and 5-carboxycytosine. The effect these modifications have on the efficiency and fidelity of the base excision repair pathway during the repair of opposing base damage, and in particular DNA polymerization, remains to be elucidated. Using kinetic assays, we show that the catalytic efficiency for the incorporation of dGTP catalyzed by human DNA polymerase β is not affected when 5-methylcytosine, 5-hydroxmethylcytosine, and 5-formylcytosine are in the DNA template. In contrast, the catalytic efficiency of dGTP insertion decreases ∼20-fold when 5-carboxycytosine is in the templating position, as compared with unmodified cytosine. However, DNA polymerase fidelity is unaltered when these modifications are in the templating position. Structural analysis reveals that the methyl, hydroxymethyl, and formyl modifications are easily accommodated within the polymerase active site. However, to accommodate the carboxyl modification, the phosphate backbone on the templating nucleotide shifts ∼2.5 Å to avoid a potential steric/repulsive clash. This altered conformation is stabilized by lysine 280, which makes a direct interaction with the carboxyl modification and the phosphate backbone of the templating strand. This work provides the molecular basis for the accommodation of epigenetic base modifications in a polymerase active site and suggests that these modifications are not mutagenically copied during base excision repair.

Epigenetic modifications have a significant role in maintaining cellular identity by regulating transcript levels of the human genome (1). Methylated cytosine (5mC) 3 is one such epigenetic modification that consists of a methyl group addition to the fifth carbon of the cytosine base. DNA methyltransferases (DNMT1-3) catalyze this methylation of cytosine, usually within the context of CpG dinucleotide clusters or islands (2). In general, 5mC in CpG islands down-regulates gene expression by weakening transcription factor binding or by promoting repressor binding (3,4).
Cytosine methylation can be erased to allow for differential gene expression. This necessitates demethylation pathways that can either remove the methyl group from the cytosine base or remove the base and replace it with normal cytosine (active demethylation pathway). At this point, an enzyme capable of catalyzing the direct removal of a C5 methyl group from cytosine has not been identified. However, in vivo isotopic labeling studies have suggested the existence of such an enzyme (5). Nonetheless, the most well-described demethylation pathway involves the removal and replacement of 5mC by action of teneleven translocation (TET) enzymes and base excision repair (BER) (6). Humans encode three TET enzymes that are nonheme Fe(II) ␣-ketoglutaratedependent dioxygenases that have multiple domains (7,8). These enzymes oxidize the methyl group on 5C iteratively, forming three intermediates: 5hydroxymethyl cytosine (5hmC), 5-formyl cytosine (5fC), and 5-carboxyl cytosine (5caC) (9). The base excision repair enzyme thymine DNA glycosylase catalyzes removal of 5fC and 5caC from DNA generating an abasic site product (10,11). Apurinic/ apyrimidinic endonuclease 1 hydrolyzes the phosphodiester backbone of the abasic site, generating a 5Ј deoxyribose phosphate flap in a one-nucleotide gap (12). DNA polymerase (pol) ␤ removes the 5Ј deoxyribose phosphate group using its lyase activity and catalyzes gap-filling DNA synthesis (13,14). Finally, the DNA ligase III-XRCC1 complex or DNA ligase I catalyzes ligation of the DNA backbone (15).
Although a role of 5mC and 5hmC in epigenetic gene regulation has been established, such a role for 5fC and 5caC has yet to be uncovered (16). However, 5fC and 5caC were detected in genomes from neuronal tissue and stem cells, and proteins specific for binding to 5fC and 5caC in DNA have been reported (17)(18)(19)(20). Information on how the cytosine modifications might influence DNA repair pathways, such as BER, is relatively lim-ited (21, 22). Furthermore, the potential for these modified cytosine bases to alter the activity and fidelity of DNA polymerases during repair and replication has not been quantitatively assessed, and conflicting reports have been published.
In general, these modifications in the template base position can have two effects on DNA polymerization. They may reduce the catalytic efficiency of dGTP insertion, and these modifications may alter dNTP selectivity, leading to a decrease in fidelity. It has been hypothesized that the formyl and carboxyl modifications promote the imino tautomeric form of cytosine (23). The imino form of cytosine is potentially mutagenic because it can form a Watson-Crick base pair with adenine (24). However, analytical experiments have revealed that these modifications do not promote the tautomeric form of cytosine and call into question the potential of these bases to encode for adenine insertion into DNA (25,26).
Previous replication studies in Escherichia coli using a shuttle vector assay suggested that 5mC and its oxidized forms could induce C-to-T transition mutations at a low frequency (ϳ0.2-1.1%) (27). Another in vivo study, using cos-7 cells, found 5fC to induce C-to-T mutations at a frequency of 0.03-0.28% (28). Qualitative in vitro kinetic studies using DNA polymerases , , and Klenow exo and the DNA template base 5fC revealed a frequency of 1% for the insertion of dATP versus dGTP (24). Another qualitative kinetic study revealed that 5mC and its oxidized forms had no effect on insertion of dGTP or dATP with Klenow exo and human pol , as compared with unmodified templating cytosine (29).
To provide clarity to the disparate results observed with DNA polymerases regarding the templating properties of oxidized cytosine, we measured single-turnover and steady-state kinetics of nucleotide insertion by human DNA pol ␤. The results indicate that the oxidized forms of 5mC do not significantly alter the fidelity of pol ␤. A series of crystal structures reveal the molecular basis for the accommodation of these epigenetic modifications in the active site of pol ␤.

Results
We first measured the single-turnover rate constants of pol ␤-catalyzed nucleotide insertion with 1-nt gap DNA substrates containing C, 5mC, 5hmC, 5fC, or 5caC in the templating base position (Fig. 1). In these reactions, pol ␤ and dGTP are at saturating concentrations allowing for the determination of the observed rate constant, k pol . The k pol values are comparable among these varying templating base DNA substrates ( Fig. 2 and Table 1), suggesting that the modifications do not alter the rate-limiting step of insertion. Although the k pol values are similar, the reaction amplitudes vary between the substrates (Fig.  2). Longer reaction time points revealed that a fraction of the 5fC (ϳ25%) and 5caC (ϳ40%) substrates were not extendable, suggesting that a population of these templates were altered.
To determine whether the cytosine modifications alter the fidelity of pol ␤, catalytic efficiencies were measured for correct (dGTP) and incorrect (dATP, dTTP, and dCTP) nucleotide insertions. The catalytic efficiency (k cat /K m ) for nucleotide insertion is a measure of the productive associations between the binary complex (pol ␤-DNA) and dNTP. Templating bases  3 for templating C, 5mC, 5hmC, 5fC, and 5caC, respectively. The reported error is from the fit. Table 1 Single-turnover kinetic summary for the insertion of dGTP opposite indicated templating base The data were collected as described under "Materials and methods" and in the legend to Fig. 2

DNA replication of epigenetically modified cytosine
5mC, 5hmC, or 5fC did not significantly alter the catalytic efficiencies for insertion of dGTP ( Fig. 3 and Table 2). In contrast, templating 5caC decreased the catalytic efficiency of dGTP insertion by ϳ20-fold as compared with a 1-nt gap templating C substrate ( Fig. 3 and Table 2). DNA polymerase fidelity can be determined by comparing the catalytic efficiencies of the correct insertion to those of incorrect insertion (correct/incorrect). The catalytic efficiencies for misinsertion were measured for each template (Table 2), and the results are shown on a discrimination plot (Fig. 3). The fidelity among the various templates can be visually appreciated by comparing the distance between the correct and incorrect nucleotide insertion on the plot (30). Fig. 3 shows that the fidelity of pol ␤ is not significantly altered when replicating these templating cytosine modifications.
To provide structural insight into the positioning of 5-modified cytosine within the active site of pol ␤, we solved the struc-ture of pol ␤ in complex with 5hmC and 5caC in the templating position and compared the results with cytosine. The crystallographic and refinement data statistics are shown in Table S1.
First, a binary complex structure of pol ␤ with 5caC in the template position was solved. The structure revealed clear density for the carboxyl group and positioning of the 5caC base in the canonical anti-conformation (Fig. S1). To form a ternary complex, the binary complex crystals were soaked with a nonhydrolyzable dGTP analog, dGP(CH 2 )PP. Binding of dGP(CH 2 )PP resulted in pol ␤ undergoing the well-recognized open-to-closed conformational change; Watson-Crick base pairing between the incoming dGP(CH 2 )PP base and templating 5caC base is observed (Fig. 4). Overlay of ternary complex structures of templating C and 5caC reveals that the 5Ј phosphate backbone of the 5caC templating base is repositioned (Fig. 4). This repositioning of the phosphate backbone is a consequence of the repulsion between the 5-carboxylic group and nonbridging oxygen of the templating phosphate. Lysine 280 is positioned to make electrostatic and hydrogen bonding interactions with the 5-carboxyl group and the nonbridging oxygen of the templating phosphate backbone (Fig. 4, inset).
Based on these results, we suspected the formyl and hydroxymethyl modifications containing only one oxygen could be positioned in either of two conformations or somewhere in between. To provide insight into the positioning of the oxygen atom on 5hmC and 5fC in the active site of pol ␤, we solved a ternary complex structure of 5hmC with the nonhydrolyzable dGTP. Like the structure of 5caC, Watson-Crick base pairing was observed between the base of the incoming dGTP analog and 5hmC, and clear density for the 5hmC modification was present. The 5hmC modification is positioned to form a stabilizing hydrogen bond with N4 of cytosine and avoid a steric clash with the templating phosphate backbone (Fig. 5).

Discussion
The epigenetic modification of cytosine is found throughout vertebrates and is used to regulate transcription. An estimated 70 -80% of CpG sites are methylated in human somatic cells, making it likely that DNA repair pathways would encounter such modifications. Here we show that 5mC and its three oxidized forms in the templating base position do not alter the single-turnover rate constant, k pol , or the fidelity of DNA synthesis. However, the efficiency of insertion was decreased ϳ20fold with templating 5caC compared with the other base modifications. Structural analysis revealed backbone repositioning of the DNA template caused by a steric clash between the carboxyl group and the nonbridging phosphate oxygen of the template, and this correlates with the reduced catalytic efficiency observed with the 5caC template base. Lysine 280 stabilizes the altered backbone conformation by directly interacting with the carboxyl modification and the backbone phosphate nonbridging oxygens.
Contrary to previous reports of 5fC and 5caC being promutagenic, we observe no evidence for their mutagenicity. Using catalytic efficiencies, we can measure mutation frequencies of ϳ0.0001% (for every million insertions one mistake). A 1% mutation frequency would correspond to a catalytic effi-  The data were collected as described under "Materials and methods" and in the legend to Fig. S2. The reported error for dGTP insertions is the standard deviation of the indicated number of replicates. The standard error from fitting is reported for the misinsertion data. ND, not determined.  (25,26). It is tempting to extrapolate these findings to other eukaryotic polymerases; however, because the fidelity of dATP insertion opposite templating cytosine is low with pol ␤, it may not be the best exemplar. Polymerases that have high fidelity for the insertion of dATP opposite templating C may show differential effects with these modified templating cytosines. Accordingly, further studies with other eukaryotic polymerases would need to be performed before such broad conclusions are made. The methyl, hydroxymethyl, and formyl modifications of cytosine are accommodated within the active site of pol ␤ without the need for side chain or DNA repositioning, consistent with the results of single-turnover kinetics displaying no effect with these modifications. However, the accommodation of the carboxyl modification in the active site requires template repositioning to avoid a steric/repulsive clash. This required repo-

DNA replication of epigenetically modified cytosine
sitioning likely leads to a reduction of productive ternary complexes resulting in a decrease in catalytic efficiency for dGTP insertion. Because the methyl, hydroxymethyl, and formyl modifications are positioned toward the N4 exocyclic amine of cytosine, a steric/repulsive clash with the backbone is avoided. However, the carboxyl modification has two oxygens, so if an intramolecular hydrogen bond is maintained with N4 and one oxygen, a steric/repulsive clash will occur with the other carbonyl oxygen and the phosphate backbone. This is clearly illustrated in Fig. 4.
It is worthwhile to consider the alternative conformations that the carboxyl modification or cytosine base may occupy. For instance, the carboxyl modification could rotate relative to the cytosine base (not planar with the cytosine base), breaking the intramolecular H-bond with N4 but avoiding a steric clash with the phosphate backbone. However, it must be energetically more favorable to maintain the intramolecular hydrogen bound and shift the template backbone. Another potential position that could occur to avoid the steric clash is rotation of the N-glycosidic bond, positioning the cytosine base from the canonical anti-conformation and into the syn-conformation. Presumably, this would happen in the binary state before binding nucleotide, because this is what is observed in the templating 8-oxoguanine binary structure (31). However, in the 5caC binary structure, the carboxyl modification is positioned far enough away from the backbone phosphate to avoid the clash. Only upon base pairing with the incoming dGTP does the templating 5caC shift into the steric clash position.
In previous structures, Lys-280 can be observed in different positions depending on the templating base. However, the methylene portion of the side chain is generally positioned to make van der Waals interactions with the templating base, and the ⑀ amino group typically interacts with a nonbridging oxygen of the templating phosphate backbone (31,32). The 5caC structure reported here is the first observation of Lys-280 making a specific interaction with a templating base. This highlights the versatility of the polymerase active site.
The repositioned conformation observed in the ternary 5caC structure reveals the dynamic nature of the steric block that is created by a kink in the templating DNA strand when bound to the polymerase. This steric block lies on the major groove side of the templating base and discriminates against modifications by shaping the allowable space (Fig. 6). Previous studies revealed that this steric block can be repositioned ("opened") when 8-oxoguanine is in the templating position, allowing the base to sample both anti-and syn-conformations in the binary complex. As observed here in the 5caC ternary complex structure, opening of the steric block mitigates a potential steric clash with the phosphate backbone of the templating strand. However, in contrast to the binary complex structure of pol ␤ with templating 8-oxoguanine, the carboxyl group of 5caC is accommodated within the binary complex structure because it can occupy available downstream space, highlighting a difference between major groove facing modifications of templating purines and pyrimidines.
DNA polymerases will encounter 5C modified cytosines in the templating strand during repair and replication. The consequences these modifications have on the efficiency and the fidelity of different polymerases are important for understanding overall genome integrity. Here we demonstrate that the fidelity of DNA pol ␤ is maintained when encountering these modified cytosines in the template strand. This suggests that the fidelity of BER remains unaltered in the presence of epigenetically modified cytosine.

Materials and methods
The oligonucleotides used are from Integrated DNA Technologies (Coralville, IA), and the modified cytosine oligonucleotides are from the Midland Reagent Company Inc. (Midland, Texas). The fluorescein-labeled oligonucleotides were purified using HPLC, and the unlabeled oligonucleotides were purified using PAGE. Concentrations of the oligonucleotides were determined by UV absorbance at 260 nm using the extinction coefficients given by the manufacturer. dsDNA substrates with a 1-nt gap were prepared by annealing the primer, downstream, and template oligonuclotides at a 1:1.1:1.1 molar ratio, respectively. The sequences of the various oligonucleotides can be found in the supporting data (Table S1). Substrate annealing was performed using a thermocycler, with heating at 95°C for 5 min, followed by a temperature drop of 1°C per min until 10°C was reached.
Pol ␤ was expressed and purified as described for DNA polymerase , with some alterations to the purification strategy (33). Full-length GST-TEV-pol ␤ was expressed from a pGEX4T3 TEV-modified vector in Rosetta2 (DE3) cells (33). After sonication and centrifugation, the lysate was incubated with GSH 4B resin, washed, and then treated with TEV protease overnight at 4°C. Eluted pol ␤ was further purified by cation exchange (SP-Sepharose) and size-exclusion chromatography. TEV-cleaved pol ␤ contains seven noncognate amino acids on the N terminus (GSNSRVD). The concentration was determined by absorbance at 280 nm using an extinction coefficient of 23,380 M Ϫ1 cm Ϫ1 . The pol ␤ concentrations reported throughout reflect the value determined by this method. Pol ␤
Kinetic parameters for gap-filling catalyzed by pol ␤ were determined by two different methods depending on whether a correct insertion or misinsertion of a nucleotide was being measured. The correct insertion reactions consisted of a single time point measurement that was collected to determine the initial velocity of pol ␤, as previously described (34). The single time point measurement contained 1 nM pol ␤, 200 nM DNA substrate, and varying concentrations of dGTP (0.2-100 M). The reactions were performed at 37°C and contained 50 mM Tris-HCl, pH 7.4, 100 mM KCl, 1 mM DTT, 0.1 mg/ml BSA, 10% glycerol, and 5 mM MgCl. The reactions (45 l) were initiated by adding 5 l of pol ␤ and quenched at the appropriate time with 50 l of 200 mM EDTA, 80% formamide, ϳ0.1% bromphenol blue, and ϳ0.1% xylene cyanol. The reactions were then heated to 95°C for 5 min, placed on ice, and loaded into a 22% denaturing polyacrylamide gel. The resulting gels were scanned using a Typhoon scanner and quantified using ImageQuant TL using the rolling ball method. The initial velocity normalized by the enzyme concentration (V 0 /[E]) was calculated by multiplying the fraction product by the concentration of substrate (200 nM) and dividing by the time point and concentration of pol ␤ (1 nM). The resulting V 0 /[E] value was plotted against the respective dGTP concentration, and the data were fit to the Michaelis-Menten equation (Figs. S2 and S3). Because the templates containing 5fC and 5caC were not fully extendable (ϳ75 and ϳ60%), the substrate concentration was corrected (i.e. 150 and 120 nM for 5fC and 5caC, respectively).
The misinsertion kinetic assays were performed under the same standard conditions described above. The reactions were performed with subsaturating concentrations of dNTP, 2 or 20 nM pol ␤, and 200 nM DNA substrate. An aliquot was removed at the appropriate time to be quenched in an equal volume of 300 mM EDTA. The reactions were then heated to 95°C for 5 min and then loaded into a 18% or 22% denaturing polyacrylamide gel. The gels were visualized and analyzed as described above. Initial velocities were determined by fitting each time course to a straight line (Fig. S2). The resulting slopes were then divided by the pol ␤ concentration and plotted versus their respective dNTP concentration and fit to a straight line. The resulting slope is the catalytic efficiency (k cat /K m ). In some cases, the catalytic efficiency was measured using a single subsaturating concentration of nucleotide (Fig. S4). Under these circumstances the catalytic efficiencies were determined by dividing the slope by the enzyme concentration and multiplying by the nucleotide concentration.
Single-turnover reactions were performed with a KinTek rapid quench-flow apparatus, as previously described (35,36). The single turnover assays were performed under the following conditions: 50 mM Tris, pH 7.4, 100 mM KCl, 5 mM MgCl 2 , 1 mM DTT, 10% glycerol, and 0.1 mg/ml BSA at 37°C. The reactions were initiated by mixing a solution containing 100 nM DNA and 100 M dGTP, with a 2 M concentration of pol ␤ equilibrated in reaction buffer. The reactions were quenched by addition of 100 mM EDTA. The samples were resolved by 22% denaturing PAGE and quantified using ImageQuant TL with the rolling ball method. Plots of fraction product versus time were fit to a single-exponential equation. Data points from two experiments are shown for the 5mC substrate, and the data are fitted together. These experiments are independent with regards to time (performed on separate days), and the enzyme and substrate mixtures were prepared separately but from the same stocks. The k obs values reported here are also comparable with our lab's previously published work: 2.8 s Ϫ1 (37), 2.9 s Ϫ1 (38), and 3.3 s Ϫ1 (39). The k obs values for templating C, 5mC, 5hmC, 5fC, and 5caC are considered to be within experimental error of each other. Kinetic data were analyzed using GraphPad Prism using a nonlinear regression fitting procedure.
Binary complex crystals of human DNA polymerase ␤ with the modified dC as the templating base in a 1-nucleotide gapped DNA were grown as described. The sequence of the template strand (16-mer) was 5Ј-CCGACC*GCG-CATCAGC-3Ј (C* ϭ 5caC or 5hmC). The primer strand (10-mer) sequence was 5Ј-GCTGATGCGC-3Ј. The downstream oligonucleotide (5-mer) was phosphorylated, and the sequence was 5Ј-GTCCC-3Ј. These oligonucleotides were annealed in a ratio of 1:1:1 by heating at 90°C for 10 min and cooling to 4°C (1°C/min) using a PCR thermocycler resulting in a 1 mM mixture of 1-nucleotide gapped duplex DNA with the modified dC base in the templating position. This annealed mixture was further incubated with an equal volume of pol ␤ (14 mg/ml). The pol ␤-DNA complex was crystallized by sitting-drop vapor diffusion at 18°C by mixing 2 l of the complex with 2 l of the crystallization buffer. The crystallization buffer consisted of 14 -16% PEG2000 monomethyl ether, 350 mM sodium acetate, and 50 mM imidazole, pH 7.5. Crystals grew in ϳ2-4 days after seeding. The ternary complexes were obtained by soaking crystals of binary complex in artificial mother liquor with 50 mM MgCl 2 , 20% PEG2000 monomethyl ether, 15% ethylene glycol, and 2 mM (␣,␤)-CH 2 -dGTP for 1-2 h. The crystals were then flash-frozen to 100 K in a nitrogen stream. Diffraction quality data were then collected for the binary and ternary complex crystals as described below.
The data were collected at 100 K on a CCD detector system mounted on a MiraMax-007HF (Rigaku Corporation) rotating anode generator. The data were integrated and reduced with HKL2000 software (40). All crystals belong to the space group P2 1 . Binary complex and ternary complex structures were solved by molecular replacement using 3ISB and 2FMS as reference models, respectively. The structure was refined using PHENIX (41) and manual model building using Coot (42). The crystallographic statistics are reported in Table 2. The structure factors and the coordinates for the human DNA polymerase ␤ complexes with 5caC (6N2R binary and 6N2S ternary) and 5hmC (6N2T ternary) have been deposited in the Protein Data Bank.