Low Fidelity DNA Synthesis by a Y Family DNA Polymerase Due to Misalignment in the Active Site*

Sulfolobus solfataricus DNA polymerase IV (Dpo4) is a member of the Y family of DNA polymerases whose crystal structure has recently been solved. As a model for other evolutionarily conserved Y family members that perform translesion DNA synthesis and have low fidelity, we describe here the base substitution and frameshift fidelity of DNA synthesis by Dpo4. Dpo4 generates all 12 base-base mismatches at high rates, 11 of which are similar to those of its human homolog, DNA polymerase (cid:1) . This result is consistent with the Dpo4 structure, implying lower geometric selection for correct base pairs. Surprisingly, Dpo4 generates C (cid:1) dCMP mismatches at an unusually high average rate and preferentially at cytosine flanked by 5 (cid:2) -template guanine. Dpo4 also has very low frameshift fidelity and frequently generates deletions of even noniterated nucleotides, especially cytosine flanked by a 5 (cid:2) -template guanine. Both unusual features of error specificity suggest that Dpo4 can incorporate dNTP precursors when two template nucleotides are present in the active site binding pocket. These results have implications for mutagenesis resulting from DNA synthesis by Y family polymerases.

Our understanding of spontaneous mutagenesis and lesion bypass during DNA replication has been expanded in the last several years by the discovery of a number of novel DNA polymerases in the Y family (1). These polymerases all lack intrinsic 3Ј to 5Ј exonuclease activity, and several of them can copy DNA templates containing lesions that slow down or arrest synthesis by polymerases in other families. Among several subdivisions of the Y family, the DinB subfamily consists of enzymes sharing significant sequence homology with the product of the Escherichia coli dinB gene, DNA polymerase IV (2). E. coli Pol IV 1 and its human homolog, DNA polymerase (pol ) can bypass certain lesions in template DNA, and they replicate undamaged DNA templates with low fidelity (3)(4)(5)(6). These properties are shared by Rad30A members of the Y family, including eukaryotic DNA polymerase (7)(8)(9)(10)(11)(12), whose inactivation leads to an increased susceptibility to sunlightinduced skin cancer in humans (13,14).
Decades of study indicate that the stability of large genomes depends on the high fidelity of DNA replication. It is therefore of great interest to understand the molecular basis of low fidelity synthesis by Y family polymerases, especially since they are conserved in all three kingdoms of life. Indeed, the genome of the archaeal aerobic thermophile Sulfolobus solfataricus P2 also contains a DinB homolog designated DNA polymerase IV (Dpo4) (15). Recently, high resolution crystal structures have been described of a ternary complex consisting of Dpo4, primer-template DNA, and a correct nucleotide bound at the active site (16). In this structure, the nascent base pair binding pocket is relatively open and has increased solvent accessibility in comparison with polymerases in other families. This predicts that Dpo4 will synthesize DNA with low base substitution fidelity. In support of this possibility, kinetic analysis revealed that Dpo4 inserts incorrect nucleotides at rates ranging from 3 ϫ 10 Ϫ4 to 8 ϫ 10 Ϫ3 (15).
Also of interest is the frameshift fidelity of DinB polymerases. The mesophilic DinB polymerases, E. coli Pol IV and mammalian pol , both contribute to single base deletion mutagenesis in vivo (3,(17)(18)(19), and these polymerases generate high levels of frameshift errors in vitro (2,5). Early studies (20) of the error specificity of exonuclease-deficient eukaryotic DNA polymerases ␣ (B family) and ␤ (X family) considered three possible mechanisms for how single base deletions might be generated during DNA synthesis. Deletions in homopolymeric runs could be explained by the classical strand slippage hypothesis (21). However, deletions of noniterated nucleotides were also generated that were inconsistent with misaligned substrate stability imparted by the correct base pairing possible in repetitive sequences. Thus, two additional hypotheses were put forth. One suggested that some frameshift errors might be initiated by nucleotide misinsertion, followed immediately by primer relocation to create a misaligned deletion intermediate stabilized by one or more correct terminal base pairs (22,23). The other idea (20) was that a noniterated nucleotide might assume a stable conformation in the active site such that it would not template an incorporation event itself but would not interfere with its neighbor's ability to do so. Support for this hypothesis was provided by kinetic analysis of insertion by human pol ␤ with substrates containing abasic sites (24) or propanodeoxyguanosine adducts (25). Those modified template nucleotides were suggested to be misaligned at the active site, with the misalignment stabilized by the incoming dNTP paired with the next template base. Direct support for misalignment in the active site binding pocket is the structure of Dpo4 in which a template guanine at the active site is unpaired, with incoming dCTP paired with the next template guanine (Fig. 6b in Ref. 16). The fact that the active site of Dpo4 can accommodate an unpaired template nucleotide suggests that this polymerase could have low frameshift fidelity, even for loss of noniterated nucleotides.
To test these ideas and to establish a comprehensive view of Dpo4 fidelity that can be used in future structure-function studies of this enzyme and other Y family polymerases, here we report the error rates for all 12 base substitutions and for single base deletions in a variety of sequence contexts. The results reveal the generally inaccurate DNA synthesis capacity of Dpo4, and they strongly suggest that many single base deletions and a subset of base substitutions may result from the ability of Dpo4 to polymerize despite the presence of an extra, unpaired template base in the active site.

EXPERIMENTAL PROCEDURES
Materials-Materials for the forward mutation assay were from previously described sources (26). S. solfataricus P2 Dpo4 polymerase was prepared as described previously (15).
Forward Mutation Assay-Reaction mixtures (30 l) contained 1 nM gel-purified M13mp2 gapped DNA substrate, 40 mM Tris-HCl (pH 9.0 at 22°C), 5 mM MgCl 2 , 10 mM dithiothreitol, 7.5 g of bovine serum albumin, 2.5% glycerol, and 1 mM dATP, dGTP, dCTP, and dTTP. Polymerization reactions were initiated by adding 3.3-10 nM Dpo4, incubated at 70°C for 1 h, and terminated by adding EDTA to 15 mM. DNA products were analyzed by agarose gel electrophoresis as described (26). Reaction products were assayed for the frequency of lacZ mutants as described (26). DNA from independent lacZ mutant phage was sequenced to identify the types of sequence changes generated during gap-filling synthesis. Since an average of 2.6 errors was identified per lacZ mutant, many of the mutants contained both phenotypically detectable and silent changes. Here the error rates are described as the number of observed mutations divided by the number of nucleotides sequenced. The resulting values are similar to those obtained using the error rate calculation for higher fidelity DNA polymerases described earlier (26), which uses only the phenotypically detectable sites, the lacZ mutant frequency, and the value for expression of the nascent strand in E. coli.
Misinsertion Kinetics-The 5Ј-GC substrate was constructed by annealing the following oligonucleotides: 5Ј-GGT GCG GGC CTC TTC GCT ATT ACG CCA-3Ј (primer) and 5Ј-CCT TTC GCC AGC TGG CGT AAT AGC GAA GAG GCC CGC ACC-3Ј (template). A 5Ј-AC substrate used the same primer and a template in which the G (in boldface type) was replaced by A. Oligonucleotides were annealed by mixing 5Ј-32 Plabeled primer with template in a 1:2 ratio. Mixtures were incubated at 85°C and slowly cooled to 25°C. Reaction mixtures (30 l) contained the same components used for the gap-filling assay except for 170 nM template, 1.1-6.7 nM Dpo4 and seven different concentrations of each individual dNTP. Reaction mixtures were incubated at 70°C, aliquots were removed at 2-, 4-, 6-, and 8-min time intervals, and the reactions were terminated with adding an equal volume of 95% formamide in 20 mM EDTA. Reaction products were fractionated on a 16% (w/v) denaturing polyacrylamide gel and quantified using a PhosphorImager. Kinetic constants were derived as previously described (27).

RESULTS
The fidelity of DNA synthesis by Dpo4 was examined using a forward mutation assay that scores a variety of substitution and frameshift errors generated during copying of a lacZ template in a single-stranded gap in M13mp2 DNA. Correct polymerization produces DNA that yields blue M13 plaques in an E. coli ␣-complementation strain. Errors are scored as light blue or colorless plaques, and error specificity is defined by sequencing DNA isolated from independent lacZ mutants. Dpo4 copied the 275-nucleotide mutational target sequence and continued to fill the 407-nucleotide gap to apparent completion as determined by agarose gel electrophoresis (data not shown, but see typical analysis in Ref. 26). The DNA products yielded a lacZ mutant frequency of 16%, a value that is similar to results with human pol (5) and substantially higher than for polymerases in the A, B, or X families. Sequence analysis of the 275-nucleotide lacZ target in 182 independent lacZ mu-tants (50,050 total nucleotides) revealed 476 sequence changes (i.e. 2.6 changes per mutant). Among these, 326 (70%) were single base substitutions (Fig. 1), and 116 (25%) were single base deletions (open triangles in Fig. 1). The remaining changes included nine single base additions ( Fig. 1, slashes), nine tandem double base substitutions, and 16 other examples of one or a few two-base deletions, two-base additions, combined substitution-additions, combined substitution-deletions, large deletions, and more complex changes (not shown). Fig. 2 compares the average error rates of Dpo4 with those of other exonuclease-deficient DNA polymerases obtained using the M13mp2 forward mutation assay. The substitution error rate for Dpo4 is 6.5 ϫ 10 Ϫ3 (326/50,050). This is comparable with the rate for the homologous human pol (5) and 4 -5-fold lower than for mouse and human pol . All four Y family enzymes listed have substantially higher substitution rates than do polymerases in the A, B, and X families (DNA polymerases ␥, ␣, and ␤, respectively) ( Fig. 2A). Thus, Dpo4 serves as a good model for the generally low base substitution fidelity of Y family polymerases in the DinB and Rad30A subfamilies.
Single Base Substitution Error Specificity-Dpo4 generated all 12 single base-base mismatches (Table I), and these were widely distributed in many different sequence contexts throughout the target sequence ( Fig. 1). As is the case with other DNA polymerases, the highest Dpo4 error rate was observed for the T-dGMP mismatch, 4.7 ϫ 10 Ϫ3 . The lowest error rate was 0.6 ϫ 10 Ϫ3 for the G⅐dAMP mismatch, yielding a range of error rates of only 8-fold. The error rates for Dpo4 are similar to those for the human DinB homolog, pol (5), except for the C-dCMP rate, which was unusually high with Dpo4 (Fig. 2B). Interestingly, C-dCMP is the most common misinsertion event generated by the homologous Dbh pol from Sso strain P1 (28).
Single Base Deletion Rate and Error Specificity-The single base deletion error rate for Dpo4 is 2.3 ϫ 10 Ϫ3 . This is comparable with the deletion error rates of other Y family enzymes and is much higher than for polymerases in other families (Fig.  2C). To consider whether the deletions generated by Dpo4 might be initiated by strand slippage, we used the information in Fig. 1 and Table I to calculate subclasses of deletions in different sequence contexts. The average rate for deleting any of the 97 noniterated template nucleotides in the target was calculated to be 1.7 ϫ 10 Ϫ3 . This is similar to the rates at which nucleotides were deleted from repetitive sequence tracts of 2-5 nucleotides (Fig. 3A). A similar, minimal dependence of error rate on repeat tract length has previously been observed for human pol (5) and pol (29). In contrast, deletion error rates for polymerases in other families are lower, and those rates typically increase with increasing repetitive tract length. As one example, the rate of single base deletions by exonucleasedeficient human pol ␥ (A family) within iterated tracts of 4 or 5 nucleotides is 26 times greater than the rate in noniterated DNA (Fig. 3B). The smaller dependence of single-base deletion error rate on repetitive sequence length by Dpo4 implies that most deletions are initiated by a mechanism other than strand slippage.
Active Site Misalignment to Explain Unusual Error Specificity-In consideration of what the deletion mechanism might be, we analyzed the two unusual features of Dpo4 error specificity in greater detail. Surprisingly, 54% of the nonreiterated nucleotides deleted by Dpo4 were cytosine, a preference not seen in previous studies of any other polymerase. Cytosine deletions were not randomly distributed in the target; 19 of the 26 noniterated cytosines that were deleted were flanked by a 5Ј-template guanine (Fig. 1). Equally surprisingly, the third most common base substitution error indicated formation of a C-dCMP mismatch (Table I, column 4).
This mismatch is rarely produced by polymerases in other families (e.g. see Ref. 30 and Fig. 2B). Furthermore, the C⅐dCMP intermediate is generated by Dpo4 at a rate that is 12-fold higher than for pol and even higher than for pol ( Fig. 2B), despite the fact that pol has lower base substitution fidelity overall ( Fig. 2A). As observed for the noniterated cytosine deletions, 30 of 34 C⅐dCMP errors generated by Dpo4 were flanked by a 5Ј-template guanine (Fig. 1, Table I). This strong preference for a flanking template guanine for both errors and the misaligned ternary complex structure previously observed (16) suggest a common mechanism in which Dpo4 skips the template cytosine and incorporates dCTP when paired with the flanking guanine (Fig. 4). If dCMP incorporation is immediately followed by correct synthesis, a deletion will result from the extended misalignment (pathway F, for frameshift). However, realignment prior to further incorporation (pathway B, for base substitution) will generate a C⅐C mismatch at the primer terminus whose extension will result in a C to G substitution.
Kinetic Analysis of Misinsertion-To test this transient misalignment model for formation of C⅐dCMP mismatches, we determined steady state parameters for insertion of each of the four dNTPs with a cytosine (we used lacZ template position 146; Fig. 1) present as the first template nucleotide in an oligonucleotide substrate. The K m and V max values (Table II) were used to calculate the ability of Dpo4 to discriminate between correct incorporation of dGMP and "apparently incorrect" incorporation of dCMP when the template contained either a 5Ј-flanking guanine or adenine. With the 5Ј-GC template, discrimination (f ins ) between insertion of dGMP and dCMP was 0.042 (Table II, line 2). This is similar to the value of 0.027 for stable misincorporation (i.e. insertion plus mismatch extension) of dCMP opposite this same template cytosine during gap-filling synthesis, where five C to G substitutions were observed at position 146 among 182 lacZ mutants (Fig. 1). The active site misalignment model predicts that changing the flanking template base to adenine should de-

Dpo4 Active Site Misalignment
crease dCMP insertion because dCTP opposite template A would be incorrect. This prediction is fulfilled with the 5Ј-AC template, where discrimination against dCMP insertion was increased by 18-fold (f ins ϭ 0.0023, Table II). As an internal control, note that discrimination against insertion of dAMP opposite C was similar using the 5Ј-GC and 5Ј-AC templates (0.0084 and 0.012, respectively), as predicted by the model.
Sequence Context for Other Base Substitutions-To see if this misalignment mechanism could theoretically explain other substitutions, we analyzed the sequence contexts for all 12 mismatches produced by Dpo4 (Table I). As mentioned above, 88% (30/34) of all C to G substitutions occurred at a 5Ј-GC-3Ј template sequence. This is a greater proportion than would be expected by chance, since only 22 of 79 cytosines in the target (28%) are flanked by a 5Ј-G (Table I, line 12). As a consequence, the average rate of C to G substitutions flanked by G is 75 ϫ 10 Ϫ4 , which is 19-fold higher than the rate of 3.3 ϫ 10 Ϫ4 for all other template cytosines. The 19-fold difference in rate at 5Ј-GC sequences compared with 5Ј-(A/T/C)C sequences (Table  I, last line, last column) is similar to the 18-fold decrease in dCMP misinsertion resulting from replacement of the flanking G with A (Table II).
A similar analysis of T to G transversions (Table I, line 6) indicates that T⅐dCMP mismatches are also preferentially generated in a manner consistent with active site misalignment. Thus, 79% of these substitutions were at the 19% of template thymines flanked by a 5Ј-G, yielding an error rate that is 15-fold higher than for the same substitution flanked by the other three bases. This substitution also involves a template pyrimidine and incoming dCTP. It is notable that dCMP incorporated opposite a template G is the most catalytically favored event for Dpo4 among the four possible correct base pairs (15). Analysis of the reciprocal error involving a template C and incoming dTMP revealed a 3-fold sequence-dependent bias in error rate. A 3-fold sequence-dependent bias was also seen with the A⅐dCMP mismatch, the third possible error involving incoming dCTP, as well as with the A⅐dAMP mismatch.

FIG. 4. Model for deletions and C to G base substitutions.
Step a, lower line represents part of the sequence of the template strand. The upper line shows the 3Ј-end of the primer strand. An incoming dCTP residue is noncomplementary to a template cytosine residue (shown in boldface type) but can form a Watson-Crick base pair with the 5Ј adjacent guanosine, leaving the template cytosine unpaired (b). b, the view is a comparable simplified representation of the DNA substrate and incoming nucleotide in the Dpo4 type II ternary complex presented in Ref. 16 in which an incoming dGTP mispairs with a template G but can base-pair with a 5Ј adjacent C residue. As described under "Results," if the incoming C residue can form a phosphodiester bond at the 3Ј-end of the primer strand (c), the ultimate consequences can be either a deletion mutation (pathway F) or a substitution mutation (pathway B).

DISCUSSION
This study provides a comprehensive view of the fidelity of DNA synthesis by Sso Dpo4, a DinB DNA polymerase in the Y family. When copying undamaged DNA, Dpo4 is highly inaccurate for essentially all types of single base substitutions and deletions in a large number of different sequence contexts. This generally low fidelity was anticipated by homology to other Y family members that also have low fidelity (5,29). In fact, Dpo4 and its human homolog DNA polymerase generate most base substitutions at remarkably similar rates, and these rates are much higher than those generated by polymerases in the A, B, and X families. Like human pol and pol , Dpo4 also generates an unusually high rate of single base deletions, including deletions of noniterated bases whose loss is not predicted by the classical strand slippage hypothesis.
These common fidelity characteristics are interesting in light of recently published structures of Y family polymerases. These include the apoenzyme structures of S. cerevisiae pol (31) and the full-length (28) and N-terminal fragment of S. solfataricus P1 Dbh polymerase (32). Most relevant to this study are the structures of two different ternary complexes of Sso Dpo4 with primer-template DNA and a nucleotide bound at the active site (16). These structures reveal that Y family members have the same general right hand shape and thumb, fingers, and palm subdomains found in polymerases in other families. The structures of the palms suggest that Y family members have a catalytic mechanism in common with other polymerases. However, several structural features of Y family polymerases are distinctive. Dpo4 has a "little finger" subdomain that is not present in other polymerase families but is present in Sso Dbh (designated "wrist" in Ref. 28) and in yeast pol (designated "PAD" in Ref. 31). The Dpo4 little finger fits into the major groove upstream of the active site and contacts the backbone of both strands. It is tethered to the thumb via a positively charged loop that contacts the template strand backbone. The Dpo4 thumb is smaller than that observed in polymerases in other families and it contacts the backbone of both strands on the opposite (minor groove) side of the duplex. The primertemplate is B-form, with a minor groove that is deeper and narrower than in most other polymerase-DNA structures. Dpo4 does not contact any bases in the minor groove upstream of the active site, which differs from other polymerases (e.g. see Fig. 2 in Ref. 33), which may use such contacts to probe for base pairing correct geometry. The fingers domain of Dpo4 that contacts the incoming nucleotide and the single-stranded template is unusually small and lacks the ␣-helix in other polymerases (34 -36) that has a critical role in fidelity (reviewed in Ref. 33). The side chains surrounding the nascent base pair in Dpo4 are small and hydrophobic, in contrast to the larger side chains present in other polymerases. Thus, the nascent base pair binding pocket in Dpo4 is relatively open and has increased solvent accessibility in comparison with polymerases in other families. These structural features are consistent with the ability of Dpo4 to incorporate various mismatches in different sequence contexts at relatively high rates ( Fig. 1 and Table I), regardless of differences in mismatch shape, size, orientation, or base hydrogen bonding, stacking, or hydration potential.
Pyrimidine⅐pyrimidine mispairs have been suggested to be sterically excluded from polymerase active site binding pockets by the increased bulk resulting from bound water molecules that are not efficiently displaced during noncomplementary base pairing (37). Consistent with this is the observation that C⅐dCMP is usually the least frequent mismatch generated by DNA polymerases, including pol ␣ and avian myeloblastosis virus reverse transcriptase (30,38) and pol (29) and pol (5). Thus, we were surprised to see that the average C⅐dCMP error rate for the 79 cytosines monitored in this study was unusually high (Table I, last line, and Fig. 2B). Five observations suggest that most of these errors involve correct dCMP incorporation opposite the template guanine flanking a preceding, unpaired cytosine, followed by realignment and mismatch extension (Fig. 4, pathway B). The first is the structure of a ternary complex of Dpo4 revealing an unpaired template strand base in the active site binding pocket and the next template base correctly paired with an incoming dNTP (Fig. 6b in Ref. 16). The second is the fact that the error rate for C to G transversions is 19-fold higher when C is flanked by G rather than the other bases (Table I). The third is the observation that discrimination against dCMP insertion is 18-fold lower when cytosine is flanked by G than when flanked by A (Table II). Fourth is the fact that template cytosine flanked by guanine is the most frequently deleted nucleotide. This is consistent with pathway F of the model (Fig. 4), wherein dCMP incorporation is simply followed by continued correct synthesis without realignment. Finally, the limited effect of increasing repeat sequence length on the deletion error rate of Dpo4 (Fig. 3) suggests that an unpaired nucleotide can exist in the substrate (e.g. as seen in the misaligned Dpo4 structure) without being stabilized by the correct base pairing that is possible in repetitive DNA.
How general is the misalignment mechanism for different errors and different polymerases? The branched pathway in Fig. 4 can account for a substantial proportion of errors generated by Dpo4. Among base substitutions (Table I), pathway B can account for almost all errors involving incoming dCTP opposite C or T, a smaller portion of the third mismatch involving dCTP opposite A and possibly some mismatches involving incoming dATP opposite A. Dpo4 crystal structures with the inferred misaligned intermediates might provide insights into why template pyrimidines and incoming dCTP seem to be particularly preferred. Pathway F (Fig. 4) can readily account for deletions of noniterated nucleotides, and it may also contribute to deleting iterated nucleotides as well. This mechanism may be common to other Y family polymerases, since the limited effect of repeat sequence length on Dpo4 deletion rate (Fig. 3A) is shared by human pol (5) and pol (29). Since Dpo4 Active Site Misalignment these human polymerases and Dpo4 were assayed for fidelity at 37 and 70°C, respectively, this further suggests that active site misalignment readily occurs at both temperatures during replication. This does not exclude the possibility that reaction temperature may affect the rate of misalignment or the balance between extension of mispaired versus mismatched termini (see branch point of Fig. 4). In addition, since mutations consistent with active site misalignment are observed at much lower frequencies during replication at 70°C by the A family Taq polymerase (39), it is likely that unique structural elements associated with Dpo4 and other Y family polymerases stabilize the misalignment. Interestingly, the error spectrum in Fig. 1 indicates that certain 5Ј-GC-3Ј sequences yield C to G substitutions (e.g. nucleotide 146), whereas others yield deletions of C (e.g. nucleotide 9). Thus, additional sequence contexts, such as the identity of the primer terminal base pair, may determine whether realignment (Fig. 4, pathway B) or direct primer extension (pathway F) occurs. This balance may differ among polymerases, even those in the same family. Thus, Dpo4, pol , and pol share common frameshift error rates (Fig. 2C) and limited response to increasing run length, yet the latter two enzymes do not share with Dpo4 the preferential formation of C⅐dCMP or T⅐dCMP mismatches in specific sequence contexts. However, C⅐dCMP and T⅐dCMP have the highest relative misincorporation efficiencies of formation among all mispairs for the Dbh polymerase of Sso P1. It is speculated that these mispairs may arise from misalignments in either the 5Ј or 3Ј direction and over more than one base along the template strand (28). Therefore, enhanced formation of these specific mispairs by strand misalignment may be a common property of archaeal DinB polymerases. Finally, the ability of Dpo4 and perhaps other Y family members to incorporate a dNTP while an unpaired template strand nucleotide is located in the active site binding pocket may be relevant to their ability to bypass certain lesions and to sometimes generate deletions while doing so (10,40).