If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
DNA polymerases are today used throughout scientific research, biotechnology, and medicine, in part for their ability to interact with unnatural forms of DNA created by synthetic biologists. Here especially, natural DNA polymerases often do not have the “performance specifications” needed for transformative technologies. This creates a need for science-guided rational (or semi-rational) engineering to identify variants that replicate unnatural base pairs (UBPs), unnatural backbones, tags, or other evolutionarily novel features of unnatural DNA. In this review, we provide a brief overview of the chemistry and properties of replicative DNA polymerases and their evolved variants, focusing on the Klenow fragment of Taq DNA polymerase (Klentaq). We describe comparative structural, enzymatic, and molecular dynamics studies of WT and Klentaq variants, complexed with natural or noncanonical substrates. Combining these methods provides insight into how specific amino acid substitutions distant from the active site in a Klentaq DNA polymerase variant (ZP Klentaq) contribute to its ability to replicate UBPs with improved efficiency compared with Klentaq. This approach can therefore serve to guide any future rational engineering of replicative DNA polymerases.
), where they often must manage DNA molecules having unusual structural features. For example, some WT bacterial polymerases can replicate DNA-containing UBPs that interact by steric complementarity without any interbase hydrogen bonding (
Expanded genetic alphabets and engineered DNA polymerases
These prior studies, however, only “scratch the surface” of the possible modified nucleotides that have scientific, biotechnological, and medical value. For example, DNA can have many more independently replicable groups than A, T, G, or C (Fig. 1). Scientifically, these groups can be varied to learn whether WC DNA represents the only molecular solution for the long-term storage and replication of biological information (
). Alexander Rich was likely the first to recognize the importance of this question, suggesting in 1962 that the isocytidine:isoguanine (S:B) pair (Fig. 1) might be used as an additional information storing unit in DNA (
Further, many have asked whether the sugar and phosphate components of the DNA backbone might be modified. Synthetic biologists have therefore synthesized numerous such modified DNA molecules to delineate molecular changes that can (and cannot) be tolerated by duplex DNA (
) synthesized UBPs that exploit orthogonal patterns of hydrogen bonding. This led to multiple discoveries about the molecular changes that can (and cannot) be tolerated in replicating units, especially how the fidelity of replication can be influenced by protonation, deprotonation, and multiple tautomeric forms of the nucleobases (
), an artificially expanded genetic system (AEGIS) in which A, G, T, and C are augmented by two unnatural pyrimidine analogs, pseudocytidine (or isocytidine) (S) and 6-amino-3-(2′-deoxyribo-furanosyl)-5-nitro-1H-pyridin-2-one (Z), and their size and hydrogen bond complementary partners, the purine analogs, isoguanine (B), and 2-amino-8-(1-β-d-2′-deoxyribofuranosyl)imidazo[1,2-a]-1,3,5-triazin-[8H]-4-one- (P) (
) (Fig. 1). The additional P:Z and B:S base pairs form three interbase hydrogen bonds, as does the natural G:C. Work in this area has also been driven by commercial factors; the use of AEGIS UBPs as components of diagnostic tests has generated lifetime sales in excess of $1 billion (
Future disruptive technologies exploiting these and other expanded genetic alphabets will almost certainly require engineered DNA polymerase variants having expanded ranges of biophysical and catalytic properties. Unfortunately, and despite extensive research into the structure and mechanism of DNA polymerases from a wide range of organisms (
), rationally predicting how specific amino acid replacements in these enzymes will impact the incorporation of any particular kind of UBP into duplex DNA remains almost impossible. This reflects a lack of fundamental understanding about how multiple factors act synergistically to control UBP incorporation efficiency and fidelity (
). These factors include nucleobase pair complementarity, nucleobase tautomerism, variations in hy-drogen bond free energies, and the conformational dynamics of the polymerase itself as incorporation proceeds through the catalytic cycle. Directed evolution methods (
These facts make it timely to review how X-ray crystallography, computational modeling, and directed evolution strategies can be combined to obtain DNA polymerases capable of replicating artificial DNA containing UBPs. Given the recent publication of a comprehensive discussion about the replication of UBP-containing DNA by unengineered WT polymerases (
), we draw on work from our laboratories, the only studies (to our knowledge) that employ a combined experimental and computational approach.
Generally speaking, enzymes that replicate DNA must manage three contradictory demands as they catalyze their critical reactions. First, this class of polymerases must be highly specific, making not much more than one mistake every billion turnovers (
). Mistakes mean somatic mutations potentially leading to cancer, germ line mutations that create inherited genetic disease, and, ultimately, an error catastrophe that leads to the death of the organism. Typical replicative polymerases achieve specificity at both the elongation step and a proofreading step, the latter in the form of 3′–5′ exonuclease domains (see below). Other replicative DNA polymerases, such as reverse transcriptases (
), lack proofreading and are therefore more error-prone. DNA polymerases that function primarily in DNA repair, including those important for genome stability and trans-lesion synthesis, also often have lower fidelity (
Second, DNA polymerases must accept four different substrates: (i) template dG and dCTP, (ii) template dC and dGTP, (iii) template dA and dTTP, and (iv) template dT and dATP. Most other enzymes accept only one.
Finally, DNA polymerases must work rapidly. An E. coli cell replicates its entire genome by copying 4000 nucleotides each second. Even though many replication forks are used, this rate is as fast as that of many enzymes using only one substrate in pathways able to tolerate mistakes.
Managing these demands is especially difficult because the four substrates have few molecular features in common to present to the polymerase (Fig. 1). For example, the four nucleobases present different functionalities to the major groove, a methyl group for T, a hydrogen for C, two hydrogen bond acceptors for G, and one donor and one acceptor for A.
One feature that all four substrates have in common is their relative size, leading to the rule that large purines must pair with small pyrimidines. Further, the functional groups located on the sides of all four bases in the minor groove of the DNA duplex share a common feature: electron density presented by the exocyclic C=O groups of T and C and the N-3 nitrogen in the purine rings of A and G. Thus, the “minor groove scanning hypothesis” proposes that polymerases donate hydrogen bonds to this electron density in the template, primer, and incoming triphosphate as a way of enforcing Watson–Crick geometry, and hence fidelity, on base pair recognition (
). The electronic constraint in the minor groove also appears to be important for nonstandard base pairing, even pairing that, as in the Romesberg pair (Fig. 1), involves no interbase hydrogen bonding (
). Moreover, the easiest AEGIS UBP to incorporate into DNA by polymerase-catalyzed reactions has been Z:P (Fig. 1), which is the only pair of those with shuffled hydrogen-bonding patterns for which both components present electron density to the minor groove (
). DNA synthesis proceeds in the 5′ to 3′ direction in a primer-dependent reaction requiring Mg2+ as a cofactor. In the first step, the enzyme forms an initial binary complex with a DNA duplex composed of a template strand and a shorter, complementary primer strand possessing a 3′-hydroxyl group. dNTPs then bind to the binary complex to give a ternary (or “pre-incorporation”) complex (Fig. 2). Correctly matched dNTPs reside longer in the active site, allowing the enzyme to adopt a “closed” conformation in which the 3′-hydroxyl group in the primer can attack the α-phosphate of the bound dNTP (Fig. 2A). This reaction requires the presence of two (
), Mg2+ ions within the active site of human polymerase β (Fig. 2C) or Klentaq (Fig. 2D). Subsequent release of PPi generates a binary (or “post-incorporation”) complex in which the primer has been elongated by one nucleobase, which then serves as the substrate for the next templated addition of dNTP. These steps are repeated until no unpaired nucleobases remain in the template strand.
Replicative DNA polymerases, especially those for which DNA binding is the rate-limiting step, exhibit a property referred to as processivity (
). Processivity, of course, permits these DNA polymerases to achieve high rates of nucleotide incorporation. This process requires translocation of the enzyme along the template-primer or the template-primer along the enzyme (depending on your point of view) during polymerization. We discuss the conformational changes that occur during dNTP incorporation below.
When mismatched dNTPs are present in the ternary complex, kinetic evidence shows that the active site remains in an “open” conformation so that dissociation of the incoming nucleotide can take place prior to formation of the new P–O bond (
) (Fig. 2B). Alternatively, should a mismatched dNTP be incorporated, most replicative polymerases include a 3′–5′ exonuclease domain that can remove the mismatch. In this case, the 3′-hydroxyl end of the primer shifts from the polymerase to the exonuclease active site for removal of the mismatched nucleotide (
). This proofreading step is essential for ensuring fidelity during replication by DNA polymerases, which typically make errors every 104 to 105 nucleobases. The 3′–5′ exonucleolytic activity improves this error rate by 2–3 orders of magnitude (
) when DNA polymerases and their variants are challenged with AEGIS UBPs (Fig. 3). For example, incorporation efficiency can be measured by nested PCR using primers tagged at the 5′-end with multiple consecutive UBPs. In this assay, amplicons are formed in the reaction mixture only if the polymerase is capable of replicating the UBPs (Fig. 3A).
Assaying fidelity requires the DNA polymerase to replicate a sequence in which UBPs are present (
) (Fig. 3B). In the event that the UBP is lost during PCR cycles, two possible unique restriction sites are created in the amplicons, depending on whether the UBP is replaced by T:A or C:G. Incubating the PCR products with two restriction endonucleases therefore gives cleaved products for the amplicons from which the UBP has been lost in both possible transition mutations (Fig. 3C). This assay thereby reports on the extent to which polymerase fidelity deviates from 100%. In addition, performing the PCR with different concentrations of the UBP components permits a quantitative assessment of fidelity.
Obtaining DNA polymerases engineered for altered substrate specificity
Some, albeit limited, success has been reported in modifying the substrate preferences of DNA polymerases by structure-guided mutagenesis studies (
). The complexity of the conformational changes undergone by the polymerase during each catalytic cycle seems to require that variants contain multiple residue replacements. Most successful efforts to obtain DNA polymerases with altered catalytic and/or biophysical properties have therefore selected suitable variants from large libraries (
). Examples include several Taq polymerase variants evolved for specific functions (Table 1). The fact that several of the amino acid substitutions are found in more than one of these Taq variants may ultimately inform rationale design of variants with new functional properties in the future. To date, however, only one of these published variants (ZP Klentaq) has been subjected to structural and dynamic characterizations to determine the impact of each of the various substitutions on the properties of the variant polymerase (see below).
Table 1Examples of evolved Taq DNA polymerase variants
) is often used to obtain engineered DNA polymerases. In CSR, water droplets containing appropriate primers and dNTPs are formed in a water/oil emulsion. Droplets that also contain a single E. coli cell carrying a plasmid encoding a single variant of a thermophilic DNA polymerase, such as Klentaq, can then be used to select enzymes with desired properties. In our work (
), we employ the nested PCR strategy described above (Fig. 3A) to identify DNA polymerase variants that can replicate UBPs orthogonal to WC nucleobases, such as Z and P, because only these will replicate copies of their gene in the plasmid (Fig. 4). Those variants that cannot incorporate UBPs efficiently will not replicate their genes. Thus, genes encoding DNA polymerases with the desired substrate specificity become represented more in the mixture of DNA amplicons obtained when the emulsion is broken at the end of several rounds of PCR (Fig. 4). The resulting DNA amplicons are inserted into a new set of plasmids that are used to transform E. coli for use in a subsequent selection round. After a defined number of rounds, genes encoding DNA polymerase variants with the ability to incorporate UBPs become highly enriched in the library. The set of enriched genes can be sequenced to identify the molecular changes that permit the polymerase variant to replicate the UBP. Standard methods can then be used to express and purify individual variants prior to any detailed evaluation of their fidelity and incorporation efficiencies (
The original CSR strategy, however, requires the replication of the entire polymerase gene, which contains more than 2000 nucleobases. Thus, only the most active variants are recovered. Many technical aspects, including the appropriate choice of oil or surfactant containing the water droplets (
), including compartmentalized self-tagging (CST) (Fig. 4B). In CST, the positive feedback loop depends on the polymerase “tagging” a plasmid containing its encoding gene by extension of a biotinylated oligonucleotide. As a result, plasmids containing genes that encode DNA polymerases with altered substrate selectivity can be enriched by selective capture onto an affinity column. Clearly, the UBP(s) must be located within a segment of plasmid DNA that is replicated with high efficiency and fidelity by the DNA polymerase variant. In addition, the sensitivity of the assay is higher because CST does not require multiple self-replications of the complete gene encoding the polymerase variant.
Insights from structural studies of WT and evolved Klentaq DNA polymerase complexes
In contrast to some other critical cellular enzymes and machines, DNA polymerases have been reinvented throughout evolution (
). As a consequence, several distinct structural classes of polymerases provide multiple opportunities to identify an enzyme that will efficiently and faithfully replicate an unnatural base pair. Replicative DNA polymerases have been classified as families A–D, with D being the most recently identified DNA polymerase family from Archaea (
To date, naturally occurring family A and B DNA polymerases have been used to successfully replicate single UBPs. Family A polymerases are found primarily in bacteria; their peptide fold is described as including fingers, palm, and thumb domains associated with polymerase activity (Fig. 5A). They also have 3′–5′ exonuclease and 5′–3′ exonuclease domains, the second removing the RNA primers required for lagging strand synthesis (
), the N-terminal 5′–3′ exonuclease domain is absent, and the 3′–5′ exonuclease domain lacks activity. In contrast, family B DNA polymerases, found in all archaea, include a 3′–5′ exonuclease domain and a polymerase domain, which is related to that in family A polymerases (
), respectively. It has proven much more difficult, however, to replicate templates containing consecutive hydrophobic nucleobases. The problem appears to be caused, in part, by disruptions to the DNA double helix resulting from the presence of hydrophobic UBPs (
); as yet there is no report of DNA containing consecutive NaM:PTP3 (or structurally related hydrophobic UBPs) being replicated. Of course, the necessity of replicating more than one UBP, either consecutive or separted by standard WC nucleobases depends on the biotechnological application (
A truly artificial genome would presumably include consecutive UBPs just as natural genomes include runs of sequential A:T or G:C pairs; cells possessing such a genome would require a DNA polymerase that can replicate through these regions of sequence. We have shown that up to 4 consecutive Z:P pairs can be incorporated by WT Klentaq polymerase (
), meaning that the UBP is easily lost during PCR-based applications. An engineered Klentaq variant (ZP Klentaq), exhibiting improved fidelity and Z:P incorporation efficiency was, however, obtained using CSR (Fig. 4A) (
). Remarkably, ZP Klentaq contains only four amino acid substitutions, M444V, P527A, D551E, and E832V, all of which are located distal from the active site (Fig. 5A).
In family A DNA polymerases, a large conformational change in the fingers domain occurs upon binding of the complementary dNTP, resulting in the formation of a closed complex that facilitates incorporation (Fig. 5). Of the structurally characterized DNA polymerases related to Klentaq at the sequence level (Table 2), only three have been captured in both pre- and post-incorporation complexes: ZP Klentaq (
). Incorporation of nucleotides by the Geobacillus enzyme exhibits a fingers domain closure angle of 37°. In contrast to the Geobacillus enzyme, incorporation of natural dNTPs by WT Klentaq involves a much larger conformational change in which the fingers domain rotates by ∼59° (Fig. 5, B and C). In the closed conformation of the polymerase, correctly paired nucleotides are selected by hydrogen bonding and size complementarity. Despite retaining these features, Z:P pairs are more efficiently and faithfully incorporated by ZP Klentaq, in which the fingers close down by ∼64° (Table 3). Protein-nucleic acid interactions in both pre- and post-incorporations for ZP Klentaq are similar to those observed in the analogous WT complexes, suggesting that the evolved polymerase closely mimics the WT enzyme in this regard (Fig. 5, D and E).
Table 2Structure-based alignments of Taq-related DNA polymerases
) and retain standard features (groove widths, base stacking parameters, etc.) consistent with these helical forms, modeling of a post-incorporation complex of P:Z bound to the WT Klentaq suggested that some structural alterations within the active site would be required to accommodate the UBP (
). The two base pairs closest to the site of incorporation of the next dNTP exhibit A-form, which is significantly wider than B-form. This feature of the active site may allow WT Klentaq to accommodate a single hydrophobic UBP, which is wider than hydrogen-bonded pairs (
Of the amino acid substitutions in ZP Klentaq, M444V was considered to be the best candidate for providing access to increased relative domain motion as compared with the WT enzyme. Residue 444 resides within the hydrophobic core of the palm domain, which serves as the command center for the enzyme. Both the fingers and thumb domains are connected to the palm domain. In addition, the essential catalytic residues Asp-610 and Asp-785, which coordinate Mg2+ and the incoming dNTP in the active site of the enzyme, reside in the palm domain (Fig. 5F). Substitution of a Val for Met at this position creates open space within the core that could potentially translate into increased motion of the fingers and thumb domains.
Augmenting crystallography with molecular dynamics simulations of natural and evolved DNA polymerases
Molecular dynamics (MD) simulations are a well-validated method of studying protein dynamics (
). For example, microsecond trajectories have shown how the dynamic motions of ZP Klentaq differ from those of WT Klentaq in the binary complex containing template-primer DNA duplexes (Fig. 6A). They can also highlight correlated motions in networks of residues that are critical to altered domain motions in the variant DNA polymerase (Fig. 6B) (
), which permit the engineered enzyme to bind WC and Z:P-containing template-primer duplexes in an equivalent fashion, thereby increasing the efficiency of UBP incorporation. This seems to be a consequence of replacing Met-444 and Asp-551 at the base of the thumb domain by valine and glutamate, respectively, which allows the fingers, palm, and thumb domains in ZP Klentaq to move into position about the AEGIS template-primer duplex. Thus, replacing Met-444 by valine in the hydrophobic core changes the populated interactions with adjacent residues (Phe-564, Met-779, and Leu-780) in the hydrophobic core (Fig. 6C) with the consequence that the G helix becomes more flexible in the variant compared with the WT polymerase (
As well as modeling how amino acid substitutions might impact long time-scale motions, such as domain reorientations, MD simulations can illuminate specific molecular changes that have the greatest impact on the altered catalytic biophysical and catalytic properties of the DNA polymerase variant. In the case of ZP Klentaq, the role of each of the substitutions (M444V, P527A, D551E, or E832V) was determined using MD simulations of the four single-point Klentaq variants (
). These calculations showed that replacing Met-444 in the hydrophobic core of the palm domain with valine makes a significant contribution to altering dynamic motions in ZP Klentaq. In addition, changing Asp-551 to glutamate alters the interactions of the H and I helices at the base of the thumb domain, which affects G helix flexibility (Fig. 6B) (
Care must be taken to perform MD simulations on the correct form of the enzyme. For example, introducing valine in place of Glu-832, which breaks a salt bridge with Arg-596, has little impact on the dynamics of the ZP Klentaq/DNA binary complex. However, it does alter protein/DNA interactions. This substitution may therefore exert its effects in the pre-incorporation (ternary) complex during dNTP binding and incorporation.
MD simulations have also provided interesting insights into the “closing mechanism” of DNA polymerase I prior to the incorporation of WC nucleobases (
). This I707L Klentaq variant exhibits low activity at 37 °C but efficiently replicates DNA at 68 °C, permitting improved PCR amplification of difficult sequences. MD simulations show that the I707L Klentaq variant is almost immobile at 37 °C but exhibits increased flexibility at 68 °C. In contrast, the WT enzyme exhibits less pronounced dynamic changes in simulations performed at 37 and 68 °C. Thus, it is likely that the reduced activity of the I707L Klentaq variant is associated with the enzyme remaining in its closed conformation at lower temperatures (i.e. dNTP binding is blocked) and that the pre-incorporation complex cannot form. As observed for ZP Klentaq (
), the molecular origins of this behavior could be traced to changes in the hydrophobic core; replacing isoleucine by leucine results in an alternate conformation of an adjacent phenylalanine (Phe-749), thereby repositioning the O and O1 helices. As a consequence, the active site becomes filled with nucleobases in the template overhang, thereby blocking entry of the incoming dNTP.
Realizing the full potential of UBPs in the creation of new research, diagnostic, and therapeutic tools will ultimately depend on our ability to identify DNA polymerase variants that can faithfully and efficiently replicate those UBPs. Many of these applications will necessarily require enzymes that are active at the high temperatures used in PCR, limiting the pool of candidate DNA polymerases primarily to family A and family B members. For other applications, such as using dNaM-dTPT3 (Fig. 1) in semi-synthetic organisms that introduce noncanonical amino acids into specific proteins or the isothermal amplification of AEGIS UBPs (
), ZP Klentaq takes advantage of increased domain flexibility to improve the replication efficiency of Z:P pairs, even though both WT Klentaq and ZP Klentaq grip the template-primer duplex in a similar manner prior to and following incorporation of the AEGIS UBP.
Combining X-ray crystallography with MD simulations yields two important conclusions. First, not all substituted residues impact the properties of the evolved enzyme equally. Second, substitution of a key residue, such as Met-444, can result in increased relative domain motion, even in an enzyme that exhibits a very large conformational change during catalysis. It is this altered motion that allows ZP Klentaq to incorporate UBPs more readily than the WT enzyme.
CSR, CST, and related strategies for the directed evolution of DNA polymerases, which typically produce variants containing several amino acid substitutions, might therefore be improved by targeting fewer residues outside the active site for variation. Coupling structure determinations, MD simulations, and enzymatic characterization will provide guiding principles for CSR library design and facilitate rational approaches to optimize family A DNA polymerases to replicate multiple UBPs. Such work can be guided by understanding amino acid substitutions already identified in Klentaq variants (Table 1); for example, Asn-583, Ile-614, and Met-747 are substituted in three of eight variant enzymes, whereas Glu-520, Glu-602, Ala-609, Glu-615, and Glu-742 are substituted in two of eight variants. None of these residues are involved in direct hydrogen-bonding interactions with the substrate template-primer or the dNTP in the active site (Fig. 7). The past evolutionary history of DNA polymerases may also prove useful for selecting specific sites for variation (
We also note that whereas most engineering efforts to date have focused exclusively on improving the ability of DNA polymerases to incorporate UBPs with little consideration for exonuclease activity, a Thermococcus gorgonarius DNA polymerase variant selected to replicate xeno-nucleic acids contains two amino acid substitutions that inactivate the 3′–5′ exonuclease site (
). In theory, incorporation of UBPs that slow down polymerization would allow the substrate to be positioned within the exonuclease active site for proofreading. Whether the exonuclease would efficiently process the newly incorporated unnatural nucleobase or not has yet to be fully investigated. It is possible that engineered DNA polymerases may have to possess optimized exonuclease function as well as improved efficiencies for UBP incorporation to maintain artificial genomes.
In summary, the next generation of useful DNA polymerases must (i) replicate more than one UBP with rates and fidelities similar to those observed for WT DNA polymerases when replicating WC dNTPs and (ii) exhibit no significant pausing following UBP incorporation to avoid invoking proofreading mechanisms. These properties will permit the fidelity of these engineered enzymes to approach 99.99%, allowing them to generate full-length replication products in applications involving multiple rounds of PCR. We suggest a strategy for obtaining these DNA polymerase variants that builds on studies of ZP Klentaq, in which a combination of X-ray crystallography and computer simulations showed the importance of a few key residues for conferring increased dynamic motion in the thumb and fingers domains, thereby improving UBP incorporation efficiency. In this approach, altering the properties of an evolved DNA polymerase variant by introducing additional amino acid substitutions will take advantage of a fundamental, structure-based understanding of how dynamic changes in residue “networks” impact kinetic properties (
Funding and additional information—This work was supported by Biotechnology and Biological Sciences Research Council, UK, Grant P/018017/1 (to N. G. J. R.). This review is also based on work supported by the National Science Foundation under Grant MCB-1939086 (to M. M. G. and S. A. B.). Research reported in this publication was also supported by the National Institutes of Health under Director's Award R01GM128186 (to S. A. B.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of interest—S. A. B. owns intellectual property associated with AEGIS nucleobases. Some of the compounds mentioned in this article are sold by Firebird Biomolecular Sciences, LLC, which is owned by S. A. B.