Structural evidence for an in trans base selection mechanism involving Loop1 in polymerase μ at an NHEJ double-strand break junction

Eukaryotic DNA polymerase (Pol) X family members such as Pol μ and terminal deoxynucleotidyl transferase (TdT) are important components for the nonhomologous DNA end-joining (NHEJ) pathway. TdT participates in a specialized version of NHEJ, V(D)J recombination. It has primarily nontemplated polymerase activity but can take instructions across strands from the downstream dsDNA, and both activities are highly dependent on a structural element called Loop1. However, it is unclear whether Pol μ follows the same mechanism, because the structure of its Loop1 is disordered in available structures. Here, we used a chimeric TdT harboring Loop1 of Pol μ that recapitulated the functional properties of Pol μ in ligation experiments. We solved three crystal structures of this TdT chimera bound to several DNA substrates at 1.96–2.55 Å resolutions, including a full DNA double-strand break (DSB) synapsis. We then modeled the full Pol μ sequence in the context of one these complexes. The atomic structure of an NHEJ junction with a Pol X construct that mimics Pol μ in a reconstituted system explained the distinctive properties of Pol μ compared with TdT. The structure suggested a mechanism of base selection relying on Loop1 and taking instructions via the in trans templating base independently of the primer strand. We conclude that our atomic-level structural observations represent a paradigm shift for the mechanism of base selection in the Pol X family of DNA polymerases.

Two major DNA repair systems can resolve DNA doublestrand breaks (DSB): 4 homologous recombination (HR) and nonhomologous end joining (NHEJ) (1). HR is an accurate process that takes advantage of the presence in the cell of a DNA duplex that is homologous to the DSB site and that will restore faithfully the DNA integrity (2). In contrast, the NHEJ pathway relies only on the two DNA ends at the DSB sites and does not require the presence of homologous DNA. Importantly, NHEJ is usually error-prone, leading to the loss or the addition of a few nucleotides (3). HR can occur only in dividing cells during late S and G 2 stages, after synthesis of a homologous DNA molecule, whereas the NHEJ pathway can be potentially activated in all phases of the cell cycle and is thought to be the major DSB repair system in higher eukaryotic cells (4).
The NHEJ pathway involves sequential interactions of proteins allowing stabilization, end processing, and ligation of the DSB. The eukaryotic NHEJ machinery is composed of the Ku heterodimer (Ku 70/80), DNA-PKcs, Artemis nuclease, Pol and/or Pol , and the ligase IV-XRCC4 -XLF complex, with accessory roles by Paralog of XRCC4 and XLF and apratoxin and PNK-like factor (5,6). The same machinery participates in a programmed genetic recombination that occurs in developing lymphocytes called V(D)J recombination (7)(8)(9). During this process, TdT incorporates random nucleotides at the coding end to increase immune repertoire diversity (10,11). The expression of TdT is limited to primary lymphoid organs, where B-and T-cell maturation occurs, and consequently TdT does not participate in the NHEJ pathway in other cell types (12).
For the last 50 years, TdT has been described as a templateindependent polymerase (13,14). Indeed, it behaves like a nucleotidyltransferase even in the presence of an in cis template strand (Fig. 1A). However, recent results demonstrate the ability of this enzyme to carry out templated activity across strands in the presence of a downstream (in trans) DNA duplex with a 3Ј-protruding end at high DNA/TdT ratios (15). This activity was also described earlier for Pol at an equimolar DNA/Pol X ratio (16), as well as an intrinsic nucleotidyltransferase activity in the presence of transition metal divalent ions (Fig. 1A). TdT and Pol are two members of the polymerase X family that share high sequence and structure similarity (17,18) (see Fig. S1 for their alignment). From a structural point of view, the main difference between these two polymerases is the sequence of a long loop (Loop1) composed of 20 amino acids (382-401 in TdT) between the ␤3 and ␤4 strands (Fig.  1B). In all TdT structures, Loop1 adopts a lariat-like conformation (Fig. 1C) that prevents the binding of an uninterrupted template DNA strand (19). In contrast, Loop1 is disordered in Pol structures obtained in a gap-filling complex and does not interact with the continuous template DNA molecule used for crystallization (20). Thus far, Pol could not be crystallized in complex with a true DNA-DSB substrate, despite substantial efforts from several laboratories, whereas TdT could not be crystallized in the presence of a 1-nt-gapped DNA substrate (Fig. 1C).
Extensive biochemical experiments were performed on TdT and Pol to better understand the difference between their activities. For instance, Pol can acquire a template-independent activity by single point mutation or by exchanging the catalytic metal ions from Mg 2ϩ to Mn 2ϩ (21). Conversely, it is possible to transform TdT to a template-dependent in cis polymerase by a single point mutation that destabilizes the Loop1 conformation (22), as probed by regular primer extension tests with a primer-template duplex containing a 5Ј-end overhang on the template strand. Interestingly, the deletion of Loop1 in Pol improves both DNA binding and catalytic efficiency in DNA-templated reactions (in cis) but inhibits its weak intrinsic template-independent activity (23). Similar experiments with TdT lead to the same conclusion: deletion of Loop1 leads to a drastic decrease of the untemplated activity correlated with an increase of the in cis template-dependent activity (22). Furthermore, grafting Loop1 of Pol to a chimeric TdT (Fig. 1B) confers to TdT an in cis templated activity (22) (Fig. 1A) and vice versa (23). These results highlight the importance of Loop1 for the specific activity of both Pol and TdT. However, the role of Loop1 of Pol specifically in a DNA-bridging context is far from clear.
Interestingly, crystal structures of TdT in the presence of a full DNA synapsis could be obtained (15) and showed that Loop1 is crucial to maintain a tight binding of TdT across the DNA synapsis (Fig. 1C). This raised the question of whether Loop1 has the same role in Pol and possibly uses a similar mechanism of base selection.
Here, we present a complete functional and structural characterization of the TdT-Loop1-Pol chimera (Fig. 1B), hereafter referred to as the TdTchimera. We previously showed that it is a templated enzyme across a discontinuous template strand, taking its instructions in trans at a 1:1 DNA/TdT ratio (15). We now demonstrate the biological relevance of this Tdt-chimera protein using an in vitro NHEJ ligation assay, which shows functional properties similar to Pol . We then solve and compare its crystal structure in the apo form with that of Pol (20) as well as in the gap-filling mode. We also report the structure of a ternary complex with the downstream dsDNA (down-dsDNA), where Loop1 is fully ordered and prevents the binding of the upstream dsDNA (up-dsDNA) but actively participates in the selection of the incoming dNTP in front of a template base located in trans. Related studies on LigD in prokaryotes show striking similarities with this mechanism (24). Finally, we present the structure of a ternary complex of TdT-chimera with a full DNA-DSB synapsis and an incoming nucleotide, which could be used as a model for the Pol -DNA DSB complex.

The TdT-Loop1 chimera
The first Pol X chimera was described in 2006 (23). In this paper, Juarez et al. created both a Pol ⌬Loop1 mutant and a chimeric construct (Pol -TdT chimera) in which Loop1 was replaced by the one from TdT. Electrophoretic mobility shift assay experiments with Pol ⌬Loop1 and different DNA substrates suggested that Loop1 negatively affects the DNA-binding capacity of Pol . Furthermore, deletion of Loop1 in Pol produced a 10-fold improvement of the catalytic efficiency of in cis templated polymerase activity and abolished the intrinsic template-independent activity of Pol (23). Functional assays showed a similar polymerase activity between the Pol -TdT chimera and TdT, in the presence of ssDNA and templateprimer substrate.
We performed similar experiments using a TdT-chimera (described in Fig. 1B), obtained by the grafting of Loop1 of Pol in a TdT context (22). Note that in the TdT-Pol chimera, 29 amino acids were changed, containing Loop1 (20 residues). In addition, 4 residues upstream and 5 residues downstream were changed, thereby also including the SD1 region, which stands out as "maximally different" between TdT sequences and Pol sequences (25). Comparable template-dependent polymerase activity was observed in the TdT-chimera and Pol using an in cis DNA substrate (Fig. 1A), whereas WT TdT displays an essentially untemplated polymerase activity in the same conditions (22). More recently, we demonstrated a similar templatedependent activity both in TdT-chimera and Pol using an in trans DNA substrate (Fig. 1A) (15,26). Here we describe further functional studies of TdT-chimera using an in vitro NHEJ ligation assay.

Ligation of compatible or incompatible 3 overhangs in the presence of full-length WT TdT, TdT-chimera, and Pol
In Fig. 2, we show that XRCC4 -ligase IV alone is sufficient for ligation of compatible 4-nt 3Ј overhangs (lane 3, 53% efficiency), whereas the addition of Ku 70/80 ensures an even more efficient (Ͼ88%) ligation (lanes 4 -7) ( Fig. 2A). This reflects in vivo data showing that Ku 70/80 is not essential for ligation of overhangs containing at least 2 bp of microhomology, which can be generated upon hairpin nicking during V(D)J recombination (40). Sequencing data show that ligation proceeds without nucleotide addition by a polymerase, likely because rapid base pairing of these overhangs occurs faster than templateindependent or template-dependent polymerase activity at the DNA ends (Fig. 2B).

Structural model of the base selection mechanism of Pol
Whereas XRCC4 -ligase IV is sufficient for ligation of compatible overhangs, we find that substantial ligation of incompatible 3Ј overhangs does not occur in the absence of TdT-WT, Pol , or TdT-chimera (lanes 12-14) ( Fig. 2A). Sequencing data show that at least 1 bp of microhomology must become available through either template-dependent or template-independent nucleotide addition before ligation of the ends can occur (Fig. 2C). In particular, these data show that full-length TdT adds nucleotides randomly until at least 1 nucleotide is available for base pairing with the downstream strand. As expected, this reflects the known template-independent activity of TdT-WT.Conversely,Poladdsnucleotidesmainlytemplate-dependently, although there are four instances where a template-independent addition of 1 nucleotide occurs prior to templated addition, illustrating a small degree of template-independent activity (Fig. 2C). Importantly, the activity of TdT-chimera is much more like that of Pol than that of TdT in terms of template dependence because most of the nucleotides added are A and thus complementary to the T overhang of the righthand DNA end. This clearly indicates that Pol Loop1 does indeed confer template-dependent activity to the TdT-chimera across strands, in the context of a DNA synapsis. In addition to the ligated product sequence, the ligation efficiency also emphasizes that the chimera is more like Pol than TdT. Specifically, ligation in reactions with Pol and TdT-chimera are very similar in efficiency, in contrast to the efficiency for the TdT reactions ( Fig. 2A, compare lanes 12, 13, and 14). Therefore, it is important to note that both the joining efficiency and junctional sequencing support the conclusion that the chimeric protein behaves like Pol rather than TdT.  (15,18,22), illustrating its template-dependent (T-D) or template-independent (T-I) activities, both in cis and in trans. B, sequence alignments of Loop1 sequence in Mus musculus TdT-WT, TdTchimera, and Pol . Residues belonging to Loop1 are represented in boldface type. The region SD1 is indicated in green. The special numbering choice for the insertion Gln 393 is highlighted in black. C, previously known structures of complexes of TdT (pink) or Pol (blue) with a primer strand and a downstream DNA duplex (D/S DNA), in a gap-filling mode or with a full DSB-DNA junction. The colored arrows at the bottom indicate the 5Ј to 3Ј orientation of strands used in the DNA substrates. X means 'not possible' and a slash 'not done.'

Overall structure of TdT-chimera apoenzyme or in complex with the incoming nucleotide
Structures of TdT-chimera apoenzyme or as a complex with an incoming dideoxynucleotide were solved at 2.20 and 1.96 Å resolution, respectively. The overall architecture of the TdT-chimera protein is almost identical to the TdT apoenzyme structure (RMSD of 0.517 Å over 336 C␣ atoms using PDB entry 1JMS). The only notable difference is observed in Loop1, localized between strands ␤3 and ␤4. In TdT apoenzyme, this loop adopts a lariat-like conformation, with a clear electron density (19), whereas in TdT-chimera apoenzyme, 16 amino acids (positions 384 -399) of 20 are missing in the electron density map (Fig. 3). Therefore, Pol 's Loop1 (in the context of TdT) appears to be as flexible as reported in the context of Pol apo-structure or engaged in a gap-filling complex (20,27).
Interestingly, in the structure of the TdTchimera dNTP-Mg 2ϩ complex, Loop1 becomes mainly visible, with the exception of residues 394 -396, and contains a short ␣-helix. Some crystal contacts were observed that might stabilize this short helix (especially Arg 393 ), but the rest of the Loop1 structure is free from such contacts and is sufficient to prevent the binding of an uninterrupted template strand (Fig. 3), as described in various TdT-dNTP structures (28). Here Asp 399 and Arg 403 make specific hydrogen bonds with the nucleobase (Fig. 3), which is stacked between the conserved positions Trp 450 and Arg 454 . We note that in a bacterial Pol X from Thermus thermophilus, the equivalent of Arg 454 , namely Lys 263 , has also been seen to be essential for strong binding of dNTP-Mg 2ϩ (29). The triphosphate moiety of the dNTP binds at the same place in all other TdT or Pol structures.
Asp 399 and Arg 403 belong to a specific sequence motif located at the end of Loop1, called SD1 and first identified by Romain et al.

Structural model of the base selection mechanism of Pol
tion of Arg 403 in Pol results in an increased nucleotidyltransferase activity (21). Here, we mutated the same position in the context of the Tdtchimera and tested the activity of the R403A mutant for its in trans templated activity (Fig. S2). We found a decreased activity with all four substrates, thereby confirming the important role of this side chain suggested by the X-ray structure.
We also observed a rearrangement in the catalytic site involving the side chain of the Asp 434 . Specifically, in the apo form and, in fact, in all known structures of Pol and TdT, this aspartate makes a salt bridge with Arg 432 (both Asp 434 and Arg 432 residues have been shown to be essential in TdT by site-directed mutagenesis. 5 Here it changes partners from Arg 432 to Arg 403 , from the SD1 motif, preventing the correct coordination of metal A, which is absent in the structure of the dNTP-Mg 2ϩ complex (Fig. S3). We note that the equivalent of Asp 434 in Pol is seen in both conformations in PDB structure 1XSN and that metal A is known to be the last partner to bind to complete the assembly of the catalytic site in Pol ␤ (31). A similar (but not identical) mechanism involving a change of partners in salt bridges occurs in the catalytic site of Pol ␤ when switching from the open form to the closed form (32); specifically, Asp 192 switches from interacting with Arg 258 to metal B, whereas Arg 258 changes rotamer to interact with Glu 295 and Tyr 296 in motif SD2.
In summary, the presence of the incoming nucleotide participates in the organization of Pol 's Loop1, whereas in TdT, Loop1 is intrinsically ordered and adopts a similar conformation (RMSD ϭ 1.95 Å) in the absence or in the presence of an incoming nucleotide in known structures.

Exchanging Loop1 allows TdTchimera to bind a DNA substrate in a gap-filling mode
To check whether TdT-chimera reproduces the known behavior of Pol in those cases where structural data are available, we co-crystallized the TdTchimera in complex with a 1-nt-gapped DNA duplex substrate and a nonhydrolyzable nucleotide (Fig. 4A). In this nonhydrolyzable dNTP, the oxygen atom between ␣and ␤-phosphate has been substituted by a carbon atom, to prevent DNA synthesis and to block the enzyme in a precatalytic state. The structure was solved at 2.35 Å resolution with two copies of the complex in the asymmetric unit. The electron density of each one of the DNA bases is well-defined and readily allows the building of the DNA molecules as well as the incoming nucleotide, which makes Watson-Crick interactions with the templating base (Fig. S4A). Binding of the uninterrupted template strand is possible because residues 384 -401, corresponding to Loop1, are disordered ( Fig.  4C). Such a complex could not be obtained under the same conditions using WT TdT, probably because of its intrinsically ordered Loop1, whereas several similar structures, also showing a disordered Loop1, have been solved with polymerase (20,27,33).
The two copies of the same complex in the asymmetric unit have no major difference (RMSD of 0.28 Å over 329 C␣ atoms). Their comparison with the corresponding complex with Pol shows that the protein structures are very close (RMSD of 1.95 Å over 328 C␣ atoms of the protein using PDB entry 2IHM), as well as the DNA molecules; the RMSD is 1.255 Å over 11 nucleotides for the template strand, 1.30 Å for 6 nucleotides in the upstream primer strand, and 1.36 Å over 4 nucleotides for the downstream primer strand (Fig. 4D). In both structures, Loop1 is disordered and gives way to the DNA template strand. One difference involves the ␤2-␣12 loop that appears more flexible in the TdT-chimera gap-filling complex because no electron density is present to build residues 452 and 453 (Fig. 4C), whereas the N-terminal part of ␣12 helix is slightly distorted and shifted by 3.3 Å. Concerning the nucleobase of the incoming dNTP, its orientation is slightly modified compared with the one seen in the ddCTP complex to make a Watson-Crick bp with the templating base, whereas the Arg 454 side chain swings to allow this rearrangement (Fig. 4B).
In the TdT-chimera gap-filling structure, we used a 5Ј-phosphorylated downstream primer because the presence of a phosphate group in this position was described to be important for the binding of the downstream DNA strands in Pol (33,34). Importantly, we also tested the gap-filling activity in vitro for the TdT-chimera and found that it essentially reproduces the activity of Pol and not that of TdT (Fig. S5).

Loop1 checks the in trans nucleotide selection in the absence of a DNA primer strand
By mixing TdT-chimera with a 2-fold excess of the dsDNA and an incoming ddCTP, we obtained crystals that lead to a detailed picture of a possible role for Loop1 at 2.09 Å resolution. Unexpectedly, only the downstream dsDNA was visible in the electron density (Fig. S4B), and Loop1 was ordered and actively involved, through its main-chain atoms, in stabilizing the Watson-Crick interactions of the nascent bp with an in trans instructing base. All residues of Loop1, including the side chains, could be manually built (Fig. 5, A and B), and several secondary structure elements were identified, including two sequential 3 10 helices and an ␣-helix of 7 residues (Fig. 6A). Notably, no crystal contact is involved in the stabilization of this conformation. The full upstream dsDNA is excluded by Loop1, whereas in TdT's comparable structure, Loop1 just prevents the binding of a continuous template strand but not of the primer strand (Fig. 7D). The incoming ddCTP, which can access the nucleotide binding site through a dedicated channel formed by the 8-kDa and fingers domains, makes Watson-Crick interactions with the first 3Ј protruding base of the downstream template strand and nicely fits a cavity created by Loop1 (Fig. 7B), whereas the rest of the protruding bases make their way out of the active site through a separate exit channel, encompassed between Loop1 and the thumb domain (Fig. 7C).
The incoming ddCTP is positioned by Loop1 Pol opposite the most downstream template possible (the last ssDNA/template nucleotide before dsDNA), even when another upstream complementary nucleotide is present (Figs. 5A and 7A). This is consistent with previously described studies of template selection by Pol (33,34).
We observe a slight distortion in the catalytic site, where the 2 angle of the catalytic residue Asp 434 (Asp 418 in Pol ) is rotated by 82°compared with the apo-form (Fig. S3). This is due to an interaction with the side chain of His 381 (His 363 in Pol ), which is also stabilized by stacking interactions with Arg 403 . This rotation prevents the correct coordination of metal A in the active site.
Loop1 drastically changes its conformations and is completely remodeled when compared with the ddCTP binary complex (Fig. 7D). It interacts with residues in the fingers domain (Leu 260 ), the palm domain (Gln 379 , Arg 403 , and

Structural model of the base selection mechanism of Pol
Asp 434 ), and the thumb domain (Arg 454 , Arg 461 , Asp 473 , Asn 474 , and His 475 ) of the protein (Fig. 6, B-D). The interactions with the incoming base involve only main-chain atoms of Loop1 (Fig. 6C), and the templating base interacts with the conserved residue Arg 461 , whose mutation into an alanine has a strong deleterious effect in TdT (25). Importantly, all of the interactions of Loop1 in the chimera construct with the rest of the TdT-like structure involve residues that are conserved in Pol or subject to a conservative substitution (Fig.  6C). To investigate further how this structure would be modified in the context of the full Pol sequence, we modeled this complex using homology modeling techniques. This is justified, considering the high level of sequence identity (42%) between them.

Modeling of the full sequence of Pol in the context of the downstream dsDNA complex
Both in the TdT-chimera X-ray structure and the Pol homology model, the following features were observed. 1) The main-chain atoms of Asn 391 , Leu 392 , and Arg 393 amino acids stabilize the nascent bp formed by the incoming ddCTP and the template base across strands, but their side-chain atoms play no apparent role. This is shown in Fig. 7B, where residues from Loop1 are represented in surface mode and colored in dark blue, playing the role of the absent upstream dsDNA. Clearly, the check is made at the level of the nascent bp volume, and there is no base specificity: all isosteric bp would be accommodated in the same way in this cavity. 2) There is a direct interaction of the base immediately downstream of the templating base with the side chain of Arg 393 , which also interacts with the side chains of both Ser 388 and Asn 391 in Loop1 (Fig. 6C). 3) The side chains of Gln 393B and Thr 397 make hydrogen bonds with the DNH motif (Asp 473 , Asn 474 , and His 475 ), also called the SD2 region (22) or SD2 motif (25) or the thumb mini-loop motif (16), localized in the ␤8 -␤9 loop (Fig. 6D). Mutations of this motif in human Pol (16) resulted in loss of function. 4) There is a van der Waals contact between Phe 401 in region SD1 both with Trp 450 and with the SD2 region. F401A mutation human Pol (16), resulting in total loss of activity. 5) Both residues Phe 401 and Phe 405 (SD1) make a sandwich for the side chain of His 381 (N terminus of Loop1), thereby clipping both ends of Loop1. The mutation of Phe 405 (F387A in human Pol ), as well as in mouse Pol (F391A), resulted in a total loss of function (16,25). 6) Asp 399 side chain stabilizes the short N-terminal helix of Loop1. Its mutation in mouse Pol (D385E) resulted in a total loss of function (25).
Interestingly, two arginine residues (Arg 454 and Arg 458 in TdT, corresponding to Lys 438 and Arg 442 in Pol ) are close to position Ser 372 (Ser 388 in TdT), localized in the middle of Loop1, which is the main cyclin-dependent kinase phosphorylation site in Pol during S and G 2 phases (35). A reduced activity of Pol was observed when this position was mutated into a glutamate residue, mimicking a phosphorylated serine, suggesting a regulatory mechanism to avoid NHEJ activity in dividing cells. The structure therefore suggests how these two arginines would interact with a phosphorylated serine at position Ser 372 and increase the stability of Loop1, thereby preventing the binding of the primer DNA binding and inhibiting the polymerase activity.
It should be noted that a short extra DNA strand forming a triple helix with each dsDNA is present in the electron density map, forming 1A:2T triple bases. This third strand does not interact with the protein, except with Glu 67 , far away from the active site, but stabilizes packing interactions that occur between neighboring DNA duplexes in the crystal. To check

Structural model of the base selection mechanism of Pol
whether the presence of this extra strand could induce an artifactual conformation of the dsDNA in the crystal, we compared its structure with an earlier structure of TdT (PDB code 5D46) with a DSB-DNA synapsis and found that the RMSD is 1.16 Å over the backbone atoms of 12 nucleotides (3 atoms per nucleotide: C4Ј, C1Ј, and P) for the downstream dsDNA.
Furthermore, we removed the third strand and performed energy minimization in water; the RMSD of DNA atoms of the TdTchimera was only 0.8 Å, whereas the RMSD on C␣ atoms was 1.1 Å.
We also subjected the Pol homology model to energy minimization in water in the presence of both ddCTP and the down-dsDNA. After equilibration, the RMSD on DNA atoms was 0.8 and 1.0 Å for the protein C␣ atoms. Notably, Loop1 conformation was remarkably stable.

Flexibility of Loop1 allows Pol to interact with a full DNA synapsis
We also solved the structure of TdTchimera bound to a full DSB-DNA substrate (a DNA synapsis) and with an incoming ddCTP, at 2.55 Å resolution, by increasing the dsDNA/ protein ratio to 4:1 instead of 2:1. This time, both upstream and downstream dsDNA can be fully built in the electron density map (Fig. S4C). The TdT-chimera DSB-DNA structure is highly similar to the TdT-WT DSB-DNA structure (RMSD of 0.504 Å over 336 C␣ atoms with PDB entry 5D46). Moreover, ddCTP is present in the nucleotide-binding pocket and makes Watson-Crick interactions with the first single base at the 3Ј protruding end of the downstream dsDNA molecule (across strands). Loop1 appears to be disordered in this structure, so that it does not sterically hinder the binding of the template strand (Fig. 8, A and B), as observed in the TdT-chimera structure in a gap-filling mode (Fig. 4).
A third DNA strand is present on each DNA duplex, forming a triple helix with a 1A:2T stoichiometry (Fig. 8A). As described above, these additional strands help to stabilize interactions in the crystal-packing arrangement, but they do not interact directly with the protein. They also do not directly participate in the stabilization of the DNA synapsis itself, as observed in the TdT-WT DSB-DNA complex. To check whether the presence of this extra strand could induce an artifactual conformation of the DNA in the crystal, we compared its structure with the known structure of TdT with a DSB-DNA synapsis; the RMSD is 0.5 Å over 6 nucleotides for the upstream primer, 5 nucleotides for the downstream primer strand, and 6 backbones for the downstream template strand (with 3 backbone atoms per Structural model of the base selection mechanism of Pol nucleotide). We note that the third DNA strand is not in the same direction in the downstream and upstream parts of the synapsis.
In summary, it is possible to crystallize the TdTchimera construct in the context of a full DNA-DSB junction, but in this case, Loop1 is lifted up and moved out of the way of the upstream DNA duplex, as if it is not needed any more once it has played its role to select the base in front of the in trans templating base.

Discussion
The studies presented here provide the first atomic structure for two DNA ends brought close together by a protein that recapitulates the properties of a Pol X DNA polymerase involved in the NHEJ machinery, in this case Pol , in ligation experiments.

Comparison of Pol X activities during NHEJ in the presence of different 3 ends
The biochemical ligation tests using Ku 70/80, XRCC4ligase IV, and either the full-length TdT, Pol , or TdT-chimera provide useful insights into the role of Pol X polymerases in NHEJ (Fig. 2). First, TdT robustly adds nucleotides in a template-independent manner prior to ligation in the case of incompatible DNA ends. However, TdT does not add nucleotides when compatible DNA ends are being joined. This illustrates that the collision and annealing of the DNA ends is rapid relative to the encounter of those ends with the Pol X polymerase. This observation confirms and extends previous work showing that when DNA end structures are compatible, then new synthesis is suppressed. This was indeed apparent in very early work before specific proteins were identified for NHEJ (36,37). More recently, the degree of Pol X engagement was shown to be directly proportional to the extent to which there was a barrier to direct ligation (due to sequence overhang incompatibility), both in vitro and in vivo, for Pol (38).
Second, for incompatible DNA ends, Pol usually adds nucleotides that generate terminal microhomology. But in ϳ25% of instances in the experiments here for this configuration, it appears that Pol adds at least 1 nucleotide in a template-independent manner. This raises the possibility that the microhomology nucleotide is also template-independent, and we are only observing the subset of events where Pol added, by chance, a nucleotide that provided 1 bp of terminal microhomology. The remaining nucleotides could reflect fill-in synthesis by Pol in a template-dependent manner. The clearest tests of template-dependent versus template-independent addition by Pol are with dideoxynucleotides or immobilized DNA ends (33,39,40), and in these tests, Pol shows both template-independent and template-dependent activity. Our in trans structural studies show synthesis across a discontinuous template by both Pol and TdT-chimera (15). All of the aforementioned biochemical and structural data are consistent with the original conception of Pol 's ability to cross a discontinuous template (39).
Most importantly for this study, in the NHEJ biochemical assays using TdTchimera, the nucleotide additions are much more like those of Pol than of TdT. This illustrates the impor-tance of Loop1 in the distinction between TdT and Pol , directly in the context of NHEJ, and validates that TdTchimera can be used to characterize the role of Loop1 in Pol and the SD1 region at the structural level.

Loop1 in the context of the Pol X family: Positioning SD1 and SD2 regions
The length of Loop1 is one of the main differences observed among members of the polymerase X family. This loop is composed of only 4 and 9 amino acids in polymerase ␤ and polymerase , respectively, whereas Loop1 is made up of 20 amino acids in TdT and 17-21 residues in polymerase (Fig. 5C). All structures of individual members of this family were solved by X-ray crystallography. Loop1 can be observed in Pol ␤, Pol , and TdT structures, but not in any of the currently available Pol structures (this loop is too small in Pol ␤ and Pol to interfere with the template DNA path). At the sequence level, Loop1 is more conserved in TdT sequences than in Pol sequences, and there seems to be an inverse correlation between sequence conservation and flexibility of Loop1. Indeed, Loop1 always adopts the same fixed correlated conformation in TdT, where the sequence conservation is high. On the other hand, Loop1 sequence is more divergent in Pol , resulting in an increased flexibility of this loop in Pol .
Just downstream of Loop1, there is an important region called SD1 that is differentially conserved in Pol and TdT (Figs. 1B and 5C). Our structures indicate that Loop1 ordering in the complex with the down-dsDNA is responsible for the new positioning of the SD1 region (located at the C terminus of Loop1) with respect to the SD2 region and the catalytic site (Fig.  6), and this probably explains its importance for functional aspects of Pol .
For Pol , Loop1 is too short to play the role described here. However, Loop3, coming from the down-dsDNA side, might be able to play a similar role, as suggested by the superimposition of the different structures in a recent review (26). Answering this question will require additional structural studies of Pol in the context of a true DNA synapsis, as we have done here.

Sequence of conformational changes during Pol catalytic cycle in the presence of 3 overhanging DNA ends
Recent studies have provided a wealth of structural and biochemical information about the DNA-bridging binding properties of eukaryotic Pol X polymerases (15,26). Nevertheless, the order of substrate binding as well as the role of Loop1 for Pol activity in the NHEJ pathway remains a central unknown aspect. The new structural information revealed here by using TdTprotein may be organized as follows to explain the function of Loop1 during the NHEJ pathway in the presence of 3Ј protruding ends by Pol (Fig. 9). First, our data on the binary complex with dNTP would be compatible with the idea that Pol is always "loaded" with a dNTP (see below). When a DSB is detected and stabilized by Ku heterodimer, Pol -dNTP complex would bind preferentially to the downstream DNA duplex, due to the presence of a 5Ј-phosphate binding pocket. In this process, Loop1 is rearranged to stabilize Watson-Crick interactions in the microhomology bp across strands and excludes the binding of the up-dsDNA. Subsequently, Loop1 would be Structural model of the base selection mechanism of Pol displaced by the up-dsDNA positioning, driven by base-stacking interactions. Pol would then catalyze nucleotide incorporation to the primer DNA, allowing bridging between upstream and downstream dsDNA, followed by dissociation of the complex. Therefore, the catalytic cycle contains a separate step that checks Watson-Crick interactions at the nascent bp, independently of the upstream DNA molecule. This suggests for the first time a structural basis for the role of the specific Loop1 of Pol that includes the selection of the incoming nucleotide before binding the primer strand.
This step is actually the major difference between TdT and Pol , because such an intermediate state was never detected during extensive crystallization trials at various WT TdT/DNA ratios. Indeed, Loop1 is always ordered in TdT and structured in such a way that it excludes the upstream template strand, but not the upstream primer strand (26). This further highlights the importance of stacking interactions between the incoming nt and the 3Ј base of the primer in TdT, which indeed are known to play a major role in the nature of sequences added (13,30).

Similarity with the bacterial NHEJ system
Loop1 is mostly ordered in the binary structure with dNTP-Mg 2ϩ , in the absence of any primer strand (Fig. 9). Strikingly, the same type of intermediate structure is also present in bacterial Pol X from T. thermophilus (43), where it was suggested that the bacterial Pol X is always present in solution as a complex with one of the four dNTPs. This is important because phylogenetic studies of the Pol X family (43) suggest that eukaryotic Pol X members involved in NHEJ have a bacterial origin. We note that if the polymerase is already loaded with a dNTP prior to its binding to a synapsis of two DNA ends, it may incorporate a nucleotide that does not match the downstream DNA end, resulting in a template-independent mode (39). This is a critical point highlighted by our study.
In bacteria, NHEJ is promoted by PolDom, a member of the archaeo-eukaryotic primase (AEP) superfamily, whose folding is different from that of the Pol X family. In the bacterial Mycobacterium tuberculosis PolDom structure, there is also an intermediate state that contains the incoming nucleotide and only the downstream dsDNA (with no upstream dsDNA) and also a mobile loop, called Loop2, that can adopt two conformations and regulate the binding of a catalytic metal ion in the polymerase active site (24). The rotation of the side chain of one of the catalytic aspartates that interacts with an arginine belonging to Loop2 leads to an inactive catalytic site (Fig. S6). The relevance of such a complex in solution was demonstrated using FRET experiments and electrophoretic mobility shift assays (24). Although Loop1 of Pol is neither structurally nor topologically related in any way to Loop2 of PolDom, both loops inter- Initially, a complex composed of free Pol (apoenzyme) and dNTP is formed. The binding of downstream dsDNA, strengthened by the 5Ј phosphate-binding pocket, modifies Loop1 conformation to favor Watson-Crick interactions in the nascent bp but also prevents the binding of upstream dsDNA, including the primer. Subsequently, Loop1 is moved away, and the upstream dsDNA is recruited (DSB full synaptic complex), allowing nucleotide incorporation on the upstream primer (DSB post-catalytic complex). Finally, the enzyme and the bridged DNA dissociate to allow for the action of ligase IV. If the incoming dNTP does not form a Watson-Crick bp with the downstream template DNA end, then it is possible that the ternary complex (downstream duplex ϩ dNTP-Pol ) will disassemble. If the complex does not fall apart and the mismatched nucleotide is incorporated, then this would account for the low level of template-independent addition that is seen in Fig. 2C (bottom two boxes).

Structural model of the base selection mechanism of Pol
vene in stabilizing the catalytic site conformation. Also, both loops are able to promote the complete exclusion of the upstream dsDNA. It was postulated for PolDom that this preliminary step is responsible for nucleotide selection, prior to DNA bridging at the DSB site. The fact that similar observations can be made in the prokaryotic and eukaryotic NHEJ pathways suggests that this mechanism, which dissociates DNA bridging and fidelity, may have been selected twice in evolution. Because the folding topologies of the polymerases involved in this reaction are different, we may speak of convergent evolution for the mechanism of base selection by the NHEJ polymerase in bacterial (AEP family) and eukaryotic (Pol X family) systems.
In conclusion, the set of proposed structures of intermediates in the catalytic cycle of Pol described here represents a paradigm shift in the base selection mechanism in the Pol X family of DNA polymerases. Because of the high quality in the atomic details of this set of structures and of the high sequence identity between Pol and the Tdt-chimera, we can reliably model Pol in the context of the proposed complexes along the catalytic cycle, which might, ultimately, help in the rational design of inhibitors specific to this step of NHEJ and DNA repair, during which the DNA ends are made compatible before ligation.

Cloning and protein purification
The catalytic domain of mouse TdT and mouse TdT-was expressed and purified using the protocol described previously (25). The catalytic domain of human Pol was cloned, expressed, and purified using the protocol described previously (27). The full-length sequences of mouse TdT and mouse Pol were cloned into RSFDuet-1 expression vector (Novagen) fused to an N-terminal 14-histidine tag followed by a cleavage site for tobacco etch virus protease. The full-length sequence of TdTchimera contains the BRCT (breast cancer susceptibility C terminus) domain of mouse Pol (residues 1-140), the catalytic domain of mouse TdT (residues 141-510), and Loop1 of mouse Pol (residues 378 -406).
To keep the original TdT residue numbering everywhere, including after Loop1, the Gln 394 residue that is an insertion between the two Loop1 sequences is labeled differently (393B, Fig. 1B).
Proteins were expressed in Escherichia coli BL21-CodonPlus (DE3)-RIPL strain in Luria broth at 20°C for 16 h after induction by 0.5 mM isopropyl-D-thiogalactoside. The purification was done using nickel-nitrilotriacetic acid (Ni-NTA) chromatography, followed by overnight tobacco etch virus cleavage and heparin chromatography (GE Healthcare). All proteins were stored at Ϫ20°C in 25 mM Tris-HCl, pH 7, 300 mM NaCl, and 15% glycerol.

Oligonucleotides and DNA substrates
Oligonucleotides used for the NHEJ assay were synthesized by Integrated DNA Technologies, Inc. (San Diego, CA). Oligonucleotides were purified using 12% denaturing PAGE, and their concentration was determined by UV spectroscopy. 5Ј end radiolabeling of oligonucleotides was performed using [␥-32 P]ATP (3000 Ci/mol) (PerkinElmer Life Sciences) and T4 polynucleotide kinase (New England Biolabs). Unincorporated radioisotope was removed using Sephadex G-25 spin columns (Epoch Life Science). Duplex DNA substrates were created by adding a 20% excess of unlabeled oligonucleotide to the radiolabeled complementary strand. DNA substrates were heated at 95°C for 5 min and cooled at room temperature for 3 h and then at 4°C overnight. Sequences of oligonucleotides used in this study are as follows:  TAG TGG GTT CAG CAG GCA  TTG TGC TAT GAT CAA CCG AAT CTG TAC ATA TAT  CAG TGT CTG CAT CGT CGA CCT TGG AGG CAT CGG  GG-3Ј; HC119, 5Ј-biotin-CGA TAG TGG GTT CAG CAG  GCA TTG TGC TAT GAT CAA CCG AAT CTG TAC ATA  TAT CAG TGT CTG CAT CGT CGA CCT TGG AGG CAT  CTT TT-3Ј; JG163, 5Ј-GTT AAG TAT CTG CAT CTT ACT  TGA CGG ATG CAA TCG TCA CGT GCT AGA CTA CTG  GTC AAG CGG ATC GGG CTC GAC C-3Ј; JG166, 5Ј-CGA  GCC CGA TCC GCT TGA CCA GTA GTC TAG CAC GTG  ACG ATT GCA TCC GTC AAG TAA GAT GCA GAT ACT  TAA CAG G-3Ј. Asterisks indicate phosphorothioate linkages.
Oligonucleotides used for the polymerase activity test were purchased from Eurogentec and dissolved in 50 mM Tris-HCl (pH 8) and 1 mM EDTA. Concentrations were measured by UV absorbance using the absorption coefficient ⑀ at 260 nm provided by Eurogentec. Primer strand was 5Ј-labeled with [␥-32 P]ATP (PerkinElmer Life Sciences; 3000 Ci/mM) using T4 polynucleotide kinase (New England Biolabs) for 1 h at 37°C. The labeling reaction was stopped by heating the kinase at 75°C for 10 min. Upstream (5Ј-TAC GCA TTA GCC TG) and downstream (5Ј-P-GGC TAA TGC GTA) primers were mixed with template strand (5Ј-TAC GCA TTA GCC CCA GGC TAA TGC GTA), heated for 5 min up to 90°C, and slowly cooled to room temperature overnight.

NHEJ assay
In vitro NHEJ assays were performed as described previously (57). Briefly, NHEJ components were incubated with DNA substrates as indicated at 37°C for 1 h. Markers were generated under the same conditions. Reactions were terminated by heat-Structural model of the base selection mechanism of Pol ing at 95°C for 10 min, and samples were subsequently deproteinized using phenol-chloroform extraction. Extracted DNA was resolved using 8% denaturing PAGE and detected by autoradiography. Ligation efficiency was quantitated using Quantity One 1-D analysis software (Bio-Rad).

Junction sequence analysis
Sequence analysis of ligated DNA junctions was performed as described previously (57). Briefly, DNA was visualized by exposing dried radioactive gels to an X-ray film overnight. Ligated DNA products were eluted from the gel, and junction sequences were amplified from these products using PCR primers HC105 and HC114. Amplified junction sequences were TAcloned into pGEM-T Easy vectors (Promega) and transformed into electrocompetent DH10B cells. Transformed cells were plated on Luria broth-agar/ampicillin/X-gal, and white colonies were selected for sequencing.
Crystals of TdTchimera alone (apoenzyme) or mixed with ddCTP grew in 1 day at 18°C by mixing of 1 l of concentrated protein at 10 mg ml Ϫ1 and 1 l of mother liquor solution containing 20 -24% PEG 6000, 400 -800 mM lithium chloride, and 100 mM MES, pH 6. Crystals of TdT-chimera in the presence of nucleotide and dsDNA or gap-filling DNA (1 l of complex ϩ 1 l of mother liquor) grew in 1 day at 18°C in a solution containing 19 -25% PEG 4000, 100 -400 mM lithium sulfate, and 100 mM Tris, pH 8.5. Crystals were cryo-protected using one soaking step with 25% glycerol and then flash-frozen in liquid nitrogen. X-ray data collections were collected at the Soleil Synchrotron (Saint-Aubin, France) on Beamline Proxima-1 and at ESRF (Grenoble, France) on Beamlines ID23-1, ID23-2, and ID29.

Data processing, crystallographic refinement, and model validation
Diffraction data sets were processed using XDS (41) and CCP4 (46,47). Crystals of TdT-chimera apoenzyme alone or bound to ddCTP belong to space group P2 1 2 1 2 1 and diffract at 2.20 and 1.96 Å resolution, respectively. Crystals of TdTchimera in the presence of a 2-fold excess of dsDNA A5/T5GG and ddCTP diffract at 2.09 Å resolution and belong to space group P2 1 2 1 2. Crystals of TdT-chimera in the presence of a 4-fold excess of dsDNA A5/T5G and ddCTP diffract at 2.55 Å and belong to space group P2 1 . Finally, crystals of TdT-chimera in the gap-filling mode with a continuous template strand and dCpcpp diffract to 2.35 Å and belong to space group C2 with two molecules of TdTchimera per asymmetric unit. Molecular replacement was performed with the program Phaser using the PDB file 1JMS as a search model (48). Manual building by iterative cycles of model building and refinement was carried out with the software COOT (49) and BUSTER (50), using TLS parameters (51) in the last stages of refinement. The number of TLS groups was chosen by default by the program Buster. The quality of the models was assessed using MolProbity (42). Data collection and refinement statistics are reported in Table S1. Superimpositions of structures and figures were performed and generated with Chimera (52).

Modeling Pol complex with the incoming dNTP and the in trans templating strand (down-dsDNA)
We modeled the complete Pol sequence on the template of the TdT-chimera in a frozen backbone conformation, keeping intact the side chains of conserved residues (46%). We optimized the rotamers of the nonconserved residues globally using our mean field optimization algorithm implemented on our web server, http://lorentz.dynstr.pasteur.fr/pdb_hydro.php 6 (53). No major clash was observed between the modeled side chains or with the DNA in the resulting model. The model with both the incoming dCTP (and Mg 2ϩ ) and the down-dsDNA (without the third DNA strand) was inserted in a cubic box of dimensions such that the distance between the protein and the edges was at least 12 Å. The TIP3P water model was used, and Na ϩ ions were added to neutralize the total charge of the system. Force field parameters for dCTP were obtained with CGENFF, and the CHARMM36 force field was used for the rest of the system (54). All simulation runs were performed using NAMD (55). The package PSFGEN was used within VMD (56) to build missing atoms and create input files for NAMD. 50,000 cycles of conjugate gradient minimization were performed, and 1000 frames were collected; convergence occurred after about 15,000 cycles (Fig. S7).