Single-stranded DNA Binding by the Helix-Hairpin-Helix Domain of XPF Protein Contributes to the Substrate Specificity of the ERCC1-XPF Protein Complex*

The nucleotide excision repair protein complex ERCC1-XPF is required for incision of DNA upstream of DNA damage. Functional studies have provided insights into the binding of ERCC1-XPF to various DNA substrates. However, because no structure for the ERCC1-XPF-DNA complex has been determined, the mechanism of substrate recognition remains elusive. Here we biochemically characterize the substrate preferences of the helix-hairpin-helix (HhH) domains of XPF and ERCC-XPF and show that the binding to single-stranded DNA (ssDNA)/dsDNA junctions is dependent on joint binding to the DNA binding domain of ERCC1 and XPF. We reveal that the homodimeric XPF is able to bind various ssDNA sequences but with a clear preference for guanine-containing substrates. NMR titration experiments and in vitro DNA binding assays also show that, within the heterodimeric ERCC1-XPF complex, XPF specifically recognizes ssDNA. On the other hand, the HhH domain of ERCC1 preferentially binds dsDNA through the hairpin region. The two separate non-overlapping DNA binding domains in the ERCC1-XPF heterodimer jointly bind to an ssDNA/dsDNA substrate and, thereby, at least partially dictate the incision position during damage removal. Based on structural models, NMR titrations, DNA-binding studies, site-directed mutagenesis, charge distribution, and sequence conservation, we propose that the HhH domain of ERCC1 binds to dsDNA upstream of the damage, and XPF binds to the non-damaged strand within a repair bubble.

To survive, cells require the ability to repair a plethora of DNA lesions. Therefore, cells contain several DNA repair mechanisms, including the versatile nucleotide excision repair (NER) 2 pathway, a conserved DNA repair machinery that can remove a wide variety of DNA lesions (1,2). Within a mammalian cell, [25][26][27][28][29][30] proteins are known to participate in two NER pathways: global genome and transcription coupled repair (3)(4)(5). Mutations in NER genes lead to impaired DNA repair. Presently, a dozen mutations in distinct NER genes have been identified in patients with eight overlapping phenotypes (6,7). Most patients carrying a mutation in NER genes develop two distinct symptoms: sunlight-induced skin cancer and segmental progeria without cancer (8,9).
ERCC1 and XPF form a stable heterodimeric complex that is essential for NER and functions as a structure-specific DNA endonuclease that is able to perform an incision 5Ј to the DNA damage (10 -13). Mutations in the ERCC1 and XPF genes can be linked to sunlight-induced skin abnormalities, late onset of skin cancers, neurodegeneration, and premature aging in both human patients and mice (7)(8)(9)14). In the absence of ERCC1, only a marginal amount of XPF is present in fibroblasts and CHO cells (11,13,(15)(16)(17)(18). This suggests that the in vivo stability of full-length ERCC1-XPF depends on tight association between the two proteins. Consistent with this finding, XPF and ERCC1 knockout mice exhibit similar phenotypes (19 -21). Furthermore, postnatal phenotypes of XPF and ERCC1 knockout mice suggest additional functions for ERCC1-XPF in double strand break repair (22), single strand annealing (23), interstrand cross-link repair (24,25), telomere maintenance (26,27), and gene-targeting events (28). All of these genome regulatory processes require binding of ERCC1-XPF at distinct DNA sequences, involving various protein complexes (29 -31).
Biochemical and structural studies revealed that the helixhairpin-helix (HhH) domain present in the C-terminal part of both proteins is essential for both ERCC1-XPF complex formation (11) and DNA binding (32)(33)(34). Structural studies by us and others showed that the HhH domain of the XPF protein serves as a scaffold for the correct folding of ERCC1, permitting formation of a stable heterodimer (34 -36). This is further emphasized by the reduced stability of the ERCC1 (F231L)-XPF complex (37), a mutation that leads to severe DNA repair defects and death in early infancy (38 -40).
A model for the binding of ERCC1 to a repair bubble was proposed before, where both the HhH domains of ERCC1 and XPF bind the ssDNA sequence (36). However, using NMR spectroscopy, we found that ERCC1 specifically recognizes dsDNA, probably through the hairpin sequences of the HhH domain of ERCC1 (34). Furthermore, the C-terminal ERCC1-XPF complex binds more tightly to ssDNA-dsDNA junctions, such as bubble and splayed arm substrates, than to either dsDNA or ssDNA alone (34). Previously, this led us to suggest that XPF might also contain an independent DNA-binding domain. We took advantage of an earlier observation that demonstrated that the isolated HhH domain of XPF is able to form a highly stable homodimer (41). Although XPF lacks one residue in the second hairpin motif, this domain adopts a canonical HhH domain structure (41). Using NMR, we showed that the homodimeric XPF HhH domain indeed binds ssDNA. Subsequently we determined the solution structure of XPF bound to ssDNA (42). We could show that besides nonspecific phosphate backbone contacts involving the second helix of the first HhH motif, a cavity is formed between the two motifs of the HhH domain, where a guanine base can be bound. These observations led us to propose that, in contrast to the model proposed by Tsodikov et al. (36), the ERCC1-XPF heterodimer recognizes DNA substrates involving the two individual DNAbinding surfaces present in ERCC1 and XPF that preferentially bind dsDNA and ssDNA, respectively (34).
Here we confirm and extend this model using in vitro DNA binding assays and NMR titration experiments, demonstrating the substrate preference of XPF and the ERCC1-XPF heterodimer for various DNA sequences. Based on these findings, we propose a model for the binding of the HhH domains of ERCC1-XPF heterodimers to DNA. In this model, the concerted binding of the HhH domains of ERCC1 and XPF to dsDNA and ssDNA, respectively, is essential for the correct positioning on the ssDNA/dsDNA junction.

Results
De Laat et al. (11) have shown that the C-terminal HhH domains of XPF and ERCC1 are indispensable for heterodimer formation and function. Similar to full-length ERCC1-XPF heterodimers, these HhH domains can together form stable complexes with various ss/ds junction-containing DNA, like bubble, hairpin, and splayed arm substrates (32)(33)(34). These findings suggest that structure-specific DNA binding by the ERCC1-XPF heterodimer is dependent on the HhH domain regions of both proteins. The ability of XPF to bind to ssDNA further supports this model (42). To elucidate the contribution of the HhH domain of XPF in ERCC1-XPF substrate preference, we first determine the binding preference of homodimeric XPF HhH domain to a variety of DNA substrates, as shown in supplemental Fig. S1.
We tested the binding of the XPF HhH domain homodimer to bubble10 (B10) because the ERCC1-XPF heterodimer can form a stable complex with this DNA sequence, as shown previously (34). Surprisingly, we noticed that the XPF homodimer binds to this substrate with even higher affinity (Fig. 1, A and B) than the ERCC1-XPF heterodimer (34). Quantification revealed an apparent K D of 0.5 Ϯ 0.1 M (Fig. 1B), which means more than 1 order of magnitude tighter binding than found earlier for the ERCC1-XPF heterodimer (34). It should be mentioned, however, that the ERCC1-XPF complex dissociates dur-ing electrophoresis, as is clear from the smear observed at the highest protein concentration, suggesting a faster on and off rate for the ERCC1-XPF complex for this substrate. If binding affinity would be determined based on the disappearance of free DNA, both complexes would bind with micromolar affinities. By performing binding experiments in lower-salt buffer (data not shown) or using agarose gels instead of polyacrylamide gel, dissociation is reduced significantly (Fig. 6). Also, under these conditions, the XPF HhH domain homodimer binds B10 DNA more tightly than ERCC1-XPF.
The Homodimeric HhH Domain of XPF Binds Synergistically to ssDNA-The binding preferences of XPF for various probes were evaluated using competition experiments where an excess of the non-radiolabeled B10 oligonucleotide is added to the reaction mixture. As shown in Fig. 1C, binding of XPF to radiolabeled B10 is competed by a non-labeled oligonucleotide, as shown by the expected exponential dissociation curve. The affinity of XPF for various DNA substrates can be determined by comparing the ability of various probes to compete for the binding of XPF to the B10 substrate.
Using these competition assays, we find that XPF is unable to bind to 10-bp dsDNA or short ssDNA probes (10 or 20 nucleotides). On the other hand, probes containing single/double strand junctions or longer ssDNA (39 nt) or dsDNA (30 bp) fragments are found to have binding affinity for the XPF homodimer (Fig. 1D). Interestingly, for the bubble and hairpin substrates, the length of the ssDNA stretch influences DNA binding. Fig. 1D describes that both hairpin 20 (H20) and B10 are better substrates than probes containing either longer or shorter ssDNA stretches. Also, a splayed arm with two ssDNA sequences is a better XPF substrate than any DNA sequence containing one ssDNA strand (data not shown). Taken together, these data show that the XPF HhH domain homodimer binds to ssDNA. We argue that stable complex formation involves both DNA binding surfaces of the symmetric XPF dimer that can bind simultaneously to either one long ssDNA fragment or to a conformationally restricted DNA containing two ssDNA stretches. In contrast, weaker binding is observed for short ssDNA sequences and hairpin or bubble substrates with shorter ssDNA sequences. These short sequences may occupy only one binding site of the XPF homodimer. These data support the idea that the synergistic binding of an ssDNA fragment to the two DNA binding surfaces is required for high-affinity DNA binding by XPF homodimers.
Preference of the Homodimeric HhH Domain of XPF for G-rich ssDNA-To determine whether XPF possesses sequence preference in ssDNA-binding experiments, competition experiments were performed using 20-nt homopolymeric ssDNA substrates with a 39-nt ssDNA probe. XPF shows strongest binding for poly(dG) substrates, competing as effectively as the larger 39-nt ssDNA substrate (Fig. 2, A and B), whereas poly(dT) and poly(dC) compete 10-fold less effectively. It is remarkable that the purine poly(dG) binds well to XPF, whereas the poly(dA) binds at least 2 orders of magnitude less efficiently. Because XPF prefers binding to a poly(G) sequence, we hypothesized that XPF might recognize the telomeric hexanucleotide repeat sequence TTAGGG. However, competition experiments indicate that XPF does not bind specifically to telomere sequences (data not shown).
Two Independent DNA Binding Surfaces Contribute to Substrate Recognition by the Heterodimeric ERCC1-XPF Complex-Previously we found that the ERCC1-XPF heterodimer has preference for a ss/dsDNA junction-containing substrate (34). Combined with the finding that homodimeric XPF binds preferentially to single-stranded DNA (Fig. 2), this suggests that the binding preference of XPF might also contribute to substrate recognition in the heterodimeric complex. We therefore performed SPR experiments with various DNA substrates (Fig.   3A). Addition of DNA prevented the binding of the His-tagged ERCC1-XPF to the Ni 2ϩ -loaded NTA surface of the SPR chip. By fitting response values corresponding to bound ERCC1-XPF against the concentration of these DNA substrates, we find that ERCC1-XPF can bind to ssDNA and dsDNA with a K D of 0.8 and 2 M, respectively. In agreement with earlier observations, the ss/dsDNA substrate has a 10-fold higher affinity for ERCC1-XPF than dsDNA of equal length. The higher affinity for splayed arm substrates underscores the importance of ss/ds ) is bound to 0.02 M B10 substrate. The relative competition efficiency is determined by quantification of the fraction bound in the presence of competitor. The ability to compete for B10 binding is compared with the competition obtained with non-labeled B10 substrate as shown in C. F, splayed arm. Using non-linear regression methods, this curve was fitted, and the competition obtained in the presence of the amount of heterologous probe was compared with the amount of B10 probe required to obtain the same inhibition of binding. For instance, a 10-fold relative competition efficiency means that 10 times more probe is required to obtain the same inhibition of binding by a given concentration of B10 competitor. If no competition is obtained at the highest amount of competitor, relative competition efficiency was estimated and is shown as being at least 500-fold less efficient. Mean Ϯ S.D. of four independent experiments is presented. junction-containing substrates for high-affinity binding by the ERCC1-XPF complex Using SPR, we further found similar binding preferences for guanine-rich DNA fragments to the ERCC1-XPF heterodimeric complex as for homodimeric XPF. The poly-dG 10 fragment binds to the heterodimer with a K D of 2.5 Ϯ 0.4 M, whereas the poly-dT 10 has a K D of 63 Ϯ 14 M. (Fig. 3B and supplemental Fig. S4). The binding affinity of dC 10 and dA 10 for the ERCC1-XPF heterodimer was about 2-and 4-fold lower than that of dT 10. These findings demonstrate that the XPF ssDNA binding surface is relevant for high-affinity DNA binding in the ERCC1-XPF complex.
Determination of the ssDNA Binding Surface in XPF-Next we performed NMR titration experiments to determine the DNA binding surfaces in XPF using a 10-nt ssDNA sequence. Significant chemical shift changes for the 10-nt ssDNA sequences were only obtained under low-salt conditions (Ͻ50 mM NaCl), in agreement with the higher affinity observed in in vitro DNA binding studies under these conditions (data not shown). Note that, in agreement with the determined weak binding, especially for the ssDNA and dsDNA substrate, saturation could not be reached, as is clear from the significantly lower chemical shift perturbations for these probes in comparison with the splayed arm substrate.
We first compared the earlier determined amide chemical shift changes upon addition of ssDNA to homodimeric XPF (supplemental Fig. S2) with the CSPs for the ERCC1-XPF complex. We found that a similar surface of XPF in the heterodimeric complex is affected by addition of the 10-nt ssDNA sequence involving helix ␤ and the following loop. Importantly, the DNA-binding surface of ERCC1 determined previously (supplemental Fig. S2) was not affected by addition of ssDNA ( Fig. 4 and supplemental Fig. S3). In addition to amide proton chemical shift changes in the 15 N-1 H HSQC spectra of ERCC1-XPF, the 31 P NMR spectrum of the ssDNA also reveals significant chemical shift changes upon addition of ERCC1-XPF, demonstrating complex formation (data not shown). The importance of the determined ssDNA binding surface of XPF for DNA binding by the ERCC1-XPF complex was further demonstrated by the 3-and 2-fold decreases in binding affinity upon mutation of His 857 and Lys 860 to alanine (see below). These experiments clearly demonstrate the ability of XPF to bind ssDNA in the heterodimeric ERCC1-XPF complex and show the importance of this ssDNA binding surface of XPF for substrate recognition.
The HhH Domain of ERCC1 Specifically Recognizes dsDNA-The above results (Figs. 1-4) suggest that ERCC1 and XPF have complementary roles in, respectively, dsDNA and ssDNA recognition that could dictate the high selectivity of ERCC1-XPF in binding ss/dsDNA junction substrates. To test this hypothesis, we performed NMR titration experiments with dsDNA (10 or 20 bp) and with various splayed arm probes containing the 10-nt ssDNA sequence that was used to determine the XPF-ssDNA structure. The binding surfaces for these probes were determined by following the chemical shift changes upon addition of DNA under various salt conditions (supplemental Table  1). The results of these experiments are summarized in Fig. 4, showing a representative set of 15 N-1 H HSQC spectra for a few of the most affected residues (Fig. 4A). By calculating the average of three to five independent titration experiments using various DNA sequences (supplemental Fig. S3 and Table S1), the most affected residues were identified and plotted on the surface of the ERCC1-XPF structure (Fig. 4B).
Importantly, upon addition of dsDNA, chemical shift changes were observed on the ERCC1 surface, whereas the XPF surface remained mostly unaffected. Mainly residues located in the second hairpin of ERCC1 show pronounced shifts, whereas a few residues from the first hairpin and surrounding helices are somewhat affected ( Fig. 4 and supplemental Fig. S3). The established dsDNA-binding surface of ERCC1 is similar to that found before (34) using hairpin DNA (supplemental Fig. S2). NMR studies using the splayed arm showed that, in addition to the dsDNA-binding surface of the ERCC1 protein, residues in XPF are also affected by the addition of this ssDNA-containing sequence. The most pronounced shifts in ERCC1 were found in the second hairpin regions, including Gly 276 and Gly 278 , whereas the first hairpin region encompassing Lys 243 -Thr 248 was affected to a lesser extent. For XPF, the regions 832-833 and 852-859 were mostly affected by the addition of splayed arm DNA. These experiments clearly establish that the hairpin regions of ERCC1 are involved in dsDNA binding, whereas the previously determined ssDNA binding surface of XPF is also involved in ssDNA binding in the heterodimeric complex.
To independently show that isolated ERCC1 can bind dsDNA, we took advantage of a recent observation revealing that the ERCC1-XPF complex dissociates during SPR experi-   (37). Subsequent addition of 30-nt ssDNA (6.25 M) to immobilized ERCC1 did not lead to an appreciable change in mass, whereas addition of the same concentration of the 30-nt dsDNA led to a significant increase in signal (Fig. 5). The on and off rates for ds30 binding to immobilized ERCC1 were determined to be 6 Ϯ 3 ϫ 10 3 M Ϫ1 s Ϫ1 and 5.5 Ϯ 0.1 ϫ 10 Ϫ2 s Ϫ1 , respectively, giving a K D of 9 Ϯ 3 M. This relatively low binding affinity agrees well with the observed binding affinities of the ERCC1-XPF heterodimer for dsDNA in EMSA (34), NMR (Fig. 4), and SPR experiments (Fig. 3B). Taken together, these results indicate that the two independent DNA-binding surfaces present in ERCC1 and XPF together contribute to both substrate specificity and overall binding affinity of the complex.
XPF Has a Preference for the Non-damaged Strand-Although the presence of two independent DNA-binding surfaces, which are probably occupied concurrently, permits positioning of the ERCC1-XPF heterodimer on ss/dsDNA junctions, it does not provide an explanation for the polarity of the cleavage. The binding preference of XPF could define this polarity by preferentially recognizing either the non-damaged (5Ј overhang) or the damaged (3Ј overhang) single strand.
We performed NMR titrations using a 10-or 20-bp stem substrate with the ssDNA sequence connected to either the 3Јor 5Ј of the stem. Both probes caused chemical shift perturbations in both ERCC1 and XPF upon binding of DNA irrespective of salt concentration or stem length. These changes mainly involve the two abovementioned ssDNA-and dsDNA-binding surfaces (Fig. 4). In agreement with the binding preference found in SPR experiments (Fig. 3B), the 5Ј overhang splayed arm gives slightly more pronounced chemical shift changes than the 3Ј overhang substrate. Although the overall CSPs were similar, the residues that were significantly affected are not identical, arguing that the two probes bind in a different way. In comparison with the splayed arm sequence, the DNA fragment containing the 5Ј ssDNA sequence binds to the ssDNA-and the dsDNA-binding surfaces in a highly similar way. Also, residues outside of these main binding surfaces show similar CSPs (Fig.  4). This indicates that most CSPs for the splayed arm substrate come from binding to the 5Ј ssDNA extension, suggesting that the non-damaged strand is the preferred substrate for XPF.
Mutation of the ERCC1 and XPF DNA Binding Interfaces Decreases the Binding Affinity-The full-length ERCC1-XPF complex processes ss/dsDNA junctions with high selectivity (32,36,43). Above, we described two DNA binding surfaces for XPF and ERCC1 proteins and explained the distinct roles of the XPF and ERCC1 helix-hairpin-helix domains in ss/dsDNA junction recognition. To validate this model, we mutated residues that could be in contact with DNA, as they show large chemical shift perturbations in DNA titrations (Fig. 4) or as they are located in between the two DNA binding domains that could therefore possibly affect the ability to bind the DNA substrate. Proteins were expressed and purified and concentrations normalized by SDS-PAGE. Despite the seemingly higher abundance of XPF, attributed to staining efficiency differences, no XPF HhH domain homodimer-DNA complex was detectable in the binding studies. The binding experiments were performed using an agarose gel instead of a polyacrylamide gel, which, in contrast to the results presented before ( Fig. 1 and Ref. 34), yields a more stable complex that enables us to determine an apparent dissociation constant of ϳ1 M for wild-type ERCC1-XPF under these conditions. For the XPF mutants K860A and H857A, residues that are in direct contact with ssDNA (42), the binding to B10 was significantly reduced compared with wild-type ERCC1-XPF (Fig. 6). However, mutation of the hydrophobic residue I876A did not affect binding significantly. Mutations outside the ssDNA-binding surface (N834A, K860A, and D871A) cause only a small decrease in affinity, whereas a 2-fold decrease in binding was noted for the double mutant Q838A and D839A.
Mutation of the positively charged ERCC1 residues (K247E, R283E, and R284E) surrounding the hairpin residues that show the largest CSPs led to highly reduced affinities (Fig. 6). Mutation of the ERCC1 residues Glu 261 and Gln 262 , both located outside of the dsDNA-binding surface, led to a much smaller 2-fold decrease in substrate binding.
Together, these experiments indicate the presence of two independent, functionally distinct DNA-binding surfaces in ERCC1 and XPF that both contribute to specificity and binding affinity. Mutation of residues that, in our model, are in contact with DNA significantly affected the ability of the ERCC1-XPF complex to bind ss/dsDNA substrates, underscoring the importance of these residues in substrate recognition.  Table S1). Free ERCC1-XPF spectra are shown in black, and spectra in the presence of a 4-fold excess of DNA are shown in red. For this experiment, the indicated DNA fragments (320 M) were added to 80 M ERCC1-XPF in a buffer containing 5 mM phosphate buffer and 100 mM NaCl. B, the determined mean CSP Ϯ S.D. of three to five independent titration experiments (supplemental Fig. S3) is plotted on the surface of the ERCC1-XPF structure in two different views rotated by 200°. All residues (ϳ25) that were significantly affected (composite average chemical shift Ͼ0.2 ppm) were colored. The most affected residues are shown in red (Ͼ0.8 ppm), and the other residues are colored relatively to this maximum chemical shift in red shades. Missing or unambiguous residues are depicted in gray. The position of the most affected residues is labeled on the surface. Shown is the experimental curve (response units(RU)) of subsequential loading (black arrow) of 0.1 M ERCC1-XPF complex to the Ni 2ϩ -loaded NTA SPR chip, followed by XPF dissociation, loading (blue arrow), and dissociation of ds30 probe and (red arrow) loading of the ss30 probe. From the ds30 experimental curve, the ds30 nonspecific binding to the chip surface was subtracted, fitted according to the Langmuir 1:1 binding model, and shown as an inset.

Discussion
DNA damage removal requires correct positioning of the ERCC1-XPF complex with respect to the lesion. We determined the substrate preference for the HhH domains of ERCC1 and XPF. We show that XPF binds ssDNA in a non-sequence specific fashion but with a preference for substrates containing guanine-rich sequences (Figs. 2 and 3). NMR titrations revealed that XPF, irrespective whether present as a homo-or heterodimeric complex, binds ssDNA sequences using the same surface (Fig. 4). Importantly, ssDNA probes do not bind to the dsDNAbinding surface of HhH domains. By using various splayed arm probes, we show that the two nucleic acid binding surfaces of ERCC1 and XPF within the heterodimeric protein can both be bound concurrently using approximately the same interaction surfaces as for their preferred substrates (Fig. 4). Kinetic experiments and site-directed mutagenesis support the view that the two separate binding surfaces are required for both specificity and binding affinity.
ss/dsDNA Junction Recognition by the HhH Domains of ERCC1-XPF-Tsodikov et al. (36) suggest that both HhH domains of ERCC1 and XPF contain ssDNA binding surfaces and that each specifically binds to one of the two arms of the DNA substrate. Our NMR titration studies using hairpin 22 (34), ssDNA (42), dsDNA, and splayed arm substrates ( Fig. 4 and supplemental Fig. S3) argue against such a model. We show that ssDNA is preferentially bound by XPF and not by ERCC1, whereas dsDNA substrates are specifically recognized by ERCC1 (Fig. 4). ss/dsDNA-containing probes make contact with both the XPF ssDNA-binding surface and the ERCC1 dsDNA-binding surface (Fig. 4). The importance of these separate DNA binding domains for recognition of ss/dsDNA sequences was confirmed by mutagenesis (Fig. 6).
The crystal structures of the Aeropyrum pernix XPF bound to DNA provide detailed insights into incision by XPF in archaea bacteria (44). Extrapolation of this structural information to eukaryotic repair factors is complicated, considering the distinct subunit composition and the different substrate specificities of both complexes (32,44). Therefore, the question of how the eukaryotic ERCC1-XPF complex recognizes ss/dsDNA junctions remains to be answered. Our previous DNA-binding studies revealed that the HhH domains of ERCC1-XPF show a similar substrate specificity as the native complex, suggesting that the HhH domain region is required and sufficient for substrate recognition (34). Using site-directed mutagenesis (Fig. 6) and NMR spectroscopy (Fig. 4), we demonstrated that a dsDNA-binding site is located near the tip of the two hairpin structures in ERCC1. This conserved DNA-binding surface is similar to other HhH domain proteins (45,46), including archaeal XPF (44). For the archaeal XPF homodimer, it was proposed that the HhH domains of the two protomers bind the two dsDNA sequences of a flap substrate (44). The structural homology suggests functional similarity, supporting the notion that ERCC1 recognizes dsDNA. Using the proposed model based on the A. pernix XPF (44), and knowing the polarity of the The ERCC1-XPF-DNA complex appears as a doublet, which might be due to the presence of two ss/ds junction binding sites in the B10 probe. F, free DNA; C, protein-DNA complex. C, quantification of the binding affinity of the indicated ERCC1 (red) and XPF (blue) mutants based on at least three independent experiments, calculated as mean apparent binding affinity Ϯ S.D. relative to the binding found for wild-type ERCC1-XPF. D, binding curves for a few mutants obtained by plotting the simulated binding curve based on the calculated apparent dissociation constants, based on three independent binding experiments as shown by the indicated symbols.
ERCC1-XPF heterodimer with respect to the damage, it can be expected that ERCC1 binds to the upstream dsDNA sequence, placing the catalytic domain of XPF in close proximity to cleave the damaged DNA strand (44,47). This does not exclude that other regions of ERCC1-XPF or other repair proteins can further substantiate substrate specificity. Indeed, we and others noted that the central domain of ERCC1 also contains an ssDNA binding surface (47,48). In addition, XPA and repair protein A (RPA), which bind to ssDNA and also interact with, respectively, ERCC1 (48) and XPF (49), can contribute to the correct positioning of the ERCC1-XPF complex near the damaged DNA. The presence of multiple weak DNA-binding surfaces within this DNA repair complex facilitates the correct positioning of the nuclease domain with respect to the damage and prevents inappropriate DNA binding and incision. Additional support for this model comes from a recent study by Su et al. (50) that shows that mutations of the individual DNA binding domains in full-length ERCC1 and XPF lead to a decrease in cleavage efficiency both in vitro and in vivo.
The XPF HhH Domain Preferentially Binds to the Non-damaged Strand-To determine which ssDNA strand within the repair bubble is bound by XPF, we modeled the ssDNA sequence into the ERCC1-XPF heterodimer structure based on the previously determined solution structure of homodimeric XPF bound to ssDNA (42). The dsDNA is positioned based on homology with the archaeal XPF-DNA structure (44,47). Assuming that the proposed models for dsDNA and ssDNA binding to ERCC1-XPF are correct, the gap between the dsDNA and the ssDNA can be filled by connecting the dsDNA fragment to either the 5Ј or 3Ј end of the ssDNA. As a result, in this model, XPF will bind to the damaged or the non-damaged strand, respectively.
If we assume that XPF binds the non-damaged strand (5Ј extension), the 3Ј end of the ssDNA would be connected to the dsDNA. Chemical shift changes for several of the residues in between these DNA binding interfaces were observed upon addition of a splayed arm substrate (e.g. Lys 243 , Met 856 ). Also, the significant decrease in DNA binding by the Q838A/D839A mutant argues that this part of the protein is contributing to binding. Furthermore, the overall positive charge and higher sequence conservation combined with the substrate preference (Fig. 3) argue that XPF preferentially binds the non-damaged strand (5Ј overhang) (Fig. 7, top panels).
On the other hand, if the dsDNA connects to the 5Ј end of the ssDNA (damaged strand), then the distance to the dsDNA would be substantially larger. Furthermore, the region between these two DNA binding surfaces is poorly conserved. Only small chemical shift perturbations are found upon addition of splayed arm substrate, and only limited effects on binding affinity by mutagenesis in this region were found. Combined with the overall negative charge for this side of the ERCC1-XPF molecule (Fig. 7, bottom panels), this would make the model where XPF would bind to a substrate with a 3Ј ssDNA extension, being the damaged strand in our model, unlikely. By combining the subtle CSP differences for the splayed arm substrate with the substrates containing one ssDNA strand ( Fig. 4 and, supplemental Fig. S3), the effect of mutagenesis on the ability of ERCC1-XPF to bind DNA (Fig. 6), the charge, and the sequence conservation (Fig. 7), we propose that XPF recognizes the non-damaged strand. This agrees well with the previously reported binding and incision preference for full-length ERCC1-XPF in the presence of RPA (51).
Substrate Preference of XPF-In vitro DNA binding experiments demonstrated a clear preference of XPF for ssDNA substrates (Figs. 1-3). Most ssDNA sequences tested were suitable substrates for XPF irrespective whether XPF was present as homodimer or as heterodimer with ERCC1. However, guanine stretches were more effective in XPF recognition than thymidine or cytosine, whereas adenosine was a poor substrate for XPF (Figs. 2 and 3). Thus, DNA binding affinity and, thereby, the ability to repair damaged DNA may not be entirely sequence-independent. There are a few in vitro studies that also suggest that cleavage is not completely sequence-independent; for example, cleavage of an identical sequence where an acetylaminofluorene adduct positioned at three distinct guanines led to significant variation in incision efficiency (52). Similar differences were found for benzo[a]pyrenyl-guanine lesions placed at various positions (53,54). In addition de Laat et al. (32) presented evidence for sequence specificity by showing that splayed arm substrates with distinct sequence composition around the junction were cleaved at distinct positions in the stem sequence, arguing that the incision position is somewhat dictated by the DNA sequence. Similar flanking sequence-dependent cleavage position variations were also found by Svoboda et al. (55). A more recent study by Bowles et al. (43) provides further support for differences in cleavage rate depending on the stem-loop sequence, although these studies show that the DEAH helicase-like domain is critically required for these effects. Interestingly, using excision repair sequencing, Hu et al. (56) succeeded in determining removed DNA sequences of excised DNA on a genome-wide scale. The position-dependent variation of the nucleotide sequences flanking the putative damaged pyrimidine dimers provides indirect support for context-dependent differences in cleavage efficiency; the relevance of this observation for DNA repair efficiency remains to be answered. This all argues that the incision position is weakly dictated by the DNA sequence. We propose that this is related to the observed preference of XPF for G-rich sequences, involving the recognition of a guanine base by XPF, as was found in the homodimer XPF-ssDNA structure (42).
Model Describing a Role for XPF in Sequence-dependent Incision-Following damage recognition, the ATP-dependent unwinding of DNA by transcription factor II H (TFIIH) (57) creates a DNA topology suitable for binding of RPA and XPA to the non-damaged and damaged DNA, respectively (58). Binding of these proteins further opens the damaged DNA and serves, through multiple interactions, as a platform for XPG and ERCC1-XPF, which subsequently perform the 3Ј and 5Ј incisions, respectively (12,59,60). This well orchestrated cleavage process (61)(62)(63) results in the removal of 24 -32 nucleotides both in vitro and in vivo. The substantial variation in both cleavage position with respect to the damage and the length of the removed sequence (5,64) suggests some heterogeneity in the cleavage mechanism, which is underscored by flanking sequence-dependent differences in cleavage efficiency in vitro (43,(52)(53)(54)(55) and in vivo (56).
The noted preference of XPF for G-rich sequences (Figs. 2 and 3), which is consistent with the structure of the XPF ssDNA complex (42), may dictate the binding of the ERCC1-XPF complex. The presence of one or a few specific nucleotides within the accessible ssDNA sequence (of the non-damaged strand) can determine where cleavage will occur via the positioning of ERCC1-XPF on the DNA (32). We propose that both the DNA sequence-dependent differences in cleavage efficiency and the heterogeneity in the cleavage position by the ERCC1-XPF complex are the result of the deoxyguanosine preference of the ssDNA-binding domain of XPF.

Experimental Procedures
Protein Expression and Purification-The HhH domains of the ERCC1-XPF heterodimer were expressed and purified as described before (34). Homodimeric XPF HhH domain expression and purification have also been described before (41). The ERCC1-XPF mutants were prepared using the QuikChange protocol (Stratagene) and expressed as the wild-type ERCC1-XPF complex (34). Because of the absence of tryptophan or tyrosine residues, XPF homodimer protein concentration was based on SDS-PAGE, which leads to relatively large errors, in part because of differences in Coomassie staining efficiency. Therefore, whenever applicable, we normalized proteins based on SDS-PAGE and used UV absorbance of the heterodimeric complex to quantify.
Electrophoretic Mobility Shift Assay-EMSA experiments were performed as described before (34,65) using the radiolabeled bubble 10 probe, ss39 or Holliday junction as substrate in a buffer containing 10 mM Tris (pH 7.5), 100 mM NaCl, 10% glycerol, 1 mM DTT, and BSA (final concentration, 20 g/ml). All oligonucleotides were purchased from Operon or Eurogentec and annealed by incubating the two mixed strands (final concentration, 50 M) for 5 min at 95°C, followed by a cooling step for 1 h in a solution containing 10 mM Tris (pH 8.0) and 100 mM NaCl. For the competition experiments, the indicated amount of competitor (supplemental Fig. S1) and the radiolabeled gel-purified probe were mixed in a tube, and, subsequently, the protein-containing solution (ϳ1 M) was added to this mixture. After incubation for 30 min on ice, samples were loaded on a 0.5ϫ Tris borate-EDTA-buffered 5% acrylamide gel, and electrophoresis was carried out for 2.5 h at 160 V at room temperature. Analysis and quantification were performed as described before (65). Alternatively, complexes were separated on a 0.5ϫ Tris borate-EDTA-buffered 3% agarose gel for 2 h at 80 V at 4°C.
Surface Plasmon Resonance Measurements-SPR experiments were performed in 10 mM Hepes (pH 7.5), 50 mM NaCl, and 0.005% (w/v) Tween 20 (SPR buffer) at 10 l/min at 12°C using a Biacore X system (Biacore AB) (66). The ERCC1-XPF HhH domain was dialyzed to the SPR buffer using Zeba Desalt spin columns (Thermo Scientific). Low-binding tubes and tips were used to prevent loss of the sample during the incubations and dilutions. Before each experiment, 5 l of 0.3 M Ni 2ϩ was loaded on flow cell 2 of the NTA sensor chip (Biacore AB), and flow cell 1 was used as a reference surface. 50 nM ERCC1-XPF HhH domain was incubated in SPR buffer on ice for 20 min in the absence or presence of different oligonucleotides (concentration ranging from 0.01-100 M). Then it was injected over the NTA sensor chip, followed by association for 60 s and dissociation for 120 s. Flow cell 1 (without Ni 2ϩ ) baseline curves were subtracted from the flow cell 2 experimental curves using Biaevaluation 3.2 software. Between consecutive injections, the chip was regenerated with 10 l of 0.25 M EDTA in 3.5 M guanidium (pH 8). All experiments were performed at least in duplicate.
Because the addition of DNA prevented the binding of the His-tagged ERCC1-XPF HhH domain on the Ni-NTA surface, the relative amount of (DNA-free) protein (F) was determined as a response value at the end of loading at 60 s (R 60 ) divided by the R 60 value of the ERCC1-XPF HhH domain in the absence of DNA. To calculate the apparent dissociation constant (K D app ) for binding of each oligonucleotide to the ERCC1-XPF HhH domain, the relative amount of the DNA-free protein (F) was fitted against the total oligonucleotide concentration and according to a simple 1:1 model of interaction using GraphPad Prism: (1-F) ϭ [DNA]/(K D app ϩ [DNA]), where F represents the relative amount of unbound ERRC1-XPF and K D app the apparent equilibrium binding constant. For ERCC1 DNA binding studies, 6.25 M ssDNA or dsDNA was loaded on the immobilized ERCC1 after dissociation of XPF from the ERCC1-XPF complex by extensive washing with binding buffer.
NMR Experiments-NMR titrations were followed by recording 15 N-1 H HSQC spectra of ERCC1-XPF by adding small volumes of a concentrated solution of commercially purchased DNA oligonucleotides (Eurogentec or Operon). The 15 N-labeled ERCC1-XPF protein and unlabeled DNA were dissolved in the same buffer containing 5-50 mM sodium phosphate buffer and ϳ10 -100 mM NaCl (pH 7.0). All NMR data were collected at 22°C on a Bruker DRX600 spectrometer equipped with a z gradient triple resonance cryoprobe or a Bruker Avance 900 spectrometer equipped with a 5-mm z gradient triple resonance probe. A set of 15 N-1 H HSQC spectra was acquired with successive addition of ssDNA, dsDNA, and splayed arm DNA substrates to 40 -100 M 15 N-labeled ERCC1-XPF. The NMR data were processed and analyzed as described before (34). To compare the chemical shift changes on the DNA backbone of a 10-nt ssDNA fragment (42) upon addition of protein, 1 H-decoupled 1D 31 P spectra of the free and bound ssDNA were acquired on a Bruker DRX500 spectrometer equipped with a QXI probe.
Author Contributions-D. D., M. F., and G. E. F. conducted most of the NMR experiments. L. K. performed and analyzed the SPR analysis, and most DNA binding studies were performed by G. E. F. R. B., R. K., and G. E. F. conceived the idea for the studies, and D. D. and M. F. wrote the paper with G. E. F.