Super hotspots and super coldspots in the repair of UV-induced DNA damage in the human genome

The formation of UV-induced DNA damage and its repair are influenced by many factors that modulate lesion formation and the accessibility of repair machinery. However, it remains unknown which genomic sites are prioritized for immediate repair after UV damage induction, and whether these prioritized sites overlap with hotspots of UV damage. We identified the super hotspots subject to the earliest repair for (6-4) pyrimidine–pyrimidone photoproduct by using the eXcision Repair-sequencing (XR-seq) method. We further identified super coldspots for (6-4) pyrimidine–pyrimidone photoproduct repair and super hotspots for cyclobutane pyrimidine dimer repair by analyzing available XR-seq time-course data. By integrating datasets of XR-seq, Damage-seq, adductSeq, and cyclobutane pyrimidine dimer-seq, we show that neither repair super hotspots nor repair super coldspots overlap hotspots of UV damage. Furthermore, we demonstrate that repair super hotspots are significantly enriched in frequently interacting regions and superenhancers. Finally, we report our discovery of an enrichment of cytosine in repair super hotspots and super coldspots. These findings suggest that local DNA features together with large-scale chromatin features contribute to the orders of magnitude variability in the rates of UV damage repair.

The formation of UV-induced DNA damage and its repair are influenced by many factors that modulate lesion formation and the accessibility of repair machinery. However, it remains unknown which genomic sites are prioritized for immediate repair after UV damage induction, and whether these prioritized sites overlap with hotspots of UV damage. We identified the super hotspots subject to the earliest repair for (6-4) pyrimidinepyrimidone photoproduct by using the eXcision Repairsequencing (XR-seq) method. We further identified super coldspots for  pyrimidine-pyrimidone photoproduct repair and super hotspots for cyclobutane pyrimidine dimer repair by analyzing available XR-seq time-course data. By integrating datasets of XR-seq, Damage-seq, adductSeq, and cyclobutane pyrimidine dimer-seq, we show that neither repair super hotspots nor repair super coldspots overlap hotspots of UV damage. Furthermore, we demonstrate that repair super hotspots are significantly enriched in frequently interacting regions and superenhancers. Finally, we report our discovery of an enrichment of cytosine in repair super hotspots and super coldspots. These findings suggest that local DNA features together with large-scale chromatin features contribute to the orders of magnitude variability in the rates of UV damage repair.
Nucleotide excision repair is a versatile repair pathway that removes a variety of bulky and helix-distorting lesions caused by DNA-damaging agents, such as UV, cisplatin, and benzo(a) pyrene (1,2). It has two subpathways: global repair, which repairs DNA lesions throughout the whole genome, and transcription-coupled repair (TCR), which preferentially removes DNA lesions from the transcribed strand (TS) of transcriptionally active genes (3,4). The two subpathways differ only in the damage recognition step and share the steps of dual incision bracketing the lesions, release of the excision products, repair synthesis, and ligation (5,6).
UV-induced DNA damage, if not removed efficiently, will lead to mutations and possibly carcinogenesis in humans. UV in sunlight is a known mutagen and causative agent of skin cancer (7,8), inducing DNA lesions, such as cyclobutane pyrimidine dimers (CPDs) and  pyrimidine-pyrimidone photoproducts [(6-4)PPs]. To better understand the molecular mechanisms of UV-induced mutagenesis and carcinogenesis, it is critical to identify the exact locations of DNA lesions and their repair efficiencies with single-nucleotide resolution on a genome-wide scale. With the advent of next-generation sequencing (NGS) technology, a number of NGS-based methods have been devised over the last 5 years to detect UV-induced DNA damage formation and repair across the whole genome (9), including Excision-seq (10), eXcision Repair-sequencing (XR-seq) (11), CPD-seq (12), translesion XR-seq (13), high-sensitivity Damage-seq (14), and adductSeq (15). Specifically, Damage-seq uses damage-specific immunoprecipitation and a high-fidelity DNA polymerase (which stops before the DNA damage during primer extension) to determine the exact positions of DNA damage (16); XR-seq directly measures the ongoing repair at a specific time point by isolating the excision products released during the repair for NGS (11,17), and it has been successfully applied to generate genome-wide repair maps of UV damage with singlenucleotide resolution in humans (11), Escherichia coli (18), Saccharomyces cerevisiae (19), Arabidopsis thaliana (20), mice (21), Drosophila melanogaster (22), Mycobacteria (23), and Microcebus murinus (24).
Formation and repair of UV damage are influenced by multiple factors, including transcription (11), transcription factor binding (14,(25)(26)(27)(28), post-transcriptional modification of histones (29), nucleosome positioning (12), chromatin structure (29,30), and 3D genome architecture (31). From the perspective of 3D genome organization, UV susceptibility generally is inversely correlated with chromatin accessibility (31). At the nucleosome level, however, CPDs favor the outward-facing rotation setting in a nucleosome, and (6-4)PPs tend to form in nucleosome linker regions (12,30). This is because the outward-facing rotation setting in a nucleosome has conformational flexibility to accommodate a CPD, and such flexibility does not alter the DNA structure dramatically. In contrast, PP formation requires greater DNA structure distortion; the nucleosome structure has no conformational flexibility for a (6-4)PP, except in linker regions. Depending on the nature of the individual transcription factor and the DNAdamaging agent, binding of a transcription factor to DNA may stimulate, inhibit, or have no effect on DNA damage formation (14,25).
For repair of UV damage, the accessibility of repair machinery plays an important role. Repair occurs earlier in open chromatin regions than in repressed regions (29), and late repair regions, such as heterochromatic regions and some transcription factor binding sites, are associated with higher mutation rates (27,29,32,33). We compared UV damage maps with repair maps and found that UV-induced DNA damage, measured with low depth of coverage, is uniformly distributed at a large-scale level and that the overall repair in the human genome is heterogeneous (14,29). A recent study reported CPD hyper hotspots located near genes in human melanocytes and fibroblasts and suggested that these hyper hotspots may drive direct physiological changes rather than cause rare mutations (15).
Despite recent progress in DNA damage formation and repair research, it is still unknown which genomic sites are prioritized for repair immediately after UV irradiation and whether those prioritized sites overlap hotspots of DNA damage. Furthermore, determining which genomic sites are subject to nucleotide excision repair at very late stages of damage removal will offer additional insight into the question.
In this study, we sought to identify these genomic sites. We performed (6-4)PP XR-seq at 1 min and 2 min after UV treatment and integrated previously published data, which include (6-4)PP XR-seq ranging from 5 min to 4 h (11,29) and CPD XR-seq as early as 12 min (22) following UV irradiation. Using these methods, we identified repair super hotspots and super coldspots for (6-4)PPs and repair super hotspots for CPDs. By comparing these repair super hotspots and super coldspots with other high-throughput sequencing datasets that measure UV damage formation, we showed that neither repair super hotspots nor super coldspots overlap hotspots of UV damage. Moreover, we demonstrated that repair super hotspots are significantly enriched in both frequently interacting regions (FIREs) and superenhancers. We also found an enrichment of cytosine in both repair super hotspots and super coldspots. Our findings suggest that both local chromatin structures (e.g., transcription factor binding and previously assembled repair machinery members in the proximity of super hotspots) and large-scale chromatin features make it feasible for DNA damage to be rapidly removed in repair super hotspots. This effective integrity maintenance at repair super hotspots may confer a selective advantage.

Profiling excision repair kinetics and UV damage formation
To identify which genomic sites are prioritized for nucleotide excision repair immediately after UV irradiation and which sites are subject to repair only at the latest stage of DNA damage removal, we designed an experimental and analytical framework to systematically investigate excision repair kinetics and UV damage formation over a time course. Removal of (6-4)PP occurs mainly through global repair and is completed within 4 h after UV irradiation (29,34,35). However, the removal of CPD requires both global repair and TCR, and the entire process takes days to complete (11,29,35). We have shown that global repair dominates CPD removal in the first 12 min after UV irradiation in normal human skin fibroblast 1 (NHF1) cells, and then at later time points, TCR also facilitates CPD removal (22). To avoid the confounding effects of transcription levels and TCR, we chose to focus on global repair of CPD and thus identified prioritized genomic sites for CPD repair in the first 12 min after UV irradiation. Figure 1A shows an outline of the experimental design we used to measure excision repair kinetics and UV damage formation. Specifically, we performed (6-4)PP XR-seq at 1 and 2 min after 20 J/m 2 UV treatment in NHF1 and adopted previous NHF1 XR-seq data for (6-4)PP repair at 5 min, 20 min, 1 h, 2 h, and 4 h (11,29) and CPD repair at 12 min (22). Refer to Table S1 for detailed XR-seq sample information. Damage-seq for both PPs and CPDs at 0 min in NHF1 cells (14) was also included to determine the distribution of initial UV damage formation. Because release and degradation of excision products occur simultaneously and XR-seq does not measure the absolute number of excision products over time intervals (11,34), it is necessary to perform XR-seq as early as possible to identify genomic sites that are subject to excision repair immediately after UV treatment. To determine the earliest time point and the optimal number of cells suitable for (6-4)PP XR-seq, we first performed in vivo excision assay at 0 and 2 min in NHF1 cells (Fig. 1B). As shown in Figure 1B, the primary excision products, ranging from 23 to 30 nt, can be seen at 2 min, but there are no degradation products at this time point and no signal at 0 min after UV treatment.
Based on this excision assay, we performed the (6-4)PP XRseq at 1 and 2 min to identify genomic sites subject to immediate repair after UV treatment. Analyses of the two biological replicates for (6-4)PP XR-seq show high reproducibility (Fig. S1). As expected, length distribution and nucleotide frequency for reads from (6-4)PP XR-seq (1 and 2 min) and CPD XR-seq (12 min) are in agreement with previously reported data ( Fig. S2) (11). Moreover, the TS/(TS + nontranscribed strand [NTS]) repair ratios in (6-4)PP XR-seq (1 min and 4 h) are on par with that in CPD XR-seq (12 min), indicating that the vast majority of DNA damage is removed by global repair by these time benchmarks (Fig. S3) (14,22).
Using genome-wide repair data from XR-seq, we performed principal component (PC) analysis (36) on the top 2000 highly variable genes to generate a low-dimensional representation of the data (Fig. 1C). PC analysis is a dimension reduction technique that extracts underlying structure of the data. It finds a sequence of linear combinations of the features/genes, as PCs, which have maximal variance. The first and second PCs (shown as PC1 and PC2 in Fig. 1C) are uncorrelated so that they can be uniquely estimated. Since TCR does not significantly contribute to the repair of the majority of (6-4)PPs in NHF1 cells, the first and second PCs do not differ between Super hotspots and super coldspots in the repair of UV damage the TS and NTS repair. Importantly, a reconstructed repair trajectory lines up well with the time points, suggesting that repair pattern differs over the time course (Fig. 1C).

Identification of repair super hotspots and super coldspots
We developed a computational framework to identify the early repair and late repair genomic sites by using time-course XR-seq data. Briefly, we first segmented the genome into consecutive bins of 50 bp long, then identified bins containing a significantly higher number of reads at early and late time points using a thresholding approach on the downsampled reads (Fig. S4). Figure 1D shows the distributions of read counts per genomic bin across all samples; we note enrichment of both early repair at 1 min and late repair at 4 h. In total, we identified 331 early repair genomic sites for (6-4)PP repair and 192 early repair genomic sites for CPD repair; we identified 105 late repair genomic sites for (6-4)PP repair (Tables S2-S4). These identified genomic sites are clusters of excision products, and we define the earliest-repair sites as repair super hotspots and the latest-repair sites as super  Super hotspots and super coldspots in the repair of UV damage coldspots. While this method was effective in identifying the top few hundred repair hotspots and coldspots, we also normalized and tested repair enrichment with a more rigorous Poisson log linear model (37,38) on the read count data. We found that the identified repair super hotspots and super coldspots show enriched repair levels compared with those that would be expected under the null (Fig. S5) and are scattered across the entire human genome (Fig. S6).
To gain further insight into the distribution of DNA damage, repair, and epigenomic markers around the identified repair super hotspots and super coldspots, we illustrate an example of each using screenshots. As shown in Figure 2, XR-seq signals from examples of repair super hotspots and super coldspots are separated by strand and plotted across all time points. We also include epigenomic signals from DNase-seq; ChIP-seq from ENCODE (39); and Damage-seq signals at 0 min after UV treatment (14). Specifically, the XR-seq signals from an example of a super hotspot for (6-4)PP repair decrease dramatically from 1 to 20 min, and they can be barely seen at 1 h ( Fig. 2A). In contrast, the XR-seq signals at a super coldspot for (6-4)PP repair, shown in Figure 2B, increase over the time course and peak at 4 h. Another representative super hotspot for CPD repair is shown at 12 min (Fig. 2C). As can be seen in Figure 2, the size of the three representative spots is in the range of 50 bp.
Neither repair super hotspots nor repair super coldspots overlap UV damage hotspots As previously reported (12,14,29), the accessibility of repair machinery to the damage sites is a key factor affecting the repair rates of UV damage. In addition, it is reasonable to assume that genomic sites with high levels of damage are Damage-seq + - Super hotspots and super coldspots in the repair of UV damage more likely to be subject to repair machinery than nearby sites with low levels of damage. A recent study assayed UV damage formation by adductSeq and freqSeq and reported a total of 157 hyper hotspots that acquired CPDs much more frequently than the genomic average in primary human fibroblasts (15). Of these CPD hyper hotspots, 83 are from the plus strand and 74 are from the minus strand, each with at least five recurrent sequence reads (15). To determine whether the identified repair super hotspots result simply from increased levels of UV damage, we first intersected the reported CPD hyper hotspots from Premi et al. (15) with the 192 CPD repair super hotspots that we identified; we found that none of our super hotspots overlap the reported CPD hyper hotspots.   Super hotspots and super coldspots in the repair of UV damage To further confirm and replicate this seemingly striking result, we analyzed genome-wide CPD damage data generated by our previously developed Damage-seq protocol (14) in order to quantify damage levels at 0 time point after UV irradiation with single-nucleotide resolution. After stringent quality control procedures (refer to the Methods section for details), we identified 91 damage hotspots from the plus strand and 78 CPD damage hotspots from the minus strand, each with at least 10 mapped reads (Table S5). Notably, these CPD hotspots are shown to be enriched for heterochromatin and repressed regions (Fig. S7), which is concordant with previous reports (31,40,41). Again, none of the CPD hotspots identified from this parallel Damage-seq platform overlap the CPD repair super hotspots.
We also compared the DNA damage levels for (6-4)PP and CPD from three independent sequencing technologies-Damage-seq (14), adductSeq (15), and CPD-seq (25)-at our identified repair super hotspots and super coldspots against those from randomly sampled regions over the genome. To account for the sparse sampling when measuring DNA damage by NGS, we extend the regions corresponding to the repair super hotspots, super coldspots, and random spots at both ends for 20 and 500 bp, respectively. Our results, shown in Figure 3, suggest that there is no significant difference in the damage levels between the three repair categories (hotspot, coldspot, and random spot). The zoom-in and zoom-out views of three examples of repair super hotspots and super coldspots in Figure 2 also indicate that the distribution of Damage-seq reads is relatively uniform in the flanking regions. Previous results have demonstrated that UV-induced DNA damage is indeed virtually uniform across the entire human genome, whereas repair is affected by a variety of factors (such as chromatin states and transcription factor binding), depending on the type of DNA damage (14). While we note that the shallow depth of coverage of Damage-seq can be a limiting factor, our results validate our conclusion that the identified repair super hotspots and super coldspots are not damage formation hotspots.

Repair super hotspots are enriched in FIREs and superenhancers
Early repair preferentially occurs in active and open chromatin regions because of the accessibility of repair machinery to damage sites (29,42). Moreover, replication time is correlated with chromatin accessibility (43), and higher levels of excision repair have been observed in early replicating regions (44). It is not surprising that these hundreds of repair super hotspots are enriched in open chromatin regions and early replication domains. Indeed, we found that chromatin accessibility is higher for repair super hotspots and lower for super coldspots (Fig. S8); we also found an enrichment of repair super hotspots at promoters and enhancers (Figs. S9 and S10). When we intersected the identified repair super hotspots with the segmented replication domains from human fibroblast cell line IMR90 (43,45), we found that these super hotspots are also significantly enriched in early replication domains (Fig. S11).
In the nucleus, the entire genomic DNA is hierarchically packaged to form a complex 3D genome architecture, which consists of multiscale structural units, including chromosome territories, A/B chromosomal compartments (46), topologically associating domains (TADs) (47), chromatin loops (48), long-range chromatin interactions (49), and FIREs (50). The 3D genome organization regulates a variety of cellular processes, such as transcription, DNA replication, and DNA damage formation and repair (51). It has been shown that DNA repair proteins bind at the boundary sites of chromosomally interacting domains in yeast cells, suggesting that this arrangement may promote the rapid repair of DNA damage in these regions (52). Despite recent progress in understanding UV susceptibility and repair efficiency in the context of genome architecture (31,52), it is still unknown how 3D genome organization affects the excision repair of UV damage in humans. We therefore sought to determine how this architectural feature of 3D genome organization contributes to the identified repair super hotspots and super coldspots by using the publicly available high-throughput chromosome conformation capture (Hi-C) data from the human fibroblast cell line IMR90 (53,54). Specifically, after quality control procedures and data normalization, we profiled FIREs using FIREcaller (50). After overlapping the repair super hotspots and super coldspots with the called FIREs (Table S6A), we found that a significantly higher proportion of repair super hotspots overlap FIREs-23.16% and 11.76% for (6-4)PP and CPD, respectively-compared with a genome average of 6.93% based on the profiled FIREs (Fig. 4A). Conversely, the overlapping proportion of (6-4)PP repair super coldspots is only 3.23%, significantly lower than the genome average (Fig. 4A).
FIREs have been previously reported to be enriched for superenhancers (55). We have demonstrated that the repair super hotspots are enriched in both FIREs and enhancers. We also observed that, across many cases, multiple enhancers that overlapped the repair super hotspots are from the same genomic regions (Fig. S12). As such, we expect that the repair super hotspots are also enriched in superenhancers, and we therefore adopted a list of previously annotated superenhancers in the human fibroblasts (56) (Table S6B). We found that, compared with a genome-wide average of 2.05%, the repair super hotspots are indeed enriched in superenhancers (5.14% and 4.69% for (6-4)PP and CPD repair hotspots, respectively), whereas none of the repair super coldspots overlap superenhancers (0% for (6-4)PP repair super coldspot) (Fig. 4B).
In addition, we detected significant interactions based on the Hi-C contact matrix using the Fit-Hi-C method (57) ( Table S6C) and showed that repair super hotspots also overlap with a significantly higher number of significant interactions (Fig. 4C). Refer to the Methods section for details on data analysis. The overlapping information of the called repair super hotspots and super coldspots with the profiled FIREs, superenhancers, and significant chromatin interactions are included in Table S7. Figure 4D illustrates the loop interactions of two identified repair super hotspots. Notably, these two hotspots also overlap with both FIREs and Super hotspots and super coldspots in the repair of UV damage superenhancers. Collectively, these results provide a global picture of genetic regulation of repair kinetics via 3D genome organization.

Enrichment of cytosine in repair super hotspots and super coldspots
As mentioned, large-scale chromatin features such as replication timing and FIREs affect UV damage formation and repair. Local chromatin structure (e.g., nucleosome and transcription factor binding) can also influence the distribution of UV damage formation and repair efficiency (30). Since the general size of our identified repair super hotspots and super coldspots is around 50 bp, we investigated the role of both local chromatin structure and large-scale chromatin features in these repair super hotspots and super coldspots.
To gain insight into how local chromatin structure contributes to the repair super hotspots and super coldspots, we performed sequence context analysis by using all reads mapped to the repair super hotspots and super coldspots, respectively. We trimmed the reads to 15 bp long, centering at the damage sites, and calculated strand-specific nucleotide frequencies in repair super hotspots, super coldspots, and randomly chosen spots. Interestingly, we identified an enrichment of cytosine in the flanking regions of the damage sites for both repair super hotspots and super coldspots (Fig. 5). We compared the cytosine frequency for repair super hotspots and super coldspots with that for the genomic bins used in this study. As shown in Fig. S13, the percentage of cytosine in both repair super hotspots and super coldspots is largely higher than that in the whole-genome regions.
Motif analysis by the MEME suite (58) confirmed the enrichment of cytosine adjacent to the damage sites, which are themselves enriched with canonical sequences of CTCA for (6-4)PP and TT for CPD (Table 1) (11). The predicted biological functions of these motifs include transcription-associated activities, such as regulation of transcription, sequence-specific DNA binding, transcription activator activity, and transcription factor activity. It has been previously shown that    Super hotspots and super coldspots in the repair of UV damage transcription factor binding can stimulate, inhibit, or produce no change on DNA damage formation depending on the nature of transcription factor and DNA-damaging agent (14,59). Likewise, transcription factor binding can decrease or increase local repair activity and consequently affect the mutation rate in these binding regions (25,27,28,33). Transcription factors (e.g., Ets-1) have also been shown to interact with DNA repair machinery in vivo (60,61). Thus, the identified repair super hotspots may be attributed to both local chromatin structures (e.g., transcription factor binding or partially assembled repair machinery in the proximity of hotspots) and large-scale chromatin features (such as TADs and FIREs). The rapid removal of DNA damage in repair super hotspots in critical regions of the genome may aid cellular survival.

Discussion
Chromatin features affect the distribution of DNA damage formation, repair efficiency, and subsequent mutational landscape (32,33,62). Knowing the exact genomic sites where the earliest-repair and latest-repair occur is critical for our understanding of the heterogeneity of DNA repair and mutation rate. We identified hundreds of repair super hotspots and super coldspots of UV damage that do not overlap with previously reported hotspots of UV damage and found that the repair super hotspots are enriched in FIREs (one of the features of 3D genome organization) and superenhancers. Furthermore, we discovered an enrichment of cytosine in areas flanking the damage sites in both the repair super hotspots and super cspots. This unique sequence context might be the target DNA sequence for binding of transcription factors that increase or decrease the damage formation and repair activity. The aforementioned local chromatin structures, as well as large-scale chromatin features, may therefore explain the formation of repair super hotspots and super coldspots.
Deciphering the interplay between DNA damage formation and repair efficiency is also crucial for the study of mutation distribution. Although a variety of NGS-based methods have been developed to detect DNA damage formation and repair over the last 5 years (9), XR-seq, to the best of our knowledge, is the only method that can be used to determine the genome- Super hotspots and super coldspots in the repair of UV damage wide repair super hotspots and super coldspots. In comparison with other approaches that measure repair indirectly by subtracting two large percentages of damage, XR-seq directly detects excision repair events with virtually no background noise. In this study, we managed to perform XR-seq as early as 1 min after UV irradiation, making it possible to detect the earliest repaired genomic sites; these sites could not be identified even at a 5-min time point in our previous study (29). With respect to the distribution of DNA damage formation, our previous results showed a uniform damage distribution pattern across the whole genome (14). However, because of the low coverage depth of Damage-seq, the uniform pattern can be observed only at a large scale (e.g., mega base), not at a small scale (e.g., kilobase). Both the sequencing depth and the scale are two important factors that we must consider when we interpret the distribution pattern of DNA damage. The damage formation hotspots used in this study may be underrepresented because of the low coverage depth, despite the computational approaches applied to identify hotspots of DNA damage formation using multiomics datasets. How and why cells prioritize these super hotspots for rapid repair but leave damage in super coldspots until the repair is almost complete is not completely understood at present. Here, our findings suggest that both large-scale chromatin features and local chromatin structures may determine the order of repair. This triage-like mechanism would allow cells to prioritize DNA damage removal based on their location in the genome. This type of triage takes place in other contexts. For example, it has been shown that the lamina-associated heterochromatin is more vulnerable than active euchromatin (31), and DNA repair factors (e.g., BRCA1) were discovered to bind to highly interacting regions within chromosomes (63). In response to ionizing radiation, cells exhibit an increased segregation of TADs, which may play a protective role against DNA damage (64). We found that repair super hotspots are enriched in FIREs (23.26% and 11.76% for (6-4)PP and CPD, respectively), whereas only 3.23% of (6-4)PP repair super coldspots overlap FIREs. The rapid removal of DNA damage in these active genomic regions such as FIREs may aid in cellular survival.
There may be several ways in which local chromatin structure contributes to the origin of repair super hotspots and super coldspots, including the binding of transcription factor, DNA sequence context, and the presence of partially preassembled DNA repair machinery. Some transcription factors (e.g., tryptophan cluster factors) have a stimulatory effect on UV damage formation (25,28), whereas other factors (e.g., SP1) have an inhibitory effect (14). This may be explained by the different levels of DNA conformational changes caused by the binding of different transcription factors; these changes may make the local DNA sequence more or less vulnerable to DNA-damaging agents (65).
In addition to the effect of transcription factor binding, the local sequence context itself, which shows cytosine enrichment in the areas flanking the damage sites, may be vulnerable to UV damage. Although we found that the observed repair super hotspots and super coldspots are not DNA damage hotspots, Table 1 Motif analysis of identified repair super hotspots and super coldspots Super hotspots and super coldspots in the repair of UV damage they can register high damage levels upon UV irradiation. The presence of preassembled DNA repair machinery at specific genomic regions likely also promotes repaid repair of DNA damage in these regions. Indeed, global repair complex was found to bind at chromosomally interacting domain boundaries in the absence of DNA damage; this preassembled DNA repair complex will initiate efficient repair at these regions (52). Moreover, most excision repair proteins are known to be involved in other genomic transactions: Transcription factor II H (TFIIH), an important protein complex, is both a general transcription factor for RNA polymerase II and an essential component of nucleotide excision repair complex (66); XPC, in complex with RAD23B, also functions in transcription (67); and XPG and XPF, two nucleases for dual incision in nucleotide excision repair, are also required to form chromatin looping through recruitment of the CCCTC-binding factor (68). In addition, it has been reported that EST1 interacts with DNA-dependent protein kinase and poly (ADP-ribose) polymerase-1 (60,69). Given this evidence, it is reasonable to propose the following model for a mechanism underlying the origin of repair super hotspots identified in this study: In active chromatin regions, stimulatory transcription factors bind to the repair super hotspots with unique sequence contexts that are vulnerable to UV radiation. Meanwhile, their interacting partners, the excision repair machinery, are positioned in close proximity to the super hotspots. Upon UV irradiation, higher levels of DNA damage are produced in repair super hotspots than in their adjacent regions. Immediately, the preassembled excision repair machinery will recognize and remove the damage through nucleotide excision repair. In this way, cells protect gene expression and survive the external stress of DNA damage. Conversely, in the case of repair super coldspots, the inaccessibility of the chromatin region and the sequence context's vulnerability to UV may explain why damage at these sites is not removed until repair of the entire genome is almost complete.
Collectively, our results identify repair super hotspots and super coldspots of UV damage in the human genome, which may be attributed to large-scale chromatin features and local chromatin structures. We believe that the methodology and data presented in this article will aid in future research on DNA damage, repair, mutagenesis, and carcinogenesis.

Experimental procedures
Cell culture and UV irradiation Human NHF1 cells were obtained from W. K. Kaufmann (University of North Carolina, Chapel Hill) (70) and cultured in Dulbecco's modified Eagle's medium with 10% fetal bovine serum at 37 C in a 5% CO 2 humidified chamber. For (6-4)PP XR-seq at 1 and 2 min time points, UV irradiation was performed as previously described (11,13). Briefly, the 80% confluent NHF1 cells in one Petri dish were irradiated for 20 s under a 250 nm UV lamp (1 J/m 2 /s) after removing the culture medium. Dulbecco's modified Eagle's medium with 10% fetal bovine serum medium at 37 C was immediately added into the Petri dish, then the medium was poured off, and the Petri dish was put on ice promptly at the end of 1 min or 2 min after UV irradiation. The time count starts from the end of 20 s UV irradiation and ends at the time point when the Petri dish is put on ice. The cells were washed one time with ice-cold PBS before being harvested by a cell scraper in 10 ml ice-cold PBS. In each replicate of (6-4)PP XR-seq experiment, 50 and 30 Petri dishes (150 × 15 mm) containing NHF1 cells were treated one by one at 1 and 2 min time points, respectively. Cell culture, UV treatment, and library preparation for (6-4)PP XRseq at 5 min, 20 min, 1 h, 2 h, and 4 h and CPD XR-seq at 12 min were performed in previous studies (11,22,29). For in vivo excision assay, UV irradiation was performed as aforementioned, and 10 and 5 Petri dishes (150 × 15 mm) containing NHF1 cells were used at 0 and 2 min time points, respectively.

Excision assay
The in vivo excision assay was performed as described (17,34). Following UV irradiation, the excision products were isolated by gentle cell lysis and nonchromatin fraction separation and purified by TFIIH immunoprecipitation. The purified excision products were then 3' radiolabeled by terminal deoxynucleotidyl transferase and [α-32 P]-3'-dATP and resolved in a 10% denaturing acrylamide gel. Ten and five Petri dishes (150 × 15 mm) of NHF1 cells were used at 0 and 2 min, respectively.

XR-seq library preparation and sequencing
XR-seq libraries were prepared as described in the previous protocol (17). Briefly, the excision products were isolated by TFIIH immunoprecipitation following gentle cell lysis and nonchromatin fraction separation and ligated with adaptors. The ligated excision products were then further purified by immunoprecipitation with anti-(6-4)PP antibody and repaired by PP photolyase before the library amplification by PCR. Libraries were sequenced on an Illumina HiSeq 4000 platform.
Data collection (6-4)PP XR-seq data at 5 min, 20 min, 1 h, 2 h, and 4 h were downloaded from the Gene Expression Omnibus (GEO) with accession numbers GSE67941 (11) and GSE76391 (29). CPD XR-seq data at 12 min were downloaded from GEO with accession number GSE138846 (22). CPD and (6-4)PP damage data of NHF1 by Damage-seq were downloaded from GEO with accession number GSE98025 (14); CPD damage data of NHF1 by CPD-seq were downloaded from GEO with accession number GSM2772322 and GSM2772323 (25); CPD damage data of human primary fibroblast by adductSeq were downloaded from GEO with accession number GSM4073616 and GSM4073634 (15). Hyper hotspots for UV-induced CPD damage in primary human fibroblasts were downloaded from the study by Premi et al. (15)

XR-seq bioinformatics preprocessing
For XR-seq, Cutadapt (72) was used to trim reads with adaptor sequence TGGAATTCTCGGGTGCCAAGGAACTC CAGTNNNNNNACGATCTCGTATGCCGTCTTCTGCTTG at the 3' end and discard untrimmed reads. Burrows-Wheeler Aligner (73) was used for alignment of single-end short reads. Unmapped reads and reads that map to multiple locations with the same alignment quality were removed using Samtools (bioinformatics tools written in C for manipulating NGS data) (74). Postalignment filtering steps were adopted using Rsamtools (http://bioconductor.org/packages/Rsamtools/). Specifically, if multiple reads share the same 5' and 3' end coordinates, we keep only one to perform deduplication. We also only keep reads that have mapping quality greater than 20 and are of lengths 21 to 31 bp.

Gene-level quantification of excision repair
Reads from the TS and NTS strands were separated using known gene annotations for the human genome assembly hg19 by ENSEMBL. We use reads per kilobase per million mapped reads for within-sample normalization for the XR-seq data. To perform gene-level quantification and downstream analysis including segmented regression, we adopted a stringent quality control procedure and only retained genes that (i) had at least 10 TT or TC dinucleotides from either TS or NTS; (ii) were less than 300 kb; and (iii) had at least ten reads in total across all XR-seq samples. In addition, we took the ratio of the reads from the TS and NTS (TS/[TS + NTS]) to remove biases and artifact that are shared between the two DNA strands, that is, library size, gene length, and other gene-specific biases, such as sequencing bias and antibody pull-down efficiency, and others. The ratio is bound between 0 and 1 and sheds light upon how TCR and global repair interplay (Fig. S3).

Identification of repair super hotspots and super coldspots
We started by segmenting the human reference genome into consecutive bins of 50 bp long. We then calculated the observed depth of coverage per bin by XR-seq, separating the plus-strand reads (+) and minus-strand reads (−). To mitigate the effect of library size/sequencing depth, we downsampled the reads in each sample to 7.7 million without replacement. To identify repair super hotspots and super coldspots, we set a threshold on the number of read counts per genomic bin in the 1 min and 4 h samples. Specifically, to identify (6-4)PP repair super hotspots, we require at least 15 reads mapped in both replicates at 1 min and at most five reads mapped in both replicates at 4 h. The read count threshold is relaxed for the identification of super coldspots, which have a smaller number compared with the super hotspots. For CPD repair, to avoid complications because of TCR at later time points, we focused on CPD repair super hotspots only.
In addition to the thresholding approach, we adopted a more rigorous cross-sample Poisson log linear model (37,38) for data normalization. Specifically, we denote Y as the observed repair matrix, with row i corresponding to the ith genomic bin and column j corresponding to the jth sample.
The "null" model, which reflects the expected coverage when there is no biologically relevant repair enrichment, is where N j is the total number of mapped reads for sample j (fixed for downsampled data), β i reflects the bin-specific bias because of library preparation and sequencing bias, and f j (TC i ) is the sample-specific bias because of TC (thymine and cytosine) content for damage/repair. The goal of fitting the null model to the data is to estimate the various sources of biases, which can then be used for normalization. We adopt a robust iterative maximum-likelihood algorithm (38) for estimating the parameters of the null model. Plus and minus strands are analyzed separately. Given a first pass of the calling algorithms, we identified strong repair super hotspots in pericentromeric regions, which were collapsed repeats annotated as unique sequences in the reference genome (e.g., ribosomal DNA (21)). It is important to exclude artifacts as stringently as possible, and thus we undertook an additional quality control step. "Blacklist" bins, including segmental duplication regions (http:// humanparalogy.gs.washington.edu/build37/data/GRCh37Gen omicSuperDup.tab), gaps in reference assembly from telomere, centromere, and/or heterochromatin regions (https://gist. github.com/leipzig/6123703), and repeating elements by RepeatMasker (https://genome.ucsc.edu/cgi-bin/hgTrackUi? g=rmsk) are masked in downstream analysis.

Hi-C data analysis
We adopted the Hi-C data of human fibroblast cell line IMR90 (53,54) to investigate the relationship between identified repair super hotspots and the 3D genome organization. We took the raw contact matrix with 40 kb resolution as input and detected FIREs, which play important roles in transcriptional regulations, across the entire genome using FIREcaller (50). To further investigate whether these repair super hotspots are involved in functional chromatin looping between regulatory elements and their target genes, we adopted the Fit-Hi-C approach (57) to identify long-range chromatin Super hotspots and super coldspots in the repair of UV damage interactions on all 40 kb bin pairs within a maximal 3 MB region. The interactions with p value <2.31e-11 were considered as statistically significant (75).

Data and code availability
The data reported in this article have been deposited in GEO with accession number GSE148303, and all remaining data are contained within the article. Scripts used in this article are available at https://github.com/yuchaojiang/damage_ repair.
Acknowledgments-We thank the Sancar Lab members and Drs Sheera Adar, Ming Hu, and Jeremy Simon for useful comments and feedback. Conflict of interest-The authors declare that they have no conflicts of interest with the contents of this article.