Mycobacteria excise DNA damage in 12- or 13-nucleotide-long oligomers by prokaryotic-type dual incisions and performs transcription-coupled repair

In nucleotide excision repair, bulky DNA lesions such as UV-induced cyclobutane pyrimidine dimers are removed from the genome by concerted dual incisions bracketing the lesion, followed by gap filling and ligation. So far, two dual-incision patterns have been discovered: the prokaryotic type, which removes the damage in 11 – 13-nucleotide-long oligomers, and the eukaryotic type, which removes the damage in 24 – 32-nucle-otide-long oligomers. However, a recent study reported that the UvrC protein of Mycobacterium tuberculosis removes damage in a manner analogous to yeast and humans in a 25-mer oligonucleotide arising from incisions at 15 nt from the 3´ end and 9 nt from the 5´ end flanking the damage. To test this model, we used the in vivo excision assay and the excision repair sequencing genome-wide repair mapping method developed in our lab-oratory to determine the repair pattern and genome-wide repair map of Mycobacterium smegmatis . We find that M. smegmatis , which possesses homologs of the Escherichia coli uvrA , uvrB , and uvrC genes, removes cyclobutane pyrimidine dimers from the genome in a manner identical to the prokaryotic pattern by incising 7 nt 5´ and 3 or 4 nt 3´ to the photoproduct, and performs transcription-coupled repair in a manner similar to dipyrimi-dine peak (presumptive CPDdamagesite) in the12- and 13-nt excision prod- ucts is consistent with the formation of the 5´ incision 7 nt from the CPD and the 3´ incision 3- or 4-nt from the CPD. Interestingly, the progressive loss of nucleotides from the 3´ end (going from 13 nt to 12 nt excision products) is consistentwith the degradation ofexcisionproductsbyanintracellular 3´ex- onuclease activity.

Nucleotide excision repair (excision repair) is a nearly universal DNA repair mechanism (1-5) that removes bulky lesions from DNA, including UV-induced cyclobutane pyrimidine dimers (CPDs) and (4)(5)(6) photoproducts, by concerted dual incisions bracketing the lesion (6). Although the concerted dual-incision mechanism is conserved from bacteria to humans (7), the proteins necessary for the dual incision in prokaryotes and eukaryotes are not evolutionarily related, and similarly the dual-incision patterns in prokaryotes and eukaryotes are different. In prokaryotes, proteins encoded by three genes, uvrA, uvrB, and uvrC, acting in concert, incise the damaged DNA 7 nt 5´and 3 or 4 nt 3´to damage (such as a CPD), generating an excised fragment of 12 or 13 nt (1,4,6). In eukaryotes, 6 repair factors encompassing 16 proteins incise 20 nt 5´and 5 nt 3´to the damage to remove the damage in the form of a 27-nt-long oligomer (7). Although the prokaryotic incision pattern is relatively narrow, in the range of 11-13 nt, in eukaryotes, the incision sites exhibit some variability such that in humans the exci-sion products range in size from 24 to 32 nt (median, 26-27 nt), and in yeast the median is 24 nt.
Although the dual-incision/excision pattern has been shown in numerous eukaryotes ranging from humans to Drosophila to Arabidopsis to yeast (5), the study of excision repair in prokaryotes has been largely confined to Escherichia coli and some thermophilic bacteria. Of the third kingdom of life, Archaea, only excision in Methanobacterium thermoautotrophicum has been analyzed, and it was found that this species performs prokaryotic-type dual incisions removing CPD in the form of an 11-mer (8). In fact, phylogenetic analysis of Archaea has shown that UvrA, -B, and -C homologs are found in all Euryarchaeota such as M. thermoautotrophicum and Halobacterium sp. NRC-1, but not in members of Crenoarcheota (9), which remains a major branch of the biological kingdom in which the presence (or the mechanism) of nucleotide excision repair is not known.
The mechanistic aspects of prokaryotic excision repair have been investigated in considerable detail in E. coli (3)(4)(5)(6). The damage is recognized by (UvrA) 2 or (UvrA) 2 UvrB, in which the ATPase activities of UvrA and UvrB provide the requisite energy for kinetic proofreading (5) for damage recognition and formation of a stable UvrB-DNA complex in which the DNA is kinked and wrapped around UvrB (10). This complex is recognized by UvrC, which contains endonuclease sites: the 5´endonuclease at the C-terminal half (11) which exhibits RNaseH homology (9) and the 3´endonuclease in the N-terminal domain (12) containing GIY-YIG homing endonuclease motifs (9,13). UvrC makes the dual incisions in a concerted manner, but under certain experimental conditions and in 5´endonuclease or 3´endonuclease active-site mutants, uncoupled incisions do occur (1). Nevertheless, the major products generated by WT repair proteins in E. coli are 12-and 13-mers (1,(4)(5)(6). Following the concerted dual incisions, UvrB and UvrC are displaced by UvrD (helicase II) and DNA polymerase I (14,15), and the gap is filled and ligated (16).
The same mechanism of damaged DNA incision and removal occurs in transcription-coupled excision repair (TCR). In TCR, template strand damage, which blocks RNA polymerase, is targeted for repair by the Mfd translocase. Mfd, widely distributed among prokaryotes, removes RNA polymerase and delivers UvrA and UvrB to the transcription-blocking damage to initiate repair as described above. However, characterization of TCR has largely been limited to E. coli, and to date, TCR has been assumed to occur similarly in all prokaryotes possessing full-length Mfd (17,18).
Against this background, then, it was rather unexpected when it was recently reported that in Mycobacterium tuberculosis (which possess the UvrA, B, and C homologs), the UvrC protein alone was capable of making dual incisions bracketing a bulky adduct 15 nt 3´and 9 nt 5´flanking the damage to generate a 25-mer, and it was concluded that dual incisions made by UvrC in M. tuberculosis are analogous to those made by its orthologs in yeast and humans (19). In light of this report, we decided to investigate dual incisions in mycobacteria using Mycobacterium smegmatis as representative of the genus. We find that M. smegmatis makes dual incisions analogous to those made in E. coli and other prokaryotes. By extension, the entire Mycobacteria genus, including M. tuberculosis, performs dual incisions in the prokaryotic and not the eukaryotic mode. We also find direct evidence for transcription-coupled repair in M. smegmatis, consistent with the presence of the mfd gene in this organism.

Results and discussion
Phylogenetic analysis of UvrC of E. coli, M. smegmatis, and M. tuberculosis Of the three proteins UvrA, -B, and -C, UvrC exhibits the most sequence variability among microbial species. Thus, in contrast to UvrA and UvrB, which reveal ;50% sequence identity across the phylogenetic tree, including bacterial and archaeal species, the UvrC protein is the least conserved, with an overall sequence identity of ;30% across species (9). In particular, the middle of the protein exhibits high variability. With this consideration, we decided to compare the sequence of E. coli UvrC (which contains the 5´and 3´incision active sites in the C-and N-terminal domains of the protein, respectively) with UvrC proteins of M. smegmatis and M. tuberculosis (Fig. 1). E. coli UvrC exhibits ;35% sequence identity with both mycobacteria, which themselves demonstrate ;75% sequence identity with each other. Importantly, the 3´endonuclease active site is embedded in the GIY-YIG intron endonuclease catalytic domain at the N-terminal half, and the 5´endonuclease active site is embedded in the RNaseH homology region in the C-terminal half of E. coli UvrC protein. Moreover, single amino acid mutations in these active sites have been reported to abrogate nuclease activity. As seen in the sequence alignment, the GVY-10aa-YVG motif is conserved in all three proteins, and most importantly, the Arg 42 residue that is essential for the 3´incision (12) and the Asp 399 and Asp 466 residues (E. coli numbering) that are essential for the 5´incision (11) are similarly conserved in all three UvrC proteins, suggesting that these residues are also responsible for the 3´and 5´dual-incision activities of mycobacterial UvrC.

M. smegmatis removes cyclobutane pyrimidine dimers from the genome by the prokaryotic dual-incision pattern
Because of biosafety restrictions, we conducted our mycobacterial experiments with WT (MC 2 155) and the uvrD1 mutant (MGM224) of M. smegmatis (20-23). The uvrD mutant was used because of the role of UvrD in excision product degradation, as discussed below.
To determine how M. smegmatis repairs CPD-damaged DNA, we first performed the in vivo excision assay recently developed in our laboratory. In this assay, following UV irradiation of cells and time for repair, the cells are lysed, and excision products containing CPDs are isolated, radiolabeled, and separated on a sequencing gel. The gel in Fig. 2 shows that CPDs are removed from the M. smegmatis genome principally in the form of 12-to 13-mers, with a bias to the shorter product in WT cell, as discussed further below. In contrast, the human cells release CPDs in the form of 26-to 29-mers.
We then performed XR-seq analysis on excision products isolated from UV-irradiated M. smegmatis. In this procedure, DNA fragments isolated from treated cells are ligated to adapters, photoreactivated to repair the CPDs, amplified by PCR, and sequenced by next-generation sequencing, and reads are mapped to the genome. In Fig. 3 (A and B), the reads are plotted to show the read length as a function of frequency. The prevalent reads are 12-to 13-mers in uvrD 2 cells and 10-to 13-mers in WT cells. The same pattern of additional small fragments in uvrD 1 versus uvrD 2 cells was seen in E. coli (24). The smaller excision products in WT cells are produced when UvrD releases excision products annealed to the genome, thereby exposing them to nucleases (24,25). UvrD-dependent degradation of excision products in WT cells also leads to the appearance of reduced levels of incision as seen in Fig. 2. The appearance of reduced repair in WT cells is an artifact. In uvrD 2 cells, UvrB and UvrC remain trapped with the (nuclease-protected) excised oligochromosomal DNA postincision complex, unable to function catalytically (14,15). In WT cells, the Uvr protein subunits do turn over, repairing more damage than in uvrD 2 cells, but the excised oligonucleotide is degraded to levels below the level protected in uvrD 2 cells.
Excision products were then analyzed by plotting the nucleotide distribution along the length of excision fragments (Fig. 3, C and D). As apparent, positions 8-9 nt from the 5´end and 5-6 (or 4-5) nt from the 3´end contain almost exclusively T-T (or more precisely dipyrimidine) dinucleotides, consistent with the excision assay in Fig. 2 and with the prokaryotic pattern of dual incisions 7 nt 5´and 3-4 nt 3´to the dipyrimidine. A different set of incision sites were described for M. tuberculosis repair, likely because of the use of indirect evidence for the 3í ncision site and the failure to use appropriate amounts of UvrA and UvrB in repair reactions (19).

Transcription-coupled repair in mycobacteria
Although TCR analysis has been performed in yeast, Arabidopsis, humans, and a few other eukaryotic species (26)(27)(28)(29), it has not been extensively performed in prokaryotes with the exception of E. coli (17,24,25), even though the TCR factor Mfd has been found in all nonendosymbiotic bacteria including mycobacteria (9). Here we use our XR-seq data to analyze TCR in M. smegmatis as a sentinel for all other Mfd-containing bacteria. Fig. 4A shows a screenshot illustrating levels of repair in each strand of a representative region of the M. smegmatis genome. Genes in this region exhibit divergent transcription, such that the TS for the mfd and ABK72606 genes is the 1 strand (repair in blue) and the TS for glmU and ABK72893 is thestrand (repair in red). Visual inspection reveals TCR in the strongly transcribed mfd, ABK72606, and ABK72893 genes. Quantification of repair in each gene is illustrated in Fig. 4B, which confirms TCR in these three genes, and also shows that TCR is absent in the nontranscribed glmU gene. These results are consistent with prior findings in E. coli and other organisms that show an association between transcription and TCR levels (24,25,27,28,29). We extended our analysis to include the top 25% of M. smegmatis genes over 100 bp. Relative repair of the two strands of these genes is illustrated in Fig. 4C as a frequency distribution plot of log2-transformed TS/NTS ratios. The shift of the histograms to positive values is indicative of the influence of TCR on the cellular repair response to DNA damage. The influence of TCR appears essentially the same in WT and uvrD mutant cells, which was also observed in E. coli (24,25). Overall, these results show that coupled repair is common among transcribed genes, which is a characteristic of TCR in E. coli, and likely occurs in all prokaryotes that possess full-length Mfd.

Conclusion
In conclusion, both phylogenetic and biochemical analyses indicate that M. smegmatis performs excision repair by the prokaryotic pattern. Even though for biosafety considerations we used M. smegmatis in our experiments, the phylogenetic data and the fact that M. tuberculosis UvrB and UvrD mutants are as UV-sensitive as the corresponding E. coli mutants (21) indicate that M. tuberculosis performs excision repair by the same mechanism: damage recognition by UvrA and kinetic proofreading by UvrB and dual incisions by UvrC in the UvrB-UvrC-DNA complex. A comparison of the eukaryotic and bac-terial (including M. smegmatis) dual-incision patterns is illustrated in Fig. 5.
Finally, we wish to make some comments on the so-called noncanonical functions of Mfd (18), in particular as it relates to the development of resistance to antibacterial drugs by promoting mutations in various genes involved in cell wall biosynthesis, translation, and transcription by an unknown mechanism. This has led to Mfd being referred to as a "proevolutionary factor" (18) or "evolvability factor" (30). Although it is clear that Mfd plays a role in recombination (31) and repair of doublestrand breaks, perhaps by aiding in resolution of R-loops and the consequent replication fork collapse (18,30), the requirement of Mfd for development of resistance to antimicrobial drugs remains poorly explained. It should be noted that the antimicrobial drugs to which resistance develops in an Mfd-dependent manner are not DNA-damaging drugs, and the role of Mfd in this phenomenon and the related stationary phase mutagenesis remain to be mechanistically defined. What is clear about Mfd function is that it is an antimutator gene. In fact, even though MFD was a phenomenon discovered for mutation avoidance in tRNA suppressor genes, the Mfd protein performs a genome-wide antimutagenic function in protein-encoding genes as exemplified by the fact that Mfd reduces UV-induced mutation frequency in the E. coli lacI gene 3-5-fold and at one particular site by more than 300-fold (32).

Experimental procedures
Excision and XR-seq assays M. smegmatis cultures (21) were grown with shaking at 37°C in Luria Broth containing 0.5% (w/v) Tween 80. Exponentially growing cells were transferred to R150 tissue culture dishes in volumes of 15 ml when at an A 600 of ;0.6 and then irradiated at room temperature with 100 J/m 2 UVC. For excision assay, the dishes were then incubated for 10 min at room temperature, and then the cells were chilled, harvested, and maintained on ice. Excision products were then isolated using a modified Hirt procedure that employs shearing to assist cell lysis. The cells were pelleted at 4°C, resuspended in ice-cold TE (10 mM Tris, 1 mM EDTA, pH 8.0), transferred to ice-cold Eppendorf tubes, and pelleted at 4°C, and the supernatants were removed. The pellets were then resuspended with 360 ml of icecold TE. Then 50 ml of room temperature 10% (w/v) SDS was added, and the cells were lysed by passage through a 26-gauge needle 40 times followed by incubation at room temperature for 20 min. A 120-ml volume of room temperature NaCl (5 M) was then added, tubes were gently mixed, and suspensions were incubated at 4°C overnight. After centrifuging at high speed for 1 h at 4°C, supernatants (;480 ml each) were taken, and each was mixed with and incubated with 12 ml of RNaseA (R4642; Sigma) for 1 h at 37°C and then with 12 ml of protein-ase K (P8107S; NEB) for 1.5 h at 60°C. Samples were then extracted twice with phenol/chloroform/isoamyl alcohol and precipitated with ethanol. The samples were immunoprecipitated with an anti-CPD antibody and washed as described (24,26). Extraction, precipitation, and labeling of the 39 ends with cordycepin were as described (24). One fmol of a 50-nt oligonucleotide was added to each sample before labeling to serve as a labeling and loading control. Samples were then extracted with phenol/chloroform/isoamyl alcohol and precipitated with ethanol and resolved with a 16% (w/v) polyacrylamide sequencing gel. We note that the Hirt procedure removes UvrB and UvrC that remain associated with DNA after dual incision and thus causes release of the excised oligonucleotide from genomic DNA. Excision assay of HeLa cells was as described previously (26). Then the excised oligonucleotides were isolated, mixed with a 50-mer internal control, 3´-end labeled, and separated on a DNA sequencing gel. Note the higher level of 12-mer detected in the uvrD mutant because the "excised" oligomer remains bound to the duplex and is protected from nucleolytic degradation in the stable postincision complex containing UvrB-UvrC proteins as seen in E. coli (24). Rep1 and Rep2 refer to duplicate samples. A separate gel was run to resolve excision products from HeLa cells. This gel (lane 8) was aligned beside the M. smegmatis gel (lanes 1-7). An accidental tear in the M. smegmatis gel between the 25-and 32-nt markers occurred after running the gel, which slightly alters the orientation of the lanes above and below this point. show that the predominant excision product length in uvrD 2 cells is 13 nt. In WT cells, the excision product is released from the genome and vulnerable to nuclease degradation (24), producing elevated levels of 10-12-nt sized products at the expense of the 13mer. The histograms in C and D show the nucleotide frequency (y axis) at each position (x axis) of the 11-13-nt excision products. Reads were plotted by locating the 5´residue at position 1. Notably, the position of the dipyrimidine peak (presumptive CPD damage site) in the 12-and 13-nt excision products is consistent with the formation of the 5´incision 7 nt from the CPD and the 3´incision 3-or 4-nt from the CPD. Interestingly, the progressive loss of nucleotides from the 3´end (going from 13 nt to 12 nt excision products) is consistent with the degradation of excision products by an intracellular 3´exonuclease activity.
XR-seq was by procedures previously described (24) using excision products isolated as described above. Approximately 20 plates of WT cells, which underwent 5-min repair were used, and ;8 plates of uvrD mutant cells, which underwent 10 min repair, were used for XR-seq. XR-seq libraries were pooled and then sequenced on the HiSeq 2500 Rapid Run platform at the University of North Carolina-Chapel Hill High-Throughput Sequencing Facility.

Data analysis
At least 2 million uniquely mapped reads were obtained for each sample. Analyses of sequencing reads and data visualization were as described previously (24). Reads were trimmed to remove flanking adapter sequences by cutadapt (33), and then duplicate reads were removed by fastx_toolkit/0.0.14 (http:// hannonlab.cshl.edu/fastx_toolkit/index.html). Trimmed reads were aligned to the M. smegmatis WT strain MC 2 155 (GCA_000015005.1 was downloaded from EnsemblBacteria) by using bowtie2 with arguments 2f 2very-sensitive (34,35). The output sam files were converted into bam files by using samtools (36) and then were converted into bed files using bedtools (37). For Fig. 3, excised oligonucleotides were aligned using bedtools with arguments getfasta 2fi 2s. Aligned reads in the 5-35-nt range totaled 2,309,284 (WT) and 2,060,476 (uvrD). Oligonucleotide lengths and nucleotide distributions were plotted by R. The analyses in Fig. 4 used only 13-nt reads with the TT dinucleotide at positions 8 and 9. With this filtering, there were 278,237 Repair is plotted as reads per million total reads (RPM) with the scale from 0 to 20. This region includes actively transcribed genes (mfd, ABK72606, and ABK72893) and a nontranscribed gene (glmU). For mfd and ABK72606, the 1 strand (repair in blue) is the TS, and for glmU and ABK72893 thestrand (repair in red) is the TS. TCR is evident in the actively transcribed genes by visual inspection. B illustrates quantitative values for each gene in A as RPKM. The absence of TCR in the nontranscribed glmU gene is evident. C shows frequency distribution of log2-transformed TS/NTS repair (as RPKM) in the 25% most highly transcribed M. smegmatis genes over 100 bp. The dotted line represents where repair in the two strands is equivalent. The shift toward positive values reflects TCR (24).
(WT) and 606,989 (uvrD 2 ) reads. Bedtools was used to obtain reads for each gene's TS and NTS.
To determine the upper quartile of transcribed genes, we used a publicly available RNA-seq data set (data for cells in midexponential phase, 16 h) (38). SALMON (39) was used to obtain transcripts per million. The 25% most highly transcribed genes over 100 bp in length were selected for our analysis.

Data availability
The raw data and alignment data have been deposited in the Gene Expression Omnibus under accession number GSE159048. All codes used in this paper are available at https://github.com/ yanyanyangunc/DNA-Damage-Repair-Circadian-Clock/tree/ master/Mycobacterium_smegmatis_str_mc2_155. Figure 5. Nucleotide excision repair patterns for prokaryotes, archaea, and eukaryotes. In the two prokaryotes, E. coli and M. smegmatis, dual incisions occur 7 nt 5´and 3-4 nt 3´to the UV photoproduct, resulting in excision products predominantly 12-13 nt in length. In the archaea, M. thermoautotrophicum, dual incisions occur 5-6 nt 5´and 3-4 nt 3´to the UV photoproduct, resulting in excision products predominantly 11 nt in length (8). In yeast cells, the photoproducts are excised by dual incisions 13-18 nt 5´and 6-7 nt 3´to the damage (28). In humans, plants, and insects, the dual-incision sites are at 19-22 nt 5á nd 5-6 nt 3´to the UV damage (27,29,42). In eukaryotes there is considerable variability in incision sites, principally at the 5´incision site, giving rise to excision products with a median length of 24 nt in S. cerevisiae and 26-27 nt in D. melanogaster, H. sapiens, and A. thaliana.