Preferential Protection of Genetic Fidelity within Open Chromatin by the Mismatch Repair Machinery*

Epigenetic systems are well known for the roles they play in regulating the differential expression of the same genome in different cell types. However, epigenetic systems can also directly impact genomic integrity by protecting genetic sequences. Using an experimental evolutionary approach, we studied rates of mutation in the fission yeast Schizosaccharomyces pombe strains that lacked genes encoding several epigenetic regulators or mismatch repair components. We report that loss of a functional mismatch repair pathway in S. pombe resulted in the preferential enrichment of mutations in euchromatin, indicating that the mismatch repair machinery preferentially protected genetic fidelity in euchromatin. This preference is probably determined by differences in the accessibility of chromatin at distinct chromatin regions, which is supported by our observations that chromatin accessibility positively correlated with mutation rates in S. pombe or human cancer samples with deficiencies in mismatch repair. Importantly, such positive correlation was not observed in S. pombe strains or human cancer samples with functional mismatch repair machinery.

Epigenetic systems are often considered to be biological systems that function beside and beyond the genome (1). Most studies have focused on the mechanisms by which epigenetic regulators achieve differential gene expression without altering the genetic information contained in the DNA sequence. However, an interesting but much less clear question is whether epigenetic systems can directly impact the fidelity of the genetic system and affect the accumulation of DNA mutations.
Mutations continually arise during cell proliferation, development, and evolution and under pathogenic conditions. The mutation rate is controlled by the rate at which errors occur in the DNA sequence and the rate of DNA repair. Therefore, questions related to the impact of epigenetic systems on genetic sequence fidelity can be broken down into two parts: do epigenetic systems affect the rate at which errors occur in the DNA sequence, and/or do they affect the rate of DNA repair?
Upon the occurrence of a DNA synthesis error and a DNA mismatch pair, genetic fidelity is first safeguarded by the 3Ј to 5Ј exonuclease activity termed proofreading activity of the DNA polymerase (2). For mismatches that escape the surveillance of DNA polymerase proofreading activity, the mismatch repair system is the primary protection mechanism against the fixation of such a replication error or the occurrence of an acquired mutation during DNA replication (3)(4)(5). The mismatch repair pathway plays a crucial function in protecting genetic fidelity, because it is tightly associated with DNA replication and ensures that most DNA sequence errors that occur during DNA replication are repaired (3)(4)(5)(6)(7)(8)(9).
Analysis of spontaneous mutation rates in various organisms revealed that rates of spontaneous mutation are highly similar across different families within the same order, but they differ substantially among organisms from different orders (10,11), suggesting that a given organism displays a relatively constant rate of spontaneous mutation. Nevertheless, regional variation in the accumulation of mutations has been reported in several species (12)(13)(14)(15)(16)(17). More recently, it has been reported that nucleosome occupancy may impact the accumulation of mutations (18) and that heterochromatic regions tend to accumulate more mutations (15,17,19,20), suggesting that the epigenetic status of different chromatin regions may impact whether a mutation occurs or whether a mutation is repaired.
To directly investigate the impact of epigenetic regulators on the accumulation of mutations and to understand whether either the occurrence of a mutation or the repair of a mutation is predominantly affected by chromatin status, we took advantage of an experimental evolutionary approach by using several Schizosaccharomyces pombe strains that contain deletions of genes encoding key epigenetic regulators or mismatch repair components. We found that loss of the epigenetic players that we tested led to minor increases in the mutation rate. Interestingly, we discovered that the high genetic fidelity that was observed in euchromatic regions was largely caused by preferential protection by the mismatch repair pathway. Defects in the mismatch repair machinery led to profound changes in the mutation landscape in the genome in both S. pombe and human cancers.

Experimental Evolution in Fission Yeast Strains Containing
Deletions of Various Epigenetic Regulators-To investigate whether epigenetic regulators affect the fidelity of genetic information, we chose the fission yeast S. pombe as our starting model system for three reasons: 1) S. pombe grows quickly enough to allow an experimental evolutionary approach and to accumulate enough mutations to be useful in analysis within a reasonable period of time; 2) S. pombe has a small haploid genome that can be easily sequenced at sufficient depth to call mutations; 3) S. pombe has a more complicated epigenetic system than the budding yeast Saccharomyces cerevisiae (21)(22)(23)(24)(25)(26)(27)(28)(29)(30), which harbors heterochromatin regions marked by histone H3K9 3 dimethylation (H3K9me2) (23,27).
We chose the WT haploid S. pombe strain LD331: hϩ as our starting strain and generated KO strains for genes encoding six epigenetic regulators (Clr4, an H3K9-specific histone methyltransferase; Set2, an H3K36-specific histone methyltransferase; Pcf1, a subunit of the chromatin assembly complex CAF1; Slm9, a subunit of the chromatin assembly complex HIRA; Dcr1, an enzyme that generates siRNA and part of the RNAi machinery; Ago1, an siRNA-binding protein and part of the RNAi machinery); and Msh6, a protein that is a key component of the mismatch repair machinery (31) (Fig. 1A). All knock-out strains were verified by PCR (Fig. 1B) and high-throughput sequencing data.
Using an experimental evolutionary approach, all of the above strains were subjected to single cell bottleneck passages with five independent mutation accumulation lines per strain. During each passage, one single colony per line was picked up and streaked onto a new plate of solid YES medium containing full supplements (Fig. 1C). Each mutation accumulation line was numbered according to its colony number at the first passage, and this number was strictly retained only for its progenies to avoid mixing between individual mutation accumulation lines. In each passage, cells were estimated to experience between 19 and 21 mitotic divisions (Table 1). Therefore, at the time point corresponding to the 100th passage (P 100 ), the cells were ϳ2,000 generations from their starting point (P 0 ).
Genomic DNA was extracted from all of the lines at P 100 and P 0 and subjected to paired-end high-throughput sequencing (PE 100 or PE 125) ( Fig. 1C; see "Experimental Procedures" for details). The average coverage of all 48 of the samples (5 at P 100 and 1 at P 0 for each strain) ranged from 80-to 180-fold across the genome.
We developed a pipeline ( Fig. 2A) for defining the mutations that accumulated during our experimental evolution. Mutations at uniquely aligned reads were relatively easy to call using GATK (32). In contrast, mutations that occurred at multiple aligned reads were more challenging to call and have typically been abandoned during analysis in many previous studies. However, this would compromise the analysis of mutations at heterochromatic regions, which contain more abundant repeats. To improve our analysis in repetitive regions, we first employed PE 100 or PE 125 sequencing to obtain longer reads. Next, we randomly assigned one of the repetitive sites as the target site for multiple aligned reads, and then we excluded those identified mutations that resided exactly at the varied bases among the same group of repeats ( Fig. 2A). This approach allowed us to confidently call single-nucleotide substitutions and 1-bp indels in the repetitive regions. However, accurately identifying indels larger than 2 bp remained a challenging task. Nevertheless, based on an analysis of the uniquely aligned reads, we found that 1-bp indels accounted for the majority (Ͼ70%) of total indels (Fig. 2B). Therefore, we concentrated our subsequent analysis on single-nucleotide substitutions and 1-bp indels that were called at unique and multiple aligned reads.
Using the above-described pipeline, we identified a total of 3,321 mutations in all of the P 100 samples compared with their own P 0 samples (Fig. 3A). To validate our analysis pipeline, we chose one of the mutation accumulation lines (ago1⌬ P 100 -1) and performed Sanger sequencing of the regions surrounding all 23 of the mutations that were called by our analysis pipeline. Among the 23 mutations identified using high-throughput sequencing, only one single-nucleotide substitution at a highly repetitive region was not clarified using Sanger sequencing ( Table 2). This single-nucleotide substitution occurred in a region containing 10 different copies in the genome, and it was therefore difficult to validate it using Sanger sequencing. Therefore, we concluded that the vast majority (and potentially all) of the mutations identified by our analysis pipeline were true mutations.
Modest Protection of Genetic Fidelity by Epigenetic Regulators-For all eight of the strains tested, the wild-type strain accumulated the lowest number of mutations, and as expected, the mismatch repair-deficient msh6⌬ strain accumulated 30-fold more mutations (Fig. 3A). Interestingly, all of the epigenetic regulator-deficient strains that were tested displayed a modest but statistically significant increase in the total number of mutations, with the exception of the set2⌬ strain (Fig.  3B). This suggests that these epigenetic regulators may contribute to the protection of genetic fidelity in some degree. However, the total number of mutations accumulated in these lines was too small to allow deeper and more confident analysis, and we reasoned that the use of an experimental evolutionary system at a different scale would be required for such a purpose.
Preferential Protection against Indels by the Mismatch Repair Complex MutS␣-To investigate what kind of mutations were preferentially enriched in the strains containing either a defective mismatch repair pathway or compromised epigenetic regulatory pathways, we first calculated the total number of each type of single-nucleotide substitution. Generally, the total numbers of transitions (C:G Ͼ T:A and A:T Ͼ G:C) and transversions (G:C Ͼ T:A, A:T Ͼ C:G, A:T Ͼ T:A, and G:C Ͼ C:G) were similar in most strains, with the exception of the msh6⌬ strain (Fig. 3, C and D). This may not indicate that the mismatch repair machinery preferentially repairs transition type errors; instead, it is more probable that these results are a consequence of transition type errors occurring more frequently during DNA replication than transversion type errors (33).
Similar to what was observed in the wild-type strain, the strains lacking epigenetic regulators displayed a ratio of indel/ single-nucleotide substitution of ϳ0.9 (Fig. 4A), suggesting that they had a minimal impact on this process. In contrast, the msh6⌬ strain had a ratio of indel/single-nucleotide substitution of 5 (Fig. 4A). This observation is consistent with findings in previous studies that used budding yeast (34,35). This may be explained by the fact that the MutS␣ complex (Msh6 is a subunit of the MutS␣ complex) preferentially repairs mismatches with looped-out bases.
Interestingly, when we further analyzed the type of indels (insertions or deletions) that accumulated in these strains, we found that most strains, including the wild-type strain, accumulated 2-3-fold fewer deletions than insertions (Fig. 4B). This differs from what was observed in budding yeast S. cerevisiae (36), but it is consistent with a recent fission yeast S. pombe study (37). This is an interesting observation because it suggests that although both insertions and deletions create looped-out structures in double-stranded DNA, they are likely to be differentially sensed by the mismatch machinery, because insertions The arrows indicate positive knock-out clones. C, experimental scheme. Five independent mutation accumulation lines were created for each strain to undergo 100 single-cell bottleneck passages (P 0 to P 100 ). Mutations were identified using high-throughput sequencing.
are associated with looped-out structures at the newly synthesized daughter strand, whereas deletions are associated with looped-out structures at the mother strand. It appears that the fission yeast S. pombe mismatch repair machinery preferentially recognizes the latter. Indeed, the msh6⌬ strain accumulated ϳ4-fold more deletions than insertions, which reflects a Ͼ8-fold change compared with all of the other strains ( Fig. 4B), further supporting our notion.
The above observations were obtained from analysis of an experimental evolutionary process in which most of the mutations were classified as spontaneous mutations. We next examined the role of the mismatch repair machinery in protecting the genome against mutations induced by chemical mutagens. We performed chemically induced mutation experiments using mutagens such as methylnitronitrosoguanidine (MNNG) or ethyl methanesulfonate (EMS) in the msh6⌬, ago1⌬, clr4⌬, pcf1⌬, and wild-type strains. MNNG is a mutagen that alkylates the O4 position of thymine or the O6 position of guanine, whereas EMS is an alkylating agent that converts guanine to O-6-ethylguanine (38 -40). MNNG did not induce more single-nucleotide substitutions in the msh6⌬ strain than in the other strains (Fig. 5A), most likely because MNNG-induced mutations may not be the ideal substrate for the mismatch repair machinery, which primarily functions in proximity to the replication fork to repair DNA replication errors (8,9,(41)(42)(43). Nevertheless, MNNG treatment resulted a robust increase (10 -20-fold) in indel accumulation in the msh6⌬ strain compared with the other strains (Fig. 5, A and B), which supports our observation that the MutS␣ complex preferentially targets looped-out structures in double-stranded DNA that may lead to indels if unrepaired. EMS treatment generated fewer mutations ( Fig. 5A) in this experimental setup, but it also induced a higher number of indels in the msh6⌬ strain than in the other strains (Fig. 5, A and C). Taken together, the above results collectively indicate that the MutS␣ complex preferentially protects the genome against indels, which is consistent with its known biochemical features (44).
Preferential Protection of Genome Fidelity in Euchromatin by the Mismatch Repair Machinery-As we described previously, during our analysis of the experimental evolutionary data set, we separated the reads into two groups, the uniquely aligned reads and the multiple aligned reads ( Fig. 2A), for technical reasons. Surprisingly, we observed a striking difference in the number of mutations called for each of these two groups of reads. The msh6⌬ strain accumulated a much higher number of mutations that were within uniquely aligned reads than all of the other strains, but it had only a much smaller increase in the number of mutations that were within multiple aligned reads (Fig. 6A).
Because the initial aim of this study was to examine the role of epigenetic regulation in protecting genome fidelity, this unexpected observation immediately attracted our attention because the multiple aligned reads corresponded to repetitive sequences, which are often enriched in heterochromatin (45)(46)(47)(48)(49)(50)(51). Heterochromatic regions in S. pombe are marked by H3K9me2 (23,27). Therefore, we defined the boundaries of S. pombe heterochromatin and euchromatin using previously published H3K9me2 ChIP-seq data (52) (Fig. 6B) and analyzed the accumulated mutations in heterochromatin and euchromatin (Fig. 6C). Clearly, the msh6⌬ strain accumulated a disproportionally high number of mutations in euchromatin in comparison with all other strains (Fig. 6C).
Then we calculated the mutation rate by normalizing the number of mutations to the length of the corresponding genomic regions. The mutation rate was clearly 5-14-fold higher in heterochromatin than in euchromatin in all of the tested strains except for the mismatch repair-deficient msh6⌬ strain, in which the difference was reduced to ϳ1.5-fold (Fig.  6D). The increased mutation rate in heterochromatin has been reported previously in other organisms (15,17,19,20). Our finding that the loss of mismatch machinery effectively abrogates such trend suggests that the mismatch repair machinery preferentially protects genetic fidelity in euchromatin.
We also noted that although loss of Clr4 is known to disrupt heterochromatic silencing (21), it did not cause a decreased number of mutations within heterochromatin in comparison with the wild-type strain (Fig. 6C), which we believe can be explained by two potential reasons. First, the number of mutations in heterochromatin identified from the wild-type strain (a total of 11) was very small, and it could be hard to expect a further decrease upon loss of Clr4, because at this level, any stochastic changes during the experiments could affect the readout. Second, loss of Clr4 reverts heterochromatin silencing, but it may not fully alter the chromatin structure of heterochromatin. Of note, in mammalian cells lacking both homologues of Clr4 (Suv39h1 and Suv39h2), DAPI-dense regions remained, despite of disruption of heterochromatin silencing (53). This is an indication that the loss of H3K9 methylation does not fully abolish the structure of heterochromatin, and DNA-dense regions could still be formed in the cells.

Loss of Msh6 in S. pombe Altered the Chromatin Distribution of Mutations and Led to a Positive Correlation between Chromatin Accessibility and Mutation
Rate-The explanation for the elevated mutation rate in heterochromatin is unclear. Due to the late replicating nature of heterochromatin, it has been proposed that replication occurring at late S phase might be more error-prone during DNA synthesis (54 -56). However, this does not explain our observation that the low mutation rate in euchromatin became disproportionally increased by disrupting the mismatch repair machinery (Fig. 6D). We reasoned that preferential protection of euchromatin was probably due to the restricted access of the mismatch repair machinery at the heterochromatic regions. To directly interrogate the relationship between chromatin accessibility and mutation accumulation, we performed DNase I sequencing experiments and cross-examined chromatin accessibility data with our mutation accumulation data.
Currently, due to the large number of repetitive sequences, the reference genome of S. pombe is incomplete, especially at the heterochromatic region, which affects our DNase I sequencing analysis at heterochromatic regions. Therefore, we focused on euchromatin regions and analyzed the relationship between chromatin accessibility and the mutation rate within euchromatin.
The number of mutations accumulated in individual strains except msh6⌬ was insufficient for statistical analysis. Nonetheless, these strains share a similar global DNase I sequencing profile, and we decided to combine these strains as the msh6 wild-type group and pooled their mutations for further analysis. We first divided the yeast genome into 1-kb windows and scored their DNase I sequencing read density. Next we ranked all of these windows according to their read density and merged them into 100 units. We then calculated the mutation rate for each unit and plotted it against the DNase I sequencing read density. It is easily appreciated that the mutation rate displayed a strong positive correlation with read density in msh6⌬ strains (r 2 ϭ 0.61, slope ϭ 1.02) (Fig. 7A), and such correlation was not observed in the msh6 wild-type group (r 2 ϭ 0.09, slope ϭ 0.11) (Fig. 7B).
As a control analysis, the single-nucleotide substitutions induced by MNNG that could bypass mismatch repair (Fig. 5A) did not show a positive correlation between DNase I sequencing read density and mutation rate in msh6⌬ strains and other tested strains (Fig. 7C).
These data collectively suggest that the function of mismatch repair machinery is regulated by chromatin accessibility, and such regulation is one major player in setting the spontaneous mutation landscape across the genome.

Differential Contributions of Chromatin Accessibility in Human Cancers with or without Mismatch Repair Deficiency-
The above data suggest that the mismatch repair machinery preferentially protects open chromatin regions that are highly accessible. Notably, a large number of human cancers are associated with microsatellite instability resulting from deficiency in the mismatch repair pathway (5,6,40,(57)(58)(59)(60). Recently, it has been revealed that microsatellite instable cancer samples display altered mutation landscape, and mutations arising after the inactivation of mismatch repair are no longer enriched in heterochromatin relative to euchromatin (61). The same study proposed that such a phenomenon might be caused by replication timing or chromatin accessibility (61). These observations, together with our fission yeast data (Fig. 7, A and B) prompted us to investigate whether human cancers with or without microsatellite instability may display distinct patterns of accumulated mutations at genomic regions that display different levels of accessibility within euchromatin.
We analyzed a public exon mutation data set of colorectal cancer samples obtained from 246 patients (the International Cancer Genome Consortium database, release 18). Colorectal cancers often display increased mutation rates due to the inactivation of the mismatch repair or other DNA repair pathways (5,40,(57)(58)(59)(60)62). These colorectal cancer samples were classified into two groups, a microsatellite-stable (MSS) group and a microsatellite highly instable (MSI-H) group, which represent cancer samples with functional or deficient mismatch repair pathways, respectively. Cancer samples in the MSI-H group displayed 5-fold more mutations than the MSS group, on average (Fig. 8, A and B), which is consistent with their deficiencies in the mismatch repair pathway.
To validate our hypothesis that the mutation landscape in cancer samples may be influenced by the accessibility of the local chromatin environment, we cross-analyzed the ENCODE   DNase I accessibility sequencing data set of the colorectal origin cell line HCT-116 (63) and the aforementioned colorectal cancer exome mutation data set. We modified our fission yeast analysis pipeline slightly, by dividing the human genome into 10-kb windows, and we then selected windows containing at least 50 bp of exonic DNA (63,606 windows in total) for further analysis. We sorted all of these windows according to their DNase I sequencing reads and merged them into 100 units (DNase I sequencing reads from low to high, with units 1-99 containing 637 10-kb windows each and unit 100 containing 543 10-kb windows). We then calculated the number of mutations per megabase of exon length in each unit and plotted this number against the DNase I sequencing read density. For MSS cancers with a normal mismatch repair pathway, we observed that more mutations occurred within genomic regions with a lower DNase I sequencing read density ( total ϭ Ϫ0.86; Fig. 8C). We also noticed that this negative correlation appeared in two distinct linear phases. In genomic regions with DNase I sequencing read densities Ͻ2.2 (phase I), a negative correlation with a relatively sharp slope was observed (slope ϭ  Ϫ2.12, r ϭ Ϫ0.65), whereas in genomic regions with DNase I sequencing read densities Ͼ2.2 (phase II), a negative correlation with a much flatter slope was observed (slope ϭ Ϫ0.15, r ϭ Ϫ0.86) (Fig. 8C). We reasoned that in relatively more closed chromatin regions (phase I), chromatin accessibility is probably the major rate-limiting factor that determines the access of DNA repair machineries, and it therefore has a greater impact on the accumulation of mutations. However, the impact of chromatin accessibility becomes less profound in more open chromatin regions (phase II), which are more freely accessible in general.
Importantly, we observed a dramatic change in the mutation landscape in the MSI-H cancer samples. The overall negative correlation between the mutation rate and chromatin accessibility was diminished ( total ϭ 0.16; Fig. 8D). This was primarily caused by changes toward the opposite direction within the closed chromatin regions. Instead of the negative correlation that was observed in the MSS cancer samples, the MSI-H can-cer samples displayed a strong positive correlation between mutation rates and chromatin accessibility at closed chromatin regions (phase I) (slope ϭ 6.39, r ϭ 0.65). In contrast, the MSI-H cancer samples displayed much milder changes in the mutation landscape at open chromatin regions (phase II) (slope ϭ Ϫ0.34, r ϭ Ϫ0.58).
The above results indicate that in fission yeast and human cancer samples, some physical or biological aspect of chromatin structure contributes to the landscape of the mutation accumulation rate, probably involving differential accessibility of the DNA repair machinery to distinct chromatin regions

Discussion
The observation that the chromatin environment can affect the rate at which mutations accumulate and the landscape representing the distribution of mutations across the genome (19, 64 -66) has prompted many studies that aim to reveal the mechanisms underlying these relationships. The accumulation FIGURE 6. The mismatch repair-deficient strain displayed a preferentially elevated rate of mutations in euchromatin. A, number of mutations identified at uniquely (blue) and multiple (orange) aligned reads in the listed strains. The green line represents the ratio of mutations at multiple/uniquely aligned reads. B, genomic distribution of heterochromatin regions defined using H3K9me2 ChIP-seq data. *, heterochromatin at mating type regions is excluded for analysis in this study, because the reference genome of S. pombe is that of an hϩ strain that lacks the region between mat2 and mat3. C, summary of the number of mutations in heterochromatin and euchromatin of listed strains. D, mutation rate in euchromatin (blue) and heterochromatin (orange) regions in the listed strains.
of mutations is apparently the consequence of two potential causes, the rate at which DNA sequence errors occur and the DNA repair rate. Because regions with elevated mutation rates appear to be correlated with late replication of DNA sequences (16,61), it has been proposed that replication timing may be a major contributor to the uneven distribution in mutations observed across the genome, which may imply that late replicating forks are prone to DNA synthesis errors (16). On the other hand, nucleosome occupancy has been reported to reduce the accumulation of mutations, in particular the C to T transitions caused by cytosine deamination that probably occur less frequently in nucleosomal DNA that is protected by histones (18).
In this study, we attempted to directly investigate the contribution of epigenetic regulators to the control of the mutation rate using an experimental evolutionary system that allowed us to compare different S. pombe strains that contained deletions of various epigenetic regulators. In human cells, H3K36 methylation facilitates mismatch repair by recruiting human MSH6 (67). In our study, set2⌬ strain displayed no significant change of mutation rate. This could be explained by the fact that fission yeast S. pombe MSH6 lacks a PWWP domain that preferentially associates with methylated H3K36. We did observe a modest increase in mutation rates in strains lacking several other epigenetic regulators (Fig. 3, A and B), suggesting that loss of these epigenetic regulators, including components of the RNAi machinery and histone methyltransferase and chaperones, may affect the protection of genome fidelity. However, we also would like to point out that the observed level of change was relatively subtle, and the number of accumulated mutations was too small for further analysis. Because of these reasons, our observation may even be affected by stochastic events that occurred during the experiment procedure. We think that future investigations on a much greater scale may help to pin-point the roles of these epigenetic regulators in the regulation of genome fidelity.
On the other hand, we observed a much more profound role of chromatin in genome protection. An elevated mutation rate at heterochromatin has been observed previously in various organisms (15,16,19,35). A number of reasons have been proposed as potential explanations, including error-prone late replication, chromatin structure-mediated DNA sequence protection, chromatin accessibility for DNA repair, etc. The dramatically elevated mutation rate and its preferential distribution in euchromatin observed in the fission yeast S. pombe msh6⌬ strain (Fig. 6, C and D) provide direct evidence that the elevated mutation rate in heterochromatin is primarily due to preferential protection in euchromatin under wild-type conditions, whereas error-prone DNA synthesis during late replication may only be a minor contributor. Furthermore, we observed that the spontaneous mutation rate displayed a strong positive correlation with chromatin accessibility only when mismatch repair machinery was impaired, both in fission yeast (Fig. 7A) and in human cancer samples (Fig. 8D). These observations argue that chromatin accessibility-regulated mismatch repair is one major mechanism for an elevated mutation rate in heterochromatin regions. Although we cannot rule out contributions from potential error-prone DNA synthesis in late replicating heterochromatic regions, we identify that restricted access of the mismatch repair machinery is one important contributor to the elevated mutation rate in these regions.
One interesting question is why chromatin can play a role in regulating mismatch repair, a process that largely accompanies DNA replication, during which chromatin structure must be unpacked. Of note, unpacking of chromatin structure during DNA replication is a transient event coupled with the passage of the replication fork. DNA exists in the context of chromatin immediately before and immediately after the passage of the replication fork (68 -71). Thus, we reason that before the passage of the replication fork, chromatin accessibility can regulate the local concentration of mismatch machinery and therefore impact subsequent repair efficiency. Indeed, in mammalian cells, H3K36 methylation, a chromatin mark, facilitates the recruitment of mismatch repair machinery and promotes repair efficiency (67), which is a good example of the role of chromatin in facilitating mismatch repair during replication.
We hypothesize that the preferential protection of genetic fidelity in euchromatin has two beneficial roles. First, mutations in actively transcribed open chromatin regions may cause deleterious effects to cells and should therefore be preferentially suppressed. Second, the increased mutation rate observed in heterochromatin regions may allow the origin of new regulatory elements or even new genes during evolution. Indeed, it has been reported that many newborn enhancers in humans are derived from Alu repeat sequences (72), which are often embedded within less accessible genomic regions (73)(74)(75)(76)(77)(78). Finally, this study also provides direct evidence indicating a regulatory role for epigenetic information in the control of genetic fidelity in addition to its well understood role in the regulation of transcription.

Experimental Procedures
S. pombe Knock-out Strains-In this study, all knock-out strains (Fig. 1A) were derived from the haploid wild-type strain LD331: hϩ. Knock-out strains were generated according to the methods described previously (79). DNA fragments containing the KanMX6 cassette for recombination were amplified from the pFA6a-KanMX6 plasmid using long primers that covered the upstream or downstream flanking sequences of the target genes. All mutant strains were verified by positive and negative PCR tests, as shown in Fig. 1B. For positive tests, the upstream primers were designed within the KanMX cassette, and the downstream primers were designed according to sequences downstream of the stop codon of the target genes. For negative tests, both primers were designed within the corresponding target genes. These knock-out strains were also confirmed by subsequent high-throughput sequencing data.
Mutation Accumulation Lines and Estimation of the Number of Cell Generations-All 40 mutation accumulation lines were selected from single P 0 colonies of eight strains. They were then continuously passed using a single-colony bottleneck culture approach on solid YES medium containing full supplements (80). To minimize variation in cell generation across all of the strains, we consistently chose colonies that were ϳ1 mm in diameter for passage.
To estimate the number of cell generations in each passage, we randomly picked two colonies of each size, counted the cell numbers, and then estimated the number of cell divisions or generations. Cells went through ϳ20 generations in each passage under our culture conditions (Table 1).
DNA Library Preparation and Illumina Sequencing-Genomic DNA samples were sonicated into fragments with average length of 500 bp using a Covaris M220 sonicator, and then sequencing libraries were prepared using KAPA Hyper Prep kits (KK8504), according to the manufacturer's instructions. We also designed 24 paired-end sequencing adapters with barcodes to multiplex the sequencing experiments.
Forty experimentally evolved lines (five lines per strain for the ago1⌬, clr4⌬, dcr1⌬, msh6⌬, pcf1⌬, set2⌬, slm9⌬, and WT strains), eight corresponding P 0 strains, 15 MNNG-treated lines (five lines per strain for the clr4⌬, msh6⌬, and WT strains), and three corresponding untreated strains were sequenced using an Illumina HiSeq 2000 sequencer for PE 100 sequencing with an average insertion size of 500 bp. Two additional WT P 0 strains were sequenced using an Illumina HiSeq 2000 sequencer for PE 50 sequencing with an average insertion size of 200 bp. Ten lines (5 lines/strain for the ago1⌬ and pcf1⌬ strains) were treated with MNNG, with their corresponding untreated strains used as controls, and 25 lines (5 lines/strain for the ago1⌬, clr4⌬, msh6⌬, pcf1⌬, and WT strains) were treated with EMS, and their corresponding untreated strains were used as controls. All strains were sequenced using an Illumina HiSeq 2500 sequencer for PE 125 sequencing with an average insertion size of 500 bp.
Mutation Identification-Contaminated adapters were removed using Cutadapt (81). Then we used SolexaQA (82) (version 3.1.3) to filter the sequencing data. For each read, the longest contiguous segment with a quality score of Ͼ20 was retained. Reads were aligned to the fission yeast reference genome (NCBI Schizosaccharomyces_pombe_uid127) using BWA (83). Potential PCR duplicates were removed using samtools (version 0.1.18) (84). Then local realignment around indels was performed using IndelRealigner in the Genome Analysis Toolkit (GATK version 3.2.0) (32). UnifiedGenotyper in GATK was used with the parameters "-mbq 35 -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -minIndelFrac 0.7" to call mutations. VariantFiltration in GATK was used with the parameters "DP Ͻ 5.0 QDϽ2 FSϾ60 MQϽ40 MQRankSumϽ Ϫ12.5 ReadPosRankSum Ͻ Ϫ8.0" for single-nucleotide substitutions and with the parameters "DP Ͻ 5.0 QD Ͻ 2 FS Ͼ 200 ReadPosRankSum Ͻ Ϫ20.0" for INDEL to filter mutations. For single-nucleotide substitutions, the frequency of the mutant allele was required to be Ͼ75%. In addition, all mutations resulting in variations within the adjacent 10 bp were removed. Then evolved strains were compared with P 0 to identify mutations. For multiple aligned reads, we chose those with a mismatch ϭ 1 and randomly assigned one site for each mismatch. Then we set the MAPQ to 40 and subjected them to mutation calling and filtering with GATK. The same settings were applied for mutation calling in uniquely aligned reads. When the same mutations were found in different copies of one given repeat group, they were merged. For each mutation that was identified in the multiple aligned reads, the base compositions of the P 0 and P 100 (or untreated and chemically treated) samples at the corresponding mutation site were computed, and we retained those mutations with a p value of Ͻ0.05 (Fisher's exact test). If opposite single-nucleotide substitutions were identified, both of them were discarded.
Mutation Validation-We chose ago1⌬ P 100 -1 as the validation sample. From it, 23 mutations were identified using our data analysis pipeline. PCR primers were designed using Primer3 (libprimer3 release 2.3.6) (85,86), and DNA fragments containing mutation sites were amplified from genomic DNA. The amplified products were cloned for Sanger sequencing.
Categorization of Euchromatin and Heterochromatin Regions-The heterochromatin regions in fission yeast S. pombe were defined using previously published H3K9me2 ChIP-Seq data (52). The reads were aligned using Bowtie (87), and the peaks were called using MACS2 (88) with the parameter "-q 1e-6". Then peaks within 15 kb were merged to define the heterochromatin regions.
DNase I Sequencing-Nuclei from S. pombe were prepared as described (89), with some modifications. Yeast cells were collected, washed, and then suspended with 5 ml of lysis solution (5 mM 2-mercaptoethanol, 1 M sorbitol) per g of cells (wet weight). Cells were digested for 40 min at 28°C with 8 mg/ml Zymolase 20T (MP Biomedicals). The spheroplasts were washed with 1 M sorbitol and then resuspended in 4 ml of Ficoll buffer (18% (w/v) Ficoll, 20 mM KH 2 PO 4 , pH 6.8, 1 mM MgCl 2 , 0.25 mM EGTA, and 0.25 mM EDTA). After 13,000 ϫ g centrifugation, the nuclear pellet was washed by DNase I buffer (15 mM Tris-HCl, pH 7.5, 75 mM NaCl, 3 mM MgCl 2 , 0.05 mM CaCl 2 , and 1 mM 2-mercaptoethanol). The nuclear pellet was suspended in 2.4 ml of DNase I buffer/g of starting cells (wet weight). Aliquots of 200 l each were digested with 10 units/ml DNase I (Beyotime Biotechnology) at 37°C for 5 min. Digestion was stopped with 1% SDS and 10 mM EDTA, pH 8.0. Then 5% (v/v) proteinase K (20 mg/ml) was added and incubated overnight. 1 M NaClO 4 was added for maxima isolating DNA from chromatin, and DNA samples were treated with RNase A (10 mg/ml). A DNase I sequencing library was prepared similar to the above mentioned genomic sequencing library, with one modified step of size selection for DNA fragments between 160 and 400 bp.
The data set supporting the results of this article is publically available at the Sequence Read Archive of NCBI under the accession number SRP065655. All barcodes have been removed in our deposited sequencing data.
Correlation Analysis between Mutation Rates and DNA Accessibility in Colorectal Cancers-The somatic mutations called from colorectal cancer exomes were downloaded from the International Cancer Genome Consortium database (release 18). Exomes with 50 or more mutations were retained, including 179 samples in the COAD-US project and 67 samples in the READ-US project. The microsatellite instability data for these cancer samples were downloaded from the NCI, National Institutes of Health, TCGA Data Portal. The DNA accessibility data for the HCT-116 cell line were downloaded from two DNase I hypersensitivity data files in ENCODE.
We divided the human genome into 10-kb windows. Only windows with at least 50 bp of exonic DNA were retained. For each window, the mutation rate and DNase I sequencing read densities were calculated. Then the windows were sorted according their DNase I sequencing read densities and divided into 100 units. The total mutation rate and the mean DNase I sequencing read density were calculated for each unit.