The Myotonic Dystrophy Type 1 Triplet Repeat Sequence Induces Gross Deletions and Inversions*

The capacity of (CTG·CAG)n and (GAA·TTC)n repeat tracts in plasmids to induce mutations in DNA flanking regions was evaluated in Escherichia coli. Long repeats of these sequences are involved in the etiology of myotonic dystrophy type 1 and Friedreich's ataxia, respectively. Long (CTG·CAG)n (where n = 98 and 175) caused the deletion of most, or all, of the repeats and the flanking GFP gene. Deletions of 0.6–1.8 kbp were found as well as inversions. Shorter repeat tracts (where n = 0 or 17) were essentially inert, as observed for the (GAA·TTC)176-containing plasmid. The orientation of the triplet repeat sequence (TRS) relative to the unidirectional origin of replication had a pronounced effect, signaling the participation of replication and/or repair systems. Also, when the TRS was transcribed, the level of deletions was greatly elevated. Under certain conditions, 30–50% of the products contained gross deletions. DNA sequence analyses of the breakpoint junctions in 47 deletions revealed the presence of 1–8-bp direct or inverted homologies in all cases. Also, the presence of non-B folded conformations (i.e. slipped structures, cruciforms, or triplexes) at or near the breakpoints was predicted in all cases. This genetic behavior, which was previously unrecognized for a TRS, may provide the basis for a new type of instability of the myotonic dystrophy protein kinase (DMPK) gene in patients with a full mutation.

The capacity of (CTG⅐CAG) n and (GAA⅐TTC) n repeat tracts in plasmids to induce mutations in DNA flanking regions was evaluated in Escherichia coli. Long repeats of these sequences are involved in the etiology of myotonic dystrophy type 1 and Friedreich's ataxia, respectively. Long (CTG⅐CAG) n (where n ‫؍‬ 98 and 175) caused the deletion of most, or all, of the repeats and the flanking GFP gene. Deletions of 0.6 -1.8 kbp were found as well as inversions. Shorter repeat tracts (where n ‫؍‬ 0 or 17) were essentially inert, as observed for the (GAA⅐TTC) 176 -containing plasmid. The orientation of the triplet repeat sequence (TRS) relative to the unidirectional origin of replication had a pronounced effect, signaling the participation of replication and/or repair systems. Also, when the TRS was transcribed, the level of deletions was greatly elevated. Under certain conditions, 30 -50% of the products contained gross deletions. DNA sequence analyses of the breakpoint junctions in 47 deletions revealed the presence of 1-8-bp direct or inverted homologies in all cases. Also, the presence of non-B folded conformations (i.e. slipped structures, cruciforms, or triplexes) at or near the breakpoints was predicted in all cases. This genetic behavior, which was previously unrecognized for a TRS, may provide the basis for a new type of instability of the myotonic dystrophy protein kinase (DMPK) gene in patients with a full mutation.
Myotonic dystrophy type 1 (DM1) 1 is an autosomal dominant neuromuscular disease that exhibits a high incidence (ϳ1:8000) and shows frequent mortality in affected infants (1). An unstable region on chromosome 19q13.3 was discovered as the genetic basis of DM1. A polymorphic locus was found to be larger in DM1 patients (1)(2)(3), because of substantial expansions of a CTG⅐CAG repeat tract in the 3Ј-untranslated region of the myotonic dystrophy protein kinase (DMPK) gene (1). As many as 3000 repeats (9000 bp) have been found in some patients, expanded from the normal range of 5-37 repeats. DM1 displays a non-Mendelian inheritance pattern.
The molecular mechanisms responsible for this genetic instability have been extensively investigated in recent years in bacteria, yeast, cell culture, and mouse systems (reviewed in Refs. [1][2][3][4]. DNA replication (5)(6)(7)(8)(9), repair (10 -13), and recombination (14 -16) are involved, probably acting in concert with other factors/processes, such as single-strand DNA-binding proteins (17) and transcription (9,18). Also, the long CTG⅐CAG repeat tract can adopt an unusual flexible and writhed conformation (19), which may promote the formation of slipped structures (9,20,21) with a transiently formed, quasistable, long CTG sequence along with an unpaired and unstacked long CAG complementary strand. These types of preferential single-strand stabilities and DNA conformational behaviors are integral to the interpretation of the genetic instability effects of TRS orientation relative to the direction of DNA replication (1)(2)(3)(4)9). A 2.5-kbp poly(purine⅐pyrimidine) tract from the human polycystic kidney disease 1 gene (PKD1), which is known to form triplexes, slipped structures, and other non-B DNA conformations (22), induced long deletions and other instabilities in plasmids that were manifested by mismatch repair and, in some cases, transcription. The breakpoints occurred at or near the predicted non-B DNA conformations. Distance measurements also indicated a significant proximity of alternating purine⅐pyrimidine and oligo(purine⅐pyrimidine) tracts to breakpoint junctions in 222 gross deletions and translocations, respectively, involved in human diseases. In 11 of these deletions, which were analyzed in detail, the breakpoints were explicable by non-B DNA structure formation. Hence, Bacolla et al. (21) concluded that alternative DNA conformations trigger genomic rearrangements through recombination-repair activities. Also, a substantial literature is growing on the role of non-B DNA conformations involving low copy repeats in genomic rearrangements (deletions, inversions, duplications, translocations, etc.) associated with human diseases (reviewed in Ref. 23).
Here we have substantially extended these studies by exploring the capacity of CTG⅐CAG and GAA⅐TTC repeat tracts of various lengths, extents of interruptions (polymorphisms), and orientations to serve as mutagens, as a function of transcription. Most surprisingly, long CTG⅐CAG repeat tracts promoted the formation of inversions and long deletions (0.6 -1.8 kbp) that removed part or all of the repeats as well as the flanking GFP reporter gene. This behavior, if found in humans, implies that the DMPK gene flanking the long CTG⅐CAG repeats in patients may be subject to deletions and rearrangements, thus labilizing the DMPK protein.

EXPERIMENTAL PROCEDURES
Escherichia coli Strains-Strains JTT1 and AB1157 were obtained from the E. coli Genetic Stock Center, Yale University, New Haven, CT.
The E. coli strains KMBL1001 and CS5428 were obtained from Dr. Nora Goosen (Leiden Institute of Chemistry, The Netherlands). The E. coli strains JJC510 and KA796 were kind gifts from Dr. Benedicte Michel (Institut National de la Recherche Agronomique, France) and Dr. R. Schaaper (NIEHS, National Institutes of Health, Research Triangle Park, NC), respectively. The E. coli strain RW118 was obtained from Dr. Roger Woodgate (NICHD, National Institutes of Health, Bethesda, MD). Table I shows the relevant nomenclature and genetic  features of these strains. Parental Plasmids-The (CTG⅐CAG) n repeats were obtained from pRW3244, pRW3246, and pRW3248 that contain (CTG⅐CAG) 17 (5), (CTG⅐CAG) 98 (5), and (CTG⅐CAG) 175 (5), respectively. The inserts were cloned into the HincII site of the polylinker of the pUC19NotI vector (5,24). The (CTG⅐CAG) 17 and (CTG⅐CAG) 98 tracts are pure (uninterrupted), and the (CTG⅐CAG) 175 sequence contains two G to A interruptions at repeats 28 and 69 (5). All of the repeat tracts have 19 and 41 bp of nonrepetitive human flanking sequences 5Ј and 3Ј of the (CTG⅐CAG) n repeats, respectively (5,25).
The (GAA⅐TTC) n repeats were obtained from pMP141 and pRW3808 that contain (GAA⅐TTC) 9 and (GAA⅐TTC) 176 , respectively (26). Both repeating tracts are pure (uninterrupted) and have ϳ254 and ϳ354 bp of human sequences, respectively, flanking the repeats (26). pRW3808 is a derivative of pUC18NotI containing the (GAA⅐TTC) 176 fragment inserted into the BamHI site of the vector. pMP141 was a derivative of pUC18NotI containing the (GAA⅐TTC) 9 fragment inserted into the EcoRI and PstI sites of the vector (26).
Cloning of the (CTG⅐CAG) n and (GAA⅐TTC) n Sequences into pG-FPT-The strategy was to clone the TRS downstream from and in close proximity to the GFP gene. The fragments containing the CTG⅐CAG and the GAA⅐TTC sequences were recloned into the pGFPT vector plasmid (Fig. 1). The fragments were prepared as follows: the parental plasmids containing (CTG⅐CAG) 17 , (CTG⅐CAG) 98 and (CTG⅐CAG) 175 were digested with EcoRI and NotI (New England Biolabs, Inc.), and these inserts were used to obtain clones with the repeats in orientation I (defined under "Experimental Procedures" and see Refs. 5 and 25). In order to obtain clones in orientation II, the (CTG⅐CAG) 98 insert was excised using HindIII and NotI digestion. The recessed 5Ј terminus of this fragment was filled in with 0.1 unit of the Klenow fragment of E. coli DNA polymerase I (U. S. Biochemical Corp.) and the four dNTPs (0.1 mM each). To prepare fragments of CTG⅐CAG repeat (n ϭ 175 and 17) for cloning in orientation II, the parental plasmid pRW3248 was digested with NotI, whereas cleavage of pRW3244 with EcoRI/HindIII was followed by filling in the termini with 0.1 unit of the E. coli DNA polymerase I Klenow fragment for blunt-end ligation. The parental plasmid containing (GAA⅐TTC) 9 was digested with EcoRI and EcoRV, whereas the plasmid harboring (GAA⅐TTC) 176 was treated with EcoRI and XbaI.
The digested DNA fragments were separated by electrophoresis on 6% polyacrylamide gels in TAE buffer (40 mM Tris acetate, 1 mM EDTA, pH 8.0). The gels then were stained with ethidium bromide, and the bands containing the triplet repeat fragments were excised. The DNA was then eluted and further purified by repeated phenol/chloroform extractions and precipitation with ethanol (27).
For clone preparation with inserts in orientation I, the fragments containing (CTG⅐CAG) n (where n ϭ 17, 98, and 175) were ligated to pGFPT digested with EcoRI and EagI (EagI and NotI have compatible sticky ends). To prepare the vector for cloning (CTG⅐CAG) 98 in orientation II, the EcoRI/EagI digestion of pGFPT was followed by filling in the recessed 5Ј termini with 0.1 unit of the E. coli DNA polymerase I Klenow fragment for the blunt-end ligation. For the cloning of (CTG⅐CAG) n fragments (where n ϭ 17 and 175) in orientation II, the vector was linearized with StuI and EagI, respectively. The pGFPT vector for cloning the fragments containing the (GAA⅐TTC) n in orientation II was prepared as follows. To ligate the (GAA⅐TTC) 9 -containing fragment, the vector was cleaved with EcoRI and StuI, whereas for the ligation of the (GAA⅐TTC) 176 repeat sequence, the pGFPT was cleaved by EcoRI and SpeI (SpeI and XbaI have compatible sticky ends).
The digested DNA was electrophoresed, and the appropriate bands were excised and eluted, as described above. The vector and the insert were mixed in a molar ratio of ϳ1:10 and ligated for 16 h at 16°C by the addition of 20 units of T4 DNA ligase (U. S. Biochemical Corp.). The ligation mixture was ethanol-precipitated and transformed by electroporation (2.5 kV, cuvette size 0.2 mm) into E. coli HB101 (27) and plated on LB agar plates containing 100 g/ml ampicillin (Ap). The transformants were screened by using a long wavelength UV lamp and were used to prepare a liquid culture. The plasmids were isolated, purified by CsCl density gradient ultracentrifugation (27), and characterized by restriction mapping to verify the lengths of the cloned TRS. Also, the plasmids were sequenced using a ThermoSequenase Radiolabeled Terminator Cycle Sequencing Kit (U. S. Biochemical Corp.) The sequencing reactions were carried out according to the manufacturer's recommendations using the following primers: AP1 (CGAATTCGAGCTCGGTAC-CCGGG) and AP2 (GCAGGTCGGCCTCAGCCTGGCCG) (Genosys). The products of the sequencing reactions were analyzed on 6% Long Ranger Gels (FMC BioProducts), containing 7.5 M urea, in the glyceroltolerant gel buffer (U. S. Biochemical Corp.). The gels were dried and exposed to x-ray film. The superhelical forms of the DNA containing the undeleted CTG⅐CAG and GAA⅐TTC tracts in orientations I and II (5,25) were used for all subsequent experiments. The pGFPT vector was used as a control (Fig. 1).
Transformation of TRS-containing Plasmids into E. coli-pGFPT or its derivatives containing the TRS were transformed into the appropriate E. coli strains by electroporation (27). The transformation mixture was used to inoculate 10 ml of LB media containing ampicillin (100 g/ml) and isopropyl ␤-D-thiogalactoside (IPTG) (2 mM). The cultures were incubated at 37°C and grown until they reached an A 600 of ϳ0.8 -1.0 (18). An aliquot (10 l) was inoculated into 10 ml of fresh LB media (with Ap and IPTG, as before). Recultivations of the cultures were repeated through five growth cycles for these population studies. After the 1st, 3rd, and 5th growth cycles, the cell populations were harvested, and DNA was isolated by alkaline lysis using the Wizard Plus Miniprep DNA Purification System (Promega). The plasmids were then digested with EcoRI and EagI to release the CTG⅐CAG-and GAA⅐TTC-containing fragments, which were radiolabeled with [␣-32 P]dATP, and electrophoresed on a 7% native polyacrylamide gel. The gels were then analyzed using a PhosphorImager (Storm 820, Amersham Biosciences).
Conditions of Bacterial Growth for the Screening of White Colonies upon Transcription Activation or Repression-Six parental E. coli strains were used to evaluate the capacity of the TRS to induce mutations in the upstream reporter GFP gene. To identify single green and white colonies ("white CFUs"), the liquid culture from growth cycles 1, 3, and 5 were spread on LB agar plates containing Ap (150 g/ml) and IPTG (2 mM), and the plates were incubated at 37°C. By using a long wavelength UV lamp, green and white colonies were counted. In each experiment, ϳ100 white colonies from the 1st, 3rd, and 5th growth cycles were re-streaked three times on LB agar plates (Ap and IPTG as before) to be certain that the loss of fluorescence was a permanent phenotype (22). Approximately 30 white colonies were transferred to LB liquid media and grown at 37°C, and the DNA was isolated. This DNA was used for restriction analyses, DNA sequencing, and re-transformation into new competent E. coli cells to confirm the white phenotype. Parallel experiments were also conducted with the pI Q -kan repressor (28) (gift from Dr. Richard P. Bowater, University of East Anglia, UK) to turn off transcription (21). For the purpose of these studies, the E. coli strains (Table I) were first transformed with pI Q -kan expressing the repressor and then subsequently with the appropriate plasmids harboring the CTG⅐CAG and GAA⅐TTC repeats (Fig. 1). The transformation mixture was inoculated into 10 ml of LB media containing ampicillin (100 g/ml) and kanamycin (50 g/ml), and the recultivation assay was conducted as described earlier. After the 1st, 3rd, and 5th growth cycles, the cultures were plated on LB agar plates containing Ap (150 g/ml), kanamycin (50 g/ml), and IPTG (2 mM) and incubated at 37°C. After the green and white colonies were counted and the white ones re-streaked (three times), the liquid cultures were prepared, and DNA was isolated as described previously. The utilization of different parental strains enabled a preliminary survey of the potential role of genetic backgrounds on the mutagenic process.
Sequencing Primers-The primers for sequencing the regions upstream of the GFP gene and downstream of the CTG⅐CAG as well as the GAA⅐TTC repeat sequences were obtained from MWG Biotec (High Point, NC) or from Sigma Genosys (The Woodlands, TX). The forward primer at position 52 of pRW5301 was GCAGCTGGCACGACAGGTT-TCC. The reverse primers at position 1509 or 1846 of the same plasmid were CAAGCTGTGACCGTCTCCG and CAGGGTTATTGTCTCATG, respectively. In some cases, it was necessary to use a primer at the origin of replication. This forward primer at position 3390 was GCT-TCCAGGGGGAAACGCCTG. To detect breakpoints of the mutants of pRW5304, the reverse primer GGCGTATCACGAGGCCCTTAAG at position 2459 and the forward primer GCTTCCAGGGGGAAACGCCTG at position 4147 of the plasmid were employed. The primers and the plasmids were used at a concentration of 10 pmol/l and 200 ng/l, respectively, for the sequencing reactions.
DNA Sequencing-The DNA isolated from the 47 mutant white colonies was sequenced from both strands by using the primers described above. The DNA was sequenced in the Molecular and Human Genetics Sequencing Core at the Baylor College of Medicine, Houston. The DNA of the mutants was analyzed by cycle sequencing using a GeneAmp PCR System 9700 and the ABI 3700 Sequencer. Cycle sequencing conditions were as follows: initial denaturation at 96°C for 10 min, and 25 cycles of heating (96°C, 10 s), annealing (50°C, 30 s), and elongation (60°C, 4 min).

RESULTS
Strategy of Study-The GFP gene served as a reporter to study the influence of the triplet repeat tracts (CTG⅐CAG) n and (GAA⅐TTC) n on mutations in flanking sequences. All the TRS were cloned into the region of the vector that was adjacent and downstream to the GFP gene ( Fig. 1), so that repair of the non-B DNA structures, which may form at the repeat sequences (21), could be analyzed.
The (CTG⅐CAG) n -containing fragments (where n ϭ 17, 98, and 175) were cloned either in orientation I or II, relative to the origin of replication; for the plasmids containing (CTG⅐CAG) n tracts in orientation I, the CTG repeat is in the leading strand template, whereas for the plasmids in orientation II, the CTG repeat is in the lagging strand template (5, 24, 25, 29, 30). The (GAA⅐TTC) n repeat sequences (where n ϭ 9 and 176) were cloned into the pGFPT vector only in orientation II, where the GAA repeats are in the lagging strand template; orientation II is less genetically stable than orientation I (31,32). As a control, the pGFPT vector with no repeat sequences was used ( Fig.  1). We performed the experiments either with transcription activation or repression and conducted five successive re-cultivation steps. To activate transcription, experiments were performed in the presence of 2 mM IPTG, whereas co-transformation with pI Q -kan ensured that the lacI Q repressor inhibited GFP transcription from the lacZ promoter (22,28).
Fraction of White CFUs Depends on the Type and Length of the TRS Sequence-The genetic instability of the TRS depends on the length of the repeat tracts (12,25). To determine whether repeat tracts of different lengths influenced mutations in sequences flanking the repeats, we transformed the E. coli strains (Table I) with the nine plasmids listed in Fig. 1, harboring (CTG⅐CAG) n or (GAA⅐TTC) n repeat sequences. To identify single colonies, liquid cultures at the end of the 1st, 3rd, and 5th re-cultivations were spread on IPTG-containing agar plates, and the number of green and white colonies was counted. To verify the fluorescent status of the cells used to start each re-cultivation, the transformation mixture was immediately plated on LB plates, and the green fluorescence of the cells was determined. The cells used for each re-cultivation assay contained plasmids with a functional GFP reporter gene, because all colony-forming units ("CFUs") were fluorescent. Therefore, all white colonies arose during the re-cultivation growths of the cells (Fig. 1). The fraction of white CFUs was calculated as the ratio of the number of white colonies to the total number of viable cells (green and white).
Several "parental" E. coli strains that are genotypically different were studied. This diversity may influence the cellular behavior. To investigate a possible role of DNA repair in TRSinduced mutagenesis, while taking into account the genotypic variability, we examined whether the presence of the TRS increased the mutations in six different parental E. coli strains, all proficient in the four main repair pathways (methyl-directed mismatch repair, nucleotide excision repair, transcription-coupled repair, and base excision repair). The results of screening for white colonies, when plasmids contained (CTG⅐CAG) n tracts of different lengths (n ϭ 0, 17, 98, and 175) and (GAA⅐TTC) 176 , are shown in Table II. The data revealed that the loss of fluorescence of the GFP reporter gene depends on the presence of the CTG⅐CAG tract, because the fraction of white CFUs increased with the length of the repeat tract. The total number of CFUs analyzed ranged from 18,700 to 68,237, with an average of 34,718 CFUs in each experiment. Table II is a summation of all data for plasmids in both orientations and with and without transcription, in order to present the composite global results.
For all six strains, no white CFUs were found (Table II) with plasmids that lacked the TRS. When plasmids contained (CTG⅐CAG) 17 , mutants were found in KMBL1001 and KA796 strains at a frequency of 0.0004 and 0.001, respectively. A distinct increase in the fraction of mutants was found when longer CTG⅐CAG sequences, either pure or interrupted, were present. For plasmids harboring the uninterrupted (CTG⅐CAG) 98 , the highest fraction of white CFUs (0.31) was found when the DNA was propagated in KMBL1001. A slightly lower fraction of mutants was formed in the JJC510 and JTT1 strains (Table II), whereas in KA796, the fraction of white CFUs reached a level of 0.001. There was no effect on the loss of fluorescence when the re-cultivation assays were conducted after transformation of AB1157 and RW118 strains with plasmids containing the uninterrupted (CTG⅐CAG) 98 sequence (Ta-FIG. 1. Plasmids used in this study. All plasmids were derivatives of pGFPT (named pRW3619 in Ref. 21), which contains a transcription terminator cassette cloned into the SapI site of pGFPuv (Clontech). The CTG⅐CAG-containing fragments of different lengths (solid gray segment) were cloned into the EcoRI and EagI recognition sites of the pGFPT (for details see "Experimental Procedures"). Orientations I and II were defined by the presence of CTG or CAG repeats, respectively, on the leading strand template for DNA replication. With the exception of the (CTG⅐CAG) 175 repeat sequence, which is not a pure tract because it contains two G to A interruptions at repeats 28 and 69 (4), the CTG⅐CAG tracts of length 17 and 98 are pure (uninterrupted). The GAA⅐TTC containing fragments were also cloned downstream of the GFP gene. The tracts are uninterrupted triplet repeats. Cross-hatched arrow, pUC19 origin of replication; box with large X, transcription terminator sequence; short black arrow, lacZ promoter-operator (Pr); solid black segment, lacZ-GFP fusion gene; long open arrow, ampicillin resistance gene; E, EcoRI; A, EagI recognition sites. ble II). In all experiments performed with plasmids harboring the interrupted (CTG⅐CAG) 175 insert, a substantial fraction of white CFUs was found, up to 0.231 (Table II), in all strains tested. The lowest fraction (0.003) was observed in experiments performed in KA796.
In contrast, after transformation of E. coli with plasmids containing GAA⅐TTC sequences in parallel experiments, no effect of the repeat tracts on the loss of fluorescence was detected (Table II). For the longer (GAA⅐TTC) 176 , white CFUs were formed only when the re-cultivation was conducted in E. coli ⌬UvrA (data not shown), where 76 mutants were found out of a total of 47,585 CFUs (0.0016). Also, no effect was found for the shorter (GAA⅐TTC) 9 repeat in any of the six strains listed in Table II (data not shown).
We conclude, first, that the myotonic dystrophy type 1 repeat tract is much more prone to induce loss of fluorescence from the adjacent GFP gene than the Friedreich's ataxia GAA⅐TTC repeat sequence. Second, the fraction of white mutants formed in the presence of the CTG⅐CAG tract increased with the length of the repeat. We speculate that quasi-stable slipped non-B DNA structures formed by the long CTG⅐CAG repeat tracts elevated the frequency of deletions in a pathway that depended on DNA repair proteins (see "Discussion").
CTG⅐CAG Repeats Are More Mutagenic in Orientation II than in Orientation I-It is well established that the orientation of the repeat sequences relative to the replication origin plays an important role in their stability (12). This behavior was attributed to a higher propensity of the CTG repeats than the CAG repeats to form stable hairpin structures on the lagging strand template (1,12). To analyze whether the orientation of the repeat sequences influenced the loss of fluorescence (the fraction of white CFUs), we conducted growth studies in the six E. coli strains (Table I) Table III shows that the mutagenic effect was more pronounced when the (CTG⅐CAG) n sequence was in orientation II. We found the long and pure (CTG⅐CAG) 98 to be mutagenic in both orientations when re-cultivations were conducted in the KMBL1001, JJC510, and JTT1 strains (Table III). In KMBL1001 cells, there were 11,427 white CFUs out of 24,888 CFUs (0.459) for orientation II and 9,961 out of 43,349 for orientation I (0.229) (Table III). We also found a higher fraction of white CFUs for orientation II

Mutagenesis induced by CTG⅐CAG and GAA⅐TTC repeat sequences in E. coli DNA repair-proficient cells
The strategy to determine the fraction of white CFUs was described under "Experimental Procedures." Briefly, after transformation, the E. coli cells were plated onto LB plates containing Ap and IPTG and incubated at 37°C. The fluorescence of the colonies harboring a functional GFP gene was determined by exposure to a long wavelength UV lamp. The fraction of white CFUs was calculated by dividing the number of white mutants by the total number of viable colonies (green and white). The fractions shown as zero (Tables II-IV) represent data where no white CFUs at all were ever found, even in multiple repeat experiments. The data represent the combined results of three or more independent experiments consisting of the five-step re-cultivation protocol. Data for each length of the CTG⅐CAG repeats represent the composite results obtained for orientations I and II, as well as when transcription was both turned on and off. Data for GAA⅐TTC (orientation II) were obtained under the same experimental conditions. The fraction of white CFUs (bold font) is shown as mutant/total CFUs.  (Table III). This anomalous result is probably linked to the inviability of the cells harboring pRW5305, because a considerably smaller number of CFUs was observed on LB plates. Hence, this low frequency of white CFUs may have been due to plasmid loss (22) or deletion events affecting the ampicillin gene and/or the replication origin. The orientation dependence on the fraction of white CFUs was extreme for the longer interrupted (CTG⅐CAG) 175 repeats (Table III). When the re-cultivations of the cells harboring plasmids containing this longer tract cloned in orientation I (pRW5302) were conducted, no white CFUs were detected in any of the six strains. Alternatively, the same tract in orientation II showed a large number of white mutants. The highest frequency was found in JJC510 (14,433 white CFUs out of 25,707 total CFUs) (0.561). A somewhat lower fraction was found in KMBL1001 cells (8,152 white CFUs out of 22,340 total viable cells) (0.365) (Table III). A similar fraction of deleted mutants was detected in strains RW118 and AB1157 (2,961 white CFUs out of a total of 15,065 CFUs (0.196) and 4,165 out of 19,014 (0.219), respectively). The lowest fraction (94 out of 11,727) (0.008) was counted in KA796 (Table III). For the shorter (CTG⅐CAG) 17 in KMBL1001 and KA796, white CFUs were found only for orientation II in a ratio of 0.0016 and 0.0017, respectively.
In summary, these data show that the orientation, as well as the length of the CTG⅐CAG sequence, are important factors that influence the fraction of white CFUs.
Transcription through the CTG⅐CAG Sequence Increases the Fraction of White CFUs-Transcription has been shown to  sequences (n ϭ 17, 98, and 175) The orientation dependence of the repeat sequences and the loss of fluorescence from the GFP reporter gene are presented as the fraction of white CFUs. Growth cycles of E. coli cells harboring plasmids with the repeat sequences in both orientations were conducted in the presence or absence of transcription (data are combined). The fractions were calculated as described in the legend to a For orientation II of (CTG⅐CAG) 98 in JTT1, a small fraction of mutants, as compared with orientation I, may be linked to a reduced viability of the cells harboring pRW5305 due to plasmid loss and deletion events affecting the origin of replication and/or the ampicillin gene.

TABLE IV Transcription through the long CTG⅐CAG sequences increases the fraction of white CFUs
For experiments with inactive transcription, E. coli cells harboring pI Q -kan were transformed with the designated plasmids and grown in LB medium without IPTG. Parallel experiments with active transcription were conducted in the presence of IPTG, after the bacterial cells were transformed with the TRS-containing plasmids (Table I). In order to visualize green colonies, IPTG was included in the agar plates in all cases (see "Experimental Procedures"). The data shown are the summation of the results for each length of the TRS in orientations I and II. induce mutations that invoke DNA repair and recombination (33)(34)(35)(36)(37)(38). Induction of transcription in long CTG⅐CAG repeats contained on plasmids in E. coli revealed an increase in the frequency of deletions within the repeat tract (18). Therefore, we tested whether active transcription through the DM1 TRS influenced the fraction of mutations in sequences flanking the repeats. Table IV shows the composite data on the distribution of white CFUs found for plasmids containing CTG⅐CAG sequences in orientations I and II, when experiments were conducted in the presence (IPTG) or the absence (no IPTG) of transcription through the repeats. Propagation of plasmids harboring (CTG⅐CAG) 98 in the absence of transcription (no IPTG) gave rise to no, or very few, white CFUs (Table IV). A very small fraction of white CFUs was found in KMBL1001 and JTT1 strains (0.00059 and 0.00029, respectively); 16 white CFUs out of 27,141 total CFUs and 4 out of a total of 13,816, respectively, were found. For the longer interrupted (CTG⅐CAG) 175 in the absence of transcription, a small fraction of mutants was observed only in KMBL1001, JJC510, and AB1157 (Table IV). Five mutants out of 25,870 total CFUs were found in KMBL1001; in JJC510 and AB1157, the white CFUs comprised 3 out of a total of 25,438 and 4 out of 11,422 under transcription repression by pI Q -kan (28). No white CFUs were observed in any of the six strains with (CTG⅐CAG) 17 in the absence of transcription.
In the presence of transcription, the CTG⅐CAG repeat sequences caused a significant elevation in the fraction of the white CFUs in a length-dependent manner (Table IV). For (CTG⅐CAG) 17 , a small fraction of white CFUs appeared in the KMBL1001 and KA796 strains, and as the length increased, more strains showed a response (Table IV). The highest fraction of deleted mutants was observed for both the pure and interrupted CTG⅐CAG tracts in KMBL1001 and JJC510. Plasmids harboring (CTG⅐CAG) 98 showed a substantial fraction of mutants when replicated in JTT1 (Table IV). The lowest fraction of white CFUs for both the 98 and 175 CTG⅐CAG repeat sequences was found when plasmids were cultivated in KA796. Alternatively, studies on the influence of transcription on the FIG. 2. Sequences of deletion mutants (white CFUs) derived from restriction maps of pRW5301, pRW5305, pRW5309, and pRW5304. Transcription was activated by the introduction of 2 mM IPTG into the LB media, whereas its repression was obtained by co-transformation with pI Q -kan. The 1st column for each part (A-D) shows the E. coli strains, which harbored the plasmids. The 2nd column lists the names of the individual mutant clones. The 3rd column shows the sizes of the deletions, and the 4th column is a schematic representation of the deletions. The open spaces between the segments indicate the location of the mutations. The last column shows the number of triplet repeats that are retained in the deleted plasmids. All clones were obtained in the presence of transcription, except those in the last two rows of part B (asterisks). All sequences of the "white mutants" in B and C had inversions that include the CTG⅐CAG repeat tract and short sequences downstream and upstream of the TRS. The open boxes mark the regions of the inversions. Amp R , ampicillin resistance gene; Ori, ColE1 unidirectional origin of replication; Ter, transcription terminator cassette; GFP, green fluorescent protein gene. deletion behaviors of (GAA⅐TTC) 176 in orientation II in all six parental E. coli strains revealed no mutagenic response. Indeed, in a total of 232,097 CFUs, no mutants were found. The one exception was in the ⌬UvrA strain, where white CFUs were found only in the presence of transcription (data not shown).
Hence, the long CTG⅐CAG sequences exerted their mutagenic character through a process associated with transcription. The fraction of mutants formed upon transcription activation was considerably greater than in the presence of replication alone. Thus, we conclude that the deletions detected herein were the result of repair-dependent reactions, which were enhanced by transcription.
Types and Locations of Mutations-Restriction mapping and DNA sequence analyses of the white CFUs were performed to evaluate the alterations within the repeat tracts and flanking sequences. Analyses of 47 clones revealed that all mutants contained a nonfunctional GFP gene. These clones were characterized in detail. Twenty one were from pRW5301, 7 from pRW5305, 12 from pRW5309, and 7 from pRW5304 (Fig. 2). All mutations were large deletions; also derivatives of pRW5305 and pRW5309 contained inversions (Fig. 2, B and C, boxed  regions). More than one clone was found with identical mutations from individual transformations of the plasmids harboring the repeat tracts.
For pRW5301 containing (CTG⅐CAG) 98 in orientation I ( Fig.   2A), all white CFUs contained deletions ranging from 0.6 to 1.8 kbp. We found 15 clones with a single deletion (clones 1-4, 20 -23, and 25-31), with one break always mapping near the terminator cassette (Fig. 1), and the second either inside the CTG⅐CAG repeat tract (clones 1-4) or downstream of the tract (clones 20 -23 and 25-31). Therefore, these 15 mutant clones had lost the entire GFP reporter gene. Moreover, we found six clones that had two deletions (clones 5-7, 9, 19, and 24): one within the repeat tract and the second affecting the reporter gene. Five of these clones (all but clone 24) had a small segment of the reporter gene remaining. In addition, different numbers of residual CTG⅐CAG repeats were found as follows: 10 clones (mutants 1-7, 9, 19, and 24) had 7-94 CTG⅐CAG repeats remaining, whereas 11 (clones 20 -23 and 25-31) lacked all repeats ( Fig. 2A). All mutant derivatives of pRW5305 and pRW5309 underwent both deletions and inversion reactions (Fig. 2, B and C, boxed regions). All clones had two deletion events, one affecting the GFP gene and the other the repeat sequences. The retained part of the GFP gene was 40 and 115 bp for plasmids containing (CTG⅐CAG) 98 and (CTG⅐CAG) 175 , respectively. Also, all clones had only a few CTG⅐CAG repeats remaining, which were in the inverted orientation. In fact, all mutants had two additional breaks outside the repeat tract within the EcoRI and EagI recognition sites; the repair of these breaks led to the inversion events giving rise to the sequences at positions 1010 -  (Fig. 2, B and C). The reason why derivatives of pRW5305 and pRW5309 contain inversions, in addition to deletions, is unclear but may be due to the strategy of their cloning. Even though all clones had the same breakpoint junctions after the terminator cassette and within the GFP gene, they represent independent mutation events because they were found in separate transformations. Hence, these regions must be hot spots for recombination in orientation II.
For the long GAA⅐TTC repeat sequence, the repaired products of mutations revealed the occurrence of one large deletion of 2.0 kbp (Fig. 2D). Restriction mapping of the DNA from seven white mutants (clones 10 -16) showed that one break always occurred within 2 bp from the replication origin region, and the second inside the repeat tract. Therefore, these seven clones had lost the entire GFP reporter gene and a considerable part of the repeat sequence. The number of repeats remaining varied from 15 to 23 (Fig. 2D). Clones 10 -13 and 16 had 17 repeats remaining, whereas clones 14 and 15 revealed 15 and 23 repeats, respectively. Furthermore, all repaired products of pRW5304 had an additional small deletion localized downstream of the repeat tract, which removed one copy of a 4-bp GATC tandem repeat. Because these three clones were found from a single transformation, it is conceivable that they were derived from a common event.
Sequence Features at the Breaks-The ability of CTG⅐CAG repeats to adopt quasi-stable folded secondary structures is well established (1, 6, 19, 39, 40). Non-B DNA structures are susceptible to strand breaks, either single or double, within the repeat tract (41,42). The breaks appear to be repaired by the RecA-dependent homologous recombination pathway (43). Repair of the breaks can lead to instability of the repeat tract and, moreover, cause deletions of the sequences flanking the repeats (41,44). The propensity of long CTG⅐CAG sequences to induce breaks in an adjacent gene, which were subsequently repaired, was determined.
We analyzed the positions of the breaks and the sequences at the junctions for the 47 mutant white CFUs. We also inspected the sequences flanking the breakpoints for any direct, inverted, or mirror repeats capable of forming slipped structures, cruciforms, or triplexes (1,9,39), respectively.
Analyses of the sequences at the breaks revealed the existence of short homologies, from one to eight nucleotides (Table  V, A-D). Breakpoints did not occur at random positions but were within specific repeat sequences able to adopt unorthodox DNA conformations (21). Moreover, sequences flanking the breaks revealed the existence of repeat motifs. For example, in the vicinity of the break localized at position 4285 in the repaired products of pRW5304, three copies of an eight-nucleotide motif of direct repeats GGCCTTTT were detected (Fig. 3A). This panel shows an example of the non-B DNA structures at breakpoints of deleted pRW5304 (clones 10 -16 in Table V, D). A few mutant clones of pRW5301 (clones 5-7) (Table V, A) had breaks mapped at positions 138 and 920 of the vector part of the plasmid. These sites were also near three copies of direct (CAA, TTACC, GGC, TTA), two copies of inverted (CTTTC-GAAAG), and mirror (CATT) repeat motifs (data not shown). Fig. 3, B-D, shows examples of the presumptive structures at the breakpoints for clones 4 -7 and 19. The presence of these sequences near the breakpoints may have played a role in the  derived from pRW5301, pRW5305, pRW5309, and pRW5304 The 1st column lists the names of the individual mutant clones. The 2nd and 3rd columns show the first and second breakpoints of each rearrangement site; uppercase letters indicate the nucleotides (nts) that were retained, and lowercase letters indicate nts that were deleted. The nts that were homologous between the first and second breakpoints are underlined. In the 2nd and 3rd columns, the sequences read from the 5Ј to the 3Ј ends of the top strand. For the inversions, the direction of the reading is the same, but the bottom strand (boldface type) is shown, whereby the reading proceeds from the high to low numbers of the plasmids. The 4th column shows the sequence of the rearranged products. Sequences that were joined in an inverted orientation are shown in boldface (parts B and C). The last column lists the breakpoint positions; the numbers indicate the map number of the last 3Ј and first 5Ј retained nts.   (Table V). The arrows show the sequences that were deleted, and the boldface numbers designate the positions of the breaks (given the homologies, these numbers are arbitrary). The bold dashed lines between non-B DNA structures designate the deleted intervening sequences in A, C, and D. B, the boldface arrow designates a possible folding of the DNA with strand exchanges. deletion events (21). Characterization of a few additional deletions confirmed the association of deletion breakpoints with sites of non-B DNA structures. The non-B DNA conformations may have served as substrates for the repair machinery that generates long deletions (21).
Mutants, products of DNA repair, derived from pRW5305 and pRW5309, harboring long CTG⅐CAG sequences in orientation II, were deletions and also revealed inversion events (Table V, B and C). Clones 32-50 had two deletions affecting ϳ1.2-1.4 kbp of the plasmids. For deleted derivatives of pRW5309, one mutation event occurred between positions 7 (in the vector DNA) and 902 (in the GFP gene) (Table V, C), which were joined together. The next break mapped at position 1012 within the EcoRI recognition site and was followed by bp 1721 located inside the second EcoRI restriction site of the opposite strand, hence leading to the inversion. Homologous GAATTC sequences were present at both breakpoints (Table V,  For the mutants of pRW5305, a similar repair behavior was found. A 1.1-kbp deletion occurred between positions 3602 (in the vector) and 977 (in the GFP gene) (Table V, B) that were joined together. A third break was found 33 nucleotides downstream of this site at position 1010, which was followed by bp 1417 of the opposite strand causing the inversion. A homologous GAATT tract was present at the breaks. The end point of the inversion (bp 1010) continued with bp 1425 of the opposite strand, at sites where the inversion revealed a homologous GC pair (Table V, B). Furthermore, the CTG⅐CAG sequence, which was found in the inverted orientation, had lost a considerable number of the repeats.
In summary, the presence of the long CTG⅐CAG repeats promotes the formation of multiple breaks in sequences flanking the repeats. Their repair occurred between motifs that , which leads to deletions of the intervening sequence between D and A. Alternatively, in the other Single Deletion pathway (right side), the DSB (filled rectangle, A) occurring within the CTG⅐CAG repeat sequence (filled gray rectangle) is followed by a DSB at a second non-B DNA structure (filled circle, B) which, because it lacks homology with A, cannot be repaired with the DSB at the A site. Thus, a third DSB at a novel non-B DNA structure (open circle, C) contains homologous nucleotides with site B and hence the DSBs between sites B and C can be repaired, deleting the DSBs at the A site (B-C repair). In the Multiple Deletion Pathway, a third DSB (open circle, C) is located upstream from the A site rather than the downstream location described above. As a result, B-C repair will not remove the DSB at the A site. A fourth cleavage at the non-B DNA structure at site D (open rectangle) contains nucleotides that are homologous to the A site (filled rectangle). In this case, the A-D sites are repaired leading to two deletion events (between B and C and between A and D). shared homology of a few nucleotides. In all cases, the positions of the breakpoints were near or within specific repeat sequences capable of forming non-B DNA structures. Thus, weakened and/or distorted base pairs in the unorthodox DNA conformations (1,9,45) probably served as substrates for the generation of large deletions and rearrangements (inversions). DISCUSSION Long tracts of the myotonic dystrophy (CTG⅐CAG) n repeats promote inversions and deletions of 0.6 -1.8 kbp of the repeats along with a portion, or all, of the flanking GFP gene. A large number of prior genetic instability studies with the TRS (1-26, 29, 30, 40 -57) revealed expansions or deletions within the TRS, but no alterations were observed in the flanking and nonrepeating sequences. Thus, this remarkable mutagenic behavior was not recognized previously. Also, the effect of the orientation of the TRS insert relative to the unidirectional origin of plasmid replication was dramatic, and active transcription across the CTG⅐CAG tracts greatly stimulated the formation of deletions. It was not uncommon to observe 30 -50% of all colonies with gross deletions. The DNA sequences of the breakpoint junctions in 47 deletions revealed the presence of short (1-8 bp) direct or inverted repeat homologies, and the presence of slipped structures, cruciforms, or triplexes at or near the breakpoints was predicted in all cases. Hence, we propose that the slipped strand (1-16, 18, 21, 22, 24 -26, 29, 30, 40 -54, 58) and/or the flexible and writhed (19) conformations of long CTG⅐CAG repeat tracts promote the formation of rearrangements.
The length of the CTG⅐CAG tract has a pronounced effect on the capacity of plasmids to promote gross deletions. If n ϭ 0 or 17, essentially no mutants were observed. Alternatively, for the longer tracts (n ϭ 98 and 175), substantial deletions were found. The role of TRS length on the capacity to adopt non-B DNA structures has been established for CTG⅐CAG repeats (1-6, 8 -16, 18, 22, 24, 25, 29, 30, 39 -49, 51-53, 58) and for GAA⅐TTC repeats (31, 32, 55, 59 -61). Whereas the exact role of the non-B conformations adopted by the CTG⅐CAG repeats in the deletion process remains to be clarified, the distinct effect of repeat length in triggering these mutagenic reactions strongly suggests a role for the overall DNA topology rather than the sequence alone.
Most interestingly, long tracts of GAA⅐TTC were inert in promoting the formation of deletions in the work described herein. This TRS was shown to adopt triplex as well as sticky DNA conformations (31, 32, 55, 59 -61); triplexes were demonstrated to cause the site-specific introduction of DNA damage in eukaryotic cells (62,63). Thus, the molecular basis of the mutational impotency of long GAA⅐TTC repeats found in our studies remains to be clarified. 2 It is possible that the relatively long sequences flanking the GAA⅐TTC repeats (see "Experimental Procedures") could contain deletions that were not detected by our assay conditions. However, the large range of deletion lengths promoted by long CTG⅐CAG tracts makes this possibility unlikely.
Long CTG⅐CAG repeats in orientation II were much more prone to promote gross deletions and inversions than in orientation I. In fact, for the longest CTG⅐CAG repeat (n ϭ 175), deletions and inversions were only observed in orientation II but not in orientation I; as found for the effect of length, the host cell strain had an influence. This effect of insert orientation is diagnostic for an involvement of replication repair in the genetic instability behavior. Although this effect was first seen in plasmids harboring CTG⅐CAG repeats in E. coli (5,6), it has been repeatedly observed in a wide range of studies in yeast, cell cultures, and mice (1-4, 9, 30, 41, 42, 44, 45, 47, 58, 64). This behavior is due to the preferential capacity of the CTG repeat-containing strand on the lagging strand template to adopt hairpin loop structures (compared with the less stable CAG repeat strand), which serve as an impediment for replication fork progression at the repeats and thereby enable the induction of double-strand breaks at the stalled fork. Also, the (CTG⅐CAG) 175 insert with two G to A interruptions was less mutagenic than the shorter but uninterrupted 98 repeat tract (Table II). Numerous other examples have been found of the highly disruptive effect of interruptions on genetic instabilities (10,14,15,56,57,(65)(66)(67).
Active transcription of the TRS caused an increase in the formation of gross deletions by several orders of magnitude. This dramatic effect reveals the important consequences of transcription as a biological process in mutagenesis, which has been reviewed extensively (9, 33-37, 51, 68 -72). Virtually every process that exposes the single strands of DNA also destabilizes triplet repeats, including transcription (18,51), replication (5,52), recombination (14 -16, 41, 53), and DNA repair (10 -12, 54). When transcription occurs on a DNA segment that is simultaneously being replicated or contains lesions, which need to be repaired, transient changes occur in the DNA topology (9,69,73). As the negatively supercoiled DNA facilitates strand separation, it is vulnerable to metabolic attacks on the single-stranded regions leading to both mutagenic and recombinogenic lesions (14,39,69). Because transcription generates a high level of negatively supercoiled DNA and thereby promotes the formation of underwound non-B conformations, it is possible that the TRS-induced mutations were caused by these conformations at the repeat tracts. We demonstrated that the sequences at the breakpoints of the deletions for all 47 mutants could adopt supercoil-dependent non-B conformations, in agreement with prior studies (21,23).
Prior investigations revealed (21) that the highly unusual 2.5-kbp poly(purine⅐pyrimidine) sequence from intron 21 of the human PKD1 gene induced long deletions and other instabilities in plasmids that were mediated by mismatch repair and transcription. Other prior studies showed that this 2.5-kbp R⅐Y tract forms non-B DNA structures (22). For 11 deletions, which were analyzed in detail, the breakpoints could be explained by the formation of non-B DNA conformations. This work proposed that alternative DNA conformations (but not the sequences per se) promote genomic rearrangements through recombination-repair activities. The work described herein, demonstrating that long CTG⅐CAG repeat tracts also trigger the formation of large deletions and inversions and are greatly stimulated by transcription, provides a substantial extension of the original observations (21,22) and establishes a clear role of transcription. Although transcription through the long CTG⅐CAG tracts (12,18) is known to enhance its instability (via deletions), the different conditions of bacterial growth and the strains used did not previously allow detection of the gross deletions and inversions. Fig. 4 presents a model for the mechanisms of formation of the products described in Fig. 2. The DSB, close to or within specific sequences capable of adopting non-B DNA conformations, may induce repair by the single or multiple deletion pathways. Repair of DSBs occurring between sequences with direct or inverted homologies at the breakpoints caused inversions and deletions of part or all of the repeat tracts along with some flanking DNA. For example, four mutants (Fig. 2, clones  1-4), and 11 other DNAs (Fig. 2, clones 20 -23 and 25-31) are products of repair events that could have been formed by the left and right side mechanisms, respectively, of the single deletion pathway (see Fig. 4). However, clone 7 is a typical exam-ple of a product formed by the multiple deletions pathway. Also, all derivatives of pRW5305 and pRW5309 as well as a few clones derived from pRW5301 (clones 5, 6, 9, 19, and 24), which had retained flanking DNA downstream of the repeats, were also derived by the multiple deletions mechanism.
Because long repeat tracts of CTG⅐CAG induce gross deletions and inversions in flanking genes, the consequences of this expanded sequence in DM1 patients with a full mutation may be profound. If the same type of behavior is found in humans as observed herein, substantial deletions or other rearrangements may occur near the 3Ј-untranslated region causing a deletion at the carboxyl terminus of the DMPK protein. Alternatively, this process may cause a proteolytic labilization of DMPK. Although this kinase has been studied extensively from biochemical, immunological, and regulatory standpoints (reviewed in Ref. 74), little or no data are available on its integrity in full mutation patients. 3 If DMPK is labilized in patients, this novel genetic process may be responsible, at least in part, for the disease pathology.