Small Slipped Register Genetic Instabilities in Escherichia coli in Triplet Repeat Sequences Associated with Hereditary Neurological Diseases*

Genetic instability investigations on three triplet repeat sequences (TRS) involved in human hereditary neurological diseases (CTG·CAG, CGG·CCG, and GAA·TTC) revealed a high frequency of small expansions or deletions in 3-base pair registers in Escherichia coli. The presence of G to A polymorphisms in the CTG·CAG sequences served as reporters for the size and location of these instabilities. For the other two repeat sequences, length determinations confirmed the conclusions found for CTG·CAG. These studies were conducted in strains deficient in methyl-directed mismatch repair or nucleotide excision repair in order to investigate the involvement of these postreplicative processes in the genetic instabilities of these TRS. The observation that small and large instabilities for (CTG·CAG)175 fall into distinct size classes (1–8 repeats and approximate multiples of 41 repeats, respectively) leads to the conclusion that more than one DNA instability process is involved. The slippage of the complementary strands of the TRS is probably responsible for the small deletions and expansions in methyl-directed mismatch repair-deficient and nucleotide excision repair-deficient cells. A model is proposed to explain the observed instabilities via strand misalignment, incision, or excision, followed by DNA synthesis and ligation. This slippage-repair mechanism may be responsible for the small expansions in type 1 hereditary neurological diseases involving polyglutamine expansions. Furthermore, these observations may relate to the high frequency of small deletionsversus a lower frequency of large instabilities observed in lymphoblastoid cells from myotonic dystrophy patients.

Neurogenetic diseases including myotonic dystrophy, fragile X syndrome, Huntington's disease, spinobulbar muscular atrophy, spinocerebellar ataxia type 1, and Friedreich's ataxia result from expanded triplet repeat sequences (TRS) 1 CTG⅐CAG, CGG⅐CCG, or GAA⅐TTC within their genes (1)(2)(3). Also, insta-bilities (expansions and deletions) of TRS have been associated with limb developmental diseases, including human synpolydactyly, hypodactyly in mice, and hereditary nonpolyposis colon cancer (3)(4)(5)(6). The earlier age of onset and the increased severity of the neurological diseases through family pedigrees (clinically referred to as anticipation) are influenced by the lengths of the TRS. Long tracts of TRS are unstable and show repeat size polymorphisms in successive generations and in different tissues. In addition to observations in humans, TRS instabilities have been demonstrated in Escherichia coli (7)(8)(9)(10)(11)(12)(13)(14)(15), yeast (16,17), transgenic mice (18), and cultured cells from patients (19,20) and are influenced by factors including host strain genotypes (11,12), DNA replication (7)(8)(9)(11)(12)(13), methyl-directed mismatch repair (10), growth conditions (8,9), transcription (8), and nucleotide excision repair. 2 Simple repetitive sequences are known to cause misalignment-mediated DNA synthesis errors that give rise to instabilities (22)(23)(24)(25). These deletions and expansions are thought to be due to the formation of unusual DNA secondary structures that can cause frame shift (small slipped register) mutations during DNA synthesis (26). Mono-and dinucleotide repeats including Z-DNA-forming sequences readily show sequence length polymorphisms, presumably due to multiple slippages in templates (22,24,25,27,28). Numerous prior investigations demonstrated the occurrence of DNA instabilities caused by inactivation of the methyl-directed mismatch repair (MMR) or nucleotide excision repair system (NER) system (29 -41). Much less is known about the involvement of these repair systems in the instability of TRS (10,42,43). The mechanism of TRS expansion that is responsible for the hereditary neurological diseases has not been fully elucidated. However, the mechanism responsible for large expansions (scores or hundreds of TRS), such as for CTG⅐CAG in myotonic dystrophy and CGG⅐CCG for the fragile X syndrome, is DNA replication errors caused by the slipped register of the DNA complementary strands along with transient formation of hairpin loops and DNA polymerase pausing with primer relocation and strand elongation (11, 12, 16, 17, 44 -46). Also, genetic recombination is a robust and efficient way to achieve large expansions of CTG⅐CAG. 3 Small expansions are also extremely important for certain types of inherited neurological diseases that involve polyglutamine expansions that are encoded by CTG⅐CAG repeats. Normal gene products seem to tolerate a rather wide variation in the size of a polyglutamine tract (between 10 -35 glutamines without any detectable adverse effects beyond a threshold of 35-40 glutamines) in five of the eight known diseases. However, when the tracts are beyond this threshold, pathological properties are observed. For all of the eight diseases known to share this mechanism, a strong inverse correlation exists between the length of the polyglutamine tract and the age of onset of clinical symptoms; for each added glutamine residue beyond the threshold, on average, a 1.5-2-year earlier onset age is found (48). Furthermore, in lymphoblastoid cell lines from myotonic dystrophy patients, two types of mutations of the expanded CTG⅐CAG repeat alleles were detected: frequent mutations that showed small changes in the repeat size and relatively rare mutations with large changes in the CTG⅐CAG repeat. We believe that the large changes in repeat size observed in this human cell line are due to replication or recombination as elucidated in simpler cells (11,12,16,17,44,46). However, the molecular mechanism responsible for the small instabilities has not been identified and may be rather different from that involved with the large instabilities.
As part of our ongoing program to elucidate the molecular events involved in genetic instabilities as related to human neurological diseases, we have investigated small slipped register genetic instabilities. These small expansions and deletions found in E. coli were studied optimally in the absence of certain repair functions (methyl-directed mismatch repair or nucleotide excision repair), since these activities would be expected to recognize the looped structures formed in the slipped TRS conformation.

EXPERIMENTAL PROCEDURES
Plasmids-All plasmids used in these experiments containing repeating (CTG⅐CAG) n inserts are shown in Fig. 1. The inserts may also be designated (TGC⅐GCA) n or (GCT⅐AGC) n . pRW3248 is a derivative of pUC19 NotI and was described previously (10). The plasmid contains (CTG) 175 as the leading strand template for replication, termed orientation I. This sequence is not homogenous but contains two G to A interruptions at repeats 28 and 69. pRW3297, pRW3296, and pRW3294 are derivatives of pUC19 containing different lengths of (CTG⅐CAG) in orientation I and were constructed and characterized in this laboratory. 4 pRW3297 contains (CTG⅐CAG) 73 and G to A interruptions at repeats 28 and 59; pRW3296 contains (CTG⅐CAG) 57 and G to A interruptions at repeats 28 and 43; pRW3294 contains (CTG⅐CAG) 42 and a G to A interruption at repeat 28. pRW3024 is a pUC19 NotI-based plasmid containing the (CGG⅐CCG) 24 insert cloned into the BamHI site, where the CGG is in the lagging strand template (14). pRW3808 is a derivative of pUC18 NotI (49) containing the (GAA⅐TTC) 176 fragment inserted into the BamHI site of the vector (50). The plasmid was obtained by the in vivo expansion technique (11), and the GAA strand is in the leading strand template (50). pRW3832 and pRW3821 are derivatives of pSPL3 vector (51) and contain (GAA⅐TTC) 15 and (GAA⅐TTC) 59 , respectively. These plasmids were constructed and characterized in this laboratory. 4 Bacterial Strains-The following E. coli MMR mutator phenotype strains were used. KA796 (ara, thi, ⌬pro-lac) is a parent (wild type) of the MMR-deficient strains; NR8039 is isogenic with KA796 but is also mutH101; NR8040 is isogenic with KA796 but is also mutL101; and NR8041 is isogenic with KA796 but is also mutS101. These strains were the kind gift of Dr. R. Schaaper (NIEHS, National Institutes of Health, Research Triangle Park, NC) and were used previously (10). To study the influence of the NER on the frequency of mutations of (CTG⅐CAG) n sequences, the following E. coli strains were used. AB1157 (thr-1, ara-14, leuB6, ⌬(gpt-proA) 62 Conditions of Bacterial Growth-The plasmids containing various lengths of repeating (CTG⅐CAG) sequences were transformed into the appropriate E. coli strain and grown for a number of generations, as described (9). Briefly, E. coli cells were transformed with plasmids, and an aliquot of this mixture was inoculated into 10 ml of LB containing ampicillin at 100 g/ml. Incubations of the liquid cultures were continued overnight at 37°C at a shaking rate of 250 rpm. The bacteria then were subcultured into fresh liquid media with a dilution factor of 10 7 . The cells from each culture were harvested, and plasmids were isolated.
Analysis of the Repeat Composition of Deletions of (CTG⅐CAG) 175 , (CTG⅐CAG) 73 , (CTG⅐CAG) 57 , and (CTG⅐CAG) 42 as Grown in MMR-proficient and -deficient Strains-Plasmids with deletions in the (CTG⅐CAG) 175 insert were obtained as follows. The E. coli parental MMR proficient strain and the mutS, mutL, and mutH mutants harboring pRW3248 were grown for approximately 100 generations. Plasmids were isolated and cleaved with SacI and HindIII to release the triplet repeat-containing inserts, which were analyzed by PAGE at room temperature (data not shown). Bands corresponding to fragments with fewer than 175 triplet repeat units were purified from the gels and recloned into the SacI/HindIII-linearized pUC19 NotI. The ligated DNA was electroporated into E. coli HB101. This E. coli strain was chosen because it provided greater stability for the TRS-containing plasmids. Cells were then spread on ampicillin plates. In order to minimize further deletions in the inserts, single colonies were grown to early 4 K. Ohshima, unpublished results.
FIG. 1. Inserts containing (CTG⅐ CAG) repeats in plasmids used in this study. G to A interruptions exist at repeats 28 and 69 for pRW3248, repeats 28 and 59 for pRW3297, repeats 28 and 43 for pRW3296, and repeat 28 for pRW3294. The total number of triplet repeats is shown on the right end of each insert. Other details and plasmids are described under "Experimental Procedures." logarithmic phase (A 600 Յ 0.7) (9) in LB medium containing 100 g/ml of ampicillin, and plasmids were isolated. These TRS instabilities were experienced in the primary MMR-proficient and -deficient strains rather than in the secondary cloning in E. coli HB101; only one of the 25 clones analyzed from the wild type cells (Tables I-III) showed any small instabilities. The number of triplet repeats and the positions of the interruptions in these clones were determined by DNA sequencing using the dideoxy chain termination method and by restriction analyses. The same technique was used to analyze the repeat composition of (CTG⅐CAG) 73 from pRW3297. pRW3294 and pRW3296 were very stable and contained only nondeleted (CTG⅐CAG) inserts.
Analysis of the Repeat Composition of (CTG⅐CAG) 175 from pRW3248 as Grown in NER-proficient and -deficient Strains-The plasmids with full-length (CTG⅐CAG) 175 insert as well as those with deletions in the (CTG⅐CAG) 175 insert were obtained as follows. The E. coli parental NER-proficient strain and the uvrA, uvrB, and uvrA uvrB mutants harboring pRW3248 were grown in LB media containing ampicillin (100 g/ml) for approximately 60 generations. These conditions were shown to cause TRS instabilities. 2 The cells were harvested, and the DNA containing both full-length and deleted products was isolated and used to transform E. coli HB101. Cells were then spread onto ampicillin plates, and DNA from single colonies was isolated. Only those preparations that contained a homogeneous length of the CTG⅐CAG insert, as determined by cleavage with EcoRI and HindIII, were selected for sequence analyses.
Calculation of the Length of Deletion Products of (GAA⅐TTC) 176 from pRW3808 -E. coli AB1157 harboring pRW3808 was propagated overnight in LB medium with ampicillin (100 g/ml). The plasmid was isolated, purified, and cleaved with EcoRI and PstI to release the TRS-containing fragment. The restriction products were analyzed by 7% PAGE in TAE buffer at 3 V/cm at room temperature along with 1-kilobase pair DNA size markers (Life Technologies) as well as the EcoRI/PstI restriction fragments derived from pRW3832 and pRW3821 containing (GAA⅐TTC) 15 and (GAA⅐TTC) 59 , respectively. The negatives of the ethidium bromide-stained gels were quantitated by densitometry (300S, Molecular Dynamics, Inc.). 506 and 517 bp size markers as well as EcoRI/PstI fragments containing (GAA⅐TTC) 176 (full-length product from pRW3808) (1173 bp), (GAA⅐TTC) 57 (845 bp), and (GAA⅐TTC) 15 (710 bp) were used to compose the standard curve by plotting the log of number of bp versus the distance migrated from the origin of the gel. The 1-kilobase pair size marker alone could not be used to construct a standard curve because of the anomalous migration of the TRS tracts (50). In the range from 1173 to 506 bp, the curve was fit by a quadratic function. The parameters of the standard curve were then used to derive the number of bp (x) of the unknown (GAA⅐TTC) n -containing fragments by solving for the distance x ϭ (Ϫb Ϯ (4yc Ϫ 4ac ϩ b) 1 ⁄2 )/2c, where a is the intercept, y is the relative migration, and b and c are the constants of x and x 2 , respectively.
Analysis of Expansion and Deletion Products of (CGG⅐CCG) 24 from pRW3024 -The methodologies and pRW3024 are described elsewhere in this section and in the legend to Fig. 5.
General Techniques-DNA preparations and agarose and polyacrylamide gel electrophoresis were performed according to standard laboratory protocols (52). Plasmids were prepared by alkaline lysis of 1 liter of culture and purified by CsCl/ethidium bromide centrifugation overnight. Also, purification of 10-ml cultures was performed using the standard alkaline lysis miniprep procedure. Restriction digests were performed following the manufacturer's instructions. The length analyses were performed by electrophoresis through 5 or 7% polyacrylamide gels in TAE (40 mM Tris acetate, 1 mM EDTA, pH 8.0) or TBE (90 mM Tris borate, 1 mM EDTA, pH 8.3) buffers. The gels were stained with ethidium bromide and photographed. For recloning, the SacI/HindIII bands corresponding to the inserts containing the original or the deleted TRS were eluted from the gel and ligated into the SacI-and HindIII-digested vector DNA. The ligation mixture was transformed into E. coli HB101 by electroporation. The transformants were selected on LB agar plates containing ampicillin (100 g/ml). The positions of the interruptions in the clones were analyzed by dideoxy sequencing of one or both strands with Sequenase (version 2.0; U.S. Biochemical Corp.) using M13 primers (New England Biolabs).

Expansions and Deletions in (CTG⅐CAG) n in Wild Type and MMR Mutants-
In an effort to further explore the molecular processes involved in instability of (CTG⅐CAG) n , we have analyzed the sequences of a family of deletion products generated from pRW3248 (Fig. 1). The plasmid contains the myotonic dystrophy (CTG⅐CAG) 175 sequence with two G to A polymorphisms at the 28th and the 69th repeats. These interruptions serve as valuable markers. Since DNA sequence analyses were required for these determinations, we do not have a quantitative genetic assay that facilitated the study of a large number of clones. The deletion products were generated in E. coli lacking MutL or MutS, or MutH and in the isogenic wild type strain. The studies described herein were conducted in the three MMR-deficient strains, since this repair system can recognize and correct small heterologous loops containing up to four bases (53). Such loops might form during replication of CTG⅐CAG and, if not repaired, lead to small expansions and deletions. Prior investigations of TRS instabilities in MMRproficient and -deficient cells also enabled the study of deletions (10), but we were unable to identify the nature of the mispairs that served as MMR substrates.
The MMR-deficient as well as the wild type strains were transformed with pRW3248. The cells were grown for approximately 100 generations. Plasmids were isolated and cleaved with restriction enzymes to release the triplet repeat containing inserts. Bands corresponding to fragments with fewer than 175 triplet repeats were recloned into pUC19 NotI. The number of triplet repeats and the positions of the G to A interruptions in 43 clones were determined by PAGE and DNA sequencing. These data are presented in Fig. 2, and the detailed compositions of the clones analyzed are shown in Table I. In general, the results are presented in both the graphical and tabular forms, since the figures more clearly show the locations of the deletions, whereas the tables show the sizes and locations of the small instabilities in repeat numbers. The G to A interruptions in pRW3248 originally located at the 28th and at the 69th repeats ( Fig. 1) were found in the MMR-deficient, but not the wild type, E. coli strains at these or nearby positions. These observations show that, besides the large deletions, small deletions (clones 17, 25, 27, 31, 32, and 35; Table I) and small expansions (clones 17-19, 26, 32, and 39) occurred within the TRS-containing inserts. For two clones, another G to A polymorphism close to the G to A interruption at position 28 was observed (clones 17 and 32; Table I). This can be explained by slippage events that took place at or very close to repeat number 28. The size of the small deletions was 1 triplet repeat unit, and the sizes of small expansions varied from 1 to 8 triplet repeats.
For wild type cells, the majority of these large deletion products lost either one or both of the G to A interruptions (clones 4 -16, Table I and Fig. 2; clone 58 and 59, Table III and Fig. 4). Hence, the more unstable region of the insert is the 5Ј-end, which contains the G to A interruptions.
Similar experiments were performed using pRW3297, pRW3296, and pRW3294 containing (CTG⅐CAG) 73 , (CTG⅐ CAG) 57 , and (CTG⅐CAG) 42 , respectively. Plasmids were transformed into MMR-proficient strains, as well as mutS and mutL strains, and the cells were grown for approximately 100 generations. The isolated plasmids were characterized by PAGE and DNA sequencing. The repeat compositions of eight clones of (CTG⅐CAG) 73 are shown in Table II and Fig. 3. Again, the positions of the two G to A interruptions were found not only at their original positions of 28 and 59 but also at new positions, suggesting that small deletions (clones 46, 47, and 50) and the expansion (clone 46) occurred within the triplet repeat region. In the case of clone number 46, the position of the G to A interruption was shifted from repeat 59 to repeat 58. This indicates the deletion of one triplet repeat between the two interruptions (Table II, deletion region b), but since the overall number of triplet repeats in this clone remained the same (73 repeats), the expansion of one triplet repeat must have taken place also. Hence, both an expansion and a deletion of one triplet repeat took place at different locations in the insert.
pRW3296 and pRW3294 containing (CTG⅐CAG) 57 and (CTG⅐CAG) 42 (Fig. 1) were substantially more stable when grown in wild type, mutL, and mutS strains than pRW3297, and no changes in their lengths or in the positions of the G to A interruptions were observed (data not shown).
In both sets of experiments, none of the clones obtained from the wild type strain underwent the small expansions or deletions. However, the frequency of the instability events in MMRdeficient strains (the number of clones with small expansions and deletions divided by the total of 34 clones analyzed for the MMR mutants) was very high (38%).
Genetic Instabilities of (CTG⅐CAG) n in Wild Type and NER Mutants-In another set of experiments, the repeat composition of (CTG⅐CAG) 175 clones from pRW3248 (Fig. 1) as grown in NER-proficient and -deficient strains was analyzed. The nucleotide excision repair enzyme complex can recognize and preferentially bind to bubble and loop regions in duplex DNA (54). The involvement of this repair system in the instability of long CTG⅐CAG sequences is under current investigation. 2 pRW3248 was transformed into NER-proficient strains as well as the uvrA or uvrB mutants and the uvrA uvrB double mutant, and the cells were grown for approximately 60 generations. The isolated plasmids were characterized by PAGE and DNA sequencing. These data are graphically presented in Fig. 4, and the detailed composition of 30 clones is shown in Table III. As seen for plasmids propagated in the MMR-deficient strains, the positions of the G to A polymorphisms in some of these clones were changed and clustered around positions 28 and 69. However, besides the large deletions (clones 57-59, 73-75, 80, and 81) and the expansion (clone 57), only small deletions were observed (clones 56, 70 -72, and 79; Table III), whereas no small expansions were found. Hence, the mechanisms of recognition and repair of the slipped structures formed by the TRS may be different for methyl-directed mismatch repair and nucleotide excision repair (34,39).
The frequency of small deletions of CTG⅐CAG from the wild type strain (one out of eight clones analyzed) was not substantially different than that from the uvrA uvrB double mutant (one out of six clones analyzed). However, the absence of UvrA or UvrB affects the frequency of small deletions in different ways. The uvrB mutant yielded a higher proportion of small deletions (three out of 10 clones analyzed) than the uvrA mutant (none of the clones analyzed). These data show a different mode of action for UvrA and UvrB on the CTG⅐CAG triplet repeat sequence.
In conclusion, we propose that the small expansions and  175 in wild type and MMR-deficient strains Forty-three clones harboring deleted inserts were isolated from the MMR-proficient strain (wild-type), as well as the mutL, mutS, and mutH strains (see "Experimental Procedures"). The number of repeats found by DNA sequencing for each of the clones is tabulated in the second column. The quality of the sequencing data enabled the precise determination of the number of repeats except for the cases otherwise designated (clones 1, 4, 5, 25, 33, and 34). However, in all cases, the locations of the interruptions, presented in the third column, were unambiguous. The fourth column lists the number of CTG units between two or more interruptions. The size and location of the expansions and deletions are shown in the fifth column, where a represents the region from the 1st to the 28th triplet repeat, b from the 29th to 69th, and c from the 70th to 175th. Expansions and deletions within these deletion products occurred within the region specified (numbers under a, b, or c) as well as in region b plus c (numbers between letters b and c) and in a to c (numbers with asterisks).  35  133  28, 68  39  1  41  36, 37  68  28  107  38  65  28  110  39  65  29  1  111  40  61  28  114  41  60  28  115  42  58  28  117  43 49 126*

TABLE II Repeat composition of clones of (CTG ⅐ CAG) 73 in wild type and MMR-deficient strains
Eight clones harboring full-size or deleted inserts were isolated from the MMR-proficient strain (wild type), as well as the mutS and mutL mutants. The size and location of the deletions and the expansion are shown in the sixth column, where a represents the region from the 1st to the 28th triplet repeat, b from the 29th to 59th, and c from the 60th to 73rd. Deletions and the expansion occurred within the regions specified (numbers under a, b, or c).  30 18 deletions are probably due to the slippage of the complementary strands in register by one or a few triplet repeat units (Fig. 8).

Expansions and Deletions in (CGG⅐CCG) n -Expansion and deletion analyses were also conducted on CGG⅐CCG fragile X sequences (6 -32 units in length) (14) that lacked interruptions.
Investigations with (CGG⅐CCG) 24 showed that expansions and deletions could be resolved as distinct bands on polyacrylamide gel electrophoresis. Fig. 5 shows a densitometric scan of the bands from PAGE analyses on (CGG⅐CCG) 24 . Each peak represents a band differing from its neighbor by one triplet repeat unit, corresponding to the range of (CGG⅐CCG) 24 to (CGG⅐CCG) 7 . A nearly linear relationship in the migration exists for these species (the relationship is actually quadratic but appears linear over this small range of lengths). Also, expansion products were visually observed on this gel but could not be resolved by densitometric scans; these expansion products maintained the triplet repeat periodicity for up to approximately 45 TRS. In summary, this family of expansion and deletion products must have been derived by slippage of the register of the CGG⅐CCG complementary strands by multiples of 3 bp. Although these sequences did not bear reporter interruptions that were important for the work described above with CTG⅐CAG for unraveling the molecular mechanism, the high resolution of the analytical system for these CGG⅐CCG lengths warrants the conclusion of DNA slippage. Eichler et al. (55) have hypothesized microsatellite slippage of (CGG⅐CCG) n sequences from fragile X patients as intermediates in the expansion to full mutations.
Expansions and Deletions from (GAA⅐TTC) n -Similar investigations were conducted on the Friedreich's ataxia TRS (GAA⅐TTC) 176 (50) in E. coli AB 1157. Expansions and deletions derived from pRW3808 were analyzed by polyacrylamide gel electrophoresis. Fig. 6 shows a densitometric scan of the bands from PAGE. It was possible to visualize 25 distinct molecular species ranging from 176 to 14 triplet repeats. Similar data were obtained in each of the nine isolates investigated (data not shown). For fragments containing 142-14 triplet repeats, the peaks representing bands differ from each other by multiples of 3 bp ranging between 2 and 10 triplet repeats. A smear above the full-length (GAA⅐TTC) 176 fragment (to the left of peak 176) most likely represents expansion products that could not be resolved by the densitometric analysis. Thus, the results of these instability investigations are in complete agreement with the results described above with CTG⅐CAG and CGG⅐CCG. Table II. The original locations of the G to A interruptions at repeats 28 and 59 are indicated by the vertical dotted lines, whereas their positions in the deleted inserts are shown by vertical ticks. Table III. The original locations of the G to A interruptions at repeats 28 and 69 are indicated by the vertical dotted lines. The locations of the inserts with no interruptions cannot be precisely determined and are shown to the right (designated by asterisks).

DISCUSSION
The slippage of the complementary strands of simple repeating DNA sequences is known to be important for genetic instabilities. The capacity of these sequences to adopt non-B DNA conformations that promote misalignment-mediated DNA synthesis errors can cause frame shift mutations during DNA synthesis (22)(23)(24)(25)(26). The expansion events involved in the etiology of hereditary neurological diseases fall into two categories (type 1 and type 2) (1). For relatively small TRS expansions for the type 1 diseases involving expanded glutamine tracts in the target proteins, the mechanisms proposed herein for small slipped register expansions and deletions (SSED) may be responsible. However, for the massive expansions involved in the type 2 diseases such as myotonic dystrophy and fragile X, a different mechanism involving replication (7-12, 16, 56) or recombination 3 is likely. Other workers (57) have also suggested the involvement of two different DNA repair mechanisms for correction of "small" and "large" lengths of repeats. The slippage of DNA complementary strands was hypothesized to be the mechanism responsible for the in vitro synthesis of long DNA polymers of repeating nucleotide sequences (25,58,59), slipped strand mispairing mutagenesis (60 -62), the genetic hypermutability of dinucleotide repeat sequences in mismatch repair-deficient cells related to hereditary non-polyposis colon cancer (4,63,64), and the intrinsic genetic instabilities in TRS associated with several human neurological diseases (3,11,12,46).
Herein, we demonstrated that CTG⅐CAG, CGG⅐CCG, and GAA⅐TTC sequences undergo small expansions and deletions caused by slippage events in vivo. For wild type cells, we found large deletions at the 5Ј-ends of the inserts that contain the G to A interruptions. We believe this is due to the formation of G⅐T and C⅐A mispairs (Fig. 7) as a consequence of complementary strand slippage. These mispairs trigger the mismatch repair system to cause the removal of the progeny strand. The resulting long single-stranded CTG tract forms a hairpin struc-FIG. 5. Expansion and deletion products found for (CGG⅐CCG) 24 . The DNA was prepared by digestion of pRW3024 in the unstable orientation (CCG in the leading strand) with SacI/HindIII, and the resulting fragments were electrophoresed by PAGE at 4°C (14). Inset, a densitometric scan of the PAGE analysis (shown above) of the fragments derived from (CGG⅐CCG) 24 . Numbers above the peaks represent the number of triplet repeats in the most abundant species, and each peak is a fragment differing from its neighbor by one repeating unit. The full figure shows the number of triplet repeats as a function of distance (in cm) migrated from the top of the gel. This relationship is generally graphed on a semilogarithmic scale, but since we are concentrating on only a small portion of the gel, we find that the measurements are essentially linear in the range from 24 to 7 repeats with a correlation coefficient of r 2 ϭ 0.993. Individual expansion products (to the left of peak 24) are visualized on the photograph of the gel, but could not be resolved by the densitometric analysis.

TABLE III
Repeat composition of (CTG ⅐ CAG) 175 clones in wild type and NER-deficient strains Thirty clones harboring full-size and deleted inserts were isolated from the NER-proficient strain (wild type) as well as the uvrA, uvrB, and uvrA uvrB strains (see "Experimental Procedures"). The quality of the sequencing data enabled the precise determination of the number of repeats except for the cases designated otherwise (clones 52-55, 60 -69, 73, 74, and 76 -78). However, in all cases, the location of the G to A interruptions was unambiguous. The fourth column lists the number of CTG units between two or more interruptions. The size and location of the expansions and deletions is shown in the fifth column, where a represents the region from the 1st to the 28th triplet repeat, b from the 29th to 69th, and c from the 70th to 175th. An expansion and the deletions occurred within the specified regions (numbers under a, b, or c) as well as in the region b plus c (numbers between letters b and c) and a to c (numbers with asterisks).  (Fig. 7), which is bypassed during DNA synthesis to give large deletions of this region. Alternatively, in the repair-deficient cells, a different model is proposed for small slipped register expansions and deletions in vivo (Fig. 8). The top molecule represents a (CTG⅐CAG) 14 tract that contains the naturally occurring CTA⅐TAG interruption at position 7. This tract is similar to, but shorter than, the (CTG⅐CAG) n fragments on which SSED was observed. Following strand dissociation, slippage produces two hairpins, one composed of CTGCTG, the other of CAGCAG. Misalignment also splits the polymorphic CTA⅐TAG, so that now CTA pairs with CAG and TAG opposes CTG, which gives rise to two mismatched base pairs, A⅐C and G⅐T, respectively. Thus, resolution of this structure into a regular B-helix requires the repair of hairpin loops as well as of mismatches. The left pathway proposes that an incision is inflicted opposite to each loop (arrows), which, followed by DNA repair synthesis and ligation to fill-in the gaps, leads to expansion. Repair of the mismatched base pairs may be accomplished on either strand, but only the use of the bottom strand template results in a shift to the right of the original interruption (position 9 versus 7). The pathway on the right envisions cleavage of the hairpin loops at their base, followed by ligation; this generates a twotriplet repeat deletion. Again, only repair of the mismatched bases by using the bottom strand as a template results in a concomitant shift to the left of the CTA⅐TAG polymorphism (position 5 versus 7). Similar results may also be obtained by a combination of these two pathways (e.g. an incision opposite to the CAG loop plus an excision of the CTG loop) if a second round of replication is included in the overall reaction.
The small instabilities were found essentially only in the repair-deficient strains. In this study, we attribute the higher frequency of large deletions, in the wild type cells in the regions containing the interruptions, with the formation of long hairpins in the CTG strand. Such hairpins may form after the "long patch" removal of the CAG strand by the MMR system acting on the slipped DNA structures. It is possible that the increased SSED observed herein in the MMR-deficient strains results from the processing of similar slipped DNA structures by activities that do not involve such long patch intermediates. Indeed, it has been shown that heteroduplex DNA is a substrate for the nicking activity of endonucleases such as deoxyinosine 3Ј-endonuclease (65)(66)(67)(68). Also, hairpin structures were reported to be cleaved by topoisomerases (reviewed in Ref. 69). Finally, the UvrB-UvrC complex was shown to efficiently repair an 11-base noncomplementary region in a duplex DNA if a 3Ј-nick was introduced (70). Our data in E. coli with MMR Ϫ cells are in agreement with studies reported in pms1 and msh2 yeast mutants (43).
The slippage of complementary strands containing long TRS leading to the formation of loops is promoted by any process involving strand separation (step 1 in Fig. 8) (10,71). High levels of negative supercoiling will promote DNA slippage. Additionally, CTG⅐CAG and CGG⅐CCG are known to be more flexible and writhed than random DNA and, as a consequence, accumulate abnormally high levels of supercoil density (7,56). The interaction of DNAs with helix tracking enzymes such as DNA or RNA polymerases introduces high levels of superhelical domains, which may lead to localized denaturation, which FIG. 6. Deletion products found for (GAA⅐TTC) 176 from pRW3808. The experiment was conducted and analyzed as described under "Experimental Procedures." Inset, densitometric scan of the restriction fragments derived from pRW3808 analyzed on PAGE showing the intensity (in pixels) of the fragments containing the full-length (GAA⅐TTC) 176 and the deleted products. The length of the restriction fragments (in bp) was calculated as reported under "Experimental Procedures." The estimated number of triplet repeats for some of the peaks is indicated. This calculation takes into account the fact that long (GAA⅐TTC) n triplet repeat-containing fragments migrate approximately 15% faster than expected (50). The acrylamide gel analysis is shown above the tracing; the gel analysis and the tracing do not precisely correspond, since the tracing is a derivative of the original scan obtained by the formula shown under "Experimental Procedures" for conversion of the lengths to repeat units. In the full figure, each peak in the inset, which corresponded to a distinct band on the gel, was plotted as a function of its distance from the gel origin. The data were fit by a quadratic function (r 2 ϭ 0.9975). will promote the slippage of TRS. Prior theoretical investigations (47) showed that slippage is favored by longer rather than shorter TRS, and hence, the sequence lengths associated with the disease states are more prone to further genetic instabilities than the lengths found in normal individuals. DNA replication may or may not be involved in the strand dissociation step (step 1 in Fig. 8).
Small instabilities are not restricted to products that were originally isolated as deletions from the parent plasmids. Indeed, in a previous study (12) where the large replication-dependent expansion products of (CTG⅐CAG) 175 were examined, the positions of the G to A interruptions were also shifted.
The expansion events in human hereditary neurological diseases fall into two categories, termed type 1 and type 2 (1). For the massive expansions in the type 2 diseases such as myotonic dystrophy and fragile X syndrome, a mechanism involving replication or other systems may be involved. However, in the case of relatively small TRS expansions for the type 1 diseases that generate expanded oligoglutamine tracts in the target proteins, the mechanisms for SSED may be responsible. Both large and small instabilities have recently been observed in lymphoblastoid cell lines derived from myotonic dystrophy patients (19). However, small length changes occurred much more frequently than large changes. Small expansions and contractions (1-7 repeats) of CTG⅐CAG repeats were also found in several tissues from transgenic mice (21). These results indicate that the concomitant occurrence of small and large instabilities is a widespread phenomenon and suggests that the primary cause is the DNA sequence itself. It is possible that both large and small hairpins/loops may form transiently, depending on the extent of strand separation, and that the large and small size changes reflect the interaction and processing of these DNA structures by various enzymatic systems.
In summary, slippage of the TRS complementary strands in vivo, followed by DNA repair steps, may be responsible for the small expansions and deletions observed in type 1 human hereditary neurological diseases. The availability of genetically and biochemically tractable systems in E. coli and yeast provide optimism for elucidating the molecular details of these processes. vertical lines indicate base pairing between 14 repetitive DNA sequences (which may be 1, 2, 3, or n nucleotides in length). All elements contain an identical repeating DNA sequence except element 7. In our specific case, the top line is (CTG) n , the bottom line is (CAG) n , the vertical line is CTG⅐CAG, a full circle is CTA, a full square is TAG, and the vertical line at position 7 is CTA⅐TAG. The opposing carets show non-Watson-Crick oppositions within a repetitive unit whose top and bottom sequence occupy a different position within the series. Note that identical shifts in the positions of the interruptions would be observed if SSED occurred to the left of the interruptions.