CTG triplet repeats from human hereditary diseases are dominant genetic expansion products in Escherichia coli.

The relative ability of the 10 triplet repeat sequences to be expanded in Escherichia coli was determined. Surprisingly, CTG tracts are expanded at least 8 times more frequently than any of the other nine triplets. Low levels of expansion were found also for CGG, GTG, and GTC. Thus, the structure of the CTG repeats and/or their utilization by the DNA synthetic systems in vivo must be quite different from the other triplets. These data further validate this genetically defined system for elucidating molecular mechanisms of expansion and may explain why most triplet repeat hereditary neuromuscular and neurodegenerative disease genes contain CTG repeats.

The relative ability of the 10 triplet repeat sequences to be expanded in Escherichia coli was determined. Surprisingly, CTG tracts are expanded at least 8 times more frequently than any of the other nine triplets. Low levels of expansion were found also for CGG, GTG, and GTC. Thus, the structure of the CTG repeats and/or their utilization by the DNA synthetic systems in vivo must be quite different from the other triplets. These data further validate this genetically defined system for elucidating molecular mechanisms of expansion and may explain why most triplet repeat hereditary neuromuscular and neurodegenerative disease genes contain CTG repeats.
The molecular basis of several human genetic disorders involving CTG triplet repeat expansions including myotonic dystrophy (1, 2), Kennedy's disease (3), spinocerebellar ataxia type I (4), Huntington's disease (5), dentatorubral-pallidoluysian atrophy/Haw River syndrome (6 -8), and Machado-Joseph disease (9) has been partially established. The CTG triplet repeats may be within the genes or in the 3Ј untranslated regions. The CTG expansion is associated with anticipation whereby the penetrance of the disease is increased in successive generations. Thus, this expansion process, possibly due to the slippage of complementary DNA strands, is a novel type of mutation and shows non-Mendelian genetic properties (10,11). Furthermore, the genetic instability of repetitive microsatellite sequences (12) has been implicated with certain cancers (13)(14)(15)(16).
Whereas the lengths of CTG repeats have been correlated with several diseases, expansion of only one of the other triplet repeats (CGG) has been linked with other diseases, i.e. the fragile-X and fra-E mental retardation syndromes, which are inherited as X-linked dominant traits (17,18). None of the other eight possible triplet repeats has been associated with hereditary disease genes. Hence, CTG is the most frequently observed triplet repeat to date.
Since an understanding of the expansion of CTG repeats is critical for elucidating the etiology of these syndromes, we (19) have established an Escherichia coli system for studying the molecular mechanisms of this process. The frequency of expansions versus deletions is strongly influenced by the direction of replication in vivo across these sequences; a model was proposed based on strand slippage coupled with non-classical DNA structures for explaining these behaviors (19).
This laboratory has been evaluating the physical, biochemical, and genetic properties of all 10 repeating triplet sequences. More than 175 plasmids containing the 10 triplet repeat sequences in lengths ranging from 4 to 300 repeats have been prepared and sequenced. 1 In this study, we compare the relative capacity of the 10 repeating triplet sequences to be expanded in vivo in our E. coli expansion system (19). Herein, we demonstrate that the CTG triplet repeat is the most dominant expanded product of all ten triplet repeats. Moreover, direct comparative analyses between CTG and the sequence isomer GTC repeats reveal that the CTG repeat is preferentially expanded.

MATERIALS AND METHODS
Plasmids-The inserts with triplet repeats were cloned into the HincII site of pUC19 except for pRW3293 and pRW3292, which had their inserts in the BamHI site of pUC19. pRW3292 and pRW3293 were deletion products from pRW3262, which was constructed as follows; pRW1981 (19) was digested with Sau3AI, and the fragment containing (CTG) 130 was recloned into BamHI site of pUC19 (31). pRW3106 was a deletion product from pRW3311, which contains 81 repeats of CGG from the FMR-1 gene. 2 pRW3144 and pRW3148 were produced by deletion from pRW3143, which contains 90 repeats of TTA derived from Trypanosoma brucei. 3 pRW3464 and pRW3465 were constructed by expansion of pRW3413, which contains (GTC) 27 . Other triplet repeats were prepared by cloning synthetic oligonucleotides. 3 All plasmid DNAs were isolated from E. coli HB101 by alkaline lysis (20).
Competition Analysis for Expansion-Purified plasmids (0.5 g each) were mixed and digested with SacI and HindIII. After separation of the vector from the band containing the mixture of 10 fragments on a 1.8% agarose gel, regions of the gel above the normal insert size (by ϳ50 bp) 4 up to about 2.5 times larger than the normal size were eluted as well as regions with the insert size using the phenol method and then ethanolprecipitated with 500 ng of an oligomer as a carrier. The eluted DNA fragments were ligated into SacI/HindIII-digested pUC19 and transformed into E. coli DH10B by electroporation (transformation efficiency: Ͼ1 ϫ 10 9 transformants/g) (21). The transformants were grown in LB media containing ampicillin (150 g/ml), and plasmids were isolated by the alkaline lysis method (20). To amplify the expanded components, the above experiments were repeated. After transformation, the cells were spread on ampicillin plates. Colonies were picked from the plates and grown in LB media (containing 75 g/ml ampicillin). The plasmids were isolated, and the population of triplet repeats was determined by restriction enzyme digestions and/or DNA sequencing.

RESULTS AND DISCUSSION
For this study, a mixture of plasmids containing all 10 triplet repeats (Table I) were digested with SacI and HindIII and the products were analyzed on an agarose gel; all inserts were cloned into the polylinker of pUC19. In the case of the shorter triplet repeats (Table I), only fragments containing 45-62 repeats were observed along with the vector. However, areas of the gel that were larger (by approximately 50 bp) than this fragment that did not contain visually detectable DNA were eluted and the putative fragments were recloned back into pUC19. The sizes of the "recloned" inserts were determined by dideoxy sequencing and by restriction enzyme analysis.
When all 10 of the plasmids containing the shorter triplet repeats (Table I) were mixed in equimolar amounts and investigated by this method, we surprisingly found (Fig. 1A) that virtually all of the colonies contained CTG repeats (38 of the 43 colonies investigated, 88.4%). In addition, a smaller number of these colonies contained CGG (7%) and GTC (4.7%) repeats (Fig. 1A). Expansion of the other repeats was not observed. As a control, we eluted the DNA from the major observable band on the gel at the expected repeat size (containing all 10 triplet sequences) and determined the relative ability of each of the 10 triplet repeat inserts to be recloned; the inset in Fig. 1A shows that all triplet repeats were recloned in approximately equal proportion indicating that the frequencies of ligation and/or transformation were not influenced by the triplet repeat sequences.
Other studies were conducted in the absence of the CTG triplet repeat (Fig. 1B) to more rigorously analyze the extent of expansion of the other nine triplet repeat sequences. Fig. 1B shows that CGG (32.3%), GTG (29.0%), and GTC (35.5%) were also expanded, whereas the other repeats were expanded to the extent of 3% or less. The reason for observing the expansion of GTG in this study but not in the experiment shown in Fig. 1A is uncertain. Again, a control experiment of recloning the principal observable band at the expected length of inserts (Fig. 1B,  inset) showed that all of the nine triplet repeat sequences could be recloned, eliminating the potential artifact of problems in ligation, transformation, or other steps during the process.
Additionally, studies were conducted with the triplet repeats containing 68 -81 repeats (Table II). Similar results were found as for the shorter triplet repeats (Fig. 1A). Although the expansion frequency is larger with increasing repeat numbers (19),

TABLE I Plasmids used in this study
The number of triplet repeats and the length of the SacI-HindIII fragments are listed. After digestion with SacI and HindIII, the fragments containing the triplet repeats were between 179 and 208 bp in length for the shorter triplet repeats and between 245 and 265 bp in length for the fragments with the longer triplet repeats.

FIG. 1. Distribution of expansion products from recloning of the plasmid inserts in the 10 shorter triplet repeats (Table I) (A) and the nine shorter triplet repeats without the CTG repeat (B).
The recloning of the expansion regions of the gel and of the expected size of the inserts were described under "Materials and Methods." An expansion product is defined as a clone that contains at least 48 bp more than the average length starting triplet repeat insert; the range of the expanded products was from 48 bp to 150 bp. The number of expanded colonies found were: CTG, 38; CGG, 3; and GTC, 2 (panel A); and CGG, 10; GTG, 9; GTC, 11; and AAG, 1 (panel B). The filled bars indicate the frequency of the expanded products, whereas the hatched bars (insets) indicate the frequency of the products with the expected lengths (controls).

CTG Triplet Repeats in E. coli 1854
GAT, GTA, and TTA repeat expansions were still not observed.
To further evaluate the extraordinary finding that CTG is the dominant expansion product, we performed direct comparative analyses of this triplet repeat with its sequence isomer, GTC. Table III shows that both the 57-mer and the 71-mer of CTG were expanded much more readily than similar lengths of GTC. Both triplet repeats have the same base composition but are sequence-isomeric. Hence, the type or stability of the structure of the CTG repeats or the manner in which this sequence is utilized by the DNA synthetic enzymes in vivo must be very different from the GTC repeat.
GTC repeats have different stacking free energies from CTG repeats, which result in less stable single-strand hairpin structures (22,23). We proposed that expansion occurs in vivo during DNA replication due to the formation of a region of singlestranded CTG repeats (19). This hypothesis was based on the relative rate of expansion versus deletion as a function of the direction of DNA replication (19) as well as on oligonucleotide stability studies as determined by NMR spectroscopy (24,25). CTG oligomers form antiparallel helical duplexes with the formation of T⅐T base pairs, whereas CAG oligomers form only metastable duplexes (25). Also, computer stability analyses (22,24) infer that single-stranded CTG repeats as well as CGG repeats form antiparallel hairpin structures that are more stable than any of the other repeats. Our model (19) requires that, during the lagging strand replication, a stable hairpin may be formed in the newly synthesized DNA strand to achieve expansion, whereas a stable hairpin structure in the lagging strand template is likely to produce deletions.
A consideration of the effect of interruptions (polymorphisms) in the repeats is important since they have been statistically correlated with the stability of triplets; the loss of interruptions increases the genetic instability (26,27). These interruptions consisted of AGG repeats within the CGG triplets for the FMR-1 gene and CAT polymorphorisms in the CAG repeats for the SCA1 gene. The majority of the triplet repeats studied herein (Table I) did contain interruptions. We have evaluated the effect of interruptions on the frequency of expan-sion 5 ; comparison of an uninterrupted (CTG) 130 with a sequence of identical length that contains one CTA interruption (at the 28th repeat), derived from the myotonic dystrophy gene, showed that the uninterrupted sequence was expanded 5 times more frequently than the interrupted sequence. Thus, interruptions decrease the frequency of expansion in these investigations. For this study, we used uninterrupted CTG repeats for the experiments shown in Fig. 1A as well as Table III, while the GTC repeats have one interruption in the center of this tract (Fig. 1A); alternatively, the GTC repeats used in Table III contain no interruptions. Both sets of data indicated no influence of these interruptions on the expansion frequencies, indicating that the differences in the frequencies observed between the expansion of the CTG and GTC repeats (Table III) are caused by the sequences and/or structures rather than the interruptions per se.
At present, the genetic functions responsible for the expansions and deletions are uncertain; the establishment of a genetic system 6 will be important for these studies.
To date, only two triplet repeat sequences, CTG (CAG) and CGG (CCG), have been associated with human genetic diseases due to their massive expansions (1-11, 17, 18). Herein, we have shown that GTG and GTC repeats also have the capacity to be expanded, at least in E. coli. Although other long triplet repeats have been found in the human genome (28,29) in addition to CTG and CGG repeats, their expansion has not been correlated with human hereditary diseases. Since GTG repeats have similar base composition characteristics to CGG and CTG repeats (30), our determinations make this a likely candidate for disease involvement. The close relationship between our studies in E. coli (expansion-deletion data (19), DNA polymerase pausing (31), and mismatch repair (32)) and eucaryotic observations (see Refs. 19,31,and 32) infers that the dominant expansion of CTG described herein may reveal significant molecular insights into human systems.
The surprising discovery that CTG triplet repeats are the dominant expansion products in E. coli, as found (1-11, 17, 18) in clinical samples from human hereditary diseases, suggests the importance of DNA structural properties. Other investigations have revealed that duplex CTG and CGG repeats have unorthodox properties, including nucleosome assembly (33), their capacity to cause DNA polymerases to pause within the repeat sequences (31), as well as conformational features as revealed by helical repeat and polyacrylamide gel migrations (34). 7 The further elucidation of the involvement of DNA conformational features in the etiology of human