Cloning, characterization, and properties of seven triplet repeat DNA sequences.

Several neuromuscular and neurodegenerative diseases are caused by genetically unstable triplet repeat sequences (CTG·CAG, CGG·CCG, or AAG·CTT) in or near the responsible genes. We implemented novel cloning strategies with chemically synthesized oligonucleotides to clone seven of the triplet repeat sequences (GTA·TAC, GAT·ATC, GTT·AAC, CAC·GTG, AGG·CCT, TCG·CGA, and AAG·CTT), and the adjoining paper (Ohshima, K., Kang, S., Larson, J. E., and Wells, R. D. (1996) J. Biol. Chem. 271, 16784-16791) describes studies on TTA·TAA. This approach in conjunction with in vivo expansion studies in Escherichia coli enabled the preparation of at least 81 plasmids containing the repeat sequences with lengths of ∼16 up to 158 triplets in both orientations with varying extents of polymorphisms. The inserts were characterized by DNA sequencing as well as DNA polymerase pausings, two-dimensional agarose gel electrophoresis, and chemical probe analyses to evaluate the capacity to adopt negative supercoil induced non-B DNA conformations. AAG·CTT and AGG·CCT form intramolecular triplexes, and the other five repeat sequences do not form any previously characterized non-B structures. However, long tracts of TCG·CGA showed strong inhibition of DNA synthesis at specific loci in the repeats as seen in the cases of CTG·CAG and CGG·CCG (Kang, S., Ohshima, K., Shimizu, M., Amirhaeri, S., and Wells, R. D. (1995) J. Biol. Chem. 270, 27014-27021). This work along with other studies (Wells, R. D. (1996) J. Biol. Chem. 271, 2875-2878) on CTG·CAG, CGG·CCG, and TTA·TAA makes available long inserts of all 10 triplet repeat sequences for a variety of physical, molecular biological, genetic, and medical investigations. A model to explain the reduction in mRNA abundance in Friedreich's ataxia based on intermolecular triplex formation is proposed.

The reason why only 3 of the possible 10 TRS have been found associated with human hereditary neuromuscular and neurodegenerative diseases is unclear. Ohshima et al. (8) recently performed competition experiments with all 10 TRS in an Escherichia coli expansion system (9) and surprisingly found that CTG⅐CAG tracts were expanded at least eight times more frequently than any of the other nine TRS. Thus, the structure of the CTG⅐CAG repeats and/or their utilization by the DNA synthetic systems in vivo must be quite different from the other TRS. A survey of TRS in the human genome (10) revealed the presence of all 10 sequences but in lengths generally shorter (Ͻ15 repeats) than found in these disease genes. By the repeat expansion detection (11) and fluoresence in situ hybridization (12) analyses, several long tracts of TRS were identified. Also, the genomes of other species contain TRS (13)(14)(15)(16)(17)(18)(19)(20).
Simple repeat sequences in plasmids adopt non-B conformations under appropriate conditions (such as negative supercoil density, ionic strength, etc.) in vitro (reviewed in Refs. 5,21,22). For example, mirror repeat pur⅐pyr sequences form triplexes (H-DNA) and (in certain cases) nodule DNA, alternating pur-pyr sequences adopt left-handed Z-DNA, inverted repeats form cruciforms, and repeating A tracts exist in bent (curved) conformations. Some unusual structures were proven to exist in vivo in plasmids (5,21,22) and in chromosomes (23). However, no DNA structural studies have been reported on long tracts of TRS in plasmids except for nucleosome positioning by EM (24,25). A number of biophysical studies have appeared recently (26 -32) on short (generally Ͻ50 bp) synthetic oligonucleotides with CTG, CAG, CGG, or CCG sequences which are the basis for the concepts of quasi-stable hairpin loops as well as tetraplexes and other ordered conformations. A wide range of investigations on DNA polymers with TRS (reviewed in Ref. 33) that first revealed the influence of DNA sequences on properties and structures was described ϳ25 years ago.
Herein, we report novel cloning strategies for long tracts of TRS and the properties of seven cloned TRS. Similar data were described for CTG⅐CAG 3 (5,8,9,24,34,35) and for CGG⅐CCG 3 (5,34,36), and the adjoining paper (37) reports parallel work on TTA⅐TAA. These investigations were undertaken to understand the properties and conformations of each of the 10 TRS and to serve as a basis for the interpretation of results with the disease sequences, CTG⅐CAG, CGG⅐CCG, and AAG⅐CTT (reviewed in Ref. 5). Although a number of diseases have been characterized recently as TRS syndromes (1)(2)(3)(4)(5)(6)(7), AAG⅐CTT is the only new TRS to be implicated with a disease gene. However, other diseases also show anticipation (reviewed in Refs. [1][2][3][4][5][6]; if a correlation exists between anticipation and triplet repeats, many more diseases may be identified that show this behavior since at least 40 genes containing TRS have been found (reviewed in Ref. 5). Perhaps these disease genes may contain some of the other seven sequences.

MATERIALS AND METHODS
Synthetic Oligonucleotides and Plasmid Construction-Oligonucleotides containing a triplet repeat sequence and a restriction site that was absent in the vector ( Fig. 1) Fig. 1 were purified from a 20% polyacrylamide gel. All inserts were cloned into pUC19 unless indicated otherwise. pUC19 was digested with HincII and dephosphorylated with calf intestine alkaline phosphatase (1 unit) (Boehringer Mannheim) for 15 min at 37°C and for an additional 45 min at 55°C with the addition of another unit of calf intestine alkaline phosphatase. The ligation reaction between the oligoduplex and the dephosphorylated vector was performed in the presence of T4 DNA ligase (1 unit) (U. S. Biochemical Corp.) and 66 M ATP at 16°C for 12 h. The mixture was transformed into E. coli DH5␣ by electroporation (38), and the transformants were spread on LB agar plates containing ampicillin (75 g/ml) (Sigma), 5-bromo-4-chloro-3-indolyl-␤-D-galactoside (60 g/ml) (U. S. Biochemical Corp.), and isopropyl-␤-D-thiogalactopyranoside (20 g/ml) (U. S. Biochemical Corp.). White colonies were grown in LB media containing ampicillin (75 g/ml), and plasmids were isolated by the alkaline lysis method (39).
After the initial cloning of the oligonucleotides containing the TRS described above, the plasmid was digested with the restriction enzyme, shown in Fig. 1, specific for each TRS, and then dephosphorylated with calf intestine alkaline phosphatase as described above. The linearized plasmid was purified from a 1.0% agarose gel (International Biotechnologies or FMC). The excised band was crushed by forcing it through a 1-cc syringe, frozen in phenol at Ϫ80°C for 15 min, and ligated with the same duplex used for the first cloning of the insert at 16°C for 12 h. The ligation mixture was transformed into E. coli DH5␣ or HB101 by electroporation (38). DNAs were isolated from the transformants as described above and characterized by restriction mapping and dideoxy chain termination sequencing of both complementary strands. The above steps were repeated to elongate the number of triplet repeats.
pUC18NotI was described previously (9). Expansion of Triplet Repeats in Vivo-After elongation of the triplet repeats as described above, in vivo expansions were performed as described previously (8,9). Plasmids were digested with SacI and HindIII and run on a 1.6% agarose gel. Regions of the gel that were above the original insert size were eluted by the phenol method described above and then ethanol-precipitated with 500 ng of an oligomer as a carrier. The eluted DNA fragments were ligated into the SacI and HindIIIdigested pUC19 or pUC19NotI (9) and transformed into E. coli HB101 by electroporation (38). The transformants were grown in LB media containing ampicillin (150 g/ml), and plasmids were isolated by the alkaline lysis method (39), and the above experiments repeated. After transformation, the cells were spread on ampicillin (150 g/ml) plates. The plasmids were isolated, and the inserts were characterized by restriction mapping and dideoxy chain termination sequencing. The lengths of all TRS shown in Table I, up to 91 repeats, are certain.  However, for the uninterrupted sequences shown in Table II of ϳ80 -158 repeats, the lengths were estimated from agarose gels and sequencing to be Ϯ5 repeats.
Primer Extension Analyses-Primer extension was performed as described previously (34). 2 g of DNA was dissolved in 20 l of a solution containing 0.2 M NaOH and 2 ng of 5Ј-32 P-end-labeled primer. Primers (New England BioLabs) were M13/pUC forward sequencing primer 1211 (17-mer) and M13/pUC reverse sequencing primer 1201 (16-mer) for the bottom and top strands, respectively. The mixture was heated for 90 s at 90°C followed by cooling to room temperature for 4 min and neutralized with 2 l of 3 M sodium acetate (pH 5.2). After DNA was precipitated with ethanol, the DNA was resuspended in a buffer containing 40 mM Tris-Cl (pH 7.5), 50 mM NaCl, 20 mM Mg 2 Cl, 10 mM dithiothreitol, and 0.5 mM 2Ј-deoxynucleoside triphosphates. Before the polymerase was added, the mixtures were preincubated for 10 min at 37°C. 5 units of the Klenow fragment of E. coli DNA polymerase I was added into the mixture, and the mixture was incubated for 10 min at 37°C. After termination of the reaction by the addition of 95% formamide and 20 mM EDTA, the DNA was fractionated on a 12% denaturing polyacrylamide gel, and the bands were visualized by autoradiography.
Two-dimensional Agarose Gel Electrophoresis-To generate topoisomers of plasmids, 6 g of DNA was incubated in 100 l of a solution containing Tris-Cl (pH 7.6), 50 mM KCl, 10 mM 2-mercaptoethanol, 1 mM EDTA, 0 -4 M ethidium bromide, and chicken erythrocyte topoisomerase (40) for 60 min at 37°C. The ethidium bromide and topoisomerase were removed with phenol extraction twice and ether once, and the DNAs were ethanol-precipitated. Mixtures of topoisomer populations were subjected to first-dimension gel electrophoresis in 1. Chemical Modifications-The modifications of the plasmids by OsO 4 (Aldrich), diethyl pyrocarbonate (DEPC) (Aldrich), and chloroacetaldehyde (CAA) (Fluka) were performed as described previously (37,41). CAA was used after distillation. For the OsO 4 modifications, 3 g of DNA in 100 l of 0.5 ϫ TBE buffer (pH 8.3) or TAE buffer (pH 4.5) was incubated for 30 min at 25°C with 2 mM OsO 4 . The DEPC and CAA reactions were performed for 30 min at 25°C with 10% DEPC and for 60 min at 37°C with 2% CAA, respectively. The reactions were terminated by chilling on ice and washed twice with cold ether.
After recovery by ethanol precipitation, the DNAs were divided into two samples and digested with either SacI and HindIII or EcoRI and SphI. The overhangs were labeled with [␣-32 P]dATP (10 Ci, 1.7 pmol) (Amersham Corp.) and the Klenow fragment of E. coli DNA polymerase I (10 units) (U. S. Biochemical Corp.). The DNA fragments were isolated by polyacrylamide gel electrophoresis. The purified fragments were dissolved in 100 l of 10% piperidine (Aldrich), heated to 90°C for 30 min, and then the piperidine was removed by lyophilization. The DNAs were then fractionated on a 12% denaturing polyacrylamide gel, and the bands were visualized by autoradiography.

RESULTS
Cloning of Triplet Repeat Sequences-We have previously cloned three TRS from natural sources, CTG⅐CAG 1 (9, 24), CGG⅐CCG (34,36,42), and TTA⅐TAA (37), with various lengths up to 250 repeats. Long tracts of other triplet repeats have been identified in the human genome (11,12), but no cloning has been reported yet. For studies on the other seven triplet repeats, it is desirable to clone long tracts (Ͼ200 bp). Synthetic oligonucleotides have been useful for cloning the desired sequences but only up to a length of ϳ100 bp. Thus, in this study, new cloning strategies with synthetic oligonucleotides were implemented to clone much longer tracts of TRS.
The goal of these strategies is to utilize chemically synthe-sized deoxyribo-oligonucleotides of lengths that can be prepared conveniently containing a unique restriction enzyme site at one end that does not exist in the pUC19 vector ( Fig. 1). The duplex oligonucleotide was cloned in both orientations into the HincII site of pUC19 via blunt-ended ligation. Digestion of the ensuing plasmid with the appropriate restriction enzyme (see Fig. 1, BsaAI in the case of GTA⅐TAC) provided a linear "vector" for the subsequent cloning of a second unit of the synthetic duplex ( Fig. 2A). In general, this process was repeated four times to give inserts of ϳ60 triplet repeats. In principle, the inserts could be cloned in either orientation. However, we predominantly found direct repeat inserts, not inserts with inverted repeats that might have been deleted due to cruciform formation in vivo (22). This strategy (Figs. 1 and 2A) should give inserts with pure TRS, due to the judicious choices of the unique cleavage enzymes, with no interruptions (also called polymorphisms or mutations). However, after DNA sequencing of some of the plasmids (Table I), interruptions were found that obviously were introduced during their replication. In several cases, longer inserts (Table II) were generated in plasmids by utilizing the expansion procedures in E. coli as reported previously (8,9). For the cloning of GTA⅐TAC repeats ( Fig. 2A), the complementary synthetic oligonucleotides were annealed to produce a duplex containing a BsaAI site along with TAC repeat units. The duplex was cloned into the HincII site of pUC19 to produce pRW3151 which has a G to A mutation or the oppositely oriented pRW3152 (Table I). Both plasmids were digested with BsaAI, and the same duplex oligomer used for the first cloning was cloned into the BsaAI sites to produce pRW3153 or pRW3154 (Table I), respectively. For pRW3153, small deletions of the repeats occurred during the cloning. In general, deletions occurred in multiples of 3 bp. The above procedure based on digestion with BsaAI and insertion of a duplex oligomer was repeated to produce long tracts of GTA⅐TAC repeats (Table I). pRW3155 and pRW3156 were obtained from pRW3153 and pRW3154, respectively. pRW3157 and pRW3159 were produced from pRW3155. pRW3451 containing 77 repeats of GTA⅐TAC was produced from pRW3157 by the insertion of an additional 16 repeats. pRW3851 was obtained by recloning of pRW3451 into pUC18NotI. For pRW3159 and pRW3156, deletions of 1 and 7 repeats, respectively, occurred from the desired products.
The same procedure was used for the cloning of GAT⅐ATC, GTT⅐AAC, CAC⅐GTG, AGG⅐CCT, and TCG⅐CGA triplet repeats, using the appropriate restriction enzymes and synthetic oligonucleotides ( Fig. 1). According to the cloning strategy, 16 pure repeats should be introduced into the plasmid in the first cloning cycle, and the repetitive cloning results in the introduction of an additional 16 repeats for each subsequent cloning cycle. However, for the plasmids shown in Table I, polymorphisms including base mutations, insertions, and deletions were observed as seen in long tracts of TRS in the human genome (43)(44)(45)(46)(47)(48)(49)(50). For example, pRW3161 has one A deletion in (GAT) 16 . pRW3162 was produced from pRW3161 with the insertion of (GAT) 16 . pRW3163 has ((GAT) 10 AT(GAT) 3 ) added to (GAT) 16 from pRW3161. pRW3164, pRW3165, pRW3166, pRW3167, and pRW3169 are expanded products with small expansions in the longest GAT tracts. pRW3441 was formed by elongation of pRW3169 with the (GAT) 16 oligomer, but the ensuing plasmid has one T to A mutation, and pRW3442 has an additional (GAT) 9 insertion in addition to (GAT) 16 .
pRW3171 had an insertion of extra repeats; 26 triplet repeats containing one G to A mutation were found instead of the 16 repeats expected in the first cloning. pRW3172 and pRW3173 were predicted products, but one T to G and one G to A mutation occurred in the cloning of pRW3174. pRW3871 was derived from pRW3173 by recloning into pUC18NotI. For both GAT⅐ATC and GTT⅐AAC, only inserts in one of the two possible orientations were found (Table I); the reason for this behavior is uncertain. Furthermore, repeated attempts to clone SacI-HindIII fragments containing the GAT⅐ATC inserts into pUC18NotI were unsuccessful. Interestingly, this sequence was proposed (51) to have an unusual structure since it does not bind actinomycin D.
For CAC⅐GTG, the first cloning cycle produced pRW3181 and pRW3182. pRW3182 was the starting material for pRW3183 and pRW3184 with 16 and 14 repeats of GTG inserted, respectively. For pRW3184, a new flanking unit (GTACG) was observed, but this was not seen for pRW3183. However, this phenomenon is not the same as that seen for pRW3163 described above. For pRW3185 and pRW3186, derived from pRW3183, (GTG) 2 and GTGGT, respectively, were deleted from the predicted repeats. pRW3186 was the starting material for pRW3187, pRW3188, pRW3189, and pRW3421. pRW3424 was obtained from pRW3421 with the insertion of (GTG) 14 . pRW3425 is an expanded product of pRW3421 with 12 more repeats. pRW3427, pRW3428, and pRW3429 were elongation products from pRW3425. The plasmids derived from the cycle following the production of pRW3185 and pRW3186 from pRW3183 do not have additional GTACG flanking units, indicating that in vivo expansion of the triplet along with mutations of a few base pairs has occurred as seen in the case of pRW3183.
For AGG⅐CCT repeats, the first cloning cycle produced pRW3191 and pRW3192. pRW3193, pRW3194, and pRW3195 were elongated products from each cloning step. These new inserts contained an additional 14, 12, and 9 repeats, respectively, indicating that the perfect AGG⅐CCT repeats were likely to be unstable to give deletions from the predicted repeats. For pRW3196, two duplex units were inserted to give CCAGG polymorphisms in the CCT repeats. pRW3197, pRW3198, and pRW3891 were produced by recloning of pRW3194, pRW3195, and pRW3196, respectively, into pUC18NotI.
For TCG⅐CGA repeats, pRW3411 and pRW3412 were obtained by the first cloning cycle. pRW3413 had one C and a TCG triplet deletion in the insert. pRW3414 and pRW3415 were elongation products from each cloning step.
In general, as the length of triplet repeats increases, poly-morphisms are more likely to occur, probably due to instability (5,9). Therefore, polymorphisms seem to stabilize the triplet repeats, as seen in the case of human disease genes (43-45, 49, 50).
Cloning of AAG⅐CCT Triplet Repeats-Since there are no appropriate restriction enzymes for cloning the AAG⅐CTT repeat sequence using the method described above, we modified the strategy used for the other six TRS (Fig. 2B). In this case, two duplexes were formed by annealing two pairs of complementary oligonucleotides. The duplexes, each with an overhanging end complementary to that of the other duplex, were mixed together in the cloning procedure to give a longer duplex containing a StuI site (pRW3436 and pRW3437). For both plasmids, deletions of Gs were found to give polymorphisms. In  16 GTACG was cloned into the HincII site of pUC19. The plasmid was digested with BsaAI, which cuts the unique recognition site within the oligoduplex, a site not present in pUC19, and then ligated with the same oligoduplex as used for the first cloning. The experiment was repeated to elongate the triplet repeat sequences. This strategy was used also for cloning of GAT⅐ATC, GTT⅐AAC, CAC⅐GTG, AGG⅐CCT, and TCG⅐CGA repeat sequences. B, cloning scheme for AAG⅐CTT repeat sequence. The two oligoduplexes were cloned into the HincII site of pUC19, and the cloned plasmid was digested with StuI, which cuts the unique recognition site within the oligoduplex, a site which is not present in pUC19, and then ligated with the oligoduplex (AAG) 19 AGGCCTGG. The value of n correlates with the cycle number. Other details are as described under "Materials and Methods." addition, in some cases only one of the component duplexes was cloned (pRW3433 and pRW3434). Either pRW3436 or pRW3437 was digested with StuI, and the duplex for AAG repeats (Fig. 1, bottom line) was cloned into the StuI site. pRW3438 was obtained from pRW3436 with a 4-repeat deletion from the expected number, and pRW3439 and pRW3440 were   derived from pRW3437. The former was the desired product, whereas the latter was a deletion of 10 repeats. Although AGG interruptions are necessarily inserted into the AAG repeat sequences by this method, it may be possible to clone pure tracts of AAG by the in vivo expansion method (see "Discussion").

Expansion of Triplet Repeats in Vivo-
We have previously reported the cloning of CTG triplet repeats by the E. coli expansion method (9). The lengths of the seven triplet repeats constructed above were extended using this method. In the above cloning study, in vivo expansions were observed with small elongations for GAT⅐ATC (pRW3164, pRW3165, pRW3166, pRW3167, and pRW3169) and for CAC⅐GTG (pRW3425, pRW3424, pRW3428, pRW3427, pRW3429, pRW3183, pRW3187, pRW3189, pRW3188, and pRW3421). In our previous study on the expansion frequency of all 10 triplet repeat sequences (8), we found that CTG⅐CAG repeats were expanded at least eight times more frequently than any of the other nine repeats in E. coli. Besides CTG⅐CAG, expansion products of only CAC⅐GTG, CGG⅐CCG, TCG⅐CGA, and AAG⅐CTT were obtained. Herein, we found that pRW3480 and pRW3481 were expanded products of pRW3428 with about 40 and 60 more repeats, respectively, and pRW3416 and pRW3471 were expanded from pRW3415 and pRW3439, respectively (Table II). pRW3482 and pRW3483 were produced with pure CAC⅐GTG repeats, and all the expanded products for TCG⅐CGA except for pRW3416 had pure TCG⅐CGA repeats. Although expansion of GTA⅐TAC repeats was not found in our previous study (8), the expanded product pRW3453 was obtained from pRW3451 in an individual experiment. Attempts to obtain expansion products of the other four triplet repeats, GTT⅐AAC, GAT⅐ATC, AGG⅐CCT, and TTA⅐TAA, have been unsuccessful to date. Since shorter triplet repeats have a lower frequency of expansion (8,9), certain minimum lengths of the triplet repeats are needed for the expansion in vivo to be effective. Hence, the combination of our cloning strategy with uniquely designed synthetic oligonucleotides and the expansion in vivo is useful for the construction of long tracts of triplet repeat sequences.
The lengths observed exceed the number of triplet repeats known to cause neuromuscular diseases (1-7).
Pausing of DNA Polymerase in Triplet Repeat Sequences-Primer extensions by DNA polymerases have been used to study unusual DNA structures due to inhibition of DNA polymerization (52)(53)(54)(55)(56)(57). Recently, we found that appropriate lengths of CTG⅐CAG and CGG⅐CCG triplet repeats from human hereditary disease genes caused pausings of DNA polymerases, namely Sequenase, the Klenow fragment of E. coli DNA polymerase I, and human DNA polymerase ␤ (34). These in vitro data indicated that longer tracts of CTG⅐CAG and CGG⅐CCG triplet repeats adopt a non-B conformation(s) that blocks DNA polymerase progression. The resultant idling polymerase may catalyze slippages to give expansions of triplet repeats. On the other hand, TTA⅐TAA triplet repeats did not show any pausings by the DNA polymerases (37). In vivo, longer tracts of CGG⅐CCG also cause a pausing of DNA synthesis. 4 Herein, similar studies were performed with the other seven triplet repeats.
Primer extension studies were conducted with the Klenow fragment of E. coli DNA polymerase I and the seven new plasmids containing various lengths of repeats summarized in Table III. Previous studies showed that the pausing phenomenon for CTG⅐CAG and CGG⅐CCG was dependent on the temperature of the preincubation; pausing was observed when the DNA was preincubated at 37°C for 10 min (34). Thus, the DNA samples were preincubated at 37°C for 10 min before adding the polymerase (Table III). For the primer extension of the bottom strand of (TCG⅐CGA) 98 , when the TCG strand is the template, the polymerase paused at 29 -33 TCG triplets from the beginning as well as near the beginning region (5-10 repeats) of the TCG⅐CGA insert; the intensity of the pausing was much higher in the distal than in the beginning region (Fig. 3). On the other hand, for the top strand, when the GAC strand is the template, no pausings were observed. Unlike CTG⅐CAG, the pausing phenomenon for TCG⅐CGA was strand-specific, as 4 S. M. Mirkin, unpublished data.  Primer extension analyses were performed as described under "Materials and Methods." n indicates the total number of triplets. pRW3431, which contains (AAG) 16 , were obtained by cloning synthetic oligonucleotide, GCTCT(AAG) 16 The strong pausings were observed at 29 -33 triplets from the beginning of the repeats.
c The pausings were observed in the 3Ј-half of the synthesized strand. d The pausings were observed throughout the repeats.
previously observed for CGG⅐CCG (34). The pausing phenomenon was length-dependent since (TCG⅐CGA) 46 did not show any pausings (Table III). This agrees with previous data for CTG⅐CAG and CGG⅐CCG in which the repeats required a continuous length of Ͼ60 for the pausing to occur (34). In the case of (CTG⅐CAG) 130 , pausings were seen at 35-39 triplets, whereas in the case of (TCG⅐CGA) 98 pause sites were at 29 -33 (Fig. 3). This difference is due to the primer locations. The location of the primer binding sites was found to influence the pausing sites for CTG⅐CAG repeats (34); as the distance between the initiation site of the CTG repeat and the 5Ј-end of the primer increased, the pausing site was further from the initiation site. The distance between the pausing site and the first CTG is about 20 bp longer than the distance between the first CTG and the 5Ј-end of the primer. In this study, the 5Ј-end of the primer is located 64 bp from the first TCG triplet for (TCG⅐CGA) 98 and 89 bp for (CTG⅐CAG) 130 .
In our previous study (34), we found that the pausing phenomenon seen for the CTG⅐CAG and CGG⅐CCG repeats was influenced by temperature; preincubation at 70°C for 10 min abolished the pausing for CTG⅐CAG, but not for CGG⅐CCG, indicating that the heat treatment destroyed a structure (probably H-bonded) that blocked polymerase movement. The structure in the CGG⅐CCG repeats must be more thermally stable than that in the CTG⅐CAG repeats. In the case of (TCG⅐CGA) 98 , the pausings were also abolished by the preincubation at 70°C for 10 min as found for CTG⅐CAG (data not shown). Likewise, treatment of (TCG⅐CGA) 98 at 60 or 50°C also abolished the pausing sites. These results indicate that similar thermal structural stabilities of the DNA structures exist for CTG⅐CAG and TCG⅐CGA repeats, as expected.
On the other hand, no pausings of this type were seen for the other six triplet repeats (Table III). GTA⅐TAC, GAT⅐ATC, GTT⅐AAC, and CAC⅐GTG had no pausings at all, whereas AGG⅐CCT and AAG⅐CTT showed different types of pausings (data not shown). Extensions of purine strands (AGG and AAG) as the templates were terminated throughout the repeats. In contrast, extensions of pyrimidine tracts (CCT and CTT) showed pausings preferentially in the 3Ј-half of the extended strands. These terminations of DNA polymerization are likely due to inhibition of DNA synthesis by intramolecular triplex formation with the G⅐G⅐C and A⅐A⅐T base triads (54, 58 -60), since the primer extension analyses were performed at pH 7.5 in the presence of Mg 2ϩ (60), and these results are diagnostic for triplexes. Hence, these data indicate that TCG⅐CGA, the sequence isomer of CTG⅐CAG, has the same property for the pausing of DNA polymerase as CTG⅐CAG and CGG⅐CCG. Thus, out of all 10 triplet repeats, these three repeats may form an unusual DNA structure(s), in agreement with other results, 3 that inhibits DNA polymerization. However, these structure(s) differ from the triplexes formed by AGG⅐CCT and AAG⅐CTT repeats.
DNA Structural Analyses of Triplet Repeat Sequences-The supercoil stress-induced transitions for non-B DNA structures (such as left-handed Z-DNA, triplexes, cruciforms, and AT-rich bp unpairing) can be monitored by the use of two-dimensional agarose gel electrophoresis (22,(61)(62)(63)(64)(65)(66)(67)(68). In addition, chemical probing can identify B-Z junctions, single-stranded, or perturbed regions in other unusual DNA structures (22, 41, 61-67, 69, 70). The combination of the two analyses has provided powerful diagnostic tools for analyzing DNA structures at the bp level. We performed these analyses previously on CTG⅐CAG (up to 130 repeats) and CGG⅐CCG (up to 240 repeats) triplet repeats observing neither agarose gel transitions nor chemical modifications (36). 3 On the other hand, TTA⅐TAA triplet repeat sequences formed unpaired regions throughout the repeats (37) as detected by these methods. Herein, similar studies are described to extend these DNA structural analyses to the other seven TRS.
For AAG⅐CTT repeats, the 5Ј-half of the (AAG) 16 strand of negatively supercoiled pRW3431 (n ϭ 16) (Ϫ ϭ ϳ0.060) was modified by CAA at pH 4.5, but not at pH 8.3, and a supercoilinduced transition was observed only at pH 4.5 at Ϫ ϭ 0.027 with a relaxation of 3.5 supercoil turns (data not shown).
Thus, in both cases of the AGG⅐CCT and AAG⅐CTT repeats, the modifications observed at pH 4.5 show that the pur⅐pyr tracts formed intramolecular triplexes which consisted of C⅐G⅐C ϩ and T⅐A⅐T base triads. A number of studies by these methods (58,59,61,62,69,70) with other intramolecular triplexes validate this conclusion. It is likely that these triplexes differ from those observed in the primer extension study at pH 7.5 described above.
Longer AAG⅐CTT repeats showed relaxations even at pH 8.3 (Table IV). For pRW3437 (n ϭ 38), the transition occurred at topoisomer Ϫ16 (Ϫ ϭ 0.060) with 8.5 supercoil turns relaxed, corresponding to an unpairing or re-pairing into a triplex of 89 bp. At topoisomer Ϫ18 (Ϫ ϭ 0.067), an additional two supercoils were relaxed, corresponding to 111 bp, or involvement of the entire repeat region (data not shown). For pRW3439 (n ϭ 58), the transition was initiated at topoisomer Ϫ14 (Ϫ ϭ 0.051) with six supercoil turns relaxed (corresponding to 63 bp), and with the addition of more supercoil density, the relaxation continued until topoisomer Ϫ21 (Ϫ ϭ 0.077), a total of 13 supercoil turns corresponding to 137 bp (Fig. 4). For pRW3193, pRW3437, and pRW3439, multiple spots at each linking number were observed when samples were run at pH 4.5. As shown in Fig. 4 for pRW3439, topoisomers between Ϫ4 (Ϫ ϭ 0.015) (the beginning of the transition) and Ϫ18 (Ϫ ϭ 0.066) (the end of the transition) showed the presence of more than one spot for DNA molecules having the same linking number, indicating that different conformers of the triplexes are forming due to different nucleation sites as seen previously for the (GA⅐TC) 37  tract (62) and for (GAA) 9 TTC(GAA) 8 and (GGA) 9 TCC(GGA) 8 (61). Previous studies on shorter AGG⅐CCT and AAG⅐CTT repeat sequences (8 repeats or 17 repeats interrupted by 3 bp) revealed that no modifications were observed in negatively supercoiled plasmids at pH Ͼ7.6 (41,61), whereas, in this study, longer AAG⅐CTT triplet repeats (38 and 58 repeats) showed relaxations to form triplexes even at pH 8.3. This is due to reducing the dependence on low pH by increasing the length of the pur⅐pyr tract, as found previously (62). Relaxation was not found for longer AGG⅐CCT repeats (30 and 53 repeats) at pH 8.3. These differences between the AGG⅐CCT and AAG⅐CTT repeats may be due to the different G⅐C content, since the base composition of pur⅐pyr sequences affects the thermostability and the amount of supercoiling needed for intramolecular triplex formation (70).
In summary, considering the seven TRS studied herein, AGG⅐CCT and AAG⅐CTT form intramolecular triplexes, and the other five TRS do not form any previously characterized non-B DNA structures.

DISCUSSION
The role of DNA structure and properties in gene expression has been the principal emphasis of this lab (5,33,69). On the basis of in vitro and in vivo investigations with simple repeating nucleotide sequences (mono-through tetra-and dodeca-) in DNA oligomers and polymers, restriction fragments, plasmids, and chromosomes, we anticipated that the long tracts of certain TRS (CTG⅐CAG, CGG⅐CCG, and AAG⅐CTT) that elicit human hereditary neuromuscular and neurodegenerative diseases might adopt non-B conformations. A series of physical, biochemical, and genetic investigations (5) reveal several significant features that provide insights into our understanding of genetic instabilities which elicit the clinical characteristics of anticipation (1)(2)(3)(4)6). We feel that studies on all 10 TRS are important since, first, it is possible that one or more of the other seven TRS (GTA⅐TAC, GAT⅐ATC, GTT⅐AAC, CAC⅐GTG, AGG⅐CCT, TCG⅐CGA, and TTA⅐TAA) may be identified in the future with a genetic disease gene (e.g. AAG⅐CTT was very recently described (7) as the cause of Friedreich's ataxia) and, second, they serve as controls for the interpretation of data with the other TRS. This contribution and the accompanying paper on TTA⅐TAA (37) along with prior work on CTG⅐CAG (9,24) and CGG⅐CCG (34,36,42) describe the cloning and partial characterization of all 10 TRS.
Our cloning strategy was designed to employ synthetic oligonucleotides that would not slip (due to the presence of the restriction site (Fig. 1)) which would enable the cloning of long inserts of pure TRS (no polymorphisms) by repetitive steps (Fig. 2). Since long tracts (hundreds of repeats) have only been found to be associated with the three TRS implicated in the hereditary diseases (Introduction), it was necessary to employ synthetic oligomers for the other TRS. Our cloning strategies (Fig. 2) successfully provided all seven TRS with ϳ60-ϳ90 repeats. However, some of the plasmids contained polymorphisms, especially for the longer TRS. This behavior in E. coli is similar to the observations in eucaryotics including humans (44,45) that the presence of a few non-perfect repeats is apparently necessary for genetic stability. Also, in the case of the fragile X and SCA1 genes, it is thought that polymorphisms might cause the stabilization of TRS and that the loss of polymorphisms might cause instability (expansions) (36,(43)(44)(45). Interestingly, in the Friedreich's ataxia case (AAG⅐CTT), the polymorphisms always maintain the homopurines on one strand with the pyrimidines located exclusively on the complementary strand (7); we observed the same behavior in E. coli (Table I).
Another cloning strategy with synthetic oligonucleotides containing BbsI and BsaI sites along with TRS was reported (71) for the preparation of tracts of CTG⅐CAG. The method involves the multimerization by ligation of fragments with TRS to give longer tracts by repetitive cloning steps and was proposed to be useful for the cloning of any repeat sequence. We 5 attempted a similar strategy with BbsI as well as SapI, instead of BsaI, to clone long tracts of CGG⅐CCG and were not successful. However, other methodologies worked well (36).
For AAG⅐CTT repeats, the strategy was modified from that used for the cloning of the other six TRS since no appropriate enzyme site exists for AAG⅐CTT (Fig. 2B). As a result, AGG⅐CCT interruptions were necessarily present in the AAG⅐CTT tracts (Table II). Since AAG⅐CTT repeats showed a propensity, albeit small, to be expanded in E. coli (8), (AAG⅐CTT) 103 (pRW3471) was obtained from (AAG⅐CTT) 58 (pRW3439) by in vivo expansion. Investigations on the mechanism(s) of genetic expansion (8,9,72) of this sequence are likely to be interesting since (a) it forms a triplex (61,70) and (b) Friedreich's ataxia (FRDA) patients have 200 -900 repeats in the first intron of the frataxin (210-amino acid) gene, whereas normal individuals have 7-20 repeats (7).
We propose that the mechanism of reduction of abundance of mature X25 mRNA in individuals with FRDA (7) is the formation of an intermolecular triplex between the AAG⅐CTT in the first X25 DNA intron and the RNA segment with the GAA tract (Fig. 5) removed by splicing. Prior work (73,74) showed that the presence of a triplex inhibits transcription. In the case of long r(AAG) tracts (600 -800 repeats) from FRDA cases, the triplex may be sufficiently stable thermodynamically to cause the reduction in abundance of the FRDA mature mRNA, whereas for shorter r(AAG) stretches from normal individuals (6 -20 repeats), the triplex may be unstable and will not cause an inhibition. Possible pairing schemes (reverse Hoogsteen) are shown for the T⅐A⅐A and C⅐G⅐G triads.
Our E. coli in vivo expansion system (8,9,72) has been valuable for the preparation of long TRS. Prior work showed that CTG⅐CAG repeats could be expanded about nine times more frequently than any of the other nine TRS; TCG⅐CGA, CGG⅐CCG, and CAC⅐GTG were also prone to be expanded, but less frequently; of the other TRS, only AAG⅐CTT was found, but at a very low level (8). In the present study, besides those TRS, GTA⅐TAC expansion was observed in an individual experiment. For GAT⅐ATC, small expansions (ϳsix repeats) occurred. Since it is known that any TRS might be sufficiently unstable to give slippage(s) depending on the sequence (75), we propose that expansions occurred during DNA replication. In the present and other studies (8,72) on in vivo expansion, we focused on expanded products larger than 20 repeats rather than on small expansions as seen for GAT⅐ATC in this study. Since the expansion in E. coli of CTG⅐CAG repeats is distal to the replication origin as a single large event (72), and since the expansion occurs more frequently with increasing repeat numbers (9), this strategy is highly desirable for the cloning of longer expansion products. Once the appropriate repeat lengths are cloned from synthetic oligomers, in vivo expansion could be applied. An advantage of the in vivo expansion approach is that fewer polymorphisms are found than when synthetic oligonucleotides are used and also that the expansion products will be triplets except for the cases of mono-or dinucleotide interruptions in the TRS. Thus, the combination of the cloning strategies with synthetic oligonucleotides and in vivo expansion was useful for the cloning of long tracts of TRS.
CTG⅐CAG and CGG⅐CCG repeats showed length-dependent 5 R. Gellibolian and R. D. Wells, unpublished data. strong pausings by DNA polymerases within the repeats (34).
Herein, we showed that TCG⅐CGA, the sequence isomer of CTG⅐CAG, also showed similar pausing (Fig. 3). The pausing was destroyed by heat treatment (70°C). AAG⅐CTT and AGG⅐CCT had different types of pausings from the cases seen for the previous three TRS, indicating that the pausings were derived from triplex structures. No pausings were observed for the other TRS. We previously proposed that the pausings were due to the existence of a non-B DNA structure(s) that blocks DNA polymerase progression; the resultant idling polymerase may catalyze slippages to give expanded sequences. Although CAC⅐GTG, AAG⅐CTT, and GTA⅐TAC repeats were expanded in E. coli, pausings, as seen for CTG⅐CAG, CGG⅐CCG, and TCG⅐CGA, were not found for these inserts except for AAG⅐CTT. The locations of the pause sites were related to the location of the primer (34). Recent studies 6 on the DNA sequences of the paused newly synthesized DNA products revealed that template switching had occurred. We envision that the snap-back structure was caused by the polymerase encountering a non-B DNA structure in the TRS. Simple repeating DNA sequences adopt non-B DNA confor-mations (reviewed in Refs. 5,21,22,33). For TRS, long tracts of CTG⅐CAG, CGG⅐CCG, and TTA⅐TAA have been shown to form non-B DNA structures. Our present studies with chemical probes and two-dimensional gel electrophoretic analyses showed fundamental DNA structural features in vitro for TRS. Neither specific reactivities nor relaxations were observed except for AGG⅐CCT and AAG⅐CTT, which formed triplex structures. At acidic pH, it has been shown that pur⅐pyr tracts, including short tracts of AAG⅐CTT and AGG⅐CCT (41,61,70), adopt triplexes. In this study, the longer AAG⅐CTT tracts (n ϭ 38 and 58) showed relaxations even at pH 8.3, presumably due to the formation of triplexes. Prior studies (62) showed that low pH is not required for intramolecular triplex formation for certain sequences of the appropriate length. Thus, long AAG⅐CTT repeats (ϳ800 repeats) in the Friedreich's ataxia patients (7) might be expected to form a non-B DNA conformation(s) in vivo.