Flexible DNA: Genetically Unstable CTG·CAG and CGG·CCG from Human Hereditary Neuromuscular Disease Genes*

The properties of duplex CTG·CAG and CGG·CCG, which are involved in the etiology of several hereditary neurodegenerative diseases, were investigated by a variety of methods, including circularization kinetics, apparent helical repeat determination, and polyacrylamide gel electrophoresis. The bending moduli were 1.13 × 10−19 erg·cm for CTG and 1.27 × 10−19 erg·cm for CGG, ∼40% less than for random B-DNA. Also, the persistence lengths of the triplet repeat sequences were ∼60% the value for random B-DNA. However, the torsional moduli and the helical repeats were 2.3 × 10−19 erg·cm and 10.4 base pairs (bp)/turn for CTG and 2.4 × 10−19 erg·cm and 10.3 bp/turn for CGG, respectively, all within the range for random B-DNA. Determination of the apparent helical repeat by the band shift assay indicated that the writhe of the repeats was different from that of random B-DNA. In addition, molecules of 224–245 bp in length (64–71 triplet repeats) were able to form topological isomers upon cyclization. The low bending moduli are consistent with predictions from crystallographic variations in slide, roll, and tilt. No unpaired bases or non-B-DNA structures could be detected by chemical and enzymatic probe analyses, two-dimensional agarose gel electrophoresis, and immunological studies. Hence, CTG and CGG are more flexible and highly writhed than random B-DNA and thus would be expected to act as sinks for the accumulation of superhelical density.

Eleven human genetic disorders (including fragile X syndrome, myotonic dystrophy, Kennedy's disease, Huntington's disease, spinocerebellar ataxia type 1, dentatorubral-pallidoluysian atrophy, and Friedreich's ataxia) are characterized at the molecular level by the expansion of DNA triplet repeats (CTG, CGG, or AAG) 1 from Ͻ15 copies in normal individuals to scores of copies in affected cases (1)(2)(3)(4)(5)(6). In some cases, the CTG and CGG tracts are transcribed into mature mRNAs, whereas the AAG tracts in Friedreich's ataxia are in the first intron of the frataxin gene. The mechanism for expansion is not known, but it may involve slippage of the complementary strands during DNA synthesis (7)(8)(9)(10). Expanded alleles undergo further expansions upon passage to offspring and, in some diseases, are associated with the clinical observation called anticipation, whereby the symptoms become more severe in each successive generation and with an earlier age of onset (1)(2)(3)(4)(5). This is a novel type of mutation and shows non-mendelian genetic transmission (11,12).
Prior investigations suggested that triplet repeat sequences (TRS) 2 do not have the properties of random B-DNA. First, CTG tracts greatly facilitate nucleosome assembly (13)(14)(15), which, in turn, may repress transcription. Second, DNA synthesis in vitro pauses at specific loci in fragments containing CTG and CGG (16). Third, long tracts of AAG and AGG form intramolecular triplexes that arrest DNA synthesis (17). Fourth, CTG and CGG migrate up to 30% more rapidly than expected on polyacrylamide gel electrophoresis, whereas their migration is normal on agarose gels (18). Fifth, CTG is preferentially expanded in Escherichia coli compared with the other nine TRS (8). Sixth, the frequency of expansions and deletions in E. coli (7,9,10) is influenced by the direction of replication, suggesting the formation of stable hairpin loops in the lagging strand template or the newly synthesized nascent strand.
Conformational investigations were conducted on plasmids and restriction fragments containing CTG and CGG to evaluate their role in the biological behaviors described above. Several methods were applied, including circularization kinetics, apparent helical repeat determinations, the rate of migration through acrylamide and agarose gel electrophoresis, chemical and enzymatic probe analyses, two-dimensional gel electrophoresis, and the induction of an immune response. The analyses indicate that both CTG and CGG exist as fully paired, right-handed B-helices. However, their flexibilities are substantially greater than that of random B-DNA, and this causes the TRS to be more writhed. As a result, the average superhelical density of a DNA domain containing a TRS region will be unevenly distributed, a higher density being concentrated within the TRS tracts. This finding in unprecedented and enables the hypothesis that part of the biological response elicited by CTG and CGG is mediated by topological features associated with their increased flexibility.

Cloning of Recombinant Plasmids
Recombinant plasmids with (CTG) n and (CGG) n inserts used for the cyclization experiments were obtained by cloning a synthetic duplex that had XbaI and BamHI ends flanking (CTG) 36 or (CGG) 24 into pUC19-NotI cleaved with XbaI and BamHI. The top-strand sequence of the 5Ј-XbaI 3 3Ј-BamHI insert was TCTAGAGGATCGCTCTTCG(T-RS) n CGAAGAGCGGATCGCTAGCGGATCC. 3Ј to the TRS was an NheI site (GCTAGC) that allowed the TRS-containing fragments to self-anneal via the complementary CTAG resulting from XbaI-NheI cleavage. Plasmids with longer lengths of TRS were obtained as described (9,19). Inserts were sequenced on both strands.
Cloning of the 32 plasmids containing (CTG) n and (CGG) n inserts used for the apparent helical repeat determination has been reported (7)(8)(9)(10)19). In addition, five plasmids harboring random sequence DNA inserts were obtained by cloning HaeIII restriction fragments of pUC18 into HincII of pUC19-NotI.

Kinetics of Circularization: Theory of Ring Closure
In a random-coil chain, the distribution (W) of the end-to-end distance (v) is given by the normalized gaussian function (20,21), where ␦ is related to the mean square end-to-end distance (͗v 2 ͘ 0 ) by ␦ 2 ϭ 3/2͗v 2 ͘ 0 (͗v 2 ͘ 0 ϭ n i Ϫ2 ), n i is the number of statistical segments (Kuhn segments), and Ϫ1 is the length of a statistical segment in Å. The probability (W(0)) that the ends are confined to a volume V at a distance v (v 3 0) is as follows.
For a linear DNA duplex with cohesive ends, free in solution, the term (3/2n i ) 3/2 3 , or J-factor, specifies the concentration of intramolecular ends in dV and is directly correlated to the equilibrium constant K c for the reaction M º C (K c ϭ [C]/[M]), where C and M are circular and linear monomers, respectively. The association and dissociation rate constants between any two ends are determined by their homogeneous distribution throughout the volume of the system, i.e. by the equilibrium constant for the intermolecular association 2M º D (K a ϭ [D]/[M] 2 ), where [D] is the concentration of linear dimers. It is assumed that the noncovalent interactions formed during the intramolecular and intermolecular reactions are identical and that the entropy change (⌬S) at the reactive site is the only factor determining the rate constants (no influence from length and composition of the molecule). Under these conditions, K c ϭ K a J, whereby the concentration of circular monomers J is given by the ratio of the intramolecular to the intermolecular equilibrium constants (22). In this formulation, K a ϭ K*, where K* is the observed equilibrium constant, and is related to the permutation number by which the monomers can associate to give the same dimer. Under steady-state kinetic conditions, K c Х k 1 and K a Х k 2 for Reactions A and B, M L | ; so that the J-factor is defined as the ratio between the two forward rate constants, J ϭ k 1 /k 2 (23, 24).
Determination of k 1 k 1 is defined as follows, where Purification and quantitation were as described before. Half-molecules of (CGG) 40 were obtained similarly, except that plasmid DNA was cleaved with NheI first and then filled in and cleaved with XbaI. Equimolar amounts (1.5-6.0 ϫ 10 Ϫ9 M) of halfmolecules of (CGG) 24 and (CGG) 40 were mixed in 100 l of buffer A and processed as described for k 1 . Concentrations of T4 DNA ligase were 1 and 2 ϫ 10 Ϫ6 M. Identical procedures were applied to halfmolecules of (CTG) 36 and (CTG) 56 . k 2 was calculated for the association of (CGG) 24 (A) with (CGG) 40 (B) and of (CTG) 36 (A) with (CTG) 56 (B) from the slope of the integrated form of Equation 4 times the initial DNA concentrations. The values obtained were then divided by 2 due to the asymmetry of the molecules. The molar J-factors (J(M)) were obtained from the ratio k 1 /k 2 . Note that k 2 is a constant, whereas k 1 is unique to each length of DNA.

Interpolation Formulas
The log J(M) values were plotted against log n bp . Log J(M) is a complex oscillatory function of log n bp . Interpolation of log J(M) with the equations derived by Shimada and Yamakawa (25) for the ring-closure probabilities of a twisted worm-like chain yields the bending modulus ␣, the torsional modulus ␤, the length of the Kuhn segment Ϫ1 ( Ϫ1 ϭ 2P, P being the persistence length), and the helical repeat h 0 of the DNA. Three successive computational steps were performed.
Step A: Evaluation of Ϫ1 -Equations 5-9 were used as the starting functions to construct the theoretical curve for J 1 (L). J 1 (L) evaluates the behavior of the contour length of the DNA and is not complicated by the twist dependence of cyclization. The dependence on twist arises from the fact that linear DNA molecules with a non-integral number of helical turns need to untwist (or overtwist) to cyclize. G(0,u 0 ͉u 0 ;L) expresses the length-dependent probability of ring closure for a polymer with the end tangents specified. L denotes the reduced contour length, defined as the ratio of the contour length of the DNA chain (n bp ϫ 3.4 Å) to the length of the Kuhn segment, where f 0k are numerical constants. J 1 (L) was then transformed into J 1 by the conversion factor 10 27 3 /N A (28), where N A is Avogadro's number. Log J(M) oscillates around log J 1 . Thus, by varying Ϫ1 in the conversion factor, log J 1 may be found that runs midway through the log J(M) values. During the transformation, L (x axis) was also converted to log n bp according to the value of Ϫ1 .
Step B: Evaluation ofis Poisson's ratio and is related to the bending and torsional moduli by ␣/␤ ϭ 1 ϩ . establishes the upper and lower boundaries for the oscillating log J(M). Here, r is a periodic function of L (0 Յ r Յ 0.5) that reproduces the varying fractional helical turn (and therefore twist) of the DNA chain. For the evaluation of the upper and lower boundaries, r was set equal to 0 and 0.5 to follow the log J(M) values of DNA fragments with 0 and 0.5 fractional helical turns, respectively. Equations 10 -14 were used to construct a theoretical J(L)* function, and the previous conversion factor was then used to transform J(L)* into log J(M)*.
Lk, the linking number, expresses the number of times the two helices revolve about one another in circular DNA. This number is always an integer. Lk 0 relates to the linking number of a linear DNA in its unconstrained state and needs not to be an integer. ⌬Lk ϭ Lk Ϫ Lk 0 . ⌬Tw is the difference in twist between linear and circular DNA. ⌬Tw ϭ ⌬Lk ϩ Wr, Wr (writhe) being the measure of the deviation of the helix axis from planarity. For small circles of random DNA, Wr is approximated to 0, so that ⌬Tw ϭ ⌬Lk. J Lk (L), the linking number-dependent J-factor, is defined as follows, and G(0,⍀ 0 ͉⍀ 0 ;Lk,L), the linking number-dependent ring-closure probability, is as follows, where a 0j , a 1j (0) , and a 1j (1) are numerical constants.
Step C: Evaluation of 0 -0 is the constant torsion and determines the period of oscillation of log J(M). 0 is related to the helical repeat (h 0 ) of the DNA by 0 ϭ 2/h 0 l bp , where l bp is the distance between base pairs, 3.4 Å. The theoretical J(L) value was evaluated with Equations 10 -14, where r was taken in small increments according to the following, where k is an integer Ն0.
The previous conversion factor was again used to convert J(L) into log J(M). 0 was varied to find a good fit to the experimental log J(M) values. For illustration purposes, log n bp was converted to n bp in the figures. For all of the DNAs, the r ϩ 1 term in Equation 10 was omitted because ͉⌬Lk͉/(1 ϩ ) Ն ͌ 3. This has been shown to cause large errors in the extrapolations (25).
Satisfactory values for Ϫ1 , , and 0 were estimated by visual in-spection of the fits, and no statistical tests were performed. It should be noted that the torsional (␤) moduli obtained by these fits were identical to those estimated manually based on pairs of molecules having r (⌬Tw) values of 0 and 0.5 (23).

Variance of Writhe (͗Wr 2 ͘) and Free Energy of Supercoiling (n bp K/RT)
These computations are reported in detail in the accompanying paper (26).

Apparent Helical Repeat Determination
30 g of plasmid DNA was treated with chicken erythrocyte DNA topoisomerase I at 0°C overnight and purified by phenol/chloroform extraction and ethanol precipitation. The resulting set of topoisomers was resolved by agarose gel electrophoresis. This was performed with 0.5 g of DNA at 23°C (60 V for 18 h) on 1% agarose containing 40 mM Tris-HCl, 25 mM sodium acetate, and 1 mM EDTA at pH 8.3. The gel dimensions were 20 cm ϫ 22 cm ϫ 3 mm. Positively supercoiled molecules were obtained by relaxing the plasmids at 0°C and performing the electrophoresis at 23°C, whereas negatively supercoiled plasmids were generated by relaxation at 37°C and electrophoresis at 23°C. Average unwinding was 0.012°/°C/bp. Apparent helical repeat (h 0 ) was calculated (27) from an average of three determinations (with a S.D. of Ϯ0.1 bp/turn). Methylation of the CGG tracts was performed by treating plasmid DNA with SssI methylase in the presence of S-adenosylmethionine at 37°C overnight. To check for complete reaction, cleavage studies with AciI (CCGC) and PAGE analysis were conducted.

Other Methods
Chemical probe analyses, polyacrylamide gel electrophoresis, and induction of mouse antibodies were conducted as referred to below.
repeats (140 -272 bp long) and 19 plasmids containing 24 -73 consecutive CGG repeats (104 -251 bp long) were prepared ( Table I). The TRS-containing restriction fragments had a protruding CTAG at each end that allowed the intramolecular or intermolecular association (Reactions A and B under "Experimental Procedures") to take place. k 1 and k 2 were measured under conditions in which the intermediates C and D were converted into covalently closed products by T4 DNA ligase (23). Fig. 1 shows representative time course reactions and plots. For k 2 , two molecules of different lengths were chosen and blunted at one end. This scheme enables the molecules to dimerize via the single CTAG end, but disallows the intramolecular circularization. Fig. 1A shows a PAGE separation of the products between (CGG) 24 and (CGG) 40 . Three ligated species were observed, which correspond to dimers of (CGG) 24 , dimers of (CGG) 24 and (CGG) 40 , and dimers of (CGG) 40 in the ratio of 1:2:1 (20,25). In general, higher order aggregates were also detected at concentrations within a few percent of the linear dimers. These aggregates may have originated from blunt-end ligations.
The rate of disappearance of the combined linear monomers was plotted as a function of time. Fig. 1B shows the results for (CGG) 24 plus (CGG) 40 . The average k 2 obtained from four determinations, two with (CTG) n and two with (CGG) n fragments, was (0.89 Ϯ 0.48) ϫ 10 2 M Ϫ1 s Ϫ1 when normalized for an enzyme concentration of 1 ϫ 10 Ϫ9 M. This error associated with k 2 was quite large and may be a reflection of both experimental variation and differences in T4 DNA ligase-DNA interactions between the (CTG⅐CAG) n and (CGG⅐CCG) n sequences. Nevertheless, the average value was comparable (ϳ20% lower) to the value of 1.12 ϫ 10 2 M Ϫ1 s Ϫ1 measured previously (28) for DNAs with different sequences. k 1 was measured on each of the restriction fragments in Table I under conditions that inhibited the intermolecular association, i.e. low concentrations of DNA and ligase. Fig. 1C shows the electrophoretic separation of a kinetic reaction for (CTG) 59 . One major ligated product was seen, corresponding to circularized (CTG) 59 . A fainter band was also evident, which was the result of dimerization events. For some other fragments, linear trimer and multimers also were seen, depending on the amount of ligase that was needed to detect the cyclized monomer. The rate of accumulation of the circular product was quantitated and plotted as a function of time. Fig. 1D shows the time-dependent circle formation for (CGG) 59 at three ligase concentrations. In this experiment, k 1 obtained from the three plots was (2.39 Ϯ 0.32) ϫ 10 Ϫ5 s Ϫ1 when normalized for 1.0 ϫ 10 Ϫ9 M T4 DNA ligase. In general, the errors associated with k 1 were much smaller than for k 2 .
The log J(M) (k 1 /k 2 ) values are graphed in Fig. 2 as a function of the number of base pairs. Also shown (bottom panel) are the molar J-factors for random sequence DNA of comparable lengths as determined previously (23). This panel was included  24 ; M 2 , linear monomer of (CGG) 40 ; D 1 , linear dimer of (CGG) 24 ; D 2 , linear dimer of (CGG) 24 and (CGG) 40 ; D 3 , linear dimer of (CGG) 40 . B, integrated rate of disappearance of the combined monomers of (CGG) 24 and (CGG) 40  To estimate the torsional and bending moduli, we fit the experimental data to the equations for the ring-closure probabilities of a twisted worm-like chain (25). The oscillating solid line in Fig. 2 (top two panels) represents the fit to the experimental data as specified by the parameters 0 , Ϫ1 , and . 0 , or continuous torsion, describes the twist angle between adjacent base pairs and is related to the number of bp/turn in the DNA helix (h 0 ). Ϫ1 is related to the persistence length (P) and, therefore, to the bending flexibility of the DNA. Finally, , or Poisson's ratio, is related to the torsional stiffness of the chain.
The upper and lower boundaries in log J(M) (dotted lines in the top two panels) correspond to the values for linear molecules with integral numbers of helical turns (upper) and those with a fractional 0.5 helical turn (lower). The fits describe the experimental data very well.
The values of Ϫ1 , , and 0 (converted to h 0 ) along with the calculated bending (␣) and torsional (␤) moduli are reported in Table II. As anticipated, the values of the Kuhn segment ( Ϫ1 ) for CTG and CGG were much lower than that for random DNA (556 and 630 Å versus 950 Å, respectively), indicating that both TRS have a high degree of flexibility (low bending modulus ␣). By contrast, the torsional modulus ␤ (2.3 and 2.4 ϫ 10 Ϫ19 erg⅐cm versus 2.4 ϫ 10 Ϫ19 erg⅐cm) and the helical repeat h 0 (10.41 and 10.35 bp/turn versus 10.46 bp/turn) were close to those of random DNA, showing that fragments containing CTG and CGG form right-handed B-type helices under these conditions. Considering the S.D. associated with k 2 and the three interpolation steps, the accuracy of these values (Table II and Fig. 2) is within Ϯ10%. This error also applies to the results from the analyses that follow (see Figs. 3 and 4) on the variance of writhe and the free energy of supercoiling.
The choice of Ϫ1 (950 Å), ␣ (1.9 ϫ 10 Ϫ19 erg⅐cm), and ␤ (2.4 ϫ 10 Ϫ19 erg⅐cm) for B-DNA was based on the following. Measurements of the persistence length (P) performed by cryoelectron microscopy (29), cyclization kinetics (25,30), Monte Carlo simulation (31), and electro-optic techniques (32, 33) yielded a consensus value close to 500 Å. Measurements of ␤ performed by ring closure, Monte Carlo simulation, topoisomer distribution, fluorescence depolarization, and electron paramagnetic resonance experiments produced results that ranged from 1.5 to 3.6 ϫ 10 Ϫ19 erg⅐cm (reviewed in Ref. 34). The value of 2.4 ϫ 10 Ϫ19 erg⅐cm (23) was selected because it was within the range of 2.0 -2.4 ϫ 10 Ϫ19 erg⅐cm measured by cyclization kinetics (23, 30,35). This choice of ␣ and ␤ yields ϭ Ϫ0.20, which is equal to the value measured experimentally on plasmid DNA at low superhelical densities (36). It should be noted that in the analyses that follow, a variation in ␤ from 1.5 to 3.6 ϫ 10 Ϫ19 erg⅐cm for random DNA would not change the outcome of the conclusions (26).
Writhe and Supercoiling-The data reported in Table II enable the prediction of the topological behaviors of the three DNA species. Fig. 3 shows the results of an analysis based on the ring-closure probability J 1 (34), which enables the determination of the optimal length for circularization (J 1 max ). The probability J 1 takes into account the proximity of the two ends within the volume dV, as well as the correct orientation of their tangents, but does not consider the twist-dependent alignment (25).
The J 1 /J 1 max ratio reflects the efficiency of circularization as a function of length. The end points of the curves indicate the n bp at which J 1 is maximum. An interpolation for phased A-tract DNA is also shown. These molecules circularize ϳ1000 times more efficiently than random sequence DNA due to phased, in-plane, static curvatures (28,37). For random DNA, the calculated optimal length is 552 bp, whereas for fragments containing (CTG) n and (CGG) n , the calculated optimal lengths are 326 and 366 bp, respectively, or ϳ2.4 times greater than for A-tract DNA (ϳ140 bp). Hence, the decrease in persistence length in (CTG) n and (CGG) n causes an ϳ40% drop in the optimal length of circularization as compared with random DNA.
The second prediction concerns the writhe and the free energy of supercoiling. Writhe quantitates the out-of-plane trajectory of the helix axis in circular DNA (38). Its variance, ͗Wr 2 ͘, is directly related to Ϫ1 (34). Fig. 4  greater variations than random DNA, in the order CTG Ͼ CGG Ͼ random DNA, with differences increasing rapidly with length. Thus, we conclude that the range of conformations adopted by (CTG) n and (CGG) n is greater than for random DNA.
In sufficiently long molecules, twisting of the two ends before ring closure produces a set of topological isomers that differ in the number of helical turns (Lk). The strain generated by twisting is then partitioned with writhe, which gives rise to tertiary turns (), or supercoils. The distribution of the topoisomer population is described by ͗(⌬Lk) 2 ͘ ϭ ͗Wr 2 ͘ ϩ ͗(⌬Tw) 2 ͘ (see definitions under "Experimental Procedures"). The free energy associated with supercoiling (⌬G ) is related to by ⌬G ϭ K 2 , where K is the apparent twisting coefficient. K and ͗(⌬Lk) 2 ͘ are related by n bp K/RT ϭ n bp /2͗(⌬Lk) 2 ͘, where R is the gas constant, and T the absolute temperature. A computation of n bp K/RT is shown in Fig. 4. The calculations indicate a rapid decrease in this range. The values for (CTG) n and (CGG) n are always lower than for random DNA, implying that, for a defined length, it will require less energy to supercoil (CTG) n and (CGG) n than random DNA. In other words, these TRS will act as a "sink" for localizing writhe when embedded in DNA of random sequence.
Topoisomers of Small Circles-Additional evidence that (CTG) n and (CGG) n writhe more easily than random DNA comes from the analysis of cyclized products in the ring-closure experiments. Fig. 5 (left panel) shows the pattern of (CGG) 63 (221 bp) before the addition of T4 DNA ligase (lane 1) and 4 min after the enzyme was added (lane 2). The linear monomer (m 1 ) was converted to the linear dimer (d 1 ) plus one circular monomer (c). Lanes 1 and 2 for (CGG) 70 (242 bp) report identical time courses. However, in this case, the linear monomer (m 2 ) was converted to two circular species (ct 1 and ct 2 ) in almost equal amounts. A similar pattern was also seen for (CGG) 71 (data not shown). Analogously, the reaction performed on (CTG) 54 (194 bp) (right panel) ligated the monomer (M 1 ) into a linear dimer (D 1 ) and a single circular species (C 1 ). On the other hand, the reaction performed on (CTG) 64 (224 bp) converted the linear monomer (M 2 ) into four circular species (CT 1 to CT 4 ). Thus, (CTG) 64 circularized to give four isomers, whereas for (CGG) 70/ 71, only two species were observed, despite their longer length. (CGG) 70 , (CGG) 71 , and (CTG) 64 had fractional helical turns (⌬Lk 0 ; ⌬Lk 0 ϭ Lk 0 Ϫ Int(Lk 0 )) of 0.38, 0.67, and 0.52, respectively. The formation of topoisomers is facilitated in linear molecules that have ⌬Lk 0 close to 0.5. In fact, during ring closure, such molecules need to untwist, or overtwist, to align their ends. If both of these movements occur, topoisomers will be observed. The energetic barrier to be overcome by this process is inversely proportional to chain length (Fig. 2); thus, for very short chains, only untwisting, or overtwisting, is practically observed. The finding that fragments of CGG of 242 and 245 bp and of CTG of 224 bp form topoisomers in almost equimolar amounts indicates that this barrier is low at these lengths. This contrasts with random DNA, for which fragments of up to ϳ245 bp, with ⌬Lk 0 ϳ 0.5, were shown to circularize in  3. Relative efficiencies of cyclization. The oscillation-independent ring-closure probability (J 1 ) was calculated for (CTG) n , (CGG) n , and random DNA according to Equations 5-9 and the values in Table II. The length at which J 1 reaches a maximum is defined as J 1 max . The curves represent J 1 /J 1 max ratios. For phased A-tract DNA, the analysis was performed on experimental ratios (28) (q).

FIG. 4.
Variance of writhe and free energy of supercoiling. n bp K/RT was calculated as reported (26). Inset, the variance of writhe (͗Wr 2 ͘) versus n bp was calculated for (CTG) n , (CGG) n , and random DNA as described (26) using the values reported in Table II. mostly one species (39,40). Thus, experimental results and theoretical predictions fully agree that the onset and distribution of topoisomeric species follow the order CTG Ͼ CGG Ͼ random DNA.
Writhe in Plasmids Containing (CTG) n and (CGG) n -To further test the above predictions, we performed band shift assays, a method based on the relative electrophoretic velocities of a family of circular plasmids (27,41). Enzymatic nicking and closing of circular DNA result in a gaussian distribution of topoisomers that peaks at the planar species (⌬Lk closest to 0). Flanking isomers differ by Ϯ1 in their value of ⌬Lk due to pivoting of the ends about the nick. If a segment of x bp containing an integral number of helical turns is inserted into a plasmid of n bp , then the topoisomers of A (A ϭ n bp ) and B (B ϭ n bp ϩ x bp ) have identical ⌬Lk values and, therefore, identical writhe. Since writhe affects the electrophoretic mobility, the migration pattern of A and B will be identical when corrected for the size difference. On the other hand, if x bp contains a non-integral number of helical turns, topoisomers A and B will have different ⌬Lk as well as writhe. This will cause a shift in migration, which may be used to calculate the h 0 of x bp .
It is obvious that the calculation of h 0 by this method requires that the writhe of the inserted fragment be the same as that of the vector. If the bending force of x bp is different from that of n bp , the magnitude and the partition of writhe between x bp and n bp will be modified, and the velocity of migration altered. Fig. 6A shows a representative agarose gel comparing plasmids containing n repeats of CGG with those containing nϩa repeats. The latter is then compared with a plasmid containing nϩaϩb repeats and so on. Studies were also performed on a family of plasmids containing random DNA ranging in length from 90 to 270 bp (Fig. 6B). Each of the triangles corresponds to the h 0 calculated from inserting x bp of random DNA into a vector. The average value of h 0 (10.26 Ϯ 0.1 bp/turn) is in excellent agreement with that reported previously (27,41). The straight horizontal line passing through these points is a good indication that the writhe of the random sequence inserts is the same as that of the vector. Fig. 6 (B (circles), C, and D) shows the effect of inserting various lengths of (CTG) n or (CGG) n into plasmids. The widely oscillating behavior of the calculated h 0 reflects complex changes in the topology of the molecules, as expected if the writhe of the inserted TRS differs from that of the vector. The results include inserts with methylated cytosine residues or naturally occurring polymorphisms (panel D) as well as positively and negatively writhed molecules (panel C). In all experiments, the calculated h 0 was not a constant value, as observed for the random DNA inserts. In addition, the results of panel D substantiate the observed lack of influence (42) on the J-factor by methylated cytosines. Thus, the writhe of (CTG) n and (CGG) n is different from that of random sequence DNA, and this change causes drastic alterations in the topology of the supercoiled molecules. Specifically, due to their low bending modulus ␣, the TRS regions must be the most flexible or contortable domains of the plasmids and, hence, the preferential sites for the partitioning of supercoil density.
Other Structural Analyses-Chemical and enzymatic probes used for characterizing non-B-DNA conformations such as cruciforms, B-Z junctions, triplexes, nodule DNA, and unpaired AT-rich regions (38,(43)(44)(45) were employed on plasmids containing various lengths of CTG and CGG. Exhaustive analyses were performed with bromoacetaldehyde, osmium tetroxide, dimethyl sulfate, potassium permanganate, chloroacetaldehyde, copper-phenanthroline, diethyl pyrocarbonate, S1 nuclease, and DNase I under a wide variety of experimental conditions. No reactivities were detected that would indicate accessible bases or unpaired regions as found for the conformations identified above. However, positive internal control studies with a cruciform or B-Z junctions confirmed the validity of the interpretations.
Furthermore, a 420-bp BamHI fragment containing (CTG) 130 (16) behaved immunologically as expected for a fully base-paired B-DNA of high G ϩ C content. It competed as strongly as calf thymus DNA for binding to a monoclonal an- FIG. 5. Topoisomers of circular monomers. Fragments were prepared and processed as described for the determination of k 1 . Lanes 1 show the migration of linear monomers before the addition of T4 DNA ligase. Lanes 2 show the products formed after 4 min of reaction with the enzyme. m 1 , m 2 , M 1 , and M 2 , linear monomers of (CGG) 63 , (CGG) 70 , (CTG) 54 , and (CTG) 64 , respectively; d 1 , d 2 , D 1 , and D 2 , linear dimers of (CGG) 63 , (CGG) 70 , (CTG) 54 , and (CTG) 64 , respectively; c, circular monomer of (CGG) 63 ; ct 1 and ct 2 , circular monomers of (CGG) 70 ; C 1 , circular monomer of (CTG) 54 ; CT 1 , CT 2 , CT 3 , and CT 4 , circular monomers of (CTG) 64 . Lane L, reference size ladder. tibody that favors G/C sequences in B-DNA conformation and less strongly for a monoclonal antibody that favors A/T sequences. It did not react with a monoclonal antibody specific for single-stranded DNA. When (CTG) 130 -methylated bovine serum albumin complexes with adjuvant, were injected into three normal mice, there was no antibody response above that stimulated by adjuvant alone, and no antibodies specific for the flexible (CTG) 130 structure were formed. Conformational variants (such as Z-DNA, cruciforms, triplex, A-helix, or singlestranded DNA) do induce structure-specific antibodies (46). In summary, all of the above structural analyses revealed that the flexible (CTG) n and (CGG) n TRS behave as fully paired, righthanded B-DNA and that there are no structural transitions induced by supercoil density that are detected by these methods.
Alternatively, nondenaturing PAGE analyses were performed on restriction fragments containing various lengths of CTG (18) and CGG (9 -240 repeats, both methylated and nonmethylated), which revealed their expected rapid migration (by up to 30%). However, these sequences migrated normally on agarose gel electrophoresis. The increased velocity could be abolished by treatment with chloroquine or reduced by the The low persistence lengths of (CTG) n and (CGG) n reflect an enhanced flexibility along the helix axis. Two mechanisms contribute to the deflection of the helix axis from linearity: dynamic thermal motion and static bends (48,49). A static bend is represented by a wedge between 2 adjacent bp, which may be caused by the geometry of the base pair step itself or by an intercalated ligand or amino acid residue(s). Thermal motion produces constant fluctuations of the helix axis, and consequently, the observed (apparent) persistence length (P a ) is composed of static (P s ) and dynamic (P d ) components (1/P a ϭ (1/P s ) ϩ (1/P d )) (48,49). For a sequence that repeats regularly along the helix, P s 3 ϱ, and therefore, P a ϭ P d .
Both (CTG) n and (CGG) n are a monotonous succession of trinucleotide units, each one occupying the same position along the helix every other turn (3 nucleotides ϫ 7 ϭ 21 nucleotides/ 2h 0 , h 0 Х 10.5 bp/turn). Hence, the macroscopic, idealized shape of a TRS should be that of a straight rod, in the absence of thermal motion. The estimated P d for straight random B-DNA is ϳ800 Å (29). The values of P a (ϭP d ) of 278 and 315 Å measured for (CTG) n and (CGG) n , respectively, are ϳ60% P a (500 Å) and ϳ40% P d for random DNA. This suggests that one or more dinucleotide steps within each trinucleotide repeat unit are more flexible than average (50) and/or that there are flexible hinges along the sequence.
Of the 10 possible combinations, (CTG) n contains 1:3 each of AG/CT, CA/TG, and GC/GC, whereas (CGG) n contains 1:3 each of GC/GC, GG/CC, and CG/CG. Thus, both (CTG) n and (CGG) n share GC/GC. The geometry (flexibility at the dinucleotide step) of duplex DNA is best analyzed by x-ray crystallography.
Studies indicate that CA/TG is highly polymorphic (51)(52)(53)(54), especially in the degree of slide, roll, and twist, and is very dynamic (55). GC/GC also is rated high on a "flexibility" scale (third in one analysis and fifth in the other) (52,54). This dinucleotide step is associated with high roll wedge angles, which may be stabilized by Mg 2ϩ ions (56). AG/CT is not well represented in the family of crystal structures and, therefore, was excluded from one of the studies (54). For GG/CC and CG/CG, the most recent study indicates that they are flexible (54).
For each of the 10 dinucleotide steps, the mean roll and tilt angles are estimates of the equilibrium values, while the statistical scatter in these angles reflects the intrinsic flexibility for roll and tilt. It has been shown that such an analysis, averaged over all sequences, gives a reasonable value for the persistence length of B-DNA (52). We note that in the dinucleotide steps discussed above, CA/TG, CG/CG, GC/GC, and GG/CC are ranked second through fifth in terms of flexibility, out of 10 (AG/CT is ranked eighth). The dinucleotide flexibilities can then be used to predict the relative flexibilities of the different TRS (Table III). CGG and CTG are predicted to be the third and fourth most flexible, out of a total of 12 sequences. Interestingly, the TRS (ACC and GTC) that ranked first and fifth also showed anomalously rapid PAGE mobilities.
Flexibility may also be caused by the unpairing of the double helix (35). Due to their repetitive nature, the complementary strands of (CTG) n as well as (CGG) n may slide relative to each other following transient melting and, therefore, form slipped structures (38,43) of varying size, which exist in low proportions, at multiple sites. Because of their random location, these structures might escape detection by chemical probe analyses. However, they would also be expected to cause the loss of dependence of the J-factors on the fractional twist (35), an occurrence not observed experimentally.
Thus, despite the imprecision of the model, the limitations of the dinucleotide approximation, and uncertainties associated with crystal packing forces, the low persistence lengths of (CTG) n and (CGG) n are consistent with the variations in crystallographic values of slide, roll, and tilt.
Flexible (highly writhed) DNA is the first intrinsic, unusual DNA conformational feature associated with human hereditary neuromuscular diseases. It is tempting to speculate that this property promotes the slippage of complementary DNA strands and is responsible for the expansion and the non-mendelian transmission of these diseases (7-10, 38, 43). However, the role of flexibility (and writhe) in expansion, toroidal nucleosome structure (13)(14)(15), DNA polymerase pausing (16), recognition of methyl-directed mismatch repair enzymes (57), binding of certain specific proteins (58,59), and preferential methylation of long CGG tracts (60) remains to be elucidated. The establishment of methodologies to investigate conformational problems in living cells (43)(44)(45) offers hope for the evaluation of these flexible and highly writhed structures in molecular mechanisms responsible for human genetic diseases.

TABLE III
Relative TRS flexibilities based on analysis of random DNA crystal structures The S.D. of roll and tilt angle for each of the 10 dinucleotide steps was determined as described previously (53) using all of the crystal structures available as of summer 1996. If those two modes of flexibility are assumed to be independent, then the flexibility of each dinucleotide step can be estimated by adding the variances in roll and tilt. Furthermore, the flexibility of each TRS is estimated by adding the variances of the appropriate three dinucleotide steps. Thus, for CTG, contributions are added for the CT, TG, and GC steps. The flexibility of each TRS is reported as a S.D., i.e. the square root of the variance, measured in degrees. This corresponds to the predicted root mean square deviation in the direction of the helix axis as a result of thermal fluctuations. The relative flexibilities are ranked, with 1 indicating the most flexible TRS and 12 the least flexible. There are 4 3 ϭ 64 possible TRS, but only 12 of these sequences are unique. Rank TRS Flexibility ing Closure of CTG and CGG Triplet Repeat Sequences