Triplet Repeat Instability and DNA Topology: An Expansion Model Based on Statistical Mechanics*

The variance of writhe, the contribution of writhe to supercoiling, and the free energies of supercoiling were calculated for (CTG·CAG) n and (CGG·CCG) n triplet repeat sequences (TRS) by statistical mechanics from the bending and torsional moduli previously determined. Expansions of these sequences are inherited by non-mendelian transmission and are linked with several hereditary neuromuscular diseases. The variance of writhe was greater for the TRS than for random B-DNA. For random B-DNA, (CGG) n , and (CTG) n , the contribution of writhe to supercoiling was 70, 78, and 79%, whereas the free energy of supercoiling at a length of 10 kilobase pairs was 1040·RT, 760·RT, and 685·RT, respectively. These data indicate that the TRS are preferential sites for the partitioning of supercoiling. Calculations of the differences in free energy of supercoiling between the TRS and random B-DNA revealed a local minimum at ∼520 base pairs. Human medical genetic studies have shown that individuals carrying up to 180–200 copies of TRS (540–600 base pairs, premutations) in the fragile X or myotonic dystrophy gene loci are usually asymptomatic, whereas large expansions (>200 repeats, full mutations), which lead to disease, are observed in their offspring. Therefore, the length corresponding to the local minimum in free energy of supercoiling correlates with the genetic breakpoint between premutation and full mutation. We propose that (a) TRS instability is mediated by DNA mispairing caused by the accumulation of supercoiling within the repeats, and (b) the expansions that take place at the premutation to full mutation threshold are associated with increased mispairing caused by the optimal partitioning of writhe within the TRS at this length.

The variance of writhe, the contribution of writhe to supercoiling, and the free energies of supercoiling were calculated for (CTG⅐CAG) n and (CGG⅐CCG) n triplet repeat sequences (TRS) by statistical mechanics from the bending and torsional moduli previously determined. Expansions of these sequences are inherited by nonmendelian transmission and are linked with several hereditary neuromuscular diseases. The variance of writhe was greater for the TRS than for random B-DNA. For random B-DNA, (CGG) n , and (CTG) n , the contribution of writhe to supercoiling was 70, 78, and 79%, whereas the free energy of supercoiling at a length of 10 kilobase pairs was 1040⅐RT, 760⅐RT, and 685⅐RT, respectively. These data indicate that the TRS are preferential sites for the partitioning of supercoiling. Calculations of the differences in free energy of supercoiling between the TRS and random B-DNA revealed a local minimum at ϳ520 base pairs. Human medical genetic studies have shown that individuals carrying up to 180 -200 copies of TRS (540 -600 base pairs, premutations) in the fragile X or myotonic dystrophy gene loci are usually asymptomatic, whereas large expansions (>200 repeats, full mutations), which lead to disease, are observed in their offspring. Therefore, the length corresponding to the local minimum in free energy of supercoiling correlates with the genetic breakpoint between premutation and full mutation. We propose that (a) TRS instability is mediated by DNA mispairing caused by the accumulation of supercoiling within the repeats, and (b) the expansions that take place at the premutation to full mutation threshold are associated with increased mispairing caused by the optimal partitioning of writhe within the TRS at this length.
Several human loci associated with neurodegenerative disorders have been shown to carry a new form of mutation, i.e. the expansion of a DNA triplet repeat sequence (TRS) 1 with composition (CTG⅐CAG) n , (CGG⅐CCG) n , or (GAA⅐TTC) n , referred to as (CTG) n , (CGG) n , and (GAA) n , respectively (reviewed in Refs. [1][2][3][4]. These diseases fall into two categories (1).
The first, which includes spinal and bulbar muscular atrophy, Huntington's disease, spinocerebellar ataxia type 1, dentatorubral-pallidoluysian atrophy, and Machado-Joseph disease, is characterized by small expansions of a (CTG) n repeat from the ϳ10 -40 in the normal population to ϳ40 -120 units in diseased individuals that encode a polyglutamine tract in the corresponding gene products. This mutation may impart a gain of function to the mature polypeptides that is deleterious to neuronal activity (1)(2)(3).
The second, which includes the myotonic dystrophy (dystrophia myotonica (DM)) and the fragile X and E (FRAXA and FRAXE) genes, is characterized by much larger expansions of either a (CTG) n or (CGG) n repeat, respectively, in the untranslated region of the genes. The mechanisms by which these expansions lead to disease are not fully understood (5,6). The number of repeats is polymorphic in the normal population and ranges from 6 to 52 in FRAXA, from 6 to 25 in FRAXE, and from 5 to 37 in DM. Most asymptomatic carriers have expanded CTG or CGG tracts between 50 and 180 repeats in DM and between 60 and 200 repeats in FRAXA, respectively. The stability of the repeats decreases as their number increases, and offspring of carriers are subject to inherit a higher number of repeats than the donor parent. However, inherited expanded alleles with Ͼ200 TRS copies in FRAXA, FRAXE, and DM loci lead to disease with a severity that is proportional to the number of repeats. Furthermore, the "quantum jump" of an inherited expansion may be of thousands of repeats if the carrier approaches the threshold of 200 (1-3). These behaviors are unique to this second category of disease genes. Hence, the mechanisms of expansion and the association between the "180 -200" threshold with large expansions coupled with the onset of disease are unknown.
We have previously determined by circularization kinetics, helical repeat determination, and polyacrylamide gel electrophoresis that both (CTG) n and (CGG) n are highly flexible (7), being characterized by a persistence length ϳ40% shorter than random DNA (B-DNA) (278 Å for (CTG) n , 315 Å for (CGG) n , and 475 Å for B-DNA). This property enables these TRS to sustain higher levels of writhe (supercoiling) than random sequence DNA.
Herein, we quantitated the variance of writhe, its contribution to supercoiling, and the free energy of supercoiling for (CTG) n and (CGG) n of up to 10 kbp and compared the same parameters calculated for B-DNA. Interestingly, we found that the differences were greatest for DNA lengths around 200 repeats, whereas the free energies of supercoiling were always lower for the TRS than for B-DNA. These results suggest that supercoiling (writhe) plays a crucial role in the mechanism of expansion of (CTG) n and (CGG) n .

EXPERIMENTAL PROCEDURES
The calculations of the variance of writhe, the variance of twist, and the free energy of supercoiling were performed with the equations of Shimada and Yamakawa (8) developed by statistical mechanics for the probabilities of ring closure of a twisted worm-like chain. The following equations were used. For the variance of writhe, ͗Wr 2 ͘, where L is the ratio between the contour length of the DNA (i.e. n bp ϫ 3.4 Å) and the length of the Kuhn segment, Ϫ1 . For the variance of twist, ͗(⌬Tw) 2 ͘, and for the free energy of supercoiling, n bp K/RT, where ⌫ 0 ϭ 2 /l bp (1 ϩ ), l bp is the distance between adjacent base pairs (3.4 Å), R is the gas constant, and T is the temperature in kelvin.
Ϫ1 denotes the length of the Kuhn segment, whereas represents Poisson's ratio. The values of Ϫ1 and employed in the calculations were 556 Å and Ϫ0.51 for (CTG) n , 630 Å and Ϫ0.46 for (CGG) n , and 950 Å and Ϫ0.20 for random B-DNA. These values were taken from circularization kinetic experiments performed on restriction fragments containing the DNA sequences of (CTG) n , (CGG) n , or random composition (7,9,10). For the TRS, all calculations pertain to perfect, non-interrupted sequences.
Ϫ1 and enable the calculation of the bending ␣ and the twisting ␤ moduli from the following relations, where k B is the Boltzmann constant.

RESULTS
Variance of Writhe-Closed circular plasmid DNA contains different numbers of helical turns when the linear form with cohesive ends is circularized and ligated (11,12). This behavior is due to the fact that DNA is flexible and the helices bend and twist under the influence of thermal energy. Since the molecules vary in their degree of bend and twist at the time of closure, topological isomers (topoisomers) are formed. Within this population, which is described by a gaussian envelope, each isomer differs from its neighbor by one in the total number of helical turns. For a closed circular DNA, the linking number (Lk) is the number of times the two strands cross each other. The variance in Lk (͗(⌬Lk) 2 ͘), which describes the width of a distribution of topological isomers, results from the sum of the variance of writhe and twist: ͗(⌬Lk) 2 ͘ ϭ ͗Wr 2 ͘ ϩ ͗(⌬Tw) 2 ͘ (13). Both the variance of writhe and the variance of twist may be calculated by statistical mechanics (8) from the length of the Kuhn segment, Ϫ1 (which is twice the persistence length, P), and Poisson's ratio, . Ϫ1 and have been determined experimentally by the kinetics of circularization of restriction fragments containing (CTG) n , (CGG) n , or B-DNA (7,9,10). Fig. 1 shows the variance of writhe normalized for chain length, ͗Wr 2 ͘/L, or reduced writhe for B-DNA, (CGG) n , and (CTG) n repeats. The calculations (see "Experimental Procedures") show that in all cases the normalized writhe converges to 0.095 at infinite length. However, (CTG) n and (CGG) n approach the limit at much shorter lengths than B-DNA, implying greater fluctuations of the helix axis. The differences in ͗Wr 2 ͘/L between the TRS and B-DNA, ⌬(͗Wr 2 ͘/L), vary with length ( Fig. 1, inset). ⌬(͗Wr 2 ͘/L) rises sharply, reaches a maximum of 0.0260 at 730 bp for (CTG) n and 0.0202 at 780 bp for (CGG) n , and then declines to ϳ9 ϫ 10 Ϫ4 at 10 kbp. Hence, this analysis predicts that molecules of (CTG) n or (CGG) n that are 700 -800 bp in length (230 -270 repeats) have the greatest fluctuations in the helix axis compared with B-DNA.

Contribution of Writhe and Twist to the Linking Number
(Lk)-⌬Tw expresses the difference in twist between an unconstrained linear DNA molecule and the constrained closed species. The calculation of ͗(⌬Tw) 2 ͘ was performed according to Equation 2 under "Experimental Procedures." This formula expresses the linear relation between ͗(⌬Tw) 2 ͘ and DNA length. Since ͗Wr 2 ͘ and ͗(⌬Tw) 2 ͘ both add to the distribution of topological isomers, their relative contribution to supercoiling may be obtained from the ratio ͗Wr 2 ͘/͗(⌬Tw) 2 ͘. Fig. 2 shows this calculation for B-DNA, (CGG) n , and (CTG) n . The values increase rapidly at short DNA lengths and reach a plateau of 2.307 for B-DNA, 3.448 for (CGG) n , and 3.806 for (CTG) n at 10 kbp. The data show that writhe and twist contribute as follows to the supercoiling in long molecules: for B-DNA, 70 and 30%; for (CGG) n , 78 and 22%; and for (CTG) n , 79 and 21%, respectively. Thus, the ratios of (CTG) n and (CGG) n are greater than for B-DNA. Since the estimated torsional modulus ␤ for (CTG) n and (CGG) n is within the range of 2.0 -2.4 ϫ 10 Ϫ19 erg⅐cm found for B-DNA by circularization kinetics (7,10,14,15), we conclude that segments of (CTG) n or (CGG) n supercoil more efficiently than B-DNA due to an increased contribution from writhe.
Region of Hyperflexibility-The free energy of supercoiling, ⌬G ⌬Lk , is related to ⌬Lk by ⌬G ⌬Lk ϭ K⌬(Lk) 2 (13), where K, which is expressed in kcal⅐bp/mol and is also called apparent twisting coefficient, decreases with increasing DNA length (16,17). Fig. 3 shows the calculation of K as n bp K/RT for random DNA, (CGG) n , and (CTG) n . n bp K reaches a constant value of 1040⅐RT (606 kcal/mol, RT ϭ 0.5825 at 20°C) at 10 kbp for B-DNA, 760⅐RT (443 kcal/mol) for (CGG) n , and 685⅐RT (399 kcal/mol) for (CTG) n . The values for the TRS are 27 and 34%, respectively, lower than random sequence DNA for chains longer than 3 kbp. Below 2 kbp, n bp K/RT rises sharply and approaches 3400⅐RT (1980 kcal/mol) at 0 length for all DNAs. The differences in n bp K/RT between the TRS and B-DNA, ⌬(n bp K/ RT), are computed in Fig. 3 (inset A). n bp K becomes progressively more negative at increasing DNA length, reaching a minimum of Ϫ1241⅐RT at 500 bp for (CTG) n (167 repeats, Ϫ723 kcal/mol) and Ϫ958⅐RT at 540 bp for (CGG) n (180 repeats, Ϫ558 kcal/mol). Thus, this analysis shows that (CTG) n and (CGG) n require less energy to supercoil (writhe) than random DNA. In addition, the difference in free energy of supercoiling between the TRS and random DNA is not uniform with length, but reaches a maximum at ϳ520 bp. At this n bp , the variance in linking number, obtained from the relation ͗(⌬Lk) 2 ͘ ϭ RT/2K, equals 0.100 for B-DNA, whereas it is 0.157 for (CGG) n and 0.190 for (CTG) n , an increase of 57 and 90%, respectively.
K was also calculated for B-DNA, (CTG) n , and (CGG) n from n bp K/RT. Contrary to n bp K/RT, for which the values spanned a 3-5-fold range, K covered a range of several thousandfold from 0 to 10 kbp. For B-DNA, for example, the results obtained at 10 kbp, 1 kbp, 100 bp, and 10 bp were 0.06, 1.0, 19, and 200 kcal⅐bp/mol, respectively. As a consequence, when the differences in K were taken, the values were greatly amplified as the lengths approached 0, with their magnitude being highly sen-sitive to the initial choice of Ϫ1 and . For these reasons, these analyses involving ⌬K were less rigorous than those with n bp K/ RT. The differences in K between the TRS and random DNA are reported in Fig. 3 (inset B). As for n bp K/RT, ⌬K did not decrease uniformly with length. A local minimum was located at 355 bp (118 repeats) for (CTG) n and at 400 bp (133 repeats) for (CGG) n , at which positions, ⌬K was Ϫ1.7 and Ϫ1.2 kcal⅐bp/ mol, respectively, ϳ1 kcal⅐bp/mol lower than expected if the differences were steadily decreasing.
Therefore, this analysis confirms that there is an optimal length at which (CTG) n and (CGG) n writhe more favorably than at shorter or longer lengths and indicates that the free energy of supercoiling further decreases by ϳ1 kcal⅐bp/mol at this optimal length.
In summary, all of the calculations that involved the evaluation of ͗Wr 2 ͘, namely ⌬(͗Wr 2 ͘/L), ⌬(n bp K/RT), and ⌬K, indicate that, whereas the TRS have a greater ability to supercoil than random sequence DNA, tracts of (CTG) n or (CGG) n 500 -550 bp long (167-183 repeats as estimated from n bp K/RT) have the greatest tendency to writhe when compared with the same lengths of random B-DNA. This length is referred to as a region of hyperflexibility.
The presence of this optimal length for the partitioning of writhe in the TRS is significant, considering its surprisingly close correspondence with the repeat size of 180 -200 that demarcates both the premutation range from the full mutation range and the occurrence of small expansions versus large expansions in the FRAXA and DM loci (1)(2)(3).
Dominant Role of the Persistence Length-Ϫ1 and are related to the bending ␣ and torsional ␤ moduli by Equations 4 and 5. We wished to compare a hypothetical DNA that had ␣ equal to that of random sequence DNA, but a greater or smaller ␤, or, alternatively, a hypothetical DNA that had ␤ equal to that of random DNA, but a greater or smaller ␣, to determine whether the region of hyperflexibility depends on differences in ␣, ␤, or both. For these comparisons, two sets of analyses were performed. In the first, Ϫ1 was held constant at 950 Å, and was varied between 0.3 and Ϫ0.5 in intervals of 0.2 so as to simulate five hypothetical DNAs with a torsional modulus varying from 1.48 to 3.84 ϫ 10 Ϫ19 erg⅐cm. n bp K/RT was calculated for the five DNAs, and the value of n bp K/RT for B-DNA was subtracted from each of them. The differences in n bp K/RT are plotted in Fig. 4. This calculation shows that variations in the torsional modulus alone do not produce a substantial local maximum or minimum in ⌬(n bp K/RT). In the second comparison (Fig. 5), five hypothetical DNAs were simulated in which the torsional modulus ␤ was held constant at 2.4 ϫ 10 Ϫ19 erg⅐cm, and Ϫ1 was varied from 1300 to 500 Å in intervals of 200 Å (corresponding to a bending modulus ␣ from 2.63 to 1.01 ϫ 10 Ϫ19 erg⅐cm). n bp K/RT was calculated for each DNA, and n bp K/RT for B-DNA was subtracted. In this case, it is evident that a local maximum or minimum occurs according to the magnitude of ⌬ Ϫ1 . Thus, a persistence length smaller than B-DNA gives rise to a local minimum, whereas a persistence length greater than B-DNA results in a local maximum. These comparisons indicate that whenever two DNA sequences differ in their values of bending moduli, an optimum length window (between ϳ400 and 800 bp when one of them is random B-DNA) will result, in which the differences in free energy of supercoiling will be largest. In addition, different torsional moduli have little influence on the magnitude of the local maximum (or minimum).
In summary, these comparisons enabled us to conclude that the regions of hyperflexibility for (CTG) n and (CGG) n are caused by the lower persistence lengths of the TRS as compared with random B-DNA. . Inset A, difference in the free energy of supercoiling, ⌬(n bp K/RT), between the TRS and random DNA. n bp K/RT for B-DNA was subtracted from n bp K/RT for (CGG) n or (CTG) n and plotted as a function of length. Curve 1, ⌬(n bp K/RT) for (CGG) n ; curve 2, ⌬(n bp K/RT) for (CTG) n . Inset B, difference in the apparent twisting coefficient (K), ⌬K, between the TRS and random DNA. K was calculated for random sequence DNA, (CTG) n , and (CGG) n from n bp K/RT, with T ϭ 293.15 K (20°C). K for B-DNA was then subtracted from K for (CGG) n (curve 1) and from K for (CTG) n (curve 2) and plotted as a function of n bp .

DISCUSSION
We show by statistical mechanical calculations that the free energy of supercoiling for the triplet repeats (CTG) n and (CGG) n is lower than for random DNA and that TRS lengths of 500 -550 bp can accommodate the highest degrees of writhe when compared with the same lengths of random B-DNA. The bending of DNA is required during chromatin organization, recognition among protein complexes distally located along the DNA, and high affinity interactions involving the binding of regulatory factors (13). In addition, torsional constraints are introduced during replication and transcription, due to underwinding and overwinding of the helices (18).
The free energies of supercoiling were also calculated experimentally for random B-DNA from the topoisomer distribution (16,17) and have been computed by statistical mechanics from the values of the bending and torsional moduli (8). These analyses indicated that the free energy of supercoiling is ϳ1150⅐RT for B-DNA chains longer than 2 kbp.
The calculations performed in this study on (CTG) n , (CGG) n , and random B-DNA show that TRS lengths of 400 -600 bp have the highest differences in free energies of supercoiling when compared with analogous lengths of random sequence DNA. We also demonstrate that this behavior depends on the differences in persistence lengths and is independent of the torsional moduli. This increased hyperflexibility of the TRS coincides with the length of repeats (180 -200 units) that demarcates the premutation from the full mutation range in fragile X and myotonic dystrophy and also coincides with the repeat size that leads to far greater expansions (hundreds of repeats) in offspring (1)(2)(3). This correspondence of the region of hyperflexibility with the premutation to full mutation threshold makes it tempting to speculate that triplet repeat expansion is associated with the supercoiling of DNA. No doubt, the poorly understood molecular and cellular events involved with DNA slippage, genetic instabilities, anticipation, and alterations in gene expression that elicit changes in development that are recognized as disease syndromes are complex. We do not propose that DNA structure alone is responsible. However, the dynamic as well as static conformational features of the TRS may play a role.
Cellular processes such as transcription and replication dramatically alter the local superhelical densities of DNA due to the unwinding of the helices (18,28,29). Also, the extent of instability for a (CTG) n or (CGG) n tract depends on its length and orientation relative to the origin of replication in Escherichia coli (30,31). We propose that high levels of supercoiling (writhe) accumulate within the TRS during processes such as transcription and replication, which lead to instability. Some of these events are outlined in Fig. 6. Translocation of the DNA polymerase complex generates positive supercoils ahead of the replication fork, and possibly negative supercoils behind it, on the leading strand (step A) (13,18). Positive and negative supercoils partition preferentially within a tract of the TRS (rather than within random B-DNA sequences) due to its higher flexibility (step B). This partitioning is influenced by the length of the TRS, with segments of 400 -600 bp accommodating the greatest levels of writhe as opposed to the same lengths of random DNA. The TRS-localized increases in positive superhelicity hinder their efficient removal by topoisomerases, thus decreasing the processivity of the polymerase complex. Pausing (32) of the enzymatic complex would then allow reiterative DNA synthesis (33) to take place (step C), thus leading to expanded daughter strands. Alternatively, the polymerase complex may dissociate from the template (step D) and allow the diffusion of positive and negative supercoil domains. This diffusion is accompanied by the release of a newly synthesized strand(s) from its parent strand(s), which causes hairpins (34) FIG. 4. Dependence of the difference in free energy of supercoiling on the torsional modulus ␤. n bp K/RT was calculated for random sequence DNA from Equation 3 using Ϫ1 ϭ 950 Å and ϭ Ϫ0.2, which correspond to a bending modulus ␣ of 1.92 ϫ 10 Ϫ19 erg⅐cm and a torsional modulus ␤ of 2.40 ϫ 10 Ϫ19 erg⅐cm. n bp K/RT was then calculated for five hypothetical DNAs by holding Ϫ1 at 950 Å and setting at Ϫ0.5 (␤ ϭ 3.84 ϫ 10 Ϫ19 erg⅐cm), Ϫ0.3 (␤ ϭ 2.74 ϫ 10 Ϫ19 erg⅐cm), Ϫ0.1 (␤ ϭ 2.13 ϫ 10 Ϫ19 erg⅐cm), 0.1 (␤ ϭ 1.74 ϫ 10 Ϫ19 erg⅐cm), and 0.3 (␤ ϭ 1.48 ϫ 10 Ϫ19 erg⅐cm). n bp K/RT for random DNA was then subtracted from n bp K/RT for each of the five hypothetical DNAs, and the difference (⌬(n bp K/RT)) was plotted as a function of length.
to form in the daughter strand(s). This leads to expansion, then synthesis resumes (step E), and the process is repeated. Hence, TRS writhing may be involved in genetic instabilities.