Mutation Spectra in Fragile X Syndrome Induced by Deletions of CGG·CCG Repeats*

The fragile X syndrome results from expansions as well as deletions of the repeating CGG·CCG DNA sequence in the 5′-untranslated region of the FMR1 gene on the X chromosome. The relative frequency of disease cases promoted by these two types of mutations cannot be ascertained at present because the routine clinical assay monitors only expansions. At least 30 articles have been reviewed that document the involvement of deletions of part or all of the CGG·CCG repeats along with varying extents of DNA flanking regions as well as very small mutations including single base pair changes. Studies of deletion mutants of CGG·CCG tracts in Escherichia coli plasmids revealed a similar spectrum of mutagenic products. The triplet repeat tract in a non-B conformation is the mutagen, not the sequence per se in the right-handed B helix. Hence, molecular investigations in a simple model organism may generate useful initial information toward therapeutic strategies for this disease.

The fragile X syndrome results from expansions as well as deletions of the repeating CGG⅐CCG DNA sequence in the 5-untranslated region of the FMR1 gene on the X chromosome. The relative frequency of disease cases promoted by these two types of mutations cannot be ascertained at present because the routine clinical assay monitors only expansions. At least 30 articles have been reviewed that document the involvement of deletions of part or all of the CGG⅐CCG repeats along with varying extents of DNA flanking regions as well as very small mutations including single base pair changes. Studies of deletion mutants of CGG⅐CCG tracts in Escherichia coli plasmids revealed a similar spectrum of mutagenic products. The triplet repeat tract in a non-B conformation is the mutagen, not the sequence per se in the righthanded B helix. Hence, molecular investigations in a simple model organism may generate useful initial information toward therapeutic strategies for this disease.

General Overview: Genetic Instability and Hereditary Neurological Diseases
Substantial progress has been made in the past 20 years in our understanding of the pathophysiology, genetics, and biochemistry of approximately 20 neurological diseases associated with simple sequence amplification (1,2). These data serve as the overarching subject of this minireview series. Dynamic mutations involving the role of DNA hairpin loops or slipped strand conformations with differing relative stabilities of repeating tri-, tetra-, or pentanucleotide tracts are involved in these expansions and deletions (reviewed in Refs. 1 and 3-6). The diseases, including fragile X syndrome, myotonic dystrophy, Huntington disease, and Friedreich ataxia, are reviewed elsewhere (1), along with their inheritance patterns, chromosomal localizations, protein products, and loci of the repeat sequences. In type 2 diseases, for example, the repeat expansions are massive (thousands of repeats), whereas in type 1 dis-eases, the TRS 2 are in coding regions and elicit a modest expansion of a polyamino acid tract (usually glutamine, but alanine in some diseases) (1,2). The clinical observation of anticipation, the decrease in age of onset, and the increase in severity with progression through a family pedigree are observed with most, but not all, of these diseases. Usually, a more severe neurological syndrome is observed in patients with longer repeat tracts. Substantial work over the past decade has demonstrated that the expansions and deletions are mediated by DNA replication, repair, and recombination, probably acting in concert (reviewed in Refs. 1 and 3-7). The slippage of the repeating DNA complementary strands to form non-B DNA structures, such as hairpin loops or slipped strand conformations, with differing relative stabilities is an important component in the mechanism (1,(3)(4)(5)(6)(7).
In general, the genetic instabilities in the simple repeating sequences are found within the TRS, not in the flanking regions, for the majority of these neurological disorders (1,2). However, a large number of articles (Ͼ30) have described a variety of classical mutations, such as deletions, found in the DNA of fragile X syndrome patients in the vicinity of the CGG⅐CCG repeats. This behavior seems to be more frequent for the fragile X syndrome than for other hereditary neurological diseases (8).
To focus on the molecular mechanisms of the mutagenic spectra found in deletions related to fragile X syndrome, I shall not consider other folate-sensitive fragile sites (1,2,8) that also have CGG⅐CCG expansions.

Complex Family of Types of Mutations, Both Deletions and Expansions, Causes the Fragile X Syndrome
Deletions as well as expansions are important mutagenic processes as related to the fragile X syndrome (Fig. 1). The principal mutation responsible for the fragile X syndrome is generally considered to be the expansion of an untranslated CGG⅐CCG repeat in the 5Ј-untranslated region of the FMR1 gene on the X chromosome (1,8). This mutation is associated with the hypermethylation of the proximal CpG island and the triplet repeat region that gives rise to the down-regulation of the FMR1 gene. An absence or a reduction in the amount of the corresponding protein (FMRP) is responsible for the etiology of the disease. Hence, any type of mutation in the FMR1 gene, including the CGG⅐CCG expansion, might lead to the disease. However, the extent to which the notion that expansions are the predominant mechanism remains to be proven. At least 30 articles on patient DNAs describe point mutations, 2-bp changes, deletions of varying sizes (including the entire gene), and genomic rearrangements that affect part of or the entire gene. Therefore, base pair changes ranging from 1 unit to the entire gene may disrupt the function of FMRP or prevent the protein from being formed, thus giving rise to the fragile X syndrome. In general, the prevailing thought within the fragile X molecular biology community is that the CGG⅐CCG repeat expansions account for at least 95% of the mutations (1,8). However, no quantitative data are currently available 3 because the routine clinical testing for the suspected fragile X syndrome focuses only on the CGG⅐CCG repeat expansion. Accordingly, a large number of other types of mutations may be responsible for the disease phenotype but are currently unrecognized because of the screening methods that are typically employed.
Interestingly, a Google search revealed 290,000 web sites for expansions along with 158,000 sites for fragile X deletions; a PubMed search showed about the same ratio (160 articles on expansions and 101 articles on deletions). I was surprised to learn that the bias was only modestly skewed toward expansions. Thus, on a relative basis, we know a large amount about deletions as the causative mutation.
Expansions and Deletions of CGG⅐CCG Repeats-As stated above, studies on patient materials show that expansions as well as deletions of the CGG⅐CCG repeats are the mutations responsible for the fragile X syndrome (Fig. 1B) (1). However, the ascertainment bias inherent in the clinical assays may skew our understanding toward the expansion mechanisms. It may be noted that extensive mosaicism is observed and that most of our knowledge is based on Southern blot analyses only. Regardless, the massive expansions of this TRS are an important mutagenic mechanism that elicits the disease etiology (reviewed in Refs. 1, 4, and 6 -8). However, at least one case is known of a patient with a deleted region only in the CGG⅐CCG repeat sequence, leaving 15 pure repeats in the DNA (Fig. 1C) (9).
Estivill and co-workers (9) state that the methylation status in this patient is normal but that the level of FMRP may not be enough to prevent the clinical abnormalities of the fragile X syndrome. Deletions of the CGG⅐CCG repeats are common, especially compared with expansions, in molecular biological investigations in model organisms such as E. coli and Saccharomyces cerevisiae (10 -15). In these studies, the triplet repeat contractions were confined to the repeat sequence and did not extend into flanking sequences.
Deletions of a Portion of CGG⅐CCG Repeats and Some Flanking Sequences-Several authors have described deletions containing a portion of the CGG⅐CCG repeats and various lengths of flanking sequences ranging from 30 bp to 1.6 kb (Fig. 1D) (16 -23). In addition, some authors believe that the mutations in the CGG⅐CCG repeats that are less than a full mutation may be due to a deletion event following an expansion of the TRS. The longest deletion (1.6 kb) was reported in the initial discovery (17,18) of this behavior. At least 35 other patients' phenotypes have been characterized within this general category (19). Thus, the general behavior of mutations extending from the TRS into flanking sequences is a common event; therefore, one might surmise that the TRS plays some role in initiating the deletion processes, which is clearly demonstrated below in model systems. The concept of the CGG⅐CCG repeats undergoing substantial expansions followed by deletions of a portion of the repeat sequence along with a flanking tract (17,20) is an intriguing idea but is difficult to document in human systems, where only the end product of the genetic event can be studied. Clearly, further work in model systems will be needed to evaluate this concept.
Deletion of All CGG⅐CCG Repeats and Some Flanking Sequences-At least 16 articles (24 -39) describe large deletions of the entire CGG⅐CCG repeat tracts along with varying extents of DNA flanking sequences (Fig. 1E). These deletions range in size from several base pairs to as much as ϳ13 megabases of DNA along with all of the FMR1 gene and some flanking DNA sequences (36). Several other articles describe the deletion of the entire FMR1 gene and smaller extents of the flanking DNA tracts. The DNA of at least 24 patients was studied. Hence, the phenomenon of gross deletions of the FMR1 gene causing the fragile X syndrome is not an isolated event.
Point Mutations-To this point, the previously described types of mutations are believed to be promoted by the presence of long CGG⅐CCG repeats. However, this subsection on very small mutations is included for the sake of inclusiveness of the types of mutations that are involved in fragile X syndrome. However, no evidence exists to support the role of long CGG⅐CCG repeats in the formation of the mutations described in this subsection.
Several cases of point mutations or 2-bp changes have been observed (Fig. 1F). Willems and co-workers (40) found a patient with the fragile X phenotype and without cytogenetic expression of FMRP with a single point mutation but with a CGG⅐CCG repeat of normal length and an unmethylated CpG island. Also, two different patients were found with intragenic loss-of-function mutations, a single de novo nucleotide deletion in one and a 2-bp change in the other; both patients displayed the classical features of fragile X syndrome (41).   Furthermore, three unrelated fragile X patients were found with a C-to-T point mutation at the 14th nucleotide in intron 10 with normal length CGG⅐CCG repeats (Fig. 1F) (42). Clearly, these point mutations are sufficient for the development of the fragile X syndrome. Alternatively, three unrelated patients were identified with silent mutations in exon 1 (16,17); it is not unexpected that silent mutations would be found in FMR1 exons.

Studies in Model Systems
Conformation(s) of Non-B DNA Fragile X Syndrome Triplet Repeats as Mutagenic Agents-A recent investigation has revealed that fragile X repeats (CGG⅐CCG) are potent inducers of complex multiple-site rearrangements and/or gross deletions in flanking DNA sequences in E. coli plasmids (43). DNA sequence analyses of mutant clones revealed the influence of the length (24, 44, or 73 repeats) and the orientation of the repeat region relative to the unidirectional origin of replication and its transcription status.
Complex rearrangements occurred in the mutant clones because some products contained deletions, inversions, and insertions, and some products had only gross deletions. Fig. 2 (upper panel) shows the types of multiple-site deletions and rearrangements that were found, ranging in size from 0.5 to 1.6 kb. Furthermore, the CGG⅐CCG repeats repeatedly induced, up to 22 times, the formation of identical (to the base pair) mutagenic products, indicating the powerful nature of the complex processes involved (43). Also, the mutations were bidirectional from the TRS. The healed junctions had CG-rich microhomologies of 1-6 bp, CG-rich regions, and putative cruciforms and slipped structures (Fig. 2, lower panel). Thus, essentially the entire mutagenic spectrum observed in patients with fragile X syndrome has been found in this E. coli model system. Accordingly, I submit that this bacterial system has numerous advantages for investigating the molecular aspects of these processes compared with human systems. Indeed, all biochemical processes (replication, repair, and recombination) responsible for the genetic instability of repeating tri-, tetra-, and pentanucleotides were first demonstrated in E. coli or yeast (1, 3, 4, 7) and then later studied and co-opted in eucaryotic systems. Obviously, investigations on neurological or developmental issues must be addressed in eucaryotic systems, but molecular questions on the instability processes may be broached in these simpler systems, which are genetically tractable. Also, earlier investigations found deletions in DNA flanking sequences to the CGG⅐CCG repeats in another E. coli and a COS-1 cell system (44,45).
Models were constructed to explain the mechanisms involved in the formation of the complex multiple-site DNA rearrangements induced by the CGG⅐CCG repeat tract (43). The four critical sequences and/or DNA conformational features (see Fig. 4 in Ref. 43) are apparently operative in a defined set of sequential steps to generate these complex rearrangements. The non-B DNA structures (cruciforms and slipped structures) flanking the healed regions, which may also have an influence on the induction of these mutagenic events, were described (43). However, the breaks occurred inside the CGG⅐CCG tracts in all clones with single and double deletions.
Thus, this sequence and/or conformation is a significant trigger for the complex DNA rearrangements. This model requires the presence of homologous sequences that serve as a substrate for double-strand break repair followed by recombination repair, which leads to deletion of the intervening sequences; healed junctions in the mutant progeny were observed at non-B DNA structures (43). The types of enzymatic systems involved in these rearrangements have been reviewed (1,4).
Relatively few studies have been reported on the molecular aspects of the genetic instabilities in CGG⅐CCG repeats compared with myotonic dystrophy type 1 and 2 and Friedreich ataxia repeats because of the extreme difficulty in working with these highly unstable sequences (11). Furthermore, DNA sequence analyses are difficult because of the repetitive GCrich arrays, which cause extensive slippage during both template preparation and sequencing reactions. Thus, the CTG⅐CAG, CCTG⅐CAGG, and GAA⅐TTC repeat genetic instabilities have been more extensively investigated than the CGG⅐CCG repeats.
Non-B DNA Conformation(s) of Other Repeat Sequences as Mutagenic Agents-Not unexpectedly, other types of long repeat tracts of CTG⅐CAG, CCTG⅐CAGG, and GAA⅐TTC also induce gross deletions and inversions in model systems (13,46,47). These repeat sequences are integral to the etiology of myotonic dystrophy types 1 and 2 and Friedreich ataxia, respectively. Behaviors similar to those described above for CGG⅐CCG repeats were observed except that the fragile X sequence was the most potent, on a base pair basis, of any of the other repeats in causing the mutagenesis (43). Furthermore, the fragile X TRS induced a larger mutation spectrum (Fig. 2) than the other repeat sequences. The long repeats of CTG⅐CAG, CCTG⅐CAGG, and GAA⅐TTC caused deletions of most or all of the repeats and the flanking DNA sequences. Deletions of 0.6 -1.8 kb were found as well as inversions in E. coli and two types of mammalian fibroblast-like cells. Under certain conditions, 30 -50% of the products of episome replication/ transcription in COS-7 cells contained gross deletions. The breakpoint junctions revealed the presence of direct or inverted repeat homologies in all cases. Also, the presence of non-B folded conformations (i.e. slipped structures, cruciforms, or triplexes) at or near the breakpoints was predicted in all cases. Increased negative superhelical density on the plasmids in vivo enhanced the genetic instability of the TRS (13) as expected because it stabilized the formation of these non-B conformations (1,3,7).
Gross deletions and other genomic rearrangements have been documented in patient materials for other hereditary neurological diseases, especially Friedreich ataxia (48). However, many more complex mutations have been reported for fragile X syndrome than for the other diseases (1,2,8).
A long-standing but critical question for many years has been the extent to which the DNA sequences involved trigger these mutagenic reactions versus the non-B DNA conformations that may be adopted in vivo. Wojciechowska et al. (47) definitively demonstrated by three experimental strategies that the non-B DNA conformations are critical for these mutagenic mechanisms, not the sequences per se. Hence, future work should be aimed at evaluating the role of the non-B DNA conformational features and their thermodynamically most stable state in the mutagenic processes. Also, another prior investigation revealed that long CTG⅐CAG tracts induced deletions and rearrange-ments of flanking sequences during recombination in Chinese hamster ovary cells (49).

Future Challenges
The genetic basis for the fragile X syndrome is a variety of types of mutations, including point mutations, double mutations, and deletions of several types of varying lengths up to ϳ13 megabases that either include or do not include the CGG⅐CCG repeats. However, the disease is generally attributed to the massive expansion of the TRS, but the extent of the bias between the different types of mutations is uncertain at present because the clinical assays monitor only expansions, not deletions. Mechanistic studies in E. coli and mammalian cells provide suitable models for evaluating the mechanisms and enzymatic systems responsible for these instabilities. Further investigations in these systems may be useful for initial steps toward therapeutic intervention for this disease.
An important goal of many investigations related to fragile X syndrome is to understand the broad aspects of the pathophysiology of the disease as modulated by TRS lengths and the consequences of these expansion processes. Understanding the enzymatic systems that involve the DNA transactions causing the expansions (4) is critical. The routine clinical testing for this disease focuses on the length of the TRS, thereby not specifically monitoring some of the other types of mutations such as described in this review. Broader based molecular biological studies on patient samples will be required to better understand the extent of the involvement of various types of mutations within the spectra described herein. An understanding of the molecular mechanisms requires appropriate studies in a wide range of systems, including E. coli, yeast, human cells, mice, etc.; obviously, our goal is to understand the mechanisms in humans. Dramatic advances have been made in our understanding of the mechanisms of human hereditary neurological The arrows show the directions of the sequences that were deleted, and the numbers at these arrows designate their healed junction positions. Nucleotides in shaded boxes indicate homology at breaks identified by the DNA sequencing data. The dashed lines between the nucleotides present the continuous intervening sequences. However, the slipped structures and the deletions may be anywhere inside the CGG⅐CCG repeat tracts. The numbers beside the lines indicate the base pairs with direct repeat homology shown for both DNA strands, and the numbers above the boxes present base pairs with inverted repeat homology. Clone 54 shows a cruciform and a slipped strand structure, whereas clone 18 derived from pRW5501 presents a cruciform and two slipped structures. These conformations are representative of the types of non-B structures found. Other types of non-B structures are found at other breakpoint junctions (46,47,50). For further details, see supplemental Fig. 1 in Ref. 43. diseases (1)(2)(3)(4)(5)(6)(7)(8), underscoring the importance of a wide range of strategies for molecular investigations.
The recognition of the roles of non-B DNA structures in human disease offers new strategies for controlling the mutagenic processes. Because the non-B DNA structures, not the DNA sequence per se in the orthodox right-handed B conformations, are the mutagenic agents, methodologies that will reduce the propensity of these tracts to adopt the unorthodox structures may be beneficial. This relationship between non-B DNA structures and genomic disorders has been a multidecade goal of the DNA structural field (1,3,7,47,50). An extremely large number of questions concerning the role of non-B DNA structures in mutagenesis are outstanding (see box 3 in Ref. 50). Hence, this topic is fertile for future investigations.