CAG Repeats Containing CAA Interruptions Form Branched Hairpin Structures in Spinocerebellar Ataxia Type 2 Transcripts*

Spinocerebellar ataxia type 2 (SCA2), one of the hereditary human neurodegenerative disorders, is caused by the expansion of the CAG tandem repeats in the translated sequence of the SCA2 gene. In a normal population the CAG repeat is polymorphic not only in length but also in the number and localization of its CAA interruptions. The aim of this study was to determine the structure of the repeat region in the normal and mutant SCA2 transcripts and to reveal the structural basis of its normal function and dysfunction. We show here that the properties of the CAA interruptions are major determinants of the CAG repeat folding in the normal SCA2 transcripts. We also show that the uninterrupted repeats in mutant transcripts form slippery hairpins, whose length is further reduced by the base pairing of the repeat portion with a specific flanking sequence. The structural organization of the repeat interruption systems present in other human transcripts, such as SCA1, TBP, FOXP2, and MAML2, are also discussed.

Numerous spinocerebellar ataxias (SCAs) 1 make up the largest contribution to the variety of human hereditary neurological diseases that are caused by the expansions of short tandem repeats in single genes (1)(2)(3). More than 20 different autosomal dominant SCAs are clinically recognized. For 12 of them the respective genes and implicated mutations are known. In six SCA subtypes, SCA1, SCA2, SCA3, SCA6, SCA7, and SCA17, the underlying mutations are the expansions of trinucleotide CAG repeats present in the translated sequences of the functionally unrelated genes. The mutated CAG repeats give rise to expanded polyglutamine tracts in the encoded proteins, and the corresponding SCAs together with Huntington's disease, dentatorubralpalidoluysian atrophy, and spinal and bulbar muscular atrophy are known as polyglutamine expansion disorders (4,5). There are, however, other SCAs: SCA8 (6), SCA10 (7), and SCA12 (8), in which the expanded repeats are located outside the protein coding regions of the involved genes. Because their pathogenesis cannot be explained by the altered structure and properties of the encoded proteins, the causative role of RNA in the ethiology of these SCAs was recurrently postulated (9 -11). Questions regarding the possible contribution of RNA in the pathogenesis of the so called polyglutamine diseases were also raised recently (10 -16). The considered mechanism of transcript involvement in these diseases was similar to that demonstrated for the two types of myotonic dystrophy, DM1 and DM2 (11,17,18), and fragile X-associated tremor ataxia syndrome (14,15,19). In brief the expanded repeats in transcripts form stable hairpin structures that impair normal RNA metabolism and sequester some important cellular factors and alter the functions of other transcripts (11, 13, 16 -18, 20).
The SCA2 gene (21)(22)(23)(24) encodes protein ataxin-2 of an unknown function (Online Mendelian Inheritance in Man number 601517). Its normal alleles range in length from 14 to 32 CAG repeats, and the expanded mutant alleles range from 35 to 50 repeats (22)(23)(24). The intermediate length alleles belong to the so called gray zone because they do not give rise to the disease in all carriers. The CAG repeat polymorphism in the SCA2 gene is not confined to its length only. Normal alleles usually contain one to three CAA interruptions (22)(23)(24)(25)(26). In this respect the SCA2 gene resembles SCA1, in which the repeats contain the CAU interruptions (25,27,28), and FMR1 gene involved in the fragile X syndrome and fragile X-associated tremor ataxia syndrome, which contains AGG interruptions within the CGG repeats (29). The SCA2 alleles that harbor a single interruption are very common in various populations (25,26). The CAA triplet is located asymmetrically within the repeated sequence. Typically it is positioned closer to the 3Ј-end of the CAG repeat and is followed by eight 3Ј repeats. Because these SCA2 alleles range in length from 14 to 30 repeats, the pure CAG repeat tracts are most variable in their 5Ј-part (6 -20 repeats). The alleles containing two interruptions occur most frequently in the 8-1-4-1-8 configuration (the interruptions are underlined). Their variants differ by one CAG triplet in one of the three pure repeat tracts (25). Only two SCA2 variants are known to contain three interruptions. They are composed of either 27 or 29 repeats in the 8-1-4-1-4-1-8 and 8-1-4-1-4-1-10 configurations, respectively. In all of the populations studied so far, the most prevalent are two SCA2 alleles containing 22 repeats in the 8-1-4-1-8 and 13-1-8 configurations. They account for about 90% of chromosomes. Uninterrupted normal SCA2 alleles are very rare, whereas the expanded mutant alleles usually lack interruptions (22)(23)(24)26).
In this study we have determined the structures formed by the repeat regions of various normal and mutant SCA2 alleles in transcripts. We have asked intriguing questions related to the possible physiological functions of the SCA2 repeats in RNA and the possible involvement of the mutant transcript structures in SCA2 pathogenesis. More specifically, the questions we have asked were: What is the precise structural role of the CAA interruptions within the CAG repeats? What are the similarities and differences between the repeat interruption systems operating in the SCA1 (16) and SCA2 transcripts? To gain a still wider insight into the role of the repeat interruptions in RNA, we have extended these comparisons to tran-scripts from several other human genes containing interrupted CAG repeats. We have also compared the structures of the repeat regions in various normal and mutant SCA2 transcripts and discussed the structural features of the mutant transcripts that could make them involved in RNA-mediated pathogenesis.

EXPERIMENTAL PROCEDURES
Preparation of DNA Templates and in Vitro Transcription-DNA templates for in vitro transcription were synthesized by PCR from gel-purified SCA2 amplification products (primer sequences SCA2-F, 5Ј-GCGTTCCGGCGTCTCCTTGG, and SCA2-R, 5Ј-GGCTTGCGGA-CATTGGCAGC; GenBank TM NM_002973 positions from 564 to 755) containing different sequence variants of the CAG repeat tracts as described earlier (25). In amplification reactions the forward primer (SCA2-FT7) containing a T7 RNA polymerase promoter (underlined) 5Ј-TAATACGACTCACTATAGGGCCCCTCACCATGT and reverse primer (SCA2-R) 5Ј-GGCTTGCGGACATTGGCAGC were used. The PCR was performed in a 50-l reaction containing: a DNA template (gel-purified PCR product), 1 M of each primer SCA2-FT7 and SCA2-R, 10 mM Tris-HCl (pH 9.0), 1.5 mM MgCl 2 , 50 mM KCl, 0.1% Triton X-100, 200 M of each of the dNTPs, and 1.5 units of Taq DNA polymerase (Promega) in the following conditions: 94°C for 1 min and then 35 cycles at 94°C for 1 s, 60°C for 1 s, and 72°C for 30 s.
A transcription reaction carried out in a 50-l volume contained about 10 pmol of DNA template (10 -20 l of the PCR product described above), 400 units of T7 RNA polymerase (Epicenter), 3 mM guanosine, 0.5 mM each of ribo-NTPs, 10 mM dithiothreitol, 40 mM Tris-HCl (pH 7.9), 6 mM MgCl 2 , 2 mM spermidine, 10 mM NaCl, and 50 units of RNazin. Incubation was at 37°C for 1 h. The transcription products were purified in denaturing 6% polyacrylamide gels, visualized by Stains-all (Serva) staining, extracted from the gel slices by diffusion for 2 h at room temperature in 0.3 M NaOAc, 0.1% SDS, and 0.5 mM EDTA, and recovered by ethanol precipitation. 5Ј-End labeling was performed with 10 units of T4 polynucleotide kinase (Epicenter) and 50 Ci of [␥-32 P]ATP (4500 Ci/mmol; ICN) at 37°C for 5 min, and RNA was repurified by electrophoresis on a 10% polyacrylamide gel in the presence of 7.5 M urea and extracted as described above.
Nuclease Digestions and Lead Cleavages-Prior to structure probing reactions, the 32 P-labeled RNAs were subjected to a denaturation and renaturation procedure in a solution containing 20 mM Tris-HCl (pH 7.2), 80 mM NaCl, 2 mM MgCl 2 , by heating the sample at 80°C for 1 min and then slowly cooling to 37°C. Limited RNA digestion was initiated by mixing 5 l of the RNA sample (50,000 cpm) with 5 l of a probe solution containing either lead ions or nuclease S1 or ribonucleases T1, T2 in three different concentrations: Pb 2ϩ , 0.25, 0.5, and 1 mM; S1 nuclease, 0.5, 1, and 2 units/l (1 mM ZnCl 2 was present in each reaction); RNase T1, 0.5, 1, 1.5 units/l; RNase T2, 0.1, 0.2, and 0.4 unit/l. RNase H digestion was performed by adding to the transcript solution, after denaturation and renaturation procedure (as above), one of the following three types of oligodeoxynucleotides: libraries of random 6-mers (100 pmol) or random 7-mers (100 pmol) or complementary to CAG repeats oligonucleotide (ctg-ODN) with sequence 5Ј-TGCTGCT (5 pmol), and RNase H to the final concentration of 0.10 or 0.25 unit/l. All of the reactions were performed at 37°C for 20 min and stopped by adding an equal volume of stop solution (7.5 M urea and 20 mM EDTA with dyes) and sample freezing.
Analysis of the Reaction Products-To determine the cleavage sites, the products of lead-induced hydrolysis and nuclease digestion were separated in 10 -15% polyacrylamide gels (acrylamide/bisacrylamide, 19/1) containing 7.5 M urea, 90 mM Tris borate buffer, and 2 mM EDTA, along with the products of alkaline hydrolysis and limited T1 ribonuclease digestion of the same RNA. The alkaline hydrolysis ladder was generated by the incubation of the labeled RNA in formamide containing 0.5 mM MgCl 2 at 100°C for 10 min. The partial T1 ribonuclease digestion of the RNAs was performed under semi-denaturing conditions (10 mM sodium citrate, pH 5.0, 3.5 M urea) with 0.2 unit/l of the enzyme during incubation at 55°C for 10 min. The cleavage sites characteristic for nucleases S1 and H were determined by comparison with the homologous S1 ladder (not shown). The RNA fragments generated by nucleases S1 and H terminate with 3Ј-hydroxyls and migrate slower than the corresponding formamide and T1 fragments. Electrophoresis was performed at 1500 V and was followed by autoradiography at Ϫ80°C with an intensifying screen. The products of the structureprobing reactions were also visualized and analyzed by phosphorus imaging (Typhoon; Molecular Dynamics).
Electrophoresis in Nondenaturing Conditions-RNA structure homogeneity was analyzed for each transcript by the electrophoresis of radiolabeled samples in 12% nondenaturing polyacrylamide gel (dimension, 150/140/1 mm; acrylamide/bisacrylamide, 29/1) buffered with 45 mM Tris borate at a fixed temperature of 37°C. Prior to gel electrophoresis, the 32 P-labeled transcripts were subjected to a denaturation and renaturation procedure in a solution containing 10 mM Tris-HCl (pH 7.2), 40 mM NaCl, and 1 mM MgCl 2 by heating the sample at 80°C for 1 min and slowly cooling it to 37°C and mixed with an equal volume of 7% sucrose with dyes. Electrophoresis was conducted at a constant power of 15 W.
RNA Secondary Structure Modeling-RNA secondary structure prediction was performed using the mfold program version 3.1 (www.bioinfo.rpi.edu/applications/mfold) (30). This program is designed to determine the optimal and suboptimal secondary structures of RNA by free energy minimization using the nearest neighbor energy parameters, calculated for 1 M NaCl solution (pH 7.0) at 37°C.

RESULTS
Analyzed Transcripts and Structure Probes-Fourteen different SCA2 transcripts were analyzed in this study (Table I). They included 12 of the 24 normal variants of the SCA2 triplet repeat known to occur in a general population (25). These 12 variants represented all different types of the repeated sequence configuration found in the SCA2 gene. In all transcripts the repeat regions were surrounded by the same specific flanking sequences: 27 nt at the 5Ј-end and 32 nt at the 3Ј-end of the repeats. Seven transcripts contained a single CAA interruption (CAG) 10 CAA (CAG) 8 10-1-8 21 (CAG) 12 CAA (CAG) 8 12-1-8 22 (CAG) 13 CAA (CAG) 8 13-1-8 23 (CAG) 14 CAA (CAG) 8 14-1-8 23 (CAG) 13 CAA (CAG) 9 13-1-9 24 (CAG) 15 CAA(CAG) 8 15-1-8 Two CAA interruptions 22 (CAG) 8 CAA(CAG) 4 CAA(CAG) 8 8-1-4-1-8 23 (CAG) 8 CAA (CAG) 4 CAA(CAG) 9 8-1-4-1-9 23 (CAG) 9 CAA(CAG) 4 CAA(CAG) 8  at various locations within the repeat tract, and the repeats of three other transcripts harbored two interruptions separated by four pure CAG repeats. In the remaining two normal transcripts, the three centrally located CAA triplets, which were also separated by the (CAG) 4 , were present (Table I). Two mutant SCA2 transcripts contained the 36 and 37 uninterrupted CAG repeats typical for SCA2 mutations. Structure analysis of all transcripts was performed using five well characterized probes. Lead ions cleave preferentially flexible single-stranded RNA regions (31-33) but also some relaxed portions of double-stranded structures (34). Ribonucle-ase T1 cleaves specifically single-stranded RNA after G nucleotides. Ribonuclease T2 is less specific and cleaves all phosphodiester bonds within the single-stranded regions but preferentially the ApN bonds (35,36). Nuclease S1 cleaves the accessible single-stranded regions in RNA with no reported base specificity (35,36). We also used two types of short oligonucleotide probes: random library of hexamers (37) and 5Ј-TGCTGCT-3Ј (ctg-ODN) together with Escherichia coli RNase H, which performed RNA cleavages at the sites of RNA/DNA hybrids. The RNA cuts occur at the 3Ј-side of the hybrid, leaving 5Ј-phosphates and 3Ј-hydroxyls at the cleavage sites are similar in all of the analyzed transcripts and discussed in this paper. The S1 cuts in the repeat region, which are different in different transcripts, are indicated by vertical green lines. Fragments of autoradiograms corresponding to sequences forming the structure modules M1, M2, and M3 are also indicated. B, cleavages generated in the 3Ј-part of the 14-1-8 transcript, forming the M3 module, induced by lead ions, S1 nuclease, and T1 ribonuclease (other designations are as in A). The observed pattern of cleavages in this region is very similar in all of the analyzed transcripts. C, secondary structure model of the repeat region of SCA2 transcripts with invariable structures of the repeat flanking sequences shown in details. The cleavage sites, which are similar in all of the analyzed transcripts, are indicated for each probe used (see figure inset for probe designations and cleavage intensity classification). The RNA structure modules (M1, M2, and M3) are marked. The CAG repeat sequence is shown in red and spanned by arrows. The variable structure of the M2 module is shown schematically. (37)(38)(39). All of the RNA structure probing reactions were carried out under very similar solution conditions, and important conclusions were based on comparisons of the results obtained with different transcripts and structure probes.
Overall Structures of All Analyzed Transcripts Show Several Similarities-Examples of structure probing results obtained for representative transcripts are shown in Fig. 1. It can be noticed from the inspection of the complete set of data that the same cleavage patterns are generated by the same probes in the repeat flanking sequences of all transcripts. For example, in those transcripts with one, two, and three interruptions (Fig.   1A), two regions of the repeat flanking sequences are highly susceptible to the S1, T1, T2, and lead cleavages. These regions are marked with blue arrowheads in Fig. 1A. The first susceptible region is located upstream the triplet repeat between nucleotides C 12 and U 16 , and the second is located downstream of the repeat and includes the pentanucleotide CpGpCpCpCp, e.g. C 101 to C 105 in the 14-1-8 transcript (Fig. 1B). At both sides of these highly reactive regions, the RNA fragments highly resistant to cleavages are present, which suggests their involvement in the formation of two autonomous hairpin structures. The two highly reactive pentanucleotide regions men- A, patterns of nuclease S1 cleavages in three SCA2 transcripts 10-1-8, 12-1-8, and 15-1-8 containing one CAA interruption (marked by a red arrowhead) variably located in the repeat region. Vertical green lines show differences in cleavage patterns between transcripts. Transcripts are ordered according to the increasing total number of repeats. Other designations are as described in the Fig. 1A legend. B, nuclease S1 cleavage patterns generated in three similar SCA2 transcripts containing the following repeat configurations: 13-1-8, 13-1-9, and 14-1-8 (other designations as in Fig. 1A). C, proposed secondary structure of the SCA2 transcripts containing one CAA interruption and either an odd (14-1-8) or an even (13-1-8) total number of repeats. Positions of selected G residues of the CAG repeats are marked, and the localization of the A residues of the CAA interruptions is shown by red ovals. For both transcripts two alternatively folded structural modules M2 are presented (conformer I and conformer II). Each is composed of two hairpins either M2a and M2b or M2aЈ and M2bЈ. The designations of internal loops (a and aЈ) and hairpin terminal loops (b and c or bЈ and cЈ) are also shown. Invariant structure modules M1 and M3 are not presented except for the uppermost structure. Other designations are as in the legend to Fig. 1C. D, left panel, patterns of ribonuclease T1 cleavages in two transcripts (CAG)17cl and (CAG)16cl tioned above form terminal loops in the hairpins present in structure modules M1 and M3 (Fig. 1C).
Another common feature of all analyzed transcripts is the enhanced susceptibility to cleavages of both the 5Ј and 3Ј termini of the repeated sequence (also marked by blue arrowheads in Fig. 1A). The two 5Ј CAG repeats show similar reactivity with all the three nucleases used. The ribonuclease T2 cleaves the C 1 pA 1 and A 1 pG 1 bonds (i.e. phosphodiester bonds present within the first CAG repeat), nuclease S1 cuts A 1 pG 1 , and ribonuclease T1 cleaves G 1 pC 2 . The next several repeats are resistant to cleavages by these nucleases (Fig. 1A). With regard to the 3Ј-terminal nucleotides of the repeated sequence, the last five CAG repeats show a great similarity in their cleavage patterns in all transcripts. This similarity is illustrated in more detail for the 14-1-8 transcript in Fig. 1B. The last four internucleotide bonds of this sequence located between the G residues of CAG repeats 22 and 23 are strongly resistant to nucleases and lead-induced cleavages. This effect may be explained by the involvement of these nucleotides in the double helical structure of the M3 module (Fig. 1C). On the other hand, the strongly enhanced susceptibility to lead cleavages of the neighboring 11 nucleotides shown in Fig. 1A can be explained by their presence in between the M2 and M3 modules and in the irregular part of the module M3 (Fig. 1C). To sum up, our results suggest that as many as 10 nucleotides from the four 3Ј-terminal CAG repeats base pair with the 3Ј-flanking sequence (Fig. 1C), and the remaining repeats form the M2 module that has a variable structure depending on the number and localization of the repeat interruptions.
To make sure that the molecular architecture of the M3 module was properly assigned and that the specific repeat flanking sequences indeed do not base pair with each other, we have deconstructed this module in a way shown in Fig. 2C, so that most of the 3Ј-flanking sequence was excluded from the interaction postulated for the M3 in Fig. 1C. This was achieved by the hybridization of the 20nt-ODN oligodeoxynucleotide complementary to the terminal 20 nt of the 3Ј-flanking sequence (Fig. 2C). The highly efficient ODN binding as well as the structural homogeneity of the analyzed transcripts and their hybrids with the ODN was confirmed by the results of polyacrylamide gel electrophoresis carried out under nondenaturing conditions (Fig. 2A). The results of structure probing experiments performed before and after the 20nt-ODN hybridization are shown for the 12-1-8 and 15-1-8 transcripts in Fig.  2B. The cleavages generated in the transcript structure, which was rearranged by the 20nt-ODN, occurred as expected at different sites and pointed to the structure shown in Fig. 2C. In this structure all CAG repeats are involved in the formation of the rearranged M2 module composed of the repeated sequence and downstream pseudorepeats 5Ј-CCGCCGCCCG-3Ј. Importantly, there are no structural differences in the 5Ј-flanking region (M1 module) between the 20nt-ODN hybridized and nonhybridized transcripts (the same reactive regions are shown by blue arrowheads in Fig. 2B). This indicates that the 5Ј-and 3Ј-flanking sequences indeed do not interact with each other in all of the transcripts analyzed in this study (Table I).
Interrupted CAG Repeats Form Branched RNA Structures-Unlike the specific flanking sequences presented above, the repeats themselves form diverse branched structures (M2 module) in the analyzed transcripts. All seven transcripts harboring the single interruption are cleaved in two regions. One of these regions is located in the central portion of the 5Ј-pure CAG repeat tract, and the other is at the site of the interruption (Fig. 3, A and B). This cleavage pattern points to similar secondary structures of the repeat regions, which contain three-way junctions (loop a) from which two hairpins protrude (Fig. 3C). These form the M2b hairpin, which always contains a small 4-nt terminal loop c harboring the interruption and M2a hairpin that has a more variable structure. Both the length of the M2a stem and its loop b size are different in different transcripts. The size of the terminal loop b depends on the total number of repeats in a transcript, consisting of 7 nt in transcripts with an odd number of repeats and 4 nt in those with an even number (upper and lower structures shown in Fig. 3C). This dependence is clearly demonstrated by the S1 and T1 nuclease cleavage patterns presented in Fig. 3 (A, B, and D). Because the length of the pure 5Ј-CAG repeat tract increases from 10 repeats in the 10-1-8 to 15 repeats in 15-1-8, the regions of enhanced S1 reactivity include repeats 4 and 5 and repeats 7 and 8, respectively (Fig. 3A). However, the structure of the M2 module is also dependent on the length of the 3Ј-pure CAG repeat tract. For example, in transcripts 13-1-8 (22 repeats) and 13-1-9 (23 repeats) that have the same length of the 5Ј-pure repeat tract, the nuclease S1 cleavage patterns are different. The regions of differences are marked with a vertical green line in Fig. 3B. On the other hand, the transcripts 13-1-9 and 14-1-8, which have the same total number of repeats, have similar nuclease S1 cleavage patterns. Furthermore, the number of cuts generated by nucleases in the central portion of the 5Ј-pure CAG repeat tract is higher than that which is needed to explain the terminal loop sizes. It is therefore likely that in each of the transcripts containing either odd or even numbers of repeats, two slightly different structures coexist (conformers I and II in Fig. 3C). The structures within each pair of conformers differ in the localization of the junction (loop a or aЈ) and terminal loop b or bЈ motifs. For example in conformer I of the 14-1-8 transcript, terminal loop b is formed by repeats 6 and 7, whereas loop bЈ is formed by repeats 7 and 8 in conformer II (Fig. 3C). Based on the intensities of the cleavages generated by the nucleases S1 (Fig. 3B) and T1 (Fig. 3D), it may be concluded that conformer I, which contains the hairpins M2a and M2b, predominates under the conditions of structure probing.
In transcripts containing three interruptions in their repeat tracts all three of these sites show enhanced susceptibility to cleavages (indicated by the vertical green lines in Fig. 1A). As many as four reactive regions exist within the repeated sequence in the transcript containing two interruptions in the 8-1-4-1-8 configuration. Two of these regions include interruptions, and the other two are in the pure CAG repeat tracts (Fig.  1A). The observed cleavage patterns are consistent with the more relaxed structures of the M2 module in transcripts containing two and three interruptions as compared with transcripts harboring one interruption (Fig. 4). The M2 modules in the former two classes of transcripts contain two junctions (described in Ref. 13) containing 17 and 16 pure CAG repeats surrounded at both sides with additional complementary sequences, which are marked as G-C clamp in the presented structure models. These transcripts form fixed secondary structures containing either a 7-nt or a 4-nt terminal loop (13). The ranges of cleavages generated by nucleases in these terminal loops are marked with a blue bracket in the electropherograms. The cleavage sites and intensities are shown for three enzymatic probes (as in C) in the terminal loops of the established hairpin structures. Right panel, RNase T1 cleavages generated in the loop b (bЈ) regions of two SCA2 transcripts 12-1-8 and 13-1-8 representing transcripts with an odd and even total number of repeats. The ribonuclease T1 cuts specific for the alternative terminal loops b and bЈ are shown by green and blue lines, respectively. In the 12-1-8 transcript the 7-nt loop b comprises both G 5 and G 6 residues, and loop bЈ harbors G 6 and G 7 residues. In 13-1-8transcript the smaller 4-nt terminal loops b and bЈ contain only one G residue, G 6 or G 7 , in conformer I and conformer II, respectively.
(loops a and b) and three terminal loops (loops c-e), which are composed of the 4-nt sequence (either the 5Ј-AGCA or 5Ј-AACA) (Fig. 4).
The structural feature that all normal SCA2 transcripts have in common is the canonical W-C base pairing between the G 1 pC 2 sequence of the first two repeats and the complementary GpC sequence from the fourth and fifth repeat counting from their 3Ј-end, shown by an orange background in the structures presented in Fig. 4. Our experimental data suggest that this region is involved in coaxial stacking between the G-C base pairs located at the bases of both the M2 and M3 modules. The main evidence for this stacking interaction is the observed strong resistance of the sequence located between the M2 and M3 modules to the single strand specific nucleases and the inaccessibility of this sequence to the short complementary ODN hybridization. The occurrence of the less energetically favorable 7-nt terminal loop b in the M2a hairpin of transcripts containing an odd number of repeats and a single interruption (Fig. 3C) is also in support of the postulated coaxial stacking.

Structure of CAG Repeat Region in Mutant Transcripts Is Different in Several
Respects-The structures of two mutant transcripts, the pure(CAG)36 and pure(CAG)37, differ from those of all investigated normal transcripts in their M2 module only. In the approximately central portion of the CAG repeat tract of the pure(CAG)36 between the residues G 13 and A 17 (the guanyl residues of repeats 13 and 17), a number of cleavages are generated by all enzymatic probes used (Fig. 5, A-C). The strongest cuts occur between positions A 16 and A 17 and correspond exactly to the 4-nt terminal loop a of the hairpin structure formed by the 32 CAG repeats (Fig. 5D). This result also confirms the involvement of the 3Ј-terminal 10 nucleotides of the CAG repeat in the formation of the autonomous module M3. The remaining weaker cuts (G 13 -A 16 ) can be assigned to the terminal loops of the alternatively aligned "slipped" hairpin structures formed by the decreased number of repeats: 29, 27, and 25, which have the terminal loops aЈ, aЉ, aٞ, and harbor the residues G 15 , G 14 , and G 13 (Fig. 5, C and D). The quantitative analysis of the decreasing intensities of the S1 cuts (Fig. 5B) as well as T1 and T2 cuts (Fig.  5C) allowed us to determine the shares of the alternative M2, M2Ј, and M2Љ structures to be about 50, 30, and 20%, respectively. In Fig. 5D the structures of the M2 and M2Ј modules only are shown. Other evidence for the "slippery " nature of the M2 module structure comes from the analysis of cuts generated in the region separating the M2 and M3 modules (Fig. 5, B and C). In the case of the pure(CAG)36 variant containing the M2Ј module, this region is formed by as many as four CAG repeats (from A 30 to A 33 in Fig. 5D). In contrast, in the normal transcripts that contain interrupted repeats, no nuclease cuts were observed in this region. This difference is clearly demonstrated also by the RNase H cleavage patterns defined by the short ODN hybridization sites. Both probes of this type the random hexamers and ctg-ODN effectively hybridize to the spacer sequence between the M2 and M3 modules of the pure(CAG)36 and induce efficient RNase H cuts between A 30 and A 33 (Fig. 5A). It should be recalled here that in all analyzed normal transcripts that contain the CAA interruptions, the region between the modules M2 and M3 is completely inaccessible to the short ODN probes as shown for the 15-1-8 transcript in Fig. 5A (green vertical line). On the other hand, other sites are similarly accessible to these ODN probes in both normal and mutant SCA2 transcripts.

DISCUSSION
The RNA-mediated pathogenesis postulated for the increasing number of human diseases caused by the expansions of short tandem repeats in their responsible genes is one of the major recent developments in the field of dynamic mutations (40). The molecular basis of RNA toxicity is, however, poorly understood, partly because the related problems regarding the normal biological roles of various simple sequence repeats in transcripts remain unsolved (10,12). In this study we have performed an in depth structure analysis of the CAG repeat region of the SCA2 transcript. This tran- script was interesting not only from the perspective of its putative involvement in the SCA2 pathogenesis. It was also important as the first representative of many other human transcripts containing the most prevalent CAG triplet repeat harboring the most common CAA interruptions.
The approach we have used for the structure analysis of the repeat region of the SCA2 mRNA consisted of several steps. First, we applied the mfold program to predict the secondary structure of the entire mRNA and the structure of its repeat region composed of the repeats themselves and  Fig. 1A. B, cleavages induced by S1 nuclease in the mutant transcript pure(CAG)36 at two different probe concentrations (1 and 2 units/l). Prior to nuclease S1 digestion this transcript was subjected to a denaturation/renaturation procedure in the structure probing buffer in the presence (ϩ20nt-ODN) or absence (Ϫ20nt-ODN) of 20nt-ODN as described in the legend to Fig. 2A. The results of the experiment with 20-nt ODN-hybridizing mutant transcript confirm the structural conclusions drawn from similar experiments conducted with normal transcripts (see Fig. 2). C, phosphorus imaging peaks representing cleavage patterns in the repeat region (from G 1 to G 36 ) of the pure(CAG)36 transcript. The cleavage sites and intensities are shown for two probes: RNase T1 (dark blue) and RNase T2 (blue). The black line represents the guanosine specific T1 ladder. The peaks corresponding to the cuts in the terminal loops a, aЈ, and aЉ of alternative hairpin structures are indicated. specific flanking sequences of different lengths. By comparing these structures we selected the appropriate RNA module whose structure was expected to be preserved in the sequence context of the entire mRNA. This SCA2 mRNA fragment was subjected to detailed experimental structure analysis using chemical and enzymatic probes. We also assumed that the structure of the analyzed fragment well corresponds to that present within the endogenous SCA2 mRNA in its natural cellular environment. Both of these assumptions were strongly supported by our recent experimental results ob-tained for similar RNA structures containing the CAG repeats (16,41). The experimentally determined structures of the SCA2 transcripts were then analyzed again with mfold to calculate their thermodynamic stabilities. This combination of RNA structure prediction and experimental structure probing proved highly informative also in studies of other RNAs performed in our laboratory (12,13,16,20,34,41).
The analysis of numerous normal and mutant variants of the SCA2 transcript allowed us to establish the features of these structures, define the rules of their folding, and compare their FIG. 6. Structural organization of triplet repeat regions in the SCA1 and SCA2 transcripts. A, simplified secondary structure models based on the experimental structure probing. Transcripts containing either pure CAG repeats or repeats having asymmetrically located single interruptions are shown. Black and blue lines schematically represent the repeated sequence and specific flanking sequences, respectively. The positions of G 3 U and G 3 A substitutions are shown by red ovals. B, structure destabilizing effects predicted for the CAU and CAA interruptions. Thermodynamic parameters were calculated for 10 pure CAG repeats (the leftmost structure) and for 10 repeats containing either G 3 A or G 3 U substitution in the third CAG triplet. Several alternative secondary structures are shown, and their predicted stabilities are expressed as ⌬G values. The separate ⌬G values (green) assigned for the selected subfragments of these structures (loop regions shown in green frames) are also presented. The rules of the CAG repeat folding found in the experimentally determined structures of the SCA1 and SCA2 transcripts are shown with an orange background.
folding patterns with those of other triplet repeat expansion disease-related transcripts. The interplay between the repeated and specific flanking sequence is an important element of this folding characteristics. It may also influence significantly the correlation between the repeat length, the stability of the hairpin structure it forms, and the ability of this structure to trigger RNA pathogenesis in cells. The repeat flanks may form base pairs with each other and stabilize the repeat structure considerably as was shown for the CAG repeats in the SCA1 (16) and SCA6 (41) transcripts (Fig. 6A). Alternatively, the flanking sequences may not interact with each other, allowing the repeats to form several slipped hairpins as demonstrated for the DMPK (20) and dentatorubralpalidoluysian atrophy (41) transcripts. In the SCA2 transcript the nature of this interaction is different. The flanking sequences do not interact with each other, but one of them, the 3Ј-flanking sequence, interacts with 3Ј-terminal repeats. This kind of the interaction does not force the pure CAG repeats in the SCA2 mutant transcript to a single alignment, but instead several slippery hairpin structures are observed (Fig. 6A). Also the way in which the CAG repeat structure fixation is achieved in the SCA2 transcripts containing the interrupted repeats is different from that described for the SCA1 and SCA6 transcripts (16,41). It is most likely the coaxial stacking interaction between the M2 and M3 modules that is responsible for this effect. According to thermodynamic data the two G-C/G-C pairs involved in such stacking would have a significant Ϫ3.8 kcal/mol contribution to the overall ⌬G of the system. An interesting picture also emerges from the analysis of structural effects caused by the CAA interruptions within the CAG repeats. The strong tendency of these G 3 A substitutions to localize in small 4-nt terminal loops of hairpins is evident. The CAG repeat shows enormous plasticity in the presence of even a single substitution of this type. It is noteworthy that none of the 12 analyzed SCA2 transcripts gives rise to the mixture of stable conformers having entirely different secondary structures, as was the case for the repeat region of the SCA1 transcript containing two interruptions (16). The internal G 3 A substitutions force the CAG repeats to assume a single stable conformation. In this context the question may be asked: Which element of the interruption system plays the predominant role in determining the structural properties of the repeat region in the SCA2 and SCA1 transcripts? Is this the type of the base substitution itself? Or is it the number and localization of the repeat interruptions within the repeat tract? Some answers to these questions come from the detailed comparison of the investigated 13 SCA1 (16) and 12 SCA2 (this study) variants of the interruption system, which differ in both the type of the base substitution and in its dislocation along the repeated sequence. Using the available thermodynamic parameters of various structural motifs in RNA, the theoretical stabilities of the secondary structures formed by the CAG repeats containing either the CAA or CAU interruptions at various locations were calculated (Fig. 6B). It may be noticed that there are no differences in the calculated ⌬G values for the repeats containing both types of the interruptions. The G 3 A or G 3 U substitution may localize in various types of the singlestranded regions: in the symmetric or asymmetric internal loops, terminal loops, or three-way junction, giving different contributions to the total ⌬G of the structure (Fig. 6B). In agreement with mfold thermodynamic parameters, the lowest destabilizing effect is expected to arise from the interruption localized in the symmetrical internal loop 4 nt/4 nt and within the 4-nt apical loops in the branched structures. The predicted single hairpin and branched structures are ordered according to the decreasing stability in Fig. 6B. Our experiments show, however, that G 3 A and G 3 U base substitutions exert different structural effects within the CAG repeats. These effects are also different from those predicted to be the most favorable. In the SCA1 transcripts the single G 3 U substitution, which is asymmetrically located within the repeat tract, localizes in the asymmetric internal loop 4/1 in the nonbranched hairpin structure (16) (upper structure shown by an orange background in Fig. 6B). In contrast, the SCA2 repeats containing the single asymmetrically located G 3 A substitutions preferentially form branched structures that always contain the three unpaired nucleotides in the junction, the 4-nt terminal loop harboring the interruption, and either the 4-or 7-nt terminal loop composed of pure CAG repeats (lower structure shown by an orange background in Fig. 6B). If two interruptions break the regularity of the CAG repeat tracts, they are always separated by one CAG repeat in the SCA1 transcripts and by four CAG repeats in SCA2 transcripts (16,25). Nevertheless, the rules of their folding and the sites of the preferred localization of the interruption in the secondary structure remain the same. The effect of the three interruptions in the SCA2 transcript (m-1-4-1-4-1-n) is also in accord with the above rules, but it is entirely different in the SCA1 transcript (m-1- 1-1-1-1-n). In the latter case this interruption system forces the CAG repeats to form two hairpin structures composed of pure repeats the (CAG) m and (CAG) n (16).
The results of the recent survey show that the CAG repeats are the most frequent triplet repeats in human mRNAs (12), and CAA interruptions (coding also for glutamine) disrupt the regularity of these repeat tracts most often. Moreover, the CAA interruptions occur in different numbers and at various positions within the CAG repeats in human transcripts. It was therefore tempting to make the first insight into the structural features of several other repeat interruption systems that were brought to light recently. One of them is present in transcript from the well known FOXP2 (forkhead box P2) gene involved in speech and language. Exons 5 and 6 of this gene encode a long poly(Q) tract. Interestingly, despite its length (40Q) this tract is monomorphic in the number of repeats both in the general population and in patients affected by the severe speech and language disorder (42). This may be so because as many as 14 CAA interruptions preventing the repeat slippage and expansions in DNA alternate the CAG repeat in the 4-1-4-2-2-2-3-5-2-2-5-1-5-1-1 configuration (numbers of CAA interruptions underlined). Another example is the transcript from the MAML2 (mastermind-like 2) gene, the protein product of which functions as a co-activator of the Notch signaling pathway (43). Exon 2 of this gene encodes the polyglutamine stretch polymorphic in length in a general population (31Q-34Q). It contains the interrupted CAG repeat in the 1-1-1-2-1-13-1-5-1-1-1-1-1-1-1-2 configuration (polymorphic tract in bold type). Last but not least is the transcript from the TBP gene encoding TATA box-binding protein involved in SCA17 (44). Its interrupted repeat region also shows length polymorphism in a normal population (29Q-42Q), and the interruptions are retained in all SCA17 patients (44). Following are the examples of one normal and two pathogenic TBP alleles: 3-3-8-1-1-1-19-1-1 (38Q), 3-3-6-1-1-1-31-1-1 (48Q), and 3-3-9-1-1-1-16-1-1-1-16-1-1 (55Q). It will be intriguing to see the diversity of their RNA structures as mfold predicts further variations on the theme of structural outcomes of the CAG repeat interruptions (Fig. 7). In the predicted secondary structures of the repeat regions of the FOXP2 and MAML2 transcripts, the G 3 A substitutions localize in numerous junctions and terminal loops, and their large number prevents the formation of long hairpin structures. On the other hand, the presence of longer runs of the pure CAG repeats in the variants of the TBP transcript allows such hairpins to be formed. The stems of these hairpins formed in two TBP mutant transcripts shown in Fig. 7 are composed of the 28 and 30 CAG repeats.
Finally, referring to the involvement of the SCA2 mutant transcript in the putative RNA-mediated pathogenesis of SCA2, we have gathered examples of the normal and mutant transcript structures in Fig. 8. The RNA structures formed by the longest known representatives of the normal SCA2 alleles containing either no, one, two, or three CAA interruptions do not reach the pathogenic threshold, which is set up for SCA2 at 35 pure CAG repeats. Of these, 32 repeats are involved in the formation of a single hairpin structure as shown in this study (Fig. 5). The interrupted SCA2 alleles, which have a total repeat length higher than that threshold (45)(46)(47), do not fulfill the RNA structure criterion as they form shorter branched hairpins (27-1-8 and 8-1-9-1-4-1-4-1-10 transcripts in Fig. 8). These branched hairpins may have a reduced ability to interact with double-stranded RNA-binding proteins, e.g. the RNA-dependent protein kinase PKR (48,49) or a specific 63-kDa CAG repeat-binding protein (50). In SCA2 patients containing the uninterrupted repeats encoding 39Q, the typical onset of the disease is between 38 and 55 years of age (51). However, in one of the SCA2 families in which the 39Q tract was encoded by the genetically stable interrupted repeat 8-1-9-1-4-1-4-1-10, all five carriers developed the inherited Parkinsonism with motor complications but not cerebellar abnormalities at the age of 64 -88 years (47). A number of other allelic variants of the SCA2 gene were also described. They contained the single CAA interruptions in the 3Ј parts of their CAG repeat tracts, e.g. variant 27-1-8 in Fig. 8. These variants, however, were not associated with the SCA2 disease in most carriers (48,49). In conclusion, the good correspondence of the RNA structures formed by the CAG repeats in SCA2 transcripts with the clinical data, together with the lack of such a correlation with the lengths of the poly(Q) tracts, may be considered supportive for the RNA contribution to SCA2 pathogenesis.