Structural Diversity of Triplet Repeat RNAs*♦

Tandem repeats of various trinucleotide motifs are present in the human transcriptome, but the functions of these regular sequences, which likely depend on the structures they form, are still poorly understood. To gain new insight into the structural and functional properties of triplet repeats in RNA, we have performed a biochemical structural analysis of the complete set of triplet repeat transcripts, each composed of a single sequence repeated 17 times. We show that these transcripts fall into four structural classes. The repeated CAA, UUG, AAG, CUU, CCU, CCA, and UAA motifs did not form any higher order structure under any analyzed conditions. The CAU, CUA, UUA, AUG, and UAG repeats are ordered according to their increasing tendency to form semistable hairpins. The repeated CGA, CGU, and all CNG motifs form fairly stable hairpins, whereas AGG and UGG repeats fold into stable G-quadruplexes. The triplet repeats that formed the most stable structures were characterized further by biophysical methods. UV-monitored structure melting revealed that CGG and CCG repeats form, respectively, the most and least stable hairpins of all CNG repeats. Circular dichroism spectra showed that the AGG and UGG repeat quadruplexes are formed by parallel RNA strands. Furthermore, we demonstrated that the different susceptibility of various triplet repeat transcripts to serum nucleases can be explained by the sequence and structural features of the tested RNAs. The results of this study provide a comprehensive structural foundation for the functional analysis of triplet repeats in transcripts.

Trinucleotide repeats (TNRs) 4 belong to a family of microsatellite sequences also known as short tandem repeats or simple sequence repeats. TNRs are present in both prokaryotic and eukaryotic genomes and, similar to other microsatellites, show high mutation rates resulting in frequent length variability. Polymorphic TNRs are better tolerated than dinucleotide and tetranucleotide repeats in translated sequences because their length variation does not change the open reading frame.
A recent survey of the human genome reference sequence showed that it harbors more than 32,000 tracts of uninterrupted TNR sequences composed of six or more repeated units. In annotated human exons, which account for less than 3% of the genomic sequence, there are as many as 1,030 TNR tracts (61). Some AT-rich TNR types, such as CTT, AAC, and AAT, are particularly underrepresented in exons, whereas GC-rich repeats (CGG, CAG, and CCG) are highly overrepresented, implying that these sequences have a functional significance. The AT-rich and GC-rich TNRs tend to localize preferentially in the 3Ј-UTR and 5Ј-UTR, respectively (1). About 60% of exonic TNRs are localized in the open reading frame. These coding repeats are primarily translated to the poly-Gln, poly-Ala, poly-Glu, and poly-Leu tracts. However, the amino acid coding property of TNRs is not the only feature for which these sequences are selected in exons. The other properties of TNR sequences that manifest themselves on the levels of DNA, RNA, or both may also contribute to the functional importance of these sequences and their prevalence in exons. These properties of TNRs may include their ability to form higher order structures in single-stranded DNA and transcripts. TNR structures and their unstructured sequences may play important regulatory roles in numerous cellular processes, such as DNA replication repair, and at various steps of gene expression, ranging from transcription to mRNA decay (1)(2)(3)(4).
Little is known about the structures and physiological functions of normal TNR tracts in genes and transcripts. Existing knowledge has been gathered mainly on a subset of TNRs known to undergo pathogenic expansions that lead to human neurological diseases (1)(2)(3)(4). This fragmentary knowledge does not allow for either a deeper insight into the possible roles played by TNRs in transcripts or a more comprehensive characterization of structural, functional, and evolutionary relationships within the TNR family.
In this study, we performed a survey of RNA structures formed by all types of TNRs, using biochemical and biophysical methods. To obtain more information about the detailed features of these sequences, we carried out a structural analysis in various solutions and under different temperature conditions. We classified TNRs into four groups according to their ability to form higher order structures. TNRs showing the highest structure-forming potential were further subjected to more detailed structure characterization using biophysical methods such as UV-monitored structure melting and circular dichroism (CD). We also investigated the biochemical properties of various TNR RNA sequences by analyzing their susceptibility to endogenous nucleases present in serum and HeLa cell extracts. Taken together, our results provide new information regarding the structures and properties of triplet repeat RNAs, which may shed new light on the functions of these abundant sequences in cellular transcripts.

EXPERIMENTAL PROCEDURES
RNA Preparation-The RNAs used in this study were prepared by in vitro transcription of synthetic DNA templates with T7 RNA polymerase as described earlier (5). For transcripts forming tetraplex structure as well as transcripts synthesized from tetraplex-forming DNA templates, the in vitro transcription was significantly less efficient. Therefore, some RNAs, especially those which were used for biophysical analysis, were chemically synthesized (Metabion). All transcripts were purified by electrophoresis in a denaturing 10% polyacrylamide gel.
Electrophoresis under Non-denaturing Conditions-RNA structures formed by each transcript were analyzed by electrophoresis of radiolabeled RNA samples in 10% non-denaturing polyacrylamide gels (acrylamide/bisacrylamide, 29/1) buffered with 45 mM Tris borate (0.5ϫ Tris borate) at a fixed temperature of 4 or 25°C. In some experiments, either 1 mM or 10 mM MgCl 2 , as well as 40 mM NaCl, was present in the electrophoretic buffer. Prior to gel electrophoresis, ϳ1 pmol of the 32 P-labeled transcripts was subjected to a denaturation and renaturation procedure in a buffer containing either 1 mM or 10 mM MgCl 2 and 40 mM NaCl, 10 mM Tris-HCl, pH 7.2. The sample was mixed with an equal volume of 7% sucrose with dyes before gel loading. Electrophoresis was performed at 15 watts.
Thermodynamic Measurements-Oligoribonucleotides were melted in standard buffers containing either 100 mM NaCl, 20 mM sodium cacodylate, 0.5 mM Na 2 EDTA (pH 7.0) or 100 mM KCl, 20 mM sodium cacodylate, 0.5 mM Na 2 EDTA (pH 7.0). Absorbance versus temperature melting curves were measured simultaneously at 260 and 295 nm with a heating rate of 1°C/ min from 0 to 90°C on a Beckman DU 640 spectrometer with a water-cooled Peltier thermoprogrammer. Melting curves were analyzed, and thermodynamic parameters were calculated using the MeltWin 3.5 program (6). To calculate thermodynamic parameters for melting recorded at 295 nm for (AGG) 17 and (UGG) 17 , the spectra were inverted.
UV and CD Spectra-Experiments were performed on Beckman DU 640 and Jasco J-810 spectrophotometers for UV and CD spectra, respectively, in the same buffers as for UV melting measurements. For UV experiments, the RNA samples were either heated at 90°C or cooled at 4°C for 3 min, and duplicate spectra were recorded. The UV spectra were corrected for buffer and adjusted at 330 nm. The CD spectra were recorded in quadruplicate in both buffers at room temperature.
Stability of RNA in Serum and Cell Extract-Approximately 20 pmol of non-labeled and 45,000 cpm of 32 P-labeled transcript were added to RPMI 1640 medium containing 5% thermally inactivated fetal bovine serum or HeLa whole-cell extract (1 g/l protein) (7). Samples were incubated at 37°C for different time intervals (from 1 to 135 min) and frozen on a dry ice bath after the stop solution was added. Each sample was analyzed using 8% denaturing polyacrylamide gel-electrophoresis. Quantitative analysis was performed using FLA-5100 (Fujifilm) and Multi Gauge 3.0 software.

RESULTS
A total of 20 transcripts composed from each of the 20 different types of trinucleotide motifs reiterated 17 times were assayed to determine what structures they form in solutions at different temperatures and salt conditions. We utilized a structure-probing method involving limited digestion of transcripts with nucleases and lead ions at low (1 mM) and high (10 mM) magnesium ion concentration in the presence of sodium and potassium ions at temperatures of 4, 25, and 37°C. The probes used included single strand-specific ribonucleases T1 (G-specificity); Cl3 (C-specificity), T2 (A-preference), and S1 and mung bean nucleases (known to be sequence-nonspecific); the ribonuclease V1 (shows a preference for double-stranded RNA) (8); and lead ions that cleave single-stranded RNA and relaxed double-stranded RNA structures (9). The structural features of the transcripts were also analyzed by (i) electrophoresis in nondenaturing polyacrylamide gels, (ii) UV absorbance-based melting measurements, and (iii) UV and CD spectra.

RNAs Composed of Triplet Repeats Fall into Four Structural Classes According to Structure Probing and Migration in Native
Gel-Four different types of cleavage patterns could be distinguished for the 20 transcripts ( Table 1). Examples of these patterns are shown in Fig. 1A for (CAA) 17 , (AUG) 17 , (CGA) 17 , and (UGG) 17 . In (CAA) 17 , all probes generated cuts of similar intensity along the entire sequence. This class of unstructured tran-scripts (class I) also includes RNAs composed of UUG, AAG, CUU, CCU, CCA, and UAA motifs. Five transcripts composed of AUG, UAG, UUA, CAU, and CUA motifs fall into the second class. At 25°C, cleavages at central repeats were 2-10-times stronger than at the side repeats. V1 cleavages occurred throughout the entire transcript, but cuts at the 5Ј end were stronger. When structure probing was performed at 37°C, cleavage intensities were similar along the entire RNA sequence. These five transcripts appeared to form a semistable hairpin structure (class II). The cleavage pattern characteristic for the third class of transcripts represented by the (CGA) 17 in 17 and all (CNG) 17 transcripts also belong to this class. Single strand-specific nucleases showed 10 -100 times stronger activity at central repeats than at repeats located at the 5Ј and 3Ј ends, indicative of hairpin structures. Ribonuclease V1 cleaved only the 5Ј end of the hairpin stem formed by these six types of repeats. A similar cleavage pattern was also obtained at 37°C. The fourth class includes transcripts composed of UGG and AGG repeats. The cleavage patterns obtained at 25°C resemble those observed for transcripts forming semistable hairpins. The stability of their structures is as high as that of transcripts forming stable hairpins (class III) despite their inability to form W-C base pairs. At 60°C, enhanced nuclease cuts were still clearly visible in the central repeats of these transcripts (supplemental Fig. 1). The intensities of the nuclease cuts were different at both ends of the highly reactive region, being relatively stronger at its 3Ј end (Fig. 1A).
All 20 transcripts were also analyzed in non-denaturing gels at different temperatures and salt conditions. The electrophoretic mobility of transcripts representing the first three structural classes in 0.5ϫ Tris borate buffer at 25°C is shown in Fig. 1B. Transcripts forming stable hairpin structures migrated  17 , and (UGG) 17 were denatured and renatured in buffer B and treated with two concentrations of nuclease S1, V1, mung bean (MB), RNase T1, T2, Cl3, and lead ions (Pb). Lane Ci, incubation control (no probe); lane F, formamide ladder; lane T, guanine-specific ladder; lane Cl, cytosine specific ladder. Electrophoresis was performed on 15% polyacrylamide gels under denaturing conditions. Positions of selected G-residues (or C-residues) of repeated sequences are indicated. B, non-denaturing 10% polyacrylamide gel electrophoresis of eight transcripts representing three structural classes in structureprobing buffer containing 1 mM magnesium ion at 25°C. An arrowhead indicates the position of the wells. C, electrophoretic mobilities of different triplet repeat transcripts relative to the migration rate of (CUG) 17 transcript in non-denaturating gels and two different electrophoresis buffers (black bars and gray bars represent the results obtained in the absence or presence of magnesium ions, respectively) and temperature conditions (at 25 or 4°C). D, non-denaturing gel electrophoresis of (AGG) 17 transcripts forming a tetraplex structure and three control transcripts; (CAG) 17 represents hairpin-forming transcripts, (AAG) 17 represents single-stranded transcripts, and (AUG) 17 represents semistable hairpin-forming transcripts. Experiments were performed at 25°C.  17 (UAG) 17 (CAG) 17 (AGG) 17 (UUG) 17 (AUG) 17 (CUG) 17 (UGG) 17 (AAG) 17 (UUA) 17 (CCG) 17 (CCA) 17 (CUA) 17 (CGG) 17 (CUU) 17 (CAU) 17 (CGA) 17 (CCU) 17 (CGU) 17 (UAA) 17 the fastest, whereas all transcripts belonging to classes I and II showed slower electrophoretic mobility. When analyzed at 4°C in the presence of 10 mM MgCl 2 and 40 mM NaCl, the migration rates of all transcripts belonging to class II, including (UAG) 17 , (AUG) 17 , (CAU) 17 , (CUA) 17 , and (UUA) 17 , were similar to those of stable hairpin-forming transcripts (Fig. 1C). This means that at a low temperature, class II transcripts form hairpin structures. (UGG) 17 and (AGG) 17 exhibited electrophoretic migration different from all other transcripts as shown for (UGG) 17 in Fig. 1D. They migrated faster than single strands and more slowly than hairpins and gave rise to diffuse bands. These two G-rich transcripts appear to form a tetraplex structure (more data addressing this issue are described below).
UAN Repeats Form More Stable Structures than AUN Repeats-To determine the influence of base arrangement within the stem on the stability of hairpins formed by AU-rich sequences, we compared two pairs of transcripts, (UAG) 17 and (AUG) 17 , as well as (CUA) 17 and (CAU) 17 (mismatched bases are underlined). Quantitative analysis of cleavages generated by nucleases revealed that UAN 17 transcripts formed more stable structures than AUN 17 (supplemental Figs. 2 and 3). For example, T1 cuts at 25°C were only two times stronger in the loop of (AUG) 17 when compared with the stem. The same difference, but in favor of stem cleavages, was observed for V1 nuclease (supplemental Fig. 2). Conversely, the relative cleavage intensities in (UAG) 17 were nearly identical at 4 and 25°C. In the case of (UAG) 17 , cleavages at central repeats (loop region) were enhanced at 37°C, whereas this effect was not observed for (AUG) 17 (Table 2).
Two transcripts, (UUA) 17 and (UAA) 17 , could potentially form hairpins containing two alternative base-pairing systems in their stems, 5Ј-UA/5Ј-UA and 5Ј-AU/5Ј-AU. Our results indicated that only (UUA) 17 folded into hairpin structures (Fig.  2), whereas (UAA) 17 did not form any higher order structure, even at 4°C and in 10 mM MgCl 2 . Thus, putative A/A mismatches in the context of AU base pairs were prohibited. As revealed by structure-probing results, other mismatches were ordered (G/G Ͼ U/U Ͼ C/C) according to their structurestabilizing effect ( Table 2). The number of cuts occurring at central repeats that correspond to a hairpin terminal loop of five transcripts belonging to this structural group was higher than predicted for the 4-nucleotide loops ( Fig. 2 and supplemental Figs. 2 and 3). This result was interpreted to mean that several slipped conformers of the AU-rich repeat hairpin co-exist (Fig. 2B), as shown previously for all CNG repeat hairpins (5).
Magnesium Ion Concentration Affects the Distribution of Slipped Conformers-The structures formed by four transcripts, (CAG) 17 , (CUG) 17 , (CGA) 17 , and (CGU) 17 , were probed at 37°C at low and high MgCl 2 concentrations. Only quantitative differences in relative cleavage intensities within transcripts were evident (Fig. 3). The number of cleavages generated by nucleases occurring at central repeats was higher than predicted for the single 4-nucleotide loop. This means that several slipped conformers of the repeat hairpin co-exist for these transcripts. Two dominant conformers of (CGA) 17 and (CGU) 17 (conformers I and II) are shown in Fig. 3. Conformer I predominated in (CGA) 17 , whereas conformer II predominated in (CGU) 17 . Differences in the distribution of slipped conformers observed for (CGA) 17 and (CGU) 17 at low and high MgCl 2 concentrations (Fig. 3) were observed for (CAG) 17 and (CUG) 17 , as well as for all AU-rich transcripts forming semistable hairpins.
The experimentally determined thermodynamic stabilities for CNG hairpins are much lower than those predicted by RNA folding algorithms (Mfold and RNAstructure), which use the nearest neighbor thermodynamic parameters. For example, the experimentally determined free energy values are 3-9-times lower. Moreover, the stability order for the CNG repeat hairpins is different for the predicted and experimentally analyzed structures. The structure prediction shows that the CAG repeat hairpin has the lowest stability, whereas CCG repeat hairpin is the least stable according to structure melting experiments. This issue was widely discussed by Broda et al. (16).

TABLE 2 Stability of hairpin structure of AU-rich triplet repeat-containing transcripts
Symbols ϩ, ϩ/Ϫ, and Ϫ indicate that cleavages generated in the terminal loop were more efficient than those generated in the stem region at different temperatures and buffer conditions. ϩ indicates double efficiency, ϩ/Ϫ indicates 1.5-2ϫ efficiency, and Ϫ denotes no change.

Conditions Transcripts
(UAG) 17 17 , showed a clear difference in thermodynamic stability (⌬G°3 7 ) as calculated from UV-melting analysis in 100 mM NaCl and 100 mM KCl. UV melting at 260 nm in sodium and potassium salts exhibited a typical melting transition with hyperchromism of 10 -15%. However, (UGG) 17 and (AGG) 17 melting recorded at 295 nm also showed hypochromism (25-30%). This feature was reported earlier to be characteristic of a G-quadruplex structure (17,18). On the contrary, (CCG) 20 and (CUG) 20 did not show any transition at 295 nm. In the case of (AGG) 17 and (UGG) 17 , the free energies (⌬G°3 7 ) of the melting transitions recorded at both 260 and 295 nm in 100 mM NaCl were similar, about Ϫ4 kcal/mol (Table 3). Measurements performed in 100 mM KCl showed strongly enhanced thermodynamic stability of both transcripts. The free energies calculated from melting transitions recorded at either 260 nm or 295 nm were about Ϫ10 kcal/mol for both (AGG) 17 and (UGG) 17 . In the presence of potassium ions, the T m also increased significantly, by about 25°C (Table 3). Higher thermodynamic stabilities in the presence of potassium as well as descending absorbance during UV melting recorded at 295 nm are typical for G-quadruplex structures (17).
The UV spectra of (UGG) 17 , (AGG) 17 , (CCG) 20 , (CUG) 20 , and (CGG) 20 in the presence of sodium or potassium salts were recorded at 5 and 90°C. Differential spectra were similar for (UGG) 17 and (AGG) 17 under both conditions (supplemental Fig. 4). Negative peaks were observed at 297 and 265 nm, whereas positive peaks were seen at 274 and 235 nm. This characteristic of UV differential spectra is also typical of G-quadru- plexes (19). On the contrary, differential UV spectra for (CCG) 20 , (CUG) 20 , and (CGG) 20 in the presence of sodium and potassium did not show a negative peak near 295 nm.
Finally, CD spectra were recorded at room temperature for (UGG) 17 and (AGG) 17 , as well as for (CCG) 20 , (CUG) 20 , and (CGG) 20 RNAs, in a buffer containing either 100 mM NaCl or 100 mM KCl. The CD spectra were measured in the same buffers used earlier (18,19) to demonstrate G-quartet structure formation. A difference depending on ionic conditions was observed for (UGG) 17 and (AGG) 17 (Fig. 4). Both RNAs showed large positive peaks in the 260 -265 nm range and small peaks at 300 -305 nm. Significant differences in the shape and ellipticity of their CD spectra were also observed. On the other hand, differences in CD spectra recorded in the presence of NaCl and KCl were negligible for (CCG) 20 and (CUG) 20 hairpins. Positive peaks at 265-270 nm typical for A-RNA structures were observed for these transcripts. In the case of (CGG) 20 , the CD spectra were more complex, and some differences in ellipticity and peak wavelengths were observed for buffers containing KCl and NaCl. Besides the intense positive peak at 265 nm, an intense negative peak occurred at around 290 nm under both ionic conditions. Because G-quadruplexes may adopt a wide variety of structural variants, including tetra-, bi-, and unimolecular structures, we analyzed these RNAs at several different concentrations ranging from 50 nM to 100 M. No changes in melting temperature and free energy (⌬G°3 7 ) were observed (data not shown). This suggests that transcripts composed of AGG and UGG repeats form unimolecular tetraplex structures.

Sequences and Structures of Triplet Repeat RNAs Explain Their Different Susceptibility to Serum
Nucleases-To determine how triplet repeat RNAs differ in their resistance to serum nucleases, we incubated 5Ј-labeled transcripts representing all of the RNA structural classes in bovine fetal serum and monitored substrate disappearance and formation of products over time (Fig.   5). We found that transcripts that did not form secondary structures (class I) and those transcripts that formed semistable hairpins (class II) had half-lives (t1 ⁄ 2 ) shorter than 1 min. Similarly, rapid degradation was observed for the CAG repeat transcript. (CAG) 20 was capable of forming a stable hairpin, but it contains reiterations of fragile CpA phosphodiester bonds, making this transcript highly unstable in serum. Two other transcripts that also formed stable hairpins, (CUG) 20 and (CGG) 20 (class III), had half-lives of 7.3 (Ϯ2.1) and 12.5 (Ϯ1.9) min, respectively. The serum stability of G-tetraplex-forming (UGG) 17 and (AGG) 17 transcripts (class IV) was much higher. The t1 ⁄ 2 for (UGG) 17 was 52.4 (Ϯ3.9) min, but (AGG) 17 transcripts remained intact even after 135 min in serum. A similar relationship between structure and stability was observed for the same transcripts incubated in HeLa whole-cell extract (data not shown). These results are in line with earlier observations suggesting that the formation of higher order structure strongly protects RNAs from attack by nucleases and that pyrimidinespecific pancreatic-type ribonuclease activities are most prevalent in animal cells and sera (20 -23).

DISCUSSION
Ribonucleic acids employ a diverse repertoire of structural motifs to compose their native folded structures. The questions we asked in this study included the following. How does the variety of triplet repeat sequences contribute to RNA structural diversity? What are the thermodynamic properties of triplet repeats forming stable structures? What could be the putative structure-dependent functions of triplet repeats in transcripts? Finally, how do the various triplet repeats differ in their susceptibility to cellular ribonucleases? A survey of triplet repeats, which are present in about 4% of human mRNAs, revealed that some of these repeats are highly prevalent, whereas others are either rare or absent (61). CNG repeats, associated with human neurological diseases, are among the most frequent repeat types and have attracted the highest interest to date. Only AGG repeats occur at a similarly high frequency in human transcripts. These five repeated motifs constitute 77% of the 1,030 total repeat tracts composed of six or more uninterrupted repeats found in 878 human mRNAs. At the other end of the frequency scale are the CGU, CUA, UAG, CAU, CUU, CGA, and UGG repeats (61).
To gain insight into the structural features of all types of triplet repeat RNAs, we used native gel migration tests, biochemical structure probing, and biophysical characterization. The data obtained from this structure survey indicate that triplet repeat RNAs exist in three basic structural forms: singlestranded, semistable hairpins, stable hairpins, and very stable G-quadruplexes. CUU, CAA, AAG, UAA, UUG, CCA, and CCU repeats, which cannot form any higher order structure under any conditions tested, occur in 2, 12,18,18,34,45, and 50 human mRNAs, respectively (61). Unless these sequences provide suitable sites for some single-stranded RNA-binding proteins, purine-rich repeats may form rigid spacers, and pyrimidine-rich repeats may form flexible adapters between neighboring RNA structure domains in mRNAs. For example, the (GAA) 6 was shown by SELEX experiments to be a binding site for SC35 and SF2/ASF (alternative splicing factor) splicing proteins (24). Pyrimidine-rich TNR sequences that are present close to the 3Ј splice site may form binding sites for the polypyrimidine tract-binding protein and influence splicing efficiency. Moreover, in some genes, such TNRs are polymorphic in length in population and could play the role of modulating factors in gene expression.
All CNG repeats of sufficient length form imperfect stem and loop structures in transcripts in which the stems are composed of reiterated C-G and G-C base pairs and N/N mismatches (5,7,16,(25)(26)(27)(28)(29)(30). However, the CGG repeats that form the most stable hairpins (T m ϭ 75-76°C) ( Table 3) differ considerably from CUG, CCG, and CAG repeats in their tendency to form slipped-hairpin conformers (5). In a more recent NMR study, G/G mismatches were shown to exist in a dynamic equilibrium between the anti-and syn-conformations required to form two symmetric hydrogen bonds between the N1 and carbonyl oxygen atoms of two guanines (31). The CGG repeats were also reported to form G-quadruplexes in both RNA (11)(12)(13) and DNA (14,15). In the current study, we analyzed the structural features of CGG and other types of CNG repeats under identi-cal conditions using biochemical and biophysical methods. Data from structure-probing and gel migration tests indicated that the properties of CGG repeats do not deviate from those of CUG, CCG, and CAG repeats. The T m of the CGG repeat transcript is not sensitive to potassium ions, which also argues against G-quartet structure formation. Only the CD spectra of CGG repeats do not closely resemble those of CUG and CCG repeats, suggesting a slightly different hairpin architecture. No evidence for tetraplex structure formation by CGG repeats was found by the authors of the NMR study in which RNA was analyzed at millimolar concentrations (31).
By comparison, in the crystal structure of the CUG repeats (32), the U/U mismatches form stretched wobble interactions with only one hydrogen bond between the carbonyl O4 atom of one U-base and the N3 imino group of the second uracil residue. CUG repeats are predicted to assume one of two different conformations depending on whether the uracil residue is inclined toward the minor groove of the CUG repeat RNA-A helix (32). The authors of the first reported crystal structure of CUG repeats (33) suggested that CCG repeats may form a similar structure as pyrimidine-pyrimidine mismatches could well accommodate within the RNA-A helix. They anticipated stronger structure distortions by purine-purine mismatches, which have higher hydrogen bond-forming potential. Whatever the structural differences at the mismatched bases are, our study shows that they do not result in high stability differences between the CUG, CCG, and CAG repeat hairpins.
Two repeated motifs, UGG and AGG, form G-quartet structures known also as G-quadruplexes or G-tetraplexes, which are characteristic of both DNA and RNA sequences rich in guanine. These four-stranded structures are formed by Hoogsteen hydrogen bonding between a tetrad of guanine bases and are strongly stabilized by centrally located potassium ions. The DNA (34 -36) and RNA (37,38) structures formed by AGG repeats have been investigated by NMR and CD. The d(GGA) 4 intramolecular parallel quadruplex is composed of G:G:G:G tetrads and G(:A):G(:A):G(:A):G heptads (34,36). The same structure was formed by d(GGA) 8 (35). Similarly, but not identically, the r(GGA) 4 intramolecular parallel quadruplex is formed by one G:G:G:G tetrad plane and one G(:A):G:G(:A):G hexad plane (38). The structures of d(GGA) 4 , d(GGA) 8 , and r(GGA) 4 correlated well with CD spectra characteristics typical for parallel quadruplexes. The (AGG) 17 and (UGG) 17 transcripts investigated in the current study are much longer, but they seem to form similar types of quadruplex structures as their shorter counterparts based on the similarity of their CD spectra.
The results of the biophysical and biochemical structural studies on the CNG and AGG repeat transcripts obtained by us and others are gathered in supplemental Table 1 and summarized in supplemental Fig. 5. The fact that different structures were often reported for the same repeat types by authors that used different methods reflects the atypical character of these sequences where the structures are highly sensitive to the length of the repeated sequence, RNA concentration, temperature, and ionic conditions. Nevertheless, the x-ray structures of short CUG repeat duplexes (32,33) may well represent the stem portions of hairpin structures formed by longer CUG repeats (5,16,26,29,39). Similarly, the NMR structure of short CGG repeats, which were forced to form hairpins (31), may well correspond to the stem structure of hairpins formed spontaneously by long CGG repeat tracts (5,25).
Expansions of some triplet repeats in genes, mostly CNG repeats, cause a number of inherited neurological and neuromuscular diseases. Several lines of evidence argue that the pathogenesis of some of these diseases is RNA-mediated. Expanded CUG repeats play a crucial role in the pathogenesis of myotonic dystrophy DM1, and long CGG repeats are associated with the fragile X-associated tremor ataxia syndrome (40,41). The RNA gain of function mutations have also been implicated in a subgroup of these diseases (so-called poly(Q)-induced diseases), which are caused by CAG repeat expansions (42). The CUG, CAG, as well as CGG repeats form similar hairpin structures. Each of these repeat hairpins is known to be a site of interaction with some double-stranded RNA-binding proteins. For example, the muscleblind-like protein (MBNL1) was found in nuclear inclusions formed by expanded CUG and CAG (43) as well as CGG repeats (44). The affinity of MBNL1 is almost the same for the CUG and CAG repeats (39). It is also known that MBNL1 interacts mainly with the GC element of the repeated sequence (45). The structural similarity of all CNG repeat hairpins was also observed in the in vitro assay with ribonuclease Dicer (46). On the other hand, these hairpin structures are different targets for the binding of small molecule ligands (47,48). For example, Hoechst 33258 binds to CUG repeats with nanomolar affinity, and K D values are 3-4 times higher for CCG and CAG repeats. This difference could be explained by the different recognition of unique repeating electrostatic patches in the minor groove of the RNA helix.
Sequences capable of forming G-quartet structures are highly prevalent in the human genome (49). The G-rich triplet repeats are, however, rather atypical sequences of this kind. They contain only two consecutive guanine residues instead of the three or four usually present in sequences forming tetraplexes; however, the trinucleotide motif is reiterated many times, which increases the G-quartet structure stability. Analysis of the RNA aptamer confirmed that four GGA repeats are sufficient to form tetraplex conformation. Such a structure is a binding site for bovine prion protein PrP, and adenosine residues are important for this interaction (38). Of the tetraplexforming repeats analyzed here, AGG is highly prevalent in human mRNAs (122 occurrences), whereas UGG repeats are present only in six transcripts (61). The physiological importance of RNA G-tetraplexes, which are more stable and are more easily formed due to the single-stranded character of RNA than their corresponding DNA structures, was postulated in a number of reports (49 -52). Such structures have been implicated in specific protein binding and modulation of gene expression by regulating splicing, polyadenylation, translation, and transcript turnover (18,(53)(54)(55)(56)(57)(58). For example, short tetraplex-forming sequences from 5Ј-UTR of the NRAS and ZIC-1 genes decreased the translation of reporter genes several times when inserted into their 5Ј-UTRs (18,57).
The CNG and NGG repeats that are present in numerous genes are often polymorphic in length in a normal population (59). They may form binding sites for specific proteins. Differ-ent lengths and the localizations of these sequences may differently regulate gene expression. For example long CGG repeats located in 5Ј-UTR of FMR1 gene down-regulate fragile X mental retardation protein (FMRP) synthesis (60). Both CNG and AGG repeats predominantly occur in the 5Ј-UTR regions of hundreds of human transcripts.
In conclusion, the present study combines the structural surveys of triplet repeat RNAs by biochemical methods with detailed thermodynamic characterization by biophysical methods. Novel structural information gathered for repeats that are not known to be involved in the pathogenesis of human diseases, as well as the deeper structural characterization of those previously implicated in human diseases, may help to understand the biological roles of these sequences. Because numerous triplet repeat motifs of various types are present in the transcriptomes of many organisms, our results may also be considered the first step toward RNA structural genomic characterization of the "microsatellite repeatome."