The Mouse Ms6-hm Hypervariable Microsatellite Forms a Hairpin and Two Unusual Tetraplexes*

The mouse Ms6-hm microsatellite consists of a tandem array of the pentamer d(CAGGG) n . This microsatellite is extremely hypervariable, showing a germ line mutation rate of 2.5%/gamete. The mechanism responsible for this instability is not known. The ability to form intrastrand structures is a conserved feature of many hypervariable sequences, and it has been suggested that the formation of such structures might account for instability by affecting DNA replication, repair, or recombination. Here we show that this microsatellite is able to form intrastrand structures as well. Under physiological conditions, the Ms6-hmmicrosatellite forms a hairpin as well as two different unusual intrastrand tetraplexes. The hairpin forms in the absence of monovalent cation and contains G·A, G·C, and G·G base pairs in a 1:1:1 ratio. In the presence of K+, a tetraplex is formed in which the adenines are unpaired and extrahelical, and the cytosines are involved in C·C pairs. In Na+, a tetraplex forms that contains C·C+ pairs, with the adenines being intrahelical and hydrogen-bonded to guanines. Tetraplex formation in the presence of Na+ requires both cytosines and adenines and might reflect the altered internal dimensions of this tetraplex, perhaps resulting from the ability of the C·C+ pairs to become intercalated in this sequence context. Our demonstration of the stabilization of tetraplexes by hydrogen bonding between adenines and guanines expands the hydrogen-bonding possibilities for tetraplexes and suggests that the category of sequences with tetraplex-forming potential may be larger than previously appreciated.

The mouse Ms6-hm microsatellite consists of a tandem array of the pentamer d(CAGGG) n . This microsatellite is extremely hypervariable, showing a germ line mutation rate of 2.5%/gamete. The mechanism responsible for this instability is not known. The ability to form intrastrand structures is a conserved feature of many hypervariable sequences, and it has been suggested that the formation of such structures might account for instability by affecting DNA replication, repair, or recombination. Here we show that this microsatellite is able to form intrastrand structures as well. Under physiological conditions, the Ms6-hm microsatellite forms a hairpin as well as two different unusual intrastrand tetraplexes. The hairpin forms in the absence of monovalent cation and contains G⅐A, G⅐C, and G⅐G base pairs in a 1:1:1 ratio. In the presence of K ؉ , a tetraplex is formed in which the adenines are unpaired and extrahelical, and the cytosines are involved in C⅐C pairs. In Na ؉ , a tetraplex forms that contains C⅐C ؉ pairs, with the adenines being intrahelical and hydrogen-bonded to guanines. Tetraplex formation in the presence of Na ؉ requires both cytosines and adenines and might reflect the altered internal dimensions of this tetraplex, perhaps resulting from the ability of the C⅐C ؉ pairs to become intercalated in this sequence context. Our demonstration of the stabilization of tetraplexes by hydrogen bonding between adenines and guanines expands the hydrogen-bonding possibilities for tetraplexes and suggests that the category of sequences with tetraplex-forming potential may be larger than previously appreciated.
Long arrays of tandemly repeated sequences are common in eukaryotes and, in humans, make up ϳ10% of the genome (1). These sequences show varying degrees of polymorphism, with some arrays being stable, whereas others show frequent intergenerational changes that involve the gain or loss of a large number of repeat units. The mouse microsatellite locus Ms6-hm on chromosome 4 is an example of a highly unstable tandem array (2). This microsatellite, which contains repeats of the pentamer d(CAGGG), shows a high rate of both germ line and somatic mutations (3), with germ line mutation rates of 2.5%/gamete and the number of repeats ranging from 200 to Ͼ1000, making it one of the most unstable mouse loci thus far described (3). The mechanism responsible for this instability is not known, but is of wider interest since, in humans, unstable tandem arrays are responsible for a group of genetic diseases known as the triplet expansion disorders (4 -6).
We have previously shown that like a number of other hypervariable tandem arrays, the Ms6-hm microsatellite forms a set of strong blocks to DNA synthesis in vitro in the presence of K ϩ (7). It has been suggested that instability might result from repeated strand slippage in the repeat tract during replication (8), and as such, these blocks might play a role in promoting this process. These blocks do not form in the presence of cations such as Rb ϩ , Cs ϩ , Li ϩ , or NH 4 ϩ . They occur at the 3Ј-end of the G-rich strand and are template concentration-independent and strand-specific. They are abolished when guanines in the template are replaced with 7-deazaguanine, and the guanines in this sequence are completely protected from DMS 1 modification in the presence of K ϩ (7). Thus, the properties of the Ms6-hm microsatellite are consistent with intrastrand tetraplex formation.
However, unlike other tetraplex-forming sequences, which form a block to DNA synthesis only in the presence of K ϩ (7, 9 -14), the Ms6-hm microsatellite blocks DNA synthesis also in the presence of Na ϩ . This suggested to us that the underlying structure responsible for this block might differ in some fundamental way from previously described tetraplexes. We show here that, in fact, the Ms6-hm microsatellite forms two different tetraplexes, one in the presence of K ϩ and a different one that is seen only in the presence of Na ϩ . In addition, this microsatellite also forms an unusual hairpin in the absence of cations. Our observations expand the category of sequences we would expect to form tetraplexes and might be relevant in particular to a series of previously identified sequences with tetraplex-forming potential that contain adenines. These sequences include the telomeres of a number of eukaryotes (15), the hypervariable minisatellite in the human insulin promoter (16), and the immunoglobulin switch regions (17).

Template Preparation-Oligodeoxyribonucleotides containing an
EcoRI site at the 5Ј-end with the sequence of interest followed by 15 bases of the 5Ј-end of the Escherichia coli supF gene were synthesized using standard phosphoramidite chemistry. These oligodeoxyribonucleotides were then used together with a primer that contained an EcoRI site and 15 bases complementary to the 3Ј-end of the supF gene to generate a PCR fragment as described previously (9,14). For PCR fragments containing inosine or 7-deazaadenine, dITP or 7-deaza-dATP was substituted for the appropriate dNTP in the PCR. For the construction of plasmid templates, the PCR product was digested with EcoRI, purified by gel electrophoresis, and then cloned into the plasmid pMS189⌬ as described previously (9,14). Plasmids were propagated in E. coli MBM7070, isolated by alkaline lysis, and purified by CsCl * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  gradient centrifugation according to standard procedures. PCR fragments that were used directly as templates in tetraplex assays were purified on 5% polyacrylamide gels.
Chemical Modification of the (CAGGG) 8 -containing Oligodeoxyribonucleotide-DMS was purchased from Sigma, and diethyl pyrocarbonate (DEPC) and chloroacetaldehyde (CAA) were purchased from Fluka (Buchs, Switzerland). These reagents were used for chemical modification of the oligodeoxyribonucleotide without further purification. Endlabeled oligodeoxyribonucleotides (1-5 ng) were heated in Tris/EDTA buffer for 1 min at 90°C and then diluted into 40 l of the reaction buffer. Reactions carried out at pH 5.5 used a buffer consisting of a 50 mM concentration of the acetate salt of the indicated cation (pH 5.5) and 2 mM MgCl 2 . Reactions at other pH values used a buffer consisting of 50 mM Tris adjusted to the appropriate pH with HCl and either no added cations or 2 mM MgCl 2 or 2 mM MgCl 2 and a 50 mM concentration of the specified monovalent cation as either the glutamate or the chloride salt (the anion has no effect on tetraplex formation (14)). Reactions were heated for 30 s at 95°C and then incubated for 5-60 min at 55°C. One microliter of DMS, 4 l of DEPC, or 4 l of CAA was added to each tube. These reactions were allowed to proceed at room temperature for 1 min (DMS) or 20 min (DEPC and CAA). Reactions were terminated by repeated precipitation with butanol. CAA-treated templates were treated with DMS, precipitated with butanol, and then heated for 30 min at 90°C in 1 M piperidine. DMS-treated samples were treated either directly with piperidine or with sodium acetate (pH 5.2) at 90°C for 4 min or 60% hydrazine for 30 s at room temperature before piperidine cleavage. DEPC-treated samples were treated with piperidine without further modification. The reactions were precipitated twice with 1 ml of butanol; dried under vacuum; dissolved in 20 l of 42.5% (v/v) formamide, 5 mM EDTA (pH 9.5), 5 mM NaOH, 0.05% xylene cyanol, and 0.05% bromphenol blue; denatured for 2 min at 90°C; and subjected to electrophoresis on a 20% sequencing gel at 3500 V until the bromphenol blue was 5 cm from the bottom of the gel. Gels were covered with plastic wrap and exposed to x-ray film overnight at Ϫ70°C. Gels were scanned and analyzed by NIH Image (Version 1.58).

RESULTS
Intrastrand tetraplexes form when four G-rich regions on a single strand interact to form a series of tetrads. Successive tetrads form a hollow stem that is bounded by loops formed by the bases between the G-rich regions ( Fig. 1, L1, L2, and L3). K ϩ ions are generally the most effective monovalent cations at stabilizing these structures. It is thought that this is due either to an optimal fit of K ϩ inside the stem of the tetraplexes such that the cation is able to coordinate the O 6 atoms of guanines in adjacent tetrads (18) or to its relative ease of dehydration (19). We have recently demonstrated that the ability to block DNA synthesis in the presence of K ϩ is a diagnostic feature of intrastrand tetraplexes (11), presumably because the K ϩ -stabilized tetraplexes form a barrier to polymerase progression. We have not found ions other than K ϩ to be effective at producing a block to DNA synthesis in any of the other tetraplex-forming sequences that we have tested (7, 9 -13), so our observation of the ability of Na ϩ to cause a block to DNA synthesis on templates containing the mouse Ms6-hm repeats (7) was somewhat surprising. However, since the properties of the arrest sites are all consistent with tetraplex formation, our working hypothesis was that the structural basis of the blocks to DNA synthesis seen in this region was the formation of one or more intrastrand tetraplexes with unusual properties.
Since the major sites of polymerase arrest occurred at the 3Ј-end of the repeat array (summarized graphically in Fig. 2; also see lanes marked WT in Fig. 8) (7), we conclude that the major structure formed in the presence of both cations involves all eight repeats. A lesser amount of polymerase pausing is seen immediately 3Ј of the fifth, sixth, and seventh repeats from the 5Ј-end in the presence of K ϩ , but not Na ϩ (7). These products represent DNA synthesis arrest occurring at tetraplexes involving fewer repeats. The fact that a template containing only five CAGGG repeats produces only one weak K ϩdependent chain termination product and no Na ϩ -dependent ones (data not shown) is consistent with this idea. In addition to the larger number of polymerase stops, the amount of K ϩdependent polymerase pausing at the 3Ј-end of the repeat tract is greater than the amount of Na ϩ -dependent pausing (see lanes marked WT in Fig. 8) (7). Two explanations are possible: either CAGGG repeats form a single type of tetraplex with a higher stability in K ϩ ; or two types of tetraplex form, one in the presence of K ϩ and a different, somewhat less stable tetraplex in the presence of Na ϩ .
We used DMS, CAA, and DEPC to analyze the structures formed by eight d(CAGGG) repeats as well as a variety of base-substituted variants of this sequence. DMS methylates the N 7 of guanine, which is not involved in Watson-Crick base pairing (20). Protection of guanines from this reagent is thus diagnostic of the formation of non-Watson-Crick base interactions. DMS also modifies adenines and cytosines, albeit to a lesser extent, with the rate of these reactions varying with the particular configuration of the base. Chloroacetaldehyde reacts with the N 1 and N 6 of adenine and the N 3 and N 4 of cytosine (21), positions involved in hydrogen bonding in Watson-Crick A⅐T and G⅐C pairs, respectively. The first step in the reaction of CAA with DNA is believed to be the alkylation of the endocyclic nitrogen (21), and N 6 -substituted adenosines and N 4 -substituted cytidines are reactive with CAA (22). It is thus feasible that cytosines that are hydrogen-bonded at N 4 but not N 3 and that adenines in which the N 6 but not the N 1 proton is involved in a hydrogen bond will also be reactive with CAA. DEPC decarboxylates adenines and, to a lesser extent, guanines (23). This modification occurs at adenine N 6 and N 7 and guanine N 7 and is seen most readily when these bases are unpaired or in an unusual conformation. In parallel with the chemical probing experiments, we examined the ability of these oligodeoxyribonucleotides to cause either Na ϩ -or K ϩ -dependent DNA synthesis arrest when used as templates for DNA polymerase under similar reaction conditions.
The Mouse Ms6-hm Repeat Forms a Mixture of Different Structures-An oligodeoxyribonucleotide containing eight copies of the mouse Ms6-hm repeat showed some level of DMS protection for certain guanines both in the absence of monovalent cation (or in the presence of Li ϩ ) and in the presence of K ϩ or Na ϩ (Fig. 3). Since unpaired guanines or guanines in Watson-Crick G⅐C pairs are not protected from DMS modification, this indicates that secondary structures involving non-Watson-Crick base interactions are formed under all of these conditions. The patterns of DMS protection are different in Li ϩ , K ϩ , and Na ϩ (Fig. 3, lanes marked L, K, and N), indicating that at least three different secondary structures are formed by the mouse Ms6-hm repeat.
The Monovalent Cation-independent Structure-In the absence of monovalent cation or in the presence of Li ϩ , there is partial protection of the third guanine in the first repeat and of the first and third guanines in the remaining repeats (Fig. 3). The third guanine in each repeat shows ϳ50% of the DMS reactivity of the second guanine residue, whereas the first guanine is also protected, albeit to a lesser extent. Replacement of the C at position 41 with a T greatly reduces the amount of DMS protection (data not shown). Substitution of A 7 , A 17 , A 27 , and A 37 with N 6 -methyladenine, an adenine analog that contains a methyl group at N 6 , or nebularine, a purine base that lacks the N 6 amino group, abolishes the partial DMS protection of the guanines (Fig. 4), indicating that hydrogen bonding of some or all of these adenines is involved in the structure responsible for this protection. These data are consistent with the formation of a hairpin in which the first guanine in each repeat is involved in a G⅐A pair, the second guanine is involved in a Watson-Crick G⅐C base pair, and the third guanine is hydrogen-bonded to the third guanine in a repeat on the opposite side of the hairpin (Fig. 5). In a G⅐G pair, each of these guanines acts as an N 7 donor 50% of the time, thus accounting for the ϳ50% reduction in DMS reactivity of these bases. A base pairing scheme in which guanines in the G⅐A pairs are obligatory N 7 donors should afford their complete protection from DMS modification. Therefore, the partial protection of the first guanine in each repeat could result from either an oscillation between a G⅐A pair that involves guanine N 7 and one that does not or a mixture of the two types of base pairs. A G-syn⅐AH ϩ -anti base pair is the only G⅐A pair involving the normal tautomeric forms of these bases in which guanine N 7 participates in a hydrogen bond; however, this scheme requires that the adenine be protonated, and since we observed a similar pattern of DMS protection over the pH range 5.5-10, this finding is somewhat surprising. Possibilities for G⅐A pairing that do not involve guanine N 7 include a G-anti⅐A-syn pair, a G-anti⅐A-anti pair, or a sheared G⅐A pair.
The K ϩ -dependent Structure-In the presence of K ϩ , all guanines within the microsatellite were protected from DMS modification (Fig. 3), which is consistent with tetraplex formation where the guanines are obligatory N 7 donors. The fact that no guanines in the repeat are reactive with DMS suggests that all the guanines are located within the stem of the tetraplex rather than the loops. Although no guanines within the repeat are The repeat is shown in boldface, and the bases within the repeat are numbered starting from the 5Ј-end of the top strand. Each repeat unit is indicated by a bracket and numbered 1-8 starting from the 5Ј-end. The same numbering convention is used in Figs. 3, 4, and 6 -8. The black arrows indicate the K ϩ -dependent blocks to DNA synthesis, and the gray arrows indicate the Na ϩ -dependent ones. The length of the arrows indicates the relative amounts of premature chain termination in each case. This figure summarizes our previously published data (7). 8 tract. An oligodeoxyribonucleotide containing the Ms6-hm repeat was modified with DMS as described under "Materials and Methods." The lanes marked L, K, and N indicate the products for reactions carried out in the presence of Li ϩ , K ϩ , and Na ϩ respectively. Each repeat unit is indicated by a bracket and numbered 1-8 starting from the 5Ј-end. The sequence of a representative repeat unit is shown alongside unit 2. The DMS-reactive bases are shown in boldface with the identity of the base and its position in the repeat tract indicated alongside. The black arrow indicates the DMS-hyperreactive guanine that is seen in the presence of K ϩ and is diagnostic of the formation of the K ϩ tetraplex. DMS-reactive, there was a single DMS-hyperreactive guanine in the 3Ј-flanking region (Fig. 3, arrow) that was not seen in the presence of either Li ϩ or Na ϩ . Hyperreactivity of this base would be consistent with its N 7 atom being in a particularly exposed conformation, as it might be at the junction of the tetraplex and a region of single strandedness, and appears to be a common feature of K ϩ -dependent tetraplexes. 2 This pattern of DMS modification was independent of pH over the range 5.5-10.

FIG. 3. DMS modification of an oligodeoxyribonucleotide containing the (CAGGG)
The same pattern of DMS protection, including the hyperreactive guanine in the flanking region, was seen when A 7 , A 17 , A 27 , and A 37 were substituted with N 6 -methyladenine or nebularine (Fig. 4). Since nebularine cannot participate in hydrogen bonds, this observation would be consistent with the adenines being unpaired in the K ϩ tetraplex. The fact that N 6 -methyladenine substitution does not abolish the tetraplex supports this hypothesis. Moreover, since this analog contains a relatively bulky methyl group that would be difficult to accommodate within a tetraplex (24), it suggests that the adenines might in fact be pointed away from the helix axis in an extrahelical position. All the adenines in the repeat were more extensively modified by DEPC than adenines outside the repeat or the same adenines in the presence of Li ϩ or in the absence of monovalent cation (Fig. 6). This hyperreactivity suggests that the adenines in the repeat are not simply unpaired, but may be particularly exposed to the solvent, consistent with the hypothesis that the adenines are extrahelical. No DEPC-reactive guanines were seen within the repeat, consistent with the DMS data. Replacement of A 7 and A 27 with the FIG. 4. DMS modification of N 6 -methyladenine-or nebularinesubstituted oligonucleotides. Oligodeoxyribonucleotides in which the A 7 , A 17 , A 27 , and A 37 of the Ms6-hm repeat were replaced with N 6 -methyladenine (m 6 dA)or nebularine were modified with DMS as described under "Materials and Methods." The lanes marked L, K, and N indicate the products for reactions carried out in the presence of Li ϩ , K ϩ , and Na ϩ , respectively. Each repeat unit is indicated by a bracket and numbered 1-8 starting from the 5Ј-end. The sequence of a representative repeat unit is shown alongside unit 2. The DMS-reactive bases in the unsubstituted oligonucleotide in the presence of Na ϩ are shown in boldface with the identity of the base and its position in the repeat tract indicated alongside. The positions of the N 6 -methyladenine and nebularine residues are indicated with asterisks and filled circles, respectively. The black arrows indicate the DMS-hyperreactive guanine that is seen in the presence of K ϩ and is diagnostic of the formation of the K ϩ tetraplex. isosteric analog of adenine, 7-deazaadenine, has little effect on either the K ϩ -dependent arrest of DNA synthesis or the pattern of K ϩ -dependent DMS protection (data not shown). However, replacement of all adenines with 7-deazaadenine eliminates the K ϩ -dependent arrests. This observation is not necessarily inconsistent with the interpretation that the adenines are unpaired since the presence of a carbon rather than a nitrogen at position 7 of 7-deazaadenine not only abolishes the ability of this analog to act as an N 7 donor, but also produces an altered -electron shell that can affect other interactions that may stabilize the tetraplex in other ways (25).
The adenines in the repeat were uniformly reactive with CAA in the presence of K ϩ at both high (Fig. 7), and low pH values. Given the DEPC hyperreactivity of the adenines, the CAA data are consistent with these bases being unpaired. At high pH, the cytosines are CAA-reactive (Fig. 7), but at low pH, they are not (data not shown). This suggests that the cytosines form pH-sensitive hydrogen bonds at low pH. Since the adenines are unpaired at all pH values and since G⅐C pairs or G⅐C⅐G⅐C tetrads are not favored by protonation, this suggests that the K ϩ tetraplex contains C⅐C ϩ pairs at low pH. The reason for the CAA reactivity of the cytosines at high pH is less clear-cut. One interpretation is that the cytosines are unpaired under these conditions. However, replacement of C 6 and C 26 with thymines reduces the extent of polymerase arrest in the presence of K ϩ (Fig. 8, center panel). This would be consistent with a role of hydrogen bonding involving these cytosines even at higher pH values. Moreover, replacement of C 6 , C 16 , C 26 , and C 36 with thymines enables the tract to block DNA polymerase as effectively as the unsubstituted repeat (Fig. 8, right panel). These results suggest that hydrogen bonds form between C 6 and C 16 and between C 26 and C 36 . The C 6 -to-T 6 and C 26 -to-T 26 substitutions would disrupt both C⅐C pairs, without providing significant compensating hydrogen bonding, whereas the substitution of all four cytosines in the stem would make the formation of hydrogen bonds between thymines possible and thus restore stability to the structure. In neutral C⅐C pairs, the N 4 amino group of one cytosine is involved in a hydrogen bond with the N 3 of the other cytosine, but since this base pair only involves a single hydrogen bond, it is likely that each cytosine acts as an N 3 acceptor 50% of the time, with the lifetime of each conformation being shorter than the reaction time of the structure with CAA (26). The CAA reactivity of cytosines at pH values above neutral could thus be reconciled with the effect of FIG. 6. DEPC modification of the (CAGGG) 8 tract. An oligodeoxyribonucleotide containing the Ms6-hm repeat was modified with DEPC as described under "Materials and Methods." The lanes marked L, K, and N indicate the products for reactions carried out in the presence of Li ϩ , K ϩ , and Na ϩ , respectively. Each repeat unit is indicated by a bracket and numbered 1-8 starting from the 5Ј-end. The sequence of a representative repeat unit is shown alongside unit 2. The reactive adenines seen in K ϩ and the reactive guanines seen in Na ϩ are shown in boldface with the identity of the base and its position in the repeat tract indicated alongside.  8 tract. An oligodeoxyribonucleotide containing the Ms6-hm repeat was modified with CAA as described under "Materials and Methods." The lanes marked L, K, and N indicate the products for reactions carried out in the presence of Li ϩ , K ϩ , and Na ϩ , respectively. Each repeat unit is indicated by a bracket and numbered 1-8 starting from the 5Ј-end. The sequence of a representative repeat unit is shown alongside unit 2. The CAA-reactive bases seen in the presence of K ϩ are shown in boldface with the identity of the base and its position in the repeat tract indicated alongside. thymine substitutions if it is assumed that the cytosines are involved in neutral C⅐C pairs in which each cytosine oscillates between a conformation in which it acts as an N 3 donor and one in which it acts an N 3 acceptor. The cytosines would be vulnerable to CAA modification either when they are acting as N 3 acceptors or during the transition from N 3 acceptor to N 3 donor. All the available data therefore lend support to a model for the K ϩ tetraplex shown in Fig. 9, in which all the guanines are involved in G 4 tetrads; the adenines are extrahelical; and C 6 , C 16 , C 26 , and C 36 are involved in neutral C⅐C pairs at non-acidic pH values and in hemiprotonated C⅐C ϩ pairs at low pH.
The Na ϩ -dependent Structure-In the presence of Na ϩ , there is strong protection of most of the guanines from DMS modification that is diagnostic of tetraplexes (Fig. 3). However, in contrast to what is seen in K ϩ , two guanines (G 10 and G 20 ) and a cytosine (C 21 ) were reactive with DMS at neutral pH and above, suggesting that these bases are located in the loops of the Na ϩ tetraplex. Consistent with this observation is the fact that substitution of G 20 with thymine has no effect on the amount of Na ϩ -dependent DNA polymerase arrest (data not shown). The location of these bases in the loops rules out a tetraplex with a conformation similar to that of the K ϩ tetraplex. Similarly, it rules out a tetraplex in which the adenines are aligned with one another so as to contain a mixture of G 4 tetrads and G⅐C pairs or G⅐C⅐G⅐C tetrads. Rather, these data are consistent with a structure with the same overall conformation as the one shown in Fig. 9 (right). The reactivity of G 10 , G 20 , and C 21 was reduced at low pH. This reduction, without changes elsewhere in the tetraplex, suggests that although the overall conformation of the structure is relatively pH-independent, there may be some pH-dependent conformational changes in the loops.
Substitution of A 7 , A 17 , A 27 , and A 37 with either N 6 -methyladenine or nebularine abolished the DMS protection of stem guanines completely (Fig. 4). The effect of N 6 -methyladenine would be consistent either with a role for the N 6 amino group in hydrogen bonding or with an effect of the bulky methyl group on steric hindrance. The nebularine data support the first hypothesis. All the adenines in the repeat were DEPC-reactive, with A 32 being hyperreactive (Fig. 6). The same pattern of reactivity of these bases was seen at all pH values. This reactivity indicates that adenine N 7 is not involved in hydrogen bonding. Replacement of A 7 and A 27 with 7-deazaadenine abolishes the Na ϩ -dependent arrests and the Na ϩ -dependent DMS protection pattern (data not shown). However, since all the adenines are reactive with DEPC, this is probably due to the negative effect of the altered -electron shell of the 7-deazaadenine analog on stacking rather than the inability to act as an N 7 donor (25,27). The same two guanines that were DMSreactive (G 10 and G 20 ) were also DEPC-reactive at high pH ( Fig. 6), supporting the assignment of these bases to the loops of the tetraplex. The DEPC reactivity of these guanines is reduced at low pH, consistent with pH-sensitive conformational changes occurring in these loops.
In the presence of Na ϩ at non-acidic pH values, C 21 , A 22 , C 31 , and A 32 are CAA-reactive (Fig. 7), consistent with the location of these bases in the loops as shown in Fig. 9. In contrast, A 2 , C 6 , A 7 , C 16 , A 17 , C 26 , A 27 , C 36 , and A 37 are protected from CAA modification. The protection of adenines is consistent with our N 6 -methyladenine and nebularine data and supports the contention that the stem adenines are involved in hydrogen bonds. Moreover, it suggests that the hydrogen bonding involves adenine N 1 as well as the N 6 proton since the first site of CAA attack is thought to be the endocyclic nitrogen. The formation of G-syn⅐AH ϩ -anti pairs could account for both the DMS protection of the guanines and the chemical reactivity of the adenines. Since the DMS protection of most of the guanines remains constant over the pH range 5.5-10, our interpretation, if correct, would suggest that the adenines in this sequence can be protonated even at relatively high pH values. This would be consistent with our model for the hairpin formed by this sequence in the presence of Li ϩ . The formation of A-anti⅐G-anti⅐A-anti⅐G-anti tetrads in which the N 7 of each guanine acts as an acceptor for the amino proton of the second adenine in each tetrad would also be consistent with our data. However, this tetrad is thought to be improbable based on stereochemical considerations (28). The protection of stem cytosines from modification by CAA (Fig. 7) demonstrates that cytosines are also FIG. 8. Effect of different C-to-T substitutions on the stability of the two tetraplexes. Plasmid clones that contained a C-to-T substitution at C 6 and C 26 or at C 6 , C 16 , C 26 , and C 36 were constructed as described under "Materials and Methods." These templates were used together with the unsubstituted template in the tetraplex assay. The assay was conducted in the absence of added monovalent cation (0), in the presence of 50 mM K ϩ (KCl), or in the presence of 50 mM Na ϩ (NaCl). The lane markers T, C, G, and A indicate the bases on the template strand. The bracket on the left demarcates the repeat array. WT, wild type.
hydrogen-bonded in the Na ϩ tetraplex probably in C⅐C ϩ pairs. Replacement of C 6 , C 16 , C 26 , and C 36 with thymines eliminates the Na ϩ -dependent DNA synthesis arrest site completely (Fig. 8), as does replacement of A 7 , A 17 , A 27 , and A 37 (data not shown). The fact that replacement of stem adenines or cytosines with thymines abolishes the Na ϩ tetraplex, but not the K ϩ tetraplex, indicates that both cytosines and adenines are required for the Na ϩ structure. Model sequences with the potential to form A⅐G pairs or A⅐G⅐A⅐G tetrads do not show any evidence for tetraplex formation in the presence of Na ϩ , either at neutral or acidic pH values (data not shown). This suggests that hydrogen bonds between guanines and adenines alone do not account for the Na ϩ -dependent tetraplex. Likewise, model sequences with the potential to form C⅐C or C⅐C ϩ pairs also do not form tetraplexes in the presence of Na ϩ either at high or low pH, 2 suggesting that such pairs are also not sufficient to allow formation of the tetraplex in the presence of Na ϩ . One explanation for the Na ϩ effect could be that the hydrogenbonding arrangement between adenines and guanines favors intercalation of the hemiprotonated C⅐C pairs. The closer proximity of metal ion-binding sites that result from this intercalation (30) may allow the smaller Na ϩ cation to stabilize this type of tetraplex. If this hypothesis is correct, it would suggest that the Na ϩ structure is unusual not only because it is stabilized by hydrogen bonds involving adenines, but also because it contains elements of both a typical G 4 tetrad-based tetraplex and an i-motif structure in which the constituent C⅐C ϩ pairs are intercalated (31). DISCUSSION We have described a set of folded structures that form on the purine-rich strand of the tandem repeat found at the mouse Ms6-hm locus. These structures include a hairpin containing a mixture of G⅐G, G⅐C, and G⅐A pairs and two tetraplexes, one formed in the presence of K ϩ containing a mixture of G 4 tetrads, extrahelical adenines, and neutral C⅐C pairs and one formed in the presence of Na ϩ containing G 4 tetrads, hemiprotonated C⅐C pairs, and either G-syn⅐AH ϩ -anti pairs or G-anti⅐A-anti⅐G-anti⅐A-anti tetrads.
These tetraplexes are seen at physiologically reasonable ionic strengths and form very rapidly in solution. The Na ϩ -K ϩ effect resembles a switch in which the nature of the major cation determines the ultimate configuration of the tetraplex. Whether this "switch" has any biological consequences remains to be determined. This switch is different from the one reported by Sen and Gilbert (32), which involved an oscillation between intermolecular tetraplexes formed in the presence of Na ϩ and intramolecular tetraplexes that formed preferentially in the presence of K ϩ , but serves to underscore how fluctuations in the cationic environment of DNA might elicit a change in the behavior of a particular region of DNA. The observation that a different hydrogen-bonding interaction is seen on lowering the pH raises the possibility of an additional switch that is pHsensitive. The observation that adenines can participate in tetrads is relevant not only for the structure formed by the Ms6-hm locus, but also for other regions of eukaryote genomes since the ability of adenines to contribute to tetraplex stability increases the number of sequences with the potential to form stable tetraplexes. In particular, the potential contribution of adenines might be germane to the structures of other tetraplexforming regions that contain adenines, e.g. the (T 2 AG 3 ) n repeats that comprise the telomeres of all vertebrates (15), the Dictyostelium telomere ((AG 6 GAGAG 6 AG 6 ) n ) (33), the polymorphic repeats in the immunoglobulin switch regions ((G 4 AGCTG 4 ) n ) (17), the hypervariable sequence found in the D4S43 locus ((G 4 AG 5 AAGA) n ) (34), and the hypervariable sequence in the human insulin promoter ((ACAGGGGTGT-GGGG) n ) (16). Since there is a growing body of evidence, both direct (35-37) and indirect (29, 38 -41), for the formation of tetraplexes in vivo, the ability of additional sequences to form tetraplexes might have important biological consequences.