The Consensus Sequence for Self-catalyzed Site-specific G Residue Depurination in DNA*

Background: Certain stem-loop-forming sequences self-catalyze site-specific DNA depurination of G residues. Results: The catalytic intermediate is highly sequence-specific. Conclusion: Like other catalytic mechanisms inherent in macromolecules, self-catalyzed DNA-depurination involves critical sequence and structural elements. Significance: Because the resultant apurinic sites are subject to highly error-prone repair, knowledge of the sequence requirements enables location of potential spontaneous mutagenic sites within genes and other genomic elements. The sequence variation tolerated within the stem-loop-forming genomic consensus sequence for self-catalyzed site-specific depurination of G residues is explored. The variation in self-depurination kinetics with sequence changes in the loop residues and stem base pairs, as well as with pH, provides insights into the self-catalytic mechanism. The observations suggest that self-catalyzed depurination of the 5′ G residue of the loop consensus sequence 5′-G(T/A)GG-3′ probably involves formation of some intraloop hydrogen-bonded base pair with the 3′-terminal G residue; although the electronic structure of both these G residues is retained, their 2-amino substituents are not critical for that interaction. The strong dependence of the self-depurination kinetics on stem stability suggests that the lifetime of some strained form of the loop is controlled by the integrity of the stem. In addition to the effects of length and base pair sequence on stem stability, there is a base pair requirement at the base of the loop: self-depurination is suppressed by 5′-C·G-3′, 5′-A·T-3′, or a mismatch but is most favored by 5′T·A3′ and less so by 5′-G·C-3′. The occurrence in T and G of a similarly located carbonyl capable of hydrogen-bonding to the water molecule required for glycosyl bond hydrolysis may explain this sequence requirement. In toto, the more complete definition of the consensus sequence provided by this investigation enables a more accurate estimation of their number in the human genome and their distribution among different genes.

Previously (1), we described a 14-residue DNA consensus sequence that, upon forming a stem-loop structure, self-catalyzes depurination of a specific G residue under physiological conditions in vitro some 4 -5 orders of magnitude faster than the background "spontaneous" depurination rate of DNA in vivo (2). The self-depurinating single-strand element was originally identified within the coding strand of the human ␤-globin gene, with the depurinating G residue immediately upstream of the sickle cell anemia mutation site in codon 6. The catalytic intermediate was shown to be a stem-loop with a 5Ј-G(T/ A)GG-3Ј loop and T⅐A as the first base pair of the stem, followed by any sequence of several more base pairs sufficient to stabilize the structure. The 5Ј-G loop residue was shown to convert to an apurinic site susceptible to cleavage by either apurinic endonuclease or by spontaneous ␤-elimination at mildly alkaline pH. Such sites had previously been found to lead to substitution mutations or short deletions as a consequence of error-prone repair (3). In a preliminary search of the human genome, based upon the 14-residue self-depurinating sequence identified in the ␤-globin gene, at least 50,000 sites capable of forming such self-depurinating stem-loops were found, and when several variants corresponding to the consensus sequence were tested, they all showed the self-depurinating activity (1).
Given the possible biomedical and evolutionary consequences of this surprising self-catalytic reaction inherent in genomic DNA, we have sought to extend these initial observations to provide a more complete definition of the consensus sequence by extensively analyzing the consequences of varying the sequence of the stem-loop catalytic intermediate on the self-depurinating activity. This knowledge can provide a basis for elucidating the biological roles of the mechanism through an analysis of the frequency and distribution of potential selfdepurinating sites in various genes and genomes, and it may also shed some light on the nature of the catalytic mechanism. Toward that end, we have examined the tolerance for replacement of each of the three G residues of the loop with canonical base or base analog residues, the tolerance for variation of the base pair at the base of the loop of the stem-loop catalytic intermediate, and the consequence of varying some features of the stem. As a result of this more precise definition of the consensus sequence, the number of potentially self-depurinating stemloop-forming sequences in the human genome has been determined to be very much larger than originally estimated.

EXPERIMENTAL PROCEDURES
Deoxyoligonucleotides-These were purchased with and without base analogs from IDT (Coralville, IA), purified to homogeneity by denaturing PAGE (16% slab gels 25 ϫ 45 cm in 8 M urea with Tris-borate buffer (90 mM Tris, 90 mM Boric Acid, 10 mM EDTA) at 1.5 kV for ϳ2.3 h), and eluted from gel slices by soaking overnight in 0.1 M sodium phosphate, pH 7.0. Final purification was accomplished by 100% acetonitrile elution from a C18 SEP-PAK reverse phase column (Waters), followed by spin evaporation. Oligomer concentration was then adjusted spectrophotometrically in Aldridge ACS reagent grade water.
3Ј-End Labeling-This was performed using terminal deoxytransferase (United States Biochemical Corp.) (5 units) on ϳ5 pmol of purified oligomer with ϳ5 pmol of [␣-32 P]d-dATP (4). The labeled oligomers were purified by denaturing PAGE prior to use.
Kinetics of Depurination-Oligomers were incubated at 22°C in 10 mM sodium cacodylate, pH 5.0 (unless otherwise noted), and aliquots were frozen on dry ice at appropriate intervals. To reveal apurinic sites, aliquots were treated with 0.1 M piperidine for 30 min at 75°C to induce base-catalyzed backbone cleavage (5,6). Either a trace amount of end-labeled oligomer was mixed with an unlabeled stock of appropriate concentration prior to incubation or else the aliquots were labeled following the piperidine treatment (both methods yield the same depurination rates). The radioactivity of intact oligomer, of its 3Ј-end cleavage product, and of the products of nonspecific cleavage at other G residues was quantitated using a Molecular Dynamics store phosphorimaging device (American Biosciences), and these data were used for the kinetic analysis. Each experiment was performed at least three times, and the kinetic data were averaged.

RESULTS
Varying the Loop Residues-The stem-loop catalytic intermediate for self-catalyzed depurination of the 5ЈG residue of the loop in the wild-type ␤-globin gene is shown schematically in Fig. 1. Fig. 2 illustrates how the depurination kinetics was revealed by denaturing PAGE and the experimental data were quantitated to determine the depurination rates for different stem-loop sequences. Fig. 2A shows a typical gel fractionation after piperidine-induced cleavage of apurinic sites of self-depurinating oligomers (or the absence of such cleavage for nondepurinating ones), and Fig. 2B presents typical kinetic plots obtained from analysis of such gels. Data from those plots provide the basis for Table 1, which indicates how various substitutions of each of the loop residues and the residues of the first base pair with other canonical bases and base analogs of G affect the kinetics of self-depurination.  A, radioautograph of a screening gel electrophoresis analysis of four stem-loop-forming deoxyoligonucleotides with a common loop sequence (that of sickle cell ␤-globin), but with different first stem base pairs. The oligomers were incubated under standard pH 5 conditions for up to 50 h and then treated with 0.1 M piperidine at 75°C for 30 min to reveal fragments caused by ␤-elimination at accumulated apurinic sites. The oligonucleotides (usually 18 nucleotides) were first gel-purified and labeled at their 3Ј-end with terminal deoxytransferase and [ 32 P]ddATP; in some cases, both 18 and 17-nucleotide bands can be seen in the unincubated sample, because these are difficult to resolve during the preparative gel purification, and this results in a trace of shorter depurination product as well. Such screening gels were used initially to determine the range of the depurination rate for a particular oligonucleotide sequence and were followed by several experiments with more time points and/or longer incubation times. The gels were quantitated as described under "Experimental Procedures," and both specific (the 9-nucleotide product) and random (usually at the 12-13-nucleotide level) self-depurination rates were determined. Note that random cleavage is virtually not detectable. B, in all four cases shown: ࡗ, -T-GTCG-A, the first base pair, that at the base of the loop of the stem-loop-forming deoxyoligonucleotide, is the most favorable one, T⅐A. ⅙, the nonspecific control reflecting the random background self-depurination rate, which was below 10 Ϫ7 min Ϫ1 . The plots show the increase in the proportion of the piperidine-induced ␤-elimination fragmentation product with incubation time. The rates given in Table 1 were determined from the initial slopes of such plots.
At loop position 1, the only other residue to undergo selfdepurination at a rate close to that of dG is deoxyinosine (Fig.  2B). The base in this case is hypoxanthine, which like G has a carbonyl substituent on C6, but unlike it, lacks the 2-amino substituent. Notably, as can be seen in Table 1, the other canonical purine residue, that with the base A, shows a definite trace tendency to self-depurinate. However, not unexpectedly, residues with 7-deaza-G (in which N 7 is replaced by a carbon atom) and the two canonical pyrimidines C and T show no detectable tendency toward self-glycosyl bond hydrolysis.
In the initial report (1), data were presented showing that depurination of the G residue at loop position 1 occurs only when the base of the second loop residue is a T, as in the sickle cell mutant, or an A, as in the wild-type ␤-globin gene. Substitution of a C-residue at that site results in complete absence of self-depurination at position 1, as does substitution of a G residue.
At loop position 3, replacement of the normal G residue with an I residue causes a small diminution in the self-depurination kinetics at position 1 ( Fig. 2B), whereas replacement with residues of 7-deaza-G, -C, -T, or -A results in complete absence of self-depurination of loop residue G1.
At loop position 4, replacement of the G residue with the analog I residue also diminishes the kinetics of self-depurination of residue G1, but it does not eliminate it. This diminution is perhaps due to the well known poor stacking of the base hypoxanthine, in contrast to the strong stacking of G. This might lend flexibility to the loop by not stacking strongly with the first base pair of the stem, but only with 7-deaza-G and -T residues (dA and dC substitutions were not tested) is self-depurination at position 1 totally suppressed. In sum, sequence variation within the loop residues is quite restricted, with some sequence flexibility tolerated only at loop position 2.
Varying the Base Pair at the Base of the Loop-In the ␤-globin gene, the base pair at the base of the loop in the stem-loop catalytic intermediate for self-depurination is 5Ј-T⅐A-3Ј, and as can be seen in Table 1, with this base pair, the rate of selfdepurination at position 1 is highest. Fig. 2 illustrates the effect of changing this first base pair of the stem. The consequences are quite striking: although replacement of that pair by 5Ј-G⅐C-3Ј results in a mere 3-fold diminution in the rate, replacing it with either 5Ј-A⅐T-3Ј or 5Ј-C⅐G-3Ј (or any base pair mismatch; data not shown) totally suppresses self-depurination. A mismatch at the base of the loop likely permits a level of loop flexibility that removes any strain energy required for the depurination to occur. The difference between 5Ј-G⅐C-3Ј and 5Ј-T⅐A-3Ј versus 5Ј-C⅐G-3Ј and 5Ј-A⅐T-3Ј may well reside in the capacity of the similarly located carbonyls of T and G to hydrogen-bond to a water molecule so as to orient it toward the glycosyl bond of dG1 to be hydrolyzed, which lies just above it. Hence the differential base pair effects at the base of the loop.
Varying Stem Stability-To this point, all indications are that the lifetime of some fixed configuration of the loop impacts significantly on the probability of self-depurination to occur. The lifetime of such a strained loop configuration is likely to depend on the stability of the helically twisted stem, which in turn will depend on several well recognized variables, including helix length, i.e. the number of base pairs (7-10), base pair composition (11), and sequence (12), i.e. A⅐T, T⅐A, G⅐C, and C⅐G, noncomplementary base pair mismatches (13,14), and the presence of extrahelical residues (13,15). In fact, the data in Table 2 and the plot in Fig. 3 reveal a positive exponential correlation between the stability of these stem-loops, as indicated by their T m values, and the rate of their self-depurination. Such a dependence is consistent with the two-state model of DNA melting, in which the proportion of time in stem-loop versus single-stranded conformation at any particular temperature should vary with the log of the difference between that temperature and T m for the stem-loop (16). Consistent with this model, we have found that a stem-loop with a short but G⅐C-rich stem without mismatches (D14p) displays almost the same depurination rate as an A⅐T-rich one, but with a 16-base pair-long (A⅐T) 8 stem (D(AT) 16 ). Moreover, stem-loops with differences in stem sequence that have only a small effect on their T m values (D18 and D18ch) display similar self-depurination rates. On the other hand, as seen in the

Effect of changing loop and first base pair residues on the rate of site-specific self-catalyzed depurination of G residues
Unless directly indicated in the table, the loop and first base pair base residues are as in the first row. Hence, only base substitutions in that sequence are indicated. A, G, T, and C indicate canonical base residues. H indicates the hypoxanthine base residue. 7-deaza-G indicates the guanine base residue with N7 replaced by a carbon.

First 5 bp residue
Loop residues First 3 bp residue Rate at pH 5.0 sequences in Table 2 and the PAGE analysis and plots in Fig. 4, replacement of the A-C mismatch in the stem of D18 (T m of 43°C, self-depurination rate of 2 ϫ 10 Ϫ4 min Ϫ1 ) by a G⅐C base pair, as in D20p, leads not only to a very large increase of 26°C in the T m value, but also to an increase in the self-depurination rate so large (ϳ40-fold) that it could only be estimated. Indeed, in this stem-loop, with its 7-base pair fully complementary stem with a G⅐C content of 57%, self-depurination was so rapid that in the hour it took at room temperature to purify it, the rate was too fast to permit more than an estimate that it is higher than 80 ϫ 10 Ϫ4 min Ϫ1 , and when a stem with a slightly different sequence of 8 base pairs with a G⅐C content of 62.5% (D22p) was prepared, its T m rose another 4°C, and the rate rose another 2-3-fold. Effect of pH on the Rate of Self-catalyzed Depurination-The data presented above were all obtained at the standard pH 5 conditions to enable direct comparison with the previous observations (1). In these earlier experiments, a fragment of the human ␤-globin gene was used that contained an A-C mismatch in the stem of the catalytic intermediate. For this sequence and its various derivatives, it was shown that raising the pH to the physiological level of 7 significantly reduces the depurination rate. Such reduction is consistent with the well known fact that spontaneous DNA depurination is acid-catalyzed (19, 20) via protonation of both G and A residues, but it is also consistent with the stabilization of the A-C mismatched base pair at more acidic pH (21,22). To determine whether the faster self-depurination rate at lower pH should be taken to mean that self-catalyzed DNA depurination is limited to in vitro conditions, the pH dependence was determined of the self-depurination rate for deoxyoligonucleotides with and without an A-C mismatch in the stem. As can be seen in the plots in Fig. 5, although the rate of self-depurination for the stem-loop with the A-C mismatch is reduced ϳ50-fold when the pH is raised from 5 to 7, it is only reduced ϳ4-fold when the

Correlation of G residue self-depurination rate with stem stability
The bold G residue is the one self-depurinated. Residues that form stem base pairs are underlined. The melting temperatures (T m values) were estimated using the DinaMelt Web Server (17,18) and were confirmed experimentally for D(AT) 16 and D18 oligomers in 10 mM sodium cacodylate, 0.1 M NaCl, 0.002 M MgCl 2 , pH 5.0).

Oligomer
Sequence 5Ј-CTCCTGTGGAGGAG-3Ј 4 ϫ 10 Ϫ4 55 D(AT) 16 5Ј-ATATATATATATATATGTGGATATATATATATATAT-3Ј 4 ϫ 10 Ϫ4 58 D18 5Ј-GACTCCTGTGGAGAAGTC-3Ј 2 ϫ 10 Ϫ4 43 D18ch 5Ј-GGACCCTGTGGAGAGTCG-3Ј 2 ϫ 10 Ϫ4 48 a K obs is an estimate because the self-depurination rate of these oligonucleotides exceeds the detection limits of the method used.  . Replacement of the A-C mismatch in the stem by a G⅐C base pair results in a sharp increase in the self-depurination rate, so that it becomes virtually impossible to determine, because the initial, linear region of the depurination product accumulation cannot be resolved (1 h of incubation results in 50 -60% self-depurination).

Consensus Sequence for G Residue Self-depurination in DNA
OCTOBER 21, 2011 • VOLUME 286 • NUMBER 42 mismatch is replaced by a G⅐C base pair, which results in a stem-loop with a fully paired and more stable stem. This difference is consistent with stabilization of the A-C mismatch at lower pH either by protonation of its A residue to form a wobble-type A ϩ ⅐C base pair (21) or of its C residue to form an A⅐C ϩ imino,enol complementary-type base pair (22). The comparison in Fig. 5 shows that only a modest fraction of the rate reduction at pH 7 is, in fact, due to the reduction in self-depurination per se. Apparently, the major effect at pH 7 (where the self-depurination rate greatly exceeds the background rate of spontaneous depurination) is due instead to the resulting destabilization of the A-C mismatch and the consequent drastic decrease in the stability of the stem, which, as was demonstrated by the data in Table 2 and the plot in Fig. 3, is a major determining factor in the self-catalyzed depurination rate. In vivo, however, it is very likely that intranuclear macromolecular crowding (23), which is known to elevate acidic pK values (because of the reduction in the proximity of the charged surfaces of the macromolecules and the consequent favorable base pairing that ensues) (24), both mitigates this destabilization and enables the required protonation of the purine residue that will undergo glycosyl bond hydrolysis. We have tested this rationale by mimicking the macromolecular crowding with a well known crowding agent, polyethylene glycol 1000 at a concentration of 20% (conditions that we have found not to otherwise alter DNA helix stability). This resulted in ϳ10-fold increase in the in vitro self-depurination rate at pH 7 of the ␤-globin stem-loop consensus sequence (containing the A-C mismatch in the stem) to 8 ϫ 10 Ϫ6 min Ϫ1 (from 0.6 ϫ 10 Ϫ6 min Ϫ1 , as shown in Fig. 5). This self-depurination rate is some 3 orders of magnitude faster than the background spontaneous depurination rate, which is high enough to justify including consensus sequences with a mismatch in the stem-forming sequence among the potentially self-depurinating sites in the genome.

DISCUSSION
The primary aims of this investigation were to define more completely the consensus sequence for site-specific self-catalyzed depurination of G residues within DNA sequences and then to evaluate the pH dependence of the mechanism. Aside from enabling much more accurate estimates of their presence in genomes, this information can be of value in identifying potential self-depurinating sites in genes of particular functional annotation and thereby provide a means of uncovering possible biological roles of the self-depurinating mechanism. The results presented show that the sequence variations examined in the loop and base pair at the base of the loop form a meaningful pattern in terms of the way they impact the selfdepurination rate. Similarly, there is consistency between the stem sequence of the catalytic intermediate, the effect of pH variation, and its physical consequences on the stability of the stem, on the likely lifetime of the intermediate, and therefore on the rate of self-depurination. The consensus sequence would therefore appear to be suitably defined.
By varying stem parameters, it was found that self-depurination rates of deoxyoligonucleotides capable of forming stemloops with the appropriate loop sequence and perfectly paired stems can be Ն10 Ϫ4 min Ϫ1 at physiological pH (e.g. Fig. 5) or even higher when the stem-loops are more stable. Hence, selfdepurination rates 4 -5 orders of magnitude faster than the background spontaneous depurination rate (2) are quite possible.
The stem-loop structure essential for this mechanism can readily extrude from the DNA duplex (as part of a cruciform) under several circumstances: under the topological stress of DNA superhelicity (25); when the DNA strands separate during transcription or replication (26); and during ATP-dependent chromatin remodeling (27). Such extrusion has even been directly observed in genome sequences undergoing recurrent translocations in human chromosomes (28). All of these indications that cruciform extrusion can or does occur moves the observed phenomenon of self-catalyzed site-specific DNA depurination from a class of peculiar properties of some DNA sequences under unnatural conditions to one of potential biological relevance.
In fact, with the consensus sequence in hand, it has become possible to find strong indications of such relevance in the frequency and patterns of occurrence of sites in a genome with a potential to self-depurinate G residues. Such a mechanism may well have been expected to disappear over the course of evolution, given its capacity to damage the integrity of the genome as a consequence of error-prone repair of the resultant apurinic sites. Contrary to such expectation, a complete search of the human genome based upon the defined consensus sequence, which allows for a single base mispair or a single extrahelical residue (bulge) after the second base pair of the stem, both of which are thermodynamically plausible, reveals ϳ1,000,000 potential self-depurination sites for G residues. This number is some 5-fold higher than would be expected on a random basis, and this overrepresentation is even more prominent in many disease-associated genes, as well as some genes of common functional annotation that require sequence diversity in their contemporary function or else appear to have used the mechanism in the course of their evolution (29). Each such site has a potential for a substitution or short deletion mutation from error-prone repair of resulting apurinic sites. Although the extent to which potentially self-depurinating sequences in the human genome have been employed to create sequence diver- FIGURE 5. pH dependence of initial cleavage rates for D-18 (OE, 18 nucleotides, 5 base pairs and one A-C mismatch in stem) and D14p (f, 14 nucleotides, perfect stem of 5 base pairs). Note the differential decrease in the rate of self-depurination for the two examples when the pH is raised from 5 to 7. Whereas the reduction is drastic for the stem-loop containing the A-C mismatch, it is modest when the mismatch is replaced by a G⅐C base pair. However, when protonation of the A-C mismatch is favored at pH 7 by macromolecular crowding, as in the cell nucleus or as we have done with the crowding agent, polyethylene glycol (see text), the self-depurination rate is significantly enhanced.
sity is not presently apparent, the fact that significant examples have been uncovered emphasizes the value of knowing more fully the extent of sequence variation that retains the capacity to self-depurinate and so create potential apurinic mutagenic sites. In this connection, we have found many examples of coincidence of reported single-nucleotide polymorphisms with the consensus sequence described here for self-depurination of G residues. Prominent among such sites is the only one in the human ␤-globin gene; that site coincides with substitutions responsible for six different anemias and short deletions associated with two ␤-thalassemias (29).
The demonstration in this study that self-depurination can occur at a finite albeit low rate under essentially physiological conditions is worthy of note, particularly because of the implication that this mutation-related mechanism has likely played an evolutionary role. Although the low rate might be seen to indicate a mechanism of low significance, our view is that it is in fact appropriate for a mechanism likely to have been selected for creating sequence change in a biological context, in a manner analogous to the evolutionary sequence changes that arise from spontaneous mutations associated with the low error rates of highly accurate DNA synthesizing polymerases. Thus, both these mechanisms are geared to create genetic changes without creating cataclysmic genomic instability. Finally, we note that the tolerance for sequence variation in the consensus sequence for self-depurination of G residues that has been revealed and its affirmed acid catalysis provide insights whose further exploration should lead to a chemical mechanism for this unanticipated self-catalyzed reaction.