Refined solution structure of a DNA heteroduplex containing an aldehydic abasic site.

The solution structure of the DNA duplex d(C1G2C3G4A5D6A7C8G9C10C11)-d (G22C21G20C19T18A17T16G15C14G13-G12), with D indicating a deoxyribose aldehyde abasic site and numbering from 5' to 3', has been determined by the combined use of NMR and restrained molecular dynamics. The 31P and 31P-1H correlation data indicate that the backbones of these duplex DNAs are regular. One- and two-dimensional 1H NMR data indicate that the duplexes are right-handed and B-form. Conformational changes due to the presence of the abasic site extends to the base pairs adjacent to the lesion site with the local conformation of the DNA being dependent on whether the abasic site is in the alpha or beta configuration. When the sugar of the abasic site is in the beta configuration the deoxyribose is within the helix, whereas when the sugar is in the alpha configuration the deoxyribose is out of the helix. The base of residue A17 in the position opposite the abasic site is predominantly stacked in the helix in both cases. A water molecule can apparently form a hydrogen bond bridge between the beta abasic site and A17.

Damage to DNA bases can arise from a number of naturally occurring routes including oxidative stress as well as by the action of various chemical agents and by radiative processes. Base damage such as the spontaneous deamination of cytosine to form uracil, the oxidation of thymine to thymine glycol, or the oxidation of guanine to 8-oxo guanine can be repaired via abasic sites. The first step in repair in vivo is often the hydrolytic cleavage of the modified base, at the C-N bond between the sugar and the damaged or unusual base to generate an abasic site. The cleavage of the glycosidic bonds is catalyzed by DNA glycosylases which were first identified in 1974 (1), and there are nine known distinct classes of glycosylases (2)(3)(4)(5). Uracil glycosylase is the most familiar of the glycosylases and catalyzes the reaction shown below. Structures of two uracil glycosylases have been recently determined (6, 7) (Structure 1).
The abasic site is not a chemically unique species but is an equilibrium mixture of four forms (8 -10). The abasic site is an equilibrium mixture of ␣-(I), and ␤-(II) hemiacetals (2-deoxy-D-erythro-pentofuranoses), of aldehyde (III), and of hydrated aldehyde (IV), as depicted below. The hemiacetal forms predominate with about 1% aldehyde being present (8,11). The strand cleavage at the 3Ј side of the abasic site catalyzed by UV endonuclease V of bacteriophage T4 or endonuclease III of Escherichia coli occurs via a syn ␤-elimination reaction (9,10,12). The hydroxide-catalyzed reaction proceeds via a trans ␤-elimination reaction (10) (Structure 2).
A primary source of abasic sites is the spontaneous deamination of cytosine to uracil (2,(13)(14)(15)(16)(17). For a typical E. coli, it has been estimated that there are about 40 -400 such events per cell division and in a typical mammalian cell 4,000 -40,000 uracil formed per cell division (2,3,16,17). A genetic test has shown that the deamination rate of C deamination in doublestranded M13 in vitro is about 10 Ϫ12 /s (18). The number of abasic sites in a "typical" human cell is not known as the rate of formation and the rate of repair are dependent on many factors. Ames and co-workers (19) have estimated that there are more than 10,000 damaged sites/typical human cell at any given time. The number and types of damage to DNA which are required for transformation to occur are just now being determined (20 -23). The damage of mitochondrial DNA can also lead to cancer, and the repair processes of mitochondria are not well understood (24).
During the past few years there has been a growing appreciation of the diversity of DNA repair responses depending on the state of the cell. DNA repair can be highly coupled with transcription (25). Thus, in mature cells damaged DNA sites can accumulate, and the structural and dynamical effects of damaged DNA and the intermediates in DNA repair may be more important in mature than in dividing cells. It is now known that DNA repair can be strand specific with only the transcribed strand being repaired (2, 3, 24 -27). In mature nerve cells there is apparently little DNA repair occurring and damaged sites accumulate (28 -30). When a mature nerve cell is infected by herpes, pseudorabies or other virus DNA repair is activated by repair enzymes, including uracil glycosylase, coded by the virus (28 -30). It is now recognized that damaged DNA can also have pronounced effects on transcription and chromosome integrity (21,31) as well as cellular aging (32,33).
There have been a number of investigations of the effects of unrepaired abasic sites on replication (2-5, 21, 22, 34 -38). When DNA polymerase copies DNA containing an abasic site there is a strong preference, about 90%, for dA to be put in the daughter strand in the position opposite the abasic site (2,3,16,17,36,39). There may be special features associated with dA interacting with the abasic site with these physical properties determining to the dA preference which occurs during the rate determining step in the polymerase reaction. On the other hand if the various bases have similar interactions with the abasic site in the context of duplex DNA then the preference for dA is most likely that of the replication complex (22). There is no consensus at the present time for the origin of this preference, and there is some evidence that it is due to a kinetic effect (40) suggesting that the conformation of the abasic site may be important.
Abasic sites also affect transcription. The limited results to date suggest that the presence of an abasic site slows down but does not block transcription (21,22,26,31,41,42). The base most commonly placed in the RNA at the position complementary to the abasic site is rA. This is the same preference as found for DNA polymerases and suggests that the dA/rA preference might be at least partially due to the preferential interaction of A with the abasic site.
There have been a number of studies of DNA duplexes containing analogues of the naturally occurring abasic site (43)(44)(45)(46). These tetrahydrofuran or other analogues differ from the natural abasic site in hydrogen bonding potential, chemical reactivity, and perhaps other properties as well. The hydrogen bonding potential may be important since it is likely that the abasic site interacts with water molecules. A recent investigation indicated that the conformational preferences of at least some of these abasic site analogues are different from those of aldehydic abasic sites (47).
Studies on abasic sites are also of interest as they relate to studies on DNA degradation by drugs such as bleomycin and neocarzinostatin (15, 48 -50). These drugs lead to products such as deoxyribolactones and other types of non-aldehydic abasic. The refined solution state structures of duplex DNAs containing abasic sites may be useful for comparison with those generated by anti-tumor and other drugs.

EXPERIMENTAL PROCEDURES
Sample Preparation-The DNA single strands d(C 1 G 2 C 3 G 4 A 5 U 6 A 7 C 8 G 9 C 10 C 11 ) and d(G 22 C 21 G 20 C 19 T 8 A 17 T 16 G 15 C 14 G 13 G 12 ) were obtained from DNAgency and the purity of the single strands checked by reverse phase HPLC 1 on a preparative Hamilton PRP-1 column via elution with a gradient of 1.5-8.5% acetonitrile in 25 mM phosphate buffer, pH 7.0, in 22.5 min. The DNA was then dialyzed, and the single strands were lyophilized and reconstituted in pH 7.0 buffer containing 10 mM sodium phosphate, 100 mM sodium chloride, and 0.05 mM EDTA in 99.96% 2 H 2 O.
The abasic site single strand d(C 1 G 2 C 3 G 4 A 5 D 6 A 7 C 8 G 9 C 10 C 11 ) was prepared by treating DNA with the single strand d(C 1 G 2 C 3 G 4 A 5 U 6 A 7 C 8 G 9 C 10 C 11 ) that contains a single U residue, with N-uracil glycosylase as described previously (8 -12). The extent of reaction was monitored during the reaction by the reverse phase HPLC method described above which separates free uracil, the DNA single strand containing U, and the DNA single strand containing the abasic site. The single strand containing the abasic site was purified by gel filtration chromatography on a preparative TSK-GEL G2000SW column and eluted with 25 mM sodium phosphate buffer and 100 mM sodium chloride at pH 7.0 to remove N-uracil glycosylase and free uracil. The purified single strand was subsequently dialyzed, lyophilized to dryness, and dissolved in pH 7.0 buffer containing 10 mM sodium phosphate, 100 mM sodium chloride, and 0.05 mM EDTA in 99.96% 2 H 2 O. Abasic site containing single- 1 The abbreviations used are: HPLC, high performance liquid chromatography; NOESY, nuclear Overhauser effect spectroscopy; NOE, nuclear Overhauser effect; ECOSY, easy correlation spectroscopy; PE-COSY, phased easy correlation spectroscpy; heterotocys, heteronuclear total correlation spectroscopy.  Solution Structure of an Abasic Site Duplex DNA stranded DNA prepared by this approach was found to be pure both by proton and 31 P NMR and to be free of phosphodiester cleavage products. Overall yield for the conversion of the single-stranded material to abasic site containing DNA was about 85%. Due to the degradation of DNA containing abasic sites at elevated temperatures with subsequent irreversibility of duplex formation, a precise melting temperature for the duplex was not determined.
The heteroduplexes were formed by mixing equimolar quantities based on the extinction coefficients of the two strands and by monitoring the titration of the single strand containing the residue dU or D with the adjacent strand. NMR Procedures-The 400 MHz NMR spectra were obtained using a Varian Unityplus spectrometer, and the 500 and 600 MHz NMR spectra were obtained using Bruker AMX spectrometers at the University of Wisconsin. The proton spectrum of this DNA was previously assigned (51). The Varian NMR results were processed using VNMR software, and FELIX 2.1 software was used for the Bruker data. The two-dimensional NOESY, ROESY, double quantum filter correlation spectroscopy, total correlation spectroscopy, PECOSY, and other NMR experiment were carried out in the usual fashion.
Heteronuclear 31 P-1 H correlation data were obtained using the Unityplus 400 via a two-dimensional PHH-heterotocsy experiment (52). The heteronuclear spinlock time was 100 ms. The spectral width in the 1 H dimension was 2232 Hz collected into 1664 points and in the 31 P dimension was 500 Hz collected into 128 complex points. 128 scans for each of the 128 t 1 increments were obtained with an acquisition time of 0.4 s. The data were zero-filled to 2K data points in F 2 and 1K data points in F 1 dimension. The data were weighed with a Gaussian apodization in each dimension.
NOESY experiments in 2 H 2 O were carried out at mixing times of 100 and 200 ms with a 1.6 s equilibration delay with presaturation of the water resonance, using the Bruker 500 MHz with a spectral width 5000 Hz in each dimension. At each mixing time, 512 t 1 increments were acquired with 64 scans for each increment. The F 1 dimension was zero filled to 2K, and the data were weighed with a Gaussian apodization in each dimension prior to 2K ϫ 2K Fourier transformation. These data were used for quantification of the NOESY cross-peaks.
A NOESY spectrum was obtained with a 300-ms mixing time using the Varian Unityplus 400 spectrometer at 20°C in 2 H 2 O with 31 P decoupling during the evolution time. The data were collected into 2K complex points in t 2 and 1K complex points in t 1 with a spectral width of 4000 Hz in each dimension, and 64 transients were acquired for each of 512 increments of t 1 . A Gaussian weighting was used in both dimensions, and the spectra were zero-filled to 4096 by 4096 real points. The heteronuclear J PH couplings were determined by comparison of the proton linewidth along F 1 and F 2 .
NOESY experiments in 90% H 2 O, 10% 2 H 2 O were carried out on the Bruker 600 MHz spectrometer with jump and return pulses replacing the last 90°pulse in the standard NOESY sequence. The delay between the jump and return pulses was 55 ms. 650 increments of t 1 with 96 scans/increment were used, and 4K data points in t 2 were acquired. The spectral width in each dimension was 12,000 Hz. The data was processed to minimize the intensity of the water signal of each FID prior first Fourier transformation in the F 2 dimension. Gaussian apodization   FIG. 2. On the left are shown the two-dimensional 600 MHz NOESY spectrum of the DNA obtained with a 100 ms mixing time is shown with the sequential assignments of the d(C 1 G 2 C 3 G 4 A 5 D 6 A 7 C 8 G 9 C 10 C 11 ) strand indicated in the top spectrum and the assignments of the d(G 22 C 21 G 20 C 19 T 18 A 17 T 16 G 15 C 14 G 13 G 12 ) strand indicated in the bottom spectrum. This spectral region contains the aromatic to H5 and H1Ј cross-peaks. On the right are shown the two-dimensional 600 MHz NOESY spectrum of the DNA obtained with a 100 ms mixing time is shown with the sequential assignments of the d(C 1 G 2 C 3 G 4 A 5 D 6 A 7 C 8 G 9 C 10 C 11 ) strand indicated in the top spectrum and the assignments of the d(G 22 C 21 G 20 C 19 T 18 A 17 T 16 G 15 C 14 G 13 G 12 ) strand indicated in the bottom spectrum. This spectral region contains the aromatic to H2Ј, H2ЈЈ, and methyl cross-peaks. was used in both dimensions, the data was zero-filled to 4K data points in t 1 , and a second order polynomial base-line correction was used for the F 2 dimension.
PECOSY spectra were collected at 400 MHz, and the data were collected into 2K complex points in t 2 and 512 complex points in t 1 . The spectral width in F 2 was 3200 Hz, and in F 1 2600 Hz. 256 transients were acquired for each increment in t 1 . The data was linear predicted to 360 points before zero filling, and Gaussian weighting was used in both dimensions and the spectra were zero-filled to 4K ϫ 4K real points.
One-dimensional 31 P NMR spectra were obtained at 161.9 MHz with proton decoupling. The spectral width was 2687 Hz with 4288 complex points and 128 scans. A Lorentzian apodization of 3 Hz was applied prior to Fourier transformation. One-dimensional spectra of the imino protons were obtained at 400 MHz with spectral width of 10,000 Hz using a jump and return pulse for water suppression and 8K of complex points. The standard solvent subtraction was used prior to Fourier transformation.
Quantitation of NOE Cross-peak Volumes-The volumes in the NOE cross-peaks of the data obtained with 100 and 200 ms mixing times were quantified using FELIX 2.1 software. For each assigned and resolved cross-peak, the volume in a standard area was determined.
Structure Determination Procedures-Structure refinement was performed as described elsewhere (53) by restrained molecular dynamics using X-PLOR 3.1 with the following modifications. The structure of the AD duplex was refined using a complete relaxation matrix restrained molecular dynamics trajectory using X-PLOR 3.1 as described elsewhere (53). The experimental NOE cross-peak volumes were used as constraints using a well function. The experimental homonuclear and heteronuclear couplings were used to generate dihedral constraints defining sugar pucker and the conformation of the backbone using a harmonic potential. In addition, dihedral constraints were used to keep the purine rings planar. The non-bonded interaction cut off was set to 11.5 Å. The distance over which the switching functions for non-bonded interaction was switched from on to off was 9.5-10.5 Å. The distance cut off for the 28 Watson-Crick hydrogen bonding interactions was set to 7.5 Å, and the switching function was applied from 4.0 to 6.5 Å. 219 NOE volume constraints were used for each of the two mixing times. 16 H1Ј-H2Ј dihedral constraints were used in addition to 13 P-O-H3Ј dihdedral constraints. NOE constraints for the imino protons of base pairs 4, 5, 7, and 8 were used. The NOE and other constraints are available at the ftp site.
The optimum relative weighting of the NOE constraint, 40 kcal/mol, was found to be larger for the abasic site DNA than for undamaged DNAs, 20 kcal/mol. In addition, the optimum weighting of the biharmonic potential for the heavy atoms involved in experimentally observed base pair hydrogen bonds was found to be 30 rather than the 20 kcal/mol found for undamaged DNA. The other force constants were the same as used for undamaged DNA (53). The X-PLOR force field has been optimized for normal duplex DNA. The presence of the abasic site eliminates both stacking and steric interactions between the bases found in undamaged DNA and which are built into the X-POLR force field. The increase in the relative weighting of the experimental constraints compensates for the differences of the abasic site DNA.
The starting structures were generated from a canonical B-DNA by replacing the base with a hydroxyl group at the C1Ј ␣or ␤-position of the abasic site. This hydroxyl group has a partial negative charge of Ϫ0.325. The structures of the ␣ and ␤ structures were refined separately.
The energy of each of the starting structures was minimized in 100 steps of Powell's conjugate gradient minimization using X-PLOR. The relaxation matrix refinements were carried out in vacuum at 300°K. These were further minimized using the force field with all restraints for 100 steps of minimization and then subjected to a 100 ps relaxation matrix simulation followed by 200 steps of conjugate gradient energy minimization. Each trajectory was run for a total of 100 ps, and the structures appeared to reach equilibrium after about 20 ps. There were no significant differences between the structures after 20 ps and those after 100 ps. The structures at 20 ps were used for generating the back calculated spectra. Coordinates of both structures are available at the ftp site.
The NOE cross-peak volumes for each of these structures were backcalculated separately using an overall correlation time of 5 ns, a leakage rate of 0.33 s Ϫ1 and a distance cutoff of 5.5 Å. The spectra calculated for each of the structures were added together for comparison with the experimental results.
An anonymous ftp site has been set up that contains the assignments, NOE and dihedral constraints, and the structures of the ␣ and ␤ forms of the DNA. This site can be accessed either by http://prophet-.chem.wesleyan.edu/Chemistry.html using a web browser and following the links papers-bolton-ABASICSITEJBC1995. Alternatively the ftp site can be accessed at ftp://prophet.chem.wesleyan.edu with the material in the/pub/papers/bolton/ABASICSITEJBC1995 directory.

RESULTS AND DISCUSSION
The imino proton spectrum of the duplex is shown in Fig. 1. The spectrum indicates that the resonances in the AT region are broadened due to exchange. The integration of the AT region corresponds to less than two base pairs. The GC iminos are similar to those of the parent duplex with the broad resonance at 12.5 parts/million being from a terminal base pair.
The proton-decoupled 31 P spectrum of the duplex is also shown in Fig. 1. The resonances all appear in the region associated with B-form DNA, and there is relatively low resolution in the spectrum. Two resonances appear slightly downfield from the others. The most downfield resonance is from the phosphate between residues 16 and 17. Analogous downfield resonances have been previously observed from DNA duplexes containing abasic sites.
The one-dimensional proton spectrum of the non-exchangeable protons of the DNA are shown in Fig. 1. This spectrum was used to demonstrate that a one to one complex was formed by comparison with the spectra of the two individual strands that were combined to form the duplex.
The assignments of the protons of the DNA were made using the standard connectivities of B-form DNA. Fig. 2 contains the NOE connectivities of the aromatic and H1Ј protons as well as the connectivities of the aromatic and H2Ј,H2ЈЈ and methyl protons. Most of the interresidue connections are indicated in Fig. 2. In the ␤ configuration of the abasic site, the H1Ј and FIG. 5. The back calculated NOESY spectra for the NMR refined structures of the ␣ and ␤ forms of the DNA are shown as well as the sum of these two back calculated spectra. The experimental spectrum is also shown for comparison. This spectral region contains the aromatic to H2Ј, H2Љ, and methyl cross-peaks.
H2ЈЈ protons are spatially close and in the ␣ configuration the H1Ј and H2Ј protons are spatially close. These NOEs were observed and allowed the assignment of the H1Ј, H2Ј, and H2ЈЈ resonances of the ␣ and ␤ forms of the abasic site. The assignments of the proton resonances and the assignments of the analogous duplex which does not contain an abasic site are available at the ftp site. NOEs involving the abasic site include those between A7H8 and D6H1Ј (␣), A5H2 and D6H1Ј(␣), A7H8 and D6H2Ј, D6H2Љ(␤) as indicated in Fig. 2. The NMR results are consistent with our prior observations (8,11) of essentially equal amounts of the ␣ and ␤ forms.
The NOEs involving the imino protons are shown in Fig. 3. A strong NOE is observed between A5H2 and the imino proton of T18. This indicates that A5-T18 is a good base pair and that it is A7-T16 which is disrupted by the presence of the abasic site. The NOE connectivities of the imino protons of the G residues involved in base pairs are those of B-form DNA.
The qualitative analysis of the NOE and other results indicates that the presence of the abasic site at position disrupts the A7-T16 base pair but not the A5-T18 base pair. The NOEs indicate that the structure of the DNA duplex in the region near the abasic site depends on whether the abasic site is in the ␣ or ␤ configuration since distinctly different NOEs are observed for the two forms. The results also indicate that the structure of the duplex DNA is quite similar to that of the analogous duplex at the base pairs more than one removed from the abasic site.
The NMR results were used to obtain refined structures for duplex DNA containing an abasic site by separately refining the structures and comparing the results predicted by each structure with the experimental results. The experimental results are actually the combination of the results from the ␣ and ␤ forms so the results predicted by each form were combined and compared with the experimental data. Fig. 4 contains the NOE cross-peaks in the aromatic to H1Ј region for the refined ␣ structure, the refined ␤ structure, and the combination of the ␣ and ␤ structures, and the experimental spectrum is shown for comparison. The differences between the predicted spectra of the ␣ and ␤ forms are highlighted in this figure. In this spectral region two of the main differences between the ␣ and ␤ forms are the A7H2-D6H1Ј and the A5H8-G4H1Ј cross-peaks which are only found in the predicted spectrum of the ␣ form. Fig. 5 contains the cross-peaks of the aromatic to H2Ј/ H2ЈЈmethyl region for the refined ␣ structure, the refined ␤ structure, and the combination of the ␣ and ␤ structures, and the experimental spectrum is shown for comparison. The differences between the predicted spectra of the ␣ and ␤ forms are highlighted in this figure. In this spectral region, the A7H8-H2Ј and A7H8-H2Ј are only predicted by the ␤ form structure, and the A17H8-T18 methyl cross-peak is only predicted for the ␣ form structure. The two forms also differ in their predictions of some of the NOEs of the terminal residues.
These results show that the NOE spectra back calculated from neither the ␣ or ␤ form offer good agreement with the NOEs observed from the protons of the abasic site while the sum of the two back calculated spectra account for the NOEs of the abasic site. This is in accord with both forms being present in solution.
The two refined structures are shown in Fig. 6 with the minor groove prominent and Fig. 7 with the major groove prominent. The structure of the ␤ form is much more "B-like" in the region near the abasic site than is the ␣ form. In both cases A5-T18 is a good base pair while A7-T16 is not. In the ␣ form the abasic site sugar is almost extrahelical and residues 5 and 7 are relatively close together. In the ␤ form the spacing between the aromatic rings of residues 5 and 7 is comparable to that of B-form DNA. Thus, it appears that when the abasic site in the ␤ form that the structural distortion induced by the abasic site is minimal whereas in the ␣ form there is a considerable distortion induced by the presence of the abasic site. Fig.  8 shows the overlay of structures consistent with the NMR data for the ␣ and ␤ forms.
A rationale for this difference can be ascribed to the hydrogen bonding potential of the abasic site. In the ␤ form D6 and A17 can be bridged by an intervening water molecule which hydrogen bonds to both as depicted in Fig. 9. The position of the water molecule was found by minimizing the energy of the water molecule while keeping the DNA fixed. The water molecule can form a bifurcated hydrogen bond to the OH and ribose ring oxygen of D6 as well as a regular hydrogen bond to the N1 of A 17 . The analogous hydrogen bonding is less likely to occur for the ␣ form since positioning the 1ЈOH of D6 would be accompanied by unfavorable steric interactions of the abasic site sugar with the ring of A7. The abasic site will be able to be the donor in hydrogen bonds to the ring nitrogen of dA and dC residues and the acceptor in hydrogen bonds to the imino protons of dG and dT residues. The hydrogen bonding interactions of the abasic site are likely to be important in determining the differences in the properties of DNA duplexes with different bases opposite the abasic site. The interaction of the ␣ and ␤ abasic sites with DNA polymerase and other enzymes may be different due to their different hydrogen bonding capabilities.
Additional studies to determine the structures of DNA duplexes with residues other than dA opposite the abasic site are underway.