Circular Structures in Retroviral and Cellular Genomes*

A computer program for predicting DNA bending from nucleotide sequence was used to identify circular structures in retroviral and cellular genomes. An 830-base pair circular structure was located in a control region near the center of the genome of the human immunodeficiency virus type I (HIV-I). This unusual structure displayed relatively smooth planar bending throughout its length. The structure is conserved in diverse isolates of HIV-I, HIV-II, and simian immunodeficiency viruses, which implies that it is under selective constraints. A search of all sequences in the GenBank data base was carried out in order to identify similar circular structures in cellular DNA. The results revealed that the structures are associated with a wide range of sequences that undergo recombination, including most known examples of DNA inversion and subtelomeric translocation systems. Circular structures were also associated with replication and transposition systems where DNA looping has been implicated in the generation of large protein-DNA complexes. Experimental evidence for the structures was provided by studies which demonstrated that two sequences detected as circular by computer preferentially formed covalently closed circles during ligation reactions in vitro when compared to nonbent fragments, bent fragments with noncircular shapes, and total genomic DNA. In addition, a single T 3 C substitution in one of these sequences rendered it less planar as seen by computer analysis and significantly reduced its rate of ligase-catalyzed cyclization. These results permit us to speculate that intrinsically circular structures facilitate DNA looping during formation of the large protein-DNA complexes that are involved in site-and region-specific recombination and in other genomic processes. Sequence-directed bending of DNA

Sequence-directed bending of DNA is one factor that causes local variations in the structure of genomes (reviewed by Crothers et al. (1990), Trifonov (1991), Hagerman (1992), and Harrington (1992)). The major determinant of bent DNA arises from oligo(A) tracts, which deflect the axis of the helix from a straight line. In bent sites, the tracts are spaced near the helical repeat of DNA, which causes their deflections to sum producing curvature of the DNA axis. Studies of bent DNA have most often focused on simple curved segments that are less than 100 bp 1 in length. However, A tract bending has the potential to give rise to complex higher order structures, which may be important in regulating genome packaging and function. For example, the spatial organization of bending elements and straight DNA segments in replication origins and promoters is thought to be an important feature in the initiation of replication (Eckdahl and Anderson, 1987;1990) and transcription (McAllister and Achberger, 1989;Pérez-Martin, 1994). Similarly, a multi-domain structure of nucleosome positioning DNA has been proposed to be a determinant for the deposition of nucleosomes at specific sites in chromatin DNA (Drew and Travers, 1985;Fitzgerald et al., 1994). It is generally assumed that intrinsic curvature facilitates the wrapping of DNA around regulatory and structural protein complexes, and it seems likely that the intrinsic shape of the curved DNA plays some role in the assembly mechanism. Thus, a description of higher order DNA structures beyond the level of simple curved loci may be needed for a complete understanding of the biological relevance of bent DNA.
Small planar DNA circles produced by ligation of synthetic curved oligonucleotide precursors have served as model systems for the study of one complex higher order structure in DNA (Ulanovsky et al., 1986;Koo et al., 1990;Crothers et al., 1992). To our knowledge, the only comparable structures that have been found in natural DNA are in kinetoplast minicircles of the trypanosome Crithidia fasciculata. These sequences have been shown to exist as circular structures by electron microscopy (Griffith et al., 1986). In this report, computer programs for calculating bending from nucleotide sequence were used to identify circular structures of various sizes in cellular and retroviral genomes. Ligation studies were then carried out in order to provide experimental evidence for the structures. The results show that large circular structures are associated with a wide range of DNA recombination systems and we suggest that this unusual structure plays a general role in DNA looping.

EXPERIMENTAL PROCEDURES
Computer Analysis-The standard program for calculating DNA bending from nucleotide sequence has been previously described (Eckdahl and Anderson, 1987). The program incorporates the dinucleotide twist angles reported by Kabsch et al. (1982) and uses the parameters of the wedge model for bent DNA (Ulanovsky and Trifonov, 1987). The program assumes that bending occurs only at AA/TT and uses wedge values estimated by Ulanovsky and Trifonov, (1987). A derivative program, which incorporates wedge values for all 16 dinucleotides (Bolshoy et al., 1991), has also been described previously (Wang et al., 1994) and was used for the analysis in Fig. 1E. The programs calculate threedimensional coordinates of the helical axis along a sequence and then compute the ENDS ratio from these coordinates. The ENDS ratio is used to quantify predicted bending and is defined as the ratio of the contour length of a DNA segment to the shortest distance between the ends of the segment.
ENDS ratios were computed at the indicated window widths for sequences from release 73 of the GenBank data base (Benson et al., 1994). Details of the strategies used for analysis of both the data base and random sequences have been described previously (VanWye et al., 1991) and are outlined under "Results." The subdivisions of the Gen-Bank library that were used in the studies shown in Fig. 3 were phage, bacterial, invertebrates, plants, organelles, nonmammalian vertebrates, rodents, primates, other mammals, and viruses. Sequences from the structural RNA, synthetic, and unannotated categories were omitted. ENDS ratio analyses were performed on all sequences with lengths greater than 1000 bp. A total of 10,823 sequences (37% of the library) were analyzed. Upon request, the sequence locus names, nucleotide positions, and lengths of the highest 100 ENDS ratio peaks in each subdivision of the library will be provided. The accession numbers of the retroviral sequences used in Fig. 2 are given in Table I of Bronson and Anderson (1994). The accession numbers of sequences A-W from Fig General DNA Techniques and Electrophoresis-Standard molecular biological techniques were used for DNA manipulations (Sambrook et al., 1989). Double-stranded DNA sequencing was performed by the dideoxy chain termination method (Sanger et al., 1977) using a Sequenase kit from U. S. Biochemical Corp. Oligonucleotides were synthesized by the Purdue University Oligonucleotide Synthesis Facility. Unless otherwise indicated, DNA fragments were analyzed on 6% native polyacrylamide gels (29:1 acrylamide/bisacrylamide) run at 4°C in TAE (0.04 M Tris acetate (pH 8.3), 3 mM EDTA) or on agarose gels run at room temperature in TAE in the presence of 1 g/ml ethidium bromide. The apparent size of fragments on polyacrylamide was estimated by comparing mobilities of fragments to a 123-bp ladder (Life Technologies, Inc.) and to HaeIII digests of pUC18.
Preparation of DNA Molecules and Ligation Reactions-An EcoRI fragment corresponding to the circular structure detected by computer in intron D of the RNA polymerase II gene from Caenorhabditis elegans (accession no. M29235) was prepared by PCR. Plasmid DR36 (Bird and Riddle, 1989) containing this segment was kindly provided by Dr. David Bird (University of California, Riverside). An XhoI/BamHI fragment from the plasmid was used as a template to amplify the 192-bp segment (bp 3809 -4000), and the primers were designed such that the PCR product could be inserted into the EcoRI site of pUC18. EcoRI-digested recombinant plasmids isolated from transformed Escherichia coli (DH5-␣) were analyzed by electrophoresis on polyacrylamide gels and then by DNA sequencing in order to identify wild-type and mutant sequences. Of the 32 characterized plasmids, one point mutation was recovered; we presume that it was introduced during PCR. The mutation differs from the wild-type sequence by a single T 3 C substitution at position 3856. Ligation products of wild-type and mutated fragments were compared to the ligation products of EcoRI fragments from plasmid pSV2 cat (Gorman, 1985) (bp 4752 to 5006) and PJA71-2. PJA71-2 is a 213-bp fragment from an exon of the RNA polymerase II gene (bp 6332 to 6530). Ligations were also carried out with bent satellite monomers from Tenebrio molitor (Plohl et al., 1990), Panagrellus redivivus (de Chastonay et al., 1990), and Columbia riscoria . Satellite monomers were isolated from EcoRI-digested genomic DNA and cloned in pUC18. Genomic DNA was isolated as described previously (Van Wye et al., 1991).
Gel-purified EcoRI fragments were dephosphorylated with calf alkaline phosphatase and then end-labeled with [␥-32 P]ATP using T4 polynucleotide kinase as described by Drak and Crothers (1991). Unless indicated otherwise, labeled DNAs (1.0 g/ml) mixed with nonradioactive EcoRI-digested recombinant plasmids (350 g/ml) were equilibrated at 5°C in 30 mM Tris (pH 7.8), 10 mM MgCl 2 , 10 mM dithiothreitol, and 1 mM ATP. These high DNA concentrations favor multimerization of noncircular fragments (Zahn and Blattner, 1987). Unless otherwise indicated, ligation was initiated by the addition of T4 DNA ligase (Promega) to a final concentration of 5 units/ml. Samples were removed at the indicated times and added to stop buffer (66 mM EDTA, 5% glycerol, 0.05% bromphenol blue) in order to quench the reaction. One ligated sample set was treated with exonuclease III (ExoIII; 0.5 units/l) for 1 h at 37°C prior to its addition to stop buffer. Ligated products were resolved on 2% agarose gels in TAE containing ethidium bromide (Shore et al., 1981). Gels were subsequently dried and autoradiographed.
The circular structure that was detected by computer in the phage genome (bp 870-1750) was amplified and cloned as described above using BamHI-digested DNA as the template in the PCR. The ligation products of this 880-bp segment were compared to the ligation products of control (noncircular) EcoRI fragments from the troponin I (Koppe et al., 1989) and ␣-prothymosin (Schmidt and Werner, 1991) genes, which are 700 and 1192 bp in length, respectively. EcoRI-digested recombinant plasmids (20 g/ml) were equilibrated at 0 -1°C in 30 mM Tris (pH 7.8), 10 mM MgCl 2 , 10 mM dithiothreitol, 1 mM ATP, and 1.8 M sucrose. Ligation was initiated by the addition of T4 DNA ligase. The final concentration of ligase was 10 units/ml unless otherwise indicated. In some experiments, 10-l samples of the ligation reaction were removed at 3-120 min after the addition of ligase, mixed with stop buffer, and electrophoresed on 1.3% agarose gels containing ethidium bromide. DNA in the gels was transferred to nitrocellulose, and the blots were probed with nick-translated insert DNAs. In other experiments including the one shown in Fig. 11, ligation reactions contained all three digested plasmids (20 g/ml each) and 0.5-ml samples were removed at the indicated times and added to 1 ml of ethanol. Following precipitation, DNA was dissolved in electrophoresis sample buffer and resolved on 1.3% agarose gels containing ethidium bromide. Photographic negatives of the gels and blots were scanned by computer, and DNA forms were quantified using the NIH image 1.52 program. Open circular, closed circular, and linear forms were identified by comparing the gel mobilities of DNA samples treated and not treated with ExoIII, by the restriction nuclease test described in Fig. 2 of Shore et al. (1981), and by polyacrylamide gel electrophoresis (Ulanovsky et al., 1986) (data not shown). Photographic negatives of ethidium bromide-stained gels are presented in Figs. 5, 8, and 11.  (gag, pol, and env) and the locations of regulatory elements in the vicinity of the integrase gene at the end of pol. These elements include a transcriptional enhancer with two subdomains containing AP-1 and NF-B binding sites (Verdin et al., 1990;Van Lint et al., 1991) and the central polypurine tract which serves as an initiation site for synthesis of the plus strand HIV DNA (Hungnes et al., 1991;Charneau et al., 1992;Hungnes et al., 1993). A single-stranded gap that flanks this tract is seen in unintegrated copies of the linear double-stranded viral DNA (Charneau and Clavel, 1991). A major DNase I-hypersensitive site is found between the enhancer subdomains when the provirus is packaged into cellular chromatin (Verdin, 1991).

Computer Analysis of Retroviruses-
Computer-generated helical projections of the HIV-1 proviral DNA are shown in Fig. 1A. The standard program that produced the projections is based on a model of intrinsic curvature, which assumes that DNA bending arises solely from AA/TT stacks. Deflections from linearity are quantified by the ENDS ratio, which is defined as ᐉ/d, where ᐉ is the contour length of a DNA segment and d is the shortest distance between its ends (Eckdahl and Anderson, 1987). In panel B, ENDS ratios were computed at window widths of 20 -200 bp (top), 20 -400 bp (middle), and 20 -1000 bp (bottom) so that the corresponding short, intermediate, and long bent segments could be characterized in a single analysis. At the shorter window widths, the major site of bending occurred in the first 100 bp of reverse transcriptase. Helical projections of this segment are shown in Fig. 1C. The structure is characterized by an ENDS ratio value of 1.30 at a window width of 120 bp. This value is greater than 4 standard deviations above the mean of random and GenBank data base sequences . The site consists of two 40-bp regions of bending centered at nucleotides 2689 and 2757. The bending elements in the two regions are out of phase with each other by about 5 bp, which produces an Sshaped molecule. This structure is similar to the S-shaped configuration that is characteristic of nucleosome positioning DNA . Minor DNase I-hypersensitive sites, indicative of phased nucleosomes, have been mapped immediately adjacent to this structure on the 3Ј side (Verdin, 1991). At the intermediate window widths, regions of bending are noted at the gag-pol overlap, and at the 3Ј ends of pol and env. These deflections, as well as minor deflections at the 5Ј ends of gag and env, can readily be seen by the helical projections shown in panel A. A single unusually high ENDS ratio peak with a value of 73 is seen at the 830-bp window. The computer-generated structure corresponds to a circle of 830 bp (Fig. 1, A and D). The circular structure encompasses most of the integrase gene at the end of pol, and its central position at nucleotide 4800 is 22 bp away from the internal polypurine tract. The circular structure was also seen when the sequence was analyzed by a computer program that incorporates wedge angles for each of the 16 dinucleotides (Fig. 1E).
In order to assess the conservation of the bent structures in retroviruses, the 60 full-length retroviral sequences from re- The expanded region at the bottom of the map (bp 4000 -5500) gives the positions of AP-1 (q) and NF-B binding sites (E), which are located within the two subdomains of the intragenic transcriptional enhancer (Verdin et al., 1990;Van Lint et al., 1991). Also shown are the positions for the internal DNase I-hypersensitive site ( ) and the central polypurine tract ( ). The position of the circular structure (see below) is indicated by the heavy line above I. A, two-dimensional projections of the helical axis of the HIV-1 proviral DNA. Each successive projection is rotated by 60 degrees. The positions of the 5Ј end of the proviral DNA, the start (S) and end (E) of Gag (G), Pol (P), and Env (E), and the circular structure at the end of Pol (C) are indicated along the projections. B, ENDS ratios were computed at window widths of 20 -200 bp (top), 20 -400 bp (middle), and 20 -1000 bp (bottom) at a window step of 10 bp along the sequence. C, two-dimensional projections of the S-shaped bend at the beginning of RT (nucleotides 2639 -2807). D and E, two-dimensional projections of the circular structure in I. Each projection in C-E is rotated by 36 degrees. The projections and ENDS ratios in A-D were calculated by the standard program, while the projections in E were calculated by a program that incorporates all 16 dinucleotide wedge angles. lease 73 of the GenBank data base were examined as in Fig. 1. In this collection, each of the known retroviral lineages is represented by multiple sequences . Fig. 2A shows the positions and types of bending elements along the genomes of the primate lentiviruses HIV-1, HIV-2, and SIV. The ENDS ratio peaks that characterize each bending element are more than 2 S.D. above the mean of randomized sequences (Fig. 2, see legend). Bending occurs most often at the beginnings and ends of the major open reading frames in essentially all of the sequences and this preference is especially prominent at the intermediate window widths. Likewise, the bent sites in the genomes of other retroviral lineages not shown in the figure tended to occur at the beginning and/or at the ends of the major open reading frames, although the positions of the sites varied in the different viral groups (data not shown). The most striking result in Fig. 2 is the conservation of the circular structure in the integrase gene. The structure was seen in all isolates of HIV-1, HIV-2, and SIV, although the magnitudes of the ENDS ratio peaks in some SIV isolates were not as high as those seen in the human immunodeficiency viruses ( Fig. 2A). Helical projections of the integrase sequence from a virus representing each group are presented in Fig. 2 (B-D), which shows that each structure displays relatively smooth planar bending throughout its length. Near planar bending was also noted in the homologous regions of integrase in some nonprimate lentiviruses including the prototype virus visna (data not shown). No comparable structures were observed in the 17 sequences from oncoviruses or spumaviruses, which are the other two subfamilies of retroviruses. The circular structures in the primate lentiviruses had an average ENDS ratio of 50 with circumferences that ranged from 780 to 1060 bp. The center of the structure in every genome was within 60 bp of the internal polypurine tract. The conservation of the predicted structure was of interest because there is as much as 40% divergence in the nucleotide sequence of the circular regions shown in Fig. 2 (B-D; see legend), and the divergence is even greater among the integrase sequences from the viruses depicted in Fig. 2A. This is significant because insertions or deletions of as few as a single bp in the central 200-bp regions, or a small number of random substitutions (less than 5%), can abolish the computer-generated circular projections of the sequences shown in Fig. 2 (B-D) (data not shown). In addition, the structures are heterogeneous in size and shape as seen in Fig. 2 (B-D). Consequently, the exchange of homologous segments of the integrase gene from HIV-1, HIV-2, and SIV failed to generate circular structures of the types shown in Figs. 1 and 2 (data not shown). These results imply that selective pressures have maintained the integrity of the circular structure despite extensive sequence divergence.
Computer Analysis of Cellular Sequences-A search of all sequences in the GenBank data base containing Ͼ1000 nucleotides was carried out as in Fig. 1B in order to estimate the frequency of high ENDS ratio peaks in cellular DNA. The results in Fig. 3 show ENDS ratio values of the highest peaks plotted against the window widths that gave the maximal peak values. Data base sequences were scanned at window widths from 20 to 1100 bp, and an ENDS ratio value Ͼ15 was used as the criterion for a peak. This value is greater than 20 standard deviations above the mean of the data base sequences and greater than 10 standard deviations above the mean of the highest ENDS ratio peak in each data base sequence (Fig. 3). No ENDS ratio peaks Ͼ10 were seen when 1500 kbp of randomized sequences were subjected to the analysis, which also ENDS ratio values of all sequences in the data base at each window step (ϫ) and the mean of the highest ENDS ratio peaks per data base sequence (ϩ) are shown for the indicated window widths. At a window width of 1500 bp, the corresponding ENDS ratios were 1.7 and 2.9. Each of the 135 sequences that displayed an ENDS ratio peak Ͼ15 is indicated by a symbol in the top portion of the graph, and the window width at which the peak maximum was observed is given on the x axis. Symbols represent peak centers found in untranslated DNA (E), protein coding DNA (q), coding DNA within 200 bp of the end of a gene (Ⅺ), and the integrase regions of HIV-1, HIV-2 or SIV (*). The two structures indicated by the arrows are described under "Results." illustrates the unusual nature of the peak structures in natural DNA. A total of 135 peaks were observed in the 10,823 data base entries, but this frequency is an overestimate of circular structures of the type shown in Fig. 2 since visual inspection of the helical projections revealed that only about half of the peak sequences displayed planar bending throughout their lengths (data not shown). All 18 peak sequences with lengths between 160 and 580 bp were found in untranslated regions of the genome. We presume these elements are excluded from coding DNA because the closely spaced oligo(A/T) tracts needed to generate such peaks would impose severe coding constraints. Peak sequences with lengths from 620 to 1100 bp also displayed a preference for untranslated DNA, although significant fractions resided in sites that overlap coding and noncoding regions or in more central areas of coding DNA.
Three groups of sequences are overrepresented in Fig. 3 when compared to their frequencies in the GenBank data base. First, 20 of the 135 sequences are found in the integrase region of primate immunodeficiency viruses and the highest data base peak (ENDS ratio ϭ 392) was from this region of HIV-1 (MN isolate). Second, 11 of the peak sequences were found in untranslated DNA from C. elegans and these sequences were particularly prevalent among the shorter peak sites uncovered by the search. This overrepresentation is consistent with the observation that the genome of this nematode is highly bent (VanWye et al., 1991). Third, a significant fraction (17/50) of the remaining peak sequences longer than 800 bp were associated with a class of molecules which includes known mobile elements and related sequences. Helical projections of sequences that constitute this group are shown in Fig. 4.
Alterations in gene expression caused by inversion of DNA segments has been extensively studied in bacteria and in yeast (reviewed by Cox (1988), Glasgow et al. (1988), Stark et al. (1992), and Van de Putte and Goosen (1992)). In these systems, a recombinase promotes strand cleavage and rejoining within inverted repeat sequences that flank the invertible element. Efficient inversion in bacteria also requires a recombinational enhancer. The enhancer and the gene for the recombinase are typically located within, or adjacent to, the invertible element. Fig. 4 (A-D) shows helical projections of four invertible systems that were detected as ENDS ratio peaks in the data base search. In each system, the circular structure terminates within 200 bp of at least one inverted repeat sequence and the structures in the two bacterial sequences terminate within 200 bp of a recombinational enhancer. The 2.1-kbp invertible DNA segment that controls pilin phase variation in Moraxella bovis is particularly striking since it consists of two circular structures containing the two pilin genes (Q and I), which are separated by a straight stretch of DNA (Fig. 4D). In order to determine if DNA circularity is a characteristic of invertible systems, we examined the inversion regions responsible for the well characterized phase variation of flagellin genes in Salmonella typhimurium and fimbrial protein genes in E. coli.The invertible DNA segments in both systems appear as C-shaped structures as shown in Fig. 4 (E and F). The cin invertase gene in phage P1 and P7 and the pin invertase gene from E. coli also appeared as C-shaped structures when viewed by the computer program for bending (data not shown).
DNA looping is thought to facilitate the assembly of complex nucleoprotein structures during synapse formation in inversion systems (reviewed by Schleif (1988), Wang and Giaever (1988), Echols (1990), and Matthews (1992)). An analogous looping mechanism has been implicated in the transposition of the phage Mu transposon (Surette et al., 1989;Heichman and Johnson, 1990;Mizuuchi, 1992;Surette and Chaconas, 1992). As shown in Fig. 4G, the A gene for the Mu transposase and the B accessory transposition gene are contained within a circular structure, which is adjacent to the transpositional enhancer. Initiation of replication in plasmid R6K utilizes a DNA looping mechanism, which is mediated by the P1 initiator protein (Mukherjee et al., 1988a(Mukherjee et al., , 1988b. The protein binds to multiple enhancer sites in the ␥-ori, and the resulting DNA-protein complex then loops to and activates the ␤-ori located ϳ1.2 kbp downstream. A circular structure is seen at the site of loop formation in the intervening DNA between the ␥and ␤-oris and its 5Ј end terminates in the ␥-enhancer element (Fig. 4H). Class switching of immunoglobulin heavy chain constant regions (C H ) occurs through a recombinational event in switch regions located 5Ј to each C H (reviewed by Coffman (1993)). The most common form of switching involves the looping out and excision of chromosomal DNA between the most 5Ј switch region (S) and one of the switch regions located further downstream along the chromosome. Switch recombinational sites are clustered within a region at the 5Ј edge of the S segment and this region is immediately downstream of a circular structure in the mouse genome (Fig. 4I).
Recombination events near the ends of linear chromosomes are responsible for antigen variation in the African trypanosome ( Van der Ploeg et al., 1992;Pays et al., 1994), in the spirochete Borrelia hermsii (Barbour, 1993), and in the African swine fever virus de la Vega et al., 1990;Gonzá lez et al., 1990). These organisms presumably evade the immune system of their mammalian hosts by periodically switching the expression of members of multigene families that code for surface antigens. In the trypanosome and spirochete, a transcriptionally silent copy of a surface protein gene is typically duplicated and transferred unidirectionally from a distal site in the genome to a transcriptionally active telomeric expression locus. Similarly, duplication and translocation of sequences located near the left end of the 170-kbp swine viral genome is thought to have played a role in generating the antigen variation seen among different isolates of the virus (Gonzalez et al., 1990). As shown in Fig. 4 (J and K), circular structures were noted in the expression locus several kilobase pairs upstream from the surface antigen genes in the spirochete and the trypanosome. Comparable O-shaped structures (Fig. 4L), as well as C-shaped structures (data not shown), were also noted within 1 kbp upstream of the antigen genes in both systems. Both locations have been implicated in recombinational events (Pays et al., 1989(Pays et al., , 1994Barbour et al., 1991;Barbour, 1993). Multiple unusual structures also characterized the left end of the swine virus genome. O-shaped (Fig.  4, M and N) and C-shaped (data not shown) structures were seen in this region at locations corresponding to sequences that are the most variable among viral isolates (see Fig. 7 in De La ).
Type II introns encoding reverse transcriptase (RT introns) are common in the organelles of lower eukaryotes. These sequences are thought to be mobile elements, although the mechanism of their transposition is unclear (Lambowitz and Belfort, 1993). The data base search in Fig. 3 detected four RT introns from organelles, which are shown in Fig. 4 (O-R). The RT sequence from the carnation etched virus also displayed a circular structure as seen in Fig. 4S. This virus is thought to be more closely related to transposons than to plant viruses (Doolittle et al., 1989). Helical projections of the 34 RT introns and related sequences analyzed by Mohr et al. (1993) were assessed in order to determine whether DNA circularity is a conserved feature of this group. The results revealed that about 80% of the sequences displayed either O-shaped or C-shaped structures (data not shown). Organelles from plants and lower eukaryotes also contain mobile elements in the form of small plasmids whose origins are poorly understood (Ray et al., 1987;Kempken et al., 1992). The data base search identified four of these sequences with circular structures in the organelle subdivision of GenBank (Figs. 4, T-W).
It is unlikely that the association of circular structures with mobile elements and related systems is coincidental since the known sequences that make up this group are expected to represent only a small fraction of the total data base entries. The preferential detection of these elements in the computer search can be clearly seen by considering the identity of the sequences with the highest ENDS ratio peaks from Figs. 3 and 4. Nineteen of the circular structures shown in Fig. 4 have ENDS ratio values that are greater than 40. This number is significant since there were only 48 sequences with ENDS ratio peaks greater than 40 in the entire GenBank data base and 8 of these were found in the integrase region of the primate immunodeficiency viruses (Fig. 3). The specificity of the computer search for a discrete subset of the data base can be FIG. 4. Two-dimensional projections of the helical axis of high ENDS ratio peaks. Each peak sequence (A-W) is represented by two helix projections that are rotated 90 degrees. All sequences were detected as high ENDS ratio peaks (Ͼ15) in the search from Fig. 3 except the C-shaped structures in E and F and the structures in G and U, which are longer than 1100 bp. Top panels, sequences from inversion and looping systems: A, min system from plasmid 15 B of E. coli (Sandmeier et al., 1991); B, FLP system for inversion of the 2-m plasmid of Saccharomyces cerevisiae (Cox, 1989); C, FLP system for inversion of a plasmid from Zygosaccharomyces bailii (Utatsu et al., 1987); D, pilin inversion system from M. bovis (Fulks et al., 1990); E, hin system for flagellar variation in S. typhimurium (Szekely and Simon, 1983); F, fim system for fimbrial variation in E. coli (Abraham et al., 1985); G, genes A and B coding for the phage Mu transposase and accessory gene (Harshey et al., 1985); H, plasmid R6K replication-looping system (Kelly and Bastia, 1991); I, the C H switch region (S) from mouse (Sakano et al., 1980). The maps adjacent to each set of projections show the positions of the circular elements (--), recombinase cleavage sites ( ), enhancers (q), and genes (Ⅺ). Heavily lined boxes indicate genes for site-specific recombinases. The recombinational enhancers bind either Fis (A, D, and E) or Mu transposase A (G), and the replication enhancer in H binds the P1 initiator protein. Note that a circular structure terminates within 200 bp of each enhancer. Multiple recombination sites in I are located in the region between the two arrows (Matsuoka et al., 1990). Scale bars are given in the right lower corner of each panel. Middle panels, sites within telomeric recombination systems located: J, in a VMP expression site of Borrelia hermsii (Barbour et al., 1991); K and L, in a VSG expression site of Trypanosoma brucei (Pays et al., 1990;Boothroyd and Cross, 1982); M and N, in the left-terminally located variable region of the swine fever virus Gonzá lez et al., 1990). Bottom panels, mitochondrial plasmids and other reverse transcriptase coding entities from: O-R, Type II (RT) introns (Wahleithner et al., 1990;Wissinger et al., 1991;Lazowska et al., 1980;Siemeister et al., 1990); S, ORF5 of the carnation etched virus (Hull et al., 1986); T and U, S-1 and S-2 plasmids from CMS-S Zea mays mitochondria (Paillard et al., 1985;Levings and Sederoff, 1983); V and W, kinetoplast plasmid minicircles of Leishmania tarentolae and Crithidia fasciculata (Kidane et al., 1984;Ray et al., 1986). The coordinates of the ENDS ratio peaks in sequences A-W are given in "Experimental Procedures." further illustrated by a consideration of all peak sequences that were detected among the 3643 sequence entries that constitute the viral, organelle, and bacterial subdivisions of GenBank. There were 11 viral sequences and 5 sequences from organelles with ENDS ratio peaks Ͼ40. All of these structures are shown in Fig. 2 (top panel) and Fig. 4 (O, P, T, V, and W). Five of the 11 peak structures from the bacterial data base with ENDS ratio peaks Ͼ40 are also shown in Fig. 4. Of the remaining six structures from bacteria that are not shown in the figure, two were associated with ori sequences, one with a toxin gene, and three with genes that encoded pilin-like proteins. 2 Three of the four genes were from pathogens, as were most of the bacterial recombinational systems shown in Fig. 4. To our knowledge, these genes and ori sequences have not been tested for recombinational or looping activities. Many of the bacterial sequences with ENDS ratio peaks between 15 and 40 were also from pathogens, and the circular structures were most often found in or near genes that encode toxins, membrane proteins, or proteins that form filamentous appendages. Pathogenic bacteria frequently display phenotypic variations of their surface components and toxins and genetic recombination is a prevalent mechanism that mediates these phenotypic changes (reviewed by Robertson and Meyer (1992)). Thus, it is conceivable that these systems also exhibit phenotypic variation mediated by recombination and the circular DNA structures could play a role in the process.
Experimental Analysis of Selected Structures-Two high ENDS ratio peaks with lengths of less than 200 bp were identified in the search of the GenBank data base (see Fig. 3). One sequence was from kinetoplast minicircles of C. fasciculata, and this K-DNA has been visualized as a circular structure in electron micrographs (Griffith et al., 1986). This sequence appeared as a nearly perfect circle when analyzed by the computer program (Fig. 4W). The other sequence was found within intron D of the RNA polymerase II gene of C. elegans. Experiments were performed using this intron segment in order to provide additional evidence for the existence of small circles in natural DNA. Fig. 5 gives some of the characteristics of the EcoRI fragments used in the work. Two of the fragments (pJA71-2 and pSV2CAT) have electrophoretic properties and ENDS ratio values that are within the ranges of those seen for the bulk of natural DNA (Anderson 1986;Eckdahl and Anderson, 1987;VanWye et al., 1991;Fitzgerald et al., 1994). The three satellite sequences display significant bending, and the sequence from dove has nearly as many phased oligo(A) tracts as are found in K-DNA and in the C. elegans intron segment . However, the dove satellite monomer is composed of two bent loci, which are out of phase with each other by approximately one-half helical turn. This gives rise to the computer-generated S-shaped structure shown in the figure . The 198-bp intron segment has an R L value (apparent chain length/real chain length) which is similar to that of the K-DNA sequence . The magnitude of the electrophoretic anomaly displayed by these segments is greater than those reported for all other natural sequences of this length. The temperature dependence of the gel anomaly was also similar to K-DNA. The R L values for both sequences declined by about 50% upon increasing the temperature from 5°C to 35°C (Diekmann and Wang, 1985;Fig. 5). Mutations in the intron sequence were recovered following PCR amplification and cloning. The computer-generated structure of one sequence carrying a point mutation that converts a T 6 tract to TCT 4 is shown in the figure. The helical projections of this sequence are less planar then those displayed by the wild-type sequence, and this difference is reflected in a reduction in the ENDS ratio and a small but reproducible reduction in electrophoretic anomaly.
A comparison of the cyclization of the DNA sequences shown in Fig. 5 should provide a stringent test for the predictive power of the computer program used in this study. In Fig. 6, DNA fragments were end-labeled and then incubated with T4 DNA ligase at 4°C for the indicated times in the presence of an excess of nonradioactive digested plasmid DNA. Following incubation, samples were electrophoresed on agarose gels in the presence of ethidium bromide in order to resolve circular and linear species (Shore et al., 1981). Inspection of the resulting autoradiograms revealed that the control and satellite DNA molecules preferentially formed linear multimers during the ligation, and nearly all of the radioactivity was sensitive to ExoIII digestion upon termination of the experiment. In contrast, both the wild-type and mutant circular structures from intron D preferentially formed closed monomer circles that were resistant to ExoIII digestion. In addition, the rate of cyclization of the wild-type sequence was 3.2 Ϯ 0.4-fold faster than that of the mutant sequence, as revealed by four replicate gel experiments of the type shown in Fig. 6, and 2.7-fold faster, as seen from the ExoIII experiments shown in Fig. 7.
Additional evidence for the preferential cyclization of the intron segment was obtained by carrying out ligations in the presence of EcoRI-digested C. elegans genomic DNA (Fig. 8). In panel A, the 32 P-labeled intron segment was mixed with a 1000-fold mass excess of C. elegans DNA prior to ligation. Gel analysis of the ligation products revealed cyclization of the intron segment persisted even in the presence of excess genomic DNA. A similar analysis in panel B showed that the 2 These are GenBank accession numbers M29691, M34386, M59751, M62809, M29725, and M24197.
FIG. 5. Characteristics of the DNA molecules. The preparation of the DNA is described in "Experimental Procedures." The intron D segment in pJA46 -13 carries a point mutation that converts a T 6 tract to TCT 4 . Relative length (R L ) is defined as the ratio of the apparent length to the actual length. Apparent lengths were determined by electrophoresis on 6% polyacrylamide gels. ENDS ratios were calculated at a window width that corresponded to the actual length of the fragment. Each successive projection of the helix is rotated by 36 degrees. The 5% gels at the bottom of the figure were run at the indicated temperatures. Lanes A-E are, respectively: pSV2cat, pJA670, pJATM5, pJA46 -2, and pJA46 -13. Lane M shows markers. cyclization of the intron segment was not noticeably effected by 100-fold excess levels of genomic DNA fragments that ranged in size from 150 to 500 bp. Panel C shows the reverse approach where trace amounts of 32 P-labeled genomic DNA were ligated in the presence of an excess of an EcoRI-digested plasmid containing the intron sequence. The preferential cyclization of the intron sequence when compared to the genomic DNA is also apparent from this analysis.
The torsional alignment of the two ends of a DNA molecule can influence the rate of cyclization Baldwin, 1983a, 1983b). However, it is unlikely that this effect is responsible for marked differences in cyclization efficiencies reported in Figs. 6 -8. For example, the wild-type and mutant intron segments displayed clear differences in cyclization rates as shown in Figs. 6 and 7, but both sequences have the same number of nucleotides and essentially the same number of helical turns as calculated from dinucleotide twist values (18.87 versus 18.85 turns). In addition, the failure of the control and satellite fragments to readily cyclized in the studies shown in Fig. 6 was expected since noncircular molecules in this size range resist cyclization, especially under the conditions of high DNA concentrations which favor bimolecular associations (Shore and Baldwin, 1981). To further rule out this possibility, ligation reactions were carried out in the presence of varying concentrations of ethidium bromide in order to alter the twist of control and circular molecules (Shore and Baldwin, 1983b). As shown in Fig. 9A, ethidium bromide had no noticeable effect on the preferential cyclization of the intron sequence when high DNA concentrations were used in the reaction mixtures. Likewise, cyclization of the intron sequence but not the control exon fragment was seen at all levels of the drug when the DNA concentrations were reduced by about 350-fold (Fig. 9B). We also note that ethidium bromide did not reduce the R L valve of the intron segment when the fragment was electrophoresed in the presence of 0.3 and 0.7 g of this drug/ml (data not shown). Cons and Fox (1990) have similarly noted that ethidium bromide failed to alter the anomalously slow gel mobility of K-DNA. Preferential cyclization of the intron sequence was also observed at ligation temperatures ranging from 5°C to 37°C in the absence of ethidium bromide (Fig. 9C). Temperature variations within this range, like ethidium, alter the helical twist of DNA molecules of this size (Shore and Baldwin, 1983b). Taken together, these results and the observations mentioned above strongly suggest that differences in the orientation of the ends of DNA molecules in Fig. 5 are not responsible for the preferential ligation of the intron sequence.
The studies in Figs. 10 and 11 focused on a large circular structure from phage . The 880-bp circular structure is found in the A gene which codes for the major subunit of the DNA terminase. The enzyme cleaves the terminal cos site and also plays an active role in the packaging of DNA into proheads (Cue and Feiss, 1993). The sequence was characterized by an  Fig. 5 were incubated with 5 units/ml ligase for the indicated times. Exo-resistant radioactivity was then measured following acid precipitation. Results are expressed as the means Ϯ S.E. of the percentages of the valves for DNA not treated with Exo. Right panel, data replotted as in Crothers et al. (1992) where (D)t is the concentration of Exo-sensitive radioactivity (unreacted monomer) at time t. Exosensitive radioactivity is equated with unreacted monomer because autoradiograms of these samples following electrophoresis revealed Ͼ98% of the radioactivity was either unreacted or circular monomer (not shown). Cyclization rate constants k 1 were 5.9 ϫ 10 Ϫ4 and 2.2 ϫ 10 Ϫ4 s Ϫ1 for the wild-type and mutant sequences, respectively. FIG. 8. Ligation analysis of C. elegans intron segment  (pJA46 -2) in the presence of C. elegans genomic DNA. A, EcoRIdigested C. elegans genomic DNA (1 mg/ml) and 32 P-labeled intron segment (1 g/ml) were incubated together in the absence (lanes 1 and 2) or in the presence (lanes 3 and 4) of ligase (30 units/ml) for 30 min at 4°C and then 30 min at 22°C. Samples in lanes 2 and 4 were treated with Exo prior to electrophoresis on a 2% agarose gel. A negative of the ethidium-stained gel and its autoradiogram are shown (B). The 32 Plabeled intron segment (1 g/ml) was incubated alone (lanes 1 and 2) or with size fractionated C. elegans genomic DNA (100 g/ml, lanes 3-8) in the absence (odd-numbered lanes) or in the presence (even-numbered lanes) of ligase as in A. The sizes of the genomic DNA fragments were 150 -200 bp (lanes 3 and 4), 200 -300 bp (lanes 5 and 6), and 300 -500 bp (lanes 7 and 8). C, procedures are the same as in A, except 32 P-labeled genomic DNA (1 g/ml) was mixed with an EcoRI-digested plasmid containing the intron (500 g/ml) prior to ligation. Arrows indicate the positions of the 198-bp nonligated intron segment.
ENDS ratio peak of 18 which was the highest value observed for sequences in the phage subdivision of GenBank. Non-overlapping fragments from this region displayed a pronounced electrophoretic anomaly which was attributed to phased oligo(A) tracts that were found throughout this segment of the genome (Anderson, 1986). The sequence appears to belong to a set of relatively high ENDS ratio peaks which lie within the GϩC-rich left arm of the genome. As noted in Fig. 10, major structural variations are seen near the beginning and end of the left arm which contains the head and tail structural genes, while minor variations are observed in the vicinity of the headtail junction.
Non-bent molecules that are shorter than the DNA persistence length of about 150 -200 bp resist cyclization in a ligation reaction (Shore et al., 1981). Consequently, the rate of cyclization of small DNA molecules with intrinsic curvature can be several orders of magnitude greater than the cyclization rate of a nonbent molecule of the same length (Koo et al., 1990;Crothers et al., 1992). The effects of sequence-dependent curvature on the cyclization of larger molecules should be more difficult to demonstrate since nonbent large fragments will readily cyclize during a ligation reaction (Shore et al., 1981). In order to investigate the cyclization of the predicted circular structure from shown in Fig. 10, the rate of cyclization of an EcoRI fragment that corresponds to the structure was compared to the rates of cyclization of two noncircular molecules of similar lengths. In Fig. 11, plasmids containing each of the sequences were digested with EcoRI and mixed together prior to ligation. The reaction was carried out using low DNA concentrations in order to favor cyclization of all DNA molecules. In an attempt to minimize random motions of the molecules, ligations were carried out at 0 -1°C in the presence of 1.8 M sucrose. The results revealed that the cyclization of the segment (designated B) occurred faster than the cyclization of the control molecules A and C. This can be seen by the preferential loss of linear monomer in B and the selective increase in the corresponding circular monomer during the course of the reaction. Results similar to those shown in the figure were seen in all five experiments where this procedure was used, and in all seven experiments where the three plasmids were ligated in separate reactions and the products detected either by ethidium bromide staining or by blot hybridization analysis (data not shown). The preferential cyclization of the sequence also occurred using varying amounts of ligase (1-20 units/ml) but was not observed in the absence of sucrose in the ligation mixture (data not shown).

DISCUSSION
The standard computer program used in this work for predicting DNA structure from nucleotide sequence assumes that AA/TT steps are the major determinants of bending as has been proposed from numerous experimental findings Diekmann et al., 1992;Hagerman, 1992;Haran et al., 1994;Wang et al., 1994;Sprous et al., 1995). A precondition for predicting structure by this program is that bending possibly caused by non AA/TT stacks should not significantly alter the interpretations of the analysis. This is apparently the case since the circular structure in the HIV-1 integrase gene (Fig.  1E), as well as other selected circular structures (data not shown), were also observed when the sequences were analyzed by a program that incorporated predicted wedge values for all dinucleotides. In addition, we have recently shown that programs which assume that AA/TT is the only source of bending were better predictors of electrophoretic data than a program based on all wedge values (Wang et al., 1994). Recent electrophoretic studies by Haran et al. (1994) have also stressed the relative unimportance of non AA/TT stacks in bending and critical discussions of non-A tract bending models for curvature are provided by Haran et al. (1994), Wang et al. (1994) and Sprous et al., 1995). The standard program has previously been shown to reproduce the shapes of short DNA molecules seen in electron micrographs and to accurately predict the planarity of curvature as determined from electrophoretic studies (Eckdahl and Anderson, 1987;Fitzgerald et al., 1994;Wang et al., 1994) and from ligation analysis of small synthetic circular DNAs (data not shown). The results of this study provide additional support for the predictive power of the program and consequently for the model of A tract bending since the circular intron segment from C. elegans preferentially cyclized in ligation reactions when compared to nonbent fragments, bent fragments with noncircular shapes, and total genomic DNA . In addition, the wild-type intron sequence cyclized significantly faster than a mutant sequence with a single base substitution and the difference was reflected in differences in electrophoretic anomaly, ENDS ratio values, and helical projections (Figs. 5 and 6). Ring closure probabilities 3 could not be accurately determined for any of the molecules shown in Fig. 5  FIG. 9. Effects of ethidium bromide and temperature on ligation reaction products. Labeled exon (pJA71-2) and intron (pJA46 -2) segments were ligated in the presence of ethidium bromide (A and B) or at various temperatures (C) for 15 min. A, the labeled DNAs (1 g/ml) were ligated in the presence of nonradioactive EcoRIcut recombinant plasmids (350 g/ml). The concentrations of ethidium were: 0, 0, 10, 20, 40, and 200 ng/ml. Ligase (5 units/ml) was added to samples in lanes 2-6. B, the procedure described in A was used, except the nonradioactive DNA was omitted from the ligation mixture. C, samples were ligated in the presence of nonradioactive plasmids but in the absence of ethidium at 5, 10, 20, 25, and 37°C for lanes 1-6, respectively. Higher concentrations of ligase were used (50 units/ml) to ensure complete ligation at all temperatures. Positions of the linear and circular monomers (LM, CM) are indicated. because the intron segments displayed very low rates of bimolecular association, which could not be quantified reliably, while the noncircular fragments did not cyclize or cyclized very slowly under all reaction conditions tested (Figs. 6, 7, and 9; data not shown). However, relative cyclization efficiencies could be estimated by making assumptions that have made by others who have encountered similar problems with short synthetic DNA molecules (Crothers et al., 1992). We estimate that the relative cyclization efficiency of the wild-type intron segment was about 3-fold greater than the mutant intron segment and more than 100-fold greater than all of the noncircular molecules shown in Fig. 5. These estimates were not affected by variation in DNA concentration, ligase concentration, ethidium bromide, or temperature (Figs. 6 -9; data not shown). The sequence encoding the phage terminase also cyclized significantly faster than noncircular molecules of similar lengths and this sequence displays predicted planar bending throughout its 880-bp length (Figs. 10 and 11). Additional studies along these lines are needed in order to further characterize the shapes and stabilities of such large structures in solution. The identification of an ensemble of sequences with circular helical projections as in Figs. 1-4 provides a rich source of natural molecules that could be used for such an analysis.
DNA inversion systems in prokaryotes have served as models for the study of site-specific recombination (Glasgow et al., 1988;Heichman and Johnson, 1990;Stark et al., 1992;Van de Putte and Goosen, 1992;Merker et al., 1993). Typically, a recombinase binds as a dimer to each of two inverted repeat sequences and dimers of Fis interact with two sites within a transpositional enhancer located nearby. Protein-induced DNA bending and Fis-recombinase interactions then facilitate the looping of intervening DNA to form a large multi-protein com- plex called the invertasome. Accessory DNA-bending proteins such as the histone-like proteins HU and IHF also aid in particle formation by facilitating the folding of the DNA (Surette et al., 1989;Harrington, 1992). Thus, the ability of DNA to loop in the invertasome should not only depend on the position and orientation of specific protein binding sites but also on the physical properties of the intervening DNA including its flexibility and intrinsic curvature. Support for this idea has been derived from the observation that intrinsically bent DNA can substitute for protein-induced bending in many systems including inversion reactions (Bracco et al., 1989;Goodman and Nash, 1989). Consequently, the short segments of bent DNA found in recombinational enhancers have been viewed as factors that facilitate protein-induced bending and thus the formation of the invertasome (Johnson et al., 1987;Glasgow et al., 1988;Hü bner et al., 1989). The preferential association of high ENDS ratio peaks with a variety of inversion and long-range looping systems (Fig. 4, A-H) makes it likely that the large circular structures play an analogous but perhaps more global role in particle formation. This model could also be applied to the sequences in Fig. 4 (I-N), which are thought to display regional recombinational specificity. In these systems, the large circular structures may direct recombinational complexes to the regions where strand cleavage and rejoining take place.
The circular structures described in this report may play some role in recombination in addition to a general packaging function. A consideration of such a role is particularly relevant since the requirement for a highly ordered DNA topology in the invertasome has led to the view that the DNA molecules are active participants in recombination rather than merely passive substrates for protein action. An important characteristic of inversion systems is the requirement of a negatively supercoiled DNA substrate that facilitates protein-protein interaction and the formation of the invertasome. Restriction nuclease studies and electron microscopy have revealed that bent DNA is positioned within end loops of supercoiled molecules (Silver et al., 1986;Laundon and Griffith, 1988). Thus, bent DNA can order the global structure of superhelical DNA (Silver et al., 1986;Laundon and Griffith, 1988;Kremer et al., 1993). It seems likely that this organizing effect would depend on the configuration of the bent segment since O-and C-shaped structures should preferentially reside at the ends of interwound molecules while bent structures such as S-shaped regions should be preferentially excluded from these sites. These considerations imply that the circular DNA shapes have the potential to align DNA sites both within and adjacent to protein complexes on domains of supercoiled DNA and this could be important in controlling DNA topology during recombination and other long range looping events.
The circular structures in the primate immunodeficiency viruses were detected as the highest ENDS ratio sequence set in the entire GenBank data base (Figs. 1-3). This structure is not found in all retroviruses, but is apparently more conserved than sequence (see Figs. 1 and 2 and "Results"), which implies that selective pressures have maintained the circular form. To our knowledge, this region is not a hot spot for recombination, although a role for the structure in this process cannot be excluded. Perhaps more likely, the structure functions in a DNA looping mechanism which serves to bring together the protein binding sites located in the two distinct subdomains of the intragenic transcriptional enhancer (see Fig. 1). According to this view, the structure may have a packaging function analogous to the circular elements in Fig. 4 (A, D, E, G, and H). The presence of multiple functional AP-1 recognition sequences in the upstream enhancer subdomain of HIV is of interest since these sequences are frequently components of complex regulatory elements, which contain binding sites for multiple transcription factors (Sonnenberg et al., 1989;Kerppola and Curran, 1991). Several potential binding sites for transcription factors other than AP-1 are indeed found in this region of the HIV genome (Van Lint et al., 1991). Thus, the DNase I-hypersensitive site that resides at this location in HIV-1 chromatin (Verdin, 1991) may be a reflection of a complex regulatory particle assembled onto the circular DNA segment. The DNA structure is expected to be regulated since unintegrated linear viral DNA has a single-stranded gap at this site, which would prevent circle formation prior to integration. In addition, the enhancer and the nuclease-hypersensitive site are regulated by cellular factors following the integration of the HIV genome into host cell chromatin (Verdin et al., 1990;VanLint et al., 1991;Verdin, 1991). Although the function of this highly unusual conformation in regulating the HIV life cycle is not yet known, the structure may provide a model system for the study of a regulatable looping mechanism that is dependent on the intrinsic circularity of the DNA.