Long CTG (cid:1) CAG Repeat Sequences Markedly Stimulate Intramolecular Recombination*

Previous studies have shown that homologous recombination is a powerful mechanism for generation of mas-sive instabilities of the myotonic dystrophy CTG (cid:1) CAG sequences. However, the frequency of recombination between the CTG (cid:1) CAG tracts has not been studied. Here we performed a systematic study on the frequency of recombination between these sequences using a genetic assay based on an intramolecular plasmid system in Escherichia coli . The rate of intramolecular recombination between long CTG (cid:1) CAG tracts oriented as direct repeats was extraordinarily high; recombinants were found with a frequency exceeding 12%. Recombination occurred in both RecA (cid:1) and RecA (cid:2) cells but was (cid:1) 2–11 times higher in the recombination proficient strain. Long CTG (cid:1) CAG tracts recombined (cid:1) 10 times more efficiently than non-repeating control sequences of similar length. The recombination frequency was 60-fold higher for a pair of (CTG (cid:1) CAG) 165 tracts compared with a pair of (CTG (cid:1) CAG) 17 sequences. The CTG (cid:1) CAG sequences in ori- entation II (CTG repeats present on a lagging both orientations origin of triplet repeat fragment excised. bands, purified and precipitated by digesting pBR322 and Hin dIII followed by filling in the recessed 3 (cid:2) termini as described insert mixed at a molar ratio of 1:10 and ligated 14 h 16 °C the addition of 20 units of T4 DNA ligase ligation mixture ethanol-precipi-tated and transformed into E. coli HB101 by electroporation kV, size plated on LB agar plates g/ml ampicillin. individual DNA fluorescent colony in (cid:1) 10 5 white colonies analyzed. These that the of the colony” and white

Previous studies have shown that homologous recombination is a powerful mechanism for generation of massive instabilities of the myotonic dystrophy CTG⅐CAG sequences. However, the frequency of recombination between the CTG⅐CAG tracts has not been studied. Here we performed a systematic study on the frequency of recombination between these sequences using a genetic assay based on an intramolecular plasmid system in Escherichia coli. The rate of intramolecular recombination between long CTG⅐CAG tracts oriented as direct repeats was extraordinarily high; recombinants were found with a frequency exceeding 12%. Recombination occurred in both RecA ؉ and RecA ؊ cells but was ϳ2-11 times higher in the recombination proficient strain. Long CTG⅐CAG tracts recombined ϳ10 times more efficiently than non-repeating control sequences of similar length. The recombination frequency was 60-fold higher for a pair of (CTG⅐CAG) 165 tracts compared with a pair of (CTG⅐CAG) 17 sequences. The CTG⅐CAG sequences in orientation II (CTG repeats present on a lagging strand template) recombine ϳ2-4 times more efficiently than tracts of identical length in the opposite orientation relative to the origin of replication. This orientation effect implies the involvement of DNA replication in the intramolecular recombination between CTG⅐CAG sequences. Thus, long CTG⅐CAG tracts are hot spots for genetic recombination.
Genetic instabilities (expansions and deletions) of simple repeating sequences are important in the life cycles of both prokaryotic (1) and eukaryotic (2) cells. This fundamental mechanism of mutagenesis has been found in mycoplasma, bacteria, yeast, mammalian cell cultures, and in humans. In mycoplasma and bacteria, these genetic polymorphisms are the basis for phase variations, which control the expression of genes (3)(4)(5)(6)(7). In humans, the expansions and deletions of simple repeating sequences are closely tied to the etiologies of cancers (8 -12) as well as hereditary neurological diseases (reviewed in Ref. 2).
The general mechanism accepted for all of these instabilities is slipped strand mispairing, which allows mismatching of neighboring repeats and, depending on the strand orientation, enables the insertion or deletion of repeats during DNA polymerase-mediated duplication (reviewed in Refs. 2 and 13). The enzymatic machineries involved include DNA replication and repair (nucleotide excision repair, methyl-directed mismatch repair, and DNA polymerase III proofreading) (2,13). Biochemical and genetic studies showed also that expansions and deletions of the TRS 1 sequences occur in vivo by homologous recombination (14,15). These investigations, carried out in a two-plasmid system, demonstrated that the expansion mechanism is principally gene conversion rather than unequal crossing-over (15).
In the present work, we cloned two triplet repeat tracts in the same plasmid and used an intramolecular assay to study the recombinational properties of the CTG⅐CAG sequences ( Fig. 1). Intramolecular recombination systems have been widely used to investigate the mechanism of the recombination processes (16 -24) as well as to establish the recombinational properties of different DNA sequences, including microsatellites (25).
It has been shown that recombination between TRS tracts can lead to repeat expansion (14,15,26). Evaluation of the frequency of recombination between CTG⅐CAG tracts provides important information about the cellular mechanisms of instability, relative to replication and repair.
Here we have developed the first genetic assay for monitoring the frequencies of intramolecular recombination between CTG⅐CAG tracts in Escherichia coli. Interestingly, long CTG⅐CAG repeat sequences from myotonic dystrophy are preferred sites for intramolecular recombination. In our companion paper (27), we have established a genetic assay for monitoring the recombination frequency of the CTG⅐CAG repeat tracts in an intermolecular system.

EXPERIMENTAL PROCEDURES
Parent Plasmids-pRW3244, pRW4026, pRW3246, and pRW3248 were the parent plasmids containing (CTG⅐CAG) n tracts used for these experiments; these pUC19NotI derivatives contain the (CTG⅐CAG) n tracts cloned into the HincII site of the polylinker (28 -30). For nomenclature of the TRS, CTG⅐CAG designates a duplex sequence of repeating CTG, which may also be written TGC or GCT; CAG, the complementary strand, may also be written as AGC or GCA. The orientation is 5Ј to 3Ј for both designations of the antiparallel strands. pRW3244 contains (CTG⅐CAG) 17 , pRW4026 contains the (CTG⅐CAG) 67 , pRW3246 contains (CTG⅐CAG) 98 , and pRW3248 contains (CTG⅐CAG) 175 sequence. The (CTG⅐CAG) 175 sequence is not a pure CTG⅐CAG tract but contains two G to A interruptions at repeats 28 and 69 (30); all other TRS are pure (not interrupted). All of these sequences have non-repeating human flanking sequences (19 and 41 bp) outside the repeated tract. pRW3815 2 is a pUC18NotI derivative and contains the (GTC⅐GAC) 79 tract. These plasmids were maintained in E. coli HB101 (Invitrogen) (mcrB, mmr, hsdS20 (r B Ϫ , m B Ϫ ), recA1, supE44, ara14, galK2, lacY1, proA2, rplS20 (Sm R ), xyl5, Ϫ , leuB6, mtl-1). The (CTG⅐CAG) n and (GTC⅐GAC) n sequences were subcloned into pBR322.
Cloning of (CTG⅐CAG) n and (GTC⅐GAC) n Sequences into pBR322-The general strategy of this investigation involved recloning of the (CTG⅐CAG) n and (GTC⅐GAC) n sequences from pUC19NotI and pUC18NotI derivatives (28), respectively, into pBR322. Fragments containing the CTG⅐CAG and the GTC⅐GAC TRS were prepared from these plasmids by digesting the pUC19NotI or pUC18NotI derivatives with EcoRI and HindIII (New England Biolabs, Inc.) followed by filling-in the recessed 3Ј termini with 0.1 unit of the Klenow fragment of E. coli DNA polymerase I (U. S. Biochemical Corp.) and the four dNTPs (0.1 mM each). In the case of pRW4806 (a pUC19NotI derivative harboring a tract of 165 uninterrupted CTG⅐CAG repeats), the insert was prepared by AluI digestion. The blunt-ended DNA fragments were used for cloning to obtain plasmids containing the TRS tracts in both orientations relative to the unidirectional ColE1 origin of replication. The digested DNA was electrophoresed in a 7% polyacrylamide gel and stained with ethidium bromide, and the bands containing the triplet repeat fragment were excised. The DNA was eluted from the excised bands, purified by phenol-chloroform extraction, and precipitated with ethanol (31). The vector was prepared by digesting pBR322 with EcoRI and HindIII followed by filling in the recessed 3Ј termini as described earlier. The vector and the insert were mixed at a molar ratio of ϳ1:10 and ligated for 14 h at 16°C by the addition of 20 units of T4 DNA ligase (U. S. Biochemical Corp.). The ligation mixture was ethanol-precipitated and transformed into E. coli HB101 by electroporation (2.5 kV, cuvette size 0.2 mm) and plated on LB agar plates containing 100 g/ml ampicillin. Plasmid DNA was isolated from individual transformants by the Wizard Plus Miniprep DNA Purification System (Promega). Clones containing the CTG⅐CAG repeats in orientations I and II (defined in Refs. 28 and 30) were obtained and characterized by restriction mapping. The inserts cloned into the EcoRI/HindIII site of pBR322 are referred to as "X inserts" (Fig. 2). The pBR322 derivatives containing a single CTG⅐CAG sequence (the "X insert") were subsequently used to clone the second TRS tract (CTG⅐CAG or GTC⅐GAC) into the PvuII site at position 2064 of the pBR322 backbone (Fig. 2, Y insert). The same experimental approach was used to clone the second TRS insert, except that after ligation the reaction mixture was subjected to PvuII digestion to eliminate plasmids lacking the insert. This strategy enabled the construction of a family of plasmids harboring two homologous TRS tracts oriented as direct repeats or inverted repeats as well as plasmids containing non-homologous repeats (Fig. 2).
All plasmids were characterized by restriction mapping (to determine the orientation and length of the cloned TRS) and dideoxy sequencing of both strands with ThermoSequenase Radiolabeled Terminator Cycle Sequencing Kit (U. S. Biochemical Corp.). The sequencing reactions were carried out according to the manufacturer's recommendations using the following pBR322 specific primers: pBR322EcoRI, GTATCACGAGGCCCT which 3Ј-terminates at the pBR322 map position 4347 (New England Biolabs, Inc.); pBRHR, GCGTTAGCAATTTA-ACTGTGAT which 3Ј-terminates at the pBR322 map position 49 (Genosys Inc.); pBRPF, GCTTCACGACCACGCTGAT which 3Ј-terminates at the pBR322 map position 2052 (Genosys Inc.); pBRPR, GTCAGAG-GTTTTCACCGTCAT which 3Ј-terminates at the pBR322 map position 2087 (Genosys Inc.). The products of the sequencing reactions were analyzed on 6% Long Ranger gels (FMC BioProducts) containing 7.5 M urea in the glycerol tolerant gel buffer (U. S. Biochemical Corp.). The gels were dried and exposed to x-ray film.
Cloning of Non-repeating DNA Sequences into pBR322-Two different non-repeating sequences were used as controls in this study: the 564-bp fragment of phage DNA (HindIII fragment from nucleotide position 36895 to 37459) and the 354-bp fragment of the human DMPK gene (part of the exon 7 and intron 7) (32)(33)(34)(35). pRW4804 and pRW4805 were constructed by digestion of phage DNA with HindIII and cloning one of the released restriction fragments (564 bp) into the HindIII and PvuII sites of pBR322. Thus, the two plasmids, pRW4804 and pRW4805, harbor direct and inverted repeats, respectively (Fig. 2).
The exon 7/intron 7 fragment of the human DMPK gene used for construction of pRW4871 and pRW4873 was obtained by PCR amplification of the sequence from the human genomic DNA. The PCR was carried out in a volume of 20 l containing 50 ng of genomic DNA, 1.5 mM MgCl 2 , 50 mM KCl, 10 mM Tris/HCl, pH 8.3, 200 M of each dNTP, and 0.2 units of Pfu Turbo DNA polymerase (Stratagene). The PCR primers DM7F, GGCTCGAGACTTCATTCAGC, and DM7R, TAGAT-GGGCACAGAGCAGGT, were used at the concentration 1 M. Amplification on a PCR System 9700 (Applied Biosystems) involved 35 cycles: 20 s/95°C, 20 s/58°C, and 40 s/72°C. PAGE-purified PCR product was phosphorylated using 2 mM ATP and 5 units of T4 polynucleotide kinase (New England Biolabs, Inc.) and cloned into the HindIII and PvuII sites of pBR322. pRW4871 as well as pRW4873 contain homologous sequences oriented as direct repeats; however, the orientations of the pairs of inserts are opposite in these two plasmids.
Cloning of the Green Fluorescence Protein Gene (GFP) into pBR322 Derivatives-pBR322 and pBR322 derivatives containing direct, inverted, non-homologous repeats and non-repeating DNA sequences were digested with EcoRV and EagI (positions 185 and 939 on the pBR322 map, respectively) to remove the 754-bp DNA fragment of the vector backbone. The digested plasmids were purified by 5% acrylamide gel electrophoresis as described earlier and ligated to the GFPuv gene (36). The GFPuv gene was obtained by digestion of the pGFPuv (CLONTECH Laboratories, Inc.) with PvuII and EagI (positions 56 and 1078 on the pGFPuv map, respectively). After ligation and transformation into E. coli HB101, transformants were screened using a long-wave length UV lamp. The cells carrying plasmids with the GFPuv gene emitted a strong green fluorescence.
The GFP cassette from pGFPuv contains the GFPuv variant of the green fluorescent protein gene inserted in-frame with the lacZ initiation codon from pUC19 so that a ␤-galactosidase-GFPuv fusion protein is expressed from the lac promoter in E. coli.
Conditions of Bacterial Growth for Recombination Studies-For determinations of recombination properties, plasmids containing TRS tracts were electrophoresed in 1% agarose gels, and bands corresponding to the supercoiled plasmids were excised from the gels, transferred into dialysis tubes, and electroeluted (31). To avoid DNA damage, plasmid purifications were performed without ethidium bromide staining and UV irradiation of DNA. In all experiments, only gel-purified, supercoiled plasmid DNA was used for transformation of the appropriate E. coli strains. To ensure the identical conditions for experiments with all plasmids studied, a large batch of the competent cells was prepared for each set of experiments, and the transformations were always done in parallel. The transformants were cultured, harvested, and analyzed under the same conditions. The following E. coli strains were used: AB1157 (37) as a parent of the recombination deficient strain JC10289 (thr-1, ara-14, leuB6, ⌬(gpt-proA)62, lacY1, tsx-33, glnV44(AS), galK2, Ϫ , racϪ, hisG4(Oc), rfbD1, mgl-51, ⌬(recA Ϫ srl)306, srlR301::Tn10, rpsL31(strR), kdgK51, xylA5, mtl-1, argE3(Oc), thi-1). Strains were obtained from the E. coli Genetic Stock Center, Yale University, New Haven, CT. In the population experiments, the transformation mixture was inoculated into 10-ml LB tubes containing 100 g/ml ampicillin at a cell density of 10 2 cells/ml. The cultures were grown at 37°C with shaking at 250 rpm. At late log phase (A 600 ϳ1.0 units), the cells were harvested, and the plasmid DNA was isolated as described above and analyzed by restriction digestion.
To determine the frequency of recombination, plasmids harboring the GFPuv gene were transformed into the appropriate E. coli strain, plated onto LB plates containing 100 g/ml ampicillin, and incubated for 16 h at 37°C. The frequency of recombination was measured as the ratio of the number of white colonies to the total number of viable cells. The white as well as a representative number of fluorescent colonies were inoculated into 10 ml of LB medium (containing ampicillin at 100 g/ml). After overnight growth, the plasmids were isolated and subjected to the restriction and DNA sequencing analyses. The statistical analyses were performed using SigmaStat version 2.03.
This genetic assay enabled the detection and quantitation of the recombination events that occurred directly after transformation of the parental plasmids into the host cells. In order to detect those recombination events that took place at a later stage of colony formation, the recombination product would have to outgrow the parental plasmid molecules (that are present in a large excess at the moment of the recombination event). Consequently, the recombinant plasmid should have a tremendous replication advantage over the parental plasmids. However, this can be easily ruled out by the results of copy number analyses (see "Results").
In addition, the white and the fluorescent colonies are stable. Randomly selected fluorescent colonies (350 total) were inoculated into one bulk culture, mixed, and then plated on plates containing ampicillin. After overnight growth, no white colonies were observed among ϳ2 ϫ 10 5 colonies screened. The same experiment was repeated for the white 2 K. Ohshima and R. D Wells, unpublished data. colonies which revealed no fluorescent colony formation in ϳ10 5 white colonies analyzed. These results indicate that the "color of the colony" (i.e. recombination status of the plasmid) is established at the earliest stage of the colony formation, and masking or overgrowing of the cells to alter the apparent color (e.g. fluorescent cells by the white ones or vice versa) is highly unlikely.
Determination of E. coli Growth Rates and Plasmid Copy Numbers-To ensure that results of the population experiments are not biased by the growth advantage of cells containing recombination products over the cells containing parental plasmids, the doubling time of E. coli cells harboring either recombination substrates or the recombination products with CTG⅐CAG tracts of different lengths and orientations was established. The determination of the doubling time and plasmid copy numbers as well as the recombination studies were carried out under identical conditions of bacterial growth (10-ml LB tubes containing 100 g/ml ampicillin, 37°C with shaking at 250 rpm). In each case, ϳ10 2 -10 3 cells/ml were used to start the cultures. Aliquots of 10 l were withdrawn at every 30 -60 min for ϳ8 h, diluted in LB, and subsequently plated on agar plates without ampicillin. The growth curves were prepared using SigmaPlot 2000 version 6.10, and the doubling time was calculated as described previously (38).
To exclude the possibility of the replicative advantage of recombination products over the parental plasmids, the copy numbers of these plasmids were determined as described earlier (39,40); the size of the E. coli genome of 4,639 Kbp (41) was used for these calculations. The quantitative analyses of plasmid and genomic DNAs separated by agarose gels were performed using FluorChem version 3.04 (Alpha Innotech Corp.).
Agarose and Polyacrylamide Gel Analyses of Recombination Products-In order to analyze the products of intramolecular recombination between repeating sequences, the isolated DNAs were linearized with AflIII and labeled by end-filling with the Klenow fragment of E. coli DNA polymerase I and [␣-32 P]dATP. The labeled DNAs were separated on 1% agarose gels in TAE (40 mM Tris acetate, 1 mM EDTA, pH 8) buffer, and the gels were dried and exposed to x-ray film. The instabilities of the TRS tracts of the recombination products were determined using SphI/BamHI digestion followed by end labeling as described above. The products were resolved in 5-7% polyacrylamide gels in TAE buffer. The lengths of the CTG⅐CAG inserts were calculated as described earlier (29). The primary structures of more than 35 individual recombination products were determined by direct DNA sequencing of one or both DNA strands.

Intramolecular System to Study Recombination between
(CTG⅐CAG) n Sequences-We used an intramolecular plasmid system to study recombination between TRS tracts, where two homologous repetitive sequences are located on the same plasmid molecule and are separated by non-homologous intervening sequences.
Two homologous TRS tracts present on the same replicon can be oriented relative to each other as direct or inverted repeats (Fig. 1). The term "orientation" is used in this study to define the relative directionality between two recombining homologous sequences (direct and inverted repeats). The terms "orientation I" and "orientation II" refer to the orientation of the TRS sequences relative to the origin of replication; for example,

FIG. 1. The products of intramolecular recombination between CTG⅐CAG tracts oriented as direct repeats or as inverted repeats.
The homologous recombination between direct repeats (left panel) leads to the formation of a smaller plasmid containing only one CTG⅐CAG tract; the DNA fragment which originally separated the two TRS tracts (shown as a dotted area) is deleted. This deleted fragment is inviable due to the absence of an origin of replication and the ampicillin resistance gene and will therefore be lost. In the case of the inverted repeats (right panel), the recombination event between two homologous sequences leads to the inversion of the sequence separating the repeats shown here as an inversion of the direction of the replication origin and the ampicillin resistance gene. The ampicillin resistance gene (Amp) is designated as a white arrow. The gray arrow shows the orientation of the unidirectional origin of replication (ori). A portion of the CTG⅐CAG tracts is black and a second portion has a white background to illustrate the location and the consequences of the recombination events.
for the plasmids containing (CTG⅐CAG) n tracts in orientation I, the CTG repeat is in the leading strand template, whereas for the plasmids harboring (CAG⅐CTG) n tracts, in orientation II, the CTG repeat is in the lagging strand template (28 -30, 42, 43).
The recombination frequency as well as the types of final products of the intramolecular recombination event strongly depend on the relative orientation of the recombining sequences ( Fig. 1) (18,21,24,44). The recombination event between direct repeats may lead to the deletion of one of the homologous tracts and any intervening sequences between the repeats (20, 21, 44 -47). In the case of homologous TRS tracts, the intervening sequence separating the repeats will be also deleted. However, due to their repetitive nature, two homologous CTG⅐CAG tracts can align and hybridize with each other in several different frames (the number of frames equals the number of repeats divided by 3). As a result of possible different alignments of the CTG⅐CAG sequences, the length of TRS tracts in the recombination products may vary from the minimum length required for recombination to occur to the maximum length determined by the size of both recombining homologous sequences. The intervening sequence separating the two TRS tracts is inviable because it lacks an origin of replication as well as the ampicillin resistance gene and therefore will be lost during cell division.
The recombination event between inverted repeats ( Fig. 1, right panel) can lead to the inversion of the intervening sequence between the homologous repeats (44,48). This will result in an inversion of the direction of the ampicillin resistance gene and an inversion of the origin of replication. However, other types of products of intramolecular recombination between inverted repeats such as head-to-head inverted dimers have also been described previously (18,24).
Plasmids Containing Direct and Inverted Repeats-Intramolecular plasmid systems have been used widely for investigating the mechanisms of recombination and the influence of different factors on this process (16 -19, 21-24). We used this system to investigate recombination between CTG⅐CAG repeats in E. coli. For this study, we constructed and characterized a family of pBR322 derivatives (Fig. 2). Various lengths of CTG⅐CAG repeats (17, 67, 98, 165 and 175) were cloned into the EcoRI/HindIII and PvuII sites of pBR322. Two homologous TRS tracts inserted in both orientations (I and II) as direct and inverted repeats (Fig. 2, left and center columns) were separated by ϳ2,000 bp of the intervening sequence (Fig. 2, dotted region) and 2,300 bp of the intervening sequence harboring the unidirectional replication origin and the ampicillin resistance gene. Introduction of the X TRS insert into the EcoRI/HindIII site of pBR322 inactivated the tetracycline resistance gene FIG. 2. Plasmids used in this study. All plasmids are derivatives of pBR322 and contain two inserts (X and Y) oriented as direct repeats (left column) or inverted repeats (central column). Control plasmids harboring two non-homologous TRS are shown in the right column. TRS inserts as well as a non-repeating DNA sequences were cloned into the HindIII/EcoRI (X TRS insert) or PvuII (Y TRS insert) sites of pBR322 (for details, see "Experimental Procedures"). The inserts containing 17, 67, 98, and 165 CTG⅐CAG repeats as well as the (GTC⅐GAC) 79 insert are homogeneous (i.e. are perfect repeating sequences and contain no interruptions). The (CTG⅐CAG) 175 sequence present in pSF3 and pSF4 is not a pure CTG⅐CAG tract but contains two G to A interruptions at repeats 28 and 69. The actual sequences of the leading strand templates of the TRS inserts are shown for all plasmids. Thus, CTG⅐CAG and CAG⅐CTG inserts correspond to orientation I and orientation II, respectively (28 -30). The ampicillin resistance gene (Amp) is designated as a white arrow. The gray arrow shows the approximate position and direction of the origin of replication (ori). (49). The cloning of the Y TRS insert into the PvuII site destroyed the rop gene of pBR322, which mediates the activity of RNA I. The latter resulted in an elevated copy number of the plasmids (50).
As controls, plasmids with non-repeating homologous sequences instead of the CTG⅐CAG repeats were constructed (Fig.  2). Two different non-repeating DNA fragments were used: the 564-bp fragment of bacteriophage DNA and the 354-bp fragment of exon 7/intron 7 of the human DMPK gene (see "Experimental Procedures" for details). In addition, pBR322 derivatives containing one CTG⅐CAG tract (the X insert) and its isomeric GTC⅐GAC sequence (51) (the Y insert) were constructed (Fig. 2, right column) as controls for non-homologous TRS tracts in one plasmid.
All plasmids were maintained in E. coli HB101, which is RecA Ϫ . Previous studies (18 -20, 23) showed that intramolecular plasmid recombination is not dependent on the function of the recA gene product. Thus, even the propagation of the plasmids in E. coli HB101 to obtain working stocks of plasmids can cause DNA rearrangements due to RecA-independent recombination. In addition to the recombination events, cultivation of E. coli harboring plasmids with TRS tracts leads to the genetic instability of repeating sequences manifested predominantly as deletion products (28 -30, 51, 52). This applies mainly to the long uninterrupted CTG⅐CAG sequences such as (CTG⅐CAG) 67 or longer (28,30,52). To eliminate the possibility of transformation by plasmids containing either large rearrangements caused by recombination (e.g. dimers, substantial deletions, duplications) or smaller deletions within the TRS tracts due to replication errors, all plasmids were subjected to extensive agarose gel purification. The purity and sequence integrity of DNA was determined before transformation using restriction analyses and DNA sequencing. Only plasmids that met the above-mentioned criteria were subsequently used for transformation experiments.
Recombination between Direct Repeats-The recombination behavior of the plasmids shown in Fig. 2 was studied in two E. coli strains that differed in their recombination capacity: AB1157 (parent) and JC10289 (RecA Ϫ ). For all plasmids, both single colony analyses and population experiments were performed (see "Experimental Procedures"). The plasmids isolated from E. coli AB1157 and JC10289 (Fig. 3, ϩ and Ϫ lanes, respectively) were analyzed by AflIII digestion. The unique AflIII recognition site is located between the origin of replication and the Y TRS insert, about 60 bp from the origin of replication. Thus, large rearrangements such as dimerization or deletion of the DNA segment between homologous sequences, which may result from recombination, can be detected. Restriction analyses of plasmids containing direct repeats (pRW4815, pRW4817, pRW4819, pRW4821, pRW4823, pRW4825, and pRW4804) isolated from RecA ϩ and RecA Ϫ E. coli showed bands of 4,500 -5,500 bp in size, corresponding to the starting DNA (co-migrating on agarose gel with the plasmids used for transformation; Fig. 3A, lanes C) and shorter DNA fragments at ϳ2,500 -3,000 bp. As revealed by restriction analyses of plasmids isolated from single colonies and DNA sequencing of several clones, the shorter fragments (at 2,500 -3,000 bp) correspond to the recombination products between direct repeats, which harbor only one stretch of CTG⅐CAG repeats and lack the intervening sequence separating the two homologous TRS tracts. The same type of recombination products was observed in the case of pRW4804 containing the homologous non-repeating sequences (Fig. 3A).
Thus, we conclude that the predominant products of recombination between directly repeated CTG⅐CAG tracts are intramolecular deletions. Moreover, quantitative analyses of the data presented in Fig. 3A, obtained using PhosphorImager scanning of radioactively labeled restriction fragments, showed that the amount of the recombination products was strongly dependent on the length of the CTG⅐CAG sequence. In the case of pRW4815 and pRW4817, both containing (CTG⅐CAG) 17 , the recombination product constituted ϳ1% of the total DNA isolated. For plasmids harboring (CTG⅐CAG) 67 (pRW4819 and pRW4821) and (CTG⅐CAG) 98 (pRW4823 and pRW4825), the recombination products accounted for ϳ15 and ϳ30% of the total DNA, respectively. The plasmids containing 98 CTG⅐CAG repeats (410 bp of homologous sequence including human myotonic dystrophy flanking sequences and a fragment of the pUC19 polylinker) showed ϳ20 -30 times higher propensity of recombination product formation when compared with the nonrepeating 564-bp phage DNA (Fig. 3A, compare pRW4823 and pRW4825 with pRW4804).
Quantitative analyses of the data presented in Fig. 3A also revealed that the amount of the recombination products depends on the orientation of the CTG⅐CAG sequence relative to the origin of replication, with orientation II being more recombination-prone than orientation I (Fig. 3A). Also, this was confirmed later by using a genetic assay to study the frequency of recombination between the direct repeats (see below). The influence of the CTG⅐CAG orientation on the recombination frequency suggests an important involvement of replication mechanisms such as polymerase pausing and induction of DNA nicks in intramolecular recombination between the CTG⅐CAG sequences.
The products of intramolecular deletion between the direct repeats were detected in both RecA ϩ and RecA Ϫ strains; however, E. coli JC10289 (RecA Ϫ ) exhibited a lower recombination propensity than the isogenic RecA ϩ cells (Fig. 3A, compare lanes ϩ with Ϫ). These results are in agreement with previous studies (19,45) showing that intramolecular plasmid recombination does occur efficiently independent of the recA gene function, although the presence of RecA increases the frequency of this process.
Furthermore, the spectrum of recombination products is different for plasmids containing short stretches of CTG⅐CAG as compared with the long tracts. Digestion of the recombination products from pRW4815 and pRW4817 (containing (CTG⅐CAG) 17 ) with AflIII showed a single band (within the resolution of the agarose gel), but recombination between the homologous sequences harboring 67 and 98 CTG⅐CAG repeats gave a set of products spanning a distance of at least 500 bp (Fig. 3A). This effect might be due to the higher instability of the longer TRS tracts present in the recombination products; however, it is more likely that the size variability of the CTG⅐CAG tracts in the recombination products increases with the length of the recombining homologous sequences. Restriction analyses of products of the intramolecular recombination events between direct or inverted repeats or non-homologous sequences. Plasmids were isolated from E. coli AB1157 (RecA ϩ ) and JC10289 (RecA Ϫ ) cultures that were grown until the late log phase. The DNA was linearized with AflIII, end-labeled, and electrophoretically separated through 1% agarose gels in TAE buffer. The starting material (the plasmids used for transformation of the E. coli strains) is indicated as C. Recombination proficient and deficient strains are shown as ϩ or Ϫ, respectively. The 1-kbp DNA ladder (Invitrogen) was used as a size marker, and the sizes of these bands are indicated (left sides of gels). A, the results of AflIII digestion of plasmids containing direct repeats. Brackets designate the full-length plasmids (ϳ5 kbp) as well as the products of intramolecular deletion due to recombination between the repeated tracts (at ϳ3 kbp). B, AflIII digestion products of plasmids containing the inverted repeats. C, shows the AflIII digestion products of control plasmids containing non-homologous TRS. plasmids with one CTG⅐CAG tract may have a growth advantage over cells harboring the larger parental plasmids with two TRS tracts. Second, the differences in size and the number of TRS present in the replicon could influence the replicative advantage (copy number) of one type of plasmid over the other. Thus, studies were conducted to evaluate the magnitude of these potential influences.
The growth curves of E. coli AB1157 and JC10289 host strains harboring plasmids with two of the longest, uninterrupted TRS tracts studied (pRW4863 and pRW4865, with two (CTG⅐CAG) 165 tracts in orientations I and II, respectively) were compared with the growth curves of bacteria harboring recombination products with single TRS tracts of 21, 58, and ϳ92 CTG⅐CAG repeats. The doubling time (t 2 ) calculated during the exponential phase of growth was almost identical for E. coli AB1157 harboring pRW4863 (t 2AB ϭ 21.7 Ϯ 1.3 min), pRW4865 (t 2AB ϭ 22.0 Ϯ 1.4 min), and for bacteria that harbored recombination products (22.7 Ϯ 1.4, 21.8 Ϯ 1.8, and 23.7 Ϯ 2.1 min, for plasmids containing the single tracts of 21, 58, and 92 repeats, respectively). The doubling time of E. coli JC10289 was lengthened by 15-25% for all plasmids studied. Approximately 20% difference in t 2 between AB1157 and JC10289 was observed regardless of the presence of the plasmid. The doubling time of bacteria harboring non-repeating DNA sequences (pRW4804) was ϳ5-10% shorter than for E. coli harboring plasmids with CTG⅐CAG repeats. There was no statistical difference in t 2 between pRW4804 (19.7 Ϯ 1.8 min) and the recombination product of this plasmid (20.9 Ϯ 0.8 min). Thus, under our experimental conditions, no growth advantage of cells harboring the recombination products over cells harboring the recombination substrates was observed. These results are in agreement with the previous findings (30) that cells harboring plasmids with a shorter TRS tract ((CTG⅐CAG) 17 ) do not have a growth advantage over cells containing plasmids with (CTG⅐CAG) 175 tract, so long as the cultures were maintained in the exponential phase of growth (even for several generations). Alternatively, the growth advantage was pronounced after E. coli passed through the stationary phase (30), which are conditions never employed in our studies.
In order to determine the difference in the replication propensities of plasmids with one or two TRS tracts, the copy numbers of pRW4865 (with two (CTG⅐CAG) 165 tracts), pRW4815 (with two (CTG⅐CAG) 17 tracts), and the recombination product (with one CTG⅐CAG sequence of 21 repeats) were analyzed. By using the detergent lysis method (39,40), we observed that the plasmid copy number is ϳ10% higher for the recombinant plasmid (148 Ϯ 9 copies per genome) than for pRW4865 (134 Ϯ 7 copies per genome) or pRW4815 (129 Ϯ 7 copies per genome) in E. coli AB1157. Thus, the copy number of the plasmid harboring a pair of short (CTG⅐CAG) 17 tracts is very similar to the copy number of the plasmid carrying two long 165 repeats tracts. We can also conclude that the difference in copy numbers between recombination substrates and products is negligible.
It should be noted that recombination products (ϳ3 kbp plasmids) do not form multimeric forms (dimers, trimers, etc.) with a high efficiency while maintained in E. coli AB1157 (RecA ϩ ). However, recombination substrates are capable of forming large amounts of multimeric forms in the recombination proficient cells. Considering the oligomeric states of the plasmids, 134 copies of pRW4865 and 129 copies of pRW4815 account for 283 and 277 monomer equivalents, respectively. Thus, the copy number as well as the number of monomer equivalents does not depend on the length of CTG⅐CAG tracts (in the range of 17-165 repeats). Determination of the copy number of the same plasmids in JC10289 (RecA Ϫ ) revealed that both recombination substrates and products are maintained in E. coli at almost the same copy number.
These experiments showed clearly that parental plasmids with long repeat tracts do not have a replication disadvantage compared with the recombination products. Moreover, due to the approximately two times higher amount of monomer equivalents present in the recombination proficient cells carrying plasmids with two TRS tracts, the frequency of recombination events leading to the formation of the smaller plasmids (calculated from the data presented in Fig. 3A) may be underestimated rather than overestimated.
In summary, these data show only negligible effects of both the replicative advantage of recombinant plasmids over parental plasmids and the growth advantage of cells containing smaller plasmids with one TRS tract on the outcome of the population experiments shown in Fig. 3. On the other hand, the data obtained from the biochemical approach must be interpreted cautiously due to their lower precision, sensitivity, and statistical significance compared with the genetic assay. Therefore, we established a genetic assay for determining the frequency of intramolecular recombination between two TRS tracts (see below).
Recombination between Inverted Repeats-Experiments with plasmids containing the inverted CTG⅐CAG repeats were carried out under identical conditions as described above for plasmids harboring direct repeats. In contrast to the plasmids containing direct repeats, when the two TRS tracts were oriented as inverted repeats, no products of intramolecular deletions were ever observed (Fig. 3B). This result was expected (18,24,44).
The predicted product of recombination between inverted repeats is a simple intramolecular inversion as shown in Fig. 1 (right panel) (44,48). More than two hundred colonies from recombination studies with pRW4816, pRW4818, pRW4820, pRW4822, pRW4824, pRW4826, and pRW4805 along with DNAs isolated from the population experiments were analyzed in both RecA ϩ and RecA Ϫ strains. NheI/AatII digestion was used to identify the inversion products. These two sites flank the X TRS, and the analysis of the starting DNA should give rise to two fragments that are ϳ550 -800 and 4100 -4350 bp long (the size depends on the number of CTG⅐CAG repeats present). If the products of intramolecular inversion due to recombination are formed, two new bands should be detected on agarose gels (ϳ2450 -2700 and 2200 -2450 bp in length). Unexpectedly (44,48), intramolecular inversions were never detected (data not shown).
Intramolecular recombination between inverted repeats can also lead to the formation of a head-to-tail dimer with complex DNA rearrangements (18,24). This kind of recombination product would be easily detected by AflIII digestion as well as by electrophoresis of supercoiled plasmid DNA on agarose gels. None of those two approaches showed formation of such recombination products.
Several factors can explain the failure of detection of recombination between inverted repeats. The lower frequency of recombination between inverted repeats in comparison to direct repeats was reported previously (44); therefore, this process may not be detectable by our radiolabeling methods. Even in the case of non-repeating sequences (pRW4804 and pRW4805), the products of recombination between direct repeats were detected in contrast to those of inverted repeats. By using biochemical methods (restriction digestion and radioactive labeling), we were able to detect products of the recombination events occurring with a frequency of Ն10 Ϫ4 . Furthermore, it is possible that the intrinsic properties of the TRS sequences (e.g. to form stable DNA structures (2), pause DNA polymerases (51,53,54), or cause the double-strand breaks (55-58)) favor a specific recombination/repair pathway, which in our experimental conditions strongly promotes recombination between direct repeats and/or inhibits inverted repeat recombination.

Frequency of Intramolecular Recombination between Direct Repeats Depends on the Length of the CTG⅐CAG Tracts-The
GFP gene was cloned into the region of the plasmids that underwent deletion during recombination (for details see "Experimental Procedures"). The GFP cassette contains the GFPuv variant of the green fluorescent protein, which is expressed in E. coli under the control of the lac promoter (36). The GFP emits strong green fluorescence when irradiated with long wavelength UV light. The detection of the fluorescence does not require exogenous substrates or cofactors and is completely independent of the genetic background of bacterial host cells (59,60). Plasmids containing the GFP gene separating two TRS tracts were transformed into E. coli AB1157 and JC10289. In all experiments the transformations were performed with a large excess of cells to DNA molecules so that transformation should have occurred by a single plasmid molecule (61). The white colonies are formed only when the incoming plasmid undergoes recombination immediately after the transformation, leading to the loss of the GFP gene located between the direct repeats. When the incoming plasmid is established in the host cell and replicates several times, the expression of the GFP gene leads to fluorescent colony formation. The frequency of recombination was measured as the ratio of the number of white colonies to the total number of viable cells (Fig. 4A).
Sixteen plasmids containing the GFP cassette were constructed and used in our experiments (Fig. 4, B and C). The two major factors found to influence the recombination frequency between direct repeats were the length of the TRS tract and the orientation of the CTG⅐CAG sequence relative to the origin of replication. Fig. 4B shows that plasmids containing short (CTG⅐CAG) 17 tracts in orientation I recombined in E. coli AB1157 with the frequency ϳ5 times lower than plasmids harboring non-repeating sequences. However, simple inversion of the orientation (pRW4817gfp) caused a statistically significant (p Ͻ 0.001) 4-fold increase in recombination propensity of the (CTG⅐CAG) 17 sequences. Thus, the plasmids containing the (CTG⅐CAG) 17 tracts in orientation II recombined with a frequency similar to non-repeating, homologous DNA fragments that were 200 -400 bp longer.
The effect of length of the homologous TRS regions on the recombination frequency was dramatic. For CTG⅐CAG tracts in orientation I, lengthening the recombining sequences to 67, 98, and 165 repeats increased the rate of recombination 6.5, 8, and 60 times, respectively, in comparison to (CTG⅐CAG) 17 . A similar effect of the length of the recombining sequences was observed also for plasmids harboring TRS tracts in orientation II (Figs. 4B and 5). Hence, the long CTG⅐CAG tracts have a much higher propensity for recombination than shorter tracts; moreover, the frequency of recombination between the longest (CTG⅐CAG) 165 sequences studied (pRW4863gfp and pRW4865gfp) was ϳ7-10 times higher (p Ͻ 0.001) than the frequency observed for non-repeating sequences of comparable length (pRW4804gfp). In addition, the level of recombination between the 564-bp long phage DNA fragments was only slightly higher (11.9 ϫ 10 Ϫ3 ) than for the 354-bp DMPK gene fragments (9.2-10.4 ϫ 10 Ϫ3 ). This statistically insignificant difference (p ϭ 0.07) suggests that the frequency of recombination does not depend on the length of the recombining fragments in the case of non-repeating DNA sequences. These results are in agreement with previous studies (21) showing that the frequency of recombination between non-repeating DNA sequences (fragments of the tetracycline resistance gene) oriented as direct repeats increases as the length of the homologous sequences increases from 14 to 100 bp. Further lengthening of the repeats (up to 854 bp) had little or no effect on the recombination frequency (21).
It should be pointed out that the rate of intramolecular recombination between long CTG⅐CAG tracts was extraordinarily high; recombinants were found with a frequency 1.5- After transformation, the E. coli cells were plated onto LB plates containing ampicillin (100 g/ml) and incubated for 16 h at 37°C. The fluorescence of the colonies harboring the nonrecombined plasmids was detected by exposing the colonies to a long-wave UV lamp. B, the recombination frequency was measured as the ratio of the number of white colonies to the total number of viable cells (both fluorescent green and white colonies). For each plasmid, three or more independent experiments were performed, and at least 7,000 colonies were counted, except for pRW4863gfp and pRW4865gfp where ϳ2000 colonies were counted. The frequency was calculated as the mean of the data collected from all experiments. R represents relative frequency of recombination and is calculated relative to the frequency of recombination observed for pRW4815gfp harboring a pair of (CTG⅐CAG) 17 inserts. C, the frequency of white colony formation in experiments conducted with control plasmids in E. coli AB1157.
12.6% (for plasmids containing 98 and 165 CTG⅐CAG repeats). Therefore, the recombination products could be easily detected and visualized in plasmids isolated from population experiments, as shown in Fig. 3A.
CTG⅐CAG Tracts in Orientation II Are More Susceptible to Recombination-Although CTG⅐CAG tracts stimulate recombination in both orientations, a pronounced orientation dependence of the frequency was observed. The frequencies of recombination were 4, 2, and 3.5 times higher for plasmids containing (CTG⅐CAG) 17 , (CTG⅐CAG) 67 , and (CTG⅐CAG) 98 in orientation II, respectively, than for plasmids harboring repeats of the same length but in orientation I (p Ͻ 0.001). Surprisingly, in the case of the longest tracts studied ((CTG⅐CAG) 165 ), the orientation dependence was found to be the opposite. However, the frequency of recombination for both pRW4863gfp and pRW4865gfp was much higher than for plasmids harboring shorter tracts. The plasmid containing (CTG⅐CAG) 165 in orientation I showed a higher recombination propensity than plasmids with the (CTG⅐CAG) 165 in the orientation II (126 ϫ 10 Ϫ3 versus 81 ϫ 10 Ϫ3 ). The reason for this finding is uncertain, but we believe that the instability of the CTG⅐CAG tracts contributes to this behavior. Long, uninterrupted CTG⅐CAG sequences (even containing 67 or 98 repeats) are extremely unstable in plasmids cultivated in E. coli. In addition, the CTG⅐CAG repeats in orientation II undergo deletions with a much higher rate than those in orientation I. Both pRW4863gfp and pRW4865gfp harbor the very long (CTG⅐CAG) 165 sequences; thus, it is essentially impossible to stably maintain them in E. coli. Although the preparation of pRW4863gfp (orientation I) used in these experiments contained only 5-10% deletions, the pRW4865gfp preparation (orientation II) contained ϳ30 -35% deletions (estimated by restriction digestion followed by DNA labeling and calculated as the total amount of deletions from both TRS inserts). This difference in the TRS stability is likely responsible for the apparent lower frequency of recombination observed between the two (CTG⅐CAG) 165 inserts in orientation II. Thus, if this extreme level of instability was not encountered for the DNAs with orientation II, we anticipate that the frequencies of ϳ180 ϫ 10 Ϫ3 would have been observed, amounting to a 36-fold enhancement compared with the shortest CTG⅐CAG tracts.
As expected, the formation of white colonies due to recombination was not detected for plasmids harboring inverted re-peats (pRW4820gfp and pRW4822gfp) as well as for plasmids containing a CTG⅐CAG tract in the same plasmid as the isomeric GTC⅐GAC repeats (pRW4830gfp and pRW4831gfp, Fig.  4C). In the case of pBR322gfp, which contains no homologous repeating sequences, a single white colony was found (ϳ30,000 colonies were screened), and the restriction analysis of DNA isolated from that colony showed the existence of a point mutation (1 or 2 nucleotide deletion) within the GFP gene, which was obviously a sporadic event.
Intramolecular Recombination Is Independent of RecA-Intramolecular recombination experiments with plasmids containing CTG⅐CAG repeats were done in both recombination proficient and deficient E. coli strains. In contrast to the significant reduction of the intermolecular recombination frequency by recA gene knockout (14,15), intramolecular plasmid recombination is known to proceed efficiently in recA-deficient strains (19,45).
Similar to the results obtained with E. coli AB1157, the recombination frequency in JC10289 (recA) was strongly dependent on the length and orientation of the recombining CTG⅐CAG tracts (Figs. 4B and 5). The recA gene inactivation reduced the overall rates of intramolecular recombination by 2-11-fold in comparison to the isogenic recombination proficient E. coli cells. In the case of plasmids containing shorter CTG⅐CAG tracts (17 and 67 repeats), the effect of recA deletion was modest (2-4-fold decrease in recombination frequency). However, a stronger, 3-11-fold reduction in the recombination rate was detected for plasmids harboring (CTG⅐CAG) 98 and (CTG⅐CAG) 165 . These results are in agreement with the previous studies, where the effect of recA mutation on intramolecular recombination varied from 0-to a 40-fold decrease in frequency and was predominantly dependent on the length of the recombining sequences (reviewed in Ref. 17).
Instability of the CTG⅐CAG Sequences in the Recombination Products-To study the instability of the CTG⅐CAG tract resulting from intramolecular recombination, plasmids were isolated from white colonies and analyzed by agarose gel electrophoresis. The electrophoretic migration of recombinants showed that all plasmids (ϳ400 DNA samples) isolated from white colonies lost a significant portion (ϳ2 kbp) of the vector backbone (Fig. 6A). Other types of rearrangements were not observed. In order to characterize further the structure of the recombinants, a total of 37 individual recombination products, FIG. 5. The effect of length and orientation on frequency of intramolecular recombination between direct repeat tracts. Circles represent the plasmids containing the CTG⅐CAG cloned in orientation I (pRW4815gfp, pRW4819gfp, pRW4823gfp, and pRW4863gfp); diamonds represent the plasmids containing the CTG⅐CAG cloned in orientation II (pRW4817gfp, pRW4821gfp, pRW4825gfp, and pRW4865gfp); the triangles represent the plasmids containing the nonrepeating DNA sequences (pRW4804gfp, pRW4871gfp, and pRW4873gfp). Data from experiments performed in E. coli AB1157 are shown on the main graph; the inset shows the data from E. coli JC10289 (RecA Ϫ ). The standard deviations are shown by the error bars. The homologous sequences are composed of the CTG⅐CAG repeats as well as the human flanking sequences and segments of the polylinker.
representing plasmids harboring TRS of different lengths, were subjected to the DNA sequencing. The sequence analyses revealed that all recombination products studied harbored only one TRS tract flanked by the human myotonic dystrophy sequences and fragments of the polylinker (Fig. 6B). The smallest recombination product analyzed carried only 13 triplet repeats (Fig. 6B), whereas the largest entirely sequenced TRS tract contained 165 CTG⅐CAG repeats. This expansion product was created by a recombination event between two (CTG⅐CAG) 98 inserts. In the case of recombinants containing very short CTG⅐CAG inserts, we were able to analyze, using a single sequencing reaction, the entire TRS tract along with the human flanking sequences and segments of the polylinker (Fig.  6B). In addition, pBR322 sequences that were originally separated in the parental plasmids by a distance exceeding 2 kbp (Fig. 6B, E and P sites) could also be detected. Thus, these results proved that recombination occurred between two homologous TRS inserts resulting in the deletion of the intervening sequence.
In order to analyze the size of the TRS inserts in a large number of recombinants, plasmids were subjected to SphI/ BamHI restriction digestion followed by end-labeling and polyacrylamide gel electrophoresis (Fig. 7). Although the parental plasmids contain seven recognition sites for SphI/BamHI, the recombinants harbor unique SphI and BamHI restriction sites (the remaining five are lost during recombination along with the 2 kbp of intervening sequence (Fig. 6B)). Therefore, digestion by these restriction enzymes splits the recombinant plasmids into an ϳ2.3-kbp pBR322 fragment (identical for all plasmids studied containing the origin of replication and the ampicillin resistance gene; Fig. 6B) and the CTG⅐CAG-containing inserts (Fig. 7, A-D).
At least 40 individual isolates from each recombination ex-periment were analyzed by SphI/BamHI restriction digestion. Fig. 7 shows the polyacrylamide gel analyses of typical data obtained for pRW4815gfp, pRW4819gfp, pRW4823gfp, and pSF3gfp (all plasmids contain the CTG⅐CAG tracts in orientation I). There were no significant differences in the overall TRS size distributions in the recombination products from plasmids harboring CTG⅐CAG sequences in orientations II and I. In the case of pRW4815gfp and pRW4817gfp, the progenitor (starting length (CTG⅐CAG) 17 ) was retained in more than 80% of the recombination products analyzed (Figs. 7A and 8). Only 5 isolates harbored longer tracts with (CTG⅐CAG) 23 being the longest (Fig. 7A, 12th lane) and 7 clones contained deleted products (12 repeats was the shortest). These data imply that the homologous, non-repeating flanking sequences may be an important factor for recombination between short repeats (particularly (CTG⅐CAG) 17 ).
The most interesting recombinational behavior was found for the plasmids harboring (CTG⅐CAG) 67 sequences (Figs. 7B and 8). Analyses of the pRW4819gfp and pRW4821gfp recombination products (Fig. 7B) revealed that more than 30% contained expanded CTG⅐CAG tracts. Deletions and retentions of the progenitor insert length were detected in ϳ30 and 40% of analyzed clones, respectively (Fig. 8). The smallest CTG⅐CAG tract found among the recombination products had 15 repeats, and the longest, expanded CTG⅐CAG tract contained 130 repeats.
For plasmids containing 98 or 165 CTG⅐CAG repeats (pRW4823gfp, pRW4825gfp, pRW4863gfp, and pRW4865gfp), only ϳ5% of the isolates maintained the size of the progenitor sequence; the majority of the clones analyzed (70 -80%) harbored deleted CTG⅐CAG sequences (Figs. 7C and 8). Thus, these sequences were very prone to deletions, and expansions were observed infrequently. The longest TRS tract found in the recombination products had 172 and 289 CTG⅐CAG repeats for pRW4823gfp and pRW4863gfp, respectively.
Hence, we conclude that the intramolecular recombination between CTG⅐CAG repeats results in the genetic instabilities of the TRS tracts.
Effect of Interruptions-It was demonstrated previously (24) that as little as 2.8% of heterology might reduce the frequency of recombination more than 1000-fold between non-repeating sequences in an E. coli plasmid system. The GFP gene-containing derivative of pSF3 (Fig. 2) was used to analyze the influence of sequence interruptions on the frequency of intramolecular recombination between TRS tracts. pSF3gfp harbors two CTG⅐CAG inserts in orientation I, 98 and 175 repeats in length. The longer tract contains two G to A interruptions at repeats 28 and 69 (see "Experimental Procedures"), but the other tract has no interruptions. Note that all TRS tracts studied herein (Fig.  2) containing 165 or fewer repeats were uninterrupted, whereas (CTG⅐CAG) 175 was interrupted (see "Experimental Procedures"). The frequency of recombination for pSF3gfp was ϳ2-fold higher than for pRW4823gfp containing two (CTG⅐CAG) 98 tracts in orientation I (both with no interruptions) and ϳ4-fold lower than the recombination frequency observed in the case of pRW4863gfp ((CTG⅐CAG) 165 ). In fact, the insert harboring 175 repeats with two interruptions contains a tract of 106 pure CTG⅐CAG repeats which is long enough to efficiently recombine with the (CTG⅐CAG) 98 sequence. This result showed that the presence of these two interruptions has no influence on frequency of intramolecular recombination between the CTG⅐CAG repeats.
Although the presence of these two interruptions had no influence on the frequency of recombination, their effect on the length of the CTG⅐CAG tracts in the recombination products was pronounced (Figs. 7D and 8). Restriction analyses of the FIG. 6. Analysis of products of intramolecular recombination between CTG⅐CAG repeats. A, agarose gel analysis of the plasmid DNAs isolated from the single colonies from E. coli AB1157 after transformation with pRW4823gfp. Lanes 1-6, DNA isolated from white colonies; lanes 7-12, plasmids isolated from fluorescent colonies. The sizes of bands of the supercoiled DNA ladder (Sc) are shown on the left. B, sequence analysis of an intramolecular recombination product harboring 13 CTG⅐CAG repeats. A schematic diagram of the recombinant plasmid is shown on the right. The recognition sites for BamHI and SphI used to determine the sizes of TRS tracts in the recombination products (Fig. 7) are indicated. E and P indicate the cloning sites for the X TRS insert (EcoRI) and the Y TRS insert (PvuII), respectively. The numbers in parentheses represent the original positions of these restriction sites on the pBR322 map.
FIG. 7. The influence of CTG⅐CAG length and sequence interruptions on recombination-mediated instability. The plasmids were isolated from white colonies and were digested with BamHI/SphI to release the TRS-containing inserts. Labeled DNA fragments were separated by 5-7% PAGE to determine the lengths of the CTG⅐CAG sequences. Each panel shows the analysis of ϳ30 individual colonies. A 1-kbp ladder and TRS size standard (M) were used to determine the lengths of the CTG⅐CAG tracts. The TRS size standard (bands identified by arrows) contains four BamHI/SphI fragments containing 17, 67, 98, and 175 CTG⅐CAG repeats. A, products of recombination between two (CTG⅐CAG) 17 tracts (pRW4815gfp). B, products of recombination between two (CTG⅐CAG) 67 tracts (pRW4819gfp). C, products of recombination between two (CTG⅐CAG) 98 tracts (pRW4823gfp). D, products of recombination between two TRS tracts containing (CTG⅐CAG) 175 and (CTG⅐CAG) 98 (pSF3gfp). recombination products revealed that ϳ60% of clones contained CTG⅐CAG inserts of 175 repeats or longer, whereas for pRW4823gfp (containing two uninterrupted (CTG⅐CAG) 98 inserts), less than 30% of the recombinants had 98 or more CTG⅐CAG repeats (Fig. 7, C and D).
Similar findings regarding the influence of interruptions on intermolecular recombination between CTG⅐CAG sequences were described recently (14,15). The results obtained using a two-plasmid system demonstrated that both multiple fold expansions and increase of frequency of recombination were observed when one of the recombining sequences (usually cloned into pACYC) contained an interrupted CTG⅐CAG tract and the other of the recombining plasmids harbored an uninterrupted CTG⅐CAG tract. The presence of two interruptions in each of the recombining CTG⅐CAG sequences reduced the frequency recombination as well as inhibited the formation of recombination-mediated TRS expansions (14,15). DISCUSSION Previous studies showed that repetitive sequences, such as interspersed Alu repeats (62) as well as tandemly repeated minisatellites (63,64) and microsatellites (25,(65)(66)(67)(68), exhibit a higher recombination capacity than non-repeating DNA tracts. Also, it has been postulated that TRSs, such as CTG⅐CAG repeats, may be recombination hot spots (69,70).
The principal conclusions from our studies are the following. First, long CTG⅐CAG microsatellites are preferred sites of intramolecular recombination in E. coli. The frequency of recombination between two directly repeated CTG⅐CAG tracts is up to ϳ10 times higher than between two non-repeating sequences ( DNA and DMPK DNA) of similar length in recombination proficient cells. Second, when the TRS tracts are oriented as inverted repeats, no products of homologous recombination were observed. Third, the effect of length of the homologous CTG⅐CAG tracts on the recombination frequency is dramatic. We found that increasing the length of the homologous sequences from 17 to 165 CTG⅐CAG repeats showed a 60-fold increase in the recombination frequency between direct repeats. This effect was similar for TRS tracts in orientation I and in orientation II. Fourth, a pronounced orientation dependence in the frequency was observed, although directly repeated CTG⅐CAG tracts recombined efficiently in both orientations relative to the origin of replication. TRS tracts present in orientation II (CTG repeats on the lagging strand template) are more susceptible to recombination. Fifth, intramolecular recombination between CTG⅐CAG tracts was observed in both parental and recA E. coli strains, but the frequency of this process was elevated by 2-11-fold in the recombination proficient cells. This effect is dependent on the length of the recombining sequences. Sixth, intramolecular recombination between CTG⅐CAG tracts led to high genetic instability (deletions and expansions) of the repeating sequences.
Several features of the TRS tracts may contribute to their recombinogenic behavior. During the recombination process, two CTG⅐CAG repeat tracts can hybridize with each other in many registers; in contrast, the two control, non-repeating sequences (564 bp DNA and 354 bp DMPK DNA) can align only in one frame. The number of possible alignments between two homologous TRS tracts increases with the number of repeats present in the recombining sequences, which can have an influence on the kinetics of the synapsis step of homologous recombination.
In contrast to the extremely frequent intramolecular events between direct repeats, we were unable to detect any products of recombination between the inverted repeats. Thus, recombination between head-to-head-oriented TRS was reduced by at least 100-fold compared with the head-to-tail-oriented CTG⅐CAG inserts. These results are consistent with previous data from studies on plasmid (44) and chromosome (71,72) recombination in prokaryota, yeast (73), mammalian chromosomes (74), and mammalian extrachromosomal elements (75). A dramatic difference in the recombination frequency between direct and inverted repeats was also observed for site-specific recombination systems such as the Tn3 resolvase (76 -78). For site-specific recombination systems, the orientation dependence was attributed to the geometry of the DNA (78). Studies on recombination between inverted repeats in the Salmonella typhimurium chromosome showed that the inversion process depended predominantly on the chromosomal localization of the head-to-head-oriented repeats (71). Some DNA sequences separating the recombining repeats were shown to be permissive for recombination while others did not stimulate recombination (nonpermissive) (71,79). The most comprehensive studies in E. coli plasmid systems to resolve the orientation dependence revealed that homologous recombination between inverted repeats occurs predominantly via nonconservative pathways (44,80). Thus, homologous recombination leads to the formation of linear, inviable recombinants from circular plasmid substrates harboring inverted repeats (44). We favor the idea, suggested earlier for recombination between nonrepeating sequences in yeast (73), that there is more than one efficient recombination pathway leading to the high frequency of intramolecular deletions observed for directly repeated CTG⅐CAG sequences. In contrast, only a completely conservative event (reciprocal exchange) can cause the formation of the predicted inversion between two homologous TRS tracts in the head-to-head orientation. Intramolecular recombination depends on the relative orientation of recombining TRS inserts to each other (direct and inverted tracts) as well as on the orientation of the pair of the direct repeats relative to the origin of replication. A signifi-  Fig. 7) were measured, and the numbers of CTG⅐CAG repeats were calculated as described earlier (29). Black bars, deletions; gray bars, retention of size of progenitor sequence; white bars, expansions. The orientation of the CTG⅐CAG tracts relative to the origin of replication in the parental plasmids had no influence on TRS size distribution of the recombination products. Therefore, data obtained for plasmids harboring the same number of triplet repeats but present in different orientations were combined and plotted together in a single bar. Each bar represents the data collected from the analysis of ϳ90 clones. cantly higher frequency of recombination was observed for the pair of CTG⅐CAG repeats in orientation II (when the CTG repeats are on the lagging strand template). These results confirm a tight connection between formation of the stable secondary structures by CTG⅐CAG repeats, replication arrest, and recombination. A higher propensity for recombination between CTG⅐CAG tracts in orientation II is in agreement with the formation of more stable hairpin structures by CTG repeats. Furthermore, the arrest of the replication fork progression occurred in vivo, predominantly when the CTG sequence was present on the lagging strand template (in orientation II) (54).
Intramolecular recombination between directly repeated CTG⅐CAG tracts also occurs efficiently in a RecA Ϫ E. coli. However, the presence of the recA gene product increases, in a length-dependent manner, the rate of intramolecular deletion by 2-11-fold in comparison to the isogenic RecA Ϫ cells. Previous studies (17) revealed the capacity of RecA Ϫ cells to affect intramolecular recombination. Hence, we conclude that at least two types of recombination pathways, RecA-dependent as well as RecA-independent, are responsible for recombination between CTG⅐CAG tracts in E. coli. Intramolecular deletions between direct repeats can occur by a RecA-independent singlestrand annealing pathway (21,47,81,82). Previous studies (22,47,82) also showed that these types of recombination products could result from slippage during DNA replication. Therefore, we cannot exclude that replication misalignment may be partially responsible for intramolecular deletions between direct repeats observed in our study. In addition, RecA-dependent mechanisms such as crossing-over and half-crossing-over (44,80) may contribute significantly to the high frequency of recombination observed in the case of long CTG⅐CAG sequences.
Recent data (14,15,27) showed that recombination between TRS tracts leads to large scale deletions and expansions within tandem repeat sequences at high frequency. We found also that intramolecular recombination between long, uninterrupted CTG⅐CAG tracts is a source of great instability of the TRS tracts. As expected (14,15), recombination had no significant influence on the stability of short inserts, containing 17 CTG⅐CAG repeats. In addition, the presence of the G to A interruptions in one of recombining sequences reduced the length variability of the CTG⅐CAG tracts observed in recombination products. We suggest that the interruptions disturb the homogeneity in the CTG⅐CAG repeat units and thus decrease the number of possible alignments between the two recombining TRS tracts, therefore leading to the genetic stabilization of the repeating sequence.
This work, as well as the accompanying article (27) on the frequency of intermolecular recombination between long CTG⅐CAG sequences, shows their very high recombination potential in E. coli. Their recombination hot spot characteristics and their capacity to expand by recombination (14,15,26,27) may be responsible for the genome instabilities observed in humans. Several cases of the involvement of recombination processes in TRS expansions in humans were described (reviewed in Refs. 14 and 83). Also, by taking into account the statistical overrepresentation of trinucleotide microsatellites in eukaryota (84), the frequent recombination events between TRS tracts may be a source of mutations (deletions and inversions) leading to genetic diseases. In addition, recombination between TRS may have an important evolutionary role (70) by promoting rearrangements of genetic information within different loci leading to the formation of novel genes.