Involvement of the Nucleotide Excision Repair Protein UvrA in Instability of CAG (cid:1) CTG Repeat Sequences in Escherichia coli *

Several human genetic diseases have been associated with the genetic instability, specifically expansion, of trinucleotide repeat sequences such as (CTG) n (cid:1) (CAG) n . Molecular models of repeat instability imply replication slippage and the formation of loops and imperfect hairpins in single strands. Subsequently, these loops or hairpins may be recognized and processed by DNA repair systems. To evaluate the potential role of nucleotide excision repair in repeat instability, we measured the rates of repeat deletion in wild type and excision repair-deficient Escherichia coli strains (using a genetic assay for deletions). The rate of triplet repeat deletion decreased in an E. coli strain deficient in the damage recognition protein UvrA. Moreover, loops containing 23 CTG repeats were less efficiently excised from heteroduplex plasmids after their transformation into the uvrA (cid:1) strain. As a result, an increased proportion of plasmids containing the full-length repeat were recovered after the replication of heteroduplex plasmids containing unrepaired loops. In biochemical experiments, UvrA bound to heteroduplex substrates containing repeat loops of 1, 2, or 17 CAG repeats with a K d of about 10–20 n M , which is an affinity about 2 orders of magnitude higher than that of UvrA bound to the control substrates role maintaining the integrity of Heteroduplex isomers formed by the short fragments of pUC8NcoI and pEO(CAG)23H were isolated from 5% polyacrylamide gel and purified using an UltraClean 15 plasmid purification kit from MoBio (Solana Beach, CA). Heteroduplex species were identified by sequencing using a 15:1 mixture of KlenTaq 1 and Pfu polymerases (LA-16) as described previously (40). Each heteroduplex isomer was ligated with the Afl III Nde I long fragments of pUC8 to produce two types of the triplet repeat hairpin-containing constructs: those with the (CTG) 23 hairpin in the leading strand, and those with the (CAG) 23 hairpin in the lagging strand. The constructs were transformed into the appropriate E. coli strains that were grown on LB (cid:4) amp plates at 30 °C. Randomly selected clones were placed in 5 ml of LB broth containing ampicillin and grown overnight at 30 °C. Plasmids from each culture were isolated by the Magic mini-prep procedure (Promega) and analyzed for the presence of the Nco I site and the (CTG) 23 (cid:1) (CAG) 23 insert. For this, plasmids were digested separately with Nco I and Pvu II and analyzed on a 1% agarose gel. Binding of UvrA Protein to the (CAG) n Loop-containing Heteroduplex DNA— For experiments on UvrA binding to heteroduplexes, each of the plasmids p(CAG)6E, p(CAG)7E, p(CAG)8E, and pEO(CAG)23E was digested with Pvu II producing short (301-bp) triplet repeat-containing Pvu II- Pvu II fragments and longer (2364-bp) fragments of vector DNA. After dephosphorylation, restriction fragments were 5 (cid:2) -end-labeled with [ (cid:4) - 32 P]ATP. Upon hybridization of the short Pvu II- Pvu II fragment of plasmid p(CAG)6E with each of the similar fragments of plasmids p(CAG)7E, p(CAG)8E,

Repetitive DNA sequences in the human genome are clearly involved in disease-associated mutations or genetic rearrangements (1)(2)(3)(4). Several human genetic diseases, including Huntington's disease, fragile X syndrome, myotonic distrophy, and Friedreich's ataxia, have been associated with the genetic instability (length expansion) of trinucleotide repeat sequences such as (CTG) n ⅐(CAG) n , (CGG) n ⅐(CCG) n , and (GAA) n ⅐(TTC) n (see Refs. 5 and 6 for review). Some types of cancer, such as prostate cancer and nonpolyposis colorectal cancer, are also associated with the instability of repeat sequences (7)(8)(9). Certain molecular models of repeat instability have implied a role for alternative DNA structures in aberrant DNA replication, repair, and recombination (10). Repeat instability may also involve replication slippage, which would be associated with the formation of loops and imperfect hairpins in single strands (11)(12)(13). The data available to date indicate that both short (1-3 repeats) and long (likely in multiples of 40 repeats) repeat tracts may loop out (14 -18). The extent of hairpin-stabilizing internal hydrogen bonding in looped-out strands depends on their length and sequence (13,19,20). For example, (CTG) n and (CGG) n form more stable hairpins than their corresponding complements, (CAG) n and (CCG) n . Subsequent loop/hairpin recognition and processing by some postreplicative mechanism are likely involved (21). The identities of the systems responsible for loop repair remain unclear.
Whereas the methyl-directed mismatch repair (22)(23)(24) is clearly involved in very short repeat instability, one may not expect significant effects on the large loop processing. This is because the methyl-directed mismatch repair system is only able to recognize mismatches/heteroduplexes as large as 3 nucleotides (25,26). Indeed, mutations in various methyl-directed mismatch repair functions in bacteria, yeast, and humans did not change their ability to remove large nonrepeating loops (27)(28)(29). The SbcCD nuclease, which has been previously shown to be responsible for cleaving hairpins in vitro (30) and generating double-stranded breaks at inverted repeats in vivo (31), may play a role in the large loop processing. Furthermore, the nucleotide excision repair (NER) 1 system (32), which repairs bulky DNA adducts at damaged nucleobases, is likely involved in loop repair because the recognition specificity of NER proteins extends beyond the damaged bases. In particular, strand breaks and flaps and unpaired strands in heteroduplex DNA may serve as binding substrates for UvrA, which is a primary recognition protein in the NER system of Escherichia coli (33)(34)(35)(36). We were interested in evaluating the contributions of DNA repair systems that may process relatively large loops arising from the DNA strand slippage. Using a genetic assay for deletions, we found that the rates of triplet repeat deletions decreased in the UvrA-deficient E. coli strain. Heteroduplex plasmids containing trinucleotide repeat loops in one strand were less efficiently repaired in the UvrA-deficient E. coli strain after transformation. Purified UvrA binds to heteroduplex substrates containing repeat loop-outs with an affinity about 2 orders of magnitude higher than that to linear DNA. Altogether, this indicates that UvrA is involved in triplet repeat instability related to the formation of large singlestranded loops.
Plasmids-Plasmid pUC8NcoI is identical to pUC8, except that it contains a NcoI site as a result of a 1-base pair A-T substitution for the G-C base pair at position 225 of pUC8 sequence introduced by the polymerase chain reaction point mutation technique (37). To assemble (CTG) n ⅐(CAG) n repeats from oligonucleotides and clone them in pBR325, short oligonucleotides (CTG) 3 and (AGC) 3 were annealed and ligated to create a series of polymers [(CTG) 3 ⅐(AGC) 3 ] m to which the EcoRI adaptor oligonucleotides were ligated. 2 After EcoRI digestion and gel purification, (CTG) n ⅐(CAG) n -containing inserts were ligated into the EcoRI site of pBR325. In all cases, cloning the repeat tract into pBR325 resulted in CAG strand comprising the leading template strand (plasmids pVH(CAG)23, pVH(CAG)25, pVH(CAG)43, and pVH(CAG)79). Plasmid digestion with the Csp45, which cuts on either side of the CAT gene, and religation of the fragments allowed an insertion of repeat in the opposite repeat orientation, with CTG strand comprising the leading template strand (plasmids pVH(CTG)25, pVH(CTG)43, and pVH-(CTG)79). Plasmids p(CAG)6E, p(CAG)7E, and p(CAG)8E were obtained by cloning synthetically made fragments containing 6, 7, and 8 repeats in the EcoRI site so that the CAG strand comprised the lagging template strand. Plasmids pEO(CAG)23H and pEO(CTG)23H contain (CTG) 23 ⅐(CAG) 23 trinucleotide repeats in the HindIII site of pUC8, with (CAG) 23 and (CTG) 23 , respectively, as the lagging strand templates for DNA replication. To construct pEO(CAG)23H and pEO(CTG)23H, the (CTG) 23 ⅐(CAG) 23 insert from pVH(CAG)23 was recloned in pUC8. For this, pUC8 was digested with HindIII, followed by filling in the overhangs using the Klenow fragment of DNA polymerase I and dephosphorylation. This vector was used for ligation in two orientations of the (CTG) 23 ⅐(CAG) 23 insert-containing EcoRI-EcoRI fragment of pVH-(CAG)23 made blunt-ended with mung bean nuclease. Plasmids were isolated from E. coli HB101 by an alkaline lysis and then purified by CsCl/ethidium bromide gradient procedures (37).
Materials-All restriction enzymes, mung bean nuclease, and Klenow fragment of E. coli DNA polymerase I were from New England Biolabs (Beverly, MA). DNA polymerases KlenTaq1 and Pfu were from Ab Peptides (St. Louis, MO) and Stratagene (La Jolla, CA), respectively. T4 DNA ligase and calf intestinal phosphatase were from Roche Molecular Biochemicals. Molecular size markers were from New England Biolabs (100-bp ladder) and Life Technologies, Inc. (123-bp ladder).
Determination of Mutation Rates-Plasmid pBR325 contains genes conferring antibiotic resistance to ampicillin (amp) and chloramphenicol. Triplet repeats were cloned in the unique EcoRI site in the CAT gene (38). Determinations of rates of triplet repeat deletion were based on reversion to a chloramphenicol-resistant (Cm r ) phenotype. The CAT gene is inactivated by a (CTG) n ⅐(CAG) n insert of 25 repeats or more, making cells chloramphenicol-sensitive (Cm s ). Deletion of the repeat tract to n Յ 24 makes cells chloramphenicol-resistant. Because of the triplet repeat instability in E. coli, the transformants were grown on LBϩamp plates for a minimal amount of time to detect colony formation. Colonies were then stored as frozen glycerol stocks for future experiments. For determination of mutation rates, frozen cells were distributed onto LBϩamp plates and allowed to grow to small colonies. A culture was then grown from a single colony in LB to mid-log phase, after which cell dilutions were plated onto LBϩamp plates to determine the viable cell count and onto LBϩampϩCm plates to determine Cm r revertants. A minimum of six independent reversion assays from different colonies were performed for each plasmid, and mutation rates were determined according to Drake (39). p values were calculated using a Mann-Whitney test. The nature of reversion events was analyzed by polymerase chain reaction and sequence analysis on DNA isolated by the Magic mini prep procedure (Promega).
Analysis of the Stability of (CTG) 23 and (CAG) 23 Hairpin-containing Heteroduplex DNA after Transformation-To form heteroduplexes, plasmids were digested with AflIII and NdeI producing 2063-bp and 602-bp AflIII-NdeI fragments for pUC8NcoI and similar fragments for pEO(CAG)23H or pEO(CTG)23H. About 5 g of pUC8NcoI and either pEO(CAG)23H or pEO(CTG)23H cut with AflIII-NdeI were mixed in 40 l of hybridization buffer and hybridized as described previously (40).

TABLE I
Effects of mutations in the NER and sbcCD systems on deletion of large repeat fragments Triplet repeats were cloned in the EcoRI site in the CAT gene that is inactivated by an insert of 25 repeats or more, making cells Cm s . Determination of rates of triplet repeat deletion to n Յ 24 was based on reversion to a Cm r phenotype. The transformants were grown on LBϩamp plates to determine the viable cell count and on LBϩampϩCm plates to determine Cm r revertants. A minimum of six independent reversion assays from different colonies were performed for each plasmid, and mutation rates were determined according to Drake (39). Median values and confidence intervals are shown. The higher mutation rates for the CAG repeats in the leading strand template (top three lines) were statistically different from those for the CTG repeats in the leading strand (bottom three lines).  43 3 25 20.

Involvement of UvrA in Instability of Repeated Sequences
Heteroduplex isomers formed by the short fragments of pUC8NcoI and pEO(CAG)23H were isolated from 5% polyacrylamide gel and purified using an UltraClean 15 plasmid purification kit from MoBio (Solana Beach, CA). Heteroduplex species were identified by sequencing using a 15:1 mixture of KlenTaq1 and Pfu polymerases (LA-16) as described previously (40). Each heteroduplex isomer was ligated with the AflIII-NdeI long fragments of pUC8 to produce two types of the triplet repeat hairpin-containing constructs: those with the (CTG) 23 hairpin in the leading strand, and those with the (CAG) 23 hairpin in the lagging strand. The constructs were transformed into the appropriate E. coli strains that were grown on LBϩamp plates at 30°C. Randomly selected clones were placed in 5 ml of LB broth containing ampicillin and grown overnight at 30°C. Plasmids from each culture were isolated by the Magic mini-prep procedure (Promega) and analyzed for the presence of the NcoI site and the (CTG) 23 ⅐(CAG) 23 insert. For this, plasmids were digested separately with NcoI and PvuII and analyzed on a 1% agarose gel.
Binding of UvrA Protein to the (CAG) n Loop-containing Heteroduplex DNA-For experiments on UvrA binding to heteroduplexes, each of the plasmids p(CAG)6E, p(CAG)7E, p(CAG)8E, and pEO(CAG)23E was digested with PvuII producing short (301-bp) triplet repeat-containing PvuII-PvuII fragments and longer (2364-bp) fragments of vector DNA. After dephosphorylation, restriction fragments were 5Ј-end-labeled with [␥-32 P]ATP. Upon hybridization of the short PvuII-PvuII fragment of plasmid p(CAG)6E with each of the similar fragments of plasmids p(CAG)7E, p(CAG)8E, and pEO(CAG)23E, the excess repeats formed loop-outs of 3, 6, and 51 nt, respectively. These heteroduplex fragments were separated on a 5% native polyacrylamide gel from the correctly annealed linear species, and, in the case of 51 nt loop-outs, heteroduplex isomers were separated from each other. The 301-bp heteroduplex fragments isolated from the gel were then digested with HindIII to produce mixtures of 205-bp end-labeled heteroduplex fragments with the CAG loop-outs, unlabeled heteroduplex fragments with the CTG loop-outs, and 96-bp labeled linear fragments. After phenol-chloroform extraction and ethanol precipitation, mixtures containing one specific labeled heteroduplex and a labeled linear fragment were used for protein binding experiments. Binding of the UvrA protein to the DNA substrates was determined by gel mobility shift assays. Typically, the substrate (2 nM) was incubated with UvrA with varying concentrations as indicated at 37°C for 15 min in 20 l of UvrABC buffer (50 mM Tris-HCl, pH 7.5, 50 mM KCl, 10 mM MgCl 2 , and 5 mM dithiothreitol) in the presence of 1 mM ATP. After incubation, 2 l of 80% (v/v) glycerol was added, and the mixture was loaded immediately onto a 3.5% native polyacrylamide gel in TBE (90 mM Tris borate (pH 8.3), 2.5 mM EDTA) running buffer and electrophoresed at room temperature. After quantification of the radioactivity of the corresponding bands in the gel, the dissociation constants were estimated from the protein concentration at the point where half of the DNA was bound.

Differential Deletion of Large Repeat Fragments in the Wild
Type and UvrAB Mutants- Table I shows the effects of mutations in the uvrA, uvrB, and sbcC genes on the deletion rates of (CAG) n ⅐(CTG) n in E. coli. Cloning the triplet repeat tracts in the CAT gene results in the insertion of a polyglutamine-coding sequence that inactivates the CAT gene, rendering cells Cm s when an insert is 25 repeats or longer. 2 Partial deletions of the triplet repeats to Ͻ25 repeats result in a Cm r phenotype. Therefore, the high mutation rates for plasmids containing 25 repeats correspond to deletions that may be as short as 1-2 repeats, whereas deletions of at least 19 and 55 repeats are required for Cm r reversion for plasmids containing 43 or 79 repeats. For the two possible insert orientations, the mutation rates were 10 -50-fold higher when the CAG repeat comprised the leading strand template, and the CTG comprised the lagging strand template. The differences were statistically significant (p Ͻ 0.025). This may be explained by a better propensity of the lagging strand CTG repeats to form hairpins that may be bypassed by DNA polymerase, leading to deletion (41). In either repeat orientation, the mutation rates are slightly lower in the uvrA and uvrB cells compared with the wild type strain for repeat lengths of 43 and 79, where deletions of at least 19 and 55 repeats must occur. For (CTG) 79 in the leading strand, the differences between the wild type and uvrA and uvrB cells were statistically significant (p Ͻ 0.025), whereas for the CAG repeats in the leading strand, the difference for (CAG) 43 was statistically significant (p Ͻ 0.05). Thus, these genetic experiments show that inactivation of either UvrA or UvrB makes large deletions of triplet repeats in bacteria less likely. Muta- tion in the sbcC gene that inactivates the SbcCD nuclease previously shown to be responsible for cleaving hairpins in vitro (21,30) reduces the deletion rates compared with the wild type strain.
Processing of Repeat Hairpin-containing Plasmids in Bacteria-To get further insight into the role that UvrA and UvrB may play in triplet repeat instability, we performed experiments on triplet repeat loop processing in wild type and mutant E. coli strains. To identify which strand of the heteroduplex plasmid was used as the template for replication during repair, heteroduplexes were made from a modified pUC8 plasmid (pUC8NcoI), in which a NcoI site was introduced into the original pUC8 by a single-base mutation, and pUC8-based plasmids carrying triplet repeat inserts but no NcoI site. As a result, one strand of the heteroduplex constructs contained a NcoI site as a marker, whereas the other strand contained a triplet repeat hairpin. Hybridization of AflIII-and NdeI-digested plasmids pUC8NcoI and pEO(CAG)23H produced heteroduplex DNA fragments that were separated on 5% polyacrylamide gel (Fig. 1). As shown in Fig. 2A, the slowermigrating heteroduplex isomer contained the top strand of pUC8NcoI and the bottom (CTG) strand of pEO(CAG)23H, whereas the faster-migrating heteroduplex isomer contained the top (CAG) strand of pEO(CAG)23H and the bottom strand of pUC8NcoI. Thus, each heteroduplex isomer contained a 1-bp mismatch 40 nt from a 75-nt loop-out (presumably, an imperfect hairpin). The (CTG)-and (CAG)-containing heteroduplex fragments were isolated from the gel and ligated with the 2063-bp AflIII-NdeI fragment of pUC8 so that the (CTG) 23 loop-out was in the leading strand template for DNA replication, and the (CAG) 23 loop-out was in the lagging strand (Fig.  2B). Similarly, heteroduplex plasmids with the (CAG) 23 loopout in the leading strand template and the (CAG) 23 loop-out in the lagging strand were made using heteroduplex isomers prepared from pUC8NcoI and pEO(CTG)23H. Because we used plasmids isolated from the E. coli strain HB101, in which dam methylation mechanism is active, both heteroduplex strands are methylated. Therefore, the repair of the mismatch at the NcoI site should proceed randomly with either one or another strand being used as the template for repair (42). Consistent with this assumption, in the progeny plasmids from 20 clones after wild type E. coli transformation with heteroduplex plasmids containing one strand with the sequence of NcoI site and the complementary strand of pUC8 (therefore without the NcoI sequence), 9 plasmids had a pUC8 sequence, 8 had a modified sequence with the NcoI site, and 3 contained a mixture of both.
Plasmids recovered from cells after heteroduplex transformation were analyzed by separate digestions with PvuII and NcoI (Fig. 3). The size of the PvuII-PvuII fragment indicated whether the repeat insert in individual clones had been lost (Fig. 3A, lanes 1 and 2) or retained (lane 4). The latter would occur if the repeat-containing strand were chosen as a template during repair replication. If a heteroduplex plasmid escaped repair, subsequent replication would result in a mixture of the NcoI site-and repeat -containing plasmids in the same cell (lanes 3 in Fig. 3, A and B). For individual clones, the NcoI digestion distinguished the plasmids containing no inserts as either pUC8NcoI (Fig. 3B, lane 1) or pUC8 (lane 2). For a pure plasmid population, susceptibility to NcoI likely means that the NcoI site-containing strand was chosen as a template during repair replication. Table II shows the effects of mutations in uvrA, uvrB, and sbcC on loop-out processing by E. coli cells when either (CAG) 23 or (CTG) 23 loop-outs are in the lagging strand for DNA replication. Upon transformation of the wild-type strain AB1157 with heteroduplex plasmids, no clones with pure insert-containing plasmids were found for either loop-out. Most of the clones contained pUC8NcoI, and minor fractions of clones contained pUC8. This is consistent with loop recognition as a target for excision and with the NcoI-containing strand being chosen as a template for replication during heteroduplex repair. The repair patch seems to be long enough to include both the heteroduplex loop-out and a single-base pair mismatch separated by 40 bp. About one-fourth of the clones formed after cell transformation with the (CAG) 23 loop-out heteroduplex contained a mixture of plasmids. The plasmid mixture contained both pUC8NcoI and (CAG) 23 -containing plasmids, consistent with replication of unrepaired heteroduplex construct, so that both strands were replicated, and the two plasmids coexisted in the same transformant colony. No significant differences were detected for processing of heteroduplex constructs in wild type strain AB1157 and strains AB1885 (mutant for UvrB protein, which is required to form the UvrBC-DNA incision complex during NER (43)) and PF2070 (mutant for the SbcCD protein complex, an exonuclease known to prevent replication of long palindromes (21,30)). Upon heteroduplex transformation into strain AB1886, which is mutant for the damage recognition protein UvrA, 1 of 40 clones contained a pure insert-containing plasmid in the case of the (CAG) 23 loop-out. Table III shows the effects of mutations in uvrA, uvrB, and sbcC on loop-out processing by E. coli cells when either (CAG) 23 or (CTG) 23 loop-outs are in the leading strand for DNA replication. Similar to the wild type strain AB1157, no clones with pure insert-containing plasmids were found for strains AB1885 and PF2070 with mutations in the uvrB and sbcC genes, respectively. In all three strains, recovered plasmids predominantly contained the NcoI site. However, in the case of (CTG) 23 loop-outs, a significant fraction of clones contained a mixture of plasmids. The percentage of cells containing the mixture was somewhat higher for the SbcC-deficient strain compared with the wild type and UvrB-deficient strains. Compared with other strains, in the UvrA-deficient strain, the (CTG) 23 inserts were retained at a higher rate. Three of 40 clones contained pure repeat-containing plasmids, and 17 clones contained a mixture of plasmids. The number of clones containing plasmids with undeleted repeats increased at the expense of clones containing pure pUC8NcoI plasmids. Thus, it appeared that protein UvrA, which recognizes damaged nucleotides and loads UvrB at the damage site, was also able to recognize loop-outs of trinucleotide repeats, and in the absence of such recognition, fewer loop-outs were processed by repair systems, resulting in an increased amount of the insert-containing plasmids.
UvrA Binding to Heteroduplex Substrates-In the E. coli NER, the functional form of UvrA is a UvrA 2 dimer that interacts with UvrB to form a UvrA 2 B complex and load UvrB to the damage site. In addition, UvrA 2 itself recognizes DNA damage (44). To determine whether UvrA 2 interacts directly with triplet repeat loop-outs and may thereby influence the deletion of triplet repeats, the binding of purified protein to the heteroduplex loops was tested. One strand of heteroduplex substrates contained 6 CTG repeats, whereas the complementary strands contained either 7, 8, or 23 CAG repeats (Fig. 4A). Thus, the loop sizes were 1, 2, and 17 triplet repeats or 3, 6, and 51 nucleotides. Double-stranded DNA fragments carrying inserts were used as control substrates. The nonspecific UvrA 2 binding to control substrates was characterized by dissociation constants in the low micromolar range, consistent with the previously published data (45,46). In all three cases of heteroduplex substrates, strong protein binding was detected (Fig. 4, B-D).
The dissociation constants were about 2 orders of magnitude lower than those for control double-stranded DNA substrates, which indicated specific UvrA 2 binding to the heteroduplex DNA (Fig. 4E). However, no UvrB⅐DNA complex has been detected in the gel mobility shift assay upon incubation of the heteroduplex substrates with the protein complex UvrA 2 B. Two options are possible: (i) the UvrB⅐DNA complex has not been detected due to the technical difficulties as reported for some N-2-acetylaminofluorene adducts (47), and (ii) although UvrA 2 recognizes both CTG loops and damaged nucleotides, it does not load UvrB onto the loops and influences the repeat loop excision in a different way than the excision of damaged nucleotides. The latter option is consistent with the data on processing of heteroduplex loop-outs in E. coli cells, where uvrB mutation had no effect.

DISCUSSION
Nucleotide excision repair is a major cellular repair pathway for a variety of DNA damages in both prokaryotes and eukaryotes (32). To investigate the potential contribution of this system to triplet repeat instability, we analyzed repeat stability in E. coli strains containing mutations in different genes involved in the nucleotide excision repair. A significant role of the UvrA protein has emerged as a consensus from genetic and biochemical experiments. When UvrA was inactive, the deletion rate of (CAG) 79 ⅐(CTG) 79 from plasmids propagated in bac-teria was significantly lower than that in the wild type cells. UvrA may promote a removal of the replication misalignment loops, thus promoting shortening of the repeat length in progeny plasmids. Consistent with this idea, in the absence of active UvrA, fewer preformed repeat loop-outs were excised after transformation of heteroduplex molecules in E. coli. A direct involvement of UvrA-DNA loop interaction in repeat instability was supported by the demonstration of an interac- FIG. 4. UvrA binding to heteroduplex substrates. A, schematics of heteroduplexes containing loops of 1, 2, and 17 triplet repeats or 3, 6, and 51 nt. The 319-bp PvuII-PvuII heteroduplex fragments were radiolabeled at both ends and then cut with HindIII, producing a 93-bp short linear fragment and longer 226-bp heteroduplex isomers. For heteroduplexes with 3-and 6-nt loops that were impossible to separate in a gel, UvrA binding to the mixtures of isomers was studied. However, only the band shift of the heteroduplex isomer with the (CAG) n loop can be detected. The short linear fragment is not retarded upon protein addition and serves as an internal control. B, UvrA binding to the 3-nucleotide loop heteroduplex. C, UvrA binding to the 6-nucleotide loop heteroduplex. D, UvrA binding to the 51-nucleotide loop heteroduplex. In this case, individual isomers were isolated before UvrA binding, and the band shift assay for the isomer with the (CAG) 17   tion between the heteroduplex substrates containing singlestranded loops/hairpins and UvrA. Several issues related to the UvrA role in repeat instability are worth discussing. According to our data, UvrA inactivation decreases repeat instability. This effect is less pronounced at short repeat lengths (n ϭ 25), where our genetic assay detects deletions of 1 repeat and longer. When longer deletions must occur to be detected (for (CTG) 43 , a 19-repeat deletion is necessary to restore chloramphenicol resistance), reductions in the deletion rates in the UvrA-deficient strain compared with the wild type E. coli become more pronounced and, for (CAG) 43 , statistically significant. At longer lengths (n ϭ 79), where a 55-repeat deletion is necessary to detect the deletion event, the differences in repeat instability in the wild type and UvrAdeficient strains were more pronounced and, for the CTG leading strand, significant. In biochemical experiments that involved multiple cycles of growth through exponential and stationary phases, no significant influence of mutations in the NER system on stability of repeats when n ϭ 50 was found (48). Higher instability of long (CTG) n ⅐(CAG) n repeats (n ϭ 175) was observed in the UvrA-deficient strain compared with the wild type (48). However, the latter result is difficult to compare with our data for considerably shorter repeats.
Models of instability imply that DNA repeats allow strand misalignment during replication, leading to the formation of partially unpaired loops/hairpins. In bacteria, repeat instability generally manifests itself as contraction of the repeat length. This may be due to either a hairpin/loop removal by an appropriate repair system or, if unrepaired, polymerase bypass in the next round of replication. A major assumption of our transformation experiments was that after transformation heteroduplex plasmids undergo repair (27, 49 -53), and then repaired and unrepaired plasmids undergo replication. Our preliminary experiments showed that methyl-directed mismatch repair has no significant role in removal of the large heteroduplex loop-outs. Both (CTG) 23 and (CAG) 23 loop-outs were efficiently removed in E. coli deficient in the mismatch repair (MutS) (data not shown). Inactivation of the SbcCD protein complex, which is known to cleave hairpins in vitro (30) and generate double-strand breaks at inverted repeats in vivo (31), reduced triplet repeat deletions in genetic experiments but did not result in a dramatic increase in retention of the repeats in a progeny of heteroduplex plasmids. Although a repair system that plays a major role in excising the repeat loops has not yet been identified, the evidence presented in this study supports the involvement of UvrA in the process.
The results of the band shift assay show that purified UvrA 2 binds specifically to heteroduplex substrates containing loopouts of 3, 6, and 51 nt. Because the binding affinities are very similar for all three substrates, they likely have a common recognition determinant. The damage recognition by UvrA 2 consists in probing for an enhanced capacity of DNA regions around the damage to undergo unwinding and bending deformations (32). The formation of very different types of DNA lesions, such as a thymine dimer or benzo[a]pyrene diol epoxide adducts at N2 of guanines or N6 of adenines, results in a common structural feature, bent or kinked DNA at the damaged sites (54,55), which is one of the elements searched for by UvrA 2 . Consistent with this, it is reasonable to suggest that UvrA 2 may bind to a kink of double helix at the three-way junction. However, the subsequent tight binding of UvrB to DNA necessary for the formation of the UvrBC-DNA incision complex has not been detected. Therefore, it is possible that UvrA 2 triggers a sequence of events different from the usual NER.
Several potential models may explain the involvement of UvrA in the instability of triplet repeat heteroduplex loops. First, UvrA may act as a recognition protein of this error of replication (or recombination). UvrA 2 may recruit other repair proteins that may accomplish loop excision. Another possible explanation is that UvrA 2 blocks replication of the loop-containing strand, making it highly inefficient, whereas the other strand is quite normally replicated. This is similar to a suggested replication interference of Uvr 2 bound to the N-2-aminofluorene adduct (56). The replication blockage may result in overrepresentation of progeny plasmids without repeats. In some clones, heteroduplex plasmids did not seem to undergo loop repair and then went into replication, producing mixtures of both repeat-containing and vector plasmids. The ratio of vectors to repeat-containing plasmids is severalfold, so it is possible that a partial block to replication by a UvrA 2 -(CAG) loop complex results in this vector bias. Finally, a block to polymerization may induce a gap that would stall the replication fork and initiate fork restart, thereby inducing recombination events (57-60) that lead to increased rates of deletions.