A Structural Model for the Damage-sensing Complex in Bacterial Nucleotide Excision Repair

Nucleotide excision repair is distinguished from other DNA repair pathways by its ability to process a wide range of structur-ally unrelated DNA lesions. In bacteria, damage recognition is achieved by the UvrA (cid:1) UvrB ensemble. Here, we report the structure of the complex between the interaction domains of UvrA and UvrB. These domains are necessary and sufficient for full-length UvrA and UvrB to associate and thereby form the DNA damage-sensing complex of bacterial nucleotide excision repair. The crystal structure and accompanying biochemical analyses suggest a model for the complete damage-sensing complex.


A Structural Model for the Damage-sensing Complex in Bacterial Nucleotide Excision Repair *
Nucleotide excision repair is distinguished from other DNA repair pathways by its ability to process a diverse set of lesions. In bacteria, the initial steps are carried out by three proteins: UvrA, UvrB, and UvrC. The UvrA⅐UvrB complex conducts surveillance of DNA and recognizes damage. Having located a lesion, UvrA "loads" UvrB onto the DNA at the damaged sites and then dissociates. Damage searching, formation of the UvrB⅐DNA "preincision" complex, and dissociation of UvrA are regulated by ATP (1). UvrB subsequently recruits the endonuclease UvrC, which catalyzes incisions on either side of the lesion (2,3). Following incision, UvrC and the damage-containing oligonucleotide are removed by UvrD (helicase II), whereas UvrB remains bound to the gapped DNA and recruits DNA polymerase I for repair synthesis. Sealing of the single-stranded nick completes the repair process and restores the original DNA sequence (4).
Since its discovery more than 40 years ago, bacterial nucleotide excision repair has been extensively studied, resulting in a large body of work that describes the protein components and the details of how they operate. Notwithstanding the trove of genetic and biochemical data, several key questions remain unanswered. For example, how does the same set of proteins handle a diverse set of lesions while maintaining specificity? How do UvrA and UvrB cooperate during damage recognition, and what is the precise role of ATP? Ongoing studies in the field, including those described below, aim to address these issues.
Recently, we reported the structure of Geobacillus stearothermophilus UvrA and the identification of binding sites for DNA and UvrB (5). We also established that the identified UvrB-binding domain is necessary and sufficient to mediate the UvrA-UvrB interaction and that the isolated interaction domains of UvrA (5) and UvrB (6) bind to each other in solution.
To understand the interaction between UvrA and UvrB, we have determined the crystal structure of the complex between the two isolated interaction domains. The structure revealed that UvrA-UvrB interaction interface is largely polar, mediated by several highly conserved charged residues. Site-directed mutagenesis and biochemical characterization of the mutant proteins confirmed the importance of the observed interactions. Based on the interaction domain complex structure, we have constructed a structural model for the full-length UvrA⅐UvrB ensemble and propose two models for lesion recognition that will serve as a basis for future experiments.

EXPERIMENTAL PROCEDURES
Expression and Purification of G. stearothermophilus UvrA and UvrB Interaction Domain Complex-The DNA sequences encoding the interaction domains ( Fig. 1) were amplified from the plasmids containing the genes for full-length UvrA and UvrB (5), cloned into pET-28a (ϩ) (Novagen; see Table 3), and confirmed by sequencing. The UvrA and UvrB domain expression constructs contained residues 131-245 of UvrA and residues 149 -250 of UvrB, respectively, with an N-terminal His 6 tag and a thrombin cleavage site. The proteins were expressed in Escherichia coli BL21(DE3) pLysS. The cells were grown in LB broth at 37°C until A 600 reached 0.5-0.6, at which point expression was induced by the addition of 1 mM isopropyl-␤-Dthiogalactopyranoside. The cells were allowed to grow at 30°C for 4 h and harvested by centrifugation. The cell pellet was resuspended in lysis buffer (50 mM NaPO 4 , pH 8.0, 500 mM NaCl, 10 mM imidazole, 5 mM ␤-ME), 4  UvrA 131-245 and UvrB 149 -250 were separately purified using nickel-nitrilotriacetic acid-agarose (Qiagen), after which the His 6 tag was removed by thrombin cleavage. The resulting proteins were further purified by size exclusion chromatography (Superdex 75; GE Healthcare; 25 mM Tris-HCl, pH 7.4, 150 mM NaCl, 5 mM ␤-ME). The interaction domain complex was made by mixing UvrA 131-245 with molar excess of UvrB 149 -250. The complex was then purified from excess UvrB 149 -250 by size exclusion chromatography (Superdex 75; GE Healthcare; 25 mM Tris-HCl, pH 7.4, 150 mM NaCl, 5 mM ␤-ME; Fig. 2).
Structure Determination-X-ray diffraction data were collected on the NE-CAT beamline 24ID-C ( ϭ 0.97949 Å) at the Advanced Photon Source, Argonne National Laboratory. The processed data (HKL2000 (7)) revealed that the crystal belonged to the tetragonal space group P4 1 2 1 2 with cell parameters a ϭ b ϭ 84.49 Å, c ϭ 60.87 Å, ␣ ϭ ␤ ϭ ␥ ϭ 90.0°, and contained one complex in the asymmetric unit. The structure was solved at 1.8 Å by molecular replacement (PHASER (8)) using the structures of the corresponding domains in the fulllength proteins (residues 131-153 and 200 -245 of UvrA from 2R6F (5) and residues 157-250 of UvrB from 1T5L (6)) as search models. Automated model building (9) followed by manual building (10) and crystallographic refinement (11)(12)(13) resulted in a model with residues 131-245 of UvrA, residues 157-250 of UvrB, and 53 water molecules. The model displays good geometry (91.1 and 8.9% of the residues in the most favored and additional allowed regions of Ramachandran space, PROCHECK (14)) with a crystallographic R factor of 22.99% and R free of 24.75%. The accuracy of the model was confirmed by the positions of selenium atoms determined using anomalous diffraction data collected on a selenomethionine-substituted crystal ( ϭ 0.97926 Å). Data collection and refinement statistics are in Table 1. The coordinates and structure factors have been deposited to the Protein Data Bank with the accession code 3FPN.

Construction of UvrA and UvrB
Mutants-Point mutants of full-length UvrA and UvrB were constructed using QuikChange II XL site-directed mutagenesis kit (Stratagene), and the mutations were confirmed by sequencing (see Table  3). The mutant proteins were purified using the wild-type protocol (5).

TABLE 1 Data collection and refinement statistics
The data were collected from a single crystal. The values in parentheses are for the highest resolution shell. R sym ϭ ⌺͉I Ϫ ϽIϾ͉/⌺I where I is the integrated intensity of a given reflection. R work ϭ ⌺͉F(obs) Ϫ F(calc)͉/⌺F(obs), where F(obs) and F(calc) are the observed and calculated structure factor amplitudes, respectively. R free ϭ ⌺͉F(obs) Ϫ F(calc)͉/⌺F(obs), calculated using 5% of the data omitted from the refinement.  Multi-angle Laser Light Scattering-UvrA⅐UvrB complexes were formed in the following ratios (2A:1B, 1A:1B, and 1A:2B) in 20 mM Tris-HCl, pH 7.5, 150 mM NaCl, 5% (v/v) glycerol, 5 mM ␤-ME, 5 mM MgCl 2 , and 2 mM ATP. The samples were then applied to a Shodex KW-804 column equilibrated with the same buffer. Light scattering and refractive index signals were measured using a Wyatt Optilab and Dawn EOS system. Scattering curves were processed using the provided Astra software package (15,16).

Structure of the UvrA⅐UvrB Interaction Domain Complex-
The G. stearothermophilus UvrA⅐UvrB interaction domain complex crystallized in the tetragonal space group P4 1 2 1 2 with one molecule of each protein in the asymmetric unit. The structure was solved by molecular replacement at 1.8 Å using the relevant domains in the full-length protein structures as search models. The final model consists of residues 131-245 of UvrA, residues 157-250 of UvrB, and 53 water molecules, with a crystallographic R factor of 22.99% and R free of 24.75%. The accuracy of the model was confirmed by the positions of selenium atoms determined using anomalous diffraction recorded on a selenomethionine-substituted crystal ( Table 1).
The overall structure of the complex is shown in Fig. 3A. Residues 154 -199 of UvrA that were disordered in our previous structure (Protein Data Bank code 2R6F) (5) are now The interface is largely polar, consisting of a large number of direct and water-mediated hydrogen bonds, as well as electrostatic interactions between conserved residues. UvrA and UvrB interaction domains are shown as C ␣ trace. Residues that are involved in direct contacts across the interface are shown as sticks. The interactions are drawn as dashed lines. This view was generated by separating the two proteins by 10 Å and rotating them by 65°away from each other. This orientation was chosen to most clearly depict the interactions (see Table 2 for a complete list of interactions).  (6). Interestingly, residues 157-250 of UvrB are disordered in every structure in the data base except for this one, in which a point mutation Y96A led to a favorable crystal packing interaction. These observations suggest that the substructures that are involved in the UvrA-UvrB interaction are largely flexible, both internally and with respect to the rest of the protein, and only adopt a single conformation upon binding to its interaction partner. The interface between UvrA and UvrB is largely polar. Contacts consist of both direct and water-mediated hydrogen bonds, as well as electrostatic interactions between charged residues ( Fig. 3B and Tables 2 and 3). For example, Asp 219 of UvrA makes a direct contact with Arg 183 , and water-mediated contacts Arg 169 and Gly 197 of UvrB. The amino acids that participate in the UvrA-UvrB interaction are highly conserved (17). The composition of the interface is consistent with experiments suggesting that the A-B interaction is considerably weakened in high ionic strength buffers (6,18).
Comparison of the complete UvrB-binding domain of UvrA to other proteins in the Protein Data Bank (19,20) revealed that the closest structural neighbors are a region of the small subunit of ribulose-1,5-bisphosphate carboxylase (Protein Data Bank code 1BXN; Z score of 6.2), and domain I of the ribosomal protein L1 (Protein Data Bank code 2OV7; Z score of 5.4). A detailed examination confirmed the structural similarity. In both cases, the domain in question makes protein-protein contacts (intermolecular in the case of ribulose-1,5-bisphosphate carboxylase, and intramolecular in the case of ribosomal protein L1). The interactions, however, are not identical to those observed at the UvrA-UvrB interface. Therefore, the functional significance of this similarity is not established.
The Observed Interface Is Authentic-To determine whether the structure observed in the crystal represents the physiologi-cal UvrA-UvrB interface, we studied the behavior of a series of point mutant proteins. For this purpose, conserved residues in UvrA and UvrB whose side chains participate in direct contacts at the interaction interface were targeted. Negatively charged residues were replaced with arginine, whereas positively charged residues were substituted by glutamate. Nine mutant proteins were prepared (four UvrA mutants: R176E, E185R, R206E, and D219R; and five UvrB mutants: R183E, D198R, E215R, E222R, and R223E) in the context of the complete proteins. The mutants were purified using protocols identical to those used with the wild-type proteins. The ability of the mutants to bind the wild-type interaction partner was measured using size exclusion chromatography (Fig. 4), which resolves the UvrA⅐UvrB complex from free UvrA and UvrB. Chromatography was carried out in the presence of MgCl 2 and ATP because these are crucial for complex formation (21). 5 The presence of UvrA and/or UvrB in each fraction was analyzed by SDS-PAGE.
Three of the four UvrA mutations (R176E, R206E, and D219R) completely abolished interaction with UvrB (Fig. 4, A  and B), confirming the importance of these positions in complex formation. The E185R mutant, on the other hand, retained wild-type activity (Fig. 4, A and B), indicating that perturbation of Glu 185 of UvrA is not sufficient to abolish interaction. Two of the five UvrB mutants studied (D198R and R183E) completely abrogated UvrA-UvrB interaction, and two others (E215R and R223E) showed milder effects (Fig. 4, C and D). The E222R mutant of UvrB displayed wild-type level of UvrA binding activity (Fig. 4, C and D). Taken together, these data suggest that the structure seen in the crystal represents the authentic interface between UvrA and UvrB.

DISCUSSION
We have determined the crystal structure of the complex formed between the interaction domains of G. stearother-5 D. Pakotiprapha, G. L. Verdine, and D. Jeruzalmi, unpublished observation.

TABLE 3 Sequences of primers used in the amplification of G. stearothermophilus UvrA and UvrB interaction domains and the construction of mutants
The recognition sequences of the restriction enzymes used for cloning of the PCR products are underlined (NdeI and HindIII). The positions of Arg 3 Glu, Asp 3 Arg, and Glu 3 Arg mutations are in bold type. fwd, forward, rev, reverse.

Primers Sequences (5 3 3)
UvrA 131-245 fwd CGCGGCAGCCATATGCCCATTTGCCCGACGCAC UvrA 131-245 rev GGCCGCAAGCTTTTACGAAAAGCCGCAGTACGGAC UvrB 149-250 fwd CGCGGCAGCCATATGGGGTCGCCGGAAGAATATCGG UvrB 149-250 rev GGCCGCAAGCTTTTACACGAAGTGCGACGCCGG mophilus UvrA and UvrB. These domains are necessary and sufficient to mediate UvrA-UvrB interaction. The structure showed that interface between UvrA and UvrB is largely polar, consisting of a large number of direct and water-mediated hydrogen bonds, as well as electrostatic interactions between conserved residues. Interaction across the UvrA-UvrB Interface-We have performed site-directed mutagenesis and biochemical experiments to establish that the direct contacts seen in our structure are important for the formation of the UvrA⅐UvrB ensemble in solution. The observation that UvrA mutants either retained full activity or became completely defective in complex formation, whereas the UvrB mutants displayed a range of intermediate phenotypes (Fig. 4), could be attributed to the fact that one residue from UvrA often interacts with multiple residues from UvrB ( Fig. 3B and Table 2). For example, Arg 176 of UvrA forms direct contacts with Glu 215 , Glu 222 , and Asp 198 of UvrB, and Arg 206 of UvrA directly interacts with Asp 198 , Glu 222 , and Phe 216 of UvrB. The only exception to this is the interaction between Glu 185 of UvrA and Arg 223 of UvrB, which does not appear to significantly contribute to the UvrA⅐UvrB complex formation.
Besides direct interactions between side chains, the UvrA-UvrB interaction interface also involves several side chain-main chain interactions and water-mediated contacts (Table 2). Additionally, several of the highly conserved residues form intramolecular interactions with residues that participate in the contacts at the interface. For example, Arg 213 of UvrB, which is one of the most highly conserved residues in the interaction domain, does not participate in the UvrA-UvrB interaction. Instead, it makes a water-mediated contact with Glu 215 , which in turn makes both direct and water-mediated contacts with UvrA. Such buttressing interactions, observed at several positions around the A-B interface, could help stabilize the substructures of the interaction domains that are important for UvrA-UvrB binding.
We note that Truglio et al. (6) reported a biochemical study of B. caldotenax UvrB mutants (R183E, R194A/R916A, R194E/ R916E, and R213A/E215A) with substitutions in the UvrAbinding region (domain 2). Our study confirms the accuracy of this earlier analysis and adds additional residues to the interface, providing a more complete inventory of residues that are important for UvrA-UvrB interaction.
Based on biochemical studies of various E. coli UvrB constructs, Hsu et al. (18) proposed additional contacts between UvrA and UvrB involving residues 547-673, which are located at the C terminus of UvrB. If these contacts do form, we do not believe that they are energetically significant because several UvrB point mutants in the interface captured by our structure completely abolish interaction between the full-length proteins (Fig. 4). Additionally, UvrB*, a known proteolytic fragment lacking the C-terminal 43 amino acids, has been documented to form complexes with UvrA and get "loaded" at the site of damage (22). Lastly, examination of the structure of B. caldotenax UvrB revealed that only a small portion of the region in question was modeled (residues 547-595), and the resulting structure, we believe, would not form a stable entity that could productively interact with UvrA. We are thus confident that we have identified the energetically significant contacts between UvrA and UvrB.
Structural Model for the Complete UvrA⅐UvrB Damage Sensor-Despite having been extensively studied, the structure of the UvrA⅐UvrB ensemble has not been elucidated; even its stoichiometry remains controversial. Hydrodynamic studies suggest the stoichiometry of A 2 B 1 (21), whereas atomic force microscopy (23) and fluorescence resonance energy transfer (24) measurements imply A 2 B 2 . We have used multi-angle laser light scattering measurements to establish that the molecular mass of the G. stearothermophilus complex is ϳ290 kDa. The only combination of UvrA (molecular mass ϭ 107 kDa) and UvrB (molecular mass ϭ 78 kDa) that gives this molecular mass is A 2 B 1 (Fig. 5). Although dimeric UvrA contains two sites that could bind UvrB, and nothing in our structure (5) prohibits a larger ensemble, light scattering analysis of samples containing a large excess of UvrB failed to show any sign of an A 2 B 2 species. We thus combined our analysis of the A-B interaction domains and our molecular mass measurements to construct a model of the intact UvrA 2 ⅐UvrB 1 ensemble. The model was built by superimposing the corresponding domains of full-length UvrA (Protein Data Bank code 2R6F) and UvrB (Protein Data Bank code 1T5L) onto our structure (Fig. 6A).
In our model, the A-B sensor adopts a flat and open structure with overall dimensions of 160 ϫ 80 ϫ 60 Å. Strikingly, the expected path of DNA over UvrA, as determined by sitedirected mutagenesis (5), neatly aligns with the crystallographically established position of DNA on UvrB (25). Moreover, the approximate length of DNA that would be associated with the A 2 B 1 complex is ϳ43 bp, a value in FIGURE 5. Multi-angle laser light scattering suggests a 2:1 stoichiometry for the full-length UvrA⅐UvrB complex. UvrA⅐UvrB complex was formed using different UvrA:UvrB ratios and subjected to size exclusion chromatography (20 mM Tris-HCl, pH 7.5, 150 mM KCl, 5% (v/v) glycerol, 5 mM ␤-ME, 5 mM MgCl 2 , 2 mM ATP), and multi-angle laser light scattering. The complex appeared monodisperse with an apparent molecular mass of ϳ290 kDa, approximating that of UvrA 2 ⅐UvrB 1 . remarkable agreement with estimates from DNase I footprinting (26). We note that the approximate length of DNA occupied by an A 2 B 2 ensemble would be considerably larger, ϳ58 bp, and would be in poor agreement with the experimentally determined value.
Models for Lesion Recognition by the UvrA⅐UvrB Complex -We envision two limiting models for lesion recognition and UvrB loading (Fig. 6B). In the first model, termed the recruitment model, damaged DNA binds along the sensor in the conformation suggested by our structure. This model predicts that contacts to the damaged moiety are made exclusively by UvrA, consistent with photocross-linking experiments (27). Using geometric considerations, UvrB would have to bind at an adjacent site without directly contacting the lesion. Upon ATP hydrolysis, UvrA exits the complex, leaving UvrB stably bound. UvrB could move closer to the lesion, and there is some evidence suggesting that this type of subtle conformational rearrangement is possible (28,29). Departure of UvrA would leave the damaged site accessible to UvrC. In this model, the UvrA⅐UvrB complex does not undergo large structural changes upon damage detection and loading of UvrB.
A second model, termed the handoff model, envisages the lesion being recognized most likely by UvrA within the complex in the conformation suggested by our structure. Successful lesion detection would lead to handoff of the damaged site to UvrB. Such a handoff would clearly require major structural changes in the damage sensor that alter the relative orientation between UvrA and UvrB. The presence of flexible linkers between the interaction domains and the remainder of the fulllength proteins could enable dramatic changes of the scale suggested in this model. For example, these changes could reorient UvrB from the conformation in our structural model to a location on the double helix opposite the lesion. Available data do not, at present, permit these two models to be distinguished. These models make specific predictions that can serve as a basis for future experiments. Understanding the mechanism of damage recognition must await future studies to more precisely delineate the structure of the damage sensor and its interaction with damaged DNA.
Interactions between UvrA and Transcription-Repair Coupling Factor-In addition to global genome repair, which involves damage recognition by the UvrA⅐UvrB ensemble, nucleotide excision repair also has another subpathway, termed transcription-coupled repair. Transcription-coupled repair preferentially removes lesions from the DNA strand that is being transcribed (30), thus ensuring that the RNA transcript contains the correct information. This process is initiated by the protein transcription-repair coupling factor (TRCF), also known as Mfd (31,32). TRCF recognizes RNA polymerase stalled at a lesion, terminates transcription, releases the transcript, and recruits UvrA to the site of damage so that nucleotide excision repair can take place (32). Amino acid sequence analysis of TRCF revealed that the N-terminal region of the protein is similar to a portion of UvrB (32); this region of TRCF was later shown to be important for the TRCF-UvrA interaction (33). The crystal structure of TRCF revealed a surface corresponding to the UvrA-binding region of UvrB, supporting the hypothesis that TRCF and UvrB would interact with UvrA in a similar manner (34). Superposition of the portion of UvrB in our structure on TRCF (Protein Data Bank code 2EYQ) reveals a likely correspondence between residues that participate at the A-B interface and the A-TRCF interface. The more complete inventory of contacts between UvrA and UvrB provided by our structure enables a more precise definition of the TRCF-UvrA interface.