New insights on DNA recognition by ets proteins from the crystal structure of the PU.1 ETS domain-DNA complex.

Transcription factors belonging to the ets family regulate gene expression and share a conserved ETS DNA-binding domain that binds to the core sequence 5'-(C/A)GGA(A/T)-3'. The domain is similar to alpha+beta ("winged") helix-turn-helix DNA-binding proteins. The crystal structure of the PU.1 ETS domain complexed to a 16-base pair oligonucleotide revealed a pattern for DNA recognition from a novel loop-helix-loop architecture (Kodandapani, R., Pio, F., Ni. C.-Z., Piccialli, G., Klemsz, M., McKercher, S., Maki, R. A., and Ely, K. R. (1996) Nature 380, 456-460). Correlation of this model with mutational analyses and chemical shift data on other ets proteins confirms this complex as a paradigm for ets DNA recognition. The second helix in the helix-turn-helix motif lies deep in the major groove with specific contacts with bases in both strands in the core sequence made by conserved residues in alpha3. On either side of this helix, two loops contact the phosphate backbone. The DNA is bent (8 degrees) but uniformly curved without distinct kinks. ETS domains bind DNA as a monomer yet make extensive DNA contacts over 30 A. DNA bending likely results from phosphate neutralization of the phosphate backbone in the minor groove by both loops in the loop-helix-loop motif. Contacts from these loops stabilize DNA bending and may mediate specific base interactions by inducing a bend toward the protein.

Transcription factors bind to target DNA sequences to regulate metabolic functions such as growth and differentiation. Typically, the molecular scaffold for DNA recognition is conserved within a given family of DNA-binding proteins. In some cases the similarity of these scaffolds suggests an evolutionary relationship between different families or comparison of scaffolds reveals a structural similarity that was obscured by sequence comparisons alone.
A recently discovered family of regulatory proteins, the ets gene family, includes more than 45 members in a variety of organisms from Drosophila to humans (1,2). These molecules play a role in normal development and have been implicated in malignant processes such as erythroid leukemia and Ewing's sarcoma. The DNA-binding domain of ets proteins is a conserved region (ETS domain) that is about 85 residues in length. Although ets proteins share a homologous sequence in the ETS domain, they differ in length and in the relative position of this domain. In some molecules, the ETS domain is found at the carboxyl terminus (e.g. PU.1 (3); ets-1 (4); ets-2 (5)), while in others the domain is located in the middle of the sequence (erg (6)), or in the amino-terminal region (elk-1 (7)). Flanking regions are thought to form other functional domains that influence protein-protein recognition or inhibitory domains that mask the DNA-binding site (8,9). 1 In ets-1, an ␣-helix that is located in an inhibitory domain immediately NH 2 -terminal to the ETS domain unfolds on DNA-binding (10). Regardless of the position of the ETS domain within the intact ets proteins, there is strong sequence homology in this conserved region.
We have determined the crystal structure of the ETS domain of the PU.1 transcription factor complexed to DNA (11). The domain is similar to ␣ϩ␤ helix-turn-helix (HTH) 2 DNA-binding proteins and contacts a 10-base pair region of duplex DNA that is bent (8°) but uniformly curved without distinct kinks. The PU.1 domain assumes a tight globular structure with three ␣-helices and a four-stranded antiparallel ␤-sheet enclosing a hydrophobic core. The topology of the domain is similar to the structures of other ets family proteins fli-1 (12), murine ets-1 (13), and human ets-1 (14) determined in solution by NMR. The common molecular scaffold is similar to DNA-binding proteins such as CAP (15) and resembles "winged"-HTH proteins including HNF-3␥ (16). ETS domains bind as a monomer to the core sequence 5Ј-(C/A)GGA(A/T)-3Ј.
The PU.1 domain contacts DNA from three sites: the recognition helix (␣3) interacts with the GGAA core sequence in the major groove, while contacts with the phosphate backbone on either side of this site are made in the minor groove by two loops. Therefore, the PU.1 ETS domain binds DNA by a loophelix-loop motif. One loop is formed between ␤-strands 3 and 4 (a "wing") and the other is a loop in the position of the turn in the HTH motif (␣2-turn-␣3). The protein-DNA contacts stabilize a uniform bending of the duplex DNA that likely is due to phosphate neutralization by the PU.1 domain. Surprisingly, the protein-DNA interactions reported in the NMR structure of a human ets-1-DNA complex (14) differed dramatically from this pattern, involving different contacts and significant DNA deformation. Because of this discrepancy, we chose to test the validity of the PU.1-DNA complex as a model for other ets proteins. As reported here, when the results of mutational analyses on a number of ets proteins are correlated with the structure of the PU.1-DNA complex and with chemical shift data measured with the fli-1 (12) and murine ets-1 (13) mole-cules, the loop-helix-loop scaffold is confirmed as a general model for DNA recognition by ets proteins. This pattern defines a new class of HTH DNA-binding proteins. The molecular pattern of DNA recognition by ets proteins is compared to other HTH proteins for which crystal structures of the protein-DNA complexes are available.

EXPERIMENTAL PROCEDURES
PU.1 DNA Complex-A recombinant fragment encompassing residues 160 -272 from the murine ets protein PU.1 was crystallized in complex with a 16-base pair oligonucleotide representing a consensus PU.1 DNA-binding site (3) as described previously (17). The complex crystallized in space group C2 with a ϭ 89.1, b ϭ 101.9, c ϭ 55.6 Å, and ␤ ϭ 111.2°. There are two complexes in the asymmetric unit. The length of the oligonucleotide was critical for crystallization and the oligonucleotide used to form the complex permitted end-to-end stacking of the DNA in the crystal lattice with the formation of pseudo-base pairing by the overhanging A and T bases.
Crystallographic Analyses-The initial structure analysis of the complex solved by the MIRAS method was reported (11). For this first phase of the study, a native data set and four heavy atom data sets were collected using a Rigaku RU200 rotating anode x-ray source and two San Diego Multiwire Systems area detectors. The initial data sets were collected from flash frozen crystals at 2.3-Å resolution. To refine the structure further, another native data set extending to 2.1 Å was collected at the LURE synchrotron source in Orsay, France. Diffraction data were collected at station D41 interfaced with the Mark III multiwire proportional area detector. Data sets were processed using MOS-FLM (18) and ROTAVATA, AGROVATA, and TRUNCATE in the CCP4 package (19). In the present study, this native data set was scaled to the data collected in the home laboratory by Wilson scaling and the synchrotron data were incorporated into the refinement. The programs PHASES (20), FRODO (21), and X-PLOR (22) were used for structure solution, model building, and refinement. The current R-factor is 22.5 for 6 to 2.1 Å data (22,022 reflections). The average overall B-factor for 2929 non-hydrogen atoms (1486 protein atoms ϩ 1300 DNA atoms ϩ 143 solvent oxygens) is 31.6 Å 2 . The refinement statistics are presented in Table I. There were 11 disordered residues at the amino terminus of the domain and 14 disordered residues at the carboxyl terminus of the recombinant fragment that were excluded from the model. These residues were not ordered even when the resolution was extended to 2.1 Å. For all residues representing the complete ETS domain (residues 171-258), the electron density was clear and permitted unambiguous fitting of both backbone and side chain atoms. More solvent atoms have been added to the model. Only minimal changes in the configuration of some side chains were evident in the high resolution map. The stereochemistry of all main chain torsion angles in the domain fall within energetically favorable limits ( Fig. 1) indicating that no segment of the domain is denatured or randomly configured. The DNA was clearly defined even in the first MIRAS map.
Analyses of DNA Helical Parameters-To analyze the stereochemical basis for the uniform bending observed in the oligonucleotide bound in complex to PU.1, the DNA superstructure was measured (23,24) and four parameters were calculated that describe the conformation of the DNA bases and the phosphate backbone. The values were calculated (excluding the 5Ј A overhang) to analyze helical parameters along the length of the oligonucleotide and to compare these with standard B-DNA parameters. The geometry of dinucleotide steps was analyzed for three rotational angles defining twist, tilt, or roll and for one transla-tional distance, i.e. rise. The values for these parameters are presented in Table II.
Sequence Alignments and Structural Comparisons-Sequence alignments for ets proteins were made using GENEWORKS. The individual sequences were collected from the SWISSPROT data base and regions corresponding to the ETS domains were excised from the full-length protein before the alignment process began (25). The results of this comparison are presented in Fig. 2. Sequence comparisons between members of different families of HTH proteins were made using the program QUANTA (Molecular Simulations, Inc.) especially when structure-based alignments were utilized. To search structure data bases to identify proteins with similar overall scaffolds to the PU.1 domain, the algorithm DALI developed by Chris Sander (26) was used. For structural comparisons of HTH proteins, coordinates were obtained from the Brookhaven Protein Data Bank (27): 434 cro repressor (code 3CRO), repressor (code 1LMB), CAP (code 1CGP), and heat shock factor (code 2HTS). The coordinates for HNF-3␥ were kindly provided by Dr. S. Burley. The actual structural comparisons/graphical analyses were performed using Quanta (Molecular Simulations, Inc.) and the Alberta/ Caltech program TOM based on FRODO (21).

RESULTS AND DISCUSSION
The similarity of the structural organization of the ETS domains of PU.1 (11), fli-1 (12), and ets-1 (13,14) and the presence of a conserved hydrophobic core suggests that this overall scaffold will be highly conserved in all members of the family. To facilitate comparisons, the sequences of the ETS domains of 33 members of the ets family are aligned (Fig. 2). The sequences of this domain in a number of ets proteins are identical for two or more species, representing a significant level of homology within the family. The results of mutational substitutions in a number of ets proteins are tabulated in Table III.
Hydrophobic Core-The importance of the hydrophobic core was verified by site-directed mutagenesis of the PU.1 domain (11). Of the 14 strictly conserved residues in the domain, seven are found in the hydrophobic core. Single substitution of glycine for five of these residues in PU.1 (Fig. 3) resulted in loss of DNA binding. Two of these core residues also contact the DNA phosphate backbone. The peptide amide nitrogen of Leu 174 interacts with O2P of C-22 and the side chain NE-1 of Trp 215 forms a hydrogen bond with O1P from T-23. Mutation of tryp- tophan 215 to arginine results in loss of DNA binding in ets-1 (28, 29; see Table III). Substitutions in the hydrophobic core affect DNA binding probably because the changes disrupt the tight globular structure of the domain. Residues 174 and 215 are doubly critical for DNA binding since they represent both important structural residues in the domain core and actual DNA contact residues. In summary, residues in the hydrophobic core are critical for the formation of the overall scaffold for ets recognition.
Molecular Scaffold of ETS Domains-To evaluate the conservation of this scaffold within the ets family, the ␣-carbon backbones of PU.1 (11) and fli-1 (12) domains were superimposed utilizing both sequence homology and secondary structure similarities. For this purpose, a single model from the ensemble of structures deposited in the data bank was used for the NMR-derived fli-1 structure. This scaffold provides the  The amino acid sequence of PU.1 is listed at the top of the figure and residues that are strictly conserved in the family are enclosed in boxes. The sequences were obtained from the SWISSPROT data base and original citations for the sequences are given in the data base. Secondary structural features of the PU.1 ETS domain are indicated above the alignment. Directly under the PU.1 sequence, the residues that contact DNA are indicated: B, base interaction; P, phosphate backbone interaction; W, water-mediated interaction. Residues found in the hydrophobic core in PU.1 and expected to be located in the hydrophobic interior of all ets proteins are shaded. In some cases, the sequences for ets proteins for two or several species are identical, and therefore only one sequence has been listed to avoid duplication. framework for the three structural features arranged in a loop-helix-loop pattern that mediate precise DNA binding by the PU.1 domain. In order to delineate the loop-helix-loop motif in other ets domains and to predict whether this motif is the paradigm for ets recognition, we also superimposed the ␣-carbon skeleton of the fli-1 domain onto the PU.1 backbone bound to the DNA (Fig. 4). Since this is one of an ensemble of struc-tures from the NMR study, detailed comparisons are not possible. However, general comparisons are useful to establish overall structural similarities between the two related molecules. Although the structure of the fli-1-DNA complex was not determined, it should be noted that the published structure of the fli-1 domain (12) reflects a bound conformation since the NMR experiments were conducted on a 98-residue protein frag-  (11) 174H, D L 3 G PU.1 (11) 178H L 3 G ets-1 (29) Multiple 174H, D, 175H, 177H, 178H ets-1 (28) 185 K 3 P ets-1 (28) 191H I 3 T PU.1 (11) 193H W 3 G ets-1 (28) 194 T 3 I ets-1 (28) 196 Residue numbers of the PU.1 sequence are given to facilitate direct comparison with the sequence alignment in Fig. 2; H indicates a residue in the hydrophobic core of the PU.1 domain and D indicates residues which contact DNA in the PU.1-DNA complex, either directly or by water-mediated interactions. b X, substitution by any amino acid. The ␣-carbon backbone for residues 171-258 is shown bound to DNA with the bases in the GGAA core in bold lines. The ETS module is composed of three ␣-helices and a four-stranded antiparallel ␤-sheet enclosing a hydrophobic core. There are seven strictly conserved residues in this core (Fig. 2). Substitution of glycine for each of the five core residues in PU.1, shown on the model, abolishes DNA binding. ment complexed to a 16-base pair oligonucleotide.
As shown in Fig. 4, there is close similarity in the overall scaffold of the ETS domains but several other features of the superposition are worth noting. First, the positions of the four conserved residues that contact DNA are very similar in PU.1 and fli-1. In PU.1, two conserved arginines, 232 and 235, make hydrogen bonds with the bases GGA of the PU core sequence. Arg 235 (NH-2) forms a hydrogen bond with G-8(O-6) while Arg 232 (NH-1) makes hydrogen bonds with two bases G-9(O-6) and A-10(N-6) on one strand and a water-mediated contact with T-23(O-4) on the opposite strand. These arginines are strictly conserved in all members of the ets family and the GGA sequence is the consensus DNA sequence recognized by the ets proteins. Therefore, these interactions are expected to be reproduced in all ets protein-DNA complexes. When the fli-1 domain is superimposed on PU.1, the side chains of conserved arginines 232 and 235 in the recognition helix are within hydrogen-bonding distance of the same bases in the GGAA core sequence in the major groove. Substitution of these residues by any other amino acid, even closely related hydrophilic amino acids results in loss of DNA recognition in PU.1, fli-1, and other ets proteins (see Table III). Conserved lysines, residues 219 in the loop (HTH) and 245 in the wing contact the phosphate backbone in PU.1 and are in a position to make the same contacts in fli-1. Mutational substitutions for Lys 219 in PU.1 (11) and the equivalents of Lys 219 and Lys 222 (see Table III) in fli-1 (12) or ets-1 (28) disrupt DNA binding, presumably due to the loss of the phosphate backbone interactions. In fli-1, the equivalents of Lys 222 and Met 225 in PU.1 (from the HTH loop) and residues 248/249 (from the wing loop) were identified within 4 Å of DNA by intermolecular NOEs (12). Chemical mapping experiments with the murine ets-1 molecule suggested a similar pattern with a major groove contact zone and interactions with both adjacent minor grooves (30).
DNA Conformation in the PU.1 ETS Domain-DNA Complex-The PU.1 ETS domain contacts DNA over a 10-base pair area. The DNA is bent by 8°in the complex but does not deviate significantly from B-form DNA (see Table II). As can be seen in Fig. 4, the DNA is uniformly curved over the length of the 16-base pair fragment. There is an average helical twist of 33°, with 10.8 base pairs per turn and an average rise per base pair of 3.2 Å. The minor groove is slightly enlarged (ϳ2-3 Å from the mean) in the GGAA region at the midpoint of the oligonucleo-tide. A "spine" of water molecules, similar to that observed in the crystal structure of a B-DNA dodecamer (31), is located in the minor groove from bases 8 to 12. Binding of the ETS domain induces a DNase I-hypersensitive site 3Ј to the C-26 base in the core sequence (30). This site is probably exposed on the face of the DNA opposite to where the protein binds as a result of the expansion of the minor groove (Fig. 3).
The DNA bending that is stabilized by the PU.1 domain may serve as an illustration of the hypothesis of DNA bending by phosphate neutralization. It has been demonstrated, by the introduction of neutral methylphosphonate analogues in DNA fragments bearing polyadenylate tracts (32) that bending of the DNA occurs when the phosphate charges are neutralized on one face of the DNA helix, due to repulsion of the remaining anionic phosphates. It was proposed (32) that binding of proteins with cationic surfaces to DNA could also cause the DNA double helix to "spontaneously relax" toward the surface where cationic amino acids neutralized phosphate anions through formation of salt bridges. The PU.1 ETS domain makes neutralizing contacts with phosphate groups on one face of the DNA helix, involving consecutive phosphates on either side of the major groove. The sites of phosphate neutralization are shown on the DNA sequence in Fig. 5. On the GGAA strand, neutralizing contacts with the phosphate backbone 5Ј to the core sequence are made by Lys 208 and Lys 245 from the wing. On the complementary strand, the phosphate contacts are 5Ј to the core sequence as well as with the phosphate backbone within the core: Arg 173 , Lys 219 , and Lys 223 from the HTH loop and Lys 229 from helix ␣3. As predicted by the neutralization exper- iments (32), the cationic surface of the PU.1 domain binds to the DNA causing a bend of the duplex oligonucleotide toward the ETS module that is within the range (ϳ10°) of curvature estimated experimentally. The bend is toward the "neutral surface," i.e. toward the protein. Two of these phosphate interactions in the minor groove involve conserved residues, Lys 219 from the HTH loop and Lys 245 from the wing. Thus the loophelix-loop pattern may influence both DNA recognition and DNA bending.
This type of charge neutralization is not seen in all proteininduced DNA bends. For example, the TATA-binding protein binds with extensive phosphate backbone interactions to the TATA element (33). Yet in this case the DNA is sharply kinked away from the protein contacts. In CAP (15) salt bridges and other hydrogen bonds to phosphate groups stabilize a severely kinked DNA conformation with DNA bent at 90°.
Interactions with the phosphate backbone are seen in numerous DNA-binding proteins, but these contacts are often hydrogen bonds and not salt bridges. The hypothesis (32) states that neutralization of charge by lysines and arginines results in excess repulsive electrostatic forces that can maintain bending of the DNA double helix (34). The moderate DNA bending seen in the complexes of oligonucleotides with paired homeodomains (35,36) or HNF-3␥ (16) may also result from phosphate neutralization, since these proteins form phosphate-side chain salt bridges with 4 or 3 arginines, respectively. However, the neutralizing contacts are not as extensive as those seen in the PU.1-DNA complex.
The complementarity of the loop-helix-loop motif of fli-1 with the DNA from the PU.1 complex also suggests that, like PU.1, other ETS domains may not significantly deform DNA from B-DNA conformation but to date there is not much biochemical data in the literature on DNA bending by ETS domains. In one study of the ETS domain from the Elk-93 protein, circular permutation analyses indicated that DNA binding by the Elk-93 fragment did not induce significant bending of DNA (37). In contrast, in the human ets-1-DNA complex (14), the DNA was kinked at a 60°angle due to intercalation of a tryptophan side chain. The equivalent of this tryptophan, tyrosine 175 in PU.1, is found in the hydrophobic core and is not in position to intercalate. Substitution of glycine for this tyrosine in PU.1 does not affect DNA binding (11). In fli-1 (12), the equivalent tryptophan is buried in the hydrophobic core and was not listed among residues in close proximity (Յ4Å) to DNA. Thus, the molecular basis for kinked DNA cannot be understood in the context of contacts seen in the PU.1-DNA complex (11) or inferred in the fli-1 complex (12). DNA bending by phosphate neutralization is not apparent in the ets-1-DNA complex, since only one lysine and one arginine form phosphate-side chain salt bridges. The arginine is the equivalent of Arg 235 in PU.1 that forms a hydrogen bond with base G-8 in the GGA core.
Target Specificity-The superimposed models in Fig. 4 suggest that a loop-helix-loop scaffold that brings together conserved amino acids and conserved DNA bases is a general mode of DNA recognition by ets proteins. Yet, ets transcription factors bind to the GGA(A/T) core motif in the context of specific promoters. To begin to identify residues that influence target specificity, it is necessary to look for mutations of non-conserved residues that affect DNA binding. Of the 14 absolutely conserved residues in the domain, seven contact DNA in the PU.1 complex. These contacts would be expected to be maintained for all ets-DNA complexes. In studies of a number of members of the ets family, mutations have been reported that affect DNA binding. These mutations, summarized in Table III, can now be correlated with the atomic model of the PU.1-DNA complex. Some of these residues are conserved residues, but others are unique to a particular molecule. Residues that are shaded represent positions in classic HTH that are generally hydrophobic or small (Gly or Ala) in these proteins. The glycine that is conserved in the bacterial HTH proteins is marked with an asterisk. Note that helix ␣2 in PU.1 is one turn longer than the counterpart in the bacterial proteins, yet when the HTH motifs of the repressors are superimposed on the PU.1 HTH, the glycine in the last turn of the PU.1 ␣2 helix is equivalent to the conserved glycine in the turn of the bacterial proteins (not shown). Panel B, the HTH motifs of PU.1 (thick line), CAP (medium line; Ref. 15), and heat shock factor (thin line; Ref. 40) are superimposed for comparison. The ␣3 recognition helix is on the right in the photograph. Note that the relative orientation of the two helices is closely similar in the three molecules, but the configuration of the residues in the turn between the helices is different. The turn in the PU.1 domain is seven residues in length which is intermediate between the extremes reported for the family of HTH proteins (43,44).
It should be emphasized that PU.1 contacts both strands at the GGAA core. Interactions are made by conserved residues as well as residues where sequence variability exists in the ets family. Therefore, ets recognition requires specific base contacts with the GGAA sequence and the bases on the complementary strand. For example, it has been shown that a single residue converts DNA recognition of ets proteins from GGAA to GGAT. When a lysine in chicken ets-1 (equivalent to residue 229 in PU.1) is altered to threonine found in this position in Elf-1 and E74, the resultant protein exhibits a restricted selectivity for GGAA like the Elf-1/E74 proteins and the reverse mutation causes the converse change in DNA recognition (38). In the PU.1 complex, Lys 229 is located in the recognition helix and makes a water-mediated contact to base C-25 on the antisense strand at the GGAA core. There is a water network located in the major groove at the GGAA site. Twelve well defined water molecules are hydrogen-bonded to the bases and also form a hydrogen-bonded network between the two strands. This water network may contribute to the stability of the duplex and consequently influence specific DNA recognition.
Since the side chain of lysine is long, it is possible that the contact of a shorter residue such as threonine would not bind to this water network and could contact a different base, i.e. T-23. The water network itself could also change. Or, the interchange of lysine7threonine could permit DNA contact reflecting the stereochemical difference in size of adenine versus thymine bases.
HTH Motif-All of the direct contacts with specific bases in the PU.1-DNA complex are made by residues in the ␣3 recognition helix. Two non-conserved residues, Thr 226 and Gln 228 , at the amino-terminal end of this helix, make water-mediated contacts with bases C-25 and C-26, respectively, that are base paired to guanines 8 and 9 in the core GGAA sequence. Both of these residues are unique to PU.1/SpiB in the ets family, so these may represent PU.1-specific contacts.
Tyr 227 , which is strictly conserved in the ets family, is located in the hydrophobic interior of the protein. While the phenyl ring of this tyrosine is buried, the hydroxyl group is exposed and lies within 3.6 Å of G-6(O1P). This residue was not included in our list of DNA contacts using a conservative cut-off In each of these complexes, the recognition helix makes contact in the major groove. The contacts of the PU.1 domain with DNA are more extensive and include interactions from two loops in the minor grooves on either side of the major groove where recognition helix ␣3 binds. of 3.2 Å for hydrogen bonds/ionic interactions. Although this interaction may not occur in PU.1, with a simple side chain rotation, a hydrogen bond is possible with the phosphate backbone. This may be an example of a contact made by a conserved residue that influences DNA recognition by selected family members. Substitution of cysteine for this tyrosine abolishes DNA binding in ets-1 (28).
In Fig. 6A, the sequence of the HTH motif of PU.1 is compared with the sequence of "classic" bacterial HTH proteins and other winged-HTH proteins. The glycine required in the turn between helices in HTH proteins (39) is also conserved in this position in ETS domains, although the ␣2 helix is one turn longer than the helix in HTH proteins. In PU.1, the glycine lies in the last turn of this helix. This glycine and other hydrophobic residues in ␣2 and ␣3 stabilize the arrangement of these two helices in HTH proteins. Even this pattern of conserved hydrophobic residues is seen in ets proteins. In other winged-HTH proteins, HNF-3␥ (16) or heat shock factor (40), the sequence similarities are not as apparent. These two proteins have prolines in the equivalent position of the conserved glycine and the presence of this proline may influence the configuration in the "turn." On the other hand, ets proteins may exhibit a helical arrangement that is structurally closer to that in "classic" HTH proteins. When HTH elements of PU.1 and HTH molecules such as (41) or 434 cro (42) repressors are superimposed, the glycine is in a structurally equivalent position (not shown). Moreover, the overall pattern of docking of the recognition helix in the major groove is quite similar when 434 cro repressor (42), CAP (15), and PU.1 are compared bound to DNA (Fig. 7). The major difference is the fact that the recognition helix in PU.1 docks deep in the major groove with contacts to the bases involving residues along the entire length of the helix, while DNA contacts in CAP and other classic HTH proteins are made from residues at the amino-terminal portion of the helix.
None of the related proteins in the HTH superfamily actually contact DNA by residues in the HTH turn (43,44). This novel DNA contact may be possible in PU.1, as well as other ets proteins, because the connecting segment between helices is more of a loop than a turn. The corresponding HTH motifs of heat shock factor (40) and CAP (15) are compared to PU.1 in Fig. 6b. But it is not simply the length of the "turn" or "loop" in the HTH motif that accounts for this DNA contact in PU.1, since other eukaryotic HTH proteins contain even longer connecting segments (43,44) and yet do not contact DNA by this structural feature, for example HNF-3␥ (16). Thus the contacts made by this loop in PU.1 illustrate a new DNA contact that, to date, is unique to the ets proteins as the newest members of the HTH superfamily.
Loops and Minor Groove Contacts-Since the sequences in the HTH loop as well as the loop (wing) between strands ␤3 and ␤4 are not strictly conserved among members of the ets family, these residues may be important sites for specific recognition by individual members of the family. In the PU.1-DNA complex, these two loops contact the minor groove through interactions with the phosphate backbone closest to the major groove. It is also interesting to note that the length of both of the contact loops differs among members of the family, with the PU.1 loop containing an "extra" glycine at residue 220 and lacking a glycine after residue 247. Other residues in these loops may also provide specific contacts to bases in other ets proteins. For example, the change of arginine3aspartic acid (equivalent to 244 in PU.1) affects DNA binding in Elk-1 (45).
Since ets proteins bind DNA as monomers, it could be expected that there would be extensive contacts to stabilize the interaction. HNF-3␥ also binds DNA as a monomer (16). In the HNF-3␥ complex, three regions were involved in DNA recognition: the recognition helix and two wings. The location of the first wing between the last two strands in the ␤-sheet corresponds topologically to the wing in PU.1, but contacts from the second wing emanate from a loop at the COOH terminus of the domain. The structural equivalent of this second loop is absent in PU.1. In CAP, the major DNA contacts are made from the recognition helix. This protein binds DNA as a dimer. The surface area on CAP that is buried on DNA binding is 1187 Å 2 . Similarly, the surface area buried when 434 cro repressor binds DNA is 1306 Å 2 . But with the formation of the DNA complex with the PU.1 ETS domain, 1701 Å 2 surface area is buried. The significantly greater surface area of the PU.1 domain covered reflects the extensive protein-DNA contact region extending for more than 30 Å (11).
The PU.1-DNA model suggests that residues from the two loops contribute the critical interactions for recognition of bases other than the conserved GGAA core when the core is embedded in specific promoter sequences. The loops approach segments of the DNA that are adjacent to the conserved core sequence and therefore these interfaces are stereochemically suitable to permit sequence-specific interactions by a given family member while maintaining the consensus interactions at GGA(A/T). Moreover, the contacts from these loops may mediate specific base interactions by stabilizing a bend toward the protein. Future extensive mutational studies of amino acids that contact DNA are needed to identify these residues. Ultimately, crystal structures of other ets proteins complexed to DNA can be compared to distinguish unique DNA contacts. The DNA junction-resolving enzyme endonuclease VII of bacteriophage T4 contains a zinc-binding region toward the N-terminal end of the primary sequence. In the center of this 39-amino acid section (between residues 38 and 44) lies the sequence HLDHDHE, termed the Hisacid cluster. Closely related sequences are found in three other proteins that have similar zinc-binding motifs. We have analyzed the function of these residues by a site-directed mutagenesis approach, modifying single amino acids and studying the properties of the resulting N-terminal protein A fusions. No sequence changes within the His-acid cluster led to a change in zinc content of the protein, indicating that these residues are not involved in the coordination of zinc. We found that the N-terminal aspartate residue (Asp-40) and the two histidine residues (His-41 and His-43) within the cluster are essential for junction-cleavage activity of the proteins. However, all sequence variations within this region generate proteins that retain their ability to bind to four-way DNA junctions (with minor changes in binding affinity in some cases) and to distort their global structure in the same manner as active enzymes. We conclude that the process of cleavage can be uncoupled from those of binding to and distortion of the junction. It is probable that some amino acid side chains of the Hisacid cluster participate in the phosphodiester cleavage mechanism of endonuclease VII. The essential aspartate residue might be required for coordination of catalytic metal ions.
The DNA junction-resolving enzymes are a class of nucleases that recognize the structure of branched DNA molecules. Such enzymes are important in DNA recombination and repair for the processing of four-way DNA junctions created as intermediates (1-10). Junction-selective nucleases have been isolated from bacteriophage-infected eubacteria (11,12), Escherichia coli (13)(14)(15)(16), yeast (17,18), and mammalian cells (19,20) and their viruses (21) and are very probably ubiquitous cellular enzymes.
These proteins are fundamentally structure-selective. For example, the complexes formed between T7 endonuclease I, T4 endonuclease VII, or yeast CCE1, and four-way DNA junction are not displaced by a 1000-fold excess of duplex DNA of the same sequence (22)(23)(24). Although RuvC of E. coli and yeast CCE1 exhibit significant sequence specificity, this is manifested at the level of the cleavage reaction, and these enzymes bind to four-way junctions of any sequence (24,25). The existence of mutants of T7 endonuclease I and T4 endonuclease VII that bind normally to DNA junctions but are defective in cleavage suggests that binding and catalysis are separable events. While the binding of resolving enzymes is selective for the structure of DNA junctions, the act of binding in general also distorts the global configuration of helical arms (22,26,27).
Endonuclease VII of T4 is required during late infection in order to resolve DNA branch points prior to packaging of the DNA into phage heads (11,28). Examination of the primary sequence of endonuclease VII (29) suggests the existence of three sections that might form modules within the overall protein structure. There is a region at the C terminus that is 48% identical to a sequence within the pyrimidine dimer glycosylase and nuclease T4 endonuclease V. The structure of this repair enzyme is known (30), and the region of similarity comprises a helix and an extended section. Replacement of the region of endonuclease VII with the corresponding sequence of endonuclease V resulted in a chimeric protein that retained its specificity for the precise cleavage of four-way DNA junctions (23). In the center of the endonuclease VII sequence is a section with some similarity to a region of the functionally related resolving enzyme T7 endonuclease I. We have previously found that in selection of non-functional mutants of T7 endonuclease I, all such mutants map within this region of the protein (22), suggesting that it may comprise a significant part of the active site for DNA cleavage. We have shown that a mutation within the corresponding part of the primary sequence of T4 endonuclease VII results in a catalytically inactive protein (27).
The N-terminal section of endonuclease VII contains a 40amino acid region bounded by two Cys-X-X-Cys motifs that binds an atom of zinc (23). This region is 42% identical to a section found in a protein (gp59) of unknown function encoded by mycobacteriophage L5 (31). In addition, an open reading frame identified downstream of the secF gene of E. coli and Salmonella typhimurium (32) encodes a 109-amino acid protein of unknown function that includes a region that is 43% identical with this section, which is also bounded by Cys-X-X-Cys motifs. The four sequences are collected together in Fig. 1. The comparison reveals a number of conserved features, including arginine (position 28 in endonuclease VII), asparagine (position 31), and glycine (position 51). In addition, there is a conserved four-residue sequence Asp-His-Asp-His beginning with aspartate 40 in endonuclease VII. Indeed, this cluster of histidine and acidic residues can be extended in endonuclease VII, beginning with histidine 38, to read HLDHDHE. Acidic residues are frequently involved in the catalytic sites of nucleases (33)(34)(35)(36), where they coordinate metal ions that participate in the chemistry of phosphodiester bond cleavage, and we were curious to learn whether any or all of these residues might be involved in catalysis. We have therefore made point mutants of endonuclease VII in which amino acids within the cluster of * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. histidine and acidic side chains (referred to subsequently as the his-acid cluster) have been changed to different residues, and we have examined the ability of the resulting proteins to bind to or cleave four-way DNA junctions.

MATERIALS AND METHODS
Synthesis of Oligonucleotides-Oligonucleotides were synthesized using ␤-cyanoethyl phosphoramidite chemistry (44,45) implemented on a 394 DNA/RNA synthesizer (Applied Biosystems). Fully deprotected oligonucleotides were purified by gel electrophoresis in 10% polyacrylamide containing 7 M urea, the bands were excised, and DNA was eluted and recovered by ethanol precipitation.
Construction of Four-way DNA Junctions-A four-way DNA junction with four arms of 25 bp 1 was generated by the hybridization of four oligonucleotides of 50 nucleotides each, one of which was radioactively 5Ј-32 P-labeled. The oligonucleotides were based upon the sequence of junction 3 studied by Duckett et al. (39); thus, the b strand had the sequence 5Ј CCTCGAGGGATCCGTCCTAGCAAGGGGCTGCTACCG-GAAGCTTCTCGAGG 3Ј.
For the comparative gel electrophoretic analysis of the global structure of the complex between junction and endonuclease mutants, six junctions comprising two long arms of 40 bp and two short arms of 15 bp were each generated by hybridization of appropriate oligonucleotides based on the sequence of junction 3 as described above. As an example, the BH junction (where the B and H arms are long) was obtained using the following four oligonucleotides: b strand, 5Ј CGCAAGCGACAGGA-ACCTCGAGGGATCCGTCCTAGCAAGGGGCTGCTACCGGAAGCTT-CTCGAGGTTCCTGTCGCTTGCG 3Ј; h strand, 5Ј CGCAAGCGACAG-GAACCTCGAGAAGCTTCCGGTAGCAGCCTGAGCGGTGGTTGAA 3Ј; r strand, 5Ј TTCAACCACCGCTCAACTCAACTGCAGTCT 3Ј; and x strand, 5Ј AGACTGCAGTTGAGTCCTTGCTAGGACGGATCCCTCGA-GGTTCCTGTCGCTTGCG 3Ј.
Cloning of Endonuclease VII to Encode an Oligohistidine Fusion Protein-A fragment containing part of the synthetic gene encoding endonuclease VII was obtained by digestion of pAT153-SEVII (23) by HindIII and BamHI. The sequence lost from the 5Ј end of the gene was restored by hybridization of two oligonucleotides, giving a DNA fragment containing an NcoI site at the 5Ј-end and a HindIII site at the 3Ј end; this also added the coding sequence for 10 histidine residues and a site for proteolytic cleavage by enterokinase. This was ligated into pET-19b (previously digested by NcoI and BamHI) to generate the plasmid pETSEVII, which was used to transform the E. coli strain HMS174(DE3) pLysS.
Mutagenesis by Polymerase Chain Reaction-Mutagenesis of the synthetic endonuclease VII gene was performed by polymerase chain reaction (PCR) using one primer that differed in sequence from the synthetic gene sequence by one or two nucleotides (therefore changing the appropriate codon) and one non-mutated primer. In order to maximize the stability of the hybrid, at least seven nucleotides were placed between the mutagenic mismatch(es) and the ends of the oligonucleotide. In all cases, the mutated primer extended to the nearer restriction site to facilitate the cloning. For example, the sequence of the primer used for changing the histidine 38 to a glutamine was 5Ј CTCAATC-CAGACGTCCAAGCTAATCAACTTGACCAC 3Ј, where the AatII cloning site and mutagenic adenosine base are underlined. The other primer corresponded to a sequence downstream to the targeted area, 5Ј CCACGTTGCAGCATCTCTGC 3Ј.
PCR reactions were performed as described by Landt et al. (47) using 1 unit of Taq DNA polymerase, 1 ng of plasmid DNA, 100 pmol of each primer, 50 M dNTPs in 50 mM Tris-HCl (pH 9), 1 mM MgCl 2 , 0.1% Triton X-100 for 30 cycles (1 min at 93°C, 1 min at 45°C, and 45 s at 72°C). The PCR products were digested with the restriction enzymes AatII and AflII, and the fragment was purified by gel electrophoresis and used to replace the wild-type sequence in pK19SEVII (23).
DNA Sequencing-The base sequences of wild-type and mutated genes were obtained by primer extension-dideoxy sequencing (48).
Preparation of Oligohistidine-Endonuclease VII Fusion Protein from E. coli-Endonuclease VII as an oligohistidine fusion protein was prepared from 1 liter of E. coli strain HMS 174 (DE3) pLysS transformed with pETSEVII. The cells were grown to an A 660 of 0.6 and then induced with IPTG to a final concentration of 1 mM for 2 h. Cells were harvested, resuspended in 20 ml of 20 mM Tris-HCl (pH 7), 0.5 M NaCl, 1 mM imidazole, and lysed by sonication. Unbroken cells and cell debris were removed by centrifugation (40,000 ϫ g for 15 min). The protein was purified by affinity chromatography using a Fractogel EMD chelate column previously charged with nickel chloride. The protein was eluted using a gradient of imidazole from 0 to 500 mM in 20 mM Tris-HCl (pH 7), 0.5 M NaCl. The protein-containing fractions were pooled and dialyzed for 2 h at 4°C against 50 mM Tris-HCl (pH 7.4), 1 mM DTT, 50% glycerol and were stored at Ϫ20°C.

Preparation of Protein A-Endonuclease VII Fusion
Proteins from E. coli-E. coli strain JM101 transformed with the appropriate protein A-fusion plasmids based on the pK19 system (37) was grown to an A 660 of 0.6 and induced with 0.5 mM IPTG for 2 h. Cells were harvested by centrifugation and resuspended in 20 ml of 20 mM MES (pH 6). Cells were lysed by sonication as described above. Two ammonium sulfate precipitation steps were performed. Ammonium sulfate was added to 40% saturation and the precipitate was discarded. Further ammonium sulfate was added to the supernatant to a final concentration of 65% saturation. The pellet was redissolved in 5 ml of a solution of 20 mM MES (pH 6), 1 mM DTT and was applied to an S-Sepharose ion-exchange column. A gradient of NaCl in the same buffer was applied to the column. The peak fractions containing the protein were dialyzed against 20 mM Tris-HCl (pH 7.4), 1 mM dithiothreitol, 50% glycerol for 2 h at 4°C. The concentration of the protein was determined by the Bradford method, calibrated against a previous amino acid analysis for endonuclease VII-H41T (23). The purity of the proteins obtained was verified by polyacrylamide gel electrophoresis in the presence of SDS.
Preparation of Endonuclease VII H38T Without Fusion Polypeptide-Protein A-endonuclease VII H38T was prepared as described above, but after elution from the S-Sepharose column the pooled fractions were not  (23) that is coordinated by four cysteine residues, and the sequence of this region is shown. The His-acid cluster at the center of the section is highlighted in larger type. The C-terminal section is similar to a region of T4 endonuclease V. The central section has some similarity to a region of T7 endonuclease I, and the catalytically inactivating E86A mutation (27) lies in this region. B, an alignment of sequences of probable zinc-binding domains in T4 endonuclease VII, gp59 of mycobacteriophage L5 (31), and open reading frames identified in the secD locus of E. coli and S. typhimurium (32). The sequences are aligned by their CXXC sequences. Note the presence of the His-acid cluster within each of these sequences. dialyzed in the glycerol-containing solution. The buffer was changed by centrifugation in a Centriplus concentrator (Amicon) to 20 mM sodium phosphate (pH 7.4), and the sample was concentrated to 5 ml. The protein was digested overnight with a ratio of 1:500 (w/w) of protease factor Xa. The digested protein was reapplied to the S-Sepharose column. The released protein A was not retained by the resin, and nonfusion endonuclease VII H38T was eluted using a gradient of NaCl in 20 mM MES (pH 6), 1 mM DTT. The protein-containing fractions were dialyzed in 20 mM Tris-HCl (pH 7.4), 1 mM DTT, 50% glycerol for 2 h at 4°C.
Cleavage of Four-way DNA Junctions-Reactions were performed on ice in 10 l of 112 nM four-way DNA junction 3 individually 5Ј-32 Plabeled on either the b, h, r, or x strand and endonuclease VII or mutant protein in 50 mM Tris-HCl (pH 7.4), 50 mM NaCl, 1 mM DTT, 100 g/ml bovine serum albumin, 10 mM MgCl 2 (cleavage reaction buffer) for 20 min. The reactions were terminated by addition of 10 l of formamide, 50 mM EDTA. The samples were loaded on a 10% polyacrylamide denaturing gel (acrylamide/bisacrylamide, 29:1). After electrophoresis the gels were dried onto Whatman 3MM paper and subjected to autoradiography at Ϫ70°C using Fuji RX x-ray film with Ilford fast tungstate intensifier screens. The electrophoresis was performed at 110 V for 4 -6 h. Magnesium salt-containing buffers were continuously recirculated at 1 liter/h. Dried gels were subjected to autoradiography, and the radioactivity present in different bands was quantified as described above. The fraction of DNA bound to protein (f b ) was calculated for each protein concentration, and the association constant (K a ) was calculated by fitting the data by regression analysis to the equation,

Gel Electrophoretic Retardation Analysis of Protein A-Endonuclease VII Mutants and Measurement of the Apparent Binding Constants-
where P T is the total protein concentration (calculated as a dimer) and D T is the total DNA concentration. The dissociation constant (K D ) is the reciprocal of K a . Comparative Gel Electrophoretic Analysis of the Global Structure of Protein-bound Four-way Junctions-Protein A fusions of wild-type and mutant endonuclease VII were incubated separately with each of the six forms of 5Ј-32 P-labeled junction containing two arms of 40 bp and two of 15 bp (see above) for 10 min at room temperature in 10 l of binding buffer (50 mM Tris-HCl (pH 7.4), 100 mM NaCl, 1 mM dithiothreitol, and either 1 mM EDTA or 200 M MgCl 2 ). After addition of 2 l of 35% (w/v) Ficoll solution, the different species were separated by electrophoresis in a 6% polyacrylamide gel (acrylamide/bisacrylamide, 20:1) in the presence of either TBE or TBM. Electrophoresis was performed at 110 V for 16 h.
Determination of Zinc Bound to Protein-Colorimetric assays were performed as described by Giedroc et al. (38) as applied to T4 endonuclease VII (23). The proteins were dialyzed overnight in 20 mM Tris-HCl (pH 8), 600 mM NaCl, and 5% glycerol. 4-(2-Pyridylazo)resorcinol (PAR) was added to 800 l of a 5 M protein solution to a final concentration of 0.1 mM. The protein-resorcinol solution was titrated with p-hydroxymercuriphenylsulfonic acid (PMPS). The zinc released by PMPS was chelated by the resorcinol, and the titration was followed by measuring the absorbance of Zn(II)PAR 2 at 500 nm (49). The concentration of protein-bound zinc was calculated from a titration of standard ZnCl 2 solutions with resorcinol obtained under the same conditions. Absorption measurements were performed on a Cary 1E UV-visible spectrophotometer using 1-ml polystyrene cuvettes. All buffers were treated with the chelating resin Chelex 100 (Sigma). Protein concentrations required for zinc determination were measured by Bradford assay using a control sample whose concentration was obtained by amino acid analysis on an Applied Biosystems 420H analyzer.

RESULTS
Construction and Expression of Endonuclease VII Mutants-We have constructed a series of point mutants within the His-acid cluster of T4 endonuclease VII by means of sitedirected mutagenesis of the synthetic gene described previously (23). A section of the gene was replicated by means of the PCR that included one mutagenic primer. The amplified fragment was cleaved with restriction enzymes and ligated into the corresponding location in the synthetic gene. The sequence of the mutated gene was confirmed by DNA sequencing.
We have previously shown that N-terminal fusions of protein A, maltose binding protein, or oligohistidine affect neither the cleavage of nor the binding to four-way DNA junctions. Moreover, the increased molecular mass of such mutants actually confers an experimental advantage for electrophoretic retardation experiments. The mutant genes were therefore cloned as translational fusions with protein A in the plasmid pK19PRA (37) and transformed in E. coli JM101. Expression was under the control of the lac promoter and was induced by the addition of IPTG. Following ammonium sulfate precipitation, the protein A fusion polypeptides were purified by ion exchange chromatography. The endonuclease VII variants could be released from the protein A fusion by digestion with Factor Xa protease.

TABLE I Mutations introduced into the His-acid region of endonuclease VII
and their properties The wild-type sequence protein was studied as an N-terminal oligohistidine fusion, while all the mutant proteins were analyzed as Nterminal protein A fusions. Dissociation constants were calculated by measuring the extent of binding to DNA junctions of the proteins as a function of their concentration, fitting the data as described under "Materials and Methods." These data were all measured in the absence of added magnesium ions. For protein A-endonuclease VII H38T the error is the standard error obtained from three independent experiments; for the other proteins the errors are derived from the fit of individual data points. Zinc stoichiometries were measured using a colorimetric assay. The errors are the random error on the data points in the absorption plateau region, and the full experimental error is probably larger than this.  2. Gel electrophoresis of endonuclease VII and derived mutant proteins. The purified proteins were separated by SDS-polyacrylamide gel electrophoresis. Track 1, a mixture of proteins to serve as size standards (molecular masses indicated at left, in kDa); track 2, wild-type sequence endonuclease VII as N-terminal oligohistidine fusion (calculated mass, 21 kDa); track 3, protein A-endonuclease VII H38T (calculated mass, 36 kDa); track 4, endonuclease VII H38T released from fusion with protein A by Factor Xa cleavage and purification by chromatography (calculated mass, 18 kDa); track 5, protein A-endonuclease VII D40A (calculated mass, 36 kDa); track 6, protein A-endonuclease VII H41T (calculated mass, 36 kDa); track 7, protein A-endonuclease VII D42A (calculated mass, 36 kDa); track 8, protein A-endonuclease VII H43T (calculated mass, 36 kDa).
Wild-type sequence endonuclease VII was also expressed as a fusion with an N-terminal oligohistidine sequence by transferring the gene into the plasmid pET-19b. The protein was purified by affinity chromatography on a column to which nickel ions were chelated.
The purity of the proteins was analyzed by polyacrylamide gel electrophoresis in a buffer containing SDS, and the preparations were generally found to contain a single polypeptide migrating at the position expected for the calculated mass (Fig. 2).
The single-amino acid changes introduced into endonuclease VII are summarized in Table I. In general we have altered histidine residues to threonine, glutamine, or serine, and aspartate residues to asparagine or alanine.
Mutants with Altered Sequences in the His-Acid Cluster Have Normal Zinc Content-The His-acid cluster is centrally located within the zinc-binding region of endonuclease VII. While previous studies have strongly implicated the four cysteine resi-dues in the coordination of the zinc ion (23), we could not exclude some role for other amino acids, particularly the histidine residues. We therefore measured the zinc content of mutants representative of each position (as N-terminal protein A fusions) using a colorimetric assay (38). The results are summarized in Table I, where it can be seen that each of the mutant proteins analyzed contains 1 mol of zinc/mol of protein within the probable experimental error. Thus, the zinc content of the protein has not been altered by mutation of any of these residues from the wild-type sequence, strongly indicating the lack of a role in zinc coordination for these amino acids.
Activity Four-way DNA junction 3 was prepared radioactively 5Ј-32 P-labeled individually in the b, h, r, or x arm. These species (112 nM) were each incubated with endonuclease VII or mutant protein on ice, and the products of the reaction analyzed by electrophoresis on a sequencing gel and by autoradiography. Endonuclease VII cleaves junction 3 primarily at single phosphodiester bonds on the b and r strands, shown by the arrows drawn on the schematic of the junction. The wild-type sequence enzyme was generated as an N-terminal oligohistidine fusion, while all the mutant proteins have been studied as N-terminal protein A fusions. A, histidine mutants. Tracks 1-4, incubation with 180 nM oligohistidine endonuclease VII; tracks 5-8, incubation with 60 nM protein A-endonuclease VII H38T; tracks 9 -12, incubation with 60 nM protein A-endonuclease VII H38Q; tracks 13-16, incubation with 60 nM protein A-endonuclease VII H38S; tracks 17-20, incubation with 1 M protein A-endonuclease VII H41T; tracks 21-24, incubation with 1 M protein A-endonuclease VII H43T. Tracks 1,5,9,13,17, and 21, junction 3 5Ј-32 P-labeled on the b strand; tracks 2, 6, 10, 14, 18, and 22, junction 3 5Ј-32 P-labeled on the h strand; tracks 3,7,11,15,19, and 23, junction 3 5Ј-32 P-labeled on the r strand; tracks 4,8,12,16,20, and 24, junction 3 5Ј-32 P-labeled on the x strand. B, aspartate mutants. Tracks 1-4, incubation with 180 nM oligohistidine endonuclease VII; tracks 5-8, incubation with 1 M protein A-endonuclease VII D40N; tracks 9 -12, incubation with 1 M protein A-endonuclease VII D40A; tracks 13-16, incubation with 60 nM protein A-endonuclease VII D42N; tracks 17-20, incubation with 60 nM protein A-endonuclease VII D42A. Tracks 1,5,9,13, and 17, junction 3 5Ј-32 P-labeled on the b strand; tracks 2, 6, 10, 14, and 18, junction 3 5Ј-32 P-labeled on the h strand; tracks 3, 7, 11, 15, and 19, junction 3 5Ј-32 P-labeled on the r strand; tracks 4,8,12,16, and 20, junction 3 5Ј-32 P-labeled on the x strand. ard conditions, using a four-way DNA junction with the central sequence of junction 3 of Duckett et al. (39) (Fig. 3), and the results are summarized in Table I. It is clear that aspartate 40 and histidine 41 are essential to activity, because all mutations of these residues result in total loss of detectable activity (note that we have used a 17-fold higher concentration of the inactive mutant proteins (1 M), compared with the active ones (60 nM)). Histidine 43 is also important, because alteration to threonine leads to an almost total loss of activity. By contrast, aspartate 42 can be replaced by asparagine or alanine without detectable loss of activity. Histidine 38 can be changed to threonine with retention of activity, but H38T has an interesting thermal sensitivity that we shall discuss in a further publication. We have previously reported that endonuclease VII H41T was active (23), but we now believe that this was an error. All subsequent preparations have been totally inactive, and comparison of properties suggests that the mutant proteins H41T and H38T might have been temporarily exchanged at that time. All mutants have now been reconfirmed by sequencing the genes used for their expression.
Binding to Four-way DNA Junctions-Endonuclease VII of wild-type sequence binds selectively to four-way DNA junctions. This is also true of a non-catalytic mutant protein endo-nuclease VII E86A; this protein (as either N-terminal fusions or non-fusion) binds to four-way junctions in the presence or absence of magnesium ions and is not displaced by a 1000-fold excess of duplex competitor of the same sequence (27). We examined the binding of the proteins that were mutated in the His-acid cluster to radioactively labeled four-way DNA junction. All were found to bind DNA junctions. Binding titrations were carried out using gel electrophoretic retardation in the presence of 1 mM EDTA (Fig. 4). The titrations are well behaved for all the proteins, giving increasing fractions of a single retarded species as the protein concentration is raised. At protein concentrations higher than 100 nM some super-retarded species could be found in some cases, and thus such data were not used in the calculation of binding affinities.
The ratios of bound and free junction were quantified by phosphorimaging, from which apparent dissociation constants (K D ) were calculated assuming binding of a dimeric species (Fig. 5). Most of the proteins bound with affinities that were close to that of the wild-type sequence (K D in the range 20 -40 nM). Protein A-endonuclease VII H38S and H43T had higher affinity (K D Ϸ 5 nM), while H41T had lower affinity (K D ϭ 96 nM). These results indicate that the loss in activity of the proteins with sequence alterations at Asp-40, His-41, and  FIG. 5. Binding isotherms for endonuclease VII and derived mutant proteins binding to a four-way DNA junction. Extent of protein binding to four-way junctions as a function of total protein concentration was estimated by gel electrophoresis (see the legend to Fig. 4). The fraction of DNA junction bound to protein was calculated for each protein concentration and plotted against the protein molarity (calculated for a dimeric species) on a logarithmic scale. The data were fitted to a model for the binding process (see under "Materials and Methods") from which the binding affinities were calculated. The points plotted are experimental data, and the lines are simulations derived using the association constants derived from the fits. A, binding of protein A-endonuclease VII H38T. Three independently measured sets of data are plotted, differentiated by the use of three different plotting symbols. The line was calculated for a K a ϭ 3. His-43 is not due to impairment in substrate binding, since the mutant proteins D40N, D40A, H38T, and H38Q bind normally, and H43T has a 3-fold higher affinity than the enzyme of wild-type sequence.
The binding affinities of inactive mutant proteins could also be measured in the presence of magnesium ions. We found that protein A-endonuclease VII H41T bound around 2-fold more tightly in the presence of 200 M magnesium ions. The binding affinity of protein A-endonuclease VII D40N was increased by a factor of 1.3 under the same conditions.
Distortion of the Global Structure of Junctions on Binding Endonuclease VII Variants-On binding to junctions, endonuclease VII induces a change in the global configuration of arms, demonstrated by comparative gel electrophoresis studies (27). In this method a four-way junction with arms of 40 bp each in length is subjected to shortening of two arms by restriction cleavage in the six possible combinations (39 -41). The electrophoretic mobility in polyacrylamide of these six two-long, twoshort arm species are compared and analyzed on the basis of the expected relationship (42) between electrophoretic mobility and the angle included between the two long arms. We used this method originally to analyze the structure of the free junction under different conditions (39), but it has more recently been applied to junction-protein complexes (22,27,43).
Both endonuclease VII H38T and E86A induce a change in the global folding of DNA four-way junctions (27). The same structure is generated by the inactive mutant endonuclease VII E86A in either the presence or absence of added magnesium ions and is different from that of the free DNA junction under either set of conditions. It is clear that the four-way junction is extensively manipulated by endonuclease VII, and we asked whether the mutant proteins retained the ability to induce the same structural alteration.
The binding of protein A-endonuclease VII to junction 3 generates a pattern of electrophoretic mobilities described by (27). The relative mobility of the BR species (second from left, as our gels are conventionally loaded) is slower when the protein binds as a protein A fusion. The pattern is interpreted FIG. 6. Comparative gel electrophoretic analysis of the global conformation of four-way DNA junction bound to endonuclease VII variants. Radioactively 5Ј-32 P-labeled junction 3 was assembled from four oligonucleotides of appropriate lengths to generate the six species with two long and two short arms that were purified by gel electrophoresis. The four arms of the junction are labeled B, H, R, and X, and the species with shortened arms are labeled by the names of the two long arms. These six species were each incubated with protein A-endonuclease VII variants and analyzed by electrophoresis in a 6% polyacrylamide gel and by autoradiography. A, complexes of junction 3 with protein A-endonuclease VII H38T, H41T, and H43T in the absence of added metal ions. The six two-long, two-short arm species generated from junction 3 were electrophoresed in the presence of 1 mM EDTA following incubation with protein. Free junction is not shown in this autoradiograph; the bands arise from the protein-bound junction only. Under these conditions, free DNA junction would generate a slow-fastslow-slow-fast-slow pattern indicative of the extended square structure with no coaxial stacking of arms (39). Each of the proteins generates the pattern demonstrated previously for protein A-endonuclease VII E86A and H38T (27). This pattern is different from the free DNA and indicates a protein-induced folding of the four-way junction. The resulting global structure is clearly the same for all three histidine mutants. The following double restriction digests were electrophoresed: tracks 1, 7, and 13, species BH, with long B and H arms; tracks 2, 8, and 14, species BR, with long B and R arms; tracks 3, 9, and 15, species BX, with long B and X arms; tracks 4, 10, and 16, species HR, with long H and R arms; tracks 5, 11, and 17, species HX, with long H and X arms; tracks 6, 12, and 18, species RX, with long R and X arms. B, complex of junction 3 with protein A-endonuclease VII H41T in the presence of added mag-nesium ions. Since this mutant is catalytically inactive it can be studied in the presence of magnesium ions. The six two-long, two-short arm species were incubated with protein A-endonuclease VII H41T in the presence of 200 M magnesium ions. The products were analyzed by electrophoresis in 6% polyacrylamide containing 100 M magnesium ions and by autoradiography. For each of the long-short arm species, two radioactive bands are evident, corresponding to free DNA and complex. The free DNA exhibits the slow-intermediate-fast-fast-intermediate-slow pattern indicative of the stacked X structure with B on X coaxial stacking indicated at the right (39). The structure of this stacking isomer gives rise to the three pairs of long-short arm species in which the included angles between the long arms are acute (BH and RX are slow species), obtuse (BR and HX are intermediate species), or linear (BX and HR are fast species). This interpretation is summarized below the schematic of the stacked X structure on the right. The pattern of mobilities for the long-short species of the junction-protein complex is clearly different from that of the free DNA and is the same as that found in the presence of EDTA (compare with A). The electrophoretic pattern (and thus the global structure of the DNA junction in the presence of this mutant) is unchanged by the presence or absence of magnesium ions, just as was found previously for the complex with protein A-endonuclease VII E86A. Track 1, BH species; track 2, BR species; track 3, BX species; track 4, HR species; track 5, HX species; track 6, RX species. C, complexes of junction 3 with protein A-endonuclease VII D40N and D42N in the absence of added metal ions. Free junction is not shown in this autoradiograph. Once again, both proteins generate the same pattern of electrophoretic mobilities found for the other proteins. Thus, all the variants of endonuclease VII appear to impose the same global structure on the four-way DNA junction. Tracks 1 and 7, BH species; tracks 2 and 8, BR species; tracks 3 and 9, BX species; tracks 4 and 10, HR species; tracks 5 and 11, HX species; tracks 6 and 12, RX species. in terms of a principal binding across arms H and X (in which the cleavages are introduced by the active enzyme in the presence of magnesium ions) and a rotation of arms B and R toward arms H and X, respectively, together with a movement out of the plane (27). Fig. 6A compares the electrophoretic patterns of the complexes of junction 3 with protein A fusions of the three histidine-to-threonine mutants of endonuclease VII, in the presence of 1 mM EDTA (TBE buffer) to prevent cleavage by active enzyme. The patterns of mobilities of the six long-short species are identical for all three proteins, indicating that all three induce the same global conformation of arms on the four-way DNA junction. The experiment was repeated for protein A endonuclease VII H41T in the presence of 200 M magnesium ions (TBM buffer), conditions where the free DNA junction folds into the stacked X structure. Since this mutant is completely inactive as a nuclease, the experiment can be carried out in the presence of magnesium ions without inducing cleavage of the DNA. Despite the change of conditions, the complex clearly has the same global structure, resulting in an unchanged pattern of electrophoretic mobilities (Fig. 6B).
The aspartate mutants also generated the same structure in the four-way junction. Fig. 6C shows the comparative gel electrophoretic analysis of the complexes of the six long-short variants of junction 3 with endonuclease VII D40N and D42N in the presence of 1 mM EDTA (TBE buffer). Once again the pattern is unchanged from those of all the endonuclease VII variants.
These results suggest that the binding process is very similar for all of the mutants of the His-acid cluster studied. This is further evidence that the lack of catalytic activity in some mutant proteins was not due to impairment of binding. DISCUSSION The mutation analysis confirms that amino acids contained within the conserved His-acid cluster are important in the function of T4 endonuclease VII. In particular, one aspartate (Asp-40) and two histidine residues (His-41 and His-43) are required for cleavage of DNA junctions. Since mutation of these residues leads to only small changes in binding affinity for DNA junctions, it is likely that they are involved (directly or indirectly) in the catalysis of phosphodiester bond hydrolysis. The zinc content of none of the mutants was significantly altered from 1 mol/mol of protein, and thus a role in the coordination of zinc is not likely. We have previously shown (27) that another acidic residue, glutamate 86, is essential for catalytic activity. This is located in the second region of the primary sequence of the protein, which exhibits some similarity to endonuclease I of T7. It is therefore possible that the active site of the enzyme comprises amino acid side chains from both of these regions of the polypeptide. We suspect that acidic side chains may be involved in the coordination of a magnesium ion required for the cleavage reaction. This is a common feature of nucleases (33)(34)(35), including the E. coli junction-resolving enzyme RuvC (36), where the catalytic center contains three aspartate residues and one glutamate residue.
All the endonuclease VII sequence variants examined retained selective binding to DNA junctions. This is a further indication of the divisibility of binding and catalysis in this enzyme. Small changes in binding affinities were measured over the set of mutant proteins, but this difference was only about 20-fold between the tightest (endonuclease VII H38S) and weakest binding (endonuclease VII H41T) protein, corresponding to an overall difference in binding free energy of 1.7 kcal mol Ϫ1 . In most cases the differences are much smaller than this. This suggests that sequence changes in the His-acid cluster tend to affect interactions primarily with the transition state rather than the ground state DNA structure. It is interesting to note that the mycobacteriophage L5 gp59 protein has a serine at the position corresponding to histidine 38 in endonuclease VII. This protein therefore has the same sequence at this position as the tightest binding endonuclease VII variant.
An interesting feature of the binding of endonuclease VII to four-way junctions is the change in the global structure of the DNA (27). Distortion of DNA structure upon binding of junction-selective proteins appears to be rather general, having also been observed for T7 endonuclease I (22), E. coli RuvA (43) and RuvC (26), and yeast CCE1. 2 The distortion imposed on the global configuration of helical arms in the junction by endonuclease VII is quite significant, and we have proposed a model of the structure of the bound junction involving an unstacking of the arms at the point of strand exchange (27). All of the sequence variants studied here appear to induce the same change in junction structure, whether active or inactive, despite small differences in binding affinity. This further supports the contention that the basic binding processes are unaltered by any of the sequence changes in the His-acid cluster. This is also indicated by the fact that mutations within the region result in either proteins that cleave at exactly the same positions as the wild-type enzyme or fail to cleave at all. None causes an alteration in the cleavage pattern that might be expected if the manner of substrate binding had been changed.
In summary, we have found that a number of sequence changes in the His-acid cluster of endonuclease VII lead to reduced activity of the enzyme. In particular aspartate 40 and histidines 41 and 43 appear to be required for cleavage of DNA junctions. However, none of these mutations appears to affect binding to DNA junctions (beyond relatively small changes in affinity) or the distortion of DNA structure. It is therefore quite likely that the His-acid cluster will prove to be important in generating the active site of T4 endonuclease VII.