Recognition of Flanking DNA Sequences by EcoRV Endonuclease Involves Alternative Patterns of Water-mediated Contacts*

The 2.1-Å cocrystal structure ofEcoRV endonuclease bound to 5′-CGGGATATCCC, in a crystal lattice isomorphous with the cocrystallized undecamer 5′-AAAGATATCTT previously determined, shows novel base recognition in the major groove of the DNA flanking the GATATC target site. Lys104 of the enzyme interacts through water molecules with the exocyclic N-4 amino groups of flanking cytosines. Steric exclusion of water molecule-binding sites by the 5-methyl group of thymine drives the adoption of alternative water-mediated contacts with ATversus GC flanks. This structure provides a rare example of structural adaptability in the recognition of different DNA sequences by a protein and suggests preferred strategies for the expansion of target site specificity by EcoRV.

The 2.1-Å cocrystal structure of EcoRV endonuclease bound to 5-CGGGATATCCC, in a crystal lattice isomorphous with the cocrystallized undecamer 5-AAA-GATATCTT previously determined, shows novel base recognition in the major groove of the DNA flanking the GATATC target site. Lys 104 of the enzyme interacts through water molecules with the exocyclic N-4 amino groups of flanking cytosines. Steric exclusion of water molecule-binding sites by the 5-methyl group of thymine drives the adoption of alternative water-mediated contacts with AT versus GC flanks. This structure provides a rare example of structural adaptability in the recognition of different DNA sequences by a protein and suggests preferred strategies for the expansion of target site specificity by EcoRV.
Restriction endonucleases function in all prokaryotes as components of defensive restriction-modification systems and are superb models for the study of protein-DNA interactions owing to their exceptionally high sequence specificities. The type II restriction-modification systems are the best studied from a structure-function perspective and are composed of a homodimeric endonuclease and monomeric methylase (1,2). The crystal structures of the following six restriction enzymes have been determined: EcoRI (3), EcoRV (4 -6), BamHI (7,8), PvuII (9), Cfr10I (10), and FokI (11). With the exception of Cfr10I and EcoRI, the enzymes have been solved in both the absence and presence of DNA. The structures reveal extensive complementarity at the protein-DNA interfaces, which appears to explain the high specificities of up to 10 6 -fold in cleavage rate constants for the specific sites (Ref. 12 and references therein). An additional, less appreciated, contribution to specificity may also arise from DNA-induced conformational changes in the enzymes.
EcoRV has recently emerged as the best studied of the restriction endonucleases from both structural and mechanistic standpoints. This enzyme cleaves the sequence 5Ј-GATATC-3Ј at the center TA step in a blunt-ended fashion, generating 5Ј-phosphate groups (13). The cocrystal structure of the EcoRV-DNA complex reveals a tight network of hydrogen bonding and electrostatic and van der Waals contacts at the protein-DNA interface over the entire hexameric DNA site (4). Specificity at the outer two base pairs of each half-site is determined by hydrogen bonding with discriminating base functional groups in the major groove. At the center step the DNA is bent sharply by 50°into the major groove, so that protein cannot penetrate to contact the hydrogen bonding moieties. Indirect readout is thus implicated in specificity at this position. This may originate in part from differences in the energetic cost of partially unstacking the center TA step relative to CG or GC steps (14).
Whereas the specificity of EcoRV in vivo is limited to the hexamer target site GATATC, the crystal structures show that the enzyme also contacts 2-3 base pairs of DNA to either side of this site (4). These contacts with flanking DNA in five cocrystal structures with 5Ј-AAAGATATCTT at 2.0 -2.1-Å resolution (5,6,14), and with the decamer 5Ј-GGGATATCCC at 3.0 Å, are primarily with the sugar-phosphate backbone (4,5). Several studies have shown that the enzyme is sensitive to perturbation of these contacts. Replacement of phosphate groups with phosphorothioates showed that S p and R p substitutions directly 3Ј to GATATC reduce V max /K m toward a dodecamer substrate by 4-and 50-fold, respectively (15). Furthermore, mutation of four amino acids contacting flanking phosphate groups reduces k cat /K m toward a 20-mer substrate containing GATATC (16). The most striking effect occurs with the mutant R226A within the C-terminal subdomain, which lowers activity by nearly 10 3 -fold relative to the wild-type enzyme.
These studies show that the distal enzyme-DNA contacts outside the target site significantly increase the catalytic rate, perhaps by helping to orchestrate the mutual conformational changes in the enzyme and DNA which occur en route to the transition state. The biological role of EcoRV (and other type II enzymes) requires that all flanking sequences permit target site cleavage with high catalytic efficiency. However, this does not preclude the existence of some sequence preference. Indeed, 10-and 500-fold variations in binding constants with different flanks have been found in in vitro studies of EcoRV and EcoRI, respectively (17,18). Additionally, within the context of cleavage of the "star" site GTTATC, EcoRV prefers 5Ј-G and 3Ј-C on either side of the target (19,20). These flanking sequence selectivities are modest compared with the 10 6 -fold and greater specificities versus base substitutions internal to the GATATC site, but they nonetheless suggest that target site expansion may be feasible. This is a prospect of considerable practical importance for the development of new restriction enzymes specific for 8 -10 base recognition sites.
Attempts to alter the substrate specificity of the EcoRI and EcoRV restriction endonucleases have not been successful thus far (21)(22)(23)(24), indicating a need for further basic studies. Whereas EcoRV is the best studied enzyme of the class, its structure bound to DNA with GC flanks is determined at only 3.0 Å resolution (4). This is insufficient for describing detailed protein-DNA interactions, particularly those involving solvent molecules. Moreover, large differences in DNA and protein conformation, most probably arising from altered crystal packing contacts, have been noted in the structures of the enzyme bound to specific DNA sites possessing AT versus GC flanks (4 -6, 14). Therefore, to characterize the structural aspects of the flanking interactions better, we have determined a new cocrystal structure of EcoRV bound to 5Ј-CGGGATATCCC at 2.1 Å resolution. We find that Lys 104 makes novel water-mediated interactions with hydrogen bonding functional groups of the flanking GC bases. By contrast, no interactions of Lys 104 with bases of the flanking DNA are present in the five 2.0 -2.1-Å resolution complexes with 5Ј-AAAGATATCTT (5,6,14). These data provide insight into new modes of flanking sequence recognition by EcoRV and suggest preferred alternatives for engineering sequence specificity for larger DNA sites into the enzyme.

Purification and Cocrystallization of EcoRV and DNA-Wild-type
EcoRV was prepared as described and stored as an ammonium sulfate pellet at 4°C (6). The self-complementary oligonucleotide 5Ј-CGG-GATATCCC was synthesized for cocrystallization by standard methods and purified on a Rainin PureDNA high pressure liquid chromatography column developed in a gradient of triethylammonium acetate/acetonitrile. Detritylation was performed on the column (25). The DNA was lyophilized and stored at Ϫ20°C until ready for use, when it was brought to a concentration of 10 mg/ml in 50 mM Tris (pH 7.5), 1 mM EDTA. Cocrystals of EcoRV complexed with 5Ј-CGGGATATCCC were grown by vapor diffusion from solutions containing 0.17 mM EcoRV and 0.34 mM DNA, in the presence of 15% PEG 4K, 100 mM imidazole (pH 6.5), and 150 mM NaCl (final conditions). The protein was prepared by resuspending the ammonium sulfate slurry at 30 mg/ml in a buffer containing 10 mM HEPES (pH 7.5), 250 mM NaCl, 1 mM EDTA, and 0.1 mM dithiothreitol and then exhaustively dialyzed versus this buffer. Crystals were mounted for data collection directly from the drop.
X-ray Structure Determination-X-ray diffraction amplitudes were measured on an R-AXIS IIC area detector mounted on a Rigaku RU-200 rotating anode generator. Data were obtained at ambient temperature from two crystals. Determination of the orientation matrix, integration, scaling, and merging of data was performed with DENZO and with the HKL suite of programs (26). Local scaling of the structure-factor amplitudes against those of the isomorphous cocrystal with 5Ј-AAA-GATATCTT (6) was performed using the program MAXSCALE. The resulting scaled data set resulted in a refined structure possessing an R free (Table I) reduced by 2% as compared with the refinement versus unscaled data.
The crystal structure of EcoRV complexed to 5Ј-CGGGATATCCC was phased directly using the isomorphous cocrystal with 5Ј-AAA-GATATCTT (5) as the starting model. The flanking DNA sequences and 57 water molecules at the enzyme-DNA interface were removed from the model. The starting crystallographic R-factor was 31.1%, and this was reduced to 19.6% by several rounds of model building iterated with positional B-factor and simulated annealing refinement in XPLOR (27). Very tight stereochemical constraints were maintained throughout the refinement. Criteria for the inclusion of water molecules in both structures were the appearance of peaks at 1.0 in (2F o Ϫ F c ) maps, 3.0 in (F o Ϫ F c ) maps, and at least one hydrogen bonding interaction with protein or DNA. All water molecules and side chains possessing Bfactors above 50.0 Å 2 were carefully examined prior to inclusion in the final model.
Model building utilized the programs CHAIN (28) and LORE (29). Parameters for the DNA used in XPLOR refinement were those recently described (30). Determination of C␣ atoms in the dimerization or DNA-binding domains was by difference-distance calculations, as described (6). Least squares superpositions were performed with Insight II (31) and GEM (32).

RESULTS
EcoRV was cocrystallized with the undecamer duplex 5Ј-CGGGATATCCC possessing a 5Ј-C overhang (dGC) 1 in the absence of divalent metals. The crystals are in the well described P1 lattice (5, 6) with cell dimensions isomorphous to the previous complexes (Table I). Nearly all of the molecular packing interactions in this crystal are identical with those of the cocrystal with 5Ј-AAAGATATCTT (dAT). Only base stacking of the unpaired 5Ј nucleotide (C versus A) on the dimerization domain of an adjoining molecule is very slightly altered because of the differing sizes of the pyrimidine and purine rings. The crystals have an enzyme dimer and duplex DNA in the asymmetric unit, so that this packing interaction is not made on the opposite end of the molecule. Conserved features observed on each of the two flanks occur in the context of different nearby lattice contacts and are consequently likely to reflect true aspects of the flanking interactions in solution. Thus, comparison of the cocrystal structures of EcoRV complexed with dAT versus dGC provides an excellent opportunity to elucidate detailed differences in the interactions of EcoRV with AT versus GC flanks.
The overall structure of this EcoRV-DNA complex is very similar to that of EcoRV cocrystallized with dAT ( Fig. 1 (5)), with no significant differences in structure detectable within the GATATC target site. The quaternary structure is also identical with that of the dAT cocrystal lacking divalent metal ions. The relative orientations of the DNA binding/catalytic domains in the two structures differ by only a 0.4 -0.6°rotation and a 0.22-Å difference in the center of mass separation of the two subunits. These differences are within the level of coordinate error, estimated from Luzzatti plots at roughly 0.2-0.25 Å for each structure (data not shown). Small but perhaps significant intersubunit rotations of 1.0 -1.5°are present in comparisons of either cocrystal lacking divalent metals, with isomorphous crystals of metal-containing ternary complexes (data not shown (5,6)).
Nearly all of the previously observed major and minor groove enzyme-DNA contacts, both with base functional groups and sugar-phosphate moieties, are also present in this complex. The only significant exception is the conformation of the Arg 221 side chain in both subunits. In the dAT structure the guanidinium group makes a direct electrostatic interaction with a DNA phosphate at GpATATC in subunit I and a water-medi- where F obs and F calc are the observed and calculated structure factor magnitudes. R free is calculated with removal of 10% of the data as the test set, followed by simulated annealing refinement of the final model. r.m.s. indicates root mean square. DNA  a Includes all data in the intensity range I/(I) Ͼ Ϫ3.0. b Overall B factor is determined from a Wilson plot of the structure factor data using a low resolution cut-off of 4.7 Å. c Refinement was carried out using a low resolution cut-off of 6.0 Å. ated interaction with the identical phosphate on the opposite half-site in subunit II (5). In the dGC structure, the direct contact is absent, and the side chain instead adopts a novel well ordered conformation where it bridges through two ordered water molecules to the guanidinium group of Arg 115 (Fig. 2). The water-mediated phosphate interaction in subunit II is preserved in dGC, but the conformation of Arg 221 differs somewhat to permit an additional water-mediated interaction with Arg 115 . The conformations observed in dGC are further demonstrations of the alternative binding modes accessible to Arg 221 and are unique in the sense that in one subunit no contacts with the DNA are made. Apparently Arg 221 has very little energetic preference for making water-mediated intramolecular protein contacts as compared with interactions with the DNA. Conformations of the surrounding 220s loop, the DNA backbone, and Arg 115 are identical in the two structures. The mutation R221A is without significant effect on the activity of the enzyme (16), and this can now be rationalized based on the observation that the side chain adopts conformations which do not involve DNA binding. The newly introduced exocyclic amino groups of the guanosine flanking bases are outside of the region contacted by the minor groove binding Q loops (Fig. 1) and surface loops at residues Lys 119 -Asn 120 and are consequently without effect on the structure. However, differences between the dAT and dGC cocrystal structures appear in the major groove interactions of the flanking base pairs with the enzyme. On both sides of the GATATC target the exocyclic N-4 amino groups of flanking cytosines interact through water molecules to Lys 104 on an enzyme surface loop (Fig. 3, A and B). Water molecules bridging Lys 104 to the flanking bases are not present in any of the dAT structures previously determined (5, 6) and could not be visualized in the cocrystal with the decamer 5Ј-GGGATATCCC (4) owing to the lower resolution (3 Å) of this structure.
In subunit I Lys 104 bridges through a network of water molecules binding in the major groove at the two flanking CG pairs ( Fig. 3A and Table II). The water molecules hydrogenbond with protein at three positions as follows: the Lys 104 side chain amine, the backbone at amino acids Ala 181 -Gly 182 , and the backbone at Phe 105 . Water molecules 3 and 5 also bridge through other water molecules to the DNA phosphates at Cyt 1 and Cyt 9 (Table II). In this subunit Lys 104 interacts through one water molecule to the N-4 of Cyt 11 and through two water molecules to the O-6 of its base-pairing partner at Gua 2 , whereas its connection to the inner flanking pair Gua 3 -Cyt 10 is through a chain of three water molecules: Wat 1,3,4 . Recognition of this inner pair occurs via water molecules more closely connected to the Ala 181 -Gly 182 main chain. The Gly 182 -amide also serves as a hydrogen bond acceptor from the N-4 of Cyt 9 , providing part of the discrimination for the outer GC base pairs of GATATC (Fig. 3, A and B). Similar water-mediated interactions of Ala 181 and Gly 182 with the inner Ade 3 -Thy 10 pair are observed in the dAT structures (4 -6, 14). However, in these structures no direct or water-mediated recognition of base functional groups on the outer flanking Ade 2 -Thy 11 pair occurs.
In subunit II the Lys 104 -amine group interacts directly with two water molecules, one of which (Wat 2 Ј) bridges directly to the N-4 amino group of the inner flanking cytosine (Fig. 3B). Wat 2 Ј also interacts with another water (Wat 3 Ј) which donates a hydrogen bond to the main chain amide of Gly 182 . While preserving the common feature of water-mediated flanking sequence recognition through Lys 104 , the interactions made in subunits I and II thus clearly differ. This appears attributable to the proximity of a crystal packing contact made by the 5-overhanging cytosine (Cyt 1 ) in subunit I. The base of Cyt 1 packs onto the peptide main chain in the dimerization domain of an adjoining molecule, which provides stabilization in a manner apparently similar to the continuation of base-stacking in a longer duplex DNA. In subunit II, which lacks this lattice contact, the 5Ј-overhanging cytosine is disordered. Moreover, the outer flanking Gua 2 -Cyt 11 pair is also destabilized; atomic B-factors of Cyt 11 are above 60 Å 2 , and the electron density for this nucleotide is weak in the final 2F o Ϫ F c electron density maps (Fig. 3D). Superposition of the flanking interactions from subunits I and II of dGC shows that Cyt 11 of subunit II is shifted away from the protein by approximately 1.0 -1.5 Å (Fig.  4), and this can account for why water-mediated interactions with the Gua 2 -Cyt 11 pair are not observed in subunit II. It thus appears that the water-mediated flanking interactions in subunit I are more likely to be representative of the contacts present in solution.
The recognition of the flanking CG base pairs by these watermediated interactions is clearly nonspecific. In no case does a water molecule make more than one hydrogen bond with protein, a requirement for presenting obligate donor/acceptor functions to the DNA (33). Therefore, the waters could in principle reorient their two hydrogen bond donor and acceptor groups to provide equivalent interactions with flanking TA pairs. Moreover, it is also very unlikely that discrimination could be achieved against flanking GC or AT base pairs in which the purine and pyrimidine rings are exchanged. This is because the positions of the major groove hydrogen-bonding sites in these pairs are altered by only 1 Å (34), and this shift should be readily accommodated by small rearrangements of the waters.
To address why the apparently nonspecific water-mediated interactions between Lys 104 and the outer flanking base pairs are not present in dAT, we superimposed the dAT and dGC structures based on polypeptide backbone atoms within the DNA-binding domains (root mean square deviation ϭ 0.24 Å for the superposition of 244 amino acids (6)). This shows that in both subunits one of the water molecules bound to Lys 104 is blocked from binding to flanking AT pairs by one of the thymine C-5 methyl groups (Fig. 5, A and B). In subunit II the steric hindrance by Thy 10 , occluding binding of Wat 2 Ј, is sufficient to account for why no water-mediated contacts with Lys 104 are observed in dAT. This is because Wat 2 Ј itself interacts directly with Cyt 10 (Fig. 5B). However, in subunit I it is more difficult to explain why the steric exclusion of Wat 2 should also disrupt the Wat 1 and Wat 3 interactions (Fig. 5A). Whereas all the waters and groups of the flanking DNA and nearby protein are clearly visible in OMIT electron density    Table II, and their hydrogenbonding contacts are shown in Fig. 3, A and B. Protein atoms of Lys 104 and Ala 181 -Gly 184 are shown in green for subunit I and in yellow for subunit II. maps, these residues are clearly more mobile than those located internal to the GATATC site. This is evidenced by thermal factors ranging from 25 to 45 Å 2 for the waters, Lys 104 , and the flanking DNA base pairs (with the exception of Cyt 11 of subunit II, as noted above). By contrast, considerably lower B-factors, suggestive of reduced mobilities, are associated with the target site DNA and many of its interacting protein segments. Thus, removal of any single interaction in the flanks might be sufficient to disrupt a number of adjoining contacts as well.

DISCUSSION
Influence of Crystal Packing Interactions on the Structure of the GATATC Target Site-Comparison of the high resolution, isomorphous cocrystal structures of EcoRV complexed to dAT and dGC shows that no significant differences in DNA conformation are detectable within the GATATC target site. This finding differs from that of Winkler and colleagues (4, 5) who reported significant differences in DNA conformation between cocrystals with AT versus GC flanks. These differences appear particularly significant in the sugar-phosphate backbone at the scissile center-base step, where the 3.0-Å cocrystal structure with GC flanks shows that the DNA at this position possesses a nearly A-like conformation (4). However, this structure was determined in an orthorhombic crystal lattice, with distinctly altered intermolecular packing contacts involving both the enzyme and the ends of the DNA. Therefore, it was not possible to deduce whether the different conformations observed inside the GATATC site were a consequence of the changed lattice contacts or arose as a propagated effect from the different flanks. Because this dGC cocrystal structure is in a lattice isomorphous to the dAT structure, it is now possible to state definitively that flanking DNA sequence has no effect on DNA structure inside the target site.
Strategies for the Recognition of Alternative DNA Sequences by Proteins-The dGC structure reported here illustrates three new structural features associated with flanking sequence interactions by EcoRV. First, it reveals a role for a new segment of the enzyme at Lys 104 -Phe 105 in the recognition process. Second, it shows that the water-mediated recognition of base functional groups in the flanks is not limited to the inner flanking pair, as suggested by the dAT structures, but includes as well the adjoining base pair farther from the target site. Third, it provides a rare detailed example of how a protein is able to use alternative intermolecular contacts to recognize different DNA sequences. In this case, an important driving force for the new interactions, and the consequent recruitment of Lys 104 in recognition, arises from steric occlusion of water-binding sites by thymine C-5 methyl groups.
Other mechanisms may also be operative in determining the positions of the waters in different flanking sequence contexts. For example, in the dAT structures, a water molecule bridges the Ala 181 main chain carbonyl group to the N-6 of Ade 3 of the inner flanking pair (5,6). Comparative analysis of the dAT and dGC structures shows that these two groups are 6.0 Å apart in dAT and 7.0 Å apart in dGC, a difference outside estimated errors given that the coordinates of each individual structure are determined within 0.2-0.25 Å. The greater separation of hydrogen-bonding groups accounts for the absence of this water in the dGC structure (there is no steric overlap with other water molecule-binding sites). The altered relative positions of protein and DNA groups may originate in intrinsic sequencedependent variations in B-DNA structure (33), which can play a key role in favoring particular conformations adopted by DNA when bound to proteins. High resolution analyses of DNase I bound to different DNA sequences have provided a detailed description of the operation of this mechanism (37, 38).
Another example of the recognition of alternative sequences by a protein is that of the estrogen receptor, which is able to adapt to different nonconsensus target sites via the rearrangement of a lysine side chain at the protein-DNA interface (35). In FIG. 5. A, superposition of the crystal structures of EcoRV bound to dAT (green) and dGC (red) using polypeptide backbone atoms within the DNA-binding domain of subunit I (6). Hydrogen bonds are shown in dotted white lines, and the distances between the two electronegative atoms are shown in Ångstrom units. In the dAT structure, steric exclusion of the water molecule hydrogen-bonded to Lys 104 is indicated by the short 2.0-Å contact between this water and the C-5 methyl of the outer flanking T (T11). C-9 is the outer cytosine of the GATATC target site. B, superposition of the crystal structures of EcoRV bound to dAT (green) and dGC (red) using polypeptide backbone atoms within the DNAbinding domain of subunit II. The steric exclusion by the C-5 methyl of T10 in the dAT structure is shown by the short 2.1-Å contact. this case, the lysine adopts a new orientation in response to an unfavorable juxtaposition with the N-6 of an adenine residue substituted for a guanine in the consensus site. In its new orientation the lysine also makes new intramolecular contacts within the protein. In contrast, conformations of Lys 104 in the dAT versus dGC structures of EcoRV differ less dramatically (Fig. 5, A and B). In subunit II of dGC the most important difference is in the dihedral angle C␥-C␦-C⑀-N⑀, so that N⑀ points toward the DNA (Fig. 5B). In subunit I there are instead significant differences in many of the Lys 104 side chain dihe-drals in dGC (Fig. 5A), the effect of which is to improve the linearity of the N⑀-H-O hydrogen-bonding contact relative to that which would obtain in dAT.
The structure of EcoRV endonuclease bound to dGC thus reveals additional strategies for the recognition of alternative DNA sequences, in addition to the rearrangement of amino acids at the interface as seen in the estrogen receptor-DNA complexes. These exploit the potential for water-mediated recognition, showing how alternative arrangements of water molecules can form in the areas of protein-binding sites where the FIG. 6. Models of possible interactions of EcoRV mutants with flanking DNA. These models were constructed using subunit II of the dGC cocrystal structure as the starting point. Mutations in the protein were introduced and torsion angles varied systematically using InsightII (31). Optimal least squares superpositions of alternative base pairs were carried out using atoms in the glycosidic bonds, and the glycosidic torsion angles were adjusted to match those in the dGC structure. A, the K104R/A181E mutant interacting with a flanking CG base pair. Here a 3Ј-G (GUA10) replaces the 3Ј-C visualized in the dGC structure. CYT9 is the 3Ј-base of the target site GATATC. Dotted black lines indicate modeled hydrogen bonds or nearest approach distances (see text) with the distance between the two electronegative atoms indicated in Å. The specific recognition of the outer base pair of the target site by Gly 182 and Gly 184 is also shown. B, the K104R/A181K mutant interacting with a flanking TA base pair, where a 3Ј-T (THY10) replaces the 3Ј-C in the dGC structure. C, the K104R/A181K mutant interacting with a flanking GC base pair.
Here the flanking pair is as visualized in the dGC structure with a 3Ј-C nucleotide adjacent to the target site. DNA sequence to be recognized is not unique. These new patterns can be driven by steric occlusion or by the intrinsic sequence-dependent variations in DNA structure. It is expected that further examples of alternative water-mediated contacts, as a structural underpinning for broad specificity, will be found in other protein-DNA complexes. High resolution studies of altered complexes in the engrailed homeodomain (39) and glucocorticoid receptor (40) systems, in which the mutated protein has an altered binding specificity, also provide insight into this question.
Implications for Specificity-Flanking interactions are important in wild-type EcoRV as they contribute to the overall binding energy, as well as to k cat /K m (16). The effects of differing flanking sequences on the association constant K A has been measured to be ϳ10-fold for EcoRV (17) and 500-fold for EcoRI (18). Whereas no systematic study is available on the effect of flanking sequences on catalytic rate by EcoRV, the cleavage rates for the EcoRI substrates were found to be unaffected by the sequence of the flanking regions (18). Therefore, at least in the case of EcoRI, the energetic effect of flanking sequences on the transition state is equivalent to that on the ground state (18), and effects that stabilize the ground state are predicted to stabilize the transition state by the same amount, thus increasing k cat /K m . Such a linear free energy relationship has been observed using base analogues in the cognate sequence of EcoRV. 2 This also indicates that the structures of the ground and transition states are very similar (18), validating structural studies on the ground state.
Strategies for Target Site Expansion of EcoRV-Even though the interaction of EcoRV with flanking DNA bases is nonspecific, the existence of these contacts with potentially discriminating groups in the major groove suggests that expanding the target site selectivity via protein engineering approaches may be feasible. In the wild-type complexes, lack of specificity for flanking DNA is most evidently due to the absence of direct, discriminatory protein-DNA hydrogen bonds. However, a further rationale is that the water-mediated contacts with the bases, and the interactions with the sugar-phosphate backbone, are less well ordered than those internal to GATATC. This is reflected in the relatively high crystallographic B-factors (ranging from 25 to 45 Å 2 ) of the two flanking DNA base pairs, the enzyme surface loops in which Lys 104 and amino acids 221-226 reside, and the intervening waters. Therefore, an important starting point toward introducing expanded specificity is to decrease the mobility of these groups. It seems likely that one way to accomplish this objective is to build in direct protein interactions with the flanks.
The dGC crystal structure draws attention to the possibility that direct interactions with the flanking DNA bases might be introduced via mutation of Lys 104 . Introduction of these direct interactions has potential both to improve the stability of the flanking regions and to eliminate the ambiguities sometimes associated with obtaining specificity through water-mediated hydrogen bonding. Modeling from the dGC structure shows that substitution with Arg 104 permits, without steric clashes, direct contact in the major groove with the O-4 of a thymine base directly 3Ј to the target site (Fig. 6B). The closest approach of the Arg guanidinium to the O-6 of a guanine base modeled at the position of the 3Ј-cytosine is 3.7 Å, so that direct contact can be envisioned with the requirement for only small structural rearrangements (Fig. 6A).
The design of new distal contacts to provide discrimination for flanking bases at the catalytic step may be aided by the fact that Gly 182 makes interactions both internal and external to GATATC (Fig. 3). New flanking interactions, if made by a segment of protein already directly interacting with the hexameric target, might more readily participate in the cooperative conformational changes that occur en route to the transition state. In fact, modeling shows that several substitutions at the adjacent Ala 181 can position hydrogen-bonding amino acids in the major groove, at the flanking base pair directly adjacent to the target site (37). For example, the A181E substitution allows direct contact with the N-4 of a 5Ј-cytosine (Fig. 6A). Alternatively, Lys 181 can be modeled to donate hydrogen bonds to both the N-7 and O-6 atoms of a 5Ј-guanine base. In the context of an adjacent TA base pair (Fig. 6B), the interaction of Lys 181 with the N-7 of the 5Ј-adenine could provide some additional stability, with specificity for the 3Ј-thymine arising from Arg 104 . Thus either of the A181E or A181K mutations, perhaps together with the introduction of K104R, should provide a reasonable starting point for the rational extension of site specificity in EcoRV. 3 The designs might also be aided by consideration of the properties of enzyme mutants that interact with the target site and that introduce deficiencies at the catalytic step. Introduction of new specific flanking interactions, in the context of a mutant background, might provide additional binding energy useful in reconstituting the required conformational changes upon DNA binding. Unlike the present circumstance with wild-type EcoRV, however, the sequencespecific nature of these contacts would create the potential for specific rather than nonspecific rate enhancement.