Amino Acid Residues in Both the Protein Splicing and Endonuclease Domains of the PI- Sce I Intein Mediate DNA Binding*

A structure-based model describing the interaction of the two-domain PI- Sce I endonuclease with its 31-base pair DNA substrate suggests that the endonuclease domain (domain II) contacts the cleavage site region of the substrate, while the protein splicing domain (domain I) interacts with a distal region that is sufficient for high affinity binding. To support this model, alanine-scan-ning mutagenesis was used to assemble a set of 49 PI-Sce I mutant proteins that were purified and assayed for their DNA binding and cleavage properties. Fourteen mutant proteins were 4- to > 500-fold less active than wild-type PI- Sce I in cleavage assays, and one mutant (T225A) was 3-fold more active. Alanine substitution at two positions in domain I reduces overall binding > 60- fold by perturbing the interaction of PI- Sce I with the minimal binding region. Conversely, mutations in domain II have little effect on binding, reduce binding to the cleavage site region only, or affect binding to both regions. Interestingly, substitutions at Lys 301 , which is part of the endonucleolytic active site, eliminate binding to the cleavage site region but permit contact with the minimal binding region. This experimental evidence demonstrates that the protein splicing domain as well as the endonuclease domain is involved in binding of a DNA substrate with the requisite

The yeast PI-SceI endonuclease catalyzes the hydrolysis of two specific phosphodiester bonds within an asymmetrical recognition site (1). This enzyme is a homing endonuclease (for a review, see Ref. 2) that occurs as an intein situated within an H ϩ -ATPase protein subunit. Like other homing endonucleases, PI-SceI recognizes an extremely long sequence (31 bp) 1 and cuts DNA to yield 5Ј-phosphate and 3Ј-hydroxyl ends (3,4). Mutagenesis and biochemical studies indicate that the PI-SceI recognition sequence can be divided into two regions (4,5).
Region I contains the cleavage site that is cut by the enzyme to generate a 4-base pair overhang, and region II includes an adjacent 17-bp sequence (the minimal binding sequence) that is sufficient for high affinity binding. Mutagenesis of the substrate reveals that PI-SceI tolerates substitutions at numerous positions, since substitutions at only nine positions in the substrate lead to severely reduced activity (4). Like the other homing endonucleases that have been studied, PI-SceI requires Mg 2ϩ as a cofactor. The metal ion is likely to be required for the hydrolytic reaction, since it is required for catalysis but not for specific binding. Mn 2ϩ can substitute for Mg 2ϩ , and it stimulates more efficient cleavage by the enzyme at cognate and noncognate sites (1,5).
The three-dimensional structure of PI-SceI has been recently determined by x-ray crystallography and reveals a bipartite domain structure (6). Domain I contains the protein splicing active site, which is composed of the N-and C-terminal amino acids and two other His residues that have been shown to be required for activity or have been implicated in the reaction (7,8). The residues that compose the putative endonucleolytic active site, a lysine (Lys 301 ) and two aspartic acid residues (Asp 218 and Asp 326 ), are present in domain II and form a catalytic triad that displays structural similarity to charged clusters found in restriction enzymes (6,9). By using the PI-SceI structural information and the knowledge that the enzyme contacts two discrete regions of the recognition sequence, a model for the docking of PI-SceI with its substrate was constructed where domains I and II of the protein contact regions II and I, respectively, of the substrate (Fig. 1). In this model, both domains are proposed to contact the substrate, since the binding surface on the endonuclease domain alone is insufficient to contact the entire 31-bp recognition sequence. A bend of ϳ55°was introduced into the middle of the substrate to accommodate the angular orientation of the two domains with respect to each other, and experimental evidence confirms the existence of this distortion (4,5). Furthermore, the scissile bonds of the DNA were placed in close proximity to Asp 218 and Asp 326 , which are thought to bind the Mg 2ϩ co-factor. Two symmetry-related ␤-sheets (sheets 7 and 9) in domain II that flank the active site aspartic acid residues may serve as platforms that contact the cleavage site region. Furthermore, we speculate that a pair of ␤-hairpin loops between ␤15 and ␤16 and between ␤21 and ␤22 that lie above the sheets contain amino acids whose side chains mediate substrate binding. The interaction of domain I with region II of the substrate may involve a cluster of positively charged amino acids situated along the same face of PI-SceI as the endonucleolytic active site. The structure of a second homing endonuclease, I-CreI, was recently reported, and a model for its binding to DNA bears similarity to that proposed for PI-SceI (10). I-CreI is a ho-modimeric protein that resembles domain II of PI-SceI, but it lacks the protein splicing domain. Like PI-SceI, I-CreI contains a set of ␤-sheet structures with interconnecting extended loops that are proposed to form the protein interface that binds the DNA substrate.
To determine the identity of amino acid residues involved in contacting the PI-SceI recognition sequence, we used alaninescanning mutagenesis to create a set of mutant proteins with single amino acid changes at numerous positions on the proposed DNA binding interface. These mutant proteins were purified and assayed for their substrate cleavage and DNA binding activities. The major finding of the work is that residues in both domains mediate DNA binding. Moreover, the binding behaviors of wild-type PI-SceI and several mutants provide compelling evidence for a high affinity interaction between domain I and the minimal binding region and for a substantially weaker association between domain II and the cleavage site region.

EXPERIMENTAL PROCEDURES
Materials-TALON metal affinity resin and TALONspin columns were obtained from CLONTECH. All oligonucleotides were synthesized by Genosys Biotechnologies, Inc.
Mutagenesis of PI-SceI Gene-Wild-type and mutant PI-SceI proteins were expressed from plasmid pET PI-SceI C-His, which encodes a 479-amino acid PI-SceI derivative containing a polyhistidine C-terminal extension that facilitates rapid protein purification by metal affinity chromatography. To construct pET PI-SceI C-His, PCR mutagenesis (11) was used to insert six silent restriction sites (SpeI, ApaI, BssHII, BstEII, BsiWI, and MluI sites at positions 243, 406, 484, 717, 813, and 943, respectively, relative to the first codon) into plasmid pET23PI-Sce ESARC (9) to generate plasmid pET23PI-Sce-9. Plasmid pET23PI-Sce-9 was used as a template in a PCR reaction with two oligonucleotides (5Ј-TTCGGATCCGCGACCCATTTTGCATGGACGACAACCT-3Ј and 5Ј-CGGTACGCGTGAAACATTTCTG-3Ј) to generate a 449-bp fragment. This product was digested with MluI and BamHI and ligated into MluI/BamHI-digested pET23PI-Sce-9 DNA to create pET PI-SceI C-His. The entire PI-SceI coding region of pET PI-SceI C-His was confirmed by DNA sequence analysis. Omitting the N-terminal methionine residue, pET PI-SceI encodes a 479-amino acid PI-SceI derivative with a C-terminal tail having the sequence KWVADPNSSSVDKLAAALEH-HHHHH-COOH. Protein splicing-mediated cleavage of the C-terminal affinity tag was prevented by substituting Asn 454 with alanine. To introduce mutations into the PI-SceI coding sequence, oligonucleotide primers were used in either cassette mutagenesis or two-step overlapping PCR amplification protocols (11). All introduced mutations and inserted sequences were confirmed by dideoxy sequencing.
Expression and Purification of PI-SceI Proteins-Plasmid pET PI-SceI C-His encoding wild-type or mutant PI-SceI proteins was trans-formed into Escherichia coli strain BL21 (DE3). For most of the mutant proteins characterized, a 200-ml culture was grown in LB medium (1% Bacto-tryptone, 0.5% Bacto-yeast extract, 0.5% NaCl, 1 mM NaOH) containing ampicillin (100 g/ml) at 37°C to an A 600 of 0.6 -0.8. Expression of PI-SceI protein was induced with 0.5-1.0 mM isopropyl-1thio-␤-D-galactopyranoside, and growth was continued overnight at 15°C. The cells were harvested by centrifugation, resuspended in 2 ml of sonication buffer (20 mM Tris-Cl (pH 8.0), 300 mM KCl, 10 mM MgCl 2 , 5% glycerol, 1 mM phenylmethylsulfonyl fluoride) containing 1 mM imidazole, and lysed by sonication (3 ϫ 1 min) at 4°C. All further manipulations were performed at 4°C. Cell debris was pelleted by centrifugation at 10,000 ϫ g for 15 min. The clarified lysate was applied to a TALON spin column (0.5 ml of TALON metal affinity resin) preequilibrated with sonication buffer, and the metal affinity columns were inverted for 5 min and centrifuged at 700 ϫ g for 2 min. The resin was washed twice with 1 ml of sonication buffer containing 1 mM imidazole, and PI-SceI was eluted from the columns with sonication buffer containing 300 mM imidazole. Elution fractions containing PI-SceI, as judged by SDS-polyacrylamide gel electrophoresis, were pooled and dialyzed overnight in buffer D (10 mM potassium phosphate (pH 7.6), 5% glycerol, 0.1 mM EDTA, and 1.4 mM 2-mercaptoethanol) containing 40 mM KCl (buffer D40). The dialyzed protein was applied to a 1-ml SP-Sepharose column equilibrated with buffer D40, the resin was washed with 2.5 ml of buffer D40, and protein was eluted with 10 ϫ 1 ml of buffer D450. Elution fractions containing purified PI-SceI were pooled and stored in storage buffer (10 mM potassium phosphate (pH 7.6), 50 mM KCl, 2.5 mM 2-mercaptoethanol, and 50% glycerol) at Ϫ20°C. For some PI-SceI proteins, similar protocols were used to purify the enzyme from 1-liter cultures. The PI-SceI proteins were purified to greater than 95% as judged by SDS-polyacrylamide gels. The affinitytagged PI-SceI (M r ϭ 53,800) concentration was determined using the extinction coefficient of 5.03 ϫ 10 4 /M/cm as determined by published methods (12). The wild-type protein had the same specific activity as native PI-SceI. 2 Native Gel Mobility Shift Assay of DNA Binding-To detect protein-DNA complexes in DNA mobility shift analyses, a 219-bp duplex DNA fragment containing a single PI-SceI recognition site was synthesized by PCR and labeled with [ 32 P]ATP as described previously (4). Nonspecific binding was measured using a 189-bp duplex DNA fragment that was identical in all respects except that it lacked the PI-SceI recognition site. Each reaction mixture (20 l) contained 25 mM Tris-HCl (pH 8.5), 100 mM KCl, 10% glycerol, 50 g/ml bovine serum albumin, 2.5 mM 2-mercaptoethanol, 5 fmol of 219-bp substrate (5Ј-32 P-labeled at both ends), and PI-SceI as specified and was incubated at 25°C for 10 min. The samples were subjected to electrophoresis through a 7% native polyacrylamide gel in 0.5 ϫ TBE at 210 V for 5 min and then at 120 V for 2-4 h at 4°C. The amounts of bound and unbound substrate were determined using a PhosphorImager and FragmeNT Analysis software (Molecular Dynamics, Inc.). Autoradiographic exposure of the dried gel to film was used to visualize the unbound DNA and the PI-SceI-DNA complexes.
The PI-SceI reaction pathway can be described by Scheme I (Fig. 2), where the free protein (P f ) and DNA (D f ) interact to form the lower protein-DNA complex (PD LC ) that involves PI-SceI contacts to region II of the substrate (4,5,9). PD LC is in equilibrium with the upper complex, PD UC , where PI-SceI contacts both regions I and II of the substrate. A second pathway for PD UC formation is possible involving a complex where PI-SceI contacts region I only (PD x ), but this complex has not been observed. The PD UC complex binds Mg 2ϩ and forms the putative pentavalent phosphate transition state that undergoes double-stranded scission. PI-SceI is proposed to remain tightly bound to the region II cleavage product following the reaction.
The thermodynamic parameters K 1 and K 2 in Scheme I (Fig. 2) can be expressed as follows, where  In the PI-SceI target sequence, the cleavage site is located in region I (base pairs Ϫ10 to ϩ4), and the minimal binding region is situated in region II (base pairs ϩ5 to ϩ21). These regions are defined by biochemical and mutagenic experiments (4).
Similarly, substitution into the expression for K 2 with LC and UC yields the following.
Values for K 1 and K 2 were determined by nonlinear regression of the gel mobility shift data to Equations 4 and 5 using KaleidaGraph software (Abelbeck Software). Under the conditions of the assay, these equations are valid, because the total protein concentration [P T ] is much greater than the total DNA concentration [D T ]. In addition to wild-type PI-SceI, the R90A, R94A, T225A, R231A, D232A, Y328A, T338A, and H343A mutant proteins also formed PD LC and PD UC complexes, and binding could be represented by Scheme I (Fig. 2). Complete binding curves for the R90A and R94A proteins could not be generated due to the low level of binding. However, an estimate of the K 1 ϫ K 2 value could be made, since K 1 ϫ K 2 ϭ [P T ] when UB ϭ UC , where UB is the fraction of total DNA that is unbound. The K301A, K301E, and H377A mutants only formed the PD LC complex even at high protein concentration. No equilibrium dissociation constants were measured for the D229A, K301R, K340A, and K369A proteins, since the PD UC complexes migrated faster than that of wild-type PI-SceI and could not be adequately resolved from the PD LC complex. PI-SceI Cleavage Analysis-In an initial characterization of the PI-SceI proteins, purified enzyme (50 -150 nM) was incubated with XmnIlinearized pBS-PISce36 (7 nM) (4) in 15 l of cleavage buffer (100 mM KCl, 25 mM Tris-HCl (pH 8.5), 2.5 mM 2-mercaptoethanol, 2.5 mM MgCl 2 ) for 30 min and 1 h at 37°C. On the basis of these assays, mutant proteins that were determined to be partially or fully defective in cleavage activity were assayed with purified PI-SceI proteins (100 nM) under the same conditions for various lengths of time. Reactions were terminated by the addition of 5 l of stop buffer (5 mM Tris-HCl (pH 7.5), 10 mM EDTA, 0.05% (w/v) SDS, 2.5% (w/v) Ficoll). Samples were subjected to electrophoresis in 1 ϫ TBE on a 0.9% agarose gel, which was stained with ethidium bromide and photographed. The amounts of undigested plasmid DNA and the two cleavage products were determined using a scanning densitometer (Molecular Dynamics). Cleavage rates were calculated from curve fitting of the linear portions of the reaction using KaleidaGraph (Synergy Software).

RESULTS
Mutagenesis of PI-SceI-To identify the amino acid residues that participate in substrate binding, we introduced amino acid substitutions into the domain II platform and loop regions and into the positively charged region of domain I that is predicted from the model to contact the DNA. In domain II, substitutions were made in ␤14, ␤15, and ␤16 in one of the two symmetryrelated platforms and in ␤19, ␤20, ␤21, and ␤22 in the other (Table I). Substitutions were also made at the active site at Lys 301 and Pro 304 , two highly conserved residues situated in block D, a conserved motif found in homing endonucleases and maturases (7,8). In domain I, substitutions were introduced at amino acids Arg 90 , Arg 91 , Arg 94 , and Lys 97 , which comprise the cluster of positive charges thought to bind the DNA. In general, amino acid residues with side chains containing putative hydrogen bond donors or acceptors were targeted, since hydrogen bonds are frequently important components of protein-DNA interactions. Alanine substitutions were introduced, since this residue lacks hydrogen bond partners and it would be expected to exert minimal steric or electrostatic effects on structure. To investigate the effect of charge changes at Lys 301 , substitution was made at this position with arginine, which maintains the positive charge, and with glutamic acid, which introduces a negative charge.
A total of 49 mutant PI-SceI derivatives were generated containing substitutions at four positions in domain I and 43 positions in domain II (Table I). Cultures of E. coli strains harboring plasmids that expressed the mutant proteins were grown and induced to overexpress PI-SceI. The levels of PI-SceI protein varied for each of the mutants following induction with isopropyl-1-thio-␤-D-galactopyranoside and ranged from approximately 2% to greater than 10% of total cell protein. The only exception was the D371A derivative, which could not be purified in sufficient quantities to accurately study. However, a low level of activity was observed with this derivative. 3 The PI-SceI proteins were purified by chromatography on a Co 2ϩ metal affinity resin. All enzymes adhered to this column matrix and could be eluted using 300 mM imidazole, just as for the wild-type protein. After this column, the proteins were approximately 90% pure as judged by SDS-polyacrylamide gel electrophoresis. 3 Following the affinity purification step, the proteins were further purified using chromatography on SP-Sepharose. All of the mutant proteins and wild-type PI-SceI bound to the SP-Sepharose column and could be eluted using buffer containing 450 mM KCl. Samples were subjected to electrophoresis on SDS-polyacrylamide gel electrophoresis and were judged to be Ͼ95% pure. 3 Characterization of DNA Cleavage Activity of Mutant Proteins-The 48 mutant proteins that were successfully purified were tested for their ability to cleave a PI-SceI recognition site on linearized plasmid pBS-PISce36. In initial experiments designed to quickly identify mutant derivatives that were partially or completely defective in cleavage activity, an approximately 50 -150 nM concentration of PI-SceI protein was incubated with a 7 nM concentration of linearized substrate under standard reaction conditions in buffer containing MgCl 2 . Table I shows that 34 of 48 mutant proteins tested had at least 25% of the activity of wild-type PI-SceI. These results reveal that mutations can be made at numerous positions in the protein proximal to the active site with little or no effect on 3 Z. He and F. S. Gimble, unpublished results. activity. Of the remaining mutants examined, 12 (R90A, R94A, D229A, R231A, D232A, K301R, Y328A, T338A, K340A, H343A, K369A, and H377A) were partially active (activity levels less than 25% of wild-type activity) and two displayed no activity (K301A and K301E). Surprisingly, one mutant protein (T225A) was at least 3 times more active than wild-type PI-SceI.
More detailed rate experiments were performed for the partially or fully defective proteins and for the enhanced activity protein in reaction buffers containing either MgCl 2 or MnCl 2 . These experiments were carried out under single turnover conditions (excess enzyme relative to substrate). Steady state conditions could not be achieved, since PI-SceI remains tightly bound to one of the two cleavage products (5,9), yielding a low turnover number. The amount of linearized substrate that was cleaved to form the two products was measured as described under "Experimental Procedures," and the cleavage activities are shown in Table II. In the buffer containing Mg 2ϩ , 25% of the substrate was cleaved by wild-type PI-SceI in approximately 5 min. By contrast, for two of the mutant proteins (K301A and K301E), no cleavage activity was apparent after 4 h of incubation, and for two others (D229A and K340A), only trace amounts of cleavage products were detected. The reaction rates for these mutant proteins were too slow to measure accurately, and we estimate that their activities are at least 500 times lower than that of wild-type PI-SceI. Of the remaining 10 defective mutants, four were Ͼ20-fold less active than wildtype PI-SceI (R90A, R94A, Y328A, and H377A), and five were 4 -20-fold less active (R231A, D232A, K301R, T338A, H343A, and K369A). The PI-SceI protein with enhanced activity, T225A, cleaved the DNA substrate over 3 times faster than the wild-type protein in the presence of Mg 2ϩ .
Substitution of Mn 2ϩ for Mg 2ϩ in the PI-SceI cleavage buffer is known to relax the specificity of the enzyme, allowing it to cleave at sites that are resistant to cleavage in MgCl 2 (1,4), and to increase the overall activity of the enzyme when it cuts at the normal recognition site (1,5). The data in Table II show that MnCl 2 increases the cleavage rate of wild-type PI-SceI nearly 10-fold, which is similar to rate enhancement levels reported elsewhere (5). Interestingly, for all mutants tested, even for the K301A, K301E, D229A, and K340A mutants that are completely inactive in the presence of MgCl 2 or severely reduced in activity, MnCl 2 increases the cleavage rates, often to near wild-type levels. The levels of activity for the R90A, R94A, R231A, Y328A, T338A, H343A, and K369A proteins in MnCl 2 are less than 4-fold different than that of wild-type PI-SceI (Table II). The level of rate enhancement by Mn 2ϩ varies for the different proteins; for example, the activity of the Thr 338 protein is 12-fold higher in Mn 2ϩ , while that of the R94A protein is at least 400-fold higher.
DNA Binding Properties of the Mutant Proteins-To test whether defects in substrate binding by the mutant proteins account for the reduction in cleavage rates, gel shift analyses were performed as described under "Experimental Procedures" using a 219-bp linear fragment containing a single PI-SceI site. Wild-type PI-SceI forms two complexes with this substrate in the absence of metal ion co-factor; a lower complex (PD LC ) in which the protein binds solely to a 17-bp minimal binding region distal to the cleavage site (region II) and an upper complex (PD UC ) in which PI-SceI binds to both the minimal binding region and to the cleavage site region (region I). Fig. 3 shows that wild-type PI-SceI forms both complexes in the binding experiment. In this report, we used the data to measure two equilibrium dissociation constants, K 1 and K 2 , that describe PI-SceI binding to its substrate (see "Experimental Procedures"). Overall binding can be expressed as the product of these parameters and is approximately 0.7 nM (Table III). As suggested previously (4,5,9), it appears that the major contributing factor to this tight affinity stems from the interaction of PI-SceI with region II of the substrate, since K 1 is only about 10-fold higher than K 1 ϫ K 2 . The high value of K 2 , which reflects the partitioning between the lower and the upper complexes, suggests that the binding energy released by the interaction of domain II with the DNA is used to stabilize the energetically unfavorable distorted DNA conformation that is ␤22 ϩϩ a Indicates the positions where mutations were introduced into the PI-SceI sequence. Denotes the wild-type residue, the location of the mutation in the primary sequence and the substituted residue. Amino acids are represented by the single-letter code. Positions in boldface type were chosen for further study.
b Position of the amino acids relative to the PI-SceI secondary structure (see Ref. 6). Double arrows indicate an amino acid is located between two secondary structure elements. c DNA cleavage assays were carried out as described under "Experimental Procedures." ϩϩϩ, ϩϩ, ϩ, and Ϫ indicate Ͼ300% of wild-type activity, Ͼ25% of wild-type activity, trace levels, and undetectable activity, respectively. The asterisk indicates that the mutant protein could not be sufficiently purified.
d Reported as the percentage of the total surface area of the amino acid that is accessible to a water molecule. present in the upper complex. Furthermore, as predicted from the model, the ratio of the upper and lower complexes, which reflects K 2 , is independent of protein concentration. 3 The equilibrium dissociation constant for binding of PI-SceI to a nonspecific DNA fragment of similar size is over 300-fold lower than to the specific probe (ϳ200 nM compared with 0.67 nM). 4 The PI-SceI mutant proteins display a variety of different binding behaviors that strongly indicate that both domains of the protein contact the substrate. Of all of the mutants analyzed, the level of binding is lowest for the R90A and R94A mutant proteins, which both contain substitutions in domain I. Under conditions where wild-type PI-SceI generates high levels of the PD UC complex, the mutants yield barely detectable amounts, and complete binding curves could not be generated. However, the K 1 ϫ K 2 values could be estimated as ϳ100 and ϳ40 nM for the R90A and R94A proteins, which are 150-and 60-fold higher than for wild-type PI-SceI. The H377A, K301A, and K301E proteins contain substitutions in domain II and are similar in that they generate no species that co-migrates with 4 M. Crist and F. S. Gimble, unpublished results. b No PD UC complex is observed for these mutant proteins, indicating that the K 2 value is significantly higher than that of wild-type PI-SceI. K 1 was calculated as described under "Experimental Procedures."  the wild-type upper complex. Interestingly, these three proteins appear to bind more tightly to region I than wild-type PI-SceI, since their K 1 values are lower (Table III). For the D229A, K301R, K340A, and K369A mutants, it is evident from the gel shift assay that the upper complex forms but that it migrates faster than that of wild-type PI-SceI (Fig. 3). The inability to resolve the two complexes prevents accurate determination of K 1 and K 2 , but it is clear from the gel shift analysis that the total amount of bound complexes produced by the D229A and K301R proteins is roughly similar to that of wildtype PI-SceI, while overall binding of the K340A protein is markedly lower (Fig. 3). Both complexes are apparent for the R231A, D232A, Y328A, T338A, and H343A proteins, but their thermodynamic parameters differ. The K 1 values are within 2-fold of wild-type for all mutants except for Y328A, which has a value that is over 3-fold higher. By contrast, the D232A and T338A proteins exhibit K 2 values that are approximately 2.5and 7.5-fold higher than that of the wild-type protein. Interestingly, K 2 Ϸ 1 for the T338A mutant, indicating that there is equal partitioning of the lower and upper complexes. As a consequence of these differences in K 1 and K 2 , the K 1 ϫ K 2 values for the D232A, Y328A, and T338A proteins, which reflect overall formation of the upper complex, are Ͼ4.5-fold higher than that of wild-type PI-SceI. Finally, both protein-DNA complexes are generated by the T225A mutant, which cleaves the substrate 3-fold faster than wild-type PI-SceI. No significant reproducible differences in binding were detected for this mutant protein. DISCUSSION In this report, we employed alanine-scanning mutagenesis to generate mutations in regions of PI-SceI endonuclease that are believed to contact the DNA substrate. This type of strategy has been successfully used to probe protein-DNA recognition for several other DNA-binding proteins, including the Arc repressor (13) and the E. coli Tyr B protein (14). Alanine-scanning mutagenesis has the advantage of only substituting a single methyl group for the wild-type side chain, which effectively removes any important functional group that is normally present. Random mutagenesis followed by genetic selection is more likely to cause a loss-of-function phenotype by introducing a deleterious moiety that alters the protein conformation. A drawback of alanine-scanning mutagenesis is that unless a complete mutagenesis profile is performed for a given protein, there is the possibility that functionally important residues may not be tested. It is also possible that main chain functional groups contribute to binding free energy, which would not be probed by our strategy.
In the absence of a crystal structure that includes the DNA substrate, it remains unclear whether the functionally important residues identified here by mutation act directly by removing a critical contact or indirectly by modifying the protein conformation. However, these mutations probably do not cause any gross structural perturbations, since all of the mutant proteins could be purified in soluble form using the same procedures as for wild-type PI-SceI, suggesting they are correctly folded. Furthermore, and most importantly, in the presence of Mn 2ϩ , all of the mutant proteins are active to some degree, with some being nearly as active as wild-type PI-SceI.
According to Scheme I (Fig. 2), mutations that modify cleavage activity can exert their effects by altering the catalytic machinery of the protein (i.e.. they can affect k 1 ) and/or by affecting the substrate binding determinants (they can affect K 1 -K 4 ). We show here that there is a good correlation between the decrease in the level of cleavage activity and the decrease in substrate binding, suggesting that binding interactions have been disrupted. The mutants that display the lowest levels of cleavage activity, i.e. R90A, R94A, D229A, K301A, K301E, Y328A, K340A, and K377A, yield either little or no apparent PD UC complex or produce complexes that migrate faster than that of the wild-type enzyme. The absence of the PD UC complex suggests that important contacts near the cleavage site have been disrupted and that no interaction occurs between PI-SceI and region I, while the appearance of faster migrating complexes indicates possible conformational differences in the complex that may affect the cleavage activity. The T225A mutant, which is approximately 3-fold more active than wild-type PI-SceI, has a K 1 ϫ K 2 value similar to wild type, but we cannot rule out a small binding enhancement.
The main finding of this report, that both PI-SceI domains contact the recognition sequence, is supported by consideration of the thermodynamic binding parameters of the various mutant enzymes together with the positions of the substituted residues in the crystal structure. Fig. 4 shows an overview of the entire protein that indicates the positions of the amino acids where mutations lead to a loss or gain of activity. Two domain I residues, Arg 90 and Arg 94 , are strong candidates for amino acids that contact region II, since proteins with mutations at these positions have K 1 ϫ K 2 values that are significantly higher than that of wild-type PI-SceI. Residue Arg 90 is exposed to solvent and lies on the same face of the protein as the active site in domain II, which might be expected if both regions contact the DNA. Little can be concluded from the positioning of residue Arg 94 since it is part of a disordered loop in the crystal structure, but it is in the same vicinity as Arg 90 . The residues in domain II that alter activity cluster in groups that neighbor the active site. For example, the Tyr 328 , Lys 340 , and Thr 338 side chains are in close proximity in the crystal structure. The Tyr 328 phenolic group and the ⑀-amino group of Lys 340 are situated within 4 Å of one another and extend upward into the solvent-exposed region above the platform formed by ␤-sheet 9 that is thought to contain the DNA (Fig.  5A). The Y328A protein exhibits a ϳ25-fold reduction in cleavage activity that probably results in part from its reduced DNA binding affinity. However, binding defects alone cannot account for the large reduction in cleavage activity of the Y328A mutant, and there may be effects on catalysis as well. Even more striking is the nearly total absence of activity of the Lys 340 mutant, which can be easily accounted for by its binding defect. According to our model (Fig. 1), it might be predicted that PI-SceI domain I and domain II mutations affect K 1 and K 2 , respectively. Within domain I, this prediction is borne out by the R90A and R94A proteins. However, Y328A is an example of a domain II substitution that alters K 1 , which suggests that rather than being independent, the domains communicate with each other. Alanine substitution at the third residue in this group, Thr 338 , increases the K 2 value to unity, resulting in equal partitioning between the complexes. The Thr 338 side chain is not solvent-exposed and would not be expected to contact the substrate. In the other half of the binding platform, which originates from ␤-sheet 7, the Thr 225 side chain also extends above the platform surface (Fig. 5B). Removal of most of the threonine side chain by alanine substitution does not have a major effect on binding. Residues His 343 and His 377 are located above one another in two loops that are part of an extended structure that rises above one side of the active site. Both ⑀2 nitrogens are pointed toward the opening above the active site where the DNA is thought to be located (Fig. 5B). The behavior of the H377A mutant protein nicely fits our model, since it yields no PD UC complex (high K 2 value), and is Ͼ50-fold reduced in activity compared with wild-type PI-SceI. Somewhat surprisingly, the K 1 value is nearly 10-fold higher compared with wild-type PI-SceI, which again suggests syn-ergy between the two domains. The Lys 369 residue is situated in the same loop as His 377 , but its orientation is uncertain due to disorder in the structure. However, stereochemical refinement of the structure indicates hydrogen bonding between the Lys 369 and Lys 340 ⑀-amino groups, and the K369A substitution may affect the structure of the binding platform. Diametrically opposite to His 343 and His 377 on the other side of the active site are residues Asp 229 , Arg 231 , and Asp 232 , which form a tight cluster where the side chains are oriented toward the putative substrate binding cavity. A hydrogen bond exists between the Arg 231 ␦-guanidino group and the Asp 229 carboxyl group. The binding constants for the D229A mutant could not be accurately determined, but it is clear that the large decrease in cleavage activity cannot be accounted for solely by reductions in overall binding (Fig. 3). What is certain is that the mutation alters the mobility of the PD UC complex, which may indicate conformational differences in the DNA. Alternatively, as with any of the mutants described here, there may be conformational changes in the catalytic center that affect activity.
The PI-SceI mutants containing substitutions at Lys 301 fall into a separate category, since this amino acid, unlike the other residues characterized here, is highly conserved among homing endonucleases (7,8) and, together with Asp 218 and Asp 326 , forms a "catalytic triad" that comprises the PI-SceI active site (6). Similar clusters of two acidic residues and a lysine residue are found at the active sites of several restriction endonucleases (15). Lys 301 is situated at the C-terminal end of ␤18 in the PI-SceI crystal structure, and the side chain extends into the putative substrate binding cavity that is also occupied by the two aspartic acid side chains (Figs. 4 and 5). Substitution of Lys 301 with alanine or glutamic acid dramatically increases K 2 and consequently eliminates all activity. Similar substitutions at Lys 92 of EcoRV, which may be an analogous residue to Lys 301 , reduce substrate binding and cleavage activities (16). The basic character of the Lys 301 side chain is critical for the PI-SceI binding interaction, since a K301R mutant is partially active in binding and cleavage assays. By contrast, arginine substitution at Lys 92 of EcoRV abolished DNA cleavage activity with either Mg 2ϩ or Mn 2ϩ (17). We also found that cleavage activity of the PI-SceI K301A and K301E mutants could be partially rescued by Mn 2ϩ (Table II). A similar effect was observed for the EcoRV K92E mutant protein but not for the K92A protein, which led to speculation that the binding of a second Mn 2ϩ ion to the Glu residue restored the positive charge normally contributed by the Lys 92 side chain. This is unlikely to be the case for PI-SceI, since we observe rescue of activity to the K301A mutant as well. In fact, the activity of all of the mutant proteins is partially rescued by substitution of Mn 2ϩ for Mg 2ϩ . It is also worth noting that a set of substrate mutants that are catalytically inactive in Mg 2ϩ also have activity restored by the presence of Mn 2ϩ (4). Similar instances of activity "rescue" by Mn 2ϩ have been observed with EcoRV mutants that have low levels of activity in Mg 2ϩ but have nearly wild-type activity levels in Mn 2ϩ (16,18). However, unlike the restriction enzymes, PI-SceI normally displays greater activity in the presence of Mn 2ϩ than with Mg 2ϩ . One EcoRV mutant has been identified for which this is also the case (19). Taken together, our data are consistent with the Lys 301 side chain establishing an important binding contact within region I, perhaps to a phosphate oxygen near the scissile phosphodiester bond.
The substrate binding properties of the protein mutants characterized here complement those of a set of loss-of-function DNA substrate mutants that contain substitutions in regions I and II. Point mutations at positions A ϩ16 , G ϩ18 , and A ϩ19 in region II dramatically reduce all binding to wild-type PI-SceI (4). According to our model, these base pairs are located in the same general vicinity as the R90A and R94A mutant proteins, which display similar binding defects. By contrast, substitutions in the PI-SceI substrate near the cleavage site at positions A Ϫ9 , T Ϫ1 , G ϩ1 , G ϩ3 , and G ϩ4 only eliminate PD UC complex formation or produce a complex that migrates faster than that of wild-type PI-SceI (4). These binding properties are similar to those of some domain II mutants described here. Thus, there is a good correlation between the DNA binding properties of both the substrate and protein mutants that strongly supports the conclusions of the PI-SceI docking model. However, a convincing demonstration that these proposed interactions occur must await the determination of the PI-SceI structure complexed to its recognition site.
The results presented here are the first to show that the PI-SceI protein splicing domain is involved in site-specific substrate binding. We hypothesized that the PI-SceI intein gene originally arose by the fusion of two pre-existing genes, one that encoded an endonuclease and the other that encoded a splicing protein (6). Surprisingly, the recently determined structure of the autoprocessing domain of the Drosophila Hedgehog protein is very similar to domain I of PI-SceI, but it lacks the PI-SceI DNA recognition region. Instead, it contains an unrelated region that binds cholesterol (20). This suggests that the protein splicing domain existed previously as a core protein that acquired new functions in different instances by associating with new sequences (20). In the case of PI-SceI, it raises the possibility that the DNA binding region was acquired after the intein was assembled. Presumably, the acquired abil-  Fig. 4 and is from the vantage point of the lower right side of that figure. One of the two ␤-sheets (␤9) proposed to act as a binding platform for the DNA lies in the foreground. The two Asp residues that compose part of the PI-SceI active site are positioned in the middle and are situated at the C-terminal ends of two parallel ␣-helices. On the other side of Asp 218 and Asp 326 is located the pseudosymmetrically related ␤-sheet (␤7), which comprises the binding platform together with ␤9. The side chains of Tyr 328 and Lys 340 are seen to extend upwards into the proposed binding cleft. On the two sides of the binding cleft are structures that rise above the platform and include extended loops. Each of the amino acids where substitutions affect activity are numbered, and their side chains are shown. B, the opposite view of the binding platform, looking from the vantage point of the upper left side of Fig. 4. One of the extended loops situated above the active site includes amino acids Lys 369 and His 377 .