Probing the Structure of the PI-SceI-DNA Complex by Affinity Cleavage and Affinity Photocross-linking*

The PI-SceI protein is an intein-encoded homing endonuclease that initiates the mobility of its gene by making a double strand break at a single site in the yeast genome. The PI-SceI protein splicing and endonucleolytic active sites are separately located in each of two domains in the PI-SceI structure. To determine the spatial relationship between bases in the PI-SceI recognition sequence and selected PI-SceI amino acids, the PI-SceI-DNA complex was probed by photocross-linking and affinity cleavage methods. Unique solvent-accessible cysteine residues were introduced into the two PI-SceI domains at positions 91, 97, 170, 230, 376, and 378, and the mutant proteins were modified with either 4-azidophenacyl bromide or iron (S)-1-(p-bromoacetamidobenzyl)-ethylenediaminetetraacetate (FeBABE). The phenyl azide-coupled proteins cross-linked to the PI-SceI target sequence, and the FeBABE-modified proteins cleaved the DNA proximal to the derivatized amino acid. The results suggest that an extended β-hairpin loop in the endonuclease domain that contains residues 376 and 378 contacts the major groove near the PI-SceI cleavage site. Conversely, residues 91, 97, and 170 in the protein splicing domain are in close proximity to a distant region of the substrate. To interpret our results, we used a new PI-SceI structure that is ordered in regions of the protein that bind DNA. The data strongly support a model of the PI-SceI-DNA complex derived from this structure.

Homing endonucleases are a class of enzymes encoded by introns and inteins that initiate the mobility of their genetic elements to sites in recipient alleles where the elements are absent (reviewed in Ref. 1). These extremely specific enzymes cleave the recipient loci to stimulate a gene conversion process that copies the homing endonuclease coding sequence into the broken chromosome as it is repaired. The end result is that the intron or intein that encodes the endonuclease is propagated throughout the population. Enzymes in the LAGLIDADG fam-ily of homing endonucleases are characterized by the presence of one or two conserved dodecapeptide sequences. One member of this family is the PI-SceI homing endonuclease from Saccharomyces cerevisiae (2,3), which occurs as an intein within a vacuolar H ϩ -ATPase subunit and is generated by an autocatalytic protein splicing reaction (4,5).
Like most LAGLIDADG homing endonucleases, PI-SceI recognizes an extremely long sequence (Ն31 bp), 1 and cleaves the DNA to generate a 4-bp, 3Ј overhang (3,6,7). Although the structure of the PI-SceI-DNA complex has not been determined, evidence from biochemical studies (6 -8), from the structure of the PI-SceI apoenzyme (9), and from the structure of a related endonuclease, I-CreI, complexed to its DNA substrate (10) provide clues to general features of the interaction. The PI-SceI structure is composed of two domains that contain the protein splicing (domain I) and endonucleolytic active sites (domain II), respectively (9). The two LAGLIDADG motifs constitute two tightly packed ␣-helices that are each contributed by one of two subdomains related by pseudo-2-fold symmetry and form the hydrophobic core of the endonuclease domain. Two conserved acidic residues (Asp-218 and Asp-326) at the carboxyl termini of these helices may function to coordinate the Mg 2ϩ co-factor(s) that is involved in the DNA cleavage mechanism (11). The group I intron-encoded I-CreI endonuclease is structurally similar to the PI-SceI endonuclease domain (12,13). However, I-CreI differs from PI-SceI in two significant respects; it lacks a protein splicing domain, and it is a homodimeric protein composed of two molecules that are related by 2-fold symmetry (14), each of which is topologically similar to the N or C subdomains of the PI-SceI endonuclease domain. Within the I-CreI-DNA complex, the DNA rests in a saddle formed by ␤ sheets contributed by both monomers and contacts the protein over an extended surface area (10).
We proposed a hypothetical model for the PI-SceI-DNA complex in which the endonuclease domain contacts the PI-SceI cleavage site region of the DNA, and an extended subdomain of the protein splicing domain termed the DNA recognition region (DRR) binds to a distal section of the target sequence (9). When the I-CreI-DNA complex structure was determined, we modified the model by docking the I-CreI DNA to PI-SceI (15). This and a related model (16) are consistent with mutational studies of the protein and DNA substrate (6,17), and with interference and footprinting analyses (6,11). Furthermore, they are consistent with cross-linking and affinity cleavage mapping studies that position His-333 of the endonuclease domain near T ϩ9 on the bottom strand (16) and position residues 271-279 of a disordered loop near A/T Ϫ9 (15). However, detailed mapping information is not available for any other part of the PI-SceI-DNA interaction. Here, we have tethered the affinity cleavage reagent FeBABE and the photocross-linking reagent phenyl azide to residues in both PI-SceI domains that are expected to be near the protein-DNA contact surface and have employed these derivatized proteins to map these positions to specific regions of the PI-SceI target sequence. The data have been interpreted using a model of the PI-SecI complex based on a new PI-SecI crystal structure.

EXPERIMENTAL PROCEDURES
Materials-Oligonucleotides used for mutagenesis were synthesized by Sigma. Restriction and DNA-modifying enzymes were obtained from New England Biolabs, Inc. TALON metal affinity resin was purchased from CLONTECH, SP-Sepharose was obtained from Amersham Pharmacia Biotech, and Affi-Gel Blue Gel was obtained from Bio-Rad. Ascorbic acid (vitamin C, microselect grade) was purchased from Fluka, and hydrogen peroxide (ultrex grade) was obtained from J. T. Baker. 7-(Diethylamino)-3-(4Ј-maleimidylphenyl)-4-methylcoumarin was purchased from Molecular Probes, Inc. (Eugene, OR). 4-azidophenacyl bromide and piperidine were obtained from Sigma. FeBABE was kindly provided by Dr. Claude Meares.
Crystallization of PI-SceI-The procedures for overexpression and purification of selenomethionyl PI-SceI have been described previously (9). The C2 space group crystal was grown during attempts to produce crystals of the PI-SceI-DNA complex. Selenomethionyl PI-SceI protein was mixed in a 1:1.5 molar ratio with an 18-base pair duplex containing the PI-SceI minimal recognition site (6) and incubated 30 min prior to setting up the crystallization. An equal volume of the complex was mixed with mother liquor containing 20% polyethylene glycol 3350, 100 mM Tris-Cl (pH 8.3), 100 mM KCl, and 200 mM MgCl 2 , and the crystals were grown using the hanging drop vapor diffusion method. The crystals belong to space group C2 and have unit cell parameters a ϭ 93.6 Å, b ϭ 76.0 Å, c ϭ 71.4 Å, and ␤ ϭ 111.3°. There is one molecule per asymmetric unit. The crystal was stored overnight in mother liquor containing 30% polyethylene glycol 3350 and frozen by flash-cooling.
Data Collection, Structure Determination, and Refinement-Data were collected from one frozen crystal at the HHMI X4A beam line of the National Synchrotron Light Source at wavelength 0.9791 Å with a 1°oscillation angle using a Raxis-4 system from a distance of 250 mm. The data were indexed and integrated using DENZO and scaled using SCALEPACK (18).
Molecular replacement was effected using Amore from the CCP4 package (19). The 2.4-Å PI-SceI structure of the P2 1 space group crystal was used without water molecules as the search model using the data from 15 to 4 Å. The search box was 80 ϫ 80 ϫ 80 Å with a Patterson radius of 25 Å. One outstanding peak (Eulerian angles: 1 ϭ 95.81°, 2 ϭ 65.59°, 3 ϭ 340.87°) that was found in a cross-rotation search remained as the best peak after a translation function search with fractional coordinates x ϭ 0.3246 and z ϭ 0.3570 and an initial R-factor of 41.2%.
The model from molecular replacement was refined with XPLOR (20) using data from 10 to 2 Å. The simulated annealing procedure was used after rigid body refinement. After several cycles of alternating positional and B-factor refinements and manual model building, water molecules were added using the graphics program O (21). The assignment of the secondary structure elements was performed using PRO-CHECK (22).
Comparison of the PI-SceI Structures-The two PI-SceI structures derived from the two different space group crystals were compared using the Lsq option in the graphics program O (21). The program uses an algorithm that searches for the longest matching fragments between the two structures that can be aligned with a given cut-off limit. Atoms were considered to be structurally equivalent if they were within 3.8 Å and within a continuous stretch of at least three equivalent atoms.
Site-directed Mutagenesis and Purification of Mutant Proteins-The construction of a gene that encodes a PI-SceI derivative, PI-SceI (Ϫ5Cys), in which five of the six naturally occurring cysteines (Cys-1, Cys-17, Cys-249, Cys-398, and Cys-416) have been changed to serine has been described previously (15). Cys-75 is not solvent-accessible and is the only remaining native cysteine in this protein. Cysteines were individually introduced into PI-SceI (Ϫ5Cys) at positions Arg-91, Lys-97, His-170, Ser-230, Lys-376, and Lys-378 by an overlapping amplification protocol (23). The same method was used to construct the H170A and K53A mutant proteins. All mutations were verified by dideoxy sequencing. The wild-type and mutant PI-SceI derivatives were overexpressed and purified by Co 2ϩ -metal affinity, SP-Sepharose ion exchange, and Affi-Gel Blue gel chromatography essentially as described (17) except that reducing agent was omitted during purification. Metals were removed by dialysis against a buffer containing 10 mM Tris-Cl (pH 8.0), 10 mM EDTA, 100 mM KCl, 5% glycerol followed by dialysis against conjugation buffer (10 mM HEPES (pH 8.0), 100 mM KCl, 5% glycerol, and 1 mM EDTA). Protein concentrations were determined by a micro-Bradford assay.
Assay of DNA Cleavage Activity-DNA cleavage assays were performed as described previously under conditions of protein excess relative to substrate (17). Wild-type PI-SceI or PI-SceI derivative proteins (100 nM) were combined with a linearized plasmid substrate (7 nM) containing a single PI-SceI recognition site, and incubated at 37°C for 10 min. Cleavage rates for the H170A and H170C proteins were determined as described (17).
Assay of DNA Binding Activity-Native gel mobility shift assays were performed using a 219-bp DNA duplex that contains a single PI-SceI recognition site as described earlier (17).
Modification of the Purified Proteins-Modification of PI-SceI with 4-azidophenacyl bromide was performed using a modified version of a published procedure (24) by combining the reagent with the protein at a 20:1 molar ratio for 3 h at room temperature in the dark. To remove unreacted 4-azidophenacyl bromide, the derivatized proteins were dialyzed overnight against 25 mM HEPES (pH 8.0), 100 mM KCl, 0.1 mM EDTA, and 5% glycerol. To conjugate PI-SceI to FeBABE, protein (1-5 mg/ml) was mixed with 1 mM FeBABE, and allowed to incubate for 1 h at room temperature as described (25). Excess reagent was removed by dialysis. The extent of modification by FeBABE or by 4-azidophenacyl bromide was determined using the fluorescent reagent 7-(diethylamino)-3-(4Ј-maleimidylphenyl)-4-methylcoumarin according to a published procedure (26).
Phenyl Azide-mediated Photocross-linking-DNA substrates that were 5Ј-end-labeled on either the top or bottom strands were prepared by digesting a labeled 219-bp polymerase chain reaction product containing a single PI-SceI recognition site with either EcoO109I or SacI, respectively, and by purifying the 158-and 187-bp fragments on a polyacrylamide gel (15). PI-SceI-DNA complexes were formed by incubating the modified protein derivatives (200 nM) with these labeled DNA duplexes (ϳ0.5 nM) for 20 min in the dark in a buffer containing 25 mM HEPES (pH 8.0), 100 mM KCl, and 0.1 mM EDTA. To effect cross-linking, the mixture was UV-irradiated for 2 min using a Fotodyne transilluminator (312 nm) from a distance of ϳ13 cm. The reactions were heated at 70°C for 10 min, extracted with phenol/chloroform (4:1, v/v), and the phenolic phase was washed with 1 M Tris-Cl (pH 8.0), 1% SDS. The DNA products were ethanol-precipitated, cleaved with piperidine (1 M) for 30 min at 90°C, and resolved by high resolution gel electrophoresis on an 8 M urea, 6% polyacrylamide denaturing gel adjacent to a G ϩ A sequencing ladder (27).
FeBABE-mediated Affinity Cleavage-Complexes of FeBABE-modified PI-SceI derivatives and the end-labeled 158-and 187-bp DNA fragments were generated as described previously (15) except that the buffer contained 10 mM HEPES (pH 8.0), 100 mM KCl, 1 mM EDTA, and 12% glycerol. DNA cleavage by FeBABE was initiated by adding sodium ascorbate (pH 7.0) and hydrogen peroxide to a final concentration of 5 mM and by allowing the reaction to proceed for 2 min at room temperature. The cleavage products were ethanol-precipitated and resolved by electrophoresis on an 8 M urea, 6% polyacrylamide denaturing gel. The dried gels were scanned using a PhosphorImager (Molecular Dynamics, Inc., Sunnyvale, CA) and analyzed using Fragment Analysis software (Molecular Dynamics).
Modeling of the PI-SceI-DNA interaction-The DNA is derived from the I-CreI DNA structure (1bp7; Ref. 10) by adopting base pairs Ϫ10 through ϩ12. This structure was extended from base pair ϩ13 to ϩ22 with a model created using the software program NAMOT (28). The DNA roll parameters of the extension were varied in NAMOT to create a DNA structure that satisfied the experimental constraints. The DNA was docked to the C2 space group crystal structure of PI-SceI using methods described previously (15). The DNA curvature was determined using CURVES (29).

Structure of a C2 Space Group Crystal of PI-SceI-A new
crystal form of the PI-SceI protein was obtained during attempts to crystallize the PI-SceI-DNA complex (30). Surprisingly, although the crystallization mixture included an oligonucleotide duplex, no DNA was present in the dissolved crystals (30). The new crystals belong to the C2 space group, while those grown previously belong to the P2 1 space group. There is one instead of two molecules per asymmetric unit. However, the one molecule dimerizes with one of its crystallographic symmetry-related molecules, and the resulting dimer overlaps the dimer observed in the asymmetric unit of the P2 1 crystal (30). The structure of the C2 space group crystal was determined by a molecular replacement method and refined at 2.0-Å resolution to an R cryst factor and an R free of 21.2% and 28%, respectively ( Table I).
The overall PI-SceI structure determined from the C2 space group crystal is very similar to the originally reported structure (Fig. 1), and the root mean square deviation between the C-␣ atoms of the two structures is 0.9 Å. The same bipartite domain architecture is evident in both structures, and they adopt the same secondary structure folds. The Hedgehog-intein (HINT) module, which is a shared protein fold in the PI-SceI protein splicing domain and in the autoprocessing domain of the Drosophila hedgehog protein (31), overlaps extensively in the C2 and P2 1 structures with the exception of a loop that connects ␤6 to ␤7 (Fig. 1). Residues 135-151, which are part of the DNA recognition region (DRR), are disordered in the C2 structure, but fold in the P2 1 structure into the four-turn helix ␣1 and a loop that connects ␣1 to ␤12. Conversely, residues 93-102, which spatially neighbor residues 135-151 in the DRR, are disordered in the P2 1 structure but show density in the C2 structure and form a loop that connects ␤9 and ␤10. A second disordered region in the P2 1 structure occurs in the N subdomain of the endonuclease domain that includes residues 271-279, which connect ␤16 to ␣6. The disordered segment is further extended in the C2 structure to include ␤16 and the loop that proceeds ␤16 (residues 254 -280). Finally, the extended ␤-hairpin loop (residues 369 -374) between ␤21 and ␤22 that is located above the active site is disordered in the P2 1 but not the C2 structure.
Location of Introduced Cysteine Residues in the C2 Structure-To map interactions between PI-SceI and its recognition sequence, we conjugated FeBABE and/or phenyl azide moieties to surface residues that are in close proximity to amino acids that are critical for PI-SceI binding to DNA. The choice of residues was done in the context of the C2 structure because it is ordered in many of the DNA binding regions. Fig. 2 displays the C2 structure of PI-SceI and identifies residues Arg-91, Lys-97, His-170, Ser-230, Lys-376, and Lys-378 that were individually substituted with cysteine and then chemically modified. Substitution of residues Arg-91, Lys-97, Ser-230, Lys-376, and Lys-378 with alanine does not affect the PI-SceI-mediated DNA cleavage activity (17). Arg-91 and Lys-97 neighbor residues Arg-90 and Arg-94 in the protein splicing domain, which contribute to the binding interaction between the DRR and DNA. Lys-376 and Lys-378 are part of the same extended ␤21-␤22 loop as Lys-369 and His-377, which participate in the interaction between the endonuclease domain and the PI-SceI cleavage site region (17). On the opposite side of the DNAbinding cleft as Lys-369 and His-377, Ser-230 is bordered by residues Asp-229 and Arg-231 in a loop between ␤14 and ␣5 (17). The exact role of Asp-229 and Arg-231 is unclear, but they are both critical for the PI-SceI reaction pathway (17). His-170 occurs in the interdomain region that is part of a ridge of positively charged residues, including Lys-53, His-170, and Lys-173, that extends along the putative DNA contact surface from the endonuclease domain toward the DRR. Unlike the other modified amino acids, His-170 may contribute to DNA binding because alanine substitution at this position reduces the cleavage rate 10-fold relative to wild-type PI-SceI (Fig. 3). However, the magnitude of this effect is lower than that caused by other PI-SceI DNA-binding mutants (17). His-170 spatially neighbors Lys-53, where substitution with alanine reduces the  cleavage rate Ͼ15-fold relative to wild-type PI-SceI. 2 The affinity cleavage and photocross-linking methods described in this report require that cysteines be modified in a PI-SceI protein that lacks other solvent-accessible cysteine residues. For this purpose, we utilized a recombinant derivative (PI-SceI (Ϫ5Cys)) in which five of the six naturally occurring cysteines have been changed to serine and the sole remaining cysteine is buried in the protein interior distant from the putative DNA binding surface (15). We showed previously that the DNA binding and cleavage activities of this variant are similar to wild-type PI-SceI and that the protein is not modified by FeBABE (15).
DNA Cleavage and Binding Activities of Cysteine-substituted PI-SceI Variants-Each of the cysteine-substituted PI-SceI proteins was purified using our standard expression and purification protocols (17). The proteins were tested in vitro for their DNA cleavage activities using a linearized plasmid substrate that contains a single PI-SceI recognition sequence (15). Similar levels of PI-SceI-mediated DNA cleavage activity are apparent for all of the mutant proteins except for the K378C derivative, which is reduced 2-fold in activity relative to PI-SceI (Ϫ5Cys), and the H170C variant, which is reduced 5-fold (Fig.  3). Binding activity was measured by gel mobility shift assays of a 219-bp end-labeled DNA fragment containing a single PI-SceI site. Wild-type PI-SceI forms two complexes with this substrate; a lower complex (PD LC ) that involves PI-SceI binding predominantly to a region of the substrate that is distant from the cleavage site and an upper complex (PD UC ) in which PI-SceI binds to this region and the cleavage site region (6, 7). The different mobilities of the complexes reflect their differing extents of DNA distortion; the lower complex is bent 40 -45°, while the upper complex is bent 60 -75°(6, 7). The DNA binding activities of the R91C, K97C, S230C, and K376C proteins are comparable with the PI-SceI (Ϫ5Cys) protein, but those of the K378C and H170C variants differ (Fig. 4). Total DNA binding by the K378C protein is similar to that of PI-SceI (Ϫ5Cys), but the ratio of the upper to lower complex species is inverted. This observation is interpreted in light of the PI-SceI reaction pathway model (17) to mean that the K378C mutation partially reduces DNA binding of the endonuclease domain only. By contrast, the H170A and H170C mutations decrease the amounts of both complexes, which is consistent with the removal of a binding contact in the splicing domain.
Affinity Cleavage in Vitro Using FeBABE-conjugated PI-SceI Derivatives-To effect iron⅐EDTA affinity cleavage, FeBABE is first covalently tethered to a single solvent-accessible cysteine that is in close proximity to, but not part of, the DNA binding surface, and the resulting FeBABE-conjugated derivatives are used to form protein-DNA complexes. In the presence of hydrogen peroxide and reducing agent, FeBABE generates reactive hydroxyl radicals that abstract the deoxyribose C-1Ј or C-4Ј hydrogen atoms of nucleotides in close proximity to the Fe-BABE moiety, ultimately leading to strand scission (25,32,33). The loci of maximal cleavage are shifted toward the 3Ј-end when the iron group is in the minor groove due to the minor groove accessibility of the reactive deoxyribose C-1Ј or C-4Ј bonds (34). Conversely, a 5Ј-shifted pattern results when Fe-BABE is in the major groove, because the hydroxyl radicals must diffuse to the two adjacent minor grooves. The distance between the C-␣ atoms of the introduced Cys residues and the FeBABE-mediated DNA cleavage site is expected to be Ͻ22 Å, which reflects the 12-Å length of FeBABE (25)  for reasons that are unclear, FeBABE could not be successfully conjugated in high yield to the H170C protein. 3 Figs. 3 and 4 show that modification with FeBABE affects neither the DNA cleavage activity nor the DNA binding activities of most of the PI-SceI derivatives. Conjugation of FeBABE to the K378C protein causes no further reduction in its activity relative to the unmodified enzyme.
No significant affinity cleavage is observed on either DNA strand in the control reaction that includes PI-SceI (Ϫ5Cys) protein that is either treated with FeBABE or untreated (Fig.  5). By contrast, four of five of the FeBABE-PI-SceI conjugates, FeBABE-PI-SceI 91 , FeBABE-PI-SceI 230 , FeBABE-PI-SceI 376 , and FeBABE-PI-SceI 378 , yield different levels of DNA cleavage at discreet sites within the substrate (Figs. 5 and 7). FeBABE-PI-SceI 91 generates a moderate signal, with the strongest cleavage occurring between ϩ14 and ϩ17 on the top strand and between ϩ9 and ϩ12 on the bottom strand. This result is consistent with genetic evidence that indicates that the neighboring residues Arg-90 and Arg-94 are involved in PI-SceI binding to this region of the substrate (17). Somewhat surprisingly, no FeBABE-mediated cleavage occurs when residue 97 is modified with the chemical nuclease.
FeBABE-PI-SceI 230 , FeBABE-PI-SceI 376 , and FeBABE-PI-SceI 378 each cleave the PI-SceI substrate proximal to the PI-SceI cleavage site, but important distinguishing features are evident. FeBABE-PI-SceI 230 generates the most intense cleavage pattern on both strands. The top strand is cut with high yield between positions ϩ2 and ϩ6 and to a lesser extent between Ϫ9 and Ϫ6. The most intense cleavage of the bottom strand occurs between nucleotides positions Ϫ5 and ϩ1, and lower levels of cleavage are evident between Ϫ14 and Ϫ11. Photocross-linking Using Phenylazide-coupled PI-SceI Derivatives-Phenyl azide-mediated photocross-linking was used as a complementary means of identifying interacting regions between PI-SceI and its recognition sequence. The major difference between the photocross-linking and affinity cleavage methods is that a phenyl azide photoactivable cross-linking moiety rather than an FeBABE group is coupled to the solventaccessible cysteines on PI-SceI. Protein-DNA complexes are formed using the derivatized PI-SceI variants, the protein is cross-linked to the DNA by UV irradiation, and the complexes are treated with piperidine, which effects strand scission at the cross-linked nucleotides and permits their identification by gel electrophoresis. Unlike FeBABE-mediated affinity cleavage, in which the reactive agent is diffusible, cross-linking by phenyl azide groups requires direct contact between the reactive species and a DNA nucleophile. The distance between the C-␣ atom and the target base has been estimated as 9 -12 Å (36,37).
PI-SceI variants with single solvent-accessible cysteine residues at positions 91, 97, 170, 230, 376, and 378 and the PI-SceI (Ϫ5Cys) protein were modified with 4-azidophenacyl bromide as described in previous reports (24,38). Interestingly, the PI-SceI (Ϫ5Cys) protein, which contains no introduced cysteine residues, cross-links to the 219-bp substrate on both strands at nucleotide position ϩ3 (Figs. 6 and 7). This cross-linking requires both modification of the protein with 4-azidophenacyl bromide and irradiation with UV light. 2 It is unlikely that the cross-linked species results from reaction of DNA with the sole remaining cysteine in the molecule (Cys-75), because this res- idue is buried in the interior of the protein splicing domain. Rather, we presume that another reactive residue in PI-SceI in the vicinity of the endonucleolytic active site may be modified and cross-link to DNA.
Of the phenyl azide-conjugated protein variants, three cross-link to DNA at sites that are distinct from that of PI-SceI (Ϫ5Cys). Cross-linking occurs between azidophenacyl-PI-SceI 97 and A ϩ20 and A ϩ22 on the top and bottom strands of the substrate, respectively (Figs. 6 and 7). Cross-links also occur to the adjacent base on each strand at A ϩ19 and T ϩ21 . Azidophenacyl-PI-SceI 170 cross-links to A ϩ15 on the top strand and to T ϩ16 on the bottom strand. Of the azidophenacyl-PI-SceI conjugates modified in the endonuclease domain, the only observed cross-link occurs between azidophenacyl-PI-SceI 376 and C ϩ1 on the bottom strand.

DISCUSSION
The x-ray crystal structure of the PI-SceI protein has been determined, but no structure is yet available for the protein bound to its DNA substrate. Previously, we presented a model for the interaction of PI-SceI with DNA (15) using information from the recently reported structure of the related LAGLI-DADG endonuclease I-CreI complexed to its recognition sequence (10). I-CreI is structurally analogous to the PI-SceI endonuclease domain only, and as a consequence, the PI-SceI model describes how this domain contacts the DNA region surrounding the PI-SceI cleavage site. To elucidate additional features of the PI-SceI-DNA complex, we employed affinity cleavage and UV photocross-linking methods to map particular residues to regions of the PI-SceI substrate. We have interpreted these results by constructing a DNA docking model that takes advantage of the new C2 PI-SceI structure in which the modified residues are ordered. The model also depicts how the DRR might interact with DNA.
Docking of I-CreI DNA to PI-SceI was accomplished by least squares overlapping the C2 PI-SceI structure with the I-CreI-DNA complex structure. (Fig. 8). The resulting model fulfills the constraints imposed by extensive mutational (6,11,17), biochemical (6 -8, 11, 16, 17), and structural studies (9). The DNA is positioned within a large cleft of the endonuclease domain on two symmetrically related ␤-sheets. Residues Lys-340 and Tyr-328, which have been identified by alanine-scanning analysis as potential DNA binding contacts (17), are positioned close to two base pairs, G/C ϩ3 and G/C ϩ4 , that are required for PI-SceI binding (6). The two conserved acidic residues, Asp-218 and Asp-326, that are thought to ligate the essential metal ion co-factor(s) (11) are located near the two scissile phosphodiester bonds. Furthermore, the model indicates that His-333, which is located at one edge of the endonuclease domain at the interdomain boundary, is near T ϩ9 on the bottom strand, which is consistent with the observed crosslinking between these groups (16). Finally, the model includes two regions of DNA distortion (6,7), one near the cleavage site region that is intrinsic to the I-CreI DNA and one near the DRR. The overall amount of DNA distortion is ϳ55°as measured by the program CURVES (29). This amount may be lower than the experimentally determined value (60 -75°; Refs. 6 and 7), because the 10°DNA bend induced by I-CreI may underrepresent the degree of bending induced by the PI-SceI endonuclease domain.
An important prediction of the model is that the extended ␤21-␤22 hairpin loop is located in the major groove adjacent to the PI-SceI cleavage site. This loop is not perfectly positioned in the major groove in the model but instead sterically clashes with one of the DNA strands. The loop is disordered in the P2 1 structure, suggesting that it is extremely flexible in the absence of DNA, and we propose that its actual conformation in the complex may be different than in the C2 crystal form. That this loop is located in the major groove is supported by the observed cross-linking of azidophenacyl-PI-SceI 376 to C ϩ1 on the bottom strand. The affinity cleavage data indicate that the reactive iron group on FeBABE, whether tethered to Cys-376 or to Cys-378, is situated over the DNA minor groove (Fig. 8). If the ␤21-␤22 loop is within the major groove, the FeBABE moiety may extend across the minor groove where cleavage occurs. The 2-bp offset between the sites of maximum DNA cleavage by FeBABE-PI-SceI 376 and FeBABE-PI-SceI 378 reflects the slightly different positions of the two residues relative to the DNA (Fig. 8). Additional support for the location of the ␤21-␤22 loop in the major groove comes from a methylation interference analysis that demonstrates major groove interactions at G ϩ1 and G ϩ3 on the top strand (6).
What role might the extended loop play in the overall PI-SceI reaction pathway? A possible scenario involves initial localization of PI-SceI to the recognition sequence through base-specific interactions made by residues within the DRR, followed by clamping of the flexible ␤21-␤22 loop onto the DNA to position the endonucleolytic active site near the scissile phosphodiester bonds. Mutagenesis studies suggest that some of these contacts involve residues Lys-369 and Lys-377 and base pair G/C ϩ1 , and we speculate that the contacts made by the extended loop contribute a significant portion of the DNA binding energy. The ␤21-␤22 loop in PI-SceI is significantly longer than the structurally analogous ␤-hairpin loops in two other LAGLIDADG homing endonucleases, I-CreI and I-DmoI (39). In a comparison of these enzymes, it has been commented that the size of the loops at the periphery of the endonuclease domains defines the extent of the recognition site, whereas the size of the inner loops, such as the PI-SceI ␤21-␤22 loop or its analogues, determines the substrate specificity (39). Thus, the extended loop would be a reasonable target in efforts to re-engineer the PI-SceI specificity. On the opposite side of the DNA binding cleft from the ␤21-␤22 loop, the model shows that Ser-230 is in close proximity to the nucleotides that are cut by FeBABE-PI-SceI 230 (Fig. 8). Ser-230 falls within a short loop between ␤14 and ␣5 with three residues, Asp-229, Arg-231, and Asp-232, that participate in the reaction pathway (17).
No high resolution structure exists for any DNA that binds to a protein splicing domain, so we modeled a hypothetical DNA structure to represent the splicing domain-DNA interaction (Fig. 8). In extending the DNA from the end of the I-CreI DNA structure, three criteria were satisfied: 1) three base pairs (A/T ϩ16 , G/C ϩ18 , and A/T ϩ19 ) and two amino acids (Arg-90 and Arg-94), defined by mutagenesis as essential for binding (6,17), were positioned in close proximity to each other; 2) the DNA was distorted in this region to conform to circular permutation studies (6, 7); and 3) protein-DNA contacts were located in the major groove near two guanines on the top strand, G ϩ13 and G ϩ18 , where methylation at the N-7 position interferes with PI-SceI binding (6). This model resembles one that has been recently reported (16), with the important difference that it predicts a major groove interaction for the ␤9-␤10 loop, which is invisible in the P2 1 crystal structure. Moreover, the model is consistent with ethylation interference and hydroxyl radical footprinting studies that indicate that residues in the splicing domain, such as Lys-53 and His-170, mediate DNA binding by making contacts to the phosphate backbone (6).
The affinity cleavage and photocross-linking data reported here are the first to map interactions between the DRR and the PI-SceI substrate. The close proximity of His-170 to positions A ϩ15 and T ϩ16 and of Lys-97 to positions A ϩ20 and A ϩ22 in the model can account for the observed cross-linking (Fig. 8). The affinity cleavage pattern of FeBABE-PI-SceI 91 is shifted toward the 3Ј-end of each strand, which indicates that the reactive moiety on this protein is situated over the minor groove. If residue 91 were located in the major groove, the FeBABE group might be expected to extend across the adjacent minor groove where cleavage occurs. Like the ␤21-␤22 loop of the endonuclease domain, the DRR is likely to be flexible in the absence of DNA, so we cannot rule out that there are conformational differences between PI-SceI in the complex and the protein depicted in the model. Regardless, it is satisfying that there is good agreement between the model and the mapping data.
In conclusion, PI-SceI employs a singular strategy to effect base-specific recognition of a target sequence that is significantly longer than those of many other homing endonucleases. In common with other LAGLIDADG homing endonucleases, PI-SceI makes base-specific contacts using residues within loops located in the endonuclease domain (39). However, it also establishes specific contacts nearly two helical turns distant from the PI-SceI cleavage site by using residues in the DRR of the protein splicing domain, which is absent from most homing endonucleases (40). The high affinity interaction permits the splicing domain to bind independently to the distal region of the SceI substrate (16). It has been suggested previously that PI-SceI and related enzymes formed when an ancestral homing endonuclease gene, which may have encoded an enzyme that FIG. 8. Hypothetical model of PI-SceI bound to its DNA recognition sequence. A, the PI-SceI structure determined from the C2 space group crystal is shown docked to a model of the 31-bp PI-SceI recognition sequence. The two conserved LAGLIDADG motif ␣-helices within the PI-SceI endonuclease domain are colored magenta. The PI-SceI side chains of Arg-91 (gray), Lys-97 (green), His-170 (dark blue), Ser-230 (purple), Lys-376 (pink), and Lys-378 (red) within the protein splicing and endonuclease domains are depicted. The phosphates of bases that undergo affinity cleavage and/or affinity photocross-linking are colored the same as the side chains of residues that effect these reactions at these positions. Spokes that represent the DNA bases are colored red (adenine), yellow (thymine), green (guanine), and blue (cytosine). B, same as A but rotated 180°about a vertical axis. C, same as A but rotated 90°about a horizontal axis. The molecule is viewed down the 2-fold symmetry of the endonuclease domain. The figure was prepared using the program SPOCK (42). resembled I-CreI, became associated with a gene that encoded a protein splicing element (9,41). The further addition of the DRR to the splicing domain and its associated or evolved DNA binding activity may have extended the specificity of PI-SceI, thereby permitting it to locate and cleave its recognition sequence within the context of the complex yeast genome.