Structures of the Arm-type Binding Domains of HPI and HAI7 Integrases*

The structures of the N-terminal domains of two integrases of closely related but not identical asn tDNA-associated genomic islands, Yersinia HPI (high pathogenicity island; encoding siderophore yersiniabactin biosynthesis and transport) and an Erwinia carotovora genomic island with yet unknown function, HAI7, have been resolved. Both integrases utilize a novel four-stranded β-sheet DNA-binding motif, in contrast to the known proteins that bind their DNA targets by means of three-stranded β-sheets. Moreover, the β-sheets in IntHPI and IntHAI7 are longer than those in other integrases, and the structured helical N terminus is positioned perpendicularly to the large C-terminal helix. These differences strongly support the proposal that the integrases of the genomic islands make up a distinct evolutionary branch of the site-specific recombinases that utilize a unique DNA-binding mechanism.

The structures of the N-terminal domains of two integrases of closely related but not identical asn tDNA-associated genomic islands, Yersinia HPI (high pathogenicity island; encoding siderophore yersiniabactin biosynthesis and transport) and an Erwinia carotovora genomic island with yet unknown function, HAI7, have been resolved. Both integrases utilize a novel fourstranded ␤-sheet DNA-binding motif, in contrast to the known proteins that bind their DNA targets by means of three-stranded ␤-sheets. Moreover, the ␤-sheets in Int HPI and Int HAI7 are longer than those in other integrases, and the structured helical N terminus is positioned perpendicularly to the large C-terminal helix. These differences strongly support the proposal that the integrases of the genomic islands make up a distinct evolutionary branch of the site-specific recombinases that utilize a unique DNA-binding mechanism.
Genomic islands, together with temperate phages, integrative plasmids, transposons, and integrative conjugative elements, make up the group of mobile genetic elements that play an important role in bacterial quantum leap evolution and adaptation (1)(2)(3). Genomic islands that carry clustered genes encoding vital functions supply bacteria with additional capabilities to withstand and overcome host defenses and to improve fitness. Genomic islands are integrative elements that are not able to self-transfer and replicate. Typically, they are composed of functional and recombination modules. The recombination module consists of a tyrosine family integrase and two attachment sites involved in recombination. The integrase promotes attPϫ attB site-specific DNA recombination of the genomic islands into highly conserved tRNA-encoding genes (attB recombination targets) of the host genome and subsequent excision (4,5).
It has been demonstrated that the N-terminal domain of phage integrases is responsible for specific recognition of the arm-type site sequence of the attachment sites, a step that is essential for activity of the catalytic C-terminal domain responsible for the strand exchange. Three-dimensional structures of the N-terminal domains of two prokaryotic integrases, namely bacteriophage integrase and Tn916 transposon integrase, have been determined. Although they do not share significant sequence homology, both adopt similar structures and recognize the arm-type DNA site by inserting their N-terminal domain into a major groove of DNA (6 -9). The N-terminal domain, consisting of a three-stranded antiparallel ␤-sheet, is proposed to be a new DNA-binding motif whose residue composition and position within the major DNA groove varied to alter specificity (6). Nevertheless, the genomic islands are evolutionarily divergent from phages and other mobile elements and represent a distinct mobile genetic element class. Moreover, island-encoded integrases are not closely related to phage integrases, as was expected previously (3,10).
Four closely related genomic islands, Yersinia HPI (high pathogenicity island; encoding siderophore yersiniabactin biosynthesis and transport) (11), Ecoc54N or the pks island (encoding the cytotoxic polyketide colibactin in uropathogenic Escherichia coli CFT073) (12), and two genomic islands with yet unknown functions in Erwinia carotovora, HAI7 and HAI13, contain highly similar but not identical integrases that recognize asn tDNA genes as their attB integration sites (13). In contrast to the highly conserved bacterial attB attachment site, the element-encoded attP sites and integrases are subjected to sequence fluctuations. Parallel evolution of the corresponding integrase with its cognate attP sites results in inability of the evolved integrase to support recombination of the heterologous attP and to substitute heterologous integrase that has the same bacterial target. In this work, we present new insights into the evolution of two site-specific genomic island integrases through the three-dimensional structures of their N-terminal domains that utilize a novel four-stranded ␤-sheet DNA-binding motif.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-The construct of the arm-type binding domain of HPI integrase comprised residues 1-80, whereas a corresponding domain of the HAI7 integrase covered residues 1-100. Both constructs were cloned into the pET21b vector (Novagen). Native proteins were expressed in the BL21(DE3) strain grown on standard LB medium, whereas for the selenomethionine labeling, the B834(DE3) strain was used in a defined modified M9 medium with methionine being replaced by selenomethionine. For all preparations, the medium was inoculated with freshly transformed overnight cultures, grown at 37°C until A 600 ϭ 0.6 -0.7 was reached, and induced with 1 mM isopropyl ␤-D-thiogalactopyranoside. Afterward, the temperature was lowered to 27°C, and the cells were grown for ϳ16 h. After harvesting by centrifugation, cells were resuspended in lysis buffer (300 mM NaCl, 50 mM Na 2 HPO 4 , and 10 mM imidazole), broken by sonication, and centrifuged to remove cell debris. Clear supernatant was applied on nickel-nitrilotriacetic acid resin pre-equilibrated with lysis buffer. The column was then washed with lysis buffer supplemented with 20 mM imidazole, and the proteins were eluted with lysis buffer containing 250 mM imidazole. Eluted proteins were applied to a Superdex S-75 preparative gel filtration column in crystallization buffer (50 mM NaCl and 10 mM Tris, pH 7.6). Purity of the proteins was confirmed by SDS-PAGE and mass spectroscopy.
Crystallization-Crystallization was performed with a sitting drop vapor diffusion method by mixing equal volumes of the proteins and reservoir solutions, using varied starting concentrations of the proteins. Crystals of the HPI integrase domain appeared typically after 24 h. The crystals chosen for the native data set originated from the condition containing 0.1 M trisodium citrate, pH 5.6, and 35% (v/v) tert-butyl alcohol. The selenomethionine crystals were harvested from a drop containing 0.2 M trisodium citrate, 0.1 M sodium cacodylate, pH 6.5, and 30% (v/v) isopropyl alcohol with cryoprotection with 2-methyl-2,4-pentanediol. Crystals of the HAI7 domain initially grew in 0.1 M MES, 2 pH 6.0, and 3.2 M ammonium sulfate, and crystals optimized for data collection were obtained from a drop containing 0.1 M MES, pH 6.3, and 2.8 M ammonium sulfate.
Data Collection and Structure Determination-Crystals were plunge-frozen in the cryoprotectant solution containing 30% 2-methyl-2,4-pentanediol in the mother liquor. The diffraction data were measured on the PXII beamline of the Swiss Light Source (Villigen, Switzerland). Data were indexed, integrated, and scaled with the XDS package (14). The native data set from the Int HPI 80 crystals was collected at 0.9873 Å, and native crystals were diffracted up to 1.3 Å. The selenomethionine derivative was measured at 0.9796 Å for the peak, 0.9796 Å for the inflection point, 0.972 Å for the high remote, and 0.9875 Å for the low remote data sets; all data sets were collected up to 2.0 Å. Two selenomethionine crystals were measured, and corresponding data sets were merged to obtain higher redundancy. Data sets were of high quality and showed strong anomalous signals. Anomalous scatterers were found using SHELXD software (15). Interestingly, two sites, instead of the expected one site, were found. Map analysis showed that the selenium atom is very close to a special position and that the selenomethionine side chain assumes different conformations in each asymmetric unit, thus locally breaking the crystallographic symmetry. (This is also the case in the native data for the sulfur atom.) Two initial atom positions were refined using the autoSHARP software package (16). The resulting phases were improved by the DM program (17) and used for automated model building with ARP/wARP software (18). The resulting model of ϳ80% completeness was inspected and finished manually with the Xfit program (19). Restrained refinement enforced by Refmac5 software was then performed using the native data, followed by the addition of water molecules by ARP/wARP (20). Data collection, phasing, and refinement statistics are presented in Table 1. Most of the model has a clear and well interpretable electron density with the exception of a few solvent-exposed side chains. These parts were omitted in the final model. The R-factor of the presented structure of the HPI integrase domain is 20.8%, and R free is 23.5%.
Data for the HAI7 integrase domain crystal, collected to 1.6 Å, were integrated, scaled, and merged by XDS and XSCALE programs. The structure was determined by molecular replacement using the Molrep program from the CCP4 suite with the structure of the previously solved Int HPI 80 used as a probe. The model was then refined by Refmac5 and rebuilt by XtalView/ Xfit and by a subsequent Refmac5 refinement. Water molecules were added by the ARP/wARP program. Side chains of Lys 7 , Lys 16 , Lys 46 , Lys 47 , Arg 49 , Lys 90 , Gln 93 , Lys 95 , and Arg 96 had no interpretable electron densities and therefore were omitted in the model. The final R-factor of the structure of the HAI7 integrase domain is 14.7%, and R free is 21.2%.

N-terminal Domains of Int HPI and Int HAI7 Bind Short Direct
Repeats in Heterologous attP Sites-Previously, we reconstituted the attP site of HPI and defined several copies of imperfect direct repeats located in the regions that can be designated "arms" by analogy with integrases of temperate phages (5). To determine whether an N-terminal part of Int HPI indeed specifically recognizes attP HPI DNA, Int HPI 80 was tested in the electrophoretic mobility shift assay for its ability to bind the attP HPI fragment containing direct repeats. This Int HPI 80 protein (spanning Met 1 -Asn 80 ) was able to bind only its cognate attP HPI but not attP HAI7 fragment (Fig. 1). Similarly, the Int HAI7 100 protein (Met 1 -Arg 100 ), representing the N-terminal domain of Int HAI7 , was able to form retarded bands with only attP HAI7 but not attP HPI DNA (Fig. 1).
Overall Structures-The structure of the arm-type binding domain of the HPI integrase (residues 1-80) comprises a fourstranded antiparallel ␤-sheet preceded and followed by ␣-helices ( Fig. 2A). The N-terminal ␣-helix H1 covers Asp 5 -Thr 10 and is positioned perpendicularly to the C-terminal larger helix H2. Helix H1 is followed by a 7-residue-long loop connecting it to the first ␤-strand B1 (Phe 18 -Ser 23 ). ␤-Strands B1, B2 (Leu 26 -Lys 31 ), B3 (Gly 34 -Ile 44 ), and B4 (Lys 47 -Ala 55 ) are connected by short turns comprising 2 amino acids. ␤-Strands are parallel in the following pairs: B1 with B2 and B3 with B4. The first two strands (B1 and B2) form angle of ϳ45°with respect to strands B3 and B4. The ␤-sheet is connected to ␣-helix H2 (Leu 61 -Ala 76 ) by the 5-amino acid loop. This helix is positioned almost parallel to strands B3 and B4. The ␤-sheet, together with helix H2, forms the L-shaped hydrophobic core of the protein.
The structure of the HPI integrase domain shows a high resemblance to that of the arm-type binding domain of the HAI7 integrase (Fig. 2B). The positioning of helices and ␤-strands is identical, and the root mean square deviation of superimposed structures (aligned taking into account C-␣ atoms) is 0.81 Å. The largest difference in the main chain tracing is localized in the loop between strands B3 and B4, with the Int HAI7 100 main chain bending toward helix H1. This segment   shows also a difference in its primary structure with the large residues (Glu 45 and Lys 46 ) in Int HAI7 100 exchanged for small ones in Int HPI 80 (serine and glycine, respectively) (Fig. 3). The crystallized Int HAI7 construct is longer than Int HPI (spanning residues 1-100). The residues, located Cterminally to helix H2, are visible in the crystal structure, but this region is largely unstructured, with only residues 81-83, 86 -88, and 92-95 forming very short helices. The electron density is, however, well defined, and the polypeptide chain is arranged antiparallel to helix H2. A comparison of the amino acid sequences of the arm-type binding domains of Int HPI and Int HAI7 is shown in Fig. 6. Conserved residues can be divided into several groups. The amino acids of the N-terminal helix H1 and the following loop form a large cluster of conserved residues (8 being identical and 4 being similar). Several of these residues stabilize the "vertical position" of helix H1 through formation of hydrogen bonds. For example, Arg 65 binds the carbonyl group of Leu 3 , and Leu 3 also seems to create a hydrogen connection with the conserved Asp 22 . Additionally, in Int HPI , the hydroxyl group of Tyr 56 binds the carboxyl group of the conserved Asp 5 from the top of helix H1.

Ramachandran statistics
The interactions that connect helix H2 to strand B1 seem to be of special importance because all residues responsible for this interaction are conserved. Asp 22 , Arg65, and Arg 68 (exclusively in Int HPI ) create a network of hydrogen bonds between one another. Arg 68 in Int HAI7 , although conserved, does not contribute to the interactions. Another group of well conserved residues are those present in or in the proximity of the turns: Gly 25 in the turn between strands B1 and B2; Gly 34 and Ser 35 in the proximity of the turn between strands B2 and B3; and Gly 54 , Pro 57 , and Ala 58 in the turn between strand B4 and helix H2.
The Int HAI7 100 structure gives additional insight into the positioning of the C-terminal part of the protein. A dense network of hydrogen bonds formed by Lys 73 -Ser 82 caps the C terminus of helix H1 and thus fixes the region with respect to this helix. The following part of the C terminus does not form hydrogen interactions with the rest of the protein, with the exceptions of the carbonyl oxygen of Asn 89 bonding Arg 67 and the side chain of Asp 63 interacting with NH of Ile 92 (Fig. 4). It should be mentioned here that, as for proteins of such relatively small sizes (10 -12 kDa), the arm-type binding domains of integrases and especially Int HPI possess a rich H-bonding network.
Comparison with Structures of Three-stranded ␤-Sheet DNAbinding Proteins-Several structures of arm-type binding domains of integrases have been solved (6, 21, 22). The align-   NOVEMBER 13, 2009 • VOLUME 284 • NUMBER 46 ment of the arm-type binding domain of HPI with the integrase and the GCC box-binding protein is shown in Fig.  5; the root mean square factors are 1.51 and 1.48 Å for the backbone atoms, respectively. These proteins bind their DNA recognition sites by means of the three-stranded ␤-sheet. Our structures differ from these published structures by a prominent extra ␤-strand, which is an integral part of the ␤-sheet. Furthermore, all ␤-strands in the structure of the arm-type binding domains of HPI and HAI7 integrases are significantly longer than in any other arm-type DNAbinding domains, with as many as 11 residues projecting from the ␤-sheet in the direction of the potential DNA-binding cavity (with the Tn916 transposon integrase having 9 of them and the integrase only 7) (Fig. 6).

Structures of HPI and HAI7
The other distinguishing feature of our integrase domain is its structured helical N terminus positioned perpendicularly to the large C-terminal helix. Among all known integrases, a small N-terminal helix is present only in a newly described structure of the integrase in complex with DNA (9). The positioning of this helix is, however, different from that in the HPI and HAI integrases, with the N terminus of the integrase pointing away from the main body of protein (Fig. 5). Because the structure of the N terminus of the integrase proved to be significantly different in the free versus DNA-bound state, it cannot be excluded that the N-terminal helix of HPI and HAI7 integrases could change its conformation upon binding to the attachment site.
It is still unconfirmed whether the mode of DNA binding is conserved between the integrases possessing three-stranded ␤-sheets and the HPI and HAI7 integrases. The space created by a concave structure beneath the ␤sheet is, however, large enough to fit the major groove of DNA. Furthermore, the charge distribution in Int HPI 80 and Int HAI7 100 is similar to that in other arm-type binding proteins, with positive charges gathered on the concave surface of the ␤-sheet (Fig. 7).

Comparison of DNA-binding Cavities of N-terminal Domains of HAI7 and HPI Integrases-
The largest observed differences between our solved structures appear, as expected, on the putative DNAbinding interface. The majority of the residues conserved between Int HPI 80 and Int HAI7 100 present in the ␤-sheet have their side chains directed toward the core of the interface between the ␤-sheet and helix H2 (Leu 28 , Val 30 , Trp 38 , Leu 40 , and Tyr 42 ). Three conserved residues are facing the putative DNAbinding cavity, namely Lys 19 , Leu 29 , and Arg 43 . Other residues pointing in the same direction differ substan-   tially between the HPI and HAI7 integrases, as well as (although to a lesser extent) between the HAI13 and HAI7 integrases.
Comparison of the putative DNA-binding cavities of Int HPI 80 and Int HAI7 100 shows several differences. First, the proximal border of the cavity (composed of loop L1, parts of ␤-strands B1 and B2, and the turn between them) in Int HPI 80 bears a strong positive charge, introduced by Lys 16 , Lys 19 , and Lys 31 (Fig. 7). The situation differs in the case of Int HAI7 100, which not only lacks Lys 31 (substituted with histidine, which remains buried) but also presents the negative charge from Glu 17 protruding from the surface of the cavity. An exact position of the side chain of Lys 16 , which is present in both proteins, could not be detected in the Int HAI7 100 structure, probably due to its high mobility.
The central part of the cavity (formed by the middle parts of strands B3 and B4) in Int HPI 80 is dominated by the presence of the positive charge of Lys 41 , next to the negative charge supplied by Glu 48 . In Int HAI7 100, none of these residues is present (Lys 41 being substituted with Ser, and Glu 48 with Gln). However, Leu 50 protrudes into the cavity approximately in the same place as the charged residues in Int HPI 80. The distal rim of the cavity in Int HAI7 100 (formed by the C-terminal end of strand B3, the N-terminal part of strand B4, and the loop between them) seems larger and bulkier than that in Int HPI 80.
Putative Residues Interacting with DNA-The Int HPI and Int HAI7 DNA-binding interfaces bear too many differences to unambiguously link specific residues with the DNA base recognition. Therefore, we used a related integrase, Int HAI13 , which shows a strong sequential resemblance to Int HAI7 , to obtain more insight into the residues' functions. A model of the Int HAI13 DNA-binding cavity was constructed based on the structure of HAI7 with surface-exposed residues mutated in the Swiss-PdbViewer (Fig. 8). Comparison of the crystal and modeled structures shows that differences group in a cluster in the central part of the cavity (residues mutated: H27T, R39Q,  S41G, and L50V). This results in a slight change of the surface of the cavity, mostly induced by the substitution of Leu 50 with a smaller Val side chain. Furthermore, the hydrogen bond pattern between the protein and the target DNA fragment can be changed because a histidine, which can serve both as a donor and an acceptor, is substituted with a donor residue, tyrosine. The reverse is the case for the R39Q substitution, where a double donor is exchanged for the donor ϩ acceptor residue. Another significant change in the DNA-binding interface is the substitution of a negatively charged Glu 17 for Val.
The sequences of DNA recognized by the inspected integrases, supplemented by the sequences bound by HAI13, are presented in Fig. 9. Each of the integrases binds to two attachment sites: the right (attR) and the left (attL). Furthermore, each of them contains two binding sites (P1 and P2). Alignment of all 12 binding sites shows that base 4 (indicated in blue in Fig.  9) differs among all integrases. Base 9 (indicated in green) remains the same in the attachment site of both HAI integrases, whereas it is changed in Int HPI . Therefore, differences in the structure of the binding cavity between Int HAI7 and Int HAI13 can be attributed to the recognition of base 4.
Given these two major spatial localizations of differences, it is impossible to safely predict which area is responsible for the differentiation between the Int HAI7 and Int HAI13 binding sites. However, the residues interacting with base 9 are most probably localized in the Glu 48 /Lys 41 region.

DISCUSSION
Two closely related but still non-identical asn tDNA-associated integrases of the four genomic islands give us a unique opportunity to superimpose differences in their amino acid structure and the cognate co-evolved attP recombination sites to address specific protein-DNA recognition. The structures of the arm-type binding domains of both Int HPI and Int HAI7 integrases resemble each other and comprise a four-stranded antiparallel ␤-sheet preceded and followed by ␣-helices. The main difference can be localized in the loop between strands B3 and B4, with the HAI7 main chain bending toward helix H1.
Comparison with the structures of the arm-type binding domains of other integrases demonstrates that the arm-type binding domains of Int HPI and Int HAI7 differ by the presence of an extra ␤-strand. In contrast, all known proteins bind their DNA recognition sites by means of three-stranded ␤-sheets (6,21,22). Additionally, all ␤-strands in the structures of the armtype binding domains of HPI and HAI7 integrases are significantly longer than those in three-stranded arm-type binding domains, which enables mimicking the shape of the major groove of a longer DNA fragment. HPI and HAI7 attachment sites comprise eight nucleotides, the same number found as DNA-interacting for the integrase (9). In the integrase, however, only six nucleotides are recognized through the interaction of the ␤-sheets with the major groove, whereas the remaining two are bound by the N-terminal helix inserting into the minor groove (9). Therefore, it can be speculated that the long ␤-sheets in Int HPI and Int HAI7 are sufficient for recognition without a contribution of the N-terminal helix. The other distinguishing feature of the Int HPI and Int HAI7 N-terminal domains is their structured helical N terminus positioned perpendicularly to the large C-terminal helix. These differences support the proposal that integrases of the genomic islands form a distinct evolutionary branch of the site-specific recombinases different from that of phage integrases (3). On the other hand, integrases of the genomic islands are bidirectional recombinases that efficiently support both integrative and excisive recombination and do not require an additional recombination directionality factor (RDF), in contrast to the phage integrases. For example, the excisive activity of HPI by its RDF is supported Ͻ10-fold (23), in contrast to the highly efficient RDF of bacteriophage . Moreover, not all asn tDNA-associated islands possess an RDF.
Evolutionary integrases of the temperate phages might represent a further step in function speciation to increase the efficiency of prophage rescue under unfavorable conditions. In contrast, genomic islands, being unable to replicate and to transfer, are long term passengers of the bacterial chromosome with the dominating integrative function. In addition, the inefficient RDF is not clustered with the cognate integrase, as in the case of temperate phages. This supports the proposal of their independent and consecutive acquisition by the integrative modules of genomic islands. Likewise, differences in the structures of Int HPI and Int HAI7 might be more suited for the recognition of both attP and recombinant attL/attR sites to support efficiently both types of site-specific recombination. Until now, it also had not been demonstrated whether the mode of DNA binding is conserved between the integrases possessing threestranded ␤-sheets and Int HPI and Int HAI7 .
Our resolved structures of the arm-type binding domains of the two asn tDNA-associated integrases show the largest observed differences between two structures on putative DNAbinding interfaces. Although 3 conserved residues are facing the putative DNA-binding cavity (Lys 19 , Leu 29 , and Arg 43 ), all other residues pointing in the same direction differ substantially between the HPI and HAI7 integrases, as well as (although to a lesser extent) between Int HAI13 and Int HAI7 . Such ␤-sheet sequence-specific DNA binding is very rare and has been demonstrated only for three proteins that form complexes with DNA, namely the bacteriophage integrase, the Tn916 transposon integrase, and the ethylene-responsive factor AtERF1 from Arabidopsis thaliana.
At the moment, it is impossible to demonstrate what DNA nucleotides are actually interacting with the amino acid residues in the DNA-binding interface. However, co-crystallization of the arm-type binding domains with the cognate DNA arms might be the next step to solve this problem.