Functional Mapping of Cre Recombinase by Pentapeptide Insertional Mutagenesis

Cre is a site-specific recombinase from bacteriophage P1. It is a member of the tyrosine integrase family and catalyzes reciprocal recombination between specific 34-bp sites called loxP. To analyze the structure-function relationships of this enzyme, we performed large scale pentapeptide insertional mutagenesis to generate insertions of five amino acids at random positions in the protein. The high density of insertion mutations into Cre allowed us to identify an unexpected degree of functional tolerance to insertions into the 4-5 beta-hairpin and into the loop between helices J and K (both of which contact the DNA in the minor groove) and also into helix A. The phenotypes of the majority of inserts allowed us to confirm a variety of predictions made on the basis of sequence conservation, known three-dimensional structure, and proposed catalytic mechanism. In particular, most insertions into conserved regions or secondary structure elements inactivated Cre, and most insertions located in nonconserved, unstructured regions preserved Cre activity. Less expectedly, the non-conserved and poorly structured loops and linkers between helices A-B, E-F, and M-N did not tolerate insertions, thus identifying these as critical regions for recombinase activity. We purified and characterized in vitro several representatives of these "unexpected" Cre insertion mutants. The role of those regions in the recombination process is discussed.


INTRODUCTION
The phage P1 Cre recombinase is a member of the tyrosine integrase family of sitespecific recombinases whose members use a conserved tyrosine residue as a catalytic nucleophile. Molecular analysis of Cre has provided a wealth of biochemical and structural information on recombination mechanisms, and this enzyme is also extensively used for engineering chromosomes of higher eukaryotes.
The Cre protein catalyzes precise recombination between two copies of its target sequence, the 34 bp loxP site. Two Cre molecules bind specifically to a single loxP site, and a pair of such Cre occupied loxP sites come together to form a synaptic complex. A Holliday junction intermediate is then produced by cleavage and subsequent religation of DNA strands, followed by isomerization, cleavage and religation of crossing DNA strands to form the recombination products (1,2).
Despite decades of study on Cre-mediated recombination, and the availability of highresolution three-dimensional structures of wildtype (wt) 1 and mutant Cre proteins (3)(4)(5)(6)(7)(8)(9), many mechanistic details of the recombination process remain unclear. Details regarding the functional determinants of DNA recognition and binding, DNA bending, synapse formation and partner subunit activation all remain poorly understood. Nevertheless, characterization of Cre mutants with single amino acid substitutions have identified a number of important amino acid residues involved in catalysis, DNA binding and bending and synapse complex formation (10)(11)(12)(13)(14). 5 transposase (20) or possible toxicity of the cre mutants to the host, the pattern of insertions appeared nearly random.

Cre mutant selection
Insertion mutants of Cre that retain recombinase activity were selected by their ability to excise a loxP-flanked transcription terminator (12) that prevents expression of a neo gene.
Briefly, the expression library of Cre mutants was electroporated into DH5 [pBS848], where pBS848 is a pACYC-based Cm R plasmid carrying the loxP 2 rrnB T1T2 terminator cassette inserted between the neo gene and a lac promoter. Electroporation and selection for kanamycin resistance was as described (12) except cre was induced for only 1 hr with 0.20% L-arabinose.
Resulting Ap R Cm R Kn R colonies were pooled, plasmid DNA was purified and, to minimize contamination by carryover, the selection procedure was repeated two more times. Digestion of DNA with NcoI before retransformation eliminated the loxP plasmid while retaining the mutant Cre-expressing plasmids which have no NcoI sites. In the absence of cre the frequency of Kn R colonies was less than 1 x 10 -5 .
A modification of the above procedure, namely the omission of the arabinose induction step to lower the level of Cre expression, shortened the time window available for recombination and therefore allowed us to generate a second sublibrary enriched for highly active mutants.
A sublibrary of inactive cre mutants was selected using E. coli strain NS2300 (10). This strain contains a neo gene flanked by loxP sites in the chromosome. Thus, only transformants with an inactive cre gene remain Kn R . Electroporation and selection was as above with 1 hr of induction with 0.20% L-arabinose. The background frequency of Kn R was 0.04% as determined by transformation with a wt cre pBAD18 plasmid indicating high efficiency of neo gene by guest on March 25, 2020 http://www.jbc.org/ Downloaded from 6 excision. Plasmid DNA was purified and a second round of selection was imposed to eliminate carryover contamination from any active Cre mutants that might have survived the first round of selection.

Protein purification
Cre protein and its mutants were expressed to high levels in E.coli BL21(DE3) LysS using a T7 expression system, and then purified to homogeneity and stored as described previously (12). The concentrations of wt and mutant Cre proteins were determined by spectrophotometry at 280 nm using an 280 for wt Cre of 1.17 x 10 -5 M -1 cm -1 (21). Cre was diluted to a working concentration of 1 µM in 20 mM Tris-HCl pH8.0, 1 M NaCl, 1 mM EDTA, 25% glycerol and 100 ng/µl BSA prior to use in vitro.

Recombination in vitro
For recombination in vitro, the 6.8 kb loxP 2 plasmid pBS835 was cleaved with BglI and NotI to generate two DNA fragments (4.2 and 2.6 kb) with one loxP site per fragment.
Intermolecular recombination between loxP sites yields two DNA fragments (5.5 and 1.3 kb) readily distinguishable from the substrate fragments by size. All recombination reactions were in a 12 µl reaction volume containing Cre reaction buffer (50 mM Tris-HCl pH 7.5, 140 mM NaCl, 10 mM MgCl 2 ), 2 nM (100 ng) DNA substrate and 83 nM of Cre. Reactions were incubated at 37 o C for 1 hour, terminated by phenol/chloroform extraction and ethanol precipitation, and analyzed by electrophoresis in 1% agarose gels. Gels were quantified using a PhosphorImager (Molecular Dynamics, Sunnyvale, CA) scanner.

Electrophoretic mobility shift assay
8

Protein Sequence Alignment
We performed PSI-BLAST searches (22)  Mesorhizobium loti, GI:17233219 from Nostoc sp. PCC 7120. Sequence alignment was performed using the T-COFFEE program (23) with following manual editing using MACAW (24). Sequence of XerD recombinase GI:16130796 from E. coli was aligned to the resulting multiple alignment profile of other sequences.

Automatic determination of insert locations
We wrote a program, IMLANAL (Insertion Mutation Library Analyzer), to precisely locate an insert in a mutated sequence. It requires the following inputs: a reference (wildtype) sequence; the sequence of the invariant part of the insertion (TGCGGCCGCA in the case of the Finzymes MGS kit ) and the size of the site targeted for transposition (5 bp); and the set of DNA sequence reads from presumed insert clones. The following outputs are produced: a Genbankformatted flat file containing one additional feature for each detected mutation; a corresponding GFF formatted file (describing only the mutation features) viewable in any GFF compatible 9 sequence viewer (we used Vector NTI from Informax, Inc.); and the record of anomalous situations. IMLANAL is implemented in Perl script, makes use of selected BioPerl modules, and is available at http://research.stowers-institute.org/mec/software/imlanal or upon request.

Selection of active and inactive cre mutants
We constructed a library of random pentapeptide insertions into an arabinose-inducible bp from duplication of the target DNA sequence at the insertion site. Thus, a gene gains a 5 amino acid insertion into the protein product. We designed the library to have about 75-fold coverage (3×10 5 independent insertions per ~4000 non-essential bp of plasmid) and then subcloned the 1067 bp HindIII -XbaI fragment carrying the 1029 bp cre gene to ensure that most insertions were in the cre gene. Three sublibraries were selected from the insertion library: a) cre mutants that retain recombinase activity, b) highly active cre mutants that retain recombinase activity even with low amounts of protein in vivo (selected without arabinose induction), and c) inactive cre mutants that have lost recombinase activity at loxP sites.
Recombination-proficient Cre mutants excise a loxP-flanked transcription terminator cassette (12) inserted between the neo coding sequence and the lac promoter, thus activating neo and giving a Kn R phenotype. By this measure, 39% of our peptapeptide insertion library was recombinationally active. This is the "active cre" sublibrary. Because of the mild selection, some mutants in this sublibrary could have somewhat diminished activity even though they are recombinationally proficient. With increased stringency of selection (no arabinose induction step), wt cre could still efficiently activate the neo gene to give a Kn R phenotype (48% of transformants); however, only 6% of insertion mutants were recombinationally proficient. At the third round of enrichment (see Experimental Procedures) 43% of transformants from this sublibrary yielded a Kn R phenotype. These mutants make up the "highly active cre" sublibrary and are presumed to have virtually undiminished Cre activity.
We selected inactive cre mutants using E.coli strain NS2300 (10) which carries a loxPflanked neo gene integrated into the bacterial chromosome. Only library transformants with a recombinationally inactive cre gene, unable to excise neo from the genome, maintain a Kn R phenotype. After transformation of NS2300 with the insertion library, 53% of transformed cells stayed kanamycin-resistant, reflecting the percentage of inactive Cre mutants. This is the "inactive cre" sublibrary. These results, taken together with the observed frequency of active cre mutants and assuming random DNA insertion, suggest that slightly greater than one third of the Cre protein can tolerate a pentapeptide insertion without significant loss of recombinase activity.
To gauge how well selection worked, we sequenced about 90 mutants from each of the three sublibraries. Because the Cre coding sequence comprises 97% of the targeted HindIII-XbaI restriction fragment, we expected insertions in non-coding regions to occur in both the active cre and highly active cre sublibraries, but not in the inactive cre sublibrary. Moreover, we suspected there would be a higher percentage of insertions into the non-coding regions in the pool obtained from selection under stringent versus more relaxed conditions for retention of function. Sequence analysis confirmed our expectations. Of 92 active cre mutants 4 insertions were in non-coding sequences, of 87 highly active cre mutants 13 were in non-coding sequences and of 93 inactive cre mutants none were in non-coding sequences.

An insertion-based functional map of Cre
Depending on the reading frame of insertion, the 15 bp inserted can be translated into 3 different types of pentapeptides (16). To distinguish these one from another we refer to these types of inserts as CGR, RP and AAA, with the designation representing the invariant amino acids in each reading frame insertion type. Sequencing of a large number of insertion mutants from each of the three sublibraries indicated that there was no strong correlation between the insertion sequence type and the mutant's activity (Table I). We therefore focused our attention on the effect of insert location on Cre protein function.
We mapped the locations of inserts from all three sublibraries with respect to the 343 amino acid Cre sequence and relative to Cre secondary structure elements (Fig. 1). Inspection reveals that for all three sublibraries the distribution of insertion mutations is patchy, that is, for each sublibrary there are clusters of insertions in some regions and there are other regions devoid of insertion. The pattern of patchiness is characteristic for each sublibrary. What is also immediately apparent is that there is a clear segregation between locations of inserts in active and inactive mutants. Not only do the inactivating inserts cluster, but there are also no active inserts within those clusters. A similar property holds for the clusters of active inserts. The lone exception is the region spanning β-strand 2 and helix I: most insertions in this region are 12 inactivating, but there is also one position that permits active insertions. In all, there are eight major clusters of active insertions and six clusters of insertions which abolish Cre activity.
The overall density of insertions was high enough to determine functional importance for most regions of Cre, with the exceptions of helices B, E and J where either no insertion or only one insertion was detected. Insertions that retain Cre activity tend to be outside of secondary structure elements. This was expected, as α-helices and β-strands are responsible for most of the essential structure and contain catalytic amino acid residues. Indeed, many of the inactivating insertions are in defined structural elements. Much less expected was that insertions in several unstructured regions also led to loss of Cre activity.
Comparison of the active and highly active cre sublibraries indicates that more stringent selection concentrated insertions primarily to three regions: the N-terminus with helix A, the extreme C-terminus and to the loop between the J and K helices. This result indicates that insertions into these three regions probably have little if any negative effect on Cre activity.
Purification and assay in vitro of three mutants from the highly active sublibrary, two from the J-K loop and one from the A-helix, support this notion: all three showed approximately the same recombination activity on a loxP substrate as wt Cre (data not shown). Conversely, clusters underrepresented in the highly active sublibrary compared to the active cre sublibrary suggest that insertions into these other regions (D-E loop, 1-2 loop and I-J region) may slightly reduce Cre activity.

Correlation with Secondary and Tertiary Structure
We counted as inserts into secondary structure only those inserts that change the sequence of a secondary structure element. Because the very N-terminal 19 13 are not resolved in the published crystal structures, we considered that region here as unstructured. Fig. 1 shows that there was a general tendency for non-inactivating insertions to be

Characterization in vitro of inactive mutants
Based on their occurrence in exposed loop/linker regions and in non-conserved regions, inactivating insertion mutations occurred in six unexpected places: the A-B loop, the C-D loop, the E-F linker, the β-strand 3 -helix I loop and the M-N linker. We therefore selected the following representative mutants (Fig. 3)  Biosciences) was similar to wt Cre protein, suggesting that they folded correctly. We tested the ability of purified mutant proteins to mediate recombination of the loxP sites in vitro (Table II).
All of the purified mutant proteins were unable to recombine those substrates, thus confirming the results of our genetic screen. Sensitive, quantitative recombination assays in vivo indicated that two mutants showed residual weak activity (0.21% of wt Cre for 215::GAAAL and 6.7% for 32::CGRIR), but no recombination was detected for the other three mutants (Table II).
We used a gel mobility shift assay to test the ability of all five mutants to perform the first step of recombination, namely, binding to the loxP site. A 159 bp 5'-32 P-labeled DNA fragment was incubated with each of the Cre mutants for 30 minutes at 37ºC and then analyzed by 6% native PAGE (Fig. 4). All of the mutants shifted the loxP fragment to the same complex 2 (c2) position (2 molecules of Cre per loxP site) observed with wt Cre. There were slight differences in c2 mobility, probably reflecting subtle differences in Cre-mediated DNA bending.
In particular, the increased mobility of the 333::CGRTG mutant suggests that its bending of the loxP substrate is less than that observed with wt Cre (Fig. 4A). More detailed gel shift analysis with varying concentrations of mutant and wt proteins indicated that the cooperativity of binding was similar to wt Cre except for two of the mutants: 215::GAAAL and 333::CGRTG. As estimated from quantitation of DNA binding (Fig. 4,  We next tested the cleavage competence of purified Cre mutants, i.e. their ability to form a covalent intermediate with target DNA, using either an intact loxP site or a suicide substrate having a nick one nucleotide away from the cleavage position. To offset the low efficiency of cleavage of the intact loxP substrate by the wt enzyme, presumably due to rapid religation following cleavage, the second "suicide" substrate releases one nucleotide after cleavage, making the reaction irreversible and thereby trapping the covalently attached intermediate. Fig. 5 shows that all of the mutants were unable to cleave the DNA substrate efficiently. Only one mutant, 32::CGRIR, was able to cleave the nicked substrate at moderate levels, but the ability to cleave an intact loxP site was severely diminished. Indeed, with 32::CGRIR the relative efficiency in cleavage of the intact vs. the nicked substrate was 16-fold decreased compared to wt Cre. Thus, insertions into five different non-conserved exposed linker/loop regions of Cre are defective in DNA recombination. Moreover, although they retain the ability to bind to the loxP substrate, they are unable to cleave loxP DNA.

DISCUSSION
Structure-function studies of proteins commonly focus on the role of single residues, typically those most evolutionarily conserved. However, the individual residues making up functionally important structural elements may not themselves be conserved. In this study, we

N-terminus and helix A
The functional role of the extreme N-terminus of Cre is unclear. The first 19 amino acids are unresolved in all published crystal structures (3)(4)(5)(6)(7)(8)(9). Neither the N-terminus nor helix A are well-

I-J region
The region between the I and J helices is extensive (226-255 aa) and contains a well conserved 4-5 β-hairpin (Fig. 2) that interacts with the minor groove of DNA (3). However, numerous insertions retaining recombinase activity were identified throughout this region.
Moreover, no inactivating insertions were observed in this region. The skewing of insertions towards the amino terminal portion of the region in the highly active mutant sublibrary does, however, suggest that insertions in the 4-5 β-hairpin and proximal to helix J may have reduced activity.

J-K loop
Insertions into the loop between the J and K helices (273-286 aa) do not inactivate Cre, and are especially common in the highly active mutant library, suggesting that these insertions cause no significant impairment. The J-K loop makes several contacts with DNA, including a minor groove contact of R282 with adenine-7 N3 (3). Curiously, the J-K loop is shorter in topoisomerases than in site-specific recombinases (31). The relative positioning of the J-K loop and DNA in crystal structures of Cre, λ Ιnt, hTopo I and Flp is shown in Fig. 6B. There are no minor groove protein contacts with DNA bases in these other protein structures, but all of them have contacts with phosphates. Our data suggest that interactions of the J-K loop with loxP are not critical for recombinase activity, although it could play some role in site selectivity. The quite different conformations of this loop suggest that its role may vary in different member of this protein family.

A-B loop
Inactivation of Cre by pentapeptide insertions into the A-B loop shows the functional importance of this region. The A-B loop makes contacts with helix E of a neighboring Cre 20 molecule in the synaptic complex. Two previously described recombinationally deficient point mutations in this region, A36V and T41F, bind target DNA in vitro but do not cleave suicide substrates (11,32), although they are able to cleave HJ intermediates. It was hypothesized that these mutants are deficient in bending and/or synapse formation. Our results with the 32::CGRIR mutant support this hypothesis for the role of the A-B loop. This mutant was unable to recombine in vitro, but bound DNA with approximately the same efficiency as wt Cre.
Interestingly, it was more than 160-fold reduced in cleavage activity on an intact substrate compared to wt Cre but only 10-fold reduced for cleavage activity on a nicked suicide substrate.
This, along with the slight change in c2 complex gel mobility, could be explained by a bending defect. The stimulation of cleavage we see by introduction of a nick into the spacer region may act by giving some flexibility to the spacer, which may in turn compensate for a bending defect in the 32::CGRIR mutant.

E-F linker
The cluster (127-144 aa) of inactivating insertions we obtained covering the E-F linker is somewhat unexpected. Except for K235 the sequence of this region is not conserved, nor is its length. From structural considerations it was proposed that this region may be involved in formation of type I and type II interfaces between Cre subunits in the synaptic complex, although there are no direct protein-protein contacts. From analysis of the point mutants E129Q, E129R and Q133H it was suggested that these residues are involved in both synapses and catalysis, although the mechanism is not clear (13). In accord with this data 128::CGRTG and cleave either an intact loxP substrate or a nicked suicide substrate, indicating that insertions into this non-conserved unstructured linker prevent catalysis.

3-I loop
The cluster of inactivating insertions (195-223 aa) into the 2-3 loop and helix I supports a role for the 2-3 loop in catalysis but also provides increased insight into the function of the 3-I loop. Earlier work showed that two point mutations in the 3-I loop, L213P and S214D, diminished Cre recombinase activity but did not impair binding to loxP DNA (32). Similarly, the inactive 215::GAAAL insertion mutant still binds DNA. The recombination defect is complex: 215::GAAAL showed a catalytic defect, as evidenced by its inability to cleave a nicked suicide substrate, but also exhibited decreased cooperativity in binding to loxP. Because L215 makes direct contact to E340 of helix N of the neighboring Cre subunit as part of an acceptor pocket into which helix N nestles we suspect that the pentapeptide insertion may perturb protein-protein contacts to produce an incorrect protein-protein interface, resulting in loss of catalytic activity and reduced DNA binding cooperativity.

M-N linker
Not surprisingly, insertions into the catalytic core comprised of the K, L, M helices inactivate recombinase, but so also do insertions into the M-N linker and helix N. The M-N linker at the protein-protein interaction interface (type I or II) influences the position of the catalytic tyrosine Y324 (2). Even though helix N shows low sequence conservation, and is even missing in some putative recombinases, for Cre we would expect insertion into M-N linker to affect both cooperativity of binding and cleavage activity. The inactivating 333::CGRTG mutant fulfills that prediction. Unexpectedly though, 333::CGRTG did not decrease DNA binding cooperativity but instead increased it 50-100-fold.
How might insertion into the M-N linker increase cooperativity? It is reasonable to think that the energy acquired from the interaction of helix N with the "acceptor pocket" of the partner subunit goes partly for bending of DNA (3,14) and the remainder appears as a positive effect on DNA binding cooperativity (33,34). If the length of the M-N linker limits the distance between helix M of one subunit and the "acceptor pocket" of the other, then from geometrical considerations a longer linker could lead to a smaller bend angle that would require less energy for bending. This would leave more energy to stabilize partner Cre subunit binding to loxP, thus