The helix-turn-helix motif of the coliphage 186 immunity repressor binds to two distinct recognition sequences.

The CI protein of coliphage 186 is responsible for maintaining the stable lysogenic state. To do this CI must recognize two distinct DNA sequences, termed A type sites and B type sites. Here we investigate whether CI contains two separate DNA binding motifs or whether CI has one motif that recognizes two different operator sequences. Sequence alignment with 186-like repressors predicts an N-terminal helix-turn-helix (HTH) motif, albeit with poor homology to a large master set of such motifs. The domain structure of CI was investigated by linker insertion mutagenesis and limited proteolysis. CI consists of an N-terminal domain, which weakly dimerizes and binds both A and B type sequences, and a C-terminal domain, which associates to octamers but is unable to bind DNA. A fusion protein consisting of the 186 N-terminal domain and the phage lambda oligomerization domain binds A and B type sequences more efficiently than the isolated 186 CI N-terminal domain, hence the 186 C-terminal domain likely mediates oligomerization and cooperativity. Site-directed mutation of the putative 186 HTH motif eliminates binding to both A and B type sites, supporting the idea that binding to the two distinct DNA sequences is mediated by a variant HTH motif.

The CI protein of coliphage 186 is responsible for maintaining the stable lysogenic state. To do this CI must recognize two distinct DNA sequences, termed A type sites and B type sites. Here we investigate whether CI contains two separate DNA binding motifs or whether CI has one motif that recognizes two different operator sequences. Sequence alignment with 186-like repressors predicts an N-terminal helix-turn-helix (HTH) motif, albeit with poor homology to a large master set of such motifs. The domain structure of CI was investigated by linker insertion mutagenesis and limited proteolysis. CI consists of an N-terminal domain, which weakly dimerizes and binds both A and B type sequences, and a C-terminal domain, which associates to octamers but is unable to bind DNA. A fusion protein consisting of the 186 N-terminal domain and the phage oligomerization domain binds A and B type sequences more efficiently than the isolated 186 CI N-terminal domain, hence the 186 C-terminal domain likely mediates oligomerization and cooperativity. Site-directed mutation of the putative 186 HTH motif eliminates binding to both A and B type sites, supporting the idea that binding to the two distinct DNA sequences is mediated by a variant HTH motif.
DNA binding proteins are often modular in structure, with separate domains responsible for binding and oligomerization. Such an arrangement, even in a simple system such as the lysis-lysogeny switch of coliphage , permits remarkable control of gene expression through a series of thermodynamically linked protein-protein and protein-DNA interactions. The genetic switch, one of the most intensively studied systems in biology, has in many ways provided the basis for the study of switch biology in higher organisms (1,2). Coliphage 186, a member of the P2 family of phage, shows essentially no similarity with at the protein or DNA level, and so the two have presumably evolved independently of each other. Nevertheless, the lysis-lysogeny switches of each show superficial similarity, and it is expected that a comparative analysis of the differences in detail between the two will improve overall understanding of switch operation, hence the present structure-function study of 186 CI repressor.
The immunity repressor, CI, of coliphage 186 is responsible for maintenance of the stable lysogenic state and achieves this by binding directly over and repressing two promoters: p R , the promoter of the early lytic operon and p B , the promoter for the late promoter activator gene B (Fig. 1). 186 contains a total of four binding sites for CI, including two sites (F L and F R ) that flank the lytic promoter (6). These flanking sites play a role fine-tuning CI-regulation of transcription from p R and from the lysogenic promoter p L . 1 The flanking sites F L and F R each consist of an inverted repeat, while the CI binding site at the p B promoter contains a pair of inverted repeats. These four inverted repeats share sequence similarity and are separated in each case by a five-base pair A/T-rich spacer. These CI binding sites have been designated A type sites (Fig. 1). The CI binding site at p R , responsible for repressing the early lytic genes, consists of three inverted repeats. There is a central A type site, which has a four-rather than five-base pair spacing between conserved bases and is designated an AЈ site. Situated on either side of the AЈ site are inverted repeats, again with an A/T-rich spacer, but unrelated in sequence to the A type sites (see Ref. 6, Fig. 1). These alternative recognition elements have been termed B type sites. Hence, the CI binding site at p R has the arrangement B-AЈ-B. The recognition elements at p R all lie on the same face of the helix and are strongly supported by DNase I footprint data and by a library of 19 virulent (vir) mutations (6,7). Thus, CI is able to recognize two distinct DNA sequences.
In the present work, we set out to determine whether there are two distinct DNA binding regions within the CI protein or whether there is just one motif that binds with relaxed specificity to the two different types of binding sites. To this end we have investigated the domain structure of CI by sequence analysis, linker insertion mutagenesis, and limited proteolysis. We have examined the self-association and DNA binding properties of the isolated domains and from the information so obtained carried out mutagenesis on residues predicted to be critical for DNA binding.
which had been digested with the same enzymes. pETCI (1-82)His 6 and pETCI (83-196)His 6 were constructed similarly to pETCIHis 6 , using primers designed to introduce NdeI and SacII sites at the appropriate positions. pETCI(hybrid)His 6 was constructed by amplifying by PCR the region encoding amino acids 1-82 of 186 cI using a 3Ј-primer designed to introduce an SphI site. The region of the cI gene encoding amino acids 93-236 of repressor was amplified by PCR using a 5Ј-primer designed to introduce an SphI site. The 186-cI PCR product was digested with NdeI and SphI, the PCR product with SphI and SacII, and these fragments inserted into NdeI/SacII-digested pET3aHis 6 . pETCI(HT-H Ϫ )His 6 was generated by performing QuikChange (Stratagene) sitedirected mutagenesis on a pETCIHis 6 template using primers 219 and 220.
pMRR9 (11) is a derivative of the lacZ promoter assay plasmid pRS415 (9) containing translation stop codons from pKO2 and the pUC polycloning site. pMRR9 p R (short), used to generate NK7049 (RS45 p R short lacZ), contains the 22,980-to 23,190-fragment of 186 (Ϫ81 to ϩ129 of p R ) inserted into the XbaI site of pMRR9, such that transcription from p R reads into lacZ. pMRR9 p R (HincII/SnaBI), used to generate NK7049 (RS45 p R HincII/SnaBI lacZ), contains the 22,583-to 23,552-fragment of 186 (Ϫ477 to ϩ492 of p R ) inserted into the SmaI site of pMRR9 such that transcription from p R reads into lacZ. pMRR9 p B , used to generate NK7049 (RS45 p B lacZ), contains the 20,408-to 20,647-fragment of 186 (Ϫ176 to ϩ64 of p B ) inserted into the EcoRI and KpnI sites of pMRR9 such that transcription from p B reads into lacZ. Any regions amplified by PCR were checked by sequencing.
RS45 is a phage vector used to transfer transcriptional reporter fusions made in pMRR9 into single copy. RS45 and pMRR9 share portions of the N terminus of both the ␤-lactamase gene and the lacZ gene thus allowing any promoter insert in pMRR9 to be recombined into the phage (9). Lysogenization with this recombinant phage gives a single copy chromosomal fusion.
Oligonucleotides-Sequences of oligonucleotides (shown 5Ј to 3Ј) are as follows. Protein Purification-Escherichia coli strain BL21 pLysS containing the various pETCIHis 6 constructs was grown in LB broth (500 ml) containing 100 g/ml carbenicillin, 30 g/ml chloramphenicol at 37°C to an A 600 of 0.55-0.7, induced with isopropyl-1-thio-␤-D-galactopyranoside to a final concentration of 0.5 mM, and growth continued for a further 2-3 h. Cells were harvested by centrifugation, washed once with 50 mM Tris-HCl, 0.1 mM EDTA, 150 mM NaCl, and 10% glycerol, pH 7.5 (TEG150), and the cell pellet stored at Ϫ20°C until use. For protein purification, the cell pellet was resuspended in 20 ml of buffer A (20 mM sodium phosphate, pH 7.2, 500 mM NaCl) and sonicated on ice, and cell debris removed by centrifugation (12 000 ϫ g, 30 min, 4°C). PMSF 2 (50 M) was added to inhibit proteases. The supernatant was loaded onto a freshly charged 5-ml Hi-Trap chelating column (Amersham Biosciences, Inc.) using a disposable syringe. The column was washed with 12-column volumes of buffer A, followed by 12-column volumes of buffer A containing 150 mM imidazole, and the protein eluted with 2-column volumes of buffer A containing 500 mM imidazole. Fractions of 200 l were collected and assayed for protein by absorbance at 280 nm. Protein-containing fractions were pooled and dialyzed extensively against TEG150. Purity of the dialyzed protein was examined by SDS-PAGE on 10% Tris-Tricine gels and judged to be better than 95% in all cases. Protein concentrations were determined spectrophotometrically, using molar extinction coefficients of 23950 M Ϫ1 cm Ϫ1 for CIHis 6 and CI(HTH Ϫ )His 6 , 15470 M Ϫ1 cm Ϫ1 for CI (1-82)His 6 , 8480 M Ϫ1 cm Ϫ1 for CI (83-192)His 6 and 36440 M Ϫ1 cm Ϫ1 for CI(hybrid)His 6 (calculated using the SEDNTERP program). Yields for the various proteins were between 10 and 30 mg per 500 ml of culture.
His 6 -tagged CI was shown to be equivalent to the wild type repressor by several criteria; (i) when expressed from a plasmid both forms of repressor gave immunity to infection by 186, (ii) in gel shift assays, purified CIHis 6 bound with a 5-fold higher affinity than purified wild type protein, presumably due to the more rapid NiNTA affinity purification procedure giving a higher active fraction of protein, (iii) both CI and CIHis 6 associated in solution in an equilibrium between monomers, dimers, tetramers, and octamers, and (iv) in vivo repression of p R and p B lacZ reporters by CI and CIHis 6 were very similar (see Table I).
Linker Insertion Mutagenesis-Linker insertion mutagenesis was performed according to the vendor's instructions (New England Biolabs), with minor modifications. In brief, pRAS1 was used in an in vitro reaction (5 l) containing target DNA (pRAS1, 20 ng), donor DNA (4.2-kb Transprimer pGPS5 carrying the kanamycin resistance gene, 5 ng) and transposase protein. After a 1-h strand transfer reaction at 37°C, the enzyme was inactivated by heating to 75°C for 10 min. Aliquots of the mixture were transformed by electroporation into NK7049 ( p R short lacZ) and plated on LB plates containing kanamycin (20 g/ml), ampicillin (100 g/ml), and X-gal (20 g/ml) to select for transformants carrying the transposon within pRAS1. Blue colonies indicated the potential presence of a transposon within the CI gene, since CI was rendered unable to repress the p R lacZ reporter. These potential insertion mutants were further screened by a PCR-based assay. Plasmid DNA was isolated from those strains in which the transposon was confirmed as being within the CI gene. The plasmid DNA was digested with PmeI to remove the bulk of the transposon, the plasmids religated and retransformed into strain NK7049 ( p R short lacZ). The position and sequence of the resulting 15-bp insert was determined by DNA sequencing. The effects of the mutations were assayed in the same strain by measuring ␤-galactosidase activity.  (3,4,5). Sequence numbering starts at the left cos end of the 186 genome. Genes are shown as gray boxes: B, activator of late transcription; 69, unknown function; int, integrase; cI, immunity repressor; apl, excisionase and transcriptional control; cII, establishment of lysogeny. Promoters are shown as arrowheads, their transcripts as arrows, terminators as stem loops, and the phage attachment site attP as a solid box. The CI binding sites at p B , F L , p R , and F R are indicated as solid circles. The sequences of the inverted repeats from each of these sites are shown in the lower part of the figure. The diamonds indicate the center of symmetry of each inverted repeat. F L and F R each consist of one A type site, p B consists of two A type sites, while p R has the arrangement B-AЈ-B. The consensus sequences for A type and B type sites are shown, where w ϭ A or T, y ϭ C or T and r ϭ A or G. The central w in the A type consensus is optional, reflecting the alternate spacing of A and AЈ type sites.
Limited Proteolysis-CIHis 6 (1.1 mg/ml) in TEG150 was digested at 37°C at CI to protease molar ratios of 280 (subtilisin), 370 (papain), or 1100 (proteinase K). At appropriate time points, 5 l of samples were diluted into an equal volume of 2ϫ SDS loading buffer containing 20 mM PMSF, immediately heated to 95°C for 1 min, and analyzed by SDS-PAGE. For samples to be analyzed by mass spectrometry, reactions (20 l) were stopped by the addition of PMSF and heating. Electrospray mass spectrometry was kindly performed by Dr. C. Bagley, Institute of Medical and Veterinary Science, Adelaide, Australia. Cleavage points were deduced from mass spectrometry results using the PAWS program.
Chromosomal Single Copy lacZ Fusions-Strain NK7049 transformed with the appropriate pMRR9 derivative was used as the host for growth of the RS45 phage vector. Phage stocks obtained were plated on NK7049, and single recombinant plaques selected on the basis of color in the presence of X-gal and purified once by streaking across a lawn of NK7049. Independent blue lysogens from at least two recombinant plaques were purified by restreaking. Single copy status of these lysogens was confirmed by PCR (12). For assay of p R or p B ␤-galactosidase activity, the appropriate CI expression plasmid (pETCIHis 6 or the parental pET3a plasmid) was transformed into these lysogens, and liquid cultures started from single colonies.
Kinetic LacZ assays were done in 96-well microtitre plates by an extensively modified Miller method (13). Fresh colonies on selective LB plates were used to inoculate 200 l of LB ϩ antibiotic. Plates were sealed and incubated at 37°C for ϳ16 h without shaking. These cultures were subcultured by diluting 2 l into 98 l of fresh medium and incubated with rotation to an A 600 of 0.2-1.2 (log phase). A 600 was measured using a Labsystems Multiskan Ascent plate reader with a 620-nm filter; the A 620 values were converted to A 600 (1-cm path length) values using an empirically derived relationship and adjusted for lightscattering non-linearity according to (14). Cells were chilled and then permeabilized with polymyxin B (15) by adding 20 l of culture ϩ 30 l of LB to 150 l of lysis buffer in a microtitre plate. Lysis buffer was TZ8 (100 mM Tris-HCl, pH 8.0, 1 mM MgSO 4 , 10 mM KCl) ϩ 2.7 l/ml 2-mercaptoethanol and 50 g/ml polymyxin B. The presence of detergents and chelating agents did not improve the assay. A higher pH value than used by Miller (13) improved display of o-nitrophenol in the absence of NaCO 3 added to stop the reaction. Assays were performed at 28°C and were initiated by addition of 40 l of o-nitrophenyl-␤-D galactoside (4 mg/ml in TZ8). The A 414 of the reaction was read every 2 min for 1 h, and enzyme activity determined as the slope of the line of best fit of A 414 versus minutes (readings with A 414 Ͼ 2.5 were ignored). Enzyme activity was found to be directly proportional to the A 600 of the culture and the volume of culture added to the assay (V in l). LacZ units were calculated as 200,000 ϫ (A 414 /min)/(A 600 ϫ V) and were roughly equivalent to standard Miller units.
Analytical Ultracentrifugation-Sedimentation experiments were performed in a Beckman XL-I analytical ultracentrifuge using absorbance optics and a four-hole An60Ti rotor. Approximately 100 l of sample and 105 l of reference solution were loaded in the sectors of the epon centerpieces. Following 24 h of centrifugation, scans were compared at 3-h intervals to ensure that equilibrium had been reached. Data were collected at 280 nm at a spacing of 0.003 cm. The buffer for all experiments was TEG 150. Protein was prepared for centrifugation by exhaustive dialysis against TEG 150, and the dialysate used as the reference solution for centrifugation. Buffer density () was measured in an Anton-Paar precision density meter to be 1.03953 g/ml at 5°C and 1.03644 g/ml at 20°C. The partial specific volumes ( ) were calculated (using the SEDNTERP program) as 0.727 ml/mg for CIHis 6 , 0.712 ml/g for CI (1-82)His 6 , and 0.736 for CI (83-192)His 6 .
Sedimentation data was analyzed using Sigmaplot 4.0 for Windows (SPSS Inc, Chicago, Il) initially by fitting each data set (absorbance versus radial distance) individually to Equation 1, the basic sedimentation equilibrium equation, in order to estimate whole cell molecular weights.
where A r and A r,0 are the absorbances at radial distance, r, and r,0, M is apparent molecular weight, is the partial specific volume, is the solution density, is the rotor speed in radians per second, T is the temperature in degrees Kelvin, R is the gas constant and e is a baseline error term. M, A r,0 , and e were fitting parameters. All data sets were then analyzed globally by fitting to an extended version of Equation 1, modified to take into account association between species. Surface Plasmon Resonance (SPR)-Surface plasmon resonance experiments were conducted on a Biacore 2000 using a streptavidincoated chip (SA chip, BIAcore AB, Sweden). Biotinylated DNA (PAGE-purified, SigmaGenosys, Sydney) was prepared by adding a slight molar excess of the non biotinylated strand over the biotinylated strand. The strands were annealed by heating to 90°C for 3 min, followed by slow cooling to room temperature. Between 68 and 135 response units (RU) of DNA was immobilized at a flow rate of 5 l/min. Flow cell 1, containing no DNA, was used as a reference channel, while flow cell 2, containing two tandem O L operator sites (90 RU), was used as a negative control. A32A DNA (68 RU) was immobilized in flow cell 3, and B32B DNA (135 RU) in flow cell 4.
The binding buffer for Biacore experiments was TEG150. Proteins, diluted in TEG150, were pumped across all four flow cells at 20 l/min and 25°C, and responses recorded at 1Hz.

RESULTS
Homologs of 186 CI-The first step in investigating the structure-function relationship of 186 CI was to search for homologs. A number of proteins related to the 186 CI repressor were identified by BLAST (16) (Fig. 2). The 186 CI amino acid sequence was used initially to search the protein data base, and four prophage proteins (repressors from phage R67, Hemophilus influenzae phages HP1 and S2, and Vibrio cholerae phage K139) with homology to CI repressor were identified. The unfinished microbial genomes data base was also searched with 186 CI as the input sequence, and two additional proteins were found. The first was from Klebsiella pneumoniae (WUSTL Genome Sequencing Center) and, judging from other sequence similarities, appears to be the CI homolog of a phage closely related to 186, present as a prophage. The second was a putative prophage repressor from Salmonella typhi (CT18 phage) (Sanger Center, Cambridge, UK). Several partial sequences related to 186 CI were also evident in the unfinished genomes of other Salmonella subspecies (typhimurium, paratyphi, enteriditis) but have not been included here. A block alignment of these 186-like proteins was then used to search the BLOCK data base (24). This more powerful search method detected, in addition to those proteins already identified, the CI protein from 80, a lambdoid phage, as being related to the 186 like repressors. Alignment of the 186 CI repressor amino acid sequence with those of the seven related repressor proteins reveals two blocks of homology, one of ϳ70 amino acids at the N terminus, and a second block of ϳ60 amino acids at the C terminus (Fig. 2). The two blocks are separated by a low homology region of 40 -50 amino acids. This non-conserved region may represent an unstructured linker joining two more highly structured domains. In the case of the 80 repressor, homology was less evident at the C-terminal end. The C termini of the lambdoid repressors form part of the RecA recognition site, with cleavage of the repressor occurring within the central linker (25,26). 186 CI is not RecA-sensitive (7).
Each CI-like protein in Fig. 2 was examined for the presence of protein motifs using a number of search methods. Potential helix-turn-helix motifs were identified by both the Dodd and Egan (22) weight matrix method and the GYM2 pattern recognition method (23) in some of the proteins. The S.D. scores obtained for each protein by the Dodd and Egan method are shown in parentheses following each sequence in Fig. 2. An S.D. score above 2.5 is considered good evidence for a HTH; likely HTH motifs were identified in some of the proteins, and both search methods always identified the same location as the potential HTH region (solid line in Fig. 2). Thus, although a number of the proteins including CI itself have poor S.D. scores, the alignment of amino acids within the N-terminal block of homology coincides with the position of the predicted HTH motifs in each case. We take this as evidence that 186 CI very likely contains an N-terminal helix-turn-helix motif.
Domain Structure of CI-We have used two techniques, linker insertion mutagenesis and limited proteolysis, to investigate the domain structure of the 186 CI repressor protein, with the aim of determining whether one or both of the putative domains have the potential to bind DNA.
In linker insertion mutagenesis (27, 28) a short stretch of amino acids is inserted into the protein of interest. The effect of the inserted amino acids on the activity of the protein is dependent upon the location of the insertion. For example, an insertion located on a surface loop of the protein or in a relatively unstructured region would be expected to have a minor effect on protein function. In contrast, an insertion within a tightly folded or buried region is more likely to interfere with protein function, whether by disrupting protein structure, protein folding, or through an effect on protein stability. To define the regions of CI that are either tolerant or intolerant to insertions, we have used the Genome Priming System-Linker Scanning system (New England Biolabs, Beverly, MA). In this system, a modified Tn7-based transposon is used in an in vitro reaction to make random (1.7 kb) insertions into the gene of interest. The majority of the transposon sequence is then removed by restriction digest and religation, leaving a 15-base pair insertion. In four of the six possible reading frames, this insertion results in a five-amino acid linker, while the other two reading frames generate stop codons. Thus, two sets of CI mutants were generated; (i) a set of truncated proteins and (ii) a set of proteins containing randomly located five-amino acid insertions. The activities of these CI mutants were measured by their ability to repress a single copy, chromosomally inserted pR lacZ reporter, NK7049 (RS45 p R short lacZ). In this system, unrepressed p R lacZ gave 839 (Ϯ 77) units, while wild type CI from pRAS1 repressed p R lacZ to 0.7 (Ϯ 0.8) units.
Transposon insertions that resulted in truncated protein products occurred at amino acids 36, 61, 74, 103, 106, 107, 110, 112, 123, 173, and 185. These truncated CI proteins invariably lost the ability to repress the p R lacZ reporter. Even a protein truncated at amino acid 185, resulting in just a seven-amino acid C-terminal deletion, was inactive, indicating that these amino acids are required for CI function or stability.
Among the second set of mutants (Fig. 3), insertions of five amino acids had quite different effects, depending on the posi-FIG. 3. Linker insertion mutagenesis. A series of repressor mutants were generated in which 15 bp of DNA were randomly inserted into the plasmid (pRAS1)-encoded cI gene. Depending on the reading frame, this 15-bp insertion gave rise to either a truncated protein or a protein with a five-amino acid insertion. The activities of the various CI proteins were assayed in NK7049 (RS45 p R short lacZ). Unrepressed p R lacZ (parental pBluescriptKS only) gave 839 (Ϯ 77) ␤-galactosidase units, while wild type CI (supplied from pRAS1) repressed p R to 0.7 (Ϯ 0.8) units. None of the 11 truncated proteins (see "results") were able to repress the single copy p R lacZ reporter gene. The locations of the five amino acid insertions within CI are shown. Insertions shown to the right of the figure reduced or eliminated the ability of CI to repress the p R lacZ reporter gene; the activities of each these mutants are shown as the mean of at least three determinations (Ϯ 95% confidence limits). CI mutants containing the insertions shown to the left of the figure retained the ability to repress the reporter, giving less than two lacZ units. The two shaded areas represent the blocks of homology described in Fig. 2.  (20), 80, CI repressor protein of coliphage 80 (21). Where at least five of the eight amino acids are identical or conserved, they are shown on a black or gray background, respectively. Each protein sequence was also examined for motifs. The Dodd and Egan weight matrix method (22) and the GYM2 pattern recognition algorithm (23) both detected potential HTH motifs. The S.D. scores obtained for each protein by the Dodd and Egan method are given in parentheses after the sequences. A score of 2.5 or greater indicates a likely HTH motif. Both methods always identified the same sequence within each protein the most likely to contain a HTH motif and these sequences, indicated by the black line, are all aligned in the multiple sequence alignment. tion of the insertion. ␤-galactosidase units for the inactive or partially active mutants are given to the right of Fig. 3; active mutants were defined as those that gave less than two units of ␤-galactosidase activity and are shown to the left of Fig. 3. Insertions within the N-terminal region, with one exception, abolished the ability of CI to repress p R . The exception was the insertion at amino acid 5, which lies just outside the conserved N-terminal region and which remained fully active. Western blotting of cell extracts prepared from the inactive mutants indicated that the inactivity of two of the mutants (insertions at amino acids 15 and 66) reflected a lack of CI in the soluble fraction (data not shown). Together these data are consistent with the idea that insertions within the conserved HTH-containing N-terminal region disrupt folding and/or protein stability and are thus detrimental to DNA binding. In contrast, mutants having insertions within the putative linker region remained able to fully repress p R , suggesting that the central region of CI is relatively unstructured or forms part of a surface loop and is thus tolerant to insertions. Only three insertions were obtained within the C-terminal region. The insertion at amino acid 167 had no effect on CI activity, suggesting that this amino acid may also be located on the surface of the protein.
Insertions at amino acids 139 and 156 reduced but did not eliminate the ability of CI to repress p R . One possibility is that insertions in this region disrupt protein-protein association or cooperativity, although we have not tested this explicitly.
Limited proteolysis was used to further probe the domain structure of CI, the principle being that structured regions of the protein will be more resistant to low levels of protease than an unstructured linker region. Purified CIHis 6 , which we have shown to be equivalent to wild type CI ("Experimental Procedures"), was digested with low levels of protease (papain, proteinase K or subtilisin), aliquots removed over the course of the digestion, and the reactions quenched with PMSF. These samples were analyzed by SDS-PAGE using 10% Tris-Tricine gels (29) (Fig. 4). At early time points, a stable fragment of ϳ14 kDa was observed for each of the proteases employed. A fragment representing the remainder of the protein (expected size ϳ8.5 kDa) was not observed. In the case of the subtilisin and proteinase K digests, the 14-kDa CI fragment was further digested at later time points to a stable 8-to 9-kDa fragment.
The boundaries of these stable fragments were determined by analyzing samples from each of the digests using electrospray mass spectrometry. The results are summarized in the lower part of Fig. 4. All three proteases cleaved within the presumably unstructured C-terminal six-histidine affinity tag of CI. The 14-kDa fragment from the subtilisin digest represents the C-terminal region of CI, cut primarily at amino acid 79, along with some minor products digested within a few amino acids either side of residue 79, consistent with the nonspecific nature of this protease. The smaller subtilisin fragment(s) obtained at later time points result from further digestion at both ends of the larger fragment to give a minimal polypeptide consisting of residues 116 -198. This result suggests that the C-terminal region is a stable, folded domain. The absence of a stable N-terminal fragment suggests that the N-terminal region is at least partly unstructured in the absence of DNA and so is susceptible to proteolysis. However, repetition of the proteolysis experiment in the presence of an oligonucleotide containing the F R CI binding site gave an identical pattern of cleavage, suggesting that either the N-terminal region remains susceptible to proteolysis when bound to DNA or that upon cleavage elsewhere within the bound protein, the N-terminal fragment dissociates from the DNA and is then susceptible to the protease.
Limited proteolysis with papain gave fragments cleaved pri-marily around residues 79 and 80, again with some other minor products cleaved at nearby residues. Digestion of CIHis 6 with proteinase K also gave cleavage at residues 77-79 at early time points, followed by cleavage primarily at residue 110 at later time points (Fig. 4). This protease-sensitive region (approximately amino acids 77-116) of CI is consistent with the central non-conserved region shown in Fig. 2.
Properties of CI Domains-To further examine their biochemical properties in comparison with full-length repressor, residues 1-82 (N-terminal region) and residues 83-204 (linker plus C-terminal region) were cloned, expressed, and purified using a C-terminal six-histidine affinity tag. Both fragments were soluble and obtained in milligram amounts.
Full-length wild type CI repressor associates in solution in an equilibrium between monomers, dimers, tetramers, and octamers (10). The His 6 affinity-tagged CI also associates to octamers in solution (data not shown), with dimers the predominant species at the concentration range in which DNA binding first occurs. The oligomeric state of the N-terminal and Cterminal fragments of CI were assessed by sedimentation equilibrium (Fig. 5). For CI (1-82)His 6 , data were obtained at three different loading concentrations (cell 1, 5 M; cell 2, 16 M, and cell 3, 32 M) at a rotor speed of 24,000 rpm (Fig. 5a). Initially, the individual scans were analyzed in terms of Equation 1 to obtain whole cell molecular weights. These ranged from 15,870 (Ϯ 420) for cell 1 to 18,660 (Ϯ 100) for cell 3, values that approach twice that of the monomer molecular weight (10,549). The data for all three cells were then fitted globally to a number of association schemes, with the monomer molecular weight fixed at 10,549. The best fit (as judged by the sum of squares of the residuals) was to a monomer-dimer equilibrium, FIG. 4. Limited proteolysis of CIHis 6. CIHis 6 (1.1 mg/ml) was digested with subtilisin (4 g/ml), papain (3 g/ml), or proteinase K (1 g/ml) in a 50-l reaction. Samples (5 l) were taken at the time points indicated and quenched with PMSF. Samples were heated to 90°C for 2 min and analyzed on a 10% Tris-Tricine SDS gel. The full-length protein is indicated by the arrows. The sizes of the molecular weight markers (M) are shown to the right of the gels. To determine the location of the cleavage points, samples from selected time points were subjected to electrospray mass spectroscopy, and the masses of the peptides used to infer the point of cleavage. The results are summarized in the lower part of the figure. The N-terminal and C-terminal regions of homology, described in Fig. 2, are shaded, while the His 6 affinity tag is in black. Lines indicate the major fragments obtained for each of the proteases.
with an association constant of 2.5 ϫ 10 5 M Ϫ1 (⌬G ϭ Ϫ6.9 kcal/mol). When the molecular weight of monomer (M 1 ) was included as an additional fitting parameter, the association constant was unchanged, and a value of 10,770 (Ϯ 170) was obtained for M 1 . There was no evidence for species beyond dimer. Thus, the N-terminal fragment is able to form stable dimers in solution, albeit with an association constant at least 10 4 -fold weaker than the full-length protein (10). It seems reasonable to now refer to the CI (1-82)His 6 fragment as a domain, even though it is not highly resistant to proteolysis.
Data for CI (83-192)His 6 were obtained at three loading concentrations (cell 1, 19 M; cell 2, 50 M; cell 3, 77 M) and two rotor speeds (12,000 rpm (Fig. 5b) and 18,000 rpm (Fig.  5c)). Analysis in terms of Equation 1 gave whole cell molecular weights in the range 57,900 (Ϯ 1200) to 85,760 (Ϯ 1700), indicating association to a species at least 6.3-fold larger than the monomer (M 1 ϭ 13,490). All six data sets were then fitted globally to a number of models. The best fit (Fig. 5, b and c) was to a dimer to octamer association with an association constant of 1.4 ϫ 10 15 M Ϫ3 (⌬G ϭ Ϫ19.2 kcal/mol). It was not possible to obtain data on the monomer-dimer association, which was essentially complete over the concentration range accessible in the sedimentation experiments. Nor was it possible to obtain information about tetramer formation, since the tetramer to octamer transition is a concerted (energetically favored) process and tetramer is not a significantly populated species. However, the free energy of association per dimer for the dimer to octamer transition is Ϫ4.8 kcal/mol for CI (83-192)His 6 , compared with Ϫ5.3 kcal/mol for wild type CI (10). This calculation suggests that the majority of the free energy for CI association is derived from interactions between C-terminal domains.
The functions of the domains were tested in two ways: (i) the ability to repress a reporter gene under control of the early lytic promoters p R or p B and (ii) binding in vitro to A type and B type sites. The ability of these protein fragments to repress a single copy chromosomally inserted p R lacZ and p B lacZ reporter in vivo was tested (Table I). CI and its variants were expressed from the T7 promoter of pET3a-based plasmids in a strain lacking T7 polymerase. There was sufficient "leakage" of expression in this system to give approximately the same level of CI expression as that found in a 186 lysogen (data not shown).
The strong p R promoter was repressed ϳ230-fold, from 534 units in the absence of CI to 2.3 units in the presence of full-length CIHis 6 . The p B promoter, which at 139 units is 4-fold weaker than the p R promoter, is also not as strongly repressed by CI, retaining 4.6 units of activity in the presence of full-length CIHis 6 , a 30-fold repression. This weaker repression of p B probably reflects the number and arrangement of the CI operators at the respective promoters (Fig. 1). Neither CI (1-82)His 6 or CI (83-192)His 6 were able to repress p R in this system (Table I). Similarly, the CI (1-82)His 6 and CI (83-192)His 6 domains had no effect on repression of the p B lacZ reporter. Thus, at least at the concentrations of protein generated in this assay system, repressive capacity of the CI fragments was lost. This is consistent with the inability of the CI truncation mutants (Fig. 3a) to repress p R , even when expressed from the high copy number pRAS1 plasmid. It is possible, however, that the isolated CI domains may be able to bind the CI operators, but be unable to bring about repression of the promoter.
The ability of the N and C-terminal fragments of CI to bind DNA was measured in vitro by SPR. This technique measures binding between macromolecules by detecting changes in refractive index at the surface of a sensor chip, the response being proportional to the mass of macromolecule bound. While a pair of A type CI recognition sites are found at p B , B type sites only occur in combination with an AЈ sites at p R . Since we wished to differentiate between the ability of CI to bind to A type and B  type sequences, we employed synthetic oligonucleotides to generate (i) a tandem pair of A sites separated by 32 base pairs, the natural spacing found at p B , (ii) a tandem pair of B sites separated by the same distance, and (iii) a tandem pair of O L 1 operators as a control for nonspecific binding. The sequences of the oligonucleotides used are given in "Experimental Procedures." The lower strand of each oligonucleotide was biotinylated to allow attachment to a streptavidin-coated biosensor chip. The results, corrected for bulk refractive index changes by subtracting the response of a control (no DNA) flow cell, are shown in Fig. 6. Comparison of full-length CI binding to A type (Fig. 6a) and B type (Fig. 6b) sites showed that CI bound more strongly to A type sites, indicated by a greater response at a given protein concentration. A titration performed over a 100fold range of CI concentrations showed that CI bound to A32A at least 10-fold more strongly than to B32B (data not shown). During this titration, nonspecific binding of CI became apparent at concentrations above 1 M, but only if the DNA contained CI operators. It appears that nonspecific binding may be seeded from specifically bound repressor, similar to the phasing seen with HK022 repressor (30). This phenomenon precluded analysis of SPR data in terms of the equilibrium binding response, while the likelihood of multivalency of the CI-DNA interaction prevented meaningful analysis of the binding kinetics (31). Binding of the purified CI domains was therefore examined only qualitatively by SPR. Purified CI (1-82)His 6 bound to both A and B Type sequences, albeit weakly, with binding to B type sequences only evident at a concentration of 10 M. CI (83-192)His 6 gave no response above background to either type of site, even at a concentration of 10 M (data not shown).
Hybrid Repressor-The isolated N-terminal domain is capable of only weak dimerization and hence has likely also lost the potential for cooperative interactions between adjacently bound dimers, leading to a lower overall affinity for its sites. This potential for higher association and dimer-dimer cooperativity was replaced by creating a hybrid protein consisting of the 186 CI N-terminal domain and the well characterized CI repressor C-terminal domain (32) (Fig. 7a). We reasoned that if this fusion protein could bind to both A and B type sites, then the residues necessary for DNA binding to both types of sites must be present in the N-terminal region. A chimeric repressor consisting of the 186 CI N-terminal domain (residues 1-82 of 186 CI) and the CI C-terminal oligomerization domain ( residues 92-236) was cloned, expressed, and purified, and its ability to bind A and B type sites tested in vivo and in vitro. The chimeric repressor was able to repress both p R and p B lacZ reporters in vivo (Table I), although not to the same extent as full-length CIHis 6 .
In SPR experiments (Fig. 7, b and c) the hybrid repressor bound to both A and B type sites, although with somewhat lower affinity than the wild type 186 repressor. Control BIAcore experiments showed that full-length repressor had no affinity for either A or B type 186 sequences. Taken together, these results indicate that at least some of the binding determinants for both A and B type sequences are located in the 186 CI N-terminal (amino acids 1-82) region. The loss of some binding affinity of the hybrid compared with the wild type repressor is presumably due to less than optimal cooperativity between adjacently bound dimers, since operator to operator spacings differ between and 186.
Mutagenesis of Helix-Turn-Helix-To test whether the determinants for DNA binding to A and B type sites are both located in the same DNA binding motif of 186 CI, critical residues in the predicted helix-turn-helix motif were mutated. Residues to be mutated were chosen on the basis that they should change the sequence away from the 186-like repressor consensus, but not disrupt the structure of the protein (Fig. 8). Residues 12 and 13 of the HTH motif are commonly involved in sequencespecific interaction with the DNA (34). The serines at these positions in 186 CI (amino acids 37 and 38 of CI) were mutated to arginine and glutamic acid, respectively, amino acids occurring frequently at these positions in other HTH motifs. These changes actually improve the match to the Dodd and Egan (22) HTH master set consensus (S.D. score ϭ 0.5 for wild type, 1.4 for mutant). We expected that these changes would not disrupt the HTH motif but would alter its DNA binding specificity. This protein, CI(HTH Ϫ )His 6 , was purified in milligram amounts and was shown by sedimentation equilibrium to selfassociate to octamers, similar to the wild type protein. This is good evidence that the mutations do not have a large effect on protein folding and are specifically affecting DNA recognition.
The mutated protein was unable to repress either p R or p B lacZ reporters in vivo (Table I) and gave no response in Biacore experiments to either A or B type sites (not shown), even when used at a concentration of 10 M. In addition, when wild type 186 phage was plated on a strain carrying pETCI(HTH Ϫ )His 6 , the resulting plaques were less turbid than those obtained by plating on a control (pET only) strain, suggesting that CI(HTH Ϫ )His 6 acts as a dominant negative mutant in vivo to wild type phage-derived repressor. This further supports the idea that the mutant is correctly folded, is able to heteroassociate with wild type repressor subunits, and is unable to bind CI operators. We conclude that serines 37 and 38 within the putative HTH motif of 186 CI are necessary for binding to both A and B type sites. DISCUSSION Coliphage 186, like the intensively studied bacteriophage , has evolved to enable it to follow two distinct but interchangeable developmental pathways. As 186 and are almost unrelated at the DNA and protein level, the focus of this laboratory has been to study the genetic switch of 186, since it represents an independently evolved solution to a common problem. One aspect of this study has been to investigate the properties of the 186 lysogenic repressor (3,4,6,10). We have shown previously (6) that the 186 CI repressor binds to four sites within the early control region of the phage and that, among these four sites, are two distinct types of inverted repeat operator sequences, termed A type sites and B type sites. The operators are arranged in the order AA-A-BAЈB-A, where the AЈ operator has a four rather than five base pair spacing between half-sites. Since 186 CI needs to recognize two different types of sequences, the possibility existed that CI does this using two distinct DNA binding motifs. Such an arrangement is found for example in members of the integrase family of proteins, which recognize core type sites and arm type sites (35,36). In another example, the recent crystal structure of the bacterial repressor MarA, an araC family member, shows that it contains two HTH motifs, which together bind an asymmetric, degenerate sequence (37). Here we have investigated the structure-function relationship of CI and show that there is one DNA binding motif that recognizes both types of site, rather than two distinct DNA binding motifs. We have also shown that CI consists of two domains, an N-terminal domain (nominally amino acids 1-82), which contains a putative helix-turn-helix motif, forms weak dimers in solution, and is responsible for sequence-specific DNA binding, and a C-terminal domain which, together with the linker region, forms octamers in solution and has no capacity for DNA binding.
In terms of domain structure, 186 CI is similar to the lambdoid repressors, which also consist of an N-terminal DNA binding domain, and a C-terminal domain, which mediates dimerization as well as cooperative interactions between adjacently bound dimers (32,38). Indeed this arrangement of domains is common among many prokaryotic and eukaryotic transcriptional regulators where protein association is linked to DNA binding (39). There are several lines of evidence that 186 CI also utilizes cooperative interactions. The C-terminal domain alone can associate strongly to octamers. Like repressor (40), full-length 186 CI exists in solution in an equilibrium between monomers, dimers, tetramers, and octamers, and both proteins have similar free energies of association (10). CI binding sites are arranged such that they are on the same face of the helix, spaced two or three turns of the helix apart (6). Gel mobility shift experiments show only one retarded species, whether there are one (A), two (AA) or three (BAЈB) sites present on the DNA (6). Mutations (vir mutants) in one or two of the inverted repeats at p R diminishes overall binding affinity, yet the same retarded complex is observed (6). We also have preliminary evidence that CI bound at p R can interact with CI bound at the flanking sites, 1 similar to the looping observed between CI bound at the O R and O L operators (41). Taken together, these points suggest that cooperativity between DNA bound dimers of CI may be important for the existence of a stable lysogenic state. The recent crystal structure of the C-terminal domain has provided a model for cooperative binding between dimers bound to adjacent sites, as well as suggesting a mechanism for tetramer-tetramer interactions (42). The availability of mutants unable to associate were important in confirming the validity of the proposed models. Sequence alignment of the 186 CI-like repressors shown in Fig. 1 with a set of lambdoid phage repressors was attempted, however, no obvious homologies across families were found, with the exception of the lambdoid 80 phage repressor (Fig. 1), where homology was primarily at the N-terminal region. One approach to isolating C-terminal association mutants (whether monomer-monomer, dimerdimer, or tetramer-tetramer) of 186 CI would be to select mutants of CI(HTH Ϫ ) that no longer display a dominant negative phenotype.
CI repressor must recognize alternate spacing between the A type half-sites, five base pairs at A sites, four base pairs at the central AЈ site of p R . Members of the araC family are able to recognize different half-site spacings by utilizing a flexible linker between the DNA binding domain and dimerization domain (43). Three lines of evidence suggest the presence of a flexible linker joining two domains in 186 CI: (i) in the alignment of 186-like phages (Fig. 2), the two blocks of homology are separated by a region (amino acids 74 -135) containing little sequence homology, (ii) insertions of five amino acids within this region (between amino acids 87-124) did not affect the ability of CI to repress p R (Fig. 3), and (iii) limited proteolysis of 186 CI (Fig. 4) resulted in cleavage around amino acid 80 and, at later time points, around amino acid 116, while retaining a stable C-terminal fragment. Thus the presence of a linker between 40 and 60 amino acids in length may allow CI to recognize the variable (4 or 5 bp) half-site spacing found in A type sites and also may be important in higher order association.
It is apparent that A and B type sequences are not recognized equally well by 186 CI. Both full-length CI and the N-terminal domain bind to a pair of A sites at least 10-fold more strongly than to a pair of B sites separated by the same distance. Strong repression of the p R promoter by CI is essential for maintaining the lysogenic state and indeed, in a lysogen, p R is repressed at least 300-fold (4). Strong cooperativity between adjacently bound dimers of CI is likely responsible for the overall tight binding at p R , as the individual sites have relatively poor affinity for repressor (6). This strong cooperativity is also manifested, at least in vitro, in the observation of spreading of CI binding along the DNA from specifically bound sites (Ref. 6;FIG. 8. Mutation of the helix-turn-helix motif. The proposed structure the 186 CI HTH motif is shown (33,34). Shaded residues are those proposed to be involved in sequence specific DNA recognition. The mutations made at serine 12 and serine 13 of the motif (amino acids 37 and 38 of CI) are indicated. The first and last residues of the helices are numbered, as is residue 9, located at the turn and most often a glycine or alanine residue. The bulkier aspartate residue present in CI will most likely result in restricted stereochemistry. A common, but not essential, helix-helix interaction between residues 5 and 15, which helps to orient the helices, is indicated by a dashed line. this work). On the other hand, in order for the switch from lysogeny to the lytic mode of development to be effective, derepression of the lytic promoters by removal of repressor must be rapid and efficient; interfering with cooperativity would be one means to facilitate this. In the lambdoid phages this is achieved through the RecA*-mediated self-cleavage of the repressor at conserved cleavage points in the linker region (25). With the loss of dimer-dimer cooperativity, the N-terminal domains then have insufficient affinity for DNA to enable repression of the lytic promoters. In the case of 186, however, the need for RecA is indirect. Induction of a 186 prophage requires a phage encoded-gene, tum, under LexA control, whose product has antirepressor activity (7,44,45). Although 186 CI N-terminal domains also bind DNA less strongly than does the full-length protein (Fig. 5), Tum does not cleave the repressor but acts at some other level (45). One way to pursue the mechanism of Tum action would be to test whether the 186-repressor hybrid constructed in this study is susceptible to Tum.
What are the characteristics of CI that allow it to bind two distinct operator sequences? No structural data is available for 186 CI. There are general rules for recognition of DNA by HTH proteins (33,34). (i) Residues 1,11,12,13,17, and 20 of the HTH motif contact the DNA; for 186 CI, all of the corresponding residues except Ala-11 could form hydrogen bonds with the DNA via hydroxyl or amino groups. (ii) There is usually a small residue (Gly or Ala) at position 9 of the turn; 186 CI has a bulkier aspartate residue, a substitution likely to restrict the relative orientation of the helices. (iii) Positions 4 and 15 of the HTH should be hydrophobic; 186 satisfies this requirement with leucine at these positions. (iv) Position 5 should not be branched; for CI, position 5 is an alanine. Thus, although 186 CI follows the general rules for recognition of DNA by HTH proteins, with the exception of the residue located at the turn, the HTH motif has a poor match to the Dodd and Egan (22) HTH master set (Fig. 2). Perhaps it is these differences that allow dual site recognition. Our definition of A type and B type sites is strongly supported by mutation data (6), but it is possible there are sequence characteristics in common between the two types of sites that we have not recognized. There are only a limited number of sites from which to construct a consensus sequence, compared with some bacterial regulators, which have numerous recognition sites within the genome. In vitro site-selection methods could be used to determine which bases are important for recognition by CI. Alternatively, structural data on the CI N-terminal domain bound to A sites versus B sites would provide direct answers to these questions.