DNA binding by the coliphage 186 repressor protein CI.

The cI gene of coliphage 186 maintains lysogeny and confers immunity to 186 infection by repressing the major early promoter, p(R), and the promoter for the late transcription activator gene, p(B). Gel mobility shirt and DNase I footprinting show that CI protein binds to the DNA at p(R) and p(B) and also to sites approximately 300 base pairs upstream and downstream of p(R), called FL and FR. Mutations which cause virulence reduce CI binding to p(R). The biochemical and genetic data identify three CI operators at p(R), two at p(B), and single operators at FL and FR. The operators at the p(B), FL, FR, and central p(R) sites are inverted repeat sequences, separated by 5 base pairs (Type A) or, in the case of p(R), by 4 base pairs (Type A'). A different inverted repeat operator sequence (Type B) is proposed for the binding sites on each side of the central site at p(R). Thus, CI appears to recognize two distinct DNA sequences. CI binds cooperatively to adjacent operators, and binding at p(R) is strongly dependent on these cooperative interactions. A high order CI multimer appears to be the active DNA binding species, even at single operators.

The cI gene of coliphage 186 maintains lysogeny and confers immunity to 186 infection by repressing the major early promoter, p R , and the promoter for the late transcription activator gene, p B . Gel mobility shift and DNase I footprinting show that CI protein binds to the DNA at p R and p B and also to sites ϳ300 base pairs upstream and downstream of p R , called FL and FR. Mutations which cause virulence reduce CI binding to p R . The biochemical and genetic data identify three CI operators at p R , two at p B , and single operators at FL and FR. The operators at the p B , FL, FR, and central p R sites are inverted repeat sequences, separated by 5

base pairs (Type A) or, in the case of p R , by 4 base pairs (Type A). A different inverted repeat operator sequence (Type B) is
proposed for the binding sites on each side of the central site at p R . Thus, CI appears to recognize two distinct DNA sequences. CI binds cooperatively to adjacent operators, and binding at p R is strongly dependent on these cooperative interactions. A high order CI multimer appears to be the active DNA binding species, even at single operators.
Coliphage 186 shares with the well characterized phage the ability to achieve a lysogenic state that is extremely stable yet which can efficiently switch to the lytic state in response to activation of the host SOS system (1). Elucidation of the gene control mechanisms and strategies used in 's genetic switch (2,3) has been of profound value in molecular biology and has informed thinking about the ways in which higher organisms utilize alternative stable developmental states. 186 is a member of the P2 phage family (4) and, since it shows very little similarity at the DNA or protein sequence level to , appears to represent a relatively independent solution to the requirements of such a genetic switch.
The 186 cI gene is the central player in the maintenance of the stable lysogenic state (5). The cI gene product represses two promoters: p R , the promoter for the early lytic operon, including the apl, cII, and replicase genes, and p B , the promoter for the late promoter activator gene, B (Refs. 6 and 7; see Fig. 1). CI thus, directly or indirectly, represses all the lytic genes of the prophage and also blocks lytic development of 186 phage that infect the lysogen (8). Expression of cI is maintained during lysogeny by transcription from the p L promoter. The leftward p L promoter and the rightward p R promoter are arranged face-to-face, with their transcripts overlapping by 62 bases (6). This is quite different from the analogous P R and P RM promoters of , which are arranged back-to-back.
The processes of establishment, stable maintenance, and efficient breakdown of the lysogenic transcriptional state in 186 display some unusual features, and characterization of the activity of the CI protein is needed to understand these processes. During lysogeny, the face-to-face promoter arrangement would seem to create difficulties for CI in maintaining repression of p R . If the CI protein binds at p R , then RNA polymerase from p L must pass through the CI⅐p R complex in order to transcribe the cI gene. This passage of RNA polymerase seems likely to remove CI from the DNA and thus make p R accessible for RNA polymerase binding. Whether this situation is a problem and, if so, what special mechanisms are used to cope with it, are questions that are relevant to the important topic of how mobile protein⅐DNA complexes, such as polymerases, interact with each other (9) and with static protein⅐DNA complexes, such as repressors and nucleosomes (10). The strategy for establishment of lysogeny in 186 appears very similar to , with lysogenic transcription requiring CI and the initial production of CI being dependent on another phage protein, CII (5). 1 However, the activation of lysogenic transcription by 186 CI seems to be indirect, with CI repression of p R removing an inhibition of p L caused by converging transcription from p R (6). 2 Efficient breakdown of CI repression during SOS induction of the prophage is initiated not by RecA, as in , but by a phage protein, Tum, that antagonizes CI repression of p R and p B (1). 3 Efficient derepression may also require repression of cI transcription from p L by the Apl protein, which binds between p R and p L (11). 2 Interactions between Apl and CI at p R are likely to be critical in the operation of the lysis-lysogeny switch.
To investigate CI repression, Lamont et al. (8) isolated and examined a number of virulent (vir) mutants of 186. These are phage mutants that are insensitive to lysogenic immunity and are able to develop lytically in a 186 lysogen. It was expected that these mutants would carry mutations at p R which interfered with CI repression. Indeed, in all 19 vir mutants, mutations were found within the Ϫ49 to Ϫ3 region of p R . These mutations are clustered into three sites. All of the vir mutants carry at least one mutation in the central site (Site I), and most carry additional changes in the leftmost site (Site II). Two mutants have changes in the rightmost site (Site III), with one of these also altered at Site II. The three mutants with changes at Site I only are poorly virulent, forming plaques with low efficiency (10 -15%) on lysogens, and not forming plaques on a strain carrying the cI gene on a multicopy plasmid. Mutants carrying additional changes at Sites II or III plate with higher efficiency on lysogens (22-52%), and most are able to form plaques on the strain with the cI plasmid. It was expected that these mutations disrupt CI operators and thus act by reducing CI binding to p R . However, no DNA sequence element that was conserved between the three sites and which could serve as a likely CI operator sequence could be found (8).
The experiments reported here were designed to: (i) confirm that 186 CI is a sequence-specific DNA-binding protein, (ii) identify the CI binding regions in the early control portion of the 186 genome, (iii) confirm that the vir mutations reduce CI binding at p R , and (iv) identify a likely CI recognition sequence. In the course of this work, a number of other aspects of DNA binding by CI became apparent.

General DNA Manipulations
Plasmid preparations were by the alkaline lysis method, with a single CsCl gradient purification (15). Restriction digestions were performed as specified by the suppliers (New England Biolabs, Pharmacia Biotech Inc., Boehringer Mannheim). The Klenow fragment of Escherichia coli DNA polymerase I (Bresatec, Australia) was used for endfilling. DNA fragments for ligation were isolated from agarose using the Geneclean procedure (Bio 101, La Jolla, CA). T4 DNA ligase, T4 polynucleotide kinase, and radionucleotides were from Bresatec. Bovine pancreatic DNase I was from Boehringer Mannheim.

Radiolabeling of Oligonucleotides
Oligonucleotides were end-labeled in 10-l reactions containing 50 ng of oligonucleotide, 1.25 M [␥-32 P]ATP (5 Ci l Ϫ1 ), 10 mM MgCl 2 , 70 mM Tris-HCl, pH 7.5, 5 mM dithiothreitol, and 10 units of T4 polynucleotide kinase with incubation at 37°C for 30 -60 min, followed by 70°C for 20 min to inactivate the enzyme. These reaction mixes were used directly in PCRs and DNA sequencing reactions.
The following oligodeoxynucleotide primers (Bresatec) were used.

CI Preparations
The preparation and purification of CI is described in full in the accompanying paper (16). Briefly, sonicated lysates of CI overexpressing cells were polyethyleneimine-precipitated, resuspended, reprecipitated with ammonium sulfate, fractionated on an Affi-Gel Blue column, fractionated on a heparin column, and dialyzed against 50 mM Tris-HCl, 0.1 mM EDTA, 10% glycerol, 150 mM NaCl, pH 7.5 (TEG150). The fraction from the Affi-Gel Blue column (in 50 mM Tris-HCl, pH 8, 250 mM NaCl, 1 mM EDTA, 10% glycerol) was judged to be Ն95% pure CI and was used for the DNase I footprinting experiments of Fig. 3. The heparin column purified CI, judged to be Ն98% pure, was used in all the other experiments.

Gel Mobility Shift Assays
Radiolabeled DNA fragments for gel retardations were prepared by PCRs. CI binding reactions (10 l) were in TEG150 with 0.1-0.3 ng l Ϫ1 radiolabeled fragment (0.33-1 nM) and 5 ng l Ϫ1 competitor DNA (sheared salmon sperm DNA). The reactions were incubated on ice for 15 min and were loaded onto 6% nondenaturing polyacrylamide gels (19:1 acrylamide:bisacrylamide) containing 20% glycerol, with electrophoresis at 4°C in TBE. Gels were vacuum-dried and analyzed by phosphor autoradiography (Molecular Dynamics). Computer images of gels were contrast/brightness-adjusted using Adobe Photoshop.

RESULTS
CI Binding to Four DNA Regions-The cI gene was cloned into a protein expression plasmid, and crude cell extracts containing high levels of CI protein were obtained. Gel mobility shift assays were carried out using this extract, and a control extract was made from the same strain carrying the parent expression vector (data not shown) (29). Using various DNA fragments from the PstI.1-BglII.4244 early control region of the 186 chromosome (which contains p B , the lysogenic operon, p R , and the first four genes of the early lytic operon), four DNA regions that showed CI-specific binding were located (Fig. 1): a region spanning p R (termed the pR site), a region spanning p B (termed pB), a region upstream of p R (termed FL, for far left region) and a region downstream of p R (termed FR, for far right region. Fig. 2 shows gel shift assays with CI purified to at least 98%  (14,25), showing the location of the four CI binding regions, pB, FL, pR, and FR identified by gel mobility shift studies (Fig. 2). Genes are indicated by gray boxes: B, late promoter activator (26); 69, unknown function; int, integrase (14); cI, immunity repressor; apl, excisionase and transcriptional control (6, 11) 1 ; cII, establishment of lysogeny (5); dhr, inhibitor of host replication; fil, inhibitor of cell division (27). The promoters p R , p L , and p B are denoted by solid arrowheads, and their transcripts by arrows. Terminators are shown as stem loops. The phage attachment site attP 2 is shown. homogeneity (16) and with DNA fragments containing each of the four binding regions. In each case, there was a single major retarded species at each concentration tested. Thus, only one major complex appeared to be formed at each concentration. The mobility of the retarded species was similar for the different fragments, implying binding of a similar number of protein subunits in each case. A minor, less retarded species was seen in some experiments (see the FR fragment in Fig. 2). The lack of binding to the control DNA fragment in Fig. 2 showed that CI binding to the pB, FL, pR, and FR sites was sequence-specific.
The gel retardations showed an unusual effect that we are not able to explain: the mobility of the retarded species decreased in small steps with increasing CI concentration (Fig. 2). This effect was seen with all CI binding DNA fragments and occurred over increments of CI concentration even smaller than shown in Fig. 2. The phenomenon was not affected by the order in which samples were prepared or loaded. It is unlikely that the effect is caused by a nonspecific DNA-binding contaminant in the CI preparation because the binding is both sequencespecific, as shown with the control DNA fragment (Fig. 2), and CI-specific, as crude cell extracts not containing CI show no such binding to these fragments (data not shown) (29). Presumably, the decreasing mobility reflects increasing numbers of CI subunits per complex. However, if this were the case, one would expect both a greater magnitude of shift and the appearance of multiple bands in at least some tracks instead of the single band seen.
DNase I Footprinting of CI Binding Sites-To locate the CI binding sites more precisely, DNase I protection studies were carried out. CI purified to at least 95% homogeneity (16) was used for these footprint experiments; similar results were originally obtained using crude cell lysates containing CI (29). The CI footprint results for one strand of the pR binding region is shown in Fig. 3. Fig. 4 summarizes the footprinting data for both strands at all four CI binding sites, showing the assignments of DNase I cleavages that were protected, exposed, or enhanced in the presence of CI. The lengths of the arrows in Fig. 4 indicate the strength of the CI effect, as judged by the lowest CI concentration at which the effect was observable.
At each site there was a particularly dense region of relatively strong protections (indicated by the brackets in Figs. 3 and 4). Within this region there were a few positions that were still sensitive or showed an enhanced sensitivity to DNase attack in the presence of CI.
Extending beyond these dense footprint regions, usually on one side only, was a less dense region of enhancements and FIG. 2. Binding of CI to pB, FL, pR, and FR sites by gel mobility shift assay. CI gel mobility shift assays (see "Materials and Methods") with gel-purified, radiolabeled DNA fragments prepared by PCR from plasmid clones using primers USP and RSP, at least one of which was 5Ј-labeled with polynucleotide kinase and [␥-32 P]ATP (see "Materials and Methods"). The pB, FL, pR, and FR DNA fragments were derived from pEC627, pEC629, pEC625, and pJC251, respectively. The nonbinding control fragment was a similarly labeled fragment amplified from the 186 tum gene (primers 9 and 87). The CI concentrations (nM) are given above each track. The gel origin is at the top of the gel image.
FIG. 3. DNase I footprinting with CI at pR. The CI DNase I footprint on the top strand (the 186 l-strand) of the pR binding site is shown. The interpretation of footprint data for both strands of all the CI binding sites is given in Fig. 4. The rightmost four tracks show the DNA fragment treated with DNase I in the presence of various indicated concentrations of CI (nM). DNA fragments were prepared by PCRs in which one of the primers was radiolabeled at its 5Ј end (see Fig. 2 legend). The sequencing marker tracks (C, G, T) were prepared by dideoxy chain termination sequencing, using the same radiolabeled primer and dsDNA template with Sequenase version 2.0 (United States Biochemical Corp.). The numbering of the sequence positions is explained in Fig. 4. The solid bracket to the right of each gel image indicates the dense CI-affected region, with the dashed brackets showing the extended portions of the footprint (see text). The asterisk indicates the running position of a small amount of undenatured DNA probe (determined by a gel lane with DNA untreated with DNase I). The CI binding reactions were made by mixing 5 l of dilutions of CI (Affi-Gel Blue fraction) in 50 mM Tris-HCl, pH 8, 250 mM NaCl, 1 mM EDTA, 10% glycerol with 45 l of the radiolabeled DNA probe (final concentration in the binding reaction 0.5-1 nM) in 10 mM Tris-HCl, pH 7.5, 1 mM EDTA, 0.1 mM dithiothreitol, 5% glycerol, 50 g l Ϫ1 bovine serum albumin. The reactions were incubated at room temperature for 15 min before the addition of 5 l of 0.2 g ml Ϫ1 DNase I in 100 mM KCl, 100 mM MgCl 2 , 20 mM CaCl 2 , 10 mM Tris-HCl, pH 8.0, 2 mM dithiothreitol. After 2 min at room temperature, the reaction was stopped by the addition of 50 l of 50 mM EDTA, 1% SDS, 20 g ml Ϫ1 glycogen, 600 mM sodium acetate, and 200 l of ethanol. The DNA was ethanol-precipitated, phenol/chloroform-extracted and reprecipitated. The pellet was dissolved in 5 l of formamide loading buffer (60% formamide, 12 mM EDTA, 0.03% bromphenol blue, 0.03% xylene cyanol), heated to 95°C, chilled, and loaded onto a 6% sequencing gel. Following electrophoresis, the gel was dried onto Whatman 3MM paper under vacuum. The bands were visualized and manipulated as described for the gel mobility shift gel (see "Materials and Methods"). In some of the gel images, the sequence markers have been contrast-adjusted differently from the footprint tracks.
weaker protections that tended to be interspersed with positions where DNase I cleavage was not affected by CI. Most of the effects in this region were apparent only at higher CI concentrations. This extended portion of the footprint was often quite large; in the case of the pR site, this portion of the footprint was almost 60 bp long and reached the Ϫ14 position of p L . The less dense region effects were not intensified in footprints with crude CI extracts (29). The accessible and pro- tected positions in the extended regions tended to show a 9 -11-bp periodicity, reminiscent of the "phasing" effect seen with the HK022 CI protein, in which protein bound at specific sites nucleates, by cooperative interactions, binding to nonspecific sites (17,18). The extended footprint region may be involved in some way in the "stepping" effect seen with the gel shifts.
The footprint at the pB site contained an interesting feature. The dense region of the footprint was divided on each strand by a group of four consecutive strong enhancements, with the enhancements on one strand being offset in the 3Ј direction from those on the other. Since DNase I cuts in the minor groove and since the nearest phosphates across the minor groove are offset 3 bp in the 3Ј direction (19), these enhancements indicate that the minor groove in one region of the DNA helix was made particularly DNase-susceptible by CI. Such sensitivity is often found when the minor groove lies on the outside of a DNA bend (20).
The CI Recognition Sequence at the pB, FL, and FR Sites-In an attempt to identify the DNA sequences that determine CI binding, we examined the pR, pB, FL, and FR binding regions for common sequence elements. Disregarding pR for the moment, we were able to find an inverted repeat sequence element that is strongly conserved between the binding regions at pB, FL, and FR and can account well for the binding seen at these sites.
There are two copies of this inverted repeat element at the pB site and single copies at FL and FR. The alignment of the eight half-sites is shown in Fig. 5 and yields the conservations a 7 .T 8 .T 8 .C 8 .a 7 .C 8 with a 5-bp A/T-rich spacer. The formal consensus recognition sequence of 17 bases is MTTCWCWWWW-WGWGAAK (W ϭ A or T, M ϭ A or C, K ϭ G or T).
A sequence-specific DNA-binding protein must use the information in its DNA recognition sequence to specify binding to its functional binding sites and to minimize binding to nonfunctional sites. Therefore, an important test for a proposed recognition sequence is to show that it can provide for discrimination between known binding sites and known nonbinding sites. We quantitated the discriminatory power of this recognition sequence by determining (i) how similar each of the four se-quences at pB, FL, and FR are to the consensus, and (ii) how frequently sequences with this degree of similarity to the consensus arise in random DNA sequences (described in the legend to Fig. 5). We found that such sequences occurred less than once per 1000 kb of random DNA sequence, showing that the consensus is highly discriminative and emphasizing its legitimacy as a recognition sequence of the CI protein. By comparison, the CI operators are less well conserved. Using a consensus sequence for the CI operators, sequences similar to the best scoring operators occurred less than once per 1000 kb. However, some of the operators scored quite poorly, with similar matches occurring every 10 kb of random sequence.
The location of the eight half-sites relative to the footprints is indicated in Fig. 4. The position of the sequences at pB, FL, and FR is consistent with the DNase I footprints in three ways, further strengthening their assignment as recognition sequences for CI. (i) The sequences lie totally within the dense portion of the footprints. (ii) The two operators at pB are positioned symmetrically on either side of the strong enhancements in the center of the pB footprint and are on the same face of the DNA helix as each other (having a center-to-center distance of 31 bp). The DNase-sensitive minor groove between these sites lies on the opposite face of the helix. Thus, the hypersensitivity of the DNA in the center of the pB footprint may be explained by DNA distortion, perhaps bending, induced by interactions between CI molecules bound to adjacent sites. (iii) There is a tendency for certain bonds in approximately equivalent positions in the half-sites of the sequences to remain exposed to DNase or to show enhanced cleavage in the presence of CI ( Fig. 4; indicated by triangles in Fig. 5), suggesting that these sequences interact similarly with CI. The relationship between these cleavages on the two strands indicates sensitivity of the minor groove at a single region in each operator arm.
The Binding Determinants at pR Lie within the Dense Footprint Region-With such strong evidence supporting the proposed operator sequence at pB, FL, and FR, we were surprised to find no significant matches to this sequence at the pR binding site.
In order to further examine CI binding at pR, we first narrowed down the location of the binding determinants, since the extended CI footprint at the pR site was very large, extending from positions Ϫ78 to ϩ76. The dense portion of the footprint was considerably smaller, from Ϫ53 to ϩ14, and contains the loci of the vir mutants. To test the idea that all of the CI binding determinants at p R are contained within the dense footprint region, we examined CI binding to a p R DNA fragment from which most of the 186 sequences around the dense footprint region were removed.
Using the gel shift assay, we compared CI binding to a DNA fragment containing the Ϫ58 to ϩ14 sequence of p R , which carries little more than the dense footprint region (the minimal pR fragment), with binding to a DNA fragment containing the Ϫ81 to ϩ126 sequence, which covers the entire CI footprint at the pR site (large pR fragment). One result is shown in Fig. 6, top. The amounts of bound and unbound DNA in the gel were quantitated and graphed in Fig. 6, bottom. This shows that the affinity of CI for the pR site was not affected by the replacement of sequences from the extended footprint region with non-186 DNA. The apparent dissociation constant for CI to pR, K obs , was obtained as the CI concentration at which half the DNA was bound. This was 30 nM for the large pR fragment and 31 nM for the minimal pR fragment. Ratios close to one for the relative binding strengths of the two fragments were found in three experiments (average K obs minimal/K obs large ϭ 0.97 Ϯ 0.05).
Putative CI Recognition Sequences at pR-We therefore re-FIG. 5. Proposed CI operators at the pB, FL, and FR sites. The DNA half-sites of the inverted-repeat sequences proposed as CI operators at the pB, FL, and FR sites (see Fig. 4) are aligned. The diamond shows the center of symmetry of the sequences. The small triangles indicate bonds which remain DNase-accessible (unfilled) or whose DNase cleavage is enhanced (filled) in the presence of CI (Fig. 4); upward pointing triangles indicate the bond on the complementary strand. Conserved bases and their frequencies are indicated below the sequences. To quantitate the discriminative power of the putative CI binding sequence (see text), a full-length consensus sequence MTTC-WCWWWWWGWGAAK (M ϭ A or C, K ϭ T or G, W ϭ A or T) was derived from the alignment. We scored the degree of match of each of the four putative binding sequences with the consensus sequence by assigning a match at a nonredundant consensus position (A, C, G, or T) a score of 1 and a match at a redundant position (W, M, or K) a score of 0.5. With this scheme, each binding sequence scores at least 12 points out of a maximum of 12.5. A computer program was written to scan random DNA sequence for matches to consensus sequences; this showed that a score of 12 or more on this consensus occurred by chance less than once per 1000 kb. examined the dense footprint region at pR for sequence elements that could confer CI recognition. As noted above, we were unable to find matches to the pB-FL-FR binding sequence at the pR site. However, since some DNA-binding proteins are able to tolerate different spacings between their DNA half-sites (for example, AraC (21)), we performed computer-assisted searches of pR for matches to the pB-FL-FR consensus with longer or shorter spacings between the inverted half-sites. A partial match was found to the consensus with a 1-bp shorter spacing between the half-sites, located in the center of the dense footprint region, at vir Site I. Alignment of these and the pB-FL-FR half-site sequences (Fig. 7) showed that 3 base pairs in each of the 10 half-sites are completely conserved. In all five sequences, there is an A/T-rich spacer between these conserved bases; this spacer is 5 bp long in the pB, FL, and FR sequences and 4 bp long in the Site I sequence. Of the total of 24 bp that comprise these spacers in the five sequences, 23 are A/T base pairs.
The sequence conservations between Site I and the pB-FL-FR sequences provide sufficient discriminatory power to provide for the binding seen to these sequences and the lack of binding elsewhere on the 186 genome. A consensus TCWCW-W(W)WWGWGA (W ϭ A or T) was derived from the five sequences. The central W is optional, reflecting the alternative spacings. Sequences matching either the 5-spacer consensus as well as the pB-FL-FR sequences or the 4-spacer consensus as well as the Site I sequence occurred only once every 29 kb of random DNA sequence. No other strong matches to this consensus occur in the 186 early control region. We term the pB, FL, and FR recognition sequence Type A and the Site I recognition sequence Type AЈ to indicate its different half-site spacing.
All of the 19 vir mutants carry a change at Site I, and there are in total 29 base changes at this site (8). This large body of mutational data provides a stringent test of the proposed recognition sequence. The very location of the AЈ-type sequence at Site I supports its role in CI recognition. However, the fit between this sequence and the vir mutations is much more extensive than co-location and not only provides very strong evidence that it is a CI recognition sequence but also indicates those bases that are critical for binding. Firstly, the mutations lie in both arms of the sequence (Fig. 7), showing the importance of both half-sites, a feature expected for a symmetrical binding sequence. Secondly, in all of the 19 vir mutants, the match to the consensus sequence is worsened. Of the 29 mutations occurring at Site I, 27 involve one of the three fully conserved positions in the half-sites. Two mutations lie at less conserved positions, one slightly improving the match to the consensus. However, both these mutations occur in combination with changes at the fully conserved positions. There is therefore a remarkable agreement between the vir mutations and the consensus sequence. The mutations indicate that the two fully conserved C residues within the half-sites are critical for CI binding, since every one of the 19 vir mutants carries a change at one of these positions.
The Type A recognition sequence cannot explain CI binding in the left (Site II) and right (Site III) regions of pR as we were unable to find such sequences in these regions, even when other alternative half-site spacings were tested. We had begun with the assumption that CI is able to recognize only one type of sequence. However, some DNA-binding proteins, for example, the integrase proteins of and other phages, have two FIG. 6. CI binding determinants at the pR site lie within the dense footprint region. Gel mobility shift assays (see "Materials and Methods") were carried out to compare the affinity of CI for a DNA fragment carrying p R sequence from Ϫ81 to ϩ126, containing the entire CI footprint region (large pR fragment) with a fragment carrying p R sequence from Ϫ58 to ϩ14, containing only the dense footprint region (minimal pR fragment). Both fragments were prepared by PCR (see "Materials and Methods"): the large fragment was obtained from pEC631 with primers 35 and USP, the minimal fragment was from pEC627 with primers RSP and USP (USP labeled in both cases). The CI concentrations were 0, 2, 6, 18, 54, 128, 384, and 1152 nM. The graph shows the fraction of DNA that was bound by CI, obtained from quantitation of the phosphorimage and calculated as 1 Ϫ (unbound DNA/ total DNA) with a correction factor subtracted to give a fraction DNA bound value of 0 in the absence of CI. FIG. 7. A pB-FL-FR-like sequence with altered half-site spacing at Site I. The sequence conservations in the operators proposed for CI binding at pB, FL, and FR (Fig. 5) sites are aligned with the half-sites of an inverted-repeat sequence at Site I in pR. The diamond shows the center of symmetry of the sequences; note that the half-site spacings for the pB-FL-FR sequences are different from the Site I sequence. The changes found in the left and right half-site sequences in the vir mutants (8) are indicated, with the subscripts denoting the frequency of each change. The asterisk indicates a mutation that does not reduce the match to the consensus. We derived from these five sequences the Type A sequence consensus: TCWCWW(W)WWGWGA (W ϭ A or T; the W in parentheses is optional, reflecting the alternative spacings). In a search of 1000 kb of random DNA sequence, matches as good or better than the poorest match among the pB, FL, or FR sequences to the longer consensus (Type A) were found 11 times (see Fig.  5 legend) and matches as good or better than the match of the Site I sequence to the shorter consensus (Type AЈ) occurred 23 times. Thus, pB-FL-FR-like or Site I-like sequences arose by chance once per 29 kb. independent DNA binding domains and are able to recognize distinct DNA sequences (22). We therefore examined the sequences at Sites II and III for alternative recognition elements. An element that occurs at Site II and at Site III and which is a likely candidate for a second type of CI recognition sequence is shown in Fig. 8. The element is again an inverted repeat sequence with an A/T-rich spacer; however, the half site sequences are quite different from the A-type sequences. The consensus sequence TNGRYWWWRYCNA (W ϭ A or T, R ϭ A or G, Y ϭ C or T, N ϭ any base) was derived from the alignment. The discriminatory power provided by this sequence is strong; the matches to the consensus as good or better than those obtained by the Site II and III sequences arose only once per 67 kb of random DNA sequence. We term this second proposed CI recognition sequence Type B. No significant matches to this consensus were found elsewhere in the 186 early control region.
Again, strong evidence for these Type B sequences being CI binding determinants is provided by the vir mutations (Fig. 8). There are 16 vir mutants with changes at Site II or Site III, involving a total of 21 mutations. Of these, 18 reduce the match to the consensus. (Three mutations do not alter the match to the consensus; however, these mutations occur in combination with other changes that do disrupt the consensus). Nineteen of the twenty-one mutations occur in Site II and are distributed over both arms of sequence symmetry, with 11 mutations involving the same fully conserved position. Only two mutations have been isolated at Site III. One of these (in vir121) reduces the match to the consensus. The other (in vir100) does not reduce the match to the consensus. However, although it is clear that the Site III mutation in vir121 increases virulence (8) and reduces CI binding to pR (see below), it is not clear that the Site III mutation in vir100 does so. The vir100 mutant, unlike vir121, also carries a change at Site II and its virulence is no different from other Site II mutants (8), so the vir100 change in Site III may have no effect on CI binding.
Further evidence for these recognition sequences at Sites II and III is apparent in the DNase footprint. There is a pattern of CI-dependent DNase I enhancements that is very well conserved between all four half-sites ( Fig. 4; marked by filled triangles in Fig. 8). These enhancements indicate minor groove sensitivity at the same position in each half-site and argue that CI interacts in a similar fashion with the four sequences.
Two further observations support our three-site model (B-AЈ-B) at pR. Firstly, the location of the three sequences correlates very closely to the dense footprint region (see Fig. 4). Secondly, the three sequences all lie on the same face of the DNA helix. The center-to-center spacing is 21.5 bp between the Site II and Site I sequences (B-AЈ) and 20.5 bp between the Site I and Site III sequences (AЈ-B). Binding to the same face of the DNA helix is a feature of binding at the pB site and is also consistent with the interactions seen between CI bound at pR, as described below.
Binding Studies with pR from vir Mutants-We examined CI binding to pR fragments carrying mutations at the three vir sites to, firstly, confirm our assumption that the vir mutations disrupt CI binding at pR and, secondly, to investigate independent binding of CI to its individual operators at pR. The three vir pR fragments used in these experiments represent three classes of vir mutants. All fragments contain the same double mutation at Site I (see Fig. 9C). For the vir122 fragment (referred to as Site II ϩ I Ϫ III ϩ ), this is the only change from wild-type. The vir97 fragment carries an additional change at Site II (Site II Ϫ I Ϫ III ϩ ), and the vir121 fragment carries an additional change at Site III (Site II ϩ I Ϫ III Ϫ ).
Gel shift studies with these fragments showed that all three mutant fragments bound CI more weakly than wild-type (Fig.  9A). Furthermore, each of the three mutations weakened CI binding. From the K obs values (Fig. 9A, legend), mutation at Site I reduced binding 2.8-fold and further mutation at Site II or Site III reduced binding an additional 2.9-and 2.3-fold, respectively. These effects on CI binding correlate with the degree of virulence shown by the mutants, weak for vir122 and strong for vir97 and vir121 (8), and correlate with the number of intact CI operators in these fragments. Although we have tested only three vir mutants, it is now reasonable to assume that CI binding is weakened in all of the vir mutants.
We noted previously that CI binding to pR, pB, FL, and FR DNA fragments produced, at each CI concentration, a single retarded species of similar mobility (Fig. 2), despite the fact each fragment contains a different number of CI operators. The identical gel shift patterns seen with pR fragments carrying one, two, or three intact operators confirms this result. Thus, at any one CI concentration, it appears that a similar number of CI subunits is binding to the DNA whether this DNA carries one, two, or three operators. Fig. 9B shows the DNase I protection results with the vir mutant fragments at CI concentrations at which there was very little CI binding in the extended footprint region. We found no evidence for independent binding to individual operators at the pR site. Instead, each mutation affected binding not only to its own site but to the whole of the dense footprint region. With the wild-type fragment, cleavage at almost every position was strongly protected or enhanced by CI at 210 nM (indicated by dots to the right of the gel lanes). Most effects were also visible at 70 nM. The Site I mutation (vir122) strongly reduced binding to the central Site I region but also to the whole dense footprint area, with a subset of the wild-type effects remaining at 210 nM only. Further mutation at Site II or Site III (vir97 and vir121) eliminated all CI binding at these concentrations, despite the fact that there was an intact Type B site on these fragments. Again, the mutation at one site affected binding at the others: mutation at Site II removed the weak remaining binding at Sites I and III; mutation at Site III FIG. 8. A second putative CI recognition sequence at Sites II and III. The half-site sequences of the operators proposed for CI binding at vir Sites II and III (Fig. 4) are aligned. The diamond shows the center of symmetry of the sequences. The small triangles indicate bonds which remain DNase I-accessible (unfilled) or whose DNase cleavage is enhanced (filled) in the presence of CI (Fig. 4); upward pointing triangles indicate the bond on the complementary strand. Conserved bases and their frequencies are indicated below the sequences. The changes found in the left and right half-site sequences in the vir mutants (8) are indicated, with the subscripts denoting the frequency of each change.
The asterisks indicate mutations that do not reduce the match to the consensus. One mutation, lying just left of the Site II sequence, is not shown. The consensus Type B sequence: TNGRYWWWRYCNA (W ϭ A or T, R ϭ A or G, Y ϭ C or T, N ϭ any base) was derived from the alignment. The match of the Site II or III sequences to this consensus was found to arise by chance once per 67 kb. removed the weak remaining binding at Sites I and II.
These results indicate a high degree of cooperativity in CI binding at pR, that is, the favorable interactions between bound CI subunits are strong compared with the interactions between the subunits and the DNA. Thus, the strong CI binding to pR appears to be a result of strong cooperativity between CI protomers at relatively weak DNA binding sites. Strong cooperativity in CI binding is supported by the gel shift results with pB and pR fragments. Noncooperative or weakly cooperative binding of a protein to fragments containing multiple operators should give rise, at any one protein concentration, to multiple retarded species, with the species differing in the number of protein subunits bound. The lack of such species (Figs. 2 and 9A), therefore, argues for strong cooperativity in CI binding to adjacent operators. DISCUSSION Gel shift and DNase I footprinting studies have confirmed that the 186 CI repressor is a sequence-specific DNA-binding protein and have identified four binding regions in the early control region of the phage genome. CI binds to the lytic promoters p R and p B that it represses and also binds to sites at the Ϫ330 and ϩ350 positions of p R (FL and FR). The biological role, if any, of these flanking sites is not yet known. The CI binding region at p R is distinct from that of the Cro-like Apl protein, although there is some overlap between the two regions (11). We showed that the CI binding determinants at p R are located between the Ϫ58 and ϩ14 positions of the promoter and confirmed for three vir mutants that the mutations at p R carried by these mutants reduced CI binding.
DNA recognition by CI is unusual. There appear to be two distinct recognition sequences, Type A and B. Furthermore, CI appears to be able to recognize a Type A half-site spacing variant, AЈ. This gives the structure AA-A-BAЈB-A for CI binding to the pB-FL-pR-FR region. Three types of evidence provide strong support for our operator assignments. (i) There is good sequence conservation between the A/AЈ-type sequences and between the B-type sequences; with very strong conservation between the A-type sequences at pB, FL, and FR. (ii) The operators are consistent with the DNase I protection data, including many of the fine details of the footprints. (iii) The proposed operator sequences at the pR site can explain the large body of genetic data provided by the vir mutations. The AЈ sequence (Site I) is disrupted in all 19 of the vir mutants, invariably involving one of the fully conserved bases. Fifteen of these mutants also carry a disruption of the left B-type sequence (Site II), and one mutant carries a disruption of the right B-type sequence (Site III), with the most conserved bases often involved. The number of intact operators correlates with the strength of binding of CI to pR in the three mutants tested and with the sensitivity to immunity of all the vir mutants (8).
A number of proteins are known to recognize two different sequences or different half-site spacings. For example, the integrase protein is able to recognize distinct "core" and "arm" type DNA sequences, utilizing different regions of the protein (22), a property that seems to be general to the large family of "complex" integrases (see, for example, Ref. 23). An example of and the DNA was ethanol-precipitated only once. To the left of the figure are indicated the p R coordinate of each band, the location of the predicted operators at pR (vertical lines), and the positions of the mutations used (arrowheads). To the right of each set of tracks, CIaffected cleavages are indicated by dots. C, sequence of the densely CI-protected region of the pR site, showing the mutations carried by vir122, vir97, and vir121. The underlined bases are the positions at which vir mutations occur. The predicted CI operator sequences are shown by converging arrows, the Ϫ10 and Ϫ35 hexamers of p R are in boldface.
FIG. 9. Binding of CI to pR DNA from virulent mutants. A, gel mobility shift assay (see "Materials and Methods") comparison of affinity of CI for DNA fragments carrying vir mutations. DNA fragments were prepared by PCR (see "Materials and Methods") from 186 ϩ , 186del1vir122 186cI ϩ vir97, and 186del1vir121 phage, using primers 71 and 68. From a number of experiments, the average K obs values (determined as in Fig. 6) for the wild-type, vir122, vir97, and vir121 fragments were 36, 102, 295, and 228 nM, respectively. B, DNase I protection study of CI binding to DNA fragments carrying vir mutations. DNA fragments were prepared as in A, except that primers 34 (labeled) and 89 were used. The data are for the top strand. The procedure was as described in the Fig. 3 legend, except that the reactions were stopped by extraction with phenol saturated with 10 mM Tris-HCl, 50 mM EDTA, recognition of alternative half-site spacings is provided by the AraC protein, which has a flexible linker between the dimerization domain and the DNA binding domain of the protein (21). Further work is planned to define the spacing requirements within and between CI recognition sequences and to identify the DNA-binding regions of the protein.
Certain features of CI binding can be deduced from the data. In some of the A-type sites and in both the B-type sites, the minor groove in each arm of the operator remains sensitive or becomes hypersensitive to DNase I in the presence of CI (see Figs. 5 and 8). Thus, the contacts made by CI with the bases in these operator arms must occur via the major groove. The inverted repeat nature of the operators and the spacing of the exposed bonds (9 -11 bp at Type A sites and 8 bp at Type B sites) indicates that a rotationally symmetrical CI dimer is recognizing successive major grooves that are close to being on one face of the DNA helix.
Strong cooperative binding of CI to adjacent operator sequences was indicated by (i) the presence of only a single retarded species in gel shift experiments with fragments carrying multiple operators, and (ii) the finding that vir mutations at pR weakened binding to nonmutated operators. Cooperative binding is not surprising, as CI has been shown to exist in a monomer-dimer-tetramer-octamer equilibrium in solution (16) with interaction energies very similar to CI (24).
Although the data of Shearwin and Egan (16) show that CI in solution is predominantly dimeric at the concentrations used in our studies, the gel shift experiments suggest that a higher-order CI multimer is the active binding species, since the mobility of the major retarded species was very similar for DNA fragments with differing numbers of operators. This multimer seems able to occupy DNA containing up to two operators of any one type (two A-type operators at pB or two B-type operators at pR). Assuming that each operator is contacted by a CI dimer, then this species must be at least a tetramer. Studies of CI-DNA stoichiometries at the different operators are planned.
Once we have suitably characterized the relationship of type A, AЈ, and B sequences to CI binding, we will then be in a position to investigate the possible interaction between CI bound at FL, FR, and pR and its biological significance.