Genetic Analysis of Phage Mu Mor Protein Amino Acids Involved in DNA Minor Groove Binding and Conformational Changes*

Background: The phage Mu Mor activator protein structure led to predictions of key amino acids needed for DNA binding. Results: Mutational analysis identified specific Mor amino acids needed for DNA binding and transcription activation. Conclusion: The properties of the mutants support the predictions from the structure. Significance: The Mor protein family is unique among regulatory proteins. Gene expression during lytic development of bacteriophage Mu occurs in three phases: early, middle, and late. Transcription from the middle promoter, Pm, requires the phage-encoded activator protein Mor and the bacterial RNA polymerase. The middle promoter has a −10 hexamer, but no −35 hexamer. Instead Pm has a hyphenated inverted repeat that serves as the Mor binding site overlapping the position of the missing −35 element. Mor binds to this site as a dimer and activates transcription by recruiting RNA polymerase. The crystal structure of the His-Mor dimer revealed three structural elements: an N-terminal dimerization domain, a C-terminal helix-turn-helix DNA-binding domain, and a β-strand linker between the two domains. We predicted that the highly conserved residues in and flanking the β-strand would be essential for the conformational flexibility and DNA minor groove binding by Mor. To test this hypothesis, we carried out single codon-specific mutagenesis with degenerate oligonucleotides. The amino acid substitutions were identified by DNA sequencing. The mutant proteins were characterized for their overexpression, solubility, DNA binding, and transcription activation. This analysis revealed that the Gly-Gly motif formed by Gly-65 and Gly-66 and the β-strand side chain of Tyr-70 are crucial for DNA binding by His-tagged Mor. Mutant proteins with substitutions at Gly-74 retained partial activity. Treatment with the minor groove- and GC-specific chemical chromomycin A3 demonstrated that chromomycin prevented His-Mor binding but could not disrupt a pre-formed His-Mor·DNA complex, consistent with the prediction that Mor interacts with the minor groove of the GC-rich spacer in the Mor binding site.

Bacteriophage Mu, the prototype for a large family of transposable phages, has two alternate life styles, lytic and lysogenic (1). Gene expression during the lytic cycle occurs in three phases: early, middle, and late (2,3). The early promoter P e has the characteristics of a typical bacterial promoter with both recognizable Ϫ10 and Ϫ35 hexamers, the promoter recognition elements for the Escherichia coli RNA polymerase (RNAP) 3 with the "housekeeping" sigma factor, 70 (4,5). The middle promoter P m possesses a Ϫ10 hexamer but lacks similarity to the consensus Ϫ35 element TTGACA (Fig. 1A). As a result, gene expression from P m requires a phage-encoded transcription activator protein, the middle operon regulator, Mor (6). Likewise, transcription from the Mu late promoters requires activation by the Mu C protein (7), which shares significant sequence similarity with Mor (6 -9) (Fig. 1B).
Mor and C are the most well studied members of this Mor/C family of transcription factors. They are characterized by an acidic N terminus and a basic C terminus that contains a helixturn-helix (HTH) DNA-binding motif (6). Mor is a homodimeric, sequence-specific DNA-binding protein with a monomer length of 129 amino acids ( Fig. 1B) (6,10). Consistent with its role as a transcription activator, Mor binds to the inverted repeats of a pseudo-palindromic DNA element located between Ϫ51 and Ϫ36 base pairs upstream of the transcription start site ϩ1 (Fig. 1A) and overlapping the region normally occupied by the missing Ϫ35 recognition element (Fig. 1A) (11). In addition to Mor binding, activation of the middle promoter requires the C-terminal domains of the ␣ and subunits of the E. coli RNA polymerase (12)(13)(14). Thus, the working model for middle promoter activation involves binding of Mor to the promoter, and its recruitment of RNAP to the otherwise nonfunctional middle promoter through Mor-RNAP interactions (13,14).
The high-resolution crystal structure of a histidine-tagged Mor (His-Mor) dimer showed the locations of Mor amino acids 27 to 120; the His tag and Mor amino acids 1-26 and 121-129 were not visible (15). The dimer structure has three domains: an N-terminal dimerization domain and two C-terminal DNAbinding domains, each with a classical HTH motif ( Fig. 2A) (15). The N-terminal ␣-helices, ␣1 and ␣2, of each His-Mor subunit intertwine with each other to form a four-helix bundle, creating a single central dimerization domain (Fig. 2A). The C-terminal three-helix bundle in each monomer forms the canonical HTH DNA-binding motif in which helix ␣3 forms the "scaffolding helix," and helices ␣4 and ␣5 together with the intervening loop form the HTH motif ( Fig. 2A) (16,17). The DNA-binding domains of the His-Mor dimer flank the dimerization domain, putting them at opposite ends of the structure ( Fig. 2A). Comparisons of the HTH domain structure with other proteins identified the trp repressor TrpR and region 4.0 of the subunit of Thermus aquaticus RNAP as close structural neighbors (15,18,19). These proteins use a perpendicular "ends-on" mode of DNA binding, inserting their recognition helices into the major groove using only the first two turns of the recognition helix for DNA contacts. This led us to propose that His-Mor might bind similarly (15); however, the recognition helices of the His-Mor dimer are located too far apart (63 Å) to interact with two adjacent major grooves of B-DNA, which reach a maximum distance at the outside edges of 54 Å, indicating that conformational changes in both His-Mor and P m are probably required to FIGURE 1. Promoter and protein sequences and plasmids. A, the sequence of the Mu middle promoter P m was annotated with the locations of the Ϫ10 hexamer (box), the transcription initiation site at ϩ1 (bent arrow), the region protected from DNase I digestion by Mor binding (bar), and the imperfect dyad-symmetry element recognized by Mor (inverted arrows). Dots mark 10-bp intervals numbered by their positions in the promoter relative to ϩ1. B, the Mor and C amino acid sequences are aligned with each other and with the secondary structure elements from the His-Mor crystal structure (15). The amino acids are identified by the single letter code. Gray shading indicates amino acids that are chemically similar at that position among the 15 total Mor/C family members previously identified (15). Black boxes indicate identical amino acids in those homologs (15). Dots indicate 10-amino acid intervals in Mor. C, primary sequence alignment of the inter-domain ␤-strand linker region of Mor and its previously identified homologs (15). The invariant residues are boxed in black and the chemically conserved amino acids are shaded gray. The scoring of invariant amino acids allowed two exceptions that were present in prophage remnants (marked with an asterisk) no longer subject to selection for function. The open arrow indicates the amino acids corresponding to the ␤-strand in the crystal structure. The numbers above the alignment indicate the positions of those amino acids in Mor. To the left of the alignment are the names of the prophage and prophage remnants (*) and the protein name, Mor or C, which was determined by the Mu protein to which it is the most similar (15). The names of prophages and prophage remnants are given in detail in previous genome papers (62)(63)(64)(65)(66)(67)(68) and reviews (69). D, the two plasmids that comprise the in vivo His-Mor-P m activation assay are shown. Addition of IPTG leads to expression of His-Mor, which activates P m , whose activity is assayed by measuring ␤-galactosidase produced by the P m -lacZ fusion. Panel C was excerpted from achieve DNA binding by His-Mor (15). Consistent with this prediction, circular permutation gel-shift assays with His-Mor bound to P m sequences revealed DNA curvature with a ϳ45°b ending angle. 4 Earlier we proposed that conformational changes in His-Mor would involve movement of the DNAbinding domains up and away from the dimerization domain to contact the DNA (Fig. 2, C and D).
The structure of the His-Mor dimer revealed an additional novel secondary structure element that might play a role in DNA binding (15). The anti-parallel ␤-strands at the top of the molecule (Fig. 2, A and B) form a ␤-ribbon in a 12-amino acid linker that connects the dimerization and DNA-binding domains. The residues in the ␤-strand and flanking loops of this linker are highly conserved, with 7 of 12 amino acids identical between Mor and C (Fig. 1, B and C). The ␤-strand amino acid side chains of Gln-68 and Tyr-70 (numbered as in native Mor) extend away from the protein (Fig. 2B). When the two HTH motifs make contacts in the major groove of the inverted repeats in P m (Fig. 1A), the surface-exposed side chains of Gln-68 and Tyr-70 are ideally positioned to interact with the 4 Y. Mo and M. M. Howe, unpublished results. FIGURE 2. The crystal structure of the His-Mor dimer (Mor amino acids 27-120). A, the structure is viewed from the side with one subunit shown in yellow and the other in red. The secondary structure elements, helices ␣1-5 and ␤-strand 1, are labeled for the red subunit. The N and C termini of the red subunit are marked N and C, respectively. The central dimerization domain and the flanking DNA-binding domains are bracketed and labeled. The side chains of key amino acids in the ␤-strands and their flanking loops are shown in a "ball and stick" representation, whereas the glycines are depicted as spheres. The boxed portion is magnified below in panel B. B shows a close-up look at the ␤-strand linker region with the key residues color-coded for each subunit as in A. The side chains of Gln-68 and Tyr-70 of each subunit are shown in a ball and stick representation; whereas the glycine residues are shown again as spheres. Each of the key amino acids is identified by its single letter code and position, with the amino acids in the yellow subunit denoted with a "prime." C, the His-Mor dimerization domain is shown as cylinders for ␣-helices and arrows for ␤-strands; the HTH domains are represented as ribbon diagrams. A 20-bp segment of B-DNA is shown above His-Mor with the arrows originating at the position in the HTH domain predicted to contact the DNA and the arrowheads pointing to the adjacent major grooves to be contacted. D, the diagram is shown as in C except that the DNA is 18-bp long and is bent ϳ40°. The HTH domains are shown docked into the DNA major grooves. C and D were originally published as Fig. 4, B and C, in Ref. 15. minor groove of the GC-rich spacer between the inverted repeats (Fig. 1A).
In the crystal structure the side chains of residues Val-69, Ile-71, and Pro-72 are buried and involved in hydrophobic interactions in the dimer interface; they form a "cap-like" structure that seals the hydrophobic interior from the solvent (15).
This linker region contains three highly conserved glycines (Fig. 1C) that are predicted to form hinges that contribute to the conformational flexibility of Mor (15) (Fig. 2, B-D). The N-terminal amino acids of this hinge region form a "Gly-Gly-Gly" motif in which two of the glycines, Gly-65 and Gly-66, are highly conserved among Mor/C family members (Fig. 1C); whereas the C-terminal amino acids include only one conserved glycine, Gly-74 (Fig. 1C). Given their location between the two domains and the tolerance of glycine to extreme bond angles, we proposed that glycine hinges might act as pivot points for the conformational changes in Mor required for it to interact with the DNA (15).
In this study, the roles of the potentially key linker amino acids in His-Mor function were investigated by codon-specific site-directed mutagenesis of a his-tagged mor gene (his-mor). A two-plasmid transcription-activation phenotypic screening assay ( Fig. 1D) for P m -lacZ function was employed to identify candidate mutants with different levels of P m activity, and their his-mor genes were sequenced. The mutant His-Mor proteins were then characterized for their overexpression, solubility, DNA binding, and transcription activation. The results presented here clearly demonstrate that multiple conserved amino acids in the ␤-strand and flanking loops of this linker are crucial for DNA binding by His-Mor.

EXPERIMENTAL PROCEDURES
Media, Chemicals, and Enzymes-Protein overexpression and routine cell growth were done in LB medium (20), whereas cultures for ␤-galactosidase assays were grown in minimal medium with casamino acids (M9CA) (7). MacConkey lactose plates with 25 g/liter of MacConkey agar and 25 g/liter of Mac-Conkey agar base (Difco) and thus only half the normal amount of lactose was used for plate phenotyping. When necessary, chloramphenicol and ampicillin were used at 25 and 40 g/ml, respectively. Oligonucleotides for mutagenesis, sequencing, and probe preparation were purchased from Integrated DNA Technologies, Inc.; their sequences are given in Table 1. Chromomycin A 3 was obtained from AG Scientific Inc. Sources for the remaining chemicals and enzymes are given in previous publications (10,12,15).
Bacterial Strains and Plasmids-The host strain background for most of the plasmid constructions and in vivo assays was E. coli K-12 strain MH13312 (mcrA ⌬pro-lac thi gyrA96 endA1 hsdR17 relA1 supE44 recA/FЈ pro ϩ lacI Q1 ⌬lacZY), a derivative of JM109 (21) carrying an FЈ plasmid deleted for both lacZ and lacY and expressing higher than normal levels of Lac repressor (11). Strain MH13435 was made by transforming MH13312 with the P m -lacZ fusion plasmid pIA14; it was used as a host for phenotypic assays of the ability of mutant Mor proteins to activate transcription from P m . Strain MH13422 is a derivative of MH13312 containing both pIA14 and pIA69 and was used as the positive control in the MacConkey plate and ␤-galactosidase assays. Strain MH18211 was derived from MH13435 by transformation with the Mor deletion plasmid pMUT10; it was used as the negative control in the MacConkey plate and ␤galactosidase assays. Strain MH13355 was derived from JM109(DE3) and contains the same FЈ plasmid as MH13312. The DE3 in MH13355 is a derivative of D69 containing the T7 RNA polymerase gene under control of the IPTG-inducible P lacUV5 ; it also carries imm21 and nin5 mutations (21). This strain was used for protein overexpression and detection of wild-type and mutant proteins.
Plasmid pIA14 contains wild-type P m sequences from Ϫ62 to ϩ10 (the base at Ϫ62 is provided by the vector) cloned between the EcoRI and BamHI sites in the lacZ reporter plasmid (Fig.  1D) such that ␤-galactosidase levels provide an indicator of P m activity (11). Plasmid pIA69 contains sequences encoding His-Mor with introduced silent restriction sites; relevant sites are shown in Fig. 1D. His-Mor expression in pIA69 is under the control of both a T7 promoter and P lacSYN (Fig. 1D), a slightly altered form of P lacUV5 (15). Amino acid changes were made in this his-mor gene; then the ability of the mutant protein to activate P m was assayed by using the P m -lacZ fusion in pIA14. The His-Mor deletion plasmid, pMUT10, is a derivative of pIA69 deleted from PstI to SphI, missing Mor amino acids 8 -116, and in the wrong reading frame after 116. Site-directed Mutagenesis-Plasmid pIA69 was used as the template for the PCR-based mutagenesis of his-mor and as the vector for cloning the PCR products. Degenerate mutagenic primers (Table 1) were designed to introduce base substitutions at all three positions of a single codon by using an equimolar mixture of bases (NNN or NNGϩC) at the targeted codon (22). In most cases the degenerate primers were used for PCR with wild-type primers MUT13 or ZAO3 flanking the his-mor gene to create a library of mutant DNA fragments containing part of the his-mor gene with single codon mutations. The libraries were combined with overlapping wild-type PCR products containing the remaining part of the his-mor gene and were used as templates for overlapping PCR. In two cases both strands were mutagenized in separate PCR reactions and then combined and used as templates for overlapping PCR. After gel purification, the resulting mutagenized cassettes were cloned (using standard procedures; 23) into pIA69 between the PstI and either HindIII or BamHI sites, in and following his-mor, respectively. The ligation mixtures were transformed as described previously (15) into strain MH13435 containing the P m -lacZ fusion reporter plasmid, pIA14 ( Fig. 1D) (11). In early constructions, the transformants were spread on LB plates with ampicillin (40 g/ml) and chloramphenicol (25 g/ml), and the resulting colonies were screened for P m activity by picking and stabbing 48 to 96 colonies onto MacConkey lactose indicator plates with ampicillin, chloramphenicol, and different concentrations of IPTG (10, 50, and 100 M) to induce His-Mor expression. In later experiments transformants were plated directly onto the MacConkey lactose plates containing 50 M IPTG. The direct plating gave much better resolution of distinct single colony phenotypes. Representative mutant plasmids were chosen based on their plate phenotypes, and the mutations were identified by automated DNA sequencing of their his-mor genes by the UTHSC Molecular Resource Center. For protein overexpression, mutant plasmids were transformed into MH13355 for His-Mor protein overproduction and purification.
Summary of Assays Used to Characterize Mutant Proteins-Overexpression of His-Mor proteins was achieved by using the T7 promoter in pIA69 and the T7 RNA polymerase provided by the MH13355 host. The expression levels and solubility of the mutant proteins were assessed by SDS-PAGE of supernatants from crude cell extracts that were made from those cells. Small scale His-Mor protein purification was accomplished using His tag affinity chromatography of the above supernatants. The ability of the mutant proteins to activate transcription was based on in vivo assays of ␤-galactosidase produced from a P m -lacZ fusion plasmid. Detailed descriptions of these assays can be found in Ref. 15.
Probe Preparation and Gel Retardation Assay-The DNAbinding ability of the mutant His-Mor proteins was analyzed by using a gel retardation assay. The DNA probe was a PCR product made with 32 P-labeled oligonucleotide MLK7 and unlabeled oligonucleotide IRI21 using pIA14 (containing P m sequence Ϫ62 to ϩ10) as template. After purification with a QiaQuick PCR purification kit (Qiagen), the probe concentration was estimated on a 2% agarose gel by comparison with a low mass DNA ladder (Invitrogen).
A 20-l reaction volume containing 400 pg of probe, 50 ng of calf thymus DNA, and 200, 400, or 800 ng of His-Mor protein in binding buffer (20 mM Tris-HCl, pH 7.9, 50 mM NaCl, 5% glycerol and 1 mM DTT) was incubated at room temperature for 20 min, then resolved on a 10% nondenaturing acrylamide gel containing 0.5 ϫ TBE and run in 0.5 ϫ TBE buffer at 260 V for 3 h at 4°C. Initial exposure of the gels to X-Omat Bio-Max MR film was done without drying.
Chromomycin Assays-Chromomycin interference and disruption assays were performed by using the probe preparation described above and chromomycin A 3 from A.G. Scientific Inc. For the interference assay, different concentrations of chromomycin (final concentrations 0.5-50 M) were incubated with the probe (ϳ400 pg) in a 20-l reaction volume at room temperature for 15 min followed by addition of 800 ng of purified His-Mor (1.18 M dimer) and incubation at room temperature for an additional 20 min. For the disruption assay, the effect of chromomycin on preformed Mor-DNA complexes was assayed similarly except that the order of Mor and chromomycin addition was reversed. The reaction mixtures were resolved on a 10% native polyacrylamide gel as described above, and the gels were exposed to X-Omat-MR film without drying but with an intensifier screen for 16 h.

RESULTS
Rationale-To demonstrate that a particular amino acid is essential for protein function, one could construct a set of mutants, one for each of the other 19 amino acid substitutions, assay them, and show that none retain protein function. The mutants could be made by brute force, one mutant at a time, or more efficiently by making a library of substitutions at a single codon by using a primer with degeneracy at all three positions of the codon, and then sequencing clones until the 19 mutants have been found and tested for function. An even more efficient approach is to use an indicator plate assay for screening the library, where functional and nonfunctional phenotypes are easily discriminated. Sequencing of the target gene from phenotypically functional colonies will identify substitutions that retain protein function. If the library is diverse and all the phenotypically functional colonies have the wild-type protein sequence, then one can conclude that the amino acid at that position is essential. To demonstrate that the library is sufficiently diverse to contain all 19 substitutions, the genes from seven to 10 nonfunctional colonies are sequenced; they should all contain different single amino acid substitutions.
Mutant Isolation-To determine the importance and roles of the inter-domain linker amino acids in His-Mor function, we carried out site-directed mutagenesis by PCR with oligonucleotide primers containing degeneracy at all three positions of a single codon, i.e. oligonucleotides designed to introduce all possible amino acid substitutions at a given position (numbered as in native Mor). The mutated DNA fragments were cloned into the His-Mor expression plasmid pIA69 (Fig. 1D) and transformed into strain MH13435 containing the reporter P m -lacZ fusion plasmid pIA14 (Fig. 1D). The resulting libraries of transformants containing mutant His-Mor proteins were screened on MacConkey lactose plates containing small amounts (50 M) of IPTG to induce low levels of His-Mor expression. The colony color then reflected the ability of its mutant His-Mor protein to activate transcription from P m . Transformants were initially scored in a "pick and stab" assay as defective (white or very light pink) or functional (red) by comparison to the phenotypes of strain MH18211 containing a deletion in His-Mor (⌬his-mor) and strain MH13422 with wild-type His-Mor, respectively. Subsequent re-testing by streaking for single colonies, and by direct plating of the libraries on the MacConkey plates allowed us to identify two additional intermediate phenotypes, diffuse pale pink colonies and white colonies with red centers. The phenotypes being reported here ( Table 2) were derived from this isolated colony assay. Libraries with substitutions in the most highly conserved positions Gly-65, Gly-66, and Tyr-70 produced 91, 94, and 71% white colonies, respectively. DNA sequencing showed that the plasmids from these colonies contained his-mor genes encoding single amino acid substitutions at the targeted positions. When the his-mor genes in the red colonies were sequenced, they had no mutations. In contrast, libraries with substitutions in the less highly conserved positions Gln-68 and Gly-74 had only 63 and 56% mutant phenotypes and exhibited a range of mutant colony colors including white, pale pink, and white with red centers. For these positions we chose representatives from each phenotype for sequencing. The Gly-74 mutants were unusual in that one red colony and one white colony with a red center each contained single amino acid substitutions, demonstrating that Gly-74 is not essential. In addition, multiple mutants from the Gly-74 library made white colonies, but those colonies acquired red centers upon continued incubation; these colonies made His-Mor mutant proteins that retained partial function. The amino acid substitutions and colony phenotypes for the above mutants are given in Table 2. Finally, to validate this genetic approach we also made libraries for the nonconserved position Gly-67. For these libraries only 8% of the colonies had a mutant phenotype. Beyond sequencing to demonstrate the presence of mutations, analysis of these mutants was not pursued.
Protein Overexpression-Amino acid substitutions in a protein can cause misfolding and lead to aggregation, precipitation, and increased sensitivity to proteolytic degradation. These consequences should be detectable by a reduction in levels of His-Mor protein in supernatants from crude cell extracts, as observed previously for several mutant proteins with substitutions in the dimerization domain (15). To ensure that the severe defects of these inter-domain linker mutants were not caused by precipitation or degradation of the mutant proteins, we assayed for the presence of overexpressed mutant protein in the supernatants from sonicated cell extracts by SDS-polyacrylamide gel electrophoresis. The gels in Fig. 3 show that the mutant proteins were overexpressed and present at levels comparable with that of wild-type His-Mor. Given the chemical diversity of the substitutions (Table 2) and their negligible effects on protein solubility and degradation (Fig. 3), we favor the interpretation that the defective phenotypes caused by the majority of these mutations are due to their interference with protein function.
DNA Binding-Representative mutant His-Mor proteins were purified by His tag affinity chromatography and tested for their ability to bind to a wild-type middle promoter DNA fragment by using a gel retardation assay (Fig. 4). Substitutions at the most highly conserved positions Gly-65, Gly-66, and Tyr-70 abolished DNA binding (Fig. 4), making it most likely that their defects in transcription activation are due to their inability to bind to promoter DNA. Substitutions in Gln-68 also prevented DNA binding with the exception of Q68R, which retained some ability to bind DNA but failed to activate transcription. Mutations in the less conserved Gly-74 position conferred less defective phenotypes, and almost half of their mutant His-Mor proteins retained at least partial function for DNA binding and for transcription activation as observed in the colony color assay.

TABLE 2 Mutant colony phenotypes
Transformants containing the pIA69-derived plasmids with codon-specific amino acid (aa) substitutions and the P m -lacZ reporter plasmid were plated directly or streaked for single colonies on MacConkey lactose plates, incubated overnight, and then scored for their colony colors. The pIA69-derived plasmids were sequenced to determine their amino acid substitutions, which are indicated below. Intriguingly, two substitutions at Gly-74 that exhibited significant DNA-binding activity were aromatic substitutions, G74W and G74F, whereas those with other substitutions, G74Y and G74S, failed to bind DNA. Transcription Activation-To quantify the transcription activation ability of the mutant His-Mor proteins in vivo, we performed liquid ␤-galactosidase assays with the P m -lacZ fusion as the reporter. The results are shown in Table 3. Consistent with their DNA-binding defects, mutants with substitutions at the most conserved positions Gly-65, Gly-66, and Tyr-70 were extremely defective, with ␤-galactosidase levels comparable with those of the Mor deletion (Ͻ10 units). The

TABLE 3 Summary of ␤-galactosidase and DNA binding by mutant proteins
The wild-type His-Mor amino acids predicted to be involved in conformational change and DNA minor groove binding were subjected to codon-specific mutagenesis, and the mutants were assayed for the effects of the mutations on DNA binding and transcription activation at P m as described earlier. This summary presents the location and identity of the key wild-type amino acids in His-Mor in the top row. The amino acid changes that were recovered and characterized for each wild-type position are listed under column A. The ␤-galactosidase activities produced by transcription activation of the P m -lacZ fusion by the mutant protein are given under column B, and are expressed relative to the wild-type level assayed in parallel and set to 1000 Miller units (53); they represent the average of results from either two (Gln-68 and Tyr-70) or three (all three glycines) independent experiments. The DNA-binding activity for each mutant is given in column D.

WT Mor Gln-68 WT Mor Tyr-70 WT Mor Gly-65 WT Mor Gly-66 WT Mor Gly-74
ϩϩ Thr 17 ND a A minus sign (Ϫ) indicates that no DNA binding was detected in gel retardation experiments such as those shown in Fig. 4. b ND indicates that DNA binding was not tested for that particular mutant. c For mutants that gave detectable DNA binding, the degree of binding was indicated by one or more "ϩ" signs, with "ϩϩϩϩ" set as the wild-type level, "ϩϩϩ" indicating strong but definitely reduced binding at the lowest protein concentration, "ϩϩ" with DNA binding only at the two highest protein concentrations, and "ϩ" significantly reduced binding even at the two highest protein concentrations. Mutant Y70Q is not included in this summary because it was recently found to contain a second mutation. majority of Gln-68 mutants were similarly defective. Interestingly, for position Gly-74, the same two mutant proteins with aromatic substitutions, G74W and G74F, that retained some DNA-binding activity also exhibited more than 50% of wildtype levels of transcription activation. A summary of the ␤-galactosidase and DNA binding properties for the mutant proteins is given in Table 3. Natural Amino Acid Substitutions-A second complementary approach to identifying amino acids that are likely to be important for protein function is to examine the degree to which they are conserved in members of a protein family. In 2003 a BLAST similarity search (24) of the GenBank protein data base (25) identified 15 Mor/C family members (Fig. 1C,  15). In late 2010 a similar single-round BLAST search identified 90 family members that were then directly aligned on the NCBI server by using COBALT (26). The majority of the family members were located in complete prophages or prophage remnants in bacterial genome sequences. Although it is not feasible for us to determine which of these proteins function, it is possible to tally the amino acids at each position and identify those that occur frequently and others that are relatively rare. The rationale for this analysis is that amino acids that occur frequently are reasonably likely to retain function; whereas amino acids present rarely would more likely occur in mutant proteins that do not function.
Examination of the tally results shown in Table 4 reveals that the amino acids that were highly conserved in the earlier 15-member family (Fig. 1C) are extremely highly conserved in the current 90-member family (Table 4). They are Gly-65, Gly-66, Tyr-70, and Pro-72; at these positions there are only 3, 1, 0, and 1 nonconserved amino acids, respectively. For Gly-65 two of the three exceptions were found in the initial mutant isolation and produced white colonies and Ͻ10 units of ␤-galactosidase, supporting our hypothesis that rare amino acid changes result in nonfunctional mutant proteins. These exceptions also lead us to suspect that amino acids present three or fewer times are likely to be in defective proteins. The BLAST results for Gly-74 show that it is highly conserved, with Gly in 75 of 90 members, and the other 15 distributed among 7 different amino acids, with a maximum of four times for a single amino acid.
These data alone would have led us to predict that these other 15 proteins would be entirely nonfunctional, but the colony colors in Table 2 and ␤-galactosidase values in Table 3 indicate that most of them retain partial Mor function. For Gln-68, Asn is the most frequent amino acid, being present 27 times, with Gln found 24 times and Ser found 14 times; the remainder of the substitutions were present from 1 to 6 times and contain Leu, Gly, and Ile substitutions, which were also found among the highly defective mutants (Table 2) Effects of Chromomycin-Modeling studies demonstrated that the side chains of ␤-strand amino acids Gln-68 and Tyr-70 are favorably placed to interact with the minor groove of the GC-rich spacer between the inverted repeats in the Mor binding site. To investigate the significance of the proposed minor groove contacts to Mor-P m interactions, we used gel mobility shift assays to test whether His-Mor could bind to the middle promoter in the presence of the GC-specific minor groove binding ligand, chromomycin A 3 , at concentrations known to inhibit binding of other proteins (27,28). In the first experiment chromomycin was incubated with the promoter DNA and then followed by addition of His-Mor. Chromomycin at a concentration as low as 2 M completely inhibited His-Mor binding to the promoter DNA and reduced Mor binding at 0.5 and 1 M; only the chromomycin-DNA complex was present at, and above, 2 M (Fig. 5A).
To ask whether chromomycin would be able to access the minor groove and displace His-Mor from a His-Mor⅐DNA complex, as observed previously for an EGR1-DNA complex (28), we conducted a second experiment in which the order of addition of His-Mor and chromomycin was reversed. When chromomycin was added to preformed His-Mor⅐DNA complexes, it was unable to completely disrupt the complexes even

Tally of amino acid substitutions
The Mor amino acid (WT Mor) sequence was subjected to a BLAST (24) similarity search, identifying a total of 90 proteins (including Mor), which were then directly aligned on the NCBI server using COBALT (26). For Mor amino acids Gly-65 through Ala-76, which comprise the interdomain linker, the frequencies of each amino acid at each position were tallied. The identity of each amino acid substitution and the number of times that amino acid was present at that position in the 90 proteins are listed below each Mor amino acid. The amino acids shown in italics were found among the highly defective mutants in Table 2.

DISCUSSION
The crystal structure of His-Mor revealed a novel structural element, a 12-amino acid inter-domain linker containing a ␤-strand, which we proposed serves as a hinge needed for DNA binding of His-Mor (15). The results from this genetic analysis demonstrate that the highly conserved Tyr-70 residue in the ␤-strand and the Gly-65, Gly-66, and Gly-74 residues in the flanking loops play a significant role in DNA binding and, thus, transcription activation by His-Mor.
The Roles of Gln-68 and Tyr-70-Docking of the crystal structure of His-Mor onto a 16-bp long B-DNA, permitting the HTH motifs to contact the DNA major grooves, revealed that the side chains of ␤-strand residues Gln-68 and Tyr-70 would be brought into close proximity to the minor groove and, therefore, might participate in DNA binding by His-Mor. The severe DNA-binding and transcription-activation defects of the mutant proteins with most substitutions at Gln-68 and Tyr-70 are consistent with this hypothesis. One possibility is that the Tyr-70 side chain intercalates into the DNA, causing the observed DNA bend. Tyrosine intercalation is found in multiple DNA-binding proteins, including bovine pancreatic DNase I (29), human 3-methyladenine DNA glycosylase (30), and MutY of Bacillus stearothermophilus (31) leading to DNA bends of 20, 22, and 55 degrees, respectively. It seems more likely that the amide side chain of Gln-68, and its chemically conserved substitution Asn-68, play a stabilizing role in interactions with the negatively charged sugar-phosphate backbone of the DNA.
The Role of Flexible Glycines in DNA Binding by His-Mor-In the DNA-free crystal structure of His-Mor, the tips of the predicted recognition helices of the HTH motif are unfavorably positioned to interact with two adjacent major grooves of DNA (15) (Fig. 2C). In particular, they are 63 Å apart, which is ϳ29 Å farther than the distance between the centers, and ϳ9 Å farther than the outermost edges, of two adjacent major grooves of B-form DNA. Structural changes that would allow His-Mor to assume an altered DNA-bound conformation were predicted to stem from the flexibility provided by the three highly conserved glycines flanking the ␤-strands (15). The mutational analysis presented here demonstrates that these glycines, Gly-65, Gly-66, and Gly-74, are important for His-Mor function. It also revealed the unexpected finding that some substitutions at Gly-74 retain partial His-Mor function. Given the lack of side chains in glycines and their known roles in the flexibility of other proteins (32)(33)(34)(35), it is more likely that they play a role in the conformational change in His-Mor to allow it to bind to DNA than in making direct DNA contacts.
Minor Groove DNA Binding by His-Mor-Sequence-specific DNA-binding proteins generally make base-specific contacts in the major groove due to its better accessibility and greater information content (36 -38). Other proteins such as TATAbinding proteins and proteins with high mobility group domains, e.g. sex-determining region Y and Lef-1, exclusively contact bases in the minor groove (39 -41). Finally, some transcription regulators make simultaneous contacts with both the major and minor grooves (17,(42)(43)(44). For example, proteins containing winged HTH motifs use their HTH motifs for major groove binding, and their "wings" to probe the flanking minor grooves (45)(46)(47)(48). Similarly, transcription regulators in the LacI family utilize an HTH motif for major groove interactions and a hinge region connecting the domains to contact the intervening minor groove (43,49). Despite the variation in the secondary structure elements making the minor groove contacts, these proteins induce DNA unwinding and bending toward the major groove by inserting amino acid side chains into the minor groove. In the case of DNase I (29) and human 3-methyladenine DNA glycosylase (30), it is tyrosine intercalation into the minor groove that results in widening of the minor groove, narrowing of the major groove and bending of the DNA toward the major groove.
Based on the crystal structure of the DNA-free His-Mor dimer, we proposed that His-Mor also uses two elements for DNA binding: the HTH motifs for major groove interactions and the inter-domain linker for binding to the intervening minor groove on the same face of the DNA. To address this possibility we assayed the effect of the GC-specific minor groove binding drug chromomycin A 3 on His-Mor⅐DNA inter- actions. Consistent with our prediction, chromomycin was able to prevent His-Mor from binding to the DNA, but it is not clear that this occurred due to minor groove binding by His-Mor or narrowing of the adjacent major grooves as a consequence of chromomycin binding. Chromomycin did not inhibit DNA binding by the HSV ICP4 (infected-cell polypeptide 4) (earlier called IE175) major transcriptional regulator (50,51), which from bioinformatic analyses is predicted to bind in the major groove via a helix-turn-helix DNA binding motif (52). However, chromomycin did prevent binding of proteins EGR1 and WT1 (Wilms tumor 1) known to bind in the major groove (28). It is not clear whether this is due to groove width changes from the widened minor groove causing narrowing of the major groove or because of structural changes it causes in the DNA immediately flanking its binding site (27,53).
When we tested whether preformed His-Mor⅐DNA complexes could be disrupted by chromomycin, they were not, i.e. chromomycin was unable to access the GC-rich minor groove in preformed His-Mor⅐DNA complexes even at very high concentrations (50 M). Just the opposite was observed for the zinc finger, major groove binding, transcription factor EGR1 (28). When chromomycin was added to preformed EGR1⅐DNA complexes, it was able to access the minor groove and disrupt the preformed complexes, even at the very low chromomycin concentrations (ϳ1 M) that blocked binding of EGR1 to its DNA-binding site (28). We suggest that the inability of chromomycin to disrupt the preformed His-Mor⅐DNA complexes indicates that chromomycin cannot access the minor groove when His-Mor is bound and, thus supports the hypothesis that Mor interacts with the minor groove as well as the two flanking major grooves. Providing additional support, binding of the Mu C protein to the late promoter, P mom , blocked the minor groove accessibility to the minor groove-specific chemical nuclease, 1,10-phenanthroline (54). Taken together, these results support our hypothesis that His-Mor interactions with the minor groove of the GC-rich spacer in the Mor-binding site are essential for His-Mor⅐DNA binding and that the minor groove becomes inaccessible upon His-Mor binding.
A Role for Pro-72 in His-Mor Function-In the crystal structure of the His-Mor dimer the hydrophobic residues Val-69, Ile-71, and Pro-72 were found in a "cap-like" structure at the top of the dimerization interface with their side chains extending down into the hydrophobic core (15). We thought that these residues would serve the same role in DNA-bound His-Mor, so we did not include them in the mutational analysis. The natural hydrophobic substitutions observed in the BLAST results for Val-69 and Ile-71 are consistent with this hypothesis. The essential nature of Pro-72, however, forces us to consider whether Pro-72 plays a different role in the DNA-bound form of His-Mor. Proline is unique among naturally occurring amino acids in the cyclization of its side chain to the backbone amide and its ability to adopt two conformations, cis and trans. It is known to function as a molecular hinge (55), mediating conformational transitions in a number of proteins (55)(56)(57)(58). In His-Mor, Pro-72 is located at the C-terminal end of the ␤-strand (15), and its isomerization could act as a molecular switch inducing structural changes in His-Mor to confer, in combination with the conserved glycines, the DNA-bound form of His-Mor. Alternatively, the above structural changes in His-Mor could be independent of Pro-72 but its side chain might participate directly in His-Mor⅐DNA interactions, either by intercalation between bases or by interactions with the sugar-phosphate DNA backbone. Proline intercalation and the resulting ϳ90°DNA bend has been well documented in the bacterial histone-like proteins, integration host factor and HU, and also in eukaryotic high mobility group proteins (36,59,60). The smaller, ϳ45°, bend observed for His-Mor might occur if there is only partial intercalation. On the other hand, interactions between the proline side chain and sugar phosphate backbone does occur, for example, in DNA binding by the E. coli transcription regulator PutA (61). Thus, similar interactions between the side chain of Pro-72 and the sugar-phosphate DNA backbone might participate in His-Mor-DNA interactions and play a role in stabilizing the bending angle generated by His-Mor binding. Finally, it is also possible that Tyr-70 and Pro-72 both interact within the minor groove. There is precedence for such interactions, for example, in the 3-methyladenine DNA glycosylase (30) Tyr-162 intercalates between the bases, resulting in widening of the minor groove, which is then filled by Met-164 and Tyr-165 (30).
Conclusion-Results from this mutational and homolog identification analysis favor the model for middle promoter binding by His-Mor in which the conformational changes in His-Mor stemming from the inter-domain linker region and the structural changes in DNA due to His-Mor-minor groove interactions are crucial for DNA binding and transcription activation by His-Mor. Given the high degree of sequence conservation in the ␤-strand linker region among the Mor/C family members, this mode of DNA binding is likely to be common in this family.