Structural Basis for Interaction between Mycobacterium smegmatis Ms6564, a TetR Family Master Regulator, and Its Target DNA*

Background: The structural basis for interaction between a master regulator and DNA remains unclear. Results: We solved the crystal structures of a broad regulator Ms6564 and its protein-operator complex. Conclusion: Ms6564 binds DNA with strong affinity but makes flexible contacts with DNA. Significance: Ms6564 might slide more easily along the genomic DNA and extensively regulate the expression of diverse genes. Master regulators, which broadly affect expression of diverse genes, play critical roles in bacterial growth and environmental adaptation. However, the underlying mechanism by which such regulators interact with their cognate DNA remains to be elucidated. In this study, we solved the crystal structure of a broad regulator Ms6564 in Mycobacterium smegmatis and its protein-operator complex at resolutions of 1.9 and 2.5 Å, respectively. Similar to other typical TetR family regulators, two dimeric Ms6564 molecules were found to bind to opposite sides of target DNA. However, the recognition helix of Ms6564 inserted only slightly into the DNA major groove. Unexpectedly, 11 disordered water molecules bridged the interface of TetR family regulator DNA. Although the DNA was deformed upon Ms6564 binding, it still retained the conformation of B-form DNA. Within the DNA-binding domain of Ms6564, only two amino acids residues directly interacted with the bases of cognate DNA. Lys-47 was found to be essential for the specific DNA binding ability of Ms6564. These data indicate that Ms6564 can bind DNA with strong affinity but makes flexible contacts with DNA. Our study suggests that Ms6564 might slide more easily along the genomic DNA and extensively regulate the expression of diverse genes in M. smegmatis.

Protein-DNA interactions play critically important roles in many biological processes (1,2). This is particularly true with transcriptional regulation, because a regulator can function only when it successfully recognizes its target DNA. In recent years, the functions of some master regulators, which regulate expression of a large number of genes, have been characterized. The structural basis for such a broad regulation, however, remains largely unclear.
A protein's specificity and affinity for binding DNA are usually determined by the base readout mechanism (recognition of DNA bases) or the shape readout mechanism (recognition of DNA shape) (1,3). The ␣-helix and ␤-strands are two common secondary structure elements used for the base readout mechanism (1,2,4). In contrast, helix-turn-helix (1, 2, 4 -7) and helix-loop-helix motifs (8) (category mainly ␣) are frequently used to recognize the DNA major groove. Interestingly, some regulators utilize both DNA base and shape recognition mechanisms to interact with their target DNA. One example is the two-component regulator, NarL, which can control expression of many respiration-related operons (9 -11). Structural analysis of the signal output domain of NarL (NarL C ) in complex with DNA reveals that NarL C acts as a dimer. The recognition helices contact the floor of the major groove of DNA, which is bent and transformed into the A-form (9). In contrast, transcription activator-like effectors can bind almost any DNA or DNA-RNA hybrid sequence primarily through a DNA-based recognition of a central domain of tandem repeats (12)(13)(14).
The TetR family of transcriptional regulators (TFRs) 4 comprises a large group of transcriptional regulators. Their prototype is an Escherichia coli TetR gene that regulates the expression of a tetracycline efflux pump in Gram-negative bacterium (15). TFRs often serve as repressors and regulate a variety of bacterial physiological processes (15). They usually act as homodimers in which each monomer consists of an N-terminal DNA-binding domain (DBD) and a C-terminal ligand-binding domain (15)(16)(17). For example, Staphylococcus aureus QacR regulates the expression of a multidrug transporter (18) by acting as a pair of dimers that bind a 28-bp operator DNA, and each half-site of the operator is recognized by the DBD of the QacR dimer on the opposite sides of the DNA (19). Similarly, a pair of Corynebacterium glutamicum CgmR dimers also docks on the opposite sides of its operator (20). Some master TFRs are reported to regulate the expression of a large number of genes. For example, SmcR controls at least 121 genes (21). KstR is directly involved in regulating the expression of 83 and 74 genes in Mycobacterium smegmatis and Mycobacterium tuberculosis, respectively (15,22). More recently, Ms6564 is characterized as a master regulator that regulates the expression of 339 potential target genes in M. smegmatis (15). However, the mechanisms through which such master regulators recognize specific DNA motifs are poorly understood.
In the present study, we determined the crystal structure of a TetR master regulator, Ms6564, and the Ms6564-operator complex at resolutions of 1.9 and 2.5 Å, respectively. We report that two dimeric Ms6564 molecules bind to opposite sides of its operator, which is similar to the case of other TFR regulators such as QacR and CgmR (19,20). However, Ms6564 demonstrates flexible contact with DNA base pairs, and strikingly, 11 water molecules are incorporated into the protein-DNA interface. In addition, only two residues in the DBDs of Ms6564, Lys-47 and Lys-48 directly interact with the cognate DNA. Therefore, Ms6564 can bind DNA with good affinity but makes flexible contacts with DNA, which allows Ms6564 to extensively regulate the expression of diverse genes in M. smegmatis.

EXPERIMENTAL PROCEDURES
Strains, Enzymes, Plasmids, and Chemicals-E. coli BL21(DE3) strains and the pET28a expression vector were purchased from Novagen. All enzymes including DNA polymerase, restriction enzymes and DNA ligase, deoxynucleoside triphosphates (dNTPs), and all antibiotics were purchased from TaKaRa Biotech. ␤-D-1-Thiogalactopyranoside, DTT, and all chemicals were purchased from Sigma. PCR primers were synthesized by Invitrogen.
Protein Expression and Purification-The gene encoding truncated Ms6564 (residues 9 -189) were amplified from the genomic DNA of M. smegmatis mc 2 155. The PCR products were cloned into a pET28a vector to produce recombinant vectors. After transformation with these recombinant plasmids, E. coli BL21(DE3) cells were grown in LB medium up to an A 600 of 0.8 at 37°C, and protein expression was induced with 0.1 mm ␤-D-1-thiogalactopyranoside at 16°C. Selenomethioninelabeled (SeMet) Ms6564 was expressed in M9 medium. Both native and SeMet Ms6564 were purified on Ni 2ϩ affinity columns as previously described (15). The eluted proteins were purified using heparin affinity columns (GE Healthcare) and eluted with buffer containing 20 mM Tris (pH 8.0) and 600 mM NaCl. Then the proteins were loaded on Superdex200 (GE Healthcare) with 20 mM Tris (pH 8.0) and 500 mM NaCl. The purified proteins were concentrated to 10 mg/ml in 20 mM Tris (pH 8.0), 300 mM NaCl, 50 mM imidazole, and 1 mM DTT.
Crystallization and Data Collection-All crystals suitable for x-ray diffraction were obtained using the sitting drop vapor diffusion method at 4°C. N-terminal truncated Ms6564 (1 l, residues 9 -189) was mixed with 1 l of reservoir solution containing 0.2 M sodium citrate tribasic dehydrate, 0.1 M HEPES sodium (pH 7.5), and 30% (Ϯ)-2-methyl-2,4-pentanediol. The mixture was equilibrated against 120 l of reservoir solution for 7 days. The crystal was soaked in a reservoir supplemented with a stepped concentration (first 10%, then 15%, and finally 20%) of glycerol and flash-cooled in liquid nitrogen. The SeMet-Ms6564 (residues 9 -189) crystal was obtained by the same procedure. The cryoprotection of SeMet-Ms6564 was achieved by raising the glycerol concentration stepwise to 20% with a 5% increment in each step.
We used the brominated DNA to identify the DNA bases. SeMet-Ms6564 was mixed with 31-bp brominated DNA (5Ј-TCATAAACGAGACGGTACGTCTCGTCTTGTG-3Ј). To avoid the effect of bromine atoms on the DNA binding ability of Ms6564, we chose the three underlined bases for bromination according to the electron density of the Ms6564-DNA complex. The crystallization of SeMet-Ms6564/brominated DNA complex was performed using the procedures described above for the native Ms6564-DNA complex.
Structure Determination and Refinement-The x-ray diffraction data were collected using Beamline 3W1A with a mounted MAR-165 CCD detector at Beijing Synchrotron Radiation Facility. All of the data were processed and scaled using the program HKL2000. The structure of SeMet-Ms6564 was determined by single anomalous dispersion. Three selenium sites were located and used to obtain the original experimental phases using Phenix.autosol, which located the selenium atoms and built the initial model; ϳ90% of the residues of the whole peptide were traced. The remaining part was manually built using the program COOT. The intact model was refined in Phenix.refine. The crystal structure of Ms6564 (9 -189) was determined by molecular replacement, and the structure of SeMet-Ms6564 (11-189) was used as the searching model in Phaser. Iterations of refinement using Phenix.refine, and manual refinement in COOT led to the final native model with excellent geometrical characteristics (see Table 1). The structure of the SeMet-Ms6564-DNA complex also was determined using the single anomalous dispersion method as described above.
To determine the precise position of base pairs, DNA was bromine-labeled, and diffraction data at the bromine absorption edge were collected. Two bromine atom sites were identified based on the anomalous Patterson map. Refinement of the complex also was carried out using Phenix.refine. The double-stranded DNA in the final structure of the SeMet-protein-DNA complex did not contain bromine atoms to avoid the bromine effects on protein binding.
DNA Substrate Preparation and EMSA-DNA fragments for the DNA binding activity assays were directly synthesized by Invitrogen or amplified by PCR from the genomic DNA of M. smegmatis mc 2 155. The DNA substrates were labeled and prepared as described previously (15) and stored at Ϫ20°C until use. Mutant Ms6564 DNA (K47A, Q48A, and K47A/ Q48A) were obtained by site-specific mutagenesis using wild type DNA as a template and were cloned into pET28a vectors. Protein expression in E. coli BL21(DE3) was induced with 0.1 mM ␤-D-1-thiogalactopyranoside at 37°C for 4 h and purified by Ni 2ϩ affinity columns as previously described (15). EMSA experiments using labeled DNA fragments also were performed as previously described (15). Images were acquired using a Typhoon Scanner (GE Healthcare).
Overall Structure of the Ms6564-Operator Complex-We further solved the crystal structure of the Ms6564-operator complex to 2.5 Å and refined it to R work (22.5%) and R free (28.1%) ( Table 1). As shown in Fig. 1B, the crystallographic complex is comprised of four Ms6564 monomers and a 31-bp palindromic DNA substrate that is part of the M. smegmatis promoter. Four similar monomers form two dimers: one is composed of distal monomer A and proximal D, and the other is composed of monomer distal B and proximal C (Fig. 1B). The root mean square deviation between the two dimers is 1.18 Å, which indicated they are similar to each other. The operator  DNA is recognized by the DBD of Ms6564 and by two dimers on the opposite sides of the DNA (Fig. 1, B and C). When Ms6564 binds DNA, the N-terminal domain is bent toward the DNA, and the positively charged N-terminal arm further inserts into the minor groove (Fig. 1D). Calculation consistently reveals that the root mean square deviation between the DNAfree and DNA-binding structures is 1.8 Å, which indicates the DNA binding induces a significant change in orientation between these terminal domains. Thus, Ms6564 undergoes a conformation change upon binding DNA. Compared with QacR, the symmetry axes of Ms6564 dimers lie in the same plane and antiparallel to each other (Fig. 2, A and B). Ms6564 binds DNA only flexibly, and its recognition helix inserts slightly into the DNA major groove (Fig. 2C, upper panel). This is strikingly different from the case of the QacR-DNA complex, in which recognition helices sink deep into the major groove floor (Fig. 2C, lower panel). DNA Deformation Occurs in the Ms6564-DNA Complex-We observed clear evidence of DNA deformation upon Ms6564 binding, although the DNA displays typical B-form DNA with average global roll and twist angles of 2.9 and 34.7°. In the  Ms6564-DNA complex, the mean width of the major groove decreased to 10.5 Å compared with 11.4 Å for canonical B-DNA. Interestingly, the recognition helices contact all four regions where the DNA major groove became narrow. In contrast, the average minor groove width is 7.4 Å, which represents a significant increase compared with 5.9 Å for canonical B-form DNA (Fig. 3). Thus, conformation of the DNA changed in the regulator-DNA complex, but the DNA retains the conformation of B-form DNA.
Eleven Water Molecules Are Involved in Bridging the Protein-DNA Interaction-Previous reports do not describe water molecules that participate in protein-DNA interactions in TFRs (19 -21). Unexpectedly, we found that seven water molecules bridge the contacts between Ms6564 and DNA base pairs and four water molecules mediate hydrogen bonds between protein and the DNA backbone (Fig. 4A). The residues Glu-37, Lys-47, Gln-48, Thr-49, and Tyr-51 participate in water-mediated interactions with base pairs. In the Ms6564-DNA complex, two water molecule-mediated hydrogen bonds form between monomer D and the base pairs (Fig. 4D). Only one water molecule contributes to the DNA binding in monomer A or B (Fig.  4, B and E), but three water molecules participate in indirect base pair binding in monomer C (Fig. 4C).
Only Two DBD Residues Directly Interact with Bases of the Cognate DNA-The DNA operator has a total of 10 bases and 39 phosphates that make direct contact with the two Ms6564 dimers (Fig. 5A). Six residues of short recognition helix (␣3, positions 47-53), which is similar to QacR (19), participate in DNA recognition. Within the DBD of Ms6564, two residues, Lys-47 and Gln-48, were observed to directly interact with bases of the cognate DNA (Fig. 5B, left panel). Although Lys-47 interacts only with G, Gln-48 recognizes the base with lower specificity and can interact with cytosine (Fig. 5B, middle panel) or adenine (Fig. 5B, right panel). The nitrogen atom at zeta position of Lys-47 forms hydrogen bonds with the O6 atom and N7 atom of guanine 13. Compared with the two hydrogen bonds between Lys-47 and guanine, Gln-48 forms only one hydrogen bond with its target base, between OE1 of Gln-48 and N4 of cytosine 11 (in monomers A, B, and C, Gln-48 makes contacts with cytosine), or between NE2 of Gln-48 and N7 of adenine 16 (monomer D) (Fig. 5B, middle and right panels). Interestingly, the positively charged Lys-47 and the negatively charged Glu-37 interact with each other upon DNA binding of Ms6564 (Fig. 5B, left panel).
Structural information suggests that the base pairs that contact with these two residues may be important for regulator-DNA interaction. To test this idea, we designed several new DNA substrates with mutated base pairs that contact Lys-47, Gln-48, or both (Fig. 6A). When all the nucleotides contacting both residues were mutated from guanine or cytosine to adenine, Ms6564 lost specific DNA binding activity. In comparison, Ms6564 could still bind other mutant DNA. Interestingly, Ms6564 could still bind substrate S2 (Fig. 6, A and B, lanes 5-8), in which a base C, previously omitted from the Ms6564 DNA- binding motif (15), was mutated. This result indicates that the C base is not essential, which is consistent with previous results.
Lys-47 Is a Critical Residue for Specific DNA Binding Activity of Ms6564-The present structural data imply that two DBD domain residues, Lys-47 and Gln-48, play important roles in interactions between regulators and DNA. In particular, Lys-47 specifically interacts with guanine through two hydrogen bonds, suggesting that Lys-47 functions as a primary amino acid residue for DNA binding specificity. This hypothesis is confirmed by further mutation and EMSA experiments. Lys-47 and Gln-48 are situated at the positive electrostatic surface of Ms6564 (Fig. 7A). When an increasing amount (1.4 -7 M) of protein is co-incubated with DNA, no obvious shifted bands are observed for the K47A or K47A/Q48A mutant proteins (Fig.   7B, lanes 1-6), which indicated that Lys-47 is essential for the specific DNA binding activity of Ms6564. In contrast, clear shifted bands are still observed for the Q48A mutant protein (Fig. 7B, lanes 3-9). These data indicate that Lys-47, but not Gln-48, is essential for the specific interaction between Ms6564 and its cognate DNA. Interestingly, further alignment analysis indicates that Lys-47 is highly conserved in the Ms6564, QacR, and CgmR proteins (Fig. 7C). Taken together, we have characterized Lys-47 as a critical residue for specific binding of Ms6564 to its cognate operator DNA.

DISCUSSION
In recent years, several master regulators that extensively regulate the expression of many genes have been characterized. However, the structural basis for broad transcriptional regulation remains unclear. In this study, we determine the crystal structure of a TFR master regulator, Ms6564, and the Ms6564operator complex. Although we reveal a general similarity to typical TFR proteins, Ms6564 contacted DNA more loosely, and many disordered water molecules participated in the interface between Ms6564 and DNA. This is the first case in which water molecules have been found to participate in the interaction of a TFR regulator and DNA. In addition, only two amino acid residues in DBD of Ms6564, namely Lys-47 and Gln-48, directly interact with bases of the DNA. These findings enhance our understanding of the mechanisms of protein-DNA interaction and transcriptional regulation.
Overall crystallographic structure of the Ms6564-DNA complex is generally similar to that of typical TFRs. For example, it contains four similar monomers that together comprise two dimers. The operator DNA was recognized by two dimers on the opposite sides of the DNA. However, distinct differences from other TFRs are evident. First, in the Ms6564-DNA complex, the symmetry axes of two dimers lie in the same plane, antiparallel to each other. This is strikingly different from that of QacR in which its two dimers do not lie in the same plane but form a triangular cavity between each dimer, and the angle between the two dimers is nearly 130° (Fig. 2B) (19). The DNA binding mode of Ms6564 might increase its flexibility and selectivity to interact with the target operator DNA, and therefore, the master regulator can more easily play an extensive regulatory function. Second, compared with only 10 bp in QacR, the proximal recognition helices (from monomer C and monomer D) in Ms6564 are separated by only 4 bp. The center to center distance between the recognition helices of each monomer in a Ms6564 dimer is 35.1 Å (measured by the distance between the amide nitrogens of Gln-48). In contrast, this distance is 37 Å in QacR, wider than that of Ms6564, implying that a major DNA deformation was induced by QacR. This hypothesis is consistent with our observation that the DNA has still retained a B-form conformation despite being deformed in the Ms6564-DNA complex. Third, a key distinction between Ms6564 and many TRFs (such as TetR, QacR, and CgmR), or even many DNA-binding proteins, is the location of recognition helix. For example, in the QacR-DNA complex, the major groove is widened significantly throughout the entire binding site (19). Instead of sinking deeply into the DNA major groove (17,19,20), the recognition helix of Ms6564 inserts slightly into the DNA major groove. Fourth, hydrophobic amino acid residues Ile-50 and Trp-53 are far from the interaction interface, and no hydrophobic interactions are observed between the recognition helix and DNA major groove in the Ms6564-DNA complex. In contrast, the strong hydrophobic interactions exist between the recognition helix and DNA in TFR-and CgmR-DNA complexes (17,20), which might push the water molecular out of the protein-DNA interface. Taken together, these findings suggest that Ms6564 makes flexible contact with DNA and may thus slide on the DNA more easily compared with other TFRs.
The water molecule has been reported to play important roles for interaction between protein and DNA (1,27,28). One example is that in the trp repressor-operator complex, where three highly ordered water molecules mediate interactions between the base pairs of half-operators and half-repressors (28). However, water molecules are not reported to participate in regulating interactions between TFR proteins and DNA (17,19,20,29). In the current study, a significant number of water molecules unexpectedly participated in the Ms6564-DNA interface. Moreover, compared with the well ordered water molecules in the trp repressor-DNA interface, these 11 water molecules existed in a disorderly manner within the crystal structure. These water molecules obviously contribute to the DNA binding affinity of Ms6564. This finding is consistent with the observation that Ms6564 inserts only slightly into DNA and that its recognition helix makes flexible contacts with the DNA major groove (Fig. 2C, upper panel). This structure leaves a suitable space for water molecules to be incorporated into the Ms6564-DNA interface. In contrast, with other typical TetR regulator-DNA complexes, such as the QacR-DNA complex, the regulator is tightly bound to the DNA substrate, and the recognition helices of the protein sink deeply into the major groove floor (Fig. 2C, lower panel), leaving no space for water molecules to enter the interface of the protein-DNA complex. Therefore, our study reveals a novel structure model in which disordered water molecules can participate in the interaction between TFRs and DNA. This finding enhances our understanding of the mechanisms of regulator-DNA interaction for the TetR family of transcriptional factors.
Another interesting observation we made is that only two residues in the DBD of Ms6564 are involved in direct interaction with DNA. In contrast, both QacR and CgmR have four amino acid residues that engage in direct DNA binding (19,20), and three amino acid residues are responsible for DNA binding in the E. coli TetR-DNA complex (17). In addition, a transcriptional repressor, MogR, has seven amino acid residues involved in direct interaction with DNA bases (30). We report that Lys-47 and Gln-48 residues in Ms6564 directly interact with bases of the cognate DNA. Lys-47 is specifically associated with guanine through bifurcated hydrogen bonds in each monomer. However, Gln-48 recognizes bases with lower specificity only through a hydrogen bond. Previous studies indicate that a single hydrogen bond usually does not contribute to base specificity (1,31). Consistent with our interpretation, our mutation experiments indicated that Lys-47, but not Gln-48, is essential for DNA binding specificity of Ms6564. M. smegmatis is a fast growing and nonpathogenic mycobacterium whose genome has a high GC percentage of nearly 65%. The GC-rich genome thus provides large numbers of potential bases that can be rec-ognized by Lys-47. This could be a possible mechanism by which Ms6564 regulates expression of many target genes and functions as a master regulator in M. smegmatis.
In summary, we report the crystal structure of the Ms6564-DNA complex and that the conformations of both the regulator and the DNA change upon their interaction. Compared with other TFR proteins, the recognition helix of Ms6564 inserts only slightly into the DNA major groove, and numerous disordered water molecules unexpectedly bridge the interface of TFR-DNA. Furthermore, the symmetry axes of two Ms6564 dimers lie on the same plane, and the DNA still retains a B-form conformation in the complex. These data imply that Ms6564 can bind DNA with strong affinity but makes flexible contacts with DNA. This function may permit the regulator to slide more easily along the genomic DNA and extensively regulate the expression of diverse genes in M. smegmatis.