Plasticity in Repressor-DNA Interactions Neutralizes Loss of Symmetry in Bipartite Operators*

Transcription factor-DNA interactions are central to gene regulation. Many transcription factors regulate multiple target genes and can bind sequences that do not conform strictly to the consensus. To understand the structural mechanism utilized by the transcription regulators to bind diverse target sequences, we have employed the repressor AraR from Bacillus subtilis as a model system. AraR is known to bind to eight different operator sites in the bacterial genome. Although there are differences in the sequences of four of these operators, ORE1, ORX1, ORA1, and ORR3, the AraR-DNA binding domain (AraR-DBD) as well as full-length AraR unexpectedly binds to each of these sequences with similar affinities as measured by fluorescence anisotropy experiments. We have determined crystal structures of AraR-DBD in complex with two different natural operators ORE1 and ORX1 up to 2.07 and 1.97 Å resolution, respectively. These structures were compared with the previously reported structures of AraR-DBD bound to two other natural operators (ORA1 and ORR3). Interactions of two molecules of AraR-DBD with the symmetric operator, ORE1, are identical, but their interaction with the non-symmetric operator ORX1 results in breakdown of the symmetry in protein-DNA interactions. The novel interactions observed are accompanied by local conformational change in the DNA. ChIP-sequencing (ChIP-Seq) data on other transcription factors has shown that they can bind to diverse targets, and hence the plasticity exhibited by AraR may be a general phenomenon. The ability of transcription factors to form alternate interactions may be important for employment in new functions and evolution of novel regulatory circuits.

The binding of proteins to cognate DNA sequences with high specificity is critical for a number of cellular processes. Among such proteins, the ability to detect and bind target sequences present at different locations in the whole genome is especially crucial for the function of transcription factors. This ability enables the transcriptional regulatory pathways to function with remarkable precision and sensitivity. The specificity of transcription factor for its cognate DNA sequence is largely achieved due to non-covalent interactions between the amino acid side chains and the functional groups present on the DNA bases in the major and minor groove (1). In addition, the shape of the DNA also plays a significant role in specific protein-DNA recognition (2)(3)(4)(5). The availability of transcription factor binding data spanning entire genomes has led to identification of new targets for transcription factors. It has been observed that a single transcription factor can bind to several different sites throughout the genome. Sequence deviations have been seen in target sites that are recognized by the same transcription factor. Examples include glp repressors of Escherichia coli and NdgR from Streptomyces coelicolor, which recognize 13 and 19 different operator sequences, respectively (6,7). The sequence differences suggest the possibility of variations in the mode of recognition by a single transcription factor when recognizing target sequences. A systematic structural study of the binding of a transcription modulator to naturally occurring target sequences can facilitate the understanding of the diverse structural strategies employed by the regulator for precise recognition.
AraR is the key regulatory protein of the L-arabinose metabolism in Bacillus subtilis. AraR binds to eight different operator sequences (8 -10) governing five different promoters. AraR is composed of two independent domains exhibiting different functions and belonging to different families of proteins (11,12). The smaller N terminus domain can bind DNA (AraR-DBD) 2 even in the absence of a C terminus domain. AraR-DBD is composed of a winged helix-turn-helix motif and belongs to the GntR family of regulators (13)(14)(15). The larger C terminus domain binds L-arabinose and belongs to the LacI/GalR family (16,17). AraR operators are palindromic DNA sequences, and the consensus operator is 16 bp in length (9). To understand how AraR recognizes diverse operators, we selected the ORA1, ORR3, ORE1, and ORX1 operators for structural and biochemical analysis.
DNA footprinting experiments have shown that AraR may bind these different operators with varying affinities (9). However, fluorescence anisotropy experiments unexpectedly showed that the binding affinities of full-length AraR as well as AraR-DBD to various operators are similar. We have * This work was supported by an Innovative Young Biotechnologist Award (IYBA) grant from the Department of Biotechnology (to D. J.) and intramural funds available from the Regional Centre for Biotechnology (to D. J.) and the National Centre for Biological Sciences-Tata Institute of Fundamental Research (to D. T. N.). The authors declare that they have no conflicts of interest with the contents of this article. previously determined the x-ray crystal structure of the N terminus domain of AraR bound to two natural operators, ORA1 and ORR3 (13). These structures reveal that the two AraR-DBD monomers are bound to each bipartite operator and that recognition of each monomer by DNA is through the core recognition motif TXG present in one half-site of the operator (where X is any base). In the present study, we report the crystal structures of AraR-DBD in complex with two other natural operators, ORE1 and ORX1. Although the overall position of the protein on the DNA is similar in the two cases, there are significant differences in the hydrogen-bonding scheme in the two complexes. A detailed analysis of the AraR-DNA interactions shows that any breakdown of symmetry in the operator sequence is compensated by novel interactions between repressor and DNA. The structural analysis reveals that sequence differences lead to distinct local conformations in the DNA that can be exploited by the transcription factor to achieve novel stabilizing interactions. Overall, the observed plasticity in interactions may be a general phenomenon that allows transcription factors to bind disparate DNA sequences without compromising the specificity.

Materials and Methods
Fluorescence Anisotropy-Fluorescence anisotropy was measured for both full-length AraR and DBD using four different operators, namely ORA1, ORE1, ORX1, and ORR3 (see Fig. 1). The 3Ј end of one of the oligonucleotides in the duplex was labeled with 6-fluorescein amidite, procured from the Keck Biotechnology Resource Laboratory (Yale University). The protein in varying concentrations (0 -500 nM for AraR-DBD and 0 -100 nM for full length) was incubated with 1 nM of respective DNA in a buffer containing 25 mM Tris, pH 8.0, 5% glycerol, 1 mM DTT, 150 mM NaCl for 45 min at room temperature. All the reactions were carried out in triplicates, on 96-well black bottom Costar plates. The fluorescence anisotropy was measured using a SpectraMax M5 micro plate reader (Molecular Devices) with an excitation wavelength of 492 nm and an emission wavelength of 517 nm. SpectraMax software calculates anisotropy values using I parallel (Iʈ) and I perpendicular values (IЌ).
Three independent measurements were recorded for each protein concentration, and average anisotropy values were calculated. Reduced anisotropy values were calculated by subtracting average anisotropy values for a particular protein concentration with the average anisotropy value for the reaction without protein. The reduced anisotropy values were then divided with the maximum value of reduced anisotropy to obtain the fraction bound. This fraction bound was plotted (on the y axis) against protein concentration (on the x axis). The fraction bound data were fitted using the Hill equation to calculate the K d values. This experiment was repeated with all four operators.
Nucleic Acid Preparation-Four oligonucleotides corresponding to complementary sequences for the ORE1 and ORX1 (with a T/A overhang) were purchased from Sigma and dissolved in appropriate volumes of autoclaved and filtered (0.22-m filter) water to achieve a final concentration of 4 mM. Equimolar amounts of complementary oligonucleotides were annealed by heating to 90°C for 5 min followed by cooling to 25°C, and the final concentration of the duplex DNA was 2 mM.
Crystallization-Crystallization conditions and the cryoprotection strategy for the AraR-DBD ORE1 complex were identical to that for AraR-DBD ORX1 . Complex co-crystals were obtained using vapor diffusion by mixing the duplex DNA and AraR-DBD-(1-68) in the ratio 1.2:1, with the final concentration of DNA at 0.67 mM. The mixture was incubated on ice for 20 min. The complex crystallized in solutions of PEG 8000 (20%) buffered with 0.1 M sodium acetate (pH 4.5) containing 200 mM KCl. Crystals were cryoprotected by serial transfers using increasing concentrations of glycerol from 5 to 25% in steps of 5%. The crystals were flash-frozen in liquid nitrogen.
Structure Determination-The native data sets were collected for AraR-DBD ORE1 and AraR-DBD ORX1 at the BM14 beamline of the European Synchrotron Radiation Facility (see Table 1). The data were processed using HKL2000 followed by scaling and merging in SCALEPACK (18). The AraR-DBD ORE1 and AraR-DBD ORX1 complex structures were solved by the molecular replacement method in PHASER (19) using one molecule of AraR-DBD and double-stranded DNA as the model from the previously solved AraR-DBD ORR3 structure (4H0E) as the model. The second molecule of AraR-DBD was docked into the electron density, and the map was improved through iterative cycles of rigid body refinement in CNS. The DNA sequence of ORR3 was modified to match that of ORE1 and ORX1 sequences in Coot (20). This was followed by cycles of model building (Coot) refinement (CNS and PHENIX) (21,22) and water picking (PHENIX) with constant monitoring of geometrical parameters. Final rounds of refinement were performed in PHENIX incorporating TLS restraints (23). The R free and R work converged to final values of 24.7%/19.7% and 24.1% /19.5% for the AraR-DBD ORE1 and AraR-DBD ORX1 , respectively (see Table 1). For the final refined models of the two complexes, MOLPROBITY (24) revealed that 98% of the residues of the DBD are in the allowed regions of the Ramachandran plot. The structure was analyzed using CONTACT (CCP4 suite (25)). Figures were generated using PyMOL (Schrödinger Inc.).

AraR Binds Diverse Sequence Operators with Similar Affinities-
The sequences of the operators used in the study are shown in Fig. 1A. The oligonucleotides are numbered on the basis of the 16-bp consensus sequence (13). There is a dyad symmetry in the operator sequences such that each half-site of the oligonucleotide has ATTTGTAC as the consensus sequence and contains one recognition motif: TXG. The operator ORX1 is asymmetric because it lacks this motif in the first half-site. The spacing between the two recognition motifs is 6 bp for ORA1, ORE1, and ORX1 and is 8 bp in case of ORR3. These operators are conserved at positions 1, 6, 8 -11, 15, and 16. (Fig. 1A).
We have analyzed the increase in fluorescence anisotropy on titrating the labeled oligonucleotides with either the full-length AraR or the DBD (26). The K d for the full-length AraR to vari-ous operators is in the range 69 -80 nM (Fig. 1B), and the affinity of DBD-AraR to operators is lower (K d ϭ 247-277 nM). Based on the composition of these sequences and available data in the literature, it was expected that the affinity for these sequences will be different (9). Previously, the affinities of AraR to the various operators have been measured using DNA footprinting experiments and have shown apparent affinity constant (K app ) ranges between 40 nM and Ͼ250 nM (9). The full-length AraR binds DNA with higher affinity as compared with DBD. Given that the sequence of ORX1 differs from the other operators in one half-site, it was surprising that the DBD bound to it with similar affinity as the other operators. This observation suggests that there may exist a structural strategy to neutralize the effect of sequence divergence on the thermodynamics of AraR-DBD association.
Overall Structure of AraR-DBD ORE1 and AraR-DBD ORX1 -AraR-DBD-(1-68) was co-crystallized in complex with a 21-bp oligonucleotide containing the natural operators ORE1 (2.07 Å) and ORX1 (1.97 Å), and the structures have been determined by molecular replacement ( Table 1). The DNA duplexes used for crystallization have a T/A overhang, and they pack head-to-tail in the crystal. There are two molecules of AraR-DBD (monomers A and B) bound to one molecule of double-stranded DNA in the asymmetric unit. Each monomer of AraR-DBD recognizes one half-site of the operator without forming a dimer (Fig.  2, A and B). The DNA binding domain of AraR adopts winged helix-turn-helix architecture. The AraR-DBD has two distinct DNA binding elements. One is the helix-turn-helix motif formed by helix 2 and helix 3 of AraR-DBD that interacts with DNA bases in the major groove through the residues present at the tip of the recognition helix (helix 3). The second is the wing motif formed by the residues present in the loop connecting the two ␤-strands, which form base-specific interactions in the adjacent minor groove (Fig. 2A). In addition, there are several sequence-independent contacts between protein backbone atoms and/or side chains and the sugar-phosphate backbone of DNA (Fig. 3).
AraR-DBD ORE1 and AraR-DBD ORX1 structures can be superimposed on the AraR-DBD ORA1 complex (13)  AraR-DBD ORE1 Complex-ORE1 is a symmetric operator, and the recognition motif (TXG) is present in both half-sites

Plasticity in Operator-Repressor Interactions
( Fig. 1A). As a result, AraR-DBD monomers A and B form similar interactions with ORE1. The complete intermolecular contacts of AraR-DBD ORE1 complex are shown in Fig. 3A. Only the interactions of monomer A with DNA are described below. The side chains of monomer A, Arg 41 :A and Gln 61 :A, form base-specific interactions with the Gua5 (the guanidinium group forms hydrogen bonds with the O6 and N7 atoms of Gua5) and nucleotide Ade3Ј in the DNA major and minor groove, respectively (Figs. 3A and 4A). In addition, Arg 41 :A also forms a water-mediated hydrogen bond with the base Ade6Ј. His 42 :A forms a water-mediated hydrogen bond with Gua8Ј and Ade10Ј. Interestingly, in the AraR-DBD ORE1 complex, His 42 :B (monomer B) is present in two different conformations. In one conformation, it comes closer to DNA and forms a basespecific interaction with Gua9 and a water-mediated interaction with Thy10 (Fig. 3A). The residue Gly 62 :A forms watermediated hydrogen bonds with bases Thy2 and Thy1Ј. Similar water-mediated interactions are seen for the residues of monomer B, G62:B (Thy16, Thy15Ј) and R41:B (Ade11). In addition, there are extensive interactions of protein residues with the sugar-phosphate backbone of DNA. The residues Glu 30 , Gly 62 , Arg 45 , Tyr 5 , Lys 4 , Ser 40 , and Thr 43 form interactions with the DNA backbone in both the monomers (Fig. 4A).
AraR-DBD ORX1 Complex-In the case of the natural operator ORX1, the recognition motif present in the first half-site is not TXG but ACA, and consequently, the operator sequence is asymmetric (Fig. 1). In the AraR-DBD ORX1 structure, a number of novel interactions are observed between protein and DNA to ensure specific recognition of the ACA sequence. Due to the absence of Gua at position 5 in this sequence, the side chain of Arg 41 :A (monomer A) no longer interacts with DNA. Instead it forms a hydrogen bond with Asn 31 :A (Fig. 4B). Due to the movement of the arginine side chain, a water molecule occupies the previous position of the nitrogen atom (of Arg). This water molecule is involved in the hydrogen-bonding interaction between Glu 30 :A and the N7 atom of Ade5 (Figs. 3B and 4B). The importance of the residue Glu 30 in interaction with DNA has been shown recently in the context of ORX1 promoter using reporter assays where its mutation to alanine resulted in complete loss of the regulation of the wild-type abf2Ј-LacZ promoter fusion (27). However, the conformation of Glu 30 :B (monomer B) is similar to what is seen for the AraR-DBD ORE1 complex where it forms hydrogen bonds with Arg 41 :B and Arg 45 :B and stabilizes their interaction with DNA. In addition, it forms a water-mediated hydrogen bond with the DNA backbone (Fig. 3B).
Another residue, His 42 :A side chain, also undergoes a significant conformational change, and it forms hydrogen bonds with the N7 and O6 atoms of Gua8Ј (Fig. 4B). This novel interaction is possible due to the local conformational change in the DNA (Fig. 5B). This interaction does not occur in the AraR-DBD ORE1 complex as it would result in steric clash between His 42 :A and Gua8Ј (Fig. 5A). It has been seen earlier that the mutation of Gua8Ј to Thy results in complete loss of repression activity in vivo (28). The Gua8Ј is conserved in all the operators except ORB1. Biochemical assays have shown that mutation of either Arg 41 or His 42 does not completely abolish the regulatory activity of AraR in vivo, indicating that these two residues may compensate for each other (28). The interactions of monomer A of AraR in the minor groove are also altered. Gln 61 :A forms basespecific interactions with Thy3Ј as well as with Gua4Ј (Fig. 4B). Of the other residues that interact with DNA, Gly 62 :A forms water-mediated interactions with Thy1Ј and Thy2 identical to what was observed for the AraR-DBD ORE1 structure (Fig. 3B). Remarkably, monomer B of AraR-DBD retains all the interactions with the second half-site of the operator ORX1, which conforms with consensus TXG (Fig. 3, A and B). Thus, in the AraR-DBD ORE1 structure, the two AraR-DBD monomers display asymmetric interactions with the operator.

Comparison of AraR-DBD ORE1 and AraR-DBD ORX1 Structures with Other AraR-DBD-Operator
Complexes-Binding of the recognition helix of the helix-turn-helix motif into the major groove of DNA allows for base-specific interactions with protein side chains. AraR-DBD forms very similar interactions with the DNA in AraR-DBD ORE1 , AraR-DBD ORR3 , and AraR-DBD ORA1 complexes, as can be seen in Fig. 4, A-D. In all three structures, the side chain of Arg 41 forms specific interactions with Gua5 of the TXG motif, and Gln 61 forms specific interactions with Ade3Ј, which is present opposite to Thy in the TXG motif. Also, in all three complexes, the interactions of monomer A and B with DNA are symmetric. However, the operator, ORX1, contains the altered recognition motif (ACA) in the first half-site of the operator ORX1. As a result, in this complex, the AraR-DBD monomer A makes novel contacts with DNA (Fig.  4B) that are facilitated by local conformational change in the DNA. Consequently, the interactions of AraR-DBD become asymmetric with the two half-sites of the operator ORX1.
Analysis of DNA Structure-It has been shown that the shape of DNA contributes extensively toward protein-DNA recognition (2)(3)(4)(5). The analysis of the DNA structure using Curvesϩ (29) reveals that although the ORE1 operator is bent by 10°w hen bound to AraR-DBD, the ORX1 operator is bent by 8.5°. The electrostatic potential surface in the two complexes is shown in Fig. 6, A and B. Although there is no significant change in the charge on the interacting surface, there is a difference in the shape of the protein where it contacts the altered recognition motif in the ORX1 complex. The structure of the DNA bound to the protein was also analyzed for all four repressoroperator complexes. The widths of the major groove, minor groove, twist, and roll parameters measured for DNA are plotted in Fig. 7, A-D. As can be seen from the plot, the minor groove is significantly narrower in the region of the two recognition motifs where it is interacting with the DNA. In AraR-DBD ORR3 complex, the two recognition motifs are separated by 8 bp as opposed to 6 bp in the other three structures, and as a result, the constriction of the minor groove is also shifted by two bases. In addition, there is a slight kink in the DNA at the central CG base pair as reflected in the decrease in the minor groove width in the region in all but the AraR-DBD ORR3 complex. The major groove is enlarged by about 2 Å in the region of protein binding in all but the ORX1 structures as compared with the standard B-DNA values. These local changes in DNA conformation in the ORX1 structure facilitate the altered interactions observed in this complex.

Discussion
Previous data have allowed us to define the specific recognition motif (TXG) present in the two half-sites of the operator sequences recognized by the DNA binding domain of AraR (13). In addition to these observations, we have now shown that the two molecules of AraR bound to the same operator are capable of recognizing significantly different DNA sequences in the two half-sites. This is achieved by providing a novel interface in one of the two monomers to bind to the asymmetric operator ORX1, keeping the position of the protein on DNA intact. Sequence comparison of the four operators (Fig. 1) shows that position three is occupied by either Thy or Ade. The AraR-DBD utilizes the same residue, Gln 61 , to recognize both these bases. Another deviation from the consensus in the operator ORX1 is the presence of Ade instead of Gua at position 5. The structures show that this change results in breakdown of the symmetry of interactions, and in such cases, the protein-DNA recognition is modulated by the local DNA structure and bases outside the recognition motif.
Transcription factors often bind DNA sequences that are altered or relaxed as compared with the consensus sequence. These sites have been shown to be present in unexpected locations in the genome (30). Thus, transcription factors are capable of binding to a larger repertoire of sequences than what is suggested by consensus motif searches. In fact, ChIP-Seq and ChIP-on-chip data have identified degenerate target sites for several transcription factors such as CtrA, LexA, and FNR, which were earlier unknown (31)(32)(33). Using modified chromatin immunoprecipitation, several intergenic sites were identified to be bound in vivo by transcription factor CtrA from Caulobacter (31). These binding sites did not correspond to the consensus sequence identified by programs such as MEME and BIOPROSPECTOR upon searching the upstream sequences of cell cycle regulons. In vitro binding of CtrA was shown by DNA footprinting to the fliX promoter, which only showed a 4-bp match to the minimum 8-bp consensus. Similarly, of the 63 target sites identified by ChIP-on-chip for FNR binding, 10 sites do not match the canonical binding sites (33). ChIP-on-chip, along with transcription profiling studies in the case of AraC, a transcription factor that regulates arabinose metabolism in E. coli and Salmonella enterica, have identified targets that have no connection with arabinose metabolism (34).
It has been suggested earlier that cooperative binding by multiple transcription factor to adjacent sites reduces the requirement for the consensus (32). However, from the current study, it is clear that in addition to cooperativity, these transcription factors are endowed with the ability to tolerate the deviations from the consensus site through energetically favorable alternate interactions. The altered specificity of the transcription factor as observed here can promote the rewiring of the transcription regulatory circuits and can facilitate the evolution of new ones.
A systematic study depicting the structural basis of DNA recognition by transcription factors is lacking in literature except in a very few cases. Structures of carbon catabolite protein A (CcpA) and mammalian transcription factor NF-B p65 (RelA) have been determined when bound to three different operator sequences in each case (35)(36)(37). The two proteins bind their cognate operators with similar affinities but use different structural strategies for interactions. The CcpA-DNA structures show that the operators display different bend angles when bound to protein facilitated by a flexible linker between the two domains of the protein. The NF-B p65 (RelA) structure bound to asymmetric and pseudosymmetric operator shows that the hydrogen-bonding patterns at the protein-DNA interface show significant differences (37) in the two cases. Sim-  ilarly, the structures of the DNA binding domain of the repressor of the phage 434 bound to OR1, OR2, and OR3 sites show extensive structural differences between the consensus and non-consensus half-sites. In the case of OR3, rearrangement of several amino acid side chains occurs for DNA binding (38 -40). In the case of the human estrogen receptor, it was shown that the recognition of the non-consensus DNA occurred through rearrangement of a lysine side chain and formation of an alternate base contact (41).
Taken together, these studies, along with the present study, suggest that there are at least three approaches by which plasticity favors the binding of a transcription factor. (i) The first scenario is one in which the binding of the protein to the nonconsensus half-site happens only through backbone interactions; the protein side chains do not show any base-specific interactions to the altered half-site. (ii) The second scenario is one in which the binding to the altered site occurs through utilization of exactly the same residues that would recognize the consensus half of the operator. (iii) The third scenario is one in which the protein uses different sets of side chains to interact with the non-consensus half-site and the consensus half-site. It is evident that the appearance of newer specificities is facilitated by flexibility of the DNA backbone as well as the amino acid side chains. The ability of a transcription regulator to recognize different sequences using any one of the above three strategies allows scope for evolution of new regulatory circuits as any mutation in the protein can lead to the appearance of new specificities.