Mechanism of cognate sequence discrimination by the ETS-family transcription factor ETS-1

Functional evidence increasingly implicates low-affinity DNA recognition by transcription factors as a general mechanism for the spatiotemporal control of developmental genes. Although the DNA sequence requirements for affinity are well-defined, the dynamic mechanisms that execute cognate recognition are much less resolved. To address this gap, here we examined ETS1, a paradigm developmental transcription factor, as a model for which cognate discrimination remains enigmatic. Using molecular dynamics simulations, we interrogated the DNA-binding domain of murine ETS1 alone and when bound to high-and low-affinity cognate sites or to nonspecific DNA. The results of our analyses revealed collective backbone and side-chain motions that distinguished cognate versus nonspecific as well as high- versus low-affinity cognate DNA binding. Combined with binding experiments with site-directed ETS1 mutants, the molecular dynamics data disclosed a triad of residues that respond specifically to low-affinity cognate DNA. We found that a DNA-contacting residue (Gln-336) specifically recognizes low-affinity DNA and triggers the loss of a distal salt bridge (Glu-343/Arg-378) via a large side-chain motion that compromises the hydrophobic packing of two core helices. As an intact Glu-343/Arg-378 bridge is the default state in unbound ETS1 and maintained in high-affinity and nonspecific complexes, the low-affinity complex represents a unique conformational adaptation to the suboptimization of developmental enhancers.

Developing organisms depend on the expression of essential genes in a spatially and temporally coordinated manner. Functional studies in model animals, including sea squirts (1,2), Drosophila spp. (3)(4)(5), Caenorhabditis elegans (6), sea stars and sea urchins (7), and the annelid Platynereis dumerilii (8), show that low-affinity binding by transcription factors is essential for developmental gene regulation. Low-affinity ("suboptimal") binding enables spatiotemporal control of genes by tuning the sensitivity of their enhancers to transcription factor levels and facilitating combinatorial control by multiple factors (9,10). Substituting the WT low-affinity sites with high-affinity sequences disrupts the level, location, timing, and specificity of developmentally sensitive genes with major dysmorphic outcomes. As the literature increasingly implicates low-affinity transcription factor-DNA complexes as key mediators of gene expression, mechanisms of sequence discrimination are re-emerging as deserving areas of detailed investigation.
To this end, the ETS family of transcription factors represents an excellent model system for tackling this line of inquiry. ETS proteins arose early in metazoan evolution (11,12) and are ubiquitous in animals (13,14). Despite significant divergence in both function and primary sequence, ETS proteins harbor an eponymous DNA-binding domain that is highly conserved. The ETS domain is a winged helix-loop-helix motif that recognizes a helical turn of DNA sequences harboring a core 5Ј-GGA(A/T)-3Ј consensus. Beyond this specific requirement, ETS domains recognize a wide palette of flanking-base variants (and their epigenetic modifications) with a dispersion of affinities (15)(16)(17). ETS complexes formed with consensus-bearing (cognate) sequences are experimentally distinguishable from nonspecific complexes, quantitatively in terms of affinity and qualitatively by distinctive DNase I hypersensitivity in ETSbound DNA at the core consensus (18,19). Cognate discrimination is a biologically relevant function because the genomic distribution of ETS-bound DNA sequences in vivo corresponds closely to the relative affinities for their respective ETS domains (20). ETS members that are more stringent site discriminators in binding experiments are, in turn, more selective in their genomic motifs (21,22). Finally, low-affinity ETS binding was specifically required for proper activation of enhancers governing the patterning of the anterior neural plate in Ciona embryos (1).
Structurally, ETS domains are characterized by a characteristic segmented pattern of contacts with cognate DNA. Although contacts with the core motif are mediated with nucleobases, flanking contacts occur exclusively by interactions with backbone phosphate or deoxyribose oxygen atoms. Lowaffinity binding therefore appears to arise from variation in the core-flanking contacts. Since proposing this "indirect readout" mechanism of cognate discrimination (16), there has been limited progress in elucidating its molecular basis. In particular, little is known about how ETS domains dynamically parse cog-nate targets. NMR studies have established that the ETS domains of ETS1 (23) and ETV6 (24) utilize the same interface in both cognate and nonspecific DNA complexes. Nonetheless, the precise interactions that distinguish "low-affinity" and "nonspecific" complexes remain obscure. This is a critical distinction because nonspecific DNA dominates cognate sites in abundance but does not constitute bona fide binding sites for transcription factor function.
To address these questions, we selected ETS1, a paradigm ETS-family regulator of key developmental processes such as hematopoiesis (25) and cardiac neural crest development (26,27) as a model system. Heteronuclear NMR studies of ETS1 have shown that high-affinity and nonspecific DNA binding invoke large differences in 15 N spin-lattice and spin-spin relaxation in the bond vectors as well as amide 15 N NOE of the bound protein (23). As these parameters reflect conformational dynamics on the picosecond to nanosecond timescale, we hypothesize that molecular dynamics (MD) 3 simulations should provide useful high-resolution information regarding the target-dependent differences in binding affinity. The current state of highly optimized biomolecular force fields, particularly for DNA (28), further support unbiased MD as a viable approach. We therefore compared explicit solvent simulations of the ETS domain of murine ETS1 (residues 331-440) in the unbound state and in complex with high-affinity, low-affinity cognate (consensus-bearing), or nonspecific DNA. These simulations represent the longest-duration and broadest-scope allatom studies of any ETS protein to date (29 -33). The results are consistent with literature data and reveal novel mechanisms of sequence discrimination. Additional simulations and experiments with mutants prompted by observations on WT ETS1 identified specific electrostatic contacts distal from the protein-DNA interface as important determinants of highversus low-affinity cognate binding. The results have implications for the evolution of low-affinity binding in developmental enhancers as well as autoinhibition, a general mechanism of allosteric regulation and protein-protein partnerships by transcription factors (34).

Results
To maximize the relevance of the simulations to experiments, a binary co-crystal structure (PDB code 1K79) that most closely corresponded to the C-terminal ETS domain (⌬N331) used in experimental studies (15,17,21,(35)(36)(37) was used to template the simulational models. For the complexes, the crystallographic DNA sequence was replaced with DNA duplexes utilized in experimental determinations of the binding affinity under the same conditions (0.15 M NaCl at 25°C, Fig. 1). The high-and low-affinity cognate sequences (termed in the literature SC1 and SC12) were established substrates originally identified in SELEX (systematic evolution of ligands by exponential enrichment) screening against this same ETS1 construct (15). Binding experiments indicated that the high-and low-affinity cognate complexes of ⌬N331 differed in stability by 13-fold (⌬⌬G°ϭ ϩ6.40 kJ/mol) and that the nonspecific complex bound to a randomized sequence was an additional 28-fold weaker (⌬⌬G°ϭ ϩ8.27 kJ/mol) than the low-affinity complex.
The established secondary structure assignment for ETS domains, a winged helix-loop-helix motif, is H1-S1-S2-H2loop-H3-S3-wing-S4 (H, ␣-helix; S, ␤-strand). H3 is the essential recognition helix that inserts into the major groove of consensus DNA and is the most conserved element in the ETS domain. As in native ETS1, ⌬N331 contained two short C-terminal helices, H4 and H5, that are not part of the canonical ETS topology (38). However, ⌬N331 did not include additional N-terminal helices (termed HI-1 and HI-2) that are extrinsic to the winged helix-loop-helix motif and negatively regulate DNA binding. Each structure was run in triplicate independent simulations seeded with random initial velocities. Equilibration was judged by the root mean square deviation of the protein and typically occurred within 700 ns of production (Fig. S1). The final 300 ns of each run was taken for analysis for a total time of 3.6 s. We reserve the label "low-affinity" for suboptimal consensus-bearing sequences but not nonspecific DNA.

Validation of simulations against experiments
Our first objective was to establish whether the simulations generated reasonable models of ETS1 compared with experimental data. Relative to the co-crystallographic template, the bound proteins relaxed without major changes in secondary structure (HA, 1.9%; LA, 2.8%; NS, 0.9%; Fig. S2A). The The crystal structure of the minimal ETS domain of ETS1 in complex with DNA (PDB code 1K79) was used as the template for independent simulations in the free (apo) and various DNA-bound states. The simulational and experimental DNA consisted of the shown sequences inserted into a (5Ј-CGGCCAA . . . ATGGCG-3Ј) cassette to generate a 23-bp construct. The 5Ј-GGA(A/T)-3Ј consensus is underlined. The standard free energies of complex formation at 25°C were determined from DNA binding experiments (mean Ϯ S.E. of three or more replicates), as shown below the representative bound structures.

Suboptimization of ETS1 binding
unbound state showed larger differences (8.6%) relative to the closest solution NMR apo structure (⌬N301, PDB code 1R36) that harbors the additional N-terminal helices HI-1 and HI-2 (39). Differences were localized at H1, the unnamed single-turn helix between H1 and S1, S3, and H5, which were all close to the HI helices. Notably, ⌬N331 from the co-crystal structure also showed a ϳ9% difference in ␣/␤ content relative to the NMR apo structure (Fig. S2B). In the absence of an experimental structure of unbound ⌬N331, the secondary structure difference between the unbound models appeared to reflect conformational contributions from the additional residues in ⌬N301. 15 N NOE experiments showed that the backbone amides of ETS1 near and at the DNA interface became less dynamic on the picosecond to nanosecond timescale upon binding DNA (23). Comparison of root mean square fluctuations (RMSF) between the bound and unbound states showed reduced backbone dynamics in all simulational bound structures as well ( Fig.   2A). The decreased dynamics mapped principally to the loop (between H2 and the recognition helix H3), H3, and the wing (between S3 and S4) (Fig. 2B). The nonspecific complex retained significantly greater fluctuations relative to specific complexes, also in agreement with the NOE data (23). With respect to the side chains, the difference in RMSF of nonhydrogen side-chain atoms between unbound and bound ETS1 indicated that most DNA-contacting residues were also stabilized when bound (Fig. 2C). Also in agreement with a reported comparison of bound and free NMR structures (37), several residues displayed increased dynamics relative to unbound ETS1. Thus, the general agreement between experiments and simulations by multiple structural and dynamic metrics supports the simulations as viable models.
As an additional check on the models, we tested whether the unbound and bound states were reversible on the simulation timescale (10 Ϫ7 s). To do so, we exchanged the unbound and From top to bottom, the rows show unbound ETS1 and the high-affinity, low-affinity, and nonspecific complexes. A, per-residue RMSF of backbone and nonhydrogen side-chain atoms of ETS1. Bound ETS1 RMSF was calculated by taking the difference of bound from the unbound backbone/side-chain RMSF. Residues with notable increases in side-chain dynamics in each set are marked with asterisks. The three first and last terminal residues were disregarded. B and C, Differences in backbone and side-chain RMSF (bound Ϫ unbound) are mapped to representative structures. Regions that were most stabilized are colored blue, and the most dynamic are colored red. Residues with the strongest side-chain dynamics are indicated by arrows.

Suboptimization of ETS1 binding
bound protein states at the final frame of their respective original simulations. New unbound systems were generated by deleting the DNA from bound systems, and new bound systems were constructed by mapping the first converged unbound protein onto DNA. Positional restraints were applied to the DNA backbone for the first 700 ns to preserve the bound conformation. Re-equilibration of the newly unbound and bound systems, as judged by root mean square deviation, occurred from 1.3 to 1.5 s, and the final 300 ns was used for analysis. We compared the equilibrated trajectories from the original and exchange simulations by principal component analysis (PCA) of all nonhydrogen atoms from the combined trajectories (Fig.  S3). In each case, corresponding pairs of original and exchanged trajectories overlapped along the first two principal components, indicating highly similar conformations. Thus, even though the simulations were unbiased and did not sample longer-timescale dynamics, the distinct dynamics of the various states were reversible on the nanosecond to microsecond timescale, supporting the physical relevance of the simulations in interpreting experimental binding data.

Comparison of the sequence-dependent protein-DNA interfaces formed by ETS1
As a parsimonious initial approach to comparing the interactions in the ETS1 complexes, we enumerated hydrogen bonds (direct and water-mediated) and charge-charge interactions between the protein and DNA along standard criteria as described under "Experimental procedures." Overall, nonspecifically bound ETS1 made fewer DNA contacts than any cognate sequence, particularly in direct hydrogen bonds and charge-charge interactions (Table S1). We then scrutinized three residues (Leu-337, Arg-391, and Arg-394) that have been experimentally implicated as critical for cognate binding (23,40). Leu-337 at the N terminus of H1 contacts DNA via a backbone-backbone contact that is nevertheless sensitive to side-chain mutations (40). Arg-391 and Arg-394 are signature residues in the recognition helix H3, which are absolutely conserved in all ETS domains and responsible for interacting with the core 5Ј-GGA(A/T)-3Ј consensus.
In the simulational complexes, Leu-337 contacted DNA at the same flanking base position with comparable frequencies in both specific complexes (HA, 67%; LA, 72%) but only made an infrequent water-mediated contact in nonspecific complexes (18%) (Fig. S4A). Arg-391 was persistently hydrogen-bonded to DNA (HA, 93%; LA, 91%; NS, 93%) at the first guanine in the consensus as well as the corresponding position in nonspecific DNA. DNA contacts by Arg-394 were more variable. In addition to a reduction in direct hydrogen bonds (HA, 84%; LA, 79%; NS, 71%), water-mediated interactions (HA, 26%; LA, 36%) were lost in nonspecific binding. Although the 1 H-15 N cross-peaks for the side-chain guanidiniums show different patterns between the experimental nonspecific and high-affinity complex (23), the configurations of DNA contacts by Arg-391 and Arg-394 were also different between the simulational complexes (Fig. S4B). The conserved H3 arginines therefore made compensatory interactions with nonspecific DNA even though, overall, nonspecific DNA afforded substantially fewer interactions with the ETS domains relative to the cognate sequences.
Comparison of the two cognate complexes showed that ETS1 made fewer hydrogen bonds and charge-charge interactions with DNA in high-affinity compared with low-affinity systems. Little difference was found in water-mediated hydrogen bonds (Table S1). Closer examination of direct hydrogen bond frequencies between the high-and low-affinity complexes revealed two significant residues: Gln-336 and Glu-343 (Fig.  S5). In the high-affinity complex, Gln-336 (like its neighbor Leu-337) contacted the DNA frequently (80%), whereas the low-affinity complex only made infrequent water-mediated contacts with DNA (20%) (Table S2). However, the low-affinity complex displayed a contact made to DNA by Arg-378 (90%) that was absent in the high-affinity complex. The dichotomy between the overall DNA contact density of highversus lowaffinity complexes on one hand and contacts made by specific residues on the other prompted us to determine the collective contributions to DNA binding in the complexes at residue resolution. As an unbiased approach, we performed PCA of the combined trajectories of all four states to probe the per-residue contributions in terms of linearly correlated collective motion.

Backbone and side-chain motions differentiate ETS1-DNA complexes
In view of the contrasting backbone and side-chain RMSF data ( Fig. 2), we separately analyzed backbone (C, C ␣ , N, and O atoms) and side-chain nonhydrogen atoms (see below), excluding the first and last three residues because of their proximity to the termini. Approaches such as PCA reduce the data dimensional space by identifying subspaces that capture the essential collective variation in the data. For the backbone, the first two principal components (PCs) were sufficient to describe the majority (ϳ60%) of the motions present in the protein (Fig. 3A). The primary residues contributing to PC1 consisted of H2/loop (residues 378 -383) and the wing (residues 404 -408) adjoining S3 and S4 (Fig. 3B). Both the loop and wing interact with bases flanking the core consensus. Projected along PC1, this motion corresponded to the loop and wing moving toward (negative scores) and away from the DNA (Fig. 3C). The distributions of motion along PC1 qualitatively differentiated (in sign) cognate and nonspecific systems, with the latter nearly identical to unbound ETS1. These results immediately indicate that backbone collective dynamics could distinguish cognate and nonspecific binding. In contrast with PC1, the collective motion corresponding to PC2 was diffusely distributed between the loop and wing plus several residues in helix H4 (Fig. S6, A and  B). Projection along PC2, which captured collective motions that were orthogonal to PC1, did not meaningfully differentiate the bound and unbound states (Fig. 3C). This residual collective backbone motion therefore described intrinsic dynamics and stochastic noise that were independent of DNA.
Because the dominant collective backbone motion in PC1 corresponded to the loop and wing moving in and out of proximity to DNA, we examined the density of protein-DNA contacts along these elements in the three complexes. Progression from positive to negative along PC1 (i.e. nonspecific Ͼ highaffinity Ͼ low-affinity) corresponded to an increase in hydro-

Suboptimization of ETS1 binding
gen bonds between protein and DNA along the loop, particularly direct protein-to-DNA contacts (Fig. S7). This trend followed the global enumeration of contacts (Table S1) which showed a higher density of contacts in the low-affinity complex over the high-affinity one. Both approaches therefore indicated that backbone motion did not account for high-and low-affinity discrimination, further motivating examination of sidechain dynamics.
The scree plot for the side-chain PCA showed that the collective side-chain dynamics were more directed than for the backbone, wherein the first PC alone accounted for Ͼ50% of the total variance (Fig. 4A). All but one of the dominant residues contributing to side-chain PC1 were located in the protein-DNA interface, spanning in sequence from Arg-378 in helix H2 to the end of the wing (Fig. 4B). The lone residue outside the DNA contact surface is Glu-343 in H1, which is proximal to Arg-378 in H2. Globally, the motion defined by side-chain PC1 described a concerted contraction or expansion of the protein in correspondence to an ordered change in solvent-accessible surface area (SASA) of the bound protein by ϳ10% (Fig. S8A). The distributions of fluctuations showed that collective sidechain motion along PC1 completely distinguished the highand low-affinity complexes (Fig. 4C) whereas backbone PC1 did not (cf. Fig. 3C). Nonspecific complexes remained overlapped with the unbound state.
Although dominant loop and wing residues in side-chain PC1 were also among those in backbone PC1, their quantitative contributions did not correspond significantly, reflective of uncorrelated components in backbone and side-chain dynamics hinted at by the RMSF data (cf. Fig. 2). In addition, several other residues contributed significantly to side-chain PC1 but not to backbone PC1. Of particular interest were Arg-378, the most dominant contributor to PC1, and Glu-343, located in H1 and well-separated from the DNA interface. Their side chains showed a concerted range of motion along PC1 from being in close proximity to far apart (Fig. S9). Residual fluctuations along side-chain PC2, which were distributed over a similar range of residues (Fig. S6, C and D), did not segregate the four states ( Fig. 4C) and showed no trend in SASA of the bound protein (Fig. S8B).
Summarizing the principal component analyses, collective motions of all four states for ETS1 were primarily captured by the respective first PC for the backbone and side chains. Projection of the backbone dynamics along this PC separated cognate from nonspecific binding. Cognate sites, but not nonspecific DNA, induced backbone motions at the structural elements flanking the recognition helix toward the DNA. This accounted for the higher density of DNA contacts in cognate complexes over their nonspecific counterparts. Projection of side-chain dynamics along its first PC separated high-and lowaffinity cognate binding. Collective side-chain motion was dominated by a charged residue, Arg-378 in helix H2, that underwent a large coupled motion with a complementary partner, Glu-343 in H1. As the concerted dynamics of Arg-378 and Glu-343 presented themselves as prominent contrasting features in the highversus low-affinity complex, we examined their contributions to DNA binding in greater detail.

Cognate sequence selection is coupled allosterically to a distal salt bridge
Because side-chain PC1 completely distinguished high-and low-affinity binding, we scrutinized the conformations of Arg-

Suboptimization of ETS1 binding
378, the most dominant contributor to PC1, and its partner Glu-343 in the two cognate complexes (Fig. 5A). In the highaffinity complex, the side chains of Arg-378 and Glu-343 formed a persistent salt bridge with a time-average center-ofmass separation of 4.0 Ϯ 0.1 Å (Fig. 5B), noncovalently stapling H1 and H2 together. In the low-affinity complex, Arg-378 instead contacted a DNA backbone phosphate downstream of the 5Ј-GGAA-3Ј consensus. This was a dramatic change in local conformation, as the side chains were separated by 15.1 Ϯ 0.1 Å in the low-affinity complex.
In addition to the E343-R378 interaction, the low-affinity complex was characterized by a large reduction in DNA backbone contact by Gln-336, a residue proximal to H1 (Fig. 5A). Gln-336 and Arg-378 had been identified as two residues with significant differences in DNA contacts between the high-and low-affinity complexes (Fig. S5). In the high-affinity complex, the side-chain amide nitrogen of Gln-336 maintained a timeaveraged distance of 3.7 Ϯ 0.3 Å to the closest phosphate oxygen. In the low-affinity complex, the Gln-336 side chain was mostly solvent-exposed at 6.2 Ϯ 0.1 Å from the DNA. To interrogate the triad of Gln-336, Glu-343, and Arg-378 more definitively, we carried out a side-chain PCA consisting of these three residues only. The resulting correlation uniquely separated the low-affinity complex from all other states (Fig. S10). The low-affinity complex was uniquely characterized by concerted side-chain dynamics in which contact by Gln-336 with the DNA backbone adjacent to the core consensus was allosterically coupled to the distal Glu-343/Arg-378 salt bridge. Because Gln-336 resided at the DNA contact surface, the results suggested that Gln-336 was directly sensitive to DNA identity, directing the fate of the Glu-343/Arg-378 salt bridge whose disruption specifically flagged a low-affinity complex.
To further clarify these relationships, we simulated a panel of complexes harboring various point mutations at Glu-343 or Gln-336. Of several mutations tested at one or the other position, only high-affinity complexes of E343L and Q336L perturbed the dynamics and significantly sampled PC space occupied by the WT low-affinity complex (Fig. S11). The Glu-343/ Arg-378 distance profiles of WT and mutant complexes showed that the salt bridge was predominantly intact in both high-and low-affinity mutant Q336L complexes, unlike their WT counterparts (Fig. 5B). Thus, discrimination between highand low-affinity sequences, insofar as the Glu-343/Arg-378 salt bridge was concerned, was abolished in the Q336L mutant. For E343L, elimination of the salt bridge rendered high-affinity binding similar to low-affinity binding by WT ETS1. Low-affinity binding by E343L was even more perturbed, with no welldefined Q336L-DNA contact (Fig. 5C). DNA contact by Arg-378 in both E343L complexes were also sharply reduced. In summary, the simulations of the mutants corroborated the WT results. These simulations reinforced the Glu-343/Arg-378 salt bridge as an important contributor to high-affinity binding in WT ETS1 and Gln-336 as a relay of sequence information to the salt bridge.
To test this hypothesis experimentally, we cloned E343L and Q336L mutants of ⌬N331 and measured their affinities for high-and low-affinity DNA (Fig. 5D). E343L bound both highand low-affinity DNA more poorly than WT ETS1 but retained target selectivity (Fig. 5E). The reduction in affinity in both mutants was consistent with the Glu-343/Arg-378 salt bridge as a stabilizing interaction for cognate ETS1-DNA complexes. In the case of Q336L, impaired binding was anticipated by the elimination of side chain-DNA contacts. However, we were specifically interested in whether selectivity would be compro-

Suboptimization of ETS1 binding
mised, reflective of Gln-336 as a sensor of highversus lowaffinity DNA. Q336L bound both cognate sequences similarly within experimental precision (Fig. 5D). Notably, Q336L showed significantly higher affinities relative to the low-affinity E343L complex. The binding experiments thus supported Gln-336 as a selectivity sensor that triggered the loss of a distal stabilizing salt bridge (Glu-343/Arg-378) upon binding to lowaffinity DNA. This sensor operated only in the low-affinity complex, as an intact Glu-343/Arg-378 salt bridge was the default state in unbound ETS1 and maintained in both highaffinity and nonspecific complexes.
To interrogate the triadic residues in a more functional context, we examined the Mnx enhancer, which regulates development of the notochord in Ciona embryos (1,2). A key 5Ј regulatory region in the Mnx enhancer harbored three cognate ETS sites, one of which is high-affinity and the other two weak (Fig.  6A). The low-affinity (suboptimal) site proximal to the 5Ј terminus is absolutely required for correctly patterned Mnx expression in vivo (2). Alteration of this site to a high-affinity sequence ("optimized ETS," or OE, in keeping with the reported nomenclature) results in strong but ectopic Mnx expression outside of the notochord (2). A mechanistic link between cognate discrimination and functional embryogenesis would therefore predict differences in Mnx occupancy by WT ETS1 and the selection-competent mutant E343L but not the incompetent mutant Q336L. We titrated native ("Mnx Ϫ2bp") and OE enhancers with WT and mutant ETS1 and resolved the various complexes (unbound and up to triply bound states) by native gel electrophoresis (Fig. 6B). Across identical ranges in protein concentrations, both E343L and Q336L bound the enhancers more weakly than WT (Fig. 6B, arrows). Crucially, at protein concentrations that just saturated the enhancer sites, both WT and E343L were qualitatively different in their relative occupancies of the available cognate sites. WT and E343L exhibited differential preferences for the singly and doubly bound states. In contrast, Q336L showed the same relative occupancy and therefore failed to discriminate the two enhancers. The data therefore strongly supported cognate discrimination by ETS1 and its basis in the triadic residues as functionally important in chordate embryogenesis.

Breach of the Glu-343/Arg-378 salt bridge compromises the structural integrity of H2 and exposes the hydrophobic core of H1
In their salt-bridged configuration, the sidechains of Glu-343 and Arg-378 spanned over significant portions of the H1 and H2 helices. The hydrophobic composition of H1 (LWQFLLELLT, Glu-343 underlined) suggested that loss of the Glu-343/Arg-378 salt bridge, which is intact in unbound ETS1, would be detrimental to protein stability. To test this hypothesis, we analyzed the conformations of H1 and H2 in the simulational models. First, differences in solvent exposure were Figure 5. Cognate DNA discrimination is coupled allosterically to a distal salt bridge. A, representative snapshots of the high-and low-affinity complexes from the centroids of their respective side-chain PC1/PC2 clusters in the side-chain PCA. Gln-336 is at the N terminus of H1, whereas Glu-343 and Arg-378 are near the C termini of H1 and H2, respectively. The DNA bases contacted by either Gln-336 (high-affinity) or Arg-378 (low-affinity) are marked with asterisks. B, distance profile of the Glu-343/Arg-378 salt bridge, taken as the separation between the centers of mass of the terminal nitrogen and oxygen atoms in their side chains. Each histogram is normalized to the total number of frames counted, white-filled for WT and yellow-filled for mutant ETS1. C, separation distance of the side-chain nitrogen (N⑀2) of Gln-336 to the phosphate oxygen (OP2) of its DNA contact (position marked with an asterisk in A, high-affinity). D, representative binding profiles of the Q336L and E343L mutants to high-and low-affinity DNA at 25°C. E, free energy changes of WT and mutant ETS1 binding to high-or low-affinity DNA, expressed as difference from the WT high-affinity value as mean Ϯ S.D. of three or more replicates. Student's t test, *, p Ͻ 0.05.

Suboptimization of ETS1 binding
determined in terms of accessible water at each residue in H1 (residues 337-345) and H2 (368 -378) relative to the unbound protein (Fig. 7A). In the high-affinity complex, in which the salt bridge was intact, Leu-337 and Trp-375 were less water-exposed, on average, relative to unbound ETS1. Loss of the salt bridge in the low-affinity complex resulted in exposure of several hydrophobic residues to water in H1 (Phe-340, Glu-343) and H2 (Gly-376), whereas Arg-378 became less water-exposed upon burial near the DNA (cf. Fig. 6A). H2 was well-preserved in both high-and low-affinity complexes up to Gly-376 (Fig.  7B), beyond which backbone H-bond contacts became less favorable in a DNA-dependent manner. As a result, helicity in the low-affinity complex was significantly compromised (Fig.  7C). As Arg-378 resided at the C terminus of H2, these results suggest that a salt bridge with Glu-343 significantly stabilized the local tertiary structure. The presence of a strong helix breaker like glycine in H2 may serve as a leverage point to facilitate the contortion of Arg-378 in H2.

A conserved triad of residues mediates low-affinity binding by ETS1
The side-chain dynamics exhibited by the triad of Gln-336/ Glu-343/Arg-378 in WT complexes qualitatively distinguished low-affinity binding from interactions with optimal and nonspecific DNA. These results were experimentally verified by the testing of post hoc hypotheses on the simulational mutations E343L and Q336L. To our knowledge, this triad represents the first specific elucidation of how interactions with backbone contacts at core-flanking positions are translated to differential affinity in a eukaryotic transcription factor. This study therefore extends our interpretation of the cache of literature data on DNA site selection with novel insights into an area of transcription factor biophysics that has seen limited progress in recent years.
More broadly, identification of this triad contributes to our understanding of how suboptimization of developmental enhancers arises evolutionarily. Although the cis-regulatory components of suboptimized enhancers are increasingly defined (9,10,41), the molecular adaptations in trans-regulatory proteins remain unknown. An intact Glu-348/Arg-378 salt bridge is the default state in unbound ETS1 and staunchly maintained in both high-affinity and nonspecific binding. This observation logically argues that low-affinity binding, which uniquely disrupts this salt bridge to incur an affinity penalty, represents an evolved molecular trait. If "de-tuning" of DNA sites constituted an evolutionary mechanism for achieving spatiotemporal control of developmental genes, the present data argue for a parallel adaptation in their trans-regulatory partners to suboptimized DNA. In ETS1, frustration introduced by lowaffinity DNA to destabilize packing of the H2 helix represents such a mechanism. These distortions are facilitated by a glycine (Gly-376) in H2, a highly disfavored residue in ␣-helices. The conservation of Gly-376 at corresponding positions in H2

Suboptimization of ETS1 binding
across ETS family proteins therefore hints at a possible shared component of cognate discrimination (42). Moreover, the triad of Glu-343/Arg-378/Gln-336 that mediates cognate discrimination is strongly conserved in ETS1 orthologs across vast evolutionary distances, from simple water-borne organisms to higher mammals (Fig. S12). Finally, harkening to a role of ETS proteins as developmental gene regulators, this triad is present in several putative proto-ETS proteins in Amphimedon queenslandica, a model species of sponge, one of the earliest phyla of animals with a defined body plan (43).

Role of sequence discrimination on self-regulation and protein-protein partnerships
In addition to DNA binding, the ETS domain is known to mediate regulatory interactions with other modules in ETS1 (23,44,45). Two extrinsic helices (HI-1 and HI-2) flanking the N terminus of the ETS domain interact with its C-terminal helices H4 and H5 (46). These interactions generate an autoinhibited state (Fig. 8A) in which unfolding of the HI helices is coupled to DNA binding (47). Work by several groups has shown that the allostery between the autoinhibitory module is coupled to DNA binding by H1, specifically Gln-336 and Leu-337 at its N terminus (31,40). The present data extend this model by accounting for the DNA dependence of autoinhibition, which has heretofore been unexplained. Paired data on binding by the uninhibited and autoinhibited ETS domain to variants to the high-affinity cognate SC1 (17,40,48) demonstrate that binding to weaker cognates is more strongly autoinhibited than higher-affinity sequences in a graded manner. Quantitatively, the free energy relationship between binding by autoinhibited versus minimal ETS1 yields a slope significantly above unity (Fig. 8B). The proportionate autoinhibition in binding methylated SC1 is consistent with established functional inhibition DNA methylation on ETS1 transactivation in vivo (49). Remarkably, the correlation applies whether the perturbation at a consensus-flanking position involves a change in base sequence, CpG methylation, or even a single-stranded nick and seems not to depend on the perturbed position itself. The literature data therefore indicate that autoinhibition depends on intrinsic affinity, which, in the context of the model, must be communicated from the DNA contact surface to the autoinhibitory helices via H1. Values below an assigned cutoff (Ϯ1 water) are colored gray and those above black. Residues with values above/below the cutoff in H1 and H2 are marked with an asterisk. High-affinity binding resulted in loss of water exposure at Leu-337 (H1) and Trp-375 (H2). In contrast, the low-affinity binding increased water exposure at Phe-340, Glu-343 (H1), and Gly-376 (H2). Arg-378 became less hydrated because of DNA contact. B, local structure of helix H2, showing the backbone conformation from Arg-374 and Arg-378. C, values indicating the fraction of frames in which H2 residues were helical as scored by DSSP.

Suboptimization of ETS1 binding
How may this affinity inhibition scaling come about? Previous work has shown that the dynamics of Gln-336 and Leu-337 act together as a "conformational switch" to direct correlated motions between H1 and H4 in autoinhibited ETS1 (31). The simulations show that Leu-337 interacts with DNA comparably in both high-and low-affinity complexes (HA, 67%; LA, 72%). Thus, Leu-337 is an important residue in this scheme, but it is Gln-336 that communicates affinity information to the Glu-343/Arg-378 salt bridge in H1 and H2. The requirement for H1 in the conformational switch model therefore suggests that affinity inhibition scaling may also be integrated by H1.
NMR analyses of autoinhibited ETS1 bound to high-affinity DNA show that the autoinhibitory helices HI-1 and HI-2 are not completely disordered, nor are they equally unstructured on the microsecond to millisecond timescale (23,46). Although both helices become less structured in the high-affinity complex than the free protein, the more proximal HI-2 is more structured than the distal HI-1, as evidenced by chemical shift perturbations (23), spin relaxation dynamics (23), and amide proton exchange rates (46). In comparable co-crystallographic high-affinity cognate complexes (50), HI-2 is helical, whereas HI-1 adopts a coil conformation. HI-2 maintains a backbone hydrogen bond from Ala-323 to the side chain of Glu-343 on H1, whereas the latter is salt-bridged with Arg-378 (50). We propose that, in low-affinity binding, loss of the Glu-343/Arg-378 salt bridge (triggered by the DNA sensor Gln-336) also destabilizes the Glu-343/Ala-323 contact and packing of HI-2, causing it to further unfold in addition to HI-1 (Fig. 8C). More complete unraveling of both HIs would, in turn, increase the activation penalty to low-affinity DNA binding by autoinhibited ETS1. The triadic residues may therefore present a mechanism by which the severity of autoinhibition can be tuned in response to the nature of the DNA substrate.
This extended model of ETS1 autoinhibition poses important implications in terms of protein-protein partnerships known to exert combinatorial control of Ets1-dependent genes. Binding partners such as Pax5 (51,52) and Runx1 (48,53) or homodimerization with another ETS1 (50,54) activate ETS1 activity by relieving autoinhibition. In the ternary Pax5-ETS1-DNA co-crystal structure (PDB code 1MDM), for example, HI-2 is in a folded state as the protein is bound to an intrinsically nonconsensus site for ETS1 (55). We may apply our extended model and interpret the Pax5 partnership in terms of mitigating the penalty of unfolding HI-2. This interpretation is further enhanced by a dedicated side chain-side chain interaction between Pax5 (Gln-22) with Gln-336 of ETS1 in the ternary complex, which may bypass the trigger that would otherwise disrupt the Glu-343/Arg-378 salt bridge.
Additional long timescale (microsecond to millisecond) motions have been inferred from model-free analysis of heteronuclear spin relaxation data and directly by relaxation dispersion in unbound ETS1 (45,46,56). These slow dynamics link the ETS domain to the autoinhibitory modules and indicate at least two active/inactive conformations in equilibrium without DNA. The labile active form can sample affinity-dependent interactions, which are less accessible to the rigid inactive state. Oscillations between these states may also aid in diffusion on nonspecific sequences (56). However, this behavior is inverted upon binding DNA. High-affinity sites stabilize short and long timescale motions at the DNA interface and HIs, respectively, more relative to lower-affinity sequences (23). Similar to the ETS domain, the dynamic properties of longer ETS1 constructs likely diverge upon binding DNA. It is therefore likely that fast dynamics, such as the Glu-343/Arg-378 salt bridge, modulate the slower motion that ultimately determines the DNA-dependent variation in inhibitory penalties.
In conclusion, we have determined the molecular mechanism of cognate discrimination by ETS1, a model developmental gene regulator. Conventionally, sequence-specific DNA recognition is focused on interactions at the protein-DNA interface, where concepts such as DNA shape have garnered strong traction in the literature (57)(58)(59). Site selectivity is regarded as a continuum from nonspecific DNA on one hand to optimal sequences on the other, along which low-affinity sequences represent imperfect intermediates. The present combination of computational and experimental data suggests a modified perspective, however, as the low-affinity complex of ETS1 exhibits dynamics, mediated by a Gln-336/Glu-343/Arg-378 triad, that are qualitatively distinct from the nonspecifically and high-affinity bound state with distinct impact on allosteric autoinhibition. Given the ancient history of ETS factors and their structural homology to the myriad variations of the helixturn-helix motif, the parsimonious interpretation is that a Figure 8. Proposed mechanism of variable autoinhibition in ETS1. A, DNA binding by ETS1 is autoinhibited by an allosteric mechanism mediated by helix H1 (blue), which couples the DNA binding interface with an N-terminal inhibitory module consisting of the short helices HI-1 and HI-2 (red). Structural evidence shows that both HI helices are folded in the unbound state (PDB code 1R36, left). Upon binding to high-affinity cognate DNA (right, PDB code 3MFK, the DNA sequence is 5Ј-GCAGGAAGTG-3Ј, consensus in orange), and HI-1 is differentially unfolded over HI-2. B, free energy relationship between ETS1 binding to the same DNA substrates as uninhibited (i.e. ⌬N331) and autoinhibited (i.e. ⌬N280) constructs. Data are compiled from the literature (see text for references) in decreasing uninhibited affinity as follows: 1, SC1; 2 and 3, nicked SC1 (green); 4, 6 -8, and 10, flanking base variants of SC1; 5, a long construct harboring SC1; 9 and 11, hemi-and fully methylated SC1 (magenta). Reported error bars are shown. The line represents an errorweighted linear fit by York's method with the 95% confidence interval. The estimated slope is 1.5 Ϯ 0.1. Note the base-10 logarithmic axes. The dashed line is unity. C, observed contacts Ala-323/Glu-343 and Glu-343/Arg-378 contacts in the high-affinity co-crystal in A. The present results suggest that the absence of the Glu-343/Arg-378 salt bridge in low-affinity complexes would destabilize HI-2, leading to increased autoinhibition (arrows).

Suboptimization of ETS1 binding
noncontinuum mechanism of cognate discrimination is not unique to ETS1. Pervading evidence of functionally essential low-affinity sites in the enhancer elements of developmental genes provides independent support for this view and a renewed need for improved understanding of how eukaryotic transcription factors parse cognate variants. As ETS1 shows, molecular dynamics simulations of cognate complexes offer an attractive approach to addressing the mechanics of enhancer site suboptimization, particularly in the absence of appropriate experimental structures.

Molecular dynamics simulation
Initial coordinates were constructed by modeling the WT or mutant ETS domain of murine ETS1 (residues 331 to 440, termed ⌬N331) to the protein in a cognate ETS1-DNA cocrystal structure (PDB code 1K79) (55). The protonation states of all ionizable residues were estimated using PROPKA3 (60) and set accordingly for pH 7.4. Unbound (apo) ETS1 was generated by deleting the DNA from the co-crystal structure. DNA-bound complexes were produced by replacing the crystallographic sequence with a 10-bp DNA sequence inserted into a 5Ј-CGGCCAA . . . ATGGCG-3Ј cassette to generate a 23-bp construct. Unbound DNA models were generated as B-form DNA by NAB. All systems were constructed in tLeap by placing the solute in a truncated octahedral TIP3P waterbox with a minimum distance of 10 Å to the box edge. Excess ions (Na ϩ , Cl Ϫ ) were added to achieve a salt concentration of 0.15 M. Topologies were generated using the Amber14SB and OL15 force fields for protein and DNA, respectively (28, 61). All energy minimization and molecular dynamics simulations were performed with AMBER16. The particle mesh Ewald method was used for treating long-range electrostatic interactions with a 10 Å cutoff for nonbonded interactions. All bonds involving hydrogen atoms were constrained by the SHAKE algorithm. All systems were energy-minimized using steepest descent (2,000 steps) followed by conjugate gradient (1,000 steps). Positional restraints on all nonsolvent atoms were used (500 kcal/mol Å 2 ) before being released stepwise for unrestrained minimization. The systems were then heated to 300 K over 100 ps in the canonical ensemble using the Langevin thermostat with restraints on the solute (100 kcal/mol Å 2 ). Afterward, the isobaric-isothermal ensemble was thermalized at 1 atm for 8 ns total in 1-ns steps. Initial relaxation of solute restraints (75, 50, 25, and 10 kcal/mol Å 2 ) were followed by an iterative release of restraints of different solute groups in the following order: nucleobases, nucleotide backbone, protein backbone, and protein side chain, followed by 1 ns of equilibration without restraints. A time step of 1 fs was used for both heating and equilibration. Subsequent production runs were run for up to 1.5 s with a 2-fs time step, saving frames every 4 ps.

MD trajectory analysis
Analysis of the trajectories was performed using cpptraj from AmberTools16. Visualization was performed with PyMOL. The final 300 ns (75,000 frames) was used for analysis. The energy-minimized structure was used as a common reference for least square fitting. Root mean square deviations were cal-culated for all heavy atoms in the protein and complex. Root mean square fluctuations were calculated on a per-residue basis separately for the backbone atoms (C ␣ , C, N, and O) and nonhydrogen side-chain atoms of the protein. Hydrogen bond analysis was performed using a 3.0 Å distance cutoff and an angle cutoff of 135°for direct and bridging (one water) contacts. Charge-charge interactions were enumerated for charge-bearing atoms of charged residues of the protein (Lys, NZ; Arg, NH1 and NH 2 ) within a predefined cutoff (3.5, 5.5, and 7.5 Å) from the phosphate oxygen atoms (OP1 and OP2) of the DNA. SASA was calculated for all atoms of the solute with a probe radius of 1.4 Å. Collective motions in the protein in bound and unbound states were determined by PCA of backbone or nonhydrogen side-chain atoms for the systems as a single concatenated trajectory. Because of the high flexibility of the N and C termini, the first and last three residues (333-335 and 436 -438) were omitted in PCA. Block averages were calculated by splitting the entire dataset into five equally sized blocks and computing the mean for each block before determining the mean and standard deviation across the blocks.

Molecular cloning and recombinant protein preparation
Site-specific mutants of the murine ETS1 ETS domain (residues 331 to 440, termed ⌬N331) were generated by PCR cloning as described previously (37). Briefly, His 6 -tagged recombinant constructs of WT and mutant ⌬N331 were overexpressed in Escherichia coli and purified by Co-NTA affinity chromatography as described previously (37) to apparent homogeneity as judged by SDS-PAGE. Following cleavage of the affinity tag with thrombin, purified protein was extensively dialyzed against 10 mM Tris-HCl (pH 7.4) containing 0.15 M NaCl and 3 mM DTT.

ETS1-DNA binding experiments
The free energy of DNA binding by WT and mutant ETS1 was determined via the equilibrium constant by fluorescence polarization, as reported previously (22). In brief, a subsaturating concentration of WT or mutant ⌬N331 in complex with a Cy3-labeled DNA probe was displaced at 25°C with unlabeled DNA duplexes harboring the same cognate sequences as the simulational DNA. The solution conditions were 10 mM Tris-HCl (pH 7.4) containing 0.1% BSA, 0.15 M NaCl, and 3 mM freshly prepared DTT. Steady-state fluorescence anisotropy of the probe was measured at 530/590 nm with a Molecular Devices Paradigm plate reader. The binding data were fitted by nonlinear least square analysis with a competitive binding model (17), as a function of total unlabeled DNA concentration, to estimate the equilibrium dissociation constant K D . The free energy of binding was computed via the standard thermodynamic relationship ⌬G°ϭ RT ln K D .
Titration samples involving Mnx enhancers were prepared in a direct binding format using DNA PCR-amplified from plasmids harboring the fragment sequence. These plasmids were prepared by cloning synthetic oligos encoding the NCBI reference sequence NW_004190369.2 into pCR2.1-TOPO vectors (Invitrogen). Amplicons were purified by agarose electrophoresis and extracted with spin columns (Thermo). Each sample consisted of 10 nM and graded concentrations of purified WT Suboptimization of ETS1 binding or mutant (E343L or Q336L) in the same buffer matrix as samples for fluorescence polarization. The mixtures were resolved by native electrophoresis in 10% polyacrylamide gel running at 10 V/cm in 0.5ϫ Tris/glycine buffer. Gels were stained with SYBR Gold (Thermo) and imaged with UV transillumination. Each lane was quantified by peak fitting with Gaussian distributions.