Sequence Discrimination by DNA-binding Domain of ETS Family Transcription Factor PU.1 Is Linked to Specific Hydration of Protein-DNA Interface*

Background: PU.1 recognizes a large number of binding sites with differing affinities. Results: The role of hydration in sequence-specific binding by the ETS domain of PU.1 was determined. Conclusion: Sequence discrimination is linked to the uptake of specifically bound waters at the protein-DNA interface. Significance: The sequence specificity of PU.1 ETS is directly related to the transactivational activity and frequency of in vivo binding sites for the full-length protein. PU.1 is an essential transcription factor in normal hematopoietic lineage development. It recognizes a large number of promoter sites differing only in bases flanking a core consensus of 5′-GGAA-3′. DNA binding is mediated by its ETS domain, whose sequence selectivity directly corresponds to the transactivational activity and frequency of binding sites for full-length PU.1 in vivo. To better understand the basis of sequence discrimination, we characterized its binding properties to a high affinity and low affinity site. Despite sharing a homologous structural framework as confirmed by DNase I and hydroxyl radical footprinting, the two complexes exhibit striking heterogeneity in terms of hydration properties. High affinity binding is destabilized by osmotic stress, whereas low affinity binding is insensitive. Dimethyl sulfate footprinting showed that the major groove at the core consensus is protected in the high affinity complex but accessible in the low affinity one. Finally, destabilization of low affinity binding by salt is in quantitative agreement with the number of phosphate contacts but is substantially attenuated in high affinity binding. These observations support a mechanism of sequence discrimination wherein specifically bound water molecules couple flanking backbone contacts with base-specific interactions in a sequestered cavity at the core consensus. The implications of this model with respect to other ETS paralogs are discussed.

PU.1 is an essential transcription factor in normal hematopoietic lineage development. It recognizes a large number of promoter sites differing only in bases flanking a core consensus of 5-GGAA-3. DNA binding is mediated by its ETS domain, whose sequence selectivity directly corresponds to the transactivational activity and frequency of binding sites for full-length PU.1 in vivo. To better understand the basis of sequence discrimination, we characterized its binding properties to a high affinity and low affinity site. Despite sharing a homologous structural framework as confirmed by DNase I and hydroxyl radical footprinting, the two complexes exhibit striking heterogeneity in terms of hydration properties. High affinity binding is destabilized by osmotic stress, whereas low affinity binding is insensitive. Dimethyl sulfate footprinting showed that the major groove at the core consensus is protected in the high affinity complex but accessible in the low affinity one. Finally, destabilization of low affinity binding by salt is in quantitative agreement with the number of phosphate contacts but is substantially attenuated in high affinity binding. These observations support a mechanism of sequence discrimination wherein specifically bound water molecules couple flanking backbone contacts with base-specific interactions in a sequestered cavity at the core consensus. The implications of this model with respect to other ETS paralogs are discussed.
PU.1 (or Spi-1) is a lineage-restricted member of the ETS family of transcription factors that shares a structurally conserved DNA-binding domain. It binds a large number of cognate sequences (mostly of myeloid and lymphoid origin) harboring a central 5Ј-GGAA-3Ј consensus (1,2). These sites exhibit significant variation in the flanking bases and range over 400-fold in affinity (3). Although the combinatorial nature of transcriptional control is well established, the intrinsic selectiv-ity of the ETS domain's intrinsic selectivity per se is also biologically relevant. The latter point is demonstrated by its exact relationship with transactivational activity by full-length PU.1 (4) as well as the close correspondence between high affinity sequences identified in vitro and their prevalence among ETS binding sites in vivo (1,2). Given current interest in the transcriptional specificity of ETS proteins at the genomic level (5)(6)(7), knowledge of the physical determinants for the intrinsic selectivity of ETS proteins is essential to understanding higher order protein-protein and protein-DNA interactions found in the transcription regulatory complex.
In the co-crystal structure of the ETS domain of PU.1 bound to a high affinity binding site (8), the recognition helix makes base-specific contacts (several of which are water-mediated) in the major groove of the 5Ј-GGAA-3Ј core consensus. In the adjacent minor grooves, the protein makes contacts exclusively with DNA phosphate and deoxyribose atoms. Although this organization is consistent with the existence of flanking sequence variants, it does not explain their non-degenerate occurrences that represent a ubiquitous feature of all ETS domains (1). The complexity of flanking sequence discrimination is demonstrated by two complementary lines of experiments. In screening experiments, the occurrence of certain bases at a flanking position is strongly associated with a particular base neighbor (2). In direct measurements of sequencespecific binding, each flanking base position contributes nonadditively to the overall affinity (3) as well as the underlying thermodynamics (9). To account for these observations, an "indirect readout" mechanism of DNA structure as encoded by the flanking sequences was proposed (9). However, the physical nature of this "code" remains unresolved. A major missing piece to this puzzle is the lack of biophysical data on low affinity flanking sequence variants. Characterization of these variants is essential, even if they are not highly represented in in vivo binding sites, because this knowledge is ultimately required to complete our understanding of the intrinsic preference of ETS protein for one binding site over another. Unfortunately, comparative studies of core consensus variants 5Ј-GGAN-3Ј (10 -13) do not directly answer this question because all four core bases receive base-specific inputs.
To address the physical contribution of the flanking sequences to sequence discrimination by the PU.1 ETS domain, we investigated the structure and binding properties of two PU.1 ETS-bound sites differing strictly in the flanking sequences. Osmotic stress experiments revealed the unusual finding that high affinity specific binding is associated with an apparent net uptake of water molecules. In contrast, low affinity specific binding is manifestly hydration-neutral. Dimethyl sulfate footprinting detected a putative solute-excluding cavity at the major groove of the core consensus in the high affinity complex but not in the low affinity one. DNase I and hydroxyl radical footprinting identified no other major conformational differences along the protein-DNA interface. The two sequence-specific complexes also differ significantly in terms of destabilization by NaCl, which was interpreted in terms of attenuation of phosphate neutralization by specifically bound waters. We propose a model of sequence-specific discrimination wherein flanking backbone contacts are coupled to basespecific interactions between the recognition helix and the core consensus via a network of specifically bound water molecules.

EXPERIMENTAL PROCEDURES
Cloning and Preparation of DNA Fragments-A ϳ130-bp fragment harboring a single 23-bp ETS binding site (Table 1) was cloned into the EcoRI/HindIII sites of pUC19. For nonradioactive experiments, DNA fragments (206 bp) were amplified by PCR using standard M13 forward and reverse primers. For footprinting experiments, internal primers were used to generate shorter, singly 5Ј-radiolabeled fragments by PCR. All amplicons were purified from an agarose gel.
Protein Preparation-A synthetic DNA fragment encoding residues 165-272 of murine PU.1 was cloned between the NcoI and BglII sites of pQE-60 (Qiagen). This construct contains a C-terminal His 6 tag separated by a thrombin cleavage site. Protein was overexpressed in Escherichia coli M15[pREP4] and purified by standard affinity chromatography. The His 6 tag was cleaved by incubation with 10 units/ml thrombin overnight at room temperature and removed on a Superdex 30/200 column. Detagged, purified PU.1 ETS domain was dialyzed into binding buffer (described below) and quantified by UV spectrometry (⑀ 280 ϭ 22,460 M Ϫ1 cm Ϫ1 ).
Buffers and Reaction Conditions-The binding buffer in all experiments was 10 mM sodium phosphate, pH 7.4, prepared by diluting stocks of Na 2 HPO 4 and NaH 2 PO 4 . NaCl was added to achieve the required [Na ϩ ], taking into account contributions from the buffer salts. Minor salt-induced changes in pH were not expected to influence DNA binding, which is pH-independent between pH 6.7 and 9.0 (14). Samples were extensively incubated under the stated conditions to achieve equilibrium; high viscosity samples were routinely equilibrated for up to 24 h.
Filter Binding Experiments-Experimental procedures were as described (3,9) with the following modifications. Doublestranded radiolabeled oligonucleotides were prepared by labeling one strand (1 nmol) with [␥-32 P]ATP, quantified by UV spectrometry, and annealed with the complementary strand at a 1.2-fold excess. The specific activity (SA) of the labeled DNA was calculated by scintillation counting of an aliquot (of known total concentration) on the day of the experiment. Observed binding to the filters as a function of total DNA oligonucleotide concentration [D] t was fitted to the following equation, where B 0 is the concentration of PU.1 ETS-bound radiolabeled DNA, and ⑀ is the filter efficiency. Background DNA binding to filters was measured in ETS-free samples and estimated simultaneously with ETS-containing samples to define the linear coefficient C. B 0 is computed numerically as the real, physical root of the quadratic equation, 2 SA is expressed as a fraction of [D] t . When the concentration of PU.1 ETS p t and SA enter as constants, no significant correlation was observed among the remaining parameters.
DNA Footprinting-ETS-DNA binding was performed in 50 l of binding buffer containing ϳ1 nM singly labeled DNA fragment, PU.1 ETS domain, acetylated BSA (100 ng/l), and poly(dAT)⅐poly(dAT) (10 Ϫ5 M bp). In some experiments requiring higher sequestration of nonspecific binding, purified salmon testes DNA (0.75 mM bp) was used instead of the synthetic copolymer. At equilibrium, enzymatic or chemical footprinting was performed under conditions that introduced Ͻ30% total cleavage in order to maximize the fraction of singly cleaved DNA. Selected chemical sequencing reactions were also performed for base identification. DNase I footprinting was performed following standard procedures. Hydroxyl radical footprinting was performed as described by Tullius and Dombroski (15). For methylation protection footprinting, dimethyl sulfate (DMS) 2 was diluted 1:5 in water just prior to use. One l was thoroughly mixed into the ETS-DNA mixture. After 20 s, the reaction was quenched by ethanol precipitation. The recovered DNA was dissolved in 100 l of T0.1E, phenolextracted, and washed with diethyl ether to remove protein.
Piperidine was added to 10%, heated at 90°C for 5 min, and reprecipitated with ethanol. Purified DNA was separated by standard denaturing electrophoresis.
Osmotic Stress Experiments-The effect of osmolytes on the binding of PU.1 ETS to various binding sites was measured as a competition for 10 nM PU.1 ETS by 1 nM high affinity DNA fragment and graded concentrations of various DNA oligonucleotides. Acetylated BSA was also present at 100 ng/l in a 20-l volume. Osmolytes were dissolved or diluted in binding buffer; samples containing triethylene glycol were adjusted for total [Na ϩ ]. Stoichiometric molality was converted to osmolality by interpolation of reported values. Data for triethylene glycol, glycine betaine, sucrose come from the Laboratory of Physical and Structural Biology at the NICHD, National Institutes of Health. Data for nicotinamide and glycerol are taken from Refs. 16 and 17, respectively. At equilibrium, samples were loaded onto a running 6% polyacrylamide gel. After electrophoresis, the gel was stained with SYBR Gold (Invitrogen). No significant changes in quantum yield due to protein binding were reported for the SYBR series of dyes (18). Fractional bound fragment (f b ) as a function of total DNA oligonucleotide concentration ([D 1 ] t ) was fitted to the positive, real root of the following cubic equation, , and 0 ϭ ϪK 1 p t . This model yields absolute equilibrium dissociation constants for the DNA fragment (K 1 ) and oligonucleotide (K 2 ) from a competition experiment. When total concentrations of DNA fragment ([D 2 ] t ϭ 1 nM) and PU.1 ETS domain (p t ϭ 10 nM) enter as constants, no significant correlation was observed among the fitted values of the dissociation constants.
Computations-All parameters Ϯ S.E. were estimated by nonlinear regression. Electrophoretic data were digitized using a Storm 860 instrument (GE Healthcare). For phosphorimaging data, lane traces were fitted to a superposition of Lorenztian functions without base line (19). Stained gels (excitation/emission ϭ 450/520 nm) were fitted with Gaussian peaks on a heuristic base line to account for background staining. Structural models were rendered using PyMOL; water-accessible cavities were identified using Caver (20).

RESULTS
We have carried out a detailed comparison of the structural and binding properties of a high affinity, low affinity, and nonspecific PU.1 ETS-DNA complex (Table 1). They were selected from studies in which the affinities of specific binding sites have been directly determined (3,9,14). These sequences are variations of the B motif of the Ig2-4 enhancer, a native high affinity PU.1 binding site (21). Based on the PU.1 ETS-DNA co-crystal structure (8), the altered flanking positions contact the ETS domain via backbone phosphate or deoxyribose atoms only. The nonspecific sequence is derived from [Ϫ]GC, in which the 5Ј-GGAA-3Ј consensus has been changed to 5Ј-GAGA-3Ј. The affinities of these sequences for PU.1 ETS were determined by filter binding titrations at 150 mM Na ϩ and pH 7.4 (supplemental Fig. S1). The equilibrium dissociation constants of [Ϫ]GC and [ϩ]TG yield a difference of ϳ160-fold or 13 kJ/mol in free energy (Table 1), in agreement with reported values (3). The dissociation constant of the nonspecific (NS) sequence, which has not been characterized, is 1.5 Ϯ 0.1 M, about 10-fold higher than [ϩ]TG. Relative to the parent [Ϫ]GC site, this mutation of the core consensus results in a loss of over 2,000fold in affinity, equivalent to 19 kJ/mol in binding free energy.
The similarity in affinity between the [ϩ]TG and NS sites calls into question the extent to which low affinity "sequencespecific" binding is differentiated from "nonspecific" binding. Such differences should be manifest in the structure of the protein-DNA interface. DNase I and hydroxyl radical footprinting was therefore performed to characterize the contact interface for the three PU.1 ETS-DNA complexes. Both the [Ϫ]GC and [ϩ]TG complexes yielded footprints characteristic of sequence-specific ETS-DNA complexes ( Fig. 1 and supplemental Fig. S2). Specifically, the TTCC strand shows a strongly hypersensitive band at N 1 (5Ј-TpTpCpCpN 1 2pN 2 pT-3Ј) just 3Ј to the core consensus. This hypersensitivity is a universal hallmark of sequencespecific ETS-DNA complexes (22)(23)(24)(25)(26). It is attributed to a widening of the minor groove by the recognition helix (h3) at the major groove of the core consensus. When probed by hydroxyl radicals, the footprints for both [Ϫ]GC and [ϩ]TG sequences are highly similar (supplemental Fig. S2, C-H) and correspond to reported examples of sequence-specific PU.1 ETS-DNA complex (22,27). Hydroxyl radicals attack deoxyribose hydrogens in the minor groove (28). Accordingly, positions flanking the core consensus are protected, whereas those along the core consensus itself are fully accessible. The nonspecific complex, in contrast, produced a diffuse footprint without DNase I hypersensitivity or protection from hydroxyl radicals at the corresponding flanking positions. We conclude, therefore, that the [ϩ]TG sequence is a low affinity but sequence-specific binding site.
High Affinity PU.1 ETS Binding Is Associated with Net Water Uptake-The presence of water clusters along the interface of the PU.1 ETS-DNA co-crystal structure suggests that structural waters play a dominant role in high affinity sequencespecific binding. To test this possibility, we used the osmotic stress technique (18) to uncover differences in water uptake or  As shown in Fig. 2, competitive binding to the high affinity [Ϫ]GC oligonucleotide was destabilized in the presence of osmolyte. In contrast, competitive binding to the low affinity but sequence-specific [ϩ]TG oligonucleotide appeared to be indifferent. The latter observation was not an artifact arising from native electrophoresis of a low affinity complex because this technique was able to resolve the nonspecific complex. In the absence of osmolyte, the fitted dissociation constants agree with those measured under the same solution conditions by filter binding ( Table 1). Inclusion of osmolyte (triethylene glycol) in the gel matrix (31) made no discernable difference in the data.
In total, we measured the dependence of the binding constant K B (ϭ 1/K 2 from Equation 2) on the osmolality of four osmolytes: triethylene glycol, glycine betaine, nicotinamide, and sucrose. Because osmolality (Osm) corresponds to osmotic pressure and water activity a w via Osm ϭ /RT ϭ Ϫ55.5ln a w : ⌬⌫ PW represents the change in preferential hydration of the PU.1 ETS-DNA complex relative to the unbound species (29). Negative values of ⌬⌫ PW arise from stabilization of the complex by osmolytes and correspond to a net release of water molecules upon binding. The effects of the four osmolytes on complex stability for [Ϫ]GC, [ϩ]TG, and the NS sequences are shown in Fig. 3; estimates of the slopes and ⌬⌫ PW are given in Table 2.
The sequence-dependent responses to osmolytes as shown in Fig. 2 are reproduced for all four osmolytes. The destabilization of the high affinity [Ϫ]GC site (positive ⌬⌫ PW ) indicates a net uptake of water upon binding to PU.1 ETS. Curvature is observed with increasing osmolality, but ⌬⌫ PW remained linear up to at least 2.5 osm. In contrast, the low affinity, sequence-specific site [ϩ]TG is essentially insensitive to all osmolytes up to 6 osm, suggesting little, if any, net change in hydration between the bound and unbound states. This apparent coupling of water uptake and high affinity binding suggest that water molecules play a specific, stabilizing role in the sequence-specific complex. In contrast, the nonspecific complex is associated with net water release (negative ⌬⌫ PW ). It is also more strongly stabilized by nicotinamide than the other osmolytes, which behave similarly to each other with respect to the specific complexes. Clearly, differences in hydration are strongly related to sequence specificity and affinity for PU.1 ETS binding. Preferential hydration and preferential exclusion of osmolyte in the vicinity of biopolymers are coupled properties. Conse- quently, the magnitude of ⌬⌫ PW is expected to differ when the same macromolecular system is probed with osmolytes differing in physicochemical properties and therefore their extent of preferential exclusion. The simplest explanation for the apparent lack of dependence on osmolyte identity is that ⌬⌫ PW is dominated by the movement of water into or out of a cavity that excludes all of the osmolytes used (29). To check that the osmolytes are indeed acting "osmotically," the integrity of the protein-DNA interface was confirmed by DNase I footprinting at 1.5 osm (supplemental Fig. S3).
Differential Accessibility of PU.1 ETS/DNA Interface to Osmolytes-Based on our interpretation (which we elaborate at length under "Discussion") of the osmotic stress data, the low affinity [ϩ]TG complex, which is insensitive to osmolyte, should lack the asymmetrically accessible (to water but not osmolytes) compartment found with the [Ϫ]GC site. The co-crystal structure suggests that the most likely candidate for such a cavity is the major groove of the core consensus, where a number of crystal waters are observed. Therefore, we probed the protein-DNA interface by titration footprinting using DMS. DMS methylates the N 7 -positions of purines, which are accessible only from the major groove. It is also comparable in size with the osmolytes used in the osmotic stress experiments and is therefore well suited to probe the interface on the GGAA strand.
The DMS footprints of the GGAA strand for the [Ϫ]GC and [ϩ]TG complexes (Fig. 4) show clearly that PU.1 ETS strongly protects the guanines (G Ϫ1 and G 0 ) in the core consensus of [-]GC. In contrast, the corresponding guanines in  was subject to osmotic stress at equilibrium and measured by electrophoretic mobility shift, as shown in Fig. 2. The osmolytes used were triethylene glycol (E), glycine betaine (ࡗ), nicotinamide (OE), and sucrose (ƒ). Values of K 2 acquired from best fits of the titration data to Equation 2 were averaged from replicate experiments (up to three for each condition) and converted to ln K B Ϯ S.E. (error bars). The lines represent error-weighted fits of the aggregate data for each binding site to a line or parabola ([Ϫ]GC, dashed line). For [Ϫ]GC, there is a significant increase in the weighted sum of squares if the full range of data is fitted to a line (F test, p ϭ 0.00268) but not for the subset up to 2.5 osm (p ϭ 0.71). The dependence on nicotinamide for the NS binding site was fitted separately from data for the other osmolytes. The fitted estimates of the slopes are given in Table 2.  (Fig. 1) and hydroxyl radicals (supplemental Fig. S2) quantitatively score the fractional binding under the identical conditions. For both sites, the major groove at the flanking contacts is solvent-exposed and therefore unprotected from DMS methylation. has been determined previously (9). In all cases, Z is significantly smaller (Յ4) than the number of seven phosphate contacts observed in the co-crystal structure. We measured the dependence of PU.

DISCUSSION
Sequence-specific PU.1 ETS-DNA complexes exhibit structural and thermodynamic heterogeneity that implicate the formation of specifically coordinated water molecules along the protein-DNA interface with high affinity binding. The application of osmotic stress destabilizes the high affinity complex with the [Ϫ]GC site but exerts little effect on the low affinity complex with [ϩ]TG. The major groove of the core consensus excludes dimethyl sulfate in the [Ϫ]GC complex but remains accessible in the low affinity [ϩ]TG complex. Finally, high affinity complexes are significantly less sensitive to salt than expected for the number of known phosphate contacts in the co-crystal structure, whereas good agreement was found for the low affinity site [ϩ]TG. The ETS domain of PU.1, as with ETS domains in general, exhibits a dispersion of affinities toward a large number of binding sites differing only in the bases flanking the core consensus. If flanking sequence variants share a common structural framework, the present data argue for a physical linkage of flanking and core contacts by specifically bound water molecules.
Protein-Solvent Interactions as Window into PU.1 ETS-DNA Binding-Protein-DNA binding is typically accompanied by dehydration (i.e. water release) of the contact interface. Because macromolecular surfaces in general preferentially exclude solutes (or, equivalently, are preferentially hydrated) (34 -36), osmolytes drive complex formation to minimize these soluteexposed surfaces. Moreover, high affinity interactions that appose highly complementary surfaces are expected to displace more water than low affinity and nonspecific complexes (18). Our observation that PU.1 ETS binding to the high affinity [Ϫ]GC site is destabilized by osmotic stress is therefore highly unusual. Perturbations due to direct binding of the osmolytes are improbable based on the highly similar effects of all four osmolytes, which vary markedly in physicochemical properties (supplemental Table S1). If the observed reduction in affinity were due to osmolyte binding, the binding constants and capacities would have to be implausibly similar for all four osmolytes. Electrostatic perturbations are also unlikely because these osmolytes differ qualitatively in their effects on the solution dielectric. For both the [Ϫ]GC and [ϩ]TG, all four solutes behave identically as a function of osmolality (Fig. 3).
In the paradigm of preferential hydration, the apparent uptake of water by [Ϫ]GC implies an increase of solute-exposed surface area upon protein-DNA binding. However, any major folding coupled to DNA binding is incompatible with spectroscopic data indicating that the unbound ETS domain is substantially folded as observed in the co-crystal structure (14,33). Calculations based on a rigid body association give a loss of Ͼ2,400 Å 2 of solvent-accessible surface area (14), a relatively small but still significant amount of burial.
To reconcile these observations, we propose that sequencespecific binding by PU.1 ETS is dependent on specific hydration. Specifically, optimal interactions within the contact interface require the formation of a network of water-mediated contacts in a cavity that sterically excludes other solutes. The resultant osmotic pressure penalty would account for the observed linear dependence on osmolality up to about 2.5 osm and the lack of dependence of this effect on osmolyte identity. At higher osmolality, osmotic pressures become sufficient to strip these specifically bound water molecules, leading to the observed curvature in Fig. 4. The putative identification of this cavity by DMS protection in the [Ϫ]GC complex, but not the low affinity [ϩ]TG one, is consistent with this explanation. An analysis of the co-crystal structure (8) also reveals such a soluteexcluding cavity at the core consensus (Fig. 6A). An osmolyteaccessible interface of the [ϩ]TG complex is manifest as insen- sitivity to osmolytes. In the model described in the following, the loss of osmotic responsiveness reflects a suboptimal core interface at the [ϩ]TG site that fails to build specific hydration networks.
If the ∂ln K B /∂Osm slope is interpreted strictly in terms of osmotic stress (Equation 3), it translates to a net uptake of 96 Ϯ 11 water molecules for the high affinity [Ϫ]GC site relative to the low affinity [ϩ]TG site ( Table 2). This quantity corresponds to associated water molecules that are detectable thermodynamically upon binding. If at least some of the sequestered water molecules participate in ordered interactions within the binding interface, this slope would include high energy contributions (37) in addition to the osmotic work of filling the cavity. Eleven such water molecules are observed at the core consensus in the co-crystal structure (8). Given a nominal molecular volume of 30 Å 3 for water (assuming normal density), these waters are just accommodated within the volume of the solute-excluding cavity (337 Å 3 ) computed in Fig. 6A.
Model for Site Selection by PU.1 ETS Domain-All of the present data can be accounted for by a model for site selection in which core interactions are coupled to flanking contacts by bridging water molecules. Although the co-crystal structure does not permit an unambiguous assignment of hydrogen atoms, a network of hydrogen bonds can be drawn from backbone to nucleobase contacts via specifically bound water molecules (Fig. 6B). In the [Ϫ] segment, a water molecule that mediates a phosphate oxygen contact from Tyr 252 is also coordinated by Arg 235 , one of the key arginines involved in basespecific contacts in the core consensus. This train of watermediated contacts extends through Glu 228 and Thr 226 in the core consensus to Asn 221 , Lys 223 , and Lys 229 in the [ϩ] flanking segment.
In the context of specific hydration, indirect readout of DNA conformation (2) relates sequence-dependent effects on helical twist, roll, and rise in the flanking segments to the energetics of establishing ordered water networks. Considering the tight packing within the core cavity (Fig. 6A), such networks must be highly constrained for a given flanking sequence variant. The required cooperativity is consistent with destabilization of the [Ϫ]GC complex by osmolytes being stronger than expected for a simple osmotic penalty alone. It also explains the strong enthalpy-entropy compensation observed among specific PU.1 ETS-DNA complexes (9). Because flanking sequence variants do exhibit a reasonable dispersion of affinities (2,3), alternative formations are probably tolerated at the expense of binding free energy. The present data demonstrate that the low affinity site [ϩ]TG, which appears to be water-neutral, does adopt the conformation characteristic of sequence-specific complexes. Assuming that [Ϫ]GC and [ϩ]TG are the most "hydrated" and "unhydrated" binding sites, respectively, the effective energetic contribution of the structural waters would be reflected by their affinities, or 13 kJ/mol of free energy under physiological conditions.
In addition to their role as structural waters, we propose that specific hydration plays a related role in controlling DNA curvature. Hypersensitivity to DNase I, a minor groove probe, is an invariant feature of sequence-specific ETS-DNA complexes (22)(23)(24)(25)(26). Because minor groove width is defined by residues i and i ϩ 2 in the complementary strand (38), the [Ϫ] flanking bases in the TTCC strand (5Ј-GCT-3Ј in [Ϫ]GC, 5Ј-TTT-3Ј in [ϩ]TG) are directly involved in defining the minor groove width of the core consensus. Because effects on groove width are in turn coupled to curvature of the helical axis (8°in the co-crystal structure), the flanking contacts represent key positioning elements for optimizing contact by the recognition helix at the core consensus.
The physical driving force for curvature is provided by the asymmetric neutralization of backbone flanking phosphates (three in the [Ϫ] segment, four in the [ϩ] segment of the cocrystal structure). Complete neutralization of these phosphates by chemical derivatization to methylphosphonates induces a 28°bend (39). We believe that water molecules involved in these contacts attenuate the extent of neutralization. They may substitute salt bridges with hydrogen bonds or reduce the charge density of a nearby salt bridge via an inductive effect. In the co-crystal structure, specific water-phosphate contacts are found in four of the seven phosphate contacts. In particular, backbone contacts from Lys 219 are strictly water-mediated. A structural analysis reveals that DNA curvature is focused at the core consensus as judged by local changes in minor groove widths and helical twist (supplemental Fig. S4). For low affinity sites, we propose that more extensive neutralization of backbone phosphates in the flanking segments exaggerates the bend at the core consensus and enhances its accessibility to (larger) osmolytes. Because the core cavity in a high affinity complex is tightly packed (Fig. 6A), a larger, more distorted cavity would probably result in a suboptimal core interface that debilitates the formation of specific hydration networks bridging the recognition helix and core bases.
Experimentally, our salt dependence data are consistent with this proposal. All measured high affinity PU.1 ETS binding sites depend on [Na ϩ ] to an extent that substantially underestimates the number of phosphate contacts if they are assumed to be fully neutralized (9). The present data on the low affinity site [ϩ]TG shows a pronounced dependence on [Na ϩ ], which closely tracks the number of phosphate contacts. Moreover, the polyelectrolyte theory predicts a purely entropic effect of counterion release upon DNA binding (40). An interactive effect with specific hydration would explain thermodynamic observations (9) that salt exerts an enthalpic effect on binding as well.
Based on the foregoing, our model predicts that a specific site of intermediate affinity for the PU.1 ETS domain should exhibit corresponding responses to osmolytes, DMS protection, and salt. To test our model, we used the B site (AAAGGAAGTG) whose affinity for PU.1 ETS is intermediate of the [Ϫ]GC and [ϩ]TG sites (3). Application of osmotic stress destabilized binding with a slope of Ϫ0.90 Ϯ 0.11 (Fig. 7), about half the value for the [Ϫ]GC site (cf. Table 2). PU.1 ETS binding protects the core consensus from DMS attack but less than observed for the [Ϫ]GC site. Finally, the salt dependence of the B site is Z ϭ 4.2 Ϯ 0.9 (9), between that for [Ϫ]GC and [ϩ]TG (Table 3). Thus, the linkage connecting specific hydration, formation of an osmolyte-excluding cavity, and electrostatics in sequencespecific PU.1-ETS interactions our model is supported by data from three different specific sites at least in rank order.
Divergence of Sequence Discrimination of ETS Domains-ETS domains share relatively weak sequence homology but strong structural conservation and similar target site preference (41). PU.1 is the most sequence-divergent member of the ETS family, a feature that may be correlated with its extensively hydrated protein-DNA interface. "Canonical" ETS-DNA complexes such as Ets-1 (11,12) and GABP␣ (42) feature a higher density of direct protein-to-nucleobase contacts than PU.1. Specific hydration might therefore evolve as compensatory  mechanism to replace some of the "missing" anhydrous contacts. The involvement of specifically bound water at the backbone contacts by PU.1 is also consistent with the substantially smaller DNA curvature (8°) compared with Ets-1 (27°) and GABP␣ (18°) despite sharing identical phosphate contacts (25). Finally, high affinity binding for the minimal ETS domain of Ets-1 exhibits dissociation constants at 10 Ϫ12 M under physiological conditions (43). This 10 2 -fold difference in affinity (cf. Table 1) reflects, in part, the intrinsic free energy cost of building structural waters in high affinity PU.1 complexes. Nature of Nonspecific PU.1 ETS Complex-Nonspecific binding by the PU.1 ETS domain is clearly distinguished from sequence-specific binding by DNase I and hydroxyl radical footprinting. It is also associated with a net release of water molecules, in contrast with specific binding, and in a manner that depends on the osmolyte's identity. Therefore, hydration is a major distinguishing feature in sequence discrimination by the PU.1 ETS domain. Our data suggest a nonspecific complex that is structurally defined but very different from that of sequence-specific ETS-DNA complexes.