Structures of the Ets Protein DNA-binding Domains of Transcription Factors Etv1, Etv4, Etv5, and Fev

Background: Transcription factors of the Ets family regulate development and cancer. Results: We show the details of DNA sequence recognition and identify structural changes that drastically affect DNA binding. Conclusion: The activity of Ets transcription factors is regulated by their oxidation state. Significance: Ets transcription factors may control the response of cells to oxygen levels.

The Ets transcription factor family consists of 28 genes in humans (1,2) containing the evolutionarily conserved 85 amino acid Ets DNA-binding domain (3), originally identified as a viral oncogene (E26 transformation-specific) (4). Ets proteins exhibit ubiquitous or tissue-specific expression (5) and are particularly involved in differentiation processes and the response to signaling pathways (6,7). Ets proteins may be classified based on the presence of additional structured domains (5). The small PEA3 subfamily is characterized by an N-terminal transactivation domain and consists of three members, Etv1 (Ets translocation variant 1, ER81), Etv4 (PEA3), and Etv5 (ERM) (8).
PEA3 transcription factors have roles in morphogenesis (9) and neuronal differentiation (10,11). PEA3 members are also oncoproteins (8), whose overexpression correlates with up-regulation of HER2/Neu and with progression in breast tumors (12). ETV1 is amplified in Ͼ40% of melanomas (13), and overexpression following chromosomal translocation of either ERG or Etv1 to the androgen-inducible TMPRSS2 promoter is present in most prostate tumors (14). Etv1 directly interacts with the androgen receptor (15) and drives the androgen receptor transcriptional response associated with aggressive prostate cancer (16). Etv1 target genes include hTERT (17) and matrix metalloproteinases (18) that may mediate cancer development. Furthermore, Etv1 may be subverted by other oncoproteins such as mutated (activated) KIT, which cooperates with elevated Etv1 levels to promote tumorigenesis (19). Etv4 and Etv5 also play roles in morphogenesis, fertility, and oncogenesis (8). Ewing's sarcomas result from chromosomal translocations that generate dominant transforming fusion proteins of the transactivation domain of the EWS protein with the ETS domain of one of the five Ets proteins Etv1, Etv4, Erg, Fli1, and Fev (Fifth Ewing Variant, PET-1) (20,21). Fev lacks additional domains and has a restricted tissue expression (21,22). Fev regulates serotonergic neuronal differentiation, being critical for normal anxiety/aggression development (23), although overexpression is associated with serotonin production in small intestine neuroendocrine tumors, stimulating tumor growth (24). Ets domains are thus a central nexus in tumor development and disease progression. Targeting transcriptional regulation in cancer with drugs is expected to be challenging (25), but Ets inhibitors have been developed. YK-4-279 targets EWS-FLI1, inhibits growth in Ewing sarcoma (26), and also inhibits Erg/ Etv1-driven prostate cancer invasion (27). Although this may be useful for generic treatment of Ets-driven cancers, the lack of specificity could cause off-target effects with other Ets proteins, highlighting the need for further high resolution structural and biochemical studies of Ets proteins.
The Ets domain is a variant helix-turn-helix (winged helix) structure (28 -31), comprising three ␣-helices and a fourstranded antiparallel ␤-sheet. Ets domains bind to the EBS 5 (Ets-binding site) in dsDNA with the ␣3 helix inserted into the major groove, and invariant arginine and tyrosine residues hydrogen bonded to bases of the invariant 5Ј-GGA(A/T)-3Ј sequence in the EBS (2). Although binding the GGA(A/T) core is a common property of Ets transcription factors, a genomewide analysis of all Ets family members has established that Ets domains recognize up to nine bases (32) (three upstream and two downstream of the GGA(A/T) core), and they are classified into four distinct classes based on the sequence preferences for these flanking regions. Etv1, Etv4, Etv5, and Fev all belong to the large and diverse class I (containing 14 of 26 of the mammalian Ets transcription factors), which recognize a consensus sequence ACCGGAAGT(G/A). The structural basis for this sequence discrimination is unclear, as there are no direct base contacts outside of the core. Indirect readout mechanisms have been suggested, based on sequence-dependent DNA conformational preferences (33). However, no mechanism for how sequence preferences would influence shape readout has been proposed nor how the different classes of Ets domains achieve their observed sequence specificities outside the GGA within this paradigm.
A number of other factors influence Ets DNA binding, including interaction with other transcription factors to recognize combined operators in a cooperative manner (34), cooperative binding of palindromic sequences by dimerization (35), and the presence of auto-inhibitory regions surrounding the Ets domain (36 -38). DNA binding is also subject to post-translational regulation (39) with protein kinase A phosphorylation of Etv1 at Ser-334 (40) and the equivalent Ser-367 in Etv5 (41) repressing DNA binding. Several Ets proteins have been identified as being subject to redox control, with redox-sensitive cysteines present in GA-binding protein ␣ (GABP␣) affecting dimerization and DNA binding (42). However, as with the case for phosphorylation, the structural mechanism for this regulation is unclear.
Additional regulation of Ets DNA binding may occur at the DNA level, with several Ets motifs being over-represented in methylated genomic regions (43). Some Ets proteins are known to bind to their cognate sequence in vitro and in vivo only in a demethylated state, including GABP␣ (44,45) and ETS1 (46). Direct methylation of the class I EBS sequence is possible in two positions (C m GGAA/TTCC m G); however, it is not known whether and to what extent this direct methylation affects Ets binding.
Here, we present the crystal structures of the Ets domains of the entire PEA3 subfamily (Etv1, Etv4, and Etv5) together with the structure of the Erg family member (Fev). All proteins share a highly conserved core Ets structure and make extensive interactions with DNA. Comparisons of apo-and DNA-bound forms of Etv1, Etv4, and Fev allow us to identify residue movements and disorder to order transitions that occur upon DNA binding. In the DNA complex structures of Etv5 and Fev (which are determined to 1.9 and 2.0 Å resolution, respectively), we observe a network of coordinated water molecules contributing to sequence recognition upstream to the GGA core. Structural and biochemical analysis of Etv1 clarifies mechanisms of post-translational regulation in Ets proteins, including PKAmediated phosphorylation at Ser-334 and the abrogation of Ets binding to sites that include a methylated CpG. Furthermore, a new Ets dimerization interface linked by a redox-sensitive disulfide bond is identified, and this dimerization is shown to result in a significant inhibition of DNA binding, potentially linking Ets transcription factors with cellular redox regulatory mechanisms.

Experimental Procedures
Cloning and Site-directed Mutagenesis-Plasmid DNA templates for full-length Etv1, Etv4, Etv5, and Fev were obtained from the Mammalian Gene Collection (I.M.A.G.E. Consortium Clone IDs 30345383, 3854349, 3050350, and 4130242, respectively) (47). Regions corresponding to the core Ets domains were amplified by PCR using Pfx DNA polymerase (Invitrogen, Paisley, UK). PCR products were ligation independent cloning into the pNIC28-Bsa4 expression vector (GenBank TM accession number EF198106, encompassing a tobacco etch virus (TEV)-cleavable (shown by *) N-terminal His 6 tag MHHH-HHHSSGVDLGTENLYFQ*SM), as described elsewhere (48). The initial Etv1 construct contained two primer-incorporated mutations (Y329S and P427S) and thus were designated Etv1 Y329S-P427S . A subsequent construct containing the wildtype sequence is referred to as Etv1. Additional mutants were generated in a full-length Etv1 construct using the megaprimer method (49). The Gly-326 -Asn-429 fragments of the Etv1 mutant derivatives were then subcloned for bacterial expression into pNIC28-Bsa4 as described and confirmed by sequencing. Expression plasmids were transformed into BL21 (DE3) Rosetta-R3 (48) or, when indicated, into Rosetta-gami TM 2 (Novagen).
Recombinant Protein Expression-Recombinant protein expression was induced by the addition of 0.1 mM isopropyl 1-thio-␤-D-galactopyranoside to bacterial cultures grown in TB (Terrific Broth) containing 50 g/ml kanamycin at an OD 600 of 3.0 at 37°C in UltraYield baffled flasks (Thomson Instrument Co, Oceanside, CA). Cultures were further incubated at 18°C overnight. Selenomethionine-derivatized Etv1 Y329S-P427S expression was performed at 37°C in M9 minimal medium, supplemented with 0.4% glucose, 2 mM MgSO 4 , 0.1 mM CaCl 2 , and 50 g/ml kanamycin. Cells were cultured to an OD 600 of 0.8 and 25 g/ml selenomethionine was added, along with leucine, isoleucine, and valine to 50 g/ml, and lysine, threonine, and phenylalanine to 100 g/ml. Cultures were further incubated until OD 600 of 1.2, and protein expression was induced by addition of isopropyl 1-thio-␤-D-galactopyranoside and selenomethionine to final concentrations of 0.1 mM and of 75 g/ml, respectively. Cells were harvested after further overnight incubation at 18°C and stored at Ϫ80°C.
Protein Purification-For purification of Etv1, Etv4, Etv5, and Fev constructs, ϳ50 g of cell pellets were thawed and resuspended in buffer A (50 mM HEPES, pH 7.5, 500 mM NaCl, 5% glycerol, 10 mM imidazole, 0.5 mM tris(2-carboxyethyl)phosphine (TCEP)), with the addition of 1ϫ protease inhibitor set VII (Merck, Darmstadt, Germany) and 15 units/ml Benzonase (Merck). Cells were lysed using sonication. Cell debris and nucleic acids were removed by addition of 0.15% polyethyleneimine, pH 7.5, and centrifugation at 40,000 ϫ g for 1 h at 4°C. Clarified lysates were applied to a 3-ml Ni 2ϩ -iminodiacetic acid-immobilized metal ion affinity chromatography gravity flow column (Generon, Maidenhead, UK), washed with 20 column volumes (CV) of buffer A, followed by 20 CV of wash buffer (buffer A with 30 mM imidazole). Fractions were eluted with 5ϫ 2-CV aliquots of buffer A containing 300 mM imidazole and analyzed by SDS-PAGE, and relevant fractions pooled and cleaved with His 6 -tagged TEV protease (1:20 mass ratio) overnight at 8°C. Imidazole was removed by concurrent dialysis during cleavage, using a 3.5-kDa MWCO snakeskin membrane (Thermo Fisher Scientific, Rockford, IL) in buffer B (20 mM HEPES, pH 7.5, 500 mM NaCl, 5% glycerol, 0.5 mM TCEP). TEV protease was removed from dialyzed proteins using Ni-IDA immobilized metal ion affinity chromatography (2-ml CV) and washed with an imidazole gradient in 20 mM steps to 100 mM in buffer B, and cleaved protein was pooled and concentrated with a 3-kDa MWCO centrifugal concentrators (Vivaproducts, Littleton, MA). Final separation was by size exclusion chromatography, using a HiLoad 16/60 Superdex S75 or 200 column equilibrated in buffer B, and run at 1.2 ml/min in buffer B. Protein identity was confirmed by LC/ESI-TOF mass spectrometry. Protein concentrations were calculated from A 280 (Nanodrop) using the calculated molecular mass and extinction coefficients. The scheme was identical for purification of disulfide-linked proteins but with the omission of TCEP from all buffers.
Crystallization and Structural Determination-The purified Ets domains of Fev and Etv1 were crystallized with doublestranded DNA as follows. Oligonucleotides for co-crystallization were synthesized and used without further purification (Eurofins MWG Operon, Ebersberg, Germany). The oligonucleotides (5Ј-ACCGGAAGTG-3Ј) and (5Ј-CACTTCCGGT-3Ј), with the Ets core recognition sequence underlined, were annealed by mixing 450 M of each oligonucleotide in 10 mM Tris, pH 7.5, 50 mM NaCl, heating to 95°C for 5 min, and cooling to 21°C slowly in a heating block. Frozen Etv1 Y329S-P427S or Fev was thawed rapidly, and aggregates were removed by microcentrifugation at 14,000 ϫ g for 10 min at 4°C. To form DNA complexes, proteins were mixed with annealed oligonucleotides in a 1:1.1 molar ratio in a buffer consisting of 10 mM HEPES, 166 mM NaCl, 5% glycerol, and 0.5 mM TCEP at protein concentrations of 5.3 and 6 mg/ml, respectively, and incubated on ice for ϳ30 min. The protein/DNA mixtures were then concentrated by ultrafiltration using a 3000-Da MWCO centrifugal concentrator to an estimated protein concentration of 12.5 and 16 mg/ml for Etv1-DNA and Fev-DNA, respectively. For crystallization of the Etv4 and Etv5 DNA complexes, the same procedure was repeated but using the oligonucleotides (5Ј-ACCG-GAAGTG-3Ј) and (5ЈACTTCCGGTC3Ј). In all cases, sitting drop vapor diffusion crystallization trials were set up with a Mosquito (TTP Labtech) crystallization robot. Crystals were obtained in the following conditions: Etv1 ( Crystals were cryo-protected by transferring the crystal to a solution of mother liquor supplemented with 25% ethylene glycol and flash-cooling in liquid nitrogen. Datasets were collected for all crystals at Diamond Light source beamlines I04 (Etv1, Etv1-DNA, and Fev-DNA), I02 (Fev), I24 (Etv4-DNA), and I03 (Etv4, Etv5-DNA and selenomethionine Etv1-DNA). Diffraction data were processed with the programs autoPROC (Etv1 and Etv1-DNA) (50), MOSFLM (Fev) (51), and XDS (Etv4, Etv4-DNA, Etv5-DNA and Fev-DNA) (52). The structures of Etv1, Etv4, Etv4-DNA, Etv5-DNA, Fev, and Fev-DNA were solved by molecular replacement using the program PHASER (53). Initial attempts to solve the Etv1-DNA complex by molecular replacement were unsuccessful, and for this reason, the selenomethionine-derivatized protein was produced, and SAD data were collected close to the selenium peak wavelength (0.976 Å). Selenium atom positions were located with the program SHELX (54) and refined using the program SHARP (55). Model building and manipulation were performed in COOT (56), and the structures were refined using autoBUSTER (Global Phasing) (Etv1, Etv1-DNA, and Fev), REFMAC (Etv4) (57), and PHENIX REFINE (Etv4-DNA, Etv5-DNA and Fev-DNA) (58). A summary of the crystallization conditions, data processing, phasing, and refinement statistics for all datasets are shown in Tables 1 and 2. LC/ESI-TOF Mass Spectrometry of Intact Proteins-30-l protein samples at 0.02 mg/ml in 0.1% formic acid were injected onto a 4.6 ϫ 50 mm Zorbax 5-m 300SB-C3 column and resolved by reversed-phase chromatography at 40°C. The solvent system was 0.1% formic acid in double distilled H 2 O (buffer A) and 0.1% formic acid in methanol (buffer B), with 1 min at 5% buffer B and then a linear gradient of 5-95% buffer B over 6 min at 0.5 ml/min. Protein intact mass was determined using an MSD-TOF electrospray ionization orthogonal timeof-flight mass spectrometer (Agilent Technologies, Palo Alto, CA) operated in positive ion mode.

In Vitro Protein Phosphorylation-Phosphorylation of Etv1
Ser-334 by protein kinase A was performed in vitro using 2.5 units of bovine heart PKA catalytic subunit (P2645, Sigma) with 40 g of Etv1, in modified buffer B (20 mM HEPES, pH 7.5, 500 mM NaCl, 5% glycerol, 1 mM ATP, 10 mM MgCl 2 , and 5 mM dithiothreitol). Reactions were performed for 4 h at room temperature and quenched by the addition of 50 mM EDTA, followed by buffer exchange into buffer B with Micro Bio-Spin P-6 columns (Bio-Rad) and confirmation of phosphorylation by ESI-TOF mass spectrometry of the intact protein.
Electrophoretic Mobility Shift Assays-Equilibrium binding constants (K d ) of Ets constructs were estimated by EMSA. Unless otherwise specified, the dsDNA probe contained the single consensus Ets-binding site used in crystallization (underlined), with oligonucleotides Etv1ALF (5Ј-ATCTCACCGGA-AGTGTAGCA-3Ј) and Etv1ALR (5Ј-TGCTACACTTCCGG-TGAGAT-3Ј). Substrates were prepared by 32 P-end-labeling one oligonucleotide, annealing to the complementary strand in 10 mM Tris-Cl, pH 7.5, 50 mM NaCl, prior to purification with Micro Bio-Spin P-6 columns. Proteins were generally titrated from 10 Ϫ14 to 10 Ϫ6 M. The DNA probe was used at 2 nM for the initial estimations of K d (termed K d app1 in Table 3). As the dissociation constants of wild-type Etv1 were estimated to be subnanomolar, we repeated some of the measurements with a DNA probe concentration of 0.1 nM to maintain an excess of protein (results listed as K d app2 in Table 3). As seen in Table 3, the effects of mutations are qualitatively similar at both DNA concentrations. EMSA buffer is composed of 50 mM Tris-HCl, pH 7.5, 50 mM NaCl, 2 mM MgCl 2 , 0.01% Tween 20, and 5% glycerol. Proteins were reduced with 10 mM TCEP prior to assay unless otherwise specified. Reactions were performed for 1 h at room temperature, prior to mixing with loading dye to 0.25%, and resolved by 10% native PAGE at 150 V for 1 h at room temperature. Gels were analyzed using phosphorimaging, and the apparent K d value was estimated from plots by nonlinear regression with a least squares fit, using a specific one-site where I i and I m are the observed intensity and mean intensity of related reflections, respectively.
binding model with Hill slope (Prism, GraphPad, San Diego). Other EMSA substrates are described in the respective figure legends.

DNA-free Structures of the Ets Domains of Etv1, Etv4, and
Fev-The full-length sequences of Etv1, Etv4, Etv5, and Fev are predicted to include disordered regions, which often hinder crystallization. We have generated several expression constructs, including the full-length and truncated versions of each protein. Crystals were obtained from constructs that contained the Ets domain, Etv1 (encompassing amino acids 326 -429), Etv4 (encompassing amino acids 338 -470), and Fev (encompassing amino acids 42-141). The construct of Etv1 included two inadvertent mutations that resulted from primer impurities as follows: Y329S at the N terminus, and P427S at the C terminus; this construct is denoted Etv1 Y329S-P427S . Our attempts to crystallize the corresponding construct with the wild-type sequence failed, although the mutated residues are in disordered regions or at the end of the ordered region of the crystal structures of Etv1, so they are unlikely to have any bearing on the structural and functional analyses. We have also attempted to crystallize Etv5 using a similar construct strategy to that defined above (Etv5 construct encompassing amino acids 365-462), but despite obtaining protein of similar purity and yield as the other Etv constructs, we were unable to grow crystals of Etv5 in the absence of DNA.
Crystals of Etv1 Y329S-P427S diffracted to 1.82 Å and contained four copies of Etv1 in the asymmetric unit (4AVP , Tables 1 and  2). Electron density is observed for residues Ser-334 to Phe-426 (thus excluding the Y329S and P427S mutations) with most side chains visible. Etv4 crystals diffracted to 1.05 Å and contained a single copy of Etv4 in the asymmetric unit (4CO8, Tables 1 and 2). Fev crystals diffracted to 2.64 Å, with two copies of Fev per asymmetric unit (2YPR, Tables 1 and 2). Ets domains are highly similar in sequence (Fig. 1A), with Etv1, where I i and I m are the observed intensity and mean intensity of related reflections, respectively.
In the structural descriptions that follow, the protein residues are numbered according to their position in Etv1; a list of the corresponding residues in Etv4 and Etv5 and Fev is provided in Table 4.
Etv1, Etv4, and Fev display the typical fold of the Ets domain ( Fig. 1, B and C), which contains three ␣-helices (␣1-3) flanking a four-stranded ␤-sheet, with an additional C-terminal helix (␣4). The structures are highly similar (typical root mean square deviation of 1.0 Å over ϳ90 residues), with the main difference being a two-amino acid deletion in Fev (Fig. 1A), linked to a shift of 30°in the orientation of the C-terminal ␣4 helix in the Fev structure compared with the other structures (Fig. 1C). Superposition of the main chains of other Ets domain structures reveals considerable heterogeneity in the region corresponding to helix ␣4 (Fig. 1D). This region was not expected to be helical from proline-scanning mutagenesis (59) and was not structured in the NMR ensemble of the closest structural homologue of Fev, Fli1 (31).
Crystal Structure of the Etv1, Etv4, Etv5, and Fev DNA Complexes-Both Etv1 and Fev DNA complexes were crystallized using a single 10-base pair DNA duplex containing the class I consensus EBS sequence. Etv4 and Etv5 DNA complexes were crystallized using oligonucleotide duplexes that had nine complementary base pairs (containing the same consensus sequence) with a single overhanging nucleotide on each end, which makes favorable interactions with neighboring DNA molecules ( Fig. 2A). Etv1-DNA crystals diffracted to 2.9 Å and contained a single copy of Etv1 and a single DNA duplex in the asymmetric unit (4BNC, Table 2). Etv4-DNA crystals diffracted to 2.8 Å resolution and contained eight Etv4 molecules and eight DNA duplexes in the asymmetric unit (4UUV, Table 2), with the only significant differences between the various chains in the asymmetric unit being the chemical environment of the DNA chains. Etv5-DNA crystals diffracted to 1.95 Å and contained a single copy of Etv5, a single calcium ion (bound close to the N terminus of ␣4), and a single DNA duplex in the asymmetric unit (4UN0, Tables 1 and 2). Fev-DNA crystals diffracted to 2.64 Å, with two copies of Fev per asymmetric unit (2YPR , Tables 1 and 2). In all the DNA complex crystals, base pair stacking interactions between neighboring molecules allow the DNA to form a pseudo-contiguous helix that, in some cases, runs the entire length of the crystal ( Fig. 2A).
In all four DNA complexes, the recognition helix (␣3) inserts deep into the major groove and provides multiple contacts with the base pairs of the core GGAA sequence, as well as with the phosphate backbone (Fig. 2, B and C). Additional contacts to the DNA backbone are provided by the C terminus of ␣2, the ␣2-␣3 loop, ␤3, and the ␤3-␤4 loop. The proteins contact the DNA over a span of 9 bp, and the interface accounts for the burying of ϳ9% (500 Å 2 ) of the total solventaccessible surface area. To accommodate these extensive interfaces, the DNA becomes slightly distorted from the canonical B form, exhibiting smooth bending of ϳ20°toward the protein with significant widening of the major groove (ϳ20.5 Å at widest point) in the vicinity of the recognition helix and narrowing of the minor groove downstream (ϳ9.3 Å at narrowest point) (Fig. 2D). The bending of the DNA can be explained by the large number of contacts made to the phosphodiester backbone, many of which are charged in nature possibly introducing bending through asymmetric phosphate neutralization (60). Specifically in Etv1, hydrogen bonds to phosphate oxygens are formed with the side chains of residues Gln-336, Trp-375, Tyr-386, Ser-392, and Tyr-396, whereas salt bridges are formed with Lys-379, Lys-388, Lys-399, and Lys-404 (Fig. 2C). Two additional charged residues, Arg-381 and Arg-409, appear to be in position with the potential to interact with the phosphodiester backbone; the former occupies a position between the two DNA strands in the region of the minor groove narrowing, and Recognition of Specific DNA Substrates by Etv1 and Fev-The Ets family of transcription factors display sequence-specific DNA recognition for nine residues, which we have numbered Ϫ3 to ϩ6 around the invariant 5Ј-GGA-3Ј core. This core GGA sequence is directly recognized in all Ets family members by two arginines and a tyrosine (Arg-391, Arg-394, and Tyr-395 in Etv1; Arg-103, Arg-106, and Tyr-107 in Fev), found invariably on the recognition helix (␣3). The two arginine residues are both positioned with their guanidinium groups directly above the O6 and N7 of the two guanine bases and, due to the requirement for two hydrogen bond acceptors, are capable of direct recognition of guanine at positions 1 and 2 (Fig. 3A). Similarly, the tyrosine is in a position to accept a hydrogen bond from the N6 of the adenine residues at position 3 (Fig. 3A); positioning of the complementary thymine methyl group in a nonpolar envi- ronment created by the aliphatic portion of the side chains of Arg-391 (Fev-Arg-103) and Lys-388 (Fev-Lys-100) may also contribute to recognition at this position. The characteristic positioning of the two arginine side chains is likely to be further stabilized by cationstacking interactions, which, in the case of Ets domains, form between the guanidinium group of the arginine and the base at the Ϫ1 position, on the same strand as the hydrogen bonded partner. These types of interactions have already been identified in other Ets domain DNA complexes and may contribute to sequence specificity at this site due to the fact that arginine-guanine interactions are energetically the most favorable (61,62).
The DNA sequences recognized by different subclasses of the Ets family outside the central GGA core are more diverse (32,33). Etv1, Etv4, Etv5, and Fev all belong to the class I subfamily of Ets domains with a preferred consensus sequence 5Ј-ACCGGAAGT-3Ј. Both Etv1 and Fev display an absolute requirement for positions ϩ1 to ϩ3, strong preferences at positions Ϫ1, Ϫ2, and ϩ4, and somewhat more relaxed sequence selectivity at positions Ϫ3, ϩ5, and ϩ6 (32). The means by which this recognition is achieved is not currently well understood, as relatively few direct base contacts outside the core GGA motif can be seen in the current structures in the PDB (30,(63)(64)(65). From analysis of the high resolution structures of the Etv5 and Fev DNA complexes (which are high enough resolution (Ͻ 2.0 Å) to reliably locate ordered waters), a number of additional contacts can be seen, which appear to enable direct readout of additional base pairs both upstream and downstream of the GGA core. The two C-G base pairs at positions Ϫ2 and Ϫ1 lie close to a highly coordinated network of four water molecules conserved in both structures and are coordinated by polar contacts to the side chains of Asp-387 (Fev Asp-99), Ser-390 (Fev Ser-102), Arg-394 (Fev Arg-106), Tyr-412 (Fev Tyr-124), and the phosphate oxygens of the two cytosine nucleotides at positions Ϫ2 and Ϫ1. These waters are also found in conserved positions in the crystal structures of the DNA complexes of SAP-1 (64) and ELK-1 (65) and form a hydrogen bond to the N4 of the cytosine at position Ϫ1 (Fig. 3B,  Fig. 4, A-C). A detailed examination of the chemical environment of these water molecules using known hydrogen bonding donors and acceptors as fixed entities (assuming normal protonation states at physiological pH) reveals a single unique arrangement that satisfies all of the interactions in the network (Fig. 4D). The requirement in this network for the water closest to the base at position Ϫ1 to be a hydrogen bond acceptor almost perfectly explains the observed sequence preferences at this position (32), with a strong selection of cytosine (ϳ75% occurrence) and the only other allowed base being adenine (ϳ15% occurrence), which is also able to donate a hydrogen bond, although in this case the hydrogen bonding distance would be significantly longer (Ϸ4.0 Å). The close packing of these water molecules also appears sufficient to explain the selectivity at the Ϫ2 position (in which cytosine is the most favored base and thymine is the least frequent), due to the potential to form favorable van der Waals-type interactions with the nonpolar face of the cytosine and potentially unfavorable interactions or steric clashes with the thymine methyl C7. This mode of recognition also likely occurs at the Ϫ1 position; consequently, there would be a similar discrimination against 5-methylcytosine, which could occur in position Ϫ1 (which is part of a CpG base step). Interestingly, one high resolution crystal structure in which the positions of these waters is not con- served, the PU.1 DNA complex (66), is a member of the class III subfamily of Ets domains that are specific for EBS sequences containing adenine and guanine in the Ϫ1 and Ϫ2 positions, respectively (32). The base pairs at positions ϩ4 and ϩ5 downstream from the GGA motif lie close to the side chain of Tyr-395, which has the potential to accept a hydrogen bond from the N6 of the adenine at position ϩ4, and also to form a favorable van der Waals interaction with the corresponding thymine C7 methyl.
In the Etv5 and Fev DNA complex structures, the corresponding residue (Etv5-Tyr-428 and Fev-Tyr-107, respectively) can be seen to occupy two conformations, one in which the two interactions detailed above are conserved, and an additional conformation where the side chain hydroxyl lies close to Etv5-Lys-432/Fev-Lys-111 and provides a large contact surface, with the potential for favorable van der Waals interactions to bases at positions ϩ5 and ϩ6 on the complementary strand (Fig. 3C). As discussed below, mutation of the corresponding tyrosine in Etv1 (Tyr-395) to phenylalanine had little effect on DNA binding affinity, suggesting that this van der Waals interaction is the most significant contribution of this residue to binding energy. Although these interactions may not be sufficient to determine the absolute sequence preferences at these positions, the experimentally derived preferences are more relaxed, with a tolerance for thymine at ϩ4, adenine at ϩ5, and cytosine at ϩ6 (32). Both the importance to DNA recognition and the dynamic nature of this residue have already been demonstrated in a comparative structural study of the DNA binding properties of the Elk-1 and Sap-1 Ets domains (65) in which a salt bridge, estab-lished between the nearby Lys (equivalent to Fev-Lys-111) and a neighboring Asp residue (equivalent to Fev-Asp-110), was thought to stabilize the tyrosine in its alternative position. The finding that both conformations of this residue appear to occur together indicates that the dynamic nature of the side chains of the recognition helix may be even more important for DNA recognition than previously thought.
Order-Disorder Transitions upon DNA Binding-A comparison of the structures of Etv1, Etv4, and Fev in the presence and absence of DNA allows us to identify conformational changes that may occur upon DNA binding. In both cases, the overall structure is very well conserved (root mean square deviation of ϳ1 Å), with only minor movements of both the N and C termini and the ␤3-␤4 loop, which appear to move away from the DNA in the DNA complexes of both Etv1 and Fev, avoiding potential steric overlaps that can be seen to occur following structural superposition. On an individual residue level, the most striking transitions can be seen to occur in the recognition helix ␣3, where in Etv1 Asp-387, Arg-391, and Arg-394 can be seen to become ordered upon DNA binding (Fig. 5, A, and B) and Tyr-395, which can be seen to switch between the two alternative rotamers found to be important for recognition of bases downstream of the GGA motif in the Fev-DNA complex. Similar transitions occur in the Fev structure where Asp-99, Arg-103 and Tyr-107 adopt different rotamers, and the disordered Arg-106 and Lys-111 become ordered upon DNA binding. In Etv4, in addition to three residues Arg-397, Arg-400, and Tyr-401 adopting different rotamers, the C-terminal end of the recog-  MAY 29, 2015 • VOLUME 290 • NUMBER 22 nition helix switches from a 3 10 hydrogen bonding pattern (which is the form adopted in all the other ETS structures) to a more canonical ␣-helical hydrogen bonding pattern, although in this case the difference may be due to the fact an ethylene glycol molecule is bound in the vicinity (Fig. 5A). The dynamic nature of the residues within the recognition helix is likely to be an important factor for Ets domain recognition, and it indicates possible means whereby Ets domains may scan along the DNA duplex in search for high affinity sites following binding to nonspecific sites, in a similar manner to that observed for the lac repressor DNA complex (67).

Structures and Redox Regulation of Etv and Fev
Biochemical Analysis of Etv1 DNA Binding-To assess the contribution of individual residues to DNA binding in the solution we have, using Etv1 as a model system, mutated several residues individually to alanine or (in the case of tyrosine residues) to phenylalanine. Isolated Ets domains containing each of the mutations were compared in EMSAs to wild-type Etv1 (Fig.  6A and Table 3). We also compared the Etv-1 construct used for crystallization, which contains two primer-induced mutations, against wild-type Etv1 and found no significant difference in DNA binding (Fig. 6B). Somewhat surprisingly, mutants in res-idues involved in salt bridges to the DNA backbone (Lys-379 and Lys-404) exhibited no detectable DNA binding. Similarly, mutants of the tyrosine residues forming hydrogen bonds to the DNA backbone (Tyr-396 and Tyr-397) to either phenylalanine or to alanine also resulted in significant (Ͼ1000-fold) reduction in binding affinity to DNA. and Fev (shown on the right) reveals significant disorder-order transitions and rotamer movements within the recognition helix that accompany DNA binding. B, comparison of temperature factors of apo-and DNA-bound forms. The average B-factor (normalized so that the mean value is equal to zero) is plotted as a function of residue number for Fev (left panel) and Etv1 (right panel), with secondary structural elements shown for reference. Residues belonging to the recognition helix ␣3 are highlighted by a gray background and can be seen in both cases to transition from relatively high B-factors to among the lowest B-factors in the entire structure. These transitions, however, were not seen in a comparison of Etv4 apo-and DNA-bound crystals, probably due to the extensive crystal contacts formed by residues in this helix in the apo-crystals, which consequently diffract to 1.05 Å resolution. Mutations in individual residues involved in nucleobase interactions affected DNA binding to a lesser extent. Mutants of the invariant Arg-391 and Arg-394 bound DNA 50-and 1300-fold less than the wild type, respectively. Interestingly, mutation of Tyr-395 to phenylalanine did not affect the K d values, but maximal DNA binding at saturation was only 60% of the total probe; this may indicate a rapid dissociation (high k off ) of the complexes. Uniquely among the residues tested, a D387A mutation had no effect on DNA binding; it is possible that this residue contributes little to the free energy of binding to the cognate DNA sequence but may be crucial for selectivity (see below). Overall, the mutational analysis demonstrates that the residues interacting with the DNA backbone in the crystal structure are the major contributors to the binding affinity in solution, although the interactions with the bases add a significant but lower energetic contribution.
Biochemical Analysis of Etv1 Binding to Methylated DNA-To test whether CpG methylation directly affects Etv1 binding, we modified the 20-bp oligonucleotides used in the EMSA experiments by substituting 5meC at position Ϫ1 of the consensus (CGGAA) along with the cytosine complementary to Gϩ1, generating a fully methylated CpG motif (C m GGAA/ TTCC m G). EMSA experiments showed that the Etv1 protein failed to bind the methylated DNA (Fig. 7A). Modeling of 5meC at these positions in the Etv1-DNA and Fev-DNA structures (Fig. 7B) showed Arg-394/106 (Etv1/Fev) was close to the 5meC at Ϫ1 of the CGGAA consensus strand, and Asp-387/99 was close to 5meC at both Ϫ1 of the CGGAA strand and ϩ1 on the complementary strand. Methylation of cytosines at these positions could create a steric clash with Arg-394/106 causing loss of hydrogen bonds involved in the conserved water network. EMSA analyses of Etv1 D387A and Etv1 R394A mutants showed that replacement of either of these charged side chains with the smaller and hydrophobic alanine recovered some binding to methylated DNA (3 and 2% of total DNA bound, respectively, at the highest protein concentration) compared with the wildtype protein (Fig. 7A). It may be significant that Asp-387 (or glutamate) is almost invariant in the ETS recognition helix, with only class III proteins displaying a substitution to glutamine (32). As class I Ets proteins predominantly select cytosine at the Ϫ1 position of the consensus, although class III members typically have adenine (32), it is possible that Asp-387 (or Glu) plays a universal role not only in the preference for cytosine at Ϫ1 but also to the discrimination against CpG-methylated sites.
Direct Regulation of DNA Binding by PKA Phosphorylation of the Etv1 Ets Domain-Ets proteins are regulated by phosphorylation (68); for example, the protein kinases RSK1 and PKA phosphorylate Etv1 at multiple positions in vivo (40). PKA phosphorylation of Etv1 Ser-334 (at the N-terminal edge of the Ets domain) inhibits in vitro DNA binding of the full-length protein but increases transcriptional transactivation potential, as does phosphorylation of the equivalent Ser-367 in Etv5 (41). In contrast, Etv4 lacks a serine at the equivalent position (asterisk, Fig.  1A). To assess whether Ser-334 phosphorylation of the Ets domain affects DNA binding directly, we phosphorylated Etv1 with PKA on Ser-334 in vitro (Fig. 7, C and D). In parallel, we produced a phosphomimetic mutant of Ser-334 to glutamate. EMSA experiments showed that phosphorylation of Ser-334 reduced the binding affinity by at least 200-fold (Fig. 7C). This confirms that Ser-334 phosphorylation can directly interfere with DNA binding of the Etv1 Ets domain, independent of other regions of Etv1 or of other proteins (40). Interestingly, the S334E mutation caused only a modest reduction in the binding constant, but overall probe binding levels were 35% at apparently saturating DNA concentration. This may indicate a kinetic instability of DNA binding by the mutant protein. A phosphoserine residue is bulkier and carries more negative charge than glutamate, suggesting that the strong effect of the phosphate is due to charge repulsion. Modeling of a phosphor-ylated Ser-334 in the Etv1-dsDNA structure (PDB code 4BNC) superimposed onto a longer dsDNA (Fig. 7D) shows a possible repulsive interaction between the phosphate on the protein and the DNA backbone, which may underlie the decrease in DNA binding.
Etv1/4/5 and Fev Are Disulfide-linked Homodimers-Analysis of the contacts between adjacent molecules in the Etv1, Etv4, and Etv5 crystals reveals a conserved potential dimer interface that is distant from the DNA binding surface and is present in all the DNA-free and DNA-bound crystal forms (Fig. 8A). This interface is centered around contributions from the N-and C-terminal helices, ␣1 and ␣4, with additional contacts involving residues from the ␤1-␤2 loop (Fig. 8B). The contact area for this interface is ϳ800 Å 2 , includes contributions from 20 residues, and accounts for the burying of 15% of the monomer accessible surface area upon complex formation. The majority of the interface is nonpolar in nature with extensive hydrophobic interactions. Similarly, an analysis of the crystal contacts in both the Fev and Fev-DNA crystals reveals an interface that includes the same components (␣1 and ␣4 and the ␤1-␤2 loop) as that of the other three Etv proteins. However, the dimer interfaces in Etv1/4/5 and in Fev are not equivalent; rather, they differ by a relative rotation of ϳ90°, reflecting the different orientations of the ␣4 helix (Fig. 1C). The contact area for the Fev dimerization interface is ϳ700 Å 2 , spans 21 residues, and accounts for 12.5% of the monomer accessible surface area. Similar to Etv1, the interface is dominated by contacts from hydrophobic residues, a single hydrogen bond (Asp-59 -His-72), and an intermolecular disulfide bond between Cys-135 and its symmetry mate (Fig. 8, B and C). All the proteins were purified and crystallized in the presence of reducing agent (1 mM TCEP), although the reducing agent may have become oxidized during crystallization in an oxygen-containing environment.
Intermolecular dimerization has been documented previously in Ets factors (69), with ETS1 forming head-to-head homodimers required for cooperative binding on palindromic and other sites (70). ELK1 dimerization appears to regulate ELK1 cellular stability (71), with a homotypic interface utilizing the ␣1 helix, similar to that of Etv1/Fev, but with a different positioning and orientation (65). The presence of an intermolecular disulfide bond formed between similar C-terminal regions Etv1, Etv4, Etv5, and Fev is particularly significant, as they are homologous to ETS1 that can form disulfide-linked dimers in vivo (72). Although present in the crystal structures, disulfide-linked Ets domain dimers were not initially observed in solution (Fig. 9A). When Etv1, Etv4, or Etv5 was expressed in Rosetta-gami TM 2 Escherichia coli (73) and purified under nonreducing conditions, a peak consistent with the dimer molecular weight was observed in size exclusion chromatography (Fig. 9A). SDS-PAGE in the presence or absence of DTT suggests the higher molecular weight peak contained a reducing agent-sensitive dimer, which was further confirmed by mass spectrometry. We have been unable to test whether Fev also forms disulfide-linked homodimers, as our Fev constructs did not produce soluble protein in the Rosetta-gami TM 2 E. coli cells.
Disulfide Bond Formation and Dimerization Inhibit DNA Binding-To discover the functional consequences of disulfidedependent dimerization, we compared the DNA binding affinities of purified monomeric and dimeric Etv1 using EMSA. The reduced monomeric Etv1 protein bound the DNA probe with an affinity up to 200-fold higher than the oxidized dimeric form (Table 3; Fig. 9, A and C); a C416S mutant of Etv1 bound DNA with an affinity similar to the reduced wild-type Etv1 (0.26 nM). Similarly, dimeric Etv4 and Etv5 also bind DNA with affinities that are 2 orders of magnitude lower than the reduced monomeric forms (Fig. 9, D, F, and G and Table 3; note that the K d values of the dimeric forms are rough estimates, as saturation could not be reached). A graded response to physiological redox potentials could be seen when Etv1 dimers were treated with different ratios of reduced and oxidized glutathione; the extent of DNA binding roughly correlates with increasing the reducing agent potential and dimer dissociation (Fig. 9, B and C). We conclude that Etv1, Etv4, and Etv5 binding to DNA is reversibly inhibited by the formation of a disulfide-linked dimer.
DNA complexes of dimeric Etv4 and Etv5 migrate at a lower mobility than complexes of the monomeric proteins (Fig. 9, F and G), indicating the presence of a dimer of Etv4/5 with one or two bound probe molecules. EMSA using the dimeric form of Etv1 results in a band with mobility of a monomer (Fig. 9E); we assume this reflects the small amount of monomeric protein that is present in the high molecular weight SEC fractions (Fig.  9B, left lane); this monomeric subpopulation could not be removed by repeated chromatography. However, at the highest concentrations of dimeric Etv1 and DNA, a weak slower migrating band could be observed with a mobility similar to that of the Etv4 and Etv5 dimeric complexes. The paradoxical observation of the dimer in complex with DNA in the various crystal structures may be accounted for by high concentrations of both protein and DNA (ϳ1 mM) during crystallization, which may be well in excess of the dissociation constant. We also note that both DNA-bound and -unbound forms of the proteins crystallized in the disulfide-bonded form, perhaps indicating that the monomeric form is considerably more flexible and difficult to crystallize.

Discussion
Although the interaction between Ets domains and DNA has been the subject of numerous previous structural and biochemical studies (32,74), the question of how Ets domains are able to achieve DNA sequence specificity beyond the GGA consensus motif has remained open. Furthermore, although a variety of post-translational modifications regulate Ets transcription factors (39, 40, 75-77), the mechanisms involved have yet to be unambiguously clarified (8). In particular, it is not known whether modification exerts a direct effect on Ets protein-DNA binding, protein-protein interactions, or stability.
Our structural analysis of the Etv1, Etv4, Etv5, and Fev Ets domains both in the presence and absence of DNA has allowed us to identify additional features of the protein-DNA interface. Mutation of individual residues supports predictions from the crystal structure for binding the GGA core for most residues tested. Remarkably, mutating any of the residues of Etv1, which interact directly with the DNA backbone (Lys-379, Tyr-396, Tyr-397, and Lys-404), led to complete loss of binding. The extent to which these substitutions affect the binding affinity indicates that these residues might play a crucial role in opening the DNA to allow the ␣3 recognition helix to access the widened major groove or anchor the Ets domain once recognition has occurred. Further interface features include the dynamic nature of a conserved tyrosine (Tyr-395 in Etv1), which may allow for recognition for up to three bases downstream from the GGA core. In addition, a conserved cluster of coordinated water molecules supports a structural basis for sequence recognition of two bases upstream of the GGA core, including selectivity against thymine as well as 5-methylcytosine bases at position Ϫ1 of the consensus motif. Methylated bases may create an energetically unfavorable environment for conserved aspartate and arginine residues, preventing Ets proteins from binding when transcription is to be repressed by CpG methylation (43), as observed previously for Ets proteins (44 -46). Although difficult to rule out the role of indirect recognition through sequence dependence of the DNA bending observed in our structures, the additional direct and water-mediated interactions seen in the Etv1 and Fev protein-DNA crystal structures appear to be sufficient to explain almost entirely the observed sequence specificities.
Etv1 is post-translationally regulated by Rsk1 and PKA phosphorylation in vivo (40). We were able to reproduce the inhibition of DNA binding by specific phosphorylation of Ser-334, a known PKA target site, in the isolated Ets domain in vitro. This demonstrates a direct effect of post-translational regulation on an Ets domain, and our Etv1-DNA structure suggests that an increase in negative charge around Ser-334 following phosphorylation may directly abrogate DNA binding following electrostatic clashes.
We found that disulfide-mediated dimerization strongly inhibited Etv1, Etv4, and Etv5 DNA binding in vitro, which was directly reversible with increasing redox potential. Our crystal structures do not provide an explanation for this inhibition; the DNA in the crystal is bound to the dimeric protein, and the dimerization interface is sufficiently distant from the DNA binding interface to eliminate the possibility of steric effects with the substrates used (20-mer with the consensus sequence centrally located). Examination of the protein DNA interfaces in these crystals and comparison with other active Etv DNA complexes reveal very similar interfaces, with similar contacts made to the phosphodiester backbone and the major sequence-specific DNA interactions provided by the recognition helix being totally conserved (Fig. 10). Thus, we believe the interfaces in our dimeric crystal structures to be representative of the higher affinity monomer DNA interface. This leads us to suggest that that the most likely explanation for the inhibition caused by dimerization is an allosteric mechanism of inhibition, which acts through changes in protein flexibility or dynamics. The mechanism may be similar to the autoinhibitory mechanism of ETS1, where packing of a 4-helical inhibitory module also distal to the DNA-binding face allosterically modulates subtle structural changes that inhibit DNA binding (78). It is also possible that the constraints of crystallization favor a protein conformation that is less stable in solution. However, structural studies of allosteric regulation do not always reveal a clear-cut mechanism, and there are cases where allosteric regulation acts through changes in protein flexibility or dynamics (79 -81).
Redox control of transcription factors is a recognized regulatory mechanism (80,(82)(83)(84). Examples include intermolecular disulfide bridges in bZIP proteins, such as AP-1 (85), and cysteines outside the DNA-binding motif mediating redox-sensitive dimerization in plant homeodomains (86). Ets proteins have been implicated in redox signaling in vivo, such as the oxidative inactivation of GABP␣ by the Hippo pathway (87). The mechanism of redox-dependent DNA binding inhibition in Ets factors has remained unclear, as many contain multiple cysteine residues. Here, we show that a single redox-sensitive cysteine is sufficient to confer inhibition of DNA binding by dimerization.
Notably, cysteines at positions equivalent to Etv1-Cys-416 exist in addition only in GABP␣, ETS1, and ETS2 out of all human Ets domain proteins. Although dimerization and disulfide formation seem to have regulatory influence on proteins, including GABP␣ (88) and ETS1, the equivalent cysteines in crystal structures of ETS1, GABP␣, and ETS2 (89,90) do not seem to be involved in intermolecular disulfide bridges. This indicates that the precise mode of redox regulation seen for Etv1/4/5 may be restricted to a small subset of the Ets protein family.
Etv1, -4, and -5 are strongly implicated in cancer (8). Our data suggest the possibility that redox-mediated dimerization could link Etv1/4/5 factors to the response of cancer cells to their microenvironment. If proven, chemoprevention by targeting such redox-sensitive transcription factors could potentially deliver novel therapeutic strategies (91).