An Extended Winged Helix Domain in General Transcription Factor E/IIEα*

Initiation of eukaryotic mRNA transcription requires melting of promoter DNA with the help of the general transcription factors TFIIE and TFIIH. Here we define a conserved and functionally essential N-terminal domain in TFE, the archaeal homolog of the large TFIIE subunit α. X-ray crystallography shows that this TFE domain adopts a winged helix-turn-helix (winged helix) fold, extended by specific α-helices at the N and C termini. Although the winged helix fold is often found in DNA-binding proteins, we show that TFE is not a typical DNA-binding winged helix protein, because its putative DNA-binding face shows a negatively charged groove and an unusually long wing, and because the domain lacks DNA-binding activity in vitro. The groove and a conserved hydrophobic surface patch on the additional N-terminal α-helix may, however, allow for interactions with other general transcription factors and RNA polymerase. Homology modeling shows that the TFE domain is conserved in TFIIEα, including the potential functional surfaces.

In eukaryotes, RNA polymerase (pol) 1 II requires five general transcription factors, called TFIIB, -D, -E, -F, and -H, to initiate mRNA transcription. The general transcription factors assemble with pol II into a large initiation complex on promoter DNA (1)(2)(3). According to in vitro reconstitution experiments, initiation complex assembly starts with recognition of the promoter DNA by TFIID (4,5). The TFIID subunit TATA boxbinding protein (TBP) binds to the TATA box, a frequently occurring promoter element located about 30 bp upstream of the transcription start site. The TBP⅐DNA complex is stabilized by specific binding of TFIIB to TBP and to promoter DNA sequences adjacent to the TATA box (6). The TBP⅐TFIIB⅐DNA complex can be further stabilized by binding of the optional factor TFIIA. A preformed pol II⅐TFIIF complex is then recruited to the transcription start site. For most parts of the resulting complex, detailed structures are available, including TBP (7,8), the TBP⅐DNA complex (9,10), complexes TBP⅐DNA⅐TFIIB (11,12) and TBP⅐DNA⅐TFIIA (13)(14)(15), domains of TFIIB (16) and TFIIF (17)(18)(19)(20)(21)(22), and pol II (23)(24)(25)(26)(27).
Before RNA synthesis can begin, DNA must be melted around the transcription start site with the help of TFIIE and TFIIH (28). Binding of TFIIE to the pol II complex apparently recruits TFIIH (29). TFIIH contains helicase activities that unwind DNA and form a transcription bubble (30,31). TFIIH also comprises a kinase that phosphorylates the pol II C-terminal repeat domain during the transition from initiation to elongation. Both enzymatic activities of TFIIH are stimulated by TFIIE (32). In keeping with a role of TFIIE as a bridging factor between pol II and TFIIH, TFIIE has been shown to bind to pol II and to TFIIH (29). Several lines of evidence suggest a role of TFIIE in promoter DNA binding and melting. First, TFIIE can bind single-stranded DNA (33). Second, TFIIE function is influenced by the helical stability of promoter DNA, and the requirement for TFIIE during initiation can be bypassed by premelted promoter sequences (34). Finally, site-specific crosslinking located TFIIE near DNA around the active center of pol II. One study observed cross-linking of TFIIE to DNA within the transcription bubble and just downstream of it (35). Another cross-linking study located TFIIE adjacent to promoter DNA just upstream of the transcription start site (36).
TFIIE consists of a large and a small subunit, referred to as ␣ and ␤, respectively, in higher eukaryotes. From N to C termini, the TFIIE␤ sequence shows a serine-rich region with little predicted secondary structure, a region with sequence similarity to a DNA-binding domain in the small TFIIF subunit, a region with similarity to the bacterial factor, and a presumably helical region with clusters of basic residues. NMR has shown that the TFIIE␤ region with homology to TFIIF forms a winged helix domain (37). Structural information on TFIIE␣ is, however, lacking. The C-terminal half of TFIIE␣ is likely unstructured and is dispensable for yeast viability (33). In contrast, the N-terminal half of TFIIE␣ is essential for cell growth (33) and contains a leucine-rich region with weak sequence similarity to the bacterial factor, followed by a zincbinding domain (38). In agreement with in vivo data, the Nterminal half of TFIIE␣ is required for basal and activated transcription in vitro (39) and interacts with TBP and pol II (29).
Archaea contain a TFIIE homolog, called TFE, that corresponds to the essential N-terminal half of TFIIE␣ (40,41). Also found in archaea are homologs of TBP and TFIIB, but not of TFIIF and TFIIH (42). Whereas the archaeal TBP and TFIIB homologs are essential for transcription initiation, TFE is not absolutely required but stimulates transcription in vitro (40,41). Because mutagenesis of the TATA box or low concentrations of TBP sensitize a promoter to TFE in vitro, TFE apparently stabilizes the archaeal TBP⅐DNA complex (41). TFE also interacts directly with archaeal TBP and RNA polymerase (41).
Here we have defined and crystallized the N-terminal domain of TFE from the archaeon Sulfolobus solfataricus. X-ray analysis reveals that this domain adopts an extended winged helix fold with unusual features. Analysis of surface properties, comparisons with other winged helix domains, and biochemical data suggest that the function of the TFE winged helix domain is not primarily DNA binding. The available data are, however, consistent with a role of this domain as an adapter between RNA polymerase and general transcription factors.

EXPERIMENTAL PROCEDURES
Cloning and Site-directed Mutagenesis-The TFE DNA fragments TFE⌬C and TFE⌬⌬C, which encode for TFE residues 1-110 and 1-97, respectively, were subcloned using as template DNA pET-ssTFE, which encodes for full-length S. solfataricus TFE (41). Plasmid pET-ssTFE was a generous gift of Stephen Bell. For cloning of fragment TFE⌬⌬C, oligonucleotide primers TFEf1 (5Ј-TATATCATATGATGGTTAACGCA-GAGGATCTGTTTA-3Ј) and TFEr1 (5Ј-TATCTGCGGCCGCTAATCTT-TTCCTATTTAACAG-3Ј) were used in PCR amplification. Primer TFEf1 adds an NdeI restriction site upstream of the start codon. For amplification of TFE fragment TFE⌬C, primer TFEr1 was replaced by TFEr2 (5Ј-TATCTGCGGCCGCTTTCTCATATTCCAATCTTGTC-3Ј). Primers TFEr1 and TFEr2 encoded an overhang for a NotI restriction site downstream but did not introduce a stop codon. Ligation of the restriction digested PCR products into pET21b (Novagen) added to the gene codons for a C-terminal hexahistidine tag followed by a stop codon. Cloning was performed according to standard procedures. To introduce additional methionine residues into the polypeptide chain encoded by gene TFE⌬C, codons for hydrophobic residues were changed by sitedirected mutagenesis using the two-step PCR overlap extension method, as described in a previous study (43). Several mutant forms of TFE⌬C have been constructed, including the single mutant F76M, the double mutant I39M/F76M, and the triple mutant I39M/F62M/F76M.
Protein Expression, Selenomethionine Labeling, and Purification-Plasmid-DNA encoding for wild-type TFE⌬C or TFE⌬⌬C protein was transformed into Escherichia coli strain BL21(DE3)RIL (Stratagene). Cells were grown at 37°C in LB medium supplemented with ampicillin and chloramphenicol at concentrations of 100 mg/liter each. Induction was performed during logarithmic growth by addition of 0.5 mM isopropyl-␤-D-thiogalactopyranoside, and bacteria were grown for an additional 3.5 h. Cells were harvested by centrifugation and resuspended in buffer A (50 mM Tris-Cl, pH 8.0, 300 mM NaCl, and 10 mM ␤-mercaptoethanol). For selenomethionine labeling of TFE⌬C mutants F76M, I39M/F76M, and I39M/F62M/F76M, plasmid DNA was transformed into the methionine auxotroph E. coli strain B834(DE3) (Novagen). Bacteria were grown in LB medium with 100 mg/liter ampicillin at 37°C to an A 600 of 0.6. Cells were harvested and resuspended in the same amount of New Minimal Medium (44), supplemented with selenomethionine and ampicillin at concentrations of 100 mg/liter each. The optical density was measured, and cells were grown until the optical density at ϭ 600 nm increased by 0.2 unit, to deplete the cells from any residual methionine from the standard LB medium. Protein expression was induced by addition of 0.5 mM isopropyl-␤-D-thiogalactopyranoside, and cells were grown for an additional 4 h at 37°C. Cells were harvested and resuspended in buffer A.
Native and selenomethionine-labeled proteins were purified according to the same procedure. Cells were broken with a French press, and the suspension was clarified by centrifugation. The cleared supernatant was incubated for 20 min at 72°C, to denature the majority of E. coli proteins. After centrifugation, the cleared lysate was loaded onto a 1-ml nickel-nitrilotriacetic acid column (Qiagen), equilibrated with buffer A. The column was washed with 20 ml of buffer A, followed by 4 ml of buffer A containing additionally 40 mM imidazole. Bound proteins were eluted with buffer A containing 250 mM imidazole and were judged to be 99% pure by SDS-PAGE. The proteins were then applied to a Superose-12 HR gel filtration column (Amersham Biosciences) equilibrated with buffer B (50 mM Hepes, pH 7.5, 200 mM NaCl, 1 mM EDTA, 10 M ZnCl 2 , 5 mM dithiothreitol). Peak fractions were pooled and concentrated to 15 mg/ml. Protein concentrations were measured using Bradford reagent (Bio-Rad). Whereas TFE⌬C and its variants remained soluble during purification, TFE⌬⌬C was found in the insoluble pellet after centrifugation.
Crystallization and Data Collection-Purified protein samples were subjected to commercial crystal screens (Hampton Research) using the hanging drop vapor diffusion method. In initial screens, TFE⌬C formed large clusters of thin needles within a week under three different conditions. The reservoir solutions were (i) 100 mM Tris-Cl, pH 8.5, 200 mM LiSO 4 , 30% (w/v) polyethylene glycol (PEG) 4000, (ii) 100 mM Tris-Cl, pH 8.5, 200 mM NaOAc, 30% (w/v) PEG4000, and (iii) 20% (v/v) ethanol, 10% (v/v) glycerol. Optimization around the second condition yielded thicker needles (smallest dimension, 30 m). Removal of the hexahistidine tag did not improve crystal quality. Exchange of PEG4000 in the reservoir solution by PEG1000 resulted in very large, cigar-shaped single crystals (smallest dimension, Յ200 m). Crystals of selenomethionine-labeled proteins were grown under the same conditions. Crystals were harvested in mother liquor and were transferred to a solution containing additionally 15% (v/v) PEG400. Crystals were incubated for 10 min, flash-frozen in liquid nitrogen, and stored for data collection at the synchrotron. Multiwavelength anomalous diffraction (MAD) experiments were performed on crystals from selenomethioninelabeled proteins. X-ray diffraction data from protein crystals of native TFE⌬C and selenomethionine-labeled TFE⌬C variant I39M/F76M were collected at the Swiss Light Source (Table I). Further MAD experiments on protein crystals of TFE⌬C variants F76M and I39M/ F62M/F76M were performed at beam line ID14-4 at the European Synchrotron Radiation Facility (ESRF) in Grenoble (Table I).
Crystal Structure Determination-X-ray diffraction data of wild type and native protein crystals and crystals of selenomethionine-labeled protein from TFE⌬C variants F76M and I39M/F76M were integrated and scaled using the programs DENZO and SCALEPACK (45). MAD data from protein crystals of variant I39M/F62M/F76M were reduced using the program XDS (46). Selenium sites were found by using the program SOLVE (47,48), and the solution was confirmed with program SnB (49). The obtained selenium sites were used in phasing with SHARP (50) and were refined with RESOLVE (47, 48) with automated model building turned off.
Additional methionines were introduced at positions of conserved hydrophobic residues, some of which are methionines in TFE homologs. After introduction of one methionine in addition to the natural M1 and M34 (TFE⌬C variant F76M), we could consistently identify two selenium peaks with SnB, but no interpretable electron density was obtained after refinement with SHARP. Structure determination was tried in both enantiomorphic space groups (P6 1 22 and P6 5 22). Introduction of a second additional methionine (double mutant I39M/F76M) and further MAD experiments led to the detection of three selenium peaks by SnB and SOLVE (Z-score 13.3), of which two matched the peaks in variant F76M. After phasing with SHARP, a noisy electron density was obtained that showed segments of ␣-helices. However, no continuous C ␣ trace could be built into this density. Finally, a third additional methionine was introduced. MAD data from crystals of the triple mutant I39M/F62M/F76M gave rise to an additional selenium peak but still resulted in a poor electron density map. The known three selenium sites could be identified, the Z-score increased to 23.5 (mean figure of merit ϭ 0.55, CC Nat. Fourier ϭ 0.17), and the electron density improved dramatically. However, a fourth selenium site was still missing, and no continuous C ␣ trace could be built into the density. A breakthrough was achieved when a fourth selenium site was identified with SHARP, and the obtained phases were further improved with RESOLVE.
The resulting electron density, in combination with the selenium atom positions used as sequence markers, allowed for building of an atomic model with program O (51) that included TFE residues 3-88 except residues 70 -72 in a flexible loop region (Fig. 1). After one round of refinement against the remote dataset, the initial model was repositioned and reoriented in the slightly different unit cell of the native crystal by rigid body refinement with CNS (52). After positional and B factor refinement, model-phased maps allowed for building of the Nterminal residues 1-2, but still there was no electron density observed for the flexible residues 70 -72 and for the C-terminal region. Several cycles of model rebuilding and refinement with CNS were performed. Refinement by simulated annealing was used in the beginning, but later only positional refinement was carried out. Before each refinement cycle, all atomic temperature factors were set to 50 Å 2 and were refined again before calculation of a new electron density map. During refinement, the automated solvent correction in CNS failed, most likely due to the high R sym values at low resolution. We therefore used only data in the resolution range 6.5-2.9 Å during refinement. The model refinement converged with a free R factor of 32.1% (33.9% if the problematic low resolution data are included). The relatively high model R factors may be explained by the moderate resolution of our data, a high overall B factor, and by the high percentage of disordered residues (29%). Except residues Val-2 and Trp-75, which have high B factors, all residues fall within the allowed regions of the Ramachandran plot. The coordinates have been deposited with the Protein Data Bank (PDB accession code 1Q1H).
To ensure that the relatively high R factors are not due to low quality of the native data set, we have also refined the structure against the remote wavelength of the MAD data set. To this end, the free R factor flags were transferred and three residues were mutated to methionines. Several cycles of slight manual adjustment of the structure and refinement of atomic positions and B factors with CNS quickly converged on a refined model that has very similar R factors (Table I). A superposition of the two refined models revealed that they are essentially identical within the experimental error, except for minor differences around the sites of mutation.
DNA-binding Assay-Band shift assays were carried out essentially as in a previous study (53). Briefly, purified TFE⌬C was added to 100 pmol of DNA in a 100-and 200-fold molar excess in buffer TGEM (50 mM Tris-Cl, pH 7.5, 380 mM glycine, 2 mM EDTA, 4% (w/v) PEG6000, 20% (v/v) glycerol, and 16 mM magnesium acetate). The reaction mixture was incubated at 20°C for 30 min before 10 l of sample loading buffer (TGEM with 30% glycerol and bromphenol blue) was added. The mixtures were then separated on a 5% native polyacrylamide gel (95 V for 90 min at 20°C). The gels were stained with the highly sensitive DNA stain SYBR Gold (Molecular Probes, Leiden, Netherlands).

TFE Contains a Stable N-terminal Domain-
The modular and flexible nature of TFIIE has hampered crystallization so far. To obtain structural information for the essential N-terminal region of TFIIE␣ we turned to the homolog TFE from the archaeon S. solfataricus. Full-length S. solfataricus TFE was expressed in E. coli and purified as described (41), with minor modifications (see "Experimental Procedures"). Because initial crystallization screens with the full-length protein were unsuccessful, we conducted partial proteolysis of purified TFE with four proteases of different specificity (trypsin, chymotrypsin, elastase, and subtilisin). These experiments always resulted in a stable N-terminal fragment (verified by Edman sequencing, not shown), which lacks a C-terminal region that includes a potential zinc-binding domain that is apparently not well structured in S. solfataricus TFE (Fig. 2A). Although binding of zinc ions by this domain is essential for function of TFE (40) and TFIIE (54), several TFE sequences, including S. solfataricus TFE, lack one of the four cysteine residues involved in zinc coordination (55).
Crystal Structure Determination-Based on the proteolysis results, we constructed the TFE variant TFE⌬C, which comprises residues 1-110 (see "Experimental Procedures"). TFE⌬C was expressed in E. coli with high solubility. The purified TFE⌬C variant crystallized in two morphologies. Individual cryo-cooled crystals of both morphologies diffracted synchrotron radiation to 2.8-Å resolution at best and turned out to have the same symmetry and comparable unit cell dimensions.
Trials to derivatize TFE⌬C crystals with heavy metals were unsuccessful, most likely due to a lack of cysteine residues. Structure determination by selenomethionine incorporation and multiwavelength anomalous diffraction (MAD) was also hampered, because TFE⌬C comprises only two methionine residues, M1 and M34, that did not provide sufficient signal for phasing. To increase the signal for MAD phasing, we introduced additional methionines in a sequential, non-disruptive manner (see "Experimental Procedures"). Sufficient signal for MAD phasing was only obtained after three additional methionines were introduced. MAD phasing with the selenomethionine-labeled triple mutant I39M/F62M/F76M led to an electron density map that allowed building of an initial model (Fig.  1). This initial model was used to phase native data from a wild-type TFE crystal, leading to an improved electron density map. A model comprising TFE residues 1-88 (except a disordered loop containing residues 70 -72) and a total of 701 nonhydrogen atoms could be built and refined at 2.9-Å resolution ( Table I).
The observed difficulties during phase determination may have been caused by relatively high R sym values of about 6% in the lowest resolution shell. The elevated R sym values were independent of the x-ray source and detector type and were not due to incorrect assignment of space group, because they remained high after processing the data in space groups of lower symmetry. The data intensity distribution exactly followed theoretical Wilson statistics, suggesting that the elevated R sym values are also not due to partial crystal twinning. The phenomenon may, however, be explained by the high percentage of disordered residues (29%, see below) and the low solvent content of the crystals (34%).

TFE Forms an Extended Winged Helix Domain-
The crystal structure shows that the N-terminal TFE domain adopts a winged helix fold (Fig. 2C), which is frequently found in transcription factors and other nucleic acid-binding proteins (56 -58). To detect similarities of the TFE structure with known molecular structures we submitted the atomic coordinates to the DALI server (59). This analysis revealed that most proteins with significant structural similarity to TFE are DNA-binding winged helix proteins, including transcriptional regulators and enzymes involved in nucleic acid metabolism. Also among the 40 proteins with the highest DALI score were histone H5, the catabolite activator protein, domains 2 and 4 of the bacterial factor, and the winged helix domains of the TFIIF Rap30 subunit and TFIIE␤. The winged helix fold of TFE comprises three ␣-helices and three ␤-strands in the canonical order ␣1-␤1-␣2-␣3-␤2-␤3 (Fig. 2, B and C). Conserved residues within helices FIG. 1. Experimental electron density maps. Depicted are three regions of the initial experimental electron density map obtained by MAD phasing (blue, contoured at 1) with the final model superimposed (yellow). The location of methionine side chains coincides with peaks in a selenium anomalous difference Fourier map (red, contoured at 4). The selenium anomalous difference Fourier was calculated with anomalous differences measured at the selenium peak wavelength and with phases from the final model. FIG. 2. Structure of the TFE winged helix domain. A, domain architecture of S. solfataricus TFE and human TFIIE␣. A potential zinc-binding region (Zn), a helix-turn-helix motif (HTH), an alanine-rich region (Ala), a region rich in serines, threonines, aspartates, and glutamates (STDE), and an acidic region (Acidic) are indicated. Regions in TFIIE␣ that are required for basal transcription, interaction with ␣1-␣3 form the tightly packed hydrophobic core of the winged helix domain. Among these residues are several leucines that are repeated in the TFE sequence with a spacing of seven amino acid residues, previously suggesting a possible leucinezipper structure that could confer dimerization (38).
A specific feature of the structure is the extension of the canonical winged helix fold at the N and C termini by the additional helices ␣0 and ␣4, respectively (Fig. 2C). Hydrophobic residues from the additional helix ␣0 extend the hydrophobic core of the winged helix domain, and helix ␣0 is tightly packed against the canonical winged helix fold. Helix ␣4 comprises only one turn and is ordered only until residue 88, although residues 89 -110 are present in the crystals and secondary structure prediction suggests that the helix extends until residue 110. Because the crystal packing would allow for extension of helix ␣4 and its protrusion into the solvent region, it is likely that residues 89 -110 extend helix ␣4 but that they are not observed due to disorder in the crystal. Truncation of the TFE C terminus until residue 97 results in an insoluble protein variant, possibly because helix ␣4 is disrupted. A protruding helix ␣4 could form a semirigid linker to the C-terminal zinc-binding domain of TFE, or it could be involved in TFE dimerization, although there is no evidence for dimerization in the crystals.
Unusual Surface Features and Potential Interaction Sites-The extended strands ␤2 and ␤3 form a protruding ␤-hairpin. The ␤2-␤3 connecting loop is partially disordered and corre-sponds to wing 1 of the canonical winged helix fold (Fig. 2C). The wing 1 protrusion in TFE is unusually long and has a basic and an aromatic face, formed by residues in strands ␤2 and ␤3, respectively. The aromatic residue Trp-75 in strand ␤3 is fully solvent-exposed and may play a functional role. Residue Trp-75 is involved in crystal packing and influences the position of the wing 1 protrusion. The aromatic face of the wing 1 protrusion and helix ␣3 line a negatively charged groove (Fig. 2, D and E). The molecular surfaces of the groove, the wing 1 protrusion, and the central part of helix ␣3 are conserved among archaeal TFE proteins (Fig. 2, B and D), indicating that this side of the domain is involved in functionally relevant interactions. Several salt bridges are formed on the domain surface, which could contribute to protein stability at high temperature and low pH, typical for the growth environment of S. solfataricus.
There are two hydrophobic surface patches that may serve as protein interaction sites. The larger of these hydrophobic patches is formed by three conserved residues on the exposed side of the additional helix ␣0, located about 30 Å away from the groove (Fig. 2, D and E). A small hydrophobic patch is found on the side of helix ␣3 opposite the groove. This patch is formed by Leu43 and Ile45, which are partially buried and also contribute to the hydrophobic core (Fig. 2D).

TFE Is Not a Typical DNA-binding Winged Helix Protein-
The structural similarity to nucleic acid-interacting proteins suggested that the TFE winged helix domain could bind nucleic acids. To explore this possibility we compared the TFE struc-TFIIE␤, and stimulation of C-terminal repeat domain kinase activity (39) are indicated with gray lines. B, alignment of the TFE winged helix domain amino acid sequence with corresponding regions in other archaeal TFE sequences, and in the human and yeast TFIIE ␣ subunit. Secondary structure elements are indicated above the sequences (␣-helices, magenta cylinders; ␤-strands, cyan arrows). Marked are residues in the hydrophobic core (), residues forming the hydrophobic patch on helix ␣0 (f), arginine 51 (ࡗ), and three conserved acidic residues (X). Residues that were mutated to methionines for structure determination are marked below the alignment (q). The amino acid sequence alignment of archaeal TFE homologs was produced with ClustalW (65). Sequences of human and yeast TFIIE␣ were adjusted manually, aided by secondary structure prediction (PHD, 64). The graphic was prepared with ALSCRIPT (71). C, ribbon representation of the structure of the TFE winged helix domain. Secondary structure elements are colored and labeled according to B. The graphics were prepared with programs BOBSCRIPT (72) (46); native data were processed with DENZO and SCALEPACK (45). b Calculated with the CCP4 program TRUNCATE. c Calculated with the CCP4 program CHOOCH for peak and inflection data sets, estimated for the remote data set. d MAD phasing statistics were calculated with program SOLVE (47,48). e Refinement statistics were calculated with program CNS (52).
ture to known structures of winged helix protein⅐DNA complexes (one is shown in Fig. 3). Most DNA-binding winged helix domains studied to date interact with DNA via the "recognition" helix ␣3 and via wing 1 and generally show a continuous positively charged surface on the DNA binding side (57). Examples for such typical DNA-binding winged helix proteins are HNF-3␥ (60), E2F-4 fragment (61), and BmrR (62). The winged helix domain of hRFX1 also uses helix ␣3 and wing 1 for DNA contacts, but the DNA is positioned differently (63). The side of the TFE domain that corresponds to the DNAbinding face of typical winged helix proteins shows considerable surface conservation, in agreement with a possible role in DNA binding (Fig. 2D). Helix ␣3 is positively charged, as required for DNA binding (Fig. 2E). Within this helix, arginine Arg-51 is conserved in all TFE and TFIIE␣ sequences except for the yeast sequence (Fig. 2B), and a DNA-binding arginine is found at a corresponding location in the winged helix domain of the transcriptional activator BmrR (62). However, the groove between helix ␣3 and the wing 1 protrusion, which typically accommodates the DNA backbone, is negatively charged in the TFE domain. In particular, the entrance to the groove is formed by a conserved acidic patch in loop ␤1-␣2 (Fig. 2D). Furthermore, wing 1 in TFE is unusually long in comparison with typical DNA-binding winged helix domains, and it is even longer in TFIIE␣ (Fig. 2B). These comparisons predict that the extended wing 1 and the acidic groove interfere with DNA binding and make it unlikely that TFE binds DNA like a typical DNA-binding winged helix protein, despite some features that suggest this.
To test directly whether the TFE winged helix domain binds to nucleic acids in vitro, we carried out electrophoretic mobility shift assays with purified recombinant TFE (see "Experimental Procedures"). We tested various types of synthetic nucleic acid constructs for TFE binding, including single-stranded DNA, a double-stranded blunt end DNA, a double-stranded DNA with a single-strand overhang, and a DNA mismatch bubble. In these experiments we could not detect significant DNA binding by the TFE winged helix domain, even when a 200-fold molar excess of TFE was added to 100 pM nucleic acids (not shown). It is however possible that cooperative interactions within the transcription initiation complex allow for a contribution of the TFE domain to promoter DNA binding.
Conservation of the Winged Helix Domain in Eukaryotic TFIIE␣-Archaeal TFE sequences are highly homologous and can be aligned easily, but it is difficult to align these with the corresponding eukaryotic TFIIE␣ sequences, because the sequence conservation is weak. However, the TFE structure and secondary structure prediction of eukaryotic TFIIE␣ (PHD (64)) suggested how to modify a ClustalW-generated (65) alignment of TFE with human and yeast TFIIE␣ sequences (Fig.  2B). The modified alignment was then used to produce homology models of the human and yeast TFIIE␣ winged helix domains. Inspection of the modeled structures revealed reasonably packed hydrophobic cores and no severe side chain clashes, indicating that the alignments are correct. Residues of the TFE winged helix domain that are identical in human TFIIE␣ include Leu-24, Ile-26, Leu-27, Leu-43, Leu-54, and Phe-62, which participate in the extended hydrophobic core.
The TFIIE␣ homology models also reveal that key surface features of archaeal TFE are apparently conserved in eukaryotic TFIIE␣. First, the surface charge distribution of S. solfataricus TFE is conserved in yeast and human TFIIE␣ from helix ␣3 to strand ␤3 (not shown). Second, residues Glu-37, Arg-51, and Asn-55, located in the groove between wing 1 and helix ␣3, are identical between TFE and human TFIIE␣. Third, the two hydrophobic surface patches of TFE are conserved in eukaryote TFIIE␣, including the three exposed hydrophobic residues in helix ␣0 and the invariant Gly-17 in the subsequent loop. Because conservation of the hydrophobic patch on helix ␣0 can not result from a structural requirement, this patch is likely a protein docking site in all TFE and TFIIE␣ winged helix domains. Taken together, this analysis shows that the TFE structure is a valid model for the corresponding domain in eukaryotic TFIIE␣ and suggests that the described key surface features are functionally relevant.
Comparison with Other Winged Helix Domains in General Transcription Factors-Our finding extends the number of winged helix domains in general transcription factors to a total  (20)), TFIIF Rap30 (PDB code 1BBY (22)), and E2F-4 (PDB code 1D8K (61)). The view is as in Fig. 2C, top. of four, one in each subunit of the TFIIE and TFIIF heterodimers. An 80-residue domain in TFIIE␤ adopts the winged helix fold (37) and shows sequence similarity to the C-terminal DNA-binding domain of the small TFIIF subunit Rap30, which also adopts the winged helix fold (22). Another winged helix domain is found at the C terminus of the large TFIIF subunit Rap74 (20). This domain forms a complex with a C-terminal helical portion of the pol II-specific phosphatase Fcp1 (18).
Superposition of the TFE/TFIIE␣ winged helix domain with the three above winged helix domains shows that helices ␣0 and ␣4 are unique features of the TFE/TFIIE␣ domain (Fig. 3). The length of wing 1 decreases in the order TFE/TFIIE␣ Ͼ TFIIF Rap30 Ͼ TFIIF Rap74 Ͼ TFIIE␤. The interaction sites of all four winged helix domains in general transcription factors also differ. The DNA-binding faces of TFIIF Rap30 and TFIIE␤ are located on opposite sides (22,37). The site for interaction of the Fcp1 peptide with the TFIIF Rap74 winged helix domain is the groove between helix ␣3 and wing 1 (Fig. 3). The same groove in TFE may be used for protein interactions, but additional putative interaction sites exist (see above). Taken together, this comparison demonstrates that the winged helix fold is very versatile, even among the four domains in general transcription factors.
Comparison with the Bacterial Factor-It was noted that TFIIE␣ shows weak sequence similarity to regions 2 and 4 of the bacterial factor (38). Structures of domains have been solved (66,67), and structures of bound to the bacterial polymerase core and to the polymerase core and promoter DNA have also been determined recently (68 -70). Superposition of the TFE domain onto domains 2 and 4 shows that the conservation is generally limited to important residues in the hydrophobic core of the domains (not shown). The domains do not contain a wing and no equivalent to helix ␣0. In the context of the bacterial polymerase--DNA complex, domain 2 binds to DNA near the point of melting, and domain 4 binds to the promoter element at position Ϫ35. Superposition of the TFE domain on the two domains in the context of the polymerase and DNA shows clear differences in residues in the vicinity of DNA. It is thus unlikely that TFE contacts promoter DNA in an equivalent manner in the archaeal and eukaryotic systems.
This comparison suggests that the function of the TFE winged helix domain does not directly correspond to that of a domain. Instead, the sequence homology between the TFE domain and the two regions apparently reflects the conservation of structural residues in a related helical fold. We recently reported that a similarly weak sequence homology between and the pol II subunit Rpb4 does not reflect a functional equivalence either (26). Despite differences in structure, key functional aspects of protein factors involved in transcription could nevertheless be similar between eukaryotes and bacteria.
Overall Structure of TFIIE in the Initiation Complex-Our results are consistent with a model of the TFE/TFIIE␣ winged helix domain as a bridging factor or adapter between TBP, the polymerase, and possibly promoter DNA. TFE interacts physically both with TBP and with archaeal RNA polymerase (41). These protein interactions may be achieved by the conserved surface features of the TFE winged helix domain, including the two hydrophobic surface patches and a negatively charged groove. TFE can promote transcription apparently due to stabilization of the TBP⅐DNA interaction (41), suggesting that TFE could contact DNA. Although our results apparently exclude a canonical DNA interaction, the TFE domain shows some features of DNA-binding winged helix domains that may contribute to context-dependent promoter DNA binding.
The adapter function of the winged helix domain may be the basic conserved function of TFE/TFIIE, because the winged helix domain is common to archaea and eukaryotes. In contrast, the C-terminal region of TFIIE␣ and the TFIIE␤ subunit only occur in eukaryotes, and appear to play eukaryote-specific roles, as reflected by their cooperation with TFIIF and TFIIH, factors that do not exist in archaea. Whereas the TFE/TFIIE␣ winged helix domain could bind between TBP and pol II, the TFIIE␤ subunit may bind to promoter DNA in and around the early transcription bubble (35)(36)(37). Open questions on the TFIIE structural mechanism include the basis of TFIIE subunit dimerization and the exact location of the TFIIE domains with respect to other components of the transcription machinery.