The C-terminal RpoN Domain of σ54 Forms an Unpredicted Helix-Turn-Helix Motif Similar to Domains of σ70*

The “σ” subunit of prokaryotic RNA polymerase allows gene-specific transcription initiation. Two σ families have been identified, σ70 and σ54, which use distinct mechanisms to initiate transcription and share no detectable sequence homology. Although the σ70-type factors have been well characterized structurally by x-ray crystallography, no high resolution structural information is available for the σ54-type factors. Here we present the NMR-derived structure of the C-terminal domain of σ54 from Aquifex aeolicus. This domain (Thr-323 to Gly-389), which contains the highly conserved RpoN box sequence, consists of a poorly structured N-terminal tail followed by a three-helix bundle, which is surprisingly similar to domains of the σ70-type proteins. Residues of the RpoN box, which have previously been shown to be critical for DNA binding, form the second helix of an unpredicted helix-turn-helix motif. The homology of this structure with other DNA-binding proteins, combined with previous biochemical data, suggests how the C-terminal domain of σ54 binds to DNA.

Transcription, the synthesis of RNA from double-stranded DNA, is a fundamental process in all forms of life. The primary protein complex catalyzing transcription is composed of five subunits, ␣ 2 ␤Ј␤, called "core RNA polymerase" (RNAP). 2 Core RNAP is fully competent to synthesize RNA from DNA. However, the initiation of transcription at specific DNA sequences requires additional protein(s). In bacteria, these additional proteins are called the " factors" (1,2). The factor facilitates transcription at specific DNA sequences by 1) binding to the core RNAP to form the -RNAP holoenzyme, 2) recognizing and binding to a specific DNA sequence next to the transcription start site, called the promoter element, and 3) opening the double-stranded DNA to initiate transcription.
Based on protein sequence homology, factors can be grouped into two classes, 70 and 54 , which were named by the molecular weights of the first members identified (reviewed in Ref. 3). The 70 -type factors are the most abundant in bacteria (reviewed in Refs. 4,5). They include the primary factors, such as 70 from Escherichia coli and A from Gram-positive bacteria, which regulate transcription for most genes required for normal exponential growth. Other members of the 70 family ( 28 , 32 , etc.) regulate the transcription of more specialized genes that are required to respond to environmental changes. For example, 28 from Salmonella typhimurium controls expression of genes required for flagellar assembly (6).
The 54 -type factor, which is also called N and encoded by the rpoN gene, has no detectable sequence homology to 70 -type factors. Its occurrence is widespread among bacteria, but there is usually only one 54 gene present in a particular organism (7). 54 regulates gene transcription for proteins required for many important cellular functions, such as nitrogen metabolism, development, phage shock responses, and pathogenicity. For example, 54 has been shown to be required for mammalian infection by Borrelia burgdorferi, the agent of Lyme disease (8).
The most striking difference between 54 and 70 factors is how they regulate transcription initiation. Once 70 -RNAP holoenzyme binds to the promoter of a gene (at a conserved sequence Ϫ10 and Ϫ35 base pairs upstream of the transcription start site), it can open DNA and transcription can begin spontaneously. In contrast, when 54 -RNAP holoenzyme binds to a promoter (at a consensus sequence Ϫ12 and Ϫ24 base pairs upstream of the transcription start site), it remains in the closed, inactive state. To open the double-stranded DNA and initiate transcription, 54 -RNAP holoenzyme requires interaction with an activator protein (reviewed in Ref. 9), which binds ϳ150 base pairs upstream of the promoter (10). Once stimulated by a physiological signal, such as phosphorylation or ligand binding, the activator uses the energy of ATP hydrolysis (via a conserved AAAϩ-ATPase domain) to remodel the 54 -RNAP promoter complex into an active conformation that is capable of initiating transcription (7).
The 70 -type factors have been well characterized structurally (11)(12)(13)(14). X-ray crystal structures of individual domains of 70 and 70 -RNAP holoenzyme show that 70 -type factors are composed of four helical domains ( 1 , 2 , 3 , 4 ) connected by flexible linkers (15,16). These four domains are also functionally distinct: 1 is involved in regulating the kinetics of transcription initiation; 2 binds to the Ϫ10 promoter element and is essential for melting the DNA; 3 stabilizes the open complex formation by binding the extended Ϫ10 element; and 4 binds specifically to the Ϫ35 promoter element (reviewed in Ref. 17).
Although information about the overall shape of 54 has come from scattering and electron microscopy data (22,23), no high resolution structural information is available for 54 . The lack of sequence homology between 54 and 70 prevents homology modeling of 54 domains based on the structures of 70 domains. Therefore, to understand the mechanism of 54 -dependent transcription initiation, high resolution structures of domains are required.
Here we present the NMR-derived structure of a C-terminal fragment of 54 (residues Thr-323 to Gly-389) from the hyperthermophilic bacterium, Aquifex aeolicus. This domain, which includes the signature RpoN box motif, consists of a poorly structured N-terminal tail followed by an unpredicted helix-turn-helix motif. Surprisingly, this domain is structurally similar to the 3 and 4 domains of 70 -type proteins, despite their low sequence homology. Structural homology with other DNA-binding proteins suggests how this protein may bind the Ϫ24 promoter element. Because of the high sequence conservation of this region among 54 proteins (Fig. 1), it is likely that this domain has a similar fold in all species, including the most highly studied 54 proteins from E. coli and Klebsiella pneumonia.

MATERIALS AND METHODS
Sequence Analyses-54 protein sequences from a variety of species were downloaded from the National Center for Biotechnology Information web site (www.ncbi.nlm.nih.gov/) and then aligned using the ClustalW (24) web server at the European Bioinformatics Institute (www.ebi.ac.uk/clustalw/). Secondary structure prediction for the A. aeolicus 54 protein sequence was performed using the neural network Jnet (25) via the Jpred web server (www.compbio.dundee. ac.uk/ϳwww-jpred/) (26).
Plasmid Construction-Six unique constructs of the C-terminal region of 54 from A. aeolicus were cloned by PCR with primers (Operon) containing NdeI and BamHI restriction sites at the 5Ј and 3Ј-ends, respectively. The fragments were amplified from a plasmid containing the full-length 54 gene (courteously provided by Gongyi Zhang from the National Jewish Medical Center, Denver, CO). The amplified fragments were digested with NdeI and BamHI, purified by gel electrophoresis, and ligated into the pET21a expression vector.
Resonance Assignments-NMR data were collected at 298 K on a Bruker DRX 600 MHz. All data were processed with NMRPipe (28) and analyzed with NMRView (29). Small amounts of proteolysis were detected by new peaks in the 15 N-HSQC spectrum after 5 days at 298 K. Therefore, a fresh protein sample was used for each three-dimensional spectrum.
Backbone assignments were obtained using the three-dimensional triple resonance experiments HNCA, CBCA(CO)NH, and 15 N-NOESY-HSQC. Side chain assignments were obtained from the threedimensional experiments 15 N-TOCSY-HSQC, C(CO)NH, and HCCH-TOCSY, as well as the two-dimensional experiment proton DQF-COSY for the aromatic proton assignments.
Distance Restraints-Backbone dihedral restraints were obtained from the backbone chemical shifts (HN, HA, CA, and CB) using the program TALOS (30). 3 J HNHA -coupling constants were obtained from an HNHA spectrum (31). Nuclear Overhauser effect (NOE) distance constraints were derived by manual assignment of cross-peaks in a three-dimensional 15 N-NOESY-HSQC and 13 C-NOESY-HSQC, both with 100-ms mixing times. Hydrogen bond restraints were defined from slowly exchanging amide protons. The protein was exchanged into 95% D 2 O using an Amicon Ultra centrifugal concentrator (5-kDa cutoff; Millipore). Amide protons with strong cross-peak intensities in the 15 N-HSQC after 3 h at 298 K were identified as significantly protected. Stereo-specific assignments of the methyl groups of valine and leucine residues were obtained from a 13 C-HMQC spectrum of a 10% 13 C-labeled protein sample as described in Ref. 32. Only residues with totally unambiguous methyl assignments were used for stereo-specific assignments.
Structure Calculations-Structure calculations were performed with the program DYANA (33). Initial structures were calculated using only unambiguous NOESY peak assignments. During the later stages of assignment (backbone r.m.s.d. Ͻ1.5 for residues 21-67), the structure with the lowest energy was used to filter possible assignments based on distances. The program MOLMOL (34) was used to analyze molecules throughout the assignment process. Structure statistics, including PRO-CHECK analysis (35), are summarized in TABLE ONE.

RESULTS
Sequence Analysis and Protein Construct Design-To help identify a subdomain of the 54 protein suitable for structure determination by NMR, the A. aeolicus 54 protein sequence was aligned with 38 other 54 proteins using the program ClustalW (24). This alignment, which includes 54 proteins from 11 of the most studied species, is shown in Fig. 1. Visual examination of the sequence alignment indicated that the  54 . Only the well structured region, Ala-335 to Gly-389, (E. coli-(407-466)) is shown. Helix 3 that is created by the RpoN box is in the front in dark blue. B, protein sequence alignment of the A. aeolicus and E. coli 54 C fragment. Conserved residues are highlighted in blue; the highly conserved RpoN box is underlined in red. ␣-Helix secondary structure is indicated above in red. Below the sequence alignment (from top to bottom) are the chemical shift index analysis of the CA atoms; chemical shift index analysis of the HA atoms; schematic representation of the sequential i, iϩ1 NOE connectivities involving NH atoms; schematic representation of the medium-range i, iϩ3 NOE connectivities involving HA, HN atoms, respectively; and relative peak intensities in the 1 H-15 N HSQC spectrum. For the NOE plots, the bar is on the residue containing atom i and the height of the bar is inversely proportional to the maximum distance between the atoms. C-terminal end (ϳ288 -398 in A. aeolicus and 360 -477 in E. coli) was the most conserved region. This fragment of 54 has been shown to be sufficient for binding to double-stranded DNA (19).
Using the secondary structure prediction of the A. aeolicus 54 protein as a guide, six different constructs of this region were cloned and expressed in E. coli. Constructs were created using PCR with three different N-terminal primers and two different C-terminal primers, shown in Fig. 1. Only one of the six fragments expressed in E. coli with high levels of soluble protein. This fragment, shown with ** above the primers in Figs. 1 and 2B, starts at Thr-323, just upstream of a pair of conserved phenylalanines (329, 330) and ends at Gly-389, just downstream of the highly conserved RpoN box motif (377-386; boxed in Fig. 1). For this part of 54 , the A. aeolicus protein sequence is 36% identical to the E. coli protein sequence. Here, we present the solution structure of this 67-residue (7.9 kDa) fragment, which we refer to as " 54 C." As shown in Fig. 1, this fragment does not include the region predicted to form a helix-turn-helix motif, residues 292-313 in A. aeolicus and 366 -386 in E. coli (gray box above the sequence alignment in Fig. 1).
Structure Determination-The 54 C fragment gave an excellent 1 H-15 N HSQC spectrum with highly dispersed cross-peaks (supplemental Fig. S1), indicating that the majority of the protein fragment was stably folded. The 1 H-15 N HSQC spectrum contained cross-peaks for all residues except for the first three N-terminal residues (Thr-323, Tyr-324, Ser-325) and the single proline (Pro-359). 1 H, 15 N, and 13 C resonance assignments for the 54 C domain were made using standard three-dimensional NMR techniques (37). In total, the chemical shifts for 97% of the backbone atoms (N, NH, CA, HA) and 94% of side chain atoms (aliphatic/aromatic C and H) were assigned.
Even before we started calculating structures, three properties of the NMR data indicated that the N-terminal portion of 54 C from Leu-326 to Thr-339 was poorly structured and dynamic on a time scale of microseconds to seconds. First, in all two-and three-dimensional spectra, diagonal and cross-peak intensities for atoms in this N-terminal region were significantly weaker than for atoms in other parts of the protein. For example, in the 1 H-15 N HSQC the average relative peak height for residues Leu-326 to Thr-339 is 24% less than the average intensity for residues Gln-340 to Gly-389 (Fig. 2B, last row). Second, chemical shift index (38) for this region indicated little secondary structure beyond Leu-338 (Fig. 2B, first two rows). Third, only a small number of interresidue NOE cross-peaks were present in the three-dimensional 13 C-NOESY-HSQC for this region of the protein.
Nine hundred fifty-six unique distance restraints (TABLE ONE) were obtained from the three-dimensional 15 N-NOESY-HSQC and 13 C-NOESY-HSQC spectra. An additional 108 / torsion angle restraints were calculated from HN, HA, CA, and CB chemical shift values (30) and from 3 J HNHA -coupling constants (31) (TABLE ONE). These 1064 restraints (16 restraints/residue) were used to calculate 150 structures. The 20 structures with the lowest energy were chosen to represent the structure of the C-terminal domain of the 54 protein from A. aeolicus ( Fig. 2A).
Structure of the C-terminal Domain of 54 -For the well structured region (Gln-340 to Leu-388) of 54 C, the superposition of the final 20 structures has an r.m.s. deviation from the mean structure of 0.56 Ϯ 0.07 Å for the backbone atoms and 1.24 Ϯ 0.09 Å for all heavy atoms. Overall these structures have good statistics and geometry, as summarized in TABLE ONE. This C-terminal portion of 54 C folds into a compact domain consisting of three ␣-helices (Fig. 2). Helix 1 (Gln-340 to Asn-353) is followed by a long loop, which starts at the conserved Glu-354 and ends at the conserved Ser-361. Helix 2 (Asp-362 to Leu-369) and helix 3 (Arg-379 to Leu-388), which includes the highly conserved RpoN box sequence, form a helix-turn-helix motif that is commonly found in many DNA-binding proteins (reviewed in Ref. 39).
A color-coded image representing the electrostatic surface of 54 C (residues Gln-340 to Leu-388) shows that the molecule has a striking polarity. It has a positively charged face primarily composed of the loop in the helix-turn-helix motif (loop 2) and the top of helix 3 (Fig. 3C,  right). The other side of the molecule is negatively charged, due to negative residues in loop 1 and on helix 1 (supplemental Fig. S2).
A search for proteins structurally homologous to 54 C (residues Gln-340 to Leu-388) using the DALI server identified many DNA-binding proteins.  Fig. 3B, left and right, respectively.
As described above, the N-terminal segment from Ser-325 to Thr-339 is poorly structured and probably dynamic on a microsecond-to-  DECEMBER 16, 2005 • VOLUME 280 • NUMBER 50 second time scale. This type of dynamics causes line broadening, thus making less structural information available for this region. The poor chemical shift dispersion for backbone and side chain atoms (Fig. 2B, top two rows) and the low number of medium-and long-range NOE cross-peaks available made it possible to assign only ten long-range NOE cross-peaks available for this region. These cross-peaks indicate that Ala-335 and Leu-338 interact with the N-terminal end of helix 1 primarily via Leu-343 and with the loop between helices 2 and 3 primarily via Phe-374. For example, the aliphatic hydrogens on Leu-338 interact with the amide and aliphatic hydrogens on Leu-343 and with the aromatic hydrogens on Phe-374.

Structure of the RpoN Domain of 54
Structural Role for Conserved Residues-The hydrophobic core connecting the three helices is composed of highly conserved residues. For example, Ile-347 in helix 1 makes van der Waal interactions with Ile-365 in helix 2 and Val-381 in helix 3. All three of these residues are isoleucine, leucine, or valine in all 38 54 protein sequences examined (11 of these sequences are shown in Fig. 1). Other residues making crucial van der Waals interactions between the helices are Leu-343, Met-344, and Ile-350 in helix 1, Ala-366, Ile-368, and Leu-369 in helix 2, and Tyr-384 and Leu-388 in helix 3.
Many of the residues creating the charge on the protein surface are conserved among 54 proteins. The highly conserved RpoN box (Ala-378 to Glu-386) forms the majority of helix 3, the second helix in the helix-turn-helix motif. This helix is amphipathic, with the hydrophobic face packing against the hydrophobic core described above. The hydrophilic side is surface exposed. This helix is primarily positively charged FIGURE 3. A, ribbon diagram of the 54 C domain (residues Ala-335 to Gly-389). Side chains for residues that were previously shown to be near the Ϫ24 promoter element (50) and/or to decrease significantly DNA binding when mutated (48) are represented as sticks and labeled (except for Tyr-384, which is hidden behind Lys-383). All of these residues are on helix 3 (dark blue) created by the highly conserved RpoN box, except for Met-343 (Arg-421 in E. coli), which is on helix 1. B, superposition of the CA backbones of the 54 C fragment (blue) with the 4 domain of the 70 -type protein 28  due to the side chains Arg-378, Arg-379, Lys-383, and Arg-385, but the C-terminal part of the helix has negative charge due to the side chain of Glu-386 (Fig. 3C, right).

DISCUSSION
When fragments of 70 were crystallized, they were found to be small helical domains (15). The subsequent structure of 70 (regions 2-4) in complex with core RNA polymerase showed that these domains are modular, connected by flexible linkers, which interact with other parts of polymerase and DNA (16,42). Sequence analysis and biochemical experiments suggest that 54 may also be comprised of such domains (43,44). Here we present the first structure of a 54 domain, the C-terminal segment of the protein, which is critical for DNA binding and contains the signature "RpoN box" sequence ( Fig. 1).
Residues Gln-340 to Leu-388 of A. aeolicus 54 form a compact, three-helix domain that includes a helix-turn-helix motif. A search of the Protein Data Bank using the DALI server found that this region of 54 is structurally similar to many DNA-binding domains (supplemental Table S2). Of the ten structures with the highest Z scores, seven are known to interact with DNA. For two of these domains, the C-terminal part of the PAX6 homeodomain (PDB accession number 6PAX; Ref. 40) and the N-terminal domain of the methicillin repressor protein (PDB accession number 1SAX; Ref. 45), the structures of the protein-DNA complexes are known (40,46). These structures show that the proteins interact primarily with the major groove of DNA via the second helix of the helix-turn-helix motif, termed the "recognition helix." The equivalent helix in the 54 C domain is created by residues of the highly conserved RpoN box motif (helix 3; Figs. 2B and 3A). Thus, residues in the RpoN box are structurally poised to interact with the major groove of the 54 promoter element.
Consistent with interpreting structure of the 54 C domain as a DNAbinding domain, previous footprinting and gel shift assays have shown that the region near the RpoN box is critical for 54 promoter recognition and DNA binding (19,47,48). Mutations in the RpoN box sequence are far more damaging to DNA binding than mutations in other parts of E. coli 54 (21,48). All five mutations made in the RpoN box sequence had reduced binding to both duplex and fork junction promoter DNA, with four mutant proteins (K. pneumoniae R455A, R456A, Y461A, and R462A) showing Յ20% DNA binding of wild-type 54 protein. In addition, these four mutant proteins had dramatically reduced transcription activity in vitro with Յ30% the activity of the wild-type protein. Thus, these residues on the RpoN box are important for DNA binding and 54 -dependent transcription. The corresponding residues in A. aeolicus 54 are shown as sticks in Fig. 3A. Four of these residues (Arg-378, Arg-379, Lys-383, and Arg-385) lie on the hydrophilic face of the recognition helix, suggesting that this face of the helix interacts directly with DNA, as discussed in more detail later in this section.
Additional support for the role of this domain in DNA binding comes from the work of Cannon et al. (47). They found that removing the last 53 residues of K. pneumoniae 54 (Lys-425 to Val-477; A. aeolicus Lys-346 to Ile-398) eliminated DNase I and 1,10-phenanthroline-copper footprints of 54 and reduced 54 holoenzyme binding to 5-bromouracil substituted DNA by 85%. This truncation cuts the protein in the middle of helix 1 of the 54 C domain, removing loop 2 and the helix-turn-helix motif.
Overall, the C-terminal domain of 54 is more structurally similar to domain 3 than to domain 4 of 70 . 3 and the 54 C are about the same size (55 and 59 residues, respectively), and all three helices are about the same length (Fig. 3B, right). On the other hand, 4 has an extra helix at the N-terminal end, and the recognition helix is twice as long as the corresponding helix in the 54 C domain (Fig. 3B, left). In terms of surface charge distribution, however, the 54 C is more similar to 4 than to 3 (Fig. 3C). Both 4 and 54 C have a positively charged surface created by side chains on the loop and the recognition helix of the helix-turnhelix motif (Fig. 3C, left and middle). For 4 , this surface interacts specifically with the major groove of the Ϫ35 promoter element (13,15). For 54 C, this surface is created by the residues of the conserved RpoN box, which probably interact with the Ϫ24 promoter element.
Previous biochemical studies suggest that the C-terminal domain of 54 is more functionally analogous to the 4 domain than to the 3 domain of 70 -type factors. 3 of 70 proteins interacts with the extended Ϫ10 promoter element near the site of melting (42,49), whereas the major role of 4 of 70 proteins is to bind to the Ϫ35 promoter element (13,15). By tethering a DNA cleavage reagent, FeBABE (p-bromoacetamidobenzyl-EDTA-Fe), to charged residues in the RpoN box of K. pneumoniae 54 , Burrows et al. (50) showed that residues in the RpoN box are near the Ϫ24 promoter element, not the Ϫ12 promoter element where melting occurs. For example, with the FeBABE tethered to residue E463C (A. aeolicus Glu-386), strong cleavage was seen between the Ϫ30 and Ϫ25 nucleotides of the template DNA strand.
The N-terminal end of 54 C domain, residues Ser-325 to Thr-339, is not well ordered and probably dynamic on the microsecond-to-second time scale. These residues could be a part of a linker region that connects two independently folded domains of 54 , as seen with 70 -type factors (15,16). Previous limited proteolysis experiments on the fulllength 54 did not identify this segment as proteolytically sensitive (51). Thus, additional proteolysis studies and NMR analyses on larger fragments of the A. aeolicus 54 are required to determine whether this region (Ser-325 to Thr-339) is still poorly structured in the presence of the other domains and RNAP.
The C-terminal domain of 54 presented in this report provides an interesting structural connection between 54 and 70 . This structural similarity of 54 C to domains of 70 -type sigma factors is surprising (Fig.  3B) because these regions have poor sequence homology. Residues Thr-323 to Gly-389 of 54 C have 14% identity, 25% similarity with residues 186 -326 of region 4 of A from T. aquaticus and 15% identity, 26% similarity with residues 279 -326 of region 3 of 28 from A. aeolicus. A low resolution structure of the full-length 54 protein suggests that other domains of 54 may also be more similar to domains of 70 than previously predicted by only sequence homology (22). High resolution structures of these other domains are needed to confirm this hypothesis.