X-ray Crystal Structure of Escherichia coli RNA Polymerase σ70 Holoenzyme*

Background: A crystal structure of Escherichia coli RNA polymerase (RNAP) has not been determined. Results: The σ1.1 and α subunit C-terminal domain structures have been determined in the context of an intact RNAP. Conclusion: σ1.1 localizes within the RNAP DNA-binding channel and must disengage from this site to form an open complex. Significance: This work enables future structure determination of bacterial RNAP mutants. Escherichia coli RNA polymerase (RNAP) is the most studied bacterial RNAP and has been used as the model RNAP for screening and evaluating potential RNAP-targeting antibiotics. However, the x-ray crystal structure of E. coli RNAP has been limited to individual domains. Here, I report the x-ray structure of the E. coli RNAP σ70 holoenzyme, which shows σ region 1.1 (σ1.1) and the α subunit C-terminal domain for the first time in the context of an intact RNAP. σ1.1 is positioned at the RNAP DNA-binding channel and completely blocks DNA entry to the RNAP active site. The structure reveals that σ1.1 contains a basic patch on its surface, which may play an important role in DNA interaction to facilitate open promoter complex formation. The α subunit C-terminal domain is positioned next to σ domain 4 with a fully stretched linker between the N- and C-terminal domains. E. coli RNAP crystals can be prepared from a convenient overexpression system, allowing further structural studies of bacterial RNAP mutants, including functionally deficient and antibiotic-resistant RNAPs.

RNA polymerase (RNAP) 2 is the central enzyme of gene expression, and all life forms have RNAPs that function as multisubunit protein complexes (multisubunit cellular RNAP). The common core of the multisubunit RNAPs is composed of five subunits that are conserved in bacteria, archaea, and eukaryotes. Bacterial RNAP is the simplest form of this family (composed of the core enzyme ␣ I ␣ II ␤␤Ј subunits), whereas in eukaryotes and archaea, RNAP possesses additional polypeptides to form 11ϳ15-subunit complexes (1).
In bacteria, one of several different factors binds to the core enzyme to form the holoenzyme, which is responsible for rec-ognizing promoter DNA. 70 in Escherichia coli and SigA in other bacteria belong to the group 1 (primary or housekeeping) factor family (2). These factors contain distinct regions of highly conserved amino acid sequence and are composed of four domains: 1.1 (region 1.1), 2 (regions 1.2-2.4), 3 (regions 3.0 -3.2), and 4 (regions 4.1-4.2) (3). Group 1 factors can bind to promoter DNA as part of the holoenzyme; once it binds to the core enzyme, the 2, 3, and 4 domains are ideally positioned to recognize the promoter DNA sequences of Ϫ10, extended Ϫ10, and Ϫ35, respectively (4,5).
In addition to the 2, 3, and 4 domains, the group 1 family contains an ϳ100-amino acid N-terminal extension, 1.1 , which is a negatively charged ␣ helical domain (6). The 1.1 domain has been shown to accelerate the formation of the open complex at some promoters and suggested to reside inside the RNAP main channel (7). This channel is positively charged to accommodate nucleic acids in the open complex and the transcription elongation complex. It has been proposed that during open complex formation, signals from DNA may induce opening and closing of the RNAP clamp, causing 1.1 to eject from the RNAP main channel (4,8). Given its flexible nature, 1.1 has not been solved in all Thermus RNAP holoenzyme crystal structures that have been reported (5, 9 -12). Only an NMR structure of 1.1 from Thermotoga maritima has been reported, and it consists of three ␣ helices with a compact hydrophobic core formed by highly conserved hydrophobic residues (6).
Since the first discovery of RNAP in the early 1960s (13), the RNAP from E. coli has been the primary model system of choice for understanding functions of cellular RNAPs for many reasons. For example, active E. coli RNAP can be conveniently reconstituted in vitro from its individual subunits using either wild-type or mutant proteins (14,15), and its mechanism can be easily probed in vitro in the presence of purified template DNA, factors, and transcription factors. A simple and robust E. coli transcription system also makes it an excellent model for single-molecule studies of RNAPs (16).
X-ray crystal structures of bacterial RNAPs have been determined only from the Thermus genus. Because of the high sequence conservation among RNAPs from all species of bacteria, the most insight derived from the Thermus RNAP has been generalized to represent the transcription apparatus in all bacteria (4, 5, 9 -12, 17-19). Nevertheless, without the structure of E. coli RNAP available, it is difficult to fully interpret the enormous amount of data that have been collected on E. coli RNAP. The structure of E. coli RNAP will also generate new insight about structural domains and motifs, as well as interactions with some ligands (e.g. ppGpp) and antibiotics (e.g. lipiarmycin) that specifically affect E. coli but not the Thermus RNAPs (20,21). These structural insights are important to identify their binding sites and to understand the mechanisms of action.

EXPERIMENTAL PROCEDURES
Preparation and Crystallization of the E. coli RNAP Holoenzyme-The polycistronic plasmid pGEMABC was created for overexpressing the rpoA (encoding the ␣ subunit), rpoB (encoding the ␤ subunit), and rpoC (encoding the ␤Ј subunit) genes as follows. The plasmid pGEMA185 expressing rpoA under the control of an IPTG-inducible T7 RNAP promoter (22) was digested at a BamHI site located downstream of rpoA. A DNA fragment containing the rpoB-rpoC genes was isolated from the pPNE2017 plasmid 3 by BamHI digestion and inserted at the BamHI site of pGEMA185. pGEMABC expresses a single mRNA containing the rpoA-rpoB-rpoC genes.
The RNAP holoenzyme was prepared by adding a 3-fold excess of 70 to core RNAP, followed by incubation at 30°C for 30 min and purification by Superdex 200 column chromatography. Crystals were obtained by hanging drop vapor diffusion by mixing equal volumes of RNAP holoenzyme solution (ϳ20 mg/ml) and crystallization solution (0.1 M HEPES-HCl (pH 7.0), 0.2 M calcium acetate, and ϳ15% PEG 400) and incubating at 22°C over the same crystallization solution. For cryocrystallography, crystals were soaked in crystallization solution containing 25% PEG 400. Selenomethionyl-substituted proteins, including core RNAP and 70 , were prepared by suppression of methionine biosynthesis (23). The crystals belong to the primitive orthorhombic space group (Table 1) containing two 440-kDa RNAP holoenzymes per asymmetric unit, and these RNAPs have almost identical structures (0.643-Å root mean square deviations by a structure alignment using the ␤Ј subunit), with some minor deviations in the position of the nonconserved and 4 domains.
X-ray Data Collections and Structure Determination-The native data set was collected at Macromolecular Diffraction at the Cornell High Energy Synchrotron Source (MacCHESS) beamline A1 (Cornell University, Ithaca, NY). The data sets of SeMet-labeled crystals were collected at Berkley Center for Structural Biology (BCSB) beamline 8.2.1 (Lawrence Berkeley National Laboratory, Berkley, CA). The E. coli core RNAP model (24) was used as a search model for the molecular replacement (25). The data were processed by HKL2000 (26). Anomalous signals from SeMet were located by phase obtained from molecular replacement. Rigid body refinements were performed, and further adjustments to the model were performed manually. The resulting model phases allowed me to position E. coli 70 structures (27,28) in the electron density map. Positional refinement with non-crystallographic symmetry and secondary structure restraints was performed using the program PHENIX (29), and deformable elastic network (DEN) refinement was performed using Crystallography & NMR System (CNS) version 1.3 (30). The resulting map allowed segments that were not present in the search model to be built manually by Coot (31). The final coordinates and structure factors were submitted to the Protein Data Bank with code 4IGC.

RESULTS AND DISCUSSION
E. coli 70 RNAP Holoenzyme Preparation and Crystallization-Endogenous E. coli RNAP can be purified from cells by a combination of RNAP-DNA co-precipitation using Polymin P and column chromatography (32). However, the yield and purity of endogenous E. coli RNAP are inadequate to 3 N. Fujita and R. E. Glass, personal communication. obtain high-quality crystals for x-ray crystallography (Ͻ1 mg of RNAP is generated from 1 liter of cell culture). Therefore, I developed a co-overexpression plasmid (pGEMABC) that expresses the rpoA (encoding the ␣ subunit), rpoB (encoding the ␤ subunit), and rpoC (encoding the ␤Ј subunit) genes under a single T7 RNAP promoter. This overexpression system drastically improves the yield and purity of RNAP (10 mg of RNAP from 1 liter of cell culture). The 70 holoenzyme can be prepared by adding recombinant 70 to core RNAP. Both the core and holoenzyme formed crystals, but neither diffracted beyond 10 Å resolution. pGEMABC overexpresses the ␣, ␤, and ␤Ј subunits but not the subunit; thus, purified RNAP contains a substoichiometric amount of the subunit. The importance of the subunit for RNAP assembly and formation was suggested by a biochemical experiment (33) and by the Thermus RNAP crystal structure, which shows that the subunit binds the C-terminal tail of the ␤Ј subunit (see Fig. 3b and supplemental Movie S4) (17).
To prepare RNAP containing a stoichiometric amount of the subunit, all RNAP subunits were overexpressed by pGEM-ABC and pACYCDuet-1_Ec_rpoZ, which overexpresses the subunit. The E. coli RNAP holoenzyme was prepared in vitro by addition of 70 , which produced better quality crystals that allowed determination of the structure by x-ray crystallography.
Structure Determination of the E. coli 70 RNAP Holoenzyme-The crystals contain two 440-kDa RNAP holoenzyme mole-cules, designated RNAP A and RNAP B , per asymmetric unit. The structure was solved by molecular replacement with an E. coli RNAP core enzyme model (24). After density modification, the resulting electron density map had several deviations from the molecular replacement solution, including the following regions: 1) ␤ insert 4 (␤i4, residues 225-343, previously named ␤ dispensable region 1/␤DR1/SI1), 2) ␤ insert 9 (␤i9, residues 938 -1042, previously named ␤ dispensable region 2/␤DR2/SI2), 3) ␤ insert 11 (␤i11, residues 1122-1180, present between ␤ conserved regions H and I), 4) ␤Ј insert 6 (␤Јi6, residues 942-1129, present in the middle of the highly conserved ␤Ј trigger loop/helix), 5) ␤Ј residues 515-597 (present between ␤Ј conserved regions B and C), and 6) the C-terminal tails of the ␤Ј and subunits ( Fig. 1; see Fig. 3a). The overall structures of ␤i4 and ␤i9 are similar to the structures in the previously reported E. coli RNA core enzyme model (24), but their orientations relative to the main body of the RNAP are different. The crystal structures of the E. coli 70 domains (27,28) were manually placed in the F o Ϫ F c map, resulting in good fits of 2, 3, and 4. Anomalous signals from SeMet sites from both the core enzyme and 70 were used as guides for model building and refinement.
Structure of the E. coli 70 RNAP Holoenzyme-The overall structure of the E. coli RNAP holoenzyme is similar to the structure of Thermus RNAP, resembling a crab claw with two pincers that constitute the DNA-binding cleft and the active site ( Fig. 1 and supplemental Movie S1) (17). The ␤Ј subunit forms one pincer, called the "clamp," and the ␤ subunit forms the other pincer. The clamp changes its position by swinging between open and closed states (34). Comparison of the E. coli RNAP structure with the Thermus RNAP structures, including the core enzyme (17,35), holoenzyme (4,5), and transcription elongation complex (18), revealed that the E. coli RNAP clamp is in a more closed conformation compared with any other RNAP crystal structure solved to date. The gap is narrow (ϳ7.5 Å) between the C␣ atoms of 2 and the tip of the ␤ subunit pincer (residues 371-380) ( Fig. 1; see Fig. 4a and supplemental Movie S1). The sequences and structures of 2 and the ␤ subunit pincer are highly conserved in the E. coli and Thermus RNAPs; therefore, the narrow gap between 2 and the ␤ subunit pincer observed in the E. coli RNAP crystal structure is due to closing of the entire clamp. The 3.2 region formed a well ordered loop in the Thermus thermophilus holoenzyme (5), but it was disordered in the Thermus aquaticus holoenzyme (4). The E. coli holoenzyme shows a well ordered 3.2 structure (residues 509 -519) (see Fig. 4a and supplemental Movie S1).
Structural Comparison of E. coli and Thermus RNAPs-The structures of the E. coli ␣ subunit dimer and 70 domains 2 and 4 have been determined previously (27,28,36) and have already been compared with their counterparts in the Thermus RNAP (17,37). However, the E. coli RNAP structure from this study enables a direct comparison of the ␤, ␤Ј, and subunits between E. coli and Thermus (Fig. 2).
The entire architecture of the E. coli ␤ subunit (Eco␤) can be superimposed on the T. aquaticus ␤ subunit (Taq␤), with deviations around Eco␤i4 (residues 225-343), Eco␤i9 (residues 938 -1042), and Eco␤i11 (residues 1122-1180) (Fig. 2, a and b). Eco␤i11 comprises three ␣ helices, with a long loop connecting the second and third ␣ helices, and it is located near the ␤ subunit N terminus. The Eco␤i11 structural homolog in the Thermus RNAP is ␤i12 (Taq␤i12, residues 919 -969), but it is located ϳ20 Å away from the relative position of Eco␤i11 and does not associate with the N-terminal tail of Taq␤ (Fig. 2b). The structures of Eco␤i4 and Eco␤i9 have been determined and described previously (24).
In the case of the ␤Ј subunits of E. coli and Thermus, there is structural conservation distributed throughout the entire subunit (Fig. 2c). However, Eco␤Ј has several insertions that are not present in Taq␤Ј and vice versa. These insertions include a 13-amino acid insertion between Thr-553 and Thr-567 of Eco␤Ј, a 13-amino acid insertion between Glu-704 and Ser-718 of Eco␤Ј, and a domain inserted between Arg-796 and Gly-837 of Taq␤Ј (Fig. 2d). Taq␤Ј also has a large insert (Taq␤Јi2) between conserved regions A and B (Figs. 1c and 2c) (24,38).
The ␤Ј subunit trigger loop/helix (TLH) plays a critical role in the nucleotide addition cycle (39,40). The front edge of the TLH (residues 930 -941 and 1130 -1137) is highly flexible, but it becomes a rigid "trigger helix" structure when an incoming nucleotide is present at the active site. The middle of the E. coli TLH has a large insert (␤Јi6, residues 942-1129) that separates the TLH into two regions (TLH1, residues 915-941; and TLH2, residues 1130 -1148) (Fig. 2e). The edges of TLH1 and TLH2 of E. coli RNAP are in loop conformations (residues 930 -933 in TLH1 and residues 1133-1138 in TLH2; residues 934 -941 and 1130 -1132 are disordered). Eco␤Јi6 plays an important role in all stages of transcription, including open complex formation, transcription pausing, and termination, and its location was proposed to be near the ␤Ј subunit jaw (41). However, ␤Јi6 in the E. coli holoenzyme structure is completely disordered, without any trace of electron density map, indicating that ␤Јi6 is highly mobile in this crystal structure and possibly in an apoform holoenzyme.
The ␤Ј subunit bridge helix separates the deep groove of RNAP into a DNA-binding main channel and an NTP entry secondary channel (Figs. 1 and 2e) (17). The eukaryotic RNA polymerase II structure shows a straight-form bridge helix (39,42), whereas the Thermus RNAP structures show a bent-form bridge helix (5,17). Further crystallographic studies of the Thermus RNAP complex with the antibiotic streptolydigin (11), as well as a transcription elongation complex (18), have shown that an alternative straight-form bridge helix can exist in the Thermus RNAP. Based on these structures, it was proposed that alternate straight-form and bent-form bridge helix conformations are important for the nucleotide additional cycle, including NTP binding and DNA/RNA hybrid translocation (43,44). The E. coli RNAP holoenzyme structure presented here possesses a straight bridge helix (Fig. 2e).
Structure and Function of the Subunit of E. coli RNAP-The subunit of E. coli RNAP is composed of five ␣ helices (␣ 1 -␣ 5 ) (Fig. 3a), and the first three ␣ helices (␣ 1 -␣ 3 ) can be overlaid with the first three ␣ helices of the Thermus subunit (Fig. 3b). The folding of the subunit N-terminal tail in the E. coli RNAP is different in the Thermus RNAP structure. The E. coli subunit C-terminal tail, including ␣ 4 and ␣ 5 , is fully extended; the E. coli subunit makes no interaction with the C-terminal tail of the ␤Ј subunit, in contrast to the Thermus RNAP, which has an extensive interaction between the subunit and the C-terminal tail of the ␤Ј subunit (supplemental Movie S4).
Functionally, the subunit is the least understood subunit, but there is a clear link between the subunit and ppGpp-dependent transcription (45,46). The finding that the subunit structure is so different in the E. coli and Thermus RNAPs may be related to the observation that E. coli RNAP can respond to ppGpp only in the presence of the subunit (46,47). Thus, the E. coli holoenzyme structure can be used as an ideal system for understanding the relationship between the subunit and ppGpp-dependent transcription regulation and may finally reconcile 4 decades of experimental data, especially in understanding the cause of the stringent response and growth control by ppGpp in E. coli cells (48 -51).
Structure and Function of 1.1 -Strong and traceable electron density maps of 70 were attainable from 1.2 to the C terminus. In one of two E. coli RNAP molecules (RNAP A ) in the asymmetric unit, the F o Ϫ F c electron density map calculated using CNS version 1.3 (30) showed rod-like densities for 1.1 , which is adjacent to 1.2 . A homology model of E. coli 1.1 , which was constructed by SWISS-MODEL (52) based on the T. maritima 1.1 NMR structure (6), was placed on the 1.1 electron density map, and the positions of three ␣ helices were manually adjusted. An additional ␣ helix (H4) was then built based on a rod-like density next to the third ␣ helix (H3). The 1.1 structure was refined in the holoenzyme. The final 1.1 structure contains four ␣ helices (residues 6 -64), and the electron density of residues from position 65 to 1.2 (residue 95) is completely disordered. The higher B-factor and weak electron density map of 1.1 in the E. coli holoenzyme structure indicate that 1.1 is highly mobile in the holoenzyme. The structure shows that 1.1 is surrounded by 2, the ␤ lobe, the ␤Ј clamp, and the ␤Ј cleft ( Fig. 4a and supplemental Movie S3). The 1.1 location in the E. coli RNAP crystal structure is consistent with the E. coli RNAP model derived from systematic FRET and distance-constrained docking (8).
The acidic residues of 1.1 mask the basic residues of the ␤ lobe and ␤Ј clamp, and 1.1 fits snugly in the DNA-binding main channel of RNAP, thereby preventing access of either double-or single-stranded DNA to the RNAP active site. Therefore, 1.1 must disengage from this binding site, or the RNAP clamp must open further (34) to form an open complex.
The structure shows that the three basic residues (Lys-10, Arg-15, and Lys-17) found at the 1.1 N terminus are surfaceexposed and face the outside of the RNAP main channel (Fig. 4b and supplemental Movie S3). These 1.1 basic residues, together with other positively charged regions, including 2, the ␤ lobe, the ␤Ј clamp, and the ␤Ј jaw, form a continuous path of negative electrostatic potential for promoter DNA and downstream DNA binding. This region may also serve an important role in bending DNA to form the early stage intermediates between the closed and open promoter complexes (7). Although the presence of basic residues at the 1.1 N terminus is common in the group 1 family, the function of this basic region for transcription has not been tested. This basic region in 1.1 could make a contribution to open complex formation.
The ␣ Subunit C-terminal Domain within E. coli RNAP-The C-terminal domain of the ␣ subunit (␣-CTD, residues 250 -329) is a DNA-binding element and a major target of transcription factors for regulation (22,53). The two ␣-CTDs of the RNAP holoenzyme, connected to their N-terminal domains (␣-NTD) by linkers (54), can interact independently with transcription factors that bind to DNA 40 -100 bp upstream from the transcription start site (55,56). The structure of an ␣-CTD in the context of an intact RNAP has not been solved because it is dynamic. In the E. coli RNAP structure presented in this study, electron density was visible for only one of the four ␣-CTDs in the asymmetric unit (RNAP A ␣ I ). The map enabled a model of the ␣-CTD to be fitted (57). Furthermore, the map included density of the linker region that allows modeling of the linker (Fig. 5a). Arg-265 in the ␣-CTD is ϳ60 Å away from the ␣-NTD (residue 233), with the linker fully stretched and without any secondary structure ( Fig. 5a and supplemental Movie S2), indicating that the near-maximum length that Arg-265 in the ␣-CTD can reach DNA from its N-terminal domain is ϳ60 Å.
Previous biochemical studies suggested that surface-exposed residues in 4.2 interact directly with the ␣-CTD (Fig. 5b) (58,59). Although these residues are partially involved in making the ␣-CTD⅐4 complex in the E. coli RNAP crystal structure, the orientation of the ␣-CTD relative to 4 is different compared with the cryo-EM model of the RNAP⅐catabolite activator protein⅐DNA complex (60) and the predicted models of the ␣-CTD⅐4⅐DNA complex based on biochemical studies ( Fig. 5b and supplemental Movie S2) (58,59). The structure of the ␣-CTD in this holoenzyme structure may be one of several possible conformations of free holoenzyme and would have to rearrange itself for promoter DNA binding.
Concluding Remarks-The crystal structure of the E. coli RNAP holoenzyme presented here provides an ideal model for analyzing the functional data that have been generated for over 50 years and for designing future experiments that will uncover the transcription mechanisms. My E. coli RNAP structure reveals the molecular features of the ␣-CTD and 1.1 for the first time in the context of an intact bacterial RNAP. Furthermore, I have shown that the E. coli RNAP prepared from a cooverexpression vector can generate sufficient quantities of active RNAP for crystallization and high-quality diffraction. This methodology will facilitate the structure determination of the large collection of mutant RNAPs that have been generated for E. coli transcription and regulation studies. Finally, because the sequence and antibiotic sensitivity of E. coli RNAP are similar to those of pathogen-related RNAPs, including Mycobacterium tuberculosis and Staphylococcus aureus, E. coli RNAP can now be used to readily study RNAP-antibiotic interactions by x-ray crystallography.