Structural Basis of Transcription Initiation by Bacterial RNA Polymerase Holoenzyme*

Background: Cellular RNA polymerases start transcription by de novo RNA priming. Results: Structures and biochemical studies of initially transcribing complexes elucidate the de novo transcription initiation and early stage of RNA transcription. Conclusion: 5′-end of RNA in the transcribing complex starts σ ejection from core enzyme. Significance: Insights from this study can be applicable to all cellular RNA polymerases. The bacterial RNA polymerase (RNAP) holoenzyme containing σ factor initiates transcription at specific promoter sites by de novo RNA priming, the first step of RNA synthesis where RNAP accepts two initiating ribonucleoside triphosphates (iNTPs) and performs the first phosphodiester bond formation. We present the structure of de novo transcription initiation complex that reveals unique contacts of the iNTPs bound at the transcription start site with the template DNA and also with RNAP and demonstrate the importance of these contacts for transcription initiation. To get further insight into the mechanism of RNA priming, we determined the structure of initially transcribing complex of RNAP holoenzyme with 6-mer RNA, obtained by in crystallo transcription approach. The structure highlights RNAP-RNA contacts that stabilize the short RNA transcript in the active site and demonstrates that the RNA 5′-end displaces σ region 3.2 from its position near the active site, which likely plays a key role in σ ejection during the initiation-to-elongation transition. Given the structural conservation of the RNAP active site, the mechanism of de novo RNA priming appears to be conserved in all cellular RNAPs.

During successive steps of transcription initiation, RNAP 3 must employ different mechanisms of NTP loading and accommodate the growing RNA product within the RNA channel until entering the stable transcription elongation phase. All cellular and most bacteriophage RNAPs initiate transcription de novo by loading two iNTPs opposite the first and second template DNA bases at the transcription start site to synthesize the first phosphodiester bond of RNA. Loading the first iNTP, which will become the 5Ј-end of RNA, at the i site is a unique process for de novo transcription by RNAP because this site in the transcription elongation complex usually accommodates the RNA 3Ј-end, whose binding is stabilized by the preceding ϳ8 base pairs of the DNA/RNA hybrid (1). The binding affinity of the first iNTP during de novo transcription is substantially lower than that of the second iNTP binding at the iϩ1 site (2,3). This allows the direct sensing of NTP concentrations by RNAP to become a basis of regulating transcription initiation at ribosomal RNA promoters (4) and as well at the pyrimidine biosynthesis genes (5) in bacteria. High-resolution crystal structures of the de novo transcription initiation complex containing RNAP, DNA, and the first and second iNTPs bound at the active site have been determined for bacteriophage T7 (6) and N4 (7) RNAPs, and these structures revealed unique interactions between the first iNTP and RNAP/DNA. However, due to structural differences between different classes of RNAPs, insights from the bacteriophage RNAP structures cannot be directly transferred to cellular RNAPs. Previously published crystal structure of the Thermus thermophilus RNAP transcription initiation complex contains a GpA dinucleotide primer complementary to the template DNA positions Ϫ1 and ϩ1 but lacking the 5Ј-triphosphate group (8). Recently, a structure of the T. thermophilus RNAP initiating complex containing two iNTPs has been reported (9). However, the roles of the observed RNAP-iNTP contacts in transcription initiation were not tested experimentally; in addition, the structure contained a suboptimal template strand sequence around the transcription start site (see below), suggesting that it might miss some important contacts with the iNTPs.
Following the de novo incorporation step, RNAP goes through several cycles of NTP addition before entering transcription elongation, during which a highly stable and proces-sive RNAP complex performs synthesis of thousands of bases of RNA. In contrast, the initially transcribing complexes of RNAP holoenzyme containing short RNAs, usually ranging 2-12 nucleotides in length, are unstable resulting in abortive initiation (10,11). Multiple events during initial transcription, including release from the core enzyme, DNA scrunching, and promoter escape, have been proposed (12)(13)(14); however, molecular basis of the initial transcription by bacterial RNAP is lacking due to the limited stability of these complexes and the difficulty of capturing a homogeneous initiating complex containing short nascent RNA.
Here, we report a crystal structure of the de novo transcription initiation complex of T. thermophilus RNAP, containing two iNTPs loaded in the active site. The structure and also structure-based biochemical assays reveal key interactions between the RNAP holoenzyme, DNA, and iNTPs that are critical for de novo transcription initiation. Furthermore, we prepared a homogeneous initially transcribing complex with a 6-mer RNA using in crystallo transcription approach and solved its crystal structure, which provides insights into the binding of short RNA-DNA hybrid in the active site and in the process of release triggered by the nascent RNA, as the initially transcribing complex begins transition into the elongation phase of transcription.

EXPERIMENTAL PROCEDURES
Preparation and Crystallization of the T. thermophilus RNAP Promoter DNA Complex, the de Novo Transcription Initiation Complex, and the Transcription Complex Containing 6-Mer RNA-T. thermophilus HB8 cells were cultured by using a 300liter BioService fermentor at the Penn State fermentation facility. Endogenous T. thermophilus RNAP core enzyme was purified as follows: ϳ100 g of cell paste was suspended in 300 ml of lysis buffer (40 mM Tris-HCl, pH 8, at 4°C, 100 mM NaCl, 10 mM EDTA, 10 mM 2-mercaptoethanol, 1 mM benzamidine, 1 mM PMSF, 0.5 g/ml leupeptin, and 0.1 g/ml pepstatin), and cells were lysed by Emulsiflex C3 homogenizer (Avestin, Inc.) at 20,000 psi. After 30 min, benzamidine and PMSF (1 mM) were added to the lysate. The lysate was then clarified by centrifugation, and glycerol was added to the supernatant to a concentration of 5%. RNAP in the soluble fraction was precipitated by adding 10% polyethyleneimine (Polymin-P) solution (0.5%), and the pellet was recovered by centrifugation. The pellet was then resuspended and washed with 200 ml of wash buffer (40 mM Tris-HCl, pH 8, at 4°C, 200 mM NaCl, 1 mM EDTA, 10 mM 2-mercaptoethanol, 5% glycerol, 1 mM benzamidine, and 1 mM PMSF) and again recovered by centrifugation. This wash step was repeated with an additional 200 ml of wash buffer. Finally, RNAP was recovered by resuspending the pellet in 100 ml of extraction buffer (40 mM Tris-HCl, pH 8, at 4°C, 1 M NaCl, 1 mM EDTA, 10 mM 2-mercaptoethanol, 5% glycerol, 1 mM benzamidine, and 1 mM PMSF) and centrifuging for 15 min at 4°C and 17,000 rpm. This extraction step was repeated with another 100 ml of extraction buffer, and the supernatants from both extractions were combined (total, 200 ml). Ammonium sulfate powder was then gradually added to 45% concentration to precipitate RNAP followed by centrifugation. The pellet was suspended in TGED buffer (10 mM Tris-HCl, pH 8 at 4°C, 0.1 mM EDTA, 5% glycerol, and 1 mM DTT), and RNAP core enzyme was purified by heparin, ResourceQ, and SP Sepharose column chromatography (GE Healthcare).
T. thermophilus A was expressed in BL21(DE3) cells transformed with a pET21a plasmid containing T. thermophilus sigA gene. Cells were grown in 1.5 L LB containing 100 g/ml ampicillin for 6 h, induced with 1 mM isopropyl 1-thio-␤-D-galactopyranoside, and grown for additional 3 h at 37°C. After cells were lysed by sonication, A was purified from the lysate by heat treatment in a 65°C water bath for 30 min. The suspension was centrifuged at 16,000 rpm to remove the white precipitate. A was then purified from the supernatant by ResourceQ column (GE Healthcare) chromatography. A was stored in 50 mM Tris-HCl, pH 7.7, 400 mM NaCl, and 15% glycerol at Ϫ20°C.
T. thermophilus RNAP holoenzyme was prepared by adding 3-fold molar excess of A to core RNAP and then purified by Superdex200 column chromatography. The RNAP and promoter DNA complex was prepared by mixing 18 M (24 l) T. thermophilus holoenzyme (in 20 mM Tris-HCl, pH 7.7, 100 mM NaCl, and 1% glycerol) and 1 mM (0.65 l) of the DNA scaffold (see Fig. 1A) and incubated for 30 min at 22°C. Crystals were obtained by using hanging drop vapor diffusion by mixing equal volume of RNAP-DNA complex solution and crystallization solution (100 mM Tris-HCl, pH 8.7, 200 mM KCl, 50 mM MgCl 2 , 10 mM Spermine tetra-HCl, and 10% PEG 4000) and incubating at 22°C over the same crystallization solution. The crystals were cryoprotected by soaking in same constituents as the crystallization solution with stepwise increments of PEG4000 and (2R,3R)-(Ϫ)-2,3-butanediol (Sigma-Aldrich) to final concentrations of 25 and 15%, respectively. The final cryoprotectant solution was also used for soaking NTPs at room temperature to prepare the de novo transcription initiation complex (5 mM ATP; and 5 mM CMPCPP from Jena Bioscience; soaking time, 30 min) and the holoenzyme transcription complex with 6-mer RNA (5 mM each of ATP, CTP, and UTP; soaking time, 5 h).
X-ray Data Collections and Structure Determination-The crystals belong to the C-centered monoclinic space group (Table 1) containing one T. thermophilus RNAP transcription complex per asymmetric unit. The data set was collected at the Macromolecular Diffraction at Cornell High Energy Synchrotron Source (MacCHESS) F1 beamline (Cornell University, Ithaca, NY), and the data were processed by HKL2000 (15). The T. thermophilus RNAP-promoter DNA complex (8) was used as a search model for the molecular replacement (16). Rigid body refinements were performed, and further adjustments to the model were performed manually. The resulting model phases allowed positioning of iNTPs in the de novo transcription initiation complex and RNA in the initially transcribing complex in their electron-density maps. Positional refinement and reference model restraints was performed by the program Phenix (17).
In Vitro Transcription-The wild-type Escherichia coli RNAP core enzyme and holoenzymes containing 70 were prepared as described (18). Mutations in the E. coli rpoB gene of the RNAP ␤ subunit were obtained by site-directed mutagenesis of plasmid pIA545 and recloned into plasmid pIA679 encoding all four core RNAP subunits with a His 6 tag in the N termi-nus of the ␤ subunit (both plasmids kindly provided by I. Artsimovitch). Mutant RNAPs were expressed in E. coli BL21(DE3) and purified using Polymin-P precipitation, heparin, and nickel-nitrilotriacetic acid affinity chromatography, and MonoQ column ion exchange chromatography (19). The E. coli RNAP holoenzyme containing the subunit with region 3.2 deletion was prepared as described (20).
In vitro transcription of E. coli RNAP containing wild-type and mutant ␤ subunits was carried out as described (20). Apparent K m values for iNTPs were measured as described (20). RNA products were separated by 15% (for full-length RNA transcription), 23% or 30% (for abortive RNAs) PAGE, followed by phosphorimaging.
Analysis of DNA Sequences around Transcription Start Site of Human Pol II Promoters-Human RNAP II (Pol II) promoter sequences (10,000 of 25,976 in database) were obtained from The Eukaryotic Promoter Database (21) and analyzed for their sequence conservations. Graphical representation was prepared using WebLogo (22).

Design of the X-ray Crystallographic Experiment to Determine the Structure of the de novo Transcription Initiation
Complex-The T. thermophilus RNAP holoenzyme-promoter DNA complex crystals were obtained by modifying a method described previously (8), using a DNA scaffold containing the Ϫ10 and consensus discriminator elements (23) in the nontemplate strand (Fig. 1A). To obtain the crystals of the de novo transcription initiation complex, the holoenzyme-promoter DNA crystals were soaked with the first two iNTPs that base pair with the ϩ1 and ϩ2 bases of template DNA. To maintain the transcription bubble during crystallization of the holoenzyme-promoter DNA complex, previous studies used a noncomplementary template strand sequence upstream of the transcription start site (8,9). Our previous structural analysis of the N4 phage RNAP de novo transcription complex (7) identified that the template DNA nucleotide at the Ϫ1 position par-ticipates in binding the first iNTP by a base stacking interaction. Therefore, in this study, we changed the template DNA sequence at the Ϫ1 position to a guanine base (the templatestrand sequence 3Ј-GTGA-5Ј; the transcription start site is underlined), thus mimicking the natural sequence of most bacterial promoters recognized by the primary factor (24). By incubating the crystals with ATP and a nonhydrolyzable CTP analog, CMPCPP (cytidine-5Ј-[(␣,␤)-methyleno]triphosphate), we obtained the de novo transcription initiation complex structure with the two iNTPs bound at the active site, but without phosphodiester bond formation. The structure was solved by molecular replacement at 2.9 Å resolution. The structure showed strong unbiased F o Ϫ F c electron densities corresponding to the first and second iNTPs ( Fig.  2A, supplemental Movie S1). The overall structure of RNAP was identical with that of the search model (Protein Data Bank code 4G7H) (8). However, important differences included conformational changes in the trigger loop of the ␤Ј subunit (residues 1233-1255) and the amino acid side chains of both the ␤ and ␤Ј subunits around the active site (Figs. 2, B and C).
The First iNTP Binding Site in the de Novo Transcription Initiation Complex-Binding of the first and second iNTPs involves interactions with residues around the active site that are unique to de novo transcription and other interactions that are common to RNA elongation. The first iNTP occupies the i site in the de novo transcription initiation complex, which overlaps with the RNA 3Ј-end binding site in the elongation complex ( Fig. 2C) (1). The de novo transcription-specific interactions between RNAP and iNTPs appear to concentrate on the triphosphate of the first iNTP, which is ATP in our crystal structure, with multiple salt bridges with the ␤ subunit residues, including 1) Gln-␤567 and His-␤999 (interact with a nonbridging oxygen of ␥-phosphate; correspond to E. coli RNAP residues Gln-␤688 and His-␤1237); and 2) Lys-␤838 (interacts with a nonbridging oxygen of ␣-phosphate; corresponds to E. coli residue Lys-␤1065) (Fig. 2, B and E). Electron density map for the Lys-␤846 side chain (E. coli residue Lys-␤1073) is not well defined (data not shown), but it may be positioned near a nonbridging oxygen of ␣-phosphate for making an additional salt bridge. Arg-704 of the ␤Ј subunit (Arg-␤Ј704; E. coli residue Arg-␤Ј425) forms salt bridges with 2Ј-OH and 3Ј-OH of the ATP and may therefore participate in discrimination between NTP and dNTP. All residues interacting with the ATP triphosphate are absolutely conserved in all cellular RNAPs (Fig. 2E), suggesting a universal mechanism for accommodating the first iNTP at the enzyme active site to establish de novo transcription.
The structure is consistent with previous biochemical studies showing that residues Lys-␤1065, Lys-␤1073, and His-FIGURE 1. Structure of the transcription initiation complex. A, schematic of the de novo transcription initiation complex (top) and the initially transcribing complex (bottom) obtained from the RNAP-promoter complex. Sequence of the nucleic acid scaffold used for holoenzyme-promoter DNA complex crystallization is shown in the middle (pink, template DNA; yellow, non-template DNA; gray, Ϫ10 element). NTPs and 6-mer RNA in the transcription complexes are depicted by green circles. Positions of the RNAP active site (red sphere) as well as template DNA transcription start site (TSS, ϩ1, red circle) are indicated. DNA bases indicated by black dashed circles are disordered in the crystal structures. In a standard format describing RNAP promoter sequence, the transcription start site, which base pairs with the first iNTP and encodes the 5Ј-end of the RNA chain, is designated ϩ1. DNA positions downstream from the transcription start site increase as positive numbers from ϩ1 and DNA positions upstream of the start site increase as negative numbers counting from Ϫ1. B, overall structure of the de novo transcription initiation complex. T. thermophilus RNAP is depicted as a molecular surface model. Each subunit of RNAP is depicted with a unique color (white, ␣; cyan, ␤; pink, ␤Ј; orange, ). DNA is depicted as a sphere model with the same color scheme as described in A. Right panel shows a magnified view of the boxed region in the left panel. For clarity, the ␤ subunit was made transparent in the left panel and removed in the right panel. Several key motifs discussed in the text are highlighted (see also supplemental Movie S1).
␤1237 of E. coli RNAP (counterparts of T. thermophilus residues Lys-␤838, Lys-␤846, and His-␤999, respectively) are close to the binding site of the first iNTP and that their substitutions impaired transcription (25,26). To further analyze the functions of the amino acid residues interacting with the first iNTP, we made four E. coli RNAP mutants, including Lys-␤1065A and Lys-␤1065A/Lys-␤1073A, for residues interacting with ␣-phosphate, and His-␤1237A and His-␤1237A/Gln-␤688A, for residues interacting with ␥-phosphate, and tested their transcription activities on the T7A1 promoter (Fig. 3). The mutations resulted in major defects in transcription initiation in the presence of low concentrations of NTPs, resulting in decreased abortive and full-length RNA synthesis (Fig. 3B,  lanes 1-6). The activities of the Lys-␤1065A and His-␤1237A RNAPs were partially recovered by adding an initiating dinucleotide primer (lanes 7-12) and were further rescued in the presence of high concentrations of NTPs (lanes 13-18 and 19 -24), suggesting that the mutations primarily affect the initiation step of transcription. Importantly, the mutations also affected the pattern of abortive products synthesized during initiation ( Fig. 3B; see below). The double mutant His-␤1237A/ Gln-␤688A did not differ significantly from the His-␤1237A RNAP; however, the double mutant Lys-␤1065A/Lys-␤1073A was essentially inactive at all conditions tested, indicating that residue Lys-␤1073A is essential for initiation. The results highlight the relevance of these residues during de novo transcription initiation and are consistent with their proposed roles in the first iNTP binding.
Based on the previous observations on the role of region 3.2 in iNTP binding (3,27), we next inspected possible interactions between this region and the iNTPs loaded in the active site. The closest distance between the region 3.2 (Glu-324 side chain) and the ␥-phosphate of the i site bound ATP is ϳ10.5 Å; at this distance, the region 3.2 is positioned near the upstream template DNA bases Ϫ3/Ϫ4 (Fig. 2A). This rules out any direct interaction between this region and the first iNTP during de novo initiation and suggests that its stimulatory effect on iNTP binding is indirect.
From the structure, we noticed a base stacking interaction between the guanine template position Ϫ1 and the i site bound ATP (supplemental Movie S1); also the N-1 of the guanine base and the ␤-phosphate of ATP are connected by a water molecule (Fig. 2B). The base stacking interaction would be minimized in the case of template DNA containing a pyrimidine base at the Ϫ1 position ( Fig. 4A and supplemental Movie S1). These additional interactions may contribute to stable binding of the ATP during de novo transcription. To test this hypothesis, we measured the effects of a G-to-C substitution at the Ϫ1 template position on the K m of iNTPs on the T7A1 promoter (the wildtype template strand sequence 3Ј-GTAG-5Ј, the transcription start site is underlined, Fig. 3A). We found that this substitution greatly increased apparent K m values for both ϩ1ATP and ϩ2UTP. In fact, the effect for the ϩ2UTP was even stronger than for the ATP (78.8-and 8.9-fold changes compared with the wild-type T7A1 promoter, respectively) (Fig. 4B). Thus, the interaction between the Ϫ1 template base and the ϩ1ATP may not only contribute to the ϩ1ATP binding, but also stabilize the template DNA strand and thus stimulate the ϩ2UTP binding. To further test this hypothesis, we also measured K m s for E. coli RNAP with a region 3.2 deletion (⌬513-519), which decreases the iNTP binding due to destabilizing the template DNA strand (20). Deletion of region 3.2 increased K m s for both ϩ1ATP and ϩ2UTP on the wild-type T7A1 promoter. However, the promoter substitution only slightly increased K m s for the mutant RNAP (3-and 2.5-fold for ϩ1ATP and ϩ2UTP, respectively), suggesting that both the region 3.2 deletion and the Ϫ1 nucleotide substitution disturb the template strand conformation. Second iNTP Binding Site in the de Novo Transcription Initiation Complex-The second iNTP, CMPCPP in our crystal structure, is positioned through base pairing with template DNA and interacts with the ␤Ј subunit trigger loop and basic residues (Arg-␤557, Arg-␤879, and Arg-␤Ј1029) on the rim of the secondary channel (Fig. 2B). In the holoenzyme-promoter DNA complex, the trigger loop is in an open conformation and the tip of the loop (residues 1238 -1251) is disordered (8), whereas in the elongation complex with an NTP substrate at the iϩ1 site, the trigger loop is in the closed conformation and forms two trigger helixes without disordered regions (Fig. 2C) (1). In the de novo transcription initiation complex, even though the trigger loop is in a more closed conformation compared with the holoenzyme-promoter DNA complex, the middle portion of the trigger loop (residues 1246 -1251) is still disordered. Met-␤Ј1238 and Gln-␤Ј1235 residues from the N-terminal ␣-helix of the trigger loop reach out to interact with the base and 3Ј-OH of CMPCPP, respectively. However, because the trigger helix is not fully folded, there is no direct interaction of the triphosphate group of the second iNTP with Arg-␤Ј1239 and His-␤Ј1242, causing it to attain a preinsertion conformation (Fig. 2D). In particular, the ␣-phosphate of CMPCPP is ϳ5.4 Å away from the 3Ј-OH of ATP forming a nonreactive state. The catalytic metal Mg A is bound stably at the active site through the aspartate triad (residues at 739, 741, and 743) of the DFDGD motif of the ␤Ј subunit. However, Mg B is too far away for being coordinated by the aspartate triad (Fig.  2D) and is primarily coordinated by the ␤and ␥-phosphates of CMPCPP. This nucleotide position is similar to the preinsertion state observed in crystal structures of the T. thermophilus elongation complex with the inhibitor streptolydigin (1), the eukaryotic Pol II elongation complex containing GMPCPP (guanine-5Ј-[(␣,␤)-methyleno]triphosphate) (28), and the Pol II initially transcribing complexes containing short RNA (2ϳ7 nt in length) and AMPCPP (adenine-5Ј-[(␣,␤)methyleno]triphosphate) (29). Similar to these structures, using the non-hydrolyzable CMPCPP for preparing the bacterial de novo transcription initiation complex in this study might affect the coordination of the iNTP and metals at the active site, thus positioning the iϩ1 nucleotide in the precatalytic conformation, which likely corresponds to a natural intermediate in formation of the catalytically competent transcription inititiation complex. During de novo transcription initiation in the presence of natural iNTP substrates, the trigger loop would then adopt the completely closed conformation, forming the trigger helixes to push the iNTP at the iϩ1 site closer to the 3Ј-OH at the i site and resulting in the formation of the first phosphodiester bond.
Structure of the Initially Transcribing Complex of RNAP Holoenzyme Containing 6-Mer RNA-Crystal structures of transcription complex containing short RNAs of 2 to 7 nt in length have been determined for the Pol II system. However, these complexes were prepared by incubating Pol II or Pol II/TFIIB with DNA template and synthetic RNA oligonucleotides (29 -31). Therefore, these complexes lack the triphosphate group at the 5Ј-end of RNA that may play important roles in the early stages of transcription such as DNA/RNA hybrid stabilization, separation of RNA from template DNA after reaching the full hybrid length and removal of the transcription initiation factor from RNAP. To obtain the structure of an early-stage transcription complex containing the factor and the natural form of RNA containing the 5Ј-triphosphate group, we prepared the initially transcribing complex by in crystallo approach. For this purpose, we soaked the holoenzyme-promoter DNA complex crystals with ATP, CTP, and UTP, resulting in the synthesis of a 6-mer RNA (the template sequence is 3Ј-GTGAGTGC-5Ј, the transcription start site is underlined) (Fig. 1A). The structure was solved at 3 Å resolution, and it showed a continuous electron density for the 6-mer RNA synthesized along the template DNA (Fig. 5, A and B, and supplemental Movie S1), indicating that T. thermophilus RNAP was active in the crystalline state and capable of transcribing RNA to the expected length.
The RNA transcript remains in the pretranslocated state, with its 3Ј-end bound in the iϩ1 site; however, the pyrophosphate by-product is not visible, and the middle region of the trigger loop is disordered (residues 1239 -1253) (Fig. 5B). Further movement of the DNA/RNA hybrid to a post-translocated state may be constrained due to steric hindrance between the FIGURE 4. Role of the base stacking interaction between the first iNTP and template DNA in transcription initiation. A, schematic representation of the base stacking interaction between the Ϫ1 template DNA purine base and the first iNTP purine base. (see also supplemental Movie S1). B, apparent K m values for the initiating substrates on the wild-type T7A1 promoter (Ϫ1C nontemplate sequence) and its variant (Ϫ1G non-template sequence) for RNAPs containing wild-type or ⌬513-519 70 factors. The numbers in blue and boldface type show changes in K m values for the T7A1 promoter variant relative to the wild-type sequence. C, non-template DNA sequence around the transcription start sites of human Pol I, which is adapted from Ref. 35. The transcription start site is ϩ1. D, sequence conservation of the non-template DNA around transcription start site of human Pol II. The sequence logo was prepared using experimentally determined non-redundant collection of human Pol II promoters, for which the transcription start site (ϩ1) has been determined experimentally. The majority of human Pol II promoters contain Ϫ1 pyrimidine and ϩ1 purine bases in the non-template DNA.
5Ј-end of RNA and the region 3.2 (see below), and the ejection of region 3.2 may be restricted due to crystal packing. A bigger lobe of electron density appears next to the RNA 3Ј-end ( Fig. 5A and supplemental Movie S1). This density is close to the rim of the secondary channel and overlaps with the temporary NTP binding E site defined earlier in Pol II (32), which is distinct from the preinsertion site (1).
The downstream duplex DNA in the initially transcribing complex is shifted 6 base pairs upstream due to RNA transcription, whereas the contact between the upstream Ϫ10 DNA ele-ment and the region 2 is maintained. Thus, initial RNA transcription renders nontemplate DNA bases from Ϫ3 to ϩ6 disordered due to scrunching of the nontemplate strand ( Fig.  1A and supplemental Movie S1).
Compared with the de novo transcription initiation complex, the initially transcribing complex with 6-mer RNA also reveals changes in the region 3.2 and the template DNA near the region 3.2 (Fig. 5C). In the de novo complex, the template DNA bases at the Ϫ4/Ϫ3 positions are flipped out to make contacts with the acidic residues of the region 3.2, whereas in the initially transcribing complex, DNA bases at corresponding positions upstream of the active site base pair with the nascent 6-mer RNA. The 5Ј-end of the RNA transcript reaches region 3.2 causing the tip of this region (residues 321-327) to become disordered (Fig. 5B). This is likely due to the charge repulsion between the acidic cluster of the region 3.2 and the 5Ј-triphosphate of RNA. When RNAP continues transcription, the region 3.2 has to be pushed further to accommodate longer RNA transcript extending toward the region 4.1, ultimately resulting in ejection from the RNA exit channel (Fig.  5D).
Overall, the DNA/RNA hybrid in this initially transcribing complex is a typical A-form duplex with no intrinsic tilting that was observed earlier in the Pol II initially transcribing complexes (Fig. 5D) (29,31). As the DNA/RNA hybrid extends in length, the initial contacts between RNAP and the triphosphate group of the first iNTP are lost. The positively charged and polar residues involved in interactions with the triphosphate group of the first iNTP in the de novo transcription initiation complex (Gln-␤567, Lys-␤838, Lys-␤846, and His-␤999) are now involved in interactions with internal phosphate and ribose groups at positions Ϫ3/Ϫ4 of the nascent RNA, suggesting their role in stabilization of the short RNA-DNA hybrid (Fig. 5B). To test the role of these residues in the extension of short RNAs, we analyzed abortive transcription by the wildtype and mutant E. coli RNAP variants on a consensus T7A1cons promoter (Fig. 3A), which is characterized by a very high efficiency of abortive synthesis due to strong RNAP-promoter interactions (20). Transcription was performed at high NTP concentrations and in the presence of the dinucleotide primer, to ensure efficient de novo initiation by the mutant RNAPs. Surprisingly, substitution Lys-␤1065A decreased abortive RNA synthesis (Fig. 5E), suggesting that this residue may directly or indirectly disturb short RNA extension in the wildtype RNAP, while playing a role in the first bond formation (see above). In contrast, substitutions His-␤1237A and His-␤1237A/Gln-␤688A greatly stimulated the synthesis of Յ10-nt abortive RNAs, but did not affect longer abortive products. Similarly, these mutations increased the amounts of short abortive RNAs synthesized on the wild-type T7A1 promoter (Fig.  3B, lanes 11 and 12 and lanes 23 and 24). Therefore, residue His-␤1237 plays an important role in stabilization of short nascent RNAs in the RNAP active site, until the full DNA/RNA hybrid is formed and the RNA reaches the RNA exit channel.

DISCUSSION
In this study, we report the structures of bacterial transcription initiation complexes with iNTPs or short RNA product bound in the active site of RNAP. In comparison with existing structures, they reveal essential interactions of the iNTPs with template DNA and for the first time highlight initial steps of RNA extension by bacterial RNAP. Furthermore, we provide biochemical support to our structural findings and demonstrate the importance of the observed contacts of iNTPs and RNA with RNAP and the DNA template for transcription initiation.
The binding of the first iNTP is a special feature of all cellular RNAPs that are capable of primer-independent de novo tran-scription initiation. A network of salt-bridge and hydrogen bonds between the triphosphate of the first iNTP and RNAP side chains is critical for its binding. At the same time, the positions of the ribose and the base of the first iNTP are identical to the i site in the elongation complex. Furthermore, the position of the second iNTP in the promoter complex corresponds to the position of the incoming NTP bound in preinsertion state in the elongation complex, indicating that the mechanisms of the phosphodiester bond formation are identical in the transcription initiation and elongation. Similar positions of the iNTPs were observed in the T. thermophilus RNAP de novo transcription initiation complex structure, which has been reported by Ebright and co-workers (9) as a part of the analysis of the GE23077 antibiotic targeting bacterial RNAP while our manuscript was in preparation. We now provide strong biochemical evidence in support of the importance of the observed iNTP contacts with RNAP and DNA for de novo transcription initiation.
The structures we obtained by using template DNA containing a purine nucleotide at the Ϫ1 position revealed a stacking interaction between the Ϫ1 purine base in template DNA and the first purine iNTP. These stacking interactions were not observed in previously reported structures because they used a suboptimal template sequence with a template pyrimidine at Ϫ1 position. We showed that the base stacking interaction plays major roles in the binding of not only the first iNTP but also the second iNTP, likely as a result of template stabilization (Fig. 4B). This purine-purine base stacking during de novo transcription initiation explains the preference of the purine nucleotide at the transcription start site and pyrimidine nucleotide at the preceding position in the non-template strand in bacterial promoters (Fig. 4A) (24,33,34). For example, in the Mycobacterium tuberculosis genome, the occurrences of pyrimidines and purines at the Ϫ1 and ϩ1 non-template DNA bases are Ͼ70 and 80%, respectively (33). It remains to be established whether the presence of suboptimal Ϫ1 nucleotides in the minority of promoters has any specific role in transcription regulation in bacteria. The preferable pyrimidine (Ϫ1) and purine (ϩ1) combination is also found in the majority of eukaryotic Pol I (35) and Pol II promoters (Fig. 4, C and D) (36). A similar base pair stacking interaction was noticed in the structure of the N4 phage RNAP de novo transcription initiation complex (7), indicating that this mechanism may be universal.
Although the role of the factor in promoter recognition is well established, the structural basis for its role in the binding of iNTPs remained unknown. Our structure reveals the absence of any direct contacts between the region 3.2 and the first iNTP. Rather, this region seems to guide the path of template DNA, thereby positioning the DNA bases for ideal binding with the iNTPs. Therefore, deletion of the region 3.2 likely disturbs template DNA binding, especially around the transcription start site, allowing it to adopt a floppy conformation that hampers stable binding of the iNTPs (3,20,37). The eukaryotic Pol II counterpart of the region 3.2 is the B-reader of TFIIB, which analogously inserts its ␣-helix deep into the DNA binding cleft close to the active site of Pol II to ensure proper template DNA positioning as well as participate in the transcription start site selection (30,38,39).
When RNAP transcribes a 6-mer RNA, the RNA 5Ј-end reaches the tip of the region 3.2, causing it to become disordered, most likely due to repulsion between the RNA 5Ј-triphosphate and the acidic cluster at the region 3.2 tip. The 5Ј-triphosphate group is also partially disordered, indicating that there are no stable interactions between the region 3.2 and RNA. Our initially transcribing complex structure indicates that synthesis of a short RNA transcript is the first step of the ejection process that is likely followed by ejection of region 4 from the RNA exit channel and allows for the initiation-to-elongation transition. Accordingly, deletion of the factor from its region 3.2 to C terminus eliminates abortive RNAs (40), whereas deletion of the region 3.2 reduces only 5ϳ9-mer abortive RNAs (20). Similarly, the B-reader of factor TFIIB in eukaryotic Pol II should be extruded from the RNA channel after RNA reaches 6 nucleotides in length, revealing striking similarities to the bacterial system (23).
The initially transcribing complexes of RNAP holoenzyme containing short RNAs, usually 2-12 nucleotides in length, are unstable in solution resulting in abortive initiation (10,11). In this study, we demonstrated that the T. thermophilus RNAP is active in crystallized state and is capable of synthesizing up to 6-mer RNA. In the crystallized state, RNAP motions are restricted by the crystal packing, likely stabilizing the initially transcribing complex and allowing for preparation of highly homogenous complexes. The in crystallo transcription system can be used to study RNA transcription by Raman crystallography and time-resolved trigger-freeze crystallography. Analogous to the studies performed on single-subunit RNAPs (41,42), these new experimental approaches may allow to trace events in not only initial transcription but also during the NTP addition cycle in cellular RNAPs.