Solution structure of domain 1.1 of the σA factor from Bacillus subtilis is preformed for binding to the RNA polymerase core

Bacterial RNA polymerase (RNAP) requires σ factors to recognize promoter sequences. Domain 1.1 of primary σ factors (σ1.1) prevents their binding to promoter DNA in the absence of RNAP, and when in complex with RNAP, it occupies the DNA-binding channel of RNAP. Currently, two 3D structures of σ1.1 are available: from Escherichia coli in complex with RNAP and from T. maritima solved free in solution. However, these two structures significantly differ, and it is unclear whether this difference is due to an altered conformation upon RNAP binding or to differences in intrinsic properties between the proteins from these two distantly related species. Here, we report the solution structure of σ1.1 from the Gram-positive bacterium Bacillus subtilis. We found that B. subtilis σ1.1 is highly compact because of additional stabilization not present in σ1.1 from the other two species and that it is more similar to E. coli σ1.1. Moreover, modeling studies suggested that B. subtilis σ1.1 requires minimal conformational changes for accommodating RNAP in the DNA channel, whereas T. maritima σ1.1 must be rearranged to fit therein. Thus, the mesophilic species B. subtilis and E. coli share the same σ1.1 fold, whereas the fold of σ1.1 from the thermophile T. maritima is distinctly different. Finally, we describe an intriguing similarity between σ1.1 and δ, an RNAP-associated protein in B. subtilis, bearing implications for the so-far unknown binding site of δ on RNAP. In conclusion, our results shed light on the conformational changes of σ1.1 required for its accommodation within bacterial RNAP.

Transcription of DNA into RNA is an essential cellular process. It is mediated by DNA-dependent RNA polymerase (RNAP). 3 RNAP is a multisubunit enzyme, and in bacteria the RNAP core is composed of five subunits (2␣, ␤, ␤Ј, ). Grampositive Firmicutes contain two additional subunits, ␦ and ⑀ (1). A number of functions have been ascribed to ␦, including affecting affinity of RNAP for DNA (2) and for initiating nucleoside triphosphates (3) or functioning as a transcription factor (4). The absence of ␦ decreases competitive fitness and virulence (3,5). The effects of ␦ are augmented by HelD, a helicaselike protein associating with RNAP (6). The function of ⑀ is less clear (6,7). The RNAP core is capable of transcription elongation, but it is not able to bind to promoter DNA, where transcription begins. For RNAP to bind to promoter DNA, the presence of a factor is necessary. Upon a factor binding to the RNAP core, the resulting RNAP holoenzyme is capable of recognizing promoter sequences and subsequently initiating transcription (8). Preventing the binding of factors to the RNAP core is also used as a strategy for development of novel antibacterial compounds (9). Different bacterial species contain different numbers of different factors, ranging from one per species to more than 100 (10). According to their structure, factors are divided into two principally distinct groups: the 70 and 54 families. Factors from the 54 family have no sequence similarity with 70 factors, and they require the binding of ATP-dependent activators (11). Factors from the 70 family are present in all bacterial species, and no ATP-dependent activators are needed. The 70 family is further subdivided into four groups (groups 1-4), based on domain composition. Group 1 contains vegetative factors ( 70 in Escherichia coli, A in Bacillus subtilis) essential for transcription of housekeeping genes. Groups 2-4 contain structurally related alternative factors responsible for transcription of genes whose expression is important during various environmental stresses (10,11).
The vegetative factors (group 1) contain four domains: domain 1.1, domain 2 (regions 1.2-2.4), domain 3 (regions 3.0 -3.2), and domain 4 (regions 4.1-4.2). Regions 2.4 (domain 2) and 4.2 (domain 4) recognize the Ϫ10 and Ϫ35 promoter consensus hexamer sequences, respectively, that are critical for the initial RNAP-DNA binding (closed complex) and subsequent formation of the transcription bubble, the so-called open complex. Region 1.2 (domain 2) interacts with the DNA region between the transcription start site (ϩ1) and the Ϫ10 hexamer and affects the stability of the open complex. Domain 3 binds to the Ϫ10 extended motif (TGx). This motif precedes the Ϫ10 hexamer, and it is not present in all promoters. When it is present, however, it increases the promoter affinity for RNAP and boosts transcription (10).
When A binds to RNAP that is not in complex with DNA, 1.1 occupies the DNA-binding channel (12,13). Further, 1.1 plays a specific role in autoregulation of the factor. It inhibits the binding of the factor to promoter DNA sequence alone. Trans-binding experiments with the E. coli 1.1 region and truncated 70 variants suggested that 1.1 binds to domain 4 (region 4.2) in free state (14). Furthermore, examination of possible interdomain interactions of Thermotoga maritima factor as studied by an interdomain cross-linking approach suggested that 1.1 is in a close proximity to domains 2 and 4 (15). Molecular details of these interactions, however, are still elusive.
Currently, two structures of 1.1 are available: one from T. maritima solved by NMR (15) and the other from E. coli solved by crystallography in complex with RNAP (12). Despite sequence similarities in 1.1 in these two organisms, their 3D structures differ. In both organisms, this domain consists of three helices (HI-HIII) connected by two loops. However, although in T. maritima HII and HIII are roughly anti-parallel to one another and pack perpendicularly against HI, in E. coli the three helices show anti-parallel packing, leading to a distinctly different morphology. Here, to provide a basis for better understanding of the structural diversity among domains 1.1 from different species that could have implications for their binding to RNAP, we solved the solution structure of A domain 1.1 from the model soil-dwelling Gram-positive bacterium B. subtilis.

structure determination
We decided to solve the structure of 1.1 by NMR because of its small size (9.4 kDa) and the benefit of getting additional information on flexibility of the protein. Despite the high occurrence of Glu and Gln residues in the 1.1 sequence, an almost complete backbone and side-chain assignment was obtained. As expected, overlapped peaks from the His tag and missing peaks from the N-terminal residue were not assigned; otherwise only two backbone and four side-chain chemical shifts remained unassigned. Spectra measured on a diluted sample confirmed the monomeric state of the sample. NOE assignment yielded 1886 unambiguous 1 H-1 H distances, including 486 long-range NOEs. Additional restraints, 21 3 J HNHA and 86 RDC values, were used for the residues that were predicted to form ␣-helices. The calculated structure is in agreement with the secondary structure prediction from chem-ical shifts. Statistics of the structure calculation are presented in Table 1.

Structure of 1.1
The first 71 amino acids of B. subtilis 1.1 form three helices (helix I, Phe 12 -Arg 26 ; helix II, Tyr 31 -Phe 41 ; and helix III, Ser 45 -Glu 57 ) that are connected by two short loops (Fig. 1). The HI-HII loop is formed by amino acids Gly 27 -Thr 30 , and the HII-HIII loop is formed by amino acids Glu 42 -Glu 44 . The N terminus (amino acids Ala 1 -Thr 11 ) and the C terminus (amino acids Gln 58 -Asp 71 ) are mainly unstructured. An important exception is a small portion of the C terminus (amino acids 61 ELI 63 ), which together with the HI-HII loop (residues 28 VLT 30 ) forms a ␤-sheet motif that is stabilized via hydrogen bonding between the respective parts of the 1.1 main chain. The total charge of the 71-amino acid B. subtilis 1.1 is Ϫ15, i.e. Ϫ21 ϫ (Glu or Asp) ϩ 6 ϫ (Lys or Arg). The frequency of negatively charged amino acids is increasing from the N terminus to the C terminus. Although Helix I is still slightly positively charged (ϩ1), the remaining structural motifs display progressively increasing negative charge: the N terminus (Ϫ1), helix II (Ϫ2), HII-HIII (Ϫ2), helix III (Ϫ5), and the C terminus (Ϫ6). It is therefore clear that the B. subtilis 1.1 can mimic a portion of downstream duplex DNA that is also strongly negatively charged because of the presence of phosphate groups in the phosphodiester bonds between nucleotides.

N relaxation
NMR relaxation was used to probe the dynamics of the 1.1 domain. A set of 15 N relaxation rates, including R 1 and R 2 autorelaxation rates, steady-state [ 1 H]-15 N heteronuclear Overhauser enhancement (ssNOE), and longitudinal and transverse cross-correlated relaxation rates, was measured (16). The software relax was used to analyze the relaxation data in the model-free manner (17)(18)(19)(20). The analysis showed that the molecule tumbles as a prolate spheroid, with the global correlation time (1/6D iso ) of 6.13 ns at 25°C and with D ʈ /D Ќ of 2.23. Internal dynamics of individual residues was characterized by the order parameter S 2 and the effective correlation time e (Fig. 2). The decrease of the order parameter S 2 values, indicating more flexible regions, was observed in the loop between ␣-helices II and III, as well as in the ␤-sheets. In many cases, introduction of a second mode of internal motion, described by an additional order parameter and correlation time, resulted in a statistically significant improvement of the fit. The correlation time of the second mode was well defined in the terminal regions ( s ϭ ϳ1 ns) but determined with a large uncertainty in the well ordered regions (Fig. 2B). Also, the analysis provided significant exchange contribution (R ex ) for many residues (Fig. 2C), indicating a presence of a slow conformational exchange in helix I (especially in its terminal regions) and in the proximal regions (C-terminal portion of helix II, ␤-sheet II, and its vicinity). In summary, the model-free analysis revealed that the internal motions of 1.1 are more complex than typical for well structured proteins. Therefore, molecular motions on a slow time scale (ms-s) were studied in more detail, using the 15 N relaxation dispersion Carr-Purcell-Meiboom-Gill sequence (CPMG) experiment analyzed in the software relax (20). The analysis confirmed the presence of a conformational exchange, sufficiently well described by a twostate model with a relatively uniform exchange rate of ϳ3,500 Hz and with the minor state being populated by 1.5% (Fig. 3). The 15 N chemical shift differences between both states varied from 2 to 6 ppm, with the most significant changes at the end of helix I, in helix II, and in ␤-sheet II.

Position of 1.1 within the DNA-binding channel of RNAP
In RNAP that is not bound to DNA, 1.1 is positioned inside the DNA-binding channel (12,13). To gain insights into the position of B. subtilis 1.1 within B. subtilis RNAP, we carried out structural alignments using the software package Molsoft (www.molsoft.com). 4 As a template we used the crystal structure of E. coli RNAP (PDB code 4LK1; Ref. 12). Our NMR structure of B. subtilis 1.1 and a previously published homology model of B. subtilis RNAP core (21) were structurally aligned with the template.
In this model, B. subtilis 1.1 occupies the downstream duplex DNA-binding channel with its center of gravity at approximately ϩ8 (ϩ1 is the transcription start site), where it must be displaced by the DNA upon formation of the open promoter complex. It is wedged in the RNAP channel, interacting with the ␤ subunit (i.e. amino acids Asp 151 , Arg 183 , Arg 188 , Arg 241 , and Arg 498 ) and with structural elements of the ␤Ј subunit, namely the ␤Ј clamp (Ile 110 and Arg 123 ), the rudder around residue 301, and two amino acid residues of the ␤Ј-pincer (Lys 1125 and Arg 1144 ). Many salt bridges stabilizing B. subtilis 1.1 in the downstream DNA-binding channel of the RNAP core can be predicted based on our model, involving helices I and II and especially helix III (core RNAP/1.  (Fig. 4A). (The spaces between some residues indicate that the residues have more than one possible salt bridge partner.) Although the position of the structured part of B. subtilis 1.1 within the DNA channel can be well predicted, the position of the unstructured C terminus of the B. subtilis 1.1 (the linker to 1.2) can be only roughly approximated. Nevertheless, it is apparent that the C terminus of 1.1 would interact with mobile and functionally important parts of the RNAP core, namely with the bridge helix (responsible for the DNA-RNA translocation), and with the trigger loop (opening or closing access of NTP into the active site of RNAP through the secondary channel). The C terminus of the B. subtilis 1.1 contains many negatively charged amino acids (for example Glu 67 , Glu 68 , Glu 70 , Asp 71 , and Glu 73 ) that could create additional salt bridges with a number of positively charged amino acid residues in the bridge helix (Arg 784 , Lys 785 , Lys 793 , Arg 802 , and Arg 803 ) and in the trigger loop (Arg 937 , Arg 955 , and Arg 963 ; Fig. 4A).

Discussion
We have determined the solution structure of 1.1, the N-terminal domain of the primary factor, A , from B. subtilis. For a long time, the structure of this domain had not been available because of its flexibility until it was solved by NMR for T. maritima (15), and, several years later, also by crystallography for E. coli (12). In the crystal structure, 1.1 is a part of 70 , in a context of a complex with RNAP. The two known structures (E. coli and T. maritima) differ significantly. In the following text, we provide detailed comparisons of B. subtilis 1.1 with these two structures. The comparisons shed light on interactions and conformational changes of 1.1 required for 1.1 accommodation within RNAP.

sequence comparisons
The 1.1 domains from B. subtilis, E. coli, and T. maritima displayed a modest degree of sequence similarity except for the non-conserved N terminus (residues ϳ1-30) of T. maritima (B. subtilis and E. coli lack this fragment). Pairwise alignments of respective sequences from B. subtilis, E. coli, and T. maritima revealed a sequence identity of ϳ25%. Multiple alignment of these sequences then yielded a sequence identity of only ϳ10%. Nevertheless, it should be noted that in the case of Glu/ Asp amino acid residues, which we believe are critical for the binding of 1.1 to the RNAP core, there are numerous point substitutions that do not change charge (either Glu to Asp or Asp to Glu, seven occasions for B. subtilis versus E. coli; Fig. 1). Moreover, these amino acids are often shifted just by one position in the 1.1 sequences (five occasions for T. maritima versus B. subtilis and E. coli; Fig. 1). These evolutionary differences mean that the sequence similarity is significantly greater than

structure comparisons
The 3D solution structure of B. subtilis 1.1 resembles the crystal structure of E. coli 1.1, which was obtained in the context of RNAP (Fig. 5). Despite the similarities, helix I in B. subtilis 1.1 is slightly longer than helix I in E. coli 1.1 (12). Furthermore, several interactions that contribute to anchoring Helix I to the rest of the structure of B. subtilis 1.1 are missing in E. coli 1.1. These interactions are mediated by Phe 12 , whose bulky side chain is nestled between side chains of Met 38 , Phe 41 , and Ile 43 of the HII-HIII loop (Fig. 1).
The B. subtilis and E. coli structures then markedly differ from that of one of T. maritima where HI packs perpendicularly to HII and HIII. This contrasts with the all-anti-parallel packing of helices in B. subtilis and E. coli 1.1 (Fig. 5). It should be noted that HI from T. maritima 1.1 is by far the longest one, and it is also preceded by the non-conserved and unstructured N terminus.

interactions with RNAP
Predicted salt bridges between B. subtilis 1.1 and the DNA channel have corresponding analogous interactions in the structure of E. coli RNAP (PDB code 4LK1; Ref. 12; compare Fig. 4, A and B). The compact structure of B. subtilis 1.1 is undoubtedly important for its optimal interactions with the RNAP core in the downstream DNA-binding channel (especially salt bridges formed by Glu 57 , Glu 65 , Arg 26 , and Glu 42 ; Fig.  4A). The compact structure likely compensates for the absence of some additional parts of RNAP (including those of 1.1) that are found uniquely in either E. coli or T. maritima and that participate in positioning of 1.1 in the DNA channel by reducing the breathing movements of RNAP cleft arms. The "extra" part of E. coli RNAP consists of a large insertion (i.e. amino acids Gly 938 -Thr 1131 ; see PDB code 4YLN; Ref. 22) in the trigger loop that regulates access of NTPs into the RNAP active site via the secondary channel. The extra part of T. maritima RNAP is the relatively long N terminus of its 1.1, which was unstructured in the solution structure of isolated 1.1 (15).
Remarkably, HI of T. maritima 1.1 apparently does not fit into the downstream DNA channel of RNAP (supplemental Fig. S1), whereas HI of B. subtilis 1.1 fits this space smoothly ( Fig. 4C and supplemental Fig. S1). It indicates that HI of T. maritima 1.1 likely undergoes a conformational change to be accommodated into the RNAP core.
Interestingly, the structure of T. maritima 1.1 is very similar to the structured N-terminal part of the ␦ subunit of B. subtilis RNAP, consisting of four ␣-helices (helices Ia, Ib, II, and III, formed by residues Gln 8 -Lys 12 , Leu 16 -His 27 , Phe 33 -Leu 44 , and Gly 52 -Asn 63 , respectively). In fact, helices Ia ϩ Ib, II, and III of ␦ correspond to HI, HII, and HIII of T. maritima 1.1, respectively ( Fig. 5 and supplemental Fig. S2). In addition, ␦ contains a short anti-parallel ␤-sheet composed of three short ␤-strands (residues Val 31 -Pro 32 , Phe 68 -Ala 70 , and Thr 75 -Leu 78 ) at the top of a "twisted tripod" formed by helices Ib, II, and III; Refs. 23-25; Fig. 5). This motif structurally overlaps with the short two-strand ␤-sheet found in our NMR structure of B. subtilis 1.1. Future experiments will have to reveal whether the apparent structure similarity between 1.1 and ␦ may provide clues for identifications of the so-far unknown binding site of ␦ on RNAP.
In conclusion, the determined solution structure of B. subtilis 1.1 showed for the first time a preformed 3D conformation that requires minimal, if any, conformational changes to be accommodated within the DNA-binding channel. Moreover, the NMR relaxation revealed that the determined structure of 1.1 is in a slow exchange with a minor state, differing mostly in the helix HI and its proximity. One can speculate that the minor state may resemble the solved structure of 1.1 from T. maritima. However, further experiments are needed to test this hypothesis.

Sample preparation
1.1 was prepared using a standard protocol including cloning, expression, and purification methods to produce a 13 C, 15 N-uniformly labeled sample in a sufficient concentration. The truncated gene of the A coding only its 1.1 part (amino acids 1-71) was cloned into a pET28b vector between NcoI and XhoI sites. The additional six histidine residues at the C terminus served as the His tag facilitating the protein purification process, and the two residues preceding the His tag (LE) were inserted because of the restriction enzyme (XhoI) used for cloning.

NMR measurements
All NMR experiments were performed at 25°C using 0.8 mM 13 C, 15 N-labeled sample or 0.6 mM 15 N-labeled sample. Temperature was calibrated according to the chemical shift differences of pure methanol peaks.
Resonance assignment was done using experiments acquired on 700 MHz Bruker Avance III spectrometer equipped with the TXO cryogenic probehead with z axis gradients and 850 MHz Bruker Avance III spectrometer equipped with the TCI cryogenic probehead with z axis gradients. Standard set of 3D triple-resonance experiments HNCA, HN(CO)CA, HNCO, HNCACB, and CBCA(CO)NH (26) was used for backbone assignment. Multiple experiments for side-chain assignment, HCCH-TOCSY, TOCSY-HSQC, 13  Protein structure calculation was based on 1 H-1 H distance restraints obtained from 15 N-edited NOESY-HSQC and 13 Cedited NOESY-HSQC (for both aromatic and aliphatic spectral regions) experiments (26). In addition, three-bond scalar couplings obtained from HNHA experiment (28) were used to determine torsion angles of the protein backbone. 1 D(HN H ), 1 D(C ␣ CЈ), 1 D(NCЈ), and 2 D(H N C) RDCs were obtained from 1 H, 15 N-IPAP, HN[C]-S3E, and 13 C-detected (H)CACO-IPAP experiments run on the isotropic sample and on a sample partially oriented in 5% polyacrylamide gel. Program S3EPY (29) was used to evaluate RDCs obtained from the measured spectra. The (non-uniform) RDC errors were estimated from 1D lineshape using Cramer-Rao lower bound theory (30). The spectra were processed using the program NMRPipe 8.1 (31) Secondary structure prediction utilizing 13 C ␣ , 13 C ␤ , 13 CЈ, 1 H, and 15 N chemical shifts was done using program ssp 1.0 (32). Automated assignment of NOESY spectra was performed using CANDID (33), an algorithm included in the program CYANA (34). CNS 1.2 (35) was used for refinement of structures in water using RECOORD scripts modified for our needs (36). Module TENSO (37) was used to include RDCs into structure calculation. Both 3 J HNHA and RDCs were used only for residues in ␣-helices. 300 initial structures were calculated, and 150 structures with the lowest energy were further refined using an explicit water model. The final ensemble of 20 structures with the lowest energy was validated using program CING (38,39) and deposited in the Protein Data Bank (www.rcsb.org; Ref. 40) under code 5MWW.
NMR relaxation experiments were performed at a 600 MHz Bruker Avance III spectrometer equipped with the QCI cryogenic probehead with z axis gradients, at a 850 MHz Bruker Avance III spectrometer equipped with the TCI cryogenic probehead with z axis gradients, and at a 950 MHz Bruker Avance III spectrometer equipped with the TCI cryogenic probehead with z axis gradients. The overall number of 2048 complex points was acquired in the acquisition dimension and 320 complex points were acquired in the indirect dimension for autorelaxation rates R 1 , R 2 , and steady-state 15 (41), separated by 22.22-ms delays, and with a 30-s interscan relaxation delay. Reference spectra and the spectra measured under steady-state conditions were measured in an interleaved manner.
Consistency of the relaxation data were tested as proposed by Morin and Gagné (49). The field independent J(0) values obtained from the autocorrelated and cross-correlated relaxation rates by the spectral density mapping show that the data were not biased by imperfect temperature calibration during the measurements at different magnetic fields. The model-free analysis (17)(18)(19) of autorelaxation rates from 600 MHz, 850 MHz, and 950 MHz was performed using the d'Auvergne protocol in the software relax. The prolate diffusion model was selected based on the lowest value of Akaike's information criterion (50). The selected model was then used to estimate errors of calculated parameters by performing 500 Monte Carlo simulations. Relaxation rates and model-free order parameters were deposited in the BioMagResBank under accession number 27011. Relaxation dispersion CPMG experiments with CPMG frequency ranging from 111 to 2000 Hz (51) were acquired and analyzed using the relaxation dispersion autoanalysis in the software relax (20) to study the slow exchange.
Author contributions-M. Z., L. K., and L. Z. conceived and designed the research; A. R., H. S., and L. K. prepared the samples; M. Z., P. P., and L. Z. acquired and analyzed the NMR data and solved the structure; I. B. did the in silico modeling; and M. Z., I. B., L. K., and L. Z. wrote the manuscript. All authors reviewed and contributed to the manuscript.