The pneumococcal σX activator, ComW, is a DNA-binding protein critical for natural transformation

Natural genetic transformation via horizontal gene transfer enables rapid adaptation to dynamic environments and contributes to both antibiotic resistance and vaccine evasion among bacterial populations. In Streptococcus pneumoniae (pneumococcus), transformation occurs when cells enter competence, a transient state in which cells express the competence master regulator, SigX (σΧ), an alternative σ factor (σ), and a competence co-regulator, ComW. Together, ComW and σX facilitate expression of the genes required for DNA uptake and genetic recombination. SigX activity depends on ComW, as ΔcomW cells transcribe late genes and transform at levels 10- and 10,000-fold below that of WT cells, respectively. Previous findings suggest that ComW functions during assembly of the RNA polymerase-σX holoenzyme to help promote transcription from σX-targeted promoters. However, it remains unknown how ComW facilitates holoenzyme assembly. As ComW seems to be unique to Gram-positive cocci and has no sequence similarity with known transcriptional activators, here we used Rosetta to generate an ab initio model of pneumococcal ComW's 3D-structure. Using this model as a basis for further biochemical, biophysical, and genetic investigations into the molecular features important for its function, we report that ComW is a predicted globular protein and that it interacts with DNA, independently of DNA sequence. We also identified conserved motifs in ComW and show that key residues in these motifs contribute to DNA binding. Lastly, we provide evidence that ComW's DNA-binding activity is important for transformation in pneumococcus. Our findings begin to fill the gaps in understanding how ComW regulates σΧ activity during bacterial natural transformation.

Bacterial natural genetic transformation is the uptake and incorporation of exogenous DNA into a cell's genome. Transformation was discovered in Streptococcus pneumoniae (pneumococcus) (1,2), and subsequently the genes required for transformation have been found in the genomes of streptococci from all species groups (3). As a result, streptococci have highly malleable genomes, making them well equipped for adaptation, as seen with frequent capsular switching, and the rapid spread of genes that mediate antibiotic resistance (4). Natural transformation has increased the diversity of the pneumococcal genome, directly contributing to the evolution of mutli-drug resistant pneumococcal strains (4). As over 82 bacterial species are naturally transformable (5), and many of these are important human pathogens, a deep understanding of this horizontal gene transfer mechanism is important for continued progress in the fight against drug resistant pathogens.
Streptococci primed for transformation are described as competent. Competence is a transient state marked by a shift in both transcriptomic and proteomic profiles (6 -9). Competence entry is controlled by production of the alternative factor (), 3 SigX ( ⌾ ), a member of the 70 family of proteins (10 -12). factors transiently associate with core RNA polymerase (E) to direct the holoenzyme (E) to specific promoters to initiate transcription (13,14). Like all bacteria, streptococci contain a principal factor, SigA ( 〈 ), responsible for most gene transcription. Many bacteria have multiple alternative factors, mediating responses to diverse challenges. However, streptococci have a single alternative factor, X . Interestingly, some streptococcal species contain multiple copies of sigX (also known as comX) (3), and sigX expression is strictly linked to competence development. Streptococci utilize two tightly regulated quorum-sensing systems to coordinate X -mediated competence and other group behaviors (15)(16)(17)(18).
Activation of the ComRS or the ComCDE system coordinates uniform sigX expression. Species of the bovis, pyogenic, salivarius, and suis groups utilize the ComRS system to drive X production (15,16). In this system, the pro-peptide ComS is cleaved to its mature form, XIP (SigX Inducing Peptide). XIP (ComS) is actively exported and reimported for binding to ComR, a member of the Rgg, Rap, NprR, PlcR, and PrgX (RRNPP) family of transcriptional regulators (19,20). The ComR-XIP complex (or ComRS) interacts with DNA as a dimer (21,22), and binds to a conserved inverted repeat upstream of sigX, termed the ComR-box (7,23), to activate transcription of the competence regulon. In contrast, species of the angionsus and mitis groups use the ComCDE pathway (17), an auto-regulated Two-component Signal Transduction system (TCST) that responds to the Competence Stimulating Peptide (CSP) (24). Interestingly, Streptococcus mutans contains both pathways (15,25), adding additional layers of complexity to its competence regulation. Therefore, although the competence master regulator, X , is conserved, streptococci have evolved multiple molecular mechanisms to regulate its expression and consequently, competence.
Much of our understanding of streptococcal transformation comes from initial work done with the ComCDE system in pneumococcus. During exponential growth, the E A basally transcribes comC leading to production of the pro-peptide, ComC (24). ComC is simultaneously cleaved and exported as CSP by the ABC-transporter, ComAB (comAB) (26 -28). CSP is sensed by the histidine-kinase receptor, ComD, resulting in auto-phosphorylation and activation of its cognate response regulator, ComE, via a phospho-relay event (29 -31). Activated ComE promotes transcription of comCDE and other early competence genes (32). ComE-mediated transcription triggers robust competence among the cell population in an auto-positive-feedback regulatory loop. This response culminates in ComE dependent production of X (18,32). During competence, E X transcribes from the combox promoter, directly linking ComCDE quorum sensing to the expression of the transformation regulon (18). ComE also activates expression of a competence co-regulator, comW, encoding ComW, a 9.5-kDa protein of unknown function (33).
ComW homologs are only found in eleven species of the anginosus and mitis groups. Like X , ComW is transient and tightly controlled by ComCDE, suggesting that ComW is unique to ComCDE competence induction. Although X is the master regulator and sigX is expressed independently of ComW (33,34), X activity seems to be dependent on the presence of ComW (35). SigX's requirement for ComW is supported by observations that mutants lacking ComW (⌬comW) transcribe late genes and transform at levels 10-and 10,000-fold below that of WT cells, respectively (36). Furthermore, ⌬comW cells have decreased ⌾ levels (35), and ComW and ⌾ weakly interact (37), suggesting that they physically function together in the cell.
As multiple factor subunits are present in a cell simultaneously, bacteria have evolved many precise and rapid mechanisms to regulate activity. The most common regulatory mechanism of alternative activity is sequestration by an anti-, and subsequent release of upon sensing of the appropriate cellular or environmental cues. Release from sequestration allows alternative factors to interact with E resulting in specific E formation (38,39). Less common methods of factor activation include small protein activators, like the Gramnegative specific Crl (40) (an E S assembly factor) and two-part factors, as seen in some bacteriophage systems (41)(42)(43)(44)(45) and in Bacillus subtilis (46). In an attempt to pinpoint which of these mechanisms may be applicable to ComW-mediated X activation, a suppressor screen was conducted in ⌬comW pneumococcal background. Remarkably, only mutations in A partially restored late gene expression and transformation efficiency (36,37). These A suppressor mutations were restricted to factor regions 2 and 4 ( 2 , 4 ) (36, 37), domains involved in E binding and promoter recognition (47). We interpret these results as indicators that ComW functions during the swap from reliance on E 〈 -mediated transcription to reliance on E ⌾ -mediated transcription during competence and not as a component of a sequestration system.
SigX co-purifies with E from competent pneumococci; but ComW was not isolated with the E ⌾ complex (48). Thus, it is unknown if binding to E is required for ComW function, or whether ComW specifically acts at combox promoters. Purified E X can direct transcription from combox promoters in vitro without ComW (12). However, these assays were not done in competition with A , leaving gaps in our understanding of how ComW promotes E ⌾ formation over that of E A at the onset of competence. As these previous attempts to define how ComW works have not involved direct biochemical or biophysical examination of the protein, our current understanding of ComW's role during competence is based exclusively on indirect, genetic evidence.
To directly explore the biochemical, biophysical, and genetic properties of ComW to help decipher its function as an E ⌾ assembly factor, we created a structural model of ComW using Rosetta (49). Using this model we identified ComW as structurally homologous to DNA binding proteins and show that ComW directly binds to DNA, independent of DNA sequence. Additionally, we show that specific residues on ComW's conserved molecular surface are involved in DNAbinding activity and link this activity to efficient transformation in pneumococcus.

ComW is a protein unique to Gram-positive cocci
ComW was initially identified as a CSP responsive competence regulator in multiple pneumococcal strains (18,33,50). An alignment of 19 ComW proteins from different pneumococcal strains showed that most alleles share a high percentage of residues that are identical to the R6 sequence. However, pneumococcal ComW alleles can vary by as much as 26% (Fig.  1A). This suggests that ComW is conserved within a species and likely plays an important role for the pneumococcal competence response. Furthermore, comparisons of sequences within and across species suggest that there is intra-and interspecies conservation of ComW.
BLAST searches with the ComW sequence from the R6 pneumococcal strain revealed ComW orthologs conserved in at least ten other streptococcal species. These species belong to the anginosus (S. anginosus, S. cristatus, S. oligofermentans, and S. sinensis) and mitis groups (S. dentisani, S. infantis, S. mitis, S. oralis, S. pseudopneumoniae, and S. tigurinus). Initial examination of such ComW orthologs showed that many retain a high percentage of residues that are identical to the pneumococcal protein, like ComW from S. mitis (ComW Sm , 74%) and S. pseudopneumoniae (ComW Sps , 86%), whereas others have as little as 41% identity, like ComW from S. dentisani (Fig. 1B). Like pneumococcus, S. dentisani, S. mitis, and S. pseudopneumoniae are members of the mitis group, suggesting that ComW alleles vary across closely related species. Intra-species variation of ComW is common within S. mitis (Fig. S1). S. mitis ComW proteins that are most similar to pneumococcal ComW can Pneumococcal ComW binds to DNA share 88% residue identity. However, some S. mitis ComW proteins share only 40% sequence identity with the ComW of the R6 pneumococal strain. These relationships suggest that ComW is conserved across many members of the anginosus and mitis groups, but also highlights the divergence of ComW sequences, both at the inter-and intra-species level.
Analysis of ComW sequences revealed four highly conserved regions. In the pneumococcus strain R6, these regions are 17 EEEY 20 , 29 DWE 31 , 38 LIYYLVR 44 , and 56 YHYRVAYRLY 65 . Although ComW sequences exhibit variation, specific aromatic and charged residues within these regions are absolutely conserved in all available orthologs. These four motifs are: . Note: Few pneumococcal sequences contain an alternative start codon (TTG) six bases upstream of the ATG start (not shown). As most comW sequences in pneumococci and other streptococci lack this TTG start codon, and it is in close proximity to the ribosome-binding site, the alternative start codon is not considered as part of the ComW sequence. B, alignment of ComW orthologs from S. pneumoniae (Sp, NP_357614.1), S. pseudopneumoniae (Sps, AEL09536.1), S. mitis (Sm, KEQ48646.1), Peptoniphilus lacrimalis (DNF00528), S. sinensis (Sn, WP_037617413.1) Streptococcus cristatus (Sc, EGU68430.1), S. infantis (Si, EFO53736.1), S. anginosus (Sa, EJP26452.1), S. oralis (So, WP_000939510.1), S. dentisani (Sd, WP_038804352.1). For A and B, black text, residues identical to R6; red text, resides that differ from R6; asterisks, identical residues; colons, conserved residues; periods, semi-conserved residues. ComW orthologs from S. anginosus, S. mitis, S. oralis, and S. pseudopneumoniae were moved to pneumococcus, and tested for transformation efficiency. C, Espript3 Alignment of ComW orthologs from streptococci showing helical placement and solvent exposed residues based on a ComW model. Red shading, 100% conserved residues; red letters, similar within a group, residues framed in blue, similar across groups. Relative surface accessibility (acc) of each residue is indicated: dark blue box for accessible, cyan box for intermediate, and white box for buried. 17 ExEY 20 , 29 xWE 31 , 38 LxYYLxR 44 , and 56 YH(Y/F)RxxYRxY 65 (Fig. 1B). In addition, many of the residues in these motifs are predicted to be solvent exposed, based on a ComW model (Fig.  1C, Fig. 2). ComW sequences also contain shorter variable regions. Of particular interest, the R6 C-terminal residues, 73 RGFISC 78 , are varied or missing from some orthologs (Fig.  1B). These patterns suggest that residues in the conserved motifs are key for ComW function, whereas the C terminus may be dispensable in some species.

Pneumococcal ComW binds to DNA
A BLAST search also identified an ortholog of ComW in the Gram-positive, anaerobic coccus (GPAC), Peptoniphilus lacrimalis (51). Only 47% of P. lacrimalis' ComW residues are identical to those of the pneumococal R6 strain, and the conserved motifs are retained. The full length ComW sequence was not found in any Gram-negative organisms. Thus, ComW seems to be unique to Gram-positive bacteria. ComW is critical for pneumococcal competence, but its function is unknown, and it lacks close relatives outside of Gram-positive cocci. This rarity impeded production of a homology model, for example by Phyre2D (52), and suggests that ComW may adopt a novel fold that is important to its function.

A structural model of ComW
We pursued a structural model to probe ComW's biochemical and biophysical properties. Attempts to crystallize ComW were unsuccessful, and NMR studies proved cumbersome due to the buffer conditions required for long term stability of the protein above 4°C (50 mM Tris, pH 8.0, 500 mM NaCl, 10% glycerol, 1 mM EDTA, 1 mM ␤ME) (Fig. S2). As ComW has no close relatives with known structures for the creation of a homology model (Phyre2), we used Robetta (53) and Rosetta (49) to calculate an ab initio ComW model. Initial models of pneumococcal ComW were built using the Robetta server (53). Robetta produced five models (models not shown). Each model was of a globular protein, composed of 3-4 ␣-helices connected by loops, with a disordered C-terminal tail (residues 73 RGFISC 78 ). Interestingly, Robetta predicted similar models for ComW from S. pseudopneumoniae (ComW Sps , 86% residue identity to ComW from the pneumococcus R6 strain) and ComW from S. anginosus (ComW Sa , 43% residue identity to ComW from pneumococcus R6 strain) (models not shown). These initial predictions suggested that, despite primary sequence variability, all ComW proteins share a similar fold.
To directly control model building parameters, as described in the methods, we also used a local installation of Rosetta (49) to model pneumococcal ComW. The Rosetta calculation produced 80,000 ComW models that were similar to those created by the Robetta server. To find a suitable model for ComW, we narrowed the model pool to the 8,000 lowest energy structures, Figure 2. An ab initio Rosetta model of pneumococcal ComW. A, a ribbon view of the ComW model, colored according to secondary structure (␣-helices are coral, loops and disordered C-terminal residues ( 73 RGFISC 78 ) are light gray). B, residue conservation was determined using Consurf and displayed on a molecular surface model using Chimera (Petersen, 2004). Residues of the 17 EEEY 20 , 38 LIYYLVR 44 , and 56 YHYRVAYRLY 65 motifs, and conserved, solvent exposed tyrosines (Y) are circled. Residues targeted for in vivo mutagenesis in this study are written in red. The nonconserved C-terminal ( 73 RGFISC 78 ) residues are shaded by gray circles. C, predicted surface electro statics as determined using APBS (pH 7.0, parse) and drawn in Chimera; red, negatively charged; blue, positively charged, (Ϫ10 to 10 kT/e). Conserved motifs, tyrosines, and C-terminal residues are labeled as in B.

Pneumococcal ComW binds to DNA
and clustered the models based on similarity. A summary of five Rosetta low energy clusters is shown in Fig. S3, A-E. Overall the models are globally similar 4-helix bundles with some variation in helical packing. The lowest energy models from each cluster are superimposed in Fig. S3F, and are similar in helical content but differ in their helical packing. This overall structural similarity suggests that the Rosetta calculation produced many models that converge on a similar fold.
Models in Clusters 0 and 1 were nearly identical in energy score (Fig. S3), making it difficult to distinguish which model is most representative of ComW's native state. We used the DALI server (54) to determine which cluster likely contained models most similar to folds of known function. As detailed later, the lowest energy model from Cluster 1 yielded high quality alignments with proteins or domains of similar size, as determined by DALI Z-scores, indicating it was closer to known protein folds. This low energy structure was chosen as the representative ComW model, and is shown in Fig. 2A.
The Rosetta model of S. pneumoniae's ComW is a globular protein, composed of four tightly packed ␣-helices that are connected by one short linker and two loops, with a disordered C-terminal tail (residues 73-78), placing the N and C termini at opposite sides of the molecule ( Fig. 2A). Surface representations (Fig. 2, B and C) show that the tightly packed helices form a solid core with multiple exterior grooves and/or pockets, in suitable positions to act as binding sites for other biomolecules.
To visualize surface amino acid conservation the Consurf server (55) was used. From the conservation map, the model predicts that one face of ComW has a large area of highly conserved residues (Fig. 2B, right). This face is composed of elements from two conserved motifs; residues L38, Y40, Y41, L42, V43, and R44 ( 38 LIYYLVR 44 ) in helix ␣3, and Y56, H57, R59, Y62, and R63, ( 56 YHYRVAYRLY 65 ) in helix ␣4 ( Fig. 1B and 2B). Residues Y40, Y41, and R44 are conserved across species and are specifically predicted to be solvent exposed. Residues Y56, H57, R59, Y62, and R63 are also conserved across species and are also predicted to be solvent exposed on this face or the bottom of ComW. The rest of the residues from the 38 LIYYLVR 44 and 56 YHYRVAYRLY 65 motifs are mostly buried in the ComW's hydrophobic core.
On the opposite face of pneumococal ComW, there is no contiguous stretch of conserved residues (Fig. 2B, left). However, the strictly conserved E17 residue of the 17 EEEY 20 motif in helix ␣1 is predicted to be partially solvent exposed and the center of a deep pocket. Residues E18 and E19 are fully exposed, Y20 is partially solvent exposed, and all three of these amino acids extend to ComW's bottom.
To calculate the electrostatic potential the Adaptive Poisson Boltzmann Solver (APBS) (56) was used (Fig. 2C). Interestingly, ComW's opposing faces have opposite electrostatic potential, as calculated by APBS (Fig. 1C). The nonconserved face is highly electronegative, especially the pocket centered on E17 (Fig. 1C, left), whereas the conserved face is electropositive (Fig.  1C, right). Based on these structural features, we hypothesized that these molecular surfaces are binding sites for a protein or other biomolecule and that specific residues in the conserved motifs are integral to such interactions.

ComW⌬6 oligomerizes in solution
Although we have yet to obtain an X-ray crystal or NMR structure of ComW, a truncated version lacking C-terminal residues 73 RGFISC 78 (ComW⌬6, 12.4 kDa with V5H6 tag) was purified via affinity chromatography (Fig. S4) for biochemical and biophysical characterization. A circular dichroism (CD) spectrum indicated a predominately ␣-helical structure in agreement with the Rosetta model (Fig. 3).
Size exclusion chromatography (SEC) experiments suggested that ComW⌬6 oligomerizes in solution, as the pure protein eluted from the gel filtration column at a molecular weight of 25.5 kDa, indicative of a dimer (Fig. 4A). Analytical ultracentrifugation (AUC) sedimentation velocity experiments were performed to verify the oligomeric state of ComW⌬6 (Fig. 4B). The ability of ComW⌬6 to oligomerize at increasing protein concentrations (OD 280 of 0.5 (0.31 mg/ml), 1.0 (0.61 mg/ml), and 1.5 (0.92 mg/ml)) was examined. Analysis of sedimentation data revealed one primary peak for each concentration analyzed. At concentrations of 0.31 mg/ml and 0.61 mg/ml, over 60% of ComW⌬6 sedimented at 2.04S (MW ϳ16.6 kDa), and 2.12S (MW ϳ17.0 kDa), respectively. Although some higher molecular weight aggregates were observed, the above sedimentation values are largely indicative of a monomeric state. At a concentration of 0.92 mg/ml, 95% of ComW⌬6 sedimented at 2.693S, with a calculated molecular weight of 25.6 kDa in agreement with SEC experiments. This shows that ComW⌬6 has a propensity to form dimers in solution, and that dimerization is concentration dependent.

The ComW model is similar to factor structures
The ab initio ComW models produced by Rosetta were used to search the Protein Data bank (PDB) (www.rcsb.org) (57) for similar protein folds, and to help validate our Rosetta calculation. We submitted one model from each Rosetta-generated cluster (Fig. S3), to the Distance Alignment Matrix (DALI) (58) server. Searches with each model returned 600 hits as structurally similar proteins with known function. The combined search yielded 3,000 total hits. Some structural hits were repeated within and across model searches. To focus on pro-

Pneumococcal ComW binds to DNA
teins that may offer clues to ComW's biological function, DALI hits from all five searches were compared. Proteins that were less than 200 amino acids in length, and/or had DALI Z-scores above 4.0 were considered. Protein structures from the search displayed functions in protein binding, protein degradation, nucleic acid binding, and transcriptional regulation.
Notably, searches with each model returned the Escherichia coli (E. coli) primary , 70 (Z-score ϭ 5.5) (59), as structurally similar. In addition, a number of extra cytoplasmic (ECF) factors, including W of Bacillus subtilis (B. subtilis) (Z-score ϭ 5.7) (60), E of E. coli (Z-score ϭ 5.8) (61) (Fig. 5), and K of Mycobacterium tuberculosis (M. tuberculosis) (Z-score ϭ 5.6) (62), were structurally similar to the ComW model. Interestingly, all superimpositions of ComW with these factors identified by DALI showed that ComW is structurally homologous to the 2 domain. This domain is known to directly interact with RNA polymerase, and function during Ϫ10 promoter element recognition and dsDNA melting (63,64). Thus although ComW has no sequence homologs outside of Gram-positive cocci, it likely adopts a fold similar to some -factors that exist in many bacteria. Given that both X -mediated transcription and pneumococcal transformation are dependent on ComW (37), these data strongly suggest that ComW acts as a DNA binding protein at the onset of competence.

ComW⌬6 binds to DNA, nonspecifically
To test for DNA-binding activity, ComW⌬6 was used in electrophoretic mobility shift assays (EMSA). Increasing amounts of ComW⌬6 were incubated with a fluorescently labeled probe containing the X competence specific promoter, known as the combox, (65) (Fig. 6A). An increase in the amount of labeled DNA bound by protein was observed as increasing amounts of ComW⌬6 were added to EMSA reactions (Fig. 6C,  left). Additionally, a control with cytochrome C showed that the results were likely not artificial protein-DNA interaction sim-ply from the use of concentrated purified protein (Fig. S5). Thus ComW⌬6 can bind to DNA in the absence of any other protein partners.
To determine if the ComW⌬6-DNA interaction was specific for the combox promoter, a labeled probe containing the region upstream of sigX from Streptococcus mutans (Fig. 6B) was used in DNA binding assays. In S. mutans, ComRS, a member of the Rgg-like family of transcriptional regulators, controls expression of sigX from the ComR-box promoter, a 20-bp imperfect, inverted repeat (7,15,23). This promoter region differs greatly from the combox promoter that is targeted for transcription by X (Fig. 6〈). As ComW has only been identified in streptococcal species that utilize the ComCDE competence activation pathway, we predicted that ComW would not bind to ComRbox containing DNA. However, an increase in the amount of shifted S. mutans probe was observed when increasing amounts of ComW⌬6 were added to the reaction (Fig. 6C, right). A comparison of binding curves of these interactions (Fig. 6B) sug- For both panels, the PDB entry and aligned residues, polypeptide chain, Z-score (calculated by the DALI server, and considers matched residues and domain size), root-mean-square-deviation (rmsd), and functional region are given in each table, for each structure.

Pneumococcal ComW binds to DNA
gested that ComW⌬6 binds to each probe with similar kinetics, and therefore binding is not dependent on DNA sequence. ComW⌬6 was also able to shift a plasmid derived DNA probe in EMSA (not shown), further supporting that ComW⌬6 can interact with DNA independently of sequence. In addition, ComW⌬6 interacted with a single stranded DNA (ssDNA) probe only at Ͼ64⌴ of protein (not shown), suggesting that ComW⌬6 may prefer dsDNA targets.

Mutations to the 38 LIYYLVR 44 motif of ComW disrupt DNA binding
Structural alignments of the ComW model with alternative factors showed that ComW likely adopts a similar fold to these transcriptional regulators (Fig. 5). Close examination of structural alignments between the ComW model and the crystal structure of 2 of E. coli's E (61) shows that many of the residues that constitute ComW's conserved surface align with residues that are important for ⌭ -DNA contact during promoter recognition and melting. To investigate if residues in this region of ComW played a role in DNA binding, we purified mutants in the 38 LIYYLVR 44 motif, and one mutant in the 17 EEEY 20 motif located on the opposite electronegative face. Only ComW⌬6 E18A , ComW⌬6 Y40A , ComW⌬6 L42A , and ComW⌬6 R44A proved soluble and stable in vitro, as demonstrated by CD (Fig. 3). Their spectra indicated predominately ␣-helical structures, again in agreement with the Rosetta model. Furthermore, these data indicate that the point mutations do not disrupt the structure relative to ComW⌬6. Thus, these variants were used in our biochemical assay.
Each soluble ComW⌬6 variant was tested for DNA binding by EMSA with combox and ComR-box DNA. As expected, the nonconserved E18A mutant was able to interact with both DNA probes, indicating that this residue was dispensable for DNA binding (Fig. 7B, left and middle). Comparisons of DNA binding curves also suggested ComW⌬6 E18A mutant binds to different probes with similar kinetics; therefore binding was independent of sequence ( Fig. 7B, right). ComW⌬6 Y40A also interacted with both DNA probes, indicating that this residue was dispensable for DNA binding (Fig. 7C, left and middle). Again, DNA binding curves showed similar binding kinetics with pneumococcal and S. mutans probes (Fig. 7C, right), further supporting that DNA binding is not dependent on DNA sequence. At 4 ⌴ protein, ComW⌬6 E18A and ComW⌬6 Y40A mutants did not differ in their ability to bind to the DNA probes when compared with ComW⌬6 (Fig. 7A, left). Interestingly, unlike ComW⌬6, ComW⌬6 E18A and ComW⌬6 Y40A point-mutants showed increased total DNA shift at 32 ⌴, irrespective of DNA sequence, and appeared to shift the DNA to a higher oligomeric form (Fig. 7, A, right and B and C). Mutation of residues E18 or Y40 to alanine may increase total binding of ComW⌬6 to DNA.
In contrast, ComW⌬6 L42A and ComW⌬6 R44A showed a decrease in ability to interact with combox or ComR-box DNA, when compared with ComW⌬6. ComW⌬6 L42A binds to DNA at Ն4.0⌴, compared with binding at 1.0 ⌴ as seen with ComW⌬6 (Fig. 7, A and D, left and middle, and Fig. 6C). However, at 32 ⌴, ComW⌬6 L42A appeared to bind the DNA 20 nM probe + 32µΜ protein ** * A Pneumococcal ComW binds to DNA probes at levels that were not significantly different than ComW⌬6 (Fig. 7A, right). In addition, a comparison of DNA binding curves with different probes, suggested that ComW⌬6 L42A retains the ability to bind to DNA independently of sequence (Fig. 7D, right). ComW⌬6 R44A showed the most dramatic decrease in DNA binding. At 4 ⌴, ComW⌬6 R44A did not bind to either DNA probe (Fig. 7E, left and middle, Fig. 7A, left). More than 8 ⌴ of protein was required to shift small amounts of the probe in EMSA gels (Fig. 7E, left and middle). At 32 ⌴ of protein, ComW⌬6 R44A shifted only 11-12% of either DNA probe (Fig.  7A, right), significantly less DNA than that shifted by ComW⌬6. This dramatic decrease in binding activity is evident in DNA binding curves. However, the DNA binding curves did show that there was no significant difference in binding to pneumococcal and S. mutans probes (Fig. 7E,  right) by ComW⌬6 R44A . These results demonstrate that residues ComW⌬6 L42 and ComW⌬6 R44 participate in the ComW⌬6-DNA interaction. Thus, the 38 LIYYLVR 44 motif, present on the positively charged conserved face of ComW, is important for efficient DNA binding.

Residues that participate in ComW⌬6 DNA binding are not required for oligomerization
As ComW⌬6 oligomerizes (Fig. 4), which may contribute to DNA binding, we used SEC to determine the integrity of quaternary structure in the ComW⌬6 L42A and ComW⌬6 R44A mutants. Like ComW⌬6 (25.5 kDa), the ComW⌬6 L42A and ComW⌬6 R44A mutants eluted as dimers, with molecular weights of 19.4 and 24.4 kDa, respectively (Fig. 4). This demonstrated that mutations in the 38 LIYYLVR 44 motif of pneumococcal ComW did not disrupt oligomerization. Together, the SEC and EMSA results show that ComW's 38 LIYYLVR 44 motif on the conserved, electropositive face is unlikely to participate in protein oligomerization, but instead is part of a direct binding site for DNA.

ComW mutations alter protein levels and pneumococcal transformation efficiencies
To determine levels of ComW production and transformation efficiency in comW mutant strains, we used Western blots and a pneumococcal natural transformation assay. A pneumococcal-specific antibody detected WT ComW as a 9.5-kDa band on blots, and WT cells transformed with an efficiency of 66%. In contrast, ⌬comW mutants produced no ComW and transformed at levels 10,000-fold below that of WT cells (Fig. 8,  A and C, and Fig. S6B). We attribute the loss of transformability to complete loss of ComW protein in this strain. Following biochemical characterization of the ComW⌬6 variant, we also determined how removal of the disordered C-terminal domain altered ComW production and function in pneumococcus.
ComW⌬6 was produced at levels that were 1/3 that of WT cells and comW⌬6 mutants transformed at 31% efficiency (Fig. 8, A  and C, and Fig. S6B). Thus comW⌬6-expressing pneumococci exhibited a decrease in transformability that appears to be dependent on the level of ComW⌬6 production. Thus, the C-terminal residues, 73 RGFISC 78 , which are not conserved across streptococci (Fig. 1D), are important for ComW stability and full function in vivo, but are dispensable for DNA binding in vitro. The specific functional importance of the disordered tail has yet to be determined.
We determined if changes to the 17 EEE 19 motif altered ComW levels and transformability of pneumococci. This motif is present on the electro-negative, nonconserved surface of ComW and does not appear to be required for DNA binding (Fig. 7, A and B). At 17 min post CSP induction, mutants ComW E17A , ComW E18A , or ComW E19A were produced at significantly lower levels than ComW (Fig. 8C, Fig. S6B). Interestingly, the comW E17A and comW E19A -expressing mutants transformed at only 24 and 33%, respectively, but the comW E18A expressing mutant transformed at 73% efficiency (Fig. 8A). Pneumococci did not produce detectable levels of ComW 17AAA19 , and comW 17AAA19 -expressing mutants transformed at levels similar to that of ⌬comW cells (Fig. 8, A  and C, Fig. S6B). Although decreases in ComW E17A , ComW E19A , and ComW 17AAA19 result in decreases in transformation efficiency, decreases in ComW E18A does not, and suggests that decreased protein production does not always result in a decrease in transformation efficiency. In addition, as these ComW variants are less stable than WT near peak competence, but mutation to this motif allows DNA binding, we hypothesize that the electro-negative surface of ComW serves an important functional role that may be separate from ComW DNA binding activity.

Pneumococcal ComW binds to DNA The ComW ortholog from S. anginosus does not complement pneumococcal ComW
Although all streptococci share competence specific genes, including the master regulator, X , they differ in competence activation pathways (3). ComW is only produced in streptococci that activate competence via the ComCDE quorum-sensing pathway (10,33,66). Therefore it is possible that ComW is unique to only the ComCDE system, and that streptococci have evolved multiple mechanisms to regulate competence via direct regulation of X activity. We explored how natural variation in ComW orthologues affected their ability to function in competent pneumococci. Pneumococcal comW was replaced with that of S. anginosus (ComW Sa ), S. mitis (ComW Sm ), S. oralis (ComW So ), or S. pseudopneumoniae (ComW Sps ), and we examined the production of ComW orthologues and the corresponding transformation efficiency in the resulting chimeric strains.
ComW Sm and ComW Sps are 73 and 78% identical to pneumococcal ComW respectively (with identical 17 EEEY 20 , 38 LIYYLVR 44 , and 56 YHYRVAYRLY 65 motifs, and have varied sequences of C-terminal residues, Fig. 1B). Both ComW Sm and ComW Sps were detected in Western blots from competent chimeras at 17 min post CSP induction, but levels of ComW Sps were significantly decreased compare with pneumococcal ComW, whereas ComW Sm levels were not ( Fig. 8C and Fig.  S6B). The S. mitis and S. pseudopneumoniae chimeras transformed at 56% and 55%, respectively, and only the decrease in transformation with comW Sps was statistically significant (Fig.  8B). Although it is possible that differences in amino acid sequence contribute to the observed differences in protein levels, these data do demonstrate that the S. mitis and S. pseudopneumoniae orthologues are produced in pneumococcus and are functional in pneumococcus. Thus ComW functionality is retained across closely related species.

Pneumococcal ComW binds to DNA
ComW Sa protein was not readily detected in Western blots and, in stark contrast to other orthologues, chimeras expressing comW Sa transformed at levels 10,000-fold below WT cells, a phenotype like ⌬comW cells (Fig. 8, B and C). This suggests that ComW Sa is not stably produced in pneumococcal cells.
In contrast ComW So was detectable, albeit at significantly decreased levels compared with WT, yet chimeras expressing comW So transformed at 41% efficiency (Fig. 8, B and C). This result is consistent with the fact that observed decreases in some ComW variants do not always predict equal decreases in transformability. It is possible that the specificity of the pneumococcal ComW antibody hinders detection of the more divergent ComW Sa and ComW So alleles. However, combined with their decreased transformation efficiencies, these data suggests that ComW Sa and ComW So are not stable and/or fully functional in the pneumococcal competence system. Therefore, although ComW proteins are similar across species, the transformation and Western blotting data suggest that there are species-specific determinants that are important for ComW stability and function.

Discussion
Since ComW's identification, there have been limited advances in our understanding of its function. Although a genetic link to the shift from A to X -dependent transcription has been established (36, 37), comW had not been probed for mutations that disrupt pneumococcal transformation. Furthermore, direct biochemical characterization of ComW was stalled by a lack of soluble protein. Bioinformatic tools, like Robetta, Rosetta, and the DALI server (49,53,58), have laid a foundation for deeper analysis of ComW function.
The ab initio model of ComW described here depicts a globular protein, ␣-helical in structure, with a DNA-binding-like fold. Combined predictive analysis of ComW's surface characteristics using the APBS (56) and Consurf (55) servers identified two opposing faces of ComW: one conserved, electro-positively charged face, and one nonconserved, electro-negatively charged face. In vitro characterization of ComW⌬6 (12.4 kDa, ⌬ 73 RGFISC 78 ), a soluble variant with ␣-helical structure as predicted by the model (Fig. 3), shows that the protein dimerizes in solution (Fig. 4). Importantly, we report for the first time that ComW⌬6 binds to DNA independently of DNA sequence (Fig.  6), an interaction that depends on residues that constitute the conserved, electro-positively charged face (Fig. 7). Additionally, conserved residues on both faces of ComW are important for pneumococcal transformation (Fig. 8). Interestingly, point mutations to ComW's nonconserved, electro-negative face appear to destabilize the protein more than point mutations to ComW's conserved, electro-positive face. The presence of stable protein for poor DNA binding mutants ComW L42A and ComW R44A suggest that ComW-DNA interactions are important for transformation. Thus, we have begun to fill the gaps in understanding ComW's function as a regulator of ⌾ activity during natural transformation.
Pneumococcal ComW and X are protein partners that promote transformation-specific gene transcription (33,(35)(36)(37). SigX is a member of the 70 family of proteins (10), most similar to Group 4 factors, the Extra-cytoplasmic Function (ECF) factor family. ECF factors are small, as they are composed of only the 2 and 4 domains essential for E binding, promoter recognition, and promoter melting (39). Our ComW model resembles the structure of the 2 domain of E. coli's primary factor, 70 (59), and the structures of some alternative factors, like B. subtilis' W (67), and E. coli's E (61,68). In addition, ComW⌬6 interacts with DNA in EMSAs. This is an activity that has been observed with both Gram-negative and Grampositive factors, including the structurally homologous J of M. tuberculosis (69), PG0162 of Porphyromonas gingivalis (70), B of Staphylococcus aureus (71), and ⌭ of Vibrio alginolyticus (72).InconcertwiththerequirementofComWforrobust X -dependent transcription, we predict that ComW is an active member of the E ⌾ complex during pneumococcal late-competence gene transcription.
Amino acids that are important for factor-DNA contact have been identified in structures of multiple factors or factor domains in complex with DNA. Conserved aromatic and basic residues in the 2 domain act in promoter melting and nonsequence-specific DNA binding (63,(73)(74)(75)(76)(77)(78). The ComW model has one conserved surface that is composed of motifs 38 LIYYLVR 44 and 56 YHYRVAYRLY 65 . Many of these aromatic and charged residues align with amino acids that are important for E. coli's ⌭ -DNA contact during promoter recognition and melting (61). Furthermore, we determined that mutation to two pneumococcal ComW residues, L42 and R44 on the conserved face, are important for nonspecific DNA binding, and that ComW L42A and ComW R44A are stably produced in pneumococcus but disrupt transformation efficiency. These results are in agreement with current understanding of the types of amino acids that factors utilize for DNA interaction, and support a direct role for ComW in transcription activation at X promoters. We expect that additional residues within these motifs aid in promoter recognition and DNA binding.
Although E alone can bind to DNA and promote transcription (64,79,80), factors interact with E to specify transcriptional targets (13). SigX does not require ComW for interaction with ⌭ (unpublished data), and E X can transcribe from combox promoters in vitro, in the absence of ComW (12). However these assays did not determine E ⌾ promoter specificity in the presence of A or A dependent promoters. Hence it will be important to determine if ComW alters the affinity of X for ⌭, and if/how ComW affects E ⌾ transcription in these more competitive contexts. ComW's resemblance to 2 domains is peculiar because this domain also functions in DNA sequencespecific promoter recognition (63,64). However, ComW interacts with DNA nonspecifically. Thus, it is possible that E ⌾ promoter specificity is determined by X and that ComW mediates ⌭ ⌾ -DNA interactions via binding to the DNA phosphate-backbone, or in another manner that does not depend on DNA sequence.
Sigma factor activity is often controlled via direct interaction by a regulatory protein. Canonically, anti-factors, many of which are membrane proteins, sequester ECF factors, inhibiting E formation. Specific cues activate regulated intramembrane proteolysis (RIP) cascades, leading to the release of the ECF (81-84). An anti-X protein has not been identified (37),

Pneumococcal ComW binds to DNA
making it unlikely that ComW-mediated X regulation occurs via alleviation of a sequestration mechanism.
Noncanonical factor control via interaction with a small protein has been observed in both Gram-negative and Grampositive organisms. In Gram-negative organisms, like E. coli, Crl (ϳ15.6 kDa) binds to the stress response specific S and to ⌭, to promote formation of the E S complex (40,85). In the absence of Crl, E S fails to form stable complexes, and transcription from S promoters is decreased (85). As we have observed E X formation in competent pneumococcal ⌬comW cells (48), the structure of Crl (compose of an ␣ϩ␤ fold, (86)) vastly differs from the ComW model, and no Crl-DNA interactions have been reported, it is unlikely that ComW shares an equivalent factor activating mechanism with Crl.
In the Gram-positive bacterium, B. subtilis, YvrI-YvrHa are two small proteins that promote transcription of genes required in acidic conditions (87). Activation of gene transcription by the factor-like protein, YvrI, requires binding of its N terminus to the small co-regulator, YvrHa. In this system, YvrI is most similar to 4 domain, interacts with the ␤-flap domain of E, an interaction that is not dependent on YvrHa, and determines the promoter specificity of E YvrI . YvrHa interacts with the ␤' subunit of RNAP, a conserved interaction for the 2 domain, and aids in open complex stabilization. As these independent proteins co-purify with RNAP from B. subtilis, YvrI and YvrHa likely function in vivo as a unified factor (46).
ComW and ⌾ weakly interact in yeast-two-hybrid assays (37), but a direct interaction between these protein partners has not been demonstrated in competent pneumococcal cells or in vitro. Yet X is destabilized in ⌬comW cells, or cells with Nor C-terminally tagged ComW (35), suggesting that the X -ComW interaction is important for stability of both proteins. ComW⌬6 levels are decreased compared with WT, resulting in decreased transformation. In addition, point mutations to ComW's electro-negative surface results in decreased transformation and decreased protein levels near peak competence. Thus it is possible that the C-terminal domain and/or the electro-negative surface of ComW mediate interaction with ⌾ . Interestingly, point mutations to ComW's DNA binding surface, specifically residues L42 and R44, disrupt transformation but produce stable protein. So it is likely that interaction with ⌾ is not mediated by this surface. Thus, it will be valuable to determine X protein levels in all ComW mutants in order to identify residues important for X stabilization.
The X and ComW pair shares some similarity to B. subtilis' YvrI-YvrHa. At the protein level, X and YvrI are ϳ23 kDa, and ComW and YvrHa are ϳ10 kDa. SigX certainly functions as a factor (12), and ComW's predicted similarity to 2 domains suggests that it too can interact with E. Investigation into endogenous ComW protein-protein interactions are required to determine how it functions in the context of the E X complex at competence onset.
Orthologs of comW have been identified in anginosus and mitis group streptococci, and in the Gram-positive coccus, P. lacrimalis. All known ComWs are similar in primary structure and Robetta models of orthologs from S. anginosus and S. pseudopneumoniae are similar to that of pneumococcus. However, our findings that these orthologs differ in their ability to com-plement pneumococcal ComW during transformation and the observed differences in protein stability hint that species-specific determinants exist that dictate ComW function in vivo. As subunits of E are highly conserved among Gram-positive and Gram-negative organisms (47,64), it is more likely that molecular interactions with species specific X are more important for ComW function. Like ComW, the X primary sequence is highly similar across species, but does vary. Furthermore, streptococci of different species groups utilize different quorum sensing systems to activate production of X and ComW seems to be unique to only one of these systems (ComCDE). If streptococci have evolved multiple mechanisms to regulate competence, the X -ComW system provides a unique opportunity for deep study into the co-evolution of transcriptional regulators.
Lastly, although specific affinities of pneumococcal 〈 and X for E have not yet been determined, 〈 mutations that are predicted to disrupt ⌭ 〈 formation (mutations in 2 and 4 domains), bypass the ComW requirement (36,37). Thus, competition for E between 〈 and ⌾ at the onset of competence is likely a key determinant of transformation efficiency. It is conceivable ComW helps to tip the competition in favor of ⌭ ⌾ formation over that of E 〈 . Previous work demonstrated that replacement of pneumococcal 〈 with a chimeric 〈 , which included S. mutans' regions 2 and 4 , partially restored ⌬comW phenotypes (unpublished, Tovpeko Thesis, 2016). Recall that S. mutans are naturally transformable streptococci that do not produce ComW during competence. This is additional evidence that some species-specific determinants exist and are important factors in streptococcal transformation. More broadly, species of the bovis, pyogenic, salivarius, and suis streptococcal groups likely evolved ComW-independent mechanisms to regulate X activity, and consequently, natural transformation.
The pneumococcal competence specific ComW and ⌾ regulators provide a unique framework to study multiple molecular phenomena. ComW, predicted as structurally similar to -factors, and X , the only known -factor in streptococci, work together to promote a robust change in pneumococcal transcription during competence. They have been shown to directly interact with DNA and/or RNA polymerase, and both are required for efficient pneumococcal transformation. Taken together, these observations suggest that ComW and X function together, perhaps, for example, as a competence specific two-part factor. Therefore ComW and ⌾ provide an opportunity to study factor function, regulation, and holoenzyme assembly. ComW and ⌾ also offer an unambiguous context to study how independent proteins function together to promote gene expression. Lastly, further investigation into the ComW-⌾ system will add insight into a mechanism used to control natural genetic transformation in an important human pathogen. This knowledge will add to the growing number of mechanisms that bacteria use to promote horizontal gene transfer. A deeper understanding of the mechanisms bacteria use to scavenge evolutionarily favorable genes will increase ability to combat their rapid adaptation to antibiotics and bacterial specific vaccines.

Bacterial strains and culture media
The bacterial strains used in this study are listed in Table S1. CP2137, a ⌬cps ⌬comA ⌬ssbB::pEVP3::ssbB derivative of strain Rx1 (10,36,88) was used as the wildtype (WT) standard for transformation assays. CP2137 does not secrete endogenous competence stimulating peptide (CSP) due to deletion of the CSP exporter, ComA. All comW mutations were placed in the CP2137 background (see below). CP2463, a ⌬comW::kan derivative of CP2137, was used as the ⌬comW standard for transformation assays (36). All pneumococcal strains were cultured in CAT medium and plated on CAT or THY supplemented with 1.5% agar and selective antibiotic, as needed. CAT medium was prepared from 5 g of tryptone (Difco Laboratories), 10 g of enzymatic casein hydrolysate (Sigma), 1 g of yeast extract (Difco), and 5 g of NaCl (Fisher Scientific) in 1 liter of H 2 O, sterilized for 40 min at 121°C, and then supplemented to 0.2% glucose and 0.016 M K 2 HPO 4 before use. THY was prepared from 10 g of yeast extract (Difco), 30 g of Todd Hewitt Broth (Difco) in 1 liter of H 2 O and sterilized for 20 min at 121°C. Novobiocin was used in pneumococcal transformation assays at 2.5 g/ml.
E. coli strains DH5␣ and BL21De3 were hosts for plasmid isolation and protein expression, respectively. For plasmid introduction, E. Coli strains were chemically transformed according to (89). E. coli strains were cultured in lysogeny broth (LB) (90). LB was prepared from 5 g bacto tryptone (Difco), 5 g NaCl (Fisher Scientific), 2.5 g of yeast extract (Difco) in 1 liter H 2 O, sterilized for 20 min at 121°C, and supplemented with appropriate antibiotics and 1.5% agar, as needed. Ampicillin was used at 100 g/ml, for growth of E. coli strains. Antibiotics were purchased from Sigma.

Computational modeling
The ab initio structural models of pneumococcal ComW were calculated using the Robetta server (53), and with a local installation of Rosetta (49). The Robetta server generated 3-residue and 9-residue fragment files that were used as input for the Rosetta calculation. For the Rosetta calculation, radius of gyration, contact-order, and sheet filters were used. Helix and loop structures were weighted equally and a fast relax protocol was performed.
Clustering of the 8,000 most energetically favorable models was done using Rosetta's cluster application with automatic radius detection. The Global Distance Test (Rosetta specific gdtmm) was used to cluster the models. Five clusters, each with nine models, were generated. One model from the five clusters was selected as representative of pneumococcal ComW's structure, based on Rosetta energy score, and cluster agreement as determined by rmsd between superimposed structures.
Site directed mutagenesis of pNLI37 was used to generate point mutations in pneumococcal comW. pNLI37 was used as a template for PCR amplification by Platinum Pfx Polymerase (Thermo Fisher Scientific). Mutation specific primers (Table  S3) and PCR conditions optimized for each primer set (individual conditions not given) in 50 l reactions. PCR products were digested with DpnI (New England Biolabs) for 2 h at 37°C. To circularize plasmids, a modified Seamless Ligation Cloning Extract (SLiCE) protocol (91) was used. Briefly, 8 l of DpnI digested PCR product was mixed with 1.5 l of SLiCE reaction buffer and 1.5 l of SLiCE cell extract, and brought to 15 l volume with H 2 O. Reaction mixtures were incubated at 37°C for 15 min in a thermocycler. 3 l SLiCE reaction products were transformed into 50 l of DH5␣ cells (Invitrogen), and then plated on LB agar with 100 g/ml ampicillin. Single colonies were picked and cultured in LB medium overnight at 37°C. Plasmids were isolated using the ZymoPURE Plasmid Miniprep Kit (Zymo Research) and concentration was measured using a ND-1000 Spectrophotometer (Nanodrop). Mutations were confirmed by Sanger DNA sequencing (University of Illinois at Chicago Sequencing Core (UICSQC)).
Gibson assembly (92) of pET22bϩ containing comW point mutants was achieved by use of the Hifi Assembly Mix (New England Biolabs). Briefly, comW point mutants from plasmids pNLI80, pNLI87, pNLI88, pNLI89, pNLI90, pNLI91, and pNLI100 were PCR amplified using primer pair NL226 and NL233 which deleted residues 73 RGFISC 78 of comW. The PCR products contained 5Ј and 3Ј, 12 bases overlap with NLI60 vector backbone that contained a C-terminal V5 epitope-6His tag (V5H6), followed by a stop codon. The vector backbone was PCR amplified using primer pair NL190 and NL232. PCR products were purified using the DNA Clean & Concentrator-5 kit (Zymo Research) and DNA concentrations measured using a Nanodrop spectrophotometer. DNA size and concentration was used to estimate picomolar concentration of each clean PCR product. Vectors and inserts were mixed at a 1:2 molar ratio, with up to 0.2 pmol of DNA in 10-l reactions, and incubated for 15 min at 50°C in a thermocylcer. Assembly products (3 l) were transformed into DH5␣ cells, and plated on LB-agar plates with ampicillin overnight at 37°C. Single colonies were picked and cultured in LB medium overnight at 37°C. Plasmids were isolated and sequences confirmed with Sanger sequencing (UICSQC). The plasmids were transformed into BL21De3 (Invitrogen) cells and plated on LB-agar supplemented with 100 g/ml ampicillin, overnight at 30°C.

Design and construction of DNA donors for pneumococcal comW gene replacement, and of the novobiocin resistance gene cassettes
Strains, plasmids, and primers are listed in supporting information (Tables S1-S3). Strains CP2800, CP2803, and CP2805 were generated using restriction digestion and ligation. The flanking upstream (primers NL106 and NL107) and downstream (NL102 and NL103) regions of pneumococcal comW Pneumococcal ComW binds to DNA were PCR amplified with primers containing 5Ј BtsI restriction sites using 5 ng of CP2137 genomic DNA, 1 l of Phire Hot Start II DNA Polymerase (Thermo Fisher Scientific), and 0.2 mM dNTP mix (Thermo Fisher Scientific) in a 50-l reaction in a thermocycler. The comW⌬6 gene variant was PCR amplified using primer pair NL104 and NL105, Phire Hot Start II DNA Polymerase, 5 ng of pNLI60, and 0.2 mM dNTP mix in a 50-l reaction, generating a fragment with 5Ј and 3Ј BtsI cut sites. The genomic upstream flank that was ligated to comW⌬6 was generated using primer pair NL101 and NL106, 50 ng of CP2137 genomic DNA, and the same PCR conditions as primer pair NL106 and NL107. The E19A and Y41A mutations in comW were PCR amplified from plasmids pNLI88 and pNLI80, respectively, using primer pair NL105 and NL108, in the same PCR conditions as comW⌬6 variant.
To link cassette pieces, 10 l of the flanking arms and comW variant PCR reactions were digested with BtsI (New England Biolabs) overnight at 37°C in 30-l reactions, to generate 2-bp overhangs on the ends of each molecule. Digested DNA products were purified using Zymo research kits and the DNA concentration was taken using a nanodrop. A 1:1 flank to gene fragment ratio was used in 30-l ligation reactions overnight at 16°C in a thermocycler. Ligation products were transformed into pneumococcus for homologous recombination (see below).
Strains CP2801-CP2802, CP2804, CP2806-CP2807, and CP2808-CP2813 were obtained using Gibson assembly (92) with NEBuilder HiFi Assembly Mix (New England Biolabs). For assembly of pneumococcal comW point mutant cassettes, 5 ng of CP2137 genomic DNA was used in PCR with 0.1 units Phusion High-fidelity DNA polymerase (Thermo Fisher Scientific), 1 mM MgCl 2 , 0.4 mM dNTP mix, and primers NL103 and NL177 to amplify a 3,327-bp comW downstream flank. Pneumococcal comW mutants were PCR amplified using 5 ng of specific plasmids with primer pair NL175 and NL176 containing 12-bp overlapping sequence with the upstream and downstream comW flanks in the PCR conditions mentioned above. 50 ng of CP2137 genomic DNA with primers NL179 and NL180 was used to amplify a 2,301-bp comW upstream flank, in the PCR conditions mentioned above without the use of MgCl 2 .
For assembly of comW orthologous gene cassettes, the flanking upstream (primers NL153 and NL154) and downstream (primers NL155 and NL156) regions of pneuococcal comW were PCR amplified using 50 ng of CP2137 genomic DNA, 0.02 units of Phusion High-fidelity DNA Polymerase, 0.4 mM dNTP mix, and 10 mM MgCl 2 in a 50-l reaction in a thermocycler. Orthologous comW genes were amplified from plasmids (Table  S2) with gene specific primers (Table S3) using 5 ng of plasmid DNA, 0.02 units of Phusion High-Fidelity DNA polymerase, 0.4 mM dNTP mix in 50-l reactions, in a thermocycler. All PCR products were purified using Zymo kits. A Nanodrop was used to determine the concentration. Concentrations and DNA fragment sizes were used to calculate the number of picomole ends. Flanking arms and comW variant fragments were mixed in a 6:1 molar ratio and incubated in 10-l reactions with HiFi assembly mix at 50°C for 20 min in a thermocycler. 1 l of each assembly product was used in subsequent PCR with primers DAM497 and DAM500 for amplification to generate gene cas-sette products of 2,200 bp. PCR products were purified and sequenced at the UICSQC.
The 7.4-kb gyrB novobiocin resistance marker was prepared by amplification with primers YT76 and YT77, and 10 ng of either CM6 or CP1500 genomic DNA as the template. PCR amplification was performed in 50 l reactions with 1 l Phire HotStart II polymerase and Phire reaction buffer, and 0.2 mM of dNTP. The concentration of the purified product was measured in a ND-1000 spectrophotometer.

Construction of comW mutant strains
CP2137 cells were cultured in 10 ml of CAT medium to an OD 550 of 0.05. For transformation, 100 l of cells were mixed with 100 ng of CSP, 0.04% BSA, 0.001 M CaCl 2 , and 100 ng of gene replacement cassettes made by Gibson assembly, or 300 ng of gene replacement cassettes made by restriction digestion and ligation. Transformation reactions were brought to a total volume of 1 ml with CAT, and incubated with CSP and DNA for 2 h at 37°C, and then chilled. Serial dilutions of each reaction were plated on CAT agar and grown overnight at 37°C. Single colonies were picked and cultured overnight in CAT supplemented with 0.016 M K 2 HPO 4 at 37°C. Overnight cultures were diluted into 10 ml CAT supplemented with 0.016 M K 2 HPO 4 and 0.2% glucose, and grown at 37°C to an OD 550 of 0.2. Cells were diluted 1:5 in H 2 O and used in PCR reactions to amplify the comW region with primers DAM145 and DAM146. PCR products were cleaned and sequenced at the UICSQC.

Pneumococcal transformation assays
Aliquots of frozen pneumococcal strain stocks were diluted 1:100 in 10 ml of CAT medium and cultured at 37°C for five hours until their OD 550 measured between 0.2-0.4. Each culture was diluted in CAT to an OD 500 0.05, and 10 l of cells were incubated at 37°C for 80 min with 100 ng CSP, 0.04% BSA, 0.001 M CaCl 2 , and 100 ng novobiocin resistance cassette in 1 ml with CAT medium. Transformation reactions were then diluted 1:100 in THY medium and incubated at 37°C for an additional 80 min, and then on ice prior to serial dilution. Diluted cells were plated on THY agar with 2.5 g/ml novobiocin and incubated overnight at 37°C. Colonies were counted and the transformation efficiency determined as a ratio of transformants/total cfu (CFU).

ComW⌬6 purification
E. coli transformed with a truncated variant of comW (ComW⌬6) in the pET22bϩ vector were cultured in 3 L of LB medium at 37°C, 200 rpm to an OD 600 0.5; then protein expression was induced by addition of [1 mM] f isopropyl ␤-D-1-thiogalactopyranoside (IPTG) (Gold Biotechnology). Induced cultures were incubated overnight at 20°C, 200 rpm. The next day, the OD of each culture was taken using a spectrophotometer, and cells collected at 5,000 rpm, 4°C, for 30 min. Cells were resuspended in 50 ml of E. coli resuspension buffer (50 mM Tris-HCl, pH 8.0, 500 mM NaCl) supplemented with 25 mM imidazole, 50 mM MgCl 2 , 100 g/ml DNase, and a protease inhibitor tablet (Roche Applied Science). The cell resuspension was lysed using an emulsiflex. The soluble fraction was separated by centrifugation at 30,000 ϫ g for 30 min at 4°C.

Pneumococcal ComW binds to DNA
The soluble supernatant was passed through a 5-ml nickelnitrilotriacetic acid resin (Thermo Fisher Scientific) on a column equilibrated with Column Buffer (50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 10% glycerol) via gravity. The column was washed with 500 ml of Column Buffer supplemented with 5 mM ␤-mercaptoethanol (␤ME) and 40 mM imidazole. ComW⌬6 was eluted from the column with 10 ml of column buffer supplemented with 5 mM BME and 275 mM imidazole. The ComW⌬6 eluate was dialyzed into 50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 1 mM EDTA, 10% glycerol, and 1 mM ␤ME. Protein concentration was measured using a Nanodrop.

Nano differential scanning fluorimetry
After purification, ComW⌬6 was diluted to 0.5 mg/ml in ten different biological buffers (Fig. S2). Diluted ComW⌬6 was loaded into 10-l capillaries in duplicate. Each capillary was loaded into a Prometheus NT.48 instrument (NanoTemper Technologies Inc). Samples were heated from 20°C to 95°C with a temperature ramp of 1.0°C/min. Data were analyzed with the PR.ThermControl software (NanoTemper Technologies Inc).

Analytical ultra-centrifugation
Purified ComW⌬6 was diluted to an absorbance (280 nm) of 1.5 (0.92 mg/ml), 1.0 (0.61 mg/ml), and 0.5 (0.31 mg/ml) in cold 50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 1 mM EDTA, and 1 mM ␤ME. Adjacent channels of analytical ultracentrifugation cells were separately loaded with 400 l of ComW⌬6 sample or 420 l of buffer and sealed under pressure. The samples were centrifuged at 40,000 rpm in a Beckman Coulter XL-I at 10°C for 24 h. Scans were taken every minute. The raw sedimentation data were analyzed using a continuous distribution model with SEDFIT and then analyzed with a Bayesian model.

Circular dichroism (CD)
Purified ComW⌬6 variants were diluted to 0.02 mg/ml in 30 mM sodium fluoride (NaF) in a 0.2 cm path quartz cell. Their structure was analyzed at 25°C from 260 nm to 190 nm using a JASCO J-710 machine. Deconvolution of the data were performed using Dichroweb using the Contin-LL method (93)(94)(95).

Electrophoretic mobility shift assay
EMSA probes were generated using PCR amplification. For amplification of S. pneumoniae's combox promoter region, upstream of ssbB, 50 ng of CP2137 genomic DNA was used as a template for amplification with primers NL223 (6ЈFAM) and NL224 for labeled probe, and NL228 and NL224 for unlabeled probe. For amplification of S. mutans ComR-box, 1 l of S. mutans cells diluted 1:5 in H 2 O were used as a template for amplification with primers NE 25 (6ЈFAM) and NE27 for labeled probe, and NE26 and NE27 for unlabeled probe. Probes were PCR purified using Zymo kits and concentration measured using a nanodrop.
Purified protein at varied concentrations, and 20 nM of DNA probe were mixed together in 5ϫ EMSA reaction buffer (250 mM Tris-HCl, pH 8.0, 1.5 M NaCl, 5 mM MgCl 2 , 5 mM EDTA), supplemented with 50 mM ␤ME, and the volume of each reaction was brought to 20 l with ComW⌬6 dialysis buffer plus 30% glycerol. After incubation at 25°C for 30 min, in a thermocycler, wells of a 5% polyacrylamide, nondenaturing gel were loaded with 10 l of each EMSA reaction and run at 90 V for 4 h at 4°C in 1X TBE. Gels were removed from casting plates and imaged on Typhoon 3000 (GE Healthcare). Band intensities were measured using Image Studio Lite software (Li-COR biosciences), and the amount of probe shifted by protein was determined as (bound probe/total probe) for each lane. The gels shown are representative of three biological replicates, and statistical significance was determined using an unpaired student's t test.

Western blotting
Pneumococcal cells were cultured in 10 ml of CAT medium for 4 h, diluted to an OD 550 ϭ 0.2, and induced to competence by addition of 100 ng of CSP, 0.04% BSA, and 0.001 M CaCl 2 , for 17 min, and then chilled. A 10-l sample of each culture was removed for plating on CAT agar and incubated overnight at 37°C to determine CFU. The remaining cells were collect by centrifugation at 4,000 rpm for 30 min at 4°C. After removal of the supernatant, cell pellets were resuspended in pneumococcal wash saline (0.01 M Tris-HCl, pH 8.0, 0.15 M NaCl, and 0.01 M EDTA), and 0.1 mm beads were added to each tube for lysis by bead beating in the cold for one 5-min interval, followed by one 4-min interval. The lysates were centrifuged for 10 s and 180 l of clarified cell lysates were moved to a fresh 1.5-ml microcentrifuge tube with 60 l of 4X Laemmli buffer (Bio Rad). Cell samples were incubated at 95°C for 3 min and stored at Ϫ20°C.
For protein detection, the wells of a precast 4 -20% SDS-PAGE (Bio Rad) gel (8.6 X 6.7 cm, 0.1 cm thickness) were filled with 1 ϫ 10 8 cells of each sample. Gels were run at 200 V for 35 min. Proteins were then transferred for 2 h at 36 V to a PVDF membrane activated with cold methanol at 30 V at 4°C in 1X transfer buffer (25 mM Tris-HCl, pH 8.0, 192 mM glycine, 10% MeOH). Membranes were then quickly rinsed with 1ϫ TBST (20 mM Tris-HCl, pH 7.6, 137 mM NaCl, 0.1% Tween 20). Membranes were blocked with 5% BSA in 0.1% TBST for 1 h at room temperature with rocking. Blots were then incubated with a primary rabbit antibody, raised against pneumococcal ComW (48) (1:1000 in 1.0% BSA in 0.1% TBST), plus a primary mouse antibody raised against E. coli RNA polymerase ␤-subunit (Thermo Fisher Scientific) (1:3000 in 0.5% BSA in 0.1% TBST) overnight at 4°C, with rocking. After three washes with 0.1% TBST for 5 min at room temperature, blots were simultaneously incubated with goat anti-rabbit and goat anti-mouse secondary antibodies (Sigma) diluted 1:10,000 in 1% BSA in 0.1% TBST, for 1 h and 20 min at room temperature, with rocking.

Pneumococcal ComW binds to DNA
After one 10-min wash and two 5-min washes in 0.1% TBST at room temperature, blots were incubated with 10 ml of ECL substrate (Bio Rad) for 5 min at room temperature, with rocking. The wet blots were imaged using the LiCOR system. Image Studio Lite was used to quantify the signal intensity from endogenous ComW bands (9.5 kDa) and RNA polymerase (137 kDa). Signal intensities were normalized to the RNA polymerase signal. The blots shown are representative of three biological replicates. Statistical significance was determined using an unpaired Student's t test.