Advertisement

The major subunit of widespread competence pili exhibits a novel and conserved type IV pilin fold

Open AccessPublished:April 09, 2020DOI:https://doi.org/10.1074/jbc.RA120.013316
      Type IV filaments (T4F), which are helical assemblies of type IV pilins, constitute a superfamily of filamentous nanomachines virtually ubiquitous in prokaryotes that mediate a wide variety of functions. The competence (Com) pilus is a widespread T4F, mediating DNA uptake (the first step in natural transformation) in bacteria with one membrane (monoderms), an important mechanism of horizontal gene transfer. Here, we report the results of genomic, phylogenetic, and structural analyses of ComGC, the major pilin subunit of Com pili. By performing a global comparative analysis, we show that Com pili genes are virtually ubiquitous in Bacilli, a major monoderm class of Firmicutes. This also revealed that ComGC displays extensive sequence conservation, defining a monophyletic group among type IV pilins. We further report ComGC solution structures from two naturally competent human pathogens, Streptococcus sanguinis (ComGCSS) and Streptococcus pneumoniae (ComGCSP), revealing that this pilin displays extensive structural conservation. Strikingly, ComGCSS and ComGCSP exhibit a novel type IV pilin fold that is purely helical. Results from homology modeling analyses suggest that the unusual structure of ComGC is compatible with helical filament assembly. Because ComGC displays such a widespread distribution, these results have implications for hundreds of monoderm species.

      Introduction

      Filamentous nanomachines composed of type IV pilins are virtually ubiquitous in Bacteria and Archaea (
      • Berry J.L.
      • Pelicic V.
      Exceptionally widespread nano-machines composed of type IV pilins: the prokaryotic Swiss Army knives.
      ), to which they confer a variety of unrelated functions including adhesion, motility, protein secretion, and DNA uptake. These type IV filaments (T4F)
      The abbreviations used are: T4F
      type IV filaments
      T4P
      type IV pili
      Com
      competence
      UFBoot
      ultrafast bootstrap
      RDC
      residual dipolar couplings
      RMSD
      root mean square deviation
      PDB
      Protein Data Bank
      LB
      Lysogenic Broth
      IPTG
      isopropyl β-d-1-thiogalactopyranoside
      CM
      cytoplasmic membrane
      MSH
      mannose-sensitive hemagglutinin pili
      T2SS
      type II secretion system
      RMSD
      root mean square deviation
      Ni-NTA
      nickel-nitrilotriacetic acid
      HSQC
      heteronuclear single quantum coherence.
      are assembled by conserved multiprotein machineries, which further underlines their phylogenetic relationship (
      • Denise R.
      • Abby S.S.
      • Rocha E.P.C.
      Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
      ).
      Much of our current understanding of this superfamily of nanomachines comes from the study of type IV pili (T4P), the best characterized T4F (
      • Berry J.L.
      • Pelicic V.
      Exceptionally widespread nano-machines composed of type IV pilins: the prokaryotic Swiss Army knives.
      ). In brief, T4P are micrometer-long and thin surface-exposed filaments, which are polymers of type IV pilins. Type IV pilins (simply named pilins hereafter) are defined by an N-terminal sequence motif known as class III signal peptide (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ). This motif, IPR012902 entry in the InterPro database (
      • Jones P.
      • Binns D.
      • Chang H.Y.
      • Fraser M.
      • Li W.
      • McAnulla C.
      • McWilliam H.
      • Maslen J.
      • Mitchell A.
      • Nuka G.
      • Pesseat S.
      • Quinn A.F.
      • Sangrador-Vegas A.
      • Scheremetjew M.
      • Yong S.Y.
      • Lopez R.
      • Hunter S.
      InterProScan 5: genome-scale protein function classification.
      ), consists of a hydrophilic leader peptide ending with a tiny residue (Gly or Ala), followed by a tract of 21 mostly hydrophobic residues, except for a negatively charged Glu5. This hydrophobic tract represents the N-terminal portion (α1N) of an extended α-helix of ∼50 residues (α1), which is the universally conserved structural feature of type IV pilins (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ). Although some small pilins consist solely of this extended α-helix (
      • Reardon P.N.
      • Mueller K.T.
      Structure of the type IVa major pilin from the electrically conductive bacterial nanowires of Geobacter sulfurreducens.
      ), most pilins have a globular head consisting of the C-terminal half of α1 (α1C) packed against a β-sheet composed of several antiparallel β-strands, which gives them their typical “lollipop” 3D architecture (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ). Upon translocation of prepilins across the cytoplasmic membrane (CM), where they remain embedded via their protruding hydrophobic α1N, the leader peptide is processed by an integral membrane aspartic acid protease named prepilin peptidase (IPR000045) (
      • LaPointe C.F.
      • Taylor R.K.
      The type 4 prepilin peptidases comprise a novel family of aspartic acid proteases.
      ). Processing primes pilins for polymerization into filaments. Filament assembly, which remains incompletely understood, is mediated by a multiprotein machinery in the CM, centered on an integral membrane platform protein (IPR003004) and a cytoplasmic extension ATPase (IPR007831) (
      • Berry J.L.
      • Pelicic V.
      Exceptionally widespread nano-machines composed of type IV pilins: the prokaryotic Swiss Army knives.
      ). As revealed by recent cryo-EM structures of several T4P (
      • Kolappan S.
      • Coureuil M.
      • Yu X.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Structure of the Neisseria meningitidis type IV pilus.
      ,
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ), filaments are right-handed helical polymers where pilins are held together by extensive interactions between their α1 helices, which are partially melted and run approximately parallel to each other within the filament core.
      One of the key functional roles of T4F is their involvement in natural transformation in prokaryotes, the ability of species defined as “competent” to take up exogenous DNA across their membrane(s) and incorporate it stably into their genomes (
      • Dubnau D.
      • Blokesch M.
      Mechanisms of DNA uptake by naturally competent bacteria.
      ). This widespread property in bacteria (
      • Johnston C.
      • Martin B.
      • Fichant G.
      • Polard P.
      • Claverys J.P.
      Bacterial transformation: distribution, shared mechanisms and divergent control.
      ) is key for horizontal gene transfer, an important factor in bacterial evolution and the spread of antibiotic resistance. T4F are involved in the very first step of natural transformation, i.e. binding of free extracellular DNA and its translocation close to the CM (
      • Dubnau D.
      • Blokesch M.
      Mechanisms of DNA uptake by naturally competent bacteria.
      ). DNA is subsequently bound by the DNA receptor ComEA and further translocated across the CM through the ComEC channel (
      • Dubnau D.
      • Blokesch M.
      Mechanisms of DNA uptake by naturally competent bacteria.
      ). In diderm-competent species, the T4F involved in DNA uptake is a subtype of T4P, known as T4aP (
      • Berry J.L.
      • Gurung I.
      • Anonsen J.H.
      • Spielman I.
      • Harper E.
      • Hall A.M.J.
      • Goosens V.J.
      • Raynaud C.
      • Koomey M.
      • Biais N.
      • Matthews S.
      • Pelicic V.
      Global biochemical and structural analysis of the type IV pilus from the Gram-positive bacterium Streptococcus sanguinis.
      ), which rapid depolymerization is powered by the retraction ATPase PilT (IPR006321), generating exceptionally large tensile forces (
      • Merz A.J.
      • So M.
      • Sheetz M.P.
      Pilus retraction powers bacterial twitching motility.
      ). In brief, T4aP bind DNA directly, via one of their major or minor (low abundance) pilin subunits (
      • Cehovin A.
      • Simpson P.J.
      • McDowell M.A.
      • Brown D.R.
      • Noschese R.
      • Pallett M.
      • Brady J.
      • Baldwin G.S.
      • Lea S.M.
      • Matthews S.J.
      • Pelicic V.
      Specific DNA recognition mediated by a type IV pilin.
      ), and then are retracted by PilT, bringing DNA to the ComEA receptor (
      • Ellison C.K.
      • Dalia T.N.
      • Vidal Ceballos A.
      • Wang J.C.
      • Biais N.
      • Brun Y.V.
      • Dalia A.B.
      Retraction of DNA-bound type IV competence pili initiates DNA uptake during natural transformation in Vibrio cholerae.
      ). In monoderm-competent species, DNA uptake is mediated by a distinct T4F named competence (Com) pilus (
      • Dubnau D.
      • Blokesch M.
      Mechanisms of DNA uptake by naturally competent bacteria.
      ), much less well-characterized than T4P. Com pili are composed mainly of the major pilin (ComGC) (
      • Laurenceau R.
      • Pehau-Arnaudet G.
      • Baconnais S.
      • Gault J.
      • Malosse C.
      • Dujeancourt A.
      • Campo N.
      • Chamot-Rooke J.
      • Le Cam E.
      • Claverys J.P.
      • Fronzes R.
      A type IV pilus mediates DNA binding during natural transformation in Streptococcus pneumoniae.
      ,
      • Chen I.
      • Provvedi R.
      • Dubnau D.
      A macromolecular complex formed by a pilin-like protein in competent Bacillus subtilis.
      ), and are assembled by a simple machinery composed of four minor pilins (ComGD, ComGE, ComGF, and ComGG), a prepilin peptidase (ComC), an extension ATPase (ComGA) and a platform protein (ComGB) (
      • Chung Y.S.
      • Dubnau D.
      ComC is required for the processing and translocation of ComGC, a pilin-like competence protein of Bacillus subtilis.
      ,
      • Chung Y.S.
      • Dubnau D.
      All seven comG open reading frames are required for DNA binding during transformation of competent Bacillus subtilis.
      ). Filaments morphologically similar to T4aP, several micrometer in length and 60 Å in width, have been observed in Streptococcus pneumoniae (
      • Laurenceau R.
      • Pehau-Arnaudet G.
      • Baconnais S.
      • Gault J.
      • Malosse C.
      • Dujeancourt A.
      • Campo N.
      • Chamot-Rooke J.
      • Le Cam E.
      • Claverys J.P.
      • Fronzes R.
      A type IV pilus mediates DNA binding during natural transformation in Streptococcus pneumoniae.
      ,
      • Muschiol S.
      • Erlendsson S.
      • Aschtgen M.S.
      • Oliveira V.
      • Schmieder P.
      • de Lichtenberg C.
      • Teilum K.
      • Boesen T.
      • Akbey U.
      • Henriques-Normark B.
      Structure of the competence pilus major pilin ComGC in Streptococcus pneumoniae.
      ).
      How Com pili are assembled, bind DNA, and presumably retract in the absence of a PilT retraction motor is not understood. One important limitation is the absence of high-resolution structural information. Therefore, in the present study, we have focused on ComGC, the major subunit of the Com pilus. We report (i) a global comparative and phylogenetic analysis of ComGC, and (ii) 3D structures for two orthologs, ComGCSP from the model competent species S. pneumoniae and ComGCSS from Streptococcus sanguinis, a common cause of infective endocarditis in humans that has recently emerged as a monoderm model for the study of T4F. Finally, we discuss the general implications of these findings.

      Results

      Com pili genes are almost ubiquitous in monoderm Bacilli, including the T4F model S. sanguinis

      So far, Com pili have been mainly studied in two model-competent species: Bacillus subtilis and S. pneumoniae. S. sanguinis is a naturally competent species that has recently emerged as a monoderm T4F model because it expresses retractable T4aP (
      • Pelicic V.
      Monoderm bacteria: the new frontier for type IV pilus biology.
      ). Functional analysis of S. sanguinis T4aP showed that they are dispensable for DNA uptake, which is instead mediated by Com pili because competence was abolished in a ΔcomGB mutant (
      • Gurung I.
      • Spielman I.
      • Davies M.R.
      • Lala R.
      • Gaustad P.
      • Biais N.
      • Pelicic V.
      Functional analysis of an unusual type IV pilus in the Gram-positive Streptococcus sanguinis.
      ). A closer inspection of S. sanguinis genome revealed that all the genes encoding the Com pilus are present. These genes are organized in two loci (Fig. 1A), comC and the comGABCDEDFG operon, showing perfect synteny with the corresponding loci in model competent species (
      • Albano M.
      • Breitling R.
      • Dubnau D.A.
      Nucleotide sequence and genetic organization of the Bacillus subtilis comG operon.
      ,
      • Mohan S.
      • Aghion J.
      • Guillen N.
      • Dubnau D.
      Molecular cloning and characterization of comC, a late competence gene of Bacillus subtilis.
      ). Multiple sequence alignments of the corresponding proteins with orthologs in B. subtilis and S. pneumoniae showed extensive conservation (Table S1). Detailed sequence analysis of the N termini of the five ComG pilins identified clear class III signal peptides (Fig. 1B), i.e. short (8–15 residues) and hydrophilic leader peptides ending with an Ala, followed by a tract of 21 mostly hydrophobic residues. ComGG is the only pilin that does not have a negatively charged Glu5 and displays a noncanonical class III signal peptide (Fig. 1B), which is not identified by InterPro or PilFind that is dedicated to the prediction of type IV pilins (
      • Imam S.
      • Chen Z.
      • Roos D.S.
      • Pohlschröder M.
      Identification of surprisingly diverse type IV pili, across a broad range of Gram-positive bacteria.
      ). This is a conserved property for ComGG orthologs.
      Figure thumbnail gr1
      Figure 1Com pilus machinery in S. sanguinis. A, genomic organization of the genes involved in the biogenesis of the Com pilus in S. sanguinis 2908. All the genes are drawn to scale, with the scale bar representing 500 bp. The functions of the corresponding proteins are listed at the bottom. B, sequence alignment of the putative N-terminal class III signal peptides of the five ComG pilins in S. sanguinis 2908. The 8–15–amino acid long leader peptides, which contain a majority of hydrophilic (shaded in gray) or neutral (no shading) residues, end with a conserved Ala−1. Leader peptides are processed (indicated by the vertical arrow) by the prepilin peptidase ComC. The mature proteins start with a tract of 21 predominantly hydrophobic residues (shaded in black), which invariably form the protruding N-terminal portion of an extended α-helix that is the main assembly interface within filaments.
      We next determined the global distribution of the Com system in publicly available complete bacterial genomes using MacSyFinder (
      • Abby S.S.
      • Néron B.
      • Ménager H.
      • Touchon M.
      • Rocha E.P.
      MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems.
      ). Specifically, we used the MacSyFinder model built for the identification of Com systems (
      • Denise R.
      • Abby S.S.
      • Rocha E.P.C.
      Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
      ), which takes into account the genetic composition and organization of its components. This showed that the Com system is restricted to Firmicutes, a phylum comprising a vast majority of monoderms, where it is exceptionally widespread because it was detected in 2,333 genomes (Spreadsheet S1). An overwhelming majority of the corresponding species (99.7%) belong to the taxonomic class Bacilli (equally distributed among the Bacillales and Lactobacillales orders). As many as 88.7% of the sequenced Bacilli have a Com system. We also detected Com systems in one Clostridia (of 336) and six Erysipelotrichia (of 14). In total, 349 different species have the potential to express a Com pilus (Spreadsheet S2). Taken together, these findings suggest that the Com pilus is almost ubiquitous in Bacilli and can be advantageously studied in S. sanguinis.

      ComGC, the major subunit of Com pili, is highly conserved and defines a monophyletic group among type IV pilins

      We next focused specifically on the major subunit of Com pili, the pilin ComGC (
      • Laurenceau R.
      • Pehau-Arnaudet G.
      • Baconnais S.
      • Gault J.
      • Malosse C.
      • Dujeancourt A.
      • Campo N.
      • Chamot-Rooke J.
      • Le Cam E.
      • Claverys J.P.
      • Fronzes R.
      A type IV pilus mediates DNA binding during natural transformation in Streptococcus pneumoniae.
      ,
      • Chen I.
      • Provvedi R.
      • Dubnau D.
      A macromolecular complex formed by a pilin-like protein in competent Bacillus subtilis.
      ). Compared with major pilins from T4aP, ComGC is ∼40% shorter, with 94 or 93 amino acids for the processed ComGCSS and ComGCSP, respectively (10.2 and 10.4 kDa). Moreover, unlike most other pilins, in which the only detectable sequence homology is usually in the α1N portion of the class III signal peptide (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ), ComGC orthologs show extensive sequence identity. For example, processed ComGCSS and ComGCSP display 65.6% overall sequence identity (Fig. 2). Similarly, processed ComGCSS and ComGCBS (from B. subtilis) show 33.3% sequence identity overall (Fig. S1). This is consistent with the existence of a ComGC signature in the InterPro database (IPR016940) (
      • Jones P.
      • Binns D.
      • Chang H.Y.
      • Fraser M.
      • Li W.
      • McAnulla C.
      • McWilliam H.
      • Maslen J.
      • Mitchell A.
      • Nuka G.
      • Pesseat S.
      • Quinn A.F.
      • Sangrador-Vegas A.
      • Scheremetjew M.
      • Yong S.Y.
      • Lopez R.
      • Hunter S.
      InterProScan 5: genome-scale protein function classification.
      ), which lists 2,809 ComGC entries. Global multiple alignment of these ComGC proteins shows that most of the sequence is conserved in ∼90% of the entries (Fig. 2). In Fig. 2, the consensus sequences have been aligned to ComGCSS and ComGCSP. Strikingly, some residues show sequence identity in virtually all the entries, including residues outside of the α1N portion (such as Ala38, Gln46, Tyr50, and Leu64 in ComGCSS).
      Figure thumbnail gr2
      Figure 2Global sequence analysis of ComGC pilins. Sequence alignments of ComGC in S. sanguinis and S. pneumoniae is represented in the top two rows. Residues were shaded in black (identical), gray (conserved), or unshaded (different). The leader peptide is highlighted. In the recombinant proteins that were produced for structure determination, the N-terminal 22 residues invariably forming a protruding hydrophobic α-helix were truncated (depicted by an arrow) to promote solubility. The 2D structural motifs predicted using JPred are depicted in the third row. Fourth and fifth rows represent the 80 and 90% ComGC consensus sequences, computed from 2,809 ComGC entries in InterPro, and aligned to ComGCSS and ComGCSP. Multiple alignments were generated using Clustal Omega and formatted with MView. Polar: C, D, E, H, K, N, Q, R, S, or T. Tiny: A or G. Hydrophobic: A, C, F, G, H, I, K, L, M, R, T, V, W, or Y. Aliphatic: I, L or V. Turn-like: A, C, D, E, G, H, K, N, Q, R, S, or T. Small: A, C, D, G, N, P, S, T, or V. Single letter abbreviations are used.
      The above observations suggest that Com pili form a highly homogeneous T4F subfamily. This was tested by performing a phylogenetic analysis based on the protein sequences of major pilins from different T4F found in a wide variety of bacteria, including T4aP, T4bP, T4cP (also known as Tad pili), mannose-sensitive hemagglutinin pili (MSH), type II secretion systems (T2SS), and Com pili. The phylogeny tree that was generated (Fig. 3), using IQ-TREE (
      • Nguyen L.T.
      • Schmidt H.A.
      • von Haeseler A.
      • Minh B.Q.
      IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.
      ), reveals that several T4F are in clear monophyletic groups with good branch support, >96% ultrafast bootstrap (UFBoot) (
      • Hoang D.T.
      • Chernomor O.
      • von Haeseler A.
      • Minh B.Q.
      • Vinh L.S.
      UFBoot2: improving the ultrafast bootstrap approximation.
      ). Of particular interest, Com pili define a highly supported clade (99% UFBoot), clearly distinct from all other T4F systems. Taken together, these findings show that ComGC is a small pilin with a highly conserved sequence, which defines a monophyletic group.
      Figure thumbnail gr3
      Figure 3Rooted phylogeny of the major pilins from various bacterial T4F. The tree was build using IQ-Tree, with 1,000 replicates of UFBoot and LG+F+R4 model. Numeric values (in %) indicate UFBoot of the corresponding branches. The color of the bullet points indicates the taxonomic group of the corresponding species. The color of the strips and highlights indicate the classification of the different T4F systems. T4aP, type IVa pilus. T4bP, type IVb pilus. T4cP, type IVc pilus (also known as Tad).

      Solution structure of two ComGC orthologs reveal a conserved and new type IV pilin fold

      Because high-resolution structural information is needed to improve our understanding of Com pili, we decided to solve the 3D structure of ComGCSS. To facilitate protein purification, we used a synthetic comGCSS gene codon-optimized for expression in Escherichia coli, and produced a soluble protein in which the first 22 residues of ComGCSS that form a hydrophobic α-helix (α1N) were replaced by a noncleavable N-terminal hexahistidine tag (His6). This commonly used truncation approach is predicted to have minimal structural impact on the rest of the protein, as previously shown for the Pseudomonas aeruginosa PAK pilin (
      • Craig L.
      • Taylor R.K.
      • Pique M.E.
      • Adair B.D.
      • Arvai A.S.
      • Singh M.
      • Lloyd S.J.
      • Shin D.S.
      • Getzoff E.D.
      • Yeager M.
      • Forest K.T.
      • Tainer J.A.
      Type IV pilin structure and assembly. X-ray and EM analyses of Vibrio cholerae toxin-coregulated pilus and Pseudomonas aeruginosa PAK pilin.
      ). The resulting 8.8-kDa His6-ComGCSS protein could be readily purified using a combination of affinity and gel-filtration chromatography. After purification of isotopically labeled protein with 13C and 15N for backbone and side chain NMR resonance assignments, we could assign 99.5% of the backbone and 92% of assignable protons overall. Structural ensembles were determined with 962 NOE-based restraints, 50 hydrogen bonds, 110 dihedral angles restraints, and 39 residual dipolar couplings (RDC) (Table 1). As can be seen in Fig. 4, ComGCSS 3D structure is unlike that of any type IV pilin present in the PDB, as it is purely helical, with three distinct helices connected by loops. The helices present are consistent with JPred secondary structure prediction (Fig. 2) (
      • Drozdetskiy A.
      • Cole C.
      • Procter J.
      • Barton G.J.
      JPred4: a protein secondary structure prediction server.
      ). The N-terminal α1-helix, which involves residues 37–53 of the processed protein, corresponds to α1C because the hydrophobic α1N has been truncated in His6-ComGCSS. Tightly packed against this α1-helix, in a parallel plane, are α2-helix (residues 61–67) and α3-helix (residues 72–85), which stack against each other in antiparallel fashion (Fig. 4A) and orthogonally to α1. Except for the N-terminal unstructured residues, the ComGCSS structures within the NMR ensemble superpose well onto each other (Fig. 4B), with a root mean square deviation (RMSD) of 1.2 Å for Cα atoms, which suggests that there is no significant flexibility in this portion of the structure (
      • Krissinel E.
      • Henrick K.
      Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions.
      ). The unstructured N terminus, which lacks long and medium NOEs present in the ordered regions of the protein, was predicted to be highly dynamic based on TALOS+ (
      • Shen Y.
      • Delaglio F.
      • Cornilescu G.
      • Bax A.
      TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts.
      ), with an average S2 order parameter of 0.49 ± 0.10.
      Table 1NMR structural statistics
      ComGCSSComGCSP
      NOE-derived distance constraints
       Long [(i-j) > 5]12895
       Medium [5 ≥ (i-j) > 1]414404
       Intraresidue (i = j)420381
       Total962880
       Hydrogen bonds5054
       Dihedral constraints (Φ and Ψ)110102
       Residual dipolar couplings (RDC)3938
      Ramachandran statistics (from PROCHECK)
       Most favored (%)93.483.0
       Additionally allowed (%)6.415.9
       Generously allowed (%)0.21.1
       Disallowed (%)0.00.0
      Structure statistics
       RMSD backbone (all residues)3.34.0
       RMSD backbone (ordered residues)
      PROCHECK ordered residues are 37–53, 61–66, and 72–85 for ComGCSS, and 36–54, 60–65, and 71–82 for ComGCSP.
      0.60.8
       RMS bond angles (°)1.81.9
       RMS bond lengths (Å)0.0120.017
      Restraint statistics (RMSD of violations)
       NOE restraints0.060 ± 0.0030.179 ± 0.008
       Hydrogen bonds0.075 ± 0.0150.100 ± 0.017
       Dihedral restraints1.805 ± 0.0751.827 ± 0.318
       RDC0.748 ± 0.1380.716 ± 0.256
       Q value0.146 ± 0.0280.150 ± 0.054
      a PROCHECK ordered residues are 37–53, 61–66, and 72–85 for ComGCSS, and 36–54, 60–65, and 71–82 for ComGCSP.
      Figure thumbnail gr4
      Figure 43D solution structure of ComGCSS. A, cartoon representation of the ComGCSS structure: face and side views are shown. A dimmed surface representation of the protein is superimposed. The three consecutive α-helices have been named α1, α2, and α3, and highlighted in blue (α1) or cyan (α2 and α3). B, cartoon representation of the superposition of the ensemble of 10 ComGCSS structures determined by NMR, which highlights that there is no significant flexibility in the structure except for the unstructured N terminus.
      Our ComGCSS structure differs markedly from the recently reported solution structure of ComGCSP (PDB 5NCA) (
      • Muschiol S.
      • Erlendsson S.
      • Aschtgen M.S.
      • Oliveira V.
      • Schmieder P.
      • de Lichtenberg C.
      • Teilum K.
      • Boesen T.
      • Akbey U.
      • Henriques-Normark B.
      Structure of the competence pilus major pilin ComGC in Streptococcus pneumoniae.
      ), which is surprising considering the high sequence identity between these two proteins (Fig. 2). Therefore, to define the structural relationship between ComGC orthologs, we decided to solve the structure of ComGCSP. As above, we used a synthetic comGCSP gene codon-optimized for expression in E. coli, we fused the 71-amino acid long soluble portion of ComGCSP to a noncleavable N-terminal His6 tag and purified doubly labeled His6-ComGCSP (9 kDa). Again, assignment was excellent because 98.1% of the backbone and 90% of assignable protons overall could be assigned. Structural ensembles were determined with 880 NOE-based restraints, 54 hydrogen bonds, 102 dihedral angles restraints, and 38 RDC (Table 1). As can be seen in Fig. 5, our ComGCSP 3D structure is highly similar to the structure of ComGCSS. It is, however, very different (RMSD of 3.6 Å) from the ComGCSP solution structure that was recently determined from a low number of restraints (Fig. S2) (
      • Muschiol S.
      • Erlendsson S.
      • Aschtgen M.S.
      • Oliveira V.
      • Schmieder P.
      • de Lichtenberg C.
      • Teilum K.
      • Boesen T.
      • Akbey U.
      • Henriques-Normark B.
      Structure of the competence pilus major pilin ComGC in Streptococcus pneumoniae.
      ). In brief, our structure shows that ComGCSP displays three distinct helices, with α2-helix (residues 60–66) and α3-helix (residues 71–82) stacking against each other and packing orthogonal to the N-terminal α1-helix (Fig. 5A). As for ComGCSS, except for the unstructured N terminus, there is no significant flexibility in ComGCSP because the structures within the NMR ensemble superpose well onto each other, with a RMSD of 1.6 Å for Cα atoms (Fig. 5B). Our ComGCSS and ComGCSP averaged structures are highly similar (Fig. 5C), with 1.8 Å RMSD between their ordered regions and 1.5 Å RMSD for the helical regions, which is consistent with the high sequence identity between these two proteins.
      Figure thumbnail gr5
      Figure 53D solution structure of ComGCSP. A, cartoon representation of the ComGCSP structure: face and side views are shown. A dimmed surface representation of the protein is superimposed. Nomenclature and color scheme are the same as described in the legend to . B, cartoon representation of the superposition of the ensemble of 10 ComGCSP structures determined by NMR. C, cartoon representation of the overlay of ComGCSP and ComGCSS representative structures. This highlights the high structural similarity between the two proteins, with 1.5 Å RMSD for the helical regions.
      As determined by GETAREA (
      • Fraczkiewicz R.
      • Braun W.
      Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules.
      ) with a probe radius of 1.4 Å, the average ratio of solvent exposure for the ordered portion of ComGCSS is 48.3%, relative to 6.7% for those residues determined to be on the interior. In our ComGCSS structure, conserved residues Val43, Gln46, Tyr50, Leu64, and Ile70 are deeply buried, with an average of only 6% solvent exposure, forming a critical portion of a hydrophobic core contributing to the globular fold of ComGC. (Fig. 6). In contrast, the conserved Gly68 is solvent exposed, which is important for the formation of the α2-helix-turn-α3-helix motif where a tiny residue at the beginning of the turn is necessary to provide the flexibility and lack of steric restrictions required for turning. These observations also apply to our ComGCSP structure and are surprisingly reflected in the conservation of multiple chemical shifts between the conserved residues in our two structures (Fig. S3). In addition, modeling of the globular head of ComGCBS (Fig. S4), which predicts a globular fold similar to ComGCSS and ComGCSP, shows that Cys36 and Cys76 are in close enough proximity to form a disulfide bond. Such disulfide bond, which is absent in ComGCSS and ComGCSP that do not have Cys residues, is expected to stabilize the globular fold and was reported to stabilize ComGC in B. subtilis (
      • Chen I.
      • Provvedi R.
      • Dubnau D.
      A macromolecular complex formed by a pilin-like protein in competent Bacillus subtilis.
      ,
      • Meima R.
      • Eschevins C.
      • Fillinger S.
      • Bolhuis A.
      • Hamoen L.W.
      • Dorenbos R.
      • Quax W.J.
      • van Dijl J.M.
      • Provvedi R.
      • Chen I.
      • Dubnau D.
      • Bron S.
      The bdbDC operon of Bacillus subtilis encodes thiol-disulfide oxidoreductases required for competence development.
      ).
      Figure thumbnail gr6
      Figure 6Conserved residues contributing to the globular fold of ComGC. Cartoon representation of the ordered portion of ComGCSS, where residues determined to be on the interior using GETAREA, with surface accessibility ratios of less than 20%, are highlighted in orange. The consensus residues Val43, Gln46, Tyr50, Leu64, and Ile70 are shown with space filling representation.
      Because the hydrophobic α1N that has been truncated in His6-ComGCSS is highly similar to the corresponding portion of several other bacterial T4F major pilins, including the PilE major pilin from Neisseria gonorrheae (Fig. S5) for which a full-length crystal structure is available (
      • Parge H.E.
      • Forest K.T.
      • Hickey M.J.
      • Christensen D.A.
      • Getzoff E.D.
      • Tainer J.A.
      Structure of the fibre-forming protein pilin at 2.6 Å resolution.
      ), we could model the structure of the portion of α1 truncated in our construct to produce a full-length model of ComGCSS (Fig. 7). Comparison with the two different pilin folds identified so far, pilins from N. gonorrheae and Geobacter sulfurreducens have been chosen as representative models, clearly shows that ComGC adopts a radically different type IV pilin fold (Fig. 7). All three pilins have in common an extended N-terminal α1-helix, the universal defining structural feature of type IV pilins (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ). In addition, whereas the very short G. sulfurreducens pilin almost exclusively consists of α1, both ComGC and PilE display a typical lollipop shape with a globular head mounted onto a “stick” (the α1-helix). However, unlike in canonical pilins where the globular head consists of a 4–7–stranded antiparallel β-sheet in a parallel plane to α1, oriented 45° or more relative to the long axis of α1 (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ), in ComGC the structural backbone of the globular head is an helix-turn-helix roughly orthogonal to α1 (Fig. 7). This fold, which falls within the class of mainly α and the architecture of orthogonal bundles, represents a novel pilin fold. Taken together, these structural findings show that ComGC orthologs display conserved 3D structures, with a previously unreported type IV pilin fold.
      Figure thumbnail gr7
      Figure 7ComGC display a novel type IV pilin fold. 3D structure of the three different structural types of type IV pilins identified so far. The canonical type IV pilin fold is represented by the major pilin of T4aP in N. gonorrheae (PDB 2PIL). G. sulfurreducens T4P pilin (PDB 2M7G) is the chosen representative of the very short pilins almost exclusively consisting of α1. The full-length 3D structure of ComGCSS has been modeled. The conserved α1 is highlighted in blue. Distinctive structural features in the globular heads of PilE (antiparallel β-sheet) and ComGC (antiparallel α2-α3 orthogonal to α1) have been highlighted in cyan.

      ComGC novel pilin fold is compatible with helical T4F assembly

      Because ComGC represents a novel type IV pilin structural fold, it was important to determine whether it could be modeled into recent cryo-EM structures obtained for a variety of bacterial T4F (
      • Kolappan S.
      • Coureuil M.
      • Yu X.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Structure of the Neisseria meningitidis type IV pilus.
      ,
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ,
      • López-Castilla A.
      • Thomassin J.L.
      • Bardiaux B.
      • Zheng W.
      • Nivaskumar M.
      • Yu X.
      • Nilges M.
      • Egelman E.H.
      • Izadi-Pruneyre N.
      • Francetic O.
      Structure of the calcium-dependent type 2 secretion pseudopilus.
      ). These similar structures, i.e. filaments are right-handed helical polymers where pilins are held together by interactions between their α1 helices within the filament core, have revealed that a segment of α1N is melted during filament assembly, centered on helix-breaking residue Pro22. That portion of α1 is highly conserved in ComGC, including the helix-breaking Pro22 (Fig. S5). Using SWISS-MODEL (
      • Waterhouse A.
      • Bertoni M.
      • Bienert S.
      • Studer G.
      • Tauriello G.
      • Gumienny R.
      • Heer F.T.
      • de Beer T.A.P.
      • Rempfer C.
      • Bordoli L.
      • Lepore R.
      • Schwede T.
      SWISS-MODEL: homology modelling of protein structures and complexes.
      ) and the cryo-EM structure of N. gonorrheae T4P (
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ) as a template, we produced a full-length 3D structural model of ComGCSS with a melted α1N segment (Fig. 8A). Considering that ComGC defines a monophyletic group and is highly conserved, it is very likely that all ComGC orthologs will display a similar 3D structure. This notion was strengthened by producing structural models for a range of different species expressing more or less distant ComGC (21.3–65.6% sequence identity), which were used to generate the phylogeny tree in Fig. 3. As seen in Fig. S6, all the models display the same lollipop shape with a globular head mounted onto a α1 stick. As for ComGCSS and ComGCSP, the structural backbone of the globular head is always a helix-turn-helix roughly orthogonal to α1.
      Figure thumbnail gr8
      Figure 83D model of ComGC filaments. The cryo-EM structure of the N. gonorrheae T4P (PDB 5VXX) has been used as a template to generate a model of ComGCSS pili. A, full-length ComGCSS in filaments with a melted segment in α1N. B, ComGCSS pili with a right-handed helical packing of the conserved α1-helices, which run approximately parallel to each other in the filament core. C, top and bottom views of ComGCSS pili highlighting the globular heads forming the outer surface of the filaments and the extensive interactions between α1-helices in the filament core.
      We next assessed whether full-length ComGC would be compatible with helical T4F assembly and found that to be the case. Despite its novel pilin fold, we were able to model packing of ComGC within the cryo-EM structure of N. gonorrheae T4P (
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ). This produced a homology model with good Ramachandran plot statistics based on PROCHECK (
      • Laskowski R.A.
      • MacArthur M.W.
      • Moss D.S
      • Thornton J.M.
      PROCHECK: a program to check the stereochemical quality of protein structures.
      ), i.e. allowed (89.5%), additional allowed (8.2%), generously allowed (2.3%), and disallowed (0%). As can be seen in Fig. 8B, the model revealed a right-handed helical packing of the conserved N-terminal α1-helices of ComGCSS in the filament core, which run approximately parallel to each other and establish extensive hydrophobic interactions (Fig. 8C). In addition, the Glu5 side chain of subunit S establishes a salt bridge and a hydrogen bond Phe1 and Thr2, respectively, of S+1. Importantly, the globular heads are stacked on top of each other along the long axis of the filaments and their helix-turn-helix structural backbone forms the outer surface of the filaments (Fig. 8C). A very similar model was obtained using P. aeruginosa T4P as a template (
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ).

      Discussion

      Their virtual ubiquity in prokaryotes and role in a variety of key biological processes make T4F an important research topic (
      • Berry J.L.
      • Pelicic V.
      Exceptionally widespread nano-machines composed of type IV pilins: the prokaryotic Swiss Army knives.
      ,
      • Denise R.
      • Abby S.S.
      • Rocha E.P.C.
      Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
      ). Com pili are involved in DNA uptake in naturally competent monoderm bacteria (
      • Dubnau D.
      • Blokesch M.
      Mechanisms of DNA uptake by naturally competent bacteria.
      ). Imported DNA, which usually leads to genome diversification via transformation, can also be used as a source of food or as a template for repair of damaged genomic DNA (
      • Johnston C.
      • Martin B.
      • Fichant G.
      • Polard P.
      • Claverys J.P.
      Bacterial transformation: distribution, shared mechanisms and divergent control.
      ). Compared with T4F in diderms, most notably T4P and T2SS that have been extensively studied, Com pili have been understudied, including from a structural point of view. In this report, we focused on the major subunit of Com pili, the ComGC pilin, which we analyzed genomically, phylogenetically, and structurally. This led to the notable findings discussed below.
      Although Com pili have been primarily studied in two model competent species (B. subtilis and S. pneumoniae), the present study suggests that they are widespread because complete sets of Com pilus-encoding genes are readily detected in more than 2,300 genomes corresponding to almost 350 different species. However, unlike promiscuous T4F such as T4aP and T4cP that are found in virtually all phyla of Bacteria (
      • Denise R.
      • Abby S.S.
      • Rocha E.P.C.
      Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
      ), Com pili are restricted to a single phylum (Firmicutes) and almost exclusively to a single underlying class of monoderms (Bacilli), where they are virtually ubiquitous. Indeed, an overwhelming majority of Bacilli genomes (88%) have Com-encoding genes. Interestingly, the major subunit of Com pili (ComGC) shows extensive sequence conservation in the corresponding genomes and define a clear monophyletic group within type IV pilins. Taken together, these observations suggest that the Com pilus is a T4F that has emerged only once, very early during the diversification of Firmicutes, where it has remained largely confined ever since. Because the Com-encoding genes have not become pseudogenes, it is likely that most Bacilli have the ability to assemble a Com pilus and take up DNA. However, because only a handful of these species have been experimentally shown to be competent (
      • Johnston C.
      • Martin B.
      • Fichant G.
      • Polard P.
      • Claverys J.P.
      Bacterial transformation: distribution, shared mechanisms and divergent control.
      ), this implies that either the imported DNA is primarily used as food or for genome repair instead of genome diversification, or that the inducing cues leading to transformation are yet to be established for most species of Firmicutes. Alternatively, Com pili might have evolved in some of these species to take up other macromolecules, which is, however, at odds with the conservation of the five pilins.
      Perhaps the most important finding in this study is that ComGC, the major subunit of the Com pilus, displays an entirely novel major pilin fold where the extended N-terminal α1-helix, the universal defining structural feature of type IV pilins (
      • Giltner C.L.
      • Nguyen Y.
      • Burrows L.L.
      Type IV pilin proteins: versatile molecular modules.
      ), is topped by a purely helical globular head. ComGC thus appears to be a “middle ground” between longer canonical pilins (e.g. N. gonorrheae), in which the globular head consists of an antiparallel β-sheet, and the very short pilins where a globular head is missing altogether (e.g. G. sulfurreducens). These structures point to a hypothetical evolutionary scenario during which truncation of the antiparallel β-sheet in a canonical type IV pilin might have led to a purely helical ComGC proto-structure. Intriguingly, this scenario “works” particularly well with PilE1, a major subunit of S. sanguinis T4P, which has two short α-helices in the loop connecting α1 and the antiparallel β-sheet (
      • Berry J.L.
      • Gurung I.
      • Anonsen J.H.
      • Spielman I.
      • Harper E.
      • Hall A.M.J.
      • Goosens V.J.
      • Raynaud C.
      • Koomey M.
      • Biais N.
      • Matthews S.
      • Pelicic V.
      Global biochemical and structural analysis of the type IV pilus from the Gram-positive bacterium Streptococcus sanguinis.
      ). Importantly, this putative “truncation” would not interfere with the expected ability of ComGC to be assembled into helical filaments, because this pilin could be readily modeled into recent T4F structures (
      • Kolappan S.
      • Coureuil M.
      • Yu X.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Structure of the Neisseria meningitidis type IV pilus.
      ,
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ,
      • López-Castilla A.
      • Thomassin J.L.
      • Bardiaux B.
      • Zheng W.
      • Nivaskumar M.
      • Yu X.
      • Nilges M.
      • Egelman E.H.
      • Izadi-Pruneyre N.
      • Francetic O.
      Structure of the calcium-dependent type 2 secretion pseudopilus.
      ). Com pili are thus likely to result from the right-handed helical packing of ComGC α1-helices within the filament core, running parallel to each other and establishing extensive hydrophobic interactions, with a melted central portion. Such packing will stack the globular heads on top of each other, forming the surface of the filaments. Extensive sequence conservation, including for residues beyond the classically conserved α1N, and the fact that the two structures that we have solved are virtually identical, strongly suggest that these structural features apply to the whole ComGC clade, including species such as B. subtilis where extended filaments have not been observed (
      • Chen I.
      • Provvedi R.
      • Dubnau D.
      A macromolecular complex formed by a pilin-like protein in competent Bacillus subtilis.
      ). It is therefore surprising that a recently published NMR structure of ComGCSP (PDB 5NCA) (
      • Muschiol S.
      • Erlendsson S.
      • Aschtgen M.S.
      • Oliveira V.
      • Schmieder P.
      • de Lichtenberg C.
      • Teilum K.
      • Boesen T.
      • Akbey U.
      • Henriques-Normark B.
      Structure of the competence pilus major pilin ComGC in Streptococcus pneumoniae.
      ) differs dramatically from ours. Although the previous structure is purely helical as well, the orientation of the α2 and α3 helices is entirely different, resulting in an absence of packing of the conserved hydrophobic core. Therefore, PDB 5NCA, which resembles a one-sided “pick-axe” with no globular head, cannot be readily modeled into recent T4F structures (Fig. S7). Interestingly, our assignments vary only slightly from those previously produced for PDB 5NCA (Fig. S8). However, whereas we have managed to successfully assign 90% assignable protons overall, the previous assignment was merely 65% (
      • Muschiol S.
      • Erlendsson S.
      • Aschtgen M.S.
      • Oliveira V.
      • Schmieder P.
      • de Lichtenberg C.
      • Teilum K.
      • Boesen T.
      • Akbey U.
      • Henriques-Normark B.
      Structure of the competence pilus major pilin ComGC in Streptococcus pneumoniae.
      ), which probably accounts for the apparently “unfolded” state of PDB 5NCA. Indeed, without a high degree of proton identification, the assignment of NOESY peaks and production of distance restraints fails. Local hydrogen bonds and dihedral restraints often cannot compensate for lack of long-range NOEs within the protein interior or between elements of secondary structure.
      Together with these conserved structural features, the conservation en bloc of the genes encoding the Com pilus strongly suggests that the molecular mechanisms of filament assembly and DNA uptake are widely conserved in Firmicutes. These mechanisms, which remain poorly understood, can be advantageously studied in S. sanguinis, which has recently emerged as a monoderm T4P model (
      • Pelicic V.
      Monoderm bacteria: the new frontier for type IV pilus biology.
      ). Actually, S. sanguinis is so far the only monoderm expressing two distinct T4F, Com pili and retractable T4aP, which further cements it as a prime model species. Comparison with other T4F systems shows that the machinery involved in biogenesis of Com pili is one of the simplest, by far. Because ComGD, ComGE, ComGF, and ComGG pilins are likely to be minor pilus components important for filament stability and function (a conserved role for minor pilins in various T4F) (
      • Berry J.L.
      • Pelicic V.
      Exceptionally widespread nano-machines composed of type IV pilins: the prokaryotic Swiss Army knives.
      ), and ComC is the prepilin peptidase processing pilins (
      • Chung Y.S.
      • Breidt F.
      • Dubnau D.
      Cell surface localization and processing of the ComG proteins, required for DNA binding during transformation of Bacillus subtilis.
      ), it appears that assembly of ComGC into filaments is mediated by two proteins only. Namely, an extension ATPase (ComGA) and a platform protein (ComGB), which together will assemble processed ComGC into a right-handed helical filament. Upon DNA binding, which has been visualized for S. pneumoniae Com pili, but the receptor is yet to be identified (
      • Laurenceau R.
      • Pehau-Arnaudet G.
      • Baconnais S.
      • Gault J.
      • Malosse C.
      • Dujeancourt A.
      • Campo N.
      • Chamot-Rooke J.
      • Le Cam E.
      • Claverys J.P.
      • Fronzes R.
      A type IV pilus mediates DNA binding during natural transformation in Streptococcus pneumoniae.
      ), uptake will be initiated by filament retraction (
      • Ellison C.K.
      • Dalia T.N.
      • Vidal Ceballos A.
      • Wang J.C.
      • Biais N.
      • Brun Y.V.
      • Dalia A.B.
      Retraction of DNA-bound type IV competence pili initiates DNA uptake during natural transformation in Vibrio cholerae.
      ). Because there is no dedicated retraction ATPase, one possibility is that ComGA might be a bifunctional motor powering both extension and retraction like recently suggested for the T4cP motor (
      • Ellison C.K.
      • Kan J.
      • Chlebek J.L.
      • Hummels K.R.
      • Panis G.
      • Viollier P.H.
      • Biais N.
      • Dalia A.B.
      • Brun Y.V.
      A bifunctional ATPase drives tad pilus extension and retraction.
      ). It would be interesting to image Com filaments dynamics and DNA-binding ability in live cells, using a labeling strategy that has recently enabled the visualization of these steps for T4aP involved in competence in naturally competent diderm species (
      • Ellison C.K.
      • Dalia T.N.
      • Vidal Ceballos A.
      • Wang J.C.
      • Biais N.
      • Brun Y.V.
      • Dalia A.B.
      Retraction of DNA-bound type IV competence pili initiates DNA uptake during natural transformation in Vibrio cholerae.
      ).
      In conclusion, by providing high-resolution structural information for the ComGC pilins, this study has shed light on an understudied T4F involved in DNA uptake found in hundreds of monoderm bacterial species and has led to the surprising discovery of a novel type IV pilin fold. This paves the way for further investigations of this minimalist T4F, which are expected to improve our understanding of a fascinating superfamily of filamentous nanomachines ubiquitous in prokaryotes.

      Experimental procedures

      Bioinformatic analyses

      Protein sequences were routinely analyzed using the DNA Strider program. Protein sequence alignments were done using the Clustal Omega server at EMBL-EBI. Pretty-printing of alignment files was done using the BoxShade server at ExPASy. Reformatting of large multiple alignment files was done using the MView server at EMBL-EBI. Prediction of functional domains was done using the InterProScan server at EMBL-EBI, which was also used to download all the ComGC protein entries with an IPR016940 domain. Protein secondary structure prediction was done using the JPred server at the University of Dundee. Protein 3D structures were downloaded from the RCSB PDB server. Molecular visualization of protein 3D structures was done using PyMOL (Schrödinger). The GETAREA server, at UTMB, was used for calculating the solvent accessible surface area of ComGC proteins.
      Detection of the Com systems in genomes available in the NCBI RefSeq database (last accessed in April 2019, 13,512 genomes of Bacteria and Archaea) was done as described previously (
      • Denise R.
      • Abby S.S.
      • Rocha E.P.C.
      Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
      ), using MacSyFinder (
      • Abby S.S.
      • Néron B.
      • Ménager H.
      • Touchon M.
      • Rocha E.P.
      MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems.
      ) and the relevant HMM Com model (
      • Denise R.
      • Abby S.S.
      • Rocha E.P.C.
      Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
      ). Phylogenetic analysis based on protein sequences of major pilins of different T4F involved an initial alignment of the sequences using MAFFT version 7.273 (
      • Katoh K.
      • Standley D.M.
      MAFFT multiple sequence alignment software version 7: improvements in performance and usability.
      ), specifically the linsi algorithm. Multiple alignments were analyzed using Noisy version 1.5.12 (
      • Dress A.W.
      • Flamm C.
      • Fritzsch G.
      • Grunewald S.
      • Kruspe M.
      • Prohaska S.J.
      • Stadler P.F.
      Noisy: identification of problematic columns in multiple sequence alignments.
      ) with default parameters, to select the informative sites. Next, we inferred maximum likelihood trees from the curated alignments using IQ-TREE version 1.6.7.2 (
      • Nguyen L.T.
      • Schmidt H.A.
      • von Haeseler A.
      • Minh B.Q.
      IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.
      ), with option -allnni. We evaluated the node supports using the options -bb 1,000 for ultra-fast bootstraps, and -alrt 1,000 for SH-aLRT (
      • Hoang D.T.
      • Chernomor O.
      • von Haeseler A.
      • Minh B.Q.
      • Vinh L.S.
      UFBoot2: improving the ultrafast bootstrap approximation.
      ). The best evolutionary model was selected with ModelFinder (
      • Kalyaanamoorthy S.
      • Minh B.Q.
      • Wong T.K.F.
      • von Haeseler A.
      • Jermiin L.S.
      ModelFinder: fast model selection for accurate phylogenetic estimates.
      ), option -MF and BIC criterion. We used the option -wbtl to conserve all optimal trees and their branches length.

      Protein expression and purification

      A synthetic gene, codon-optimized for E. coli expression, encoding ComGCSS from S. sanguinis 2908 (
      • Gurung I.
      • Spielman I.
      • Davies M.R.
      • Lala R.
      • Gaustad P.
      • Biais N.
      • Pelicic V.
      Functional analysis of an unusual type IV pilus in the Gram-positive Streptococcus sanguinis.
      ) was synthesized and cloned by GeneArt, yielding pMA-T-comGCSS (Table S2). The portion of the gene encoding residues 23–94 from the mature protein was PCR-amplified using comGCSS-F and comGCSS-R primers (Table S3), cut with NcoI and BamHI, and cloned into the pET28b vector (Novagen) cut with the same enzymes. The forward primer was designed to fuse a noncleavable N-terminal His6 tag to ComGCSS. The resulting plasmid was verified by sequencing and transformed into chemically competent E. coli BL21(DE3) cells. A single colony was transferred to 10 ml of LB supplemented with 50 μg ml−1 of kanamycin and grown at 37 °C overnight. This pre-culture was back-diluted 100-fold into 1 liter of M9 minimal medium, supplemented with antibiotic, a mixture of vitamins, and trace elements, and d-[13C]glucose and [15N]NH4Cl for isotopic labeling. Cells were grown in an orbital shaker at 37 °C until the OD600 reached 0.7, before adding 0.4 mm IPTG (Merck Chemicals) to induce protein expression during 16 h at 18 °C. Cells were then harvested by centrifugation at 8,000 × g for 20 min and subjected to one freeze/thaw cycle in lysis buffer (PBS, pH 7.4, with EDTA-free protease inhibitors). This lysate was further disrupted by repeated cycles of sonication, pulses of 5 s on and 5 s off during 5 min, until the cell suspension was visibly less viscous. The cell lysate was then centrifuged for 20 min at 18,000 × g to remove cell debris. The clarified lysate was then passed using an ÄKTA Purifier FPLC through a 1-ml HisTrap HP column (GE Healthcare), pre-equilibrated in lysis buffer. The column was then washed extensively with lysis buffer to remove unbound material before His6-ComGCSS was eluted using elution buffer (PBS, pH 7.4, 200 mm NaCl, 300 mm imidazole). Affinity-purified ComGCSS was further purified by gel-filtration chromatography on an HiLoad 16/600 Superdex 75 column (GE Healthcare), using (25 mm Na2HPO4/NaH2PO4, pH 6, 200 mm NaCl) buffer for elution. For RDC measurements we produced 15N-labeled protein as follows. Bacteria grown overnight in 5 ml of LB with antibiotic were subcultured at 37 °C in 0.8 liters of LB to 0.6 OD600, and then transferred to 0.4 liters of M9 with [15N]NH4Cl, unlabeled d-glucose, and 10 μg liter−1 of thiamine. Cultures were induced with 0.3 mm IPTG at 16 °C for 18 h. After the production of a clarified lysate, protein was purified as above, except for the use of hand-made Ni-NTA-agarose (Qiagen) in 50 mm Tris, pH 8, 300 mm NaCl and eluted using 50 mm Tris, pH 8, 200 mm NaCl, 300 mm imidazole, and Superdex 75 10/300 GL (GE Healthcare) columns in 25 mm Tris, pH 8, 200 mm NaCl, and dialyzed into 25 mm Na2HPO4/NaH2PO4, pH 6, 50 mm NaCl.
      For ComGCSP, a codon-optimized synthetic gene based on the gene from S. pneumoniae R6 was synthesized and cloned by GeneArt, yielding pMA-T-comGCSP (Table S2). The portion of the gene encoding residues 23–93 from the mature protein was PCR-amplified using comGCSP-F and comGCSP-R primers (Table S3), cut with NcoI and BamHI, and cloned into the pET28b vector (Novagen) cut with the same enzymes. The forward primer was designed to fuse a noncleavable N-terminal His6 tag to ComGCSS. The resulting plasmid was verified by sequencing and transformed into chemically competent E. coli BL21(DE3) cells. A single colony was transferred to 5 ml of LB supplemented with 50 μg ml−1 of kanamycin and grown overnight at 37 °C. Bacteria were subcultured at 37 °C in 0.8 liters of LB with antibiotic to OD600 0.7, and then transferred into 0.4 liters of M9 with 10 μg liter−1 of thiamine, and either [15N]NH4Cl and unlabeled d-glucose, or [15N]NH4Cl and d-[13C]glucose. Cultures were induced with 0.3 mm IPTG at 16 °C for 18 h. After the production of a clarified lysate, ComGCSS was purified as above using hand-made Ni-NTA-agarose (Qiagen) and Superdex 75 10/300 GL (GE Healthcare) columns.

      NMR spectroscopy and structure determination

      All data were collected on Bruker Avance III HD 800 MHz and 600 MHz triple resonance spectrometers with cryoprobes operated at 25 °C. For ComGCSS, a sample containing 13C/15N-labeled protein at 1 mm in NMR buffer (25 mm Na2HPO4/NaH2PO4, pH 6, 50 mm NaCl, 5% D2O) was used for assignment experiments and structure determination. For ComGCSP, a sample containing 13C/15N-labeled protein at 1.8 mm in NMR buffer was used for assignment experiments and structure determination. Resonance assignments for ComGCSS were performed using 15N HSQC, 13C aliphatic HSQC, HNCACB, CBCACONH, HBHA, HNCO, HNCACO, HCCCONH, CCCONH, and CCH. For ComGCSP, assignments were performed using 15N HSQC, 13C aliphatic HSQC, HNCA, CBCANH, CBCACONH, HBHA, HNCO, HNCACO, HCCCONH, CCCONH, and CCH. All data were processed using MddNMR (
      • Orekhov V.Y.
      • Jaravine V.A.
      Analysis of non-uniformly sampled spectra with multi-dimensional decomposition.
      ) for reconstruction after Non-Uniform Sampling and NMRPipe (
      • Delaglio F.
      • Grzesiek S.
      • Vuister G.W.
      • Zhu G.
      • Pfeifer J.
      • Bax A.
      NMRPipe: a multidimensional spectral processing system based on UNIX pipes.
      ). Peak picking and assignments were performed in SPARKY (
      • Lee W.
      • Tonelli M.
      • Markley J.L.
      NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy.
      ).
      NOE peak lists were used, with mixing time of 140 ms, from 3D 13C-HSQC-NOESY, 3D 15N-HSQC-NOESY for ComGCSP, and simultaneous 13C/15N chemical shift evolution NOESY for ComGCSS. For both proteins, RDC lists were derived from 15N-HSQC-IPAP experiments on 15N-labeled isotropic and aligned sample in 3% PEG/hexanol liquid crystal, with D2O splitting of ∼7 Hz. RDCs were included in the structure calculations if there was baseline resolution and for residues where TALOS+ predicted order parameter of >0.8. Angular constraints from TALOS+ were used in the structure calculations. Both ComGCSS and ComGCSP structures were determined using Ponderosa-C/S (
      • Lee W.
      • Tonelli M.
      • Markley J.L.
      NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy.
      ), refined using Xplor-NIH 2.52 (
      • Schwieters C.D.
      • Kuszewski J.J.
      • Clore G.M.
      Using Xplor-NIH for NMR molecular structure determination.
      ), aligned using Theseus (
      • Theobald D.L.
      • Wuttke D.S.
      Accurate structural correlations from maximum likelihood superpositions.
      ), and the secondary structure checked using Stride (
      • Frishman D.
      • Argos P.
      Knowledge-based protein secondary structure assignment.
      ). Structure validation was performed using PSVS (
      • Bhattacharya A.
      • Tejero R.
      • Montelione G.T.
      Evaluating protein structures determined by structural genomics consortia.
      ), PROCHECK (
      • Laskowski R.A.
      • MacArthur M.W.
      • Moss D.S
      • Thornton J.M.
      PROCHECK: a program to check the stereochemical quality of protein structures.
      ), and in-house scripts.

      Modeling

      SWISS-MODEL server at ExPASy was used for modeling protein 3D structures. In brief, the full-length ComGCSS was modeled using N. gonorrheae major pilin PilE (PDB 2PIL) as a template (
      • Forest K.T.
      • Dunham S.A.
      • Koomey M.
      • Tainer J.A.
      Crystallographic structure reveals phosphorylated pilin from Neisseria: phosphoserine sites modify type IV pilus surface chemistry and fibre morphology.
      ). We first modeled the missing α1 residues in our structure, which was aligned to our Xplor-NIH–produced average NMR structure (without the first unstructured α1 residues) using PyMOL and finally merged using Coot (
      • Emsley P.
      • Lohkamp B.
      • Scott W.G.
      • Cowtan K.
      Features and development of Coot.
      ).
      Similarly, the full-length ComGCSS structure within filaments was modeled by using one of the PilE subunits from the cryo-EM model of N. gonorrheae T4P (PDB 5VXX) (
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ) as a template for the missing α1 residues in our structure. The Com pilus model was produced after alignment of the averaged NMR structure ComGCSS α1-helices to the α1-helices of SWISS-MODEL PilE-based homology model subunits in the N. gonorrheae T4P. This was also done for the recently published ComGCSP structure (PDB 5NCA). The structural elements were fused using Coot (
      • Emsley P.
      • Lohkamp B.
      • Scott W.G.
      • Cowtan K.
      Features and development of Coot.
      ). In addition, we modeled packing of full-length ComGCSS in the PAK pilus from P. aeruginosa (PDB 5VXY) (
      • Wang F.
      • Coureuil M.
      • Osinski T.
      • Orlova A.
      • Altindal T.
      • Gesbert G.
      • Nassif X.
      • Egelman E.H.
      • Craig L.
      Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
      ).

      Data availability

      The NMR solution structures of ComGCSS and ComGCSP have been deposited in the Protein Data Bank under accession numbers 6TXT and 6Y1H, respectively. Chemical shift assignments and NOE-based restraints used in structure calculations are available from the Biological Magnetic Resonance Data Bank under accession numbers 34477 and 34490, respectively. All the other data described in the manuscript are either contained within the manuscript, or are to be shared upon request to corresponding author.

      Author contributions

      D. S. and V. P. data curation; D. S., E. P. C. R., and V. P. formal analysis; D. S., E. P. C. R., S. M., and V. P. validation; D. S., J.-L. B., and R. D. investigation; D. S., J.-L. B., R. D., S. M., and V. P. methodology; D. S. and V. P. writing-original draft; R. D., E. P. C. R., and S. M. writing-review and editing; E. P. C. R. and V. P. supervision; E. P. C. R. and V. P. funding acquisition; S. M. and V. P. resources; V. P. conceptualization; V. P. project administration.

      Acknowledgments

      This work relied heavily on the use of the Cross-Faculty NMR Centre at Imperial College London. We are grateful to Nicolas Biais (City University of New York) and Romé Voulhoux (CNRS Marseille) for critical reading of the manuscript.

      Supplementary Material

      References

        • Berry J.L.
        • Pelicic V.
        Exceptionally widespread nano-machines composed of type IV pilins: the prokaryotic Swiss Army knives.
        FEMS Microbiol. Rev. 2015; 39 (25793961): 134-154
        • Denise R.
        • Abby S.S.
        • Rocha E.P.C.
        Diversification of the type IV filament superfamily into machines for adhesion, protein secretion, DNA uptake, and motility.
        PLos Biol. 2019; 17 (31323028): e3000390
        • Giltner C.L.
        • Nguyen Y.
        • Burrows L.L.
        Type IV pilin proteins: versatile molecular modules.
        Microbiol. Mol. Biol. Rev. 2012; 76 (23204365): 740-772
        • Jones P.
        • Binns D.
        • Chang H.Y.
        • Fraser M.
        • Li W.
        • McAnulla C.
        • McWilliam H.
        • Maslen J.
        • Mitchell A.
        • Nuka G.
        • Pesseat S.
        • Quinn A.F.
        • Sangrador-Vegas A.
        • Scheremetjew M.
        • Yong S.Y.
        • Lopez R.
        • Hunter S.
        InterProScan 5: genome-scale protein function classification.
        Bioinformatics. 2014; 30 (24451626): 1236-1240
        • Reardon P.N.
        • Mueller K.T.
        Structure of the type IVa major pilin from the electrically conductive bacterial nanowires of Geobacter sulfurreducens.
        J. Biol. Chem. 2013; 288: 29260-29266
        • LaPointe C.F.
        • Taylor R.K.
        The type 4 prepilin peptidases comprise a novel family of aspartic acid proteases.
        J. Biol. Chem. 2000; 275 (10625704): 1502-1510
        • Kolappan S.
        • Coureuil M.
        • Yu X.
        • Nassif X.
        • Egelman E.H.
        • Craig L.
        Structure of the Neisseria meningitidis type IV pilus.
        Nat. Commun. 2016; 7 (27698424): 13015
        • Wang F.
        • Coureuil M.
        • Osinski T.
        • Orlova A.
        • Altindal T.
        • Gesbert G.
        • Nassif X.
        • Egelman E.H.
        • Craig L.
        Cryoelectron microscopy reconstructions of the Pseudomonas aeruginosaNeisseria gonorrhoeae type IV pili at sub-nanometer resolution.
        Structure. 2017; 25 (28877506): 1423-1435.e4
        • Dubnau D.
        • Blokesch M.
        Mechanisms of DNA uptake by naturally competent bacteria.
        Annu. Rev. Genet. 2019; 53 (31433955): 217-237
        • Johnston C.
        • Martin B.
        • Fichant G.
        • Polard P.
        • Claverys J.P.
        Bacterial transformation: distribution, shared mechanisms and divergent control.
        Nat. Rev. Microbiol. 2014; 12 (24509783): 181-196
        • Berry J.L.
        • Gurung I.
        • Anonsen J.H.
        • Spielman I.
        • Harper E.
        • Hall A.M.J.
        • Goosens V.J.
        • Raynaud C.
        • Koomey M.
        • Biais N.
        • Matthews S.
        • Pelicic V.
        Global biochemical and structural analysis of the type IV pilus from the Gram-positive bacterium Streptococcus sanguinis.
        J. Biol. Chem. 2019; 294 (30837269): 6796-6808
        • Merz A.J.
        • So M.
        • Sheetz M.P.
        Pilus retraction powers bacterial twitching motility.
        Nature. 2000; 407 (10993081): 98-102
        • Cehovin A.
        • Simpson P.J.
        • McDowell M.A.
        • Brown D.R.
        • Noschese R.
        • Pallett M.
        • Brady J.
        • Baldwin G.S.
        • Lea S.M.
        • Matthews S.J.
        • Pelicic V.
        Specific DNA recognition mediated by a type IV pilin.
        Proc. Natl. Acad. Sci. U.S.A. 2013; 110 (23386723): 3065-3070
        • Ellison C.K.
        • Dalia T.N.
        • Vidal Ceballos A.
        • Wang J.C.
        • Biais N.
        • Brun Y.V.
        • Dalia A.B.
        Retraction of DNA-bound type IV competence pili initiates DNA uptake during natural transformation in Vibrio cholerae.
        Nat. Microbiol. 2018; 3: 773-780
        • Laurenceau R.
        • Pehau-Arnaudet G.
        • Baconnais S.
        • Gault J.
        • Malosse C.
        • Dujeancourt A.
        • Campo N.
        • Chamot-Rooke J.
        • Le Cam E.
        • Claverys J.P.
        • Fronzes R.
        A type IV pilus mediates DNA binding during natural transformation in Streptococcus pneumoniae.
        PLoS Pathog. 2013; 9: e1003473
        • Chen I.
        • Provvedi R.
        • Dubnau D.
        A macromolecular complex formed by a pilin-like protein in competent Bacillus subtilis.
        J. Biol. Chem. 2006; 281 (16751195): 21720-21727
        • Chung Y.S.
        • Dubnau D.
        ComC is required for the processing and translocation of ComGC, a pilin-like competence protein of Bacillus subtilis.
        Mol. Microbiol. 1995; 15: 543-551
        • Chung Y.S.
        • Dubnau D.
        All seven comG open reading frames are required for DNA binding during transformation of competent Bacillus subtilis.
        J. Bacteriol. 1998; 180 (9422590): 41-45
        • Muschiol S.
        • Erlendsson S.
        • Aschtgen M.S.
        • Oliveira V.
        • Schmieder P.
        • de Lichtenberg C.
        • Teilum K.
        • Boesen T.
        • Akbey U.
        • Henriques-Normark B.
        Structure of the competence pilus major pilin ComGC in Streptococcus pneumoniae.
        J. Biol. Chem. 2017; 292 (28659339): 14134-14146
        • Pelicic V.
        Monoderm bacteria: the new frontier for type IV pilus biology.
        Mol. Microbiol. 2019; 112 (31556183): 1674-1683
        • Gurung I.
        • Spielman I.
        • Davies M.R.
        • Lala R.
        • Gaustad P.
        • Biais N.
        • Pelicic V.
        Functional analysis of an unusual type IV pilus in the Gram-positive Streptococcus sanguinis.
        Mol. Microbiol. 2016; 99 (26435398): 380-392
        • Albano M.
        • Breitling R.
        • Dubnau D.A.
        Nucleotide sequence and genetic organization of the Bacillus subtilis comG operon.
        J. Bacteriol. 1989; 171 (2507524): 5386-5404
        • Mohan S.
        • Aghion J.
        • Guillen N.
        • Dubnau D.
        Molecular cloning and characterization of comC, a late competence gene of Bacillus subtilis.
        J. Bacteriol. 1989; 171 (2553669): 6043-6051
        • Imam S.
        • Chen Z.
        • Roos D.S.
        • Pohlschröder M.
        Identification of surprisingly diverse type IV pili, across a broad range of Gram-positive bacteria.
        PLoS ONE. 2011; 6 (22216142): e28919
        • Abby S.S.
        • Néron B.
        • Ménager H.
        • Touchon M.
        • Rocha E.P.
        MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems.
        PLoS ONE. 2014; 9 (25330359): e110726
        • Nguyen L.T.
        • Schmidt H.A.
        • von Haeseler A.
        • Minh B.Q.
        IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies.
        Mol. Biol. Evol. 2015; 32 (25371430): 268-274
        • Hoang D.T.
        • Chernomor O.
        • von Haeseler A.
        • Minh B.Q.
        • Vinh L.S.
        UFBoot2: improving the ultrafast bootstrap approximation.
        Mol. Biol. Evol. 2018; 35 (29077904): 518-522
        • Craig L.
        • Taylor R.K.
        • Pique M.E.
        • Adair B.D.
        • Arvai A.S.
        • Singh M.
        • Lloyd S.J.
        • Shin D.S.
        • Getzoff E.D.
        • Yeager M.
        • Forest K.T.
        • Tainer J.A.
        Type IV pilin structure and assembly. X-ray and EM analyses of Vibrio cholerae toxin-coregulated pilus and Pseudomonas aeruginosa PAK pilin.
        Mol. Cell. 2003; 11 (12769840): 1139-1150
        • Drozdetskiy A.
        • Cole C.
        • Procter J.
        • Barton G.J.
        JPred4: a protein secondary structure prediction server.
        Nucleic Acids Res. 2015; 43 (25883141): W389-W394
        • Krissinel E.
        • Henrick K.
        Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions.
        Acta Crystallogr. Sect. D Biol. Crystallogr. 2004; 60: 2256-2268
        • Shen Y.
        • Delaglio F.
        • Cornilescu G.
        • Bax A.
        TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts.
        J. Biomol. NMR. 2009; 44 (19548092): 213-223
        • Fraczkiewicz R.
        • Braun W.
        Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules.
        J. Comput. Chem. 1998; 19: 319-333
        • Meima R.
        • Eschevins C.
        • Fillinger S.
        • Bolhuis A.
        • Hamoen L.W.
        • Dorenbos R.
        • Quax W.J.
        • van Dijl J.M.
        • Provvedi R.
        • Chen I.
        • Dubnau D.
        • Bron S.
        The bdbDC operon of Bacillus subtilis encodes thiol-disulfide oxidoreductases required for competence development.
        J. Biol. Chem. 2002; 277 (11744713): 6994-7001
        • Parge H.E.
        • Forest K.T.
        • Hickey M.J.
        • Christensen D.A.
        • Getzoff E.D.
        • Tainer J.A.
        Structure of the fibre-forming protein pilin at 2.6 Å resolution.
        Nature. 1995; 378 (7477282): 32-38
        • López-Castilla A.
        • Thomassin J.L.
        • Bardiaux B.
        • Zheng W.
        • Nivaskumar M.
        • Yu X.
        • Nilges M.
        • Egelman E.H.
        • Izadi-Pruneyre N.
        • Francetic O.
        Structure of the calcium-dependent type 2 secretion pseudopilus.
        Nat. Microbiol. 2017; 2 (28993624): 1686-1695
        • Waterhouse A.
        • Bertoni M.
        • Bienert S.
        • Studer G.
        • Tauriello G.
        • Gumienny R.
        • Heer F.T.
        • de Beer T.A.P.
        • Rempfer C.
        • Bordoli L.
        • Lepore R.
        • Schwede T.
        SWISS-MODEL: homology modelling of protein structures and complexes.
        Nucleic Acids Res. 2018; 46 (29788355): W296-W303
        • Laskowski R.A.
        • MacArthur M.W.
        • Moss D.S
        • Thornton J.M.
        PROCHECK: a program to check the stereochemical quality of protein structures.
        J. Appl. Crystallogr. 1993; 26: 283-291
        • Chung Y.S.
        • Breidt F.
        • Dubnau D.
        Cell surface localization and processing of the ComG proteins, required for DNA binding during transformation of Bacillus subtilis.
        Mol. Microbiol. 1998; 29: 905-913
        • Ellison C.K.
        • Kan J.
        • Chlebek J.L.
        • Hummels K.R.
        • Panis G.
        • Viollier P.H.
        • Biais N.
        • Dalia A.B.
        • Brun Y.V.
        A bifunctional ATPase drives tad pilus extension and retraction.
        Sci. Adv. 2019; 5 (31897429): eaay2591
        • Katoh K.
        • Standley D.M.
        MAFFT multiple sequence alignment software version 7: improvements in performance and usability.
        Mol. Biol. Evol. 2013; 30 (23329690): 772-780
        • Dress A.W.
        • Flamm C.
        • Fritzsch G.
        • Grunewald S.
        • Kruspe M.
        • Prohaska S.J.
        • Stadler P.F.
        Noisy: identification of problematic columns in multiple sequence alignments.
        Algorithm. Mol. Biol. 2008; 3: 7
        • Kalyaanamoorthy S.
        • Minh B.Q.
        • Wong T.K.F.
        • von Haeseler A.
        • Jermiin L.S.
        ModelFinder: fast model selection for accurate phylogenetic estimates.
        Nat. Methods. 2017; 14 (28481363): 587-589
        • Orekhov V.Y.
        • Jaravine V.A.
        Analysis of non-uniformly sampled spectra with multi-dimensional decomposition.
        Prog. Nucl. Mag. Res. Spect. 2011; 59: 271-292
        • Delaglio F.
        • Grzesiek S.
        • Vuister G.W.
        • Zhu G.
        • Pfeifer J.
        • Bax A.
        NMRPipe: a multidimensional spectral processing system based on UNIX pipes.
        J. Biomol. NMR. 1995; 6 (8520220): 277-293
        • Lee W.
        • Tonelli M.
        • Markley J.L.
        NMRFAM-SPARKY: enhanced software for biomolecular NMR spectroscopy.
        Bioinformatics. 2015; 31 (25505092): 1325-1327
        • Schwieters C.D.
        • Kuszewski J.J.
        • Clore G.M.
        Using Xplor-NIH for NMR molecular structure determination.
        Prog. Nucl. Mag. Reson. Spect. 2006; 48: 47-62
        • Theobald D.L.
        • Wuttke D.S.
        Accurate structural correlations from maximum likelihood superpositions.
        PLoS Comput. Biol. 2008; 4 (18282091): e43
        • Frishman D.
        • Argos P.
        Knowledge-based protein secondary structure assignment.
        Proteins. 1995; 23 (8749853): 566-579
        • Bhattacharya A.
        • Tejero R.
        • Montelione G.T.
        Evaluating protein structures determined by structural genomics consortia.
        Proteins. 2007; 66 (17186527): 778-795
        • Forest K.T.
        • Dunham S.A.
        • Koomey M.
        • Tainer J.A.
        Crystallographic structure reveals phosphorylated pilin from Neisseria: phosphoserine sites modify type IV pilus surface chemistry and fibre morphology.
        Mol. Microbiol. 1999; 31 (10048019): 743-752
        • Emsley P.
        • Lohkamp B.
        • Scott W.G.
        • Cowtan K.
        Features and development of Coot.
        Acta Crystallogr. Sect. D Biol. Crystallogr. 2010; 66: 486-501