The streptococcal multidomain fibrillar adhesin CshA has an elongated polymeric architecture

The cell surfaces of many bacteria carry filamentous polypeptides termed adhesins that enable binding to both biotic and abiotic surfaces. Surface adherence is facilitated by the exquisite selectivity of the adhesins for their cognate ligands or receptors and is a key step in niche or host colonization and pathogenicity. Streptococcus gordonii is a primary colonizer of the human oral cavity and an opportunistic pathogen, as well as a leading cause of infective endocarditis in humans. The fibrillar adhesin CshA is an important determinant of S. gordonii adherence, forming peritrichous fibrils on its surface that bind host cells and other microorganisms. CshA possesses a distinctive multidomain architecture comprising an N-terminal target-binding region fused to 17 repeat domains (RDs) that are each ∼100 amino acids long. Here, using structural and biophysical methods, we demonstrate that the intact CshA repeat region (CshA_RD1–17, domains 1–17) forms an extended polymeric monomer in solution. We recombinantly produced a subset of CshA RDs and found that they differ in stability and unfolding behavior. The NMR structure of CshA_RD13 revealed a hitherto unreported all β-fold, flanked by disordered interdomain linkers. These findings, in tandem with complementary hydrodynamic studies of CshA_RD1–17, indicate that this polypeptide possesses a highly unusual dynamic transitory structure characterized by alternating regions of order and disorder. This architecture provides flexibility for the adhesive tip of the CshA fibril to maintain bacterial attachment that withstands shear forces within the human host. It may also help mitigate deleterious folding events between neighboring RDs that share significant structural identity without compromising mechanical stability.

The cell surfaces of many bacteria carry filamentous polypeptides termed adhesins that enable binding to both biotic and abiotic surfaces. Surface adherence is facilitated by the exquisite selectivity of the adhesins for their cognate ligands or receptors and is a key step in niche or host colonization and pathogenicity. Streptococcus gordonii is a primary colonizer of the human oral cavity and an opportunistic pathogen, as well as a leading cause of infective endocarditis in humans. The fibrillar adhesin CshA is an important determinant of S. gordonii adherence, forming peritrichous fibrils on its surface that bind host cells and other microorganisms. CshA possesses a distinctive multidomain architecture comprising an N-terminal target-binding region fused to 17 repeat domains (RDs) that are each ϳ100 amino acids long. Here, using structural and biophysical methods, we demonstrate that the intact CshA repeat region (CshA_RD1-17, domains 1-17) forms an extended polymeric monomer in solution. We recombinantly produced a subset of CshA RDs and found that they differ in stability and unfolding behavior. The NMR structure of CshA_RD13 revealed a hitherto unreported all ␤-fold, flanked by disordered interdomain linkers. These findings, in tandem with complementary hydrodynamic studies of CshA_RD1-17, indicate that this polypeptide possesses a highly unusual dynamic transitory structure characterized by alternating regions of order and disorder. This architecture pro-vides flexibility for the adhesive tip of the CshA fibril to maintain bacterial attachment that withstands shear forces within the human host. It may also help mitigate deleterious folding events between neighboring RDs that share significant structural identity without compromising mechanical stability.
Bacteria occupy almost every ecological niche on Earth (1,2). Their capacity to colonize diverse environments is in part enabled by their ability to adhere to the surfaces of materials and other cells. Adherence allows anchorage and persistence within a defined environment, confers significant evolutionary advantage, and promotes bacterial infection in animals and humans (3,4). Identifying and characterizing the cellular machineries employed by bacteria to adhere and colonize is of broad fundamental interest and may inform the development of anti-infective agents, medical devices, or vaccines (5,6).
Frequently, bacteria utilize proteinaceous surface decorations termed adhesins to facilitate attachment to extracellular target molecules. Different adhesins recognize and bind different (a)biotic targets, and there is considerable diversity in the molecular architectures of these important polypeptides. Larger filamentous adhesins may be grouped into one of two categories based on their distinguishing structural features: pili and fibrils. Pili have been implicated in numerous physiological processes and are found in both Gram-positive and Gram-negative bacteria (7)(8)(9). Fibrillar adhesins are produced by a wide variety of bacteria. They exhibit considerable sequence diversity, and much still remains to be learned about their structures and functions. Fibrils are usually composed of a single polypeptide, which is covalently anchored to the cell wall via a C-terminal LPXTG motif (10 -12).
Streptococcus species, including both commensal strains and pathogens, are prodigious producers of fibrillar adhesins (13)(14)(15)(16). Streptococcus gordonii, a pioneer oral bacterium and opportunistic pathogen, employs the fibrillar adhesin CshA (cell surface hydrophobicity protein A) to enable binding to host cell surfaces and other microorganisms (17). This ϳ259-kDa polypeptide shares Ͻ10% sequence identity to any protein of known structure (17)(18)(19). CshA possesses a distinctive multidomain architecture, comprising an N-terminal signal pep-tide (41 aa residues), 3 a nonrepetitive target binding region (778 aa), a repetitive region composed of 17 sequentially arrayed repeat domains (RDs; ϳ100 aa each), and an LPXTG anchor (see Fig. 1). CshA forms peritrichous fibrils of ϳ60 nm on the surface of S. gordonii (17), and heterologous expression of this protein on the surface of Enterococcus faecalis results in the formation of a dense furry layer comprised of multiple closely associated CshA polypeptides, which confers adhesive properties (17). Similarly, ⌬cshA strains of S. gordonii show reduced binding to other oral microorganisms and host molecules, including fibronectin (Fn) (18 -20). Recently, the molecular details of host Fn binding by CshA were established, with this polypeptide shown to bind Fn via a distinctive "catch-clamp" mechanism, mediated by discrete domains within the nonrepeat region of the protein (21). This mode of binding involves the action of the intrinsically disordered N-terminal domain of the protein and its neighboring ligand-binding domain, which function in concert to form a robust protein-protein interaction via a readily dissociable precomplex intermediate.
In this study, using a combination of structural and biophysical methods, we show that the Ͼ175-kDa multidomain repeat region of CshA (CshA_RD1-17) adopts an elongated polymeric structure in solution, with a distinctive conformation dictated by the interplay of fully and partially ordered domains and intrinsically disordered regions. Equilibrium folding studies of individual CshA repeat domains reveal diversity in the stabilities and unfolding profiles of these proteins, despite their often considerable (Ͼ90%) sequence identities. The NMR structure of CshA_RD13 has been determined, which identifies a previously unreported all ␤-fold flanked on either terminus by unstructured linker regions. Complementary AUC and smallangle X-ray scattering (SAXS) studies of CshA_RD1-17 provide support for the CshA repeat region adopting a transitory structure characterized by alternating regions of order and disorder. Together, our data suggest a molecular architecture within which individual repeat domains contribute additive strength to the intact polypeptide but also minimize the likelihood of domain misfolding that may arise as a consequence of high sequence and structural identity to adjacent RDs. This is enabled via the acquisition of destabilizing mutations that preclude the adoption of a fully folded state. Our work identifies a distinctive polymeric protein architecture and resolves the molecular intricacies of its structure and organization. In turn, this provides greater insight regarding the capacity for bacterial adhesins to promote colonization of sites within the host that are continuously exposed to the flow of blood, saliva, or tissue fluids.

The intact CshA repeat region adopts an extended polymeric structure in solution
Consistent with previous domain assignments, the repeat region of CshA was considered to comprise residues 820 -2500 of the 2507-amino acid full-length CshA polypeptide (21) (Fig.  1). The intact CshA repeat region, from here on referred to as CshA_RD1-17, was amplified from S. gordonii DL1 (22) chromosomal DNA and cloned into the pOPINF expression vector (23) ( Table S1). The resulting construct was used to facilitate overexpression of an N-terminally hexahistidine-tagged variant of CshA_RD1-17 in Escherichia coli, and the resulting recombinant material was purified to homogeneity using a twostep process. CshA_RD1-17 was found to be a homogeneous, monodisperse species in solution, of Ͼ95% purity. Analysis of CshA_RD1-17 using CD spectroscopy, followed by deconvolution of the resulting spectrum into secondary structural elements, revealed the protein to be predominantly ␤-sheet (ϳ45%), with a significant disorder content (ϳ39%; Fig. 2A). Sedimentation velocity analytical ultracentrifugation confirmed that the polypeptide is monomeric and adopts an extended configuration in solution with an f/f 0 value of 2.84 ( Fig. 2B and Table S2). Complementary SAXS analysis ( Fig. 2C and Table S3) provided further evidence that CshA adopts an elongated structure, with a radius of gyration of 120 Å and maximum diameter of 408 Å, as derived from the pair distance distribution (P(r)) function (Fig. 2D). Structural disorder is apparent from the Kratky plot, which diverges from the baseline at high q, and the Porod exponent, which is lower than observed for a well-folded globular protein (Fig. 2E). The structural disorder evident from CD and SAXS analysis suggests a flexible dynamic structure, in keeping with the biological role of CshA. The measured scattering data are well-described by the flexible cylinder model ( Fig. 2F and Table S4), in which CshA is characterized by a higher Kuhn length and lower contour length than that expected for a random coil. The large deviation from random coil behavior is consistent with a significant proportion of folded regions in the solution structure. These data imply that the polypeptide adopts an elongated, flexible ultrastructure in solution that occupies an ensemble of configurations.

Structure of S. gordonii CshA Individual CshA repeat domains exhibit varying stabilities and unfolding behaviors
Having established the solution ultrastructure of CshA_ RD1-17, we next sought to investigate the molecular origins of the polypeptide's physical properties. Comparative sequence analysis of assigned CshA repeat domains reveals considerable variation in the amino acid sequences of these regions ( Fig. 3A and Fig. S1). The repeat region comprises a central core of domains with very high sequence identity (domains [3][4][5][6][7][8][9][10][11][12][13][14] punctuated by the deviant repeat domain 7. The sequence of this domain diverges significantly from those of the other 16 repeat domains that comprise the intact repeat region. Surprisingly, a significant number of adjacent domains located within the central 3-14 core exhibit high sequence identity. Domains 3 and 4, domains 5 and 6, domains 10 and 11, domains 11 and 12, and domains 12 and 13 share Ͼ90% sequence identity (Fig. 3A), an arrangement that contravenes current dogma regarding the organization of tandemly arrayed domains within multidomain proteins (24). The sequence identities of the terminal domains of CshA_RD1-17, namely 1, 15, 16, and 17, are significantly lower than those identified in the central core region. Interestingly, in addition to repeat domain 17, domains 6, 11, 12, and 13 all possess a C-terminal LPXTG cell-wall anchor motif, suggesting that evolutionary pressure to present the adhesive nonrepeat region of CshA at a maximal distance from the cell surface may have driven extension of the repeat region via gene duplication.
In an effort to explore the structural significance of sequence variation between individual CshA repeat domains, a subset of these proteins were cloned, recombinantly overexpressed in E. coli, and purified to homogeneity using the same general strategy (Table S1). Representative domains were selected covering a breadth of amino acid sequences. These were repeat domains 1, 3, 5, 7, and 13. Each could be readily produced in high quantities and to high purities (Ͼ95%, as judged by SDS-PAGE analysis). The stabilities and unfolding behaviors of each of these proteins were assessed in vitro by monitoring their unfolding in the presence of increasing concentrations of the chemical denaturant urea ( Fig. 3B and Table 1). Unfolding behavior was monitored by intrinsic tyrosine fluorescence, exploiting the presence of at least one such residue in each of

Structure of S. gordonii CshA
the repeats 1, 3, 5, 7, and 13. Of the isolated domains examined, CshA_RD13 exhibited the highest overall stability (Ϫ3.42 kcal mol Ϫ1 ), whereas remarkably, CshA_RD5 showed no fluorescence intensity change when titrated with urea, despite the 91% sequence identity with repeat 13, including the two tyrosine residues at precisely the same positions: 52 (residue Tyr 2084 ) and 92 (residue Tyr 2123 ) (Fig. S1). CD spectroscopy of repeat domain 5 also indicated that this domain was largely unstructured, even in the absence of urea (data not shown). Although CshA_RD3 and CshA_RD7 are less stable than CshA_RD13, they do exhibit a mildly cooperative unfolding transition, whereas CshA_RD1 is barely stable even in the absence of urea but also exhibits a weakly cooperative unfolding transition. Complementary size-exclusion chromatography (SEC) analyses of individual CshA repeat domains provide further support for variability in the degree of foldedness of these proteins (Fig.  S2). The largely unfolded CshA_RD5 elutes earlier from a SEC column than its better folded counterparts and significantly earlier than the well-folded CshA_RD13.

Structure of S. gordonii CshA Solution structure of CshA_RD13
In an effort to provide a structural framework for the observed biophysical properties of CshA_RD1-17, CshA_RD13, which possesses the highest cross-domain sequence identity to all other CshA repeat domains (Fig. 3A), was selected for structure elucidation. Of the five single repeat domains produced recombinantly, CshA_RD13 has the greatest tolerance to urea unfolding, suggestive of high stability (Fig. 3B). The structure of this protein was determined using solution NMR ( Fig. 4 and Figs. S3 and S4). Assignment proved challenging because of repetitive sequence motifs and a high degree of mobility leading to both the absence of some signals and the doubling (or more) of others (Fig. 4A). Nonetheless, a high degree of assignment was achieved for the core region of the protein covering residues 2053-2130 ( Table 2). The N-terminal region (residues 2032-2052 plus a 19-residue tag) was found to be largely unstructured with few inter-residue NOEs and no unambiguously assignable long-range NOEs. For this reason, no structural restraints were included for this part of the sequence, and the structure was only calculated and validated for residues 2053-2130. In addition to the high degree of disorder in the N-terminal part of CshA_RD13, several other regions of slow exchange (ms) were detected. Two sets of NMR signals were observed for the initial N-terminal loop comprised of residues 2053-2062, of which only the major set was used for structure calculations. A hydrogen-deuterium exchange experiment showed that the Val 2059 NH group is involved in a hydrogen bond that persists for over an hour, suggesting that interconversion between these conformations is either very slow or, more likely, that they are very similar and both involve a hydrogen bond between Val 2059 H and Asp 2056 O (as determined from initial structure calculations conducted without hydrogen bond restraints). Multiple conformations were also observed for residues Asp 2113 and Asn 2115 , which lie in the ␤5-␤6 loop. The ␤5-␤6 loop lies adjacent to the N-terminal loop, suggesting that slow exchange between these two regions may be coupled. The ␤4-␤5 loop (Pro 2096 -Pro 2106 ) and C-terminal tail (Ser 2124 -Val 2130 ) are both ill-defined in the structural ensemble (Fig. 4B), which is in part due to several broad, missing, or unassigned signals and thus a low density of structural restraints that might reflect the underlying dynamics of these regions. Several residues along the outside edge of the ␤3 and ␤5 strands have NOEs that could not be assigned to residues within the globular domain. Most likely these arise from interactions with the N-terminal tail of the protein, although no unambiguous assignment to particular residues was possible.
CshA_RD13 adopts a ␤-sandwich fold comprising two threestranded anti-parallel ␤-sheets arranged at an angle of ϳ35°r elative to one another (Fig. 4C). The two sheets are connected by an 11-amino acid linker that fuses ␤4 to ␤5 and forms an extended loop that wraps around the C-terminal apex of the protein. The interface between the two ␤-sheets is predominantly hydrophobic and forms the compact core of the protein (Fig. 4D). There is a high degree of amino acid sequence conservation in this region in other CshA repeat domains, suggesting that each domain retains this unique core fold. In addition to the highlighted hydrophobic residues, the two tyrosine resi-dues of CshA_RD13 (Tyr 2084 and Tyr 2123 ) reside at the ␤-sandwich interface (Fig. 4D), adding credence to the validity of our folding studies. Assessment of the solvation state of this pair of residues is likely to provide an accurate measure of protein unfolding. The overall shape of CshA_RD13 can be likened to a cylinder, which is tapered at both termini. The terminal regions of the protein present sizeable patches of charge, suggestive that neighboring repeat domains may be able to engage in complementary charge-charge interactions with one another. Four of the five residues universally conserved in all 17 repeat domains (Gly 2082 , Gly 2090 , Gly 2102 , and Asp 2113 in CshA_RD13) contribute to the constrainment of tight turns between individual ␤-strands (␤2-␤3, ␤3-␤4, ␤4-␤5, and ␤5-␤6; SI). The fifth residue is located in the disordered N-terminal interdomain linker.

CshA_RD1-17 adopts a transitory dynamic structure comprising alternating regions of order and disorder
To reconcile our structural and biophysical data, we attempted to construct a pseudoatomic model describing the molecular architecture of CshA_RD1-17 in its entirety. The partial foldedness of the polypeptide implied from our SAXS and CD data, in addition to the variations in unfolding behavior of selected repeat domains and the observation of disordered linker regions at either terminus of the CshA_RD13 NMR structure, suggests that CshA may adopt a structure comprised of alternating regions of order and disorder. To verify this model, we applied an ensemble optimization method (EOM) to our SAXS data ( Fig. 5 and Table 3). Homology models of each repeat domain were generated and used to formulate pseudoatomic models describing the molecular architecture of CshA_RD1-17, in which well-folded domains alternate with disordered regions approximated by a random coil. The data were well-described by a model containing all 17 RD homology structures, although the calculated ensemble R g (99 Å) was lower than that determined experimentally (120 Å) ( Fig. 5 and Table 3). The contour length of this structure is ϳ660 Å, more than half of that determined from the data using the flexible polymer model. However, this model underestimates the proportion of disordered structure as measured using CD. To compensate, only repeat domains predicted to be largely ordered (1, 3-4, 7-8, 14 -16) were included in the model, yielding an ensemble with an average R g in agreement with our experimentally measured value (Fig. 5 and Table 3). These findings indicate that the ultrastructure of CshA_RD1-17 does not adhere to a standard "beads-on-a-string" configuration, wherein individual well-folded RDs are arranged in a defined sequence within the polypeptide chain, but rather a highly dynamic architecture wherein a subset of RDs fail to adopt a fully folded state, thus leading to a highly dynamic transitory structure dominated by the interplay of ordered, disordered, and partially ordered regions.

Discussion
Fibrillar adhesins are an important family of bacterial surface proteins that make significant contributions to environmental and host colonization, biofilm formation, host tissue invasion, and pathogenicity. As virulence factors, they represent attrac-

Structure of S. gordonii CshA
tive targets for the development of therapeutic strategies and interventions. Although many fibrillar adhesins have been identified in commensal and pathogenic bacteria, only a small number of these proteins have been subjected to detailed molecular level characterization. Examples include SasG, M protein, and the AgI/II family polypeptides (10,12,(26)(27)(28)(29)(30)(31)(32)(33). Each of these adhesins exploits a startlingly disparate molecular mechanism to facilitate the formation of fibrillar structures on the bacterial cell surface.
The S. gordonii fibrillar adhesin CshA plays an important role in host colonization. CshA possesses a distinctive modular architecture that comprises 17 ␤-sandwich domains fused in series by flexible linkers. Although there is diversity in the sequences of individual repeat domains, amino acid sequence analysis suggests that each retains a conserved hydrophobic core that forms the basis of a compact protein fold. The structure of the representative repeat domain CshA_RD13 has been elucidated and provides a valuable test subject for understanding CshA repeat domain structure and function. The high degree of mobility in CshA_RD13 made assignment and structure calculation for this protein challenging; nonetheless, the core globular part of the protein is well-defined (Fig. 4). DALI analysis of CshA_RD13 failed to identify any closely related structural homologues of the protein, and technically the domain exhibits a new fold. However, the flattened ␤-sandwich is reminiscent of Ig domains found in many other repeat domain-containing proteins such as titin and cadherin (25).
Folding studies of individual CshA RDs reveals remarkably variable stabilities considering their high sequence identities (Fig. 3). Five of the repeat domains (domains 1, 3, 5, 7, and 13) were expressed individually and subjected to equilibrium unfolding to assess their relative stabilities. Repeat domains 3, 7, and 13 all displayed a cooperative unfolding transition with a relatively small free energy of folding, although not unusual for small domains (for example, see the study by Gruszka et al. (27)). Equilibrium unfolding of CshA_RD1 revealed a weakly cooperative transition (m D-N ϭ 0.66 kcal mol Ϫ1 M Ϫ1 ) and only very marginal stability (0.64 kcal mol Ϫ1 ), indicating that a sig-

Structure of S. gordonii CshA
nificant proportion of the molecules are unfolded even in native conditions. Because CshA_RD1 is markedly divergent from all of the other repeat domains, it is difficult to relate differences in sequence to changes in stability. Interestingly, the sequences of the terminal repeat domains CshA_RD1 and CshA_RD17 differ considerably from those located centrally within the polypeptide. This may reflect the fact that they have coevolved to be adjacent to a nonrepeat domain and the cell wall, respectively. Because this repeat follows the nonrepetitive region in the overall CshA structure, it may require the presence of that region to interact and stabilize it. No transition could be observed at all with CshA_RD5, which is surprising because it has 91% identity with CshA_RD13. An examination of the differences between the primary sequences of CshA_RD5 and CshA_RD13 with respect to the NMR structure of the latter suggests some differences that may be responsible for destabilizing CshA_RD5 relative to CshA_RD13. Thr 2077 on ␤1b and both Pro 2118 and Thr 2122 on ␤5 in CshA_RD13 are all solventexposed to some degree and have been substituted with valine, leucine, and isoleucine, respectively, in CshA_RD5, leading to unfavorable exposure of hydrophobic residues to the aqueous solvent. Pro 2079 , which forms part of a type II ␤-turn between strands ␤1b and ␤2, is substituted with a serine, which statistically has a greater preference for type I ␤-turns.
Mapping amino acid conservation across all 17 repeat domains onto the structure of CshA_RD13 indicates partial conservation of hydrophobic residues that reside within the hydrophobic cores of each RD. The central section of CshA_RD1-17 comprises 12 of 13 serially arrayed repeat domains that possess a high degree of sequence identity and appear closely structurally related (Fig. 3A). The sequential arrangement of high similarity domains is at odds with the known sequence to folding relationships in tandemly arrayed protein domains, in which sequence disparity between neighboring domains is postulated to minimize protein misfolding (24). It is tempting to speculate that interdomain linker length and disorder plays an important role in this process, ensuring that the spatial distance between neighboring domains is sufficient to allow each individual domain to adopt its fully folded conformation prior to translation of its superseding neighbor. However, what is clear from our folding studies and corroborated by hydrodynamic analysis of CshA_RD1-17 is that a subset of CshA RDs do not adopt a well-folded conformation either alone or in the context of the intact CshA polypeptide. This generalized loss in foldedness appears to arise because of the acquisition of destabilizing mutations within the hydrophobic core of some repeat domains. Significantly, these mutations appear to arise in instances where there is significant amino acid sequence identity to neighboring domains ( Fig. 3A and Fig.  S1). This may represent a strategy to minimize the likelihood of inter-domain misfolding events, thus mitigating adhesin aggregation on the bacterial cell surface. Alternatively, the solventexposed hydrophobic residues may help to mediate interaction between CshA polypeptides during assembly of the cell-surface adhesive layer.
The functional significance of the dynamic transitory structure of CshA_RD1-17 is yet to be unambiguously established, however, it is unquestionable that the combination of folded and partially folded regions will confer a high degree of flexibility to the polypeptide. This may enable the optimal projection of CshA's adhesive tip from the S. gordonii cell surface and in doing so maximize the capture radius of the adhesin. In addition, the partially folded structure may provide a mechanism of force damping following fibronectin binding. This could offer a mechanical advantage by mitigating the effects of shear forces following target engagement. This would be of particular significance in the bloodstream, where it is necessary for S. gordonii to maintain an intimate association with the surface of host cells while resisting the force of blood flow. The transitory structure of CshRD1-17 would provide a deformable tether with the capacity to dissipate the kinetic energy of binding under flow.
In summary, here we report the identification and characterization of an entirely new architecture for multidomain bacterial surface proteins as typified by the S. gordonii adhesin CshA. This ultrastructure is characterized by the presence of fully and partially folded repeat domains, along with regions of intrinsic disorder, which affords a dynamic yet mechanically robust polymeric structure. Our study extends the diversity of natural protein architectures that are employed to enable microbial adherence to biotic and abiotic substrata and provides new insight into the capacity for bacteria to adhere and persist at sites exposed to shear forces. Moreover, this information establishes a foundation for the development of interventions that target CshA and related polypeptides that can be applied to disease prevention and anti-biofouling strategies.

Gene cloning
DNA sequences encoding CshA_RD1-17, CshA_RD1, CshA_RD3, CshA_RD5, CshA_RD7, and CshA_RD13 were amplified from S. gordonii DL1 (22) chromosomal DNA using appropriate primers (Table S1), incorporating appropriate consensus sequences for subsequent cloning into the expression vector pOPINF (23), precut with HindIII and KpnI. Ligations were performed using the In-Fusion TM (Clontech) cloning system as per the manufacturer's instructions. The resulting constructs encode N-terminally hexahistidine-tagged variants of each of the proteins under investigation. The sequences of all constructs were verified by DNA sequencing before being transformed into E. coli BL21 (DE3) cells for protein expression.

Protein expression
For the expression of unlabeled CshA_RD 1-17, CshA_RD1, CshA_RD3, CshA_RD5, CshA_RD7, and CshA_RD13, cultures of E. coli BL21 (DE3) cells harboring the respective expression plasmid were grown with shaking (200 rpm) in 1 liter of LB (Luria-Bertani) broth supplemented with carbenicillin (50 g ml Ϫ1 ) at 37°C, to A 600 ϭ 0.4 -0.6. Protein expression was induced by the addition of isopropyl ␤-galactopyranoside to a final concentration of 1 mM, and the cell cultures were transferred to 20°C with shaking at 200 rpm and grown for a further 16 h. For expression of 15 15 NH 4 Cl. The cells were grown with shaking at 37°C to A 600 ϭ 0.4 -0.6 and were then grown with shaking (200 rpm) at 20°C for a further 16 h. For expression of 15 N 13 C-labeled CshA_RD13, a culture (100 ml) of E. coli BL21 (DE3) cells harboring CshA_RD13::pOPINF was grown overnight with shaking at 37°C. The cells were harvested by centrifugation, washed in resuspension buffer, and used to inoculate 2 liters of M9 minimal medium (50 mM KH 2 PO 4 , 25 mM Na 2 HPO 4 , pH 6.8, 10 mM NaCl, 1 mM MgSO 4 , 0.3 mM CaCl 2 , 1 mg ml Ϫ1 biotin, 1 mg ml Ϫ1 thiamin), supplemented with carbenicillin (50 g ml Ϫ1 ), trace elements (5 ml/liter, 100ϫ), 0.5 g/liter 15 NH 4 Cl, and 2 g/liter 13 C glucose. The cells were grown to A 600 ϭ 0.8 -0.9. Protein expression was induced by the addition of isopropyl ␤-galactopyranoside (1 mM), and the cell cultures were transferred to 25°C, with shaking at 200 rpm and grown for a further 16 h.

Protein purification
All recombinant proteins were purified using the same general strategy. The cells were harvested by centrifugation and lysed. Cell debris was removed by centrifugation, and the remaining supernatant liquids were applied to a HiTrap Ni 2ϩ affinity column (GE Healthcare). The proteins were eluted with an imidazole gradient of 10 -500 mM over 15 column volumes. The fractions (2 ml) found to contain the target protein of interest (as identified by SDS-PAGE analysis) were pooled and concentrated. Protein samples were subjected to further purification using SEC by passage through either a Superdex 16/60 S75 column (CshA_RD1, CshA_RD3, CshA_RD5, CshA_RD7, and CshA_RD13) or a Superdex 16/60 S200 column (CshA_RD1-17), both from GE Healthcare. For unlabeled proteins, SEC was performed in 50 mM Tris-HCl, 150 mM NaCl, pH 7.5. For labeled proteins, SEC purification was performed in 20 mM phosphate, 50 mM NaCl, pH 7.5. Protein-containing fractions were pooled, concentrated to 20 mg ml Ϫ1 , and stored at 4°C.

Analytical ultracentrifugation
Sedimentation velocity analytical ultracentrifugation experiments were performed using a Beckman Optima XL-I. Sedimentation of the CshA_RD1-17 was monitored at 40000 rpm and 20°C using the UV-visible absorption system at a wavelength of 280 nm. The sample concentration was 6.22 M in buffer (20 mM Tris-HCl, 150 mM NaCl, pH 7.5). The sedimentation profiles were fitted in SEDFIT using the continuous distribution c(s) Lamm equation model. The partial specific volume of CshA_RD1-17 (0.7279 cm 3 g Ϫ1 ) was calculated from the primary sequence using SEDFIT. The density and viscosity of the buffer were measured using an Anton-Paar rolling-ball viscometer (Lovis 2000 M/ME) and found to be 1.002921 g cm 3 and 1.0218 mPa⅐s, respectively.

Small angle X-ray scattering
SAXS data of CshA_RD1-17 were collected at the Diamond Light Source synchrotron (Beamline B21) with a fixed camera length configuration (4.014 m) at 12.4 keV. Size-exclusion chromatography-coupled SAXS (SEC-SAXS) using an Agilent HPLC system was utilized to collect the data. The sample was measured at a concentration of 25.8 M in buffer (20 mM Tris-HCl, 150 mM NaCl, 5 mM KNO 3 , 1% sucrose, pH 7.5). Twodimensional scattering profiles were reduced using in-house software. The data were scaled, merged, and background-subtracted using the ScÅtter software package (34). GNOM and BAYESAPP were used to generate pair distance distribution plots from the scattering curves. Form factor fitting was carried out with SASVIEW using a flexible cylinder model. The model describes a chain that is defined by the contour length (L) and the Kuhn length (b). The Kuhn length is defined as twice the persistence length, over which the chain can be described as rigid, and values above that expected for a random coil can be ascribed to the range of possible torsional angles between residues and to folded structural elements within the polypeptide. The contour length is the linearly extended length of the particle without stretching the backbone. For completely disordered chains behaving as a random coil, b is between 18 -20 Å. The theoretical contour length for a fully disordered protein is 3.84 Å per residue and is defined by the number of residues and the spacing between C␣ positions. EOM was to analyze the experimental data using the ensemble optimization. RANCH was used to generate a pool of 10,000 independent conformational models based on the primary sequence and homology models of folded RD domains. GAJOE was used to select an ensemble of models whose combined theoretical scattering profiles best approximated the measured data using a genetic algorithm.

Proteolytic His-tag cleavage
Following nickel affinity and size-exclusion purification of recombinant CshA_RD1, CshA_RD3, CshA_RD5, CshA_RD7, and CshA_RD13 CshA proteins, their hexahistadine tags were cleaved off by 3C protease digestion (Pierce). This was carried out according to manufacturer's protocol (Pierce): 3C protease (1 mg ml Ϫ1 ) was incubated with His-tagged CshA protein (5 mg ml Ϫ1 ) overnight at 4°C with agitation. The cleaved CshA proteins were separated from the uncleaved material by passage through a HiTrap Ni 2ϩ affinity column (GE Healthcare) equilibrated with buffer (20 mM potassium phosphate, 100 mM NaCl, pH 7.0). Cleaved protein was eluted with 5 column volumes of the same buffer. Uncleaved protein was then eluted with elution buffer (20 mM potassium phosphate, 100 mM NaCl, 1 M imidazole, pH 7.0). Cleaved protein was concentrated to 5-10 mg ml Ϫ1 .

Equilibrium unfolding studies
Equilibrium unfolding studies were performed by monitoring the change in intrinsic tyrosine fluorescence as a consequence of increasing urea concentration. All spectra were collected using a Horiba-Jobin YVON Fluorolog. Protein con-

Structure of S. gordonii CshA
centrations of 10 M in buffer (20 mM potassium phosphate, 100 mM NaCl, pH 7.0), plus varying concentrations of urea, were mixed, and samples were left to equilibrate for 1 h at 20°C prior to analysis. All fluorescence experiments were performed at 23°C. For each sample, an emission spectrum was measured over the range 290 -320 nm using an excitation wavelength of 278 nm. For analysis, the fluorescence intensity at 306 nm was plotted as a function of urea concentration, and the data were fitted to a two-state equilibrium unfolding model.

NMR spectroscopy
NMR data sets were collected at 20°C, utilizing a Varian VNMRS 600-MHz spectrometer with a cryogenic cold probe. All NMR data were processed using NMRPipe (35). 1  were also recorded at 20°C on a Varian INOVA 900 MHz spectrometer with a cryogenic cold-probe (Henry Wellcome Building for NMR, University of Birmingham). Proton chemical shifts were referenced with respect to the water signal relative to DSS. Spectra were assigned using CcpNmr Analysis 2.4 (36). Structure calculations were conducted using ARIA 2.3 (37). 20 structures were calculated at each iteration except iteration 8, in which 200 structures were calculated. The 20 lowest energy structures from this iteration went on to be water-refined, and the 15 lowest energy structures were chosen as a representative ensemble. Network anchoring was used during iterations 0, 1, and 2, and all iterations were corrected for spin diffusion (38). Two cooling phases, each with 8000 steps, were used. Torsion angle restraints were calculated using both TALOSϩ (39) and DANGLE (40). Restraints were included for residues where both programs gave an unambiguous result in the same area of the Ramachandran plot. The restraints were based on those provided by DANGLE but extended if the TALOSϩ restraints went beyond these. This process resulted in slightly fewer, looser restraints than either program on their own but aimed to reduce the number of over-restrained angles. 1 angle restraints were introduced for Val 2107 and Val 2109 because the orientation of these side chains was clearly defined by their NOE pattern, although the selection of structures based on global energy scores meant that not all structures resulted in these orientations unless these restraints were introduced. The hydrogendeuterium exchange experiment showed 28 NH groups to be protected after 8 min, including two Gln side-chain amides (see Fig.  S4). In addition, NOEs were observed to a ThrH␥1 hydrogen, suggesting that this was also involved in a hydrogen bond. Initial structure calculations were conducted without hydrogen bond restraints. Hydrogen bond donors were then identified, and corresponding hydrogen bond restraints were included in later calculations. Structures were validated using the Protein Structure Validation Software suite 1.5 (41) and CING (42).