Surface-exposed loops and an acidic patch in the Scl1 protein of group A Streptococcus enable Scl1 binding to wound-associated fibronectin

Keratinized epidermis constitutes a powerful barrier of the mucosa and skin, effectively preventing bacterial invasion, unless it is wounded and no longer protective. Wound healing involves deposition of distinct extracellular matrix (ECM) proteins enriched in cellular fibronectin (cFn) isoforms containing extra domain A (EDA). The streptococcal collagen-like protein 1 (Scl1) is a surface adhesin of group A Streptococcus (GAS), which contains an N-terminal variable (V) domain and a C-terminally located collagen-like domain. During wound infection, Scl1 selectively binds EDA/cFn isoforms and laminin, as well as low-density lipoprotein (LDL), through its V domain. The trimeric V domain has a six-helical bundle fold composed of three pairs of anti-parallel α-helices interconnected by hypervariable loops, but the roles of these structures in EDA/cFn binding are unclear. Here, using recombinant Scl (rScl) constructs to investigate structure–function determinants of the Scl1–EDA/cFn interaction, we found that full-length rScl1, containing both the globular V and the collagen domains, is necessary for EDA/cFn binding. We established that the surface-exposed loops, interconnecting conserved α-helices, guide recognition and binding of Scl1-V to EDA and binding to laminin and LDL. Moreover, electrostatic surface potential models of the Scl1-V domains pointed to a conserved, negatively charged pocket, surrounded by positively charged and neutral regions, as a determining factor for the binding. In light of these findings, we propose an updated model of EDA/cFn recognition by the Scl1 adhesin from GAS, representing a significant step in understanding the Scl1–ECM interactions within the wound microenvironment that underlie GAS pathogenesis.

Group A Streptococcus (GAS) 3 is a human-adapted pathogen responsible for over 700 million disease-associated infections across the globe each year and an estimated healthcare cost of 224 -539 million dollars in the United States alone (1,2). GAS asymptomatically colonizes the general population of adults (5-15%) and children (20 -30%), and most of the morbidity is associated with superficial infections of the throat and skin (3). In addition, bacterial spread may result in severe soft tissue and systemic infections, like necrotizing fasciitis and streptococcal toxic shock syndrome (4), or may lead to post-infectious sequalae, such as acute rheumatic fever and rheumatic heart disease, post-streptococcal glomerulonephritis, and pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (5,6). For the initial colonization of the human host, GAS requires a portal of entry, which is constituted by a breach in the epithelial cell lining (referred to hereafter as a wound) that allows the pathogen to spread into deeper tissue and blood. GAS expresses numerous surface adhesins that bind to host extracellular matrix (ECM) (7). ECM binding facilitates initial localized colonization, during which GAS forms tissue microcolonies embedded in the glycocalyx (8). The interplay between GAS adhesins and the host's ECM components, stable tissue microcolony formation, and superficial versus invasive infection are of significant clinical interest. Among prominent GAS surface proteins are the streptococcal collagen-like (Scl) proteins 1 and 2 (9 -13). Scl1 and Scl2 are homotrimeric proteins that have similar domain organization, which is composed of an N-terminal globular sequence-variable (V) domain, a collagen-like (CL) domain, and the C-terminal cell wall-anchoring domain (9,10). The CL domain of both Scl1 and Scl2 consists of varying numbers of Gly-Xaa-Yaa col-This work was supported in part by National Institutes of Health Grants AI50666 and AI083683, as well as West Virginia University HSC Bridge Grant Funding (to S. L.). The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This article contains Table S1. 1  cro ARTICLE lagen repeats that adopt a right-handed triple-helix structure, similar to mammalian collagen (14 -16); the Scl1-CL domain may interact with human collagen-binding integrins, promoting GAS internalization (17)(18)(19). The outermost-exposed V domains vary in amino acid sequence between and within Scl1 and Scl2 variants and are typically associated with M-type antigen (9,11). Despite significant sequence variation, the V domains of Scl1 and Scl2 were predicted to contain two conserved ␣-helices, spaced by hypervariable regions (11,20). Recent reports of a first crystal structure for the Scl2-V domain validated these secondary structure predictions (21,22). The recombinant Scl2-V globular domain forms a trimeric structure, which folds into a six-helix bundle, with three pairs of anti-parallel ␣-helices interconnected by hypervariable loops (22). Despite both Scl1 and Scl2 having similar structural organization, they play different roles in GAS pathogenesis (23). Whereas the number of Scl2 targets identified remains scanty (24), Scl1 variants from diverse GAS M types bind several host proteins. Ligands include tissue components, cellular fibronectin (cFn), and laminin (Lm), as well as blood components, lowand high-density lipoproteins, and complement Factor H (CFH) and Factor H-related protein 1 (CFHR1) (20,(25)(26)(27)(28). It is not clear how the variable sequences of the Scl1-V domain bind the same ligands, but two mutually exclusive binding patterns were observed. Scl1 proteins from several M types bind to ECM proteins, cFn and Lm, and low-density lipoprotein (LDL), but not FH and CFHR1; the converse is also true.
One of the major ECM targets for GAS within wounded tissue is fibronectin (Fn) (7). Fn is a high-molecular weight glycoprotein that is found throughout the human body in two main forms, either as a soluble circulating plasma fibronectin (pFn) secreted by hepatocytes or as an insoluble cellular fibronectin (cFn) secreted by a variety of cells within tissues (29). All forms of Fn are transcribed from a single gene, FN1, and are composed of three modules of fibronectin repeats: types I, II, and III (29,30). There are about 20 isoforms of human cFn that include combinations of the type III repeats (e.g. extra domain A (EDA/EIIIA) and B (EDB/EIIIB)) and a varied number of V repeats (30 -32). These isoforms differ by spatial-temporal distribution following injury (33, 34). Specifically, the EDA-containing isoforms are normally expressed during embryogenesis, although they are found in negligible levels in most adult tissues, except in healing wounds, where EDA/ cFn is highly up-regulated (33, 35). The EDA fold is a ␤-sandwich structure formed by seven antiparallel ␤-strands (i.e. A, B, C, CЈ, D, E, and F) connected by loops (36,37). The C-CЈ loop is an important site for cell attachment, via integrin receptors ␣ 9 ␤ 1 and ␣ 4 ␤ 1 (38,39), and attachment to the C-CЈ loop by these integrins has been shown to regulate important tissue-repair and wound-healing functions (40 -44).
Here, we established that surface-exposed loops within the V domain structure guide Scl1 to bind EDA/cFn, as well as two other Scl1 ligands, laminin and apolipoprotein B100 (apoB100) of LDL. A second conserved constituent within variable Scl1-V domain is a common topography of the electrostatic surface potential surrounding guiding loops, which includes a negatively charged patch formed by collective Asp and Glu residues contributed by all three monomeric chains. The proposed model accommodates binding of common ligands by phylogenetically diverse Scl1 variants. This work implies the evolutionary response of the Scl1-V domain to selective pressure, resulting in the retention of the conserved structure and function of ligand binding. This underscores the importance of Scl1-binding to EDA-containing fibronectins deposited at the pathogen portal of entry.

The V domains of Scl1 and Scl2 contain conserved helix-loop-helix structures
Two streptococcal collagen-like proteins, Scl1 and Scl2, display a similar "lollipop-like" structural organization (Fig. 1A), which is composed of an N-terminal globular V-domain head and a C-terminal CL-domain stalk (14). Multiple-sequence alignments illustrate a significant variations in the V regions, both between and within the Scl1 and Scl2 families (Fig. 1B). Despite sequence variation, several secondary structure prediction algorithms deduced that both Scl1-V and Scl2-V domains contain two conserved ␣-helices interconnected by hypervariable regions. Crystallography data rendered for the recombinant Scl2-V construct, derived from M3-type GAS, revealed that rScl2.3-V protein forms a homotrimer, with three pairs of conserved ␣-helices forming a six-helix bundle and hypervariable regions forming surface-exposed loops (PDB code 4NSM). To examine the conservation of the helix-loop-helix motif, homology models of two sequence-diverse Scl1-and Scl2-V domains were generated (Fig. 1C), using the structure of the Scl2.3-V domain as template. The homology model structure of the Scl1.1-V domain from M1-type GAS was based on the consensus sequence alignment (24% sequence identity on 65 amino acid residues), and the model of the Scl2.28-V domain from M28 GAS was obtained with the same approach (51.5% identity with Scl2.3-V sequence on 74 residues). Homology models predict with high confidence that Scl1.1-V (E ϭ 3.5 ϫ 10 Ϫ16 ) and Scl2.28-V (E ϭ 1.8 ϫ 10 Ϫ40 ) domains each adopt a six-helix bundle trimeric structure similar to that of Scl2.3-V template. The solved crystal structure and homology models rendered here show that globular domains of Scl1.1 and Scl2.28 variants can trimerize without the presence of the CL domain, an observation that was also found in Scl2 proteins from other M types (45). Furthermore, we hypothesized that trimeric rScl1-V constructs will retain their binding function to ECM ligands.

The conserved structure of Scl1-V domain requires the collagen domain for ligand binding
Full-length rScl1 proteins, derived from different GAS M types, bind to host cFn via interactions with EDA, which is mediated by Scl1-V domain (25,46). Here, we examined whether the V domain-alone rScl1 constructs bind to cFn and rEDA. Full-length recombinant Scl1 constructs, derived from M1-type (rScl1.1FL) and M3-type (rScl1.3FL) strains, as well as their corresponding V domain-only proteins (rScl1.1V and rScl1.3V) were immobilized in wells and tested for ECM-ligand binding. Unexpectedly, neither rScl1.1V nor rScl1.3V constructs bound rEDA and cFn, whereas the corresponding fulllength rScl1FL control proteins bound both ligands (Fig. 2). Thus, whereas the V domains of Scl1 proteins can fold indepen-Structural adaptation of Scl1-V domain dently of their CL domains, there are currently unknown structural constraints that require the presence of continuous CL domains for EDA/cFn recognition.

The loop region of Scl1-V domain plays a central role in ECM recognition and binding
The binding site for the Scl1-V domain within the loop segment was located between the C-and CЈ-␤-strands of the EDA ␤-sandwich (46). Here, we hypothesized that EDA binds to surface-exposed loops of the Scl1.1-V domain, which we have since identified in the crystal structure and deduced in the homology models (Fig. 1C). To test this hypothesis, we generated two full-length chimeric rScl.chi proteins: (i) we inserted the 22-amino acid loop sequence of the ECM binding-positive rScl1.1 variant (residues 25-46 of the V domain) in place of the respective loop sequence of the ECM binding-negative protein

Structural adaptation of Scl1-V domain
rScl2.28 (residues 20 -45), resulting in the chimeric construct rScl.chi1FL (Fig. 3A); and (ii) the reverse was also constructed, by inserting the 26-amino acid loop region of rScl2.28-V in place of loop sequence of rScl1.1-V, resulting in rScl.chi2FL. To identify the proper site for inserting these sequences, we considered secondary structure predictions, computed with JPRED; resulting models were then minimized using GROMACS.
To ensure that we did not substantially alter the overall structure of the chimeric proteins, both rScl.chi1FL and rScl.chi2FL were examined by CD spectroscopy and EM of rotary-shadowed samples (Fig. 3B). The wavelength scans of the original rScl1.1FL and rScl2.28FL proteins each exhibit a positive peak of molar ellipticity at 220 nm, [] 220 (14), which is a characteristic of the collagen triple helix (47). The CD spectra of both chimeric proteins, rScl.chi1FL and rScl.chi2FL, rendered at 20°C, show peak ellipticity values at 220 nm (Fig. 3B, solid lines), thus demonstrating that both chimeric constructs adopt triple helix structures. These peaks were not observed at 50°C, indicating that each triple helix unfolded (Fig. 3B, dashed lines). The domain organization of both chimeric rScls was further examined via rotary shadowing and subsequent EM. This technique has been used previously by our laboratory and others to investigate the structure of multidomain collagens, containing collagenous and noncollagenous domains (14,48). Both chimeric rScls displayed the characteristic two-domain lollipop-like organization, with the CL domain, as the stalk, and the globular V-domain head (Fig. 3B). Thus, swapping the loop regions between rScl1.1FL and rScl2.28FL did not alter the overall structural organization and triple helices in the chimeric proteins.
We next examined the binding characteristics of both chimeric proteins to rEDA and cFn ligands, with rScl1.1FL and rScl2.28FL serving as positive and negative controls, respec-tively. Binding data show that rScl.chi1FL chimera, containing donor-loop sequence of rScl1.1FL, binds both rEDA and cFn, whereas the reverse construct rScl.chi2FL did not bind to either ligand, similar to negative control protein rScl2.28FL (Fig. 3C). These data demonstrate the importance of the loop region in ECM binding for Scl1.1FL and hypothetically for other Scl1 variants.
To assess the importance of the Scl1.1-V loop in rEDA recognition in the context of whole-GAS cells, we complemented in trans the scl1-inactivated mutant (⌬scl1) of M1-type strain (9) for the expression of original Scl1.1 (⌬scl1::scl1.1) or chimeric Scl.chi2 protein (⌬scl1::scl.chi2), using plasmids pSL620 and pSL621 ( Table 1). The M1-type WT, and ⌬scl1 mutant strains were used as positive and negative controls, respectively (Fig.  3D, left). Expression of surface Scl1.1 and Scl.chi2 was assessed via Western immunoblotting of the cell wall fractions. Additionally, rScl1.1FL protein was loaded, as a positive control (rScl1.1FL is smaller on the SDS gel because it is devoid of the cell wall region of native Scl1.1 protein). Western blot analysis detected an immunoreactive band of ϳ55 kDa, corresponding to the Scl1.1 surface protein (9), in WT and both complemented strains but not in ⌬scl1 mutant complemented with vector alone. It should be noted that lower amounts of protein samples, prepared from complemented strains (5 l versus 25 l of WT and ⌬scl1 mutant samples), were loaded on the SDS gel to avoid an overwhelming chemiluminescent signal. This is not surprising, because Scl1.1FL and Scl.chi2FL proteins are expressed in complemented strains from multicopy plasmids. A similar expression pattern was recorded by flow cytometry of GAS cells. Approximately 66% of the WT cells had detectable levels of surface Scl1.1 and only 2% background level recorded for the ⌬scl1 mutant, compared with 77-92% detection recorded for trans-complemented ⌬scl1::scl1.1 and ⌬scl1::scl.chi2, respectively (not shown).
We next investigated the rEDA binding to whole GAS cells by flow cytometry (Fig. 3D, right). As expected, the WT and ⌬scl1::scl1.1-complemented cells bind rEDA, although the complemented strain binds ϳ4 times more ligand due to overexpression of the surface Scl1.1 protein. In contrast, neither ⌬scl1 mutant nor complemented GAS cells, overexpressing the Scl.ch2 protein, bind rEDA. Thus, GAS cells that express native Scl1 adhesin bind rEDA to the cell surface but lose this capacity once cells express a mutated variant with loop substitution from Scl2 protein, which does not bind EDA/cFn.

The C terminus of the loop region of the Scl1-V domain plays a central role in ECM recognition and binding
We next designed additional chimeric rScl constructs to identify the specific region of the loop involved in rEDA and cFn binding. Instead of replacing the entire 22 amino acids, the Scl1.1-V loop was substituted into Scl2.28-V loop in three segments (Fig. 4A). For the first construct, only the N-terminal 11 amino acids of the Scl1.1-V loop (residues 25-35) were substituted in place of the first 11 amino acids of the corresponding Scl2.28-V sequence (residues 20 -30), resulting in construct rScl.chiNFL. Similarly, the C-terminal 11 amino acids of Scl1.1-V (residues 36 -46) were substituted in place of the All three chimeric rScls, containing partial-loop substitutions, were examined by CD spectroscopy and EM to confirm the essential structural characteristics (Fig. 4B). The CD spectra of all three chimeric proteins showed peak ellipticity values at 220 nm at 20°C, whereas these peaks were not observed at   Graph bars indicate the mean A 415 nm normalized against BSA controls. Statistical analysis was calculated using Student's two-tailed t test, from three independent experiments, each performed in triplicate wells (n ϭ 3 Ϯ S.D.): **, p Յ 0.01; ***, p Յ 0.001. Statistical significance evaluates the differences between ECM binding by chimeric proteins rScl.chi1FL and rScl.chi2FL, and the control proteins, rScl2.28FL and rScl1.1FL. Solid (rEDA) and dashed (cFn) lines indicate threshold A 415 nm Ϯ 2 S.D. values (error bars) recorded for experimentally validated ECM binding-negative rScl2.28FL control protein. D, rEDA binding to whole GAS cells. M1-type GAS WT, isogenic scl1 inactivated mutant (⌬scl1), and mutant complemented for the expression of Scl1.1 (⌬scl1::scl1.1) and Scl.chi2 (⌬scl1::scl.chi2) proteins are shown. Left, Western blotting detection of Scl1 expression in cell wall fractions. 25 l of cell wall samples prepared from GAS WT and ⌬scl1 and 5 l of samples from complemented ⌬scl1::scl1.1 and ⌬scl1::scl.chi2 cells overexpressing Scl proteins were loaded per well. rScl1.1FL protein was included as a control. Scl1 was detected with anti-Scl1.1-V antibody. Right, rEDA binding to whole GAS cells by flow cytometry. Cells were incubated with rEDA, and binding was detected with primary anti-His tag mAb and secondary Ab conjugated to Alexa Fluorா 568. Binding to WT cells was set as 100%. Statistical analysis was calculated using Student's two-tailed t test from three independent experiments (n ϭ 3 Ϯ S.D.): **, p Յ 0.01. 50°C. Rotary shadowing images show that each chimeric rScl adopts a lollipop-like structural organization, with both CLdomain stalks and globular V domains. Thus, like whole-loop substitutions, partial-loop substitutions did not impact the structures of the chimeric rScls.

Structural adaptation of Scl1-V domain
Binding data to cFn and rEDA indicated that only one of the three chimeras with partial-loop substitutions, rScl.chiCFL (C-terminal substitution), binds to cFn and rEDA, similar to the whole-loop-substitution construct rScl.chi1FL and the loop donor protein rScl1.1FL (Fig. 4C); the other two chimeric proteins, as well as binding-negative control protein rScl2.28FL, do not bind to cFn and rEDA. We observed substantially lower OD values recorded during rEDA binding to rScl.chi1FL and rScl.chiCFL, compared with rScl1.1FL. This may reflect less than optimal loop conformation in chimeric proteins associated with rScl2 backbone. Additional binding assays were carried out with increasing concentrations (0.039 -5.0 M) of rEDA ligand and rScl proteins to assess binding specificity (Fig.  4D). Concentration-dependent binding of rEDA to rScl.chi1FL, rScl.chiCFL, and control protein rScl1.1FL plateaued between 2.5 and 5.0 M concentration. In contrast, neither rScl.chiNFL, rScl.chiIFL, rScl.chi2FL, nor control rScl2.28FL showed significant binding to rEDA. Therefore, these data provide evidence that the C-terminal 11 amino acids of rScl1.1-V loop are sufficient for conferring EDA/cFn binding, when inserted into binding-negative rScl2.28 protein.

Distribution of the charged residues on the surface of V domain hints at a binding mechanism for EDA/cFn
Our initial examination of the multiple-sequence alignment of Scl1-and Scl2-V domains failed to identify any conserved pattern of residues that could be responsible for EDA/cFn binding (Fig. 1A). Our subsequent experiments, employing the chimeric loop sequences, highlighted a pattern of charged amino acids (e.g. Lys 45(Scl1.1) , Glu 46,47(Scl1.1) , and Asp 48(Scl1.1) ) that were present within the uppermost portion of the Scl1-V loops and transitioning to inner ␣-helices but were absent in the respective regions of the ECM binding-negative Scl2-V loops (Fig.   1B). Therefore, we sought to investigate the surface potentials of the Scl1-, Scl2-and chimeric rScl.chi-V domains.
Electrostatic surface maps were computed from the homology models of the ECM binding-positive and -negative proteins. Electrostatic maps computed for Scl1.1-and Scl2.28-V domains reveal striking differences in the distribution of charges across their surfaces. A uniform, negatively charged electrostatic potential characterizes the ECM binding-negative Scl2.28-V on the frontal apical region of the molecule, opposite to the triple-helix attachment (Fig. 5, black ovals). These charges are due to acidic residues located on the C-terminal part of the first outer ␣-helix, the loop region, and N-terminal part of the second inner ␣-helix. In this same area, the electrostatic potential map of the binding-positive Scl1.1-V domain presents a central negatively charged pocket or cleft (Fig. 5,  black arrowhead), surrounded by a neutral and basic crown surface. Although less pronounced, the same features of electrostatic potential, with a central negatively charged region surrounded by a neutral/basic surface, are present in maps of the binding-positive chimeric rScl.chi1 and rScl.chiC-V domains, but not binding-negative chimeric rScl proteins. Therefore, the specific charge pattern of the V-domain head strongly differentiates between binding-positive and binding-negative V domains. These considerations suggest that interactions with partnering molecules occur through electrostatic recognition of the observed charge pattern.

Scl1-V loop is responsible for the ECM binding in phylogenetically distant Scl1 variants
Phylogenetic analysis of the amino acid sequences of the Scl1-V and Scl2-V domains show that both Scl1 and Scl2 form two distinct clusters (Fig. 6A). Within Scl1, there are two observed ligand-binding patterns. The first pattern is represented by Scl1 variants (Fig. 6A, black boxes) that bind host ECM proteins, cFn/EDA and Lm (25) and apoB100-LDL (20). The second pattern is represented by two Scl1 variants (Fig. 6A, black dots) that bind host complement components, CFH and CFHR1 (26,27), suggesting the potential for a unique adaptation of Scl1 proteins in M6 and M55 GAS. In contrast, the majority of rScl1 constructs investigated show the former binding pattern, and these Scl1 variants are distributed all over many branches of the Scl1 cluster, highlighting the importance of ECM and LDL binding among GAS of various M types (20,25,46,49,50).
We next investigated whether the loop region of the phylogenetically distant Scl1.28-V variant, found in M28-type GAS, which shares 46% sequence identity with Scl1.1-V sequence (Fig. 6A, arrows), will support EDA/cFn binding. To do this, we substituted the loop region of Scl1.28-V (residues 25-46) in place of the loop region of Scl2.28-V (residues 20 -45), resulting in the chimeric construct rScl.chi3FL (Fig. 6B). The CD spectra of rScl.chi3FL showed peak ellipticity at 220 nm at 20°C, but not at 50°C, and rotary-shadowed images show characteristic lollipop-like domain organization (Fig. 6C). Binding data to cFn and rEDA show that rScl.chi3FL binds both ligands (Fig. 6D). Thus, as in Scl1.1-V, the loop region of the divergent Scl1.28-V variant confers binding to both rEDA and cFn. Despite significant efforts, we could not generate a homology

Structural adaptation of Scl1-V domain
model of Scl1.28-V domain with a satisfactory global model quality estimation, due to the low sequence identity with the template structure of Scl2.28-V domain, close to 12%.

The same structural constraints determine recognition and binding of the Scl1-V domain to laminin and apolipoprotein B100-LDL
Here, we sought to understand whether the same structural constraints apply during Scl1 binding to Lm and apoB100-LDL (Fig. 7). Both rScl1.1FL and rScl1.3FL bound to Lm and apoB100, whereas their respective V domain-only counterparts, rScl1.1V and rScl1.3V, demonstrated negligible or no binding to both ligands (Fig. 7A). Chimeric constructs rScl.chi1FL, rScl.chiCFL, and rScl.chi3FL also bind to Lm and apoB100-LDL, whereas binding-negative control proteins rScl2.28FL and rScl.chi2FL (reverse loop replacement to that in rScl.chi1FL) did not (Fig. 7, B and C); in addition, both rScl.chiN and rScl.chiI did not bind to Lm and apoB100-LDL (data not shown). Collectively, our data show that (i) full-length Scl1 protein is required for binding to all ligands, and (ii) the C-terminal 11 amino acids of the Scl1-V loop region are sufficient to mediate binding of Scl1 to its sequence-and structure-diverse ligands.

Discussion
Adhesion to Fn is important for the pathogenesis of group A Streptococcus (7) by promoting the development of glycocalyxembedded tissue microcolonies (in vivo biofilms) that are seen in clinical specimens collected from cases of pharyngitis and skin infections (8,51). We previously reported that the streptococcal collagen-like protein 1, or Scl1, uniquely adheres to cFn isoforms expressed in wounded tissue and that interactions involve the V globular domain of Scl1 and EDA of cFn (25,46). Here, we report on the structural determinants involved in Scl1 binding to cFn, in the context of diversity among fibronectinbinding proteins of GAS and other bacterial pathogens. We also report that the same determinant of the Scl1 adhesin is responsible for binding to Lm and apoB100-LDL complex.
Several mammalian collagens, such as collagen type I, IV, and X, contain an N-terminal and/or C-terminal noncollagenous domain, which serves as a nucleation or trimerization domain, thereby facilitating proper folding of the three adjacent ␣-chains into a mature collagen triple helix (52). Similarly, bacterial collagen-like proteins contain noncollagenous trimerization domains, often of unknown functions (14,45). The BclA of Bacillus anthracis endospores possesses a C-terminal noncollagenous trimerization domain, BclA-CTD, for CL-domain folding and increased thermal stability (53). Two different recombinant BclA-CTD constructs had the same crystal structure, similar to a noncollagenous globular domain of C1q, independent of whether these constructs included the continuous CL domain (54,55). Likewise, the N-terminal V domains of Scl1 and Scl2 assisted in the trimerization of homologous and heterologous CL domains (45,56), and the recombinant Scl2-Vonly construct formed a trimeric six-helix bundle structure (22). However, here we found that the rScl1-V-only constructs, derived from the corresponding full-length rScl1FL proteins, did not bind rEDA, cFn, Lm, and apoB100-LDL. We propose that the triple helix imposes a restraint on the V domain in a way to destabilize it and make it more prone to conformational changes. We expect a slight effect on the tertiary structure but not on the secondary structure. A reason for the restraint may be the one-residue stagger of triple helices,

Structural adaptation of Scl1-V domain
whereas the V domains are perfectly aligned; these structural features probably distort the V domain slightly, to make it more flexible. Whereas Scl1 is unique among GAS Fn-binding proteins in recognizing the EDA type III repeat of cFn, several other bacterial adhesins may in fact recognize the EDA, due to their selective binding to cFn but not pFn, such as FimH of pathogenic Escherichia coli, OppA, and Msp of Treponema denticola and Tp0155 of Treponema pallidum (57). In addition, numerous bacterial surface proteins bind Fn type III repeats other than EDA that are present in both pFn and cFn, including proteins ShdA, ComE1, Pap31, and Ali, that are associated with the outer membrane of Gram-negative bacteria (58 -61) and fibrillar surface proteins Embp and PavA-B of Gram-positive bacteria (62,63

Structural adaptation of Scl1-V domain
located on ␤-strands B, C, and F, as well as the AB and BC loops (63). This is, however, different from the Scl1-binding site on EDA, mapped to the C-CЈ loop (46). Interestingly, PavA/B did not bind the RGD sequence on FnIII 10 , which is recognized by the integrin ␣ 5 ␤ 1 , or the synergistic sites on repeats III 8 and III 9 (63), whereas the Scl1 binding site on the C-CЈ loop maps in the vicinity of the binding site for integrins ␣ 9 ␤ 1 and ␣ 4 ␤ 1 (39,46). PavA and PavB binding to the FnIII repeats was modeled to be largely due to electrostatic interactions; however, basic amino acids were primarily responsible for binding, rather than negatively charged residues, as modeled here for Scl1-V. Thus, Scl1 remains a unique adhesin that targets EDA/cFn isoforms that are expressed at the pathogen portal of entry.
Here, we identified the loop region of the Scl1-V domain as the structural determinant that is central to EDA/cFn recognition and binding. When the 22-amino acid loop sequence of the EDA binding-positive Scl1-V variant, derived from the global M1-type strain, was transplanted into the corresponding region of the binding-negative Scl2 protein, the chimeric construct gained EDA-binding activity. When the reverse transplantation was performed, ligand binding was lost both in recombinant protein and when expressed on the surface of Scl1 mutant GAS. A similar gain of binding function by the chimeric protein was achieved when the loop region from a phylogenetically distant Scl1 variant from M28-type GAS was substituted into the Scl2-V loop. A novel "catch-clamp" Fn-binding mechanism was deciphered for the multifunctional filamentous adhesin of Streptococcus gordonii, CshA (64). The CshA protein, which also lacks the classical Fn-binding repeats, binds both pFn and cFn via its nonrepetitive domains 1 and 2. The NR2 globular domain has a ␤-sandwich structure with a lectin-like fold and probably binds glycosylated sites on Fn. The Fn-binding site of  Homology modeling of the Scl1-V domain revealed that electrostatic surface potential and distribution of charged residues probably participate in ligand recognition and binding. A negatively charged region is formed in the center of the Scl-V trimer of ligand binding-positive proteins modeled (Scl1.1, rScl.chi1, and rScl.chiC) that are surrounded by neutral and positively charged residues. In contrast, the electrostatic models of the binding-negative Scl-V domains (Scl2.28, rScl.chi2, and rScl.chiN) display a uniform negative surface charge throughout the crown of the Scl-V domain. The arrangement of the surface potential observed in the Scl1-V domains was associated with a pattern of conserved negatively charged amino acids, primarily Glu 46,47(Scl1.1) and Asp 48(Scl1.1) , located at the C terminus of the loop sequence at the junction with the ␣-helices forming the inner core of Scl1.1-V bundle. This charge distribution is reminiscent of the ligand-gating loops, capping negatively charged binding clefts, proposed in CshA and BspA (64,67). This conserved pattern is observed in Scl1 variants that are evolutionarily distant on the phylogenetic tree; for example, Glu 48,50 of Scl1.28 from M28 GAS. Therefore, phylogenetically distant Scl1 proteins, which have significant amino acid sequence variation, exhibit conservation in the arrangement of charged amino acids and similar ligand-binding partners.
We previously reported that the Scl1-V domain bound EDA/ cFn, Lm, and apoB100-LDL, thus supporting a dichotomous nature of Scl1-ligand binding in tissue and blood (23). Here, we established that the same regions of the globular Scl1 domain are critical in binding with all three ligands. It is not unusual that bacterial adhesins are multifunctional and bind several distinct ligands. Such proteins typically contain clearly defined functional domains; for example, the ECM-binding protein of S. epidermidis Embp, discussed earlier, contains both the Fnbinding FIVAR domains and the G-related albumin-binding GA domains (62). The collagen domain in streptococcal surface proteins is also found in association with additional domains; for instance, the pneumococcal protein PclA (69) harbors both FIVAR and G5 (which binds GlcNAc) domains, in addition to the collagen domain (23). The Scl1-V domain is different in that the short sequence-variable loop region and a conserved arrangement of the surface potential at the apical plane of the Scl1-V structure are involved in binding of assorted ligands that do not share substantial sequence or structural similarities. These data, combined with previous molecular evolutionary analysis (9,11), as well as structural and phylogenetic analyses (20,22), indicate that retaining the conserved structure of the Scl1-V domain is the selective constraint that influenced sequence variation.
Our work demonstrates several critical structure-function constraints that determine ligand-binding capacity by the globular domain of a major group A streptococcal adhesin, Scl1. Current prospects for a global GAS vaccine, based on M protein, are problematic because more than 240 M-protein types have been reported, especially in regions outside North America and Europe, where vaccines are acutely needed. The EDA binding to diverse Scl1 variants could be developed into a nonconventional treatment option for GAS infections. The EDAderived cyclic peptide bound Scl1 with an affinity in the micromolar range (49). The ongoing research explores the possibility of designing an improved EDA-derived peptide for targeting diverse GAS strains. This work provides an important structural understanding of Scl1 adhesin, which is essential during early stages of GAS infection and lays the foundation for the development of nonvaccine inhibitors of Scl1-ligand interactions for the prevention of infection progression.

Production of rScls
Gene cloning and rScl protein production were performed in E. coli TB1 and BL-21, respectively, grown in Luria-Bertani medium (Fisher) with ampicillin (100 g ml Ϫ1 ) at 37°C. rSclencoding clones, which were derived from the original scl alleles, were generated by PCR amplification from GAS genomic DNA; primers contained flanking sequences compatible with E. coli expression vector pASK-IBA2 digested with BsaI. Clones encoding the chimeric rScl proteins were generated using synthetic dsDNA fragments (gBlocks; Integrated DNA Technologies) flanked by PflMI and BsaI sites. All plasmids were verified by DNA sequencing. Plasmid constructs used in this study are described in Table 1, and gBlocks are shown in Table S1.
All rScl proteins were produced using the Strep-tag II cloning, expression, and purification system (IBA-GmbH, Geottingen, Germany). Proteins were expressed with a C-terminal affinity tag and purified on Strep-Tactin Sepharose, as described (14,15). The full-length (rSclFL) and V-region-only (rSclV) constructs were made, as designated. rScl1.1FL and rScl1.1V are derived from Scl1 protein in M1-type strain MGAS6708 (14); rScl1.28FL and rScl2.28FL originate from M28 strains MGAS6274 (14); and rScl1.3FL and rScl1.3V are derived from M3 strain MGAS315 (49). Both naturally derived and chimeric rScl proteins were expressed in E. coli BL21 periplasm following the induction with anhydrotetracycline at 0.2 g ml Ϫ1 for 3 h. Cells were centrifuged and suspended in high-sucrose buffer (100 mM Tris-HCl, 1 mM EDTA, pH 8.0, 500 mM sucrose) or Cell Lytic B Buffer (Sigma) for separation of the periplasmic fraction and subsequent affinity purification. Purified proteins were analyzed by SDS-PAGE and stained with RAPIDstain TM (G-Biosciences); proteins were dialyzed against Structural adaptation of Scl1-V domain 25 mM HEPES, pH 8.0, and stored at Ϫ20°C. Protein concentrations were determined using Qubit fluorometric quantitation (ThermoFisher).

rEDA production
Recombinant EDA (Table 1) was produced using the pQE-30 His tag cloning, expression, and purification system (Qiagen) in the E. coli strain JM-109 (Promega), as described (70). E. coli containing rEDA-encoding plasmid was grown in Luria-Bertani broth supplemented with ampicillin (100 g/ml Ϫ1 ). Protein expression was induced with 1 mM isopropyl ␤-D-1thiogalactopyranoside for 3 h. Cells were harvested by centrifugation, and pellets were frozen at Ϫ20°C for 2 h or overnight. Cells were next suspended in a lysis buffer (50 mM Tris/HCl, pH 8.0, 50 mM NaCl, 2 mM MgCl 2 , 2% Triton X-100, 10 mM ␤-mercaptoethanol, 0.2 mg/ml lysozyme, 1 EDTA-free protease inhibitor mixture tablet (per 10 ml) (Pierce), 1 mM phenylmethanesulfonyl fluoride, 10 units/ml DNaseI) and incubated on ice for 20 min. Cell lysate was centrifuged at high speed (16,000 ϫ g for 20 min), and supernatant was collected. Supernatant was mixed with nickel-nitrilotriacetic acid-agarose resin (Qiagen) by inversion for 1.5 h and then poured into a column (Bio-Rad). Sample was washed with a 50ϫ resin bed volume of the wash buffer (50 mM NaH 2 PO 4 , 20 mM imidazole, 500 mM NaCl), and then rEDA protein was eluted in elution buffer (50 mM NaH 2 PO 4 , 250 mM imidazole, 300 mM NaCl). Purified protein was dialyzed against 25 mM HEPES buffer, pH 8.0, and protein concentration was measured with Qubit fluorometric quantitation; protein integrity and purity were assessed by 15% SDS-PAGE. Protein was stored at Ϫ20°C until future use.

Rotary shadowing and EM
rScl proteins were rotary-shadowed and analyzed by EM, as described previously (14). Briefly, recombinant proteins were dialyzed against 0.1 M ammonium bicarbonate. rScl samples were mixed with glycerol to a final glycerol concentration of 70% (v/v), and 100 l of each sample was nebulized with an airbrush onto freshly cleaved mica. Samples were then dried in a vacuum and rotary-shadowed with carbon/platinum using an electron beam gun titled at an angle of 6º, relative to the mica surface, in a Balzers BAE 250 evaporator. The replicas were backed with carbon at 90º and then were floated in distilled water and picked up onto bare 600-mesh copper grids. Photomicrographs were taken using an FEI G2 electron microscope operated at 120 kV with a 30-m objective aperture.

CD spectroscopy
CD spectroscopy of rScl proteins was performed as described previously (14). Briefly, protein samples were dialyzed against 1ϫ Dulbecco's PBS, pH 7.4. CD spectra were taken with a Jasco 810 spectropolarimeter, in a thermostatically controlled cuvette, with a path length of 0.5 cm. Data were acquired at 10 nm/min. Wavelength scans were performed from 240 to 190 nm at either 20°C or 50°C for denaturation.

Protein-binding assays
rScl proteins (0.5 M solutions) were immobilized onto Strep-Tactin-coated microplate wells for 1.5 h at room tem-perature and blocked with TBS supplemented with 1% BSA overnight at 4°C, followed by incubation with ECM ligands: cFn (Sigma), rEDA, murine Lm-111 (Sigma), and apoB100 and LDL (Sigma). The no-rScl controls were performed in BSAcoated wells for each ligand and each antibody used. Final ODs were normalized by subtracting the BSA controls in each experimental set-up.

Protein modeling
Multiple-sequence alignments of protein sequences were generated using MUSCLE and refined manually based on the results of secondary structure predictions. Predicted ␣-helices represent the consensus between PSIPRED, PHD, and SSPRO. The phylogenetic tree of Scl1-V and Scl2-V regions was inferred based on multiple-sequence alignments, using the neighbor-joining method (adapted from Ref. 20). Branch lengths were scaled to the distance computed using the Gon-net250 matrix, and bootstrap values (statistical support for given bifurcation) were calculated for all nodes.
The homology models of Scl1.1-V and Scl2.28-V domains were obtained after consensus-based sequence alignment using HMMer, against the amino acid sequence of Scl2.3-V domain with solved crystal structure (22) (PDB code 4NSM). The homology models were built with MODELLER (71), and stereochemical quality of the models was improved by energy minimization with GROMACS (72).

Complementation of M1 group A Streptococcus
The E. coli shuttle plasmid pSB027 (73) ( Table 1) was used for in trans complementation. The DNA fragment encompassing the scl1.1 coding sequence with upstream promoter was PCR-amplified using genomic DNA from the M1-type strain MGAS5005 (WT) with primers 232Reg and 232Rev_BgIII (Table S1). The amplified DNA was cloned into pSB027 between BglII and HindIII sites, resulting in pSL620 ( Table 1). The scl.chi2 sequence was cloned as a synthetic dsDNA fragment (gBlocks (Integrated DNA Technologies); SL621 gBlock, Table S1) between SpeI and PpuMI sites of pSL620, resulting in pSL621 (Table 1). Both pSL620 and pSL621 were electroporated into MGAS5005 ⌬scl1 mutant GAS (9), and transformants were selected on BHI agar containing 10 g/ml Ϫ1 chloramphenicol; plasmids were verified by DNA sequencing.

Structural adaptation of Scl1-V domain
GAS cultures were grown in Todd Hewitt broth (BD Biosciences) supplemented with 0.2% yeast extract (THY medium) and on brain heart infusion agar (BD Biosciences) at 37°C in an atmosphere with 5% CO 2 . For antibiotic selection, chloramphenicol (5 g ml Ϫ1 ) was added to the medium.

Western blot analysis
Expression of Scl1.1 and Scl.chi2 proteins was determined by Western immunoblotting of the bacterial cell wall protein fraction, as described previously (9,10). Briefly, bacterial cultures were grown to an A 600 of 0.5, and cells were harvested by centrifugation. GAS cells were next digested with 100 g of lysozyme and 5 units of mutanolysin and phenylmethylsulfonyl fluoride in a 20% sucrose buffer. 5-25 l of the sample supernatants were separated by SDS-PAGE and transferred to a nitrocellulose membrane. Detection of Scl1.1 and Scl.chi2 proteins was performed using rabbit antibody (1:2000) specific to the Scl1.1-V domain (9). Horseradish peroxidase-conjugated goat anti-rabbit IgG secondary antibody (Bio-Rad/1721019; 1:2000) was added to blots, followed by the addition of Clarity ECL substrate (Bio-Rad) for detection. Images were acquired using a ChemiDoc touch imaging system (Bio-Rad).

Flow cytometry analysis
Determination of Scl surface expression and rEDA binding by GAS cells was measured by flow cytometry. Bacteria grown to an A 600 of 0.5 were harvested by centrifugation and washed with flow cytometry buffer (PBS containing 10% Todd Hewitt broth with 0.2% yeast extract). For Scl surface detection, the anti-Scl1.1-V antibody (9) was preabsorbed with MGAS5005 ⌬scl1 cells, and then Ab (1:10 dilution) was incubated with GAS cells tested for 30 min on ice. Cells were next washed and incubated with Allophycocyanin-conjugated donkey anti-rabbit antibody (Jackson ImmunoResearch; 1:150).
For rEDA binding, GAS cells tested were incubated with rEDA for 30 min at room temperature, and bound rEDA was detected with anti-His tag mAb (1:10 dilution) (Proteintech) preabsorbed with MGAS5005 WT cells. Samples were then incubated with goat anti-mouse polyclonal antibody conjugated with Alexa Fluor 568 (ThermoFisher; 1:150). Cells were washed and fixed in 0.4% paraformaldehyde and stored at 4°C until analysis. 50,000 events were collected per sample using a BD LSRFortessa (BD Biosciences), and data were analyzed with FCS Express Flow version 6 software.

Statistical analyses
Statistics were performed using the two-tailed paired Student's t test or two-way analysis of variance. Significance was denoted at the following levels: *, p Յ 0.05; **, p Յ 0.01; ***, p Յ 0.001. Error bars represent S.D. with analyses based on three independent experimental repeats (n ϭ 3), each performed in triplicate technical replicates.