The Crystal Structure of the Streptococcal Collagen-like Protein 2 Globular Domain from Invasive M3-type Group A Streptococcus Shows Significant Similarity to Immunomodulatory HIV Protein gp41*

Background: Streptococcal collagen-like proteins play crucial roles in host adhesion, host cell entry, and immunomodulation of host defenses. Results: This study provides the first three-dimensional structural description of a streptococcal collagen-like protein. Conclusion: The crystal structure evidences a six-helical bundle fold, which is unusual in bacterial proteins and characteristic of viral fusion proteins. Significance: The high resolution structure provides a structural basis for the design of inhibitors of streptococcal invasion. The arsenal of virulence factors deployed by streptococci includes streptococcal collagen-like (Scl) proteins. These proteins, which are characterized by a globular domain and a collagen-like domain, play key roles in host adhesion, host immune defense evasion, and biofilm formation. In this work, we demonstrate that the Scl2.3 protein is expressed on the surface of invasive M3-type strain MGAS315 of Streptococcus pyogenes. We report the crystal structure of Scl2.3 globular domain, the first of any Scl. This structure shows a novel fold among collagen trimerization domains of either bacterial or human origin. Despite there being low sequence identity, we observed that Scl2.3 globular domain structurally resembles the gp41 subunit of the envelope glycoprotein from human immunodeficiency virus type 1, an essential subunit for viral fusion to human T cells. We combined crystallographic data with modeling and molecular dynamics techniques to gather information on the entire lollipop-like Scl2.3 structure. Molecular dynamics data evidence a high flexibility of Scl2.3 with remarkable interdomain motions that are likely instrumental to the protein biological function in mediating adhesive or immune-modulatory functions in host-pathogen interactions. Altogether, our results provide molecular tools for the understanding of Scl-mediated streptococcal pathogenesis and important structural insights for the future design of small molecular inhibitors of streptococcal invasion.

Streptococcus pyogenes or group A Streptococcus (GAS) 3 is a human-adapted pathogen that causes over 700 million cases worldwide annually (1). GAS infections produce a wide range of clinical outcomes from superficial throat and skin infections to life-threatening invasive diseases such as streptococcal toxic shock syndrome and necrotizing fasciitis (2,3). The mortality resulting from the acquisition of invasive infections is high with 163,000 deaths globally each year (4). It is known that M3-type GAS strains are associated with severe infections. In a survey of 108 isolates from the United States, 50% of invasive diseases were caused by M1-and M3-type strains, and M3-type strains contributed to the majority of streptococcal toxic shock syndrome cases (5). Over the past decade, molecular pathogenomics has facilitated our understanding of the molecular basis for the more severe invasive diseases caused by M3-type strains (6 -10). GAS produces cell-associated virulence factors that contribute to host colonization and immune evasion and include the streptococcal collagen-like (Scl) proteins Scl1 and Scl2, which are also known as SclA and SclB (11)(12)(13)(14)(15).
The Scl1 and Scl2 proteins share a similar structural organization including an N-terminal variable globular (V) domain, a highly charged collagen-like triple-helix (CL) domain consisting of (Gly-Xaa-Yaa) n triplet repeats, and a C-terminal Grampositive cell wall attachment domain (Fig. 1A). Like collagen, an important structural protein in the extracellular matrix of animals, Scl1 and Scl2 form stable triple-helical structures (11, 16 -19). The collagen triple helix is composed of three lefthanded polyproline helices twisted into a right-handed supercoiled structure. In mammals, a strong contribution to triplehelix stability is given by a high content of hydroxyproline residues at the Y position of the X-Y-Gly triplets, whereas bac-teria lack the prolyl hydroxylase needed for post-translational modification of proline residues (17,20,21). To explore the basis of bacterial collagen triple-helix stability in the absence of hydroxyproline, biophysical studies were carried out on recombinant Scl2 protein and a set of peptides modeling the Scl2 highly charged repetitive (Gly-X-Y) n sequences (17). These studies showed that bacteria have developed alternative strategies to stabilize the triple helix involving electrostatic interactions, interchain hydrogen bonds, and a hydration-mediated hydrogen bonding network (17,22). Similar to that observed for human collagen, the V domain was hypothesized to be needed for proper folding of the triplehelical regions because their high symmetry constitutes an obstacle for optimal folding. However, the observation that the recombinant CL domain of Scl1 is expressed as a stable triple helix (16,23) contrasts with this hypothesis at least in vivo. Scls have characteristic "lollipop-shaped" domain organization, which seems apt for ligand binding. Indeed, antibody mapping and electron microscopy imaging analyses confirmed that the stalk-forming CL region projects the globular V region away from the bacterial surface (16), a feature that may facilitate interactions of V regions with their potential targets. Several biologically relevant V region ligands have been identified using experimental approaches. Thus, different Scl variants bind human extracellular matrix proteins cellular fibronectin and laminin (24) as well as plasma components including the low density lipoprotein, thrombin-activatable fibrinolysis inhibitor, and complement-regulatory proteins factor H and factor H-related protein-1 (18,(25)(26)(27)(28). In addition, the CL domain of Scl can bind directly to host cells through the cellular receptors integrins ␣ 2 ␤ 1 and ␣ 11 ␤ 1 (29 -31). Hence, the two main Scl structural domains bind human ligands and are essential for GAS adhesion, host cell entry, and immunomodulation of host defenses. Because of the importance of invasive M3-type strains in human morbidity and mortality, the presumed expression of Scl2.3 (Scl2 from M3 strain) was used previously as an epidemiological marker of S. pyogenes (7), although its actual expression has not been shown. Here, we demonstrate that Scl2.3 protein is expressed on the cell surface of an invasive M3-type group A Streptococcus. Because no structural clues on Scl2 are available, we combined x-ray crystallography with molecular modeling and dynamics to obtain information on the structure of the entire molecule. This structure delivers the first atomic description of an Scl protein and opens the field for the understanding of the structure-function relationship of key proteins that mediate essential adhesive and immunomodulatory functions of group A Streptococcus.

EXPERIMENTAL PROCEDURES
Bacterial Strains and Growth-The GAS M3-type strain MGAS315 (6) used here was isolated from an invasive case of a streptococcal toxic-like shock syndrome in Texas (5). GAS was routinely grown in Todd-Hewitt broth supplemented with 0.2% yeast extract (THY medium) or on tryptose agar with 5% sheep blood (BD Biosciences) at 37°C in 5% CO 2 , 20% O 2 atmosphere.
The Escherichia coli strain DH5␣ was used in cloning experiments, and E. coli BL21 was used for protein expression. E. coli strains were grown in Luria-Bertani medium (BD Biosciences) supplemented with ampicillin (100 g/ml).
The rScl2.3-V polypeptide is fused at the N terminus to the OmpA signal peptide mediating periplasmic expression of recombinant protein. The OmpA is selectively cleaved off during protein export by an endogenous signal peptidase, thus releasing the rScl polypeptide; the N-terminal sequence of purified rScl2.3-V was confirmed by Edman degradation. The rScl2.3-V polypeptide also has a short affinity tag, the Strep-tag II (WSHPQFEK), at the C terminus that allowed for affinity chromatography purification on Strep-Tactin-Sepharose. Purified rScl2.3-V protein was dialyzed against 25 mM HEPES, pH 8.0 and stored Ϫ20°C. Recombinant protein rScl2.3-V was tested for purity and integrity on a TGX 4 -20% gradient gel (Bio-Rad) and stained with RAPIDstain TM (G-Biosciences).
The presence of the cell-wall associated Scl2.3 protein was studied using the method described before (11,15). MGAS315 was grown in THY medium until midlogarithmic phase (A 600 ϳ0.5) before GAS cells were harvested by centrifugation. The cell wall-associated protein fraction was obtained by resuspending the cell pellet in a high sucrose buffer (10 mM Tris, pH 8.0, 20% sucrose) containing 25 units of mutanolysin and 1 mg/ml lysozyme followed by incubation at 37°C for 1 h. rScl2.3-V protein and cell wall-associated fraction of MGAS315 were analyzed by SDS-PAGE and Western immunoblotting using rabbit polyclonal antibodies raised against rScl2.3-V (Proteintech Group, Inc.). Alkaline phosphatase-conjugated anti-rabbit IgG heavy and light goat polyclonal antibodies (Rockland) were used as the secondary antibody, and detection was performed using 1-Step TM nitro blue tetrazolium/5-bromo-4-chloro-3-indolyl phosphate substrate (Thermo Scientific). PageRuler TM Plus Prestained Protein Ladder (Thermo Scientific) was used as a molecular weight marker.
Circular Dichroism (CD) Spectroscopy-To analyze the conformational state of rScl2.3-V, far-UV CD spectra were registered at 20°C. All CD spectra were recorded with a Jasco J-810 spectropolarimeter equipped with a Peltier temperature control system (Model PTC-423-S). Molar ellipticity per mean residue, [] in degrees cm 2 ⅐dmol Ϫ1 , was calculated from the following equation where []obs is the ellipticity measured in degrees, mrw is the mean residue molecular mass (116.1 Da), C is the protein concentration in g⅐liter Ϫ1 , and l is the optical path length of the cell in cm. Far-UV measurements (190 -260 nm) were carried out at 20°C using a 0.1-cm-optical path length cell and a protein concentration of 0.2 mg⅐ml Ϫ1 . Thermal denaturation studies were conducted at 222 nm with increasing temperature from 20 to Structure and Dynamics of Scl2 FEBRUARY 21, 2014 • VOLUME 289 • NUMBER 8 70°C. Proteins were equilibrated at each temperature point for 2 min, and the temperature was increased with an average rate of 0.5°C/min. T m was obtained by taking the peak of the first derivative of the melting curve.
Multiple Light Scattering-Purified rScl2.3-V was analyzed by size exclusion chromatography (SEC) coupled to a DAWN multiangle light scattering instrument (Wyatt Technology) and an Optilab TM rEX (Wyatt Technology). 1 mg of sample was loaded an S75 10/30 column equilibrated in 25 mM HEPES, 100 mM NaCl, pH 7.4. A constant flow rate of 0.5 ml/min was applied. The on-line measurement of the intensity of the Rayleigh scattering as a function of the angle as well as the differential refractive index of the eluting peak in SEC was used to determine the weight average molar mass of eluted protein using Astra 5.3.4.14 software (Wyatt Technologies).
Crystallization, Data Collection, and Processing-Crystallization trials were performed at 293 K using the hanging drop vapor diffusion method. Preliminary crystallization conditions were set up using a robot station for high throughput crystallization screening (Hamilton STARlet NanoJet 8ϩ1) and commercially available sparse matrix kits (Crystal Screen kits I and II and Index, Hampton Research). Optimization of the crystallization conditions was performed manually by tuning protein and precipitant concentrations. The best crystals were grown in 0.05 M ammonium sulfate, 0.05 M Bis-Tris, pH 6.5, 30% (v/v) pentaerythritol ethoxylate (15/4 EO/OH) (32). For structure solution, europium chloride derivative crystals were prepared by soaking a native crystal in a solution containing 8 mM EuCl 3 , 0.05 M ammonium sulfate, 0.05 M Bis-Tris, 30% (v/v) pentaerythritol ethoxylate (15/4 EO/OH) for 3 h at pH 6.5. A single wavelength anomalous diffraction experiment was recorded in house at 100 K using a Rigaku Micromax 007 HF generator producing CuK ␣ radiation and equipped with a Saturn944 charge-coupled device detector. The data sets were scaled and merged using the HKL2000 program package (33) (see Table 1).
Structure Determination and Refinement-Phasing was achieved using in-house single anomalous dispersion data using a protocol adopted previously (34). Using these data, both SHELXD (35) and SOLVE (36) identified five europium ions. Phases, improved by phase extension and density modification by RESOLVE (36) and wARP (37), allowed us to trace nearly the entire molecule structure. Crystallographic refinement was carried out against 95% of the measured data using the CCP4 program suite (38). The remaining 5% of the observed data, which were randomly selected, was used in R free calculations to monitor the progress of refinement. The structures was validated using the program PROCHECK (39) and deposited with the Protein Data Bank (accession code 4nsm).
Sequence and Structure Alignments-Alignments of all available Scl2 sequences were performed using the ClustalW program. These sequence alignments were used to obtain phylogenetic relationships. Structure alignments were carried out using the DALI server.
Modeling of the Full Scl2.3-Molecular modeling sessions were carried out to model the collagen-like domain and obtain the entire Scl2.3 structure. The collagen-like domain was modeled using the structure of the collagen-like peptide (Pro-Pro-Gly) 10 as a template (40). The full sequence of Scl2.3 was adjusted on the domain structure using ad hoc made routines. The full model was energy-minimized using the GROMACS package.
Molecular Dynamics Simulations-Molecular dynamics (MD) simulations were performed using the region 7-123 of the Scl2.3 as a starting model including the crystallographic V domain (residues 7-77) and part of the modeled region (residues 78 -123). MD simulations were carried out with the GRO-MACS package by using the all-atom AMBER99sb ILDN force field (41) in combination with the TIP4P-ew explicit water model (42). To avoid any bias on the hydration status of the protein derived from the MD analyses, crystallographic water molecules were removed from the starting model. The simulations were carried out in the NPT ensemble with periodic boundary conditions at a constant temperature of 300 K by using a weak coupling with external bath (V rescale method) (43) and a constant pressure of 1 atm (Berendsen pressure coupling) (44). A rectangular box was used to accommodate the protein, water molecules, and ions. The system included 28,827 water molecules and a total of 120,438 atoms.
Bending angles between CL and Scl2.3-V or between regions of the collagen triple helix were defined between the center of masses of three groups of atoms. For the definition of the global interdomain angle, these atoms are the C␣ atoms of residues 57 and 60 of each chain (from Scl2.3-V), residues 75-77 (hinge region), and residues 82-84 (from CL). For the bending angle between three zones of the CL domain, we selected the C␣ atoms of residues 77-78 of each chain (bottom region), residues 94 and 95 (hinge region), and residues 112 and 113 (top region).

RESULTS
Expression of Scl2.3 Protein by M3-type GAS-Expression of the Scl2 proteins is regulated at the level of translation and depends on a number of pentanucleotide repeats (CAAAA) found downstream of a GTG start codon (12,14,15). Based on the number of these repeats, the scl2 coding sequence may be in-frame, resulting in expression of the full-length protein, or out-of-frame, leading to early translation termination. We assessed the cell surface expression of Scl2.3 protein by MGAS315, a strain representative of global invasive M3 organisms.
To generate tools for the detection of Scl2.3 protein, we cloned, expressed, and purified the rScl2.3-V, corresponding to the V region of Scl2.3 from MGAS315. SDS-PAGE analysis of purified rScl2.3-V shows a single protein band of the expected size of about 10.1 kDa (Fig. 1B) as further confirmed by sequencing. Rabbits were immunized with rScl2.3-V to generate specific anti-Scl2.3 antibodies, which we used to test the presence of the Scl2.3 in the cell wall-associated protein fraction of MGAS315 by Western immunoblotting (Fig. 1C). In addition to the positive control (rScl2.3-V lane), we detected a prominent immunoreactive band of ϳ65 kDa in the cell wall fraction (Scl2.3 lane) using postimmune rabbit serum, whereas probing with control preimmune serum was negative for the rScl2.3 and Scl2.3 bands. Based on sequence analysis, the predicted molecular mass of the mature Scl2.3 protein is ϳ52.5 kDa. However, an aberrant migration of Scl proteins has been well documented (11,13). Altogether, our data show that the Scl2.3 protein is expressed on the cell surface of invasive M3-type strain MGAS315 of S. pyogenes.
Structural Studies in Solution-Structural features of rScl2.3-V in solution were checked using CD and light scattering studies. As shown previously (22), Scl2-V has a typical ␣-helical CD spectrum ( Fig. 2A). Thermal stability curves determined by monitoring the CD signal at 222 nm evidence a cooperative unfolding with a melting transition at T m ϭ 50°C. Consistent with previous data, denaturation of rScl2.3-V is fully reversible (Fig. 2B). Analytical SEC coupled with multiangle light scattering was carried out to investigate the oligomerization state of rScl2.3-V in solution. The on-line measurement of the intensity of the Rayleigh scattering as a function of the angle as well as the differential refractive index of the eluting peak in SEC was used to determine weight average molar mass. This analysis produced a weight average molar mass value of 26,600 Ϯ 107 Da, which corresponds to a trimeric organization of the molecule (Fig. 2C).
Overall Structure of rScl2.3-V-rScl2.3-V was crystallized in the space group H32. The structure was solved by single wavelength anomalous dispersion analysis of a europium-derivatized crystal and refined to a resolution of 1.6 Å ( Table 1). Analysis of crystal packing using the software PISA confirms that the biologic unit of rScl2.3-V is a trimer. Consistently, a large surface area is buried (32% of the total surface; 5470 Å 2 ) upon trimer formation with a strong gain of free energy of solvation (⌬ i G ϭ Ϫ42.1 kcal/mol). rScl2.3-V molecules are organized about 3-fold crystallographic axes to form a six-helical bundle structure (Fig. 3A). The inner core of this bundle consists of a parallel, trimeric structure in which helices are wrapped in a gradual left-handed superhelix. Three further helices wrap antiparallel to the internal helices in a left-handed direction around the exterior of the central trimer. The six-helix bundle forms an elongated cylinder measuring about 30 Å in diameter and 60 Å in height. Interestingly, external helices are shifted

Structure and Dynamics of Scl2
FEBRUARY 21, 2014 • VOLUME 289 • NUMBER 8 with respect to internal helices as a 12-residue-long loop, embedding residues from Lys-31 to Asp-42, connects internal and external helices in each monomer. This region, which contains Pro-34 and Pro-36, adopts a well defined polyproline II conformation (Fig. 3B).
The V domain of Scl2 was proposed to be stabilized by coiled coil interactions (23), although prediction servers do not provide a clear answer. We searched the rScl2.3-V crystal structure for the typical structural features of coiled coils, named knobsinto-holes, using the software SOCKET (45). In typical coiled coils, hydrophobic side chains at "a" and "d" positions on one helix act as knobs and dock into holes formed by diamonds of four residues on the partnering helix. This analysis shows that rScl2.3-V does not contain coiled coils.
Interactions between inner helices of the rScl2.3-V six-helix bundle involve different types of contacts along the bundle. Hydrophobic interactions exist at the two poles of the molecule, whereas an intricate pattern of salt bridges is formed in the central part (Fig. 4). In this pattern, Arg-56 bridges Glu-60 of two adjacent protomers and interacts with Asp-61 of an adjacent protomer. Further salt bridges exist between the central Glu-60 and Arg-64 and between Asp-61 and more peripheral Lys-57 (Fig. 4). As a result, as many as 16 salt bridges stabilize the central region of the bundle.
Three outer N-terminal helices (residues 7-38) pack obliquely against the outside of the inner trimer in an antiparallel orientation. As such, they interact through hydrophobic interactions and salt bridges with residues in three grooves on the surface of the central helical trimer, whereas interactions mediated by the polyproline II strand are mostly hydrophobic (Fig. 5A). Analysis of the electrostatic potential surface reveals an uneven distribution of charged patches with a concentration of negative charges in the region opposite to the origin of the collagen triple helix (Fig. 5, C and D). The negatively charged patch generated by Asp-42 and Asp-43 of each chain surrounds a solvent-exposed hydrophobic region generated by Leu-41 and Met-46. Of these residues, the position of Met-46 is occupied by hydrophobic residues in all members of a subgroup of Scl2 sequences identified by phylogenetic analysis (Fig. 6, branch C). In the same subgroup, negatively charged residues often occur in a region embedding Asp-42 and Asp-43. Different features characterize the other two subgroups, but all sequences present both charged and hydrophobic residues in loop regions (either experimentally determined or predicted), indicating that these features may be functionally important.
Sequence Alignments-Several sequences of both Scl2 and Scl1, which are derived from different S. pyogenes strains, have been identified. Multiple sequence alignment shows that a hallmark of all Scl2 sequences is the occurrence of hydrophobic  residues at regular positions, most of which are conserved in all analyzed sequences. An analysis of rScl2.3-V structure shows that these residues constitute the inner core of the six-helix bundle fold (Fig. 4). This finding suggests that all Scl2 proteins share the same six-helix bundle fold we observed in the rScl2.3-V structure. The same considerations apply to Scl1 sequences because most conserved hydrophobic residues are also conserved in Scl1 (data not shown). Phylogenetic analysis shows that Scl2 sequences can be subdivided into three main branching groups (A, B, and C; Fig. 6). In each branch, specific characteristics are conserved. For example, a striking difference between branches A and B and branch C, which contains Scl2.3, is the presence of a fully conserved Pro residue in branches A and B in a position corresponding to Scl2.3 Ser-26, which belongs to the ␣-helix ␣2 in Scl2.3 structure. Another almost conserved Pro residue characterizes branch B in place of Scl2.3 Ser-48, which is embedded in ␣-helix ␣2. These considerations suggest that the structures of proteins in each branch differ in the boundaries of ␣-helices constituting their six-helix bundle fold. Compared with Scl2.3, secondary structure predictions suggest that branches A and B are characterized by shorter ␣1 and ␣2 helices connected by a longer loop in branch A and by a loop-helix-loop motif in branch B (Fig. 6).
Scl2.3-V Structurally Resembles gp41-A search for similar folds in structural databases revealed a strong structural relationship between rScl2.3-V and subunit gp41 of the envelope glycoprotein from human immunodeficiency virus 1 (HIV-1) (Protein Data Bank code 3o40; Z ϭ 12.1; root mean square deviation (r.m.s.d.), 2.8 Å) with a sequence identity between the two proteins of 9% after alignment of 165 residues. In addition, the 3-carboxy-cis,cis-muconate lactonizing enzyme from

Structure and Dynamics of Scl2
FEBRUARY 21, 2014 • VOLUME 289 • NUMBER 8 rScl2.3-V structure on that of gp41 evidences a striking similarity in the helical arrangement of the two six-helix bundles (Fig.  7). However, the three inner ␣-helices of gp41 are packed together in the "knobs-into-hole" arrangement typical of coiled coils, whereas coiled coil interactions were not found in the rScl2.3-V structure. This feature is likely responsible for a more compact arrangement of inner helices in gp41 compared with rScl2.3-V (Fig. 7). Also, whereas gp41 fold is a highly regular six-helix bundle, rScl2.3-V presents a polyproline region at the N-terminal side of external helices. Although the position of these proline residues is not conserved among Scl2 sequences (Fig. 6), their presence in the region connecting the two main helices forming protomers of the six-helix bundle is a distinctive feature of all Scl2 sequences.
Modeling of Triple-helical Regions and MD Simulations-Good quality electron density maps allowed us to define the conformation of rScl2.3-V C-terminal ends up to Leu-76. In Scl2, this is the site of attachment of the collagen-like triple helix. Notably, whereas all three Leu-76 residues from rScl2.3-V are in a plane, the triplets of the collagen triple helix are typically staggered by one residue. This poses the question whether the asymmetry of the triple helix is accommodated by the rScl2.3-V structure or whether a kink of the two domains is necessary as observed previously for the engineered foldoncollagen (46). To tackle this question, we modeled the triple-helical part of Scl2.3, thus producing the first structural description of an Scl (Fig. 8A), and carried out MD simulations. To assess the evolution of the structure in the simulation time scale (100 ns), a number of stereochemical parameters (gyration radius, secondary structure, and r.m.s.d.) were monitored FIGURE 6. Multiple sequence alignment analysis of Scl2-V region variants. A, sequence alignment was performed using ClustalW. Conserved residues are shown in green (hydrophobic), blue (positively charged), red (negatively charged), and gray (polar); Pro residues are shown in cyan. A, B, and C refer to the three respective branches calculated by phylogenetic analysis using ClustalW. Secondary structure prediction according to JPRED is reported in magenta, whereas the secondary structure based on the Scl2.3-V crystal structure is reported in blue. along the trajectories. The evaluation of r.m.s.d. (calculated on the C␣ atoms) from the starting structure evidences that large motions characterize the simulated system (Fig. 8B). The r.m.s.d. values are smaller when they are separately computed for Scl2.3-V and Scl2.3-CL regions. Of these, r.m.s.d. values for the Scl2.3-V region are on average smaller than those of the Scl2.3-CL region (Fig. 8B). Likely, the difference in the r.m.s.d. behavior for the two regions is in their structural characteristics. Indeed, local fluctuations on an elongated structure (i.e. Scl2.3-CL) propagate into larger effects on the r.m.s.d. than in

Structure and Dynamics of Scl2
FEBRUARY 21, 2014 • VOLUME 289 • NUMBER 8 globular structures (i.e. Scl2.3-V). The r.m.s.d. data are consistent with the presence of a principal motion involving a global interdomain bending motion between Scl2.3-CL and Scl2.3-V (Fig. 8C). Analysis of the MD trajectory structures evidences a continuous evolution from linear conformations in which the Scl2.3-V and Scl2.3-CL domains are coaxial to a more "bent" state. In particular, the bending angle between the axis of the Scl2.3-V and Scl2.3-CL domains (see "Experimental Procedures" for definition) ranges from 143 to 180°in an elastic fashion (Fig. 8C).
A direct indicator of the stability of the triple-helix motif is the number of conserved main-chain hydrogen (H)-bonds along the trajectory. These intermolecular H-bonds are distinctive of the triple-helix motif and are established between the amide group of the Gly residues and carbonyl groups from complementary peptides that form the triple helix. The analysis confirms that the force field and simulation setup used were able to maintain the initial H-bonding pattern of the structure: on average, 87.5% of the native main-chain H-bonds were maintained. In addition to the observed interdomain rearrangements along the trajectory, MD data also evidence a high flexibility of the Scl2.3-CL domain with a bending angle around the center of the Scl2.3-CL region ranging between 152 and 180° (Fig. 8D). The increased flexibility of amino acid-rich triple helices compared with imino acid-rich triple helices is in line with previous MD analyses of other collagen-like polypeptides (47)(48)(49).

DISCUSSION
The arsenal of virulence factors deployed by streptococci includes Scl proteins, which arm the cell wall of the bacterium and establish multiple functions such as host adhesion (29 -31), evasion of host immune defenses (27,28), and biofilm formation (50,51). There are nearly 300 collagen-like proteins annotated to streptococci (52) including several pathogenic organisms like S. pyogenes (11)(12)(13)(14)(15), Streptococcus pneumoniae (53), and Streptococcus equi (54,55). In addition to a signature collagen-like domain, Scl proteins contain a globular domain (the V domain) and both Gram-positive signal peptide (YSIRK) and cell wall anchor (LPXTG) domains, predicting that they are all cell surface proteins. Despite their established importance in bacterial pathogenesis, no three-dimensional structural information is available so far for any of the Scl proteins. In this work, we formally demonstrate that the Scl2 protein is expressed by invasive M3-type strain MGAS315 and is found on the bacterial cell surface (Fig. 1). By combining x-ray crystallography with computational techniques, we provide a structural description of the entire Scl2.3 molecule.
Scl2 is known to be regulated at the level of translation by the varying number of CAAAA pentanucleotide repeats directly downstream of the start codon, which may result in frameshift of the scl2 gene reading frame and early translation termination (12,14,15). Analysis of scl2 within 50 GAS strains representing 21 different M types showed that the scl2 allele is present in virtually all strains tested, although the number of the repeats as well as resulting Scl2 expression varied among strains. For example, none of the M1-type strains whereas about half of the M28-and M12-type strains were predicted to express the full-length Scl2 variants. Interestingly, all of the M3-type strains initially tested (15) and 84% of 255 M3 global isolates (7) were found to contain in-frame scl2.3 alleles. This suggests that there may be a selective advantage in M3 strains to express Scl2, and it may have an important role in the pathogenesis of M3-type GAS.
Previous binding studies have delineated roles for Scl1 and Scl2 in both host colonization and immune evasion. Thus, some Scl1 variants may aid host colonization by binding to cellular fibronectin and laminin, which are major components of human extracellular matrix, and integrins ␣ 2 ␤ 1 and␣ 11 ␤ 1 , which are present on the host cell surface (24,31). Scl1 has also been shown to bind the plasma lipoproteins and complement regulators of the immune system (18,27,28). Furthermore, both Scl1 and Scl2 proteins have been shown to bind thrombinactivatable fibrinolysis inhibitor, interfering with the normal fibrinolytic breakdown of blood clots (26), which may resemble a role of staphylococcal coagulases that produce clots as a protective barrier against the immune response (57). These observations suggest that Scl2 is involved in evasion or modulation of the immune response rather than in host colonization. Although the role of Scl2 during infection is currently unclear, the structural data gained from this study provide very interesting clues into its possible function. The crystal structure of the rScl2.3-V unveils a compact trimeric six-helix bundle fold. Consistently, light scattering experiments evidence that rScl2.3-V exhibits a trimeric arrangement also in solution (Fig.  2). To date, trimeric six-helix bundle folds have not been observed in bacteria but typically characterize several glycoproteins involved in viral fusion including the gp41 subunit of the envelope glycoprotein of HIV-1, glycoprotein B of herpes simplex virus (58), and the GP2 domain of the envelope glycoprotein GP from the Marburg and Ebola viruses (59 -62). However, different from gp41 and from previously reported data on Scl2 (23), the structure of rScl2.3-V does not contain coiled coils, but it is stabilized by both hydrophobic and ion pair interactions (Fig. 4). Alignment of proteins with known structure shows that, even with low sequence identity (9%), rScl2.3-V structurally resembles the gp41 subunit of HIV-1, a subunit responsible for membrane fusion of the HIV virus (63,64). Because gp41 functions as a viral entry protein into CD4ϩ T lymphocytes, this might suggest a novel potential role for Scl2 in interacting with T cells and causing hyperactivation of the immune response, which is a hallmark of the streptococcal toxic shock syndrome infections that are often associated with M3-type strains.
We modeled the triple-helical region of Scl2 and performed MD analysis with the aim of investigating structural and dynamic features of Scl2.3. Scl2.3-V is located at the tip of an extremely elongated triple-helical structure (about 1030 Å; Fig.  8A) and exposes highly hydrophobic residues like Leu-41 and Met-46 (Fig. 5), a feature that may play a role in Scl2.3-mediated interaction of S. pyogenes with the hydrophobic milieu in the host. MD data evidence the extremely flexible nature of Scl2 with a dynamic kink of the interdomain organization (Fig. 8). A kinked structure was previously observed for an engineered foldon-collagen (46) and reflects the need of the structure to fit the 3-fold symmetry of the V domain (which brings the site of attachment of the three collagen-like chains in a plane) with the one-residue stagger of the collagen-like chains. Our data show that the Scl2.3 structure can undertake both kinked and linear conformations in a rapid equilibrium between them (Fig. 8).
The high structural flexibility we observed in Scl2.3 is likely instrumental to its biological function in mediating adhesive or immune-modulatory functions in host-pathogen interactions.
The V domain of Scls has been proposed as a trimerization domain that helps collagen folding. Indeed, different from globular proteins, misfolding of the collagen triple helix is a likely event because of its repeating structure, whose stability is relatively insensitive to lateral shifts by one or more Gly-X-Y repeats. Consistently, trimerization domains have been found in many different proteins containing collagen triple helices. However, it has been shown that the V domain of Scls is not needed in vivo because the CL region of Scl1 can be expressed in a folded triple-helical state (16,23). This observation highlights a different folding mechanism of Scl proteins compared with human collagen for which trimerization domains are crucial to the correct triple-helical arrangement (65). Likely, V domains of Scls display dichotomous functions by acting as triple-helix stabilization domains because they exhibit a higher folding temperature than the CL regions (23) and by mediating hostpathogen interaction (29 -31).
To date, there are five known atomic structures of trimerization domains of collagen: the NC1 domain of collagen IV (66), the homologous NC1 domain of collagens VIII and X (67,68), and the trimerization domains of collagens XV and XVIII (69,70). A trimerization domain was also characterized for BclA, the major component of the exosporium of the Bacillus anthracis spore (71). All of these trimerization domains have a high content of ␤-structure but share no structural homology. The structure of rScl2.3-V presents novel features as it is mainly composed of ␣-helices. Multiple sequence alignment suggests that the six-helix bundle fold exhibited by Scl2.3 is conserved in all Scl2 sequences albeit with different ␣-helix boundaries and lengths of the loop connecting ␣-helices ␣1 and ␣2 (Fig. 6).
Phylogenetic analyses of variation in Scl2 V region among different M types revealed several interesting observations. Scl2 sequences from different M types formed three separate clades referred to as A, B, and C (Fig. 6). The invasive M type 3, found in branch C, clustered with M types 1 and 28, which are also associated with invasive infections including streptococcal toxic shock syndrome and necrotizing fasciitis (5,72,73). Our analysis additionally evidenced that cluster C contains M types associated with rheumatic fever including types 1, 3, 6, and 18 (56,72), thus suggesting a possible role of Scl2.3 in this disease. Altogether, our work delivers the first atomic description of an Scl protein. This structural information, which can be extended to other members of the Scl family, is precious to understanding the structural basis of Scl-mediated streptococcal infection.