Molecular analysis of an enigmatic Streptococcus pneumoniae virulence factor: The raffinose-family oligosaccharide utilization system

Streptococcus pneumoniae is an opportunistic respiratory pathogen that can spread to other body sites, including the ears, brain, and blood. The ability of this bacterium to break down, import, and metabolize a wide range of glycans is key to its virulence. Intriguingly, S. pneumoniae can utilize several plant oligosaccharides for growth in vitro, including raffinose-family oligosaccharides (RFOs, which are α-(1→6)-galactosyl extensions of sucrose). An RFO utilization locus has been identified in the pneumococcal genome; however, none of the proteins encoded by this locus have been biochemically characterized. The enigmatic ability of S. pneumoniae to utilize RFOs has recently received attention because mutations in two of the RFO locus genes have been linked to the tissue tropism of clinical pneumococcal isolates. Here, we use functional studies combined with X-ray crystallography to show that although the pneumococcal RFO locus encodes for all the machinery required for uptake and degradation of RFOs, the individual pathway components are biochemically inefficient. We also demonstrate that the initiating enzyme in this pathway, the α-galactosidase Aga (a family 36 glycoside hydrolase), can cleave α-(1→3)-linked galactose units from a linear blood group antigen. We propose that the pneumococcal RFO pathway is an evolutionary relic that is not utilized in this streptococcal species and, as such, is under no selection pressure to maintain binding affinity and/or catalytic efficiency. We speculate that the apparent contribution of RFO utilization to pneumococcal tissue tropism may, in fact, be due to the essential role the ATPase RafK plays in the transport of other carbohydrates.

ing to GH family 13; an ATP-binding cassette (ABC) transporter, consisting of the solute-binding protein (SBP) RafE (26) and the putative permeases RafF and RafG; RafR and RafS, two putative transcriptional regulators/repressors; and RafX, a putative protein of unknown function (Fig. 1). The ATP-binding protein RafK (also known as MsmK) that powers the ABC transporter is encoded elsewhere in the genome (15,19).
The complete RFO utilization locus is part of the "core" genome of S. pneumoniae (present in Ͼ98% of clinical isolates) (7,27). In terms of virulence, both Aga and RafF have been identified as putative virulence factors in a signature-tagged mutagenesis screen (9) (this has not been validated in directed studies). Deletion mutants of rafK have also been shown to be outcompeted by WT in mouse models of pneumococcal colonization and infection (15,19); however, RafK powers other carbohydrate transporters in addition to RafEFG (15). The most compelling evidence for the role of the RFO utilization locus in virulence comes from a recent study in which the ability of clinical pneumococcal isolates to utilize raffinose was correlated with their tissue tropism (28). Minhas et al. (28) compared the genomes of serotype-and sequence type-matched isolates from blood and ear infections and identified SNPs in rafK or rafR among the ear isolates. The ability of these isolates to grow on raffinose was significantly reduced compared with their paired blood isolates, and reversal of the rafR mutation restored full raffinose utilization competency. Expression of genes from each of the transcriptional units within the RFO locus (aga, rafG, and rafK) was also significantly lower in the ear isolates compared with the blood isolates. Finally, in a mouse model of infection, isolates bearing the rafR mutation were less able to persist in the lungs than rafR WT strains; however, the mutant strains exhibited significantly higher bacterial loads in the ear and, to a lesser extent, the brain. Therefore, the authors concluded that the capacity to utilize raffinose determines the ultimate site of infection following nasal challenge (28).
Exactly how expression of the RFO utilization locus affects the ability of S. pneumoniae to cause pneumonia or otitis media is not known. Evidence to indicate that the proteins encoded by this locus are involved in RFO processing is 3-fold. First, expression of aga is strongly induced by raffinose (25). Second, the SBP RafE has been crystallized in complex with raffinose (PDB code 2I58), 4 and insertional inactivation of rafE abolishes raffinose uptake (19). Third, deletion of rafG abolishes growth of S. pneumoniae on raffinose and stachyose (14). However, none of the proteins encoded by the RFO utilization locus have been biochemically characterized to provide direct evidence of activity on RFOs.
The relationship between this locus, the processing of RFOs, and pneumococcal virulence is thus enigmatic. We therefore set out to biochemically and structurally characterize the key specificity determinants of the RFO utilization locus. Here we show that RafE binds RFOs, while Aga and GtfA sequentially degrade RFOs into free galactose, fructose, and glucose-1phosphate (G1P). Overall, however, Aga has relatively poor activity on RFOs, and RafE demonstrates remarkably low affinities for these ligands, suggesting that this locus is not optimized for efficient RFO uptake and degradation. Given the apparent biochemical inefficiency of these proteins and the natural lifestyle of S. pneumoniae, we propose that this locus is a relic of streptococcal evolution that is not utilized by the pneumococcus in vivo. We also speculate that the reported link between the RFO locus and virulence may, in fact, be due to the promiscuous role RafK plays in the import of several human-derived carbohydrates.

The RFO binding properties of RafE
SBPs are the specificity determinants of ABC transporters and, in turn, they dictate the array of glycans that are made available to downstream processing enzymes. To examine the ligandbinding properties of RafE, the protein lacking the native N-terminal secretion signal sequence and lipid-anchoring motif was produced recombinantly in Escherichia coli, and the binding properties of the purified protein were studied using isothermal titration calorimetry (ITC; Table 1). The ligands evaluated included RFOs (raffinose, stachyose, and verbascose), fragments of these glycans (i.e. galactose, melibiose, and sucrose), ␣-(133)-galactobiose, and the glucose trisaccharides isomaltotriose and panose (see Table 1 for chemical descriptions of each). The affinity of RafE for the tested mono-and disaccharides was too low to be detected, and the dissociation constants (K d ) for the trisaccharides and larger RFOs varied from ϳ80 M to ϳ500 M (Table 1). Notably, the measured affinities of RafE are significantly lower than those typically reported for SBPs, which are typically in the K d range of ϳ1 M (4,(29)(30)(31)(32).
Given the ability of RafE to accommodate glycans of different lengths and monosaccharide composition, we explored the molecular basis of its ligand binding by X-ray crystallography. Crystals that provided a diffraction data set to 2.35 Å resolution were initially obtained in the presence of stachyose in the space group P2 1 2 1 2 1 . This structure was solved by molecular replacement using the deposited coordinates of RafE in complex with  Organization of the core RFO utilization locus from S. pneumoniae. Genetic organization of a carbohydrate processing locus and an accessory ORF present in strains of S. pneumoniae whose protein products are known or putatively associated with raffinose utilization. The main locus encodes for an ␣-galactosidase (Aga) (25), an ABC transporter that consists of a solute-binding protein (RafE) (26) and two putative permeases (RafF and RafG), a putative sucrose phosphorylase (GtfA), two putative transcriptional regulators/repressors (RafR and RafS), and a putative protein of unknown function (RafX). Elsewhere in the genome is an ATP-binding protein (RafK) that powers the ABC transporter (15,19). Arrows above ORFs indicate the presence of promoter sequences. Locus tags underneath ORFs correspond to the TIGR4 genome.

Molecular analysis of pneumococcal RFO utilization
raffinose (PDB code 2I58) as a search model; the final refined model comprised two molecules in the asymmetric unit. One of these monomers was used as a search model to solve the structures of RafE in complex with raffinose (to 2.65 Å resolution) and verbascose (to 2.45 Å resolution). In all three cases the electron density for the carbohydrates was clear, allowing unambiguous modeling of the ligand (Fig. S1).
RafE adopts a fold characteristic of the Cluster D-I (oligosaccharide-specific) SBPs (33), which are composed of two ␣/␤ domains joined together by a hinge region ( Fig. 2A). Within the two ␣/␤ domains is a ␤-sheet core comprised of four and five ␤-strands(NandCterminus,respectively)surroundedby␣-helices. Like other SBPs, the binding pocket is a deep cavity located in between the two ␣/␤ domains (Fig. 2B). The binding pocket is open to solvent, completely exposing the reducing end residue of bound glycans and therefore is consistent with the capacity to bind ligands of various lengths.
An overlay of the three RafE complexes reveals a mode of binding whereby the nonreducing end galactose residues of the ligands are oriented toward the base of the binding pocket. In all three cases, the same set of interactions is made between this subsite, which we refer to as subsite 1, and the nonreducing end galactose (Fig. 2C). These comprise seven direct hydrogen bonds involving O2, O3, and O4 of the galactose residue. Trp 376 provides a platform that interacts with the plane formed by O5-C1-C2. Although the axial O4 particular to galactose makes several hydrogen bonds with subsite 1, there appears to be nothing legislating against the presence of an equatorial O4, as in glucose, likely explaining the ability of this protein to also accommodate ␣-glucooligosaccharides as ligands.
Similar binding interactions are observed at subsite 2 with a mix of hydrophobic and polar interactions. When bound to stachyose or verbascose, subsite 2 of RafE contains a galactose residue; however, when bound to raffinose, this site is occupied by glucose (Fig. 2, D-F). Coordination is similar between both residues, with interactions between Trp 274 and O5-C1-C2 plane of galactose/glucose. Direct hydrogen bonds are made between O4, O3, O2, and Asp 308 , Lys 42 , and Glu 44 , respectively.
In subsite 3, binding interactions are more diverse because of the presence of a fructose when bound to raffinose, a glucose when bound to stachyose, and a galactose when bound to verbascose. The binding interactions between the third residues of the RFOs (as well as the fourth residue of stachyose) and RafE are likely the key determinants underpinning the binding preferences of RafE. The galactose and fructose residues of verbascose and raffinose appear to have little or no van der Waal interactions with Trp 190 in subsite 3, whereas the glucose residue of stachyose forms a close fitted C-H() interaction (Fig. 2E). Direct hydrogen bonds are limited for raffinose, with the main chain CϭO and O⑀ of Gln 41 being the only contributing atoms (to O3 and O4) (Fig. 2D). The third galactose residue of verbascose is flipped away from Trp 190 and instead is coordinated by several hydrogen bonds between C4, C3, C2, and Tyr 93 , Gln 41 , and the N-H of Gly 74 (Fig. 2F). The fourth and fifth residues of verbascose are not involved in any direct hydrogen bonding with RafE side chains. The glucose residue of stachyose only forms direct hydrogen bonding with Lys 43 , whereas its fructose residue makes several hydrogen bonds between O1, O3, O4, and Asn 72 , Gly 74 , and Gln 41 (Fig. 2E).
A search for structural homologs of RafE using the Dali server (34) indicates that RafE shares the highest structural similarity (Z score 39.3; 23% sequence identity) with the raffinoseand panose-binding SBP BlG16BP from Bifidobacterium animalis subsp. lactis (PDB code 4ZZE) (35). Like RafE, BlG16BP exhibited the highest affinity for panose. A comparison of their structures reveals a conserved aromatic platform that complements the curvature of the ligands: Trp 190 , Trp 274 , and Trp 376 in RafE, which are Trp 216 , Tyr 291 , and Phe 392 in BlG16BP (Fig.  2G). Despite the similar specificity of the proteins, they have only ϳ50% conservation of the residues involved in hydrogen bonding with none of the hydrogen bonding residues conserved in subsite 1, which accommodates the reducing end sugar (Fig.  2G).

The ␣-galactosidase activity of Aga
Given the membership of Aga in GH family 36 (GH36) and its previously postulated RFO activity, we anticipated the enzyme would have ␣-galactosidase activity (25). To examine this, we tested purified recombinant enzyme that was produced in E. coli against a panel of ␣and ␤-configured synthetic substrates and found activity only on para-nitrophenyl-␣-D-galactopyranoside (pNP-␣-Gal; Fig. S2A). The pH optimum using pNP-␣-Gal was between 6.0 and 6.4 (Fig. S2B), whereas the K m was 0.23 Ϯ 0.02 mM and k cat was 1.8 Ϯ 0.03 s Ϫ1 (at pH 6.5; Fig. S2C).
The range of ␣-galactoside substrates accepted by Aga was further screened using TLC (Fig. 3). Aga displayed clear activity against the RFOs raffinose, stachyose, and verbascose producing products with mobilities matching those of galactose and sucrose. Cleavage of melibiose produced products consistent with glucose and galactose. Aga also displayed activity on ␣-(133)-galactobiose but not ␤-(134)-galactobiose. Given the activity of Aga on ␣-(133)-galactobiose, we also tested glycans representing the major terminal ␣-linked galactose motifs present in mammals (36,37). The type II blood group A and B tetrasaccharides, which terminate in ␣-(133)-linked GalNAc and galactose, respectively, and the Gb3 trisaccharide, which Table 1 Binding

constants for RafE determined by isothermal titration calorimetry
The data shown are the means of three independent titrations.

Molecular analysis of pneumococcal RFO utilization
terminates in ␣-(134)-linked galactose, were not substrates for Aga (Fig. 3). However, the enzyme did have activity on the linear type II blood group B antigen, most likely because, in this case, the activity of the enzyme on the terminal ␣-(133)-linked Gal was not blocked by the presence of the fucosyl modification that defines the typical blood group B antigen. To better define the selectivity of the enzyme, we performed a kinetic analysis using raffinose, stachyose, melibiose, and ␣-(133)-galactobiose as substrates (Table 2). Aga displayed similar k cat /K m values for melibiose and ␣-(133)-galactobiose, which were ϳ2-fold larger than the values obtained for raffinose and stachyose, thus suggesting the enzyme is better adapted to hydrolyze small substrates rather than RFOs.

Substrate recognition by Aga
With the unexpectedly lower activity on RFOs, we attempted to pursue the molecular basis of this by X-ray crystallographic analysis of Aga structures. Crystals of Aga were initially obtained in the I222 space group and provided a diffraction data set to 2.10 Å resolution. The structure, comprising one molecule per asymmetric unit, was solved by molecular replacement using the structure of Geobacillus stearothermophillus GH36 ␣-galactosidase AgaA (PDB code 4FNU) (38) as the search model. Aga presents a three-domain fold characteristic of family GH36 members ( Fig. 4A) (38,39). The structure comprises a N-terminal domain composed of 20 ␤-strands forming a large twisted ␤-sandwich fold followed by an ␣-helix linking it to the catalytic domain. The latter domain is a classical (␤/␣) 8 -barrel fold harboring the catalytic machinery. The C-terminal domain corresponds to a small ␤-sandwich made of four antiparallel ␤-strands. The structure of Aga is highly similar to Lactobacillus acidophilus Mel36A and G. stearothermophilus AgaA and AgaB, with root-mean-square deviation values below 0.95 (over at least 542 residues). Also, Aga possesses all the sequence characteristics of GH36 subgroup I (39). Furthermore, most GH36 ␣-galactosidases belonging to subgroup I are known to form tetramers. This multimeric state is also observed in Aga through crystallographic symmetry (Fig. S3); this tetramer is predicted to be stable by PISA analysis (40).
To gain further insight into the catalytic machinery, we solved the structure of Aga with its galactose product bound ( Fig. 4B and Fig. S4A). The galactose product is accommodated within a Ϫ1 subsite in a manner identical to that described for other GH36s (38,39). Specifically, the plane formed by C3 to C6 of the galactose ring packs against the indole group of Trp 330 . A series of hydrogen bonds are made between the sugar hydroxyl groups and Asp 360 , Asp 361 , Trp 405 , Arg 437 , Lys 470 , Cys 519 , Gly 522 , and Asp 541 (also identified as the acid base catalytic residue) of the enzyme. Asp 472 , the second catalytic residue, sits 4 Å beneath the galactose C1 consistent with its role as a nucleophile.
To uncover the mode of substrate recognition by Aga, crystals of a catalytically inactive mutant of Aga, Aga D472N, were soaked with an excess (Ͼ50 mM) of melibiose or raffinose ( Fig.  4C and Fig. S4, B and C), as well as with ␣-(133)-galactobiose or the linear blood group B type II trisaccharide ( Fig. 4D and Fig. S4, D and E). The complexes of Aga D472N with unhydrolyzed substrates revealed that the galactose at the nonreducing end is bound via the same set of interactions as described above.
In the raffinose complex, the glucose and fructose residues are positioned beyond the Ϫ1 subsite and make few interactions A, global structure of RafE in complex with stachyose, showing secondary structure progression colored N to C termini. B, RafE surface representation of deep binding pocket accommodating the four sugar residues of stachyose. C, RafE RFO binding pocket overlay, displaying the highly conserved nonreducing end galactose residue(s) (yellow) direct hydrogen-bonding interactions (black dashes) with RafE side chains (magenta). The remaining residues of raffinose (red), stachyose (blue), and verbascose (gray) are shown as transparent lines. D-F, RafE binding pocket interactions with raffinose (D), stachyose (E), and verbascose (F), illustrating the binding pocket interactions between RafE side chains and fructose (green), glucose (blue), and galactose (yellow) residues. G, binding pocket overlay of RafE (magenta) bound to stachyose (green) and the B. animalis subsp. lactis SBP BlG16BP (light gray) bound to panose (orange) (PDB code 4ZZE). Residue numbering is shown in pink for RafE and black for BlG16BP. In C-F, the binding subsites are numbered in red.

Molecular analysis of pneumococcal RFO utilization
with Aga (Fig. 4C). Indeed, the glucose makes only one hydrogen bond between its O5 and Tyr 193 . The C6 and C4 hydroxyls of the fructose interact through direct hydrogen bonds with Trp 330 and Gln 575 , respectively, whereas O1 makes a watermediated hydrogen bond with Asp 370 and Arg 437 . In the Aga D472N structure complexed with melibiose, the glucose aglycon is slightly shifted compared with the one in the raffinose complex and does not interact with any residue (Fig. 4C). Therefore, the glucose and fructose are located in what appear to be weak subsites, which could be described as pseudo ϩ1 and ϩ2 subsites (referred to as ϩ1* and ϩ2*). Attempts to trap a longer substrate, such as stachyose or verbascose, resulted in Aga structures displaying substrate density reduced solely to the galactose housed in the Ϫ1 subsite (data not shown). This agrees with the observation of weak or absent plus (ϩ) subsites. The structure of Aga D472N in complex with ␣-(133)-galactobiose also reveals a ϩ1* subsite where the galactose interacts through hydrogen bonding only with Asp 541 and Trp 330 (Fig.  4D). Interestingly, the galactose aglycon in the structure of Aga D472N with the linear blood group B type II trisaccharide possesses only a partial density (Fig. S3E), which still allowed modeling of the sugar in an orientation similar to the one observed for the galactobiose complex (Fig. 4D). However, the density for the linear blood group B type II trisaccharide GlcNAc is lacking, indicating disorder of this portion of the sugar (Fig. S3E). Furthermore, based on the latter two complexes, Aga would appear unable to accommodate the type II blood group B tetrasaccharide because the fucose ␣-(132) branched to the galactose at the ϩ1* subsite would clash with Trp 330 (data not shown), which explains the lack of activity toward such a substrate.
These structural data describe a catalytic machinery tuned towardthehydrolysisofdisaccharidesandpotentiallyalsotrisaccharides. These findings converge toward the hypothesis that GH36s in a tetrameric organization are prone to accommodate

Molecular analysis of pneumococcal RFO utilization
only small unbranched carbohydrates such as di-and trisaccharides because of the narrowing of the catalytic pocket resulting from tetramerization (Fig. S3B) (39).

Complete depolymerization of RFOs
The degalactosylation of RFOs by Aga results in the production of sucrose. The raffinose utilization locus from S. pneumoniae encodes for a putative sucrose phosphorylase, GtfA (Fig. 1), that belongs to GH family 13 and that we hypothesized would act on RFOs after Aga to complete the degradation of RFOs into their constituent monosaccharides. To initially test GtfA for sucrose phosphorylase activity, recombinant GtfA was expressed and purified from E. coli. The enzyme failed to show activity on sucrose when reactions were performed in Tris-HCl buffer (not shown). However, GtfA exhibited partial activity against sucrose when reactions were performed in phosphate buffer, producing products with mobilities matching those of fructose, G1P, and glucose (Fig. 5). GtfA also exhibited activity against raffinose and verbascose, but only in the presence of Aga. Therefore, Aga and GtfA act sequentially to depolymerize RFOs into their constituent monosaccharides.

Discussion
S. pneumoniae has the demonstrated ability to utilize RFOs for growth in vitro (14,15,28) and, as we have shown here biochemically, the pneumococcal genome encodes for the transporter and enzymes necessary to import RFOs and degrade them into their constituent monosaccharides (albeit with apparent inefficiency). Based on the data presented, we can assemble a model of a pathway that can be used by S. pneumoniae to utilize RFOs (Fig. 6). RafE binds RFOs outside of the cell and delivers them to the membrane components of the ABC transporter (RafF and RafG). The RFOs are then imported into the cytoplasm using energy provided by RafK. Once in the cytoplasm, Aga acts to sequentially degalactosylate the RFOs down to sucrose (we confirmed the intracellular localization of Aga in S. pneumoniae cells grown on raffinose; Fig. S5). Finally, the sucrose produced by Aga acts as a substrate for GtfA. The metabolism of free galactose in S. pneumoniae proceeds via either the Leloir or tagatose-6-phosphate pathway (41). The fructose released by GtfA is likely phosphorylated by a fructokinase, such as the previously characterized ScrK (16), prior to entering glycolysis. In addition to fructose, GtfA also generates G1P. G1P is interconverted to glucose-6-phosphate by a phosphoglucomutase (42), which can then enter either glycolysis or the pentose phosphate pathway (43). G1P itself is also an important intermediate in several pneumococcal anabolic pathways, including cell wall and capsule biosynthesis (42).
Although the data presented here indicate that S. pneumoniae possesses the ability to import and degrade RFOs, there are several lines of evidence to suggest that this capacity is unlikely to be biologically relevant to the normal lifestyle of this bacterium. First, RafE exhibits remarkably poor affinities for RFOs ( Table 1). The K d values determined for RafE with RFOs are ϳ10-fold higher than that exhibited by BlG16BP with raffinose (35) and Ͼ100-fold higher than the K d values that are typically reported for SBPs (4,(29)(30)(31)(32). Therefore, RafE would need to encounter very high concentrations of RFOs for a significant proportion of this selectivity determinant of the ABC transporter to be engaged. Second, Aga exhibited relatively high K m values (Ͼ5 mM) and low k cat /K m values for RFOs (Table  2). In fact, the activity of Aga is not restricted to RFOs, and Aga exhibited a Ͼ2-fold lower K m and Ͼ2-fold higher k cat /K m against ␣-(133)-galactobiose compared with raffinose. Therefore, whereas RafE and Aga appear able to bind and degrade RFOs, the specific properties of these proteins suggest a pathway that would be quite inefficient. Finally, the question still remains as to how S. pneumoniae would encounter RFOs in its host given that they are plant oligosaccharides. The previous suggestion that small amounts of RFOs absorbed by the intestinal epithelium may be presented to S. pneumoniae on mucosal surfaces (28) is not congruent with the very low binding affinity of RafE.
These observations continue to highlight the unlikely relationship between RFO metabolism and pneumococcal virulence, which led to the suggestion that the locus targets alternate glycans. Indeed, Aga, which shows properties consistent with a strict ␣-galactosidase, demonstrates activity on a terminal ␣-linked galactose motif found in some mammalian glycans. However, we do not believe that Aga possesses activity on an in vivo substrate on the following basis. ␣-(136)-Linked galactose residues are not found on any human glycans, and although Aga also exhibits activity against ␣-(133)-linked galactose residues, it is inactive against the only known human glycan to bear this motif (the type II blood group B tetrasaccha-

Molecular analysis of pneumococcal RFO utilization
ride). Prior defucosylation of this tetrasaccharide into the linear form would allow Aga to act; however, the two characterized pneumococcal ␣-fucosidases, which are the only known fucosidases in this bacterium, lack this activity (44). Furthermore, we found Aga to have an intracellular location, which makes a potential role in host glycan processing even less likely. Thus, when interpreted in the wider context, our data do not support the alternate glycan substrate hypothesis.
Loci with similar genes and gene organization to the pneumococcal RFO locus are relatively common among the Lactobacillales (as determined by a STRING (45) gene neighborhood analysis using Aga as a search protein). Consistent with this, many other streptococcal species, including Streptococcus mutans, and lactic acid bacteria also have the ability to grow on raffinose (46 -51). These bacteria reside in the oral cavity or gastrointestinal tract, and therefore, they would be expected to  . Powered by RafK (also known as MsmK), the RFOs are imported into the cytoplasm where they are sequentially degalactosylated by Aga. The released galactose can then enter metabolism either via the Leloir or tagatose-6-phosphate pathway (41). The remaining sucrose is cleaved by GtfA into fructose and G1P. Free fructose is likely phosphorylated by a fructokinase, such as ScrK (16), prior to entering glycolysis. G1P can be interconverted to glucose-6-phosphate by a phosphoglucomutase and enter either glycolysis or the pentose phosphate pathway (43). G1P is also an important intermediate in several pneumococcal anabolic pathways, including cell wall and capsule biosynthesis (42).

Molecular analysis of pneumococcal RFO utilization
regularly encounter RFOs as part of the host diet. In the case of S. pneumoniae, however, a bacterium that is unlikely to encounter RFOs, we hypothesize that the RFO pathway may be an evolutionary relic that is not utilized and, as such, is not under strong selection pressure to maintain binding affinity and/or catalytic efficiency. Supporting this concept of divergence, the amino acid sequence conservation between components of the RFO pathway are as low as 55% within members of the Streptococcus genus. For example, RafE and Aga from the S. mutans locus share amino acid sequence identities of only 60 and 66%, respectively, with the pneumococcal homologs. However, it remains to be determined whether the biochemical efficiency of the RFO pathway components generally correlates with the niche the bacterium inhabits.
Despite the implications of our biochemical findings (as discussed above), there remains the enigmatic relationship between this locus and the tissue tropism and virulence of S. pneumoniae. In the recent study by Minhas et al. (28), isolates with a mutation in rafR were less able to persist in the murine lung than WT isolates but showed higher bacterial loads in the ear and brain. Isolates bearing a single amino acid substitution in rafK or a complete rafK deletion exhibited lower bacterial loads in all body sites. Given that RafK is a promiscuous ATPase known to power at least four different ABC transporters in S. pneumoniae (15,17,19) and that an effect of the reported rafR mutation was a significant reduction in rafK expression (28), we propose that the observed tissue tropism and virulence phenotype seemingly associated with the ability of S. pneumoniae to utilize raffinose may, in fact, be due to indirect effects arising from the lack or significantly reduced expression of RafK. RafK is known to be involved in the import of sialic acid, maltotetraose (derived from glycogen), FOSs, and RFOs by four different ABC transporters (15,17,19). It has also been proposed that RafK likely powers an additional two ABC transporters in S. pneumoniae, one of which we have shown imports N-glycans (4) (the other is uncharacterized). All of the characterized transporters have been associated with pneumococcal virulence and/or colonization in at least one animal model study (8,9,(52)(53)(54). Furthermore, reduced bacterial loads in the lungs have been reported for ⌬rafK pneumococci in two independent studies (19,28). S. pneumoniae has the capacity to utilize more than 30 different carbohydrates, and these carbohydrates are likely differentially available at different human body sites (for example, gangliosides are most abundant in the brain (55)). Therefore, the essentiality of RafK for import of a subset of these carbohydrates has the potential to influence the tissue tropism of S. pneumoniae. Overall, the inefficiency of the RFO pathway suggests that it is not a key metabolic pathway in the pneumococcus, and previous virulence findings relating to this locus result from the more general context of carbohydrate

Molecular analysis of pneumococcal RFO utilization
uptake and the importance of this to the host-pneumococcus interaction.

Materials
Raffinose, stachyose, verbascose, and linear type II blood group B trisaccharide were purchased from Carbosynth Ltd. (Berkshire, UK). ␣-(133)-Galactobiose was from Dextra Laboratories Ltd. (Reading, UK). Type II blood group A and B tetrasaccharides and Gb3 were obtained from Elicityl (Crolles, France). All other materials were from Millipore-Sigma, unless otherwise stated.

Cloning and mutagenesis
The genes encoding for full-length Aga (locus tag SP_1898), RafE minus its secretion signal sequence and lipid-anchoring motif (amino acids 24 -419; locus tag SP_1897), and full-length GtfA (locus tag SP_1894) were amplified by PCR from TIGR4 genomic DNA with the primers Aga-F and Aga-R, RafE-F and RafE-R, and GtfA-F and GtfA-R, respectively (Table S1). PCR products were cloned into pET28a between the NdeI and XhoI sites by In-Fusion cloning (Takara Bio USA Inc., Mountain View, CA) to produce pET28a-Aga, pET28a-RafE, and pET28a-GtfA. Mutagenesis of pET28a-Aga to generate the Aga D472N mutation was performed using the QuikChange site-directed mutagenesis kit (Agilent Technologies, Santa Clara, CA). Mutagenic primers are listed in Table S1. The integrity of all constructs was confirmed by bidirectional sequencing (Sequetech, Mountain View, CA).

Protein expression and purification
Protein expression constructs were transformed into BL21(DE3). Expression of Aga, RafE, and GtfA was performed in LB broth with 0.5 mM isopropyl ␤-D-1-thiogalactopyranoside induction at 16°C for 18 h. Standard procedures, as previously detailed (44), were used to lyse cells and purify the released proteins by immobilized metal affinity chromatography. Subsequent purification by size-exclusion chromatography, using either an S100 or S200 HiPrep 16/60 Sephacryl column (GE Healthcare) as appropriate, was performed using 20 mM Tris, pH 8.0, 500 mM NaCl. Protein purity was judged by SDS-PAGE analysis, and protein concentrations were determined using extinction coefficients calculated by ProtParam on the ExPASy server (56).

Isothermal titration calorimetry
ITC was performed as described previously (57) using a VP-ITC (MicroCal, Northampton, MA) in 50 mM potassium phosphate buffer, pH 6.6, at 25°C. RafE was used at a concentration

Molecular analysis of pneumococcal RFO utilization
of 100 M, and ligands at a concentration of 2.5 mM were titrated into protein. Ligand solutions were prepared using buffer saved from the last step of extensive dialysis of the protein solution. All solutions were filtered and degassed immediately before use. The data were fit to a one-site binding model. Because conditions were not sufficient to allow fitting of the stoichiometry (n), the n value was fixed at 1, which was justified based on the 1:1 interactions revealed by the crystallographic analyses of RafE.

␣-Galactosidase assays
All Aga enzyme assays were performed at 37°C. Aga was initially tested for activity against pNP-␣-Gal and Y-␤-Gal at 200 nM in 20 mM Tris, pH 8.0, with 1.5 mM substrate. Release of pNP was monitored at 405 nm in a SpectraMax M5 plate reader (Molecular Devices, San Jose, CA). The pH optimum of Aga (100 nM) was determined using 0.5 mM pNP-␣-Gal in McIlvaine buffers, pH 2.4 -8.0. Reactions in quadruplicate were incubated for 15 min, stopped by the addition of NaOH to 60 mM, and read at 405 nM. Kinetic constants for Aga (1 nM) against pNP-␣-Gal were determined in Aga assay buffer (50 mM NaH 2 PO 4 / K 2 HPO 4 , pH 6.5) by following the release of pNP directly at 405 nm. The extinction coefficient of pNP in Aga assay buffer was experimentally determined to be 3707 M Ϫ1 cm Ϫ1 . Aga was also screened for activity against a range of pNP substrates as described above for pNP-␣-Gal and pNP-␤-Gal. For all other substrates, kinetic constants were determined by quantifying the release of galactose in a stopped assay. Reactions (in triplicate) contained 40 nM Aga and varying concentrations of substrate in Aga assay buffer. Samples (30 l) were taken every 3-5 min and stopped by heating to 95°C for at least 10 min. Once cooled, 25 l of each sample were mixed with components from the L-arabinose/D-galactose assay kit (Megazyme Inc., Chicago, IL), which contains a galactose mutarotase and a NAD ϩ -dependent ␤-galactose dehydrogenase. Galactose detection reactions (100 l) contained 0.5 l of kit enzyme mix and 4 l of NAD ϩ (kit supply) in kit buffer. Kit reactions were incubated at 25°C and read at 340 nm every 10 s until the absorbance stabilized. Final absorbances were converted to galactose concentrations according to the manufacturer's instructions and accounting for the dilution factor of the original Aga reaction.

Thin-layer chromatography
TLC screening of Aga linkage specificity was performed with 1 mM substrate and 100 nM enzyme in Aga assay buffer at 37°C for 18 h. Reactions and standards were spotted onto precoated POLYGRAM SIL G/UV 254 TLC sheets (Thermo Fisher Scientific), separated in a solvent of 2:1:1 butanol:acetic acid:distilled H 2 O, and visualized with 0.2% (w/v) napthoresorcinol in acidified ethanol followed by heating at 90°C. For GtfA-containing reactions, 10 mM substrate was digested with 30 M GtfA and/or 100 nM Aga in 100 mM Aga assay buffer at 37°C for 18 h. Spotted reactions and standards were separated in a solvent of 6:7:1 chloroform:acetic acid:distilled H 2 O, and visualized with acidified ethanol followed by heating at 90°C.

General crystallography procedures
Crystals were obtained using sitting-drop vapor diffusion for screening and hanging-drop vapor diffusion for optimization at 18°C. Prior to data collection, single crystals were flash-cooled with liquid nitrogen in crystallization solution supplemented with 20 -25% (v/v) ethylene glycol as cryoprotectant. Diffraction data were collected on an "in-house" beam comprising a Pilatus 200K 2D detector coupled to a MicroMax-007HF X-ray generator with a VariMaxTM-HF Arc/Sec confocal optical system and an Oxford Cryostream 800. All diffraction data were processed using HKL2000 (58). Data collection and processing statistics are shown in Tables 3 and 4. All structures were solved by molecular replacement with PHASER (59). Initial models were built using BUCCANEER (60) followed by COOT (61). Refinement of atomic coordinates was performed with REFMAC (62). The addition of water molecules was performed in COOT with FINDWATERS and manually checked after refinement. In all data sets, refinement procedures were monitored by flagging 5% of all observations as "free" (63). Model validation was performed with MolProbity (64). Finally, the models obtained were represented using PyMOL (PyMOL Molecular Graphics System, version 1.6.0.0, Schrödinger, LLC).

RafE complex structure determinations
Cocrystals of RafE (28 mg ml Ϫ1 ) were obtained in 0.1 M Tris, pH 8.0, 0.2 mM CsCl, 20% (w/v) PEG 3350 with 10 mM raffinose, stachyose, or verbascose. The crystallization condition for the verbascose complex also contained 0.1 M MnCl 2 . The initial stachyose complex was solved by molecular replacement using an unpublished raffinose complex structure (PDB code 2I58) as the search model.

Aga Apo and complex structure determinations
Crystals of apo Aga (14 mg ml Ϫ1 ) were obtained in 0.1 M sodium acetate:acetic acid, pH 4.6, 1.1 M ammonium tartrate dibasic. The structure was solved by molecular replacement using the structure of G. stearothermophillus GH36 ␣-galactosidase AgaA (PDB code 4FNU) (38) as the search model. To obtain a galactose product complex, a crystal of Aga obtained in 1.0 M sodium phosphate monobasic/potassium phosphate dibasic, pH 5.6, was soaked with excess (Ͼ50 mM) raffinose for 45 min prior to cryoprotection. For all other complexes, crystals of Aga D472N (17 mg ml Ϫ1 ) obtained in 0.1 M sodium acetate: acetic acid, pH 4.6, 0.7-1.0 M ammonium tartrate dibasic were soaked with excess melibiose, raffinose, ␣-(133)-galactobiose, or linear type II blood group B trisaccharide for up to 30 min prior to cryoprotection.

Aga cellular localization
S. pneumoniae TIGR was grown in AGCH medium (65) containing 1% (w/v) raffinose at 37°C in a candle jar to an A 600 of 0.6, then fractionated as previously described (44). The presence of Aga in each cellular fraction was determined by adding 5 l of fraction to 100 l of 1 mM pNP-␣-Gal in Aga assay buffer, incubating at 37°C, and monitoring the absorbance at 405 nm in a SpectraMax M5 plate reader. Equal volumes of each fraction buffer were added to control wells containing substrate, and the data were subtracted from the corresponding test wells.