Structure of Surface Layer Homology (SLH) Domains from Bacillus anthracis Surface Array Protein*

Surface (S)-layers, para-crystalline arrays of protein, are deposited in the envelope of most bacterial species. These surface organelles are retained in the bacterial envelope through the non-covalent association of proteins with cell wall carbohydrates. Bacillus anthracis, a Gram-positive pathogen, produces S-layers of the protein Sap, which uses three consecutive repeats of the surface-layer homology (SLH) domain to engage secondary cell wall polysaccharides (SCWP). Using x-ray crystallography, we reveal here the structure of these SLH domains, which assume the shape of a three-prong spindle. Each SLH domain contributes to a three-helical bundle at the spindle base, whereas another α-helix and its connecting loops generate the three prongs. The inter-prong grooves contain conserved cationic and anionic residues, which are necessary for SLH domains to bind the B. anthracis SCWP. Modeling experiments suggest that the SLH domains of other S-layer proteins also fold into three-prong spindles and capture bacterial envelope carbohydrates by a similar mechanism.

Surface layers (S-layers) 3 are para-crystalline sheets of protein, which self-assemble on the surface of microbial cells to form contiguous layers (1,2). Most organisms that elaborate S-layers do so by abundantly producing and secreting a single protein species (3). Whether an organism produces an S-layer as a component of its envelope structure is assessed by electron microscopy of the cell surface (4). In this manner, species from nearly every branch of the Bacteria and Archaea have been dis-covered to produce S-layers (2). Proteins within S-layers fulfill variable functions in that they act either as a scaffold or enzyme in the bacterial envelope (5), promote nutrient diffusion or transport (6), or contribute to virulence by enabling microbial adhesion to infected host tissues (7).
Most, but not all, S-layer proteins of bacteria share three tandem ϳ55 amino acid repeats of the Surface Layer Homology (SLH) domain (8 -10). Secreted proteins encoding three tandem SLH domains are tethered to the bacterial envelope by non-covalent interactions between the SLH domains and a secondary cell wall carbohydrate (11). SLH domains are remarkable for being both necessary and sufficient for the incorporation of chimeric proteins into S-layers (12,13). The SbsC protein of Geobacillus stearothermophilus is an example for a class of protein that forms S-layers without SLH domains (14). SbsC binds to the secondary cell wall polysaccharide (SCWP) of G. stearothermophilus via its N-terminal domain, which consists of three triple-helical bundles that are connected by two contiguous helices (14). The N-terminal domain of SbsC has high similarity with S-layer proteins from G. stearothermophilus, Geobacillus kaustophilus, and Geobacillus tepidamans (14) and is not similar to proteins with SLH domains.
The Gram-positive bacterium Bacillus anthracis is a rodshaped, spore-forming pathogen of mammalian hosts (15). The envelope of its vegetative forms is composed of a plasma membrane and peptidoglycan layer with attached secondary cell wall polysaccharide (SCWP) (16) and poly-D-␥-glutamic acid capsule (PDGA) (17,18). The genome of B. anthracis encompasses 24 open reading frames whose predicted translation products each contain a secretion signal and three tandem SLH domains (19). An operon of two such genes, sap and eag, encodes the main S-layer proteins, Sap and EA1, and is adjacent to csaB, a gene required for decorating SCWP with ketal-pyruvate (11,20). B. anthracis SCWP is a polymer with the repeating structure [36)-␣-GlcNAc-(134)-␤-ManNAc-(134)-␤-GlcNAc- (13] n, where ␣-GlcNAc is substituted with ␣-Gal and ␤-Gal at O3 and O4, respectively, and the ␤-GlcNAc is substituted with ␣-Gal at O3 (21). The position of the ketal-pyruvate modification on the SCWP is not known. B. anthracis can form S-layers from both extractable antigen 1 (EA1) and surface array protein (Sap) by tethering the SLH domains of these polypeptides to pyruvylated SCWP (20,22,23). C-terminal to the SLH domains, S-layer proteins encode crystallization domains, sequences predicted to enable subunit-subunit interactions within the S-layer (22,24,25). A simple model for S-layer assembly is that secreted subunits are recruited to the edge of an extant S-layer network via enthalpy-driven interactions between crystallization domains and are then tethered to the SCWP via the SLH domains (11). This model matches growth of the S-layer(s) with increases in the avidity of these networks for the cell wall. In this manner, bacilli assemble an S-layer on top of their peptidoglycan and thread PDGA capsule between S-layer protein subunits (26,27). To gain insight into the molecular mechanisms of S-layer assembly, we determined here the three-dimensional structure of the SLH domains of Sap by x-ray crystallography and further explored the mechanism of binding to the SCWP.
Protein Crystallization-E. coli BL21(DE3) cells harboring psap SLH were grown under conditions that promote SeMet incorporation (28). Labeled protein was subjected to immobilized metal affinity chromatography (IMAC) (29), concentrated and purified further by size exclusion chromatography on a Superdex 75 16/60 column equilibrated with 20 mM HEPES, 250 mM NaCl, 2 mM DTT, pH 8.0. A Mosquito liquid dispenser (TTP LabTech) was used to dispense 0.4 l of protein (20 mg/ml) and 0.4 l of reservoir solution into 96-well Crystal-Quick plates (Greiner); droplets were equilibrated against 140 l of reservoir solution. Crystals formed in 0.1 M citrate pH 5.5, 2.0 M ammonium sulfate at 4°C were treated with cryoprotectant (crystallization buffer containing 3.5 M ammonium sulfate), mounted on CryoLoops (Hampton Research) and frozen in liquid nitrogen.
X-ray Diffraction and Structure Determination-Singlewavelength anomalous diffraction (SAD) data were collected near the selenium absorption peak at 100°K from a single SeMet-labeled crystal at the 19ID beamline of the Structural Biology Center at the Advanced Photon Source (Argonne National Laboratory) using the program SBCcollect. Intensities were integrated and scaled with the HKL3000 suite (30). Heavy atom sites were located using the program SHELXD (31) and phased with the program MLPHARE (32). The density modi-fied map output was submitted for model building with programs ARP/wARP. The structure was completed with manual model building using the program COOT (33) and refined with REFMAC (34). The final R was 16.6% and the free R was 18.9% with zero ⌺ cutoff. The stereochemistry of the structure was checked with PROCHECK (35). Comparison of Sap SLH structure with other proteins in the Protein Data Bank (PDB) via the DALI server (Holm) identified only very limited structural homology. The closest homologue is a domain of X-prolyl dipeptidyl aminopeptidase (PDB ID 1LNS, Z-score 5.8, RMSD 3.1 Å) (36) that matches about half of Sap SLH .
GST-SLH Purification and S-layer Assembly-E. coli BL21 (DE3) carrying pgst-SLH or its variants was grown in 1-liter cultures to A 600 1 and induced with 1 mM IPTG for 3 h at 37°C. Cells were harvested by centrifugation, suspended in PBS and disrupted by French press. Cleared lysate was subjected to affinity chromatography on glutathione-Sepharose and eluted with 20 mM reduced glutathione. For circular dichroism studies, GST-SLH variants were bound to glutathione-Sepharose and treated with 50 units of thrombin (Sigma) for 3 h at room temperature. Eluate was treated with 100 l of benzamidine-Sepharose to remove thrombin and dialyzed against 8 mM NaH 2 PO 4 , 1.5 mM Na 2 HPO 4 . Purified protein (100 g ml Ϫ1 ) was subjected to circular dichroism using an AVIV 202 CD Spectrometer at room temperature.
For S-layer assembly assays, B. anthracis Sterne was grown overnight in BHI. Vegetative forms were sedimented by centrifugation at 10,000 ϫ g, suspended in 3 M urea and heated in a boiling water bath for 30 min to strip the S-layer from murein sacculi with attached SCWP. Cell wall preparations were washed with water and then with PBS and stored in aliquots at Ϫ20°C. GST or GST-SLH were mixed with cell wall suspensions at the indicated A 600 and incubated at room temperature for 10 min followed by centrifugation at 10,000 ϫ g. Supernatant and sediment were mixed with sample buffer, heated to 95°C and analyzed by SDS-PAGE. Protein was quantified by measuring pixel intensity of acquired images with Adobe Photoshop CS3.

RESULTS
Crystal Structure of the SLH Domains of Sap-The 814 amino acid Sap precursor harbors a N-terminal signal peptide (residues 1-30), three SLH domains (residues residues 34 -197) as well as a large C-terminal domain (residues 210 -814) that promotes the crystallization of the S-layer protein (25). The nucleotide sequence of the B. anthracis sap gene, i.e. codons 31 through 210, was cloned into the expression vector pET16b to generate psap SLH (Fig. 1A). This plasmid was transformed into E. coli BL21(DE3) and T7 RNA polymerase-mediated expression of Sap SLH in bacterial cultures was induced with IPTG. Sap SLH encompasses an N-terminal ten histidyl tag and amino acids Gly 31 -Glu 210 of Sap (Fig. 1C), and was purified from cleared bacterial lysates by affinity chromatography on Ni-NTA-Sepharose and analyzed for purity by Coomassiestained SDS-PAGE (Fig. 1B). Sap SLH crystallized in the tetragonal space group P4 1 2 1 2 1 with one monomer in the asymmetric unit. Its structure was determined using the single-wavelength anomalous diffraction (SAD) approach with Se-Met labeled crystals and refined to 1.80 Å resolution. All residues could be assigned either to the most favored or the allowed regions of Ramachandran plot statistics ( Table 1). The structural model that could be derived from these data accounts for the positions of amino acid residues Lys 32 -Thr 209 of Sap (Fig. 1C).
The overall structure of the three SLH domains resembles a three-prong spindle, where each prong is derived from a single SLH domain (Fig. 2). The base of the spindle is assembled from all three domains, each of which contributes a single helix that associates into a three-helical bundle (Fig. 2). The three SLH domains of Sap SLH , SLH 1 (residues 31-90), SLH 2 (91-151) and SLH 3 (152-209), share limited sequence identity: SLH 1 versus SLH 2 26%, SLH 1 versus SLH 3 39%, and SLH 2 versus SLH 3 26%. Nevertheless, the structures formed by each of the three SLH domains are nearly identical (Fig. 3). Thus, Sap SLH can be considered as a pseudo-trimer that is assembled from its three SLH domains. When inspected from the C terminus (cargo view), the SLH domain prongs of Sap SLH proceed clockwise from the N terminus and surround the three helical bundle base of the spindle (a, b, and c in Fig. 2A). Each SLH domain includes a helix on the lateral side of the molecule, two loops (A and B in Fig. 2B) and the beginning of one linker helix (Fig. 2B).
A Model for SCWP Binding to Sap SLH -A group of five residues, named the ITRAE motif for its consensus sequence (Fig.  1C), is partially conserved among the SLH domains of bacterial S-layer proteins (10) and occupies the last four residues of loop B and the first residue of the central helix bundle (Fig. 2C). Within the SLH domains of Sap, these motifs have the sequences LTRAE, IDRVS, and VTKAE and contain the cationic residues Arg 72 , Arg 131 , and Lys 193 , respectively (Fig. 1C). The corresponding positively charged residues of the ITRAE motif are considered crucial for the incorporation of protein into the S-layer of Thermoanaerobacterium thermosulfurigenes (13). Analysis of the solvent accessible surface of Sap SLH revealed three small tunnels at the spindle base (Fig. 4). The tunnels and ITRAE motifs are arranged such that Arg 72 from SLH 1 penetrates the tunnel in SLH 2 , appearing on the surface between SLH 2 and SLH 3 (Fig. 2D). Similarly, Arg 131 from SLH 2 penetrates the tunnel in SLH 3 and appears on the surface between SLH 1 and SLH 3 , whereas Lys 193 is inserted into the SLH 1 tunnel and displayed on the surface between SLH 2 and SLH 3 (Fig. 2D). Thus, all three SLH domains contribute residues to the surface structure of each of the inter-prong grooves (IPG) that are formed by the adjacent prongs of S-layer proteins (Fig. 5, A and B).
We wondered whether the inter-prong grooves formed by adjacent SLH domains promote association of the S-layer protein with its carbohydrate ligand. In agreement with this conjecture, six amino acids corresponding to the most conserved residues among B. anthracis SLH domains contribute to the surface of the inter-prong grooves, for example Arg 72 , Phe 95 , Asp 97 , Lys 143 , Trp 164 , and Lys 168 of IPG 2 (Figs. 5C and 1C). The SCWPs of different bacterial species represent a complex set of  carbohydrates with hexose units organized in linear and branched fashion (37). We modeled B. anthracis SCWP into the structure of SLH domains to analyze the anchoring of carbohydrates within their three-prong spindle (Fig. 5).    JULY 22, 2011 • VOLUME 286 • NUMBER 29

JOURNAL OF BIOLOGICAL CHEMISTRY 26045
The SCWP molecule was constructed as described in Choudhury et al. (21). A selection of low energy conformers of the molecule were manually modeled into the presumed binding sites of IPG 1 , IPG 2 , and IPG 3 by performing rigid body rotations and translations of the SCWP to minimize steric clashing with SLH molecule. Once a suitable pose was identified, a short conjugate gradient minimization procedure was performed (38).
The conserved residues in IPG 2 and the modeled SCWP are shown in Fig. 5CD. Similar SCWP poses and conserved residue arrangements were observed for the equivalent residues on the surface of IPG 1 and IPG 3 , but are not shown for visual clarity. The conserved phenylalanine residues (Phe 34 , Phe 95 , and Phe 156 ) are located at the domain interfaces, most likely playing an important role in domain packing (Fig. 5AB). However, the conserved aspartic acid residues (Asp 36 , Asp 97 , and Asp 156 ) are located outside of the SCWP binding domains. Their planar arrangement on the bottom of the molecule suggests that they may be involved in interacting with peptidoglycan or the cell wall linkage units, which provide a tether for SCWP and the envelope of bacilli (20). Of note, our modeling experiments with SLH domains cannot consider the essential contribution of the SCWP ketal-pyruvyl, as the position of this carbohydrate modification is not yet known. We therefore sought to test whether the conserved Asp and Arg/Lys residues in the interprong grooves of Sap SLH can indeed contribute to binding SCWPs in the envelope of bacilli.
Functional Studies of Sap SLH Variants-Glutathione S-transferase (GST) hybrids were used to examine the contribution of individual Sap SLH domains and their key residues toward B. anthracis SCWP binding. Purified GST-SLH 1-3 , a hybrid encompassing amino acids 31-210 of Sap fused to the C terminus of GST, was incubated with B. anthracis vegetative forms that had been stripped of their S-layer. As a measure of SCWP binding, we monitored co-sedimentation of GST hybrids with bacilli following centrifugation (Fig. 6). As a control, GST alone did not sediment with vegetative forms, whereas all GST-SLH 1-3 molecules co-sedimented with 10 A 600 units of bacilli (Fig. 6, A and B). A reduction in the number of bacilli led to reduced co-sedimentation of GST-SLH 1-3 (Fig. 6, A and B). These results are in agreement with the general concept that the SLH domains of S-layer proteins and the SCWP of bacilli represent a receptor-ligand interaction. The removal of individual SLH domains in GST-SLH 1-2 and GST-SLH 1 abolished the association of GST hybrids with the SCWP of bacilli (Fig. 6,  A and B). When compared with SLH 1-3 , the CD spectra of isolated SLH 1-2 and SLH 1 domains displayed stepwise decreases in helix content but retained the overall pattern   (Fig. 6C). We presume that the lack of GST-SLH 1-2 and GST-SLH 1 association with SCWP is due to the inability of these variants to form the three-pronged spindle structure rather than defects in the overall folding of individual SLH domains. These data demonstrate that a functional binding interface requires the presence of all three tandem SLH domains.
A striking feature of our Sap SLH structure is the juxtaposition of conserved, basic residues extending into the inter-prong grooves of adjacent SLH domains (vide supra), to conserved aspartic acid residues positioned in the loop region of each spindle prong (Fig. 2D). To test whether these residues are required for SCWP association we generated two variants.
GST-SLH RRK/AAA harbors alanine substitutions in all three basic residues (R72A, R131A, and K193A), whereas GST-SLH DDD/AAA carries substitutions in the three aspartic acid residues (D36A, D97A, and D158A). GST-SLH RRK/AAA displayed a reduction in its SCWP binding capacity as approximately half of the protein failed to associate with vegetative forms even at 10 A 600 units (Fig. 6, A and B). A similar phenotype was observed for GST-SLH DDD/AAA variants. Thus, charged residues conserved in the inter-prong grooves of SLH domains contribute to the binding of S-layer proteins to the SCWP of B. anthracis.
In Silico Prediction of the Three-prong Spindle Structure of SLH Domains from Other S-layer Proteins-Excepting the amino acids Phe 34 , Asp 36 , Trp 42 , Gly 54 , Gly 57 , Glu 64 , Pro 65 , Arg 72 , Ala 76 , and Asn 84 which are conserved in many B. anthracis SLH domains, the SLH domain peptide sequence is variable (Fig. 1C). To ascertain whether the sequence of these domains can satisfy the spatial constraints imposed by the empirically derived Sap SLH structure, we modeled aligned sequences of tandem SLH domain triplets from the B. anthracis EA1 S-layer protein and 22 S-layer associated proteins. Each SLH sequence was first aligned with Sap residues 31-209 using ClustalW.
These alignments and the Sap SLH PDB file were used to produce in silico predicted structures with MODELLER 9v8 using a standard Python script and models refined with PyMOL (MacPyMOL 1.1r1). In silico modeled SLH structures were compared with the empirically derived Sap SLH structure in two ways. First, we used the PyMol align function to measure root mean squared deviation (RMSD) in angstroms (Å) between a set of all atoms in the modeled SLH domains and the Sap SLH structure ( Table 2). Using these criteria, every set of three SLH domains produced average RMSD values from Sap SLH equal to or less than 0.461 Å ( Table 2). Performing the same calculation with only the ␣-carbon backbone, we derived alignments where the largest average RMSD distance was calculated to be 0.269 Å for 123 carbons (Table 2). Second, we subjected all 24 SLH FIGURE 6. Structural requirements of Sap SLH domains to associate with secondary cell wall polysaccharide of B. anthracis. A, purified GST-SLH 1-3 and variants lacking the third (GST-SLH 1-2 ) or the second and third SLH domain (GST-SLH 1 ) or carrying substitutions at conserved acidic (GST-SLH DDD/AAA ) or basic residues (GST-SLH RRK/AAA ) were analyzed for their ability to co-sediment with variable numbers of B. anthracis vegetative forms that had been stripped of their S-layer proteins. GST (glutathione S-transferase) was used as a control for co-sedimentation and depletion of soluble purified protein with the B. anthracis cell wall envelope containing secondary cell wall polysaccharide detected by Coomassie-stained SDS-PAGE. B, quantification of the data in panel A. C, circular dichroism (CD) spectra for isolated SLH 1-3 , SLH 1-2 , and SLH 1 after their thrombin cleavage from GST hybrids. domain PDB files to analysis using PROCHECK (35) to judge the stereochemical quality of each modeled SLH domain (and Sap SLH ). PROCHECK measures whether or not all residues lay within the allowed portions of the Ramachandran plot and scores the likeliness of all modeled bond lengths and angles occurring. Supplemental Table S1 records the PROCHECK outputs for Sap SLH and all 23 modeled SLH domains. Many of our in silico determined structures satisfy all criteria while some fail by containing a single or as many as two amino acids with disallowed F/⌿ angles. As the observed RMSD values, derived from greater than 100 atoms, are low and each of these modeled SLH domains satisfy most, if not all, PROCHECK criteria, we propose that, though highly divergent in primary sequence (Supplemental Table S1), all SLH domains within the B. anthracis genome likely adopt a similar fold (Fig. 7).
The high degree of similarity in shape and position can be appreciated by viewing the Sap SLH structure and the modeled structure for the SLH domains of EA1 (EA1 SLH ) when displayed side-by-side or superimposed (Fig. 7, A and B). The modeled EA1 SLH structure aligns closely with the Sap SLH structure (Fig.  7, A--C), producing an RMSD of 0.119 Å for 152 ␣-carbons (Table 2) with all amino acids within the allowed regions of the Ramachandran plot (Supplemental Table S1). These results explain earlier observations that the SLH domains of Sap and EA1 bind to the envelope of B. anthracis with similar affinity (39). A superimposition of all 23 modeled SLH domains aligned to Sap SLH is provided in Fig. 7D. Perfect alignment is not achieved in all cases, as some SLH domain sequences, when aligned to Sap, exhibit higher disparity in their primary sequences (Fig. 7D). Nevertheless, these structures still contain remarkably similar secondary structures with all models containing helices a, b, c, x, y, and z in the same position and of the same length. We therefore hypothesize that all SLH domains of S-layer proteins in B. anthracis form a three-prong spindle structure and bind to the SCWP in a manner similar to the SLH domain of Sap. Analyzing the predicted secondary structure of SLH domains from S-layer proteins of diverse microbes, Engelhardt and Peters predicted that SLH domains may assume a similar structure (40). Their prediction is corroborated by the results presented in Fig. 7. Structural coordinates for Sap SLH have been deposited in the Protein Database, PDB identification number 3PYW.

DISCUSSION
Binding of SLH domains to pyruvylated secondary cell wall carbohydrates is thought to be an ancestral mechanism for the anchoring of S-layer proteins to the envelope of bacteria (41). B. anthracis elaborates S-layers from two secreted polypeptides with N-terminal SLH domains, Sap and EA1 (22,42). The SLH domains of Sap and EA1 bind to pyruvylated SCWP, a carbohydrate that is tethered via peptidoglycan linkage units to the murein sacculus of this microbe (20). Sap and EA1 are synthesized by bacilli in great abundance and form two-dimensional para-crystalline arrays on solid surfaces, a feature that is encrypted in their large C-terminal crystallization domains (25). Twenty-two B. anthracis S-layer-associated proteins (BSLs) also harbor N-terminal SLH domains (19). Unlike Sap and EA1, BSLs are minor components of the envelope that use their SLH domains to engage SCWPs but do not form paracrystalline arrays (19,20). One of the S-layer associated proteins, BslA, promotes the adhesion of bacilli to host tissues and is required for the pathogenesis of anthrax disease (7,19). The S-layer protein BslK binds to hemin, thereby contributing to an iron-uptake pathway that retrieves heme from hemoglobin for subsequent transport across the many layers of the B. anthracis envelope (6,43). The AmiA protein contains three tandem SLH domains and is a peptidoglycan hydrolase (44). The molecular function of other S-layer proteins is not yet known, however, a set of eight, including AmiA, are putative hydrolases, which may act to shape the peptidoglycan layer of bacilli.
Here we used x-ray crystallography to reveal the three-dimensional structure of the SLH domain of Sap, whose overall shape resembles that of a three-prong spindle. Molecular modeling experiments suggest that the SLH domains of other proteins assume a similar structure. Further, the inter-prong grooves of this domain can accommodate both linear and branched carbohydrates, i.e. the secondary cell wall polysaccharides that are known to function as the docking platform for the assembly of S-layers. Our experiments identify conserved residues in the SLH domains of Sap and other S-layer proteins of bacilli (Phe 34 , Asp 36 , Trp 42 , and Arg 72 ) and we show that two of these, Asp 36 and Arg 72 , contribute to their interaction with SCWP.  Table 2.
Cholera toxin represents another secreted bacterial polypeptide that binds to carbohydrates, specifically the GM1 glycosphingolipid of host cell membranes (45). Cholera toxin B assembles into a pentameric ring structure (46) and docks onto the host cell receptor in a manner whereby subunit interfaces capture the glycosphingolipid ligand (47). Choleragenoid, the toxin B subunit alone, is used as a vaccine antigen to elicit protective antibodies that prevent association of toxin with GM1 receptor. A similar strategy may be plausible to prevent S-layer assembly, and this may provide protection against B. anthracis or other infectious agents that also use S-layer proteins for the pathogenesis of their associated diseases (16).