The Crystal Structure of the Bacillus anthracis Spore Surface Protein BclA Shows Remarkable Similarity to Mammalian Proteins*

The lethal disease anthrax is propagated by spores of Bacillus anthracis, which can penetrate into the mammalian host by inhalation, causing a rapid progression of the disease and a mostly fatal outcome. We have solved the three-dimensional structure of the major surface protein BclA on B. anthracis spores. Surprisingly, the structure resembles C1q, the first component of complement, despite there being no sequence homology. Although most assays for C1q-like activity, including binding to C1q receptors, suggest that BclA does not mimic C1q, we show that BclA, as well as C1q, interacts with components of the lung alveolar surfactant layer. Thus, to better recognize and invade its hosts, this pathogenic soil bacterium may have evolved a surface protein whose structure is strikingly close to a mammalian protein.

Anthrax is a disease that was first described over a century ago and one of the first diseases for which a microbial cause was also determined. The infection is propagated by spores of Bacillus anthracis, which can penetrate into the mammalian host by ingestion, inoculation, or inhalation. The latter mode of infection is the most severe, with rapid progression of the disease and a mostly fatal outcome (1). The spore is a dormant cell form of some microorganisms characterized by high resistance to chemicals and to environmental factors, such as heat and drought, which allow the organism to persist in the environment for decades. After penetrating host tissues, B. anthracis spores germinate into the vegetative, lethal form.
The early stages of B. anthracis infection and the role of the host defenses in the lung are still unclear. A study by confocal microscopy (2) shows that spore germination takes place within the macrophages, where the bacilli start producing toxins, including the lethal factor, protective antigen, and edema factor. Production of these toxins leads to macrophage apoptosis and the release of vegetative bacilli into the bloodstream, causing septicemia and death (3). On the other hand, macrophages appear to protect animals against infection due to inhalation of the spores by eliminating them (4). Another protective barrier that B. anthracis spores will encounter upon inhalation by the host is the epithelial surface of lung alveoli, which is coated with a thin layer of pulmonary surfactant. Pulmonary surfactant is a lipid-protein complex that plays two roles, a mechanical role by reducing the surface tension in the alveolus (which prevents it from collapse during expiration) and an important role in lung defense against infections (5). This immune function is attributed to surfactant proteins (SPs) 2 SP-A and SP-D, two hydrophilic members of the collectin family that can interact with pathogen surfaces, promote their destruction, and thus prevent infections (6). The two other surfactant proteins, SP-B and SP-C, are highly hydrophobic, the latter having been shown to bind bacterial lipopolysaccharides and modulate their cellular effects (7,8).
The B. anthracis spore is encased in a thick multilayered coat, which is further surrounded by the exosporium. Electron microscopy has revealed that the exosporium is composed of a paracrystalline basal layer with a hexagonal lattice structure and a hair-like outer region (9). Isolated B. anthracis exosporia contain at least 12 major protein components (10), one of the most prominent of which was named BclA (for Bacillus collagen-like protein of anthracis) (11), encoded by the bclA gene. BclA is highly glycosylated (12), is the immunodominant antigen of the spore surface (13), and was shown to be the structural constituent of the exosporium filaments (11). The length of bclA differs between strains of B. anthracis, encoding proteins of different sizes. All of these proteins possess an internal collagen-like region of GXX repeats of variable length (from 17 to 91 GXX repeats), which include a large proportion of GPT triplets) (11). It has been shown that the length of the BclA collagen-like region is responsible for the variation in filament length (14,15). Collagen-like sequences are rare in microorganisms, but have been identified in some bacteria and viruses (16). Their function is generally not clear, but they are found in surface proteins. Bacillus cereus, and Bacillus thuringiensis, species close to B. anthracis, both possess a protein related to BclA in their the exosporia (17). There is no sequence similarity, however, between the C-terminal portion of BclA and proteins of known structure.
To obtain insight into the function of this abundant surface protein, we have determined the crystal structure of a recombinant form of BclA. While the collagen-like N-terminal third of the sequence is disordered in the crystal lattice, the structure of the C-terminal two-thirds of the protein is remarkably similar to that of the globular domain of human C1q (18), the first component of complement, which belongs to the tumor necrosis factor (TNF)-like family of proteins.

MATERIALS AND METHODS
Crystal Structure Determination-Recombinant BclA preparation, purification, and crystallization have been described elsewhere (19). The structure was determined using a single heavy atom derivative with data from a laboratory source (19). High resolution data were collected on the ID14 -2 beamline at the European Synchrotron Radiation Facility equipped with an ADSC Q4 detector using a 0.933 Å wavelength. Data reduction was performed using the XDS program package (20). Further data treatment was carried out with programs from the CCP4 suite (21), with the CCP4 version of REFMAC used for refinement.
Circular Dichroism Spectroscopy-Far UV (190 -250 nm) circular dichroism measurements were carried out using thermostated 0.2 mm path length quartz cells in a Jobin-Yvon CD6 instrument, calibrated with aqueous d-10-camphorsulfonic acid. BclA (2.9 mg/ml, determined by absorbance at 280 nm based on a theoretical absorbance of 0.120 at 1 mg/ml) was analyzed in 50 mM Tris-HCl, pH 8.0. Spectra were measured with a wavelength increment of 0.2 nm, integration time of 1 s, and a band pass of 2 nm. For stability studies, the signal at 221 nm was recorded as a function of temperature within the range 11-45°C at a heating rate of 19°C/h.
Isolation of Surfactant Proteins-Human SP-A was purified from bronchoalveolar lavage from patients with alveolar proteinosis. After isolation of the surfactant by centrifugation, SP-A was extracted using sequential butanol and octyl glucoside extractions (22). Porcine SP-B and SP-C were prepared by an adaptation (23) of the method used by Cursted et al. (24).
Binding to Surfactant Components-Solutions (100 l) of hydrophobic surfactant constituents (phosphatidylcholine from egg yolk (Sigma), SP-C, or SP-B) dissolved in chloroform:methanol (1:1 by volume) were dispensed into wells (ϳ4 mg/well) of solvent-resistant (polypropylene) microplates and evaporated under vacuum. Solutions of the hydrophilic surfactant constituent SP-A (100 l, 40 mg/ml, in a carbonate/bicarbonate pH 9.6 buffer) were incubated overnight at 4°C in wells of polystyrene microplates (Immunolon 1; Dynatech). All plates were saturated by overnight incubation at 4°C with 200 ml of 2% bovine serum albumin (Sigma) in Tris buffer (5 mM with 150 mM NaCl, pH 7.4). After washing, the polypropylene and polystyrene plates were incubated overnight at 4°C or 3 h at 37°C, respectively, with solutions (100 ml) of radiolabeled ligands ( 125 I-BclA in Tris buffer plus 0.05% bovine serum albumin, 125 I-C1q in Tris buffer plus 0.2% bovine serum albumin, or 125 I-SP-A in Tris buffer plus 0.2% bovine serum albumin containing 2 mM CaCl 2 ). Plates were then washed extensively, and the remaining bound radioactivity was recovered with 10% SDS and measured.

RESULTS
Characterization of the Recombinant BclA Protein-BclA was used in all experiments as the recombinant histidine-tagged species. Its theoretical molecular mass is 21,389 Da (220 residues). When loaded onto a SDS-polyacrylamide gel, however, BclA migrates as a 29-kDa polypeptide, which could be due to its unusual collagen-like sequence, with a high proline content (14%). The identity of the recombinant protein was therefore ascertained using mass spectrometry. A molecular mass of 21,270.8 Da was determined, compatible with the theoretical value of 21,258 Da after cleavage of the initial methionine (range of error 0.1-0.05%). An identical value was obtained for BclA recovered from washed and dissolved crystals.
The protein crystallized in space group P6 3 22 (a ϭ 67.9 Å, c ϭ 163.3 Å) with one molecule in the asymmetric unit. The crystals diffracted to Ͼ1.8 Å on a laboratory source and to 1.4 Å at the synchrotron (TABLE  ONE), yet the electron density revealed only two-thirds of the expected protein chain, although the entire protein was present in the crystals. The N-terminal third of the protein, consisting of the collagen-like region, is disordered (19). We performed a circular dichroism study of the protein to determine whether the collagen triple helix can form in recombinant BclA in certain conditions. When measured at room temperature, the spectrum of BclA resembled that of a ␤-sheet containing protein with no sign of a collagen triple helical conformation (not shown). On cooling to 11°C, however, a change in conformation occurred, with a marked decrease in ellipticity at 195 nm and an increase in ellipticity in the region of 210 -230 nm (not shown). Such features are characteristic of collagen triple helix formation. When the ellipticity at 221 nm was monitored as a function of temperature, after pre-equilibration at 11°C for 1 h, a sharp transition was detected at ϳ18°C (Fig.  1). This is typical of collagen triple helix unfolding, which for mammalian collagens normally occurs around body temperature (26). We conclude that the collagen triple helix in BclA can form at low temperature but not at the temperature used for crystallization.
BclA Structure-The C-terminal two-thirds of the protein chain visible in the electron density fold into an all-␤ structure with a jelly roll topology (Fig. 2). A search for similar folds in structural data bases gave a strong structural relationship with the TNF-like family of proteins, ectodysplasin Eda-A1 with DALI (27) (Z-score 16.1) and C1q with DEJAVU (28) (DALI Z-score 14.6). The twelve ␤-strands, numbered sequentially A, AЉ, AЈ, BЈ, B, C, D, E, F, G, GЈ, and H, according to the TNF-like family conventions, are connected with the Greek key topology and form two ␤-sheets, each containing five strands, AЈAHCF and BЈBGDE respectively. The main difference with other members of the TNF-like family is the presence of the extra AЉ and GЈ strands that form a two-strand antiparallel ␤-sheet, packing together the AAЈ and GH loops. Three monomers related by crystallographic symmetry form a tight, globular trimer of ϳ50 Å in diameter. The total surface buried by the trimer formation is 1411 Å 2 /monomer, with two small cavities filled with solvent molecules at the center. The C and N termini come together on the bottom side of the trimer, with the N-terminal residues of the non-collagenous part intertwining and then pointing outwards (Fig. 3). The visible electron density is of excellent quality (19), except for the very N-and C-terminal residues, where for the latter, only the first two His residues of the affinity tag are visible. Seven residues/monomer have two alternate conformations, four that lie on the surface of the trimer close to the crystallographic 2-fold axis and three that lie at the trimer interface within one of the two cavities.
BclA trimer assembly is very similar to other TNF-like trimers. Because of the shorter protruding loops, the overall shape of the BclA trimer is somewhat more globular than that of TNF, and it is slightly smaller than the globular head of C1q (by ϳ5 Å in diameter). Upon trimer formation, the buried surface of 4234 Å 2 is less than the equiva-lent surface areas in C1q (5490 Å 2 ) or collagen VIII NC1 (6150 Å 2 ). This is due both to its slightly smaller overall size but principally to the existence of a solvent-filled cavity within the trimer center.
We do not see any cations (e.g. calcium) bound to the protein; indeed, there are no acidic residues in the region corresponding to the environment of the calcium sites in collagen X or C1q (which are nearly equivalent). On the other hand, in the high resolution electron density map, we have located a solvent ion that we have interpreted as cacodylatesequestered from the crystallization buffer (a significant anomalous peak is visible when maps are calculated with data collected at a wavelength of 0.933 Å but none in maps from the 1.542 Å data). There are no other ions bound to the ordered part of the structure. The high resolution structure of the collagen VIII NC1 trimer does not show any bound calcium ions, because it lacks the appropriate ligands (29), but there is a sulfate ion within the inner solvent-filled cavity. When the two structures are superimposed, the cacodylate and sulfate ions lie within 8 Å of each other in similar environments. They lie at the bottom of the solvent channel, which in both cases is lined with hydrophobic residues. Whether this capacity of sequestering ions from the surrounding solvent plays a functional role remains an open question for both proteins.
Structural Alignment of BclA with TNF-like Proteins-It has already been pointed out that the non-collagenous part of collagen VIII and X, the globular domain of ACRP30 and the globular domain of C1q (gC1q) belong to the TNF-like family of proteins (30). The TNF-like family contains proteins sharing the same jelly roll fold, usually organized as trimers but with very diverse functions, from cytokine activity (TNF-␣), metabolism regulation (ACRP30), and connective tissue organization (collagen VIII and X) to innate immunity (C1q). Despite the very low sequence identity of the C-terminal part of BclA with C1q, collagen VIII, or TNF (23, 25, and 11% identity, respectively), the three-dimensional structure of BclA is very similar to a TNF-like domain.

Structure of BclA from B. anthracis
The superimposed structures were used to make a structure-based alignment of BclA with known TNF-like structures (supplemental Fig.  1A). The alignment shows that the residues most conserved between members of the TNF-like family are also conserved in BclA; among residues Gly-159, Tyr-161, Phe-237, and Leu-242 (ACRP30 numbering), Gly and Tyr are conserved in BclA, Phe is replaced by an Ile with its side chain occupying the same position, and Leu is replaced by Val. Analysis of the alignment with ConSurf (35) shows that the most conserved residues are either buried or involved in trimer formation, whereas the most variable residues map to the exposed surface of the trimer. Three residues on the surface of the trimer are moderately conserved. Gln-113, which corresponds to Asn in C1q and the collagens, lies close to Asn-181 (Asn in C1qA) of the next monomer within the trimer, close to its base. Ser-166 (Cys in all C1q subunits) lies close to the top end of the BclA trimer (supplemental Fig. 1B).
Most of the differences are located in the loops connecting ␤-strands. Compared with TNF-like proteins recognized by specific TNF receptors (such as TNF-␣ and CD40L), BclA has shorter CD, DE, and EF loops, which were shown to be crucial for the binding of the ligand to its receptor. BclA does not have the exposed aromatic amino acids characteristic of the short chain collagens VII and X and that may be important for lateral interactions of the collagen molecules. Thus, on the basis of the length of the loops and the amino acid content, BclA is closer to C1q than to collagens or to TNF-like proteins.
Search for a C1q-related Function-C1q plays a key role in innate immunity by recognition of immune complexes and the initiation of the classical complement pathway (36). We therefore tried to ascertain whether BclA might behave like C1q in this respect. We failed to show any significant interaction with immunoglobulins G by enzyme-linked immunosorbent assay and by surface plasmon resonance (data not shown). BclA did not provoke lysis of activated red blood cells either. 3 More recently it has been recognized that C1q can be directly involved in the triggering of defensive cellular functions, such as chemotaxis, release of cytokines, phagocytosis, and cytotoxicity (37). These are mediated by specific receptors present on the effector cell surface (38,39), whose precise mechanisms have not yet been elucidated. Binding assays with calreticulin (cC1qR, receptor of the collagen-like domain of C1q) and with p33/gC1qR (receptor for the globular domain of C1q) were, however, also negative. Furthermore, unlike recombinant TNF-␣, BclA fails to induce a secretion of interleukin-8 by lung epithelial A549 cells (data not shown).
Binding of BclA to Lung Alveolar Surfactant Components-We next analyzed the binding of 125 I-BclA to plates coated with phosphatidylcholine, the most abundant lipid in the surfactant, with SP-B and SP-C, its two hydrophobic protein components, and with SP-A, its most abundant hydrophilic constituent. The results show that, among all these molecules, only SP-C interacts with 125 I-BclA (Fig. 4a) and that this interaction is dose-dependent (Fig. 4b). Interestingly, 125 I-C1q also binds to plates coated with SP-C (Fig. 4c), thus suggesting that BclA and C1q recognize common targets in the alveolar compartment. In contrast, the collectin 125 I-SP-A did not interact with plates coated with SP-C in the presence of calcium (data not shown).

DISCUSSION
We have solved the structure of the C-terminal part of BclA, the major component of the exosporium of the B. anthracis spore. The length of the fibers composing the hair-like surface of B. anthracis exosporium is related to the number of collagen-like repeats in BclA (14), yet the role of the collagen-like sequence is unclear. Unfortunately, we do not see well defined electron density for this part of the molecule, despite the presence of the entire N-terminal polypeptide in our crys-3 V. Fré meaux-Bacchi, personal communication. tals. Indeed, the fact that the monomers in the trimer are related by strict crystallographic symmetry is incompatible with the formation of an ordered collagen-like triple helix (19). We do not see even disordered electron density beyond Gly-80, however, and we therefore assume that the collagen-like N-terminal sequence is randomly oriented. Recombinant BclA may indeed have a less stable collagen part, because it was shown that glycosylation on Thr residues increases the stability of worm collagen (40) as well as synthetic (GPT) 10 (41) triple helices. Our thermal denaturation curves confirm that the temperature of the crystal growth (20 -21°C) is above the melting temperature (T m ) of the recombinant BclA, and thus the triple helix would be unstable even in solution. The C-terminal part of the protein forms a much more stable trimer, which was recently shown to crystallize spontaneously (15). Interestingly, the same authors measured a T m of 37°C for recombinant (i.e. non-glycosylated) BclA from a different anthrax strain containing 76 GXX repeats compared with 21 in our protein. In contrast, (GPT) 10 does not form a triple helix even at 5°C (41). It is known that the stability of collagenous triple helices increases as a function of the number of GXX triplets (42), and our results fit within this pattern.
BclA was identified as the immunodominant protein of the B. anthracis exosporium (13). Despite being highly glycosylated, including novel sugar moieties (12), the authors found that the antibodies raised against the exosporium were specific for the protein moiety of BclA and not the carbohydrate. A recent paper (15) has convincingly shown that the BclA molecules lie with the globular C-terminal part on the exterior and the collagen-like N terminus anchoring the protein within the exosporium base. It is therefore the globular domain of BclA that is immunogenic, and thus this exposed part of the protein is in contact with the environment as well as any host tissues. BclA does not seem to be essential for spore germination, because bacilli lacking BclA can produce spores and germinate (11). A striking feature of the TNF-like trimer is its stability; indeed, the trimeric globular head is thought to be the nucleation site for the formation of the collagen triple helix in several mammalian collagens (43). The trimer interface in the BclA structure shows surprising sequence similarity with TNF family proteins and has been shown to confer exceptional stability on the entire protein (15). Thus one of its roles would certainly appear to be one of a robust protective shield for the bacterial spore.
To date, very few bacterial proteins have been shown to possess TNFlike topology. The superantigen YPMa from Yersinia pseudotuberculosis has structural homology with TNF (44) but does not form trimers under physiological conditions. TNF-like topology is more frequently encountered in viral capsid proteins, a similarity noted some time ago (45,46). Although the YPMa protomer shows a higher structural similarity with viral proteins, BclA resembles the TNF superfamily proteins much more (DALI Z-score between 12 and 16 against only 5-6 for the viral capsid proteins). The resemblance is even more striking when the entire trimer is taken into account (Fig. 3). The trimers formed by most viral capsid proteins are fairly planar, with a rather variable interface (see e.g. Fig. 4 in Ref. 44)). Very recently, a high resolution structure of a bacteriophage (PRD1) spike protein was solved, showing a remarkable similarity to the TNF superfamily of proteins, including the formation of TNF-like trimers (47). This protein consists of an N-terminal "shaft" and a C-terminal head domain. The structure is clearly similar to that of adenovirus and rheovirus spike proteins, with a conserved assembly of the shaft domains and a similar assembly of the head domains. Yet only the PRD1 head domain structure superimposes well on the TNF-like family fold. Interestingly, the spike proteins of all three viruses are essential for virus attachment to host cells. Of all the microorganism proteins studied so far, the structure of the BclA trimer remains, however, the most extraordinarily close to the mammalian TNF-like family of proteins and in particular to C1q.
Our results show furthermore that BclA, as well as C1q, interacts with the extracellular molecule SP-C, which is a major component of pulmonary surfactant. This interaction suggests that BclA plays a role in the mechanism of B. anthracis adhesion to certain hydrophobic milieux in the host. Our observation that C1q behaves in a similar manner might suggest another facet of C1q-mediated recognition and a rationale for the exceptionally strong structural resemblance with BclA. The propensity for BclA to interact with hydrophobic components may suggest a more general role in host recognition. Indeed, spores lacking BclA appear to adhere more to various cell surfaces. 4 Within the Bacillus genus, only B. cereus, and B. thuringiensis, species close to B. anthracis, possess exosporia and a BclA protein with a highly conserved C-terminal domain on their surface (see the UNIPROT data base at www.ebi-.ac.uk). All three species are pathogenic soil bacteria that require specific conditions, such as those encountered in an animal host, to germinate (48). In their natural environment, the spores therefore need to be able to adhere selectively to potential hosts but not to soil particles. BclA could thus help the spores in attaining their correct target.