STRUCTURAL INSIGHTS INTO THE INHIBITION OF BACILLUS ANTHRACIS SPORULATION BY A NOVEL CLASS OF NON-HEME GLOBIN SENSOR DOMAINS

From the Infectious & Inflammatory Disease Center, Sanford-Burnham Medical Research Institute, La Jolla, CA 92037, USA; The Scripps Research Institute, Department of Molecular and Experimental Medicine, La Jolla, CA 92037, USA; and the Institute of Biochemistry and Biophysics PAS, Pawinskiego 5A, 02-106 Warsaw, Poland Running Title: Structure of a non-heme globin sensor domain Address Correspondence to: Robert C. Liddington, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA. Tel: 858-646-3136 Fax: 858-7955225 Email: rlidding@sanfordburnham.org


Pathogenisis by Bacillus anthracis requires coordination between two distinct activities: plasmid-encoded virulence factor expression (which protects vegetative cells from immune surveillance during outgrowth and replication), and
chromosomally-encoded sporulation (required only during the final stages of infection). Sporulation is regulated by at least 5 sensor histidine kinases that are activated in response to various environmental cues. One of these kinases, BA2291, harbors a sensor domain that has ~35% sequence identity with 2 plasmid proteins, pXO1-118 and pXO2-61. Since over-expression of pXO2-61 (or pXO1-118) inhibits sporulation of B. anthracis in a BA2291-dependent manner, and pXO2-61 expression is strongly upregulated by the major virulence gene regulator, AtxA, it was suggested that their function is to titrate out an environmental signal that would otherwise promote untimely sporulation. To explore this hypothesis, we determined crystal structures of both plasmid-encoded proteins. We found that they adopt a dimeric globin fold, but, most unusually, do not bind heme. Instead, they house a hydrophobic tunnel and hydrophilic chamber that are occupied by fatty acid, which engages a conserved arginine and chloride ion via its carboxyl head-group. In vivo, these domains may therefore recognize changes in fatty acid synthesis, chloride ion concentration, and/or pH. Structure-based comparisons with BA2291 suggests that it binds ligand and dimerizes in an analogous fashion, consistent with the titration hypothesis. Analysis of newly sequenced bacterial genomes points to the existence of a much broader family of nonheme, globin-based sensor domains, with related but distinct functionalities, that may have evolved from an ancestral heme-linked globin.
Fully virulent B. anthracis carries two large plasmids, pXO1 and pXO2, that are responsible for the production of its major virulence factors, anthrax toxin and the poly-γ-D-glutamic acid capsule (1)(2)(3).
Following host-triggered germination of B. anthracis spores, toxin and capsule enable the bacilli in their vegetative form to evade the host's immune system and replicate rapidly in the lymphatic system and bloodstream. If the infection is not treated at an early stage, toxemia and septicemia leading to host death may rapidly follow (4)(5)(6)(7).
Sporulation is required for long-term survival of B. anthracis following host death, but the process must be carefully coordinated with toxin expression and the progress of the infection, since sporulating cells (in contrast to encapsulated vegetative cells) are susceptible to host defenses. Given that sporulation is directed primarily by chromosomal genes, close coordination with the regulatory elements encoded by plasmid genes is required for effective pathogenesis.
Indeed, there is much evidence for such "cross-talk", although the overall picture is far from clear (reviewed in (8)). For example, the pXO1-encoded protein, AtxA, regulates synthesis of toxin (also pXO1-encoded) as well as capsule (pXO2-encoded) (9)(10)(11)(12). AtxA is also part of a regulatory network for S-layer synthesis (a further defensive layer distinct from the capsule and peptidoglycan cell wall), a task performed by chromosomal proteins (13). In turn the synthesis of AtxA is regulated by the chromosomallyencoded transcription factor Spo0A, the master regulator of sporulation (14,15), which completes a regulatory link between sporulation and virulence factor expression.
Spo0A is activated by phosphorylation to induce or repress transcription of genes required or not required for sporulation, respectively (16). The regulatory pathway controlling Spo0A is more complex than most two-component signal transduction systems. In B. anthracis this pathway includes at least 5 (chromosomally-encoded) sensor histidine kinases that are capable of inducing sporulation (17), as well as several aspartyl phosphatases, one of them encoded by the pXO1 virulence plasmid, that inhibit sporulation (18,19).
Sensor histidine kinases are the primary sensors of environmental cues, and form a large family of signaling proteins in both Gram-positive and Gram-negative bacteria. They have a modular architecture comprising at least a "sensor" domain and a catalytic domain (which includes the phosphorylatable domain DHp and the ATPbinding domain) that autophosphorylates on a histidine residue in response to sensor domain activation. Phosphohistidine is a high-energy species that transfers its phosphoryl group to an aspartic acid residue on the downstream effector (for reviews see (20)). BA2291 is one of the most active kinases in promoting sporulation in B. anthracis (17). It also appears to be unique in using GTP rather than ATP as its energy source for phosphorylation (21), and orthologs are found in most members of the B. cereus group (a subfamily of the genus Bacillae, which includes B. anthracis and B. thuringiensis (an insect pathogen), but not B. subtilis). Two plasmid-encoded proteins, pXO1-118 and pXO2-61, express "sensor-only" domains that share ~35% sequence identity with BA2291 (22) and are only found in B. anthracis and certain strains of B. cereus that harbor similar plasmids. The pXO1-118 gene lies next to and is divergently transcribed from the atxA gene, while pXO2-61 lies within the region directing capsule synthesis . A microarray study found that transcription of pXO2-61 was strongly upregulated when the atxA gene was present (12); and overexpression of pXO2-61 was found to reduce sporulation of B. anthracis in a BA2291-dependent manner (22). Expression of BA2291 in a B. subtilis model induced sporulation when expressed at lower copy levels, and this was repressed by co-expression of either pXO1-118 or pXO2-61. However, higher levels of BA2291 expression led to repression of sporulation, suggesting that when the activating signal is in limited supply, the kinase activity is reversed, and BA2291 acts as a phosphatase.
These observations led to a model in which the plasmidencoded sensor domains modulate BA2291 activity by titrating the sporulation signal, thereby preventing premature sporulation. This model completes a regulatory link between virulence factor expression and sporulation (22).
In order to explore the molecular mechanisms of BA2291 regulation, and the nature of the sporulation signal, we determined the crystal structures of pXO1-118 and pXO2-61 at high resolution. We show that they adopt a dimeric globin fold, but, most unusually, do not bind heme. We demonstrate that they bind fatty acid and a halide ion (most likely chloride) in a central cavity, and show that the key structural features of ligand recognition and dimerization are conserved in a large family of kinases found in Bacilli and related species. This suggests that they recognize the same environmental cue(s), and support the titration mechanism for the sensor-only domains. Recognizable homologs are also found in a number of distinct bacterial phyla, and our analysis points to an evolution of the globin family into a "heme-free" group of versatile environmental sensors.

EXPERIMENTAL PROCEDURES
Cloning, expression and purification-The plasmid for E. coli over-expression of B. anthracis ORF118 was obtained by cloning the PCRamplified coding sequence using oligonucleotides BaORF1185'Nde (5'-GAGTGGACATATGGAAGCAACAAAACG-3') and BaORF1183'Bam (5'-CTATAGGATCCAAAAATTTCAAGGTG-3') into plasmid pET28a (Stratagene) digested with NdeI and BamHI. The coding sequence for amino acids 1 to 146 of BA2291 was amplified using oligonucleotides BaKin5'Nde (5'-TATTCGTCATATGGAAATGGAGGGAATG-3') and BaKinXho2 (5'-TTTCCCTCGAGTTTTATAATATAATTTCCG AGTAT-3') and the fragment was cloned in pEt28a digested with NdeI and XhoI. In the case of pXO2-61, a synthetic gene was purchased from GenScript Co., NJ, USA and subcloned in pET28 as described for pXO1-118. Expression was obtained in E. coli BL21(DE3) grown in LB medium after induction with 0.1 mM IPTG for 4 h at 32°C. All proteins included a cleavable 6-His tag, and were purified from the soluble fraction by Nickel affinity chromatography on a His trap chelating column (Pharmacia), followed by thrombin removal of the tag, and size exclusion on Superdex75 or Superdex200 columns (Amersham, Pharmacia). The apparent MW in each case was consistent with a dimer. pXO1-118 was stored frozen in 20 mM Tris HCl buffer pH 7.4, 1 M NaCl, 50 µM KCl, 5 mM dithiothreitol. pXO2-61 and BA2291 sensor domain required the addition of 500 mM NaCl to keep the proteins stable for freeze-thawing and long-term storage. For pXO1-118, selenomethionine-labeled protein was purified using a similar protocol, except that cells were grown in minimal media supplemented with SeMet (23). The MW of all proteins was confirmed by SDS-PAGE and MALDI-TOF mass spectrometry. Crystallization, data collection, and structure solution-Native and selenomethionine-labeled pXO1-118 were crystallized by sitting or hanging drop vapor diffusion at room temperature by mixing 3 µl of precipitant solution (40% (v/v) PEG-300, 100 mM Tris-HCl pH 5.4, 5% (w/v) PEG-1000) and 3 µl of protein solution at 14 mg/ml. Rod-shaped crystals grew within 3 days. They belong to space group P3 2 21 with cell dimensions a=89.9 Å, c=35.3 Å. One native and one SeMet data set at the Se absorption edge were collected at SSRL beamline 9-2 and NSLS beamline X26C, respectively, at 100 K. Diffraction images were processed and scaled with the HKL package (24). SOLVE (25) located four Se sites, leading to a set of initial phases with Density modification increased the FOM to 0.60, and automatic model-building using RESOLVE (26) resulted in a 77% complete model. Further modelbuilding was carried out in O (27), and the structure refined with REFMAC5 (28) and simulated-annealing using CNS (29). The final model for pXO1-118 contains a single domain (residues 1-150), three non-native N-terminal residues, one molecule of undecanoic acid, 95 water molecules and one Clion. The final R WORK =0.181 and R FREE =0.225 for data from 80-1.76 Å resolution. The dimer is generated by rotation of the monomer about a crystallographic dyad.
pXO2-61 was crystallized by the microbatch method under paraffin oil. Crystals were obtained in 2 days using a buffer containing 1 M NaI, 20% (v/v) PEG 3350 100mM Tris-HCl pH 7.5. Crystals belong to space group P2 1 2 1 2 1 with unit cell a=44.1, b=62.6 and c= 124.7 Å. Data were collected at the SSRL to 1.49 Å resolution, and processed with the HKL package (24). The structure was solved by molecular replacement using the refined pXO1-118 structure as the search model. Model building and refinement were carried out in O (27) and REFMAC5 (28). The final model has R WORK = 17.7 % and R FREE =20.9 % for data from 60-1.49 Å resolution. The asymmetric unit contains two molecules forming a dimer (residues 5-136 for each molecule), 364 water molecules, 26 Iions and 8 Na + ions. The solvent content is 56.7%. Data collection and refinement statistics are summarized in Table 1. The stereochemical quality of both models was confirmed using PROCHECK (30). The coordinates and structure factors for pXO2-61 and pXO1-118 have been deposited with the PDB (codes 3pmc and 3pmd). Gas Chromatography-Mass Spectrometry (GC-MS)-200 µl of chloroform was added to 0.1-1 ml 10 mg/ml sensor domain in solution. The resulting two-phase system was sonicated for 10 min, incubated at 70°C for 1 h, and then centrifuged. The organic phase was separated by syringe. For pXO2-61, the carboxylic acid group of the fatty acid was verified by using 20 µl bistrimethylsyliltrifluoroacetamide and 20 µl pyridine incubated for 1.5 h at 65°C. Samples were evaporated to dryness under a stream of N 2 and reconstituted in 100 µl methylene chloride, prior to analysis with GC-MS (Scripps Center for Mass Spectrometry, CA, USA). Isothermal Titration Calorimetry (ITC)-ITC was performed using a VP-ITC calorimeter from Microcal (Northampton, MA). 8 µl of fatty acid solution (1.6-2.6 mM) was injected into the cell containing 100 µM protein (pXO1-118 or pXO2-61) in 20 mM Tris pH 7.4 and either 500 mM or 1 M NaCl, respectively. All titrations were performed at 23°C, and each experiment involved 37 injections. Myristic acid (n-C14:0) and palmitic acid (n-C16:0) were purchased from Sigma-Aldrich, MO, USA. 12-methyltetradecanoic acid (anteiso-C15:0) and 13-methyltetradecanoic acid (iso-C15:0) were purchased from Indofine Chemical Co, NJ, USA. Palmitoleic acid was purchased from Fluka. Data were analyzed using Microcal Origin software provided by the manufacturer.
Analytical ultracentrifugation-Sedimentation equilibrium experiments were performed in a ProteomeLab XL-I (BeckmanCoulter) AUC. Protein samples (pXO2-61 or BA2291 sensor domain) in 20 mM Pipes pH 7.5 and 500 mM NaCl were loaded at concentrations of 0.5, 0.167 and 0.056 mg/ml in 6-channel equilibrium cells, and spun in an An-50 Ti 8-place rotor at 25,000 rpm for 24 h at 20°C. Data were analyzed with HeteroAnalysis software (J.L. Cole and J.W. Lary, University of Connecticut). In each case, an ideal equilibrium model gave a convincing fit for a monomer-dimer equilibrium in solution, with dimer MW values of 32.5 kDa (pXO2-61) and 33.8 kDa (BA2291 sensor domain), no evidence for higher oligomers, and dimerization Kds of 0.65 µM and 0.83 µM, respectively.
Yeast Two-hybrid analysis-The yeast twohybrid system (Clontech) was used to explore interactions between pXO1-118 and AtxA. Coding genes were singly cloned into the bait plasmid (pGBT9) and prey plasmid (pGAD424). Assays were performed in the yeast strain AH109. We detected interaction when pXO1-118 was present on both pGBT9 and pGAD424, consistent with homodimer formation in yeast cells. There was no evidence for AtxA-pXO1-118 interactions (data not shown).

RESULTS
Crystal and solution structures of pXO1-118 and pXO2-61. We solved the crystal structures of pXO1-118 and pXO2-61 at 1.76 Å and 1.49 Å resolution, respectively (See Methods, Table I, Figs. 1 and 2). The asymmetric unit of pXO1-118 contains a single sensor domain which adopts the globin fold. Although it does not bind heme, we follow the standard nomenclature for hemoglobins: helices A, B, E, F, G, H are present in pXO1-118, while helices C and D are replaced by an ordered loop ("BE"). A dimer is formed across a crystallographic dyad, mediated by the packing of the G and H helices from apposing monomers, forming an antiparallel, left-handed four-helix bundle (Fig. 1) that buries a large interface (~3500 Å 2 ).
As expected, the structure of pXO2-61 (Fig 2) is very similar to that of pXO1-118 in both tertiary fold and dimer organization. The asymmetric unit in this case contains a dimer, and the monomers superpose with an RMS deviation of 0.43 Å for Cα carbons. The most obvious external difference arises from a shorter C-terminal helix, leading to a more compact shape and a decrease in the dimerization interface, to 2200 Å 2 . The dimers of pXO1-118 and pXO2-61 superpose with an RMS difference of 1.2 Å for 264 Cα carbons (0.85 Å for pairwise comparisons of single domains).
By analytical ultracentrifugation (AUC) we found that pXO2-61 forms dimers in solution at ≥µM concentrations (dimer Kd = .6 µM). We also purified the sensor domain from BA2291, and found that it also forms dimers in solution, with a similar Kd (~0.8 µM). (Suppl. Fig 1). In addition, we demonstrated that pXO1-118 forms dimers in vivo, as evidenced in a yeast 2-hybrid system (see Experimental Procedures).
The dimer interfaces of pXO1-118 and pXO2-61 include upper and lower hydrophobic regions that are more closely packed; while in the central section the helices diverge, and the interface is chiefly hydrophilic, comprising a large number of direct and water-mediated hydrophilic interactions (Fig. 1C). We found a total of 34 (pXO1-118) and 20 (pXO2-61) solvent molecules buried at distinct locations in this space. pXO1-118 has an additional interface formed by the ends of the longer C-terminal helices, which form a short antiparallel cross-over β-sheet with clusters of Tyr and Lys residues. Given the high homology between the BA2291 sensor domain and the plasmid domains (35% identity, 59% similarity) and the existence of a dimer in all cases, it is reasonable to assume that BA2291 will share a very similar tertiary and quaternary organization.
pXO1-118 and pXO2-61 are non-heme globins. 3-dimensional comparisons of the plasmid-encoded sensor domains using the DALI server (31) identified many structures with the globin fold that superpose with RMS differences in the 2.3-2.7 Å range for ~150 Cαs (Suppl . Table  I). However, the PDB database contains no close sequence homologs (identities range from 8 to 14%). Moreover, most proteins with the globin fold have an Fe-heme cofactor sandwiched between the E and F helices. The most similar structures (in both tertiary and quaternary organization) are two bacterial (B. subtilis and Geobacter) heme-containing oxygen sensor domains, which form dimers via a similar pairing of the G and H helices. There is one globin structure that lacks co-factor altogether: the B. subtilis stress response regulator RsbR (32); in this case, the dimerization helices (G and H) bend inwards, eliminating the co-factor cavity. Mammalian hemoglobins have closely related tertiary folds, but only one, the recently discovered cytoglobin, has a related dimeric organization (33).
The packing in the dimeric bacterial hemoglobin from Vitreoscilla sp. (PDB 2VHB) (34) is more closely related to the mammalian hemoglobins.
The cyanobacter phycocyanins also adopt a globin-like fold (35); they are electron transfer proteins involved in photosynthesis, and bind porphyrin-like moieties via cysteine-mediated thioether bonds. However, they have large Nterminal extensions, and their quaternary organization is quite distinct from any of the hemoglobins.
pXO1-118 and pXO2-61 contain a central cavity. An overlay of pXO1-118 and the B. subtilis oxygen-sensor domain (36) illustrates the overall similarity in secondary and tertiary folds (Fig. 3A) and shows how helix E of pXO1-118 rotates and bends in order to pack more closely against helix F, partially filling the space that is occupied by heme in the oxygen sensor. The short rigid BE loop (which replaces the CD helix/turn) packs against the FG turn, stabilizing this conformation (see below). Oxygen-binding hemoglobins are linked to the heme via a "proximal histidine" at the 8 th position of helix F (F8). In pXO1-118 (and pXO2-61), there is no similarly located histidine, and, as expected, attempts to reconstitute the proteins with hemin were unsuccessful (data not shown). Remarkably, DALI-based structural alignment places pXO1-118 residue Arg74 at the F8-heme location (Fig. 3B). A cross-section through the central cavity (Fig. 3C) further demonstrates the steric mismatch between heme and the pXO1-118 cavity. As discussed below, Arg74 is an invariant residue that appears to play a key role in binding an alternative cofactor (fatty acid -see Fig. 3B), and thus may be considered a structural and functional analog of the proximal histidine.
Note that in RsbR, DALI-based structural alignment shows that helix F is truncated such that there is no analog of F8 ( Fig  3D), consistent with its being cofactor-free.
Crystals of pXO1-118 contain a buried fatty acid. The central cavity runs roughly parallel with helices E and G, with a contour length of ~20 Å (Fig. 4). It comprises a narrow (~6 Å diameter) tunnel lined with hydrophobic and aromatic residues, capped at one end by residue Phe19, which appears to act as a gate, adopting conformations that either open or close the hydrophobic entrance to the tunnel. The other end of the tunnel opens into a hydrophilic chamber, which is sealed from bulk solvent by a "canopy" created by the packing of the BE and FG turns. It includes water molecules as well as a heavier anion, which we believe to be chloride (see below).
In pXO1-118, the cavity is also occupied by continuous worm-like electron density that ends in a symmetric bifurcation, consistent with the presence of a fatty acid (Fig. 4a). Using mass spectrometry (GC-MS), we determined that the major chloroform-soluble non-protein component in the crystals is palmitic (hexadecanoic) acid, presumably derived from the cell wall of E. coli (37) during expression or cell lysis (Suppl. Fig. 2). The tunnel is long enough to completely bury tetra-decanoic acid (with the Phe19 gate in the open position), and additional methylene groups would presumably extrude into solvent. We searched for an authentic Bacillus cofactor by incubating E.coli-purified pXO1-118 with crude B. cereus cell extract. The protein was then repurified and its crystal structure determined. However, we observed no significant difference in the cofactor density (data not shown), suggesting that a similar molecule binds in vivo.
GC-MS analysis demonstrated the presence of fatty acid in E.coli-purified pXO2-61 as well as the BA2291 sensor domain (Suppl. Fig. 2). However, it is not present in the crystals of pXO2-61 (BA2291 failed to crystallize), most likely due to the different crystallization conditions. Thus, pXO1-118 crystallized in sodium chloride, while pXO2-61 required the presence of 1.0 M sodium iodide. Iodide is a strongly chaotropic agent that disfavors complex formation and increases the solubility of hydrophobic moieties (38), consistent with the expulsion of fatty acid from the protein.
The higher pH required for crystal growth of pXO2-61 (pH 7.5 vs. 5.4) may also disfavor binding (see below). Notwithstanding, pXO2-61 binds fatty acids in solution (in the absence of iodide) with an affinity similar to that of pXO1-118 (see below). Despite the absence of fatty acid, a nearly identical tunnel and canopy are observed in crystals of pXO2-61. Weak density observed within the tunnel has been modeled as solvent/buffer; it is quite distinct from that in pXO1-118, and the density stops well short of the Phe19 gatekeeper, which adopts an ordered, closed conformation.
A conserved hydrophilic chamber engages the fatty acid head-group and a halide ion. The BE and FG turns pack closely together at the top of the domain, forming an extended salt-bridge/Hbonded network ("canopy") that seals the chamber from bulk solvent, and engages the fatty acid carboxyl head-group and a tightly bound halide ion (Fig. 4). A prominent feature of the canopy is a motif, 69 KIAxER 74 , at the end of the F helix, which is invariant among the sensor domain orthologs within the B. cereus group . Thus, Lys69 and Glu73 form part of a salt-bridge/H-bonding network that engages a fatty acid carboxyl oxygen (labeled "O-I" in Fig. 4)) via the side-chain amide nitrogen of Asn42 (E helix). Ile70 lies at the beginning of the hydrophobic tunnel, making close contact with the first 2 methylenes of the fatty acid. Ala71 points away from the binding pocket, but its short sidechain allows the F helix to pack closely against the H helix, which likely explains its conservation. The residue at position 72 points out into solution, and accordingly is not conserved. Arg74 makes a bifurcated salt-bridge to Asp33 (BE turn), and engages both fatty acid carboxyl oxygens (O-I and O-II) via its positively charged Nδ, forming a direct H-bond to O-II and a water mediated bond to O-I.
A second prominent feature within the hydrophilic chamber of both pXO1-118 and pXO2-61 is a halide ion. In the case of pXO2-61, a very strong electron density peak is observed. An anomalous Difference Fourier identifies the peak as an iodide ion (peak height ~10 σ), since it is the only species present in the protein or crystallization liquor with significant anomalous scattering at the wavelength employed, 0.98 Å (Suppl. Fig. 3). It is the only iodide ion visible within the tunnel or indeed within any buried region of the dimer. We have therefore assigned a strong peak in the analogous position in crystals of pXO1-118 as chloride, since iodide is absent from that crystallization liquor while chloride is abundant, and it has been established in other systems (see below) that Cl-and I-typically bind competitively to the same sites. Moreover, crystallographic refinement with the appropriate halide ions yielded B values comparable with those of the surrounding protein, consistent with full occupancy of the sites The environment of the chloride ion is shown in Fig. 4D. Its closest contact (2.6 Å) is made with the second carboxyl oxygen of the fatty acid, "O-II" (suggesting that this oxygen is protonated and forms a H-bond); further contacts are with the side-chain of Arg74, as well as the side-chain amide nitrogen of Asn87. Contacts with the hydrophobic side-chain of Ile39, and one directlybound water molecule, complete the coordination sphere. The structure of this site is remarkably well preserved in pXO2-61, except that a second water molecule takes the place of the fatty acid, and the larger but more polarizable iodide ion has similar but distinct bond lengths (Table II). It should also be noted that there is an identical chain of well-defined H-bonded water molecules tracing a narrow hydrophilic channel through the top of the canopy into bulk solvent, suggesting a pathway for halide exchange that does not require release of the fatty acid (Fig. 4B).
The site fits the general characteristics of a buried chloride binding pocket, as described (39). Such sites require sufficient positive potential to neutralize the anion, through Coulombic and/or Hbonding interactions, but the coordination geometry is less stringent than for a typical metalbinding site. Notwithstanding, the site in our sensor domains resembles a well-characterized buried site in the E. coli protein, GadB, where chloride plays an allosteric, pH-dependent role (40). Furthermore, the authors in that case showed that Br-and I-could readily substitute for chloride (see Table II).
The binding pocket has similarities with other fatty acid binding proteins (but created from a different protein architecture). For example, the nuclear receptor HNF4α (41), which creates a hydrophobic tunnel and buried arginine to engage the fatty acid The RSCB Protein Data Base also contains an example of a bacterial PAS domain (from the signaling protein, RV1364C, from Mycobacterium tuberculosis PDB code: 3K3C) which houses a palmitic acid in a broader pocket. In this case, both an Arg and an Asp (O-O distance = 2.6 Å) coordinate the carboxylic acid, implying that the fatty acid is protonated in this case also. However, no other ions were identified in either pocket.
pXO1-118 and pXO2-61 bind reversibly to fatty acids in solution. The fatty acid binding pocket, as well as the dimerization interfaces, are well conserved structurally between pXO1-118 and pXO2-61, and sequence comparisons suggest that the key binding/dimerization determinants of the sensor domains are conserved in BA2291 and indeed throughout the B. cereus family of BA2291-related histidine kinases (as well as relatives such as Geobacillus) (Fig 5). This suggests that they will all bind the same or a similar ligand and function in a similar way.
The tunnel appears to be optimal for a C14 fatty acid, and there is also a distinct bend in the tunnel around methylene 8, raising the possibility of specificity for a cis-monounsaturated fatty acid (bacteria possess few polyunsaturated fatty acids). We therefore measured the binding of 5 different Bacillus fatty acids by ITC. We found that both pXO1-118 and pXO2-61 bound fatty acid reversibly, and measured binding to 2 saturated, unbranched (myristic and palmitic acids), 2 saturated branched (12-methyltetradecanoic and 13-methyltetradecanoic acids), and 1 monounsaturated (palmitoleic acid). We found that all bound exothermically with a stoichiometry ~1:1. However, we did not detect significant selectivity among the fatty acids tested (binding affinities were in the range of 10-40 µM (Table III and Suppl. Fig. 4).
A new globin functionality -Our crystal structures and sequence comparisons illustrate a novel ligand-binding modality for the globin fold, one that does not involve heme or any related porphyrin-like attachment. Structural alignments point to a remarkable equivalence between the proximal histidine of heme proteins and the key arginine residue within the KIAxER motif, strongly suggesting an evolutionary relationship. Indeed the heme-based oxygen sensor has the related sequence KIGHAH, in which its Ile packs against the hydrophobic moiety of the cofactor (heme), just as the analogous Ile of pXO1-118 does against fatty acid. And the alanine/glycine residue serves a similar purpose, ensuring tight packing between the F and H helices (and providing a potential link between ligand recognition and dimerization).
The PFAM database currently contains more than 100 sequences, very few of which have been functionally characterized. Many are the sensor domains of B. cereus family kinases, which are identical (or nearly so) to BA2291. However, there are also many distinct sequences, which we have sub-classified into 4 groups in light of our structural analysis (see Fig. 5

):
Group I comprises the "sensor-only" domains from family Bacillus. In addition to pXO1-118 and pX02-61, a number of B. cereus species have acquired a pXO1-like plasmid, and these contain genes that are highly homologous to pXO1-118. The only other sequence is from the recently annotated genome of B. cellulosilyticus, an alkaliphilic, salt-tolerant, spore-forming bacterium (42). Its sequence is more divergent, but still demonstrates all of the key structural features associated with the other Group I members. It is not reported whether it is plasmid-encoded.
Group II comprises the sensor histidine kinases of the BA2291 family. As noted, these are found in a large number of B. cereus/B. anthracis strains/isolates, as well as the related B. thuringiensis, an insect pathogen. Their sensor domains are nearly invariant (>90-100% identity), so we assume that they are sensing the same or similar signal. Note that only a small fraction of these species/strains harbor a plasmid-borne sensor-only domain.
Group III comprises sensor domains from a more disparate collection of sensor histidine kinases. They are all from the class Bacilli, but from different genera within the family Bacillaceae, including Geobacillus and Brevibacillus. These typically show around 30-45% identity to the BA2291 class, but again display the key sequence characteristics, suggesting identical or similar functionality.
Group IV comprises "sensor-only" domains from bacteria from distinct phyla, for which no cognate kinases have been identified. Most of the organisms bearing these genes occupy distinct or unique environmental niches, and utilize a wide variety of energy sources. For example, the phylum Chlorobi (green sulfur bacteria) is wellrepresented. These are Gram-negative bacteria that can degrade a wide variety of chloro-aromatic compounds, as well as toxic metals (43,44). The sequences are, not surprisingly, more divergent (~20-25% identity) from those of the Groups I-II , but they still include most of the hallmark residues involved in structural integrity, and, to a lesser extent, cofactor binding and dimerization. They all have Gly or Ala at the 3 rd position of the motif, consistent with a conserved packing between helices F and H. They typically have a longer BE turn, suggesting a structure more closely related to heme-binding globins. Nothing is known about the function of these domains, but for some members of this Group there is divergence of the KIAxER motif, suggesting related but distinct functions. For example, in an Anaeromyxobacter sensor, a histidine replaces the inward-facing Ile, while in an Acidobacter sensor, the Arg at position 6 is replaced by cysteine. Both changes introduce sidechains with the potential to generate novel functionalities within the central cavity.

DISCUSSION
We have identified 2 ligands that bind to pXO1-118 and pXO2-61 sensor domains in vitro -fatty acid and halide (chloride) -and our data strongly suggest that the chromosomally encoded kinase, BA2291, behaves similarly, and will therefore sense the same environmental cue in vivo, thus supporting the proposal that the plasmid-encoded sensor domains inhibit untimely sporulation by titrating the signal (22). Our studies point to roles for fatty acid (or similar molecule), chloride ion, and possibly pH (see below), as signaling cues, which should inform the next set of experiments to define their activity in vivo.
Fatty acid synthesis is upregulated in preparation for sporulation, both in quantity and type, with a shift toward monounsaturated species. Thus, the sensor domains might recognize a fattyacid that is synthesized in the build-up to sporulation. Our short survey of saturated and unsaturated fatty acids did not offer any clues in this regard, as they all bound with similar affinity. Nevertheless, the binding pocket we have described does have a distinct shape that could be optimized for a specific unsaturated species. Specificity of this kind has been observed for an unrelated sensor kinase, DesK, from B. subtilis (45), which recognizes C16 fatty acids with a double bond at the Δ5 position.
Chloride ion concentrations differ by 10-20 fold between the different host tissues that B. anthracis must encounter during pathogenesisfrom the intracellular milieu of the macrophage (~5 mM) to the extracellular fluids (lymphatic and plasma >100 mM) (46) -which could reasonably offer a trigger for sporulation (that needs to be suppressed). We have demonstrated the existence of a specific chloride ion binding site that is intimately involved in the binding of fatty acid, raising the possibility that chloride (or some other small anion) is the "ligand" and fatty acid the "cofactor", by analogy with oxygen binding to heme in the hemoglobins. We note that the ligand binding geometry is consistent with the fatty acid being protonated, enabling one carboxyl oxygen (O-II) to form a favorable hydrogen bond with the chloride ion. Thus, fatty acid binding might be both chloride-and pH-dependent within the physiological range (cf. the Bohr/chloride effects in hemoglobin (47)) and the combinatorial effects of chloride and pH on the activity of E. coli decarboxylase, GadB (40)). Furthermore, significant changes in environmental pH are associated with different steps in B. anthracis pathogenesis, any of which could, in principle, offer a trigger for sporulation. However, further work is clearly required to test this hypothesis.
The broader significance of our work lies in the definition of a new family of globin-based sensor domains that utilize an alternative (nonheme) cofactor, but which nevertheless appear to be closely linked evolutionarily to the hemebearing globins. The majority of sensor domains within this family (that have been sequenced so far) come from genomes within the family Bacillaceae, and are contained within a histidine kinase architecture that is very similar structurally and most likely functionally to the B. anthracis BA2291 sporulation kinase. By contrast, recent genome sequencing efforts have uncovered a subfamily of "sensor-only" domains that are found in a range of bacteria from unrelated phyla. In general, rather little is known about these bacteria, other than a shared propensity for utilizing unusual energy sources such as the hydrolysis of chloroaromatics. Nothing is currently known about the function of their putative sensor domains, but it will be most interesting to see if they play a role in regulating these unusual metabolic functions. 4 Present Address: Department of Molecular Biology, University of Salzburg, 5020 Salzburg, Austria. 5 The abbreviations used are: ITC, Isothermal Titration Calorimetry; AUC, analytical ultracentrifugation; GC-MS, Gas Chromatography-Mass Spectrometry.   2. Stereo Cα overlay of pXO1-118 (red) and pXO2-61 (gray) dimers, with N and C-termini labeled. Occasional residue numbering is given for guidance. The fatty acid in crystals of pXO1-118 is shown in cyan (methylene carbons) and red (carboxyl oxygens). Fig. 3. pXO1-118 has a globin fold but does not bind heme. A. Overlay of pXO1-118 monomer (red) with the heme-based sensor from Geobacter sulfurreduccens (blue). Heme and heme-linked proximal histidine are shown for the latter. Helices A, B, F and the FG turn, G (not visible) and H overlap closely, while for pXO1-118, helix E tilts and bends toward helix F, occluding the heme pocket (fatty acid has been omitted for clarity). B. The same overlay as in A (rotated ~90° about a vertical axis), showing the FG region in greater detail. Note how well the main-chain aligns through the FG turn. The Cα of residue F8 is labeled. In the heme-based sensor, this residue is the "proximal histidine", which engages the heme-linked iron atom (gray sphere). In pXO1-118, this residue is Arg74, which engages the head-group of the fatty acid (carboxyl oxygens are in red), which nearly coincides with the position of the heme-linked iron. The chloride ion (green) also lies nearby. C. Cut-away view of the cavity at the center of pXO1-118 (view is similar to B), showing that its shape is incompatible with heme binding. Note that the hydrophilic propionate sidechains (on the right side of the heme) are confined within the protein (in this hypothetical model), owing to the close packing of helices E and F. D. Structure-based sequence comparisons of the F and G helices of the sensor domains and RsbR, which does not bind cofactor, illustrating the truncated F helix that lacks an "F8" residue (the standard globin numbering scheme is