The Structure of β-Carbonic Anhydrase from the Carboxysomal Shell Reveals a Distinct Subclass with One Active Site for the Price of Two*

CsoSCA (formerly CsoS3) is a bacterial carbonic anhydrase localized in the shell of a cellular microcompartment called the carboxysome, where it converts \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{HCO}_{3}^{-}\) \end{document} to CO2 for use in carbon fixation by ribulose-bisphosphate carboxylase/oxygenase (RuBisCO). CsoSCA lacks significant sequence similarity to any of the four known classes of carbonic anhydrase (α, β, γ, or δ), and so it was initially classified as belonging to a new class, ϵ. The crystal structure of CsoSCA from Halothiobacillus neapolitanus reveals that it is actually a representative member of a new subclass of β-carbonic anhydrases, distinguished by a lack of active site pairing. Whereas a typical β-carbonic anhydrase maintains a pair of active sites organized within a two-fold symmetric homodimer or pair of fused, homologous domains, the two domains in CsoSCA have diverged to the point that only one domain in the pair retains a viable active site. We suggest that this defunct and somewhat diminished domain has evolved a new function, specific to its carboxysomal environment. Despite the level of sequence divergence that separates CsoSCA from the other two subclasses of β-carbonic anhydrases, there is a remarkable level of structural similarity among active site regions, which suggests a common catalytic mechanism for the interconversion of \batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \(\mathrm{HCO}_{3}^{-}\) \end{document} and CO2. Crystal packing analysis suggests that CsoSCA exists within the carboxysome shell either as a homodimer or as extended filaments.

The efficiency of carbon fixation in cyanobacteria and many autotrophic bacteria is enhanced by a polyhedral microcompartment called the carboxysome. The carboxysome consists of a paracrystalline array of the carbon-fixing enzyme ribulose-bisphosphate carboxylase/oxygenase (RuBisCO) 3 enveloped by a proteinaceous shell composed of at least five polypeptides (reviewed in Refs. [1][2][3][4][5]. The structures of some of these shell proteins have been recently reported (6). It is proposed that car-boxysomes enhance carbon fixation by elevating [CO 2 ] up to 1000-fold around the active site of RuBisCO, effectively saturating it with substrate. The rate enhancement afforded by this mechanism is made evident when one considers the remarkably low affinity RuBisCO has for its substrate, CO 2 (K m Ͼ150 M). The mechanism by which CO 2 is concentrated in the carboxysome is not clearly understood. It is proposed that HCO 3 Ϫ , which is actively transported and accumulated in the cytoplasm, diffuses through the proteinaceous shell and is dehydrated to CO 2 by a carboxysome-associated carbonic anhydrase. The close proximity of the carbonic anhydrase to encapsulated RuBisCO is thought to be a key factor in speeding the fixation of CO 2 . It is also possible that the carboxysomal shell provides a barrier to the escape of CO 2 from RuBisCO. Carbonic anhydrases have been found associated with carboxysomes of the two known types: the ccm type, which appears to be limited to cyanobacteria, and the cso type, which occurs in non-photosynthetic autotrophs and in some cyanobacteria (2,7). There are four known classes of carbonic anhydrases: ␣, ␤, ␥, and ␦. In the ccm-type carboxysomes, both ␤ and ␥ classes were discovered. These are called CcaA and CcmM, respectively. CcaA activity co-purifies with carboxysome-enriched subcellular fractions (1-3, 8 -10), and CcmM was found necessary for carboxysome formation (11). In cso-type carboxysomes, however, sequence comparisons could not discern a gene that might encode any of the four known carbonic anhydrase classes. Classical biochemical approaches revealed carbonic anhydrase activity in the carboxysomal shell protein CsoS3 (which we rename here CsoSCA, for carboxysome shell carbonic anhydrase) isolated from homogeneous carboxysome preparations from Halothiobacillus neapolitanus or prepared by recombinant techniques (12).
Contrary to earlier predictions that placed the carboxysome-associated carbonic anhydrase into the interior of the microcompartment, CsoSCA was shown to be an integral component of the shell (12,13). The low abundance of the protein as compared with other shell proteins and the fact that a H. neapolitanus mutant carrying an in-frame kanamycin resistance cassette in the csoS3 gene is able to form carboxysomes 4 imply that CsoSCA is not a major structural determinant of the carboxysome shell. The protein is, however, crucial for carboxysome function since this csoS3 mutant has a HCR (high CO 2 requiring) phenotype.
Because of its large size (57 kDa, approximately twice that of other carbonic anhydrases) and its lack of homology at the primary structure level, CsoSCA was assigned to a new (⑀) class of carbonic anhydrase (12).
Here, we describe the 2.2 Å crystal structure of CsoSCA from the chemoautotrophic bacterium H. neapolitanus, which suggests that the enzyme is a variant ␤-type carbonic anhydrase. The structure also provides clues to the catalytic mechanism of the dehydration reaction and reveals quaternary structure assemblies that may be biologically relevant in the context of carboxysome function.
Size Exclusion Chromatography-Purified recombinant CsoSCA (ϳ1 mg/ml) was loaded at 1 ml/min over a Sephacryl S300 column (1 ϫ 16 cm, Amersham Biosciences) equilibrated in 10 mM Tris, pH 8.0, 50 mM KCl. Two peaks were observed, monitored by absorbance at 280-nm wavelength. The first peak (consisting of ϳ10% of the loaded protein) likely corresponds to aggregated species, lying completely within the exclusion volume (M r Ͼ 1.5 ϫ 10 6 ). The second peak (having a symmetrical shape) corresponds to a dimer, eluting with an apparent M r of 125,000. Measurements of carbonic anhydrase activity indicated that the dimer peak was active, and no other carbonic anhydrase activity was detected throughout the elution pattern. It should be noted that solutions of CsoSCA with concentrations above 1.5 mg/ml accumulated precipitated aggregates when held at 4°C over several days, and this aggregation roughly paralleled the loss of carbonic anhydrase activity, consistent with the observation that no activity was observed in the large molecular weight fractions eluted from the gel filtration column.
Structure Determination and Refinement-Potential heavy atom derivatives were screened for binding to CsoSCA using the native gel shift approach of Boggon and Shapiro (15). The following heavy atom compounds were identified as promising candidates for derivatization: HgCl 2 , mersalyl, and di-m-iodobis(ethylendiamine)diplatinum. The compounds were dissolved at 2 mM concentration in a mixture of 65% reservoir, 35% ethylene glycol for cryo-protection. Crystals were soaked at room temperature for 1 h and then flash-cooled and stored in liquid nitrogen until data collection. An iodide derivative was also prepared by dissolving 8 mg of solid KI in 100 l of 65% reservoir, 35% ethylene glycol, and then soaking the crystal for 1 min in this solution.
The x-ray diffraction data set on the native crystal was collected at the Advance Light Source beamline 8.2.2, equipped with an ADSC quantum 315 CCD detector. Derivative data sets were collected on a Rigaku FR-D rotating copper anode x-ray generator equipped with an R-axis IV ϩϩ imaging plate detector. Data were processed using DENZO/ SCALEPACK (16) ( Table 1). Mercury sites were identified by SHELXD using isomorphous differences between the native and HgCl 2 derivative data sets (17). Initial phases were calculated with MLPHARE and later improved by density modification and two-fold symmetry averaging with DM (18). Isomorphous and anomalous difference Fourier methods revealed the positions of the heavy atoms in the other derivatives and the location of four zinc ions in the unit cell. Phasing information from all the derivatives was combined. The model was built manually using the graphics program O (19) and refined using conjugate gradient and simulated annealing algorithms as implemented by the program CNS (20). Strong non-crystallographic symmetry restraints were used throughout. Hydrogen bond restraints were helpful in the early stages of refinement (21). This model was further refined with REFMAC (22), to introduce TLS parameters in the refinement. Data collection and refinement statistics are given in Table 1.
The geometric quality of the model was assessed with the following structure validation tools: ERRAT (23), PROCHECK (24), and WHA-TIF (25). PROCHECK reported that 92% of the residues fall in the most favored region of the Ramachandran plot and 8% of the residues were in additionally allowed regions. ERRAT reported an overall quality factor FIGURE 1. Ribbon diagrams of CsoSCA and ␤-carbonic anhydrase homologs. A, a stereo view of CsoSCA highlights its three domains: the N-terminal (N-term), catalytic, and C-terminal (C-term) domains. The catalytic domain (yellow) bears 5 residues that are strictly conserved among all ␤-carbonic anhydrases known to date (shown in sticks). They cluster around the catalytic zinc ion (green sphere). The C-terminal domain (red) is structurally similar to the catalytic domain but is smaller and lacks all the necessary zinc ligand residues; thus, no zinc is bound to it. The N-terminal domain (blue) bears a zinc ion, but this is thought to be non-physiological. B, structural superpositioning reveals a common fold among the three subclasses of ␤-carbonic anhydrases: plant (yellow, P. purpureum; E. coli; and M. tuberculosis, Rv3588c), cab (blue, M. thermoautotrophicum and M. tuberculosis, Rv1284), and carboxysomal (red, CsoSCA). The plant and cab subclasses are much more similar to each other than they are to CsoSCA. Numbered in the figure are three points of major structural difference between CsoSCA and the other two subclasses. The first region is an extension of helix F and its following loop (16-residue insertion). The second region is an insertion of an 8-residue loop preceding helix G. The third region corresponds to a severe foreshortening of helices G and H and intervening helices (30-residue deletion). C, the blue mesh corresponds to a 2F o Ϫ F c map of the catalytic site, contoured at 1.8 . The orange mesh centered on the zinc ion corresponds to anomalous difference Fourier density contoured at 8 . This stereo view of active site is similar to Fig. 4. of 96%. Protein structures were illustrated using the program PyMOL (26). Atomic coordinates and structure factors have been deposited in the Protein Data Bank, with ID code 2FGY.

RESULTS AND DISCUSSION
Domain Structure-CsoSCA from H. neapolitanus contains three domains (Fig. 1A): a novel N-terminal domain (residues 38 -144) composed primarily of four ␣-helices; a catalytic (middle) domain (residues 151-397) bearing structural resemblance to the carbonic anhydrases of the ␤-class (Figs. 1B and 2) complete with a tightly bound zinc ion (B-factor of 35 Å 2 at full occupancy) (Fig. 1C); and a C-terminal domain (residues 398 -514) with weak structural similarity to the catalytic domain but, unlike the middle domain, completely devoid of catalytic metal ions. Only the N-terminal residues, 1-37, and residues 145-150 (connecting the N-terminal and catalytic domains) are missing from the structural model, presumably due to disorder.
N-terminal Domain-The structure of the N-terminal domain can be classified as an up-down four-helix bundle, but the individual helices are of disparate lengths (helix A, 29 residues; helix B, 12 residues; helix C, 4 residues; and helix D, 20 residues). As such, the domain bears no significant structural resemblance to the numerous helical bundles deposited in the Protein Data Bank. The closest structural similarity uncovered by VAST (27) is with a putative signal recognition particle (Protein Data Bank ID 1WGW) in which 43 residues align with an r.m.s. deviation of 2.4 Å. An evolutionary relationship between the two could not be established with any confidence. Related ␤-carbonic anhydrase structures from Pisum sativum (28) and Escherichia coli (29) also have N-terminal helical extensions in which the helices strengthen homodimer interactions by domain swapping. In CsoSCA, the lack of contacts between the N-terminal domain and other symmetry-related molecules seems to preclude a role in oligomerization. Furthermore, the N-terminal domain is located at an isolated extremity of the protein, suggesting the possibility that it might function to anchor CsoSCA to the carboxysomal shell or to molecules of RuBisCO.
Catalytic and C-terminal Domains Related by Gene Duplication and Fusion-The catalytic and C-terminal domains possess structural similarity to each other (Fig. 3C), suggesting that they arose from a gene duplication event. Sixty-one C␣ atoms from the catalytic domain can be superimposed on the C-terminal domain with an r.m.s. deviation of 1.5 Å. The two domains each contain a central five-stranded mixed ␤-sheet with two helices flanking one side of the sheet and one helix flanking the other side. Interestingly, this latter helix, corresponding to helix K in the catalytic domain, is structurally analogous to helix D, which although tightly associated with the C-terminal domain, originates from the N-terminal part of the primary structure (Fig. 1A). The catalytic and C-terminal domains are related by an approximate two-fold symmetry axis perpendicular to the ␤-sheet (Fig. 3C).
The catalytic domain of CsoSCA bears structural similarity to numerous other carbonic anhydrases of the ␤-class (Fig. 1B). It can be superimposed on carbonic anhydrases from P. sativum (28), E. coli (29), Methanobacterium thermoautotrophicum (30) . These structures all contain a central five-stranded ␤-sheet and at least three flanking helices. Each of these homologs is approximately half the mass of CsoSCA, or roughly the same mass as an isolated catalytic domain of CsoSCA.
In addition to the similarity in fold, the identity and positions of three zinc ligands are conserved in the active sites of all these ␤-carbonic anhydrases (Figs. 2 and 4A). Specifically, the zinc ion directly coordinates Cys-173, His-242, and Cys-253 in CsoSCA (Figs. 1C and 4B). An aspartic acid and an arginine are also structurally conserved in ␤-carbonic anhydrases (Asp-175 and Arg-177 in CsoSCA); these residues presumably assist in a number of catalytic steps, including substrate binding, proton shuffling, and product release.
The similarity between CsoSCA and other ␤-carbonic anhydrases extends beyond the active site. The homologs cited above all dimerize with a two-fold symmetry axis that coincides with the pseudo-symme- Only the very C-terminal (C-term) tails are not superimposable (colored gray). In P. purpureum (B, upper panel), the same pairing is accomplished by a single polypeptide, not a dimer. The N-terminal (N-term) domain is shown in yellow; the C-terminal domain is shown in red. Pseudo-twofold symmetry is still evident but is not exact (B, lower panel). Non-superimposable regions are shown in gray. Presumably, the P. purpureum carbonic anhydrase arose from gene duplication and fusion. Divergence appears minimal (70% sequence identity between domains). In A, upper panel, CsoSCA, like the P. purpureum enzyme appears to be the result of gene duplication and fusion. However, divergence between the two internal domains has progressed to the extent that the C-terminal domain has lost all active site residues that it presumably once contained. Superimposed catalytic and C-terminal domains (C, lower panel) show much larger areas of structural nonequivalence (gray). FIGURE 4. Conservation of active site structure and mechanism. A, currently available structures of ␤-carbonic anhydrases can be divided into two groups: those in which the conserved aspartate and arginine residues are hydrogen-bonded to each other (CsoSCA, P. sativum, M. thermoautotrophicum, and Rv1284) and those in which the hydrogen bonds are broken (P. purpureum, E. coli, and Rv3588c). These are represented in dark gray and light gray, respectively. In the former group, a water molecule serves as the fourth ligand to the zinc ion. In the latter group, the aspartate plays this role. The duality of conformations suggests a conformational flexibility that might play a role in catalysis. Residue numbering corresponds to CsoSCA. B, a stereo figure showing a superpositioning of CsoSCA (light gray) and P. sativum (dark gray) enzymes. A HCO 3 Ϫ ion was modeled into the active site, based on similarities with the position of the acetate ion found in the P. sativum structure. Small changes in orientation of the HCO 3 Ϫ allowed hydrogen bonds to form between the HCO 3 Ϫ and zinc, His-397, Asp-175, and backbone nitrogens of Ala-254 and Ala-255.

Structure of ␤-Carbonic Anhydrase from the Carboxysomal Shell
try axis relating the catalytic and C-terminal domains of CsoSCA (Fig.  3). Apparently, the catalytic and C-terminal domains of CsoSCA have maintained their relative orientations following gene duplication and fusion, despite much subsequent divergence. The carbonic anhydrase from Porphyridium purpureum bears special resemblance to CsoSCA in that it appears to have undergone a similar sequence of events in which two carbonic anhydrase domains are fused and arranged about a central two-fold axis of pseudo-symmetry (32) (Fig. 3B). However, the extent of the divergence between gene duplicates in CsoSCA is much greater than in P. purpureum; there is 70% sequence identity between the N-and C-terminal domains of P. purpureum carbonic anhydrase but only 11% identity between the catalytic and C-terminal domains of CsoSCA (Fig. 2).
CsoSCA Qualifies as a Distinct Subclass of the ␤-Carbonic Anhydrases-Although the structural similarities are sufficient to allow recognition of CsoSCA as a member of the ␤-class of carbonic anhydrases, three important differences merit qualification of CsoSCA as a member of a new, carboxysomal subclass of ␤-carbonic anhydrases. First, the sequence similarity between CsoSCA and the ␤-carbonic anhydrases is negligible; there is no statistically meaningful sequence similarity between CsoSCA and these proteins (less than 15% identity for aligned residues as compared with an average of 25% sequence identity among the remaining structurally determined ␤-carbonic anhydrases). The number of absolutely conserved residues among ␤-carbonic anhydrases decreases from 13 to 5 upon adding CsoSCA to the alignment (Fig. 2).
The numerous insertions and deletions in the CsoSCA primary structure as compared with other ␤-carbonic anhydrases offer a second distinguishing feature of carboxysomal carbonic anhydrases. These include: a 16-residue insertion extending helix F and its following loop; an 8-residue loop insertion between strand 3 and helix G; and severe shortening of helices G and H and intervening sequence (averaging a 30-residue deletion) (Figs. 1B and 2). This last region forms part of a dimer interface in some ␤-carbonic anhydrases (see below). Clearly, the structural differences between CsoSCA and the other ␤-carbonic anhydrases are far greater than those used to distinguish the two previously identified ␤ subclasses, "plant" (P. purpureum, E. coli, P. sativum, M. tuberculosis, Rv3588c) and "cab" (M. thermoautotrophicum, M. tuberculosis, Rv1284). Indeed, the plant and cab subclasses superimpose with an average pairwise r.m.s. deviation of 0.9 Å for 132 ␣C pairs, whereas CsoSCA superimposes on these carbonic anhydrases with an average pairwise r.m.s. deviation of 2.4 Å over only 81 ␣C pairs.
Lastly, CsoSCA is distinct from the other structurally determined ␤-class carbonic anhydrases in that it alone has lost the two-fold symmetric pairing of active sites that most likely was once part of its heritage (Fig. 3). Active site symmetry is apparent in all species of ␤-carbonic anhydrases observed so far, except in CsoSCA. Active sites of carbonic anhydrases from E. coli, P. sativum, and M. thermoautotrophicum are organized as identical pairs through homodimerization of the carbonic anhydrase domain. The P. purpureum enzyme displays an intermediate degree of symmetry, in which two carbonic anhydrase domains have fused, and are organized with two-fold pseudo-symmetry (although only 70% sequence identity is maintained between domains). In CsoSCA, however, the C-terminal domain has diverged so greatly that it has lost the ligands for the catalytic zinc, including a loop on which one of the ligands (equivalent to Cys-253) would reside.
This loss of the catalytic potential of the C-terminal domain presents an evolutionary puzzle. In P. purpureum, one can speculate that by breaking the symmetry of its gene fusion product, the organism is able to gain some advantage by tuning the individual domains to have opti-mum activity under different conditions. However, in the carboxysomal CsoSCA, the complete loss of an active site offers no apparent advantage gained in return for the cost of synthesizing the defunct domain. The fact that the C-terminal domain is about 130 residues smaller than the catalytic domain might suggest that by diminishing the size of the domain, the organism can partially mitigate the cost of its synthesis. Alternatively, one could propose a more purposeful role for the extreme divergence of the C-terminal domain, such as regulating the activity of the catalytic domain. The F 1 ATPase provides a well characterized analogy in which a non-catalytic ␣-subunit has evolved to regulate activity of the catalytic ␤-subunit (33). Still another possibility is that the C-terminal domain has evolved to interact with the carboxysome shell or RuBisCO. The observation of non-catalytic Gla and epidermal growth factor-like domains has led to similar proposals of their role in mediating protein-protein interactions (34).
A Layer of Dimers-CsoSCA protomers pack in layers within the crystal (Fig. 5A). The individual layers resemble egg crates, having two distinct faces: a flat face and a spiked face. The flat face of each layer originates from the tight-fitting lateral association of catalytic domains, whereas the spikes on the opposite face originate from the extremities of the N-terminal domains. Protomer-protomer contacts within the layer are numerous (see below), but layer-to-layer contacts are sparse, involving only one patch of 400 Å 2 between the N-terminal "spike" (Trp-117) and the flat catalytic domain surface (specifically, residues 345-349) of the next layer.
These layers obey plane group symmetry, p2, in which there are four unique two-fold symmetry axes normal to the plane (layer). The abundance of two-fold symmetry elements is not inconsistent with space group P1, the symmetry of the crystals used in this structure determination. The p2 layers apparently arise under the special circumstances observed here, in which the unit cell contains a homodimer having its non-crystallographic two-fold symmetry axis perpendicular to one of the unit cell planes, in this case perpendicular to the b-c plane.
The association of protomers within each layer is mediated by four distinct types of dimer interfaces, each centered on one of the four unique two-fold symmetry axes of plane group p2. Each interface is characterized by varying amounts of surface area buried and varying degrees of shape complementarity. In Fig. 5A, these are numbered 1-4 in order of decreasing amount of surface area buried by the two-fold symmetric interfaces (3304, 2188, 1030, and 584 Å 2 , respectively). The shape complementarity of the four interfaces also decreases in the same order (0.717, 0.684, 0.584, Ͻ0.500).
The largest of these four dimer interfaces (Fig. 5A, interface 1) is most likely to be biologically relevant due to the immense amount of surface area it buries (3304 Å 2) . Also, the shape complementarity of this dimer interface is high (shape complementarity ϭ 0.717), in the range of that seen for protease-inhibitor interfaces (35). Shape complementarity measures the closeness of fit between two molecular surfaces. Values can range from 0.0 to 1.0, where larger values indicate the absence of gaps between molecular surfaces. Lastly, 9 of the 29 residues lining this interface are absolutely conserved in an alignment of 10 carboxysomal carbonic anhydrase sequences. For these reasons, we term this dimer the principal dimer (Fig. 5, B and C). This dimer was chosen to represent the asymmetric unit in the coordinates deposited with the Protein Data Bank. A detailed description of the principal dimer interface is given in the following section.
The extent of the second largest interface (Fig. 5, interface 2) suggests that it may also be biologically relevant. With 2188 Å 2 of buried surface area, its size exceeds the threshold 1712 Å 2 used to discriminate between a true homodimer interface and simple crystal contacts between monomers (36). However, none of 19 residues are absolutely conserved in this interface. The remaining interfaces in the layer (Fig.  5A, interfaces 3 and 4) fall below this threshold and so would most likely require the support of other carboxysomal proteins if they were truly present in the carboxysome.
The existence of p2 layer symmetry within the carboxysomal shell presents an attractive hypothesis about functional organization. In whole carboxysomes, CsoSCA is found tightly associated with the shell component, rather than the core component of RuBisCO molecules (12,13). One might propose that interaction between CsoSCA and a shell protein such as CsoS1A might be facilitated by the complementarity of fit between their flat surfaces (6). 5 If the N-terminal spikes were facing inward toward the center of the carboxysome, they could interact with RuBisCO molecules, perhaps to organize them with the proper periodicity for efficiency and compaction into paracrystalline arrays (38,39). A possible role of CsoSCA in organizing RuBisCO molecules is implied by analogy with the functionally related CcmM protein in which four repeats of the RuBisCO small subunit precede the carbonic anhydrase domain. It has been proposed that CcmM, anchored to the inside shell of the carboxysome, could bind to the large subunits of RuBisCO (11).
Three lines of evidence argue against the hypothesis that CsoSCA molecules are arranged with p2 symmetry in the carboxysomal shell. CsoSCA is a minor component of the carboxysome, appearing as a very faint band in a Coomassie Blue-stained SDS-polyacrylamide gel of purified carboxysomes (12,13). Thus, it is unlikely that there is a sufficient number of molecules present to form a continuous layer in the shell. Secondly, the periodicity of molecules in the layer (a ϭ 90 Å, b ϭ 48 Å, ␥ ϭ 90°, Fig. 5A) does not match that of the CsoS1A shell component known from crystallographic studies (a ϭ b ϭ 67 Å, ␥ ϭ 120°). 5 Specific interactions between sheets would be difficult to imagine if the layers do not share the same repeating dimensions. Lastly, examination of the protomer interfaces within the layer indicates that two of the four protomer interfaces in the layer are significantly larger and more enmeshed than the remaining two (Fig. 5A, compare interfaces 1 and 2 with interfaces 3 and 4). Interestingly, the two strongest interfaces are juxtaposed to produce straight filaments (Fig. 5B). Perhaps it is these filaments, rather than sheets, that line the inside of the carboxysomal shell. Filaments have also been observed in the M. thermoautotrophicum ␤-carbonic anhydrase crystal structure (via an open-ended domain swap) and have been suggested to have functional importance (30). Although the extent of CsoSCA protomer assembly is uncertain, it is clear that the principal dimer is a key component. Indeed, size exclusion chromatography results suggest that a dimer is the principal species in vitro.
The Principal Dimer Interface-The interface of the principal dimer may be divided into two separate subregions; the first is centered on a pair of symmetry related helices F (residues 206 -223), originating from the catalytic domains of protomers A and B (Fig. 5C). These two helices cross each other at an angle of 50°near their C-terminal ends. Hydrophobic interactions are observed near the crossing point where Leu-218 and Tyr-221 of protomer A contact equivalent residues in protomer B. Adjacent to these interactions, hydrogen bonds are reciprocated between Arg-222 (NH1) and Cys-283 (O) of opposing protomers, whereas salt bridges similarly link the side chains of Glu-207 (OE2) and Arg-468 (NE). Residues Glu-214 and Lys-215 are also paired across the dimer axis; however, these interactions are weaker (separated by 4 Å), presumably because these side chains are exposed to solvent.
The second subregion of this dimer interface is also centered on the dimer symmetry axis but ϳ20 Å away from the first subregion (Fig. 5C). It involves helix L of protomer A (from the C-terminal domain) and its two-fold symmetry-mate from protomer B. These two helices do not cross each other as the helices F (see above), but instead, their N-terminal ends abut, so their helical axes are roughly parallel. Such an interface would seem unfavorable due to the electrostatic repulsion arising from the proximity of the positive ends of the two-helix dipoles. However, Glu-426 at the very N-terminal end of the helix apparently neutralizes the charge repulsion. Hydrogen bonds are reciprocated between the side chain carboxylate of Glu-426 (O⑀2) and the amide nitrogen of Glu-426 (N) of the opposing protomer. The remaining carboxylate oxygen of Glu-426 (O⑀1) further participates in a second hydrogen bond with the amide nitrogen of Val-425 (N). This key residue, Glu-426, is absolutely conserved among carboxysomal carbonic anhydrases, as well as neighboring residue Thr-424, which forms van der Waals contacts between protomer A and its symmetry equivalent in protomer B. The interface is additionally strengthened by water-mediated hydrogen bonds between Glu-427 (O⑀2) and Arg-475 (NH 2 ) of opposing protomers.
The P. purpureum enzyme has a dimer interface analogous to CsoSCA, but more extensive (32). The P. purpureum enzyme uses roughly the same face of the molecule as a dimerization interface, but there is about 50°difference in the orientation of the monomers. The interface of the P. purpureum enzyme covers 8000 Å 2 overall, as compared with 3304 Å 2 in CsoSCA. A large segment of this interface is absent in CsoSCA and would correspond to a deletion of 30 residues between helices G and H (Fig. 2). Carboxysomal carbonic anhydrases appear to be the only subclass of ␤-carbonic anhydrases to have such an extensive deletion in this region. In E. coli (29) and Rv1284 (31), this region is used to form a tetrameric interface analogous to the dimeric interface of P. purpureum. However, in the octameric P. sativum (28), the filamentous M. thermoautotrophicum (30), and dimeric Rv3588c (31) enzymes, this region takes no part in any of its interfaces.
Thermodynamics of CO 2 Generation within the Carboxysome-For CsoSCA, as for all the structurally determined carbonic anhydrases to date, the enzyme-catalyzed reaction is reversible and is not driven by direct coupling to the hydrolysis of high energy compounds such as ATP. That is, whether CsoSCA catalyzes a net gain in reactant (HCO 3 Ϫ ) or product (CO 2 ) is determined by the difference in free energy (⌬G) between the two forms of inorganic carbon within the environment of the carboxysome shell. This energy difference must be sufficiently negative for CsoSCA to achieve its presumed biological role, the production of CO 2 for the fixation of carbon by RuBisCO inside the carboxysome (14). Since the standard free energy difference (⌬G°) contributes only a small negative term to ⌬G, it is likely that ⌬G (and the direction of the reaction) must be favorably influenced by the principle of mass action. That is, local concentrations of the reactant, HCO 3 Ϫ , are elevated by its active transport into the cell (1), whereas the local concentration of the product, CO 2 , is diminished by the close proximity of encapsulated RuBisCO, a CO 2 sink (37). The sensitivity of the reaction direction to the environmental conditions is evidently key to the diverse roles played by ␤-carbonic anhydrase in biology. For example, in contrast to CsoSCA, the biological role of ␤-carbonic anhydrase in C 4 plants is not the production of CO 2 but rather the hydration of CO 2 to produce HCO 3 Ϫ , the substrate for phosphoenolpyruvate carboxylase (40). Thus, small changes in environmental conditions are presumably sufficient to reverse the biological role of ␤-carbonic anhydrases in the cell. Active Site Similarities across Subclasses-Despite the level of sequence divergence that separates CsoSCA from the remaining two subclasses of ␤-carbonic anhydrases, there is a remarkable level of structural similarity in the active site region, which suggests a common catalytic mechanism for the interconversion of HCO 3 Ϫ and CO 2. These active site residues fall into two main categories: zinc binding and substrate/product binding. Zinc-mediated catalysis is common to all ␤-carbonic anhydrases studied to date. Crystal structures of these enzymes reveal that the zinc ion is ligated by two cysteines and a histidine with tetrahedral geometry (28 -32). In CsoSCA, these ligands are Cys-173(S␥), His-242(N⑀2), and Cys-253(S␥). Indeed, zinc coordination geometry is remarkably conserved among ␤-carbonic anhydrase structures (Fig. 4A); however, the identity of the fourth zinc ligand can vary from structure to structure. In CsoSCA, it is a water molecule, analogous to those observed in ␤-carbonic anhydrase structures from P. sativum, M. thermoautotrophicum, and M. tuberculosis (Rv1284). In the remaining ␤-carbonic anhydrase structures, this ligand is a strictly conserved aspartate: Asp-151 and Asp-405 in P. purpureum, Asp-44 in E. coli, and Asp-53 in Rv3588c from M. tuberculosis. It is observed that whenever water occupies the position of the fourth zinc ligand, the aspartate (Asp-175 in CsoSCA) consequently takes a second conformation, fixed by two hydrogen bonds to a strictly conserved arginine (Arg-177 in CsoSCA) (Fig. 4A).
The alternate conformations of this aspartate might play multiple roles in the catalytic mechanism (see below).
The composition of the HCO 3 Ϫ binding site might appear to be more subclass-specific than those involved in zinc binding; however, this impression might simply be due to the uncertainty of the location of HCO 3 Ϫ binding. Within the plant subclass of ␤-carbonic anhydrases, a conserved glutamine residue has been implicated in HCO 3 Ϫ binding.
Specifically, in the P. sativum structure, Gln-151 is observed donating a hydrogen bond to substrate analog, acetate (28). In the carboxysomal carbonic anhydrases, this glutamine residue is substituted (conservatively) with a histidine (His-397 in CsoSCA). This histidine superimposes remarkably well with Gln-151 (Fig. 4B). The N⑀2 atom of His-397 is in proximity of an oxygen atom of acetate. His-397 is held in position by a hydrogen bond between N␦1 and the relatively buried Glu-399. Backbone amide nitrogens of Ala-254 and Ala-255 (helix G) are in position to donate hydrogen bonds to the remaining HCO 3 Ϫ oxygens. Struc-turally equivalent amide nitrogens are present in all ␤-carbonic anhydrases, even the cab subclass, which appear to lack a residue equivalent to the Gln-151 of the plant. A model for HCO 3 Ϫ binding to CsoSCA is presented in Fig. 4B.
A Conserved Catalytic Mechanism-Structural similarity among ␤-carbonic anhydrases allows us to propose a mechanism for the dehydration of HCO 3 Ϫ that is analogous to that described for the reverse reaction (hydration of CO 2 ) in the literature (28 -32). Here, we adopt the convention that dehydration is the forward direction since CO 2 is considered the biologically important product in the carboxysome. However, it should be understood that notions of forward or reverse directions are entirely a matter of convenience. Furthermore, we note that CsoSCA provides no advantage in directionality of the reaction as compared with other carbonic anhydrases, as enzymes have no effect on the equilibrium positions of the reactions they catalyze. The catalytic cycle begins with water bound as the fourth ligand in the coordination shell of zinc, just as observed in the current crystal structure (Figs. 1C and 6, A and B). In preparation for HCO 3 Ϫ binding, conformational flexibility in Asp-175 might aid in dislodging the water from this site. In addition to being one of the products of the dehydration reaction, water is present at a high concentration and so might otherwise be difficult to displace. In support of this role, Asp-175 has been observed to successfully compete with water for binding to the zinc in the ␤-carbonic anhydrase structures of P. purpureum (32), E. coli (29), and M. tuberculosis (Rv3588c) (31). As HCO 3 Ϫ approaches the zinc site for coordination, Asp-175 further aids catalysis by selecting a catalytically competent binding geometry for HCO 3 Ϫ . Although all three oxygen atoms of HCO 3 Ϫ have affinity for zinc, it is important that the zinc coordinate only the protonated oxygen atom since this oxygen will form the leaving group. Selection is achieved by the close proximity of Asp-175 to the zinc-coordinated oxygen of HCO 3 Ϫ (Fig. 6C). Asp-175 is held in this conformation by a pair of hydrogen bonds with Arg-177 (as seen in the crystal structure, Fig. 1C). Since Asp-175 can serve only as a hydrogen bond acceptor, not as a donor, only the protonated oxygen of HCO 3 Ϫ can complete the hydrogen bond and coordinate zinc at the same time. An analogous "gatekeeper" role has been proposed for this strictly conserved aspartate in the hydration reaction (28). The backbone amide of Ala-254 donates a second hydrogen bond to the coordinated oxygen of HCO 3 Ϫ , and one of the two remaining HCO 3 Ϫ oxygen atoms accepts a hydrogen bond from the backbone amide of Ala-255. Decomposition of HCO 3 Ϫ to CO 2 and OH Ϫ follows (Fig. 6D). CO 2 diffuses from the active site, and a water molecule takes its place (Fig.  6E), binding where water 305 is observed in the crystal structure (Fig.  1C). The zinc-bound OH Ϫ ion is likely protonated via this water molecule (Fig. 6F), which is in contact with bulk solvent. Fortuitous Zinc Binding Site-In the furthest extremity of the N-terminal domain, a zinc ion was observed bound between surface loops (Fig. 1A). Specifically, the zinc ion directly coordinates His-92, Asp-115, and His-121. The electron density attributed to zinc could not have originated from water since a strong peak (8 ) was observed centered on this atom in an anomalous difference Fourier map. An anomalous scattering signal of this magnitude could only have arisen from a metal ion. Indeed, the metal site was used to improve the phases used to calculate subsequent electron density maps. The atomic element was identified as zinc due to the observance of a fluorescence maximum corresponding to the zinc peak wavelength. (Note: The same methods were used to verify the identity of the zinc ion in the catalytic domain).
The appearance of a zinc ion bound to the novel N-terminal domain (residues His-92, Asp-115, and His-121) gave the initial impression that this domain contains a catalytic site for the carbonic anhydrase reaction. However, it was later discovered that His-92, one of the zinc ligands, arose only as the result of a fortuitous point mutation in the clone used to produce the protein for crystallization. Since this mutant does not display increased activity as compared with the wild-type protein, 6 it seems unlikely that this zinc atom remains intact in the wild-type protein, which contains a tyrosine at position 92. The tyrosine hydroxyl would likely displace the zinc ion due to the increased length of the side chain. Furthermore, if this apparently fortuitous metal binding site were truly advantageous, one would expect that the 3 residues directly coordinating the zinc site would be evolutionarily conserved. Sequence alignments clearly show this not to be the case (Fig. 2, and additional alignments not shown). We are therefore compelled to conclude that this zinc ion found in the N-terminal domain is not biologically relevant.
Conclusions-The similarities observed in the active site structure and likely catalytic mechanism between the carboxysomal carbonic anhydrase CsoSCA from H. neapolitanus and known ␤-carbonic anhydrases clearly place this enzyme into the carbonic anhydrase ␤-class. However, they must be considered as belonging to a distinct subclass since CsoSCA and closely related carboxysomal enzymes display an almost complete lack of primary structure similarity to other carbonic anhydrases and do not posses the characteristic paired active sites of other ␤-carbonic anhydrases. The significance of these structural deviations for the function of carboxysomal carbonic anhydrases is not fully understood, but they might be essential to the role of CsoSCA as a component of this primitive prokaryotic organelle.