Structure and Function of Carbonic Anhydrases from Mycobacterium tuberculosis*

Carbonic anhydrases catalyze the reversible hydration of carbon dioxide to form bicarbonate. This activity is universally required for fatty acid biosynthesis as well as for the production of a number of small molecules, pH homeostasis, and other functions. At least three different carbonic anhydrase families are known to exist, of which the α-class found in humans has been studied in most detail. In the present work, we describe the structures of two of the three β-class carbonic anhydrases that have been identified in Mycobacterium tuberculosis, i.e. Rv1284 and Rv3588c. Both structures were solved by molecular replacement and then refined to resolutions of 2.0 and 1.75 Å, respectively. The active site of Rv1284 is small and almost completely shielded from solvent, whereas that of Rv3588c is larger and quite open to solution. Differences in coordination of the active site metal are also observed. In Rv3588c, an aspartic acid side chain displaces a water molecule and coordinates directly to the zinc ion, thereby closing the zinc coordination sphere and breaking the salt link to a nearby arginine that is a feature of Rv1284. The two carbonic anhydrases thus exhibit both of the metal coordination geometries that have previously been observed for structures in this family. Activity studies demonstrate that Rv3588c is a completely functional carbonic anhydrase. The apparent lack of activity of Rv1284 in the present assay system is likely exacerbated by the observed depletion of zinc in the preparation.

The World Health Organization estimates that 1.8 billion people are infected with Mycobacterium tuberculosis, with 8 million new cases and 2 million deaths per year. The so-called "short-term" antibiotic treatment currently available lasts 6 months and does not seem to offer much hope for eradicating the disease in the long run. In the milestone publication of the M. tuberculosis genome sequence, Cole et al. (1) were able to assign functions to some 60% of the gene products. Subsequent studies have attempted to identify genes that are likely to be required for bacterial survival and infection (e.g. studies by Sassetti and Rubin (2) and Sassetti et al. (3)). We have used such investigations as the basis for identifying M. tuberculosis gene products that are interesting subjects for detailed structural and functional studies.
The gene Rv1284 is annotated at the Pasteur Institute Tu-bercuList server (genolist.pasteur.fr/TubercuList) as coding for a conserved hypothetical protein of unknown function, that is, it is found in related species and presumably has an essential role in their survival, although it is not clear what function is involved. Furthermore, Rv1284 was judged to be essential by Himar1-based transposon mutagenesis in strain H37Rv (3), and its transcription was highly up-regulated under the starvation conditions used to model persistent bacteria (4). Phylogenetic studies of putative carbonic anhydrases (CAs) 1 (EC 4.2.1.1) suggested that Rv1284 in fact encodes a ␤-CA and uncovered distant relationships to two other M. tuberculosis proteins, those produced by the Rv3588c gene and the C-terminal region of Rv3273 (5). Rv3588c has been identified as being required for mycobacterial growth in vivo by Sassetti and Rubin (2), whereas Rv3273 is not essential (2,3). Human carbonic anhydrases belong to the ␣-class, which has a completely different structure; both ␣and ␤-classes, however, are comprised of zinc-based enzymes (see Ref. 6 for a review of carbonic anhydrase structures). This difference as well as the success of drug design for the ␣-CAs (reviewed in Ref. 7) prompted us to carry out a detailed analysis of the mycobacterial enzymes. In the present publication we report structural and functional studies of the Rv1284 and Rv3588c gene products. The results demonstrate that Rv3588c is indeed a ␤-CA and that Rv1284 is a ␤-CA-like protein.

EXPERIMENTAL PROCEDURES
Cloning, Protein Expression, and Purification-The sequences corresponding to the open reading frames of Rv1284 and Rv3588c were amplified by PCR from M. tuberculosis DNA strain H37Rv (1) with Pfu polymerase using the primer pair 5Ј-GTGACGGTTACCGACGACTAC-CTG-3Ј (Rv1284.forward) and 5Ј-TATGGTTATTGCCATCTATGCAA-C-3Ј (Rv1284.reverse) for Rv1284 and 5Ј-GTGAGGCATGCCCAACAC-CAATCCGGTA-3Ј (Rv3588c.forward) and 5Ј-TCCCGGGTACCTATTA-GACCTCCTCGCCGATGTT-3Ј (Rv3588c.reverse) for Rv3588c. A second PCR to introduce an N-terminal His 6 tag into each construct was performed with the same reverse primer and a new forward primer, Rv1284.his (5Ј-ATGGCCCATCATCATCATCATCATTCTGGTACGGT-TACCG-ACGACTACCTG-3Ј) and Rv3588c.his (5Ј-ATGGCTCATCATC-* This work was supported by the Swedish Foundation for Strategic Research, the Swedish Research Council, and the European Commission programs SPINE (QLG2-CT-2002-00988) and X-TB (QLRT-2000-02018). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The ATCATCATCATGGTCCCAACACCAATCCGGTAGCCGCG-3Ј), respectively, using the products from the first PCRs as templates. The PCR products were incubated with deoxynucleotide triphosphates and Taq polymerase to create an A-overhang, purified, and ligated into vector pCR®T7 using the TOPO® Cloning pCR® kit (Invitrogen). The resulting plasmids were transformed into Top10 cells (Invitrogen) by heat shock and grown on Luria-Bertani agar plates containing 50 g/ml ampicillin. Positive clones were transformed into Escherichia coli BL21/AI in which protein was overexpressed after induction with 0.02% arabinose at 37°C for 2 h. Cells were harvested by centrifugation. Cell pellets were washed with 1ϫ SSPE buffer (150 mM NaCl, 10 mM NaH 2 PO 4 , pH 7.5, and 1 mM EDTA) and stored at Ϫ70°C. All clones were verified by DNA sequencing.
Assays and Metal Analysis-Esterase activity was measured as pnitrophenylacetate hydrolysis at 25°C, using a modification of previously published methods (8). A 3 mM stock of p-nitrophenylacetate was freshly prepared by dissolving the compound in acetone and then diluting it 25-fold with water. The uncatalyzed reaction was measured by adding 1.9 ml of 50 mM Tris-HCl, pH 7.5, to 1 ml of this substrate solution and recording the change in A 348 . After 3 min, 100 l of enzyme solution was added, and the catalyzed reaction was monitored for an additional 3 min. A non-enzymatic control (using bovine serum albumin) was included in each set of measurements; bovine carbonic anhydrase II (Sigma) was used as a positive control.
Carbonic anhydrase activity was assayed by measuring changes in pH during the reaction with a dye indicator method (9). Assays were again performed at 25°C. The buffer/indicator pairs used were (a) 25 mM HEPES plus 100 mM phenol red (pH 7.5, A 558 ) and (b) 25 mM TAPS plus 100 M m-cresol purple (pH 8.4 -9.0, A 578 ); both solutions also contained 100 mM sodium sulfate to maintain a relatively constant ionic strength of the reaction medium. The reaction was initiated by addition of 0.5 ml of CO 2 -saturated water to 0.5 ml of a 2ϫ buffer/indicator solution containing enzyme (final protein concentration, 2.5 M). Nonenzymatic and positive controls were included as described for the esterase assay. Data were acquired at the appropriate wavelength at 0.1-s intervals using a Beckmann DU-600 spectrophotometer with time t ϭ 0 coinciding with manual addition of substrate.
Metal content analysis (Analytica AB) (www.analytica.se/) was performed by inductively coupled plasma sector field mass spectroscopy for all elements except Hg, which was analyzed by atomic fluorescence spectroscopy.
Crystallization, Data Collection, and Processing-Rv1284 crystals were grown by the hanging-drop vapor-diffusion method (10) at 20°C. The crystallization drop setup consisted of 2 l of protein sample (14.5 mg/ml) mixed with 2 l of reservoir solution containing 0.1 M sodium thiocyanate, pH 7.5, and 8% polyethylene glycol 2000. Needle-like crystals appeared after 1 day. Prior to data collection, the crystal was transferred to a cryoprotectant solution comprising reservoir solution with 30% glycerol and then immediately flash-frozen in liquid nitrogen. X-ray data were collected at 100 K at beamline ID14-1 of the European Synchrotron Radiation Facility in Grenoble, France. Diffraction data were processed with MOSFLM (11) and reduced and scaled in SCALA (12) using the CCP4 program suite (13).
Rv3588c was crystallized using the sitting-drop vapor diffusion method. Final conditions for crystallization were as follows: 1 l of protein solution (34 mg/ml) was mixed with 1 l of reservoir solution containing 200 mM MgCl 2 , 35% polyethylene glycol 400, and 100 mM HEPES at pH 7.0. Crystals grew to 0.1-0.2 mm in diameter in 5-8 days and could be flash-frozen directly from the mother liquor. Diffraction data were collected at 100 K at beamline I711 at the MAXII synchrotron in Lund, Sweden. The data were processed and scaled using Denzo and Scalepack (14).
Data collection statistics for both crystal forms are shown in Table I. Structure Determination and Refinement-The Rv1284 structure was solved by molecular replacement in AMoRe (15) using a dimer of the carbonic anhydrase from Methanobacterium thermoautotrophicum (Protein Data Bank code 1G5C) (16) as a template for the search model. All residues with sequence identity were kept unchanged, whereas other non-alanine and non-glycine residues were mutated to serine using SEAMAN (17). The first 25 residues of both molecules in the dimer were removed from the model. Because a strong translation peak appeared in the native Patterson map, the same top rotation solution was used to locate a second dimer. Rigid body refinement was then performed in REFMAC5 (18) with each molecule defined as a rigid unit. NCS phased refinement in NCSref (within the CCP4 interface) (19) was used to generate an averaged map. Additional features not present in the search model and consistent with the M. tuberculosis sequence appeared in the averaged electron density. By alternating NCS phased refinement, model rebuilding in O (20), and data extension to 2.0 Å, the complete M. tuberculosis sequence could eventually be placed in density. NCS restraints were then released, and the refinement proceeded with alternating cycles of REFMAC5 and manual rebuilding of the four separate subunits. Each dimer was used to define a separate TLS group (21) for the final cycles of refinement.
Rv3588c was solved by molecular replacement using CNS (22). The search model was constructed from the two structures deposited in the Protein Data Bank having the highest sequence identity: Protein Data Bank code 1I6O (23) with 33.7% identity in a 166-amino acid segment and Protein Data Bank code 1DDZ (24) with 34.6% identity in a 182-amino acid segment. The final search model consisted of four polyalanine chain segments containing 136 residues in total. Both enantiomorphic space groups, P4 1 2 1 2 and P4 3 2 1 2, were evaluated; the best solution was found in P4 1 2 1 2. Packing analysis showed that the top solution produced a correct dimer with a symmetry-related molecule and plausible crystal packing. Rigid body refinement to 3-Å resolution was followed by alternating cycles of REFMAC5 refinement and manual rebuilding. During refinement, density inconsistent with protein or solvent became apparent in the vicinity of His 199 . This was modeled as a magnesium ion octahedrally coordinated by the histidine ring, the carbonyl oxygen of the His 199 backbone, and four water molecules. Mg was chosen because of its high concentration (200 mM) in the mother liquor; the coordination geometry and atomic temperature factors are consistent with this assignment. Because this ion is distant from the active site and dimer interfaces, we presume it lacks functional significance.
The final refinement statistics for both structures are given in Table  I. Atomic coordinates and structure factor data have been deposited at the Protein Data Bank (25) with accession codes 1YLK and 1YM3.

RESULTS
Overall Structures of Rv1284 and Rv3588c-His-tagged fulllength Rv1284 and Rv3588c were expressed, purified, and crystallized, and their structures were determined by molecular replacement. Both structures have been refined to 2-Å resolution or better, with the statistics reported in Table I. The Rv1284 crystal contains two dimers in the asymmetric unit, each of which is part of a tetramer formed via a crystallographic 2-fold axis. Rv3588c, on the other hand, contains just a single subunit that forms a dimer by the interaction around a crystallographic 2-fold axis. The Rv1284 structure shows continuous electron density for the entire sequence, i.e. from residue 1 to residue 163. Rv3588c, however, has no density for residues 1-3, 32-41, and 207.
Both structures have the ␤-CA ␣/␤ fold anticipated from sequence analysis. Each subunit consists of a five stranded ␤-sheet, in which strands 1-4 are parallel and ordered 2-1-3-4. The fifth strand is anti-parallel and connected to the fourth by a short reverse turn. The connecting loops are of different length and structure in the two proteins. The ␤1-␤2 loop is short and irregular in both cases. The ␤2-␤3 connection includes a regular ␣-helix (␣2) packing on the surface of the sheet. The ␤3-␤4 loop is much longer, containing two helices in Rv1284 and four helices in Rv3588c. One of these helices (␣4 in Rv1284) packs on the sheet near ␤4 in both proteins. The remaining portion of the loop stretches across the surface of the dimer to interact with the 2-fold related helix linking ␤2 and ␤3. In Rv1284, the first ␣-helix clearly packs on the sheet of the other subunit, as was observed in three of the four known ␤-CA structures (16, 23, 24, 26) (see Fig. 1A). Because of the density break in Rv3588c we cannot rule out the possibility that its first helix packs in the same manner, but the positioning of the abutting residues suggests that it is not domain-swapped (Fig.  1B). The subunits of Rv1284 and Rv3588c can be superimposed such that 106 C␣ pairs have an r.m.s.d. of 1.45 Å (27), consistent with their structure-based sequence identity of ϳ15%. The largest structural variation is observed in the ␤3-␤4 loop, as described above.
Active Sites-The active site of a ␤-CA lies near a switch point at the C-terminal edge of its parallel ␤-sheet, as has been observed for other ␣/␤ proteins (28). Although neither of the present crystallization reagents included metal ions, both structures show strong electron density at the position where one would normally expect a zinc ion in an active ␤-CA enzyme. In each case we have modeled this density as a zinc atom, although the peak height in Rv3588c is twice that in Rv1284 (6.3 and 3.3 electrons/Å 3 , respectively, with both structures having density r.m.s.d. values of 0.32 electrons/Å 3 ). Coordination of the active site ion is, however, different in the two proteins.
In Rv1284, all of the metal ligands are contributed by one subunit of the dimer, although much of the active site cavity is shaped by the other subunit, in particular ␤2Ј (i.e. ␤2 in the other chain in the dimer), the helix between ␤2 and ␤3, and residues in the long loop connecting the N-terminal helix to ␤1Ј. Residues Cys 35 , His 88 , Cys 91 , and a water molecule coordinate the metal ion, whereas the neighboring Asp 37 and Arg 39 form a well-defined salt bridge ( Fig. 2A). The interactions of the ligands in Rv1284 are thus essentially identical to those described by Kimber and Pai for the enzyme from Pisum sativum (26). The S␥ atoms of the cysteines hydrogen bond with main chain amides, whereas the N␦1 of His 88 interacts with a main chain carbonyl oxygen. The water ligand forms a tight hydrogen bond with the carboxylate group of Asp 37 . A thiocyanate molecule is also positioned in the active site pocket of Rv1284, although it is not coordinated to the metal ion. The Rv1284 active site pocket is small (accessible surface volume, ϳ7 Å 3 ) and almost completely shielded from solvent; its shape is a perfect match for that of the observed thiocyanate. It contains only one other ionizable side chain, His 54Ј from the second subunit.
In Rv3588c, the active site residue Asp 53 displaces the water molecule and coordinates directly to the zinc ion, thus breaking a potential salt link to Arg 55 (Fig. 2, B and C). The two M. tuberculosis ␤-CAs, therefore, exhibit both of the coordination variants reported for ␤-CAs elsewhere (16,23,24,26). There is an additional hydrogen-bonding group in the active site cavity of Rv3588c, Tyr 89 , which is positioned 6.7 Å from the Asp 53 ligand; this residue is structurally conserved in three of the four other structures. The Rv3588c active site is much more open than that of Rv1284 and directly accessible to the solvent. This difference may be primarily due to the lack of an ordered loop between the N-terminal helix and ␤1.
In Rv3588c the greatest deviations from tetrahedral geometry at the zinc atom involve the aspartyl ligand, with one angle differing from the ideal by 18°. The geometry of the protein ligands in Rv1284 appears more strained: the angle subtended at the metal by the S␥ atoms again deviates by ϳ18°from the ideal, and the ion is also ϳ0.7 Å out of the plane of the imidazole ring.
Comparisons to Other ␤-CAs-Four ␤-CA structures have previously been reported in the literature. They can be aligned to the Rv1284 dimer with 220 -240 matching C␣ pairs having r.m.s.d. values of 1.4 -1.6 Å using our standard alignment criteria (27) (see Table II). The sequence identity of the aligned residues is very low (ϳ15%), except in the case of the M. thermoautotrophicum enzyme, to which it has 29% identity. By contrast, similar alignments to Rv3588c indicate that it is much more similar to the other three CA structures (Table III). The four earlier ␤-CA structures are superimposed on Rv1284 in Fig. 3. The comparison shows that the ␤-sheet and two of the ␣-helices that pack on it (in the ␤2/␤3 and ␤3/␤4 loops) are well conserved, although there is a great deal of variation in other regions. One major difference involves the N-terminal ␣-helix. In Rv1284, as well as the red alga (Protein Data Bank code 1DDZ) (24), plant (Protein Data Bank code 1EKJ) (26), and E. coli (Protein Data Bank code 1I6P) (23) enzymes, this helix packs on the sheet of the other subunit, although the superimposed coordinates do not form a tight cluster. In the enzyme from M. thermoautotrophicum (Protein Data Bank code 1G5C) (16), the equivalent helix packs on the sheet of other dimers in the crystal. All of the structures show good agreement within ␤1 as far as the conserved aspartic acid (Asp 37 in Rv1284 numbering), ␤2, the helical portion of the ␤2-␤3 connection, and ␤3. As described for the two M. tuberculosis proteins, large variations are seen in the loop region between ␤3 and ␤4. The C-terminal end of this long loop is, however, tightly clustered on the helix that stacks on strand ␤4 (␣4 in Rv1284). The reverse turn between strands ␤4 and ␤5 is a Nr, number of residues in the dimer; Nc, numbers of atoms within a 3.5-Å cutoff; Ir, identical residues in conserved region/number of residues in Rv3588c; r.m.s. distance, root mean square distance to Rv3588c; Z, Dali Z-score; PDB, Protein Data Bank.  a Nr, number of residues in the dimer; Nc, numbers of atoms within a 3.5-Å cutoff; Ir, sequence identity in conserved region/number of residues in Rv1284; r.m.s. distance., root mean square distance to Rv1284; Z, Dali Z-score; PDB, Protein Data Bank. conserved in length but shows some spatial fanning. Some pairs of structures show similarities that are not present in the entire family. For example, the ␤2-␤3 loops in the structures of Protein Data Bank codes 1I6P and 1DDZ are fully conserved, whereas in the other family members, only the helix-turn-␤3 unit is similar.
In two of the previously published structures (those from P. sativum and M. thermoautotrophicum), three protein side chains and a water molecule coordinate the active site zinc atom; these exhibit geometries that are very similar to that observed in Rv1284. The pea enzyme, in particular, shows similar deviations from tetrahedral geometry at the zinc atom. The other two structures (from E. coli and red algae) exhibit the aspartyl residue coordination observed for Rv3588c. In other respects, the coordination is similar, although other residues lining the active site pocket are not conserved. The most constricted entrance to the active site is found in Rv1284, primarily due to side chains from the loop connecting the first helix and strand, particularly Met 24 and Pro 25 . The thiocyanate ion observed in Rv1284 overlaps with the acetate ion found in the active site of the P. sativum enzyme.
Only thirteen residues are completely conserved in alignments based on the common structural core (Fig. 4). Five of these are the observed metal ligands and the neighboring arginine residue. Outside this core, the sequence alignments must be treated with caution, largely because of the structural variations already described.
The first conserved region in the sequence contains two metal ligands, as well as the nearby arginine. Here, the side chains of Ala 38 /Ser 54 (Rv1284/Rv3588c numbering, respectively) point in opposite directions (with C␣s separated by ϳ5 Å), as a result of the rearrangements that are associated with changes in the salt bridge and metal coordination. The movement of Leu 40 /Val 56 is also part of the structural adjustments that allow the guanidinium group of Arg 39/55 to stay in approximately the same place.
The next region of sequence similarity corresponds to the edge strand (␤2, i.e. residues Gly 51 -Gly 60 in Rv1284) that is part of the dimer interface. The conserved glycine that begins this strand adopts an ␣ L conformation, whereas the highly conserved asparagine side chain near the end (Asn 58 ) forms an Asx turn hydrogen bond. The arginine residue in the middle of the strand, which may be conservatively replaced by a lysine, is buried within the structure, without compensating interactions to negatively charged side chains. Instead, the guanidinium group of Arg 57 donates four hydrogen bonds to main chain carbonyl groups in Rv1284; the equivalent Arg 73 is solvated in Rv3588c.
The next area of conservation lies in ␤3, the central strand of each subunit. Branched side chains in the middle of this strand interact with ␣-helices above and below the plane of the sheet (␣2 and ␣3 on one side, and the ␤1/␤2 loop and ␣1Ј on the other). His 88 and Cys 91 at the C-terminal end of the strand are active site ligands. The conserved glycine immediately following, residue 92, adopts an ␣-helical conformation; any side chain group at this position would impinge on the active site cavity. Helix ␣4 packs on the surface of the sheet, in particular on ␤4, to give a conserved pattern of branched hydrophobic side chains (Val 127 , Leu 131 , and Ile 134 from the helix, and Leu 146 and Gly 148 from ␤4). The loop connecting the C terminus of ␣4 and ␤4 is variable in length and structure. The final highly conserved residue is Gly 156 , which lies in a variation of the classical three-residue reverse turn (29) between ␤4 and ␤5.
Metal Content-Rv1284 and Rv3588c protein samples were analyzed for a number of metals: Zn, Ni, Fe, Cu, Mn, Al, Cr, Co, Cd, Pb, As, Ba, and Hg. Rv3588c was found to contain 0.94 equivalent of Zn per protein molecule; in addition it contained 0.04 equivalent of each of Ni and Cu and 0.01 equivalent of Al. The Rv1284 sample contained only 0.30 equivalent of Zn, together with 0.18 equivalent of Ni, 0.05 equivalent of Al, and 0.03 equivalent of Fe. Only trace amounts (Ͻ0.01 equivalent) were found for the other metals assayed.
Activity Assays-The ability of Rv1284 and Rv3588c to catalyze the hydration of CO 2 was tested using a colorimetric assay (9) at several different starting pH values. These measurements indicate that Rv3588c is an active carbonic anhydrase at pH 8.4 (Fig. 5). However, no activity was detectable at pH 7.5 under the conditions used in our assay (data not shown). The activity of Rv1284 was not detectable at either pH value; omitting sulfate or chloride from the solutions did not affect this result.
In common with the other ␤-CA enzymes that have been tested (e.g. Ref. 30), neither Rv1284 nor Rv3588c showed the esterase activity that is characteristic of the ␣-CA family.

DISCUSSION
Carbonic anhydrases catalyze the reversible hydration of carbon dioxide to form bicarbonate. This simple conversion of a membrane-permeable gas substrate into a membrane-impermeable ionic product is vital to many important biological functions; such enzymes are thus widely distributed in nature. At least three different CA families are known to exist, ␣-, ␤-, and ␥-CAs, of which the ␣-class has been studied in the most detail (31). Despite the fact that the three families are unrelated in sequence and structure, all are comprised of zinccontaining enzymes. All CAs in the animal kingdom are of the ␣-class, and the human enzyme is a validated drug target. The topical CA inhibitor dorzolamide, for example, binds to the active site zinc and is used for the treatment of glaucoma (32,33). The ␣-class CAs have also been found in algae and prokaryotes. ␤-Class enzymes have not been found in the animal kingdom but have been observed in a wide variety of other organisms.
In an ␣-CA enzyme, a zinc-bound OH Ϫ ion attacks a CO 2 molecule to form a metal-bound HCO 3 Ϫ ion that is subsequently displaced by a new water molecule (31). Proton transfer to the external buffer then regenerates the zinc-bound OH Ϫ ion. A detailed molecular scheme has been outlined that includes the initial binding of carbon dioxide in a pocket close to the zincbound hydroxide (where Thr 199 in the human isozyme CA II plays an important role as the "door-keeper") and a proton transfer route from the metal center to His 64 via a network of hydrogen-bonded water molecules (31).
The first structural information on ␤-CAs revealed an entirely different protein architecture and supplied the atomic details of its distinct type of active site. Independent structure determinations of ␤-CAs from pea (P. sativum) (26) and red algae (Porphyridium purpureum) (24) revealed similar overall folds but different coordination of the metal at the active site. In the pea enzyme, the protein supplied three ligands, whereas in the red algae enzyme a nearby aspartic acid residue furnished a fourth protein side chain ligand; this aspartate was conserved in the pea enzyme but involved instead in a salt link with a nearby arginine. Subsequent structure determinations of the ␤-CAs from M. thermoautotrophicum (16) and E. coli (23) provided additional examples of the two alternate ligand coordinations in this class of enzyme. So far, there is no structural evidence that a particular ␤-CA can flip between alternate coordination geometries.
A reaction mechanism similar to that proposed for the ␣-CAs has been suggested for the ␤-class enzymes, including the attack of a zinc-bound hydroxide on the carbon dioxide substrate and subsequent proton transfer from a new water ligand (26). Mitsuhashi et al. (24) have described a variation on this scheme in which the observed zinc-bound aspartate (Asp 37 in Rv1284) is more directly involved. The authors propose that this residue functions as a base that activates a water molecule and that the resulting hydroxide displaces the aspartyl group as the fourth zinc ligand in the next steps of the reaction. The roles of this aspartyl side chain and the neighboring arginine residue (the equivalents of Asp 37 and Arg 39 in Rv1284, both of which are conserved in the whole family) have been analyzed by site-directed mutagenesis for the M. thermoautotrophicum enzyme (34). The Asp3 Ala change reduced the k cat /K m value, but not to an extent that would suggest a vital role for the residue in the reaction mechanism. A much larger reduction in k cat /K m was observed for the Arg3 Ala mutant, which was interpreted to mean that the aspartyl residue, once free of the salt link, coordinates the zinc. Consistent with this hypothesis, the double mutant had a higher activity than the Arg3 Ala variant. Use of an imidazole buffer rescued activity of both the Asp3 Ala and the double mutants, suggesting that this buffer could take on the role of the mutated groups in the proton transfer step. Site-directed mutagenesis of two side chains in the active site of the Arabidopsis thaliana enzyme (corresponding to His 209 and Tyr 205 in Protein Data Bank code 1EKJ) suggested that these residues are important for efficient proton transfer in that enzyme (35). However, these residues are not conserved in the whole ␤-CA family.
Our structural studies of two M. tuberculosis gene products, Rv1284 and Rv3588c, indicate that they are both ␤-CAs. Our functional assays confirm this assignment for Rv3588c, although not for Rv1284 (Fig. 5). Metal content analysis indicates that the lack of activity is correlated with a significant zinc depletion at the active site of Rv1284. This interpretation is further supported by the Rv1284 structure itself; the electron density of the metal is only half that observed in Rv3588c. However, simply adding more zinc causes the protein to precipitate. Because the His tag in the present genetic construct may act as an internal zinc-chelating agent and thus cause the observed precipitation, we are planning to modify the construct in the future to produce protein without the tag. Both tertiary structural (DALI) (36) and motif-based (SPASM) (37) searches of the structural data bases indicate that the structure of Rv1284 is only consistent with a CA activity. Our Rv3588c crystals are grown at a pH value at which the enzyme has no activity. The observed structure, therefore, lends further support to the hypothesis that the inactive enzyme has four side chains coordinating the active site zinc atom.
The six ␤-CA structures now known enable us to define the structural core of this class of enzyme and to recognize the differences associated with a change from three to four protein side chain coordination at the zinc. The structures indicate that the active site metal of the ␤-CA enzymes is located in a shallower pocket than is seen in the ␣-CA family. However, the accessibility of the zinc ion varies between members of the ␤-CA family, ranging from the almost fully closed binding site cavity in Rv1284 to the M. thermoautotrophicum ␤-CA, which is open to solvent and indeed binds to a molecule of HEPES buffer (16). The accessibility is primarily determined by the loop connecting the first ␣-helix in the structure to the first ␤-strand. In Rv1284 and the pea enzyme (26) this loop crosses over the active site, and a side chain (Met 24 and Gln 151 , respectively) points toward the zinc. The glutamine residue of the pea enzyme has been suggested to interact with the gaseous carbon dioxide substrate as well as the zinc-bound bicarbonate. In Rv1284, the methionine side chain closes off the active site cavity, helping to form a mostly hydrophobic internal lining. The only polar residue, His 54 , is interacting with the hydroxyl group of Ser 74 at N␦1 and a water molecule at N⑀2. This water molecule is in turn positioned by hydrogen bonds with main chain carbonyl oxygen (from residue 24Ј) and nitrogen (from residue 38). A fourth (somewhat longer) hydrogen bond from this water links it to the nitrogen atom of the thiocyanate ligand. Because this nitrogen is positioned only 2.9 Å from the zinc hydroxide, we suggest that thiocyanate is a good starting model for the docking of carbon dioxide to the active site of this ␤-CA (see Fig. 2). Rv3588c lacks the occluding loop of Rv1284 because of local disorder and therefore belongs to the category of ␤-CA enzymes with more open active sites. It lacks the histidine residue that lines the cavity in Rv1284 but retains the tyrosine residue (Tyr 89 in Rv3588c) implicated in proton transfer for the A. thaliana enzyme (35).
Bicarbonate/carbon dioxide is important in fatty acid biosynthesis as well as in the synthesis of various small molecules. Merlin et al. (38) have shown that the ␤-CA encoded by the can (previously designated yadF) gene is required for the growth of E. coli in air, where it is estimated that the demand for bicarbonate is 10 3 to 10 4 -fold greater than can be produced by an uncatalyzed hydration mechanism. Whereas multiple CAs in a single organism are not uncommon (humans have at least 14 ␣-class enzymes) (7), our studies of two of the three CAs that have been identified in M. tuberculosis demonstrate that these enzymes have a number of distinct properties, and so they presumably have different functional niches. Several studies have suggested that Rv1284 has a particularly vital role. The closer relationship of this protein to the enzyme from the thermophilic methanoarcheaon M. thermoautotrophicum seems interesting in this context, although it is not yet clear which aspects of this unusual organism are most relevant. The second mycobacterial ␤-CA, Rv3588c, appears to be very important in mycobacterial survival, although its closer relationships to the plant and bacterial enzymes may make it a less desirable target for drug design. We anticipate that such information will assist in the search for new inhibitors that can make effective use of the differences between these and the ␣-class CAs found in humans.