Mycobacterium tuberculosis strains possess functional cellulases.

The genomes of various Mycobacterium tuberculosis strains encode proteins that do not appear to play a role in the growth or survival of the bacterium in its mammalian host, including some implicated in plant cell wall breakdown. Here we show that M. tuberculosis H37Rv does indeed possess a functional cellulase. The x-ray crystal structure of this enzyme, in ligand complex forms, from 1.9 to 1.1A resolution, reveals a highly conserved substrate-binding cleft, which affords similar, and unusual, distortion of the substrate at the catalytic center. The endoglucanase activity, together with the existence of a putative membrane-associated crystalline polysaccharide-binding protein, may reflect the ancestral soil origin of the Mycobacterium or hint at a previously unconsidered environmental niche.

Genome-wide inventory of the carbohydrate-active enzyme repertoire of mycobacteria reveals a significant number of open reading frames (ORFs) 1 that encode proteins that do not play a role in the growth or survival of the bacterium in its mammalian host, including some apparently implicated in cellulose breakdown. For example, the Mycobacterium tuberculosis H37Rv genome (1), as well as those from M. tuberculosis CDC1551 and Mycobacterium bovis, appear to possess plant cell wall hydrolases from the sequence-based glycoside hydrolase (GH) families (2) GH5, GH6, GH12, and GH16. Indeed, many of the unfinished mycobacterial genomes (M. tuberculosis 210, Mycobacterium marinum M, Mycobacterium smegmatis str. MC2 155, and Mycobacterium avium 104) all possess similar ORFs, the functions of which are unknown.
M. tuberculosis H37Rv ORF Rv0062 encodes a protein, hereafter called Cel6, that contains an N-terminal sequence of 88 residues featuring an extended basic region and features consistent with a signal peptide, and a catalytic domain comprising residues 89 -380, assigned to glycoside hydrolase family GH6. Other characterized GH6 members display two activities, termed cellobiohydrolases and endoglucanases, the former liberating the disaccharide cellobiose from crystalline cellulose, while the latter make random internal cuts on single ␤-1,4 glucan chains. Structurally, these two classes mainly differ in the size of the two extended loops forming the active site (3)(4)(5). The sequence alignment of M. tuberculosis H37Rv Cel6 with other GH6 members suggests that it could be an endoglucanase.
We have cloned and expressed the protein derived from M. tuberculosis H37Rv ORF Rv0062. Here we show that it is a functional cellulase that liberates a repertoire of cello-oligosaccharides from acid-swollen cellulose and barley ␤-glucan with an endo action. The x-ray crystal structure of this protein, solved in ligand-complexed forms (Fig. 1), reveals that Rv0062 encodes a functional, indeed typical, GH6 endoglucanase with a catalytic "Ϫ1" subsite (6) that promotes identical distortion of the substrate glucoside to that observed previously for homologs from plant cell wall-degrading bacteria and fungi (7). The observation that Cel6 displays the structure and biochemical properties of a "classical" GH6 endo-acting cellulase poses intriguing questions for its role in mammalian pathogenic mycobacteria.

MATERIALS AND METHODS
Bacterial Strains, DNA Manipulation, and Cloning-The Escherichia coli strains used in this study were XL2blue and Origami (DE3) (Novagen), and the plasmids employed were pET22b and pET16b (Novagen) (8). E. coli was cultured in Luria broth supplemented with the appropriate antibiotics.
Protein Production and Purification-Cells were grown in Luria broth supplemented with 15 g ml Ϫ1 kanamycin and tetracycline and 100 g ml Ϫ1 ampicillin at 37°C to an A 595 of ϳ0.6 before induction of Cel6 expression by the addition of 0.2 mM isopropyl 1-thio-␤-Dgalactopyranoside and incubation for a further 7 h. Cell-free extracts were prepared and His-Cel6 was purified by immobilized metal ion affinity chromatography using a Ni 2ϩ -chelating column (Amersham Biosciences) and a 5-500 mM imidazole gradient in 20 mM Tris-HCl, pH 7.9, 0.5 M NaCl, to elute the recombinant protein. Fractions containing Cel6 (as judged by SDS-PAGE) were concentrated using a Vivaspin concentrator before further purification by gel filtration using a S75 16/26 column (Amersham Biosciences) equilibrated in 20 mM sodium HEPES buffer, * This work was supported by the Biotechnology and Biological Sciences Research Council. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The To whom correspondence should be addressed. E-mail: davies@ysbl.york.ac.uk. 1 The abbreviations used are: ORF, open reading frame; GH, glycoside hydrolase; HPAEC, high pressure anion exchange chromatography; PEG, polyethylene glycol; MES, 4-morpholineethanesulfonic acid; ISOF, isofagomine. pH 7.5, containing 100 mM NaCl, using an fast protein liquid chromatography system. The eluted protein was concentrated using a 30,000 cut-off Filtron, at 4°C, and washed into water (MilliQ) to a final concentration of 10 mg ml Ϫ1 .
Cel6-native was purified by anion exchange chromatography using a DEAE column (Amersham Biosciences) using a 0 -500 mM NaCl gradient. Fractions containing Cel6, as judged by SDS-PAGE, were pooled, and ammonium sulfate was added to a final concentration of 1.4 M. After removal of insoluble material the protein solution was subjected to hydrophobic chromatography using a Phenyl-Sepharose column (Amersham Biosciences) and a 1.5 to 0 M ammonium sulfate gradient, in 20 mM Tris-HCl buffer, pH 8.0, to elute the protein. Cel6-native was further purified by gel filtration as described for His-Cel6.
Cel6 Assays-Purified Cel6 was assayed for activity against polysaccharides using the reducing sugar assay of Miller (9). Enzyme reactions were carried out in 50 mM sodium phosphate/12 mM citrate (PC) buffer, pH 6.5, containing 1 mg ml Ϫ1 soluble polysaccharide or 4 mg ml Ϫ1 insoluble polysaccharide and 1 mg ml Ϫ1 bovine serum albumin. The reactions, which were incubated at 37°C, were initiated by the addition of Cel6 to a final concentration of 500 nM. At regular intervals up to 20 min, 500-l aliquots were removed and assayed for reducing sugar (9), and the product profile was determined by high pressure anion exchange chromatography (HPAEC) (10). To evaluate the reaction products generated from cello-oligosaccharides, 3 M Cel6 was incubated with 300 M of the cognate oligosaccharide in PC buffer, and at regular intervals aliquots were removed, and the reaction products were analyzed by HPAEC.
Crystallization and Structure Solution-All crystals were obtained by hanging-drop vapor diffusion with 1-l drops containing a 50:50 (v/v) mix of the protein and the reservoir solution. The protein was incubated with 1 mM ligand for at least 30 min prior to crystallization. The complex of Cel6A-native with thiocellopentaose (SDP5) was crystallized using 12-14% (w/v) PEG 4000 and 200 mM lithium sulfate, as precipitant, in 100 mM MES buffer, pH 6.5. Crystals with cellobio-derived isofagomine (ISOF) were obtained using 12-14% (w/v) PEG 4000 and 200 mM lithium sulfate, as precipitant, in 100 mM sodium acetate buffer, pH 4.6. Cryoprotection involved supplementing the growth conditions with 20% (v/v) glycerol or 30% PEG 400, respectively. Data for His-Cel6-SDP5 were collected at European Synchrotron Radiation Facility ID14 -2 and beamline Cel6-ISOF in the home laboratory using a CuK␣ rotating-anode source operating at 50 kV and 100 mA, with Osmic mirrors and a MARresearch detector. Data were processed with MOSFLM from the CCP4 suite (11) or the HKL suite of programs (12). All further computing was performed using the CCP4 suite, unless otherwise stated.
The structure of the His-Cel6-SDP5 was solved by molecular replacement using the coordinates from the endoglucanase from Thermobifida fusca (Protein Data Bank code 1TML) as a search model. The program AMoRE (13) was used in conjunction with data in the resolution range 20 -4 Å. This structure was subsequently used to solve the other Cel6 complexes. 5% of the observations was set aside for cross-validation analysis (14) and was used to monitor various refinement strategies with REFMAC (15). Manual corrections of the model were performed using the X-FIT routines of the program QUANTA (Accelrys, San Diego, CA). Water molecules were added in an automated manner using ARP (16) and verified manually. The crystal structures of Cel6 with cellobiose (Protein Data Bank code 1UP0) and cellobiose-S-cellobiose (Protein Data Bank code 1UP3) were also solved at resolutions of 1.75 and 1.6 Å and deposited but are not discussed further.

RESULTS AND DISCUSSION
The Three-dimensional Structure of Cel6 -Cel6 was crystallized in various ligand-bound forms, with non-hydrolyzable thiooligosaccharides and with a disaccharide inhibitor, cellobio-derived isofagomine (Fig. 1) and data collected between 1.9 and 1.1 Å resolution (Table I). The catalytic domain presents a distorted (␣/␤) 8 barrel typical for family GH6 enzymes (3-5) (Fig. 2a) composed of eight ␤-strands, with seven involved in the ␤-barrel formation and eight ␣-helices. As predicted from sequence, the structure possesses the shortened C-terminal loop typical of endoglucanases, consistent with its biochemical properties (see below). In marked contrast to previous GH6 endoglucanase structures, however, the active-center loops of Cel6 enclose the ligandbinding cleft (Fig. 2a) rather then remaining "open." Complexes with thio-oligosaccharides reveal binding in the aglycone (leaving group) subsites, termed ϩ1 and ϩ2, with sugars in the 4 C 1 (chair) conformation (data not shown). A "kink" at ϩ2 may allow for the formation of a third "ϩ3" subsite (and there is ample protein surface at the non-reducing end of the Ϫ2 site to accommodate binding of longer substrates). A strong indication that this mycobacterial enzyme is a functional ␤-glucanase is that the complex structures confirm that all the direct interactions of the "glycone" Ϫ2 and Ϫ1 subsites and aglycone ϩ1 sites are invariant. The ϩ2 subsite shows more divergence but still maintains its hydrophobic sugar-binding platform, Trp250.
Substrate Binding and Distortion at the Catalytic Center-Further evidence for the catalytic integrity of Cel6 comes from its  A Functional Endoglucanase in M. tuberculosis 20182 complex with the cellobio-derived isofagomine. This is a disaccharide inhibitor in which a glucosyl moiety is ␤-1,4 linked to the inhibitor isofagomine (possessing a nitrogen group in place of the anomeric carbon (Fig. 1)). It is presumed to be a tight-binding inhibitor by virtue of the transition state-mimicking positive charge on the ring nitrogen of the compound as its conjugate acid (7). In the Cel6 structure, two molecules of the inhibitor occupy the substrate-binding site from subsite Ϫ2 to Ϫ1 and ϩ1 to ϩ2 (Fig. 2b). In subsites Ϫ2, ϩ1, and ϩ2 the glucoside and isofagomine rings lie in relaxed 4 C 1 (chair) conformation, while in the "catalytic" Ϫ1 subsite the aza-sugar moiety is distorted to a conformation intermediate between O S 2 (skew-boat) and 2,5 B (boat) conformation, as observed previously for the cellobiohydrolase Cel6A from Humicola insolens (7). The 2,5 B is implicated as the transition state conformation during catalysis, as it maintains C5, O5, C1, and C2 co-planar, an essential prerequisite of an oxocarbonium-ion-like species. Notably, the active center of Cel6 possesses the constellation of aspartates required for transition state stabilization, distortion, and catalysis (3,7): Asp 168 , Asp 353 , and the catalytic acid, Asp 206 , and all the usual substratebinding residues that flank the active center (Fig. 3). The sole difference in the catalytic subsite of Cel6, compared with the previously published GH6 enzymes, is that a Ser residue implicated in stabilizing a solvent water in an unusual "Grotthus" inverting mechanism is replaced by Ala 173 . The putative attacking water, however, still resides "below" the anomeric carbon (Figs. 2b and 3).
Catalytic Activity of Cel6 -Demonstration of the catalytic function of the recombinant enzyme is shown by its catalytic activity on both barley ␤-glucan and acid swollen cellulose. The enzyme hydrolyzes both substrates to release an array of different gluco-oligosaccharides, a product profile typical of endoacting ␤-glucanases (Fig. 4). Reducing sugar analysis of the reaction products showed that Cel6 displayed turnover rates (moles of product per mole of enzyme per second) of 25 s Ϫ1 and 0.9 s Ϫ1 on ␤-glucan and acid-swollen cellulose, respectively. These are of the same order as observed for endo-acting GH6 cellulases from known saprophytic plant cell wall-degrading microorganisms that are typically in the range 1-10 s Ϫ1 on acid-swollen cellulose, for example. The enzyme displays no activity against a range of other polysaccharides including amylose, xylan, mannan, galactan, arabinan, pectin, laminarin, and a range of aryl glycosides including 4-nitrophenyl ␤-D-cellobioside. Cel6 is, however, also able to hydrolyze cellooligosaccharides; cellopentaose is converted into cellobiose and cellotriose; cellohexaose into cellobiose, cellotriose, and cellote-traose in a molar ratio of 1:4:1 (data not shown), while the enzyme displays no measurable activity against cellotetraose or cellotriose. Thus, at least five subsites in Cel6, as also predicted from the crystal structures of the enzyme-ligand complexes, make a highly significant contribution to catalysis.
What are the implications for M. tuberculosis, a non-plant cell wall-degrading organism, possessing functional cellulases? That they are most likely important is underlined by the fact that all finished mycobacterial genomes, excluding Mycobacterium leprae (discussed below) together with M. tuberculosis 210; M. marinus, M. avium 104, and M. smegmatis str. MC2 155 possess a GH6 cellulase, at least one GH3 ␤-glucosidase candidate (the enzyme that is required for the subsequent degradation of cellulase-derived products), and potential ␤-glucanases from GH5 and GH16. M. smegmatis str. MC2 155 also possesses a candidate ␤-glucosidase from family GH1. It is interesting to note that the GH5 and GH16 enzymes form distinct "subfamilies" when compared with the other respective characterized family members and one cannot speculate on their function beyond saying that they will be retaining ␤-glycosidases with potential for transglycosylation.
Despite possessing the enzymes responsible for cellulose degradation, it is, however, extremely unlikely that M. tuberculosis, or related mycobacterial strains, can utilize the plant cell wall as a significant nutrient as they lack the full repertoire of hydrolytic enzymes required to attack this composite structure. For example, no xyloglucanase, ␣-glucuronidase, or xylanase candidates are found in any of the mycobacterial genomes. That Cel6 possesses a signal peptide certainly points to an extracellular role. Equally important is the sole obligate parasite among mycobacteria, M. leprae, possesses none of the potential cellulase candidates on its genome.  4. The endo-catalytic activity of Cel6. Cel6 at concentrations of 500 nM and 3 M was incubated with 4 mg ml Ϫ1 ␤-glucan and 5 mg ml Ϫ1 acid swollen cellulose, respectively, in 50 mM sodium phosphate, 12 mM citrate buffer, pH 6.5, containing 1 mg ml Ϫ1 bovine serum albumin, at 37°C. At timed intervals aliquots were removed, and the identification and quantification of the reaction products were determined by HPAEC (10).

A Functional Endoglucanase in M. tuberculosis 20183
It is possible that these enzymes date from a prehistoric time when primitive mycobacteria were soil-based organisms, as M. bovis is still. Indeed Cel6 displays significant sequence identity (ϳ40%) with numerous Streptomyces GH6s indicating a strong evolutionary relationship among these enzymes (Fig. 5). This is consistent with the view that Streptomyces and Mycobacterium are related Actinomycetes. It is estimated that these bacteria diverged from a common ancestral organism around 80 million years ago. It is possible, therefore, that Cel6 is derived from the ancestral organism. The striking similarity between specific islands within the genomes of Streptomyces coelicolor A3 (2) and M. tuberculosis, however, suggests that gene transfer has occurred between the organisms when the bacteria occupied soil niches, and these events could also have resulted in the acquisition of the cellulase gene by the Mycobacterium. Irrespective of the mechanism of Cel6 acquisition by M. tuberculosis, it remains difficult to provide definitive insight into its role in the human pathogen. It is possible that the gene is an evolutionary relic and there has not been sufficient evolutionary time for its removal. Yet, it is surely significant that while M. tuberculosis only contains ␤-glucanases and a cellulose-binding module, Streptomyces coelicolor A3 contains the complete and extensive repertoire of hydrolytic enzyme genes that encode proteins that attack all the major plant structural polysaccharides. Thus, the retention of a small number of cellulase genes within the genome of the Mycobacterium suggests that these genes do confer a selective advantage on the organism. Is it possible that there is a previously undescribed secondary host such as a rumen protozoon. Such a species would possess a rudimentary plant cell wall-degrading system, in which cellulose, or its degradation products, could be presented to the Mycobacterium. An alternative evolutionary rationale for the ␤-glucanase activity displayed by M. tuberculosis is that the bacterium encounters biofilms and that some of these exo-polysaccharides on which these tight microbial ecosystems form are ␤-glucans. While the chemical composition of most extracellular polysaccharides that are integral to biofilm formation is unknown, some, such as that made by Pseudomonas aeruginosa, have recently been shown to consists of ␤-glycans (17). Indeed, a recent report has identified the regulon in the important human pathogen P. aeruginosa (18) that enables the bacterium to make this switch between a virulent, disease-causing state and a biofilm state in the mammalian host. Of interest in this context is the occurrence of a gene encoding a membraneassociated family 2 carbohydrate-binding module, CBM2, (CBM2s are known to bind to ␤-glucans such as cellulose and chitin, reviewed in Ref. 19), yet another incongruous open reading frame in all completed, and four unfinished, mycobacterial genomes.  (24) and bootstrap analysis by ClustalW (25). The tree was displayed with TreeView (26). The thick lines identify various subfamilies. The accession numbers (GenBank TM or Swiss-Prot) are given only for subfamily 2 that contains the mycobacterial enzymes (accession numbers of representative members of the various other subfamilies include subfamily 1, Q12646; subfamily 3, Q02321; subfamily 4, Q60029).
A Functional Endoglucanase in M. tuberculosis 20184