From Soil to Structure, a Novel Dimeric β-Glucosidase Belonging to Glycoside Hydrolase Family 3 Isolated from Compost Using Metagenomic Analysis

Background: β-Glucosidases hydrolyze cellobiose and are an important step in biomass saccharification. Results: The structure of JMB19063 is a homodimer, and residues from the partnering monomer form a portion of the active site. Conclusion: JMB19063 is the first structure in the GH3 family that requires dimerization for catalysis. Significance: Structural characterization of novel β-glucosidases may provide insights into improving biofuel production. A recent metagenomic analysis sequenced a switchgrass-adapted compost community to identify enzymes from microorganisms that were specifically adapted to switchgrass under thermophilic conditions. These enzymes are being examined as part of the pretreatment process for the production of “second-generation” biofuels. Among the enzymes discovered was JMB19063, a novel three-domain β-glucosidase that belongs to the GH3 (glycoside hydrolase 3) family. Here, we report the structure of JMB19063 in complex with glucose and the catalytic variant D261N crystallized in the presence of cellopentaose. JMB19063 is first structure of a dimeric member of the GH3 family, and we demonstrate that dimerization is required for catalytic activity. Arg-587 and Phe-598 from the C-terminal domain of the opposing monomer are shown to interact with bound ligands in the D261N structure. Enzyme assays confirmed that these residues are absolutely essential for full catalytic activity.

A recent metagenomic analysis sequenced a switchgrassadapted compost community to identify enzymes from microorganisms that were specifically adapted to switchgrass under thermophilic conditions. These enzymes are being examined as part of the pretreatment process for the production of "secondgeneration" biofuels. Among the enzymes discovered was JMB19063, a novel three-domain ␤-glucosidase that belongs to the GH3 (glycoside hydrolase 3) family. Here, we report the structure of JMB19063 in complex with glucose and the catalytic variant D261N crystallized in the presence of cellopentaose. JMB19063 is first structure of a dimeric member of the GH3 family, and we demonstrate that dimerization is required for catalytic activity. Arg-587 and Phe-598 from the C-terminal domain of the opposing monomer are shown to interact with bound ligands in the D261N structure. Enzyme assays confirmed that these residues are absolutely essential for full catalytic activity.
Switchgrass (Panicum virgatum) is one of the leading lignocellulosic feedstock candidates for production of "second-generation" biofuels (1)(2)(3), and the development of optimized process technologies for this feedstock has warranted significant attention in the scientific and commercial biofuel sectors. This focus on switchgrass as a feedstock and the desire to pretreat biomass at elevated temperatures has spurred interest in increasing the current repertoire of enzymes used for biofuel production from switchgrass. In a recent metagenomic study, random shotgun sequencing of a switchgrass-adapted compost community was used to identify enzymes from microorganisms that were specifically adapted to switchgrass under thermo-philic conditions (4). Sequence analysis identified JMB19063 as a novel ␤-glucosidase from the GH3 (glycoside hydrolase 3) family. Although the species of origin for JMB19063 is unknown, the sequence most closely resembles proteins from the genera Flavobacterium and Bacteroides, with ϳ70 -80% identities. In the complex mixture of cellulases required to break down cellulose to glucose, ␤-glucosidases are the enzymes that hydrolyze cellobiose and other short soluble oligosaccharides to release glucose and are commonly viewed as one of the rate-limiting enzymes (5). The GH3 family represents a collection of Ͼ2700 enzymes in the Carbohydrate-Active enZYmes (CAZy) Database, including ␤-glucosidases (EC 3.2.1.21), xylan 1,4-␤-xylosidases (EC 3.2.1.37), and glucan 1,3-␤-glucosidases (EC 3.2.1.58). However, there are only eight crystal structures of GH3 enzymes deposited in the Protein Data Bank. Because the majority of ␤-glucosidases from lignocellulosic biomass microbiomes are members of the GH3 family (4, 6, 7), high-resolution structural characterization of a ␤-glucosidase from this family may provide insights into alleviating product inhibition and thereby enhancing the overall efficiency of saccharification by increasing the catalytic rate of the bottleneck enzyme.
Here, we present the x-ray crystal structure (resolved to 2.20 Å resolution) and biochemical characterization of the GH3 ␤-glucosidase JMB19063, discovered from the metagenome sequence of a switchgrass-adapted compost microbial community. It is the first reported structure of a class of dimeric enzymes from the GH3 family, and we demonstrate that dimerization is absolutely required for catalytic activity. JMB19063 contains a C-terminal domain with an Ig-like region that contributes to both substrate binding and dimerization. We have also crystallized the catalytic variant D261N in the presence of cellopentaose to more closely examine the details of substrate binding. The results provide new insight into the GH3 family of enzymes and may enable the identification and optimization of other dimeric ␤-glucosidases.

EXPERIMENTAL PROCEDURES
Metagenome Sequencing, Assembly, and Analysis-The details of the sequence analysis and identification of the enzyme sequences have been described previously (4). Briefly, metagenomic DNA from a 31-day switchgrass-adapted compost community was shotgun-sequenced using the Roche 454 GS-FLX titanium system by the Department of Energy Joint Genome Institute and assembled using Newbler (version 2). Candidate full-length glycoside hydrolase enzyme sequences were identified by BLASTX hits (E Ͻ 1 e Ϫ10 ) matching functionally characterized enzymes in the CAZy Database (8) over at least 90% of their length. Frameshift errors due to 454 homopolymer sequencing errors were identified and manually corrected based on the gapped BLASTX alignment to the closest protein sequences.
Protein Expression and Purification-The codon-optimized ␤-glucosidase gene without the putative signal peptide sequence was inserted into the pET-DEST42 vector (Invitrogen) and then transformed into the BL21 Star (DE3) Escherichia coli strain (Invitrogen) for protein expression of the ␤-glucosidase with a C-terminal His 6 tag as described previously (4). Overexpression and purification of the recombinant ␤-glucosidase was performed following a previously reported procedure (9). After purification, the buffer was exchanged with 200 mM NaCl and 25 mM HEPES at pH 7 using an Econo-Pac 10 desalting column (Bio-Rad). The protein solution was passed through a 0.45-m membrane syringe filter to remove a trace amount of visible precipitants. Purified protein was then concentrated to 10 mg/ml using a Vivaspin 6 centrifugal concentrator (Sartorius Stedim Biotech, Göttingen, Germany) for crystallization. The protein concentrations were determined by the Bradford assay (Bio-Rad). The functional mutants of the ␤-glucosidase (D261N, R587A, F598A, and R587A/F598A) were created employing the QuikChange site-directed mutagenesis protocol (Stratagene). All recombinant ␤-glucosidase mutants were expressed and purified by the same methods described above for the wild-type ␤-glucosidase.
Enzyme Assays and Kinetic Measurements-The ␤-glucosidase activity at various temperatures and pH values was measured as described previously (4) using 4-nitrophenyl ␤-D-glucopyranoside (pNPG). 3 The kinetics of pNPG hydrolysis by the ␤-glucosidase was measured at 50°C in 50 mM MES buffer (pH 6.5) containing 0.1-10 mM pNPG and 0.5 g (wild-type) or 5 g (R587A, F598A, and R587A/F598A) of the recombinant enzyme in 1-ml reaction volume. The hydrolysis reaction was quenched by mixing the reaction mixture with an equal volume of 2% Na 2 CO 3 solution. The amount of hydrolyzed p-nitrophenol was measured by reading the absorbance at 420 nm. Eadie-Hofstee analyses were used to measure the kinetic parameters (V max , K m , and k cat ) for the wild-type and mutant enzymes.
The hydrolysis of cellobiose, cellotetraose, and maltose by the wild-type ␤-glucosidase and the three mutants was further qualitatively analyzed by nanostructure-initiator mass spectrometry (10)-based enzyme activity assays (Nimzyme) (11,12). The principle of Nimzyme has been described previously (11,12). In brief, hydrolysis reactions were measured at 50°C in 25 mM MES buffer (pH 6.5). In each case, 0.1 mM per Nimzyme substrate (cellobiose, cellotetraose, or maltose) was incubated with 2 g (wild-type) or 10 g (R587A, F598A, and R587A/ F598A) of the recombinant enzyme in 25-l reaction volumes for 15 min. For the hydrolysis of cellotetraose by the wild-type protein, additional time points were added. Reactions were quenched by the addition of an equal volume of methanol. Subsequently, 0.7 l per sample was spotted onto a nanostructureinitiator mass spectrometry surface and analyzed using an Applied Biosystems 4800 MALDI-TOF/TOF mass analyzer. Enzyme activities were determined as product/substrate ratios. In addition, the enzyme activity on polysaccharides were tested by incubating 35 g of wild-type JMB19063 with 1 mg of carboxymethylcellulose, glucomannan, arabinoxylan, galactomannose, carboxymethyl curdlan, or lichenan in a 1-ml reaction volume containing 50 mM MES buffer (pH 6.5) under the same conditions used for the pNPG assay. At the end of the incubation, a 3,5-dinitrosalicylic acid assay was performed to detect reducing sugar (33).
Crystallization-Crystallization screening was carried out on a Phoenix robot (Art Robbins Instruments, Sunnyvale, CA) using a sparse matrix screening method (13). Wild-type JMB19063 and the D261N point mutant were crystallized by sitting drop vapor diffusion in drops containing a 2:1 ratio of protein solution and 0.1 M sodium citrate and 30% (w/v) PEG 3350. JMB19063 D261N contained an additional 5 mM cellopentaose in the crystallization buffer. Rod-like crystals were observed within 2 days. For data collection, crystals were flashfrozen in liquid nitrogen directly from the crystallization drop.
X-ray Data Collection and Structure Determination-The x-ray data sets for JMB19063 were collected at the Berkeley Center for Structural Biology on beamlines 8.2.1 and 8.2.2 of the Advanced Light Source at the Lawrence Berkeley National Laboratory. Diffraction data were recorded using ADSC Q315r detectors (Area Detector Systems Corp., San Diego, CA). Processing of image data was performed using the HKL2000 suite of programs (14). Phases were calculated by molecular replacement with the program Phaser (15) using the structure of barley ␤-D-glucan exohydrolase (Protein Data Bank code 1EX1) (16) as a search model. Automated model building was conducted using RESOLVE (17) from the PHENIX suite of programs (18), resulting in a model that was 85% complete. Manual building using Coot (19) was alternated with reciprocal space refinement using PHENIX (18). Waters were automatically placed using PHENIX and manually added or deleted with Coot according to peak height (3.0 in the F o Ϫ F c map) and distance from a potential hydrogen bonding partner (Ͻ3.5 Å). TLS refinement (20) using 10 groups, chosen using the TLSMD web server (21), was used in later rounds of refinement. An initial rigid body refinement using PHENIX (18) was performed with the final wild-type model (all waters removed) for JMB19063 D261N. This model was then refined and built in the same manner as the wild-type model. All data collection, phasing, and refinement statistics are summarized in Table 1.
Creation of Structure-based Sequence Alignment and the Phylogenetic Tree-To build a high-quality sequence alignment in this diverse protein family, we used a combination of struc-tural and sequence information. First, we performed pairwise structural alignments with 3DHIT (22) of six GH3 family structures (chain A of proteins with Protein Data Bank codes 3BMX, 1X38, 2OXN, 2X41, 3F94, and 3ABZ) with the JMB19063 structure. These six structures were selected based on their resolution and to remove redundancy at the 90% sequence identity. For each of these structures, we performed a BLAST search (23) with the GH3 sequences (after removing short sequence fragments) from the CAZy Database (8) to find sequences between 25 and 90% sequence identity to the structure's sequence, and the resulting sequences were aligned with MUSCLE (24). These seven multiple sequence alignments were then combined into one sequence alignment by aligning equivalent positions in the individual sequence alignments using the pairwise structural alignments with the JMB19063 structure. Redundant sequences were filtered out at 90% sequence identity, preferentially keeping sequences with structures, experimental characterization, and longer lengths, in this order of priority.
Gapped positions and their neighbors were trimmed from the above structure-based sequence alignment by removing positions with Ͻ60% occupancy and two flanking positions. A tree was built from the resulting trimmed alignment using FastTree 2.1.3 (25) and the tree was rerooted such that the root was the midpoint between the leaves with the farthest evolutionary distance.

RESULTS AND DISCUSSION
Overall Structure of JMB19063-The crystal structure of JMB19063 in complex with ␤-glucose was solved to 2.20 Å resolution. The structure of the catalytic variant D261N crystallized in the presence of cellopentaose was also determined to 2.20 Å. The electron density map is generally well ordered throughout the entire polypeptide chain, with the only disorder observed in the C-terminal His 6 tag and linker region (amino acids 738 -775). This is the first structure described for a dimeric GH3 enzyme, and the enzyme forms a dimer of identical monomers, each composed of three distinct domains (Fig. 1A). Domain 1 (amino acids 1-349) forms an (␣/␤) 8 -TIM barrel. Domain 2 (amino acids 350 -571) is an ␣/␤-sandwich comprising a sixstranded ␤-sheet sandwiched between three ␣-helices on either  side. The majority of all structurally described enzymes from the GH3 family contain this two-domain architecture, corresponding to the Glyco_hydro_3 (Pfam PF00933) and Glyco_hydro_3_C (Pfam PF01915) domain families. The first of these structures solved was the barley exohydrolase (HvExoI) (Fig. 1B) (16). JMB19063 contains an additional domain at the C terminus (amino acids 572-737) that consists of a 50-amino acid linker and an Ig-like ␤-sandwich fold region. The Ig-like region is most similar to the family of CARDB domains (Pfam PF07705), which are bacterial cell adhesion domains. A structural Ca 2ϩ ion is observed in the Ig-like regions of both monomers. It is coordinated by Asp-664, the backbone carbonyl of Val-666, and four water molecules. The calcium is positioned similarly to Ca 2ϩ ions in the structurally related cellulose-binding domains from the cellulosomal scaffoldin subunit of Clostridium cellulolyticum (Protein Data Bank code 1G43) (26) and the Cel9V glycoside hydrolase from Clostridium thermocellum (code 2WNX) (27). In JMB19063, the Ca 2ϩ ion helps position a loop, consisting of residues 664 -674, that is at the dimer interface and makes five hydrogen bonds with the other monomer. Another three-domain ␤-glucosidase from the GH3 family has recently been described in Thermotoga neapolitana (TnBgl3B) (28). The majority of the TnBgl3B structure superposes well with JMB19063 (Fig. 1C), and TnBgl3B also has a third Ig-like domain at the C terminus. In the structure of TnBgl3B and all other previously described structures from the GH3 family, the active site is at the interface of domains 1 and 2, and residues from both domains play a role in substrate binding. The C-terminal domain of TnBgl3B does not contact the active site and is described as having no known function (28). In our structure of JMB19063, the C-terminal domain from the opposing monomer composes a portion of the enzyme active site, and residues from this domain play key roles in substrate binding (Fig. 2).
Dimerization-The buried solvent-accessible surface of the dimer interface as calculated by the protein-protein interaction interface server (PISA) (29) is 3286 Å 2 and is composed of 71 hydrogen bonds and 19 salt bridges. The majority of the interactions occur between the C-terminal linker region and domains 1 and 2 of the other monomer. One notable exception to this is the Ca 2ϩ -binding loop (positions 664 -674) in the Ig-like region, which forms five hydrogen bonds with domain 1. The results from structural analysis were confirmed by sizeexclusion chromatography, indicating that the enzyme is indeed homodimeric in nature. Only one other structure of a multimeric GH3 enzyme has been described. A ␤-glucosidase from Kluyveromyces marxianus (KmBglI) forms a tetramer (30). The crystal structure of KmBglI suggests that tetramer formation is not essential for catalysis because the subunit interfaces are far from the active site. Because the active site contribution of the C-terminal domain comes from residues of the opposing monomer, dimerization is essential for the activity of JMB19063. However, it appears that in the GH3 family, neither the role of a third domain in substrate binding nor the necessity of multimerization for catalytic activity has heretofore been described. Recently, the structure of a GH3 enzyme from Synechococcus sp. PCC 7002 was deposited in the Protein Data Bank (code 3SQL). Although it is currently unpublished, the structure suggests that, like JMB19063, it is dimeric. The enzyme is two-domain and therefore does not have the same dimer interface as JMB19063. However, the interface is close to the active site, and thus, it is possible that residues from the partnering monomer may play a role in substrate binding.
Active Site-In the structure of wild-type JMB19063, although it was not specifically added to crystallization buffer, well resolved density for a ␤-glucose is observed at the active site cleft formed by the interface of domains 1 and 2 in both monomers, and the glucose molecule occupies the Ϫ1 subsite ( Fig. 2A). Asp-261 in domain 1 and Glu-488 in domain 2 are positioned similarly to the catalytic residues (Asp-285 and Glu-491) in HvExoI, suggesting that Asp-261 and Glu-488 act as the catalytic nucleophile and acid/base, respectively, in JMB19063. The glucose is tightly bound by a network of hydrogen bonds with Asp-84, Arg-142, Lys-181, His-182, Asp-261, and Glu-488 and is further stabilized by hydrophobic interaction with Trp-430 (Fig. 2C). The source of the glucose molecule is most likely from protein expression, as there is some glucose contamination in yeast extract (31) that is carried through the purification process.
The catalytic mutant JMB19063 D261N was crystallized in the presence of cellopentaose. However, in both subunits, interpretable density was observed only for three of the five sugars in the Ϫ1, ϩ1, and ϩ2 subsites (Fig. 2B). Despite being catalytically inactive, it was not possible to model an intact cellotriose moiety into the electron density. It was necessary to introduce a break between the sugars in the Ϫ1 and ϩ1 subsites to correctly fit the density. Therefore, a ␤-glucose was modeled into the Ϫ1 subsite, and a cellobiose moiety was modeled into the ϩ1 and ϩ2 subsites. A small residual activity coupled with long crystallographic time scales is the probable cause of the broken bond between the sugars in the Ϫ1 and ϩ1 subsites. Some weak elec-  tron density exists beyond the ϩ2 subsite; however, it was not clear enough to justify modeling in either of the remaining two sugars. This disorder is likely due to the lack of any interactions between the protein and sugar units beyond the ϩ1 subsite. The glucose in the Ϫ1 subsite almost exactly overlaps with the wildtype JMB19063 glucose. Additionally, all active site residues, including Asn-261, are positioned nearly identically to their wild-type counterparts. Therefore, it is likely that the ligands are bound to the D261N variant in the same manner as we would expect from the wild-type enzyme. The sugar in the ϩ1 subsite is stabilized by hydrogen bonds to Glu-488 and hydrophobic stacking with Tyr-262 (Fig. 2D). Additional hydrogen bonding and hydrophobic stacking are provided by Arg-587 and Phe-598, respectively, from the C-terminal domain of the opposing monomer. These interactions are spatially conserved in HvExoI at Arg-291 and Trp-434, which are both in domain 1.
Because Arg-587 and Phe-598 from the C-terminal domain of the opposing monomer contribute to substrate binding, it is necessary for the enzyme to be in a dimeric state to have full catalytic activity. Therefore, we hypothesized that any GH3 enzyme that has similar amino acids conserved at those sites is a functional dimer. To investigate this, we built a phylogenetic tree using the existing GH3 crystal structures to provide a map of the relationships in the family (Fig. 3; for a high-resolution version, including accession numbers, see supplemental Fig. S2). A large clade containing JMB19063 was found to have a conserved arginine at the position corresponding to Arg-587 and an aromatic residue at a position analogous to Phe-598. Of the enzymes in this clade, only a ␤-glucosidase from Flavobacterium meningosepticum has been biochemically characterized (32). Like JMB19063, this enzyme was found to be dimeric.
Enzyme Kinetics-JMB19063 hydrolyzed a variety of substrates with ␤-1,4 linkages up to cellotetraose, including cellotriose, cellobiose, pNPG, and 4-nitrophenyl ␤-D-cellobioside. The enzyme was unable to hydrolyze 4-nitrophenyl ␤-D-xylopyranoside and a range of polysaccharide substrates that we tested, including carboxymethylcellulose, glucomannan, lichenan, galactomannose, and carboxymethyl curdlan. The Michaelis-Menten parameters of the ␤-glucosidase activity were determined using pNPG as the substrate at pH and temperature optima (Fig. 4). JMB19063 hydrolyzed pNPG with a k cat of 150 s Ϫ1 and a K m of 0.79 mM ( Table 2). The crystal structure of the cellopentaose-bound D261N mutant revealed that, as a consequence of dimerization, Arg-587 and Phe-598 interact with the bound cellopentaose in the catalytic pocket of the opposing subunit. Therefore, we hypothesized that these non-catalytic residues may influence enzyme activity by complementing the catalytic pocket of the other subunit as a result of dimerization. To determine whether Arg-587 and Phe-598 influence enzyme activity, each of these residues was mutated FIGURE 5. Time course of stepwise conversion of cellotetraose by the wild-type protein to cellotriose, cellobiose, and glucose monitored in a Nimzyme assay. All activities were corrected by the values of negative control samples without enzyme. Error bars represent S.E. of three independent experiments.

TABLE 2 Enzymatic analysis of JMB19063
The Michaelis-Menten parameters were determined using pNPG as the substrate as described under "Experimental Procedures." Reactions were incubated for 30 min at enzyme concentrations of 0.5 g (wild-type) and 5 g (R587A, F598A, and R587A/ F598A) of the recombinant enzyme in a 1-ml reaction volume. Data represent the means Ϯ S.E. of four independent experiments for each enzyme. The R587A/F598A mutant did not have any detectable activity.   (Table  2); the k cat values for R587A and F598A were 37 and 0.87% of that of the wild-type enzyme, respectively, and the K m values for R587A and F598A were 2.2-and 4.9-fold over that of the wildtype enzyme, respectively. The double mutant (R587A/F598A) had no detectable activity on pNPG, so the Michaelis-Menten parameters were not measured. Size-exclusion column chromatograms of the purified mutant enzymes overlapped the chromatogram of the wild-type enzyme at the 161-kDa region (supplemental Fig. S1), indicating that the differences in kinetic parameters are direct results of the mutations. Therefore, these results strongly suggest that Arg-587 and Phe-598 are essential for formation of the JMB19063 catalytic pocket with the dimerized subunit and are important in the enzyme activity of the ␤-glucosidase. Additionally, the hydrolysis of cellobiose, cellotetraose, and maltose was qualitatively analyzed using the nanostructure-initiator mass spectrometry (10)-based Nimzyme assay (11,12). Whereas the wild-type protein hydrolyzed Ͼ90% of cellobiose under our prescribed assay conditions, R587A converted only ϳ5% and F598A only ϳ10% of cellobiose ( Table 3). Considering that five times more mutant than wild-type protein was used for these assays, the catalytic activity of R587A was only ϳ1.0% and that of F598A was only ϳ2.3% of that of the wild-type enzyme. The double mutant (R587A/F598A) had no detectable activity. Furthermore, the wild-type ␤-glucosidase was able to utilize cellotetraose, whereas none of the mutant proteins showed this activity (Table 3). Monitoring the hydrolysis of cellotetraose by the wild-type protein over time (Fig. 5) clearly showed the stepwise degradation from cellotetraose via cellotriose and cellobiose down to glucose. None of the tested enzyme variants was able to cleave the ␣-1,4 linkages in maltose (Table 3). Together, these data also suggest the involvement of Arg-587 and Phe-598 in the formation of the catalytic pocket.

Parameter
Conclusions-We have presented the crystal structures of JMB19063 and a D261N catalytic variant of JMB19063. The structure of JMB19063, corroborated by size-exclusion chromatography, revealed that the enzyme is homodimeric. Each monomer is composed of three distinct domains. In addition to the two N-terminal domains, the C-terminal domain from the partnering subunit forms a portion of the active site cavity. Arg-587 and Phe-598 from this domain were shown to interact with bound ligands in the D261N catalytic variant structure. Enzyme assays confirmed that these residues are absolutely essential for full catalytic activity. Phylogenetic analysis showed that JMB19063 may be the first structure from a large subclass of enzymes in the GH3 family that has a unique and previously undescribed mode of substrate binding that requires dimerization.