Structure-Function Analysis of a Mixed-linkage β-Glucanase/Xyloglucanase from the Key Ruminal Bacteroidetes Prevotella bryantii B14*

The recent classification of glycoside hydrolase family 5 (GH5) members into subfamilies enhances the prediction of substrate specificity by phylogenetic analysis. However, the small number of well characterized members is a current limitation to understanding the molecular basis of the diverse specificity observed across individual GH5 subfamilies. GH5 subfamily 4 (GH5_4) is one of the largest, with known activities comprising (carboxymethyl)cellulases, mixed-linkage endo-glucanases, and endo-xyloglucanases. Through detailed structure-function analysis, we have revisited the characterization of a classic GH5_4 carboxymethylcellulase, PbGH5A (also known as Orf4, carboxymethylcellulase, and Cel5A), from the symbiotic rumen Bacteroidetes Prevotella bryantii B14. We demonstrate that carboxymethylcellulose and phosphoric acid-swollen cellulose are in fact relatively poor substrates for PbGH5A, which instead exhibits clear primary specificity for the plant storage and cell wall polysaccharide, mixed-linkage β-glucan. Significant activity toward the plant cell wall polysaccharide xyloglucan was also observed. Determination of PbGH5A crystal structures in the apo-form and in complex with (xylo)glucan oligosaccharides and an active-site affinity label, together with detailed kinetic analysis using a variety of well defined oligosaccharide substrates, revealed the structural determinants of polysaccharide substrate specificity. In particular, this analysis highlighted the PbGH5A active-site motifs that engender predominant mixed-linkage endo-glucanase activity vis à vis predominant endo-xyloglucanases in GH5_4. However the detailed phylogenetic analysis of GH5_4 members did not delineate particular clades of enzymes sharing these sequence motifs; the phylogeny was instead dominated by bacterial taxonomy. Nonetheless, our results provide key enzyme functional and structural reference data for future bioinformatics analyses of (meta)genomes to elucidate the biology of complex gut ecosystems.

The chemical and structural complexity of plant cell walls poses a challenge to organisms, from bacteria to humans, in extracting energy from biomass via polysaccharide saccharification and further metabolism. A diversity of amorphous polysaccharides ("hemicelluloses" and "pectins"), structural (glyco)proteins, and polyphenolics ("lignin") associate with paracrystalline cellulose microfibrils within the plant cell wall to form a composite framework that is both strong and dynamic (1). Among the many matrix glycans in land plants, the diverse family of xyloglucans and the mixed-linkage glucans predominate in varying ratios, depending on the plant lineage and tissue type (2)(3)(4)(5). Mixed-linkage glucans have a general structure composed of short stretches of ␤(1,4)linked glucosyl residues (typically 3-5 residues) that are linked together by ␤(1,3)-linkages (Fig. 1B) (6). In contrast, xyloglucans are composed of a linear backbone of ␤(1,4)linked glucosyl residues decorated with a regular pattern of ␣(1,6)-linked xylosyl residues, which are further extended with galactosyl, fucosyl, and/or arabinosyl residues (Fig. 1A) (7). The ␤(1,3) "kinks" of mixed-linkage glucan and the complex branches of xyloglucan appear to serve a similar function of inducing structural disorder, thereby endowing these polysaccharides with significant water solubility and hydrogellation properties, while at the same time maintaining affinity to cellulose.
The vast diversity of glycoside hydrolases (GHs) 6 directed toward plant cell walls is a testament to the importance, and the challenge, of biomass degradation in the biosphere. Indeed, hundreds of thousands of GHs have been annotated in over 130 structurally related families in the Carbohydrate-Active Enzymes (CAZy) classification, the majority of which are directed to plant polysaccharides (8 -10). Moreover, considerable divergent evolution has occurred within individual GH families giving rise to substrate specificity differences among members. Mapping functional diversity in such polyspecific families has been enabled by further division into phylogenetic subfamilies in some cases (11)(12)(13)(14).
Glycoside hydrolase family 5 (GH5) is a key example of family diversity, with members demonstrating over 20 known specificities. GH5 members are united by a canonical double-displacement, anomeric configuration-retaining mechanism of hydrolysis, which involves two key catalytic carboxylic acid side chains presented on a conserved (␤/␣) 8 protein fold (15). The recent division of GH5 into subfamilies has shown that many of these activities cluster into phylogenetic clades (11). Among these, GH5 subfamily 4 (GH5_4) constitutes one of the largest, which generally encompasses endo-␤(1,4)-glucanases, including cellulases (EC 3.2.1.4), mixed-linkage endo-␤(1,3)/␤(1,4)glucanases (EC 3.2.1.73), and highly specific endo-xyloglucanases (EC 3.2.1.151) evolved for the saccharification of plant biomass. GH5_4 endo-xyloglucanases (16,17) are particularly distinguished by their ability to accommodate and harness the numerous extended ␣(1,6)-xylosyl branches in diverse xyloglucans (18). Unfortunately, the current level of functional characterization of GH5_4, which includes the observation that most of the characterized members have not been tested consistently on the same panel of substrates (e.g. including xyloglucan) (19,20), means that clear delineation of polysaccharide specificity in this subfamily is not straightforward. This presents a significant difficulty for in silico analysis of (meta)genomes for functional prediction, as well as for the selection and application of specific enzymes for industrial biomass utilization.
To address this issue, we present here the characterization of a novel GH5_4 member, PbGH5A, from the symbiotic gut bacterium Prevotella bryantii B 1 4 involved in dietary polysaccharide breakdown (21)(22)(23). Locus PBR_0368 of the P. bryantii B 1 4 genome encodes a bi-modular gene product composed of a predicted N-terminal, Signal Peptidase I-cleavable signal peptide, followed by a GH26 module and a C-terminal GH5 module (PbGH5A) (23). Early efforts to clone PBR_0368 and characterize its product (equivalent to GenBank TM AAC97596, also known as ORF4, CMCase, or Cel5A) revealed general endoglucanase activity via assay on carboxymethylcellulose (24,25). However, detailed specificity data are currently lacking, especially in light of the identified bimodularity of this protein and diverse specificities found within GH26 and GH5 (26). Notably, PBR_0368 is located in a predicted Polysaccharide Utilization Locus encoding hallmark SusD-and SusC-like proteins and at least two other GHs whose collective function is currently unknown (27). In this study, kinetic analyses on a range of natural and artificial substrates, together with tertiary structures of enzyme variants in complex with oligosaccharides and an active-site affinity label, yielded molecular level insight into interactions along the entire active-site cleft responsible for the specificity of recombinant PbGH5A for mixed-linkage glucan over xyloglucan.

Analytical Methods
HPAEC-PAD Carbohydrate Analysis-HPAEC-PAD was performed on a Dionex ICS-5000 system equipped with an AS-AP auto-sampler with a temperature-controlled sample tray run in a sequential injection configuration using Chromeleon 7 control software. The injection volume was 10 l unless otherwise specified. A 3 ϫ 250-mm Dionex CarboPac PA200 column with a 3 ϫ 50-mm guard column was used for all HPAEC separations. Solvent A was ultrapure H 2 O; solvent B was 1.0 M NaOH prepared from a carbonate-free 50 -52% stock, and solvent C was 1.000 M NaOAc prepared from anhydrous BioUltra-grade solid (Sigma). The gradients used were as follows: gradient A: 0 -5 min; 10% B, 2% C; 5-12 min 10% B, 2-30% C linear gradient; 12-12.1 min 50% B; 50% C; 12.1-13 min return to initial conditions (exponential profile 9); 13-17 min, initial conditions; and gradient B: gradient A with 0% C initially; gradient C: gradient A with 6% C and a 12-min linear gradient; gradient D: gradient A with 3.5% C initially.
Xyloglucan Oligosaccharides (tXyGOs)-The tetradecasaccharide XXXGXXXG was prepared by partial digestion of xyloglucan from de-oiled tamarind kernel powder (dTKP, Premcem Gums) with His 6 -PpXG5 (16) followed by degalactosylation with Aspergillus niger ␤-galactosidase (Megazyme International, Bray, Ireland). XXXG was prepared similarly, with CjBgl35A replacing the A. niger ␤-galactosidase (33). Briefly, 100 g of dTKP were slowly added to 1 liter of 10 mM NH 4 OAc (pH 5.5) containing 500 units (ϳ2.5 mg) of PpXG5 (where 1 unit is defined as the amount of enzyme that releases 1 mol of glucose-eq reducing ends per min). The reaction was stirred at 50°C until a smooth, tan opaque suspension formed (ϳ30 min). The reaction was sampled regularly. The samples were filtered, run over Dowex 1X2 Cl, and analyzed using HPAEC-PAD (gradient C) until the population of Glc 8 -tXyGOs was maximal (ϳ4 h). The pH was raised to 8 (using 1 M NH 4 OH) to stop the reaction, and the solution was centrifuged for 15 min at 4000 ϫ g. The translucent yellow supernatant was then decolorized by passage through Dowex 1X2 Cl and passed through a HisTrap FF crude column (GE Healthcare) to fully remove His 6 -PpXG5. The pH was then returned to 5.5 with 1 M AcOH, and 1400 units of ␤-galactosidase was added and stirred at 30°C overnight. The degalactosylated tXyGOs were lyophilized for storage. 500 mg of this was then dissolved in 5 ml of ultrapure water, 0.45 m filtered, and purified using a 90-cm P6 BioGel (Bio-Rad) column (XK 26/100, GE Healthcare), and run at 6 cm/h at room temperature. Fractions were monitored by HPAEC-PAD (gradient C) and homogeneous fractions of XXXG, XXXGXXXG, and XXXGXXXGXXXG were pooled and lyophilized to give a white foam (final yield: 200 mg of XXXG, 55 mg of XXXGXXXG, and 30 mg of XXXGXXXGXXXG).
Mixed-linkage Glucan Oligosaccharides (bMLGOs)-Glc␤ (1,3)-Glc␤(1,4)-Glc␤(1,4)-Glc␤(1,3)-Glc␤(1,4)-Glc␤(1,4)-Glc (G3GGG3GGG) was prepared by the digestion of oat ␤-glucan (B-CAN, Garuda International) with Vitis vinifera family 16 endo-glucanase (VvEG16) (expressed and purified in-house) 7 to give a mixture of oligosaccharides with the formula G3GGG(3GGG) n . 10 g of B-CAN was initially swelled in 500 ml of deionized H 2 O at 25°C for 15 min. The B-CAN was then collected by centrifugation at 1000 ϫ g for 2 min, and the supernatant was discarded. The material was washed in this manner three times to extract glucose, unidentified oligosaccharides, fines, and colored material. The swelled particles were then resuspended in 500 ml of 10 mM NH 4 OAc (pH 5.5) and heated to 80°C. The solution was stirred until dissolved (ϳ15 min) and allowed to cool to 37°C. 50 units (ϳ10 mg) of VvEG16 was then added, and the reaction was stirred at 37°C overnight. 30 min into the digestion, the now significantly less viscous solution was centrifuged at 4000 ϫ g for 5 min to remove a small amount of insoluble matter. The reaction completion was confirmed based on the oligosaccharide distribution observed by HPAEC-PAD (gradient A) and the opaque tan solution was centrifuged at 4000 ϫ g for 5 min at room temperature to separate insoluble bMLG from soluble bMLG. The now clear and faintly yellow solution was then adjusted to pH 8 using 1 M NH 4 OH and decolorized by running through a 5-g plug of Dowex 1X2 Cl. The product was then precipitated from the clear colorless solution by the addition of 1 liter of acetone. After cooling to Ϫ20°C in the freezer, a well flocculated white precipitate was collected by centrifugation (in a high density polyethylene bottle) at 1000 ϫ g for 2 min. The product was dried under vacuum for several hours to give 1.52 g of a white powder. 500 mg of this was then dissolved in 5 ml of ultrapure H 2 O and purified using a 90-cm P2 BioGel (Bio-Rad) column (XK 26/100, GE Healthcare) run at 6 cm/h at room temperature. Fractions were monitored by HPAEC-PAD (gradient A), and homogeneous fractions of G3GGG and G3GGG3GGG were pooled and lyophilized to give white foam (final yield: 41 mg of G3GGG and 62 mg of G3GGG3GGG).

Enzyme Cloning and Expression
A DNA fragment of P. bryantii B 1 4 locus PBR_0368 encoding amino acid residues 425-776, corresponding to the GH5 catalytic domain (PbGH5A), was received from the Joint Genome Institute (jgi.doe.gov) in a pET101 plasmid and subcloned into a cloning vector p15Tv-LIC (34) providing an N-terminal His 6 -tagged fusion with a tobacco etch virus protease cleavage site between the tag and the enzyme. PbGH5A was expressed in E. coli BL21(DE3) grown in auto-induction media (35) for 3 h at 37°C and continued overnight growth at 18°C. Cells were harvested via centrifugation at 5000 ϫ g. The resulting pellet was resuspended in a binding buffer (50 mM HEPES (pH 7.5), 500 mM NaCl, 5 mM imidazole, and 14% glycerol (v/v)) and lysed via sonication, and cell debris was removed via centrifugation at 30,000 ϫ g for 30 min. Cleared lysate was loaded onto a 5-ml nickel-nitrilotriacetic acid column (Qiagen) pre-equilibrated with the binding buffer, and the column was washed with the binding buffer containing 30 mM imidazole. Bound proteins were eluted using the binding buffer with 250 mM imidazole. The His 6 tag was removed by cleavage with tobacco etch virus protease (expressed and purified in-house per Ref. 36) overnight at 4°C during dialysis against 0.5 M NaCl, 10 mM HEPES (pH 7.5), and 0.5 mM tris [2-carboxyethyl]phosphine. The sample was passed over nickel-nitrilotriacetic acid resin and the flow-through was collected. Fractions containing the protein of interest were identified by SDS-PAGE.

Enzyme Kinetics and Product Analysis
Polysaccharide Hydrolysis-Polysaccharide hydrolysis was quantified using either the BCA (37) or DNSA (38) assays. For BCA assays, reactions were prepared to a final volume of 100 l and heated to the incubation temperature for 0, 15, and 30 min before being quenched by the addition of fresh BCA reagent (100 l). A glucose series (10 -500 M) was run with each assay. Color was developed by heating to 80°C for 10 min before reading the absorbance at 563 nm. For DNSA assays, 100 l of the reaction was quenched by adding 100 l of DNSA reagent. The reaction was then heated to 95°C for 10 min to develop color, cooled to room temperature, and centrifuged for 1 min. The absorbance was read at 540 nm.
The temperature optimum was determined in 50 mM (pH 5.5) sodium citrate buffer using 1 mg/ml bMLG as substrate ( Fig. 3A) with 0.02 nM enzyme. The reaction was mixed at 4°C and incubated at a temperature ranging from 30 to 55°C for 30 min before reducing ends were quantified using the BCA assay. The specific activity of PbGH5A was standardized with 1 mg/ml bMLG substrate at 37°C in 10 mM (pH 5.5) sodium citrate buffer. The thermal stability of PbGH5A was determined by incubating the enzyme (1 g/ml in 20 mM (pH 5.5) citrate) at temperatures ranging from 30 to 74°C. At regular time intervals, samples were taken, diluted into room temperature citrate buffer (pH 5.5), and assayed using 200 M XXXG-CNP.
To determine limit-digestion products, PbGH5A (10 g) was added to 1 ml of 0.1 mg/ml substrate in 50 mM NaOAc (pH 5.5) and incubated for 4 h at 37°C. 10 l of the reaction was then analyzed by HPAEC-PAD directly using gradient A.
Chromogenic Oligosaccharide Hydrolysis-4-Nitrophenyl glycoside hydrolysis kinetics were determined by mixing enzyme (20 -1000 nM), buffer (50 mM (pH 5.5) citrate), and substrate (0.1-25 mM) to a final volume of 200 l. At 5-min intervals, 60 l of the reaction was diluted into 540 l of 50 mM Na 2 CO 3 , and A 405 was measured on a Cary 60 UV-visible spectrometer with a 1-cm path length quartz cuvette. An extinction coefficient of 18.2 mM Ϫ1 cm Ϫ1 was used to quantify 4-nitrophe- were fit to determine the pH optimum (pH 5) and apparent kinetic pK a values (3.5 and 6.5). B, activity of PbGH5A on 1 mg/ml bMLG in various buffers across a range of pH at 37°C. Each point is the average of two replicates. nol release. 1 unit was defined as the amount of enzyme that releases 1 mol of 4-nitrophenol/min. CNP substrate kinetics were determined by preheating 180 l of 1.11ϫ substrate stock (to give 0.02-10 mM final concentration) and adding 20 l of 10ϫ enzyme stock to give 0.01-100 nM final concentration in 20 mM NaOAc (pH 5.5). The change in absorbance at 405 nm was followed continuously over 10 min at 37°C in 200-l quartz cuvettes using a Cary 300 UV-visible spectrometer with an 8-cell sample changer and thermostat. The extinction coefficient for CNP was determined to be 10.7 mM Ϫ1 cm Ϫ1 in the buffer used. For the XXXG-CNP substrate, the assay was optimized to obtain conditions compatible for residual activity measurement. The hydrolysis was monitored in 50 mM citrate buffer at pH 5.5; absorbance was measured at 405 nm, and the extinction coefficient for CNP was determined to be 11.2 mM Ϫ1 cm Ϫ1 in the buffer used. Specific activity measurements for wild-type enzyme and the three mutants, E280A, S119A, H112A, were determined using GGG-CNP at 500 M and XXXG-CNP at 200 M in 50 mM citrate buffer at pH 5.5.
Native Oligosaccharide Hydrolysis-HPLC-based enzyme kinetics were determined by mixing a 10ϫ enzyme in buffer stock (to give 0.02-10 nM and 20 mM (pH 5.5) citrate final enzyme and buffer concentration) with a 1.11ϫ substrate stock (to give 0.005-1 mM final substrate concentration) preheated to 37°C. For example, 10 l of 0.2 nM PbGH5A in 200 mM sodium citrate buffer (pH 5.5) was added to 90 l of 1.11 mM GGGGG in ultrapure H 2 O preheated to 37°C. The reaction was then injected four times (10 l each) at regular time intervals, and the change in peak area over time was quantified. Gradient A was used for monitoring cello-oligosaccharide and mixed-linkage glucan oligosaccharide degradation; gradient D was used for monitoring XXXGXXXG degradation. An 8-point linear calibration series from 0.4 to 100 M was run for each product quantified. Rates were fit to the Michaelis-Menten model (39,40) using OriginPro graphing software (Origin Lab).
To determine the regiospecificity of cellopentaose hydrolysis, 18 O incorporation from [ 18 O]water was determined by mass spectrometry (41). 1 l of PbGH5A (0.10 g/ml in 20 mM NH 4 OAc (pH 5.5)) and 1 l of 0.5 M NH 4 OAc (pH 5.5) containing 5 mM NaOAc (to control adduct formation) were added to 22 l of 97% [ 18 O]water (Cambridge Isotope Laboratories) and mixed thoroughly by reciprocal pipetting. To this was then added 1 l of 10 mM cellopentaose. The reaction was then mixed thoroughly again to give an estimated final 18 O concentration of 85%. The reaction was drawn into a 50-l gas-tight Hamilton syringe (Hamilton, model 1705) and infused into a Waters Xevo QTof at 2 l/min using a syringe pump (Harvard Apparatus 11 Plus). The degree of isotopic labeling was quantified as the area ratio of Inhibition Kinetics-Inhibition kinetic parameters were determined at 37°C using 0.038 M PbGH5A in 25 mM citrate buffer at pH 5.5, containing 1% bovine serum albumin (BSA), and incubation with various putative inhibitor concentrations (0.1-3.5 mM, ca. 1/5 K i to 5 K i ). Ten l of enzyme/inhibitor solution was added to 190 l of 0.13 mM XXXG-CNP in 5 mM sodium citrate buffer (pH 5.5), and the reaction was monitored at 405 nm over 1-2 min in a 1-cm quartz cuvette maintained at 37°C. The inhibition data were fit according to the Kitz-Wilson model (42), and apparent inactivation rate constants (k app ) were determined by fitting the exponential decay function as shown in Equation 1, to the residual activity data. K i and k i values were determined by fitting plots of inactivation rate constants versus putative inhibitor concentrations as shown in Equation 2, by nonlinear regression using OriginPro graphing software.

Enzyme Crystallization
PbGH5A wild-type enzyme was crystallized at room temperature using the hanging drop method, with 1.8 l of protein solution at 28 mg/ml mixed with 1.8 l of reservoir solution (0.1 M sodium cacodylate (pH 6.3 to 7.1), 0.2 M calcium acetate, 25% PEG8K). The PbGH5A(E280A) mutant was crystallized in the same crystallization solution and using serial dilution seed- ing with wild-type crystals. The PbGH5A⅐XXXG and PbGH5A(E280A)⅐GGGG complexes were obtained by soaking apoenzyme crystals in reservoir solution supplemented with 20 mM XXXG or 10 mM GGGG for 3.5 and 2 h, respectively. Prior to data collection, crystals were cryoprotected with Paratone-N oil and flash-frozen in liquid nitrogen.
The PbGH5A⅐XXXG-NHCOCH 2 Br and PbGH5A(E280A)⅐ XXXGXXXG complexes were obtained through co-crystallization with the tag-less protein at a final concentration of 28 mg/ml. For the PbGH5A⅐XXXG-NHCOCH 2 Br complex, 6.6 mM EDTA was added to the protein, and the mixture was incubated at 4°C overnight. The inhibitor was then added to a final concentration of 8 mM, and the mixture was further incubated at 37°C for 3 h. For the PbGH5A(E280A)⅐XXXGXXXG complex, the protein solution was incubated with 2.4 mM ligand at 37°C for 3 h. The crystals were grown at room temperature using the sitting drop method, with 0.5 l of complex solution mixed with 0.5 l of reservoir solution: 0.2 M calcium chloride, 20% (w/v) PEG3350 for the XXXG⅐NHCOCH 2 Br complex, and 0.2 M magnesium acetate, 20% (w/v) PEG3350 for the XXXGXXXG complex. All complex crystals were cryoprotected by Paratone-N oil and flash-frozen in liquid nitrogen.

X-ray Crystal Structure Determination
Diffraction data at 100 K were collected using a Rigaku Micromax-007 HF rotating copper anode source with a Rigaku R-AXIS IV image plate detector (for the apoenzyme and PbGH5A⅐XXXG complexes) or using a Rigaku Micromax-007 HF rotating copper anode source with a Rigaku Saturn A200 CCD (at the Structure Genomics Consortium, for the PbGH5A(E280A)⅐GGGG, PbGH5A⅐XXXG-NHAcBr, and PbGH5A(E280A)⅐XXXGXXXG complexes). All x-ray data were reduced with HKL-3000 (43). First, the apoenzyme structure was determined by molecular replacement using a model generated by the Phyre2 server (44) of the PbGH5A sequence with the Paenibacillus pabuli GH5 structure as a template (PDB code 2JEQ (16)) and Phenix.phaser (45) and followed by automated model building using Phenix.autobuild. The PbGH5A ligand complex structures were determined by molecular replacement using the apoenzyme structure as the search model and Phenix.phaser to obtain phasing information. All refinement was performed with Phenix.refine with manual editing in Coot (46). During refinement B-factors were defined as anisotropic for all non-hydrogen atoms, and TLS parameterization was utilized. The final atomic model of the structures included chain A residues 7-352 and chain B residues 6 -352. Average B-factor and bond angle/length root mean square deviation (r.m.s.d.) values were calculated using Phenix.b_ factor_statistics. All geometry was verified using the Phenix Molprobity and Coot validation tools plus the wwPDB Deposition server. The data collection and refinement statistics are listed in Table 2.

Phylogenetic Analysis
Sequences from glycoside hydrolase family 5 (11) from bacteria were fetched from the CAZy Database (10) and aligned with MAFFT (47). The distances between sequences were cal-culated by FastTree (48) with this multiple sequence alignment. The resulting tree is displayed with Dendroscope (49).

Polysaccharide Kinetics
In light of the diverse specificities observed in GH5, we tested recombinant PbGH5A for activity on a library of linear ␤-glycan polysaccharides, including HEC, CMC, PASC, tXyG, kGM, and bMLG. As anticipated from its membership in GH5 subfamily 4, significant activity toward tamarind tXyG (k cat ϭ 6800 min Ϫ1 , K m ϭ 1.1 mM; Table 1) was observed at the pH optimum of 4.5-5.5 ( Fig. 2A) and at 37°C. However, our kinetic analysis revealed that PbGH5A is significantly more selective for barley mixed-linkage glucan, with a k cat of almost 3.5 ϫ 10 4 min Ϫ1 and K m of 0.12 mg/ml ( Fig. 4 and Table 1). PbGH5A was poorly active on the synthetic soluble cellulose mimics CMC and HEC, perhaps reflecting detrimental interactions of the pendant groups in the active-site cleft. Interestingly, PASC was a worse substrate for PbGH5A than CMC (Fig. 4). Low activity was also measured for the hydrolysis of kGM, which suggested some tolerance of ␤(1,4)-linked mannosyl residues in the polysaccharide backbone. Although very poor activity was observed for PbGH5A acting on xylans, no activity was observed with either galactomannan polysaccharide or mannohexaose, confirming the glucan specificity of the enzyme.
The pH-rate profile of PbGH5A was affected by the substrate used for its determination (Fig. 2). The pH-rate profile of PbGH5A with XXXG-CNP gave a pH optimum of 5 with kinetic pK a values of 3.5 and 6.5; however, the pH-rate profile with MLG demonstrated different kinetic pK a values depending on the buffer used. The activity-temperature profile of PbGH5A on bMLG substrate indicates that the enzyme has limited activity enhancement above 37°C (Fig. 3A); hence, kinetic measurements were routinely performed at 37°C in pH 5.5 buffer. The enzyme is stable below 45°C (Fig. 3B) but exhibits rapid (t1 ⁄ 2 ϭ 15 min) and permanent inactivation at elevated temperatures.

Polysaccharide Hydrolysis Product Distributions
Analysis of the limit-digestion products was subsequently performed to determine the cleavage specificity of PbGH5A. HPAEC-PAD analysis of the initial digest of bMLG (Fig. 5A) contained three peaks with short retention times, corresponding to primarily cellotriose and cellotetraose with a small amount of cellobiose. When allowed to run significantly longer, the limit digestion of bMLG gave glucose, cellobiose, and cellotriose (data not shown). Interestingly, a small number of peaks with longer retention times was also generated but not further degraded in longer incubations. The major late-eluting peak was determined to be G3GGG based on retention time and standard addition; the other peaks were not identified (see "Substrates and Inhibitors" under the "Experimental Procedures" for oligosaccharide nomenclature).
The presence of more than the canonical four peaks corresponding to XXXG, XLXG, XXLG, and XLLG (ratio ϳ13:9: 28:50 (18,50)) in the limit digest of tamarind tXyG (Fig. 5B) indicates that the enzyme is able to cut at sites other than the unbranched glucosyl residues. Indeed, MALDI-MS analysis of the digest revealed the presence of fragments with masses corresponding to XLLGX and XLG/LXG, which confirmed an alternate cleavage mode in which xylosyl-branched glucosyl units bind in the Ϫ1 and ϩ1 subsites (active-site nomenclature according to Ref. 51).

Chromogenic Substrate Kinetics
To map the negative enzyme subsites and determine their specific contributions to catalysis, we employed a series of initial-rate kinetic experiments measuring the release of the aglycone from the CNP and PNP ␤-glycosides of glucose (G), cellobiose (GG), cellotriose (GGG), cellotetraose (GGGG), and the xyloglucan heptasaccharide XXXG. Hydrolysis of G-CNP was undetectable, and only weak activity (k cat /K m ϭ 670 M Ϫ1 s Ϫ1 ) was observed with GG-CNP ( Fig. 6 and Table 1). GGG-CNP was a significantly better substrate (k cat /K m ϭ 1.01 ϫ 10 5 M Ϫ1 s Ϫ1 ), thus indicating a significant contribution to catalysis due to binding of the additional Glc residue in a Ϫ3 subsite; a similar trend was observed for the PNP congeners ( Table 1). The specificity constant for GGGG-CNP hydrolysis (k cat /K m ϭ 1.94 ϫ 10 5 M Ϫ1 s Ϫ1 ) was only 2-fold higher than that of GGG-CNP, suggesting little to no contribution from a Ϫ4 subsite. In keeping with the poorer leaving-group ability of the aglycone (31,52), GG-PNP and GGG-PNP were hydrolyzed significantly more slowly than the CNP congeners. Comparison of the kinetic constants for XXXG-CNP (Xyl 3 Glc 4 -CNP, composed of a GGGG backbone) with GGGG-CNP revealed a similar k cat value but a significantly (10-fold) lower K m value, yielding a corresponding increase in specificity constant (k cat /K m ϭ 2.53 ϫ 10 6 M Ϫ1 s Ϫ1 , Table 1) for the branched substrate. However, the observation of significant substrate inhibition and deviation from classical Michaelis-Menten kinetics with XXXG-CNP (Fig. 6) suggests that caution is warranted in interpreting the apparent positive effects of xylosyl branches in the negative subsites.

Native Oligosaccharide Kinetics
To gain insight into the contribution of the positive subsites to substrate binding and catalysis, we determined the initialrate kinetics of PbGH5A on cello-oligosaccharides, mixedlinkage ␤(1,3)/␤(1,4)-glucan oligosaccharides, and xyloglucan oligosaccharides, using an HPLC-based assay. No activity was observed with laminaribiose (G3G), cellobiose (GG), or cellotriose (GGG), suggesting that PbGH5A requires the occupancy of at least four subsites for initiation of the glycosidic bond cleavage. Indeed, cellotetraose (GGGG) was readily hydrolyzed through two modes, one yielding two molecules of cellobiose (2ϫGG), and one yielding glucose (G) plus cellotriose (GGG); Michaelis-Menten analysis revealed that the symmetric cleavage mode was favored by a 7-fold greater k cat /K m value ( Fig. 7 and Table 1). Notably, cellohexaose (GGGGGG) was degraded to GG, GGG, and GGGG with similar kinetic constants to GGGGG, which was exclusively converted to cellotriose (GGG) and cellobiose (GG), with a k cat /K m value 130fold higher than that for symmetric cleavage of cellotetraose (Figs. 7 and 8 and Table 1). Exclusive isotopic labeling of the product cellotriose (GGG) in H 2 18 O revealed that recognition across the Ϫ33ϩ2 subsites was responsible for this cleavage mode (Fig. 9). Specifically, the M ϩ 2 peak of cellobiose did not increase in relative intensity above the natural abundance, whereas the intensity of the M ϩ 2 peak of cellotriose indicated 74% 18 O labeling (theoretical, 85%).
Turning our attention to mixed-linkage ␤(1,3)/␤(1,4)-glucan oligosaccharides, we observed that G3GGG was not hydro- lyzed, which suggested that ␤(1,3) bonds are not tolerated between the first three negative subsites. In contrast, GG3GG was a competent substrate, yielding cellobiose as the only product ( Fig. 7 and Table 1). This recapitulated the rejection of ␤(1,3) bonds between the negative subsites, and furthermore, it highlighted the importance of Ϫ2 subsite binding. The specificity constant of the GG3GG degradation is only ϳ1.5-fold lower than that of cellotetraose (Table 1), which indicated a lack of selectivity for ␤(1,3) or ␤(1,4) bonds in the cleavage site. Interestingly, GGG3G was hydrolyzed via two modes, in which the production of cellotriose (GGG) plus glucose, via binding in the Ϫ33ϩ1 subsites and cleavage of the ␤(1,3)-linkage, was favored by a factor of 5 in k cat /K m values over the production of cellobiose (GG) plus laminaribiose (G3G), via binding in the Ϫ23ϩ2 subsites and cleavage of the ␤(1,4)-linkage and Ϫ23ϩ2binding subsites ( Fig. 7 and Table 1). The extended mixed-linkage heptasaccharide G3GGG3GGG was hydrolyzed most rapidly of all the substrates tested, with a k cat /K m value exceeding that of cellopentaose or cellohexaose by 2-fold, to give only cellotriose (GGG) and G3GGG as products ( Fig. 7 and Table 1).

Inhibition and Covalent Labeling with an Active-site-directed Inhibitor
We have previously introduced N-bromoacetylglycosylamines and bromoketone C-glycoside derivatives of xyloglucan oligosaccharides as specific active-site affinity labels for endoxyloglucanases ( Fig. 10) (32). Incubation of PbGH5A with the N-bromoacetylglycosylamine derivative of XXXG (XXXG-NHCOCH 2 Br, compound 1) led to a rapid time-and concentration-dependent inactivation (Fig. 10), with a dissociation constant, K i , of 0.63 Ϯ 0.03 mM, and an irreversible inactivation constant, k i , of 0.0364 Ϯ 0.0006 min Ϫ1 (k i /K i ϭ 0.06 mM Ϫ1 min Ϫ1 ). Intact protein MS after a 3-h incubation of PbGH5A with compound 1 at 1.4 mM and 37°C revealed exclusive single labeling of the enzyme (Fig. 11). The bromoketone C-glycoside isostere (XXXG-CH 2 COCH 2 Br, compound 2) was a less potent, but nonetheless effective, inhibitor of PbGH5A, with a 3-fold lower k i value and 2.5-fold lower K i (K i ϭ 0.27 Ϯ 0.07 mM, k i ϭ 0.0113 Ϯ 0.0008 min Ϫ1 , and k i /K i ϭ 0.04 mM Ϫ1 min Ϫ1 ). Intact protein MS of PbGH5A under conditions similar to those giving essentially complete inactivation (7.3 M PbGH5A, 1.4 mM inhibitor compound 2, 3-h incubation at  37°C) indicated near-complete labeling of the enzyme, also at 1:1 stoichiometry (Fig. 11).

Structural Characterization of PbGH5A Variants in the Apoform and in Complexes with Oligosaccharides
To provide molecular level insight into substrate recognition by PbGH5A, we determined the crystal structure of this protein to 1.65 Å resolution (PDB code 3VDH). We also obtained high resolution (1.6 -1.9 Å) structures of enzyme variants with four different ligands ( Table 2). The complex structures of the catalytically inactive PbGH5A(E280A) site-directed mutant with the tetradecasaccharide substrate XXXGXXXG (PDB code 5D9M) and that of the wild-type enzyme with the covalent inhibitor XXXG-NHCOCH 2 Br (compound 1, PDB code 5D9P) contained clear electron density corresponding to ligand molecules that spanned the length of the active-site cleft for both complexes (Fig. 12). Together, these complexes provide the most complete view of enzyme-substrate interactions across the entire active site of a GH5 member to date. In the complex structures between the wild-type enzyme and heptasaccharide XXXG (PDB code 5D9N) and between the E280A variant and the linear glucan cellotetraose GGGG (PDB code 5D9O), the respective ligands occupied the positive subsites of the PbGH5A active site, thereby providing a unique opportunity to directly compare binding for branched and unbranched ligands in the GH5_4 subfamily.
Overall Structure of PbGH5A-The PbGH5A apoenzyme structure was determined by molecular replacement using the structure of P. pabuli GH5 (PDB code 2JEQ) as a search model. The asymmetric unit contained two polypeptide chains corresponding to PbGH5A residues 7-352. According to analytical gel filtration analysis (data not shown) the PbGH5A protein predominantly exists as a monomer in solution, suggesting that intermolecular contacts observed in the crystal structure are most probably a result of crystal packing. The overall structure of PbGH5A is a (␤/␣) 8 -barrel fold typical of the GH5 family (Fig. 12A). Structural comparison using the Dali server (53) identified other characterized GH5 subfamily 4 enzymes as the closest structural homologues of PbGH5A. The best match was the structure of endoglucanase A from Piromyces rhizinflata  Table 3).
The active site in the homologous enzymes is located in a large solvent-accessible cavity formed by loop regions at the top of the barrel. As inferred from comparison of primary and tertiary structures of GH5 members, Glu-280 and Glu-162 are the catalytic active-site nucleophile and general acid/base, respectively, in PbGH5A (55)(56)(57). Accordingly, the mutation of Glu-280 to alanine resulted in a Ͼ18,000-fold reduction in activity (below the limit of detection) compared with the wild type on both chromogenic substrates XXXG-CNP and GGG-CNP. Hence, the catalytically inactive PbGH5A(E280A) variant was used for co-crystallization with tetradecasaccharide and heptasaccharide substrates.
Non-covalent Complexes of PbGH5A Variants with Branched and Unbranched Ligands-The PbGH5A(E280A) variant in complex with tetradecasaccharide substrate exhibited unambiguous, well ordered electron density corresponding to a single XXXGXXXG molecule spanning the active sites of both monomers found in the asymmetric unit (Fig. 12B); one-half of the substrate occupied the positive subsites of one monomer (monomer A), and the other half localized to the negative subsites of the second monomer (monomer B). In addition, the negative subsites of monomer A featured density corresponding to the XXXG moiety of the second substrate molecule, the rest of which was apparently disordered in the solvent channel. The positive subsites of monomer B did not contain any additional electron density.
The conformation of the XXXGXXXG substrate molecule in the negative subsites of both PbGH5A(E280A) monomers was well defined and virtually identical. Detailed analysis of interactions between PbGH5A(E280A) monomers and the substrate molecule showed that subsite Ϫ1 of the enzyme forms by far the most direct interactions with the glucosyl moieties of the substrate (Fig. 13A). More specifically, this subsite was occupied by the ␣-anomer of the reducing-end glucose unit, with the C1-hydroxyl forming a hydrogen bond with the side chain of Tyr-240 (Fig. 14). This glucose unit is also positioned by a stacking interaction with Trp-324, and hydrogen bonds between C2-, C3-, and C6-hydroxyls and the side chains of Asn-161, His-112, and Asp-288, respectively. It is interesting to note that the C1-carbon atom itself is located 3.1 Å away from the C4-hydroxyl of the xylosyl-branched glucosyl unit bound in the ϩ1 subsite, implying that little movement would be necessary to bring the two ligands within the distance required for formation of an intact ␤-1,4 bond in the reverse reaction. These observations suggested that PbGH5A(E280A)⅐XXXGXXXG complex structure represents a good proxy for wild-type enzyme-product interactions, despite having been generated from a catalytically inactive enzyme variant.
The Ϫ2 subsite is formed by the side chains of Asn-28 and Asp-288, which hydrogen bonds to the C2-and C3-hydroxyls of the corresponding glucosyl residue, respectively (Fig. 14). Binding of the Ϫ2Ј xylosyl unit is water-mediated, with the exception of a hydrogen bond between the C4-hydroxyl and the backbone of Ala-117. In subsite Ϫ3, the main interaction is stacking of the glucosyl residue against Trp-48, whereas in subsite Ϫ4, the stacking is against Phe-47. These structural observations (see also the inhibitor complex below) are in line with  the kinetic analysis presented above, which likewise suggest the existence of a total of four negative subsites in PbGH5A.
As mentioned above, the PbGH5A(E280A)⅐XXXGXXXG complex features one substrate molecule extending from the positive subsites of one monomer into the negative subsites of another monomer in the asymmetric unit. In the positive subsites, the substrate binding maps from the ϩ1 subsite, in which the glucosyl moiety is closely stacked against Trp-170 (Fig.  13B). The glucosyl moiety forms a hydrogen bond with the side chain of Glu-162 via the C4-hydroxyl, although the other hydroxyls are solvent-coordinated. The xylosyl unit in the ϩ1 subsite occupies the space between the active site cleft and the glucan backbone and forms an intricate network of hydrogen bonds with the protein, making use of all the available hydroxyls of the sugar moiety. The glucosyl unit in the ϩ2 subsite stacks against Trp-243, and its C2-hydroxyl forms hydrogen bonds with the side chain of Asp-171. The xylosyl unit in the ϩ2 position is bound away from the active site cleft and is completely solvent-exposed, as are the ligand units in the ϩ3 and ϩ4 positions.
In the wild-type PbGH5A⅐XXXG heptasaccharide complex, clear electron density corresponding to all seven monosaccharide residues bound in the positive subsites was observed (Fig.  13C). Although the general orientation of the ligand molecule in this complex structure is similar to that observed in the PbGH5A(E280A)⅐XXXGXXXG complex, it nevertheless has a distinct conformation in which the glucan backbone of the substrate is rotated ϳ15°away from the enzyme catalytic center (Fig. 15B). This alternative conformation of the ligand results in a more parallel binding along the bottom of the active site cleft, and it is observed for both PbGH5A monomers found in the asymmetric unit. Because of this shift, a unique set of interactions is formed between the enzyme and the ligand as compared with the tetradecasaccharide complex structure (Fig. 15B). The interactions conserved between the two structures include Trp-170 and Trp-243 stacking with the ligand in the ϩ1 and ϩ2 subsites, respectively, and the hydrogen bond between Lys-214 and the C4-hydroxyl of the ϩ1Ј xylosyl residue. The latter moiety forms four unique hydrogen bond interactions not observed in the complex with XXXGXXXG, demonstrating that although different, the recognition of XXXG is as elaborate as for the tetradecasaccharide. The xylosyl units for the rest of the ligand are completely solvent-exposed. The presence of available hydrogen bonding partners in the PbGH5A active site, and lack of conserved interactions with the side chain xylose moieties in the two complexes, implies that ligand binding is dominated by overall accommodation of the xyloglucan polymer rather than specific recognition of the pendant groups.
As observed for the complexes with branched ligands, the PbGH5A(E280A)⅐cellotetraose (GGGG) complex contained the ligand bound in the positive subsites, although clear elec- tron density was present only for glucosyl residues ϩ1 to ϩ3 (Fig. 13D). In contrast to the xylogluco-oligosaccharide-bound structures, the binding conformation of the cellotetraose ligand is dramatically different. In particular, the glucosyl residue in the ϩ1 subsite in the cellotetraose complex is flipped ϳ180°a bout the C1-C4 axis vis à vis the XXXG complex (Fig. 15C). In this orientation, the glucan backbone is pushed deeper into the core of the protein and results in all four glucosyl moieties of cellotetraose forming direct interactions with the protein. In the ϩ2 subsite, the C6-hydroxyl of the glucosyl moiety occupies the space equivalent to the ϩ1Ј-xylosyl residue of the branched ligands and forms hydrogen bonds with the side chain of Asp- Active-site Affinity-labeled Complex of PbGH5A-In keeping with the anticipated reactivity of the electrophilic affinity label XXXG-NHCOCH 2 Br, the XXXG-NHCOCH 2 -moiety was observed in the negative subsites, covalently bound to the side chain oxygen of the catalytic acid base, Glu-162, via displacement of the bromide nucleofuge (Fig. 16A). The specific labeling of Glu-162 was consistent with the observation by MS of a single protein⅐inhibitor covalent complex of both the wild-type and the E280A catalytic nucleophile mutant in solution (Fig.  11). Likewise, labeling of the general-acid base residue in a cellulase by a homologous N-bromoacetylcellobiosylamine has previously been observed (58). As with the PbGH5A(E280A)⅐ XXXGXXXG complex structure, clear electron density for the entire ligand indicates well ordered binding. The position of the oligosaccharide moiety of the label in the negative subsites superimposes remarkably well with that of XXXGXXXG in the corresponding complex structure (Fig. 15A). This observation confirmed that these inhibitors retain their full ability to interact with the negative subsites of PbGH5A and are thus accurate substrate mimics. The key difference between these structures results from accommodation of the inhibitor's "handle" in the Ϫ1 subsite. Here, the strictly conserved His-328 orients the amide moiety through a hydrogen bond, although the active site nucleophile, Glu-280, forms a hydrogen bond with the C2-hydroxyl of the glucose moiety (Fig. 16A).
An unexpected second inhibitor molecule occupied the positive subsites of the enzyme in crystallo. This inhibitor molecule was covalently linked to Met-62 of a neighboring enzyme molecule within the crystal packing, suggesting that in this orientation the terminal group of the inhibitor was solvent-exposed and suitably poised to react with the nucleophilic thioether. Multiple labeling of PbGH5A in the presence of this inhibitor in solution was not observed by MS, indicating that this was a fortuitous event, prompted by the particular crystal packing. Notably, this result follows a previous observation made by Black et al. (59), who suggested that nonspecific labeling could occur at solvent-exposed methionine residues.
This covalent pinning via the N-acetyl moiety resulted in well ordered binding for all four glucosyl units of the second inhibitor moiety (Fig. 16B). The oligosaccharide portion is oriented similarly to this in the PbGH5A⅐XXXG and PbGH5A(E280A)⅐ XXXGXXXG complexes and is likewise distinct from the PbGH5A(E280A)⅐cellotetraose complex. Subtle differences in the conformation of XXXG-NHCOCH 2 -in the positive subsites vis à vis XXXG and XXXGXXXG again suggests that the recognition of the XXXG motif is plastic. Here, the hydrogenbonding network represents a mixture of that observed for these other branched complexes. The xylosyl residue in the ϩ1Ј subsite retains the hydrogen bond with Lys-214 and forms an additional hydrogen bond with the side chain of Asp-241 (Fig.  16B). Similar to the XXXG complex, the ϩ1Ј xylosyl residue of the inhibitor forms a hydrogen bond with the backbone of Ala-212, yet similar to the XXXGXXXG complex, the same moiety forms a hydrogen bond to a hydroxyl of the ϩ2 glucosyl residue. The xylosyl unit in the ϩ3Ј position is hydrogen-bonded to the main chain of Trp-243 and the side chain of Asp-241, reminiscent of glucose binding in the ϩ3 subsite of the cellotetraose complex. The remainder of the interactions are entirely water-mediated.
In summary, the comparative analysis of PbGH5A⅐ligand complex structures revealed a striking variation in ligand-protein interactions within the positive but not the negative subsites. This, however, does not translate into conformational changes in the protein structure itself. The enzyme backbones in all four complex structures superimposed with an average r.m.s.d. of 0.3 Å for the protein C␣ atoms. The binding of the various oligosaccharides in approximately the same location but with a drastically different orientation therefore points to the great versatility of the PbGH5A active site with respect to accommodation of different ligands within the positive subsites (Fig. 15A).

Discussion
The combination of detailed kinetic analysis together with new insight brought by novel crystallographic complexes of PbGH5A provides a unique opportunity to explore key enzyme-substrate interactions that define substrate specificity within GH5_4 and to further elucidate the roles of this subfamily in glucan catabolism.
Examination of the overall shape of the active site of these enzymes reveals a substantially shallower and narrower cleft of the predominant mixed-linkage endo-␤-glucanases versus the predominant endo-xyloglucanases (Fig. 17A). Quantitation of this difference using the CASTp server indicated that both the active site surface area and volume are greater by approximately one-third for the former enzymes (Table 3 (Fig. 17). There is a great variability in the overall conformation of these loops compared with the other discussed GH5_4 enzymes; however, key features are conserved. The conserved regions encompass functionally equivalent residues participating in key protein-ligand interactions, which are generally found at the base of the loops and include catalytic residues Glu-162 and Glu-280, stacking residues Trp-48, Tyr-240, Trp-243, and Trp-324, and hydrogen bonding partners His-112 and Asn-161.
The constriction at the top of the catalytic center in PbGH5A is mainly formed by the loop residues 280 -295, of which Asp-288 forms a direct interaction with the glucosyl moiety bound in the Ϫ1 subsite. This feature, which is conserved in the bMLG-specific enzymes (Fig. 17B), is absent in the xyloglucanases (Fig. 17C). At the bottom of the active center, the shallow pocket of PbGH5A is formed in part by a well conserved histidine residue, His-113. Here, the additional depth of the xyloglucan-specific enzymes is due to the presence of a bulky aromatic residue found in the Ϫ2 subsite and responsible for stacking of the Ϫ2Ј-xylosyl (Fig. 17, A and C). The distinct conservation in this region has been observed previously and reported as a potential signature motif (16). The active-site pocket widens beyond the Ϫ1 and ϩ1 subsites, and although the branching residues of xyloglucan saccharides can be accommodated by the protein here, there are no obvious pockets that appear to be specifically tailored for this purpose. In the negative subsites, it is the glucan backbone that is intricately bound via stacking and hydrogen-bonding interactions, whereas the majority of the xylosyl moieties are solvent-exposed. This is distinct from the specific endo-xyloglucanases of GH5_4, in which aromatic residues have been identified to provide binding platforms for Ϫ2Ј and Ϫ3Ј xylosyl residues (Fig.  17, B and C) (16). Likewise in the positive subsites, PbGH5A appears to accommodate, rather than specifically harness, branched oligosaccharide in an open cleft. It is particularly striking that the glucan backbones of cellotetraose (GGGG) and its triple-xylosylated congener XXXG are bound with different trajectories through the positive subsite region (Fig. 15), which again implies significant flexibility in substrate binding.
In this context, it is notable that PbGH5A hydrolyzes xyloglucans at non-canonical backbone cleavage sites. Of the GH5 endo-xyloglucanases characterized to date, all cleave the dicot xyloglucan polysaccharide (exemplified by Tamarindus indica xyloglucan) at the unbranched backbone glucosyl unit (Fig. 1) to generate oligosaccharides based on a Glc 4 backbone (16,17,64). This cleavage pattern is also typical for GH7, GH9, GH12, and GH16 members, with known exceptions of certain GH44 and GH74 members (16, 18, 66 -71). Although the heptasaccharide XXXG was not hydrolyzed in the presence of high enzyme concentrations (0.1 mg/ml of PbGH5A), the limit-digestion products of tamarind xyloglucan hydrolysis contained oligosaccharides consistent with cleavage via binding "X" ([Xyl␣(1,6)]Glc-) units at subsite Ϫ1 (Fig. 15). Initial-rate kinetic analysis of the hydrolysis of the tetradecasaccharide XXXGXXXG revealed that cleavage of this substrate at the internal unbranched glucosyl residue predominated, although it was slow (k cat ϭ 422 min Ϫ1 , K m ϭ 32 M, Table 1). However, analysis of the limit-digest (data not shown) showed alternative cleavage modes resulting in the formation of XXG and XXXGX. Taken together, the data indicate that glucan chain branching is generally not well tolerated at the cleavage site due to constriction at subsites Ϫ1/ϩ1 (Fig. 17A), although an overall lack of specificity for xyloglucan motifs allows variable substrate positioning in the active-site cleft.
Active Site of PbGH5A Comprises Seven Subsites in Total-Mapping the PbGH5A active site using chromogenic and native substrates, together with crystallographic analysis of enzyme⅐ oligosaccharide complexes, suggests the presence of seven well defined subsites, four negative subsites and three positive subsites, in an open active-site cleft. Indeed, the highest activity was observed for a mixed-linkage heptasaccharide, G3GGG3GGG (closely followed by cellopentaose and cellohexaose), whereas unbranched tetrasaccharides represent the smallest competent naturally occurring substrates for PbGH5A ( Table 1). The mode of hydrolysis of the minimal substrate cellotetraose defined the smallest subset of subsites utilized for activity on linear glucans, with the Ϫ23ϩ2 binding/hydrolysis mode significantly favored over Ϫ33ϩ1. When two positive subsites are occupied, the importance of the Ϫ3 subsite contribution is emphasized by the 130-fold increase in the k cat /K m value of Ϫ33ϩ2, the binding/hydrolysis mode for cellopentaose versus the Ϫ23ϩ2 mode (Table 1). An essentially identical increase in k cat /K m values for release of the aglycones from GGG-CNP versus GG-CNP and GGG-PNP versus GG-PNP was observed. Collectively, these data indicate that binding in the Ϫ3 subsite of PbGH5A contributes a ⌬⌬G of Ϫ12 kJ/mol to catalysis. Comparison of the Ϫ33ϩ1 binding/hydrolysis mode for cellotetraose with the Ϫ33ϩ2 binding/hydrolysis mode for cellopentaose reveals that binding in the ϩ2 subsite contributes Ϫ17 kJ/mol to catalysis. As such, interactions in the ϩ2 subsite (stacking with Trp-243 and hydrogen bonding with Asp-171) make a significantly greater contribution to catalysis than the interactions in the Ϫ3 subsite (stacking with Trp-48).
Moving beyond the five core subsites spanning Ϫ33ϩ2, crystallographic complexes provide compelling evidence for an additional negative subsite that may explain the slight kinetic enhancement observed for the catalysis of GGGGGG3 GGGGϩGG over GGGGGG3 GGGϩGGG (Fig. 15 and Table  1). Specifically, the non-covalent PbGH5A(E280A)⅐XXXGXXXG structure (Fig. 13A) and the affinity-labeled PbGH5A⅐XXXG-NHCOCH 2 -structure (Fig. 12A) reveal a glucose-phenylalanine stacking interaction constituting subsite Ϫ4 (Fig. 13A). In contrast, differential binding of XXXG and GGGG ligands in the positive subsites makes clear definition of an additional positive subsite, ϩ3, difficult. The apparent length of the active-site cleft (Fig. 15) and well ordered electron density for the ϩ3 glucose residue in all ligands (Fig. 13, B-D) imply that a binding surface may exist, although the breadth of the cleft at this point is not sufficient to restrict the backbones of all ligands to lie on the same trajectory. The absence of a ϩ4 subsite is less ambiguous (Fig. 15). Unfortunately, a lack of a sufficient diversity of higher oligosaccharide substrates precludes detailed kinetic dissection of these more distal subsites; yet the observation that the heptasaccharide G3GGG3GGG is cleaved exclusively through a Ϫ43ϩ3 binding/hydrolysis mode (Table 1) strongly supports the definition of seven subsites (Fig. 14).
PbGH5A Exhibits Subtle Discrimination of ␤-Glucan Linkage Regiochemistry in the Active Site-Kinetic analysis of the hydrolysis of mixed-linkage oligoglucosides revealed that PbGH5A was essentially equally competent at hydrolyzing ␤(1,3)and ␤(1,4)-linkages at the catalytic center but demonstrated differential preference for these linkages in the positive and negative subsites. The tetrasaccharides GGGG (cellotetraose), GG3GG, and GGG3G are all hydrolyzed with similar k cat /K m values for the Ϫ23ϩ2 binding/hydrolysis mode, yet G3GGG was not cleaved by PbGH5A (Table 1).
The k cat /K m value of GGGG is only 1.5-fold greater than that of GG3GG (Ϫ23ϩ2 binding/hydrolysis mode). This effective lack of linkage specificity can be rationalized in light of the oligosaccharide orientation in the positive subsites of the GGGG and XXXG heptasaccharide ligand complexes. The dramatic difference in the binding mode of these two ligands results in the close superposition of the C3-hydroxyl of cellotetraose and the C4-hydroxyl of XXXG due to a 180°rotation of the glucan backbone (Fig. 15C). Assuming that these structures reflect both the EP (enzyme/product) and corresponding ES (Michaelis) complexes, both C3-and C4-hydroxyl moieties can be suitably positioned as nucleofuges at the catalytic center. Furthermore, the apparent breadth of the active site toward the positive subsites readily accommodates the different binding orientations distal to the catalytic center required for longer ␤-glucan substrates (Fig. 15). Here, a substantial number of ordered water molecules are present (data not shown), which can potentially be displaced variably during the binding of alternative substrates.
Further reflecting an ambivalence to linkage regiochemistry at the catalytic center, both GGGG and GGG3G were cleaved via the Ϫ33ϩ1 binding/hydrolysis mode to produce cellotriose (GGG) and glucose (G). However, the loss of ϩ2 subsite binding and gain of Ϫ3 subsite binding had a large negative effect on the k cat /K m value of GGGG (7.7-fold lower than that of the Ϫ23ϩ2 mode). In contrast, the k cat /K m value for GGG3G hydrolyzed via the Ϫ33ϩ1 mode is increased 5.2-fold versus the Ϫ23ϩ2 mode. The results highlight the delicate balance between the contributions of subsite binding and glycosidic bond specificity to catalysis. Although it is difficult to fully disentangle these competing effects given the available kinetic and structural data, it is clear that ϩ2 subsite binding is particularly important for catalysis of all-␤(1,4)-linked substrates; based on k cat /K m values (Table 1), cellopentaose is hydrolyzed nearly 900-fold better in the Ϫ33ϩ2 mode (the exclusive hydrolysis mode) than cellotetraose is hydrolyzed in the Ϫ33ϩ1 mode (which, again, is 7.7-fold poorer than in the Ϫ23ϩ2 mode).
The observation that GGG3G is efficiently hydrolyzed to GG and laminaribiose (G3G) indicates that ␤(1,3) glucosidic bonds are tolerated between the ϩ1 and ϩ2 subsites. Comparison with the k cat /K m value for the Ϫ23ϩ2 mode of hydrolysis of cellotetraose (GGGG), indicates that ␤(1,3) bonds are slightly disfavored in this position by a factor of 4 (Table 1), although this equates to less than 2 kJ/mol of lost transition-state stabilization. The recognition of ␤(1,3)-linkages between subsites ϩ1 and ϩ2 is likely to be responsible for the generation of G3GGG in the limit digest of barley bMLG (Fig. 5). The complexes of PbGH5A with GGGG and XXXG in the positive subsites suggests that the presence of a ␤(1,3)-linkage between the ϩ1 and ϩ2 subsites would necessarily cause the saccharide chain to adopt a different conformation, possibly disrupting the ϩ2 hydrogen bonding interaction with Asp-171, but stacking with Trp-243 in subsite ϩ2 would be anticipated to remain, due to the plasticity of this interaction (Fig. 15).
Turning to the negative subsites, binding in subsite Ϫ2 is essential for catalysis; no substrates, including G-PNP, were hydrolyzed to release glucose via Ϫ13ϩn modes (Table 1). Notably, kinetic analyses revealed that ␤(1,3)-linkages are not tolerated between three of four negative subsites. In particular, G3GGG is not hydrolyzed through possible Ϫ13ϩ3, Ϫ23ϩ2, or Ϫ33ϩ1 modes ( Table 1). The lack of Ϫ23ϩ2 and Ϫ33ϩ1 activity vis à vis the three other mixed-linkage tetrasaccharides provides clear evidence that ␤(1,3)-linkages are not accepted between subsites Ϫ2 and Ϫ1, as well as Ϫ3 and Ϫ2. Furthermore, GG3GG is not hydrolyzed via the Ϫ33ϩ1 mode, unlike GGGG and GGG3G, which also indicates intolerance of ␤(1,3)linkages between subsites Ϫ2 and Ϫ1. Similarly, the heptasaccharide G3GGG3GGG is only cleaved at the internal ␤(1,3)glycosidic bond. The two ␤(1,3)-linkages prevent productive binding and cleavage at the four possible ␤(1,3)-glycosidic bonds, whereas the non-reducing-end ␤(1,3)-linkage is tolerated in subsite Ϫ4. The inability of PbGH5A to accept ␤(1,3) bonds in the negative subsites is partially substantiated by the structures of complexes with xyloglucan oligosaccharides bound in these subsites. As discussed above, the xylosyl residues of these XXXG-based ligands are mostly solvent-exposed, such that the observed binding of the backbone (Figs. 3A and 13A) might be anticipated to closely approximate that of the unbranched cellotetraosyl unit (GGGG). In the Ϫ1 subsite, the enzyme forms intimate contacts with each ligand, with the C1-hydroxyl hydrogen bonding to the catalytic acid/base Glu-162 and the C3-hydroxyl directly interacting with conserved residues His-112 and His-113. As such, accommodating a ␤(1,3) link to the Ϫ2 subsite would break this interaction and require a major change in substrate orientation, likely altering the position of the scissile bond relative to the catalytic center. Beyond the Ϫ2 subsite, the active-site cleft widens significantly, such that there are no obvious steric factors that would hinder substrate binding in this region. Binding of ␤(1,3)-linked glucose across subsites Ϫ3 and Ϫ2 may be disfavored because the resulting kink in the glucan backbone could disrupt key stack-ing interactions with Trp-48 and Phe-47, which are the main contributors to the well ordered ligand binding seen in the Ϫ3 and Ϫ4 subsites, respectively. Regardless, the presence of a ␤(1,3)-linkage between subsites Ϫ4 and Ϫ3 would appear to be structurally accommodated, as underscored by the superior kinetics of G3GGG3GGG (Table 1).
Implications for Specificity Prediction in GH5 Subfamily 4 -Subfamily 4 is one of the largest GH5 subfamilies, which resulted from the merger of the previous cellulase subfamilies A3 and A4 (11). To explore the possibility of delineating the known "cellulase," mixed-linkage endo-glucanase, and endoxyloglucanase activities within specific clades, we performed a new phylogenetic analysis of GH5_4 using all sequences in the public CAZy Database (supplemental Fig. S1). Bootstrap analysis revealed several well defined clades; however, endo-glucanase and endo-xyloglucanase activities were not absolutely segregated.
A lack of systematic enzymological data further hampers efforts to delineate specificities by phylogeny. Although a generally low coverage of biochemical characterization is a ubiquitous problem for all GH families, a further significant issue arises from the use of CMC as a proxy to measure cellulase FIGURE 17. Comparison of GH5_4 structural homologs. A, overall shape of the active site pocket. The active sites are shown as a semi-transparent blue surface representation for six structures as follows: three MLG active enzymes PbGH5A, Xeg5A (PDB code, 4W88), and BhGH5 (PDB code, 4V2X); and three XyG-specific enzymes, Xeg5B (PDB code, 4W8B), PpXG5 (PDB code, 2JEQ), and BoGH5 (PDB code, 3ZMR). Ligands, if present, are shown in cyan ball-and-stick representation. Highlighted in red are the two catalytic glutamate residues present in all of the compared structures. Highlighted in green are two regions that contribute the most to the differences in the active-site shape between the compared structures: the narrowing at the top of the Ϫ1 subsite in MLG-active enzymes (absent in XyG-specific enzymes), and the presence of a bulky aromatic residue making up the binding platform for the Ϫ2Ј-xylose in XyG (absent in MLG-active enzymes). For emphasis, a white circle contours the binding surface available at the Ϫ1 and Ϫ2 subsites of the XyG-specific enzymes and points to the lack thereof for the MLG-active enzymes. B, superposition of the MLG-active enzymes from A. For clarity, only the secondary structure of PbGH5A is shown. The loops making up the active site are shown in cyan (top four loops) and blue (bottom three loops). Shown in ball-and-stick are the residues responsible for the unique shape of the active site: top acidic residues narrowing the Ϫ1 subsite and the bottom His residue forming the Ϫ2Ј subsite. PbGH5A residues are in green, Xeg5A in gray, and BhGH5 in wheat. For general orientation, the XXXGXXXG ligand in PbGH5A structure is shown in line representation. C, comparison between PbGH5A and XyG-specific enzymes. The representation is the same as in B. Distinct residues in the Ϫ1 and Ϫ2Ј-subsites are in the following color code: PbGH5A, green; BoGH5, orange; Xeg5B, violet; PpGH5, pink.
activity. As the present reanalysis of PbGH5A activity shows (Table 1), the original use of CMC as a substrate to characterize this enzyme was misleading (24); in fact, the amorphous, phosphoric acid-swollen cellulose is an even poorer substrate for PbGH5A. Analogously, it is therefore unclear how many of the 56 GH5_4 members currently assigned as cellulases or endo-␤(1,4)-glucanases, often solely on the basis of activity toward this unnatural, anionic, polysaccharide derivative, have been incorrectly annotated. When assaying new GH5_4 members, a wider panel of soluble polysaccharide substrates must be tested, and more detailed re-evaluation of currently characterized members is certainly warranted. More broadly, it could be argued that CMC should be abandoned as a substrate altogether.
Regardless, a growing body of data suggests that GH5_4 members are more likely to be active on the amorphous crosslinking glycans of the composite plant cell wall, rather than on the para-crystalline cellulose component. Testing this hypothesis will require further characterization of this large and historically significant subfamily via structure-function analyses that are at the same time systematic and deep. As our work here shows, such endeavors are likely to be fruitful in uncovering unanticipated specificities, thereby increasing the library of biocatalysts for potential applications.