Tracing Determinants of Dual Substrate Specificity in Glycoside Hydrolase Family 5*

Background: Glycoside hydrolase family 5 (GH5) comprises enzymes with a wide range of activities critical for the deconstruction of lignocellulose. Results: Concurrent glucan and mannan specificity in over 70 members of GH5 can be ascribed to a conserved active site motif. Conclusion: Single domain multispecific hydrolases are widely prevalent. Significance: This finding has potential applications in improved enzyme mixture design or microbes engineered for consolidated bioprocessing of lignocellulose. Enzymes are traditionally viewed as having exquisite substrate specificity; however, recent evidence supports the notion that many enzymes have evolved activities against a range of substrates. The diversity of activities across glycoside hydrolase family 5 (GH5) suggests that this family of enzymes may contain numerous members with activities on multiple substrates. In this study, we combined structure- and sequence-based phylogenetic analysis with biochemical characterization to survey the prevalence of dual specificity for glucan- and mannan-based substrates in the GH5 family. Examination of amino acid profile differences between the subfamilies led to the identification and subsequent experimental confirmation of an active site motif indicative of dual specificity. The motif enabled us to successfully discover several new dually specific members of GH5, and this pattern is present in over 70 other enzymes, strongly suggesting that dual endoglucanase-mannanase activity is widespread in this family. In addition, reinstatement of the conserved motif in a wild type member of GH5 enhanced its catalytic efficiency on glucan and mannan substrates by 175 and 1,600%, respectively. Phylogenetic examination of other GH families further indicates that the prevalence of enzyme multispecificity in GHs may be greater than has been experimentally characterized. Single domain multispecific GHs may be exploited for developing improved enzyme cocktails or facile engineering of microbial hosts for consolidated bioprocessing of lignocellulose.

Enzymes are commonly viewed as highly specific for their natural substrates; however, this view obscures the fact that many have the ability to perform multiple activities (1,2). This "promiscuity" typically involves the same chemistry applied to different substrates or, alternatively, can use different catalytic machinery within the same active site (3,4). One well studied example is the serum paraoxonase PON1, which can hydrolyze lactones, thiolactones, carbonates, esters, and phosphotriesters, using one set of active site residues for some functions and other residues for different functions (4,5). Enzyme multispecificity is essential, in many cases, to organismal survival and has also been argued to be a byproduct of divergent evolution from unspecialized ancestor enzymes, potentially explaining why secondary functions of one enzyme are often primary functions in other members of the same family or superfamily (3,4). The relative prevalence of enzyme promiscuity is an open question, but it has been suggested to be a fundamental characteristic of enzymes in general (2).
Multispecific enzymes can be found in numerous different protein families including the glycoside hydrolases (GHs), 5 a large class of enzymes, which catalyze the hydrolysis of plant polysaccharides (6). GHs are categorized in the Carbohydrate-Active Enzymes (6) (CAZy) database into more than 100 sequence-based families including endo-, exo-, and side chainacting hydrolases specific to glucose-, xylose-, mannose-, galactose-, and arabinose-containing polysaccharides, among others. An important application of GHs is in the hydrolysis of cellulose and hemicellulose into fermentable sugars for subsequent conversion to biofuels or commodity chemicals. Members of a given CAZy family share structural features and conserved catalytic residues but may or may not exhibit identical substrate specificity. For example, members of the GH1 family are active on a number of sugar types linked through the ␤-1,4 bond, including ␤-galactose, ␤-mannose, and ␤-glucose. In contrast, all existing members of GH64 are ␤-1,3-endoglucanases.
In this study, we probed the extent and mechanisms of multisubstrate specificity in a highly diverse GH family using phylogenetic analysis and biochemical characterization. GH family 5 (GH5) comprises enzymes with a wide range of activities critical for the deconstruction of lignocellulose including endo-␤-1,4-glucanase, endo-␤-1,4-mannanase, endo-␤-1,3-glucanase, endo-␤-1,6-galactanase, lichenase, xyloglucan-specific endo-␤-1,4-glucanases, and endo-␤-1,4-xylanase (6). The number of substrates catalyzed by GH5 enzymes suggests that this family may contain single domain enzymes with multiple specificities. To test this hypothesis, we built a phylogenetic tree for this highly diverse family from a multiple sequence alignment (MSA) built using both sequence and structure information. This combined sequence and structure approach allowed the resulting alignment to contain genes with low pairwise sequence identity and a variety of functions, something that would not have been possible with standard sequence-only MSA-building methods. We analyzed sequence patterns from the resulting subfamilies to identify glucan and mannan specificity-determining residues and, through extensive biochemical characterization, validated a conserved motif that enables dual substrate specificity within a single catalytic domain. Subsequently, we applied this motif to enhance the catalytic efficiency of a GH5 enzyme for glucan and mannan hydrolysis by 175 and 1,600%, respectively. The conserved motif allowed us to discover new enzymes active on both glucan and mannan substrates, and its presence in over 70 members of GH5 strongly suggests the widespread prevalence of dual specificity in this family. Finally, extending the aforementioned phylogenetic analyses to other CAZy families, GH1 and GH43, further indicates that single domain multispecific GH enzymes may be more common than is currently characterized. As such, single domain multispecific GHs would be expected to reduce the complexity of designing enzyme mixtures, as well as microbial hosts for consolidated bioprocessing of lignocellulose.

EXPERIMENTAL PROCEDURES
Creation of Structure-based Sequence Alignments-To build a high quality sequence alignment in this diverse protein family, we used a combination of structural and sequence information. First, we performed pairwise structural alignments with 3Dhit (7) of 22 GH5 family structures (chain A of Protein Data Bank (PDB) IDs 2JEP, 3VDH, 1BQC, 2WHL, 7A3H, 2OSX, 1H1N,  1QNR, 1TVN, 1RH9, 1UUQ, 1EDG, 2C0H, 2CKS, 1WKY,  1H4P, 2PC8, 1CEO, 2ZUM, 1VJZ, 1EGZ, and 1ECE) to the Cel5A_Tma structure (PDB ID 3MMW, chain A). These 23 structures were selected based on their resolution and to remove redundancy at 90% sequence identity. For each of these structures, we used BLAST on GH5 sequences (after removing short sequence fragments) from the CAZy database to find sequences between 25 and 90% sequence identity with the sequence of the structure, and the resulting sequences were aligned with MUSCLE (8). These 23 MSAs were then combined into one MSA by aligning equivalent positions in the individual MSAs using the pairwise structural alignments to 3MMW. Redundant sequences were filtered out at 90% sequence identity, preferentially keeping sequences with structures, experimental characterization, and longer lengths, in this order of priority. We did not filter explicitly for active site residue identity; however, 94% of the sequences in the alignment contained both catalytic glutamates, and we do not expect removal of the small number of sequences not containing both glutamates to significantly alter the tree. Filtering for required inclusion of other active site residues is possible but was not performed here as some of these active site residues were of interest in finding the specificity-determining motif.
Creation of the Phylogenetic Tree-Gap positions and their neighbors were trimmed from the above structure-based sequence alignment by removing positions with less than 60% occupancy and two flanking positions. The gap positions removed were at positions 1-9, 24 -31, 59 -67, 137-141, 200 -227, 256 -262, 293-299, and 305-309 (Cel5A_Tma numbering). A tree was built from the resulting trimmed alignment using FastTree 2.1.3 (9), and the tree was rerooted such that the root was the midpoint between leaves with the furthest evolutionary distance. To test the sensitivity of the alignment and tree to its method of creation, we built the alignment and tree starting from a different x-ray structure (PDB ID 2WHL from Bacillus agaradhaerens). The resulting tree was nearly identical, displaying essentially the same subfamily separations as the tree built from the Cel5A_Tma structure.
Subfamily Identification-Subfamilies were divided based on the clade divisions in the tree based on evolutionary distance from the root node, the length of their branches, and their bootstrap support (above 80%; see Fig. 1). Specifically, we chose subfamilies by first moving along branches away from the center of the tree until a long branch distance was found from a node that had bootstrap support above 80% (such as exists for subfamilies A1 and A8). This allowed identification of subfamilies and subfamily groups A1, A8, A7/10, A12, A5/6, A2, A11, A9, and A4. A3 did not have a long branch but clustered differently from nearby A4; A9 was thus assigned as a subfamily. A10 was split from A7 by iterating the above procedure a second time because there was a long branch from the common node of these two subfamilies. The subfamily naming used the designations in the literature describing structures in each subfamily, with the exception of the two new subfamilies A11 and A12. A11 and A12 contained the PDB IDs 1VJZ and 2OSX, respectively, neither of which contained a reference to a subfamily in the literature.
Selection of Cel5A_Tma Active Site Residues for Analysis-The ligand in PDB ID 1ECE was used to find active site positions because this ligand represents a four-sugar substrate with units on both sides of the active site, whereas most other cocrystals of homologs contain ligands binding to only one side of the active site. Residues with side chain atoms with 6 Å of the 1ECE ligand were selected with the exception of Ala-24, which is pointing away from the active site. Residues with high sequence entropy (above 1.75) and low occupancy (below 70%) in the A4 subfamily MSA were removed.

Determination of Conserved and Nonconserved Active Site
Positions-We used the following equation to calculate a BLOSUM-weighted profile difference score for alignment position i where p aa,i is the probability of amino acid aa occurring at position i in the alignment and BLOSUM aa1,aa2 is the BLOSUM substitution matrix value for amino acids aa1 and aa2. The resulting profile difference scores for the active site profiles of subfamily A4 versus each of the large primarily endoglucanase or mannanase subfamilies (A1, A2, A5/6, A7, and A8) are summarized in supplemental Table S3. Structural Modeling-The structural model of Cel5A_Tma in complex with the disaccharide glucan-based substrate has been published previously (10). To create the Cel5A_Tma complex with the disaccharide mannan-based substrate, the glucan-based substrate configuration was altered at OH-C2 by comparison with other mannan-based complex co-crystals. Hydrogens were added using UCSF Chimera (11), and His-95 and Asn-20 dihedrals were optimized for hydrogen bonding with the ligand (resulting in heavy atom root mean square deviation values of 0.43 and 0.32 Å, respectively); other rotatable hydrogen dihedrals were positioned by inspection to assess possible hydrogen bond geometries (supplemental Table S4). The subfamily A7 co-crystal structures described in Results containing an asparagine distant in primary sequence that occupies similar three-dimensional coordinates as Asn-20 are Man5_Tfu (Thermomonospora fusca, PDB ID 3MAN (12)) and Man5A_Bag (B. agaradhaerens, PDB ID 2WHL (13)). The subfamily A8 co-crystal structures containing the aspartate in similar three-dimensional coordinate space as Asn-20 are Man5A_Sly (Solanum lycopersicum, PDB ID 1RH9 (14)) and Man5A_Hje (Hypocrea jecorina, PDB ID 1QNR (15)). The model for Cel5B_Dtu was created with Phyre2 (16).
Chemicals and Reagents-All chemicals and enzymes were analytical grade from Sigma or EMD Chemicals. BugBuster protein extraction reagent, Popculture reagent, rLysozyme solution, Benzonase nuclease HC (purity Ͼ90%), and proteinase inhibitor mixture V (EDTA-free) were from Novagen and Calbiochem (EMD Biosciences). The Champion pET101 directional TOPO expression kit was from Invitrogen. Nickel-nitrilotriacetic acid spin columns were from Qiagen. Zeba spin desalting columns (2 ml, 70,000 molecular weight cut off) were from Pierce (Thermo Fisher Scientific). The bicinchoninic acid kit (BCA1-1KT) was from Sigma-Aldrich. Luria-Bertani (LB) medium was from EMD Chemicals, and 2xYT medium was from Sigma-Aldrich.
Gene Synthesis, Cloning, and Mutagenesis-Genes were codon-optimized according to the codon usage in Escherichia coli and synthesized by GenScript USA, Inc. All the genes were amplified and cloned by the pCDF-2 Ek/LIC vector kit (Novagen, EMD Biosciences) except that cel5a_Pbr was cloned into pET101 vector (Invitrogen). Cloning primers are listed in supplemental Table S5a. Construct for Cel5A_Tma, pCDF2-cel5a_Tma, has been described before (10). All the constructs were confirmed by DNA sequencing (Quintara Biosciences). Site-directed mutagenesis was conducted by using the QuikChange Lightning site-directed mutagenesis kit according to the instructions of manufacturer (Agilent Technologies). All mutagenic primers are listed in supplemental Table S5b. The mutant plasmids were extracted by the QIAprep spin miniprep kit (Qiagen) and confirmed by DNA sequencing (Quintara Biosciences).
Protein Expression and Purification-All the constructs were transformed into BL21 (DE3) (Novagen, EMD Biosciences) for protein expression. Single colonies were inoculated into 5 ml of LB autoinduction medium (Overnight Express autoinduction system 1, Novagen, EMD Biosciences) containing appropriate antibiotics (100 g/ml carbenicillin for pET101 constructs and 100 g/ml streptomycin for the others) and incubated at 30°C for 24 h. Induced cultures were harvested and preserved at Ϫ80°C until use. Protein extraction, purification, buffer exchange, and concentration determination were as described before (data not shown).
Reducing Sugar Assays-The dinitrosalicylic acid method in a microplate format (17), without adding phenol and sulfite, was used for most of the enzyme assays, whereas 3-methyl-2benzothiazolinonehydrazone (MBTH) was used for the kinetic assays. The MBTH method was used as described by Anthon and Barrett (18) with the following modifications: 40 l of sample was mixed with 80 l of Reagent A (0.25 M sodium hydroxide, 0.075% (w/v) MBTH and 0.025% (w/v) dithiothreitol) and then heated at 80°C for 15 min. After cooling the samples down to room temperature, 80 l of Reagent B (0.5% (w/v) FeNH 4 (SO 4 ) 2 ⅐12H 2 O, 0.5% (w/v) sulfamic acid and 0.25 M hydrochloric acid) was added. These mixtures were incubated at room temperature for 30 min. Samples were assayed for absorbance at 620 nm. The linear range of the MBTH method is 0.05-1 mM of reducing sugars (D-glucose for endoglucanases or D-mannose for mannanases).
Enzyme Assays-Enzyme assays for Cel5A_Tma and its mutants were performed at the respective optimal conditions for the two activities, 70°C and pH 5.00 for endoglucanase activity and 90°C and pH 5.50 for mannanase activity (data not shown), both in 50 mM sodium citrate buffer. For the other enzymes and their mutants, mesophilic enzymes were assayed at 37°C, whereas thermophilic enzymes were assayed at 60°C. 50 mM sodium citrate buffer (pH 5.50) was used for these enzyme reactions. The enzyme reactions contained 0.5% (w/v) carboxymethyl cellulose (CMC, average molecular mass ϳ90 kDa, Aldrich) and lotus bean gum (Sigma) as substrates for endoglucanase and mannanase activity assays, respectively. D-Glucose and D-mannose (0 -5 mM) were used as standards for reducing sugars, as described above, when assaying endoglucanase and mannanase, respectively. The optimal temperatures of Cel5B_Dtu on CMC and lotus bean gum were analyzed from 50 -100°C with 5°C intervals. 50 mM sodium citrate buffers (pH 3.00 -6.50 with 0.50-unit intervals) were used to survey the optimal pHs for endoglucanase and mannanase activities of Cel5B_Dtu. One unit of endoglucanase or mannanase activity is defined as the amount of enzyme required for producing 1 mol of reducing sugars per minute.
Kinetic Assays-All the specific activities and kinetic assays for Cel5B_Dtu were performed under the optimal conditions (pH 5.00 and 70°C for endoglucanase activity; pH 5.50 and 75°C for mannanase activity). CMC and carob galactomannan (low viscosity, Megazyme) instead of lotus bean gum were used as substrates in the kinetic assays. Initial velocities under a wide range of substrate concentrations ([S], 0.2-40 mg ϫ ml Ϫ1 ) were obtained for the calculation of k cat and K m by the Lineweaver-Burk Plot.

Building a High Quality Phylogenetic Tree for GH5-The
CAZy database provides a wealth of data about GHs including lists of genes, activities, and structures within the sequencebased families. However, relationships between members of a given family as revealed by phylogenetic trees are not generally available. To begin our search for single domain multispecific members in GH5, we constructed a phylogenetic tree using available sequence and structure information from this family. Such a tree allows placing the genes into their evolutionary context and identification of subfamilies and sequence patterns between subfamilies with different functions. Phylogenetic tree building relies on the creation first of an MSA containing the sequences of interest. Although there are numerous available tools for building MSAs, their construction for sequence and functionally diverse families is not trivial. Standard MSA tools do not work well when there is sequence identity between members of less than 25%. For example, MSAs have been built with sequence-only approaches (19 -23) that covered part of GH5, limiting the overall size and sequence diversity of its constituent genes. Incorporating the complementary information from experimentally determined protein structures can significantly help in the building of alignments and trees for sequencediverse families (24 -27). Given that this combined structure and sequence-based tree building approach has not previously been used on GH families, we chose to draw on the large number of structures in various GH families to build high quality sequence alignments and phylogenetic trees.
Our approach uses the relatively large number of crystal structures in GH5 (more than 30) to combine the low sequence identity parts of the family into a larger MSA. To do this, we created MSAs containing sequences with greater than 25% sequence identity to enzymes with experimentally determined crystal structures and then combined these MSAs using structure alignment methods. We used the resulting GH5 alignment containing 681 sequences to build a phylogenetic tree using FastTree 2 (9) and annotated it with the experimentally characterized activities obtained from CAZy (Fig. 1). In contrast to this phylogenetic analysis, previous studies of GH5 subfamily classifications focused on one subfamily at a time and were limited to sequence identity-based metrics (e.g. Refs. 28 and 29). In this work, we used the tree to classify subfamilies using their distance from the root, the length of the ancestral branch split, and the bootstrap support (Fig. 1).
Comparison of the functional assignments between the subfamilies in this tree shows phylogenetic correspondence with the divisions of different sugar specificities (Fig. 1). Three large subfamilies appear to contain predominantly ␤-1,4-glucanspecific enzymes (A1, A2, and A5/6); two are predominantly ␤-1,4-mannan-specific (A7 and A8); and one is predominantly ␤-1,3-glucan-specific (A9). In terms of substrate specificity, subfamily A4 appears to be the most diverse in GH5 (supplemental Fig. S1a) in that it contains a variety of ␤-1,4-linked glucan-, mannan-, and xylan-specific enzymes (supplemental Table S1). Notably, several members of subfamily A4 have previously been reported to act on more than one substrate. For instance, GH5 proteins from Prevotella ruminicola (AAC36862.1) and Clostridium cellulovorans (AAA23231.1) have been reported to act on glucan as well as xylan substrates, although detailed biochemical characterization or structural information for these enzymes is not available (30,31). The most thoroughly characterized GH5 enzyme from subfamily A4 is the thermostable enzyme, Cel5A_Tma (AAD36816.1), from Thermotoga maritima (32). Cel5A_Tma can degrade both galactomannan (71 units/mg) and CMC (616 units/mg) at rates comparable with those of its single substrate-specific counterparts Man5_Tma from GH5 (83 units/mg on galactomannan) and Cel74_Tma from family GH74 (121 units/mg on CMC). Functional genomics studies on T. maritima have revealed recruitment of this enzyme on mannan-and glucan-based growth substrates (33).
Discovery of a Specificity-determining Sequence Motif in Cel5A_Tma-To dissect the determinants of substrate specificity in subfamily A4, we used the comprehensive phylogenetic tree of family GH5 to examine the amino acid profiles of active site residues (see "Experimental Procedures") in the A4 subfamily, the mainly mannanase subfamilies (A7 and A8), and the predominantly endoglucanase subfamilies (A1, A2, and A5/6) (Fig. 2a). We categorized these positions as either conserved or variable based upon the extent of amino acid diversity (see "Experimental Procedures") between the subfamily alignments ( Fig. 2b, green circles). For example, the catalytic glutamates at positions 136 and 253 (using sequence numbering from Cel5A_Tma) are conserved among all members of GH5; in contrast, position 96 is variable, having mainly histidines in subfamily A4, but relatively few histidines in the other GH5 subfamilies (Fig. 2a).
These analyses resulted in the identification of seven positions (20,23,53,95,96,201, and 287; Cel5A_Tma numbering) that varied between the subfamilies, suggesting their involvement in substrate specificity. To evaluate the role of these seven residues in substrate specificity, we generated alanine substitutions at these positions in Cel5A_Tma and assayed the purified enzymes for endoglucanase and mannanase activities (Fig. 2b). Of the seven variable positions, alanine mutations at five positions (N20A, E23A, P53A, H96A, and E287A) resulted in reduced activity on mannan, one mutation (H95A) had reduced activity on glucan, and one mutation (F201A) did not have a large impact for either substrate. For each of the seven positions conserved between the subfamilies (30, 135, 136, 196, 198, 253, and 286), mutation to alanine eliminated activity on both substrates, con-sistent with the expected role of these positions either in catalysis or in nonspecific sugar substrate binding.
Application of the Motif to Predict Multispecificity-Next, we examined whether the pattern of amino acids at the six specificity-altering residues found in Cel5A_Tma could be generalized by assessing the presence of the pattern across various enzymes in the GH5 A4 subfamily. We reasoned that dual mannanase and glucanase activity might broadly occur within the A4 subfamily despite the lack of experimental characterization of mannanase activity in the A4 subfamily other than Cel5A_Tma (6), perhaps because previous studies did not test for mannanase activity in addition to the more typical glucanase assay. To this end, we searched for the six-residue pattern (allowing either aspartate or glutamate at positions 23 and 287) in the 143 genes in subfamily A4. We identified more than 70 sequences containing the motif (supplemental Fig. S1b). Based on the presence of the motif, we predicted that these enzymes may have both endoglucanase and mannanase activities (Fig. 1,  pink branches).
To test our prediction of broad multispecificity in subfamily A4, we assayed 10 additional enzymes, selected to broadly cover the phylogenetic diversity in A4 and to either match or differ from the six-residue pattern (Fig. 3a and supplemental Table   S2). Of these enzymes, all exhibited endoglucanase activity, and six also had detectable mannanase activity. Of the six characterized dual specificity enzymes, four had the same pattern at the six residues as Cel5A_Tma, whereas two (Cel5A_Umi and Cel5B_Dtu) did not match the pattern, differing at only a single position. Of the four characterized single specificity enzymes, each differed at one position or more from the motif. We further confirmed the specificity determination of the six-residue pattern in other enzymes from the A4 subfamily by characterizing the endoglucanase and mannanase activities of alanine mutants in two dual specificity enzymes from subfamily A4 with low sequence identity to Cel5A_Tma: Cel5C_Cth (29% sequence identity) and Cel5A_Eec (25% sequence identity). The specificity changes resulting from mutations in both Cel5C_Cth and Cel5A_Eec were consistent with the specificity changes resulting from the corresponding mutations in Cel5A_Tma (Fig. 3b), with the exception of the P72A variant of Cel5C_Cth.
Using the Motif to Engineer Enhanced Activity-In addition to using the six-residue pattern to predict dual specificity, we applied the pattern to engineer enhanced activity. We postulated that the activity of Cel5B_Dtu could be improved by mutating the aspartate at position 14 in Cel5B_Dtu (corresponding to Asn-20 in Cel5A_Tma) to asparagine to fully match the six-amino acid pattern. Homology modeling of Cel5B_Dtu (data not shown) suggested that D14N could allow three hydrogen bonds to the mannan substrate, whereas an aspartate might be limited to two hydrogen bonds. Mutation of D14N in Cel5B_Dtu resulted in enhanced hydrolysis for both substrates; we found an ϳ70% increase in specific endoglucanase activity and an ϳ300% increase in specific mannanase activity (Table 1). Kinetic analysis revealed that this single amino acid substitution decreased the K m for galactomannan by ϳ1,500%, accompanied by a 5.2% increase in k cat ; the K m for the glucan substrate was reduced by ϳ50%, whereas the k cat was increased by ϳ35%. Notably, improvement in catalytic efficiencies (k cat /K m ) attributed to this single mutation for endoglucanase and mannanase activities were ϳ175 and ϳ1,600%, respectively.

DISCUSSION
The comprehensive GH5 phylogenetic tree described here led to the identification of an active site motif describing dual specificity for glucan-and mannan-based substrates in the large and diverse A4 subfamily of GH5. However, a sequence motif alone cannot fully determine the substrate specificity of a sequence-distant group of enzymes given the importance of subtle sub-Angstrom level interactions in the active site. It is interesting then that this motif managed to capture the endoglucanase and mannanase specificity pattern for almost all mutations at these sites in three sequence-distant enzymes (Fig. 3b) and helped to successfully identify dual specificity enzymes (Fig. 3a).
To postulate structural explanations for the mechanisms of specificity changes in the six specificity-altering residues, we modeled (see "Experimental Procedures") the glucan and mannan disaccharides into the Cel5A_Tma active site using the orientation from the structure of Cel5A_Bag (10) (Fig. 4, a and  b). Mannan and glucan sugars differ in the configuration of the hydroxyl group at the C2 sugar, with mannan units having an axial configuration and glucan units having an equatorial configuration (supplemental Fig. S2) (13). With mannan present in the active site, Asn-20 forms two hydrogen bonds with the axial OH-C2 group at Ϫ2 subsite (Fig. 4b), an interaction that is unlikely to occur with the equatorial OH-C2 configuration present in the glucan-based substrate (Fig. 4a). Examination of the co-crystal structures of four strict mannanases from subfamilies A7 and A8 emphasize the importance of this position, showing similar interactions between the OH-C2 group at the Ϫ2 subsite and an aspartate or asparagine (see "Experimental Procedures" for details). In the model, Glu-23 and Glu-287 make hydrogen bonds with the main chain or side chain atoms of Asn-20, respectively, which may act to stabilize the Asn-20 side chain orientation and support its hydrogen bonding with the OH-C2 group of mannan. Mutation of Pro-53 could break the ␤2 ␤-strand, which would produce conformational changes affecting the nearby Asn-20 and Glu-23 residues. The strong effect of the His-95 mutation in reducing glucanase activity can be explained by its interaction with the Ϫ1 subsite OH-C2 when the OH is in the equatorial configuration in the glucan substrate, whereas this interaction does not appear to occur for the axial conformation found in the mannan substrate. A recently released structure of Cel5A_Tma in complex with different sugar moieties confirms our model and supports our interpretations (34).
Although there are no reports in the literature for enzymatic activity enhancement for mannan hydrolysis, the largest improvements for glucan and xylan hydrolysis to date are ϳ80% (35) and ϳ300% (36), respectively. That the 300% increase in specific mannanase activity and 1,600% improvement in mannanase k cat /K m observed for the "back-to-motif" mutation in Cel5B_Dtu come from a point mutant is intriguing given the difficulty of enhancing activity in these enzymes through other optimization techniques (37)(38)(39). The back-to-motif mutation indicates that this substrate-determining motif could be ancestral to the GH5 family, supporting Jensen's hypothesis (40) that the spectrum of specificity in the ancestors of an enzyme family can be seen in the descendant families. Similar to back-to-ancestor mutations at nonactive sites, back-to-motif mutations within the active sites may broaden enzyme activity and also make enzymes more evolvable (41).
In addition to endoglucanase and mannanase activity in A4 subfamily enzymes, preliminary work has shown the presence of a third specificity, xylanase, in some enzymes. 6 This co-oc-  3MMW (10)). Glucose and mannose differ in the configuration of the OH-C2 groups, which are labeled in orange. Hydrogen bonds between glucan (a) and mannan (b) substrates and Cel5A_Tma and between residues in the six-residue motif are shown with black dashed lines, and the hydrogen-acceptor distances are labeled; hydrogen bonds between OH-C2 and Cel5A_Tma are labeled in orange for clarity. The orientations of the substrates were modeled based on the orientation of cellotriose in the Cel5A_Bag crystal structure (45). Further details about the hydrogen bonding geometries are provided in supplemental Table S4. currence of single domain multispecificity and multiple specificities in the subfamily raises the interesting question of whether multispecificity is an "inherent" property of some groups of related enzymes, such as the GH5 A4 subfamily. To investigate the extent of this co-occurrence in CAZy, we extended the aforementioned MSA and phylogenetic analysis to two other well characterized GH families: GH1 and GH43. GH1 contains ϳ3,500 members (with 232 biochemical characterizations), and GH43 contains ϳ2,000 members (with 85 biochemical characterizations) (6). Similar to our findings in GH5, we observed the presence of single domain multispecific enzymes within subfamilies bearing different sugar specificities (supplemental Fig. S3, a and b), which suggests that these subfamilies could contain numerous multispecific members. Further analysis in other GH and non-GH families is needed to confirm this observation more generally, but these results support the idea that multispecificity could be an inherent property of some groups of enzymes.
In conclusion, our comprehensive phylogenetic and biochemical analyses of GH5 and subsequent phylogenetic analysis of GH1 and GH43 suggest that multispecific GH enzymes may be more prevalent than have been experimentally characterized. It will be interesting to investigate whether these multiple specificities are utilized in certain conditions by the host organism or whether they are perhaps a latent property of enzymes evolved from a promiscuous ancestor.