Advantages of a distant cellulase catalytic base

The inverting glycoside hydrolase Trichoderma reesei (Hypocrea jecorina) Cel6A is a promising candidate for protein engineering for more economical production of biofuels. Until recently, its catalytic mechanism had been uncertain: The best candidate residue to serve as a catalytic base, Asp-175, is farther from the glycosidic cleavage site than in other glycoside hydrolase enzymes. Recent unbiased transition path sampling simulations revealed the hydrolytic mechanism for this more distant base, employing a water wire; however, it is not clear why the enzyme employs a more distant catalytic base, a highly conserved feature among homologs across different kingdoms. In this work, we describe molecular dynamics simulations designed to uncover how a base with a longer side chain, as in a D175E mutant, affects procession and active site alignment in the Michaelis complex. We show that the hydrogen bond network is tuned to the shorter aspartate side chain, and that a longer glutamate side chain inhibits procession as well as being less likely to adopt a catalytically productive conformation. Furthermore, we draw comparisons between the active site in Trichoderma reesei Cel6A and another inverting, processive cellulase to deduce the contribution of the water wire to the overall enzyme function, revealing that the more distant catalytic base enhances product release. Our results can inform efforts in the study and design of enzymes by demonstrating how counterintuitive sacrifices in chemical reactivity can have worthwhile benefits for other steps in the catalytic cycle.

To tap into the deep reservoir of renewable energy represented by fuels derived from plant matter, an economical means of converting lignocellulose is required (1). Decomposition of the primary component, cellulose, is catalyzed by glycoside hydrolase (GH) 2 enzymes, which are found ubiquitously in nature (2); therefore, improved catalytic efficiency of cellulose decomposition enzymes would help biomass to compete with nonrenewable carbon sources. This motivates molecular-level studies into GH enzymatic mechanisms, as such understanding has previously proved invaluable in efforts to engineer variants with increased activities (3)(4)(5).
A particularly important GH enzyme is Trichoderma reesei Cel6A (TrCel6A), which plays a key synergistic role in industrial enzyme cocktails for cellulose digestion. This enzyme is a cellobiohydrolase of GH family 6, which cleaves ␤-1,4 glycosidic bonds processively along cellulose chains, from the nonreducing toward the reducing end, to release the glucose dimer cellobiose as the main product (6). This processive mode of action is believed to be key to their efficiency on highly crystalline cellulose (5). Glycoside hydrolase family 6 (GH6) enzymes exhibit a range of activity on a continuum between cellobiohydrolase processive activity and endoglucanase activity, characterized by cleavage of internal bonds (7)(8)(9)(10). Endoglucanases generally exhibit higher catalytic rate constants, only show appreciable activity on less ordered regions of cellulose, and produce a broader range of products (5). A crystal structure of TrCel6A was first solved in 1990 (11), but only recently has its molecular level mechanism begun to be established. GH6 enzymes function via an inverting mechanism wherein the stereochemistry at the carbon of the cleaved ␤-1,4 bond is changed from equatorial to axial. The classical inverting mechanism ( Fig. 1, left) by which such reactions typically take place requires a catalytic acid-base pair (a proton donor and acceptor), the latter of which activates a nucleophilic water molecule during hydrolysis (12). The identity of the catalytic acid in TrCel6A was identified based on the first crystal structure as Asp-221 without controversy (5,11), whereas the identity of the catalytic base was more ambiguous. Initial hypotheses of the catalytic base identity included Asp-263 or Asp-401 (5,11). Mutational studies of Asp-263 soon excluded this residue as the catalytic base (13); later, Asp-401 was also excluded on the basis of activity studies of site mutants of TrCel6A and homologous enzymes (5,9,14,15). Koivula et al. (15) proposed that the two water molecules that consistently appear in the active site of crystal structures would act as a water wire to shuttle a proton to TrCel6A Asp-175, thus allowing this residue to serve as the catalytic base. Recent computational studies provide convincing evidence of this hypothesis (16), and other studies have provided evidence of Grotthuss mechanisms in other GH enzymes (17).
An interesting aspect of the mechanism revealed is that the catalytic step wherein Asp-175 accepts the excess proton requires an additional water molecule compared with the canonical mechanism, positioned between the basic carboxylate group at the end of Asp-175 and the nucleophilic (or "attacking") water, as shown in Fig. 1 (at right). This "wire" or "bridge" water molecule, observed in crystal structures of the enzyme (5,18), momentarily forms a hydronium (H 3 O ϩ ) ion during hydrolysis before offloading its excess proton to Asp-175 (16). The mechanism therefore requires the stabilization of two water molecules in the active site as opposed to only the one participating in the overall chemistry, raising the question as to why the active site of TrCel6A includes the additional water. Previous studies indicate that there is an additional energetic barrier on the order of 5-6 kcal/mol associated with each additional water wire in a Grotthuss mechanism (19). If the additional water molecule is only needed to conduct a proton over the required distance, a mutant base with a longer side chain could obviate the need for the water wire. This suggests investigation of a TrCel6A D175E mutant, because glutamate (E) is identical in structure to aspartate (D) but for an additional CH 2 group in its side chain that could project its carboxylate group farther into the active site. Although no studies appear in the literature on a TrCel6A D175E mutant, mutants of the homologous aspartates have been constructed for the GH6 enzymes Thermobifida fusca Cel6A (TfCel6A, at that time known as Thermomonospora fusca endocellulase E2) (20) and Cellulomonas fimi Cel6A (CfCel6A, known as CenA at the time) (21). Note that these studies were published in 1999 and 1995, respectively, before Koivula et al.'s seminal 2002 paper informing our current understanding of which residues perform the function of the catalytic base in this family of enzymes. Wolfgang and Wilson's 1999 work described the TfCel6A catalytic base as Asp-265, but the homologous residue to TrCel6A Asp-175 is Asp-79 (5,20). Compared with the wildtype enzyme, the D79E mutant showed 0.4, 0.6, and 38% activity on carboxymethyl cellulose, phosphoric acid-swollen cellulose, and Whatman No. 1 filter paper, respectively. Similarly, Damude et al.'s 1995 paper (21) stated that the CfCel6A catalytic base was Asp-392, which was later found inconsistent with new data. Instead, their CfCel6A D216E mutant is homologous to the TrCel6A D175E mutant. The CfCel6A D216E activity was tested on carboxymethyl cellulose and phosphoric acid-swollen cellulose, showing just under 0.1% of wildtype activity on both substrates. Because of the close sequence and structural similarity between all these Cel6A enzymes, it is highly likely that a TrCel6A D175E mutant would show decreased activity, likely in the range of two to three orders of magnitude lower, rather than an increased rate because of obviating the need for the water wire.
To bridge the gap between atomistic enzymatic detail and human chemical intuition, we constructed molecular models of both wildtype and D175E TrCel6A and investigated the influence of the mutation on two nonreactive steps in the catalytic cycle: 1) procession of the substrate into the active site and 2) the transition to the reaction-competent active site conformation following procession. We found in both cases that the longer Glu-175 residue in the mutant was a hindrance to the catalytic cycle, highlighting the importance and intricacy of the remarkable network of hydrogen bonding interactions that stabilizes the active site of the wildtype enzyme. Finally, by means of comparison of the active site to that of a related processive cellulase that functions via the classical inverting mechanism (Thermobifida fusca Cel9A (TfCel9A)), we offer an explanation for the TrCel6A mechanism in terms of reduced association between the product and enzyme. Based on the results of these studies, we propose a benefit in cellulases to activation of the nucleophilic water via a wire water by taking into account aspects of the enzymatic cycle outside the reaction itself, broadening the context for rationalizing enzymatic features in carbohydrate-active enzymes.

D175E mutant: Procession of the substrate into the active site
As a nonreducing-end cellobiohydrolase, each enzymatic cycle in TrCel6A requires the leading ring of the substrate chain advancing from the ϩ1 site to the Ϫ2 site (5), described as moving from the "preslide" to "slide" positions as shown in Fig.  2 (at top). One hypothesis to explain the lower activity of the TrCel6A D175E homologue is that the bulkier side chain could protrude into the active site groove and hinder procession. To study this, we performed umbrella sampling simulations of procession using a collective variable (CV) that tracks the relative positions of the enzyme and substrate (see "Simulation Details" available in the supporting material). The resulting potential of mean force (PMF) plots for both enzyme types are shown in Figure 1. A comparison of the "classical" inverting cellulase hydrolase mechanism proposed by Koshland (12) (left) to that used by TrCel6A (right). The overall reaction chemistry is the same in each case, but TrCel6A requires the presence of the additional "wire water" in the active site because of the increased distance between the attacking water molecule and the catalytic base. Atoms and bonds belonging initially to the nucleophilic water and the wire water are depicted in cyan and red, respectively. Fig. 2 (bottom), with the zero point on the free energy axis set at around 2 Å, where we did not expect the identity of the residue at position 175 to have a large effect. We note that the CV used for sampling did not capture all key features that change during the transition from the preslide to slide positions. Specifically, as discussed in our previous study of TrCel6A wildtype procession, another key order parameter is the puckering of the second-to-leading glycosyl ring as it enters the Ϫ1 binding site, near 9 Å on the horizontal axis of the PMF in Fig. 2 (16). Additionally, we found that the serine loop moves from a more open to less open position during procession, appearing coincident to the procession at ϳ6 Å on the horizontal axis of the PMF. The multidimensional CV required to properly sample all these (and potentially more) key features changing during procession would be computationally prohibitive, and thus the PMF shown for the simplified CV (root mean square deviation (RMSD) only) is most appropriately analyzed in terms of comparative qualitative, not quantitative, differences between the WT and D175E mutant (further discussion of this point is included in the supporting text). Specifically, we note that the PMF for the wildtype enzyme has a pronounced energy well that stabilizes the productive structure just after 10 Å, providing a driving force for spontaneous procession. In contrast, the D175E mutant retains a fairly flat energy profile. This effect can be at least in part attributed to steric clashes with several residues near the catalytic base, as shown in Fig. 3, suggesting that one advantage of the shorter catalytic base is to enhance procession by widening the gap between the wall of the active site tunnel and the substrate.

D175E mutant: Reaction-competent active site conformation
During hydrolysis, the catalytic acid loses a proton whereas the base obtains an excess proton. Before the next reaction, these residues likely exchange a proton when the acid (Asp-221) bends away from the reaction site toward the base (residue 175), as shown for the wildtype enzyme in Fig. 4. After repro-tonating, the acid must rotate along the dihedral angle indicated in the figure to position its proton toward the glycosidic oxygen and realign itself for the next catalytic event (5,15).
The energy landscape in Fig. 4 shows the barriers for the wildtype and D175E mutant for this transition, with the acid and base residues farther apart at the larger dihedral angles. The PMFs are similar in shape, and the difference in barrier height of no greater than 1 kcal/mol is not expected to significantly affect the overall kinetics of the enzyme, as this reconfiguration has a barrier less than half that for hydrolysis of the glycosidic bond (16). However, as shown in Fig. 5, the wildtype and mutant conformations corresponding to the right-hand side energy wells in Fig. 4 have significantly different hydrogen bonding networks. In the wildtype, the hydrogen bond between the acid and base residues is broken, and the acid instead hydrogen-bonds with the glycosidic oxygen, in a reactive conformation for hydrolysis (16). In the mutant, the base often remains

Advantages of a distant cellulase catalytic base
hydrogen-bonded to the acid, rather than with the active-site bridge water, preventing formation of the hydrogen-bonding network that aligns the active site waters for hydrolysis. When unbiased simulations were run with the acid initially at a dihedral angle of 175°, the productive hydrogen bond between the Asp-221 proton and the glycosidic oxygen was never observed over the 1-ns trajectory in the D175E mutant, compared with roughly 5% of the frames in the wildtype.
In a separate simulation, the acid-base hydrogen bond in the conformation shown in Fig. 5B (between Asp-221 and Glu-175) remained stable during 972 picoseconds of simulation, at no point bonding instead with the attacking water. Efforts to obtain such a conformation indicated that it does not occupy a local energy minimum. As shown in Fig. 6, the energy barrier associated with this active conformation was quantified using umbrella sampling with restraints applied to the dihedral angle connecting the ␤and ␥-carbons. This parameterization approximates the motion that the residue undergoes during unbiased simulation initiated near the high-energy region around 40°. As shown, the free energy trough in this region is shallow and readily degenerates to the inactive state at Ϫ50°, and an even lower energy state is available at Ϫ170°(leftmost energy well in Fig. 6). A total free energy activation barrier of roughly 7.3 kcal/mol separates the active state from the low-energy state at Ϫ170°, posing significant hindrance to hydrolysis.
The simulations used to construct the PMF in Fig. 6 began from a structure with Asp-221 initially in a high-dihedral angle conformation from Fig. 4, which in the wildtype aligns the Asp-221 carboxylic proton with the glycosidic oxygen. However, in the mutant these atoms were not consistently aligned, indicating that 7.3 kcal/mol is an underestimation of the barrier to reactive alignment. Additionally, even when the attacking water is aligned by Glu-175, its lone pair is less favorably oriented for nucleophilic attack as compared with the wildtype. Considering these factors and assuming that the lack of a water wire reduces the hydrolysis barrier by between 4.8 and 5.8 kcal/ mol (19), the net effect of the mutation would be to increase the barrier by a conservative (low) estimate of between 1.5 and 2.5 kcal/mol. We find that although Glu-175 may adopt a conformation where it could accept a proton directly from the nucleophilic water (and thus may act as catalytic base in a classical inverting mechanism), activity in this mutant is lower than in the wildtype because (a) the "active" conformation of Glu-175 is disfavored, (b) the lone pair of the attacking water is less favorably oriented for nucleophilic attack at the anomeric carbon, and (c) the catalytic acid Asp-221 is less able to dissociate from Glu-175 (as compared with Asp-175) to enable protonation of the glycosidic bond.
The finding that a glutamate at position 175 disrupts the hydrogen bonding network that catalyzes hydrolysis indicates that active site of TrCel6A is tuned for the shorter, highly conserved aspartate (5). Several other GH6 enzymes have been crystallized, including TfCel6B (PDB ID: 4B4F (22)), Chaetomium thermophilum Cel6A (PDB ID: 4A05 (23)), Humicola insolens Cel6A (PDB ID: 1BVW (24)), and TfCel6A (PDB ID: 2BOD (25)). The distances between the glycosidic oxygens and catalytic bases, as well as the presence of two active site waters in structures apparently primed for hydrolysis, indicate that members of this family generally perform hydrolysis via a Grotthuss mechanism to a more distant catalytic base. Our simulations clearly indicate why a longer catalytic base decreases activity; however, the question remains as to why the active site is not tuned for a longer side chain, or why the aspartate is not positioned closer to the cleavage site.

Active site homology
To better understand why the TrCel6A active site is tuned to require a water wire, we compared its active site to that of another inverting glycoside hydrolase, TfCel9A, a processive endocellulase that is believed to employ the classical mechanism based on crystallography and site-directed mutagenesis studies (26,27). We created a TfCel9A model based on a product-state crystal structure (PDB ID: 4TF4 (28)) to compare with a product-state structure produced for TrCel6A in the course of our previous work (16), as shown in Fig. 7. Panels A and B highlight differences in the hydrogen-bonding networks in the

Advantages of a distant cellulase catalytic base
product states. In both cases, hydrogen-bonding interactions that helped stabilize the nucleophilic water in the appropriate position for catalysis pre-reaction become hydrogen bonds to the product monomer in the Ϫ1 position post-reaction. Specifically, the product hydrogen bonds with Ser-181 and the bridge water (in turn hydrogen-bound to Asp-175) in TrCel6A, and to Asp-55 as well as directly to Asp-58 in TfCel9A. TrCel6A Asp-175 and TfCel9A Asp-58 are the base residues whereas TrCel6A Ser-181 and TfCel9A Asp-55 serve to stabilize the nucleophilic water in the reactant state.
Viewed from the reducing end of the substrate, an interesting geometric distinction between the two active sites becomes apparent as depicted in Fig. 7, C and D. Although TfCel9A employs a glutamate residue instead of an aspartate as its catalytic acid, the acid-base pairs in these enzymes are in similar relative positions to one another (with 5.3 Å of separation between the carbon atoms in their carboxylate head groups in the relaxed product state), and the relative positions of the acid and base to the axis of the substrate in each enzyme are the same (making about a 90°angle in each case). However, the whole of the catalytic machinery of TrCel6A is rotated roughly 45°about the substrate. Because of the oblong cross-sectional shape of the substrate, the TrCel6A base is farther from the Ϫ1 anomeric carbon than in TfCel9A (6.0 versus 4.1 Å in the product state, respectively), requiring the addition of the wire water molecule to connect the base with the attacking water. Although this discussion focuses on the product state, for which a crystal structure of TfCel9A is available, the same conclusions should hold in the transition state structures based on the positions of the acid residue ␣-carbons.
One advantageous function of the water wire could be to destabilize the newly formed product in the Ϫ1 position after hydrolysis. We quantified this effect by measuring the contribution to the overall free energy of binding from the base residue in each enzyme. To focus on the effect of the mediating water wire alone, not obscured by other differences in the enzymes, we calculated the total change in binding free energy associated with mutating the bases to alanine, as detailed in the supporting text. The base-to-alanine mutations remove all of the polar interactions with the catalytic bases, so the difference between the total binding energy in the mutant and wildtype can be interpreted as the contribution only from the bases. As expected, the TfCel9A base (which binds directly to the substrate) has a larger contribution to the binding energy (ϳ7.8 kcal/mol) compared with that of the base in TrCel6A (ϳ2.1 kcal/mol). This result indicates that the GH6 enzyme greatly enhances product release by using the wire water as a buffer between the base and product.
Interestingly, the hydrogen-bonding network in the TrCel6A active site stabilizes the 2 S O pucker of the Ϫ1 ring for the product conformation. In TfCel9A this ring is relaxed to the lowenergy 4 C 1 chair, and its C1 hydroxyl group hydrogen bonds directly with the catalytic base Asp-58. This conformation is further stabilized by hydrogen-bonding between the C1 ␣hydroxyl and the nucleophilic water-stabilizing Asp-55 in TfCel9A, whereas the homologous residue in TrCel6A, Ser-181, stabilizes a puckered product because the latter pulls down the Ϫ1 ring oxygen. This pucker is known to promote reactivity in TrCel6A and occurs spontaneously as the second-to-leading glycosyl ring enters the Ϫ1 binding site (16). Quantum mechanics calculations of monosaccharides have shown that the energetic cost for puckering an ␣-glucose (as in the product) in the 2 S O orientation is ϳ4 kcal/mol greater than puckering a ␤-glucose (as in the reactant) in the 2 S O orientation (29). Holding the product in a puckered state may further promote product release by stabilizing an unfavorable conformation while bound.
In our aforementioned simulation of the TrCel6A base mutated to alanine, the Ϫ1 sugar became free to transition between skew puckers 2 S O and 1 S 3 . Because the product is in the ␣-glucose configuration, this transition stabilizes the bound product by 3 kcal/mol (29). This result points to an additional role of the base (indirectly, through the water wire) in stabilizing the unfavorable 2 S O pucker in the product state and promoting product release.

Discussion
The catalytic mechanism of TrCel6A involves a counterintuitive wire water molecule that is not strictly necessary to the overall chemistry of the enzyme's reaction. Motivated by a desire to rationalize this exception to the classical mechanism for an inverting glycoside hydrolase, we disrupted the active site with a D175E mutation that we hypothesized would supplant the wire water. We found that the longer Glu-175 residue in the Advantages of a distant cellulase catalytic base mutant was a hindrance to the catalytic cycle both during procession and alignment for hydrolysis, highlighting the importance and intricacy of the remarkable network of hydrogen bonding interactions that stabilizes the active site of the wildtype enzyme. During procession, only the wildtype base promoted hydrogen bonding that kept Arg-174 and Asn-182 out of the tunnel, contributing to an energetically favorable, spontaneous forward motion for the wildtype enzyme, which was not manifest in the mutant. In aligning for hydrolysis, hydrogen bonding in the mutant active site favors positioning the mutant carboxylate group even farther from the nucleophilic water than in the wildtype, which, compared with it, both makes accepting a proton more difficult and prevents it from helping align the nucleophilic water for attack. Calculations of low-energy conformations of the system in the hydrolysis-ready position with the Ϫ2 and Ϫ1 positions occupied by the cellulose chain indicate that the overall reaction barrier in the D175E mutant would be increased by at least between 1.5 and 2.5 kcal/ mol. Although this is a qualitative result, it corresponds with an activity reduction of approximately two orders of magnitude, and is therefore approximately consistent with measured activity reductions for the homologous mutants TfCel6A D79E (20) and CfCel6A D216E (21). This agreement allows us to address our primary question of why a longer side chain for the nucleophilic residue does not produce a more active enzyme. The effect on alignment for both proton transfer and procession may further explain the range of effects of these homologous mutants on substrates with different degrees of crystallinity.
To further understand the role of the more distant base in TrCel6A, we compared its active site to that of an inverting cellulase, TfCel9A, which does not require a water wire to shuttle a proton to the base during catalysis. Although the active site residues align similarly in the two enzymes, the substrate is rotated ϳ45°about its axis, creating the additional distance occupied by the bridge water in TrCel6A. We propose that the water wire buffers the post-reaction hydrogen bonding between the protonated base and the cellobiose product, easing product release. Consistent with this theory, we calculated an ϳ6 kcal/mol reduction in product binding energy because of differences in catalytic base alignment strategy. We thus suggest that the more distant base benefits the overall TrCel6A mechanism in terms of reduced association between the product and enzyme.
Although not addressed in this study, there are other potential mutations of TrCel6A that could obviate the need for a water wire during hydrolysis. Specifically, the wildtype positions 178 and 180 are occupied by alanines, but mutations to aspartate or glutamate (presumably in conjunction with a mutation of Asp-175 to a non-protonatable residue) might potentially place the side chain of the base close enough to allow direct proton transfer from the nucleophilic water. The alanines at these positions are quite highly conserved in this family of enzymes, although a glutamate is observed at the position analogous to TrCel6A Ala-180 in the H. insolens Cel6B enzyme (5). The large number of possible alternate sequences for these enzymes provides many opportunities to learn from how nature has optimized protein design. It is clear from this study that any such mutants would need to be carefully analyzed for how they could disrupt the highly tuned hydrogen-bonding network in the active site and their effects on multiple parts of the catalytic cycle.
Our results advance cellobiohydrolase enzyme engineering efforts by broadening the focus of the role of active site residues beyond the hydrolysis step. Family GH6 enzymes (and the conservation of their mechanism across different branches of life) expand our understanding of how enzymes can make small sacrifices in reactivity to enhance other aspects of the larger catalytic cycle.

Molecular dynamics simulations
The approach of our investigation using molecular dynamics (MD) simulations is briefly described here, with further detail in the supporting material. All models were based on crystal structures deposited in the Protein Data Bank (PDB) (30). The initial structure for the procession study was constructed by combining features from two crystal structures. First, the crystal structure PDB ID: 1QK2 (18) (wildtype TrCel6A with a nonhydrolyzable cellotetraose) was stripped of its non-protein components. Then, the cellohexaose substrate from PDB ID: 4AVO (31) was manually aligned with its leading nonreducingend ring in the ϩ1 position as in our previous work (16). For the active site conformation studies, we started with the PDB ID: 1QJW (18) crystal structure (TrCel6A Y169F mutant complexed with cellotetraose) reverted to the wildtype, and again replaced its substrate with that of PDB ID: 4AVO (31), this time in the active position with the leading ring in the Ϫ2 site. The active site water molecules appear in PDB ID: 1QJW and were retained for both the wildtype and D175E mutant models. The crystal structures in PDB format were converted into topology and coordinate files using the CHARMM (32) package and the CHARMM36 force field (33)(34)(35)(36). The models were solvated in a periodic box using the TIP3 water model (37) and converted into Amber format using the CHAMBER program in ParmEd (38).
All simulations were performed using the Amber14 package (39). Unless otherwise stated, the SHAKE algorithm (40,41) was used to constrain the lengths of all hydrogen bonds and the cutoff distance for nonbonded interactions was set to at least 8.0 Å. First, structures were minimized without SHAKE over 2500 steps (1250 steepest descent method followed by 1250 conjugate gradient method). The systems were then heated in a periodic box in the NVT ensemble, using the Andersen thermostat (42) to take the temperature from 100 K to 300 K over 10,000 2-femtosecond time steps followed by 1000 steps of constant-temperature simulation. Velocities were randomized every 1000 steps. Molecular dynamics production simulations were performed under the same conditions as the heating simulations at a constant temperature of 300 K.
Simulations of TrCel6A included positional restraints on the ␣-carbons of residues Ser-106, Ala-150, Asp-200, Asn-247, Ala-280, Ile-330, and Gln-437, as in our previous work (16). These atoms have been shown to have a low root mean square fluctuation (43), and restraining them prevents bulk motion of the protein.