Loop Motions Important to Product Expulsion in the Thermobifida fusca Glycoside Hydrolase Family 6 Cellobiohydrolase from Structural and Computational Studies*

Background: Family 6 glycoside hydrolases represent an important, diverse enzyme class in cellulolytic organisms. Results: We solved structures of two Thermobifida fusca Cel6B (TfuCel6B) cellobiohydrolase mutants and examined ligand dynamics and product release with simulation. Conclusion: These results suggest mechanisms for product release in TfuCel6B. Significance: This study further elucidates the mechanism of a unique cellobiohydrolase with an extended, enclosed active site tunnel. Cellobiohydrolases (CBHs) are typically major components of natural enzyme cocktails for biomass degradation. Their active sites are enclosed in a tunnel, enabling processive hydrolysis of cellulose chains. Glycoside hydrolase Family 6 (GH6) CBHs act from nonreducing ends by an inverting mechanism and are present in many cellulolytic fungi and bacteria. The bacterial Thermobifida fusca Cel6B (TfuCel6B) exhibits a longer and more enclosed active site tunnel than its fungal counterparts. Here, we determine the structures of two TfuCel6B mutants co-crystallized with cellobiose, D274A (catalytic acid), and the double mutant D226A/S232A, which targets the putative catalytic base and a conserved serine that binds the nucleophilic water. The ligand binding and the structure of the active site are retained when compared with the wild type structure, supporting the hypothesis that these residues are directly involved in catalysis. One structure exhibits crystallographic waters that enable construction of a model of the α-anomer product after hydrolysis. Interestingly, the product sites of TfuCel6B are completely enclosed by an “exit loop” not present in fungal GH6 CBHs and by an extended “bottom loop”. From the structures, we hypothesize that either of the loops enclosing the product subsites in the TfuCel6B active site tunnel must open substantially for product release. With simulation, we demonstrate that both loops can readily open to allow product release with equal probability in solution or when the enzyme is engaged on cellulose. Overall, this study reveals new structural details of GH6 CBHs likely important for functional differences among enzymes from this important family.

we hypothesize that either of the loops enclosing the product subsites in the TfuCel6B active site tunnel must open substantially for product release. With simulation, we demonstrate that both loops can readily open to allow product release with equal probability in solution or when the enzyme is engaged on cellulose. Overall, this study reveals new structural details of GH6 CBHs likely important for functional differences among enzymes from this important family.
Plants comprise the majority of terrestrial, organic carbon on earth, and organisms across all kingdoms of life have developed myriad strategies to depolymerize plant cell wall polymers for food (1)(2)(3). Enzyme cocktails from fungi and bacteria, the primary degraders of plant biomass in nature, have been widely studied for biomass degradation potential, often with the aim to harness these enzymes in support of renewable fuels portfolios (2,4,5). The primary organism studied for enzymatic breakdown of biomass to date is the ascomycete fungus, Hypocrea jecorina (Trichoderma reesei) (6,7). Driven mainly by studies of H. jecorina and similar organisms, a model of cellulose deconstruction by cellobiohydrolases (CBHs), 6 endoglucanases, and ␤-glucosidases emerged (8). Namely, endoglucanases act on amorphous regions of cellulose to create points for attachment and detachment of processive CBHs (9), which hydrolyze successive units of the cellulose chain via a multistep mechanism involving processive motion, hydrolysis, and product expulsion (8, 10 -13). ␤-Glucosidases subsequently hydrolyze soluble cellobiose to glucose to reduce CBH product inhibition (14,15).
The aerobic filamentous bacterium Thermobifida fusca has received significant attention as a biomass degrader in heated organic materials (16). T. fusca secretes at least eight polysaccharide-degrading enzymes (17), including two glycoside hydrolase Family 6 enzymes (GH6): the endoglucanase TfuCel6A and the CBH TfuCel6B. GH6 enzymes are important components of many fungal and bacterial enzyme cocktails for biomass degradation (16, 18 -21). Additionally, there is substantial structural and sequence diversity in GH6 enzymes, further suggesting a broad range of activity, function, and stability. Fig. 1 shows a surface representation of known structures of GH6 CBHs (22)(23)(24)(25) and endoglucanases (26,27). The T. fusca GH6 enzymes represent the extremes of known structural diversity in terms of active site enclosure. TfuCel6A has the shortest and most open cleft, whereas TfuCel6B surprisingly exhibits the longest and most enclosed tunnel.
In addition to substantial structural diversity of GH6 enzymes, the catalytic mechanism remains a question of interest as the identity of a well defined catalytic base remains elusive (28). The GH6 catalytic acid has been identified as a conserved aspartic acid (Asp-221 in H. jecorina Cel6A (HjeCel6A) and Asp-274 in TfuCel6B) (29,30). Several potential titratable residues nearby have been proposed to be the catalytic base (29 -32), but none of these residues have been found in crystal structures within the necessary distance to act directly as the base. Instead, the predominant mechanism for GH6 hydrolysis proposed to date from several structural studies involves a water wire or Grotthuss mechanism. Several structures reveal a nucleophilic water molecule near the anomeric carbon stabilized by a serine residue on the "active center" loop (residues 175-183 in HjeCel6A and 226 -234 in TfuCel6B) (23,30). This water molecule is hydrogen-bonded to a second water molecule, which in turn hydrogen-bonds to a conserved aspartate (Asp-175 in HjeCel6A and Asp-226 in TfuCel6B). The mechanism for GH6-inverting hydrolysis has thus been proposed to proceed via a water wire mechanism wherein the nucleophilic water attacks the anomeric carbon and transfers a proton to a second water molecule, which then transfers a proton to the putative catalytic base (30). The data to support this mechanism primarily come from structural studies, but enzymes wherein the hypothesized catalytic base has been mutated to alanine retain partial activity (29), warranting further study.
In this study, we extend our previous studies on TfuCel6B (23,29). TfuCel6B is an interesting GH6 model system because of its extended and more enclosed tunnel, and also because Wilson and co-workers (33,34) have conducted considerable experimental work on its catalytic mechanism, substrate specificity, and synergy with other T. fusca enzymes, thus providing significant mutational data for mechanistic interpretations. TfuCel6B is a bimodular enzyme with an N terminus Family 2 carbohydrate-binding module attached to the catalytic domain via a Pro/Ser-rich linker. In a previous study, Asp-274 was shown to be the catalytic acid (29). As with HjeCel6A, no direct catalytic base was identified in TfuCel6B. Mutation of the conserved Asp-226 to alanine resulted in lower activity on insoluble substrates, yet the D226A mutant enzyme exhibits slightly higher carboxymethyl cellulose (CMC) activity than the wild type enzyme. Furthermore, the double mutant D226A/S232A was completely inactive but showed rescued activity on CMC by a low concentration of sodium azide (29).
Recently, we published ligand-free and ligand-bound structures of the wild type TfuCel6B catalytic domain, which are the first reported GH6 CBH structures from a bacterium (23), shown in Fig. 2. The TfuCel6B catalytic domain has ϳ100 more residues than its fungal counterparts, and the majority of extra residues are located to insertions at six discrete sections of the sequence (referred to as sections I to VI, and labeled in Fig. 2). In one of the structures obtained by co-crystallization with cellobiose, we could model an intact cellohexaose molecule in the active site tunnel in chain A from the ϩ4 to the Ϫ2 glucopyranose-binding sites (with a cellotetraose spanning ϩ2 to Ϫ2 in chain B). The substrate-binding path is significantly longer and more enclosed than in fungal GH6 CBHs due to these insertions of which one is an extra loop at the exit of the tunnel. In the ligandfree enzyme structure, this "exit loop" (section I) is disordered (residues 185-197), but in the ligand-bound enzyme structure, it completely encloses the product subsites, suggesting that the exit loop likely opens for product release.
In this study, we employ x-ray crystallography to examine two catalytically deficient mutants (D274A and D226A/S232A) of TfuCel6B (29 -32). Based on the new structures presented here and the previously published wild type structures (23), we conduct various molecular dynamics (MD) simulations to examine questions of enzyme and ligand dynamics relative to HjeCel6A, cellobiose product release after hydrolysis, and the complexation of the TfuCel6B catalytic domain on the surface of a cellulose I␤ crystal.  (22,24,25,27), and enzymes labeled with green are from bacteria (T. fusca) (23,26). The first four listed GH6 cellulases are cellobiohydrolases, and the last two are endoglucanases.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-We expressed TfuCel6B variants D226A/S232A and D274A in Escherichia coli and prepared them as described previously (29). To remove the carbohydrate-binding module, the full-length enzymes were subjected to papain digestion, and the resulting catalytic domains were purified using ion exchange chromatography on a Source 30Q column as described previously (23) followed by size-exclusion chromatography on a Superdex 200 column.
Crystallization and Data Collection-We co-crystallized the catalytic domains of TfuCel6B variants (6 mg/ml) with 10 mM cellobiose using the hanging drop method as described previously (23). The gradient seeding method was necessary for obtaining crystals and improving quality of the crystals. Both 20% ethylene glycol and 10% glycerol were added to the mother liquid as cryoprotectants. The crystals were flash-frozen in liquid nitrogen prior to the collection of x-ray diffraction data.
Two diffraction data sets were collected, one for each of the TfuCel6B mutants (D226A/S232A and D274A), from single crystals on beamlines ID14-4 and ID23-1, respectively, at European Synchrotron Radiation Facility (ESRF; Grenoble, France). The datasets were reduced and scaled using iMosflm (35) and Scala in CCP4 program suite (36).
Model Building and Refinement-The structures were solved by molecular replacement using Molrep (auto-MR function) in the CCP4 program suite (37,38) with the structure of wild type TfuCel6B as a search model. For the model refinement, REFMAC5 (39) was used and the models were manually adjusted using Coot (40) by inspection of 2mF o Ϫ DF c and mF o Ϫ DF c A-weighted maps. Solvent molecules were automatically added using the ARP/wARP water-adding function within Refmac5 (41) and manually checked by inspection of electron density in 2mF o Ϫ DF c map during refinement. The ligands were introduced by fitting cellohexaose, cellotetraose, or glucose to the electron density in different structure models at a late stage, when the refinement of the protein part was nearly finished.
Structural Preparation-The MD simulations were conducted with wild type TfuCel6B. Three ligand-bound structures and the ligand-free wild type structure were studied. For the cellodextrin-bound enzymes, we consider two different binding scenarios: a cellobiose bound at the Ϫ2, Ϫ1 subsites and a cellotetraose bound from the ϩ1 to ϩ4 subsites, as shown in the crystal structure, and a single cellohexaose bound in the tunnel from the Ϫ2 to ϩ4 subsites. Additionally, a previously equilibrated cellulose I␤ microfibril (42,43) was used to construct a TfuCel6B-cellulose complex (Fig. 3). In our previous studies, we have examined catalytic engagement of H. jecorina Cel7A and HjeCel6A on the crystalline cellulose I␤ microfibril by manually docking the enzymes on the cellulose [100] surface (44). In this work, we superimposed TfuCel6B on the equilibrated structure of HjeCel6A-cellulose complex to generate the  initial structure of TfuCel6B-cellulose complex. The microfibril was composed of four layers, which consist of three, four, five, and six cellulose chains from the top layer to the bottom layer, respectively. Each cellulose chain has 16 glucose residues. The enzyme was placed on the hydrophobic surface at the nonreducing end. The cellodextrin chain from the crystal was bonded to the ligand at the ϩ4 position manually, and the threaded chain was an "edge" chain, which are easier to decrystallize relative to "middle" chains (13). The positions of the cellobiose and the cellotetraose bound at the tunnel were the same as in the TfuCel6B-cellodextrin system.
MD Simulations-All the systems were solvated in an orthogonal box of TIP3P water molecules (45,46) using CHARMM (47). The CHARMM22 force field parameters (47,48) with the CMAP correction (49 -51) were used for the enzyme, and the C35 carbohydrate force field parameters (52,53) were used for the substrates. Ions were added into the systems to maintain charge neutrality. The final system size was ϳ56,000 atoms for TfuCel6B-cellodextrin complex and 81,000 atoms for TfuCel6B-cellulose complex. In the TfuCel6B-cellulose complex, all atoms except hydrogen atoms of the six cellulose chains at the bottom of the microfibril were harmonically restrained with a weak force constant of 2 kcal/(mol ϫ Å 2 ). The four systems were equilibrated in the NPT ensemble at 300 K with a Nosé-Hoover thermostat (54, 55) and 1 atm for 100 ps with a step size of 1 fs. SHAKE was used to fix the covalent bonds to hydrogen atoms in all MD simulations (56). The nonbonded interactions were zeroed at 10 Å with a switching function. The electrostatic interactions were calculated using the particle mesh Ewald method (57) with a sixth order b-spline interpolation, a Gaussian distribution width of 0.34 Å, and a mesh size of 96 ϫ 96 ϫ 96 for ligand-free enzyme structure and TfuCel6B-cellodextrin complexes and a mesh size of 120 ϫ 90 ϫ 90 for TfuCel6B-cellulose complex. Following the equilibration, the production MD simulations were conducted in the NVT ensemble for 260 ns at 300 K with a step size of 2 fs using NAMD (58).
Steered Molecular Dynamics (SMD) Simulations-The protocols for conducting SMD simulations on the TfuCel6B-cellodextrin complex and the TfuCel6B-cellulose complex follow our previous work on product inhibition (15,59). CHAMBER (61) was used to convert the protein structure file, coordinate file, and force field files in CHARMM format to Amber format. Each system was simulated in the NVT ensemble for 20 ns at 300 K with a step size of 2 fs using the PMEMD engine in Amber (62). Subsequently, a set of 100 starting structures was selected from the 20-ns simulation of the complex system as the initial structures of 100 SMD simulations, which were conducted using the Jarzynski implementation in Amber (62). During the SMD simulations, an external force was applied to the cellobiose ligand, and the distance between the two dummy atoms shown in Fig. 4, d (also referred to as the reaction coordinate), was increased with a speed of 1 Å/ns in 14 ns. We examined different pulling speeds, and 1 Å/ns was found to yield converged results similar to our previous work (15,59). The force constant is 5000 kcal/(mol ϫ Å 2 ). The general work needed to drive the system from the cellobiose bound state to the unbound state is calculated as a function of the distance d for each SMD simulation. The Jarzynski equality ⌬G ϭ ϪkT ln͗exp(ϪW/kT)͘ was used to calculate the binding free energy ⌬G (63)(64)(65) where W is the work associated with the distance d in each SMD simulation. The exponential average of the 100 work profiles results in the determination of the free energy (i.e. the potential of mean force along the reaction coordinate, d).
The SMD measurement uncertainty was estimated with bootstrapping (66).
As shown in Fig. 4, two dummy atoms were chosen to represent the positions of the exit loop and the bottom loop (section V) (23). The distance between the red/purple dummy atom, representing the position of the exit/bottom loop, and the yellow dummy atom, representing the protein position, was measured from the 200-ns unbiased MD simulation of the TfuCel6B-cellodextrin complex. During the 100 SMD simulations, we measured the increased distance as a function of time, is the distance of the two dummy atoms at time t and d(0) is the initial distance before the SMD simulation starts. The largest value of ⌬d(t) was chosen for each SMD trajectory to characterize the extent of loop opening.

RESULTS
Overall TfuCel6B Structural Results-The catalytic modules of two catalytically deficient mutants of TfuCel6B, D274A and D226A/S232A (the latter is dubbed the "double mutant"), were crystallized in the presence of cellobiose in the space group P2 1 . Both the crystal packing (one molecule/asymmetric unit) and the unit cell dimensions were different from that of wild type TfuCel6B structure (two molecules/asymmetric unit) (23). The Cel6B D274A and D226A/S232A structures (Protein Data Bank (PDB) codes 4AVO and 4AVN, respectively) were solved and refined to 2.0 and 1.8 Å resolution and final R work /R free of 15.5/ 19.1% and 15.4/19.8%, respectively. The statistics of the x-ray diffraction data and structure refinement are summarized in Table 1. Overall, the two structures are similar to one another and to the wild type structure. The exit loop (residues 185-197) and the active center loop (226 -234), which were open in the wild type ligand-free structure, are closed in both mutant structures as in the ligand structure of the wild type enzyme. Salient differences between the published structures (23) and the present structures are discussed below.
Both of the TfuCel6B variant structures have carbohydrate ligands bound in the active site tunnel. The glycan-binding subsites are numbered from the nonreducing end to the reducing end of the glycan chain, Ϫ2, Ϫ1, ϩ1, ϩ2, ϩ3, ϩ4, with the glycosidic bond to be cleaved between sites Ϫ1 and ϩ1. The product subsites are at Ϫ1 and Ϫ2 because they harbor the cellobiose moiety that is cleaved off from the nonreducing end of a cellulose chain during processive action. In both of the structures, there is distinct electron density for one glucose unit in subsite Ϫ2 and contiguous density for four glucose units from ϩ1 to ϩ4 subsites. The electron densities of the glycans were successively weaker from the nonreducing to the reducing end, which correlates with increasing temperature factors of the glucopyranosyl atoms. The D274A structure also displays partially occupied density for a glucose residue at site Ϫ1 (Fig. 5,  top), but the D226A/S232A structure does not (Fig. 5, bottom). Consecutive glucose residues could be connected with ␤-1,4glycosidic bonds in all cases without violation of bond distance and angle restraints. Therefore, a cellohexaose molecule was modeled into TfuCel6B D274A, spanning subsites Ϫ2 to ϩ4, whereas the D226A/S232A structure contains a glucose residue in subsite Ϫ2 and cellotetraose in subsites ϩ1 to ϩ4 (Fig. 5).
The catalytic acid mutant D274A structure in complex with glycans is similar to the wild type enzyme. The Ϫ1 glucopyranosyl residue refines in the same 2 S O skew boat conformation (Fig. 5, top). The density for a glucopyranosyl residue at this position is weak, which makes the 2 S O conformation less certain. However, there is no support for any other sugar conformation, and the 2 S O skew boat conformation was previously observed in wild type TfuCel6B and GH6 enzymes (23,32,67,68). Two water molecules are retained between the anomeric carbon and the proposed catalytic base, Asp-226, which adopts the "active" conformation.
The substrate binds similarly in the D226A/D232A complex. The catalytic acid, Asp-274, adopts the "inactive" conformation, not pointing toward 4OH of the ϩ1 glucose unit, i.e. the position for the glycosidic oxygen to be protonated. It is noteworthy that the inactive conformation is adopted despite the lack of a hydrogen bond to the proposed catalytic base, as observed in GH6 structures where the latter residue has not been mutated (38). We observe electron density for a water molecule bound to the backbone carbonyl of Asp-497, corresponding to the position of the nucleophilic water, but not for the second water molecule in the proposed mechanism, suggesting that the D226A and D232A mutations have disrupted the putative Grotthuss proton transfer chain for nucleophilic attack on the anomeric carbon.  Closer examination of the electron density in subsite Ϫ1 of the D226A/D232A glycan complex reveals that water molecules bind at the expected sites for hydroxyl groups of a reducing end glucose residue. Using these water molecules, an ␣-cellobiose molecule placed in sites Ϫ2 and Ϫ1 could be refined to provide a model of how the product binds after hydrolysis of the glycosidic bond. After refinement, the ␣-glucose residue at site Ϫ1 adopts a 2,5 B boat conformation (Fig. 6, green), consistent with the proposed conformational itinerary for the inverting mechanism of GH6 enzymes, from a 2 S O skew via an oxocarbenium ion-like transition state to a 2,5 B boat conformation (30,69,70). This result indicates that the Ϫ1 ring pucker should undergo a conformational change from 2 S o to 2,5 B conformations through the transition state. Comparison of the glycan ring distortion at site Ϫ1 suggests that the glucose ring is held in place via the 4-O-glycosidic bond, the C2 and C3 hydroxyl groups, and through the steric restraints imposed on the C5-C6-O6 arm by Tyr-220. The anomeric carbon migrates ϳ1.2 Å from the ␤-bond with the glycosidic oxygen, over the transition state, to an ␣-linkage with the nucleophilic water molecule.
Computational Investigation of Enzyme and Ligand Dynamics-From the structural data, two loops in particular are noteworthy for their putative action during product expulsion and the differences that these loops may impart specifically to TfuCel6B ligand and enzyme dynamics. Namely, the exit loop (residues 185-197) is disordered and presumably flexible in the ligandfree structure, but in all of the holo structures, it closes over the product subsites. Moreover, the bottom loop (residues 501-510) forms the other section of the enzyme along with the exit loop that closes over the cellobiose product. Based on the structural results for the exit loop and on the proximity of the bottom loop to the product subsites, we hypothesize that either of these loops must open during product expulsion.
To examine this hypothesis, we first conducted long unbiased MD simulations of TfuCel6B to examine the flexibility of the ligand, the enzyme, and the loops surrounding the tunnel exit and active site. Three MD simulations were conducted in solution: a ligand-free MD simulation, an MD simulation with a cellotetraose spanning ϩ4 to ϩ1 with a cellobiose in Ϫ1 and Ϫ2 to mimic the post-hydrolysis system, and an MD simulation with a cellohexaose spanning the active site tunnel from ϩ4 to Ϫ2 to model the catalytically active state. Moreover, catalytic engagement of TfuCel6B on a solid cellulose crystal may influence the conformation and flexibility of the tunnel-forming loops. Thus, we examined whether substrate complexation on the cellulose surface can block either of the loops opening by running an MD simulation (Fig. 3) to model the post-hydrolysis system with a cellobiose in the Ϫ1 and Ϫ2 subsites and a cellotetraose spanning the ϩ4 to ϩ1 subsites, which is further bonded to the cellulose chain being decrystallized from the crystal surface.
The root mean square fluctuation (RMSF) values of each residue in the ligand-free enzyme and the two TfuCel6B-cellodextrin complexes are shown in Fig. 7A. The exit loop exhibits an RMSF value of ϳ2 Å larger than that of the bottom loop for the simulations of TfuCel6B in solution. However, when the enzyme is catalytically engaged on the cellulose surface, the RMSF value of the bottom loop increases significantly by ϳ3 Å. The bottom loop does not interact with the cellulose surface, but the subsequent portion of the section VI loop (residues 520 -530) binds to the cellulose surface, which in turn dramatically modifies the dynamics of the bottom loop. As shown in the residue cross-correlation maps (supplemental Fig. S1), the fluctuations of the bottom loop are not correlated with the motions of other regions in TfuCel6B after catalytic engagement, during which the interaction energy between the bottom loop and other regions in TfuCel6B decreased by ϳ30 kcal/mol (supplemental Table S1), suggesting the loss of a significant number of protein-protein contacts. As a result, the bottom loop becomes more flexible after TfuCel6B is engaged on the cellulose surface. Our results suggest that catalytic engagement of TfuCel6B on a cellulose surface may influence the flexibility of the tunnel-forming loops in TfuCel6B differently from that in solution, and thus product expulsion was examined in both cases.
Additionally, we compared the overall enzyme fluctuations of TfuCel6B with HjeCel6A to understand how the more enclosed active site tunnel may affect enzyme and ligand dynamics. As mentioned above, the exit and bottom loops are absent in HjeCel6A, and overall the TfuCel6B fluctuations are on average higher (Fig. 7B). Additionally, we compared the fluctuations of the ligands between HjeCel6A and TfuCel6B in the cellohexaose-bound structures (Fig. 7B). The HjeCel6A data are taken from previous work (71). As shown in Fig. 7B, the ligand RMSF is surprisingly similar along the entire active site tunnel for both enzymes despite the fact that TfuCel6B exhibits a significantly more enclosed tunnel. As with HjeCel6A, the ϩ4 subsite is the most flexible, likely due to greater solvent exposure along the active site tunnel.
Product Expulsion Pathways-To investigate whether the bottom loop or exit loop open after hydrolysis in TfuCel6B, we conducted 100 SMD simulations to slowly pull the product from the tunnel, both with TfuCel6B in solution and with TfuCel6B complexed on cellulose. The product expulsion pathways from the 100 SMD simulations in both scenarios were analyzed to identify the most probable pathway that the cellobiose follows by measuring the positions of the exit/bottom loops relative to the protein during the SMD simulations. The results are summarized in Fig. 8. The data suggest that the exit loop can open up to ϳ14 Å relative to its initial position, FIGURE 6. Water molecules present in the double mutant structure enable construction of an ␣-cellobiose model in the product of TfuCel6B in the 2,5 B conformation. The orange glucopyranose residue is from the TfuCel6B D274A structure in the 2 S o conformation. The green glucose residue was built in the TfuCel6B D226AA/S232A structure as a product-binding model in the 2,5 B boat conformation. The distance between two anomeric carbons in these structures is 1.2 Å, which suggests the itinerary of the anomeric carbon during nucleophilic attack by the water molecule and concomitant glycosidic bond cleavage.
whereas the bottom loop can open up to ϳ16 Å. The exit loop opens largely, i.e. beyond 10 Å, in 19 simulations, whereas the bottom loop opens beyond 10 Å in 20 simulations. Interest-ingly, there is no overlap between these two subsets of the 100 trajectories, i.e. under no circumstances do both loops open more than 10 Å simultaneously. Most of the product expulsion pathways involve both moderate opening of the exit and bottom loops, suggesting that both the exit and the bottom loop open with equal probability for product expulsion from the closed catalytic tunnel of TfuCel6B. Moreover, there is no clear difference in the propensity for either loop to open more than another in solution relative to when TfuCel6B is complexed on cellulose. Fig. 9 illustrates the potential of mean force of cellobiose expulsion from the catalytic domain of TfuCel6B. The computed reversible thermodynamic work required to remove cellobiose from the product sites of the enzyme is the negative cellobiose binding free energy. Thus, the overall binding free energy of cellobiose calculated using all 100 SMD trajectories from TfuCel6B-cellodextrin complex and TfuCel6B-cellulose complex is Ϫ21.2 Ϯ 1.1 and Ϫ19.8 Ϯ 0.5 kcal/mol, respectively, which is consistent with the binding free energy of cellobiose to the catalytic domain of HjeCel6A, Ϫ22.4 Ϯ 0.3 kcal/mol, as characterized in our previous study (15). Specifically, for the TfuCel6B-cellodextrin complex, the calculated binding free energy is Ϫ25.9 Ϯ 1.1 kcal/mol by using 19 trajectories involving only the large opening of the exit loop and Ϫ25.4 Ϯ 0.2 kcal/mol by using 20 trajectories involving only the large opening of the bottom loop, respectively (supplemental Fig. S2). The quantified free energy of product expulsion via the two "extreme" pathways supports the observation in Fig. 8 that no preference exists in opening of the exit loop and the bottom loop.

DISCUSSION
In this study, we have characterized two structures of TfuCel6B catalytic mutants. These structures are noteworthy for understanding GH6 hydrolysis mechanisms as mutation of the catalytic acid (D274A) or mutation of the putative catalytic base along with the residue that stabilizes the nucleophilic water (D226A/S232A) does not disrupt the structure of the active site considerably. This result in turn supports the hypothesized role of Asp-226 and Ser-232 in the catalytic step (30). The results for the double mutant were additionally surprising in that the active center loop remains closed over the cellodextrin despite removal of stabilizing hydrogen bonds, which is hypothesized to be a key interaction during CBH action. TfuCel6B was also chosen as a model GH6 enzyme because it has been extensively studied and its activity profile upon mutations has been well characterized (29,33,72), and the structures solved here also suggest a possible explanation for the observed catalytic profiles. A mutant enzyme where only the putative catalytic base is mutated, TfuCel6B D226A, exhibits higher activity on CMC than the wild type enzyme (29). Analysis of the TfuCel6B structures reveals that the active site of the mutant D226A could harbor a 3-hydroxy-substituted carboxymethyl group on the glucopyranose residue in the ϩ1 subsite. Fig. 10 shows a model for how such a carboxymethyl substitution could be in a position to form a hydrogen bond directly with the nucleophilic water in the active site of a D226A mutant. The Grotthuss mechanism may thus be retained despite the lack of an aspartic acid side chain. In the D226A/ S232A double mutant, the proton network would be disrupted due to the lack of Ser-232, which explains the loss of activity on CMC (29). For this mutant, activity could partially be regained by adding sodium azide (29). The structure of the double mutant shows that in the absence of the hydroxyl group, there is space to accommodate an azide molecule. A nucleophilic water molecule functionally similar to that of the wild type enzyme may thus be present with the help of a carboxymethyl group on the 3-hydroxyl group in the subsite ϩ1 and the azide molecule.
Additionally, the exit loop and bottom loop are not found in fungal GH6 enzymes, and it is of interest to understand the evolutionary pressure to utilize a completely enclosed product region; e.g. the presence of a loop corresponding to the bottom loop has previously been linked to a more cellobiohydrolase-like behavior of the GH6 enzyme cbhA from Cellulomonas fimi (73). As illustrated in Fig. 1, the product regions are much more closed in TfuCel6B than in fungal counterparts. From the results shown in Fig. 7, it is surprising given the extent of ligand contacts in TfuCel6B relative to HjeCel6A that the ligand fluctuations are not lower (i.e. the enzyme stabilizes the ligand more) in TfuCel6B, which has been correlated in chitinases (74) and GH7 cellulases (75) to higher degrees of processivity. However, it is perhaps noteworthy that the Ϫ3 and Ϫ4 subsites of TfuCel6B are blocked by the presence of the exit loop. We hypothesize that a potential functional reason for blocking the Ϫ3 and Ϫ4 subsites in a GH6 enzyme could be to minimize the ability of the enzyme to perform endo-like initiation or endo-   glucanase-like cuts in cellulose, but rather force it to act more in a CBH-only mode. This result may explain why there is surprisingly no difference in ligand fluctuations, and yet TfuCel6B may still be more processive than HjeCel6A. Testing this hypothesis will require careful measurements of endo-initiation and processivity on cellulose substrates (76,77), which to date has proven challenging for nonreducing end-specific CBHs. Moreover, we conducted an extensive sequence alignment of GH6 cellulases. By comparing the exit and bottom loops together, as shown in supplemental Fig. S3, we find 7 basic groups of bacterial GH6s in terms of sequence diversity of these two loops. Neither of the loops is present in fungal GH6 cellulases. The first two groups comprise the majority of the available sequences (Fig. 11). Namely, we find that the long exit loop such as that in TfuCel6B of 10 -11 extra residues is found only in bacteria from the order Actinomycetales, and mainly clustered in Streptosporangineae, Pseudonocardineae and Micromonosporaceae. A second group present only in Streptomycineae contains an ϳ6-residue insertion with either a conserved asparagine or a conserved threonine residue, but no structures exist from this second class of GH6 enzymes to date. Conversely in most cases, the bottom loop in bacterial cellulases examined in the sequence alignment is ϳ4 residues longer than in HjeCel6A with only a few entries that are significantly longer.
The subsequent section VI is a surface loop located at the entrance to the active site tunnel that pairs with the section II loop on the opposite side of the tunnel (23,30). The structure formed by sections II and VI and with the bottom loop are a large part of the active site tunnel roof and not present at all in bacterial endoglucanases. In our simulations of the TfuCel6Bcellulose complex, section VI interacts with the cellulose surface. This section is much more conserved than the exit and bottom loops as shown in supplemental Fig. S4. More than half of the available sequences share a PXYXGNXRN pattern. Of the conserved residues, only the tyrosine and the arginine side chains could potentially interact with the cellulose surface. The conserved arginine (Arg-524) adopts several different side chain conformations among the different structure models of TfuCel6B. One of these is a direct interaction with the open active site loop in the ligand-free structure (23,30), an interaction that is not detected in any of the structure models with a closed active site loop. The active site loop of a GH6 CBH needs to open and close for processive action on the cellulose chain. Arg-524 may thus be a link between an interaction of the enzyme with the cellulose surface via section VI and the processive catalytic activity, which will be examined in a future study.
Overall, this study sheds new light on the structure of the active site and role of long loops near the product site of a unique class of GH6 CBHs. However, many questions remain to be answered generally for GH6 action regarding the mechanisms of hydrolysis and processivity as well as regarding the role of long loops that block the product sites in processive action, which will be addressed in future structural and computational studies. Understanding the functional diversity in this important class of GH enzymes will be essential for the development of structure-activity relationships and to understand why certain organisms evolved seemingly such specialized features on a common enzyme fold and reaction mechanism.