Improving the thermal stability of cellobiohydrolase Cel7A from Hypocrea jecorina by directed evolution

Secreted mixtures of Hypocrea jecorina cellulases are able to efficiently degrade cellulosic biomass to fermentable sugars at large, commercially relevant scales. H. jecorina Cel7A, cellobiohydrolase I, from glycoside hydrolase family 7, is the workhorse enzyme of the process. However, the thermal stability of Cel7A limits its use to processes where temperatures are no higher than 50 °C. Enhanced thermal stability is desirable to enable the use of higher processing temperatures and to improve the economic feasibility of industrial biomass conversion. Here, we enhanced the thermal stability of Cel7A through directed evolution. Sites with increased thermal stability properties were combined, and a Cel7A variant (FCA398) was obtained, which exhibited a 10.4 °C increase in Tm and a 44-fold greater half-life compared with the wild-type enzyme. This Cel7A variant contains 18 mutated sites and is active under application conditions up to at least 75 °C. The X-ray crystal structure of the catalytic domain was determined at 2.1 Å resolution and showed that the effects of the mutations are local and do not introduce major backbone conformational changes. Molecular dynamics simulations revealed that the catalytic domain of wild-type Cel7A and the FCA398 variant exhibit similar behavior at 300 K, whereas at elevated temperature (475 and 525 K), the FCA398 variant fluctuates less and maintains more native contacts over time. Combining the structural and dynamic investigations, rationales were developed for the stabilizing effect at many of the mutated sites.

Industrial enzymatic conversion efficiency benefits from increases in temperature by virtue of the fundamental dependence of reaction rate on temperature. However, H. jecorina is a mesophilic organism that produces cellulases, including its native Cel7A, that do not have sufficient stability beyond 50°C (20). Thus, process optimization would greatly benefit from identification of increasingly thermally stable cellulase variants, where improvements to Cel7A stability, the most abundant component of industrial enzyme mixtures (18), offer the largest overall gain.
There are four commonly implemented approaches to identify stabilizing mutations of a given protein. These approaches include (i) comparison of the amino acid sequence of a protein with that of a single more or less thermally stable homologous protein, followed by replacement of selected amino acids by site-directed mutagenesis (21,22); (ii) site-directed mutagenesis to introduce amino acids derived from the study of the 3D structure of the protein of interest, where proteins with diverse thermal stabilities are compared and amino acid substitutions identified (23)(24)(25); (iii) random mutagenesis followed by selection (26 -28); and (iv) a structure-guided approach with recombination of stabilizing fragments (SCHEMA) (29,30). The latter two approaches rely on the ability to both express the enzyme in a microorganism and assay it using high-throughput screening methods. However, heterologous expression of Cel7A from H. jecorina in Escherichia coli, a traditional high-throughput expression host, has notoriously failed to produce either soluble or active enzyme. In yeast, expression levels are low, and the properties of expressed Cel7s have been unpredictable (31)(32)(33)(34). The latter obstacle is probably a result of both the lack of N-terminal glutamine cyclization of Cel7A expressed in yeast and the significant variation in protein glycosylation (35), highlighting the importance of expression host on enzyme activity.
To date, approaches ii and iv have been successfully applied in engineering more thermally stable GH7 cellobiohydrolases. Heinzelman et al. (36) used SCHEMA recombination to increase the half-life of a chimeric GH7 expressed in a glycosylation-deficient Saccharomyces cerevisiae strain. The variant, which was reported to contain an average of 42 unspecified mutations, retained some hydrolytic activity on soluble cellulose at temperatures up to 70°C. In a follow-up study, the inclusion of an additional eight mutations improved the T 50 (temperature at which a 10-min incubation without substrate results in loss of half of the activity) to 72.1°C (37). In a similar approach, chimeric libraries of Cel7A (not including H. jecorina Cel7A) were expressed in S. cerevisiae and found to contain stabilized variants (38). Voutilainen et al. have also described protein engineering efforts via approach ii, resulting in a 4.5°C and 9°C increase in T m for Melanocarpus albomyces Cel7B and Talaromyces emersonii Cel7A, respectively (39,40). Although these studies demonstrate that GH7 enzyme thermal stability can be improved through a variety of protein engineering approaches, we do not yet understand why these mutations, either singular or multiple, result in improved stability. Moreover, there has only been one report of improved thermal stability of H. jecorina Cel7A in the scientific literature (41).
As part of a large, broad campaign to find stabilizing mutations in H. jecorina Cel7A (42), we developed a host/vector sys-tem for heterologous expression of Cel7A in the filamentous fungus Aspergillus niger var. awamori AP4, using a constitutive promoter to limit the background of host proteins and potential interference from other carbohydrases. We demonstrated the successful expression of GH7 CBHs from various fungi in the A. niger AP4 system (20). Using approaches i, ii, and iii above, a collection of H. jecorina Cel7A variants was generated and found to be stable by thermal inactivation screening (42). In this paper we characterize the most stable of those variants, assessing their thermal stability, residual activity after incubation at elevated temperatures using a model substrate, and activity on amorphous cellulose. We also study combinations of the variants with further improvements in stability. This combinatorial approach has proven effective in other systems (43,44). To address the fundamental gap in understanding of structural and dynamic contributions to GH7 thermal stability, specifically that of H. jecorina Cel7A, we focus on the most stable Cel7A variant, FCA398, with 18 mutations, solving its crystal structure and using high-temperature molecular simulation to evaluate changes in its structural dynamics compared with wild type.
Overall, our study reveals that it is possible to make significant improvements to the thermal stability of H. jecorina Cel7A by the accumulation of many mutations, and through a combination of structural and dynamic approaches, we develop rationales for the beneficial effects of most of the mutations made.

Expression of thermostabilized H. jecorina Cel7A variants in A. niger var. awamori
To improve the thermal stability of H. jecorina Cel7A, mutations have been introduced in the protein molecule using protein engineering. Following high-throughput fungal expression and screening, a large set of point mutations resulting in enhanced residual activity after incubation at elevated temperature was identified (42). In the present study, a group of these were chosen for larger-scale expression in A. niger var. awamori AP4 and purification for more detailed analyses (Table 1), including wild-type Cel7A (FCA301) as a control. Site-specific combination mutants were then generated, expressed, and

H. jecorina Cel7A thermal stabilization
screened, leading to a Cel7A variant (FCA398) with 18 mutations and substantially improved thermal stability. Herein, we present the results for the most thermally stable Cel7A variant, FCA398, and key variants in its development ( Table 2).

Determination of protein T m values
Thermal stability was assessed by monitoring the irreversible thermal denaturation of the purified Cel7A variants with CD spectroscopy for determination of protein T m . The obtained T m values and ⌬T m , compared with wild type, are shown in Tables  1 and 2. For comparison, T m was also determined for Cel7A wild type isolated from a native H. jecorina culture; the T m of natively expressed Cel7A, 62.4 Ϯ 0.3°C, was the same as that of FCA301 (wild-type Cel7A expressed in A. niger var. awamori), 62.5 Ϯ 0.3°C. All the analyzed variants exhibited higher T m than the wild-type enzyme, as expected from their identification as more thermally stable in the residual activity screening. In general, the stabilizing effect of individual point mutations was small, where ⌬T m was in the range of ϩ1 to ϩ2°C for a majority, up to ϩ2.7°C for P227L (FCA334), and ϩ2.8°C for S196F/S411F (FCA375) ( Table 1). When a larger number of sites were combined, a significant enhancement of T m was achieved, as exemplified by FCA353, FCA367, and FCA398 in which 6, 11, and 18 mutations, respectively, increased the T m by 3.4, 7.4 and 10.4°C ( Table 2).
The contribution of single-site variants to the thermal stability of the corresponding multi-site combinations was not simply additive. That is, the ⌬T m of combinatorial variants was consistently lower than the direct sum of ⌬T m values for the contributing single-site variants. For instance, one of our best intermediate variants (supplemental Fig. S1), FCA367, with 11 mutations, had a measured ⌬T m of ϩ7.4°C, whereas the summative ⌬T m of the individual 11 mutations would be ϩ11.4°C if thermal stability gains were purely additive. To rule out the possibility that some mutations may counteract thermal stability when combined with each other, all 11 sites of FCA367 were recombined using QuikChange, generating a library of Cel7A mutants with different combinations of these sites. Around 2000 clones, with an average of 8 mutations/clone (990 possibilities), were screened for retention of activity at high temperature. The most thermally stable clone contained all 11 sites, suggesting that all of the point mutations of FCA367 contribute to increased thermal stability. FCA367 was further improved by the addition or substitution of stability-associated mutations using fusion PCR techniques to generate site-specific combination mutants with enhanced thermal stability. The most improved variant was FCA398, which exhibits a T m value of 72.9°C (i.e. 10.4°C higher than wild-type Cel7A) ( Table 2).

Thermal inactivation and half-life, t1 ⁄2 , at elevated temperature
Residual activity measurements were further used to determine inactivation t1 ⁄ 2 for the site-combined variants FCA353, FCA367, and FCA398 compared with wild-type FCA301. The activity of the purified enzymes on 4-methylumbelliferyl-␤-lactoside (MUL) was measured at 50°C, before and after incubation at elevated temperatures, 62, 66, and 69°C. Samples were taken at regular time points up to 20 h for activity measurements on MUL. This experimental design measured irreversible thermal inactivation. The half-life is defined as the time taken for a decrease of activity to 50% of the initial activity. The results show that the Cel7A variants were more stable than wild type, in agreement with measured T m values. Even a seemingly small increase in T m substantially improves the lifetime of the enzyme at high temperature (in the case of FCA353), and for FCA398, the most stable variant, we see large improvements (Table 2); the half-life is 44-, 34-, and 9-fold enhanced compared with wild type at 62, 66, and 69°C, respectively.

Degradation of phosphoric acid swollen cellulose (PASC) by Cel7A variant FCA398 at elevated temperature
Wild-type FCA301 and the most stable Cel7A variant, FCA398, were incubated at 53, 65, and 75°C with 1% PASC as substrate, and the cellulose degradation was monitored over the course of 3 days (72 h) by quantification of released soluble sugar by HPLC (Fig. 1). FCA398 is clearly more active and gave higher sugar yields at all three temperatures than did FCA301 at any temperature. For FCA301, the activity was highest at 53°C and slightly lower at 65°C, whereas the enzyme was practically inactive at 75°C. FCA398 retained relatively high activity even at 75°C, leveling off after ϳ48 h, and its highest activity was observed at 65°C.

3D structure of the catalytic domain of the Cel7A FCA398 variant
To understand the structural basis for improved thermal stability over wild type, the structure of the deglycosylated catalytic domain of FCA398 was determined by X-ray crystallography (Fig. 2). The protein crystallized in space group P2 1 2 1 2 with two protein chains in the asymmetric unit, in a new crystal form with unit cell parameters and crystal packing that are different from that of any previous structure of H. jecorina Cel7A. The structure of the FCA398 catalytic domain was solved by molecular replacement and was refined at 2.1 Å resolution to final R and R free values of 19.3 and 23.9%, respectively. Statistics on data collection, refinement, and the final structure model are given in Table 3. Ramachandran plots are provided in

H. jecorina Cel7A thermal stabilization
supplemental Fig. S2). The structure model has been deposited with the Protein Data Bank (PDB accession code 5OA5). Both protein chains, A and B, of the asymmetric unit contain all amino acid residues (positions 1-434) of the FCA398 catalytic domain and three N-acetyl glucosamine residues (NAG), which are covalently linked to Asn 64 , Asn 113 , and Asn 270 , respectively. Glycosylation at Asn 64 , as observed in this Aspergillus-expressed protein, has not been seen previously in H. jecorina Cel7A structures or glycosylation analyses (45), despite the presence of an N-glycosylation sequence motif at this site also in the wild-type enzyme. At Asn 113 , a new N-glycosylation motif is introduced by the S113N mutation in FCA398.
The structures of the two chains are practically identical with a root mean square deviation (RMSD) of 0.13 Å. The FCA398 structure is very similar to previously deposited H. jecorina Cel7A catalytic domain structures; superposition with the Cel7A-cellononaose complex (PDB code 4C4C (46)) gives RMSD values of 0.35 and 0.34 Å relative to FCA398 chain A and B, respectively. There is only one loop region, 244 -254, where the main chain of FCA398 deviates significantly. This loop bends toward and partially encloses the catalytic center of the enzyme. It has previously been referred to as the "exo-loop" (47) and, more recently, loop B3 (48), and flexibility in the loop has been reported previously (47,49,50). The largest backbone shift between the FCA398 and 4C4C structures is seen at residue 249 (4.6 Å between C␣ atoms), which is the site of the D249K mutation (Fig. 2, A and C).

Molecular dynamics (MD) simulations of Cel7A FCA301 and the FCA398 variant
Molecular dynamics simulations of wild-type Cel7A and FCA398 at 300, 475, and 525 K suggest that the 17 mutations in the FCA398 catalytic domain improve thermal stability by retaining tighter packing of the core domain at elevated temperatures. This is illustrated by two separate analyses of the triplicate simulation trajectories: root mean square fluctuation (RMSF) of the protein backbone and variation in native contacts maintained over time. First, the RMSF of the protein backbone was determined for each of the 18 simulation trajectories. The RMSF of the triplicate simulations were averaged together to obtain an average RMSF of a given enzyme at a given temperature (Fig. 3). The RMSF measures deviation of a residue from its time-averaged position over the course of a simulation. Higher RMSF in the protein backbone typically indicates a more flexible, potentially less thermally stable, region of the protein. At 300 K, the average RMSF of wild type and FCA398 are practically indistinguishable. However, with increasing temperature (475 and 525 K), the FCA398 variant fluctuates less in comparison with wild type, implying that the 17 selected mutations in the catalytic domain increased the stability of the protein. At 475 K, the stabilizing effect of the set of mutations had the most significant impact in the following residue ranges: 243-256, 310 -358, and 378 -396. At 525 K, we observe the same stabilization effects as in the set of simulations at 475 K, but also the region between 260 and 285 benefits from the stabilizing effects of the FCA398 mutations.
Quantitative determination of the total number of native contacts from MD simulations illustrates that FCA398 retained important molecular interactions within the core of the protein more effectively than wild-type Cel7A when faced with thermal stress. First, the number of native contacts formed by each residue was identified. Here, a native contact was defined as any amino acid whose side chain center of geometry was within 6.5 Å of the reference amino residue's C␣. The total number of native contacts (Fig. 4) is the sum of the native contacts formed by all residues in the protein. The total numbers of native contacts formed by both wild-type Cel7A and FCA398 were identical at the 300 K reference temperature over the entirety of the 50-ns MD simulations, although only 15 ns is shown in Fig. 4 for comparison with high temperature behavior. As expected at high temperatures, the protein began to partially unfold at ϳ7 ns, destroying native contacts present at 300 K. Regions having low initial numbers of native contacts began to unfold first during high-temperature simulations (supplemental Fig. S3), corresponding well with the increasing regional fluctuation described above.
The relative difference in number of native contacts retained over the 15-ns high-temperature simulations is easiest to see by examining the difference in the number of native contacts formed by each residue over the entire simulation. For both 475 and 525 K, the native contact heat maps of wild-type Cel7A and FCA398 have been subtracted from each other to reveal regions in which FCA398 maintains a greater number of native contacts at high temperature (supplemental Fig. S4). The region between residues 320 and 340 appears to benefit the most from the stabilizing effect of FCA398 mutations, forming and maintaining 2-6 new native contacts in many locations through the region. The FCA398 regions exhibiting the highest number of maintained native contacts relative to wild type largely correspond to the regions of decreased flexibility identified from RMSF. Notably, several of the FCA398 point mutations (P227L, D249K, T255P, and T332Y) were made in these regions. Interestingly, small regions around residues 160 (525 K) and 250 (475 K) exhibit fewer native contacts in FCA398. As described above, the increase in melting temperature due to an added mutation was never simply additive, and this localized destabilization of the FCA398 variant suggests a mechanism for this

Discussion
In this work, we examine stabilizing mutations of H. jecorina Cel7A and show that the accumulation of such mutations is an effective strategy in the creation of more stable variants. Although the increase in stability, as measured by thermal melting T m , by adding stable variants together was not as large as a purely arithmetic addition, the approach was effective in allowing us to progress from single variants with an increase in T m of up to 2.7°C (relative to wild type) to combinatorial variants with T m increases of 3.4°C, of 7.4°C and, by accumulating 18 mutations (in FCA398), of 10.4°C above the starting point. This stability increase led to greatly reduced rates of thermal inactivation and increased hydrolysis of cellulose (PASC) at elevated temperatures.
Most of the 18 point mutations introduced in the FCA398 variant are located on the surface of the protein and distributed over the whole molecule (Fig. 2). The side chains at six of these sites (N49S, N89D, S92T, S196T, D249K, and T332Y) point out into the surrounding solution, and there are no obvious structural explanations for why these mutations might stabilize the molecule.

H. jecorina Cel7A thermal stabilization
All mutations are found in the catalytic module except one, T462I, which is located at the junction between the linker peptide and the CBM ( Fig. 2A). Thr 462 corresponds to the first residue, Thr 1 , in the NMR structure of the CBM from H. jecorina Cel7A (PDB code 1CBH (51)). Replacement with isoleucine at this site increases hydrophobic interaction with surrounding residues (Pro 16/477 , Val 18/479 , and Ala 20/481 ; 1CBH/ FCA398 numbering) and could, thereby, strengthen the linker-CBM connection. However, T462I appeared neither as a single point variant nor in FCA367. Accordingly, the results suggest that T462I is an unlikely contributor to an overall increase in thermal stability for the FCA398 full-length protein.
None of the mutations in the FCA398 catalytic domain are at sites that are completely conserved in the phylogenetically diverse GH7 CBH sequences, examined in the recent paper by Hobdey et al. (52). Only one residue, Asn 49 , is conserved among the known GH7 CBH structures. Five of the specific amino acid substitutions in FCA398 (Leu 227 , Pro 255 , Lys 295 , Pro 296 , and Tyr 332 ) are not found in the sequences examined by Hobdey et al. (52). Of the remainder, eight (Pro 8 , Thr 68  The catalytic domain of Cel7A is built around two ␤-sheets packed against each other to form a ␤-sandwich that curves around the active site. Long loops extend from the ␤-sheets and fold into irregular loop regions, a few ␣-helices, and short ␤-strands at the surface of the protein; this is where the MD simulations show elevated backbone fluctuations and stabilization of FCA398 relative to wild type (Fig. 3). The largest differences between FCA398 and wild type at high temperature are seen in surface regions clustered around the side of the protein that harbor the product-binding site (to the right in Fig. 2A).

Mutation P227L
Near the product site, we find the P227L mutation, which gave the highest ⌬T m of the single point mutations analyzed (ϩ2.7°C; Table 1). It is the only mutation site in FCA398 where the side chain is completely buried within the protein, sitting on a ␤-strand anti-parallel to the strand that carries the catalytic residues. The flanking residues, Thr 226 and His 228 , point into the product subsite ϩ1, whereas the 227 side chain points in the opposite direction, into the hydrophobic core (Fig. 5). The mutation does not appear to cause significant conformational changes, although the ␤-strand is slightly shifted at Leu 227 in FCA398 compared with Pro 227 in the 4C4C structure (0.8 Å between C␣ atoms). The larger leucine side chain seems to fill the void better than proline, with more van der Waals contacts and tighter hydrophobic packing with surrounding side chains (Met 213 , Ile 300 , Leu 326 , Leu 349 , and Phe 352 ; Fig. 5). The P227L mutation probably contributes to the observed stabilization of the 310 -358 segment by strengthening the hydrophobic interaction with several hydrophobic residues in that segment. Furthermore, the peptide nitrogen at Leu 227 is accessible for a new hydrogen bond (3.0 Å) to the backbone oxygen of Cys 261 in the short 261-263 ␤-strand at the edge of the inner ␤-sheet. Cys 261 , in turn, is bound by a disulfide bridge to Cys 331 (next to the T332Y mutation) in the 328 -338 ␣-helix at the surface (Fig. 5).
The V403D and S411F sites are at either end of a surface helix (at the top of Fig. 2A), which is among the regions that unfold first and where stabilization of FCA398 is indicated by the hightemperature MD simulations. The V403D mutation introduces a new hydrogen bond in FCA398 between the Asp 403 side-chain and the backbone amide of Gln 406 in the helix. This type of hydrogen bonding, referred to as N-terminal capping, stabilizes the ␣-helix (53,54), which in turn should stabilize the whole protein (55). At the other end of the helix, the S411F mutation introduces an aromatic residue partially buried underneath the side chain of Gln 410 . The Phe 411 in FCA398 provides a larger hydrophobic interface with underlying amino acids (Pro 137 , Leu 140 , and Val 407 ), which should anchor the helix more firmly at the surface of the catalytic domain. Interestingly, an aromatic residue at this position is also present in the documented thermostable cellobiohydrolases M. albomyces Cel7B and Thermoascus aurantiacus Cel7A (39,56).

Tunnel entrance region
Near the entrance to the cellulose binding tunnel, at subsite Ϫ7, the extended B1 loop (residues 41-63) and the following short ␣-helix (residues 64 -70) cover the protein surface. Three of the FCA398 mutations, T41I, N49S, and A68T, are found in  Fig. S3), residue 41 in FCA398 retained more native contacts than in the wild type, indicating more favorable interactions with the neighboring amino acid residues. The A68T site is located in the 64 -70 ␣-helix, partially buried at the interface between the helix, the N-terminal pyroglu-

H. jecorina Cel7A thermal stabilization
tamate residue, one loop turn formed by residues 183-185, and Thr 160 in another loop turn at the base of loop B2. The FCA398 Thr 68 side chain makes additional van der Waals contacts with surrounding residues and forms two new hydrogen bonds (with the backbone carbonyl oxygen at position 64 within the helix and with the side chain of Gln 182 , which in turn bonds to Thr 160 ). The A68T mutation is thus likely to anchor the helix more firmly at the surface and may also contribute to stabilization of the neighboring loop regions. The MD simulations revealed that the A68T mutation in FCA398 variant significantly increased hydrogen bond interactions with Gln 186 and Ala 187 at native temperature, as well as at the two high temperatures, in comparison with the wild type. These newly formed interactions are assumed to provide additional stability to the disulfide bridge between Cys 61 and Cys 67 . Together, these contributions result in an increase of T m by about 1.2°C in the single A68T mutation (Table 1).

Surface loop turns
13 of the FCA398 mutations are at the surface of the catalytic module, and 10 of these sites are in, or close to, loop turns at the surface (S8P, N49S, N89D, S113N, S196T, D249K, T255P, S278P, E295K, and T296P). All four of the new proline residues introduced into FCA398 are found in such surface turns. The importance of proline residues in thermal stability of enzymes has been described by Matthews et al. (57). As a prime example, Muslin et al. (58) showed the effect of proline insertions on the thermostability of a barley ␣-glucosidase. Introduction of proline residues in FCA398 at these four positions, S8P, T255P, S278P, and T296P, probably stabilizes the Cel7A molecule by restricting main-chain flexibility and, thus, the mobility at these turns. MD simulations suggest that the T255P and S78P mutations have the largest impact on flexibility (Fig. 3). The S8P, T255P, and T296P mutations also provide a larger area of hydrophobic contact with underlying protein atoms, which may contribute further to stabilization of the respective surface turns. The T296P site is in a hairpin turn (residues 295-298), which also carries the E295K mutation. The Lys 295 side chain introduced in FCA398 forms a new salt bridge with Glu 325 , which may add to stabilization of the 295-298 turn and also contribute to the stabilization effect on the 310 -358 region seen in the high-temperature MD simulations ( Fig. 3 and supplemental Fig. S4).

New N-glycosylation at S113N site
One extra NAG residue in the FCA398 structure compared with the wild-type Cel7A structure was found covalently bound to Asn 113 (Fig. 6). The S113N mutation, also at a surface hairpin turn, introduces an additional consensus glycosylation motif (Asn-Asp-Thr) in the amino acid sequence, which has been recognized by the N-glycosylation machinery in the endoplasmic reticulum of the expression host A. niger to produce a new glycosylation. This single mutation resulted in an increase in T m of 1.5°C (Table 1). Mutating this residue to an aspartic acid did not result in a significant increase in T m compared with wild type (data not shown), suggesting that the stability effect of the S113N mutation may be the result of an altered glycosylation pattern of the protein. Interestingly, an N-glycosylation motif is present at the corresponding position in the GH7 CBHs from the close relative Trichoderma harzianum, as well as the distantly related social amoeba Dictyostelium purpureum and D. discoideum. An attached NAG residue is indeed visible at this site in the structures of the two former enzymes (52,59).

Role of cystine residues in stability
A striking observation is that several mutations are in the vicinity of a cystine residue and interact, either directly or indirectly, with the cystine residues. N49S and A68T are near Cys 50 , Cys 61 , Cys 67 , Cys 71 , and Cys 72 , within the 41-74 loop. Four of these cystine residues make short disulfide bridges (Cys 50 -Cys 71 and Cys 61 -Cys 67 ) within the loop, and the fifth, Cys 72 , makes a bridge with Cys 4 at the N terminus of the enzyme. Another example is at the apex of a loop extending from residue Thr 231 to Asp 259 , where a threonine has been replaced with a proline at position 255. The rigidity of this loop, probably increased by this mutation, in turn, stabilizes the position of Cys 256 , in a disulfide bond with Cys 230 . MD simulations support the proposal that the T255P mutation stabilized the protein and the disulfide bond between Cys 230 and Cys 256 (Fig. 3). The single mutant T255P improves T m by around 2°C (Table 1).
Interactions with cystines could be of importance for the thermal stability of Cel7A; some of the previously determined structures of H. jecorina Cel7A, solved by our group, have shown that the thiol group of some of these cystines can have multiple conformations. The N49S and A68T mutations, for example, may contribute to FCA398 stability by stabilizing lowenergy conformations of the disulfides in the 41-74 loop. The H. jecorina Cel7A molecule contains 10 disulfide bridges in the catalytic domain and 2 in the CBM (SwissProt P62694). Disulfide bridges between cystine residues can also be important factors in the thermal stability of enzymes. The cystines involved in disulfide bridges represent about 5% of the residues of the Cel7A catalytic domain, leading one to speculate that  (39) showed that introducing a disulfide bridge found in wild-type H. jecorina Cel7A into the thermostable M. albomyces Cel7B increased the unfolding temperature of this enzyme by 4°C. Although no disulfides were formed or broken during our experiments or simulations, presumably these covalent bridges limit the possible motions in their vicinity and could indicate regions that would otherwise be sites of local unfolding on the path to thermal denaturation. Thus, mutations near disulfides could further stabilize vulnerable regions and have an indirect effect by stabilizing the disulfide conformation.
In conclusion, we have developed a robust pathway of mutagenesis of the H. jecorina Cel7A molecule by combining mutated sites and adding their individual influences to improve the thermal stability of Cel7A. The best Cel7A variant obtained, FCA398, inactivated much more slowly than the wild type at elevated temperatures, having a half-life 44 times that of wild type at 62°C, and had an increase of 10.4°C in its thermal denaturation melting point. We have shown that this improvement in the thermostability of the Cel7A molecule enables degradation of amorphous cellulose at elevated temperatures, up to 75°C. The variant FCA398 contained 18 mutated sites compared with the wild-type enzyme, and we developed rationales for the stabilizing effects of most of the mutations on the Cel7A molecule by applying structural and molecular dynamics analyses.

H. jecorina Cel7A combinatorial variant construction
Sites with increased thermal stability were identified and selected by screening libraries of H. jecorina Cel7A mutants heterologously expressed in A. niger var. awamori AP4 (42).
Residual activity of small-scale (100 l) culture supernatants was assessed by measuring cellobiohydrolase activity against MUL (Sigma, M2405) at 50°C, as described previously (20), before and after heat treatment at elevated temperature (62°C, for initial rounds, or 69°C, as more stable variants were introduced as library starting points) for 4 h.
Specific mutagenized primers were developed, and mutated sites were combined using fusion PCR techniques as described (60). The PCR fragments containing the mutagenized Cel7A gene cbh1 were cloned in E. coli strain MAX Efficiency DH5␣ (Invitrogen), using plasmid pDONR TM 201 (Invitrogen), and then transferred to the E. coli/A. niger shuttle expression vector plasmid pRAXdes, which was subsequently transformed into A. niger var. awamori AP4 (61), as described previously (20,62). General recombinant DNA procedures were adapted from the literature (63). The mutated genes were sequenced by Base-Clear Holding BV (Leiden, The Netherlands), and sequence data were analyzed using the software package VectorNTI. The nucleotide sequences were translated into protein sequence and compared with the wild-type sequence to identify the mutated sites. The selected Cel7A variants that contributed to the design of the combinatorial variant FCA398 are listed in Table 1. Table 2 lists combinatorial variants. Variant FCA301 is wild-type H. jecorina Cel7A expressed in A. niger var. awamori as a control.

Recombination of mutated sites of variant FCA367
All 11 mutated sites of FCA367 were separately introduced into 5Ј-phosphorylated mutagenized primers. These 11 primers were then used to mutagenize Cel7A to produce a library of mutants with several different combinations, using the QuikChange multisite-directed mutagenesis kit (Stratagene, La Jolla, CA, catalog no. 200518). A total of around 2000 clones, containing an average of 8 mutations/clone (990 possibilities), were screened for thermal stability, as measured by retention of activity after incubation at high temperature (69°C).

Expression and purification of selected Cel7A variants
The A. niger var. awamori transformants were grown in 500-ml shake flasks for 3 days at 37°C (62). Expressed Cel7A variants were then purified from culture supernatants by hydrophobic interaction chromatography on phenyl-Sepharose resin (GE Healthcare catalog no. 17-0973-05) equilibrated with 5 column volumes (CV) of 0.020 M sodium phosphate, 0.5 M ammonium sulfate at pH 6.8. Ammonium sulfate was added to the supernatants to a final concentration of 0.5 M, and the pH was adjusted to 6.8. After filtration, the supernatant was loaded onto the column. The column was washed with 10 CV of equilibration buffer and then eluted with a 10-CV gradient from 0.5 to 0 M ammonium sulfate in 0.020 M sodium phosphate, pH 6.8. Cel7A eluted approximately mid-gradient. Fractions were collected and pooled on the basis of SDS-PAGE analysis.

Determination of T m
Protein T m values were determined for the purified enzymes by CD spectroscopy, as described previously (20). The thermal unfolding of the Cel7A variants was not reversible; in our measurements of Cel7A melting points, the native, starting, CD spectrum (and, specifically, the 230 nm value) was not recovered upon cooling.

Thermal inactivation half-life measurements
For determination of irreversible inactivation t1 ⁄ 2 , the purified variants, FCA301, FCA353, FCA367, and FCA398, were incubated at 62, 66, and 69°C in a microtiter plate incubator (iEMS TM microplate incubator/shaker HT, Thermo Fisher Scientific). Samples were taken at regular time intervals up to 20 h, cooled on ice for 10 min, and assayed for residual activity on MUL (measured at 50°C, as described (20)). The half-life was derived from the slope (k) of the natural logarithm of residual activity versus time using the formula t1 ⁄ 2 ϭ ln 2/k.

H. jecorina Cel7A thermal stabilization Preparation and crystallization of Cel7A variant FCA398 catalytic domain
Before crystallization, the H. jecorina Cel7A FCA398 variant enzyme (expressed in A. niger var. awamori AP4 and purified as described above) was deglycosylated and treated with papain to remove the C-terminal peptide linker and carbohydrate-binding module from the full-length protein, using a procedure similar to that described previously (67). The FCA398 protein, in 100 mM sodium acetate, pH 5.0, and 20 mM zinc acetate, was mixed with ␣-mannosidase (Sigma-Aldrich) and endo-Nacetyl glucose aminidase H to a final concentration of 40:2:1 (Cel7A/␣-mannosidase/endo-N-acetyl glucose aminidase H, w/w/w) and incubated at room temperature for 24 h. Activated papain (Sigma-Aldrich) was then added to a final concentration of 50:1 (Cel7A/papain, w/w). After overnight incubation at room temperature, the buffer was changed to 20 mM sodium acetate, pH 5.0, and the protein was loaded onto a Source 30Q anion exchange column (GE Healthcare) and eluted by a linear 0.1-1.0 M sodium chloride gradient. Fractions containing FCA398 catalytic domain were collected, and after buffer exchange to 25 mM sodium morpholine ethane sulfonic acid, pH 6.0, the protein was concentrated to 12 mg/ml. The purity of the protein was Ͼ95%, as determined by SDS-PAGE. Note that one of the mutations in FCA398, T462I, is in the linker and, hence, is removed by this process.
Initial screening for crystallization conditions for the FCA398 variant was performed with the JCSGϩ screen (Qiagen). Well-diffracting crystals were obtained in drops with 30% JCSGϩ solution D9 (25.5% PEG 4000, 0.17 M ammonium sulfate, and 15% glycerol) and 70% protein solution (ϳ12 mg/ml), equilibrated against D9 solution at 20°C (68). Large, single, square-shaped crystals grew to a size of 10 ϫ 200 ϫ 200 m in sitting drops within 1 week. The crystals belong to the space group P2 1 2 1 2 with the cell dimensions a ϭ 102.9 Å, b ϭ 92.1 Å, c ϭ 102.2 Å and have a calculated V m of 2.48 (69,70), a solvent content of 50.5%, and two Cel7A molecules in the asymmetric unit.

X-ray data collection, structure solution, and refinement
Single crystals were picked with cryoloops and flash-frozen in liquid nitrogen. X-ray diffraction data were collected at 100 K on beam line ID23-2 at the European Synchrotron Radiation Facility (Grenoble, France). Data collection and processing statistics for the final structure are given in Table 3. The data were integrated with XDS and scaled with SCALA in the CCP4 package (71)(72)(73). 5% of the reflections were set aside for calculation of R free (74). The structure of the H. jecorina Cel7A FCA398 catalytic domain was solved by molecular replacement using PHASER with a wild-type H. jecorina Cel7A structure as the search model (PDB code 2V3I (75)). The structure was refined with alternating cycles of model building using COOT and maximum likelihood refinement in Refmac version 5.0 (76). Most water molecules in the structure model were located automatically using water picking protocols in the refinement program. These water molecules were then manually selected for inclusion or discarded by visual inspection. A summary of the data collection and refinement statistics is given in Table 3. All structural comparisons were made with COOT, and figures were prepared with PyMOL (77). Coordinates and structurefactor amplitudes have been deposited with the Protein Data Bank (78) under PDB accession code 5OA5.

Molecular dynamics simulation
To evaluate the role of mutations in enhancing thermal stability, classical MD simulations of wild-type Cel7A and the FCA398 variant were performed at three temperatures: 300, 475, and 525 K. The simulation at 300 K was used to identify baseline dynamics of the two enzymes under standard conditions, providing a point of comparison for behavior at elevated temperatures. High-temperature simulations, at 475 and 525 K, were performed to observe the structural changes that occur as part of the thermal unfolding mechanisms. The selected temperatures are necessarily higher than those the proteins would be exposed to in vivo, so that unfolding was observable within a feasible timeframe for molecular simulation. Although the temperatures are quite high, we do not anticipate adverse impact on predicted thermal unfolding mechanisms; prior studies, conducted at simulation temperatures up to 600 K, have established that unfolding mechanisms are relatively independent of simulation temperature (79 -81). Comparisons between an averaged set of trajectories, thus defining the unfolding mechanism, were made on the basis of native contacts, where a native contact is any residue within 6.5 Å of a residue of interest (82).
Wild-type Cel7A and the FCA398 variant were constructed from deposited PDB structures, 4C4C (83) and 5OA5, respectively. The E217Q mutation in the 4C4C structure was reversed from Gln to Glu to obtain native Cel7A, and the cellononaose oligomer was removed from the catalytic tunnel. The FCA398 model required no manipulation of the active site. In this study, we modeled only the catalytic domains of the enzymes, maintaining consistency with the crystal structures. Accordingly, the FCA398 model contained only 17 of the 18 mutations (Table 2), as the T462I appears in the linker domain of the full-length protein. The catalytic domains were modeled in a deglycosylated state; based on previous observations, glycosylation effects on protein dynamics are marginal over the MD simulation timescales (84,85). Protonation states were determined using Hϩϩ at pH 5 with an internal and external dielectric constant of 10 and 80, respectively (86 -88). Selected protonation states and disulfide bonds are provided in supplemental Table S1. The enzymes were then explicitly solvated in water in an 80 ϫ 80 ϫ 80 Å cubic box, and sodium ions were added to ensure charge neutrality of the systems. The resultant systems contained roughly 52,000 atoms.
After system construction, the models were minimized, heated, and equilibrated before data collection. First, the water molecules and ions were minimized for 1000 steps of steepest descent in CHARMM, keeping the protein fixed (89). This was followed by 1000 steps of steepest descent in which the entire system was allowed to move freely. After minimization, the systems were heated from 100 to 300 K in the NVE ensemble over 20 ps in 50 K increments. At this point, three independent 300 K simulations were started from different random number seeds for both wild-type Cel7A and FCA398 (supplemental Fig.  S4). These six simulations were then density-equilibrated for 100 ps in the NPT ensemble in CHARMM using a Nosè-Hoover thermostat and barostat (90,91). The CHARMM36

H. jecorina Cel7A thermal stabilization
all-atom force field with CMAP corrections was used to model the proteins (89,92,93), and the modified TIP3P force field was used to model water (94,95). Long-range electrostatic interactions were modeled using the particle mesh Ewald approach. The nonbonded cut-off distance was 10 Å, with a switching distance of 9 Å and a nonbonded pair list distance of 12 Å.
The equilibrated systems were simulated for 50 ns in NAMD at 300 K in the NVT ensemble using a 2-fs time step (96). High-temperature simulations were started from a 10-ns snapshot of the 300 K simulations and run for 15 ns in the NVT ensemble (supplemental Fig. S5). In all, 18 simulations were performed: three replicates at 300, 475, and 525 K for both wild-type Cel7A and FCA398 (supplemental Fig. S4). VMD was used to visualize the trajectories (97).