A consensus-guided approach yields a heat-stable alkane-producing enzyme and identifies residues promoting thermostability

Aldehyde-deformylating oxygenase (ADO) is an essential enzyme for production of long-chain alkanes as drop-in biofuels, which are compatible with existing fuel systems. The most active ADOs are present in mesophilic cyanobacteria, especially Nostoc punctiforme. Given the potential applications of thermostable enzymes in biorefineries, here we generated a thermostable (Cts)-ADO based on a consensus of ADO sequences from several thermophilic cyanobacterial strains. Using an in silico design pipeline and a metagenome library containing 41 hot-spring microbial communities, we created Cts-ADO. Cts-ADO displayed a 3.8-fold increase in pentadecane production on raising the temperature from 30 to 42 °C, whereas ADO from N. punctiforme (Np-ADO) exhibited a 1.7-fold decline. 3D structure modeling and molecular dynamics simulations of Cts- and Np-ADO at different temperatures revealed differences between the two enzymes in residues clustered on exposed loops of these variants, which affected the conformation of helices involved in forming the ADO catalytic core. In Cts-ADO, this conformational change promoted ligand binding to its preferred iron, Fe2, in the di-iron cluster at higher temperature, but the reverse was observed in Np-ADO. Detailed mapping of residues conferring Cts-ADO thermostability identified four amino acids, which we substituted individually and together in Np-ADO. Among these substitution variants, A161E was remarkably similar to Cts-ADO in terms of activity optima, kinetic parameters, and structure at higher temperature. A161E was located in loop L6, which connects helices H5 and H6, and supported ligand binding to Fe2 at higher temperatures, thereby promoting optimal activity at these temperatures and explaining the increased thermostability of Cts-ADO.

Employing engineered microbes for the production of fuels provides an attractive alternative for mitigating the dependence on currently available fossil fuels (1,2). Research on alk(a/e)ne production gained tremendous attention recently due to its high energy density and compatibility with the existing fuel infrastructure, which makes it an ideal candidate for drop-in biofuels (3)(4)(5)(6)(7). Its production has been reported in various microorganisms, including algae, yeasts, fungi, and bacteria (8 -11), with the most consistent reports being from cyanobacteria and from natural habitats dominated primarily by cyanobacteria (12)(13)(14)(15). Recently, two genes from cyanobacteria encoding an acyl-ACP reductase (AAR) 3 and an aldehyde-deformylating oxygenase (ADO) were identified and were proven to be responsible for the production of fatty alk(a/e)nes in cyanobacteria (3). This pathway involves the reduction of the fatty acyl-ACPs into their corresponding aldehydes by the first enzyme, i.e. AAR, and the subsequent conversion of these fatty aldehydes into the respective one carbon less (C n Ϫ 1 ) alk(a/e)nes by ADO (3). There have been reports of enzymes functionally similar to the AAR, like the carboxylic acid reductase from Mycobacterium marinum that was found to be capable of converting a wide range of aliphatic fatty acids (C 6 -C 18 ) into their corresponding aldehydes (16). Another alternative route for the production of fatty aldehydes from free fatty acids was also explored by artificially assembling a metabolic pathway consisting of fatty acid reductase complex encoded by the genes luxC, luxE, and luxD (17,18) from the bioluminescent bacterium Photorhabdus luminescens (19). The fatty aldehydes produced from both these routes (16,19) can be coupled to the ADOs from cyanobacterial sources for their subsequent conversion to alkanes, a very unique and chemically difficult process (20).
Because of the irreplaceable mechanism of action catalyzed by ADO, this enzyme has gained tremendous attention. Several reports on the functionality of the ADO enzyme have elicited various conclusions (21). Crystal structures of both WT and single-point mutant ADOs from P. marinus MIT9313 have been reported by Khara et al. in 2013 (22). It has been proposed that ADOs belong to the superfamily of ferritin-like di-iron proteins and encompass a di-iron center (23,24). The structures indicate that ADO adopts an ␣-helical folding with two iron atoms that are coordinated by two histidine and four glutamate residues acting as metal ligands (3,22,25,26). The ADO-catalyzed reactions invariably involve molecular oxygen and require an external reducing system for providing electrons that can be either chemical or biological (24,(27)(28)(29). The chemical-reducing system includes phenazine methosulfate and NADH (24), and the biological reducing system involves ferredoxin (Fd), ferredoxin reductase (FR), and NADPH (3,27) to reduce the diferric form of the enzyme into its active diferrous form at the beginning of each turnover. It has been proposed that the iron-peroxo species, generated by the entry of the oxygen, attacks the aldehyde substrate resulting in the formation of hemiacetal and causing the scission of the C1-C2 bond (28,30). Earlier it was hypothesized that the C1-derived product is carbon monoxide, which was later proven to be formate (3,27,28,31). It was further observed that ADO is inhibited due to the formation of hydrogen peroxide (H 2 O 2 ) and that supplementing the depleted assays with catalase could restore the ADO activity (32). Zhang et al. (33) later showed that in vitro reconstituted ADO with an endogenous reducing system from cyanobacteria exhibited greater activity and resulted in a decreased formation of H 2 O 2 . Moreover, fusing ADO with cognate FR and Fd improved its activity by almost 3-fold (34). Mechanistic studies have demonstrated that a radical intermediate is involved in the ADO-catalyzed reaction, and a possible catalytic process has been proposed based on the crystal structures of ADO from the Synechococcus elongatus PCC7942 strain (35)(36)(37)(38). Recently, ADO was engineered to improve specificity toward short to medium chain length aldehydes for the production of short-chain alkanes (22). The role of cysteine residues has been investigated in ADO from Nostoc punctiforme PCC 73102, and it was shown that Cys-71 is involved in maintaining the activity, structure, and stability of ADO (39). Certain amino acid residues present in the vicinity of the substrate channel were identified for successfully changing the substrate chain length selectivity of ADO (21). The ADO belongs to the nonheme dinuclear iron oxygenase family of enzymes that includes methane monooxygenase, type I ribonucleotide reductase, and ferritin (3, 35, 40 -42). The bound substrate analogs such as fatty acids or fatty alcohols or fatty aldehydes have been observed to occur in the vicinity of the di-iron center in these structures (3,22,35,40). The active site of the enzyme is housed within an antiparallel 4-helix bundle in which the two iron atoms are each coordinated by a histidine and two glutamic acid residues from the protein. Few important amino acids close to the di-iron center have been identified that exerted beneficial effects on the ADO activity (43). However, so far there are no reports regarding studies on the thermostability of ADO.
Thermostable enzymes having longer operational stability at higher temperature offer robust catalyst alternatives capable of withstanding the comparatively stringent environments of industrial processing (44). Given the biosynthetic potential of ADOs in industries for alkane production, we designed a robust metagenome pipeline to generate a synthetic thermostable consensus sequence and compared its activity in vivo and in vitro with an efficient mesophilic ADO enzyme ( Fig. 1) adapted from earlier reports (3). This was followed by a comparative computational evaluation of the mesophilic ADO and its thermophilic counterpart to identify amino acid residues responsible for imparting thermostability. Based on this, four singlemutant variants and their combination were generated and assessed for their impact on catalytic activity and thermostability of mesophilic ADO from N. punctiforme. The thermostable ADO generated in this study would ensure enzyme functionality and alkane formation for a longer duration under in vivo conditions at higher temperatures.

Design of in silico pipeline for generation of consensus thermostable ADO
In the quest to identify thermostable ADO enzymes functional at higher temperatures, we designed a pipeline to construct a consensus thermostable ADO using hot-spring metagenome libraries of cyanobacterial strains (Fig. 1). Metagenome libraries of 41 hot-spring microbial communities were screened for the presence of ado gene homologs using the ado gene from S. elongatus PCC7942 (se-ado) as the query sequence. A total of 80 ado gene homologs were retrieved using this analysis. Of these, only 45 ado gene homologs showed significant BLASTN alignment (45) with the se-ado gene and thus were selected for further analysis. Translation of the sequences and consideration of only one sequence from every genus led to selection of 10 unique ado gene homologs. The BLAST results of these homologs indicated the closest hits being cyanobacterial species belonging to diverse genera (Table S1). These homologs were then used for the generation of a consensus thermostable ADO protein (Cts-ADO) sequence using the Jalview software (Fig.  S1A). The gene encoding this sequence (cts-ado) was synthesized following codon optimization (Fig. S1B) for optimal expression in Escherichia coli and further expressed in E. coli to analyze its ability to produce alkanes in vivo.
To test the activity of the recombinant Cts-ADO under in vivo conditions at different temperatures, the cells were grown at 30, 37, and 42°C in the culture medium supplemented with 100 mg/liter exogenous aldehyde hexadecanal (to serve as substrate for the ADO enzyme), and the subsequent alkane formed was analyzed in the extracellular medium using GC-MS/MS. A synthetic gene encoding an efficient ADO from a mesophilic cyanobacterial species, N. punctiforme (np-ado) (3), was taken as control. It was observed that pentadecane level of the E. coli strain expressing np-ado was significantly higher than the one expressing cts-ado at 30°C ( Fig. 2A). This could be attributed to a higher in vivo soluble expression level of np-ado as compared with cts-ado, as monitored by Western blot analysis (Fig. 2B). However, although np-ado exhibited a 1.7-fold decline in pentadecane production by raising growth temperature from 30 to 42°C, the cts-ado showed a remarkable 3.8-fold increase in pentadecane production under similar conditions. We also noticed that although the soluble expression of np-ado declined at 37°C, the soluble expression of cts-ado improved at this temperature as compared with that at 30°C (Fig. 2B). Interestingly, Protein engineering for improved thermostability of ADO although soluble expression of ADO at 42°C did not change as compared with 37°C in both the E. coli strains (Fig. 2B), the alkane titer increased only in the case of cts-ado. This finding indicated higher specific activity of the ADO encoded by ctsado at higher temperatures.
Nevertheless, in addition to the impact of temperature on the alkane titer, the soluble expression of ADOs under in vivo conditions also had a major impact on the alkane titer (Fig. 2, A and  B). We thus decided to monitor the specific activities of these ADOs under in vitro conditions. This would also circumvent the limitation of testing the activity of the mutants beyond 42°C under in vivo conditions in E. coli. Each of these proteins was recombinantly expressed in E. coli, purified by metal-affinity chromatography (Fig. S2), and the in vitro assay was conducted as described under "Experimental procedures." From the in vitro assay, it was inferred that Cts-ADO had the maximum activity at 320 K (ϳ47°C), a 2.5-fold higher activity than at 300 K (ϳ27°C) ( Fig. 3 and Table S2). It also demonstrated significant activities at much higher temperatures of 330 K (ϳ57°C) and 333 K (ϳ60°C), which were close to that of 300 K (ϳ27°C). On the contrary, Np-ADO showed a minor increase in activity from 300 to 310 K and then a continuous fall in activity on a further rise in temperature, eventually leading to no detectable activity at 330 K ( Fig. 3 and Table S2).

Computational assessment of operational stability of ADOs at different temperatures
Based on the observed in vivo and in vitro functional variations between the mesophilic ADO and its thermophilic counterpart, we were motivated to understand the molecular events that could possibly contribute to the protein stability and activity at higher temperatures using computational modeling and simulation methods (46 -51). Experimentally determined crystal structures of ADO are available at the PDB database (35,40) with different bound substrate analogs such as fatty acids, fatty alcohols, or fatty aldehydes (3,22,35,40). From the available protein structures in the PDB database, 2OC5 and 4TW3 from Prochlorococcus marinus MIT9313 and 4RC5 from S. elongatus PCC7942 were selected to serve as templates, and the amino acid sequences corresponding to these PDB entries were retrieved and subsequently aligned to the target protein sequences with ClustalW to prepare for homology modeling. From the 20 models generated through the MODELLER 9.15 (54), the best model ( Fig. 4) was selected based on the least DOPE score, and its corresponding ⌽ and ⌿ distributions of the nonglycine and nonproline residues are summarized on the Ramachandran plot (Fig. S3). As expected from the previous reports (3,22,35,40), the protein adopts an ␣ helical folding with two iron atoms (namely Fe1 and Fe2) being coordinated by Figure 1. Schematic representation of the study design. A metagenome pipeline was constructed for generation of the consensus thermostable ADO (cts-ado), which was further compared with its mesophilic counterpart for the prediction of amino acid residues responsible for imparting thermostability. Computational assessment and in vivo and in vitro activity measurements were incorporated in the study to evaluate the mutants at higher temperatures. histidine residues and carboxylate ligands. The modeled structures of both Np-ADO and Cts-ADO appeared quite similar and consisted of eight ␣ helices connected by loops. Helices H1, H2, H4, and H5 formed a compact structure and housed two iron atoms forming the catalytic core of the protein. The iron atoms are coordinated by six conserved amino acid residues (Glu-32 from helix H1; Glu-115 from helix H4; Glu-60 and His-63 from helix H2; and Glu-144 and His-147 from helix H5), which act as metal ligands. Some notable differences in the loop L2 connecting helices H1 and H2 and loop L6 connecting helices H5 and H6 were visible between the two structures ( Fig. 4), which needs to be explored further.
The modeled structures were used as starting points for an explicit 100-ns molecular dynamics simulation to observe significant changes in the conformational trends of the different proteins with temperature. MD simulations performed at different temperatures could potentially provide valuable infor-mation relating to protein unfolding. For each simulation, the root mean square deviations (r.m.s.d.) of the protein backbones were calculated, and the results are plotted in Fig. 5. Our results indicated that both the mesophilic and thermophilic proteins were mostly stable between 300 and 320 K (ϳ27-47°C) with their average r.m.s.d. in general up to 2-2.5 Å units under 100-ns time scale simulations. However, increasing the temperature to 330 K (ϳ57°C) resulted in a significant increase in the r.m.s.d. fluctuations of the mesophilic ADO, and the thermophilic ADO r.m.s.d. continued to show an average deviation of less than 2.5 Å units from the starting point thus indicating the relative stability of the thermophilic ADO. Further increasing the temperature to 340 K (ϳ67°C) resulted in complete loss of

Protein engineering for improved thermostability of ADO
stability in the mesophilic ADO, whereas the thermophilic ADO still remained stable with its r.m.s.d. values averaged between 1 and 3 Å units under 100-ns simulations.
It was postulated earlier that rigidity is a prerequisite for improved thermostability based on comparisons between mesophilic and thermophilic proteins (55,56). The large thermal fluctuations within flexible regions potentially expose the hydrophobic core of a protein to water penetration, triggering protein unfolding (57). To identify thermally sensitive regions of ADO, the secondary structures of Np-ADO and Cts-ADO retrieved from the cluster representatives of the largest clusters at different temperatures were plotted and analyzed (Fig. 6). Structural analysis revealed 10 distinct changes spread across the entire protein structure of the Np-ADO when the temperature was increased from 300 to 320 K and to 340 K (Fig. 6A, panels 300K, 320 K, and 340 K with changes marked in red). The majority of these changes involved elongation or shortening of loops or deletion and formation of additional loops. Changes were also observed in the structure of Cts-ADO when temperature was increased from 300 to 320 K, and then further to 340 K; however, those changes were far less intense than that of Np-ADO under similar condition (Fig. 6, A and B). A unique pattern was observed in the loops L2 and L6 in both Np-ADO and Cts-ADO on increasing the temperature. Although these loops appeared to be distorted constantly in Np-ADO when the temperature was being raised from 300 K, they were structured in Cts-ADO when the temperature was raised from 300 to 320 K and to a certain extent to 340 K as well. Changes in loop L6, which connects the helices H5 and H6, may impact flexibility of the helix H5, which has been shown to be an important parameter for the optimal activity of ADO (35). Loop L2 that connects the helices H1 and H2 was also found to be important for the enzyme activity as these helices are directly involved in aiding the substrate binding (35). These differences in the behavior of structures of Np-ADO and Cts-ADO upon an increase in temperature may well explain the reasons for an increase in catalytic activity of Cts-ADO at higher temperatures under both in vivo and in vitro conditions but the reverse in the case of Np-ADO.
We further wanted to assess the impact of these dynamic variations on the positioning of active-site residues and their subsequent interactions with iron atoms (Fe1 and Fe2) and substrate ligand upon increasing the temperature. These changing interactions were visualized using LigPlot ϩ version 2.1 ( Fig. 7) (58). The differences in the interactions of the substrate ligand and the residues lining the substrate channel arising due to changes in the confirmation of the helices forming the binding pocket have been further summarized in Fig. 8. It has been shown in previous studies that the ligand approaches the diiron cluster from opposite side of the His-148 residue with its headgroup binding directly to the Fe2, which is the preferred iron for substrate binding (21,35). We also observed similar interactions in the case of the Np-ADO at 300 K where ligand appeared to be bound to the Fe2 (Fig. 7, Np-ADO, 300 K; Fig. 8, Np-ADO, 300 K). The distance between the headgroup of the ligand and the Fe2 was 2.1 Å as against 4.0 Å between the ligand and the Fe1 (Fig. S4A, 300 K). After increasing the temperature to 320 and 340 K, stearic hindrance from Tyr-123 present on helix H4 appeared to alter the substrate binding, leading to a shift in substrate binding toward Fe1 (Fig. 8, Np-ADO, 320 K; Fig. S4A, 320 K and 340 K). This could lead to a loss in activity because Fe2 is the preferred iron for substrate binding. Interestingly, a reverse trend was observed in the case of Cts-ADO. Here, the headgroup of the ligand was interacting with the Fe1 Protein engineering for improved thermostability of ADO at 300 K, and the distance between the Fe1 and the substrate ligand was only 2.1 Å against 3.9 Å with the Fe2 (Fig. 7, Cts-ADO, 300 K; Fig. 8, Cts-ADO, 300 K; and Fig. S4B). On analyzing the active-site residues, it was found that at 300 K, the residue Tyr-40 (located on helix H1 and not part of active-site residues) was also interacting with the Fe2 thus possibly hindering the binding of the substrate (Fig. 7, Cts-ADO, 300 K). However, on increasing the temperature to 320 and 340 K, there was a change in positioning of the substrate that appeared to shift toward Fe2 from 3.9 Å to 2-2.1 Å (Fig. 7, Cts-ADO; Fig.  8, Cts-ADO; and Fig. S4B, 320 and 340 K). These phenomena explain why activity of Cts-ADO increased with an increase in temperature, whereas the activity of Np-ADO decreased under similar conditions.
We further analyzed the change in the surface hydrophobicity of both the Np-ADO and Cts ADO on different temperatures using the pre-set option in the Chimera software, which determines the hydrophobicity employing the Kyte and Doolittle scale (Fig. S5) (59). It was noted that hydrophobic residues increased on the surface of Np-ADO when the temperature was raised from 300 to 320 K and further to 340 K, indicating unfolding of the protein. In contrast, the overall hydrophilic nature of the thermostable Cts-ADO remained consistent throughout the increase in the temperature (Fig. S5). This phenomenon also provides the rationale for our observation of in vivo study (Fig. 2B) where the solubility of Np-ADO decreased at higher temperature but it increased in the case of Cts-ADO.

Identification and functional validation of amino acid residues conferring thermostability to ADO
To identify amino acid residues conferring thermostability to ADO, the regions undergoing fluctuations with increases in temperature in the mesophilic ADO but not in the thermophilic ADO were noted (Fig. 6). The protein sequences of Cts-ADO and Np-ADO were further aligned with ADO protein sequence of a thermophile Thermosynechococcus elongatus and a mesophile S. elongatus PCC7942 ADO (Fig. S6). The amino acid residues present in the regions corresponding to high fluctuations were analyzed, and the residues that were found not to be conserved between the mesophilic and thermophilic ADOs in these regions were noted. This predicted four residues, i.e. histidine, aspartate, alanine, and threonine, present at 51st, 132nd, 161st, and 191st position of Np-ADO, that have been mutated in Cts-ADO to lysine (H51K), proline (D132P), glutamate (A161E), and isoleucine (T191I), respectively. Structural analysis revealed that all these residues were present on surface-exposed regions of Np-ADO and would possibly not interfere with the active site of the protein directly (Fig. S7).
The four naturally occurring mutations, i.e. H51K, D132P, A161E, and T191I, were analyzed for their role in imparting thermostability by introducing these single amino acid substitutions in the mesophilic protein, Np-ADO, using site-directed mutagenesis. All mutants were successfully overexpressed in E. coli DH5␣ to estimate the in vivo activity of the enzymes at different temperatures (Fig. 2), as described in the earlier section. It was observed that although the WT showed a decline in pentadecane formation from 3.25 mg/g DCW at 30°C to 1.96 mg/g DCW at 42°C, all the four mutants recorded improvement in activity at higher temperatures. Although improvement was prominent from 30 to 37°C in all mutants, A161E and H51K mutants seemed to have further improved activity at 42°C.
The activities of all the mutants were further analyzed under in vitro conditions after purifying them by metal-affinity chromatography (Fig. S2). Consistent with the in vivo results, it was observed that the A161E and H51K mutant exhibited a 2-and 1.6-fold increase in the pentadecane production at 320 K (ϳ47°C), respectively, as compared with their activities at 300 Tyr-40 residue (colored green) appeared to interact with the Fe2 in Cts-ADO at 300 K, which at a higher temperature was removed due to a shift in the position of the helix H1. This rearrangement perhaps led to moving the ligand close to Fe2 in Cts-ADO at 320 and 340 K. Cluster representatives of the largest cluster were taken for each enzyme at different temperatures from their respective MD trajectories using CPPTRAJ module of AmberTools for generating structures. The interactions between iron atoms, active-site residues, and substrate ligand (LC16) were viewed using LigPlot ϩ version 2.1. Protein engineering for improved thermostability of ADO K (ϳ27°C) (Fig. 3 and Table S2). These two mutants shared temperature optima for their specific activity with Cts-ADO at 320 K. The other two mutants and WT mesophilic Np-ADO behaved similarly and showed small improvement in activity at 310 K (ϳ37°C) followed by decline at 320 K and beyond ( Fig. 3 and Table S2). Thus, these results indicated that mutations at 161st and 51st position were mainly responsible for imparting thermostability to Cts-ADO. Replacement of alanine at the 161st position with glutamate indicated that substitution of a neutral amino acid with an acidic one helped in enhancing the thermostability. Replacement of histidine with lysine at 51st position led to substitution of one basic amino acid with another. However, different pK a values for the side chains of these two amino acids (lysine pK a ϳ10.53 and histidine pK a ϳ6.0) and the absence of an imidazole ring in lysine possibly confers a different property to the surface and thus may be responsible for the thermostability conferred to Cts-ADO.

Evaluation of kinetic parameters of ADO variants at different temperatures
The steady-state kinetic studies of Np-ADO, Cts-ADO, and A161E mutant that resulted in highest improvement in specific activity of Np-ADO at higher temperatures were performed at different temperatures. For evaluating kinetic parameters, we also constructed the double substitution (H51K/A161E) mutant of Np-ADO combining the two amino acid changes leading to the most favorable thermostability experimentally, as well as the quadruple substitution (H51K/D132P/A161E/ T191I) mutant of Np-ADO combining all the four amino acids predicted to confer thermostability in silico. As expected, the kinetic efficiency (k cat /K m ) of Np-ADO declined by 1.7-fold upon increasing the temperature from 27 to 47°C, and it increased by 1.6-fold in the case of Cts-ADO under similar conditions (Table 1 and Fig. S8). Also, although significant activity was observed for Cts-ADO at 60°C, no activity was observed in the case of Np-ADO. Interestingly, A161E mutant of Np-ADO showed similar kinetic efficiency as Cts-ADO at 27 and 47°C and higher at 60°C, which could be attributed to lower affinity of Cts-ADO at 60°C. The double mutant of Np-ADO showed further higher kinetic efficiency at 60°C as compared with mutant A161E, whereas the quadruple mutant showed higher kinetic efficiency at 27 and 47°C but lower at 60°C (Table 1 and Fig. S8). These results indicated that combining all four mutations had no further advantage over the double mutant having A161E and H51K mutation at higher temperature. This also short listed two major residues responsible for imparting thermostability to the enzyme, i.e. H51K and A161E. To understand how these residues contributed individually to the thermal tolerance, their structural implication needed to be explored further.

Probing the structural significance of the beneficial mutations A161E and H51K in silico
The performance of the ADO mutants under both in vivo and in vitro conditions pointed out two mutations endowing the ADO enzyme with improved thermostability. As a single mutation is unlikely to cause a significant change to the overall stability of a protein, it was important to analyze the changes in the secondary structure caused at the mutation site. For this purpose, WT Np-ADO was remodeled after introducing the single amino acid substitutions. The positions of the four residues that were mutated have been indicated on the WT Np-ADO in Fig. S9. The two mutations, i.e. A161E and H51K, which were positioned on loops L6 and L2, respectively, and showed improved activity at or above 320 K, were taken up for further analysis. 100-ns molecular dynamic simulations were applied to examine the flexibility changes of the variants A161E and H51K. The protein backbone r.m.s.d. were calculated to validate the stability of these mutants (Fig. S10)

Kinetic parameters of different ADOs at varying temperature
The Michaelis-Menten graphs used for determination of kinetic parameters have been given in Fig. S4.

Protein engineering for improved thermostability of ADO
enzyme assay results (Fig. 3) as discussed in the previous section where it showed almost 2-fold higher product formation at 320 K (ϳ47°C). However, the r.m.s.d. result for H51K mutant, where it appeared to have the lowest fluctuations from the starting point at 340 K near the 100-ns time scale simulations, was not apparent in the in vitro results where the activity continuously declined beyond 320 K.
To understand the variation in the secondary structures with changes in the temperatures, cluster representatives of the largest clusters at different temperatures were analyzed for the mutants (Fig. S11) and compared with those of the Np-ADO and Cts-ADO (Fig. 6). The secondary structure analysis revealed that this naturally occurring mutation, A161E, in the thermophile was present at the transitions between helices, on the loop 6 (L6) connecting helices H5 and H6 (Fig. S9). Also, the site of the second most effective substitution, i.e. H51K, was present at the beginning of the helix H2 where the loop L2 connecting the helices H1 and H2 was present (Fig. S9). It was observed that incorporation of these single amino acid substitutions in the WT Np-ADO resulted in significant variations in the overall secondary structures of the mutants (Fig. S11). As observed in Fig. 6, the shortening of the loop L6 at 340 K in the thermophile was important for the stability of the protein, and it appears that in the mutant A161E (Fig. S11, panel A, 320 K), the shortening of the loop with the presence of an additional turn promoted the stability while maintaining the active confirmation. To understand how these substitutions impacted the substrate binding, the active-site interactions were probed using LigPlot ϩ version 2.1, and the distances between the substrate and the iron atoms were recorded (Fig. S12). It was observed that the ligand appeared to be bound to the Fe1 in the mutant A161E at 300 K, which shifted to Fe2 on increasing the temperature to 320 K. A further increment in the temperature resulted in the shifting of the ligand back to Fe1, thus probably resulting in low activity. The mutant H51K appeared to show a pattern similar to the WT Np-ADO at 300 K where the substrate showed a preferred binding to the Fe2 (Fig. S12B and Fig.  8A). Although the ligand appeared to shift to Fe1 in the WT Np-ADO at 320K (Fig. 8A), it remained bound to Fe2 in the mutant H51K thereby managing to retain the activity as shown during in vitro study (Fig. 3).

Characterization of the thermostable mutants using differential scanning fluorimetry and CD spectroscopy
Differential scanning fluorimetry employing SYPRO Orange was used to determine the T m of the WT Np-ADO, Cts-ADO, and the most effective single mutant variant A161E. SYPRO Orange is an environmentally sensitive dye and is known to bind to the hydrophobic patches of the proteins exposed due to the unfolding process, and it results in a large increase in fluorescence, which is used to monitor the protein-unfolding transition (60). In our studies (Table 2 and Fig. 9), we observed that the T m of the thermostable mutant A161E showed a remarkable improvement by 5°C over WT. The observed T m values for the thermostable mutant A161E and the thermostable consensus Cts-ADO were found to be 58 and 60°C, respectively, which was in concordance with the in silico and the in vitro studies. To further assess the impact of protein engineering on overall secondary structure of the proteins, circular dichroism (CD) spectra of the WT Np-ADO, Cts-ADO, and the mutant A161E were compared. The CD spectra results indicated that all three proteins possessed similar structures at 25°C, which was consistent with the fact that all of these proteins were active at physiological conditions ( Fig. 10 and Table  S3). The secondary structure analysis revealed the predominance of ␣-helices (87%) in both WT and its mutant as well as in the thermostable consensus protein, which was in concordance with the homology modeling results.

Discussion
Given the importance of consolidated bioprocessing where in a single microbial platform an enzyme blend that works synergistically under similar environmental conditions for production of industrially relevant fuels and chemicals, it is important to rationally engineer proteins for improved thermal stability that can withstand the harsh conditions of biorefineries while conserving their catalytic efficiency (61). For this purpose, it is crucial to identify key structural regulators that govern the maintenance of the protein structure at high temperatures while preserving its overall activity. Considering the importance of long-chain alkanes as an ideal drop in biofuel candidates, the engineering of the alkane-producing enzyme, ADO, is essential for the feasibility of this process.
There have been various reports that involve the improvement of ADO activity by employing protein engineering tools (43). However, no work has been done on improving the thermostability of this enzyme. High temperature aquatic ecosystems, such as hot spring, harbor various thermophilic microorganisms (62) and thus could be an excellent reservoir to look for thermostable enzymes. Because less than 1% microbial community from the environment is cultivable under laboratory conditions, the preparation of metagenome library by the culture-independent method provides comprehensive representation of a majority of microorganisms present in the envi-  Protein engineering for improved thermostability of ADO ronmental sample. We therefore used metagenome libraries of 41 hot-spring microbial communities and designed a thermostable ADO using an in silico pipeline constructed specifically for generating a Cts-ADO. An in-depth analysis of Cts-ADO showed its higher thermal tolerance as compared with mesophilic Np-ADO both under in vivo and in vitro conditions. Interestingly, Cts-ADO showed a much lower in vivo soluble expression level at 30°C, which subsequently increased at 37 and 42°C, whereas a reverse trend was observed for Np-ADO (Fig. 2B). This suggested that the consensus thermostable ADO has preference for soluble expression at higher temperatures. There have been reports suggesting higher soluble expression of recombinant thermostable proteins from thermophiles; however, the exact correlation between thermostability of proteins and their soluble expression remains to be understood (63,64). Although we did observe an increase in product formation with an increase in temperature under in vivo conditions in the case of Cts-ADO ( Fig. 2A), this could be due to concomitant improvement in the solubility. Nevertheless, the specific activity and kinetic efficiency measurement of purified Cts-ADO under in vitro conditions indicated a temperature maxima at 320 K (ϳ47°C), in addition to showing significant activity until 333 K (60°C). On the contrary, the specific activity and kinetic efficiency of Np-ADO declined at 320 , and no activity was detected beyond 330 K. These observations indeed demonstrated the thermostable nature of Cts-ADO and prompted us to probe the structural significance contributing to its thermal stability.
The experimentally determined structures of ADO from the PDB database served as a template to prepare homology models for Cts-ADO and Np-ADO. As expected, the modeled structures for both proteins were similar and consisted of eight ␣ helices connected by loops and two iron atoms present at the catalytic core of the protein. Further computational assessment of the modeled structures using molecular dynamics simulation showed distinctly stable regions in the structure of Cts-ADO at higher temperatures possibly imparting thermostability to the protein. Examining these regions highlighted the role of residues clustered on the exposed loops in conferring thermal stability to the protein. Previous reports have also indicated the role of loops in enhancing protein thermostability (65). The role of these changes on substrate binding and positioning of the active-site residues also needed to be analyzed. It has been shown earlier that among the two iron atoms Fe1 and Fe2, it is the preferential binding of the substrate to the Fe2 that triggers the decarbonylation reaction of fatty aldehyde (35). Based on the interactions between the active-site residues, substrate ligand, and the iron atoms, it was observed that the substrate ligand showed varied binding with the iron atoms on changing the temperature. In case of mesophilic NP-ADO, the substrate ligand appeared to be bound to Fe2 at 300 K and shifted to Fe1 as the temperature was increased beyond 320 K. Interestingly, however reverse dynamics were observed in Cts-ADO where the ligand was initially bound to Fe1 at 300 K, but on raising the temperature to 320 K it resulted in shifting of the ligand interactions to Fe2. Based on these observations and keeping in view the changes occurring in the secondary structures of Np-ADO and Cts-ADO with an increase in temperature, the regions appearing to show the highest fluctuations in the Np-ADO while maintaining a stable confirmation in the Cts-ADO were probed further for prediction of key residues responsible for imparting thermal stability. This analysis led to identification of four single amino acid substitutions, namely A161E, D132P, H51K, and T191I, all of them lying on the surface of the protein (Fig. S7). When all these substitutions were incorporated in the Np-ADO individually, A161E showed the highest improvement in thermostability, followed by H51K. Combining these two mutations had favorable impact on the thermostability, but when all the four mutations were incorporated in the Np-ADO, it increased the enzyme efficiency up to 47°C (320 K) but declined when further increasing the temperature to 60°C (340 K).
Structural insights indicated that Ala-161 was present at the transitions between helices, on loop 6, and His-51 was present at the beginning of the helix H2 where loop L2 connecting the helices H1 and H2 was present (Fig. S9). Flexible sites such as loops connecting different secondary structure elements are potential targets for engineering the stability of enzymes. Various studies have highlighted the importance of loops in modulating enzyme catalysis, specificity, stability, and proteinprotein interactions (66 -68). There are numerous reports that have exhibited the successful alteration of enzyme stability by engineering loop regions (65). Substitution of A161E was particularly found to be effective in improving the thermostability of mesophilic Np-ADO, and the interaction of ligand with its Fe2 atom at higher temperatures behaved similar to that in the case of thermostable Cts-ADO. This was further validated by the melting curve where incorporating A161E substitution in Np-ADO shifted its T m value close to that of Cts-ADO (Fig. 9).
This study reports for the first time the construction of thermostable ADO of cyanobacterial origin through structureguided engineering with net retention of its catalytic activity. Moreover, the role of different residues in contributing to the thermostability was identified employing MD simulation dynamics, and validation of the predicted residues was performed through in vivo and in vitro assays. A similar approach using this rationale can help in improving the thermostability of the first enzyme, AAR, of the alkane biosynthesis pathway, which can aid in the assembly of this pathway in thermostable hosts, such as Geobacillus, for optimal alkane production at higher temperatures.

Microbial strains, media, and reagents
E. coli strains and plasmids used in this study are listed in Table S4. E. coli DH5␣ was used as host for molecular cloning. Fatty aldehydes and fatty alcohols used in enzyme assays and as standards for GCMS-MS were purchased from TCI Chemicals (India) Pvt. Ltd. Ethyl acetate and methanol used in the extraction of hydrocarbons were procured from Merck. Codon-optimized genes for the aldehyde-decarbonylating oxygenase of N. punctiforme PCC 73102 (np-ado), the consensus thermostable aldehyde-deformylating oxygenase (cts-ado), and the quadruple mutant of Np-ADO (H51K/D132P/A161E/T191I) were commercially synthesized at Genscript. The kits and enzymes used for molecular biology purpose and their sources were as follows: genomic DNA isolation kit (ZR Fungal/Bacterial DNA Miniprep from Zymo Research); plasmid miniprep kit, PCR, and gel extraction kit (Qiagen); phusion DNA polymerase (Finnzymes); Taq DNA polymerase (Himedia Pvt. Ltd.); dNTPs (Fermentas); DNA fast digest restriction endonuclease enzymes and T4 DNA ligase (Thermo Fisher Scientific). The site-directed mutagenesis was carried out using the QuikChange II site-directed mutagenesis kit by Agilent. The primers used in the study for molecular cloning and mutagenesis are listed in Table S5.

Bacterial strains and plasmid construction
E. coli DH5␣ was used for routine DNA cloning and protein expression. Codon-optimized aldehyde-deformylating oxygenase gene from N. punctiforme PCC73102 was used to construct the WT pQE30-Np-ADO. The codon-optimized consensus thermostable cts-ado gene synthetically designed using the metagenome pipeline as discussed below was cloned at BamHI/SacI restriction site in pQE30 to obtain pQE30-Cts-ADO. Engineered E. coli strains were constructed by transformation of E. coli DH5␣ with these plasmids. All the engineered E. coli strains were grown in LB broth containing antibiotics at standard concentrations. 100 g/ml ampicillin was added in the culture when required.

Site-directed mutagenesis
The construction of single or double amino acid mutants was carried out using QuickChange site-directed mutagenesis kit by Agilent in accordance with the protocols described in the kit manual. pQE30-Np-ADO plasmid DNA was isolated using the Plasmid Miniprep Kit by Qiagen to serve as the template for the mutagenesis experiment. Site-directed mutants were constructed using the primers listed in Table S5. For constructing the double mutant H51K/A161E, the single mutant A161E was used as the template to which the second mutation H51K was introduced. The desired mutations were confirmed by DNA sequencing carried out at Macrogen, Korea.

Design of synthetic thermostable ADO using metagenome pipeline
Metagenome analysis was conducted where a pipeline for generating consensus sequence for designing a thermostable ado gene was articulated. Here, numerous metagenome librar-ies were screened using IMG/MER database, and only the sequences of organisms belonging to hot springs were retrieved. Metagenome libraries were analyzed for the presence of the ado gene homologs using the ado gene from S. elongatus PCC7942 as the query sequence. Only those homologs that showed a coverage score of Ն300 in BLASTN alignment were considered further. The sequences were translated using ExPASy translate tool (69) and aligned using MEGA6 software (70). Multiple polypeptide sequences that were assigned the same genus based on the BLAST results and that showed complete alignment on MEGA6 software were identified, and only one sequence each for every genus was selected for further analysis. The Jalview software (71) was then employed for generating a consensus sequence that was sent for synthesis to GenScript.

3D molecular modeling and structure-based protein design for improved thermostability
BLASTp search was performed against the PDB database using Np-ADO and Cts-ADO FASTA sequences. The multiple sequence alignment of the Np-ADO and Cts-ADO was performed using CLUSTAL Omega web server (72). The sequence similarities, alignment, and substrate-bound forms of PDB templates were chosen as criteria to select templates to model these enzymes. The PDB entries 4RC5, 2OC5, and 4TW3 (PDB codes 4RC5 from S. elongatus PCC7942 and 20C5 and 4TW3 from P. marinus MIT9313) were chosen as consensus templates for homology modeling of Np-ADO and Cts-ADO enzymes. The hexadecanal bound in the crystal structure of 4RC5 was used as substrate form in all three enzymes. A total of 20 models were generated for each enzyme in substrate-bound (hexadecanal) form using MODELLER version 9.15 software separately (54,73). The models were ranked according to the DOPE statistical potential scores. The PROCHECK web server (74) was used for performing quality assessments, including Ramachandran plots for the best models of three enzymes. All models were aligned using sequence similarities and secondary structure information through ESPript3 (75). The modeled structures were visualized using Delano PyMOL molecular graphics system (76). The two separate (H51K and A161E) single-residue mutations were introduced into the Np-ADO enzyme model using python script available in MODELLER to bring the desired mutation effects. Five models were used to study different temperature dynamics on respective structures. All models were subjected to five different temperatures starting from 300 to 340 K at intervals of 10 K. All molecular dynamics on different temperatures were performed using AMBER14, and data analysis was performed by AmberTools14 suite of programs (77,78). The topology and parameter files were constructed using the force field leaprc.ff99SB for the enzymes Np-ADO and its mutants and Cts-ADO. The parameters and atom types of hexadecanal substrate were generated through the ANTECHAMBER module, and charges were calculated using the AM1-BCC method. The systems were first solvated with triangulated 3-point water molecules using the using the 12-Å pad of TIP3P water model in the five systems. Neutralizing counter ions Na ϩ species were added in respective systems. The energy minimization procedure included initially 500 cycles of steep descent (SD) algorithm followed by the remain-Protein engineering for improved thermostability of ADO ing 19,500 cycles of conjugated gradient algorithm (CG) totaling 20,000 steps. After the energy minimization, the systems were heated from 0 to 300, 310, 320, 330, and 340 K over 50 ps with a collision frequency of 2.0 ps Ϫ1 , and weak harmonic restraints of 2 kcal mol Ϫ1 Å Ϫ2 on all atoms using Langevin thermostat for temperature regulation. The next step was accompanied by density equilibration (50 ps) and constant pressure equilibration at 5 ns. The final production molecular dynamics simulations were performed for a 100-ns time interval at different temperatures. All molecular simulations were performed using PMEMD module in AMBER14. The 2-fs time step was used for all molecular dynamics stages, and all atoms free bond stretching freedoms involving hydrogen atoms were constrained using the SHAKE algorithm. All simulation coordinates were saved in single trajectory in all the six systems, and the trajectory was saved every 2 ps. Total 2.7-s duration time was completed in all production molecular dynamics. The r.m.s.d. and RMSF and cluster analysis of trajectories were performed using CPPTRAJ module of AmberTools (79). Cluster representative of the largest cluster was taken for each enzyme at different temperatures from their respective MD trajectories using CPPTRAJ module of AmberTools for generating structures plotted in Figs. 6 -8 and Figs. S5, S6, S11, and S12.
Based on the r.m.s.d. and RMSF analysis, the amino acid residues lying in the thermally sensitive regions were identified. Multiple sequence alignment of different mesophilic and thermophilic ADOs was performed using Jalview. Using Chimera, the mutations identified from these approaches were visually ensured to be on the surface of the ADO enzyme, away from the substrate-binding site, and outside the catalytic center of the enzyme. LigPlot ϩ version 2.1 was used to view the interaction between the active-site residues, the iron atoms, and the substrate ligands. All the protein structures depicted in the study were finally plotted using Chimera (80).

Protein overexpression and purification
WT Np-ADO and its variants were overexpressed in E. coli DH5␣, whereas the Cts-ADO was overexpressed in Shuffle due to its low expression in DH5␣. The DH5␣ cells were grown at 37°C in LB broth medium containing the antibiotic ampicillin (100 g/ml), whereas Shuffle cells were cultured at 30°C in LB broth medium containing the antibiotic ampicillin (100 g/ml) and spectinomycin (100 g/ml). The cultures were induced with 1 mM IPTG when the A 600 reached around 0.5-0.6. Following induction, the DH5␣ cells were incubated at 37°C at 180 rpm, whereas the shuffle cells were incubated at 20°C at 180 rpm for 3.5 h before being harvested. Cells were collected through centrifugation at 8000 rpm at 4°C, resuspended in lysis buffer (50 mM KH 2 PO 4 , 50 mM K 2 HPO 4 , 400 mM NaCl, 100 mM KCl, 10% glycerol, 0.5% Triton X-100, and 10 mM imidazole), and lysed by adding lysozyme (1 mg/ml, USB Corp.) followed by sonication. The proteins were purified on nickel-nitrilotriacetic acid-agarose matrix (Qiagen), as described previously with some modifications (40). Eluates having the pure protein were pooled and dialyzed against final assay buffer (100 mM potassium phosphate buffer, pH 7.2, 100 mM KCl, and 10% glycerol). For circular dichroism spectroscopy and differential scanning fluorimetry, proteins were dialyzed against 10 mM potassium phosphate buffer, pH 7.2, concentrated on Vivaspin 20 sample concentrator (50-kDa MWCO, GE Healthcare), and used. Protein concentration was determined by BCA kit (G-Biosciences) using BSA as a standard.

In vivo and in vitro enzyme assays
For screening of thermostable mutants in vivo, E. coli strains harboring the WT and mutant plasmids were grown in 5 ml of LB broth medium with appropriate antibiotics at a final concentration of 100 g/ml ampicillin in 55-ml Borosil culture tubes. The culture tubes were incubated overnight in an incubator-shaker at 180 rpm at 37°C. 50 l of this grown culture was used to inoculate 3 ml of secondary culture having M9 modified medium supplemented with 2% glucose, 1 mg/liter thiamine, and 100 g/ml ampicillin (3). The culture was induced with 0.01 mM IPTG at the time of secondary inoculation in M9 medium. 100 mg/liter hexadecanal was added exogenously to the culture medium at the time of secondary inoculation, and the culture tubes were sealed with Parafilm to avoid evaporation at high temperatures. The culture tubes were then kept on an incubator shaker at different temperatures (30, 37, and 42°C) for 48 h at 120 rpm. The hydrocarbon analysis was done after extraction of hydrocarbons from the culture medium by adding an equal volume of ethyl acetate and vortexing the mixture for 20 min following which the samples were centrifuged, and the upper phase was collected in vials for GC analysis (3). 1-Octadecene was used as an internal standard. For each temperature, the experiments were performed in triplicate, and the data are presented as average and standard deviation of three independent biological replicates.
The in vitro enzyme assay was carried out according to the published protocol (33). Enzyme assays were performed in 100 mM HEPES buffer, containing 100 mM KCl and 10% glycerol at pH 7.2. The reaction mixtures contained the following components: NADH (750 M), ferrous ammonium sulfate (80 M), phenazine methosulfate (75 M), hexadecanal (150 M), cADO (20 M) in a total 500-l reaction volume. For enzyme assays at different temperatures, both the reaction components and the enzymes were pre-incubated at the specific temperature before starting the reaction. Equal volumes of ethyl acetate was used for termination of the reaction. The sample was then vortexed at high speed for 20 min and centrifuged. A 400-l extractant was then analyzed by GC-MS/MS. For determining the kinetic parameters, the enzymatic reaction was set up as mentioned above at different temperatures (300, 320, and 333 K) for 15 min on an incubator shaker. All kinetic assays were carried out in triplicate. The Michaelis-Menten plots were constructed using reaction rates obtained at different substrate concentrations at each temperature. The values for the kinetic parameters K m , V max , k cat , and k cat /K m were obtained by curve fitting using GraphPad Prism 5.

Quantification of alkanes using GC
GC analysis was performed on an Agilent 7890 A system equipped with HP-5 column of 30 m in length, 0.32-mm internal diameter, and 0.25-mm film thickness, and a flame ionization detector was used for analysis (81)(82)(83). The oven temper-Protein engineering for improved thermostability of ADO ature program was set as follows: initial 100°C held for 3 min to 250°C at 10°C min Ϫ1 and held at 250°C for an additional 10 min. The total run time of the program was 28 min. The inlet and detector temperature were maintained at 150 and 280°C, respectively. Further confirmation of products was done on a 7890 A GC system equipped with a 7000 GC/MS triple quadrupole system (Agilent) with HP-5 MS column using the same program as mentioned above. However, the final oven temperature was ramped up to 300°C at a rate of 20°C min Ϫ1 and was then held at 300°C for an additional 5 min. The MS quadrupole scanned from 50 to 550 m/z. Retention times and fragmentation patterns of product peaks were compared with authentic references to confirm peak identity. For confirmation and quantification, pentadecane, hexadecanal, 1-octadecene, and hexadecanol standards from TCI America were used.

Differential scanning fluorimetry
The thermal stability of the purified WT and thermostable mutant proteins was evaluated by differential scanning fluorimetry. The assay was performed using a CFX96 real time PCR system (Bio-Rad). Briefly, 5 l of 25ϫ SYPRO Orange dye (Invitrogen) was added to 10 l of protein at 1 mg/ml in 10 mM potassium phosphate buffer at pH 7.2. SYPRO Orange dye with the buffer served as the control to correct background fluorescence. The reaction volume was made up to 50 l in a Bio-Rad 96-well plate, and the samples were heated at 0.5°C per 5 s from 25 to 95°C. The fluorescence intensity (excitation/emission: 450 -490 nm/560 -580 nm) was measured every 0.5°C. Thermal midpoint (T m ) values of proteins were determined with the CFX Manager Program (Bio-Rad) based on calculation of the negative first derivative (52). The melt curve relative fluorescence unit values were entered against the respective temperature to generate the denaturation curve by nonlinear regression, fitting the Boltzmann equation to the denaturation curves using GraphPad Prism 5. Experiments were carried out in triplicate.

Characterization using CD spectroscopy
Far-UV CD spectra were measured for the WT ADO and its thermostable mutant using CD spectroscopy. The proteins were concentrated to a final concentration of 0.12 mg/ml in 10 mM potassium phosphate buffer at pH 7.2. CD spectrum was measured by JASCO J-810 (JASCO, Japan) using a 1-mm pathlength cuvette. Full wavelength data were collected at 25°C, with the wavelengths ranging from 190 to 250 nm at 0.5-nm intervals. Data were averaged over three runs, and the background was corrected using a buffer blank. Secondary structure analysis was performed with K2D2 method (53).