Crystal Structure of a Thermophilic Cytochrome P450 from the Archaeon Sulfolobus solfataricus *

The structure of the first P450 identified in Archaea, CYP119 from Sulfolobus solfataricus, has been solved in two different crystal forms that differ by the ligand (imidazole or 4-phenylimidazole) coordinated to the heme iron. A comparison of the two structures reveals an unprecedented rearrangement of the active site to adapt to the different size and shape of ligands bound to the heme iron. These changes involve unraveling of the F helix C-terminal segment to extend a loop structure connecting the F and G helices, allowing the longer loop to dip down into the active site and interact with the smaller imidazole ligand. A comparison of CYP119 with P450cam and P450eryF indicates an extensive clustering of aromatic residues may provide the structural basis for the enhanced thermal stability of CYP119. An additional feature of the 4-phenylimidazole-bound structure is a zinc ion tetrahedrally bound by symmetry-related His and Glu residues.

Understanding the structural basis for the enhanced stability of proteins from thermophilic organisms relative to their mesophilic counterparts is a challenging problem. The increasing availability of high-resolution crystal structures from both thermophiles and mesophiles has provided an important database for determining the structural basis of thermal stability. For example, Gromiha et al. (1) analyzed the thermal stability of 16 different families of proteins. The collective results suggest that the stability of thermally stable proteins may result from a better balance between packing and solubility. Other researchers have attributed various factors such as changes in loop flexibility (2), salt bridge interactions (3), and amino acid substitutions (i.e. Ile for Ala) (1) as the origin of increased thermal stability. It is important to note that despite all efforts to define specific traits, no single trait has been universally linked to the increase in thermal stability for all proteins. It is therefore important to use comparative analyses to determine the factors that lead to an increase in the thermal stability of structurally related proteins.
In addition to the fundamental question of how structure relates to stability, there is considerable practical significance in the industrial and medical application of thermally stable enzymes. (4). In this regard, reactions catalyzed by cytochromes P450 are of particular interest. P450s are heme enzymes that catalyze the following reaction, where R represents a variety of substrates, normally non-polar aromatic or aliphatic molecules. REACTION 1 One of the primary functions of P450s is to aid in the clearance of toxic hydrocarbons by rendering such molecules more soluble through hydroxylation. In addition to this catabolic function, P450s also participate in the biosynthesis of essential molecules such as steroids (5) and various natural products including antibiotics (6).
A long sought practical goal in P450 research is to capitalize on the exquisite specificity of P450s in regio-and stereo-selective hydroxylation reactions (7). One possible example is in the preparation of important hydroxylated molecules that are difficult to prepare using traditional organic synthesis methods. In addition, P450s could prove useful in the oxidative or reductive elimination of environmental pollutants (8). A current limitation is the requirement for relatively expensive reducing equivalents in the form of NADH or NADPH. It might be possible to circumvent this restriction by the use of H 2 O 2 as the oxidant rather than O 2 , because H 2 O 2 -supported P450 reactions do not require the transfer of electrons from a redox partner (9). A second limitation is the instability of the P450 catalysts. There are basically two approaches to the resolution of this problem. One is to seek out enzymes from thermophiles that catalyze the reaction of interest, and the second is to use protein engineering to convert a mesophilic enzyme into a thermally stable enzyme. Utilization of the latter approach requires knowledge of those structural features that are critical for imparting thermal stability. Until now this has not been possible with P450s because the available structures are all from mesophiles. However, quite recently, Wright et al. (10) accidentally discovered a P450 while attempting to clone the thymidylate synthase gene from the acidothermophilic archaeon Sulfolobus solfataricus. Based on the accepted P450 nomenclature (5), this P450 has been termed CYP119. 1 S. solfactaricus has optimal growth conditions at 85°C and pH 3.5 (11), which operationally defines it as a thermophile. As expected, CYP119 exhibits unusual thermal stability with a melting temperature of ϳ90°C compared with ϳ55°C for bacterial P450cam (12,13). The function of CYP119 remains unknown, but considering that S. solfataricus can utilize elemental sulfur as an energy source (11) and that P450s are known to participate in sulfoxide formation (14), the oxidative metabolism of sulfur compounds is a possibility. Currently, the genome of S. solfataricus is being sequenced (15) and once completed may provide some hints on the function and redox partners of CYP119. Here we describe the structure determination of CYP119 and a comparison of the structure with those of its mesophilic counterparts in an attempt to understand the structural basis for thermal stability.

MATERIALS AND METHODS
Expression and Purification of CYP119 -BL21(DE3) Escherichia coli cells were transformed with a pCWori plasmid containing the gene for CYP119 (12). Single colonies were selected for overnight growth in 5-ml cultures, which were used to inoculate 1-liter cultures. Two overnight 1-liter cultures were used to inoculate a 40-liter fermenter. At an A 600 nm of 0.8, 500 g/liter of isopropyl ␤-D-thiopyranogalactoside (USB) was used to induce expression of CYP119. The conditions for cell growth were 2ϫ YT (tryptone, 16 g, yeast extract, 10 g, NaCl, 5 g, per liter), 0.1 g/liter ampicillin (Sigma) at 37°C with shaking at 220 rpm for the starter cultures and slow stirring (80 -100 rpm) at 30°C for the fermenter. Thirty-six hours after induction the cells were harvested, and cell paste was stored at Ϫ70°C.
Cells were lysed in a French pressure cell in 50 mM Tris-Cl, pH 7.4, 0.1% Triton X-100, and 1 mM phenylmethylsulfonyl fluoride (Sigma). Cell debris was pelleted by centrifugation, and the cell-free extract was heated to 65°C for 1 h. Precipitated proteins were removed by centrifugation. The supernatant was brought to 80% ammonium sulfate, and the resulting precipitant was resuspended in minimal volume and dialyzed against 50 mM bis-Tris, pH 7.2 overnight (three times buffer exchanges). The dialyzed protein was loaded onto a QAE-Sepharose column (2.5 cm ϫ 25 cm), washed with 2-3 column volumes of buffer, then eluted with a linear gradient, 0 -250 mM NaCl in 50 mM bis-Tris, pH 7.2. Fractions with an absorbance ratio (A 415 nm /A 280 nm ) of Ͼ 1.40 were pooled, desalted, and exchanged into 50 mM bis-Tris, pH 7.0. A chromatofocusing column (PBE94, Amersham Pharmacia Biotech) was equilibrated with 50 mM bis-Tris, pH 6.0. The protein was loaded, washed, and then eluted with Polybuffer 74 pH 5.0 (Amersham Pharmacia Biotech) according to the manufacturer's instructions. Fractions with an absorbance ratio (A 415 nm /A 280 nm ) above 1.60 were pooled, desalted, and then exchanged to 50 mM bis-Tris, pH 6.0 and used for crystallization. P450 concentrations were determined with the extinction coefficient ⑀ 415 nm ϭ 104 mM Ϫ1 as reported by McLean et al. (13).
Crystallization-Crystals belonging to the monoclinic space group P2 1 (unit cell dimensions a ϭ 76.97 Å, b ϭ 70.26 Å, c ϭ 107.54 Å, ␤ ϭ 102.34°) were grown by free interface diffusion in 200-l capillaries with a protein to precipitant ratio of 2:3. The initial protein concentration for capillary growth was 40 mg/ml. The precipitant solution consisted of 35% polyethylene glycol (PEG) 3350, 100 mM imidazole pH 8.0, and 0.2 M Li 2 SO 4 . Crystals belonging to the orthorhombic space group C222 1 (unit cell dimensions a ϭ 76.96 Å, b ϭ 114.60 Å, c ϭ 185.20 Å) were grown by hanging-drop vapor diffusion over a well of 20% PEG 3350, 100 mM EPPS, pH 8.5, 0.2 M MgSO 4 . 1 mM 4-phenylimidazole was mixed with 20 mg/ml CYP119 (final concentrations) and used to set up 5 l ϩ 5 l drops on siliconized glass coverslips. All crystals were grown at room temperature. The single crystals used for data collections were carefully separated from larger crystal clusters. Cryogenic data collection conditions for the C222 1 space group consisted of a four-step transfer to artificial precipitant solution with increased ethylene glycol concentration up to 20%. Cryogenic conditions for the P2 1 space group were unsuccessful, leading to severe anisotropy in crystal diffraction. There were two molecules per asymmetric unit for both space groups.
Data Collection-Room temperature data were collected from the P2 1 crystal form using an in-house R-Axis IV imaging plate equipped with a Rigaku rotating copper anode x-ray generator. High-resolution data collection (50 -1.93 Å) at 100 K on the C222 1 crystal form was performed at the Stanford Synchrotron Radiation Laboratory, beamline 7-1 with a Mar345 imaging plate. Low-resolution data (50 -3.1 Å) on the C222 1 form were also collected in-house at 100 K. Optimization of data collection was guided by the STRATEGY function of MOSFLM (16). All data were reduced using DENZO and SCALEPACK (17), and rejections were performed with ENDHKL (Louis Sanchez, California Institute of Tech-nology) in conjunction with SCALEPACK.
Molecular Replacement-Molecular replacement in the P2 1 space group was carried out with a version of BRUTE modified to use loglikelihood scoring 2 using data to 4.5 Å with a polyalanine model of P450eryF (PDB accession number 1OXA) (18) as the search model. The top rotation solution was correct and was further optimized using data to 3 Å, which was followed by a translational search. The second rotation solution was generated through application of the self-rotation peak and followed with another translational search. The final solution had an R-factor of 0.53 after rigid body refinement. Density modification in MAGICSQUASH (19) was used to improve phases. Electron density map fitting beginning from the polyalanine P450eryF model was carried out with TOM (20). Several sections of the electron density map, especially the core of the structure near the heme, clearly showed the identity of CYP119 side chains. Regions of the molecule that lacked clear backbone electron density were deleted prior to further refinement with CNS (21). With successive rounds of refinement and model building the entire CYP119 sequence was fit to the electron density map except for the last two residues. In the first 12 rounds of refinement, the entire chain of one molecule was traced and non-crystallographic symmetry operators were used to generate the second molecule in the asymmetric unit. In the last two rounds of refinement, each chain was is the intensity of reflection h, ͚ h is the sum over all reflections, and ͚ i is the sum over all I measurements of reflection h.
b Measured as %. c R free ϭ R-factor calculated using 5% of the reflection data chosen randomly and set aside from the start of refinement. fit separately. The final R-factor was 0.219 (R-free ϭ 0.253). Backbone geometry was analyzed in PROCHECK (22) and only two residues (alanine 152 in both molecules) were in the disallowed region. There were 29 water and 2 sulfate molecules found per asymmetric unit.
To determine phases for the C222 1 space group the imidazole-bound (P2 1 ) structure was used as a search model in AMoRe (23). The first molecule produced a solution that stood out above the background and was fixed to find the second molecule giving a R-factor of 0.436. After one round of both positional and temperature factor refinement in CNS (21) the R-factor dropped to 0.314 (R-free ϭ 0.348) at a resolution of 1.93 Å. The F and G helical regions of the protein had to be extensively remodeled. The final R-factor was 0.225 (R-free ϭ 0.274). Backbone geometry was checked in PROCHECK (22), and none of the residues were in the disallowed region. All of the residues except the last residue, 368, were present in the electron density maps. The final model contains 622 water molecules, 2 sulfates, and one zinc per asymmetric unit. Relevant statistics are given in Table 1.
The coordinates have been deposited to the Protein Data Bank and the PDB codes are 1F4U (imidazole-bound) and 1F4T (4phenylimidazole-bound).

RESULTS AND DISCUSSION
Overall Structure-CYP119 exhibits the typical P450-fold ( Fig. 1). However, at 368 residues, CYP119 is considerably shorter than P450cam (24) or P450eryF (18), at 414 and 403 residues, respectively. The majority of this difference in length is located at the N-terminal region. P450cam and P450eryF begin with relatively ill-defined N-terminal tails whereas CYP119 begins immediately with helix A. As a result, residue 1 in CYP119 corresponds to residue 42 or 16 in P450cam or P450eryF, respectively. Therefore, the shorter N terminus alone in CYP119 accounts for 41 of the 56 fewer amino acids in CYP119 compared with P450cam. The remainder of the differences in length occurs primarily in surface turns (Fig. 1). For example, the ␤1-1/␤1-2 hairpin (residues 13-24) in CYP119 is four residues shorter than that in P450cam (residues 52-66). Another difference includes the 225-234 ␤5-turn in P450cam compared with the 191-195 loop in CYP119, a difference of five residues. This tight loop between the H and I helices in CYP119 resembles what was found in P450nor (25), which does not require a redox protein partner for electron transfer. In contrast, all other known P450 structures show an extended but ill-defined ␤-hairpin similar to that in P450cam. Interestingly, this hairpin region makes direct contact with the FMN-binding domain in the crystal structure of the complex formed between the heme and FMN domains in P450 BM-3 (26).
The most striking differences involve the B' helix region, which is known to be important in substrate binding (27). Residues 63-66 in CYP119 corresponding to the location of B' helix in P450eryF are no longer a complete helix. Instead this region defines only one turn of the 3 10 helix. However, CYP119 does have a helix that we have termed the B' helix consisting of residues 49 -53 (Fig. 2). In CYP119, the B' helix is preceded by a long loop that enables the B' helix to adopt a position away from the active site toward the molecular surface (Fig. 2). Such a change should leave the active site relatively exposed. However, a solvent accessibility calculation, without substrate or inhibitors included, shows that ϳ24 Å 2 of the heme is exposed in CYP119 compared with ϳ18 Å 2 in P450cam. The reason that CYP119 and P450cam are so similar in active site access by a solvent probe is because of the repositioning of the loop connecting the F and G helices (Fig. 2). In P450cam and other P450s, this F/G loop extends out on the surface. Conversely, in CYP119, this loop points completely in the opposite direction and dips into the active site, thereby occupying the space normally taken by the B' helix region. As we shall see in the next section, this loop can adopt alternate conformations depending on the identity of the ligand coordinated to the heme iron.
The most highly conserved regions are near the heme, which includes the I and L helices. The polypeptide conformation and local environment surrounding the Cys thiolate heme axial ligand, Cys-317 in CYP119, is strictly conserved. The I helix spans the entire molecule and is situated directly above the heme. The I helix in CYP119 is unusual, having two additional Thr residues following the conserved Thr at position 213 (Fig.  3), which is present in most P450s. The positioning of Thr-213 close to the heme surface relative to Thr-214 and Thr-215 was correctly predicted based on chemical modification data (12). This region has been postulated to be involved in a proton shuttle network considered important for delivering solvent protons required for the activation of O 2 during the catalytic cycle (28). The H-bonding network involving helix I Thr residues, however, is somewhat different in CYP119. The side chain hydroxyl group of the conserved Thr-213 does not donate an H-bond to a peptide carbonyl as it does in P450cam. Instead, Thr-214 donates an H-bond to the peptide oxygen atom of Gly-210 (Fig. 3). This is very similar to the H-bonding pattern found in P450BM-3 that also has a Thr corresponding to Thr-214 in CYP119 (29). Interestingly, the T214A (but not T213A) mutant results in a large increase in the rate at which CYP119 catalyzes H 2 O 2 -supported styrene epoxidation (12). Because Thr-214 but not Thr-213 H-bonds with the backbone, it might be expected that mutation of Thr-214 would cause a greater perturbation in the I helix thereby leading to a greater change in reactivity toward styrene.
Zinc Binding Site-An interesting feature of the 4-phenylimidazole structure is a cation, most likely Zn 2ϩ , situated at the interface between the two molecules in the asymmetric unit (Figs. 4 and 5). The identification of the ion as Zn 2ϩ is based on anomalous difference Fouriers calculated in PHASES (30) at different x-ray wavelengths. Zinc has an absorption edge at 1.28 Å whereas that for iron is at 1.74 Å. Using data from the copper rotating anode at a wavelength of 1.54 Å, an anomalous difference Fourier shows strong 10 -11 peaks for the two iron atoms but nothing for the presumed zinc. However, using synchrotron data at a wavelength of 1.08 Å, the zinc site has a peak of 17, whereas the iron peaks are 13-14. The type of ligands and the tetrahedral coordination environment of the ion also support its identification as zinc. As shown in Fig. 5, the Zn 2ϩ is coordinated by Glu-139 and His-178 in one subunit, and their symmetry mates in the other subunit. The zinc binding motif is very likely to be native to the enzyme rather than an artifact of crystal lattice formation because Zn 2ϩ was not included in the crystallization buffers. The fact that Zn 2ϩ was not seen in the imidazole complex may be due to the presence of 0.1 M imidazole in the buffer, which may have scavenged the low amounts of Zn 2ϩ present as a contaminant. It remains unclear if the dimer and Zn 2ϩ site are functionally important. Only 727 Å 2 per monomer of accessible surface area is buried at the interface, which is below the range for most true dimers. However, the tight coordination of Zn 2ϩ might provide sufficient stabilization to favor the dimeric form under physiological conditions.
Comparison of the Two Crystal Forms-A comparison of the imidazole and 4-phenylimidazole structures reveals interesting and unexpected differences. The active site adjusts in size and shape depending on the ligand bound, with the active site expanding to accommodate the larger 4-phenylimidazole li- gand. The F/G region undergoes the largest movements with backbone atoms shifting as much as 6 Å. The most significant change occurs in the F helix. In the 4-phenylimidazole complex, the F helix includes residues 141-155, but in the imidazole complex the helix stops at residue 151, a loss of one full turn of helix (Fig. 6). This unraveling of the helix lengthens the F/G loop, which then is able to dip down into the active site to make contacts with the smaller imidazole ligand. For example, the Leu-155 side chain moves approximately 6 Å, which enables the side chain to interact with the imidazole ligand. The loop "in" conformation is stabilized by direct contacts between the F/G loop with helix I, which are not present in the 4-phenylimidazole complex (Fig. 7).
In the 4-phenylimidazole complex, the F/G loop must move to make room for the larger ligand. One of the most dramatic changes is in the location of Arg-154. In the 4-phenylimidazole complex, Arg-154 is part of the F helix where it H-bonds with the peptide oxygen atom of Glu-198 (Fig. 7). In the imidazole complex, this section of the F helix unfolds enabling the Arg-154 side chain to reorient by 180 o where it can then interact with Glu-212. This new ion pair in the imidazole complex and other F/G loop-I helix contacts helps to stabilize the loop "in" conformation. It appears that favorable interactions lost upon unfolding of the F helix are partially offset by favorable interactions between the F/G loop and helix I.
Another region that experiences a large change is the ␤4 hairpin turn centered on Val-353, which moves about 3 Å. In this case, the hairpin squeezes in closer to the ligand in the 4-phenylimidazole complex which is in the F/G loop "out" conformation. This location of the turn is incompatible with the F/G loop "in" conformation in the imidazole complex. The F/G loop and the ␤4 hairpin as a whole undergoes an "induced fit" conformational change depending upon the size and shape of the ligand bound at the heme active site.
The only other P450 in which similar changes have been observed is P450BM-3. In that case, there is a large difference in the position of the F and G helices as well as the F/G loop (29). In going from the substrate-free to -bound complexes, the entry to the active site closes down around the substrate. However, secondary structural elements in P450BM-3 move as a unit with no net gain or loss in secondary structure as observed with CYP119. The various studies with P450cam provide better analogies with CYP119 because with P450cam there are several structures available of the enzyme complexed with substrate analogues as well as various imidazole complexes. In these cases, the changes are very modest compared with what we observe in CYP119 and involve primarily rearrangements in water structure, side chains, and small movements of backbone atoms. The one very dramatic exception is a chemically modified derivative of P450cam where the Cys residues were derivatized with N-(2-ferrocenylethyl)maleimide. In this case, part of the active site essentially unfolds to enable the ferrocene moiety attached to Cys-85 to enter the active site resulting in changes of as much as 10 Å in the position of some residues (31). Whereas demonstrating an unexpected level of flexibility in the active site, the introduction of a covalently attached ferrocene is a very large change compared with the difference in going from imidazole to 4-phenylimidazole in CYP119. This indicates that the CYP119 active site pocket is unusually sensitive to subtle changes in the types of molecules bound at the active site compared with P450cam. Whether or not these changes are functionally relevant must await the discovery of the true function of CYP119.
Structural Basis for Thermal Stability-The enhanced thermal stability of proteins from extreme thermophiles has been attributed to a number of factors. For example, an increase in Arg content can aid in thermal stability because of the additional H-bonding possibilities with Arg compared with Lys as well as the partial aromatic character of the guanidinium group (32). However, CYP119 contains a normal Arg content, 7.6% of the total amino acids, compared with P450cam and P450eryF, 8.4 and 6.2%, respectively. CYP119 also does not contain an unusual proportion of aliphatic and aromatic residues. In addition, the fraction of total accessible surface area due to nonpolar (aromatic and aliphatic) residues in CYP119 is relatively high at 17%, compared with 11 and 12% for P450cam and P450eryF, respectively.
Where CYP119 does differ is in the number and type of salt linked networks and aromatic interactions. Salt bridges were computed with the program HBPLUS (33) and the results are summarized in Table 2. CYP119 has a lower total number of 2-residue salt bridges but a larger number of salt-bridged networks. Whereas the two crystal forms of CYP119 differ slightly in salt bridges (Table 2), both CYP119 structures have four, P450cam has three, and P450eryf has only one salt-bridged network. The presence of only one additional salt-linked network in CYP119 compared with P450cam is unlikely to be the primary reason for enhanced stability. There is, however, an  additional interesting difference in salt bridges. In P450cam the longest distance between interacting side chains in a saltbridged network is 78 residues whereas the longest in CYP119 is 287 residues; Arg-9 in CYP119 interacts with Glu-296, which effectively ties the N-and C-terminal regions together in a salt bridge. We doubt, however, that a single long-range salt bridge network can account for a large portion of the enhanced stability of CYP119 compared with P450cam. The most striking difference that might be associated with enhanced thermal stability is a unique clustering of aromatic residues in CYP119. A recent homology model of CYP119 (34) correctly predicted the aromatic clustering found in the crystal structure. Fig. 8 displays all the aromatic residues in CYP119, P450cam, and P450eryF. CYP119 has two sets of aromatic clusters not found in the other P450s. The first cluster involves five residues (Tyr-2, Trp-4, Phe-5, Phe-24, and Trp-281) that span a distance of ϳ11.3 Å between ␣-carbons. Tyr-15 could also be considered part of this cluster because Met-8 contacts both Tyr-15 and Phe-24. The second cluster consists of seven residues: Phe-225, Phe-228, Trp-231, Tyr-250, Phe-298, Phe-334, and Phe-338 and spans a distance of ϳ24 Å between ␣-carbons. Cluster 1 ends with Tyr-2 whereas cluster 2 begins with Tyr-250. The distance between the two coplanar aromatic rings is ϳ7 Å. However, the guanidinium group of Arg-287, which ion pairs with Asp-296, is stacked between the two Tyr side chains thus providing a continuous stacking interaction between the two aromatic clusters. This means a continuous aromatic/nonpolar "ladder" involving all aromatic residues, and one arginine spans the entire edge of CYP119 covering a distance of ϳ39 Å. P450eryF, with a single cluster of six aromatic residues that spans a distance of ϳ11 Å comes closest to matching this arrangement. Based on the analysis thus far, this is the one obvious difference that is most likely to be a critical part of the enhanced thermal stability properties of CYP119. It is of interest to note that a recent mutagenesis study has shown that increasing aromatic interactions by introducing only one additional aromatic side chain increased thermal stability by 9°C in a xylanase (35). In addition, a single point mutation (F31Y) in ribonuclease P2 from S. solfactaricus disrupted an aromatic cluster which " . . . led to an unprecedented loss" of thermal-and baro-stability by at least 27 K and 10 kilobar, respectively (36). Future mutagenesis work on CYP119 should reveal if the aromatic clustering revealed by the CYP119 crystal structure is the key to its increased thermal stability.
Conclusions-As expected, the CYP119 structure closely resembles other known P450 structures. CYP119 is shorter owing primarily to fewer residues at the N terminus and two surface ␤-hairpins. Unlike what has been observed in other P450s, the F/G loop connecting the F and G helices can adapt dramatically different conformations to optimize interactions with active site ligands of different sizes and shapes. Such changes may foreshadow the type of flexibility required for those P450s that are able to metabolize a wide range of sub-strates, especially the key drug metabolizing P450s. One important structural feature that may be related to thermal stability is a large cluster of aromatic residues in CYP119 that is not found in the other P450s.