Hierarchic Finite Level Energy Landscape Model

One of the most intriguing predictions of energy landscape models is the existence of non-exponential protein folding kinetics caused by hierarchical structures in the landscapes. Here we provide the strongest evidence so far of such hierarchy and determine the time constants and weights of the kinetic components of the suggested hierarchic energy landscape. To our knowledge, the idea of hierarchical folding energy barriers has never been tested over such a broad timescale. Refolding of yeast phosphoglycerate kinase was initiated from the guanidine-unfolded state by stopped-flow or manual mixing and monitored by tryptophan fluorescence from 1 ms to 15 min. The strategy to build a model that describes folding of yeast phosphoglycerate kinase was to start from the simplest paradigm and modify it stepwise to the necessary minimal extent after repeated comparisons with the experiments. We made no a priori assumptions about the folding landscape. The result was a hierarchic finite level landscape model that quantitatively describes the refolding of yeast phosphoglycerate kinase from 1 ms to 15 min. The early steps of the folding process happen in the upper region of the landscape, where the surface has a hierarchic structure. This leads to stretched kinetics in the early phase of the folding. The lower region of the energy landscape is dominated by a trap that reflects the accumulation of molten globule intermediate state. From this intermediate, the protein can reach the global energy minimum corresponding to the native state through a cross-barrier folding step.

The general mechanism of protein folding, i.e. how the amino acid sequence directs the organization of the polypeptide chain toward the native structure, is still an unresolved problem. There is a wealth of experiments published in the literature, which provide background for the theories intending to describe protein folding, but a thorough understanding is still missing (1)(2)(3)(4).
One of the most interesting predictions of landscape models is the existence of hierarchically structured protein folding energy surfaces that produce stretched folding kinetics that lack a characteristic time scale (25). Stretched refolding kinetics has been observed experimentally and explained using the energy landscape concept (23, 26 -28). Although these findings provide experimental tests of hierarchic energy landscapes, and show examples of kinetics that stretch over several orders of magnitude in time, the models presented to explain the results are based on a priori assumptions about the shape of the energy landscape.
A possible starting point to treat complex kinetics is to write the time dependence of the concentrations (C) of different states in the following form (25).
C͑t͒ ϭ nϭ1 N a n n exp ͭ Ϫ t n ͮ (Eq. 1) Here the parameters a n are the amplitudes and n the time constants of the Debye contributions, which correspond to different "levels" of the energy landscape. For a landscape that consists of a multitude of hierarchically structured kinetic barriers, the a n and n parameters are not independent, they obey a scaling relation. These scaling relations are apriori included in models that explain strange protein folding kinetics (23, 26 -28). Apart from a few exceptions (25) the summation is extended over an infinite number of components without considering the possibility of a hierarchic system with a finite number of Debye contributions.
Two types of scaling relations are used most frequently. If the parameters follow a power law scaling: a n ϭ a 1 ⅐␣ nϪ1 (0Ͻ␣ Ͻ 1); n ϭ 1 ⅐ n Ϫ 1 ( Ͼ1), and the summation is extended to infinity, the resulting kinetics can be approximated by a power law function (25). If the scaling of the parameters is: a n ϭ a 1 ⅐␣ nϪ1 (0Ͻ␣ Ͻ 1) and n ϭ 1 ⅐n ( Ͼ1), and the summation is extended to infinity, the kinetics can be approximated by a stretched exponential (29). The discrete character of the energy levels is reflected in the summation and yields a correction term of logarithmic oscillations superimposed on the kinetics (25).
Phosphoglycerate kinase (PGK) 2 is one of the model proteins that has been used in unfolding and refolding studies (30 -33). Circular dichroism-detected stopped-flow measurements have shown significant structural changes within the instrumental dead times (32,34,35). Studies using Förster transfer pairs have indicated that an initial collapse occurs within the 5ms dead time of these experiments (34). The early collapse is followed by slower processes, and it takes minutes to refold to the native state (36 -38). A molten globule intermediate was also observed during the refolding of PGK (39). In spite that several details of PGK folding have been revealed, no coherent model exists to date that would explain the folding of this protein from a few milliseconds until the full refolding.
Yeast phosphoglycerate kinase (yPGK) is a monomeric protein that is comprised of two domains linked by a helical hinge. It contains two tryptophans, both located on the C terminus. Trp 333 is buried in the main hydrophobic core, and its fluorescence is strongly quenched in the native state, while Trp 308 is located in a solvent-exposed turn, and it is less quenched in the native fold (40,41). Proline 204 is located in a loop between the helix of the interdomain hinge, and a ␤ sheet of the C domain. It is highly conserved, and it is the only cis-proline in PGKs from Saccharomyces cerevisiae, Bacillus stearotermophilus, Trypanosoma brucei, and Thermotoga maritima (42).
In this paper, we follow refolding of yPGK and explain the observed kinetics by a hierarchic finite level (HiFi) energy landscape model. Our results provide the strongest test so far of the hierarchic protein folding landscape concept.

EXPERIMENTAL PROCEDURES
Protein Purification-Wild-type yPGK was purchased from Sigma and used after purity check by denaturing gel electrophoresis. The P204H mutant of a histidine-tagged version of the yPGK (HisPGK P204H) was constructed, expressed, and purified as described earlier (43). Protein concentrations were determined using the method of Edelhoch from the UV absorbance spectra measured on a Carry4E spectrophotometer (44). Samples were stored in 50 mM, pH 6.2, phosphate buffer at Ϫ80°C.
Kinetic Measurements-Refolding of samples containing 30 -35 M protein unfolded in 1.7 M GuHCl, 50 mM, pH 6.2, potassium phosphate, 1 mM EDTA, 1 mM DTT aqueous solution, was initiated by 11-fold dilution with a similar buffer containing no GuHCl, using stopped-flow (Applied Photophysics *-180) and manual mixing. Earlier experiments demonstrated that both yPGK variants studied here are completely unfolded at the conditions used before mixing (43). After mixing, the wild-type protein is completely folded, and 85% of the P204H mutant is in the native state (43). Refolding was followed by time resolved changes of tryptophan (excited at 295 nm, 5 nm bandwidth) or ANS fluorescence (excited at 365 nm, 5 nm bandwidth) as described earlier (45).

RESULTS AND DISCUSSION
Refolding of yPGK- Fig. 1A shows the refolding kinetics of the wild-type yPGK from the GuHCl unfolded state between 1 ms and 15 min after initializing refolding by rapid mixing. To build a model that describes yPGK folding we will pursue the following strategy. We start from the simplest paradigm and modify it stepwise to the necessary minimal extent after repeated comparison with the experiments.
The simplest model for folding of any protein involves a single cooperative folding step, in which the unfolded (U) protein adopts its folded structure (F ): U 7 F. Such simple folding mechanisms give rise to single exponential folding kinetics, which could not describe our results. The experimental folding kinetics shows the accumulation of a hyperfluorescent state a few seconds after initializing folding. The native structure forms later, with a time constant of a few minutes. This is in agreement with an earlier publication (39).
We modified the folding mechanism to describe yPGK by the addition of an intermediate state (I), and the reaction scheme changed to the following: U 7 I 3 F.
Earlier circular dichroism detected GuHCl titrations have shown that under our refolding conditions (0.15 M GuHCl) 99.997% of the sample is in folded state (43). This means that the reaction pointing from the folded state toward non-native states is roughly 3⅐10 4 times slower than the refolding reaction. Refolding of yPGK takes more than 1 min, so the back reaction has a larger time constant than 3⅐10 4 min (roughly 20 days). Since the full timewindow of our measurements is 15 min, the back reaction of the I 3 F transition was neglected in our model.
The observed fluorescence signal can be calculated from the molar fractions of the unfolded (U), intermediate (I), and folded (F) species as follows.
Here ⌽ U , ⌽ I , and ⌽ F denote the fluorescence yield of the sample in the unfolded, intermediate, and folded states, respectively. The U 7 I 3 F reaction yields a bi-exponential kinetics. Such time dependencies differed significantly from the detected refolding kinetics. We observed a gradual fluorescence intensity increase that stretched over more than 3 orders of magnitude in time: from one millisecond to several seconds. To describe this, we generated the transition from the unfolded to the intermediate state as the sum of several Debye contributions.
Here the a n amplitudes and n time constants are independent parameters of the kinetics.
If the time constant of the monomolecular transition from I to F is , the system of differential equations describing such a scenario is as follows.
After solving the equations, the fluorescence intensity can be calculated as follows.
To adjust this general description to the studied problem, we performed least squares fits of the folding kinetics for different numbers N of exponential components in the U 4 ⅐ ⅐ ⅐ 3 I transition. As the number of the exponential components was increased, the mean square deviation between the observed and calculated kinetics ( 2 ) decreased. This is easily explained by the increase in the number of the fit parameters. When trying to fit with a larger number of exponential components than five; however, unrealistic fit parameters were obtained (e.g. components with identical time constants or time constants that fell outside the measured time range). The best fit obtained with five independent exponentials gave a good fit of the experimental data. Fig. 1B shows the residuals of the fit. Fig. 2 shows the amplitudes (a n ) and time constants ( n ) of the best fit. The logarithmic plot clearly indicates that both the amplitudes and the time constants show a power law scaling. Such scaling can arise if the folding reaction is directed by an energy landscape that contains a high number of small traps that are arranged in hierarchic structure (25). In this case the a n amplitudes and n time constants are not independent parameters any more, they are connected through the scaling of the energy landscape (25).
Following the implication of the scaling property found for the fit parameters, a scaled energy landscape model is proposed to describe the refolding of the yPGK. The equation describing this scenario is identical to the equation used in the fit with independent exponentials, but the amplitudes and time constants are connected through a power law scaling relation (25). a n ϭ a 1 ⅐ ␣ nϪ1 , n ϭ 1 ⅐ nϪ1 (Eq. 6) The corresponding energy landscape has a large number of traps that can have N different depths, which give rise to a large number of independent relaxation pathways described by N different relaxation times. The fastest relaxation pathway has a time constant of 1 . Every deeper level has times longer relaxation time than the level before it. The number of the protein molecules following the pathway with the characteristic time n is proportional to the a n / n ratio. The scaling property of the landscape reduces the number of the free fit parameters to the constants a 1 , 1 , ␣, and , regardless of N. The refolding of yPGK was modeled at several different numbers N of the levels in the scaled energy landscape. Fig. 3 shows the dependence of the mean square deviation of the best fit from the measured data ( 2 ) on the number N of the levels in the scaled landscape.
Unlike in the fit with independent exponentials, here the increase in the number of the hierarchic levels does not change the number of the fit parameters, it only changes the shape of the fitting function. As it is clearly visible, a minimum 2 was obtained with n ϭ 5 levels. The residuals of the fit are shown in Fig. 1C. The parameters of the fit are summarized in Table 1. The 2 of the fit with the scaled landscape model is 28.8⅐10 Ϫ6 , only slightly larger than that obtained with the fit using five independent exponentials for the U 4 ⅐ ⅐ ⅐ 3 I transition (26.2⅐10 Ϫ6 ). The number of the fit parameters is, however, much smaller in the case of the landscape model (six independent parameters) than in the case of the multi-exponential fit (13 independent parameters). This also argues in favor of the landscape model.
The scaling relation of the amplitudes and time constants seen in Fig. 2 could also come from an accidental coincidence in the interconversion rates of a small number of well defined intermediate states and in their fluorescence yields. If the scaling of the parameters is a mere coincidence, then a mutation that changes the interconversion rates of the supposed intermediate states should diminish the above scaling property. Mutation of the only cis-proline of the yPGK structure changes the refolding kinetics of the protein (43), so we assayed the refolding of the P204H mutant of a histidine-tagged version of yPGK.
Refolding of the HisPGK P204H Mutant- Fig. 4A shows the refolding kinetics of the HisPGK P204H mutant. A similar analysis was done than in the case of the wild-type protein. The best fit was obtained when n ϭ 5 independent exponentials were used to describe the U 4 ⅐ ⅐ ⅐ 3 I transition. A further increase in the number of the exponentials yielded unrealistic fit parameters similarly to the case of the yPGK. Fig. 4B indicates the residuals of the multiexponential fit. The a n amplitudes and n time constants of the fit were clearly different from the param-eters obtained for the wild-type yPGK, but the scaling property remained valid.
A similar analysis of the 2 of a scaled landscape model fit was done for the HisPGK P204H as for the wild-type protein. The best 2 was obtained with n ϭ 5 hierarchic levels. Fig. 4C shows the residuals of the best fit with the scaled energy landscape model. The parameters of the best fit are summarized in Table  1. The 2 obtained with the scaled landscape model (85.9⅐10 Ϫ6 ) is in the same range as that of the multi-exponential fit (53.4⅐10 Ϫ6 ), although it uses less than half as many independent fit parameters.
Based on the above comparison of the refolding kinetics of the wild-type protein and the HisPGK P204H mutant, we can state that the transition from the unfolded into the intermediate state during the refolding of the yPGK is directed by a scaled energy landscape. This finding is further supported by earlier results of fast folding experiments that found stretched folding kinetics in several variants of yPGK in the 20 s to 10-ms region (27,40).  (Fig. 5) coincides well with the tryptophan fluorescence detected refolding of the wild-type protein (Fig. 1). A fit with the hierarchic landscape model and the residuals of the fit are also shown in Fig. 5. The kinetic parameters of the fit are identical within experimental error to that obtained from tryptophan fluorescence (Table 1) we can thus conclude that the intermediate state (I) of our model is a molten globule.
Hierarchic Landscape Versus Multi-exponential Models-Energy landscape models offer a unified view of protein folding that can interpret the diverse experimental findings. The  The U 4 ⅐ ⅐ ⅐ 3 I transition proceeds on a hierarchically structured energy landscape. The smallest kinetic trap has a time constant 1 . There are N levels in the hierarchic structure. Each level has fold longer relaxation time than the level before it, and ␣/ times more protein molecules follow the pathway than that of the previous level. The time constant of the I 3 F transition is . The mean square deviation of the model fit from the measured data is also indicated ( 2  physico-chemical environment of the protein in vitro or in vivo modulates the energy landscape that governs the dynamics of the molecule, but the underlying principles remain the same. If the energy landscape of the protein contains only a few deep energy minima, folding kinetics of the protein can be well described by simple mass action models that lead to multi-exponential kinetics. Since any kinetics can be decomposed into the sum of exponentials, there is no simple experimental way to distinguish between folding through a few discrete intermediate states and folding on a hierarchic landscape. Here we provide the strongest experimental evidence so far of a hierarchic structure in the folding energy landscape. In contrast to previous studies, we analyze our measurements without any a priori assumptions about the mechanism of the folding reaction, and we find that a hierarchic finite level model emerges. The model describes correctly both the tryptophan and the ANS fluorescence changes during the folding reaction. The observed kinetics can arise from discrete folding intermediates only if a coincidence happens: the amplitudes and time constants of the multi-exponential kinetics have to follow power law scaling. In this case, however, a mutation that changes the folding kinetics, changes one or more of the time constants and the amplitudes of the transitions between the intermediate states. This should break the accidental coincidence and should reveal the multiexponential character of the mechanism. Mutating the evolu-tionary conserved cis-proline of the structure changed the folding kinetics but did not reveal multi-exponential mechanism; on the contrary, the kinetics was well described by the hierarchic finite level model. Conclusions-We present, to our knowledge, the strongest test so far of the hierarchic folding landscape idea. Using a hierarchic landscape model, we describe the refolding kinetics of yPGK from 1 ms to the acquisition of the native structure (15 min) within experimental error.
According to our HiFi model, the early phase of the yPGK folding proceeds on the scaled upper region of the energy landscape of the protein. A multitude of traps with different depth is organized in n ϭ 5 levels giving rise to a large number of independent relaxation pathways with n ϭ 5 different relaxation times. The fastest relaxation pathway has a time constant of 1 ϭ 2.4 Ϯ 0.5 ms. Each level has ϭ 8 Ϯ 0.9-fold longer relaxation time than the level before it, and ␣/ ϭ 0.75 Ϯ 0.17 times more protein molecules follow the pathway than that of the previous level. The hierarchically scaled landscape gives rise to stretched kinetics that extends up to several seconds. During this early phase, a molten globule intermediate is formed. The native structure is formed from the molten globule intermediate in a single cross-barrier folding step with a time constant of ϭ 121 Ϯ 11 s.