Folding Regulates Autoprocessing of HIV-1 Protease Precursor*□S

Autoprocessing of HIV-1 protease (PR) precursors is a crucial step in the generation of the mature protease. Very little is known regarding the molecular mechanism and regulation of this important process in the viral life cycle. In this context we report here the first and complete residue level investigations on the structural and folding characteristics of the 17-kDa precursor TFR-PRC (161 residues) of HIV-1 protease. The precursor shows autoprocessing activity indicating that the solution has a certain population of the folded active dimer. Removal of the 5-residue extension, C at the C-terminal of PR enhanced the activity to some extent. However, NMR structural characterization of the precursor containing a mutation, D25N in the PR at pH 5.2 and 32 °C under different conditions of partial and complete denaturation by urea, indicate that the precursor has a high tendency to be unfolded. The major population in the ensemble displays some weak folding propensities in both the TFR and the PR regions, and many of these in the PR region are the non-native type. As both D25N mutant and wild-type PR are known to fold efficiently to the same native dimeric form, we infer that TFR cleavage enables removal of the non-native type of preferences in the PR domain to cause constructive folding of the protein. These results indicate that intrinsic structural and folding preferences in the precursor would have important regulatory roles in the autoprocessing reaction and generation of the mature enzyme.

Retroviruses including human immunodeficiency virus (HIV), 1 use their minimal genetic information by encoding their structural proteins and enzymes as two polyprotein precursors Gag and Gag-Pol (1). Autoprocessing of these precursors is an essential step in the life cycle of the virus (2,3). In HIV, HIV-1 protease (PR) plays a crucial role in virus maturation by processing these precursors into functional proteins (4,5). The HIV-1 PR is a 22-kDa homodimeric aspartyl protease, with each monomer having 99 amino acids and contributing the conserved catalytic sequence Asp-(Ser/Thr)-Gly (6 -9). As the HIV-1 protease, which is flanked by the highly variable p6 pol at its N terminus and by the reverse transcriptase (RT) at its C terminus (10,11), is responsible for all cleavages in the Gag-Pol precursor, its dimerization and autocatalytic release from the Gag-Pol are critical steps in the viral life cycle (4,13,14). Earlier studies have shown that premature activation or partial inhibition of the protease leads to retarded viral maturation (15)(16)(17)(18). Hence understanding the exact sequence of protease maturation from the Gag-Pol precursor has gained importance in recent years because of its intrinsic importance in viral maturation and as a target for drugs against AIDS (19).
Pettit et al. (20) have recently shown, by co-expressing equivalent amounts of substituted Gag-Pol constructs, that the initial cleavage of the HIV-1 Gag-Pol precursor is intramolecular. Moreover, they showed that competitive active site inhibition by the drug retonavir was 10,000-fold less for the protease embedded in the precursor than for the mature free protease (20). Earlier, kinetic studies on the model precursor system MBP-⌬TF-Protease-⌬RT showed that the protease maturation takes place in two steps. (⌬TF and ⌬RT are short native sequences from the transframe protein and the reverse transcriptase, respectively. MBP stands for maltose-binding protein of Escherichia coli containing two native cleavage sites, p6 pol /PR at the N terminus and PR/RT at the C terminus.) The first step involves an intramolecular cleavage of the N terminus that is followed by intermolecular cleavage of the C terminus (19,21). A relatively low K m for peptide substrates representing the p6*-PR (where p6* is TFPϩp6 pol ) cleavage site, compared with that for oligopeptides corresponding to other Gag or Pol cleavage sites (23) supports the view that the Nterminal cleavage is an early event in the proteolytic cascade. The activity of the protease-⌬RT was found to be nearly equal to that of the mature PR, though its conformational stability was much less than that of PR (19). However a 600-fold decrease in catalytic activity was seen in MBP-⌬TF-protease-⌬RT compared with mature PR (19,21). Thus the flanking N terminus of the protease seems to have important consequences with maturation.
The N-terminal transframe region (TFR) consisting of a conserved N-terminal transframe octapeptide (TFP) and a 48 -60 amino acid long variable p6 pol , with a protease cleavage site at the intersection, does not have any stable secondary or tertiary structure in free solution (24), though some tendency for helix formation has been seen. However, when present with the PR, TFR does seem to act as a regulator for the autoprocessing of the protease (11,23,(25)(26)(27)(28)29). Interaction of recombinant p6* protein with HIV-1 PR was found to specifically inhibit its activity, and the inhibition was dependent on the C-terminal cleavage site residues SFNF in the p6*. In separate experiments with the precursor, these residues blocked the substrate * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. binding cleft in HIV1-PR after N-terminal autoprocessing of the precursor. At the same time it was also observed that the p6* stabilized the dimer, as the relative amount of dimer increased by 12% in its presence (25). Functional characterization of the model precursor ⌬TFP-p6 pol -PR (⌬TFP is a 5-residue variant of TFP) by examination of the mechanism and the pH rate profile of the autocatalytic reaction to produce mature PR shows that full-length TFR with its native cleavage sites is critical for the regulated autoprocessing of Gag-Pol and for optimal catalytic activity (28). The extensive study by Dautin et al. (27) on functional modulations due to N-and C-terminal extensions to PR, using an E. coli genetic assay for proteolytic activity and a bacterial two-hybrid system, shows that the TFR can restore enzymatic activity to a dimerization-deficient HIV protease variant. Experiments with various deletion and addition mutants of PR and its precursors, Gag-Pol, TFR-PR, also give insights into folding and dimerization of PR (29). For example, deletion of the first four residues in PR led to Ͼ90% unfolded ⌬PR. Similar destabilization was observed for PR with additional residues in the N terminus (29). Earlier, it has also been shown that removal of the p6 pol domain from the Gag-Pol polyprotein leads to a significantly higher rate of processing of the Gag-⌬Pol precursor (31).
The studies discussed so far are mainly based on enzymatic activity assays for the HIV-1 PR and its precursors using the chromogenic peptide substrate Lys-Ala-Arg-Val-Nle-Phe(p-NO 2 )-Glu-Ala-Nle-NH 2 (19,21,32), or immunoblotting assays of the autolytic products. These give very good quantitative as well as qualitative information with regard to the working of the various precursors of HIV-1 protease. However, there are very few reports about the residue level structural characteristics of these precursors, which is crucial to understanding the molecular mechanism of protease maturation (29,33). Louis et al. (29) have earlier shown, through NMR, how the N-terminal TFR extension to the HIV-1 PR does not allow it to fold even in the presence of DMP323, which is one of the tightest binding inhibitors. Detailed NMR structural characterization of wildtype TFP-p6 pol -PR was not possible because of its autolytic property. Hence, in a later study, an active site D25N mutation was introduced, and the HSQC spectra were seen to have many peaks at the same chemical shifts as in the spectra of the folded PR D25N , though, they also had many intense peaks in a narrow region of amide proton chemical shifts (8.0 -8.5 ppm), presumably belonging to the TFR residues. This indicated that the PR region folded properly, although the TFR region could not be characterized because of insufficient dispersion of the peaks (33). It was suggested that the TFR region was largely unstructured.
Thus, all the above studies demonstrate the importance of TFR on the folding and maturation of the protease. However, the mechanistic details at the residue level are still not understood. In this context we present here investigations on a precursor TFP-p6 pol -PR-C nn , where C nn is a non-native pentapeptide extension at the C terminus of PR. Bacterial expression and MALDI analysis of the precursor show that TFR does not hamper the autoprocessing of the precursor so as to release the PR. Deletion of C nn enhanced autoprocessing, indicating that the non-native C-terminal extension interferes in the cleavage mechanism. We carried out extensive NMR investigations on the precursor containing an active site mutation D25N, which was stable for several weeks for NMR experiments. We monitored the intrinsic folding propensities of the precursor by studying the graded changes in the dynamic as well as structural characteristics of the equilibrium intermediates, created by use of different concentrations of the chemical denaturant, urea. These results have significant implications for the regu-lation mechanism of the autoprocessing reaction of HIV-1 protease precursors.

MATERIALS AND METHODS
Protein Preparation-Starting with the clone for the TFR-PR-tethered dimer (TFR-PR-C nn -PR), kindly supplied by Dr. M. V. Hosur of Bhabha Atomic Research Centre, Mumbai, we introduced an active site mutation, D25N, in PR, using a standard PCR-based site-directed mutagenesis strategy; this mutation does not affect PR folding but prevents its autocleavage. From this the TFR-PR-C nn region was selected and introduced into the NdeI/BamH1 multiple cloning site of a pET11a plasmid. The inclusion of the C nn , besides providing a non-native flanking C terminus, has a practical advantage. It has the sequence GGSSG, and the glycines have a special significance in the NMR assignment strategy. At the same time, C nn is known not to affect the folding characteristics of PR in the tethered dimer, which folds similarly to the native homodimer (34). Similarly, the TFR-PR construct was also prepared. The desired wild-type constructs were prepared by PCR amplification of TFR-PR and TFR-PR-C nn regions from the full clone (TFR-PR-C nn -PR) and introducing them into a pET11a plasmid as described above. The constructs were sequenced to verify that there were no inadvertent PCR-induced errors. The plasmid was transformed into E. coli strain BL21(DE3) for protein overexpression. Transformed bacteria were grown at 37°C in M9 medium to OD 600 of ϳ0.8, and then induced for production of the desired proteins using 1 mM isopropyl-1thio-␤-D-galactopyranoside. Uniformly 15 N-and 15 N/ 13 C-labeled protein samples were prepared by growing bacteria in M9 minimal media supplemented with 1 g liter Ϫ1 15 NH 4 Cl and 4 g liter Ϫ1 [ 13 C]glucose. Protein was purified as described previously (35). MALDI analysis of the protein showed peaks at the expected molecular mass (17.3 kDa). The NMR samples contained 1 mM protein in 50 mM acetate buffer (pH 5.2) containing 5 mM EDTA, 20 mM dithiothreitol, and different concen- Gel Electrophoresis-The recombinant protein was induced in BL21(DE3) E. coli bacterial cells as described in the section on protein preparation. Aliquots were taken at two different induction times, 3 and 5 h, and analyzed on 12% SDS-PAGE.
Capillary Electrophoresis-The purified protein was concentrated to ϳ1 mM and analyzed by neutral capillary electrophoresis on a Beckmann-Coulter capillary electrophoresis system in the presence and absence of the denaturants, urea and guanidine hydrochloride.
Mass Spectroscopy-MALDI-TOF mass spectrometry analyses were carried out with Micromass (UK) MALDI-TOF Spec 2E spectrometer equipped with a UV nitrogen laser (337 nm) and a dual microchannel microplate detector. The samples were prepared by mixing 1 l of protein solution (ϳ20 M) with 1 l of freshly prepared matrix solution (10 mg/ml of 2,5-dihydroxybenzoic acid in 3:2 0.1% trifluoroacetic acid/ acetonitrile). A total of 1 l of this mixture was placed on the stainless steel probe plate and allowed to dry at room temperature. The spectra were recorded in the positive reflector linear mode at an accelerated voltage of 20 kV in the range from 4000 to 30,000 Da. For each measurement, the spectra were externally calibrated using myoglobin and trypsinogen.
NMR Spectroscopy-All NMR experiments were performed at 32°C on a Varian Unity-plus 600 MHz NMR spectrometer equipped with pulse-shaping and pulse-field gradient capabilities. For the HNN spectrum the delays T N , and T C , were both set to 28 ms. 40 complex points were used along the t 1 and t 2 dimensions. The HN(C)N spectrum was recorded with the same T N and T C parameters, the same number of t 1 and t 2 points, and the T CC delay was set to 9 ms. TOCSY-HSQC was recorded with a mixing time of 60 ms, 32 complex points along the 15 N (t 1 ) dimension and 64 complex points along the 1 H (t 2 ) dimension. CBCANH and CBCA(CO)NH were recorded with 40 complex t 1 points ( 15 N) and 64 complex t 2 points ( 13 C). HNCO was recorded with 40 complex points along t 1 and t 2 . An HSQC was recorded with 256 t 1 increments. For the high resolution HSQC data, required for coupling constant measurements, 8192 and 512 complex points were acquired along the t 2 and t 1 dimensions, respectively. For the relaxation measurements 2048 and 256 complex points were collected along the two dimensions. For R 2 measurements, the following Carr-Purcell-Meiboom-Gill (CPMG) delays were used: 10,30,50,90,130,150,190, 230 ms and spectra duplicated at 50 and 150 ms. The R 2 values were extracted by fitting the peak intensities to the equation The experiments were carried out using the pulse sequences described by Farrow et al. (36). (Fig. 1, lanes II and VII). For TFR-PR, we see no trace of the precursor in the SDS-PAGE after 6 h of induction. This is clear evidence that the TFR does not prevent the autoprocessing activity of the precursor (lane VII). However, in the case of TFR-PR-C nn we see the presence of the intact precursor in the gel (lane II). Thus the C-terminal extension in our TFR-PR precursor slowed down autoprocessing. Our MALDI result with the purified protein also points to the same fact (Fig. 2). For the TFR-PR we see only an ϳ11-kDa peak for the PR and a ϳ7-kDa peak for the TFR part; however for the TFR-PR-C nn we see a peak at ϳ18 kDa corresponding to the precursor. This seems to suggest that the C-terminal extension possibly interacts with the PR region; either it interferes with dimer formation or it blocks the active site as has been observed for the SFNF stretch at the C terminus of the TFR in an earlier study (26).

C-terminal Extension at PR Retards the Autoprocessing Activity of the Precursor-We checked the autoproteolytic activity of TFR-PR and TFR-PR-C nn precursors in vivo
Intrinsic Folding Characteristics of the Precursor-Since for the autocleavage reaction, the precursor has to become active by forming a dimer with the correct fold and generate an active site, we attempted to determine the structure of the precursor by NMR in solution. For this purpose we first prepared a D25N mutant of the precursor, which is inactive as the mutation is at the active site of PR. At the same time it is also known that the D25N mutation does not affect the folding of the protease (37). This mutant precursor is thus stable for weeks together and is ideally suited for structural characterization by NMR. However, it turned out that the protein had a high tendency to aggregate, as seen by dynamic light scattering, capillary electrophoresis (data not shown), and also by NMR (see below), over a wide range of experimental conditions of pH and temperature. Deleting the C-terminal extension also did not make any difference with regard to this behavior. This is at variance with the earlier report by Ishima et al. (33) on a precursor, which had the mutations Q7K, D25N, L33I, L63I, C67A in the PR region, and three residues at the N terminus of TFR that were different from those in our precursor. Ishima et al. (33) found the protein to be a monomer and stable even at the high NMR concentration, and the spectra displayed features of native-like fold for the PR region. Therefore to investigate the intrinsic folding characteristics of our present precursor, we undertook to elucidate the structural characteristics in 8 M urea and the transitions therefrom by NMR, and the various equilibrium intermediates were created by systematically varying the urea concentration. In the following, we first de-scribe the sequence-specific resonance assignments for the various urea denatured states and then present the structural and dynamic characterizations of the precursor at pH 5.2 and 32°C. Resonance Assignments-The TFR-PR-C nn precursor is 161 residues long, of which the first 57 residues belong to the TFR portion. The next 99 residues, that is, 58 -156 actually constitute the PR portion. Hence residues 59 -62, 152-156 form the dimerization domain, 82-84 form the active site, 138 -140 form the substrate binding cleft, 100 -106 form the hinge region, and 109 -112 constitute the mobile flaps in the PR (35). The final five residues (157-161) having the sequence GGSSG, constitute an extension to the PR at the C terminus. Henceforth we will use these numbers for structural discussion.
Conventionally, backbone assignment in proteins has been achieved by a combination of several three-dimensional triple resonance experiments, typically, HNCA, HN(CO)CA, CB-CANH, and CBCA(CO)NH (reviewed recently in Ref. 38). These experiments display correlations between H N , 15 N, and (C ␣ , C b ) nuclei along the protein backbone. The success of this approach depends critically on the dispersion of the C ␣ , C ␤ chemical shifts, and therefore for unfolded proteins, where this dispersion is very poor, the method has serious limitations. Our methodology of assignment is based on the recently described triple resonance experiments HNN and HN(C)N (39). The most significant feature of these experiments is the observation of different patterns of positive and negative peaks in the (F 1 , F 3 ) planes depending on the residue types at iϪ1, i, and iϩ1 positions. These have been discussed in detail earlier (39,40); suffice it to say here that glycines and prolines play important roles in this regard, the former because of the absence of the C ␤ , and the latter because of the absence of the amide proton. Triplets containing these residues produce very characteristic patterns in the (F 1 , F 3 ) planes, which can be termed as fixed points. These provide many starts and check points for the sequential walk, and hence it is less crucial to obtain side chain assignment to validate the backbone assignments. Nevertheless, simultaneous analysis of an 15 N resolved TOCSY (41) experiment helps in resolving occasional ambiguities in the connections because of degeneracies of the chemical shifts. This is particularly useful in unfolded proteins, since the side chain chemical shifts are close to their random coil values, and hence the spin systems of the residues can be relatively easily identified.
TFR-PR-C nn has 20 glycines and 8 prolines, which are well distributed over the length of the polypeptide chain. Thus there are a number of fixed points, well distributed, to enable unambiguous assignments. Fig. 3A shows an illustrative sequential walk through the stretch 153-161, and Fig. 3B displays the summary of the connectivities. All the amide and 15 N assignments are shown in the 15 N HSQC spectrum in Fig. 4.
Following the amide and 15 N assignments as discussed above the carbon assignments were readily obtained from the well known triple resonance experiments, CBCANH, CBCA-(CO)NH, and HNCO (42,43). The former two experiments together provide C ␣ , C ␤ assignments while the HNCO provides CЈ assignments. We also obtained many side chain assignments for the individual residues from TOCSY-HSQC spectra in a straightforward manner, making use of the amide and 15 N assignments.
All the assignments made in 8 M urea have been listed in Table I of the Supplementary Material. The HSQC spectra at other urea concentrations were very similar to the one at 8 M urea (see below), and thus peak assignments could be readily obtained by simple transfer of assignments.
Residual Structure at 8 M Urea-It is now becoming increasingly evident that the denatured states of proteins are not always entirely random coils, but may contain regions with preferred conformations or propensities for transient structure formations (44 -47). The regions having propensities for definite structures are the so-called folding cores, which indicate folding initiation sites on initiation of the folding reaction by dilution of the denaturant concentrations. We have probed for the existence of such preferences in the 8 M urea denatured state of TFR-PR by using carbon C ␣ , CЈ secondary shifts (deviations of chemical shifts from their random coil values); these are believed to be the most diagnostic from the point of view of residual structural characterization (47,48). Positive secondary shifts for 13 C ␣ and 13 CЈ indicate a preference for , angles in the helical conformation, while negative secondary shifts indicate a preference for , angles in the ␤-sheet conformation. If a contiguous stretch of 3-4 residues shows a specific pattern of secondary shifts, that can be taken to indicate the presence of a transient secondary structural propensity in that region of the protein. Now, in any protein the observed chemical shifts are influenced both by neighboring amino acids and local backbone structure. Therefore, it is important to correct these for contributions from the local amino acid sequence (49). In the present analysis, the random coil values were corrected using sequence-dependent correction factors determined for a set of Ac-GGXGG-NH 2 peptides in 8 M urea and pH 2.3 (50). For the residues D, E, and H, which are sensitive to pH, the random coil values given by Wishart et al. (51) at pH 5.0, appropriately corrected for the alanine neighbor were used. Deviations in specific chemical shifts were then calculated by subtracting the corrected random coil values from the measured chemical shifts for all the residues in urea-unfolded TFR-PR-C nn . These secondary shifts are shown in Fig. 5. The data does not seem to indicate the presence of any long stretches of preferred conformations but suggests the presence of many short contiguous regions with trends of ␣, ␤ preferences.
The C ␣ secondary shifts (top panel) are rather small for most residues, but show interesting sequence-dependent variations; about 15 discrete residues show large secondary shifts (Ͼ1.

Folding Preferences in HIV-1 Protease Precursor
The TFR segment (residues 1-57) of the protein appears to contain two short ␤ segments and two short ␣ segments. Previous qualitative reports on structural characteristics of TFR-PR (24,28) suggested that the TFR segment may be largely unfolded in aqueous solutions. Our present observations, however, seem to suggest that there may be at least a few regions of some , preferences, in an otherwise largely unstructured polypeptide. The PR segment of the protein contains many ␤ segments and only one ␣ stretch. The location of the ␣ stretch (85-87) is certainly not the same as in the native PR where it occurs near the C terminus; this corresponds to the stretch 143-151 in the present case. Several of the ␤ stretches, namely, 62-66, 70 -72, 98 -102, 110 -114, 134 -138 belong to the native-type structures (␤ type) in the dimeric structure of PR (34).
The CЈ secondary shifts (bottom panel in Fig. 5) corroborate the results from C ␣ secondary shifts to a large extent. In both cases the contiguous stretches with ␣ and ␤ propensities are nearly at the same locations as is also the discrete residues with large secondary shifts. The short displacements of the stretches or a few mismatches may be attributed to the facts that the segments themselves are very short, and the sensitivities of the CЈ and C ␣ secondary shifts to ␣, ␤ preferences could be slightly different. Overall, the CЈ secondary shifts are slightly larger in magnitude compared with the C ␣ secondary shifts (cutoff of 0.4 ppm is used for CЈ secondary shifts). The stretch at 103-111 is significantly longer, and this belongs to the flap segment of the native protease structure (34).
We also measured the H N -H ␣ coupling constants (see below), amide proton temperature coefficients, and 1 H-1 H nuclear overhauser effects (data not shown), all of which indicate that the protein is devoid of any persistent structure in 8 M urea.
The sensitivities of the average coupling constants to the structural preferences are perhaps relatively smaller compared with the secondary chemical shifts. The transverse relaxation rates (R 2 ) (see below) indicated, however, sequence-dependent variations, suggesting possibilities of conformational transitions at certain locations. Thus we conclude that in 8 M urea at pH 5.2 and 32°C, the polypeptide is largely unstructured but with short pockets of specific secondary structural propensities in a dynamic ensemble.
Equilibrium Intermediates Along the Folding Funnel-Equilibrium intermediates created by different denaturant concentrations help to understand the folding transitions along the folding pathway of a protein. Fig. 6 shows the HSQC spectra of the precursor as a function of denaturant concentration. Interestingly, the spectra (Fig. 6, panels a-d) at 8, 6, 4, and 2 M do not show any substantial change in the profile of peak dispersions, thus showing that the protein has a high tendency to be unfolded. All the peaks present in the 8 M spectrum are also present in the 6, 4, and 2 M spectra at almost identical positions, barring a few that show small shifts. However, there are some weaker peaks in the spectra at all the denaturant concentrations, which suggest the presence of other conformations that may be partially folded forms. The presence of these peaks indicates that the state identified by the conserved peaks in the spectra would have differences in the dynamic characteristics under the different denaturant conditions. The spectrum (Fig.  6, panel e) in the absence of urea shows very few broad peaks, which is consistent with the aggregation behavior of the protein discussed earlier.
Folding Propensities-We have characterized the structural transitions from the 8 M urea state to the various other urea- created intermediates in our precursor using transverse relaxation rates (R 2 ) and three bond H N -H ␣ coupling constants.
The transverse relaxation rates are sensitive to slow motions and conformational transitions occurring on the milli-to-microsecond time scale. In many instances these have provided valuable insights into sequence-dependent motional restrictions and flexibilities in denatured proteins, which in turn provide clues to the folding mechanisms (12,(52)(53)(54)(55)(56)(57). In an earlier report, Bhavesh et al. (52) have shown the importance of sequence-specific variations in the transverse relaxation rates (R 2 ), as denaturant concentration is decreased, on the folding hierarchy of HIV-1 protease. The changes in the magnitude of R 2 values as the denaturant concentration is varied directly reflect on the transient conformational changes along the sequence that may lead to order, and hence native structure development by formation of native contacts or breaking of non-native contacts. Fig. 7a shows the R 2 values for the TFR-PR-C nn as the denaturant concentration is decreased from 8 to 2 M. The R 2 values do show sequence-specific variations indicating different degrees of restricted motions along the chain. Both upward and downward changes occur, suggesting sequencedependent transient changes in the structural preferences. At this juncture we may mention that the absence of data points for some of the residues is caused by the difficulty in quantitation because of nearby weaker peaks, and also, data points having more than 15% fitting error have not been included; in most cases the errors are less than 6%. Fig. 7b shows the changes in the R 2 values as we move toward lower denaturant concentration. Negative and positive deviations indicate increase or decrease in R 2 , respectively, and correspondingly represent increased and decreased conformational transitions, as long as the protein is still largely unfolded and there are no rigid structures. Once the rigid structures are formed, changes in R 2 values would be dictated by internal motions only. The deviations in Fig. 7b may be divided into three classes Ϯ(Ͼ1.5), Ϯ(1.5-1.0) and Ϯ(1.0 -0.0). The third class is roughly similar to the errors in the R 2 measurements and hence may not be considered as significant. Thus it follows that as we move from 8 to 6 M urea the residues 9 -12, 16, 55-56, 75, 77, 111, 124, 144, 149, 157 show large propensities for conformational transitions followed by residues 18, 67, 80, and 102. The important num-bers among these are: 55-56 at the N terminus, 157 at the C terminus, both of which are cleavage sites of the TFR-PR, and 102,111 at the flaps of the PR domain. As we move to 4 M, mostly the same regions exhibit variations, but the magnitudes are somewhat reduced, except for the stretch 82-84 at the active site, which shows enhancements indicating large conformational transitions. The general reduction may indicate a tendency toward formation of stable contacts. This trend continues as we move to 2 M, where we see a large decrease in the contribution from the conformational transitions at milli-to microsecond time scales. This seems to indicate formation of relatively more rigid contacts. A more detailed characterization would require NOE quantitation and structure calculations, but this is hampered by the tendency of the protein to aggregate and precipitate. The 3 J(H N -H ␣ ) coupling constant, which has the main chain torsion angle dependence, is an NMR parameter that can be rigorously analyzed to get an insight into the secondary structural elements that define the conformational preferences (30). A value in the order of 3-5 Hz corresponds to ␣-helix, 8 -11 Hz corresponds to ␤-sheet, and 6.0 -7.5 Hz, which essentially is an average of the ␣, ␤ values corresponds to random coil. It is also observed that the random coil value for any residue is influenced by its N-terminal neighbor and thus two sets of values have been reported for each residue, depending upon whether the N-terminal neighbor belongs to one of the two groups of residues (22) (group I: Phe, Trp, His, Tyr, Ile, Thr, Val, and group II: remaining residues except glycine). Thus, under any given experimental conditions, deviation of the observed coupling constants from the sequencedependent random coil value, (J obs Ϫ J rc ), which we call as secondary coupling constants analogous to secondary chemical shifts, would throw valuable light on the secondary structural propensities along the polypeptide chain. Negative secondary coupling constants would indicate ␣-helical propensities and positive secondary coupling constants would indicate ␤ propensities. Fig. 8a shows the fine structures of the peaks in the HSQC spectra from which the couplings were derived. The measured values range from 5.0 to 9.1 Hz along the sequence in all the cases from 8 to 2 M urea as shown in Fig. 8b, and the average value (indicated by horizontal line) is roughly the same (ϳ6.8 Hz); the estimated error in the measured coupling constants is ϳ1 Hz. The calculated secondary coupling constants are shown in Fig. 8c. As expected, many of the secondary coupling values are zero or close to zero because of random coil characteristics; however there are also few residues that show either positive or negative deviations larger than 1 Hz. Notable among these are the contiguous stretches at 8 -12 in 6 M and at 19 -25 and 110 -116 in 4 M data, which may be taken to indicate some conformational transitions. These have been marked on the figure by empty cylinders and all of them correspond to ␣ propensities. Interestingly, those in the PR region are nonnative type. Fig. 9 summarizes all the observations with regard to the residual structures in 8 M urea, which represent intrinsic initial preferences, and the folding transitions as the urea concentration is reduced. An important inference that can be derived from these at a glance is that folding is not a unidirectional process, in that a structure once formed at any particular intermediate stage does not continue to remain all along till the end. Structure forming-breaking events occur continuously in the progress toward the folded state. It is interesting to note that the TFR region in the precursor, which is considered to be largely unstructured, also has some intrinsic preferences. These may influence the preferences in the PR domain as well by direct interaction. The PR domain has not only native preferences but also many non-native preferences, which must be removed for the protein to fold properly to its native form. In the PR alone this does seem to happen and a fully folded dimeric protein is obtained. However, in the inactive precursor, failure to remove the TFR portion prevents removal of nonnative contacts, leads to possible interaction between the TFR and PR regions, which in turn, results in a misfolded state, highly prone to aggregation at the NMR concentrations. CONCLUSIONS We have tried here to obtain residue level insights into the intrinsic conformational preferences in the precursor TFP-p6 pol -PR-C nn and the characteristics of the folding intermediates by investigating the structural and dynamic features in the species created by systematic variation of urea concentration in the solution. The protein seems to have intrinsic preferences for many non-native contacts in addition to several native preferences, and the presence of TFR prevents proper folding of the protein to the native dimeric structure. However, these non-native contacts seem to get removed after cleavage of the TFR in the normal course. By extrapolation, blocking of this cleavage by an active site mutation seems to prevent removal of the non-native contacts, which in turn leads to misfolding, and consequent aggregation of the protein. Possibly there is an interaction between the TFR region and the PR region, which hinders such a change. Indeed, previous reports have indicated existence of such an interaction (26). An interesting observation in our precursor was that the C-terminal extension led to some retardation of its autoprocessing activity, as shown by the induction SDS-PAGE analysis and the MALDI data. This may suggest an interaction between the C-terminal extension and the PR domain as well. These results viewed in the background FIG. 8. Secondary coupling constants. a, portion of the high resolution HSQC spectrum at 6 M urea is shown to illustrate the high resolution of the peaks, which has enabled measurement of H N -H ␣ coupling constants from peak separations. b, measured coupling constants from such spectra at 8, 6, 4, and 2 M urea concentrations. c, deviations of the measured coupling constants from the random coil values at the four urea concentrations. Contiguous stretches have been marked with empty cylinders. of previous results by Ishima et al., where another precursor, which differed from the one studied here by virtue of several mutations and differences in the TFR sequence at the N terminus, indicates that the sequence plays a major role in dictating the folding propensities of the precursor. In real life, mutations occur frequently in HIV-1 protease precursors because of the poor fidelity of the reverse transcriptase of the virus. As the different mutant proteins would have different folding characteristics, they can also be expected to have different activities and responses to substrates and inhibitors. Taken together, it emerges that folding plays a crucial role in the regulation of precursor processing, and hence in the protease functioning in the viral life cycle. This study, to our knowledge gives the first full-length residue level description of folding characteristics of an HIV-1 protease precursor, and provides useful insights into how folding can have a regulatory role in its autoprocessing and vice versa. Our accomplishments here represent another significant advance, namely that this is the longest unfolded polypeptide chain assigned and studied by NMR so far.