pH-induced Conformational Transitions of the Propeptide of Human Cathepsin L A ROLE FOR A MOLTEN GLOBULE STATE IN ZYMOGEN ACTIVATION*

Synthesis of proteases as inactive zymogens is a very important mechanism for the regulation of their activity. For lysosomal proteases proteolytic cleavage of the propeptide is triggered by the acidic pH. By using fluorescence, circular dichroism, and NMR spectroscopy, we show that upon decreasing the pH from 6.5 to the propeptide of cathepsin L loses most of the tertiary structure, but almost none of the secondary structure is lost. Another partially structured intermediate, prone to aggregation, was identified between pH 6.5 and 4. The conformation, populated below pH 4, where the activation of cathepsin L occurs, is not completely unfolded and has the properties of molten globule, including characteristic binding of the 1-anilinonaph-thalene-8-sulfonic acid. This pH unfolding of the propeptide parallels a decrease of its affinity for cathepsin L and suggests the mechanism for the acidic zymogen activation. Addition of anionic polysaccha-rides that activate cathepsin L already at pH 5.5 unfolds the tertiary structure of the propeptide at this pH. Propeptide of human cathepsin L which is able to fold represents an evolutionary intermediate in the emergence of novel inhibitors originating from the enzyme proregions. N HSQC spectra were measured with spectral widths of 1700 and 3500 Hz for the 15 N and 1 H dimensions, respectively. Protein samples were dissolved in 90% H 2 O, 10% D 2 O. Quadrature detection was achieved with States-time proportional phase increment method. Pulsed field gradients were used to suppress water signal, which was further decreased by time domain spectral deconvolution. Data were transformed using FELIX software (MSI) on a Silicon Graph-ics workstation.

All lysosomal and most other proteases are synthesized in the form of inactive precursors (1,2). Propeptides are generally located N-terminal to the mature enzyme, and activation of the enzyme is accomplished by cis-or trans-cleavage of the propeptide. Propeptides vary from a few (e.g. trypsin) to more than 200 residues (e.g. cathepsin C). Longer propeptides are generally strong and specific inhibitors of their mature enzymes (3)(4)(5)(6)(7). In most cases propeptides are also indispensable for correct folding of the enzymes (8). In some enzymes folding of the mature form is extremely slow, and the propeptide assists in overcoming the kinetic barrier (9), which may also be overcome by physicochemical factors such as high ionic strength in subtilisin, for example (10). Proenzymes are quite often more stable than mature enzymes (11,12) and can represent a pool of latent enzyme until the activation occurs in the proper conditions. Propeptides are also involved in targeting to specific organelles (13,14); they can affect posttranslational modifica-tion such as glycosylation (15) and mediate interactions with other molecules (16,17). Propeptides can be cleaved either by other proteases or by intra-or intermolecular autocatalysis. pH change is one of the most common environmental parameters responsible for triggering the activation of proteases, occurring in cysteine, aspartic acid, and metalloproteases (1,18,19). Low pH is thought either to increase the susceptibility of the propeptide as a substrate due to the protonation of groups close to the cleavage site or to cause a conformational change in the propeptide or enzyme.
Cathepsin L is one of the most active cysteine proteases and accounts for most of the lysosomal cysteine protease activity (20). It has been implicated in a range of processes including turnover of proteins involved in growth regulation, tumor invasion and metastasis (21), and bone resorption (22). The mouse analogue of procathepsin is secreted from transformed mouse fibroblasts and has been at first called major excreted protein (11). Procathepsin L is stable under neutral and slightly alkaline pH conditions where mature cathepsin L is rapidly inactivated (11). The proregion of cathepsin L is compulsory for its correct folding either in vivo (23) or in vitro (24). Activation of cathepsin L occurs autocatalytically below pH 4 (11,25), and also at higher pH (around pH 5.5) in the presence of anionic oligosaccharides such as dextran sulfate or heparin (26,27). This mode of activation is very similar to that of other cysteine proteases of the papain family (28,29).
The crystal structure of the procathepsin L (30) shows that the proregion consists of a compact domain composed of three ␣-helices and a short stretch of ␤-strand up to the first 75 residues followed by a less ordered, extended chain of the remaining 20 residues. This domain lies above the active site cleft and inhibits the activity by sterically preventing the access to the active site. The structure of the proregion in procaricain is very similar (31), whereas in procathepsin B it lacks the first helix (32,33). The helical domain of the proregion has a hydrophobic core containing three of the four tryptophans, which provide a convenient spectroscopic probe for investigating its conformation. The complete propeptide as well as its fragments, which comprise the compact domain, are potent inhibitors of cathepsin L (34). This inhibition is strongly pHdependent and drops sharply at the acidic pH. The main contribution of this decrease is due to a decrease in the rate of formation of the propeptide-enzyme complex (34). However the mechanism of pH-dependent decrease in the K i and resulting cathepsin L activation is unknown.
In the present study we were interested in the connection between the conformation of the propeptide and the process of acid-induced zymogen activation. Recombinant propeptide of cathepsin L in Escherichia coli has been prepared. We have applied fluorescence, circular dichroism, and nuclear magnetic resonance to investigate the structure of the free propeptide in solution. We show that the propeptide folds into a defined tertiary structure at neutral pH, which has not yet been observed for any propeptide of the lysosomal proteases. Moreover we have found a distinct conformational change upon decreasing the pH. Below pH 3.5, the pH of activation of the procathepsin L, the propeptide adopts a molten globule conformation, which can explain a decrease in the inhibitory constant and particularly the association rate and provide a mechanism for the activation.

Materials
Restriction enzymes and Vent DNA polymerase were purchased from New England Biolabs, and the DNA sequencing kit was from Amersham Pharmacia Biotech. Oligonucleotides were custom-synthesized by The Great American Gene Co. (Ramona, CA).

Methods
Production of Recombinant Propeptides in E. coli-The region coding for the propeptide for human cathepsin L (PRL) 1 was amplified by polymerase chain reaction using oligonucleotides 5Ј-GCGCATAT-GACTCTAACATTTGATCAC-3Ј and 5Ј-CGCGGATCCTACTCATAAA ACAGAGGTTC-3Ј. Gene coding for the human preprocathepsin L was used as a template. Additional methionine was added at the N terminus. For the shortened propeptide PRL78, comprising 78 residues Nterminal of the propeptide and including additional methionine, oligonucleotide 5Ј-ATCGGGATCCTAGCCATTCATCACCTGCCTGA-3Ј was used instead of the 3Ј primer. Polymerase chain reaction products were subcloned into the pET3a expression vector (Novagen). Recombinant proteins were expressed in E. coli using intracellular expression under T7 RNA polymerase promoter inducible by 0.4 mM isopropyl-1-thio-␤-D-galactopyranoside. 15 N-Labeled proteins were prepared by growing bacteria in the M9 minimal medium supplemented with 1 g/liter 15 N NH 4 Cl (Isotec). Full-length propeptide and propeptide-(1-78) of cathepsin L were produced at yields of approximately 20 and 7 mg/liter bacterial culture. Both proteins were produced in form of inclusion bodies, which were isolated by sonication of the bacterial paste, suspended in 20 mM Tris, pH 8.0, 0.1 M NaCl, 0.1% Triton X-100, and centrifuged. The resulting compact pellet was extensively washed in the same buffer and finally in the buffer containing 1 M urea. Recombinant proteins were purified on a gel filtration column under denaturing conditions (8 M urea, 50 mM acetate, pH 3.3). For the PRL refolding was performed by gel filtration thereby transferring the protein from the denaturing buffer into 1 M urea, 20 mM Tris, pH 8.0. The protein was further purified on the Q-Sepharose column in the presence of 1 M urea. Refolding of the PRL78 was performed by dilution into 20 mM phosphate buffer, pH 8.0, dialyzed, and further purified using Q-Sepharose column in the same buffer. For spectroscopic experiments propeptides were dialyzed against the water and adjusted to pH 8. Any precipitate was removed by filtration, and corresponding buffer was added for each pH value.
Fluorescence-A Perkin-Elmer model LS-50 luminescence spectrometer was used for fluorescence measurements. Tryptophan emission of PRL was measured using an excitation wavelength of 290 nm. Three scans from 300 to 410 nm were recorded at a speed rate of 180 nm per min. PRL concentration was 0.02 mg/ml. ANS fluorescence was excited at 370 nm, and the emission spectra were measured from 400 to 610 nm. Two scans at scan rate of 220 nm per min were averaged. 10 l of ANS (4 ϫ 10 Ϫ3 M) was added to 620 l of the same PRL solutions as for the tryptophan emission, giving a molar ratio of ANS to protein of 35. Buffers for pH measurements were 10 mM Tris-HCl, pH 8.0, 10 mM potassium phosphate, pH 7.6 to 6, 15 mM sodium acetate, pH 5.7 to 3.8, and 15 mM glycine buffer, pH 3.5 to 2.2. pH 2 and lower was achieved by the addition of HCl to the protein in water.
Circular Dichroism-CD spectra were measured on an Aviv model 62A DS CD spectrometer, equipped with a thermostated cell holder. All measurements were performed at 18°C to avoid thermal denaturation and to make less mobile any structure present. Near and far UV CD spectra were measured in cells with a path length of 10 and 1 mm, respectively. Base-line spectra were subtracted from each spectrum. Concentration of the proteins was determined from the absorbance at 280 nm taking into account the extinction coefficients calculated from the amino acid composition (2.27 for 1‰ PRL and 2.4 for PRL78). Typical concentrations of the propeptides were 20 and 5 M for the near and far UV CD spectra, respectively. Results were converted into mean residue ellipticities in the far UV region (250 -190 nm) and into molar ellipticities in the near UV region (340 -250 nm).
Stopped-flow Fluorescence-Stopped-flow experiments were performed on an SX-17MV Applied Photophysics stopped-flow spectrofluorimeter. A protein sample at a concentration of 200 g/ml at pH 7.2 was mixed in a stopped-flow chamber with a 50 mM glycine HCl buffer at pH 3.5. As a control the propeptide was mixed with 50 mM potassium phosphate buffer at pH 7.2. Experiments were performed at 18°C.
Nuclear Magnetic Resonance-NMR spectra were measured on an INOVA 600 Varian NMR spectrometer equipped with a triple resonance probe and z gradients. Concentration of PRL78 was around 0.5 mM in 20 mM buffers. Two-dimensional nuclear Overhauser effect spectroscopy spectra were measured on proteins dissolved in D 2 O at pH 7.0, without the pH adjustment due to the solvent. Spectral width was 12 ppm with 256 increments in indirect dimension and 2048 points in direct dimension. Free induction decays were apodized with a sine square function. 1 H-15 N HSQC spectra were measured with spectral widths of 1700 and 3500 Hz for the 15 N and 1 H dimensions, respectively. Protein samples were dissolved in 90% H 2 O, 10% D 2 O. Quadrature detection was achieved with States-time proportional phase increment method. Pulsed field gradients were used to suppress water signal, which was further decreased by time domain spectral deconvolution. Data were transformed using FELIX software (MSI) on a Silicon Graphics workstation.

Conformation of the Free Propeptide-
The fluorescence emission spectrum of the recombinant propeptide of human cathepsin L (PRL) at neutral pH has a maximum at 335 nm, showing that tryptophan residues are predominantly in a hydrophobic environment shielded from the solvent. The near UV CD spectrum exhibits fine structure characteristic of a folded protein ( Fig. 1) with high positive ellipticity centered at 280 nm with a shoulder at 288 nm and a trough at 294 nm. This indicates rigid asymmetric environments of the aromatic residues, of which four are tryptophans. The near UV CD spectrum does not change significantly between pH 8 and 6.5. Far UV CD spectra indicate that the free propeptide contains a high content of ␣-helix (44%, predicted from the neural network program k2d (35)), consistent with the crystal structure of the propeptide in procathepsin L. By using ellipticity at 220 nm as a probe the propeptide undergoes a cooperative and reversible transition upon heating with a midpoint at 45°C (not shown), as expected for a protein with a defined tertiary structure. Full-length propeptide aggregates at concentrations above 0.5 mg/ml, so we have also produced fragment 1-78 of the PRL (PRL78). This fragment comprises the compact domain of the proregion and contains all four tryptophan residues. It has a lower tendency to aggregate compared with the full-length propeptide, being particularly suited for the NMR experiments. Fluorescence emission maximum of the PRL78 is at 336 nm, and near and far UV CD spectra are closely similar to the spectra of PRL (not shown), implying that the region with defined tertiary structure (or at least core with aromatic residues) is limited to this region. Two-dimensional nuclear Overhauser effect spectroscopy spectrum of the PRL78 at pH 7.0 displays many nuclear Overhauser effects between the aromatic and aliphatic side chain protons (Fig. 2), as additional evidence for the tertiary structure interactions.
pH-dependent Conformational Changes of the Propeptide-Ellipticity in the near UV CD spectra decreases with decreasing pH (Fig. 1) indicating unfolding of tertiary structure upon acidification. Fluorescence emission intensity decreases likewise with a midpoint of transition around pH 5.5 (Fig. 3). Wavelength maximum of the fluorescence decreases from 335 nm at pH 8 (state N) to 342 nm at pH 5.4 and persists at 346 nm in the pH range 4.8 to 3.6 (state X). Further lowering the pH causes an additional increase in the wavelength maximum, reaching the max of 353 nm below pH 3, characteristic for the solvent-exposed tryptophan residues. Far UV CD spectra remain quite similar upon decrease of pH with only a small difference in shape at pH below 4 (Fig. 4). This indicates that the secondary structure content of the propeptide is conserved. The acidic state (state A) is thus characterized by a high content of the secondary structure and the absence of the aromatic core and persistent tertiary structure. Presence of the secondary and absence of persistent tertiary structure are two of the characteristics of molten globule type conformations (36). The ANS binding to propeptide, as monitored by fluorescence emission at 478 nm, is low at neutral pH values and increases in the pH range from 3.5 to 1.5 (Fig. 5), where there is almost no tertiary structure left but still high amounts of secondary structure, again confirming the presence of molten globule type conformation. At even lower pH the propeptide transforms into an unfolded state which does not bind ANS, as shown also for other proteins (37). Higher ionic strength shifts the transition toward higher pH values and increases the aggregation of the X state (not shown). The ANS molecules were efficiently excited by the process of energy transfer from tryptophans, when we used excitation wavelength of 293 nm. This shows a close proximity of the hydrophobic, ANS binding surfaces to tryptophans. Unfolding of the truncated propeptide PRL78, monitored by near and far UV CD, followed very closely unfolding of the full-length propeptide. PRL78 was used for the two-dimensional 1 H-15 N HSQC NMR experiments due to its higher solubility. At pH 3.3, when the protein is in the A state, dispersion of the amide cross-peaks in the H-N plane is typical for a partially structured conformation, whereas in the unfolded state, in the presence of 8 M urea (U state), dispersion is much decreased (Fig. 6), proving that the A state is not completely unfolded. Stopped-flow fluorescence experiments were performed by transferring the PRL78 from pH 7.2 to 3.5 in order to evaluate the rate of acidic unfolding. Fluorescence decrease occurred within the mixing time of the device (Fig. 7), indicating a rate of unfolding faster than 100 s Ϫ1 .
Unfolding of the Propeptide in the Presence of Anionic Polysaccharides-Addition of dextran sulfate to PRL78 at pH 5.5 causes a complete elimination of the near UV CD ellipticity, showing that the tertiary structure is destroyed (Fig. 8). Ellipticity in the far UV is also much decreased at this pH, different from the propeptide in the absence of dextran sulfate. At lower pH the negative far UV ellipticity increases again. In the pres- pH Unfolding of the Propeptide of Cathepsin L ence of dextran sulfate at pH 5.5 aggregation at micromolar concentrations was also observed from the UV absorption spectra. This demonstrates that dextran sulfate unfolds the propeptide already at pH values below ϳ5.5.

DISCUSSION
Structure of the Free Propeptide-Propeptides of proteases characterized in the literature are either completely unfolded or exhibit a limited content of secondary structure (38,39), with the exception of the 166-residue propeptide of ␣-lytic protease and activation domain of carboxypeptidase B (9,40). Secondary structure was in some cases induced by co-solvents such as polyethylene glycol or methanol (38,41). We have shown that the free propeptide of cathepsin L and its fragment (PRL78) are at neutral pH folded into a compact structure with defined secondary and tertiary structure. The structure has, like the proregion in zymogen, a high content of ␣-helix. The near UV CD spectrum is very similar to the difference between the spectra for procathepsin and cathepsin L (compare Fig. 2 and Fig. 3A in Ref. 34), suggesting that the conformations of the free propeptide and proregion in zymogen are probably similar. The proregion is essential for folding of cathepsin L (23), and particularly based on its ability to fold independently, as demonstrated in this work, it is a good candidate for a folding cofactor of the mature enzyme, similar to the propeptide ␣-lytic protease (9) or subtilisin (42), which, however, does fold per se. Previously it has been reported that the propeptide of cathepsin L is devoid of persistent tertiary structure (34); however, those measurements were performed in the presence of 10% acetonitrile and at pH 5.5, where most of the tertiary structure is already destroyed. Propeptides of papain and pa-paya protease IV have also been expressed and shown to contain some secondary structure (39).
Existence of tertiary structure in the free propeptide is particularly interesting in light of the fact that the homologues of the proregion of cathepsin L-cytotoxic T-cell lymphocyte antigen-2 (CTLA-2␣ and CTLA-2␤) are expressed in murine T-cells and mastocytes as independent proteins (43) and are functional as inhibitors of cysteine proteases (44). This represents an interesting demonstration of a co-evolution of the enzyme and inhibitor as one protein (proenzyme), until the proregion is able to fold independently, which is a necessary condition for the independent existence of the propeptide as an autonomous protein. Of all the proregions of cysteine proteases, CTLA-2 are most similar to the proregion of cathepsin L. A phylogenetic tree of the proregions of cathepsins L, S, K, B, and H (not shown) has been constructed which suggests that divergence of CTLA-2 and a proregion of cathepsin L occurred by gene duplication after divergence of cathepsin L and other cysteine proteases. Similarity between the CTLA-2 and proregion of cathepsin L is, as in proregions of other cysteine proteases, concentrated in the first 80 residues, corresponding to the compact domain. Gene structures of the CTLA-2s are not yet known. Proregions of human and murine cathepsin L are coded by the exons 2-4 (45,46). The second exon codes for the signal sequence and the first helix of the proregion, whereas exon 3 starts precisely at the beginning of the longest helix 2 and includes the hairpin with the extended strand. Exon 4 starts at the beginning of the third helix and includes the linker between the compact proregion domain and the enzyme. The end of the fourth exon does not correspond to the border between the proregion and the mature enzyme but also includes 19 residues Acidic Unfolding of the Propeptide and Activation of the Zymogen-Activation of the protease zymogen must occur under conditions where the propeptide no longer inhibits the enzyme. This can be accomplished either by cleavage of the proregion into noninhibitory fragments, generally by another protease as for example in the coagulation cascade (48), or by changes in the conformation of the proregion in response to changes in the medium (e.g. acidic pH, binding of anionic oligosaccharides or membranes) with subsequent cleavage either intra-or intermolecularly (49). We have shown that the free propeptide of cathepsin L undergoes a conformational transition upon decreasing the pH. Two different, partially structured conformations were observed, one in the pH range from 6 to 4 which has a low amount of tertiary structure and is more prone to aggregation, and the second, the acidic state, which has almost no tertiary structure but a high amount of secondary structure and is populated in the pH range below 4, where the activation of cathepsin L occurs in vitro. This A state also binds to ANS and so has the attributes of a typical molten globule state. It has been shown previously that the inhibitory activity of the propeptide of cathepsin L decreases with decreasing pH (34). This decrease is mainly due to the decrease in the association rate constant, showing that in the acidic pH conformational rearrangements have to be made in order to bind propeptide to cathepsin L. The kinetic rate of dissociation, on the contrary, is almost independent of pH. In the tertiary structure of the proenzyme there are few electrostatic interactions between the proregion and enzyme (30) compared with those within the proregion. Interactions between the proregion and the enzyme are achieved mainly through the short ␤-strand, the conformation of which is secured by the second, longest, helix. The first helix of the proregion does not form any contacts with cathepsin L, but its deletion decreases the inhibition by 2 orders of magnitude (34), showing the importance of tertiary structure of the propeptide for the interactions with the enzyme. In the case of propapain, optimum pH of activation pH Unfolding of the Propeptide of Cathepsin L was increased by one pH unit through the introduction of an additional charge into the proregion by an F38H mutation (50), consistent with the role of electrostatic interactions in disruption of the tertiary structure of the proregion.
It is likely that the propeptide could be stabilized by the cathepsin L as it has been demonstrated for subtilisin (51), where propeptide per se does not possess persistent tertiary structure. However, procathepsin L has approximately onetenth of the enzymatic activity of the mature cathepsin L (11), and active site inhibitors can be bound to the proenzyme (11), indicating either an exchange between the active and inactive forms of the proenzyme or that in solution the complex between the enzyme and proregion is not as tight as in the crystal. At acidic pH, once the proregion dissociates from the enzyme it would rapidly be unfolded into the molten globule state with a low rate of association with the enzyme and could be cleaved off either intra-or intermolecularly. We have demonstrated with stopped-flow experiments that at pH 3.5 unfolding from the N into the A state is a fast process which may be able to compete with formation of the propeptide-enzyme complex.
It has been observed for other proteases that the conformational change accompanies the activation, as in the case of the matrix metalloproteases (52,53). Formation of molten globule conformations under acidic conditions is a rather common phenomenon (54 -60) and probably occurs due to the nonspecific long range electrostatic interactions caused by protonation at low pH (61). Unfolding of a domain of a protein could represent a more general mechanism of regulation employing a pH switch. For this mechanism two domains of a protein with different pH stability should be discernible and connected by a linker as between the propeptides and cysteine proteases.
Activation in the presence of anionic oligosaccharides appears to unfold the propeptide by unfolding its conformation and thereby increasing its susceptibility to proteolysis. Histidine residues are probably responsible for binding to dextran sulfate below their pK which coincides with the pH around 5.5 where this type of activation occurs. The binding motif of the proregion of mouse cathepsin L which is responsible for its binding to the microsomal membranes has been identified (62,63). This motif is restricted to a nonapeptide which includes several basic residues and histidines. A similar effect to that of the anionic oligosaccharides might be exhibited by the anionic phospholipid containing membranes, which have been shown to shift induction of molten globule formation to higher pH (64,65). Molten globule conformation has been implicated in physiological processes such as membrane insertion and pore formation (36,66,67) or chaperone binding (68) and acid activation of the proteases may be another role of this type of protein conformation.