Processing, Activity, and Inhibition of Recombinant Cyprosin, an Aspartic Proteinase from Cardoon (Cynara cardunculus)*

The cDNA encoding the precursor of an aspartic proteinase from the flowers of the cardoon, Cynara cardunculus, was expressed in Pichia pastoris, and the recombinant, mature cyprosin that accumulated in the culture medium was purified and characterized. The resultant mixture of microheterogeneous forms was shown to consist of glycosylated heavy chains (34 or 32 kDa) plus associated light chains with molecular weights in the region of 14,000–18,000, resulting from excision of most, but not all, of the 104 residues contributed by the unique region known as the plant specific insert. SDS-polyacrylamide gel electrophoresis under non-reducing conditions indicated that disulfide bonding held the heavy and light chains together in the heterodimeric enzyme forms. In contrast, when a construct was expressed in which the nucleotides encoding the 104 residues of the plant specific insert were deleted, the inactive, unprocessed precursor form (procyprosin) accumulated, indicating that the plant-specific insert has a role in ensuring that the nascent polypeptide is folded properly and rendered capable of being activated to generate mature, active proteinase. Kinetic parameters were derived for the hydrolysis of a synthetic peptide substrate by wild-type, recombinant cyprosin at a variety of pH and temperature values and the subsite requirements of the enzyme were mapped using a systematic series of synthetic inhibitors. The significance is discussed of the susceptibility of cyprosin to inhibitors of human immunodeficiency virus proteinase and particularly of renin, some of which were found to have subnanomolar potencies against the plant enzyme.

Sequences have been elucidated recently for genes encoding aspartic proteinases of plant origin, e.g. barley (1), rice (2), tomato (3), and oilseed rape and Arabidopsis (4). All of these sequences predict that, by comparison with the well studied aspartic proteinases from mammals, fungi, yeasts like Saccharomyces (5) and Candida (6), parasites like Plasmodium falciparum (7), and viruses such as HIV 1 (8), approximately 100 extra amino acids are introduced into the C-terminal domain of the newly synthesized plant polypeptides. The function of this region or plant-specific insert (9) is currently unknown, but, as it does share considerable sequence identity with a group of mammalian sphingolipid activator proteins known as saposins (10,11), it has been postulated to bind selectively to certain lipids and thus direct the precursor form of the aspartic proteinase into the appropriate cytomorphological compartment in the plant cell (10).
Alignment of the sequences of the inserts predicted by the plant aspartic proteinase genes with those of saposins correctly positions all six cysteine residues and a glycosylation site and also maintains the pattern of hydrophobic residues described previously for all known saposins (12). However, each plantspecific insert does not consist of a single saposin domain but appears to correspond to the C-terminal portion of one saposin domain linked to the N-terminal portion of a second saposin domain. On this basis, the plant-specific inserts have been described as swaposins (12), indicating that they are likely to have a structure similar to that of the saposins, but with N-and C-terminal halves interchanged in sequence order. Relatively little information has been reported to date on aspartic proteinases originating from plant tissues, but, in the few enzymes that have been isolated, this swaposin domain is not present and appears to have been excised by post-translational processing. Nothing is known of the enzyme(s) responsible in planta, but the excision of each insert is imprecise, resulting in the generation of a complex mixture of heterogeneous, mature aspartic proteinases within the tissues of each of the plants that has been studied, e.g. from seeds of barley (13), pumpkin (14), and Arabidopsis thaliana (15), from rice (16) and from flowers of the cardoon, Cynara cardunculus (9,17,18). Commonly, the enzymes thus generated are heterodimers with molecular weights in the region of 40,000 -45,000.
This complexity of natural isoforms is compounded even further by the expression of several genes in the plant tissues, each encoding closely related enzymes so that physicochemical and enzymatic characterization of naturally occurring aspartic proteinases isolated directly from plants has been made rather difficult. A recombinant approach was thus employed in attempts to gain an initial insight into the significance of the plant-specific insert and to examine the activity and specificity of an aspartic proteinase from the plant kingdom. We have expressed in the methylotrophic yeast, Pichia pastoris, the cDNA encoding the precursor of one aspartic proteinase from the flowers of the cardoon, C. cardunculus (19). Extracts of these flowers have been documented previously to contain a number of isoforms of aspartic proteinases called cyprosins and cardosins (9,17,18) and have been used for centuries as coagulants in traditional cheese-making in regions of southern Europe, particularly the Iberian peninsula (20). * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
§ Supported by a studentship award from the Biotechnology and Biological Sciences Research Council.

MATERIALS AND METHODS
Gene Cloning and Mutagenesis-A cDNA clone encompassing the full-length precursor of a cyprosin was isolated by courtesy of Dr. M. Pietrzak (Basel, Switzerland) by rescreening of a cDNA library prepared from flower buds of C. cardunculus, as described previously (19,21). This was amplified by PCR using Vent DNA polymerase (New England Biolabs) with appropriate forward and reverse primers (5Ј-GG  AAT TCC GGA TCC TCA CCT ACT GCA TTT TCG GTC-3Ј and 5Ј-GA  ATT CCG GGA TCC TCA AGC TGC TTC TGC AAA-3Ј, respectively; purchased from Amersham Pharmacia Biotech, Cambridge, United Kingdom (UK)) and subcloned into the pUC18 vector. In turn, this recombinant pUC18 was used as template DNA for overlapping PCR mutagenesis reactions, as described previously (22). Mutations were introduced at appropriate locations by two initial and one subsequent PCR reaction. To modify the sequence connecting the propart region to the mature cyprosin enzyme, a 241-bp fragment spanning from the pUC18 vector to the desired, mutated cleavage junction was amplified using appropriate forward and reverse primers (5Ј-GTT GGG TAA CGC CAG GG-3Ј ϭ F1 and 5Ј-CCT CAG AAA AGC AGC GAA GCC-3Ј, respectively). The second PCR amplified a 478-bp fragment using a forward primer (5Ј-GGC TTC GCT GCT TTT CTG AGG-3Ј) and a reverse primer (5Ј-GTT CTT GAA CAA GAC CTT G-3Ј ϭ R1) complementary to the sequence located downstream from the region into which the mutations were to be introduced and downstream from a BglII restriction site. The purified fragments were combined and used as template DNA in a final PCR using the F1 and R1 flanking primers. The resultant 698-bp amplicon was digested with the restriction endonucleases HindIII and BglII and subcloned to replace the corresponding wild-type segment in the original procyprosin construct.
The region encoding the 104 residues of the plant specific insert was excised by an identical strategy. The primer pairs were 5Ј-GG AAT TCC  GGA TCC TCA CCT ACT GCA TTT TCG GTC-3Ј ϭ F2 and 5Ј-CTG GAT  GAG AGG TAC CGC ACC AAT TGC ATG ATT GAT TTC-3Ј; and  5Ј-GTA CCT CTC ATC CAG GGA GAA TCA GCA GTA GAC TGC  AAC-3Ј and 5Ј-GA ATT CCG GGA TCC TCA AGC TGC TTC TGC AAA-3Ј ϭ R2. The resultant 911-and 296-bp fragments were combined and used in the final reaction using the F2/R2 combination of primers. The 1192-bp amplicon thus generated was digested with BamHI, ligated into a similarly treated, dephosphorylated pUC18 vector, and the reaction mixture was used to transform competent Escherichia coli (DH5␣) cells. The authenticity of all manipulations was verified by dideoxy sequencing of both DNA strands.
Expression in E. coli-Restriction digestion with BamHI enabled subcloning of each desired cDNA into the pET-3a expression vector (AMS Biotechnology, Witney, UK). Expression of this recombinant plasmid was induced by the addition of isopropyl-1-thio-␤-D-galactopyranoside (0.4 mM final concentration) when the cells had reached an A 600 of 0.6. After 3 h, the cells were harvested and lysed, and the resultant insoluble material was washed at 4°C for 4 h with 100 mM Tris-HCl, pH 11.0, containing 50 mM ␤-mercaptoethanol. After centrifugation at 16,000 ϫ g for 30 min, the resultant pellet was solubilized by stirring at 25°C for 16 h in 6 M urea in 100 mM Tris-HCl buffer, pH 8.0, supplemented with 1 mM glycine, 1 mM EDTA, and 50 mM ␤-mercaptoethanol.
Expression in P. pastoris-The cDNA encoding each procyprosin (wild-type or mutants) was subcloned into the expression vector pPICZ␣ C (Invitrogen, Leek, Netherlands) and used to transform E. coli (Top10FЈ) cells. Selection of transformants containing the pPICZ␣ C vector was made on low salt LB agar containing zeocin (50 g ml Ϫ1 ). Plasmid DNA was purified from selected colonies, linearized by digestion with the restriction endonuclease PmeI, and electroporated into P. pastoris (KM71) cells according to the manufacturer's instructions (Invitrogen).
Yeast colonies that had undergone the appropriate recombination events to incorporate the procyprosin gene into the host chromosome were used to inoculate BMGY medium (100 ml) containing ampicillin (100 g/ml). The flasks (250 ml) were shaken at 30°C until the cells attained an A 600 of 5.0. After harvesting by centrifugation at 3000 ϫ g for 5 min at room temperature, the cell pellets were resuspended in 20-ml aliquots of BMMY induction medium containing ampicillin (100 g/ml) in 100-ml flasks and shaken at 30°C over a period of 7 days. Expression was induced over this time period by the addition of methanol to a final concentration of 0.5% every 24 h.
Recombinant Proteinase Purification and Characterization-Samples of medium from induced Pichia cells were analyzed by SDS-PAGE followed by staining with Coomassie Blue or by Western blotting using an anti-cyprosin antiserum that had been raised in rabbits (17). Detection of immunoreactive bands used a goat anti-rabbit IgG-alkaline phosphatase conjugate as described previously (22). Proteinase activity was monitored during purification steps using the substrate Lys-Pro-Ile-Glu-Phe*Nph-Arg-Leu 2 at pH 4.0, with the cleavage being monitored by fast protein liquid chromatography using a Pep-RPC reverse phase column (22). Aliquots of conditioned medium were diluted by 10-fold prior to each assay in order to circumvent the complications caused by the increasing amounts of a yellow pigment that is released by the yeast cells into the growth medium. This absorbs in the UV region of the spectrum and thus interferes with detection of the products of peptide substrate digestion.
Samples of conditioned medium harvested after 6 days were adjusted to pH 4.0 by the addition of 1 M sodium formate buffer, pH 3.0, and dialyzed at 4°C against 10 mM sodium formate buffer, pH 4.0, containing 50 mM sodium chloride with five changes of buffer. Dialysates were applied to a Hi-Trap SP column fitted into a fast protein liquid chromatography instrument (Amersham Pharmacia Biotech), and elution was continued with the same pH 4.0 buffer. Under these conditions, the yellow Pichia-derived pigment was not retained by the column. After extensive washing, recombinant cyprosin was eluted using a linear gradient of 50 -500 mM NaCl in the sodium formate buffer at pH 4.0. Fractions were monitored for activity and combined as appropriate.
Samples for N-terminal sequencing were subjected to SDS-PAGE under reducing or non-reducing conditions and blotted onto polyvinylidene difluoride membrane, and appropriate bands were subjected to automated Edman degradation, as described previously (23). Attempts to derive C-terminal sequence on relevant bands were carried out (24), using a Hewlett-Packard G1009A C-terminal sequencing system. Deglycosylation reactions were carried out with N-glycosidase F (Roche Molecular Biochemicals, Mannheim, Germany) in 250 mM Tris-HCl buffer, pH 8.8, at 37°C for 1 h.
A naturally occurring preparation of isoform 3 of cyprosin was purified to homogeneity from the dried flowers of C. cardunculus as described previously (17).
Kinetic parameters for hydrolysis of the chromogenic substrate Lys-Pro-Ile-Glu-Phe*Nph-Arg-Leu were derived spectrophotometrically as described previously (23). Values of k cat were derived from the equation where the concentration of active enzyme E t was derived by active site titration using preparations of isovaleryl-pepstatin of precisely defined concentration (23). Inhibition constants were derived at pH 5.0, and the estimated error on all measurements was always less than Ϯ15%. However, it was necessary to use final concentrations of cyprosin in the assay cuvettes of approximately 5 nM, and so

RESULTS AND DISCUSSION
The nucleotide sequence of the procyprosin clone isolated by rescreening of the cDNA library from flower buds of C. cardunculus (19,21) has been deposited in the EMBL/GenBank data bases under the accession number X81984. The amino acid sequence predicted by this clone is aligned with that of human procathepsin D in Fig. 1. From this, it is apparent that (i) the plant gene encodes an insert of 104 residues within the Cterminal domain that is not present in the mammalian enzyme, and (ii) both proteins are predicted to have one common glycosylation motif (at Asn 67 -Gly 68 -Thr 69 ); the other known site of carbohydrate attachment in cathepsin D (at Asn 183 -Val 184 -Thr 185 ) is not present in the cyprosin sequence, although an additional glycosylation motif (at Asn 83I -Glu 84I -Thr 85I ) 3 is present in the plant-specific insert of cyprosin ( Fig. 1 and see Introduction). Excluding the plant-specific insert, the sequence of the mature enzyme region of this cyprosin shares 52% identity with that of cathepsin D; in contrast, the two prosegments have very little similarity to one another.
As described under "Materials and Methods," the cDNA encoding this wild-type form of procyprosin was introduced into pET-3a. Expression in E. coli strain BL21 (DE3) pLysS resulted in the accumulation of recombinant protein at a high level (estimated to be 10 -20 mg/liter of culture) but this was insoluble and misfolded. Despite extensive efforts, suitable conditions to prepare significant amounts of properly folded protein could not be established. Consequently a different expression system was required and the methylotrophic yeast, P. pastoris, was selected since this has been used previously to produce an aspartic proteinase (precursor) in a soluble, properly folded form (25). The procyprosin construct was subcloned into the pPICZ␣C plasmid, and appropriate recombinants of the P. pastoris cells (KM71 strain) were selected as described under "Materials and Methods." Expression of the procyprosin gene resulted in the appearance in the culture medium of a cluster of immunoreactive bands in the 32-34-kDa region, together with a second cluster in the region between 14 and 18 kDa ( Fig. 2A). The relative proportions of each band in each of the clusters varied from batch-to-batch of induced culture medium so the clusters are referred to as heavy and light chains, respectively. The heavy chain (32-34 kDa) cluster was apparent as early as the second day of induction, while the light chains (14 -18 kDa) became visible after about day 5. The recombinant protein was purified idues have been designated with the suffix I, and prosegment residues are assigned the suffix P. from medium harvested after 6 days, as described under "Materials and Methods." Acidification and dialysis of the medium at pH 4.0, followed by chromatography on a HiTrap SP column, successfully removed the yellow pigment that is a persistent contaminant released into the culture medium of induced Pichia cells. The recombinant protein that was eluted by the salt gradient from the HiTrap SP column did not emerge in a sharp peak but rather was a broad smear of material that absorbed at 280 nm and that reacted with the cyprosin antiserum (data not shown). Early fractions of material contained a mixture of 34-and 32-kDa heavy chains in which the 34-kDa band was predominant (Fig. 3A, lane 1), while in the later fractions the 32-kDa band was most abundant (Fig. 3A, lane 2). All of the fractions also contained light chains migrating in the 14 -18-kDa range, which stained only weakly with Coomassie Blue. The 34-and 32-kDa heavy chains and the light chains were all immunoreactive with anti-cyprosin antiserum (data not shown). Thus, it was apparent that all of the contaminating proteins had been removed and that the only remaining proteins were derived from recombinant (pro)cyprosin. The yield of (total) purified protein was approximately 1 mg/liter of culture medium. However, microheterogeneity was clearly evident.
The 34-and 32-kDa heavy chains were resolved from one another by SDS-PAGE under reducing conditions and blotted onto polyvinylidene difluoride membrane. N-terminal sequencing by Edman degradation revealed that both were identical and contained overlapping sequences (in the ratio 40:60).

SEQUENCES 1 AND 2
The distinction in the sizes of these heavy chains must therefore be accounted for by differences at their C termini. The N termini identified for the recombinant heavy chains are displaced further upstream by three and one residue, respectively, from the Asp residue ( Fig. 1) that was suggested to be at the N terminus of the heavy chain of a cyprosin isoform isolated from flowers of C. cardunculus (21). Processing of the recombinant procyprosin precursor had thus taken place to remove the prosegment by cleavage at two adjacent sites in the sequence, resulting in microheterogeneity at the N termini of the resultant (34/32 kDa) heavy chains of mature cyprosin. After SDS-PAGE under reducing conditions and blotting onto polyvinylidene difluoride, attempts were also made at N-terminal sequencing of the 14 -18-kDa light chains. In all cases, however, multiple residues were detected in every cycle, making assignment difficult, but indicating that the array of light chains observed was likely to have arisen from microheterogeneity at their N termini. However, when samples of recombinant cyprosin were subjected to SDS-PAGE under non-reducing conditions, a different effect was observed. A typical example is depicted in Fig. 3B (compare lanes 3 and 5). This preparation of recombinant cyprosin consisted almost exclusively of 32-kDa heavy chain under reducing conditions but in the absence of reducing agent, the dominant band detected was at ϳ46 kDa. N-terminal analysis of this 46-kDa band gave the same microheterogeneous pair of sequences, as was detected for the 34-and 32-kDa bands from the reduced SDS gel. In addition, however, a third sequence was detected in the nonreduced sample.
Edman degradation: VNELXDRLP Predicted from cDNA: VNELCDRLP ( Fig. 1) : This sequence is likely to have been derived from a light chain that had remained attached to the heavy chains under the non-reducing conditions, thereby indicating that the recombinant 34/32-kDa heavy chains and the light chains may be linked by disulfide bonds formed between sequences of the plant-specific insert that remain attached at the C-and N termini of the heavy and light chains, respectively, after excision of the bulk of the plant-specific insert by the Pichia cells. The presence of Cys 97I in the N-terminal sequence of this light chain indicated its availability for disulfide bonding.
No data were obtained from attempts at C-terminal sequencing of the 34-kDa heavy chain, but, for the 32-kDa heavy chain, leucine was identified as the C-terminal residue, with threonine occupying the penultimate position. This ϳThr-Leu combination occurs only once in the sequence Cys 3I -Lys 4I -Thr 5I -Leu 6I near the predicted N terminus of the plant-specific insert region (Fig. 1). Proteolytic processing after this leucine residue would thus result in Cys 3I remaining within the heavy chain, which would be predicted to have an M r of 29,000 on SDS-  5) and a sample of naturally occurring cyprosin (isoform 3; lanes 4 and 6) were subjected to electrophoresis under reducing (lanes 3 and 4) and nonreducing (lanes 5 and 6) conditions. C, aliquots of recombinant wildtype cyprosin before (lane 7) and after (lane 8) treatment with Nglycosidase F were analyzed under reducing conditions. Lane 9 depicts under reducing conditions a sample of the mutant procyprosin in which the plant-specific insert was deleted and the sequence between propart and mature enzyme regions was altered to ϳPhe-Ala-Ala-Pheϳ. Lanes 1-6 and 9 were stained with Coomassie Blue after electrophoresis, whereas lanes 7 and 8 were revealed by Western blotting with an anti-cyprosin antiserum, demonstrating that the antibodies are capable of recognizing protein epitopes. Markers of M r approximately 43,000, 29,000, 18,000, and 14,000 migrated as indicated. Since the cyprosin light chain bands were difficult to visualize (panel A), only the heavy chain regions of the gels are depicted in lanes 3-9. PAGE under reducing conditions. Treatment (of a different preparation of recombinant cyprosin which contained both 34and 32-kDa heavy chains) with N-glycosidase F resulted in a decrease in the sizes observed for the heavy chains from 34 and 32 kDa to approximately 32 and 29 kDa, respectively (Fig. 3C,  lanes 7 and 8). The Pichia cells had thus carried out glycosylation of the heavy chains, compatible with the presence of the Asn 67 -Gly 68 -Thr 69 motif (see above) and with our previous observation that naturally occurring cyprosins are glycoproteins (17). A disulfide bond between Cys 3I near the C terminus of the heavy chains and Cys 97I near the N terminus of the light chain would account for the increased size (46 kDa) of the recombinant cyprosin observed under non-reducing conditions (Fig. 3,  lane 5). Six cysteine residues are predicted to be present within the plant-specific insert (Fig. 1) of cyprosin, and these are all conserved in the insert sequences of aspartic proteinase precursors from other plants (1)(2)(3)(4). A putative assignment of these into disulfide-bonded pairs (3I-97I, 28I-69I, and 34I-66I) has been proposed (12) on the basis of the swaposin similarity (see Introduction).
All of these data can be interpreted on the basis of the scheme depicted in Fig. 4. Processing of the initial protein product translated in P. pastoris removes the prosegment and generates mature cyprosin with microheterogeneity at the N terminus of the heavy chain(s), as described previously. Much, but not all, of the plant specific insert region is processed away, but cleavage occurs in (at least) two locations near the predicted N terminus of the plant-specific insert sequence, thereby generating 34-and 32-kDa heavy chains that differ at their C termini. Additionally, processing at a number of locations toward the C terminus of the insert generates a complex mixture of light chains that remain attached by disulfide bonding to the heavy chains to generate a series of isoforms of mature cyprosin. In contrast, the heavy and light chains of a naturally occurring cyprosin (isoform 3) isolated from flowers of C. cardunculus are not held together by disulfide bonding, since this heterodimeric protein was still resolved into heavy and light chains on SDS-PAGE under non-reducing conditions (Fig. 3B,  lanes 4 and 6). No N-terminal sequence was determined for the light chain of this isoform, but the sequence PMGESAVDϳ was identified at the N terminus of the closely related isoform 1 4 that had been separated as described previously (17) from the mixture of naturally occurring enzyme forms extracted from the flowers of the plant. This sequence begins at residue 103I of the plant-specific insert (Fig. 1) and is located downstream from Cys 97I . Processing in planta to remove the plant-specific insert must therefore have taken place at a slightly different location from those observed in P. pastoris, i.e. downstream from Cys 97I , which serves therefore to indicate the C-terminal boundary of the plant-specific insert. No C-terminal sequence has been obtained for the heavy chain of any naturally occurring cyprosin, but the homologous cardosin A heavy chain has been reported recently (9) to have microheterogeneity at its C terminus, corresponding to cleavage at the N-terminal end of the plant-specific insert, after the residues (equivalent to) Lys 238 , Val 240 , and Met 241 (Fig. 1). This also suggests that processing in planta may take place at adjacent site(s) to those observed in Pichia, i.e. upstream from the Cys 3I -Lys 4I -Thr 5I -Leu 6I sequence at the beginning of the plant-specific insert.
Processing in Pichia was only marginally affected by inclusion (at 400 nM final concentration) of the aspartic proteinase inhibitor, pepstatin, in the culture medium of the Pichia cells. This caused only a small increase in size of the 34-and 32-kDa bands to 36 and 33 kDa, respectively (Fig. 2B), but did not result in the accumulation of a protein band consistent in size with that of the unprocessed procyprosin precursor. As this report was nearing completion, Glathe et al. (26) reported the expression of the cDNA encoding the precursor of the aspartic proteinase, phytepsin, from barley (Hordeum vulgare L.) using baculovirus. In this heterologous system, the intact precursor accumulated in the medium, enabling it to be purified and its ability to be processed under defined conditions in vitro to be determined. It was shown to be capable of undergoing autoactivation to generate initially bands of 36 and 17 kDa, as monitored by SDS-PAGE under reducing conditions, followed by further processing to produce finally subunits of approximately 28 and 11 kDa, respectively. Microheterogeneity was detected at the N terminus of the 11-kDa subunit but the residues identified were (equivalent to) Tyr 92I and Val 93I (of cyprosin; Fig. 1); thus, autoprocessing had taken place finally at an identical location to that observed with recombinant procyprosin in the present study. In the prophytepsin case, no indication was given of the migration pattern of the subunits on SDS-PAGE under non-reducing conditions. However, these autoprocessing reactions of prophytepsin only occurred when the pH was below 4.5 (26). In our case, the medium of the Pichia cells was buffered well above pH 4.5 (pH 6.0), yet it was mature cyprosin and not the intact precursor that accumulated in the medium. Since the two plant proteinases are 78% identical, unless the initial translation product of procyprosin encounters pH values below 4.5 during its translocation through the lumen of the secretory apparatus in the Pichia cells, it would seem unlikely that the (multiple forms) of mature cyprosin that accumulated in the medium had been produced as a consequence of autoactivation of the recombinant precursor. A more likely explanation is that processing took place through the action of host cell proteinases at exposed sites that were susceptible not only to heterocatalytic attack under the conditions likely to be encountered in living cells but also to autocatalytic cleavage under the somewhat more extreme conditions that can be employed in vitro. Just as in the present case with recombinant and naturally occurring forms of cyprosin, the subunits of recombinant phytepsin differed slightly from those of the enzyme extracted from barley seeds (26).
Some precursors of mammalian/fungal aspartic proteinases require the action of extrinsic proteinases to generate the mature enzyme, e.g. prorenin, whereas others such as pepsinogen undergo autocatalytic activation (27). In the case of procathepsin D, which is the mammalian enzyme most closely related to cyprosin and phytepsin, cysteine proteinases are required for activation to occur within the lysosomes of the cell although, in vitro, limited autoprocessing occurs under acidic conditions to generate an activation intermediate (known as pseudo-cathepsin D) in which only 26 of the 44 residues of the prosegment have been removed (28). In order to achieve complete autocatalytic removal of all 44 prosegment residues in vitro, it was necessary to introduce an autoactivation sequence at the junction between the prosegment and mature enzyme region of human procathepsin D (28). A comparable alteration was introduced into the procyprosin sequence by changing the residues linking the prosegment to the N terminus of the (heavy chain of the) mature cyprosin (Fig. 1) from ϳPhe 40P -Gly-Gly-Ala-Leu-Arg 45P -Asp 1 ϳ to ϳPhe 40P -Ala-Ala-Phe-Leu-Arg 45P -Asp 1 ϳ. Expression of this construct in Pichia resulted in the appearance of active, mature enzyme that was indistinguishable in its features and behavior from wild-type cyprosin (data not shown). In order to gain some insight into the significance of the plant specific insert, a further adaptation was introduced into this mutant construct by deleting the nucleotides encoding the 104-residue plant-specific insert of procyprosin and replacing the five residues (ϳLys 238 -Gly-Val-Met-Ser 242 ϳ; Fig. 1) immediately preceding the plant insert by the corresponding residues (ϳVal 238 -Pro-Leu-Ile-Gln 242 ϳ) from human cathepsin D, which are known from the crystal structure to form a surface loop in this region of the human enzyme (29). Expression of this insert-deleted construct in Pichia resulted in the production of lower amounts of recombinant protein. The protein that was detected (Fig. 3C, lane 9) had a molecular weight (46,000) which corresponded to that of unactivated (insert-deleted) procyprosin and it was inactive against the synthetic peptide substrate. Thus, it seems that, at least in Pichia cells where the processes do approximate those likely to be operative in plant cells, the residues of the plant-specific insert would appear to be essential to ensure that the nascent polypeptide is folded properly and rendered capable of being activated to generate mature enzyme, albeit in slightly altered forms from the multiple isoforms that are generated within the flowers of the plant (9,17,26).
Kinetic parameters (K m , k cat ) were determined for the hydrolysis of a synthetic chromogenic peptide substrate by a purified preparation of wild-type, recombinant cyprosin in which the heavy chain consisted predominantly of the 32-kDa form (Table 1, A). Consistent values for K m (varying by less than 2-fold) were observed across the pH range 3.0 -6.0. Similarly, the values for k cat varied by only 2-fold in absolute magnitude but did increase progressively between pH 3.0 and 4.5-5.0, decreasing again at yet higher pH values, as has been observed previously for aspartic proteinases from other species (30). Values were also measured for the hydrolysis of the same substrate at pH 5.0 by one of the naturally occurring isoforms (isoenzyme 3) of cyprosin, purified from the flowers of the cardoon plant as described previously (17). The K m value obtained (25 Ϯ 5 M) was little different from that measured for the recombinant form of cyprosin produced in Pichia (Table I,  A). The k cat value determined for the natural isoform(29 Ϯ 3 s Ϫ1 ) was somewhat higher (by ϳ3-fold) than the corresponding value derived at pH 5.0 for the recombinant form of the enzyme. A homology-based model for plant aspartic proteinases constructed on the basis of their similarity to mammalian/ fungal aspartic proteinases for which structures have been solved by x-ray crystallography, indicated that the plant-specific insert residues are likely to be located adjacent to the active site cleft (10). Since the specificity constants (k cat /K m ) measured for isoform 3 and recombinant cyprosin at pH 5.0 were 1.2 and 0.18 M Ϫ1 s Ϫ1 , respectively (Table I, A), it may be that the residual residues remaining after excision of (most of) the plant-specific insert by the Pichia cells, have a minor influence on k cat (and hence on k cat /K m ) but have relatively little effect on substrate interaction (as reflected in K m ). Comparable situations have been reported for pseudocathepsin D relative to cathepsin D (29) and from a comparison of recombinant and naturally occurring forms of plasmepsin I, an aspartic proteinase from the malaria parasite Plasmodium falciparum. 5 The temperature dependence of the kinetic parameters for chromogenic substrate hydrolysis by the recombinant cyprosin was also determined. The specificity constant (k cat /K m ) increased progressively in magnitude, reaching a maximum value at 55°C but activity was still readily detectable at 65°C (Table I, B). The stability of the recombinant proteinase at 55°C was measured at pH 5.0, 6.0, and 7.5, and half-lives of 40, 40, and Ͻ2 min were determined, respectively. Recombinant cyprosin thus displays a remarkable stability at temperatures up to 55°C and at pH values as high as 6.0.
The recombinant plant cyprosin was not affected (IC 50 Ͼ Ͼ 2,000 nM) by a plant derived protein isolated from potatoes, which is a potent inhibitor (K i ϳ4 nM) of cathepsin D (32). Similarly, the naturally occurring protein inhibitors of pepsin/ cathepsin E and yeast proteinase A from the parasitic worm A. lumbricoides (33) and S. cerevisiae (34), respectively, did not inhibit the recombinant cyprosin to any significant extent (IC 50 Ͼ Ͼ 2,000 nM).
In contrast, the naturally occurring peptide isovaleryl-pepstatin, which contains a central statine moiety as the transition state analogue occupying the P 1 -P 1 Ј positions in the acylated pentapeptide (35), showed subnanomolar potency as an inhibitor of recombinant cyprosin (inhibitor 1 in Table II). Whereas this response might perhaps have been expected as typical of an aspartic proteinase (30), a completely distinct acylated pentapeptide (inhibitor 2; Table II) was almost as effective as an inhibitor of cyprosin, despite having only the central statine residue in common with the sequence of isovaleryl pepstatin. Replacement of statine (which has a leucine side chain in P 1 ) by its cyclohexyl alanine analogue (ACHPA, inhibitor 3) resulted in a 5-fold reduction in potency (compare inhibitors 3 and 2 in Table II). In longer inhibitors spanning nine subsites (Table  III), replacement of statine by ACHPA also resulted in a reduction of potency (by 2.5-fold; compare inhibitors 8 and 6 in Table  III); replacement by the variant containing a phenylalanine substituent in P 1 (AHPPA, inhibitor 7; Table III) produced a compound with comparable inhibitory potency to that of the statine-containing inhibitor 6. Consequently, statine was retained as the centerpiece, occupying the P 1 -P 1 Ј positions of inhibitors, and the effect of substitutions in other positions was examined systematically. In the P 6 position, truncation of the (CH 3 ) 2 CH⅐CH 2 ⅐COϳ (isovaleryl) substituent by removal of a methylene carbon (generating (CH 3 ) 2 CH⅐COϳ ϭ isobutyl) resulted in an ϳ3-fold improvement in K i value toward recombinant cyprosin (cf. inhibitors 9 and 6 in Table III). Replacement of the methylene carbon atom with an oxygen (in (CH 3 ) 3 C⅐O⅐COϳ ϭ t-butoxycarbonyl) caused an additional 6-fold improvement in potency, resulting in an inhibitor with a potency of 1 nM (inhibitors 10 and 6 in Table III). It may be that the oxygen atom in the -O-COϳ arrangement of inhibitor 10 can enter into a hydrogen bonding arrangement with an H-bond donor in the enzyme that is not possible with the -CH 2 -COϳ equivalent in inhibitor 6.
At the P 3 position, the replacement of phenylalanine by homophenylalanine (which has an additional -CH 2 -in its side chain) caused a significant improvement (10-fold) in K i (cf. inhibitors 12 and 6 in Table III). Replacement of Phe with 2-naphthylalanine (inhibitor 13) had only a minor (3-fold) effect on potency, whereas a 1-naphthylalanine substituent in the P 3 position resulted in an inhibitor (number 14) that showed almost equivalent potency toward cyprosin as that measured for pepstatin (Table II). In contrast to this increase in size upon replacing the benzene ring of phenylalanine with the 1-naphthylalanine substituent, removal of the benzene ring altogether, resulting in an alanine substituent in P 3 (compare 11 and 10 in Table III), caused a reduction in potency of more than 3 orders of magnitude.
In the P 2 position, substitution of His by the more hydrophobic side chain of Tyr resulted in increased inhibitor potency toward cyprosin by about an order of magnitude (cf. inhibitors 15 and 9; Table III). Replacement of the imidazole side chain of His with the CH 3 substituent of Ala generated an inhibitor (number 16) with subnanomolar potency comparable to that of pepstatin (Table II) and substitution with the longer CH 3 -CH 2 ϳ side chain of ␣-amino butyric acid (in inhibitor 17) resulted in the best yet inhibitor of recombinant cyprosin (Table  III). Indeed, the interaction of this compound with the enzyme was so tight that problems of mutual depletion were encountered, so that an accurate K i value could not be measured using the methodology employed (see "Materials and Methods").
In the P 2 Ј position, substitution of Leu by its ␤-branched isomer, Ile, resulted in a 5-fold improvement in inhibitor potency (cf. 18 and 6 in  Tables II and III, it may be concluded that the compound ethoxycarbonyl-1-naphthylalanine-␣-amino-n-butyric acid-statine-Ile-Phe-NH 2 would be likely to be an extremely potent, if somewhat water-insoluble inhibitor of recombinant cyprosin. It will be apparent then that the active site cleft of the plant enzyme is very hydrophobic. In this regard, however, it has long been held that, among the aspartic proteinases, cathepsin D has one of the most hydrophobic active site clefts in its specificity requirements (36). For the sake of comparison, then, the K i values that we have reported previously (37) for the interaction of this human enzyme with the same set of inhibitors, are included in Tables  II and III. From these data, it is apparent that each inhibitor binds substantially tighter (varying from ϳ5to 800-fold) to recombinant cyprosin than to human cathepsin D, with the exception of pepstatin and inhibitor 11, which contains an Ala in the P 3 position. The dislike of cyprosin for a small hydrophobic substituent in P 3 reflects the situation reported previously (37) for human renin, gastricsin, and cathepsins D and E. In contrast, human pepsin exhibits a striking preference for Ala rather than Phe as a P 3 substituent (37). The inhibitors described in Tables II and III were synthesized originally as potential inhibitors of human renin, which is one of, if not the, most specific proteinase so far described. It acts solely on one protein substrate angiotensinogen, to generate angiotensin I, and so there have been substantial efforts made to design specific renin inhibitors, such as those in Tables II and III, as potential anti-hypertensive agents. It seems all the more remarkable then that inhibitors designed on the basis of the sequence of residues in plasma angiotensinogen recognized by this highly specific enzyme involved in the mammalian cardiovascular system, should prove to be so effective against a proteinase from the flowers of the cardoon plant.
Following this rationale, with the advent of AIDS, several synthetic compounds have also been developed recently as inhibitors of the aspartic proteinase from human immunodeficiency virus (8). These have been shown to have differing potencies toward the human aspartic proteinases (38 -41). Consequently, these were also examined for their effectiveness toward recombinant cyprosin. HBY-793 (compound 21), which is a symmetrical compound containing a central dihydroxyethylene moiety (38), was found to act as an inhibitor of cyprosin (Table IV). It is, however, much less effective toward the plant enzyme than HIV proteinase or some of the human enzymes for which subnanomolar K i values were obtained. Ritonavir (compound 22) contains a hydroxyethylene transition state analogue and has been shown to have K i values of 20 and 8 nM, respectively, for human cathepsin D and cathepsin E (39), so is not totally specific for HIV proteinase either. This trend was reflected with recombinant cyprosin against which Ritonavir was a weak (K i ϭ 110 nM) inhibitor. In total contrast, Saquinavir (inhibitor 23 in Table IV) has been determined previously (40) to be completely specific for the aspartic proteinases from HIV-1, HIV-2, and SIV and to have no measurable effect on any other aspartic proteinase, including those from other retroviruses (41). This hydroxyethylamine-containing inhibitor was found to inhibit recombinant cyprosin, albeit weakly (Table IV) at 140 nM. In contrast, Indinavir (compound 24 in Table IV), Boc -Phe-His-Sta -Leu-Phe-NH 2 which also contains a hydroxyethylamine transition state analogue, did not inhibit cyprosin to any significant extent. From these data, it is evident that the active site cleft of cyprosin can readily accommodate chemically synthesized inhibitors of mammalian and retroviral aspartic proteinases although the plant enzyme is not susceptible to protein inhibitors including one, itself of plant origin (potato). Thus, it would seem that the most distinctive feature of the plant aspartic proteinase is the plant-specific insert. This conserved feature has been hypothesized to direct the newly translated plant polypeptides into appropriate cytomorphological compartments within the cell. However, in the present study, recombinant cyprosin was secreted by the yeast cells into the medium so that it did not appear to contain appropriate intracellular targeting signals that were functional in the Pichia cells. Thus, in order to gain further insight into the significance of the plant-specific insert, it will be necessary to express mutants encoding diminishing lengths of the insert, within plant cells.
Nevertheless, from the present study, the plant-specific insert appears to be necessary for the production of the recombinant precursor in such a form that it can be processed to produce the mature enzyme.