Cloning, isolation, and characterization of mammalian legumain, an asparaginyl endopeptidase.

Legumain is a cysteine endopeptidase that shows strict specificity for hydrolysis of asparaginyl bonds. The enzyme belongs to peptidase family C13, and is thus unrelated to the better known cysteine peptidases of the papain family, C1 (Rawlings, N. D., and Barrett, A. J. (1994) Methods Enzymol. 244, 461-486). To date, legumain has been described only from plants and a blood fluke, Schistosoma mansoni We now show that legumain is present in mammals. We have cloned and sequenced human legumain and part of pig legumain. We have also purified legumain to homogeneity (2200-fold, 8% yield) from pig kidney. The mammalian sequences are clearly homologous with legumains from non-mammalian species. Pig legumain is a glycoprotein of about 34 kDa, decreasing to 31 kDa on deglycosylation. It is an asparaginyl endopeptidase, hydrolyzing Z-Ala-Ala-Asn-7-(4-methyl)coumarylamide and benzoyl-Asn-p-nitroanilide. Maximal activity is seen at pH 5.8 under normal assay conditions, and the enzyme is irreversibly denatured at pH 7 and above. Mammalian legumain is a cysteine endopeptidase, inhibited by iodoacetamide and maleimides, but unaffected by compound E64 (trans-epoxysuccinyl-L-leucylamido-(4-guanidino)butane). It is inhibited by ovocystatin (cystatin from chicken egg white) and human cystatin C with Ki values < 5 nM. We discuss the significance of the discovery of a cysteine endopeptidase of a new family and distinctive specificity in man and other mammals.

Cysteine peptidases form one of the major groups of proteolytic enzymes, and can be divided into about 30 separate families on the basis of their molecular structures (reviewed in Refs. 1 and 2). Three families of cysteine endopeptidases have been known to be represented in mammals. The most numerous are those of the papain family (C1), which include cathepsins B, H, L, S, and others. These are predominantly lysosomal enzymes, responsible for proteolysis in the lysosomal/endosomal system and also are secreted to act extracellularly. In the cytosolic fraction of the cell, there are members of the other two families of cysteine endopeptidases: the families of calpain (family C2) and caspase (previously interleukin 1␤-converting enzyme; C14). These peptidases mediate limited proteolysis of cytosolic substrates. We now report that the legumain family (C13) can be added to the list of mammalian cysteine endopeptidases.
Legumain is the name that was given by Kembhavi et al. (3) to an endopeptidase that is present in many leguminous and other seeds, after they had isolated and characterized the enzyme from Vigna aconitifolia (moth bean). Legumain is specific for the hydrolysis of asparaginyl bonds. The amino acid sequence of legumain from Ricinus communis (castor bean) showed it to be homologous with an enzyme from the fluke Schistosoma mansoni (4). At that time, the fluke enzyme was of unknown specificity, but it has now been shown also to be an asparaginyl endopeptidase (5), active on the test substrate that had been introduced by Kembhavi et al.
The appearance of sequences homologous to legumain among the human expressed sequence tags (ESTs) 1 in the data bases alerted us to the presence of the enzyme in vertebrates, and accordingly, we made assays for asparaginyl endopeptidase activity in mammalian tissues. It was soon evident that legumain is present in human and other mammalian cells. We have now cloned and sequenced human legumain from a placenta library, and isolated and characterized pig legumain from kidney. We discuss the implications of the presence of an asparaginyl endopeptidase in mammalian cells.

EXPERIMENTAL PROCEDURES
Materials-Z-Ala-Ala-Asn-NHMec prepared as described (3) was standardized by complete hydrolysis with legumain. Bz-Asn-NHPhNO 2 was supplied by Bachem. Sodium citrate buffer solutions were prepared by mixing equimolar solutions of citric acid and trisodium citrate, to the desired pH value. Sodium citrate/phosphate buffers in the range pH 3.0 -7.5 were as described by McIlvaine (see Ref. 6).
Ovocystatin (cystatin from chicken egg white) was prepared as described (7), and recombinant human cystatin C was the kind gift of Dr. Magnus Abrahamson (Department of Clinical Chemistry, University of Lund, Sweden). Both cystatins were titrated with a solution of papain (Sigma, 2 ϫ crystallized) that had been standardized with E64 (8,9), and the concentrations quoted in the text relate to active inhibitor. Azocasein was prepared as described by Barrett and Kirschke (10). For the preparation of the His-tagged, recombinant C-fragment of tetanus toxoid, the construct of Makoff et al. (44) was modified to include a histidine tag. 2 A normal, full-term human placenta was kindly made available by the Rosie Maternity Hospital, Cambridge, and kidneys from freshly killed pigs were purchased from a local abattoir. Rats were Porton Wistar and rabbits New Zealand Whites.
Primer Design and PCR-based Cloning of Human and Pig Legu-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper has been submitted to the GenBank TM /EBI Data Bank with accession number(s) Y09862.
§ To whom correspondence should be addressed. Tel.: 44-1223-832312; Fax: 44-1223-837952; E-mail: alan.barrett@bbsrc.ac.uk. main-One forward PCR primer (Hsl1f, 5Ј-CTCCAGGAATTCTGAT-CAACAGGC-3Ј) and three reverse primers (Hsl2r, 5Ј-GTCCGAAGCT-TCCATCCAGTTGACG-3Ј; Hsl3r, 5Ј-GCCAGTTGAAGCTTTGGGTCC-GGAAG-3Ј; Hsl4r, 5Ј-GTCTCTGATCAGCACACAGTCGG-3Ј) were synthesized with Perceptive Biosystems Expedite 8909 equipment. The sequences of the primers were based on those of human ESTs identified as homologous to legumains of plants or Schistosoma. Accession numbers of representative ESTs are F01300 for Hsl1f, R17110 for Hsl2r, R36331 for Hsl3r, and R25616 for Hsl4r. The locations of the primers in the sequence of legumain are shown in Fig. 1; the sequence of Hsl4r is located in the 3Ј-untranslated region of human legumain. An EcoRI restriction site was built into the primer Hsl1f, and HindIII sites into Hsl2r and Hsl3r for use in cloning; these are underlined in the primer sequences. PCR reactions were in 50 l for 30 cycles (94°C, 1 min; 48°C, 1 min; and 72°C, 2 min). The 3Ј proofreading polymerase Pfu (Ref. 11; Stratagene, La Jolla, CA), was used for all amplification reactions. Combinations of the Hsl1f forward primer (500 nM) with each of the three reverse primers (500 nM) were used with human placenta 5Ј-Stretch Plus cDNA library in gt11 (Clontech, Palo Alto, CA) as the template. The PCR products were separated in 1% agarose gels and purified with a Geneclean kit (Bio 101, La Jolla, CA). The cDNA fragments were then digested with EcoRI and HindIII, and cloned into pSP65 (Promega, Madison, WI). The nucleotide sequences of the various clones were determined for both strands either by the dideoxy chain-termination method (12), or by Taq DyeDeoxy terminator cycle sequencing reactions in conjunction with an Applied Biosystems model 373 DNA Sequencer.
To obtain the clone containing the N terminus of legumain, a human placenta Marathon-Ready™ cDNA (Clontech) was used as the template for 5Ј-rapid amplification of cDNA ends (13). The forward adaptor primer (AP1) and the reverse, nested gene-specific primer (Hsl5r, 5Ј-CACGTGATCCTGGGGGCCACTC-3Ј, based on EST R17110) were used for amplification reactions as recommended by the manufacturer. A single product of approximately 500 bp was obtained. The cDNA fragment representing the 5Ј end of the legumain sequence was cloned unidirectionally into pSP65 at AvaI and BamHI sites. Use of the nested AP2 primer in conjunction with the reverse Hsl4r primer gave a fulllength legumain cDNA, which was sequenced.
Partial cloning and sequencing of pig legumain cDNA was achieved by a similar PCR strategy. Two pig kidney cDNA libraries (gifts of Dr. Jerzy Adamski, Max-Plank-Institut fü r Experimentelle Endokrinologie, Hannover, Germany, and Dr. Claudia T. Evans, University of Texas Medical Center, Dallas, TX) were used as templates for the amplification reactions. A 936-bp cDNA fragment was obtained when primers Hsl1f and Hsl3r were used.
SDS-Polyacrylamide Gel Electrophoresis-Except when stated, SDSpolyacrylamide gel electrophoresis (SDS-PAGE) was in gels of 10% T, 2.6% C containing the Ammediol buffer system (14). Staining for protein was with Coomassie Brilliant Blue R, and glycoproteins were detected with the DIG glycan detection kit (Boehringer Mannheim). Transferrin and thimet oligopeptidase served as positive and negative controls, respectively. In some experiments, samples were run in a Tris/Tricine-buffered, 10 -20% polyacrylamide gradient Ready gel (Bio-Rad).
Fluorometric Assay for Legumain-Continuous fluorometric assays with Z-Ala-Ala-Asn-NHMec were as described (3), with slight modifications. The substrate (10 M) was incubated at 30°C in 2.50 ml of 39.5 mM citric acid, 121 mM Na 2 HPO 4 , pH 5.8, containing 1 mM DTT, 1 mM EDTA, and 0.1% CHAPS (assay buffer). The rate of formation of product was followed in a Perkin-Elmer fluorometer under the control of an IBM-compatible computer running the FLUSYS software (15). The excitation and emission wavelengths were 360 and 460 nm, respectively, and 1 unit of activity was defined as that releasing 1 mol of product/min under the standard conditions.
For the assay of activity in a tissue sample, the tissue was homogenized in four volumes (w/v) of 0.1 M sodium citrate buffer, pH 6.0, containing 1 mM EDTA and 2 mM 2-mercaptoethanol. Five complete strokes were used in a Braun-Potter homogenizer running at 1000 rpm. The homogenate was centrifuged (11,600 ϫ g), and the supernatant was used in the enzyme assay as described above. Once a steady reaction rate had been determined (about 5 min), E64 was added to 10 M. Two min later, ovocystatin was added to 50 nM, and the reaction was followed for an additional 5 min. Values for specific enzymic activity were calculated by use of protein concentrations determined in the Bradford assay (16), with bovine serum albumin as standard.
Rate constants for irreversible inactivation were found by nonlinear regression analysis of the pseudo-first-order curves of inactivation by use of the FLUSYS software, giving k obs . The second order rate con-stant, k 2 was calculated as k obs /[I], at [S] Ͻ Ͻ K m . Since the assay substrate was used at [S] Ͻ Ͻ K m , no correction for competition with the inhibitors was required.
Purification of Pig Legumain-A temperature of 0 -4°C was maintained throughout the procedure. All buffers contained 1 mM EDTA and 2 mM mercaptoethanol, and centrifugation was at 15,000 ϫ g for 20 min, unless otherwise stated.
Cortical tissue (250 g) was dissected from pig kidney and homogenized with an equal weight of water for 15 s in a Waring blender. The crude homogenate was stored at Ϫ20°C, or used immediately. The crude homogenate was mixed with an equal volume of 0.1 M sodium citrate, pH 6.0, containing 0.4 M ammonium sulfate. The homogenate was centrifuged, and the pellet was discarded. The supernatant was adjusted from 0.2 to 1.2 M ammonium sulfate by addition of the solid, stirred for 30 min, and centrifuged. The pellet was discarded and the supernatant made 3.2 M ammonium sulfate. After 30 min, the pellet containing the enzyme was collected by centrifugation.
The pellet was resuspended in water, adjusted to pH 5.0 with 0.2 M citric acid, and dialyzed overnight against several changes of 10 mM sodium citrate, pH 5.0. During the dialysis, a heavy, brown precipitate formed, which contained the legumain. The precipitate was collected by centrifugation, dispersed in 50 mM sodium citrate, pH 5.0, by use of an Ultra Turrax homogenizer (Janke & Kunkel KG, Ika Werk, Staufen I. Breisgau, Germany), and collected by centrifugation. The pellet was washed again, but this time in 0.1 M sodium citrate, pH 5.0, containing 10% (v/v) ethanediol. Legumain was then eluted from the pellet by resuspending it in 0.2 M sodium citrate, pH 6.0, containing 10% (v/v) ethanediol. Centrifugation (100,000 ϫ g) gave a supernatant containing activity, which was dialyzed into 10 mM sodium citrate, pH 5.0, and centrifuged once more at 100,000 ϫ g. The supernatant was adjusted to pH 5.5 with 0.1 M trisodium citrate, and applied to a column (20-ml bed volume) of SP-Sepharose FF (Pharmacia) that had been pre-equilibrated with 50 mM sodium citrate, pH 5.5. The column was washed with two bed volumes of the equilibrating buffer before the enzyme was eluted with 0.4 M NaCl in the same buffer. Effluent fractions containing activity were combined and dialyzed into 10 mM sodium citrate, pH 5.0.
The solution was passed through 0.45-and 0.22-m nitrocellulose filters in preparation for running on the Pharmacia Mono S FPLC column (type HR 10/10). A column (8-ml bed volume) of Thiopropyl-Sepharose 6B (Pharmacia) was activated with 2-pyridyl disulfide as described by the manufacturers. All buffers for use on the Mono S and Thiopropyl-Sepharose columns were mercaptoethanol-free, and were deoxygenated by sparging with nitrogen gas. The Mono S column was equilibrated with 50 mM sodium citrate, pH 5.5. The two columns were connected in tandem. The sample was first run on to the Mono S column, which was then washed with three bed volumes of 50 mM sodium citrate, pH 5.5, 1 mM EDTA at a flow rate of 3 ml/min. When the A 280 of the effluent had fallen nearly to zero, activity was eluted with a step of 0.4 M NaCl in 50 mM sodium citrate buffer, pH 5.5. As soon as protein was detected in the effluent, flow was redirected to the Thiopropyl-Sepharose column, and the flow rate was decreased to 0.25 ml/min. Once the entire peak of protein from the Mono S column had run through the Thiopropyl-Sepharose column, the column was washed with five bed volumes of 50 mM sodium citrate, 0.2 M NaCl, 1 mM EDTA, pH 5.5, before being filled with the same buffer containing 10 mM cysteine, 0.1% CHAPS, pH 5.5. The column was allowed to stand overnight at 4°C, and elution with the cysteine-containing buffer was resumed. Fractions containing activity were combined as the final product, which was stored at 4°C or Ϫ20°C in the eluting buffer.
Deglycosylation with N-Glycosidase F-Legumain (15 g in 0.25 ml) was dialyzed into 0.1 M citric acid, 0.2 M Na 2 HPO 4 , 1 mM EDTA, 0.025% CHAPS, pH 7.2, and heated at 100°C for 5 min. N-Glycosidase F (Boehringer Mannheim) was then introduced (0.125 milliunit), and the mixture was incubated at 37°C for 24 h. The protein was precipitated from 10% trichloroacetic acid and taken up for SDS-PAGE.
Partial Amino Acid Sequencing-Legumain (90 g) was digested with 50 g of CNBr in 0.2 ml of 70% formic acid for 24 h at 25°C. The mixture was diluted 10-fold with water and freeze-dried. The sample was then resuspended and run in Tris/Tricine-buffered SDS-PAGE, and the separated peptides were transblotted on to polyvinylidene difluoride membrane for N-terminal microsequence analysis.
Active Site Titration of Legumain-Legumain was titrated with ovocystatin, under tight-binding conditions. A set of assay tubes was prepared, each containing assay buffer (0.85 ml), legumain (1.0 milliunit/ ml, 50 l), and ovocystatin solution (0 -50 l, step 5 l, of 3.0 M, in 50 l total). The tubes were equilibrated to 30°C for 5 min, and assays were started by adding 25 l of 20 mM Bz-Asn-NHPhNO 2 solution in Me 2 SO. After 5 min, the reaction was stopped by addition of 1.0 ml of 0.1 M sodium chloroacetate in 0.1 M Tris/HCl, pH 8.0. Activity was expressed as increase in A 410 relative to a blank without enzyme. A linear plot of activity against ovocystatin concentration allowed the calculation of the molarity of the legumain solution, on the assumption of stoichiometric inhibition.

Cloning and Sequencing of Human and Pig Legumain-
Human legumain cDNA clones, containing either the 5Ј or 3Ј ends with overlapping regions and the full-length cDNA, were obtained by PCR amplification from human placenta cDNA libraries, and sequenced as described under "Experimental Procedures." The nucleotide sequence of the full-length cDNA ( A clone containing part of pig legumain cDNA was also obtained, by similar PCR-based cloning methods, when pig kidney cDNA libraries were used as templates. The pig legu-main clone consists of 884 nucleotides encoding 294 amino acid residues. The deduced amino acid sequence of the pig legumain cDNA corresponds to the region Asn 91 -Leu 384 in human legumain (Fig. 2). Comparison by use of BESTFIT (18) reveals 83% identity at the nucleotide level and 84% at the amino acid level.
Analysis of the N terminus of the deduced amino acid sequence of human legumain with the SIGCLEAVE program of EGCG (19) indicates the presence of a signal peptide, predicted to be cleaved at Ala 17 -Val 18 (Fig. 1).
There are four potential glycosylation sites in the deduced amino acid sequences of human and pig legumains, at Asn 91 , Asn 167 , Asn 263 and Asn 272 (in the numbering of Fig. 2).
An Arg-Gly-Asp (RGD) motif appears in the sequences of the mammalian legumains, and KGD in those of the flukes (Fig. 2, residues 118 -120). RGD sequences in cell-adhesive proteins such as fibronectin (20, 21) and the disintegrins (22) are responsible for the binding of the proteins to their cell-surface receptors, the integrins, but the significance of the RGD sequence in legumain has yet to be determined.
Detection of Legumain Activity in Mammalian Tissues-Le- gumain is unusual among mammalian proteolytic enzymes in being unaffected by E64 but inhibited by cystatin, and these characteristics were used to confirm the specificity of the assays. E64 is a potent inhibitor of the cysteine peptidases of family C1 such as cathepsins B, H, and L (9), but 10 M E64 had no significant effect on the rates of hydrolysis of Z-Ala-Ala-Asn-NHMec by tissue extracts. In all cases, the activity was totally inhibited by 50 nM ovocystatin, however. It was noted that with these crude samples, the full activity of legumain was expressed even in the absence of exogenous thiol activator.
The results of assays with a variety of mammalian tissues are shown in Table I. Activity was greatest in kidney, among the rat, pig, and rabbit tissues examined. A low level of activity (0.29 milliunit/mg) was unambiguously detected in human placenta (Fig. 3), but none was detected in blood leukocytes.
Purification of Legumain from Pig Kidney-Pig kidney was selected as source for purification of legumain, and the results of a typical preparation are summarized in Table II. We found it necessary to devise an unconventional procedure for the purification of the enzyme, because legumain is stable only in the range pH 3-6 (see below), and at low salt concentrations it tends to be adsorbed to any solid material present. When the 1.2-3.2 M ammonium sulfate fraction (stage B) was dialyzed at pH 5.0, a heavy precipitate formed, to which the legumain was adsorbed. The enzyme eluted by raising the salt concentration at pH 6.0 showed a large increase in specific activity (stage C). The enzyme was also bound exceptionally tightly to SP-Sepharose at pH 5.5, requiring 0.4 M NaCl for elution. A large purification factor was achieved in this step (stage D). The enzyme was then run on Mono S in FPLC to separate it from low M r FIG. 2. Alignment of amino acid sequences of legumains. Sequences are numbered according to human preprolegumain, and residues identical to those in human legumain are shown in white on black. Key to sequences: a, human preprolegumain; b, pig legumain (from protein sequencing); c, pig legumain (from cDNA); d, S. mansoni prepro-"hemoglobinase" (EMBL accessions M17423 and M21308); and e, C. ensiformis preprolegumain (D31787). The alignment was constructed by use of the PILEUP program. thiols, and bound to Thiopropyl-Sepharose activated with 2-pyridyl disulfide. After overnight exposure to 10 mM cysteine, it was eluted in satisfactory yield. The over all purification factor was about 2200-fold, with a recovery of 8% (stage E).
The final product ran as a single, somewhat diffuse band in SDS-PAGE, at about 34 kDa (Fig. 4). The enzyme was stable to storage at pH 5.8, showing little loss of activity over several months. This material was used for all further characterization of the enzyme.
Partial Amino Acid Sequences of Pig Legumain-N-terminal and internal amino acid sequence data were obtained for the purified pig kidney legumain. An N-terminal sequence of 25 residues was obtained, together with an internal sequence of 19 residues (Fig. 2).
Pig Legumain Is N-Glycosylated-Legumain was run in SDS-PAGE, with and without treatment with N-glycosidase F. As can be seen in Fig. 4, the original band at about 34 kDa was converted to a band at 31 kDa, consistent with deglycosylation. Proteins from a parallel gel were transblotted on to a polyvinylidene difluoride membrane, and stained for glycan. Legumain gave a clear positive reaction, which was lost following exposure to N-glycosidase F, and the positive and negative controls gave the expected results (data not shown).
Dependence of Activity and Stability of Pig Legumain on pH-Legumain solutions in sodium phosphate/citrate, pH 3.0 -7.2, were incubated at 30°C, and samples were removed for assay at 2 and 4 h. It was found (Fig. 5A) that there was no appreciable loss of activity in the range pH 4.2-5.5, and even during 24 h there was little loss of activity in this range (data not shown). Below pH 4.2, stability fell off gradually, but above pH 6.0, it fell off precipitously, and no activity survived 2 h at pH 6.6. The rate of loss of activity above pH 6.0 was strongly dependent on temperature, so that at 25°C linear rates could be obtained in brief, continuous assays up to pH 7.0.
The pH dependence of activity was determined in the same buffer system, at 25°C, with the result shown in Fig. 5B. It can be seen that maximal activity occurred at pH 6.4. However, routine assays were more conveniently made at 30°C for longer periods, and pH 5.8 was selected as the most suitable value for these.
Catalytic Activity of Isolated Pig Legumain-Active site titration of a 1 unit/ml solution of legumain with 3.0 M ovocystatin showed 2.0 M reactive sites. For a 34-kDa protein, this would imply a specific activity of 14.7 units/mg. The actual specific activity of combined fractions of our final enzyme prep-arations in the standard assay, with Z-Ala-Ala-Asn-NHMec at pH 5.8 and 30°C, was only 7.7 (Table I), but values of 10 or more were seen with individual fractions on several occasions.
Kinetic parameters for the hydrolysis of Z-Ala-Ala-Asn-NHMec by pig kidney legumain were: k cat 46 s Ϫ1 , K m 50 M, and k cat /K m 920,000 s Ϫ1 M Ϫ1 . These values are of similar magnitude to those for cathepsins B and L acting on their preferred aminomethylcoumarin substrates (10).
Legumain was tested for hydrolysis of a range of oligopeptide substrates (from Sigma). Each peptide (50 M in 100 l) was incubated with 0.3 milliunit of legumain/ml in 50 mM sodium citrate buffer, pH 6.0, containing 1 mM EDTA, 1 mM DTT, for 2 h at 30°C. Neurotensin, human VIP fragment 1-12, and chicken VIP fragment 16 -28 were cleaved. The products were isolated by high performance liquid chromatography, and identified by amino acid analysis, revealing cleavage positions as shown in Fig. 6. It is evident that in each case, cleavage was at an asparaginyl bond.
Other peptides that were cleaved, but for which the products were not identified, were neurotensin fragment 1-11, pig VIP fragment 1-28, bombesin, somatostatin, and allatotropin. All of these peptides contain asparaginyl bonds.
Not all asparagine-containing peptides were cleaved, however; under the conditions of assay, there was no detectable hydrolysis of [Asn 1 ,Val 5 ]angiotensin II, Ala-Ser-Thr-Thr-Asn-Tyr-Thr, or Gly-Ser-Asn-Lys-Gly-Ala-Ile-Ile-Gly-Leu-Met. Other peptides that were not cleaved were bradykinin, substance P, dynorphin A fragment 1-13, and angiotensin I, none of which contains an asparaginyl bond.
Action of Legumain on Proteins-Azocasein (1.5%, w/v) in 50 mM sodium citrate buffer, pH 5.5, containing 1 mM EDTA and 5 mM DTT, was incubated with legumain (5 milliunits/ml) for 1 h at 30°C. The reaction was stopped by making the mixture 3% (w/v) trichloroacetic acid, and soluble peptides were separated by filtration for quantification by A 366 . The enzyme caused an increase in A 366 of 0.128 under these conditions. Under the same conditions, cleavage of both bovine serum albumin and human transferrin was evident in SDS-PAGE. There was no obvious enhancement of the rate of hydrolysis of these proteins by prior denaturation at 100°C for 5 min.
The recombinant C-fragment of tetanus toxoid (50 g in 100 l) was cleaved at three bonds by legumain (5 milliunits, 1 h), under the conditions used for azocasein (Fig. 7). Again, only asparaginyl bonds had been hydrolyzed, but many other asparaginyl bonds in the protein were unaffected. The bonds that were cleaved were in some of the most hydrophilic parts of the molecule, as identified by the Kyte and Doolittle (23) hydropathy prediction algorithm.
Inhibitors-Potential inhibitors of pig legumain were tested in the standard assay (Table III). General inhibitors of peptidases of serine, aspartic, and metallo-catalytic type (including phenylmethanesulfonyl fluoride, pepstatin, and 1,10-phenanthroline) had no effect on the enzyme. In contrast, p-chloromercuribenzoate, iodoacetate, iodoacetamide, and N-ethylmaleimide were reasonably effective inhibitors even in the presence of 1 mM DTT. E64 had no effect on legumain, even at 100 M concentration with incubation for 12 h at 30°C.
Ovocystatin and recombinant human cystatin C were slow, tight-binding inhibitors with ID 50 values of about 5 nM. This value is in the same order of magnitude as the K i for cathepsin B, but 2 or 3 orders of magnitude higher than those for papain and most other members of peptidase family C1 (24).
For tests with more general thiol-blocking agents, the assay was adapted to contain only low concentrations of thiol compounds, and a stock solution of enzyme was preactivated before dilution into the incubation mixture. The progress of inactivation by iodoacetate and iodoacetamide (each 1 mM), N-ethylmaleimide (20 M), and N-phenylmaleimide (2 M) was monitored with only 50 M DTT, and the curves were analyzed to yield second order rate constants of inactivation (Table IV).

Covalent Structures of Mammalian Legumains-
The deduced amino acid sequences of the human and pig enzymes show that the mammalian legumains are unquestionably homologous to the legumains of plants and Schistosoma, as can be judged from inspection of Fig. 2. As aligned by PILEUP (18), the sequence of human legumain is 30% identical to that of S. mansoni and 34 -35% identical to those of Canavalia ensiformis (jack bean), R. communis (castor bean), Citrus sativa (orange), and other plants. The RDF test of Lipman and Pearson (25) gave highly significant z values for these comparisons. Human and pig legumains are therefore members of peptidase family C13 as defined by Rawlings and Barrett (1), and this family has to be added to the known mammalian families of cysteine peptidases.
The sequence of human legumain is that of a protein of 49 kDa, but SDS-PAGE shows a band at 34 kDa for mature pig legumain. This is consistent with what is known of the posttranslational processing of legumain in other organisms, in which the enzyme is processed at both N and C termini, and glycosylated.
The mammalian legumain sequences showed potential glycosylation sites, and pig legumain gave the staining reaction of a glycoprotein, and showed the expected decrease in apparent molecular mass on treatment with an N-glycosidase. Several of the plant legumains are also known to be glycosylated (3,26).
If we make the assumption that the N terminus of the mature human protein is Gly 26 , by analogy with that of the pig sequence in this study (Fig. 2), then the human enzyme has an 8-amino acid propeptide. The C-terminal processing of the legumains of plants and Schistosoma takes the form of removal of a segment of about 14 kDa (4,27). Cleavage of human legumain in the vicinity of residue 300 would give rise to a mature protein of about 31 kDa, consistent with our experimental data for the mass of the deglycosylated pig enzyme (Fig. 4).
Presence of Legumain in Mammalian Tissues-The extracts of mammalian tissues hydrolyzed the fluorometric substrate Z-Ala-Ala-Asn-NHMec, and essentially all of the activity had the inhibition characteristics expected of a legumain. In V. aconitifolia, too, we found this substrate to be quite specific (3), but Schistosoma contains at least one other major enzyme active in the assay (5).
In rat, rabbit, and pig, the highest specific activity was detected in kidney, with appreciable activities also in rat spleen and pig liver. A low but significant level of activity was detected in human placenta. In view of the countless assays of the proteolytic activities of mammalian tissues that have been made over past decades, it came as a surprise to find a novel enzyme, particularly one that can be detected with a simple protein substrate such as azocasein.
Catalytic Activity-Pig legumain was found to be very labile at neutral pH. In this respect it resembles such lysosomal cysteine endopeptidases of family C1 as cathepsins B and L. Legumains of the plants Phaseolus vulgaris and C. ensiformis are also unstable at pH 7.5 (28,29), and pH optima are in the region of 5.5 (29,30). Legumain of Schistosoma differs in this respect, with its pH optimum of 6.8, as has been shown directly by Dalton et al. (5).
There seems little doubt that the legumains are cysteine peptidases. The purified pig enzyme is active only in the presence of thiol compounds, but the enzyme in crude extracts of mammalian tissues did not require a thiol activator, and the same has been reported for Schistosoma (5).
The inhibition properties of pig legumain (Table III) were consistent with a cysteine peptidase. With low nanomolar K i values, the two cystatins tested were more potent by about 1000-fold than they are against legumains of V. aconitifolia and C. ensiformis, however (3,33).
The rate constants for inactivation of pig legumain by iodoacetate and iodoacetamide (Table IV) are very low compared to those for a typical cysteine endopeptidase of family C1 such as papain (31), but are of similar magnitude to those for glycyl endopeptidase and bromelain (32). It was notable that the maleimides were much more reactive, especially N-phenylmaleimide. 3,4-Dichloroisocoumarin is normally regarded as an inhibitor specific for serine peptidases, so it was not expected to inhibit legumain, but it has been found to inhibit calpains (family C2) and caspase 1 (C14). 3 Legumain of C. ensiformis is highly specific for the cleavage of asparaginyl bonds (33), and all indications are that the pig enzyme is equally selective. However, the legumains may cleave only some of the asparaginyl bonds in a polypeptide substrate. This was suggested by the fact that several of the asparagine-containing oligopeptides tested were not hydrolyzed, and only 3 of the 47 asparaginyl bonds in the C-fragment of tetanus toxoid were cleaved. This implies the existence of additional determinants of specificity that have yet to be identified. The specificity of plant legumain in its action on seed proteins seems to include a preference for hydrophilic surface loops (34), and a similar preference would be consistent with the action of pig legumain on the C-fragment of tetanus toxoid reported here.
There has been no report of a peptidase with strict specificity for asparaginyl bonds apart from those for species variants of legumain, so there are no direct precedents as a basis for speculation about the structural mechanism of this specificity. Perhaps the closest analogy is the strong preference of picornains for cleavage of -Gln-Gly-bonds in their processing of picornaviral polyproteins, and in these enzymes, a histidine side-chain seems primarily responsible for the interaction with glutamine (35).
Possible Biological Functions-Legumain was previously known from plants and a simple animal, Schistosoma, and we now know that it is present also in mammals. The enzyme was evidently present in the protozoan ancestor of plants and animals at the time of their divergence about 1000 million years ago, and since the strict specificity for hydrolysis of asparaginyl bonds is seen in both plants and animals, there is little doubt that it was also exhibited by the archetypal enzyme. The functional conservation of the enzyme across such a wide spread of organisms suggests that there may be some fundamental biological importance in the hydrolysis of asparaginyl bonds. The roles of the enzyme that we are aware of seem to have been acquired secondarily. For example, the enzyme has been recruited to process seed storage proteins at asparaginyl residues (34), and probably also to participate in their mobilization in germination (36), but these obviously cannot represent the original functions of the enzyme.
The hydrolysis of asparaginyl bonds is prominent in the post-translational processing of lysosomal hydrolases. Examples are the cleavages that generate the two-chain forms of cathepsins B and H (Fig. 8), and there is also hydrolysis of asparaginyl bonds in the processing of pig cathepsin D (37) and human ␤-N-acetylhexosaminidase (38). Until now there has been no indication what peptidase might be responsible for this, but we would suggest that it is legumain.
Preliminary experiments in which rat kidney has been fractionated by the procedure of Maunsbach (39) have indicated that legumain is a lysosomal enzyme. If confirmed, this finding will be consistent with the properties of the enzyme, since it is synthesized with a signal peptide, and is N-glycosylated, and moreover has a requirement for an acidic environment. Plant legumain is present in the vacuole, an analog of the lysosome (34,40), and the enzyme processes a papain-family endopeptidase, SH-EP, in Vigna mungo (mung bean; Ref. 41).
In C. ensiformis, legumain apparently catalyzes the transpeptidation of concanavalin A (42), but whether a proteinsplicing function could be significant for mammalian legumain remains to be established.
It is perhaps too early in the study of legumain in mammals to have clear ideas about possible disease involvement, but a significant report is that of Sharma et al. (43), who have detected an enzyme in lens that may be responsible for an agedependent degradation of ␣-crystallin. The enzyme hydrolyzes an Asn-Glu bond and is inhibited by N-ethylmaleimide, but not E64, and therefore possibly is legumain.
Legumain may well be secreted from cells. The instability of the enzyme at neutral pH would potentially restrict its extracellular activity, except perhaps in an acidified pericellular environment. The interactions of the RGD sequence with membrane components may help to retain it in this environment.
In summary, we have shown that mammalian tissues contain a cysteine endopeptidase of a family not previously known to be present in mammals, and with a distinctive specificity. By analogy with the important functions of the other families of cysteine peptidases found in mammals (families C1 (cathepsin B), C2 (calpains), and C14 (caspase)), family C13 may also prove to be of considerable biomedical interest.

Inhibitors of pig legumain
The enzyme was preincubated with the potential inhibitor in the usual assay buffer (containing 1 mM DTT) for 10 min at 30°C before the reaction was initiated by addition of substrate. Some irreversible inhibitors (marked *) were selected for determination of rate constants (see Table IV).  3 1 mM 92