Cystatin E is a Novel Human Cysteine Proteinase Inhibitor with Structural Resemblance to Family 2 Cystatins*

A new member of the human cystatin superfamily, called cystatin E, has been found by expressed sequence tag (EST) sequencing in amniotic cell and fetal skin epithelial cell cDNA libraries. The sequence of a full-length amniotic cell cDNA clone contained an open reading frame encoding a putative 28-residue signal peptide and a mature protein of 121 amino acids, includ- ing four cysteine residues and motifs of importance for the inhibitory activity of Family 2 cystatins like cystatin C. Recombinant cystatin E was produced in a baculovirus expression system and isolated. An antiserum against the recombinant protein could be used for affinity purification of cystatin E from human urine, as con- firmed by N-terminal sequencing. The mature recombinant protein processed by insect cells started at amino acid 4 (cystatin C numbering), and displayed reversible inhibition of papain and cathepsin B ( K i values of 0.39 and 32 n M , respectively), in competition with substrate. Cystatin E is thus a functional cysteine proteinase in- hibitor despite relatively low amino acid sequence similarities with human cystatins (26–34% with quences for the Family 2 cystatins S, SN, and SA; < 30% the Family 1

Cysteine proteinase inhibitors of the cystatin superfamily are present in a variety of human tissues and body fluids. They seem, therefore, to have an important regulatory role in normal body processes involving cysteine proteinase activity such as bone resorption (1). In addition, numerous pathological conditions have been connected to an increased cysteine proteinase activity, including septic shock (2), cancer progression and metastasis (for review, see Ref. 3), parasitic and viral infections (4), as well as inflammatory reactions in rheumatoid arthritis, purulent bronchiectasis, and periodontitis (5)(6)(7). The cystatins are thus implied to function as general regulators of potentially harmful proteinase activities besides being members of the non-immune defense system of the body.
Structurally, the cystatins constitute a single superfamily of evolutionary related proteins (8). The known human members of this superfamily can be grouped into three protein families as defined by Dayhoff et al. (9). Cystatins A and B (Family 1) contain about 100 amino acid residues (M r ϭ 11,000 -12,000) and lack disulfide bridges, and their genes do not encode signal peptides. Family 2 cystatins are secreted proteins of ϳ120 amino acid residues (M r ϭ 13,000 -14,000) and have two characteristic intrachain disulfide bonds. The human Family 2 cystatins C, D, S, SN, and SA are encoded by genes located in a multigene locus at chromosome 20 together with two pseudogenes (10 -12). Family 3 cystatins, represented by human Land H-kininogen, are more complex members of the cystatin superfamily and contain three Family 2 cystatin-like domains besides being kinin precursors (13). All of the human cystatin superfamily members are tight-binding enzyme inhibitors with specificity against papain-like cysteine proteinases such as the mammalian cathepsins B, H, L, and S (for review, see Ref. 14).
In the present study, we report the discovery of another human cystatin, cystatin E, with sequence similarities too low to agree with any of the three previous cystatin families and with the unusual characteristic of being a glycoprotein. Still, it resembles Family 2 cystatins structurally in containing two putative disulfide bridges and by being a secreted protein.
Functional characterization and distribution studies demonstrated that cystatin E is a tight-binding cysteine proteinase inhibitor with unusual tissue expression.

EXPERIMENTAL PROCEDURES
Identification of cDNA for Cystatin E-A data base containing more than one million ESTs 1 obtained from over 600 different cDNA libraries has been generated through the combined efforts of Human Genome Science Inc. and the Institute for Genomic Research using high throughput automated DNA sequence analysis of randomly selected human cDNA clones (15,16). Sequence homology comparisons of each EST were performed against the GenBank data base using the BLAST and BLASTN algorithms (17). ESTs having homology to previously identified sequences (probability equal or less than 0.01) were given a tentative name based on the name of the sequence to which it was * This work was supported by grants from Magn. Bergvall's, A. Ö sterlund's, A. Påhlsson's, and G. and J. Kock's Foundations, the Medical Faculty of the University of Lund, and the Swedish Medical Research Council (Project No. 09915 and 05196). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequences(s) reported in this paper has been submitted to the GenBank TM  homologous. A specific homology and motif search using the known amino acid sequence of human cystatin C (M27891) against this human EST data base revealed several ESTs having translated sequences Ͼ30% homologous to that of cystatin C. One clone (HAQBM60) encoding an intact N-terminal signal peptide was identified in a human amniotic cell (primary culture) library and was selected for further investigation. This EST and two other ESTs from amniotic or fetal skin epithelium libraries were sequenced on both strands to the 3Ј end, and their homology to the cystatin C cDNA was confirmed.
Cloning of Cystatin E in Baculovirus-The entire coding sequence of the cystatin C-related cDNA was amplified using standard PCR techniques with primers corresponding to the 5Ј and 3Ј sequences of the gene (upstream primer, 5Ј-CGC GGA TCC GCC ATC ATG GCG CGT TCG AAC CTC-3Ј; downstream primer, 5Ј-CGC GGT ACC GAA TGG CCT TCG CCC TC-3Ј). The amplified fragment was purified and digested with BamHI and Asp718 followed by a second purification. The baculovirus expression vector, pA2, derived from pNR704 (18 -20) was digested with BamHI and Asp718 followed by agarose gel purification. The purified pA2 vector was ligated with the amplified cystatin coding sequence using T4 DNA ligase (Life Technologies, Inc.). Purification of Recombinant Cystatin E-Recombinant cystatin E was purified from baculovirus-infected Sf9 cell supernatants. All purification steps were carried out at 4°C, utilizing a BioCAD 250 (PerSeptive Biosystems, Inc.). 500 ml of supernatant was first adjusted to pH 4.5 and then applied at a flow rate of 20 ml/min to a 10-ml Poros HS column pre-equilibrated with 100 mM NaOAc buffer, pH 4.5. The cystatin E was found in the flow-through fraction. After adjusting pH to 8.5, the flow-through fraction was applied at a flow rate of 20 ml/min to a 10-ml Poros HQ column pre-equilibrated in 20 mM Tris-HCl buffer, pH 8.5. Cystatin E was again collected in the flow through fraction. Finally, cystatin E was captured on a Mimetic Green 1 A6XL Alpha column (10 ml, ProMetic BioSciences, Inc., Burtonsville, MD) pre-equilibrated with 25 mM sodium phosphate buffer, pH 6.0. After washing the column, the cystatin E was eluted by 2 M KCl in 25 mM sodium phosphate buffer, pH 6.0. The resulting cystatin E preparation was found to be more than 95% pure by SDS-PAGE and contained Ͻ10 endotoxin units/mg.
Production and Use of an Antiserum Against Cystatin E-An antiserum against cystatin E was raised by injecting 0.2 mg isolated recombinant antigen (above) in Freund's complete adjuvant (Difco Laboratories) subcutaneously into a rabbit. The injection was repeated after three weeks, and the rabbit was bled every third week. The specificity of the antiserum was tested by crossed and classical immunoelectrophoresis of the recombinant cystatin E used as starting material and of concentrated proteinuria urine containing cystatins A, B, C, S, SN, and kininogen (21). The IgG fraction of 100 ml of antiserum was isolated by absorption to protein A-Sepharose (Pharmacia-LKB, Uppsala, Sweden) and subsequent elution with a glycine buffer at pH 2.2. The IgG fraction was coupled to MiniLeak resin (Kem-En-Tec, Copenhagen, Denmark) as recommended by the manufacturer.
Purification of Cystatin E from Human Urine-Urine from one single individual with mixed glomerular-tubular proteinuria was used as starting material in the purification procedure described below. The urine was collected directly into a flask containing benzamidinium chloride, EDTA, Tris, and sodium azide to final concentrations of at least 6, 30, 50, and 15 mM, respectively, to avoid proteolytic degrada-tion. 200 ml of urine was concentrated 10-fold by pressure ultrafiltration using a C-DAK artificial kidney with a retention limit of approximately 1,500 Da (Cordia Dow Corp., Miami, FL) and then incubated for 3 h at room temperature with 3 ml of the immunoaffinity resin described above, containing about 100 mg of covalently coupled IgG from the anti-(cystatin E) serum. The resin was then packed into a column and washed with about 300 ml of a 50 mM Tris buffer, pH 7.4, containing 0.5 M NaCl, 5 mM benzamidinium chloride, 10 mM EDTA, and 15 mM sodium azide. Immunosorbed material was eluted with 0.2 M glycine buffer, pH 2.2, containing 0.5 M NaCl, 5 mM benzamidinium chloride, 10 mM EDTA, and 15 mM sodium azide, and the eluate was immediately neutralized by addition of a 2 M Tris buffer, pH 8.6, and then concentrated to about 700 l by ultrafiltration using Centricon-3 concentrators (Amicon Corp., Danvers, MA). Agarose gel electrophoresis of the eluate revealed the presence of a major protein band, and blotting of the electrophoretically separated components onto polyvinylidene difluoride membranes followed by immunostaining and sequencing identified the major protein band as cystatin E.
Protein Analyses-Cystatin E was analyzed for glycosylation by determining the monosaccharide content in two purified preparations of the recombinant protein. About 10 g of the protein was hydrolyzed with 0.2 M trifluoroacetic acid at 100°C for 4 h. After drying the hydrolysate in a SpeedVac and reconstituting in 50 l of deionized water, the resulting monosaccharides were analyzed on a Dionex carbohydrate analyzer (22). A PA-1 column (Dionex) was used to separate the monosaccharides by isocratic elution with 12 mM NaOH. The monosaccharides were detected by integrated amperometry using a pulsed amperometric detector. Glycosylation analysis was also done by incubation with peptide-N-glycosidase F (EC 3. recommended by the enzyme supplier (Oxford GlycoSystems, Abingdon, UK) followed by SDS-PAGE. Purified preparations of recombinant and natural cystatin E was characterized by SDS-PAGE after reduction in 15 or 16.5% gels with the buffer systems described by Laemmli (23) and Schä gger and von Jagow (24), respectively, and by agarose gel electrophoresis at pH 8.6 (25). Automated N-terminal sequencing was carried out after electrophoresis in SDS-polyacrylamide (Novex 4 -20% gels), blotting onto a ProBlott membrane (Applied Biosystems), staining with Ponceau S (0.2% in 4% acetic acid), and excision of the band of interest (26), using an Applied Biosystem 492 sequencer and the gas-phase blot cycles. Alternatively, proteins in mixtures were separated by agarose gel electrophoresis and transferred to a polyvinylidene difluoride membrane, and N-terminal sequencing was carried out on the individual protein bands using an Applied Biosystems 477A sequencer (27).
Enzyme Inhibition Assays-The methods used for active site titration of papain, titration of the molar enzyme inhibitory concentration in cystatin E preparations, and for determination of equilibrium constants for dissociation (K i ) of complexes between cystatin E and cysteine peptidases have been described in detail earlier (14,28). The enzymes used were papain (EC 3.4.22.2, from Sigma), activable to 70 -75% after affinity purification on Sepharose coupled Gly-Gly-Tyr-Arg as detailed previously (29), and affinity purified human cathepsin B (EC 3.4.22.1, from Calbiochem, La Jolla, CA). The fluorogenic substrate used was Z-Phe-Arg-NHMec (10 M, from Bachem Feinchemikalien, Bubendorf, Switzerland), and the assay buffer was 100 mM sodium phosphate buffer (pH 6.5 and 6.0 for papain and cathepsin B, respectively), containing 1 mM dithiothreitol and 2 mM EDTA. Steady-state velocities were measured, and K i values were calculated according to Henderson (30). Corrections for substrate competition were made using K m values of 150 M for cathepsins B (31) and 60 M for papain (32).

RESULTS AND DISCUSSION
At EST sequencing of human cDNA libraries, several clones with significant homology to the cystatin C sequence (33) were identified. Clones encoding the same unknown protein were identified in libraries from amniotic membrane cells (32 clones) and fetal skin epithelium (27 clones). Complete sequencing of these identified a full-length cDNA clone, designated HAQBM60, in an amniotic cell library. The clone contained an open reading frame encoding a 149-residue preprotein (Fig. 1), of which the first 28 likely constitute the signal peptide according to an alignment with human cystatin sequences (Fig. 2) and theoretical considerations (38). This indicated a closer relationship with the secreted Family 2 cystatins than with the intracellular Family 1 cystatins. The open reading frame contained a typical consensus sequence for initiation of translation (39) around the start ATG codon, and was followed by a poly(A) signal, AATAAA, 78 nucleotides downstream from the stop codon, after which a poly(A) sequence was evident a further 20 nucleotides downstream (Fig. 1).
The deduced mature protein sequence was just 34% identical to that of cystatin C, showed lower resemblance (26 -30% identity) to the sequences of the other known Family 2 cystatins D, S, SN, and SA (Fig. 2), and even lower similarities of 18 and 23% identical residues when compared with the Family 1 cystatins, A and B (not shown). The resemblance to cystatin domain 2 of human kininogen was similarly low (18% identity), whereas the domain 3 sequence was 29% identical to the cystatin E sequence. However, the sequence contained a Gly residue at exactly the same distance from a central Gln-Xaa-Val-Xaa-Gly motif as the other cystatin sequences and also a Pro-Trp pair toward the C-terminal end of the translation product, like that of the human Family 2 or 3 cystatins. The sequence also contained four Cys residues toward the C-terminal end, alignable with those in Family 2 cystatins. The four Cys residues in cystatin C and the avian analogue, chicken cystatin, form two disulfide bridges stabilizing the cystatin structure (37,40). The novel protein was thus similar to Family 2 cystatins in parts essential for structure and function and was designated cystatin E. Its evolutionary relationship to the cystatin superfamily seems undisputed, but according to the relatively low sequence similarities, it should be seen as a first member in a new protein family (9). The cystatin E sequence also had some unusual characteristics, including a five-residue insertion between amino acids 76 and 77 and a deletion of residue 91 (cystatin C numbering). These sequence positions correspond to polypeptide parts on the side opposite to the proteinase binding region of chicken cystatin (40) and would likely not affect an inhibitory function of cystatin E. A motif search in addition showed a target Asn-Xaa-Ser/Thr sequence for glycosylation at positions 108 -110. On the gene level, a cystatin multigene locus on the short arm of chromosome 20 has been investigated in detail. This locus harbors the genes for the known Family 2 cystatins C, D, S, SN, and SA and in addition two pseudogenes but, according to estimates using cross-hybridizing probes in Southern blotting, likely no additional genes (10 -12). Again, this supports that cystatin E is a

FIG. 2. Alignment of amino acid sequences for cystatin E with human members of cystatin Family 2.
The numbering refers to the cystatin C sequence as deduced from its cDNA, starting from the first residue of the mature protein (33,34). For the other cystatins, the naturally occurring forms with longest N-terminal segments are shown (35,36). Dashes indicate gaps introduced to optimize the alignment. Vertical lines indicate residues identical in cystatin E and all five known human Family 2 cystatins. Boxes indicate residues that are involved in the cysteine proteinase inhibitory activity of the Family 2 cystatins. The four cysteine residues shown to form two disulfide bridges in cystatin C (37) are marked below the sequences. The putative N-glycosylation site in the cystatin E sequence is marked with an asterisk.
protein distantly, but significantly, related to the Family 2 cystatins.
Initial attempts to produce cystatin E in E. coli gave negligible amounts of recombinant protein. The cystatin E cDNA was therefore subcloned in a baculovirus expression vector and was expressed in insect Sf9 cells (see "Experimental Procedures"). The recombinant protein was secreted into the cell media of such cultures, with a yield of ϳ10 -20 mg/liter of culture medium. The protein was purified by a combinantion of ion exchange and dye affinity column chromatographies, resulting in a Ͼ95% pure protein preparation according to SDS-PAGE (Fig. 3A), provided that the observed protein band doublet was due to microheterogeneity of the same protein (see below). N-terminal sequence analysis of both protein bands confirmed that this was the case and demonstrated, for both, that a 28-residue segment is proteolytically removed during secretion from the insect cells to yield a mature protein beginning with the sequence Arg-Pro-Gln-Glu-. The N-terminal Arg residue corresponds to residue 4 in the cystatin C sequence and agrees with a theoretical signal peptidase cleavage site with (small side chains for residues in positions Ϫ3 and Ϫ1 (38) (cf. Fig. 1). The M r of the mature protein bands, calculated from their SDS-PAGE mobility after reduction was ϳ14,300 and 15,700, in good agreement with a theoretical mass of 13,652 calculated from the sequence.
The recombinant cystatin E remained as a doublet through the above purification procedures (Fig. 3A). As the protein bands shared the same N-terminal sequence, the upper band of the doublet was suspected to be a glycosylated species of cystatin E, given that a theoretical glycosylation site was found at motif analysis of the protein sequence (Asn-Ser-Ser at positions 108 -110; cf. Fig. 1). Indeed, monosaccharide composition analyses showed that the recombinant cystatin E contains N-acetylglucosamine (1.1 and 0.4 mol/mol of protein, for two different preparations) and mannose (1.6 and 0.3 mol/mol of protein), thus demonstrating that the single theoretical site, Asn-108, likely is glycosylated with an N-linked oligosaccharide. That the upper SDS-PAGE band corresponds to a glycosylated cystatin E species was verified by its size reduction upon incubation with the N-linked oligosaccharide cleaving enzyme, PNGase (Fig. 3A). This glycosylation of cystatin E is unique for the human low M r cysteine proteinase inhibitors. It has been reported that rat cystatin C isolated from urine is partly glycosylated, but in this case, the oligosaccharide is linked to residue 79, located on the opposite side and distant from the proteinase contact area, of the molecule (40,42).
To test whether cystatin E is a functional proteinase inhib-itor, the effect of isolated recombinant cystatin E (a preparation containing Ͻ35% glycosylated protein) on the papain hydrolysis of casein was investigated. The cystatin showed a dose-and time-dependent inhibition of the papain activity (not shown). At M concentrations, the recombinant cystatin E completely inhibited papain hydrolysis of Bz-Arg-pNA in 10-min assays. Titration curves drawn from the results of assays with varying inhibitor concentrations were linear and thus compatible with a reversible inhibition with a K i Ͻ 10 nM. The active concentration of the preparation studied was 27 M and the concentration determined by quantitative amino acid analysis 40 M, demonstrating that the recombinant protein was close to 100% active. (The apparently lower active concentration is most likely due to binding of the cystatin also to a papain species not capable of hydrolyzing the substrate, as has been shown to be the case for cystatin C in similar experiments (41,43); the affinity purified papain used was ϳ70% active, as determined by titration of its activity with compound E-64.) Determination of equilibrium constants for complexes between papain and human cathepsin B were done by steady-state measurements in more dilute fluorogenic enzyme assays, showing that cystatin E is a tight binding inhibitor of both enzymes (K i values of 0.39 and 32 nM, respectively). Cystatin E is thus intermediate to cystatins C and D with respect to affinity for papain and has, unlike cystatin D, the property of being a cathepsin B inhibitor ( Table I). The structural element that has been shown to be at least partly responsible for the taget enzyme specificity of cystatin C, the N-terminal segment Arg-Leu-Val-Gly at positions 8 -11, with the Leu and Val residues interacting with target enzymes substrate subpockets S 3 and S 2 , respectively (28,29,40,45), corresponds to an Arg-Met-Val-Gly segment in cystatin E. The corresponding N-terminal segment in cystatin D is Thr-Leu-Ala-Gly; the small side chain of the Ala residue might be a partial explanation for the lack of cathepsin B inhibition observed for this inhibitor, in contrast to cystatins C and E which both have a hydrophobic (Val or Met) residue that could be interacting with a deep S 2 subpocket of cathepsin B (45). An antiserum was raised against the recombinant cystatin E and used in an attempt to find a corresponding protein in human tissues. Placenta was originally investigated as the cystatin E cDNA had been found in amniotic membrane and fetal epithelial cells. Homogenates of placenta contained a cystatin E-immunoreactive protein, with an M r of ϳ15,000 according to SDS-PAGE (not shown), but the protein proved hard to purify for sequencing due to a high content of the similarly charged and sized hemoglobin ␣-chain in the homogenates. Investigation of urine from a patient with mixed glomerulartubular proteinuria, a body fluid enriched in low M r proteins originating from blood plasma and previously shown to be a good source of cystatins (21), again identified a protein with cystatin E immunoreactivity. By using the IgG fraction of the  antiserum to produce an immunoaffinity column, the immunoreactive protein could be isolated and characterized by agarose gel electrophoresis, immunoblotting, gel filtration, and N-terminal sequencing. The protein migrated in the ␤1-zone on agarose gel electrophoresis (Fig. 3B), reacted with the antiserum raised against recombinant cystatin E, cochromatographed with other low M r cystatins upon gel filtration, and displayed the N-terminal sequence Met-Val-Gly-Glu-Leu-Xaa-Asp-Leu-. Thus, the resulting sequence agreed with that of the postulated mature recombinant cystatin E from residue 6 (residue 9 in cystatin C numbering) onwards for all seven residues that could be identified. These results show that human fluids contain a protein related or identical to recombinant cystatin E. The shorter N-terminal segment of the protein isolated from human urine could be due to proteolytic processing in urine before addition of protease inhibitors and could be analogous with the presence of several N-terminally truncated forms of cystatin C in urine (21). The reasons why cystatin E has been overlooked in several studies aiming at isolation of cystatins from human tissues and body fluids using affinity chromatography with carboxymethyl-papain as the catching ligand (e.g. see Ref. 21) are not clear, but one possibility is that the cystatin E, although with a high affinity for active papain (K a Ͼ10 9 M Ϫ1 ), has significantly lower affinity for the carboxymethylated enzyme. Another explanation might be that cystatin E is present in very low concentrations in body fluids, or that the majority of the inhibitor is complexed to target proteinases in such fluids, preventing binding to the catching proteinase. Estimated from Coomassie staining of cystatin E isolated by immunoaffinity, and from the yields of the amino acid sequencing, the urine used as starting material contains around 1-10 nmol/ 200 ml of (10 ϫ concentrated) urine. In similar experiments aiming at isolation of cystatins from urine using carboxymethyl-papain as affinity ligand, the yield of cystatin C was ϳ3,000 nmol/100 ml of 20 ϫ concentrated urine, i.e. 1,000-fold higher (21).
To investigate the tissue distribution of cystatin E, Northern blot experiments with the cystatin E cDNA as probe were performed. The low nucleotide sequence similarity with Family 2 cystatin cDNAs made it unlikely that the cystatin E probe would cross-hybridize with mRNAs of other cystatins. Indeed, in control experiments, there were no significant cross-reactions with cDNAs for cystatins C and D (not shown). However, given the fact that a previously overlooked cystatin sequence now has been identified, it cannot be ruled out that additional low copy human cystatin gene transcripts with higher similarity to the cystatin E probe could be present in specialized tissues. The indications from the Northern blot results (Fig. 4) are that the cystatin E gene is expressed in most tissues in the body. The distribution pattern of relatively strong mRNA signals in uterus and liver, slightly weaker but significant signals in placenta, pancreas, heart, spleen, small intestine, and peripheral blood leukocytes, and low cystatin E mRNA content in brain, testis, and kidney, is clearly different from the distribution of e.g. cystatin C mRNA. The fact that all of the cDNA clones we identified originated either from amniotic membrane cell or fetal skin epithelium libraries (a total of 59 clones out of 10,000 sequenced, i.e. almost 0.6%) indicates that the cystatin E gene expression is strongly up-regulated in, or specific for, epithelial cells. In addition, the origin of these cells suggests that the inhibitor serves a protective role during fetal development.