Cystatin F Is a Glycosylated Human Low Molecular Weight Cysteine Proteinase Inhibitor*

A previously undescribed human member of the cystatin superfamily called cystatin F has been identified by expressed sequence tag sequencing in human cDNA libraries. A full-length cDNA clone was obtained from a library made from mRNA of CD34-depleted cord blood cells. The sequence of the cDNA contained an open reading frame encoding a putative 19-residue signal peptide and a mature protein of 126 amino acids with two disulfide bridges and enzyme-binding motifs homologous to those of Family 2 cystatins. Unlike other human cystatins, cystatin F has 2 additional Cys residues, indicating the presence of an extra disulfide bridge stabilizing the N-terminal region of the molecule. Recombinant cystatin F was produced in a baculovirus expression system and characterized. The mature recombinant protein processed by insect cells had an N-terminal segment 7 residues longer than that of cystatin C and displayed reversible inhibition of papain and cathepsin L (K i = 1.1 and 0.31 nm, respectively), but not cathepsin B. Like cystatin E/M, cystatin F is a glycoprotein, carrying two N-linked carbohydrate chains at positions 36 and 88. An immunoassay for quantification of cystatin F showed that blood contains low levels of the inhibitor (0.9 ng/ml). Six B cell lines in culture secreted barely detectable amounts of cystatin F, but several T cell lines and especially one myeloid cell line secreted significant amounts of the inhibitor. Northern blot analysis revealed that the cystatin F gene is primarily expressed in peripheral blood cells and spleen. Tissue expression clearly different from that of the ubiquitous inhibitor, cystatin C, was also indicated by a high incidence of cystatin F clones in cDNA libraries from dendritic and T cells, but no clones identified by expressed sequence tag sequencing in several B cell libraries and in >600 libraries from other human tissues and cells.

Cysteine proteinase inhibitors of the cystatin superfamily are ubiquitous in the body and are generally tight-binding inhibitors of papain-like cysteine proteinases, such as cathepsins B, H, L, S, and K (for review, see Ref. 1). They should therefore serve a protective function to regulate the activities of such endogenous proteinases, which otherwise may cause uncontrolled proteolysis and tissue damage. Cysteine proteinase activity can not normally be measured in body fluids, but can be detected extracellularly in conditions like endotoxin-induced sepsis (2) and metastasizing cancer (3) and at local inflammatory processes such as in rheumatoid arthritis (4), purulent bronchiectasis (5), and periodontitis (6), which indicates that a tight enzyme regulation by cystatins is a necessity in the normal state. A deficiency state in which the levels of the intracellular cystatin, cystatin B, are lowered due to mutations has recently been shown to segregate with a form of progressive myoclonus epilepsy (7), which points to additional specialized functions of cystatins. Moreover, cell culture results showing that chicken cystatin, human cystatin C, and human cystatin D inhibit the replication of polio (8), herpes simplex (9) and corona (10) viruses, respectively, and that human cystatin A inhibits rhabdovirus-induced apoptosis (11) indicate that cystatins play additional roles in the human defense system.
The cystatins constitute a superfamily of evolutionary related proteins, all composed of at least one 100 -120-residue domain with conserved sequence motifs (12). The previously well characterized single-domain human members of this superfamily could be grouped in two protein families. The Family 1 members, cystatins (or stefins) A and B, contain ϳ100 amino acid residues, lack disulfide bridges, and are not synthesized as preproteins with signal peptides. The Family 2 cystatins, cystatins C, D, S, SN, and SA, are secreted proteins of ϳ120 amino acid residues (M r 13,000 -14,000) and have two characteristic intrachain disulfide bonds. Recently, we identified an additional human cystatin superfamily member by EST 1 sequencing in epithelial cell-derived cDNA libraries that we named cystatin E (13). The same cystatin was independently discovered by differential display experiments as an mRNA species down-regulated in breast tumor tissue, but present in the surrounding epithelium and reported under the name cystatin M (14). Cystatin E/M is an atypical secreted low M r cystatin in that it is a glycoprotein and shows only 30 -35% sequence identity in alignments with the human Family 2 cystatins, which demonstrates that additional cystatin families are yet to be identified (13). The cystatin E/M gene has been localized to chromosome 11 (15), whereas all human Family 2 cystatin genes are clustered on the short arm of chromosome 20 (16), which further stresses that cystatin E/M is only distantly related to the other secreted human low M r cystatins.
In this investigation, we continued our search for novel human proteins with distant relationship to the cystatin super-family. We report the characteristics of a cystatin identified in cDNA libraries from immune cells, cystatin F, which is only 30 -34% homologous in overall sequence to the human Family 2 cystatins, but also only 29% homologous to cystatin E/M. Like cystatin E/M, cystatin F is a glycoprotein and a potent cysteine proteinase inhibitor.

EXPERIMENTAL PROCEDURES
Identification of cDNAs Encoding Cystatin F-A data base containing Ͼ1,000,000 ESTs obtained from Ͼ650 different cDNA libraries has been generated by Human Genome Sciences, Inc. and the Institute for Genomic Research using high throughput automated DNA sequence analysis of randomly selected human cDNA clones (17,18). Sequence homology comparisons of each EST were performed against the Gen-Bank TM Data Bank using the BLAST and BLASTN algorithms (19). ESTs having homology to previously identified sequences (p Յ 0.01) were collected in a data base. A specific homology and motif search using the known amino acid sequence of human cystatin C (M27891) against this data base revealed several ESTs having translated sequences Ͼ30% homologous to that of cystatin C. One clone (HCUDE60) encoding an intact N-terminal signal peptide was identified in a human CD34-depleted cord blood cell library and selected for further investigation. The complete cDNA sequence of both strands of this clone was determined, and its homology to the cystatin C cDNA was confirmed.
Baculovirus Expression for Production of Cystatin F-The entire coding sequence of the cystatin F cDNA was amplified using standard polymerase chain reaction techniques with primers corresponding to the 5Ј-and 3Ј-sequences of the gene (upstream primer (with tailing BamHI site underlined), 5Ј-CGC GGA TCC GCC ATG CGA GCG GCT-GGA-3Ј; and downstream primer (with Asp718 site), 5Ј-CGC GGT AC-C CTG AAG AGG CGG GGG TCA-3Ј). The amplified fragment was purified, digested with BamHI and Asp718, and again purified. The baculovirus expression vector pA2, derived from pNR704 (20,21), was digested with BamHI and Asp718, followed by agarose gel purification. The opened and purified pA2 vector was ligated with the amplified cystatin coding sequence using T4 DNA ligase (Life Technologies, Inc.).
Recombinant baculoviruses encoding cystatin F were generated by co-transfection of Sf9 cells with 5 g of transfer vector containing the HCUDE60 cDNA and 1 g of BaculoGold viral DNA (Pharmingen) using Lipofectin (Life Technologies, Inc.). Details on screening and cloning of the recombinant baculoviruses have been reported earlier (13).
Recombinant cystatin F was purified from 10 liters of Sf9 cell supernatants at 92 h post-infection. The insect cells were grown in EX-CEL401 medium (JRH Biosciences) with 1% (v/v) fetal bovine serum. The harvested supernatants were pooled and clarified by centrifugation at 18,000 ϫ g in a continuous flow centrifuge. The supernatant was loaded onto a strong cation-exchange column (Poros HS50, PerSeptive Biosystems) at pH 6.2. The column had previously been equilibrated in 20 mM sodium acetate buffer, pH 5.5, containing 100 mM NaCl. After loading the supernatant onto the HS50 column, the column was washed with the equilibration buffer, followed by successive elutions with the same buffer containing 250, 400, 600, and 1000 mM NaCl. The recombinant cystatin F eluted in the 400 mM NaCl fraction. This fraction was then loaded onto a size-exclusion chromatography column (Superdex S-75, Amersham Pharmacia Biotech), which had previously been equilibrated in 20 mM sodium acetate buffer, pH 5.5, containing 100 mM NaCl. The fractions containing cystatin F were identified by SDS-PAGE analysis and pooled for the next step. The pooled fractions were dialyzed against 40 mM Tris-HCl buffer, pH 8.0, containing 100 mM NaCl. This pool was then loaded onto a pre-equilibrated strong anion-exchange chromatography column (Poros HQ50, PerSeptive Biosystems), and the flow-through fraction was collected. The flow-though pool from the HQ50 column was finally re-equilibrated in 20 mM sodium acetate buffer, pH 6.0, containing 100 mM NaCl and then loaded onto a weak cation-exchange column (Poros CM20, PerSeptive Biosystems). The recombinant cystatin F was eluted using a linear salt gradient (from 100 to 1000 mM NaCl in 20 mM sodium acetate buffer, pH 6.0). The eluted fractions containing purified cystatin F (ϳat 400 mM NaCl) were identified by SDS-PAGE and verified by N-terminal amino acid sequencing. The final protein concentration (3.6 mg/ml) was estimated using the Bradford assay (22). Endotoxin levels were assayed using the Amebocyte lysate test (BioWhittaker, Inc.) and were Ͻ0.05 enzyme units/mg of protein. The overall yield of recombinant cystatin F from the 10 liters of insect cell culture supernatants was ϳ42 mg.
Antiserum Production and Construction of an ELISA for Cystatin F Quantification-An antiserum against cystatin F was raised by inject-ing 0.2 mg of isolated recombinant antigen (see above) in Freund's complete adjuvant (Difco) subcutaneously into a rabbit. The injection was repeated after 3 weeks, and the rabbit was bled every third week. The specificity of the antiserum was tested by immunoelectrophoresis of the recombinant cystatin F used as antigen; of concentrated proteinuria urine containing cystatins A, B, C, S, SN, and E and kininogen (23); and of recombinant cystatins C, D, and E (13,24,25). The IgG fraction of 50 ml of antiserum was isolated by absorption to protein A-Sepharose (Amersham Pharmacia Biotech) and subsequent elution with 0.1 M glycine buffer, pH 2.2. The eluate was immediately neutralized by addition of 2 M Tris buffer, pH 7.4.
The IgG fraction of the antiserum was used for construction of a double-sandwich ELISA, using detailed methods and materials described earlier for a mouse cystatin C ELISA (26). Briefly, wells in polystyrene microtiter plates (MaxiSorp, Nunc, Copenhagen, Denmark) were coated with the IgG fraction of the antiserum against cystatin F as catching antibody. Sample or, as calibrator, isolated recombinant cystatin F (in dilutions from 0.2 to 100 ng/ml) was added to the wells. A portion of the same IgG fraction of the antiserum that had been biotinylated (27) was used as detecting antibody. Bound cystatin F was quantified using horseradish peroxidase-conjugated streptavidin (Amersham Pharmacia Biotech) and the color reaction with 2,2Ј-azinobis(3ethylbenzthiazoline-6-sulfonic acid) (Sigma) to detect peroxidase activity. The absorbance was read at 405 nm in a Titerek Multiscan spectrophotometer.
For quantitation of human cystatin C, a similar double-sandwich ELISA, but with a monoclonal antibody as detecting reagent, was used (28). As an assay control for both the cystatin F and cystatin C ELISAs, pooled normal human serum (DAKO A/S, Copenhagen, Denmark) was included in the sample series at each measurement.
Purification of Cystatin F from Human Blood and Cells-As starting material for cystatin F purification attempts, blood was drawn from healthy volunteers in tubes containing EDTA to prevent coagulation. The blood plasma was separated from cells, and an inhibitor preservation mixture resulting in final concentrations of 5 mM benzamidinium chloride, 10 mM EDTA, and 15 mM sodium azide was immediately added to prevent breakdown of sample proteins.  (33), and THP-1 (ATCC TIB-202) were cultured in RPMI 1640 medium supplemented with 4 mM L-glutamine, 1% (v/v) 100ϫ nonessential amino acids, and 10% (v/v) fetal calf serum (all from Life Technologies, Inc.) at 37°C in 5% CO 2 atmosphere. The initial seeding of cells was routinely 0.5 ϫ 10 6 cells/ml, and samples of conditioned media were collected after 24 -72 h of culturing, whereupon the separated cells were counted. For purification attempts, Jurkat, MOLT-4, and U-937 cells were cultured in RPMI 1640 medium with supplements as described above, but without serum. Following 72 h of incubation under these conditions, counting of cells in the presence of trypan blue showed ϳ20% dead cells. A total of 1 liter of media from such cultures were pooled, recentrifuged at 10,000 ϫ g for 10 min to completely remove cell debris, and stored at Ϫ80°C after addition of the inhibitor preservation mixture described above.
For fractionation of blood plasma, solid ammonium sulfate was added to a series of tubes containing 2-ml portions of plasma to achieve 25, 30, 40, 45, 50, 55, 60, 70, and 75% (w/v) saturation. Following overnight incubation at 4°C, the tubes were centrifuged (10,000 ϫ g, 30 min), the supernatants were collected, and the precipitates were dissolved in water. To concentrate such samples and samples of spent culture medium, ultrafiltration using Centricon-3 concentrators (Amicon, Inc., Beverly, MA) was carried out. Immunoblotting of samples was done after electrophoretic separation on 16.5% SDS-polyacrylamide gels using the buffer system of Schä gger and von Jagow (34) or after agarose gel electrophoresis at pH 8.6 (35), followed by electric transfer to polyvinylidene difluoride membranes (Millipore Corp.). The membranes were incubated with the IgG fraction of the polyclonal antiserum against recombinant cystatin F (1-10 g/ml), and bound antibodies were detected with horseradish peroxidase-conjugated swine anti-rabbit IgG (DAKO A/S, Copenhagen, Denmark) and a coupled chemiluminescence reaction (ECL Plus, Amersham Pharmacia Biotech). Ion-exchange chromatography was performed on a MonoQ 10/10 FPLC column (Amersham Pharmacia Biotech) equilibrated in 50 mM ethanolamine buffer, pH 9.0, and eluted with a linear gradient of 0 -500 mM NaCl in the same buffer. Gel filtration was performed on a Superdex 75 FPLC column (Amersham Pharmacia Biotech) equilibrated in 50 mM ammonium bicarbonate buffer, pH 8.0, at a flow rate of 0.5 ml/min.
Protein Analyses-Glycosylation analysis of cystatin F was per-formed by determining the monosaccharide content in a purified preparation of the recombinant protein. Approximately 10 g of the protein was hydrolyzed with 0.2 M trifluoroacetic acid at 100°C for 4 h. The hydrolysate was dried in a SpeedVac and reconstituted in 50 l of deionized water, after which monosaccharides were analyzed on a Dionex carbohydrate analyzer (36). A Dionex PA-1 column was used to separate the monosaccharides by isocratic elution with 12 mM NaOH. The monosaccharides were detected by integrated amperometry. Glycosylation analysis was also done by incubation with peptide N-glycosidase F (EC 3.2.2.18) under conditions recommended by the enzyme supplier (Oxford GlycoSystems, Abingdon, United Kingdom), followed by 16.5% SDS-PAGE and silver staining. Automated N-terminal sequencing using an ABI492 sequencer (Applied Biosystems, Inc.) was carried out after electrophoresis on 4 -20% SDS-polyacrylamide gels (Novex) using the buffer system by Laemmli (37), blotting onto a ProBlott membrane (Applied Biosystems, Inc.), staining with Ponceau S (0.2% in 4% acetic acid), and excision of the band of interest. Amino acid composition analysis was carried out using a Beckman High Performance Analyser System 6300 after in vacuo hydrolysis of samples for 24, 48, and 72 h in 6 M HCl. Size determination by gel chromatography was performed on the Superdex 75 column run as described above, with the elution volumes for aprotinin (M r ϭ 6500), cytochrome c (M r ϭ 12,400), chymotrypsinogen (M r ϭ 23,400), and ␤-lactoglobulin (M r ϭ 35,000) used for construction of a calibration curve. To analyze the content of free thiol groups in recombinant cystatin F, the Ellman reaction with 5,5Ј-dithiobis(2-nitrobenzoic acid) (Sigma) was performed under conditions described earlier (38). One sample of isolated recombinant cystatin F was analyzed directly after purification (see above). A second sample of cystatin F and a sample of papain, used as a positive reaction control, were both analyzed after mild reduction to reduce oxidized sulfhydryl groups. Papain (17.5 mg; Sigma type III) and cystatin F (0.36 mg) was incubated in 100 mM sodium phosphate buffer, pH 6.5, containing 1 mM dithiothreitol for 15 min. Removal of excess dithiothreitol was accomplished by chromatography on a HighTrap desalting column (Amersham Pharmacia Biotech) in 40 mM sodium phosphate buffer, pH 6.5. Portions of the desalted samples were directly analyzed for free thiol groups by the Ellman reaction. The total concentrations of papain and cystatin F in the desalted samples were determined by A 280 measurement and the Bradford assay (22), respectively. The proportion of active enzyme in the papain sample was measured by E-64 titration (see below). Enzyme Assays-The methods used for active-site titration of papain (using benzoyl-DL-Arg p-nitroanilide (Bachem Feinchemikalien, Bubendorf, Switzerland) as substrate), titration of the molar enzyme inhibitory concentration in cystatin F preparations, and determination of equilibrium constants for dissociation (K i ) of complexes between cystatin F and cysteine proteinases have been reviewed (1). The enzymes used were papain (EC 3.4.22.2; from Sigma), activatable to 70 -75% after affinity purification on Sepharose-coupled Gly-Gly-Tyr-Arg as detailed previously (39,40), human cathepsin B (EC 3.4.22.1; from Calbiochem, La Jolla, CA), and human cathepsin L (EC 3.4.22.15; Calbiochem). The fluorogenic substrate used for K i determinations was benzyloxycarbonyl-Phe-Arg-7-amido-4-methylcoumarin (10 M; Bachem Feinchemikalien), and the assay buffer was 100 mM sodium phosphate buffer (adjusted to pH 6.5 for papain and to pH 6.0 for cathepsin B and cathepsin L) containing 1 mM dithiothreitol and 2 mM EDTA. Steady-state velocities were measured before and after addition of cystatin F in assays at 37°C, and K i values were calculated according to Henderson (41). Corrections for substrate competition were made using K m values determined for the substrate batch used under the assay conditions employed (60, 55, and 3.2 M for papain, cathepsin B, and cathepsin L, respectively).

RESULTS AND DISCUSSION
Discovery of a Novel Human Cystatin-On analysis of EST sequences obtained from human cDNA libraries, several clones encoding polypeptides with low but significant homology (30 -35%) to the cystatin C sequence (42) were identified. Some of these encoded cystatin E/M (13), but a group of others encoded another cystatin-related protein. A full-length cDNA clone in the latter group (designated HCUDE60) was identified in a library from CD34-depleted cord blood cells. The 727-base pair cDNA included an open reading frame encoding a 145-residue preprotein (Fig. 1), of which the first 19 theoretically (43) should constitute the signal peptide. The cDNA contained a typical consensus sequence for initiation of translation (44) around the start ATG codon of the open reading frame and a normal polyadenylation signal (ATTAAA) at a distance of 18 nucleotides from a poly(A) stretch in the 3Ј-end. An alignment of the deduced 126-residue mature protein sequence with the six known secreted single-domain human cystatins demonstrated that the novel protein was most similar to cystatin C of the Family 2 cystatins (34% identical residues) and a bit more distantly related to cystatins D, S, SN, and SA (30 -32% identity) (Fig. 2). The resemblance to the recently reported atypical secreted cystatin, cystatin E/M, was similarly low, with only 29% of alignable residues identical. Thus, the novel protein should most likely be seen as a first representative of a third protein family within the category of secreted single-domain human cystatins (12). The homologies to the human Family 1 cystatins, cystatins A and B, were even lower (22 and 20% identities, respectively), as was the homology to the inhibitoractive domain 2 of human kininogen (24% identity). As also seen for cystatin E/M, the novel cystatin showed a closer relationship with domain 3 of kininogen, almost as high as with the Family 2 cystatins (32% identical residues). The sequence contained a central Gln-Xaa-Val-Xaa-Gly motif, a typical Gly residue in the N-terminal segment 44 positions earlier, and also a Pro-Trp pair toward the C-terminal end of the translation product, like the sequences of the human Family 2 or 3 cystatins and cystatin E/M. The sequence also contained 4 Cys residues toward the C-terminal end, corresponding to those forming disulfide bridges in cystatin C and chicken cystatin, thereby stabilizing the cystatin structure (47,49). Since the novel protein was significantly similar in overall sequence to known cystatins, and especially in parts essential for structure and function, it was designated cystatin F.
The cystatin F sequence contained some remarkable characteristics. The presence of a second Trp residue, in addition to the conserved Trp-106 (numbering refers to the human cysta-tin C sequence; see Fig. 2A) of the second hairpin loop of the enzyme-binding surface, is unusual. This Trp-102 residue is located in a position where Tyr is relatively conserved in other cystatins ( Fig. 2A). Tyr-102 of cystatin C has been shown to be involved in a hydrophobic patch likely important for the integrity of the enzyme-binding surface of the inhibitor (48). The substitution of Trp for Tyr-102 in cystatin F would thus likely influence the enzyme-binding property of the protein. Moreover, the presence of a charged residue in the Gln-Xaa-Val-Xaa-Gly motif of the first hairpin loop of the enzyme-binding surface (Lys-58) is highly unusual. It can be predicted that the generally hydrophobic nature of the enzyme-binding surface and hence the specificity of target enzyme binding would be affected by this feature. The proposed signal peptidase-processing site in pre-cystatin F would generate a mature protein with an

FIG. 2. Relationship between human cysteine proteinase inhibitors.
A, amino acid sequence alignment of single-domain cystatins (Cys). The numbering refers to the cystatin C sequence as deduced from its cDNA, starting from the first residue of the mature protein (42,45). For the other cystatins, the naturally occurring forms with the longest N-terminal segments are shown (13,25,46). Dashes indicate gaps introduced to optimize the alignment. Vertical lines indicate residues identical in cystatin F and the other secreted single-domain cystatins. Boxes indicate residues that are involved in the cysteine proteinase inhibitory activity of the Family 2 cystatins and cystatin E/M. The 4 Cys residues shown to form two disulfide bridges in cystatin C (47) are marked with solid brackets. The 2 additional Cys residues in the cystatin F sequence likely forming a third disulfide bridge are shown with a dashed bracket. The N-glycosylation sites in the cystatin F and cystatin E/M sequences are marked with asterisks. The secondary structures forming segments in cystatin C (48) and chicken cystatin (49) are indicated above the sequences (␣, ␣-helix; ␤1-5, strands of the ␤-sheet). B, schematic illustration of evolutionary relationships. The evolutionary relationships between all known inhibitory active human cystatin domains are indicated. The phylogenetic tree was constructed using GROWTREE included in the GCG software package (Version 8.1; Genetics Computer Group, Inc., Madison, WI). The phylogenic distances were obtained according to the method described by Kimura (50). The reconstruction of the tree was done by the unweighed pair group method using arithmetic averages. extended N-terminal segment, being 6 -10 residues longer than those of other single-domain cystatins. The presence of a Cys residue in this segment, and another not seen in other cystatins in position 37 ( Fig. 2A), indicated the presence of a third disulfide bridge in addition to the two ones present in all known secretory cystatins of higher animals. Assuming a structure of cystatin F similar to those of human cystatin C and chicken cystatin (48,49), Cys-37 would be located in a loop just after an ␣-helix-forming segment, on the side of the molecule opposite from the enzyme-binding site. According to molecular modeling, this residue could be in close contact with Cys-1, provided that the extended N-terminal segment of cystatin F is stretched from the proposed anchored Gly-11 residue included in the enzyme-binding site, along the ␣-helix on the surface of the molecule. The extra disulfide could thus function to stabilize the N-terminal segment, presumably in a conformation that would affect the specificity of target enzyme inhibition for cystatin F. A motif search, in addition, showed two target Asn-Xaa-(Ser/Thr) sequences for glycosylation at positions 36 -38 and 88 -90. The first site would be located in the same surface loop as the likely disulfide-bonded Cys-37 residue. The second site lies in a segment that, assuming three-dimensional structure homology to chicken cystatin and human cystatin C, forms a surface loop close to the fourth strand of the ␤-sheet. As both of these loops are on the opposite side from the enzymebinding region, attached carbohydrate chains would likely not affect the function of cystatin F.
Characterization of Recombinant Cystatin F-To allow studies of the properties of cystatin F, recombinant production of the protein was attempted. The cystatin F cDNA was subcloned in a baculovirus expression vector as described under "Experimental Procedures." Recombinant cystatin F was secreted into the cell media of Sf9 cell cultures at a level of ϳ10 mg/liter of culture medium. Recombinant cystatin F was purified from 10 liters of such cultures (see "Experimental Procedures"), yielding a Ͼ95% pure protein preparation according to chargeseparating agarose gel electrophoresis (data not shown). Nterminal sequence analysis gave the single sequence (Ͻ5% background) Gly-Pro-Ser-Pro-Asp-, thus confirming the purity and demonstrating that, in the insect cells, the theoretical 19-residue signal peptide indeed is proteolytically removed during secretion (cf. Fig. 1). Additionally, amino acid composition analysis of the Sf9 cell-produced cystatin F agreed within experimental errors with a mature cystatin F sequence as deduced from the cDNA from residue 20 of the preprotein onwards. Calculated from the sequence, the M r of the mature cystatin F polypeptide would be 14,543.
Although the isolated recombinant cystatin F displayed a single N-terminal sequence, it showed a two-band pattern when analyzed by SDS-PAGE (Fig. 3, lane 1). Given the presence of two theoretical glycosylation sites, which moreover likely are present on the surface of the molecule, the recombinant cystatin F was analyzed for carbohydrate content. Monosaccharide composition analysis showed that the recombinant protein contained 3.0 and 4.9 mol of N-acetylglucosamine and mannose, respectively, per mol of protein, a composition compatible with the presence of only N-linked carbohydrate chains. Thus, at least one of the potential sites is glycosylated. To investigate whether one or two of the sites were occupied, cystatin F was incubated with the N-linked oligosaccharidecleaving enzyme peptide N-glycosidase F. The results from a time course experiment (Fig. 3) clearly demonstrated that the two SDS-PAGE bands for recombinant cystatin F correspond to one form with two N-linked carbohydrate chains and one form with a single carbohydrate chain, being synthesized in approximately equal quantities by the Sf9 cells. Prolonged incubation with peptide N-glycosidase F resulted in complete removal of the carbohydrate, yielding a SDS-PAGE band for reduced cystatin F corresponding to an M r of 15,500, in good agreement with the M r of 14,543 calculated from the sequence. The M r values for monoglycosylated and diglycosylated species calculated from their SDS-PAGE bands were 17,300 and 19,200, respectively, i.e. each carbohydrate chain equaled ϳ2,000 Da in size. The size of recombinant cystatin F was also estimated by gel chromatography on a calibrated Superdex 75 column, giving a M r of 26,800 for the native inhibitor (Fig. 4). This is significantly higher than for the carbohydrate-free cystatins and for the monoglycosylated cystatin E/M and likely reflects a drastic increase in hydrodynamic volume due to the carbohydrate chains, although it cannot be ruled out that cystatin F exists as a dimer under native conditions.
To investigate the possibility of a third disulfide bridge in cystatin F as discussed above, the isolated recombinant cystatin was analyzed by reaction with 5,5Ј-dithiobis(2-nitrobenzoic acid) after mild reduction. The cystatin F sample contained Ͻ0.09 mol of free sulfhydryl/mol of protein, i.e. Ͻ4.5% of what would have been expected if Cys-1 and Cys-37 both had free thiol groups. A control reaction with activated crude papain (222 M according to absorbance and 60 M (27%) activable enzyme according to E-64 titration) gave 51 M thiol groups, i.e. close to the expected 1 mol of free thiol/mol of active papain (38). Thus, the results of the experiment strongly indicate that all 6 Cys residues of cystatin F are disulfide-paired and that a third disulfide bond, unique for human cystatins and stabilizing the N-terminal segment, is likely present in cystatin F.
Cysteine Proteinase Inhibitory Activity of Cystatin F-To investigate a potential proteinase inhibitory function of cystatin F, the recombinant protein was incubated with papain, and residual activity against a peptidyl substrate was measured. Cystatin F showed a dose-and time-dependent inhibition of the papain activity (data not shown). At micromolar concentrations, the recombinant cystatin F completely inhibited papain hydrolysis of benzoyl-Arg p-nitroanilide in 10-min assays, like other human cystatins (1). Titration curves drawn from the results of assays with varying inhibitor concentrations were linear and thus compatible with a reversible inhibition, with K i Ͻ 10 nM. The active concentration of the preparation studied was 3.8 M, whereas that determined by quantitative amino acid analysis was 12 M. Because the affinity-purified papain used for the titration experiments was activable to 70% (according to a parallel titration using a reference solution of E-64; see "Experimental Procedures") and cystatins also bind the fraction of papain that is inactive against peptidyl substrates (1), these results demonstrated that the recombinant protein   FIG. 3. Cystatin F is a glycoprotein. A time course experiment for deglycosylation of recombinant cystatin F by incubation with peptide N-glycosidase F is shown. Samples taken from the incubation mixture were analyzed by SDS-PAGE using a 16.5% gel. The gel was silverstained. Lane 1, isolated recombinant cystatin F, with no peptide Nglycosidase F added; lanes 2-6, cystatin F incubated with peptide N-glycosidase F at 37°C for 0.5 min, 30 min, 1 h, 3 h, and 6 h, respectively. The positions of relevant molecular weight marker bands are indicated to the left. was 45% active. Equilibrium constants for cystatin F complexes with papain, human cathepsin B, and human cathepsin L were determined by steady-state measurements in fluorogenic enzyme assays to allow dilution of the enzymes to nanomolar concentrations; under such assay conditions, the cystatin dissociated significantly from formed enzyme complexes. The results (Table I) show that cystatin F is a relatively tightbinding inhibitor of papain, with a K i value for the complex similar to that for the cystatin D complex, but 5 orders of magnitude higher than that for the cystatin C complex. The cystatin F affinity for cathepsin L was somewhat higher (K i ϭ 0.31 nM) than for papain and moreover significantly higher than the affinity of cystatin D for cathepsin L. Like cystatin D, however, cystatin F did not appreciably inhibit cathepsin B. This contrasts with the inhibitory specificities of cystatins C and E/M, which both have the capacity to inhibit cathepsin B (Table I). A structural element that has been shown to be at least partly responsible for the target enzyme specificity of cystatin C and of crucial importance for efficient cystatin C binding of cathepsin B is the N-terminal segment with Arg-Leu-Val-Gly at positions 8 -11 (40,53). Leu-9 and Val-10, interacting with target enzyme substrate subpockets S 3 and S 2 , respectively, are especially important for a fast interaction between the cystatin and cathepsin B (54,55). The corresponding N-terminal segment in cystatin F is Val-Lys-Pro-Gly. The presence of Pro-10 in the proposed P 2 position would likely not be favorable for binding of the N-terminal segment to cathepsin B, nor would a charged Lys-9 residue in the P 3 position be beneficial, thus likely explaining the lack of cathepsin B inhibitory activity seen for cystatin F.
Identification and Distribution of Cystatin F in Human Tissue-To identify cystatin F in human samples, an ELISA for quantification of the cystatin was constructed using a polyclonal antiserum against recombinant cystatin F and cystatin F quantified by amino acid analysis for construction of the standard curve (see "Experimental Procedures"). The ELISA showed a sensitivity of 0.5 ng/ml (defined as 2 S.D. values of blank readings), and the linear part of the standard curve allowed us to measure concentrations up to 30 ng/ml. The assay should thus be sufficiently sensitive to measure physiologically relevant concentrations of cystatin F in human samples. ELISA measurements demonstrated that blood (plasma and serum) contained cystatin F in low amounts. The cystatin F concentration in pooled normal serum was 0.89 ng/ml (mean value from three measurements). The concentration of cystatin F in urine from patients with tubular proteinuria, which has earlier been shown to contain significant amounts of the singledomain cystatins A, B, C, E, S, and SN and has proven a good starting material for purification of human cystatins (13,23), was below the detection limit, however. Measurement of bovine serum samples demonstrated that the ELISA cross-reacted with neither a putative bovine cystatin F-like inhibitor nor other bovine cystatins. Immunoblotting using the antiserum against cystatin F after separation of blood plasma proteins according to size (SDS-PAGE) or charge (agarose gel electrophoresis) gave a negative result. This was not unexpected, given the low blood concentration of the protein according to ELISA, which should mean that Ͻ Ͻ1 ng of cystatin F would be present in the 30 l that was the maximum volume that could be electrophoresed per lane. Fractionation of blood plasma by ammonium sulfate precipitation followed by concentration of the fractions and immunoblotting after SDS-PAGE separation indicated that cystatin F precipitated already in 25% saturated ammonium sulfate solution. The dissolved precipitate obtained in 25% saturated ammonium sulfate solution also contained high amounts of immunoglobulins. The immunoglobulin light chains co-eluted with cystatin F in attempts to further purify the cystatin F by ion-exchange chromatography or gel filtration according to immunoblotting of such fractions (data not shown).
Because the cystatin F concentration of Ͻ1 g in a background of some 70 g of other proteins/liter made blood plasma a far from ideal starting material for purification of natural cystatin F, we instead investigated human blood cell lines in culture for cystatin F secretion. Quantification of cystatin F in spent culture media from 10 cell lines (Table II) revealed that B cell lines generally did not produce appreciable amounts of cystatin F, but some T cell lines and one out of two monocyterelated cell lines secreted the inhibitor in significant amounts. The highest cystatin F production was seen for U-937 cells, which secreted 2.0 ng/10 6 cells/72 h, an amount equaling 10% of the cystatin C secretion from the same cells. Serum-free culturing of MOLT-4, Jurkat, and U-937 cells was attempted. The ELISA showed that conditioned medium from such cultures also contained cystatin F (Table II). The cystatin F immunoreactivity co-eluted with recombinant cystatin F in ion-exchange chromatography on MonoQ and in gel filtration as demonstrated by ELISA measurements of fractions from such chromatographies (Fig. 4), but the amounts present in the fractions after attempts to start purification from 1-liter scale serumfree cultures of U-937 cells did not allow purification of the natural cystatin F to homogeneity and were not sufficient for N-terminal sequencing from electrophoresis blots. What the results clearly demonstrated, however, was that the cystatin F  measured by ELISA was not a result of cross-reactions with other cystatins, as they elute in significantly lower M r fractions on gel filtration (Fig. 4). The apparent native size of the natural cystatin F according to the gel filtration experiment exactly equaled that of recombinant cystatin F (M r ϭ 26,800). The combined results from the purification attempts and protein analyses of human tissue and cell samples thus demonstrate that a protein with the immunoreactivity, charge, and size of recombinant cystatin F is present in human blood and produced by cell lines derived from the immune system. To assess the overall tissue distribution of cystatin F, Northern blot experiments with the cystatin F cDNA as probe were performed. The cystatin F cDNA showed no cross-hybridization with cDNAs for cystatins C, D, and E/M in control experiments under identical washing conditions. The Northern blot results (Fig. 5) indicate a distribution of cystatin F mRNA in whole tissues that is quite different from the distribution of cystatin C or cystatin E/M mRNA (13,56). The strongest mRNA signals were seen for spleen and peripheral blood leukocytes; moderate signals were observed for thymus and small intestine; and cystatin F gene expression appeared relatively low in all other tissues examined. The Northern blot results were thus compatible with up-regulated cystatin F expression in lymphoid tissue. The expression pattern was additionally assessed by analysis of transcripts in cDNA library data bases. In the CD34depleted cord blood cell cDNA library from which the first clone was obtained, one more cystatin F clone was identified. In the total of Ͼ650 libraries investigated, EST sequencing identified 52 additional cystatin F clones in 19 of the libraries. All other libraries were negative. The positive libraries were made from primary dendritic cells (27 clones), anergic T cells (four clones), bone marrow (two libraries; three clones and one clone, respectively), activated T cells (two libraries; one clone in each), multiple sclerosis lesions (white matter of the central nervous system) (two libraries; one clone in each), and osteoarthritic OA-4 cells (four clones), with one clone each in libraries from unstimulated and phorbol 12-myristate 13-acetate-and retinoic acid-stimulated HL-60 cells and from breast lymph node, apoptotic T cells, osteosarcoma cells, pancreatic islet tumor cells, thymus, Jurkat cells, helper T cells (Th2), and peripheral blood mononuclear cells stimulated with poly(I-C) (interferon inducer). To put the distribution pattern of cystatin F clones in the cDNA data bases in perspective, the fact that 51 of the 54 cystatin F clones were found in libraries directly related to the immune system could be compared with the distribution pattern of cystatin C clones in the same libraries. A total number of 539 cystatin C clones were identified in 163 of the Ͼ650 libraries, without any obvious bias toward a certain tissue or cell type. The combined results from the analyses of cystatin F mRNA by Northern blotting and EST data base screening thus suggest that cystatin F is a cysteine proteinase inhibitor primarily produced by cells of the immune system, possibly with a function crucial for specialized cells of the myeloid or T cell lineages. In this context, it is noteworthy that a novel papainlike cysteine proteinase, cathepsin W, with specific expression in cytotoxic T lymphocytes has recently been reported (57).

TABLE II Distribution of cystatin F in cultured cells
The human blood cell-derived cell lines were cultured as described under "Experimental Procedures." Conditioned culture media were collected and measured for cystatin F content by ELISA using a polyclonal rabbit antiserum against isolated recombinant human cystatin F. Cystatin C content in the samples was also measured by a specific ELISA (28)