Molecular Cloning and Structural and Functional Characterization of Human Cathepsin F, a New Cysteine Proteinase of the Papain Family with a Long Propeptide Domain*

A cDNA encoding a new cysteine proteinase belonging to the papain family and called cathepsin F has been cloned from a human prostate cDNA library. This cDNA encodes a polypeptide of 484 amino acids, with the same domain organization as other cysteine proteinases, including a hydrophobic signal sequence, a prodomain, and a catalytic region. However, this propeptide domain is unusually long and distinguishes cathepsin F from other proteinases of the papain family. Cathepsin F also shows all structural motifs characteristic of these proteinases, including the essential cysteine residue of the active site. Consistent with these structural features, cathepsin F produced in Escherichia coli as a fusion protein with glutathione S-transferase degrades the synthetic peptide benzyloxycarbonyl-Phe-Arg-7-amido-4-methylcoumarin, a substrate commonly used for functional characterization of cysteine proteinases. Furthermore, this proteolytic activity is blocked bytrans-epoxysuccinyl-l-leucylamido-(4-guanidino)butane, an inhibitor of cysteine proteinases. The gene encoding cathepsin F maps to chromosome 11q13, close to that encoding cathepsin W. Cathepsin F is widely expressed in human tissues, suggesting a role in normal protein catabolism. Northern blot analysis also revealed a significant level of expression in some cancer cell lines opening the possibility that this enzyme could be involved in degradative processes occurring during tumor progression.

A cDNA encoding a new cysteine proteinase belonging to the papain family and called cathepsin F has been cloned from a human prostate cDNA library. This cDNA encodes a polypeptide of 484 amino acids, with the same domain organization as other cysteine proteinases, including a hydrophobic signal sequence, a prodomain, and a catalytic region. However, this propeptide domain is unusually long and distinguishes cathepsin F from other proteinases of the papain family. Cathepsin F also shows all structural motifs characteristic of these proteinases, including the essential cysteine residue of the active site. Consistent with these structural features, cathepsin F produced in Escherichia coli as a fusion protein with glutathione S-transferase degrades the synthetic peptide benzyloxycarbonyl-Phe-Arg-7-amido-4-methylcoumarin, a substrate commonly used for functional characterization of cysteine proteinases. Furthermore, this proteolytic activity is blocked by transepoxysuccinyl-L-leucylamido-(4-guanidino)butane, an inhibitor of cysteine proteinases. The gene encoding cathepsin F maps to chromosome 11q13, close to that encoding cathepsin W. Cathepsin F is widely expressed in human tissues, suggesting a role in normal protein catabolism. Northern blot analysis also revealed a significant level of expression in some cancer cell lines opening the possibility that this enzyme could be involved in degradative processes occurring during tumor progression.
The cysteine proteinases are a widespread group of enzymes that catalyze the hydrolysis of many different proteins and play a major role in intracellular protein degradation and turnover (1,2). These proteolytic enzymes can be subdivided into more than 20 different families, including the papain family, calpains, streptopains, clostripains, viral cysteine proteinases, and caspases, the largest one being that of papain (3). In fact, the papain family of cysteine proteinases comprises a large number of enzymes from both prokaryotes and eukaryotes, with representative members expressed in bacteria, fungi, pro-tozoa, plants, and humans (3,4). In recent years, the number of human cysteine proteinases belonging to the papain family has considerably increased, and a total of 10 different family members has been characterized at the amino acid sequence level. These human cysteine proteinases include cathepsin B (5), cathepsin L (6, 7), cathepsin H (8,9), cathepsin S (10, 11), cathepsin C (11,12), cathepsin O (13), cathepsin K (14,15), cathepsin W (16), cathepsin L2 (17), and cathepsin Z (18). Structural analysis of these enzymes has revealed that all of them contain a series of conserved features including an essential cysteine residue in their active site. In addition, it is well established that all these cysteine proteinases are synthesized as preproenzymes, which are processed to the corresponding proenzymes and targeted to the lysosomes by the mannose 6-phosphate signal attached to them. However, these enzymes differ in tissue distribution and in some enzymatic properties, including substrate specificities and pH stability. Functional analysis of these proteinases has shown that in addition to their intracellular role in protein recycling, they are involved in other normal processes such as antigen presentation (19), bone remodeling (20), and prohormone activation (21). In addition, it has been suggested that cysteine proteinases are involved in a variety of disease processes such as pulmonary emphysema (22), osteoporosis (23), Alzheimer's disease (24), rheumatoid arthritis (25), and cancer invasion and metastasis (26). Therefore, these enzymes represent primary targets for the development of inhibitors that could block its uncontrolled activity in these pathological conditions.
As part of our work directed to look for proteolytic enzymes that could be of importance in tumor progression, we have recently identified different cysteine proteinases of the papain family overexpressed in human carcinomas from diverse sources. These proteases include cathepsin O, originally cloned from a breast carcinoma (13), cathepsin L2, overexpressed in breast and colon carcinomas (17), and cathepsin Z, ubiquitously distributed in cancer cell lines and primary tumors and characterized by containing an unusual short propeptide in its amino acid sequence (18). We have also identified human bleomycin hydrolase, a cytosolic cysteine proteinase distantly related to other members of the papain family and involved in chemotherapy resistance (27,28). In this work, we describe the molecular cloning and complete nucleotide sequence of a cDNA encoding a new member of the papain family of cysteine proteinases, which has been called cathepsin F, and that is mainly characterized by possessing a unique long propeptide domain in its amino acid sequence. We also report the expression of the gene in Escherichia coli and the functional characterization of the recombinant enzyme. Finally, we determine the chromosomal location of the cathepsin F gene and analyze its expression in human tissues and cancer cell lines.

EXPERIMENTAL PROCEDURES
Materials-A human prostate cDNA library, constructed in gt11, and different Northern blots containing poly(A) ϩ RNAs prepared from diverse human tissues and cancer cell lines were from CLONTECH (Palo Alto, CA). A high density gridded human P1 artificial chromosome (PAC) 1 genomic library and a panel of somatic cell hybrids containing a single human chromosome in a rodent background were supplied by the Human Genome Mapping Resource Center (Cambridgeshire, UK). Restriction endonucleases and other reagents used for molecular cloning were purchased from Roche Molecular Biochemicals (Mannheim, Germany). Synthetic peptides Z-Phe-Arg-AMC, Z-Arg-Arg-AMC, and Z-Arg-AMC were from Bachem (Bubendorf, Switzerland), and proteinase inhibitor E-64 was from Sigma. Oligonucleotides were synthesized by the phosphoramidite method in an Applied Biosystems DNA synthesizer (model 392A) and used directly after synthesis. Double-stranded DNA probes were radiolabeled with [␣-32 P]dCTP (3000 Ci/mmol) from Amersham Pharmacia Biotech (Buckinghamshire, UK) using a commercial random-priming kit from Amersham Pharmacia Biotech (Uppsala, Sweden).
Probe Preparation and cDNA Library Screening-After searching the GenBank TM data base of human expressed sequence tags (ESTs) for sequences with homology to members of the papain family of cysteine proteinases, we identified a sequence derived from an ovarian tumor cDNA clone 2 that could be a good candidate to encode a new family member. Further searching of the EST data base for sequences similar to H39591 led us to identify 6 overlapping ESTs, spanning around 680 bp, and useful to prepare a probe for trying to clone a cDNA encoding this putative novel human cysteine proteinase. To obtain this probe, we performed a PCR amplification of DNA prepared from a prostate cDNA library using two specific primers 5Ј-GGACTGTGGACAAGATGGAC and 5Ј-AGCTGTTCTTGATGGCCCA, whose sequence was derived from the overlapping ESTs. The PCR reaction was carried out in a GeneAmp 2400 PCR system from Perkin-Elmer for 35 cycles of denaturation (94°C, 15 s), annealing (56°C, 15 s), and extension (72°C, 15 s). The PCR-amplified product was cloned and, after confirming its identity by nucleotide sequence analysis, was used to screen a human prostate cDNA library, according to standard procedures (29). Hybridization to the radiolabeled probe was carried out for 18 h in 6ϫ SSC (1ϫ ϭ 150 mM NaCl, 15 mM sodium citrate, pH 7.0), 5ϫ Denhardt's (1ϫ ϭ 0.02% bovine serum albumin, 0.02% polyvinylpyrrolidone, 0.02% Ficoll), 0.1% SDS, and 100 g/ml denatured herring sperm DNA at 65°C. The membranes were washed twice for 1 h at 65°C in 0.1ϫ SSC, 0.1% SDS and exposed to autoradiography. After plaque purification, cloned inserts were excised by EcoRI digestion and the resulting fragments subcloned into the EcoRI site of pUC18. The isolated prostate cDNA encoding a novel human cysteine proteinase was also used as a probe to screen a mouse brain cDNA library following the same procedure described above. Positive clones were isolated and characterized by nucleotide sequence analysis.
DNA and Protein Sequence Analysis-DNA fragments selected for nucleotide sequencing were inserted in the polylinker region of phage vector M13mp19 and sequenced by the dideoxy chain termination method using either M13 universal primer or cDNA specific primers and the Sequenase Version 2.0 kit (U. S. Biochemical Corp.). All nucleotides were identified in both strands. Sequence ambiguities were solved by substituting dITP for dGTP in the sequencing reactions. Computer analysis of DNA and protein sequences were performed with the software package of the University of Wisconsin Genetics Computer Group (30). A phylogenetic tree directed to examine the evolutionary relationships between human cysteine proteinases was constructed using the NEIGHBOR program, included in the PHYLIP software package (31). The construction of the tree was done by the unweighed pair group method using arithmetic averages. The phylogenic distances were obtained according to the method described by Kimura (32).

Construction of Expression Vectors and Expression in E. coli-To
prepare an expression vector suitable for production of recombinant cathepsin F in E. coli, we first generated a 647-bp DNA fragment containing the coding sequence for the mature human cathepsin F by PCR amplification of the isolated full-length cDNA with primers 5Ј-ATGGCCCCACCTGAATGGGACT and 5Ј-TCAGTCCACCACCGCCG-AG. The PCR reaction was carried out for 20 cycles of denaturation (95°C, 30 s), annealing (60°C, 30 s), and extension (68°C, 1 min) using the Expand TM Long Template PCR System (Roche Molecular Biochemicals) to try to reduce error frequency. The PCR product was phosphorylated with T4 polynucleotide kinase, repaired with Klenow fragment, and ligated to the expression vector pGEX-3X (Amersham Pharmacia Biotech), previously treated with SmaI and alkaline phosphatase. The resulting plasmid, called pGEX-3X CTSF, was transformed into E. coli strain BL21(DE3). and the transformed cells were grown in LB broth containing 100 g/ml ampicillin at 37°C for 16 h, diluted 1/100 with the same medium, and grown to an A 600 of 1.0. Then, isopropyl-1-thio-␤-D-galactopyranoside was added to a final concentration of 1 mM, and the incubation was continued for 3 h. Cells were collected by centrifugation, washed, and resuspended in 0.05 volumes of PBS, lysed by using a French press and centrifuged at 20,000 ϫ g for 20 min at 4°C. The soluble extract was treated with glutathione-Sepharose 4B and eluted with glutathione elution buffer (10 mM reduced glutathione in 50 mM Tris-HCl, pH 8.0) following the manufacturer's instructions.
Enzyme Activity Assays-The enzymatic activity of purified cathepsin F produced in E. coli was measured using 20 M Z-Phe-Arg-AMC, Z-Arg-Arg-AMC, or Z-Arg-AMC as substrates and following the procedure described by Barrett and Kirschke (33) with minor modifications. Assays were performed at 30°C, in 100 mM sodium acetate buffer, pH 5.5, containing 8 mM dithiothreitol, 2 mM EDTA, and 0.05% Brij 35. Substrate hydrolysis was monitored in a Cytofluor 2350 fluorometer (Millipore, Bedford, MA) at excitation and emission wavelengths of 360 and 460 nm, respectively. For inhibition assays, the reaction mixture was preincubated with 20 M E-64 at 30°C for 15 min, and the remaining activity was determined using the fluorogenic substrate Z-Phe-Arg-AMC as above.
Chromosomal Mapping-Total DNA from a panel of 24 monochromosomal somatic cell hybrids containing a single human chromosome in a mouse or hamster cell line background was PCR-screened for the presence of the genomic sequence flanked by the cathepsin F specific primers 5Ј-GTGCTGATCAGAAGTGCTGCTGC and 5Ј-AGTTTCCTG-GACATGGATAGGGAC. Amplification conditions were as follows: 35 cycles of denaturation (94°C, 15 s), annealing (68°C, 15 s), and extension (72°C, 1 min). To more precisely determine the physical location of the cathepsin F gene within the human genome, fluorescent in situ hybridization (FISH) of genomic DNA clones for cathepsin F was performed as described previously (34). Briefly, genomic clones were isolated from a human P1 artificial chromosome (PAC) genomic library screened by filter hybridization with the full-length cathepsin F cDNA as probe. Two independent clones were identified enclosing the cathepsin F gene as demonstrated by PCR and Southern blot analysis. DNA from one of these PAC clones (called 123H18) was obtained with the standard alkaline lysis method and then used for FISH mapping. To do that, 2 g of the PAC DNA was nick-translated with biotin-16-dUTP and hybridized to normal male metaphase chromosomes obtained from phytohemaglutinin-stimulated cultured lymphocytes. Biotinylated probe was detected using two avidin-fluorescein layers. Chromosomes were diamidine-2-phenylindole dihydrochloride-banded, and images were captured in a Zeiss axiophot fluorescent microscope equipped with a CCD camera (Photometrics).
Northern Blot Analysis-Northern blots containing 2 g of poly(A) ϩ RNA of different human tissue specimens and cancer cell lines were prehybridized at 42°C for 3 h in 50% formamide, 5ϫ SSPE (1ϫ ϭ 150 mM NaCl, 10 mM NaH 2 PO 4 , 1 mM EDTA, pH 7.4), 10ϫ Denhardt's, 2% SDS, and 100 g/ml denatured herring sperm DNA. After prehybridization, filters were hybridized with a full-length cDNA for cathepsin F. After hybridization, filters were washed with 0.1ϫ SSC, 0.1% SDS for 2 h at 50°C and exposed to autoradiography. RNA integrity and equal loading was assessed by hybridization with an actin probe.

RESULTS
Identification and Characterization of a cDNA Encoding Human Cathepsin F-As a previous step to identify and characterize new human cysteine proteinases belonging to the papain family, we performed an analysis of the human EST data bases, searching for expressed sequences with significant sim-ilarity to those previously determined for human cathepsins. This computer search allowed us to identify several overlapping ESTs that, after translation, generated an open reading frame with a significant degree of similarity to papain-like cysteine proteinases. A cDNA containing part of these overlapping ESTs was PCR-amplified from DNA of a human prostate cDNA library and used as a probe to hybridize this library. After screening of approximately 1 ϫ 10 6 plaque-forming units, DNA was isolated from 10 independent clones selected according to their positive hybridization with the probe, and their nucleotide sequences were determined. Computer analysis of the obtained sequences confirmed that all of them derived from the same gene, potentially encoding a new member of the papain family of cysteine proteinases. Further analysis of the nucleotide sequence derived from the isolated clone containing the largest insert revealed the presence of an open reading frame coding for a protein of 484 amino acids and a predicted molecular weight of 53,365 (Fig. 1).
To provide additional evidence that the isolated prostate cDNA encoded a putative cysteine proteinase, we performed a detailed amino acid sequence comparison between the identified sequence and those present in the data bases. This analysis revealed that the highest degree of identity was found with a cysteine proteinase from Schistosoma mansoni (48%). Significant similarities were also found with the different human cysteine proteinases of the papain family, with the percentage of identities ranging from 37% with cathepsin L2 to 26% with cathepsin B. Furthermore, the identified amino sequence exhibits the domain organization and structural motifs characteristic of the papain-like cysteine proteinases (Fig. 2). Thus, it contains a stretch of hydrophobic amino acids close to the initial methionine which likely corresponds to the signal peptide found in all other family members. In addition, the multiple amino acid sequence alignment between all human cysteine proteinases of the papain family characterized to date (Fig. 2) also allows the identification of a proregion and a mature proteinase domain in the identified protein sequence, as well as to define the putative cleavage site between both domains. Thus, the active processed form of the putative novel cysteine proteinase would start at the alanine residue located at position 271, since it immediately precedes the absolutely conserved proline residue located at the ϩ2 position in all family members. The amino acid sequence alignment shown in Fig. 2 also allows the identification of the putative active site Cys residue (at position 295) of the deduced amino acid sequence as well as other residues proposed to be important for the catalytic properties of cysteine proteinases, including the His-431 and Asn-451 residues (35,36). Furthermore, the amino acid sequences surrounding these three residues are also well conserved. Thus, the N-terminal region contains the glutamine residue (at position 289) of the oxyanion hole present in the structure of these enzymes as well as the conserved tryptophan residue and hydrophobic segment immediately adjacent to the active site cysteine residue (35,36). In addition, the C-terminal region contains a series of conserved aromatic and glycine residues located around the histidine and asparagine residues of the active site. Finally, the deduced amino acid sequence also contains five potential sites of N-glycosylation (Asn-Glu-Thr at position 160, Asn-Arg-Thr at 195, Asn-Phe-Ser at 367, Asn-Asp-Ser at 378, and Asn-Arg-Ser at 440), it is very likely that at least one of them is effectively glycosylated and has attached the mannose 6-phosphate marker required for lysosomal tar-geting of these enzymes. On the basis of all these structural characteristics, we can conclude that the isolated human prostate cDNA codes for a novel member of the papain family of cysteine proteinases, which we propose to call cathepsin F. A phylogenetic tree constructed to evaluate the evolutionary relationships of human cathepsin F to other human cysteine proteinases revealed that this novel protein is only distantly related to other family members, the more closely related member is cathepsin W (Fig. 3).
After first submission of the present work, a paper describing a sequence closely related to that described herein has been reported (37). According to these authors, cathepsin F should be much smaller than the protein identified in this work (302 versus 484 amino acid residues), and contrary to all known members of this protease family, it would lack a hydrophobic signal sequence. Consequently, they propose that cathepsin F would be targeted to the lysosomal compartment via an Nterminal signal peptide-independent lysosomal targeting pathway. However, an alternative explanation would be that Wang et al. (37) have isolated a partial cDNA clone for this novel cysteine proteinase. In fact, a detailed comparison of both sequences shows that they are identical in the 3Ј-region, but the open reading frame identified in this work extends more than 180 amino acids upstream from the methionine residue considered by Wang et al. (37) as the first residue in cathepsin F. In addition, and as can be seen in Figs. 1 and 2, the sequence reported herein contains a hydrophobic signal sequence like all the remaining cathepsins. This 19-residue leader peptide ends in a sequence Ala-Val-Ala, which matches perfectly the Ala-X-Ala motif found at the processing site of eukaryotic preproteins (38). Computer analysis using the algorithm developed by Nielsen et al. (39) confirmed that this site presented the highest probability to be the processing peptide bond of the cathepsin F leader sequence. Taken together, these data indicate that human cathepsin F contains a bona fide signal sequence; and consequently, its domain organization is identical to that previously described for all known members of the papain family of cysteine proteinases. Nevertheless, the amino acid sequence reported in this work for cathepsin F exhibits a very long propeptide (251 residues) which distinguishes this enzyme from all other family members, whose equivalent domains range from 206 to 41 residues in length (cathepsin C and cathepsin Z, respectively) (Fig. 2).
To provide additional information on the structural organization of cathepsin F, studies were undertaken to identify and clone the murine homolog of this cysteine proteinase. To this purpose, the cDNA encoding human cathepsin F was used as a probe to screen a mouse brain cDNA library. Nucleotide sequence analysis from clones selected after positive hybridization to the radiolabeled probe revealed the presence of an open reading frame coding for a protein showing 72% of identities with human cathepsin F ( Fig. 2; accession number AJ131851). This murine protein also contains a hydrophobic signal sequence as well as a long propeptide domain, thus providing further evidence that these structural features are not exclusive of the sequence reported herein for human cathepsin F. Finally, it is remarkable that according to preliminary studies, 3 there is a putative homolog of cathepsin F in Drosophila, whose amino acid sequence also exhibits a signal sequence and 3 I. Santamaría and C. López-Otín, unpublished results. The amino acid sequences of human cysteine proteinases were extracted from the SwissProt data base, and the multiple alignment was performed with the PILEUP program of the GCG package (30). Numbering corresponds to the sequence of cathepsin F. Residues that are common to all sequences are shown in bold. Gaps introduced to optimize the alignment are indicated by hyphens. a long prodomain, confirming again that both domains are present in this novel cysteine proteinase.

Expression of the Human Cathepsin F cDNA in E. coli and Analysis of the Proteolytic Activity of the Purified Recombinant
Protein-To examine further the possibility that the isolated cathepsin F cDNA encodes a catalytically active cysteine proteinase, studies were undertaken to produce the human protein in a bacterial expression system following the strategy previously used to produce other cysteine proteinases of the papain family (17,18,40). For this purpose, a 647-bp fragment encoding the predicted mature cathepsin F was PCR-amplified as described under "Experimental Procedures" and cloned in the polylinker region of the expression vector pGEX-3X. The resulting plasmid (pGEX-3X CTSF), whose identity was verified by nucleotide sequencing, was transformed into E. coli BL21(DE3), and the transformed bacteria were induced to produce the recombinant protein by treatment with isopropyl-1-thio-␤-Dgalactopyranoside. Protein extracts were prepared from the induced bacteria and analyzed by SDS-PAGE. As shown in Fig.  4A, the bacteria transformed with the recombinant plasmid contained a fusion protein of about 52 kDa, which was not present in the control extracts, whereas these control extracts contained a 29-kDa band corresponding to the parental glutathione S-transferase that was absent in the recombinant bacteria. The fusion protein containing cathepsin F was purified by affinity chromatography in a glutathione-Sepharose 4B column, which was eluted with a reduced glutathione-containing buffer. The protein material present in the chromatographic eluate was analyzed by SDS-PAGE, and as shown in Fig. 4A, a single band of the expected size was detected. The fractions containing purified cathepsin F were pooled, and their enzymatic activities against Z-Phe-Arg-AMC, Z-Arg-Arg-AMC, and Z-Arg-AMC were examined. As can be seen in Fig. 4B, these analyses revealed that recombinant human cathepsin F exhibits a significant proteolytic activity (3.75 mol/min/mol enzyme) against the synthetic peptide Z-Phe-Arg-AMC, which has been defined as an optimal substrate for different cysteine proteinases (41). This enzymatic activity is slightly higher than that obtained for cathepsins L2 and Z produced in E. coli as fusions with glutathione S-transferase and assayed under the same experimental conditions (17,18) (Fig. 4B). It is remarkable that, similar to these cathepsins, the proteolytic activity of recombinant cathepsin F against other fluorogenic substrates such as Z-Arg-AMC and Z-Arg-Arg-AMC was extremely low or undetectable. Finally, we examined the possibility that the degrading activity of recombinant cathepsin F against Z-Phe-Arg-AMC was inhibited by specific inhibitors of cysteine proteinases. In fact, this proteolytic activity was completely abolished by E-64, a commonly used inhibitor of these enzymes, whereas inhibitors of serine proteinases (phenylmethylsulfonyl fluoride), aspartyl proteinases (pepstatin A), and metalloproteinases (EDTA) did not show any significant effect (Fig. 4B, and data not shown). According to these preliminary enzymatic analyses, together with the above mentioned structural characteristics, we can conclude that cathepsin F is a cysteine proteinase with the substrate specificity and sensitivity toward inhibitors characteristic of these enzymes.
Physical Mapping of the Human Cathepsin F Gene-To provide additional information on the structural and evolutionary relationship of human cathepsin F to other members of the papain family of cysteine proteinases, we carried out studies directed to establish the chromosomal localization of the cathepsin F gene. To this purpose, we first performed a PCRbased strategy directed to screen a panel of somatic cell hybrid lines containing a single human chromosome in a mouse or hamster background. As can be seen in Fig. 5A, positive amplification results were obtained in hybrids containing human FIG. 3. Schematic illustration of evolutionary relationships between human cysteine proteinases. The phylogenetic tree was constructed using the NEIGHBOR program of the PHYLIP software package (31). The phylogenic distances were obtained according to the method described by Kimura (32). chromosomes 11 and 15. However, the somatic cell hybrids containing human chromosome 15 also carry fragments from chromosome 11 (42), strongly suggesting that the human cathepsin F gene maps to this latter chromosome. To establish more precisely the chromosomal location of the cathepsin F gene in chromosome 11, we carried out FISH analysis. To do that, we first isolated PAC clones containing this gene by screening a genomic library using as probe the full-length cDNA for cathepsin F. After characterization of the isolated PAC clones by both Southern blot and nucleotide sequencing analysis, DNA isolated from one of them (123H18) was then employed in FISH experiments on human chromosome metaphase spreads. After diamidine-2-phenylindole dihydrochloride banding of the metaphase cells showing specific hybridization signals, fluorescent spots corresponding to the biotinylated PAC clone were mapped to the q13 region of chromosome 11 (Fig. 5B). Interestingly, the gene encoding cathepsin W has also been mapped to this region of chromosome 11 (43), strongly suggesting that cathepsins F and W could be tightly linked in the human genome. The gene encoding cathepsin C has also been located at the same region, but in a different band 11q14. 1-14.3 (44), whereas the genes coding for the remaining members of the family have been mapped to different chromosomes (45)(46)(47)(48)(49)(50)(51).
Expression Analysis of Cathepsin F in Human Tissues and Cancer Cell Lines-As a preliminary step to elucidate the potential role of cathepsin F in human tissues, we examined by Northern blot analysis the expression pattern of this enzyme in a wide variety of cells and tissues including leukocytes, colon, small intestine, ovary, testis, prostate, thymus, spleen, pancreas, kidney, skeletal muscle, liver, lung, placenta, brain, and heart. After hybridization with a radiolabeled probe specific for cathepsin F, a single transcript of approximately 2.1 kb was observed with variable intensity in most examined tissues (Fig.  6A). The major sites of cathepsin F expression were skeletal muscle and testis, whereas expression in leukocytes and thymus was virtually undetectable. The widespread distribution of cathepsin F in human tissues should be consistent with a putative role for this enzyme in the intracellular protein catabolism taking place in lysosomes from all cell types. However, the wide variability observed in cathepsin F expression in the different tissues analyzed in this work suggests that in addition to its housekeeping role as a lysosomal digestive enzyme, it may play a more specific role in those tissues like skeletal muscle and testis in which its relative levels are very high.
Finally, in this work we have addressed the possibility that cathepsin F could be overexpressed by human cancer cells lines from different sources, as already shown for other cysteine proteinases of the papain family (26,52). To this purpose, we first hybridized a Northern blot containing poly(A) ϩ RNAs extracted from different cancer cell lines (HL-60, HeLa, K-562, MOLT-4, Burkitt's lymphoma Raji, colorectal adenocarcinoma SW480, lung carcinoma A549, and melanoma G361) with the full-length cDNA for cathepsin F. As shown in Fig. 6B, high levels of a transcript identical in size (about 2.1 kb) to the one detected in normal tissues were observed in HeLa cells. Lower levels of this transcript were also detected in melanoma, K-562, and lung carcinoma cells.

DISCUSSION
The availability of EST data bases represents an excellent tool to look for novel genes through computer search of short expressed DNA sequences with nucleotide sequence similarity to genes of interest. In this work, we have used this strategy as a first step to clone a new member of the papain family of cysteine proteinases, which we have called cathepsin F. The identification of this human protease was based on the finding of a series of overlapping ESTs, whose sequence was similar to previously characterized human cysteine proteinases. These sequences were used to design a DNA probe that was PCRamplified from a human prostate cDNA and subsequently employed to screen a cDNA library from the same tissue. This screening led finally to the finding of a full-length cDNA coding for cathepsin F. Pairwise comparisons for structural similarities between the identified amino acid sequence for this protein and those for the remaining papain-like cysteine proteinases confirmed that cathepsin F displays the same domain organization as other family members. Thus, a signal peptide, a propeptide domain, and a catalytic region can be identified in the amino acid sequence deduced for this protein. The identification of this signal sequence, which is also present in the mouse and Drosophila homologs of cathepsin F (Fig. 2, and data not shown), does not support the data reported by Wang et al. (37), after submission of this manuscript, who have proposed that cathepsin F lacks signal sequence. Furthermore, the catalytic domain contains all structural motifs characteristic of cysteine proteinases, including the nucleophilic cysteine residue involved in covalent intermediate formation during peptide hydrolysis, as well as the histidine and asparagine residues that constitute the catalytic triad of these enzymes (35,36). Consistent with these structural characteristics, functional analysis of recombinant cathepsin F produced in a bacterial expression system provided additional evidence that the isolated cDNA codes for a catalytically active cysteine proteinase. In fact, the purified recombinant protein exhibits a significant proteolytic activity against fluorogenic substrates used for assaying the enzymatic activity of these proteinases. In addition, this degrading activity was abolished by inhibitors of cysteine proteinases but not by inhibitors of any other class of proteolytic enzymes. Nevertheless, this novel protease also contains in its amino acid sequence some specific features. Of special interest in this regard is the finding that its N-terminal propeptide domain is extremely long when compared with those described for all the remaining papain-like cysteine proteinases. According to structural properties, the prosegments found in these enzymes can be classified into two groups (53,54). The first one contains cathepsin L-like enzymes with prodomains of about 90 amino acids in length and bearing two highly conserved motifs called ERFNIN and GNFD. The second group comprises the cathepsins B from different sources and is characterized by a smaller proregion of about 60 amino acids lacking the ERFNIN consensus sequence. In addition, there are two cysteine proteinases that cannot be classified into any of these groups. Thus, human cathepsin C propeptide contains 206 amino acids (12), whereas the recently described human cathepsin Z contains a proregion that is only 41 residues in length and lacks the above-mentioned conserved domains (18). Human cathepsin F markedly deviates from all of them because its prosegment contains 251 amino acids. Interestingly, both mouse and Drosophila homologs of cathepsin F also exhibit a very long prodomain ( Fig. 2 and data not shown) indicating that it is a characteristic feature of this enzyme. At present, the functional significance of this extremely long prosegment is unknown. In this regard, it is well established that the propeptide found in papain-like enzymes acts as an intrinsic inhibitor of their proteolytic activity (55). In addition, this region has also been found to be essential for the proper folding of these enzymes, for stabilizing their structure upon exposure to changes in pH, or for providing the structural markers required for microsomal membrane binding or lysosomal targeting (55)(56)(57)(58)(59). It is likely that the long prosegment of cathepsin F may play some specific role in addition to those proposed for this domain of papain-like cysteine proteinases.
In this work, we have also analyzed the chromosomal location of the cathepsin F gene as well as its expression in normal and tumor cells. According to both FISH and somatic hybrid mapping techniques, this gene localizes to the long arm of chromosome 11, at 11q13. This position is the same as that recently reported for the cathepsin W gene, indicating that these genes are clustered in the human genome. Consistent with these results, a phylogenetic tree constructed to analyze the evolutionary relationships between all known human cysteine proteinases of the papain family demonstrated that ca-FIG. 6. Expression of the cathepsin F gene in human tissues and cancer cell lines. A, 2 g of poly(A) ϩ RNA prepared from the indicated tissues were analyzed by Northern blot hybridization with the full-length cDNA for human cathepsin F. The positions of RNA size markers are shown. Filters were subsequently hybridized with a human actin probe in order to ascertain the differences in RNA loading among the different samples. B, 2 g of poly(A) ϩ RNA prepared from the indicated tumor cell lines were hybridized with the above-described probe specific for human cathepsin F. Filters were finally hybridized with a human actin probe. thepsin F and cathepsin W are closely related. In addition to its possible value in the context of evolutionary studies of the human cysteine proteinases, knowledge of the chromosomal location of the cathepsin F gene reported here may be useful for searching putative genetic diseases associated with this gene. Interestingly, different studies have reported that the 11q13 region is frequently altered in diverse human tumors (60,61). Consequently, it will be of great interest to examine the possibility that cathepsin F may be a target of these genetic abnormalities associated with human carcinomas.
On the other hand, analysis of the expression of cathepsin F in human tissues has provided some information about the putative functional significance of this protein. Thus, the finding that it is expressed in most normal tissues analyzed suggests a putative general role for this enzyme in the lysosomal protein catabolism taking place in all cell types. This expression pattern of cathepsin F classifies this enzyme within the group of widely distributed cysteine proteinases such as cathepsins B, L, H, O, and Z, as opposed to a series of recently described family members including cathepsins K, S, W, and L2, which appear to play highly specific roles in those tissues in which they are overexpressed or even exclusively expressed (see Ref. 2 for a review). Nevertheless, it is remarkable that cathepsin F expression levels in normal tissues exhibit a large variability, and there are tissues such as skeletal muscle and testis, in which its mRNA levels are up to 20-fold higher than in others such as kidney and colon, which also produce this novel protease, albeit at low levels. The finding of very high levels of cathepsin F mRNA in skeletal muscle is of particular interest in light of previous data reporting an essential role of cysteine proteinases in muscle proteolysis in both normal and pathological conditions, including some forms of muscular dystrophy (62)(63)(64). Further studies will be required to evaluate the possibility that cathepsin F could be responsible for the catabolism of specific protein substrates in the muscle. On the other hand, its high level expression in the testis is also suggestive of a role for this novel cathepsin in fertilization processes, as proposed for other family members including the recently described cathepsin L2 (17,65). Finally, the expression analysis of cathepsin F has also revealed the presence of this enzyme in several human cancer cell lines, being especially significant in high levels in HeLa cells. This finding suggests that cathepsin F may play some role in the progression of some human carcinomas, thereby providing additional interest to the further functional characterization of this proteinase.
In conclusion, we have identified and characterized a new human cysteine proteinase of the papain family that shows similarities and differences with the remaining family members previously described. Cathepsin F exhibits signal sequence and all structural features of cysteine proteinases as well as a profile of activity against fluorogenic substrates and sensitivity to inhibitors typical of these enzymes. However, it shows an extremely long propeptide domain which distinguishes this enzyme from other family members. Furthermore, its high level expression in certain tissues such as skeletal muscle and testis is suggestive of a specific activity in some physiological processes taking place in these tissues. The availability of recombinant cathepsin F and specific reagents for this new proteinase generated in this work will be very helpful to evaluate its precise functional role in the context of the increasingly complex pathways of protein degradation and turnover in human tissues.