Molecular Cloning of cDNA for Matriptase, a Matrix-degrading Serine Protease with Trypsin-like Activity*

A major protease from human breast cancer cells was previously detected by gelatin zymography and proposed to play a role in breast cancer invasion and metastasis. To structurally characterize the enzyme, we isolated a cDNA encoding the protease. Analysis of the cDNA reveals three sequence motifs: a carboxyl-terminal region with similarity to the trypsin-like serine proteases, four tandem cysteine-rich repeats homologous to the low density lipoprotein receptor, and two copies of tandem repeats originally found in the complement subcomponents C1r and C1s. By comparison with other serine proteases, the active-site triad was identified as His-484, Asp-539, and Ser-633. The protease contains a characteristic Arg-Val-Val-Gly-Gly motif that may serve as a proteolytic activation site. The bottom of the substrate specificity pocket was identified to be Asp-627 by comparison with other trypsin-like serine proteases. In addition, this protease exhibits trypsin-like activity as defined by cleavage of synthetic substrates with Arg or Lys as the P1 site. Thus, the protease is a mosaic protein with broad spectrum cleavage activity and two potential regulatory modules. Given its ability to degrade extracellular matrix and its trypsin-like activity, the name matriptase is proposed for the protease.

Elevated proteolytic activity has been implicated in neoplastic progression. Although the exact role(s) of proteolytic enzymes in the progression of tumor remains unclear, it seems that proteases may be involved in almost every step of the development and spread of cancer. A widely proposed view is that proteases contribute to the degradation of extracellular matrix and to tissue remodeling and are necessary for cancer invasion and metastasis. A wide array of extracellular matrixdegrading proteases have been discovered, the expression of some of which correlates with tumor progression, as reviewed by Magnatti and Rifkin (1). The plasmin/urokinase-type plas-minogen activator system and the 72-kDa gelatinase (MMP-2)/ membrane-type MMP system have received the most attention for their potential roles in the process of invasion of breast cancer and other carcinomas. However, both systems appear to be largely synthesized by stromal cells in vivo (2)(3)(4)(5) and require indirect mechanisms for their recruitment and activation on the surfaces of cancer cells. The stromal origins of these well characterized extracellular matrix-degrading proteases may suggest that cancer invasion is an event that either depends entirely upon stromal-epithelial cooperation or is controlled by some other unknown epithelium-derived protease(s). A search for these epithelium-derived proteolytic systems that may interact with the plasmin/urokinase-type plasminogen activator system and/or with the MMP family could provide a missing link in our understanding of malignant invasion.
We have pursued studies of a novel protease with the hypothesis that a tumor itself may be a major source of proteases important for multiple aspects of malignant behavior, including invasion and metastasis. To this end, we systematically altered several conditions such as the pH using gelatin zymography to search for potentially important breast cancer cellderived gelatinases. This search led us to the discovery of a major protease, which on a gelatin zymogram had a slightly alkaline pH optimum and a size between those of MMP-2 and MMP-9 in T-47D human breast cancer cells (6). We now propose to call this protease matriptase. Matriptase has been purified from T-47D cell-conditioned medium and has been used as an immunogen to produce monoclonal antibodies (7). Although matriptase was initially isolated from cell-conditioned medium, three lines of evidence, including immunofluorescence staining, surface biotinylation, and subcellular fractionation, suggested that a portion of the enzyme molecules were localized on the surfaces of cells. Given its extracellular matrix-degrading activity and presentation on the surfaces of breast cancer cells, we hypothesize that matriptase may be involved in breast cancer invasion. To further characterize the newly discovered matrix-degrading protease in this study, we have purified the enzyme and its binding protein from human milk, a biological source of relatively high abundance. A cDNA clone for matriptase has now been generated and characterized.

MATERIALS AND METHODS
Cell Lines and Culture Conditions-COS-7 cells were maintained in modified Iscove's minimal essential medium (Biofluids, Inc., Rockville, MD) supplemented with 5% fetal calf serum (Life Technologies, Inc.).
Purification of Matriptase-To obtain enough matriptase for amino acid sequencing, the enzyme was isolated from human milk (39). Briefly, human milk from the Georgetown University Medical Center Milk Bank was precipitated and collected by addition of ammonium sulfate between 40 and 60% saturation. Matriptase was purified by a combination of CM-Sepharose and immunoaffinity chromatography.
Amino Acid Sequence Analysis-To obtain internal amino acid sequences, purified matriptase was separated by SDS-polyacrylamide gel electrophoresis and lightly stained with Coomassie Blue, and protein bands were excised. Matriptase was then subjected to in-gel digestion and amino acid sequencing at the Howard Hughes Medical Institute Biopolymer Laboratory and W. M. Keck Foundation Biotechnology Resource Laboratory at Yale University. The amino-terminal sequences were determined as described previously (8). Briefly, the proteins were resolved by SDS-polyacrylamide gel electrophoresis, transferred to polyvinylidene difluoride membrane, and lightly stained with Coomassie Blue. The proteins were then excised and subjected to amino-terminal sequencing in the Chemistry Department of Florida State University (Tallahassee, FL). The two short sequences obtained were identical to a deduced amino acid sequence from a cDNA termed SNC19 (Gen-Bank TM accession number U20428).
Amplification of an SNC19 cDNA from T-47D Breast Cancer Cells-An SNC19 cDNA clone was generated by reverse transcriptasepolymerase chain reaction utilizing mRNA from T-47D human breast cancer cells. Primer sequences for SNC19 (5Ј-CCTCCTCTTGGTCTT-GCTGGGG-3Ј and 5Ј-AGACCCGTCTGTTTTCCAGG-3Ј) were derived from the published sequence. Standard reverse transcription-polymerase chain reaction was conducted using the Advantage RT-PCR kit (CLONTECH). Products were analyzed on a 0.8% agarose gel; and the resultant band of ϳ2.8 kilobase pairs, corresponding to the expected product size, was excised from the gel, purified, and ligated into pCR2.1 (Invitrogen, San Diego, CA) by TA cloning (pCR-SNC19).
Sequencing-DNA sequencing was performed on an Applied Biosys-tems automated 377 DNA sequencer using standard methods, with the assistance of the Lombardi Cancer Center Sequencing and Synthesis Shared Resource. The sequences were assembled and analyzed with Lasergene software for Windows (DNASTAR, Inc., Madison, WI). The predicted protein sequence was compared with sequences in the Swiss-Prot data base at the National Center for Biotechnology Information using the BLAST network server. Expression of SNC19 in COS-7 Cells-To verify that SNC19 encodes the matriptase cDNA, we constructed a eukaryotic expression vector (pcDNA/SNC19) utilizing the commercially available pcDNA3.1 vector (Invitrogen, San Diego, CA). A 2.83-kilobase pair EcoRI fragment containing the SNC19 cDNA was produced by digestion of pCR-SCN19 and cloned into the EcoRI site of pcDNA3.1. This construct contains the open reading frame of SNC19 driven by the cytomegalovirus promoter. Correct insertion of the SNC19 cDNA was verified by restriction mapping (data not shown). Transfections were carried out using SuperFect transfection reagent (QIAGEN Inc., Valencia, CA) as specified in the manufacturer's handbook. After 48 h, the matriptase-transfected COS-7 cells and the control COS-7 cells, which were transfected with LacZ to monitor transfection efficiency, were extracted with 1% Triton X-100 in 20 mM Tris-HCl, pH 7.4.
Immunoblot Analysis-Immunoblotting was conducted as described previously (7). Proteins were separated by 10% SDS-polyacrylamide gel electrophoresis, transferred to polyvinylidene fluoride membrane, and subsequently probed with anti-matriptase mAb 1 M32. Immunoreactive polypeptides were visualized using peroxidase-labeled secondary antiserum and the ECL detection system (Amersham Pharmacia Biotech).
Gelatin Zymography-Gelatin zymography was carried out as described previously with some modifications (13). Gelatin (1 mg/ml) as a substrate was copolymerized with regular SDS-polyacrylamide gel. Electrophoresis was performed at a constant current of 15 mA. The gelatin gels were washed three times with phosphate-buffered saline containing 2% Triton X-100 and incubated in phosphate-buffered saline at 37°C overnight.
Cleavage of Synthetic Substrates-To demonstrate the trypsin-like activity of matriptase, various synthetic fluorescent protease substrates with arginine or lysine as the P1 site were tested with purified matriptase from human milk. Matriptase was assayed in 20 mM Tris buffer, pH 8.  1 and 2). Some of them could be the degraded products of the protease since they were recognized by mAb 21-9 after longer exposure to the x-ray film. A 40-kDa protein doublet was seen in low levels in a nonboiled sample (A, lane 1), but its levels were increased after boiling (A, lane 2). This 40-kDa doublet was not recognized by mAb 21-9 (B). We propose that these two polypeptides could be binding proteins (BPs) of matriptase. The sizes of the molecular mass markers are indicated. strates were purchased from Sigma. The rate of cleavage of individual substrate was determined against time with a Hitachi F-4500 fluorescence spectrophotometer.

RESULTS AND DISCUSSION
Purification of Matriptase from Human Milk-In our previous study (7), a small proportion of the matriptase molecules were identified as complexes in human breast cancer cells. We have subsequently found human milk to be a good source for isolation of larger quantities of the matriptase complexes (39). We first purified from human milk a matriptase complex with an apparent size of 95 kDa using anti-matriptase mAb 21-9-Sepharose affinity chromatography (Fig. 1A). The 95-kDa complex is capable of being converted by boiling to matriptase plus a 40-kDa protein doublet. Both the 95-kDa complex and matriptase itself were recognized by anti-matriptase mAb 21-9 (Fig. 1B). Although sequence analysis of the 40-kDa binding protein has shown it to be a serine protease inhibitor (see below), some residual gelatinolytic activity was observed for the 95-kDa matriptase-inhibitor complex (Fig. 1C). When matriptase and its binding protein were subjected to N-terminal sequencing, only 11 amino acid residues (VVGGT-DADEGE) from matriptase were obtained, with relatively low recovery. In addition, 12 amino acid residues (GPPPAPPGL-PAG) were obtained from the amino terminus of the 40-kDa binding protein. We searched GenBank TM using these amino acid sequences for proteins related or corresponding to matriptase and its binding protein. The binding protein of matriptase was identified to be a Kunitz-type serine protease inhibitor. This inhibitor is known to be a reversible and competitive serine protease inhibitor that was reported to inhibit the hepatocyte growth factor activator; thus, it was named HAI (9). The detailed characterization of HAI from the matriptase complex is reported in the accompanying paper (39). The 11 amino acid residues from matriptase were identical to a deduced amino acid sequence from a 2.9-kilobase pair cDNA called SNC19. We subsequently obtained nine internal amino acid residues (DYVEINGEK) from matriptase. These were also identical to the predicted translated protein sequences of SNC19. However, numerous stop codons were observed in this deposited SNC19 sequence, resulting in several small predicted translation products. Thus, a 2830-base pair cDNA fragment was obtained by reverse transcriptase-polymerase chain reaction using two primers based on the sequence of SNC19. We observed extensive discrepancy (132 bases) between our sequence and that of SNC19. These analyses suggest that there might be some errors in the bank-deposited SNC19 sequences or that this cDNA encodes a distinct but related protein(s).
Verification of SNC19 cDNA Encoding Matriptase-In addi-tion to the sequence identity of matriptase to a portion of SNC19, we examined the immunoreactivity of anti-matriptase mAbs to the SNC19 to verify whether SNC19 encodes matriptase. SNC19 cDNA was inserted into the eukaryotic expression vector pcDNA3.1 and transfected into COS-7 monkey kidney fibroblasts, which do not express matriptase. An immunoreactive band with the same size of matriptase from T-47D human breast cancer cells (Fig. 2, lane 3) (17), human blood coagulation factor XI (19), and human plasminogen; and the serine protease domains of two transmembrane serine proteases, human TMPRSS2 (32) and the Drosophila Stubble-stubbloid gene (Sb-sbd) (33). Gaps to maximize homologies are indicated by dashes. Residues in the catalytic triads (matriptase His-484, Asp-539, and Ser-633) are boxed and indicated (OE). The conserved activation motif ((R/K)VIGG) is boxed, and the proteolytic activation site is indicated. Eight conserved cysteines needed to form four intramolecular disulfide bonds are boxed, and the likely pairings are as follows: Cys-469 -Cys-485, Cys-604 -Cys-618, Cys-629 -Cys-658, and Cys-432-Cys-559. The disulfide bond Cys-432-Cys-559 is observed in two-chain serine proteases, but not in trypsin and chymotrypsin. Residues in the substrate pocket (Asp-627, Gly-655, and Gly-665) are boxed and indicated (ࡔ). It is evident that the residue positioned at the bottom of the substrate pocket is Asp in trypsin-like proteases, including matriptase, but Ser in chymotrypsin. fifth methionine codon because the sequence GTCATGG matches a favorable Kozak consensus sequence (10). This methionine is followed by four positively charged amino acids and a 14-amino acid hydrophobic region (Ser-18 -Ser-31), a putative signal peptide. Assuming this methionine codon to be the initiator, the open reading frame was 2049 base pairs long, and thus, the deduced amino acid sequence was composed of 683 residues with a calculated molecular mass of 75,626 Da. The two stretches of amino acid sequences (DYVEINGEK and VVGGTDADEGE) obtained from matriptase are located in amino acids 228 -236 and 443-453; thus, the translation frame is likely to be correct. There are three potential N-glycosylation sites with the canonical Asn-X-(Ser/Thr) sequence and an RGD sequence. An RGD sequence from proteins of the extracellular matrix has been found to mediate their interactions with integrins (11).
Structure of the Matriptase Catalytic Domain-A homology search for the deduced amino acid sequence by BLAST in the Swiss-Prot data base revealed that the carboxyl terminus at residues 432-683 of matriptase is homologous to other serine proteases and that matriptase contains the invariant catalytic triad, a characteristic disulfide bond pattern, and overall sequence similarity. Compared with the archetype serine protease chymotrypsin (12,13) and other serine proteases, the three amino acids (His-484, Asp-539, and Ser-633) are likely to correspond to those in chymotrypsinogen (His-57, Asp-102, and Ser-195) and are likely to be essential for catalytic activity (14). The six most conserved cysteines needed to form three intramo-lecular disulfide bonds that stabilize the catalytic pocket have been determined in other chymotrypsin-related proteases. The most likely cysteine pairings in matriptase are thus as follows: Cys-469 -Cys-485, Cys-604 -Cys-618, and Cys-629 -Cys-658). Matriptase also contains two additional cysteines (Cys-432-Cys-559) that correspond to those used in two-chain proteases, such as enteropeptidase (15,16), hepsin (17), plasma kallikrein (18), blood coagulation factor XI (19), and plasminogen (20), but not in trypsin (21) or chymotrypsin (22) (Fig. 4).
A putative proteolytic activation site (Arg-442) of matriptase in an Arg-Val-Val-Gly-Gly motif is similar to the characteristic RIVGG motif in other serine proteases. As mentioned above, a conserved intramolecular disulfide bond is found in those serine proteases that are synthesized as single-chain zymogens and are proteolytically activated to become active two-chain forms. This disulfide bond is proposed to hold together the active catalytic fragment with their noncatalytic N-terminal fragments. This conserved intramolecular disulfide bond has been also observed in matriptase (Cys-432-Cys-559). These sequence analyses suggest that matriptase may be synthesized as a single-chain zymogen and may become proteolytically activated to a two-chain form. If this is the case, the majority of matriptase molecules in the conditioned medium of T-47D breast cancer cells are likely to be in the zymogen form; the two-chain matriptase represents only a minor proportion of the total, consistent with the purified matriptase from T-47D human breast cancer cells exhibiting an apparent size of 80 kDa under reduced conditions (data not shown). This conclusion is also supported by the observation that the proposed N-terminal sequences for the catalytic chain of matriptase are identical to the stretch of amino acid residues (VVGGTDADEGE) that were obtained from milk-derived matriptase with very low recovery when matriptase was subjected to N-terminal sequencing.
The substrate specificity (S 1 ) pocket of matriptase is likely to be composed of Asp-627, positioned at its bottom, with Gly-655 and Gly-665 at its neck, indicating that matriptase is a typical trypsin-like serine protease. The predicted preferential cleavage for matriptase at amino acid residues with positively charged side chains was tested with 10 synthetic substrates with Arg and Lys residues as P1 sites. In our preliminary studies (data not shown), matriptase was able to cleave the following synthetic substrates, presented as follows from the most rapid to the slowest: Boc-Gln-Ala-Arg-AMC, Boc-benzyl-Glu-Gly-Arg-AMC, Boc-Leu-Gly-Arg-AMC, Boc-benzyl-Asp-Pro-Arg-AMC, Boc-Phe-Ser-Arg-AMC, Boc-Val-Pro-Arg-AMC, succinyl-Ala-Phe-Lys-AMC, Boc-Leu-Arg-Arg-AMC, Boc-Gly-Lys-Arg-AMC, and Boc-Leu-Ser-Thr-Arg-AMC. Thus, matriptase may prefer substrates with amino acid residues containing small side chains, such as Ala and Gly, as P2 sites.
Structure Motifs of the Noncatalytic Region of Matriptase-The noncatalytic region of matriptase contains two sets of repeating sequences, which may serve as regulatory and/or binding domains for interactions with other proteins. Four tandem repeats of ϳ35 amino acids including six conserved cysteine residues (Fig. 5A) were found at the amino-terminal region (amino acids 280 -430) of its serine protease domain. They are homologous to the cysteine-containing repeat of the LDL receptor (23) and related proteins (24). All of these cysteine residues are likely be involved in disulfide bonds. In the LDL receptor, the homologous seven repeating sequences serve as the ligand-binding domain. By analogy, the four tandem cysteine-containing repeats in matriptase may also be the sites of interaction with other macromolecules. In addition, the cysteine-containing LDL receptor domain was found in other proteases such as enteropeptidase (15,16).
The amino-terminal region of matriptase (amino acids 42-268) contains another two tandem segments with internal homology. These segments resemble partial sequences, originally identified in complement subcomponents C1r (25,26) and C1s (27,28). This C1r/s domain was also found in other serine proteases, such as enteropeptidase, an activator of trypsinogen (15,16), and in the astacin subfamily of zinc metalloprotease, such as bone morphogenetic protein-1 (29) and Drosophila tolloid gene, a dorsal-ventral patterning protein (30). Although the exact roles of the C1r/s domains in these proteins remain unclear, a deletion of the first C1r/s domain in complement subcomponent C1r impairs tetramer formation of C1r with C1s (31). These results suggest that this domain may be involved in protein-protein interactions. In our previous study (7), a small proportion of the matriptase in breast cancer cells was identified in its complexes. One of the complexes has been isolated from human milk, and the binding protein was identified as a fragment of a Kunitz-type serine protease inhibitor. Whether the LDL receptor domain and the C1r/s domain in matriptase are both involved in the interaction with the Kunitz-type serine protease inhibitor remains to be investigated.
In conclusion, matriptase is a trypsin-like serine protease with several potential regulatory modules (Fig. 6). Its broad spectrum cleavage activity may contribute to the degradation of the extracellular matrix, activation of other proteases, and processing of growth factors. All of these ascribed functions could contribute to important aspects of tumor progression such as cancer invasion and to physiological process such as differentiation and lactation. The presence of potential proteinprotein interaction domains and ligand-binding domains in matriptase suggests that the interaction of matriptase with other macromolecules on the cell surface (such as the luminal surface of the mammary gland) may regulate its activation, inhibition, and presentation. Aberrant regulation of matriptase processing may be involved in the malignant progression of cancers.