Catalytic domain structures of MT-SP1/matriptase, a matrix-degrading transmembrane serine proteinase.

The type II transmembrane multidomain serine proteinase MT-SP1/matriptase is highly expressed in many human cancer-derived cell lines and has been implicated in extracellular matrix re-modeling, tumor growth, and metastasis. We have expressed the catalytic domain of MT-SP1 and solved the crystal structures of complexes with benzamidine at 1.3 A and bovine pancreatic trypsin inhibitor at 2.9 A. MT-SP1 exhibits a trypsin-like serine proteinase fold, featuring a unique nine-residue 60-insertion loop that influences interactions with protein substrates. The structure discloses a trypsin-like S1 pocket, a small hydrophobic S2 subsite, and an open negatively charged S4 cavity that favors the binding of basic P3/P4 residues. A complementary charge pattern on the surface opposite the active site cleft suggests a distinct docking of the preceding low density lipoprotein receptor class A domain. The benzamidine crystals possess a freely accessible active site and are hence well suited for soaking small molecules, facilitating the improvement of inhibitors. The crystal structure of the MT-SP1 complex with bovine pancreatic trypsin inhibitor serves as a model for hepatocyte growth factor activator inhibitor 1, the physiological inhibitor of MT-SP1, and suggests determinants for the substrate specificity.

plasmin, and cathepsin G; matrix metalloproteinases including gelatinases, interstitial collagenases, stromelysins, matrilysin, and membrane-type metalloproteinases; and lysosomal cysteine proteinases such as cathepsin B. Recently, several members of an important, emerging subfamily of serine proteinases, the type II transmembrane serine proteinases (TTSPs; reviewed in Ref. 3), have also been implicated in tumor growth and progression.
MT-SP1 (matriptase/TADG-15/suppressor of tumorigenicity 14; EC 3.4.21) was first isolated by Shi et al. (4) as a novel proteinase that was expressed by human breast cancer cells. The enzyme was initially assigned as a gelatinase, because of its gelatinolytic properties and gelatinase-like molecular weight. However, isolation and sequencing of the cDNA revealed a 683-residue multidomain proteinase with a C-terminal serine proteinase domain. The enzyme was then named matriptase to emphasize its matrix degrading properties and trypsin-like specificity (5). Independently, Takeuchi and coworkers (6) cloned and characterized a type-II membranebound trypsin-like serine proteinase from a human prostatic cancer cell line, which they called membrane-type serine proteinase 1, MT-SP1. This 855-residue proteinase contained two tandem repeats of the complement component C1r/s domain (CUB, derived from complement factor/1R-urchin embryonic growth factor/bone morphogenetic protein) and four tandem repeats of the low density lipoprotein receptor (LDLR) class A domain between the N-terminal transmembrane signal anchor and the C-terminal catalytic domain (5,6). Because the matriptase sequence reported by Lin turned out to be part of the translated MT-SP1 cDNA sequence, matriptase is likely to be a form of MT-SP1 produced by ectodomain shedding (7). Alternatively, the two cDNAs may result from alternative splicing. MT-SP1 is highly expressed in prostate, breast, and colorectal cancers in vitro and in vivo (8), and inhibition of this enzyme suppresses both primary tumor growth and metastasis in a rat model of prostate cancer (5,6). A mouse homologue was cloned by another group and called epithin (9).
The substrate specificity of MT-SP1 has been mapped using a positional scanning synthetic combinatorial library and substrate phage display (10). The preferred cleavage sequences contained Arg/Lys at P4 and basic residues or Gln at P3, small residues at P2, Arg or Lys at P1, and Ala at P1Ј. This specificity profile corresponds well to the cleavage sequences of recognized surface localized protein substrates of MT-SP1 such as the proteinase-activated receptor-2 (PAR2), single-chain uPA (sc-uPA), the proform of MT-SP1, and the hepatocyte growth (scattering) factor, which have been shown in vitro and/or in vivo to be efficiently activated by MT-SP1 (10,11).
Although human breast cancer cells produce MT-SP1 primarily as the free enzyme, in human milk and normal tissues the enzyme is found in complex with an inhibitor called hepatocyte growth factor activator inhibitor 1 (HAI-1; Ref. 12). This membrane-bound inhibitor was originally isolated from human stomach carcinoma cells (13) as a 478 residue glycoprotein containing two Kunitz-type domains separated by an LDLR domain and followed by a transmembrane segment, but has been subsequently detected in several tissues. Soluble, presumably proteolytically cleaved forms of HAI-1, lacking the Cterminal hydrophobic domain, have also been reported (14). In addition to HAI-1, a smaller inhibitor (HAI-2) has been identified and characterized, which lacks the LDLR domain separating the two Kunitz-type modules in HAI-1. Site-directed mutagenesis studies suggested that the first Kunitz domain of HAI-2 is responsible for the inhibitory activity toward hepatocyte growth factor activator (15).
We have expressed and purified the catalytic domain of human MT-SP1 (MT-SP1(cd)), and we have crystallized and solved the high resolution x-ray crystal structure of this enzyme in the presence of benzamidine (Bz). Because recombinant HAI-1 was not available, we also determined the structure of the MT-SP1 complex with the Kunitz-type bovine pancreatic trypsin inhibitor (BPTI), which shares a 36% sequence identity with the first Kunitz domain of HAI-1 and is a nanomolar range inhibitor of MT-SP1. These crystal structures provide important new insights into the molecular determinants of the unique specificity of MT-SP1 not obtainable from modeling (16), and give hints about the interaction of this proteinase with the physiological inhibitor HAI-1. This information is expected to facilitate the design of potent, selective small molecule inhibitors of MT-SP1 that may yield lead compounds for the development of novel anti-cancer agents. Because of the high accessibility of the substrate binding site, the MT-SP1(cd) crystals are well suited for soaking of small molecule inhibitors facilitating the further elaboration of initial lead compounds.

EXPERIMENTAL PROCEDURES
Cloning and Purification-The human prostate adenocarcinoma cell line, PC-3, was purchased from ATCC (CRL-1435). PC-3 cells were lysed in Trizol reagent (Invitrogen, Carlsbad, CA) and total RNA was isolated according to the manufacturer's protocol. Poly(A) ϩ RNAs were purified using oligo(dT) beads (Oligotex; Qiagen, Valencia, CA) and subsequently converted to single-stranded cDNAs by reverse transcription using ProSTAR first-strand reverse transcriptase-PCR kit (Stratagene, La Jolla, CA) and SuperScript II RNase H Ϫ reverse transcriptase (Invitrogen). Single-stranded cDNAs from PC-3 cell RNA were subjected to PCR with sense and antisense degenerate oligonucleotide primers (sense primer, 5Ј-TGGRT(I)VT(I)WS(I)GC(I)RC(I)CAYTG-3Ј; antisense primer, 5Ј-(I)GG(I)CC(I)CC(I)SWRTC(I)CCYT(I)RCA(I)G-HRTC-3Ј, where R ϭ A or G; V ϭ G, A, or C; W ϭ A or T; S ϭ G or C; Y ϭ C or T; and H ϭ A, T, or C). The primer sequences corresponded to two highly conserved regions in all chymotrypsin-like serine proteinases. PCR products were purified using a gel extraction kit (QIAquick gel extraction kit; Qiagen), ligated into pCR2.1-TOPO (Invitrogen), and transformed into Escherichia coli TOP10 cells (Invitrogen). To obtain additional MT-SP1 cDNA sequences, both rapid amplification of cDNA ends and gene-specific amplification reactions were performed. A human prostate Marathon-Ready cDNA (CLONTECH, Palo Alto, CA) was used to isolate part of the cDNA encoding MT-SP1. The 3Ј region of MT-SP1 cDNA was successfully obtained by a 3Ј-rapid amplification of cDNA ends reaction using a gene-specific primer, 5Ј-CACCCCTTCTT-CAATGACTTCACCTTCG-3Ј. The 5Ј end of the MT-SP1 proteinase domain was obtained by a PCR amplification reaction using two MT-SP1-specific primers, 5Ј-TACCTCTCCTACGACTCC-3Ј for the sense primer and 5Ј-GAGGTTCTCGCAGGTGGTCTGGTTG-3Ј for the antisense primer. These fragments were subcloned into pCR2.1-TOPO.
After transformation into E. coli cells, the insert DNAs were characterized by Southern blot analysis (using the internal cDNA fragment as probe) and by DNA sequence analysis. To obtain a cDNA encoding the entire proteinase domain of MT-SP1, an end-to-end PCR amplification using the gene-specific primers 5Ј-TCTCTCGAGAAAAGAGTTGTTGG-GGGCACGGATGCGGATGAG-3Ј for the 5Ј end and 5Ј-ATTCGCGGC-CGCCTATACCCCAGTGTTCTCTTTGATCCA-3Ј for the 3Ј end. An 800-bp DNA fragment was amplified, purified, digested with XhoI and NotI, and subcloned into the Pichia pastoris expression vector, pPIC9KX. Transformation was performed in a Bio-Rad GenePulser II (voltage ϭ 1500 V, capacity ϭ 50 microfarads, and resistance ϭ 200 ohms). The screening of transformed Pichia clones for MT-SP1 expression was performed by testing clones with Spectrozyme t-PA (CH 3 SO 4 -D-HHT-Gly-Arg-pNA.HCl; American Diagnostica).
The production of multimilligram amounts of MT-SP1 was carried out by fermentation in a BioFlo 3000 fermentor (New Brunswick Scientific, NJ) using a SMD1168/pPIC9K:MT-SP1 Sac SC1 clone. The medium was inoculated with an overnight culture of the P. pastoris transformant. Cells and cell debris were removed by centrifugation, the supernatant was concentrated, and the buffer was exchanged into 50 mM Tris-HCl, 50 mM NaCl, 0.05% Tween 80, pH 8.0 (buffer A). The concentrated MT-SP1-containing solution was applied onto a 150-ml benzamidine column equilibrated with buffer A, and the column was washed with 50 mM Tris-HCl, 1.0 M NaCl, 0.05% Tween 80, pH 8.0 (buffer B), and eluted with 50 mM Tris-HCl, 1.0 M L-arginine, 0.05% Tween 80, pH 8.0 (buffer C). Fractions containing MT-SP1 activity were pooled and concentrated. The buffer was exchanged into 50 mM Na 2 HPO 4 , 125 mM NaCl, pH 5.5 (buffer D), and the partially purified MT-SP1 was passed through a Q-Sepharose Fast Flow HiTrap column (Amersham Biosciences, Inc.) pre-equilibrated with buffer D. The flowthrough was collected, and the protein concentration was determined by measurement of A 280 (using an extinction coefficient of 2.012 mg/ A 280 ). Purified MT-SP1 was deglycosylated with endoglycosidase H (ProZyme, 5 units/ml) and further purified on a Ä kta Explorer system using a 7-ml Source 15Q anion exchange column (Amersham Biosciences, Inc.). The protein was eluted in a buffer containing 50 mM HEPES, pH 6.5, with a 0 -0.33 M NaCl gradient. Fractions containing protein were pooled, and benzamidine was added to a final concentration of 10 mM. The protein purity was examined by SDS-PAGE, and the protein concentration was determined by measurement of A 280 .
Crystallization, Structure Determination, and Crystallographic Refinement-Plate-like crystals of the Bz-MT-SP1 complex were grown from 0.1 M Tris-HCl, pH 8.0, 1.5 M ammonium sulfate, 3% ethanol at 18°C using the hanging drop vapor diffusion technique. These crystals belong to the orthorhombic space group C 222 , diffract x-rays to beyond 1.3-Å resolution, and have one molecule in the asymmetric unit. The Bz-MT-SP1 crystals were transferred to 0.1 M Tris-HCl, pH 8.0, 1.5 M ammonium sulfate, 23% glycerol. A complete native data set to 1.3-Å resolution was collected from a single crystal under a nitrogen stream at 100 K using synchrotron radiation and a CCD system (MAR Research, Hamburg, Germany) at DESY, Hamburg, Germany.
These data were evaluated with the MOSFLM package (43) and loaded and scaled using SCALA from the CCP4 program suite (17). For the determination of the orientation and position of the MT-SP1 molecules in the crystals, rotational and translational searches were performed with AMoRe (18) using data from 20-to 3.5-Å resolution and a modified enteropeptidase search model (19) with all nonidentical residues reduced to Ala. A unique solution was found with a correlation factor of 39.4% and an R-factor of 46.2%; the corresponding values of the next best solution were 12.8 and 56.3%, respectively. Crystallographic refinement was done in several cycles consisting of model building performed with MAIN (20) and conjugate gradient minimization and simulated annealing using CNS (21). The target parameters of Engh and Huber (22) were used. This procedure converged rapidly, yielding a model with excellent parameters (see Table I). In the final model building/refinement cycles, water molecules were inserted at stereochemically reasonable sites, and individual restrained atomic B-values were refined. 5.1% of all reflections were omitted from the refinement to calculate the R free ; the final R and R free are 18.4 and 19.3%, respectively, for all data to 1.3 Å. The whole main chain of the MT-SP1 catalytic domain is in appropriate electron density. Only a few side chains projecting out into solution are partially undefined in the electron density; the occupancy of all undefined atoms was set to zero. Modeling was performed interactively using MAIN; these models were energy refined with CNS.
Crystals of the complex were grown from 0.1 M Hepes, pH 6.5, 20% polyethylene glycol 4000, 2% CsCl 2 at 18°C using the hanging drop vapor diffusion technique. These crystals belong to the triclinic space group P 1 , diffract x-rays to beyond 2.9-Å resolution, and have two molecules in the asymmetric unit. After transfer of one crystal to its mother solution containing 20% glycerol, a complete native data set to 2.93-Å resolution was collected from a single crystal under a nitrogen stream at 100 K using an Image Plate system (MAR Research). The data were processed as described above, with a combined model of the BPTI-trypsin complex (23) and our coordinates as replacement input. The procedure converged rapidly, leading to R and R free of 19.9 and 26.9%, respectively. 8.1% of the reflections were omitted from the refinement for the calculation of the R free .

RESULTS
Overall Structure of the MT-SP1 Catalytic Domain-Our cloning, expression, and purification procedure described under "Experimental Procedures" yielded multimilligram quantities of highly purified MT-SP1(cd), which in the presence of Bz formed crystals diffracting to beyond 1.3-Å resolution. MT-SP1(cd) resembles an oblate ellipsoid with diameters of 35 and 50 Å. Similar to other trypsin-like serine proteinases, the chain is folded into two adjacent six-stranded ␤-barrels strapped together by three trans-domain segments. The surface contains several turn structures, a 3 10 -helix (residues 60(I) to 64, using the chymotrypsinogen numbering, see Figs. 1A and 2), and two ␣-helices (segments 164 -172 and 235-243). The catalytic triad is located along the junction of the barrels, whereas the active site cleft runs perpendicular to this junction.
Recombinant MT-SP1(cd) consists of the B-chain of mature MT-SP1. The chain starts with Val 16 (corresponding to Val 617 (g) in the generic MT-SP1 sequence numbering; Ref. 6). Cys 122 , which would be disulfide-linked with Cys 1 of the Achain in the full-length MT-SP1 molecule, is an unpaired surface located cysteine residue in this construct. The MT-SP1 B-chain contains three disulfide bridges (Cys 42 -Cys 58 , Cys 168 -Cys 182 , and Cys 191 -Cys 220 ) that are also present in most other trypsin-like serine proteinases. However, based on the electron density of the 1.3-Å Bz-MT-SP1(cd) structure, the disulfide bridge Cys 42 -Cys 58 is present in only about half of the crystallized protein molecules. In the other half, these cysteines clearly exist in the reduced form, with Cys 58 S␥ located in the same position as in the Cys 42 -Cys 58 disulfide, and Cys 42 S␥ rotated toward the interior of the molecule between Tyr 59 O␥ and the Leu 33 side chain, thereby avoiding steric hindrance with the adjacent Cys 58 thiol group. A similar partial opening of the Cys 42 -Cys 58 disulfide bridge has recently been found in a high resolution structure of the recombinant human uPA catalytic domain (24).
The ␣-ammonium group of the N-terminal Val 16 of MT-SP1 forms the highly conserved internal salt bridge with the side chain carboxylate of Asp 194 , stabilizing the substrate binding site and the active site in the catalytically active conformation. In contrast to most (chymo)trypsin-like proteinases, the whole C-terminal region of MT-SP1, including the last residue, Val 244 , is fully defined by electron density. After the conserved C-terminal ␣-helix, the MT-SP1 polypeptide makes a sharp turn at Gly 243 and forms an as yet unobserved surface-located salt bridge with the Arg 235 guanidyl group via its C-terminal carboxylate group.
An optimal superposition with several related serine proteinases reveals highest topological similarity of MT-SP1(cd) with the catalytic domain of another membrane-type serine proteinase, enteropeptidase/enterokinase (Fig. 1B). 222 C ␣ atoms of topologically equivalent residues are found within a 2.0-Å distance, corresponding to an root mean square deviation of 0.70 Å, with 109 of these topologically equivalent residues being identical. The next best fit, with a 0.73-Å root mean square deviation for 212 C ␣ atoms, is observed with bovine trypsin (25), followed by bovine chymotrypsin (26) and human thrombin (27). The topological equivalence with chymotrypsin(ogen) formed the basis for the sequence alignment and the chymotrypsinogen numbering of the MT-SP1 catalytic domain used in this paper (Fig. 2). This alignment requires a nine-residue insertion between residues Ile 60 and Pro 61 , single-residue insertions behind residues Gly 184 , Glu 186 , Ala 204 , and Ala 221 , and single-residue deletions at positions 149 and 218. Inserted residues are marked by suffixes following the residue number of the preceding common residue.
Loops Surrounding the Active Site Cleft-The narrow substrate specificity of MT-SP1 arises from unique structural determinants of its active site cleft that is shaped in part by the surrounding surface loops. In the following brief description, these loops (defined according to the residue number of their central residue) will be addressed in an anticlockwise manner with respect to the standard orientation displayed in Fig. 1A.
To the east of the catalytic triad, the rigid 37 loop projects out of the molecular surface of MT-SP1. This loop contains two of the "zymogen triad" residues, Ser 32 and His 40 , which, together with Asp 194 , would stabilize the inactive zymogen-like conformation of the proenzyme (28,29). Around Gln 38 , this loop deviates markedly from the path followed in most other chymotrypsin-like proteinases (Fig. 1B). This conformation seems to be mainly stabilized by the Gln 38 side chain, which is held in an exposed position via hydrogen bonds made through its terminal carboxamide group with Tyr 60 (G) O␥ and with the -electron system of the Phe 60 (E) phenyl ring.
The most striking feature of MT-SP1 is the unusually large 60 insertion loop, which is of the same length (nine additional residues) and exhibits a similar ␤-hairpin conformation as the corresponding loop in thrombin (27), but is oriented differently (Fig. 1B). The eight residues of MT-SP1 between Tyr 59 and  catalytic residues Asp 102 and His 57 from the solvent, delimits the S2 subsite. Unique to MT-SP1 is the well defined benzyl side chain of Phe 97 , which extends away from the apex of the 99 loop partially shielding the circularly arranged carbonyl groups of Asp 96 , Phe 97 , and Thr 98 (see Fig. 3).
The "autolysis" or 145 loop of MT-SP1 is one residue shorter and less exposed than the corresponding segment of chymotrypsin. This loop encircles the extended side chain of His 143 and forms the southern boundary or "floor" of the active site cleft (Fig. 4). The presence of Gly 151 in this loop is noteworthy, because this would allow for the accommodation of bulkier P2Ј side chains of peptide substrates.
In the pancreatic serine proteinases, the 70 -80 loop forms the calcium binding site, with the carboxylates of Glu 70 and Glu 80 coordinating the calcium ion. In MT-SP1, both positions are occupied by aliphatic residues (Leu 70 and Val 80 ), hence rendering the catalytic domain calcium-independent. Together with the proximal side chain parts of Arg 76 and Ala 77 (A), the hydrophobic side chains at positions 70 and 80 clamp both loop ends together in a manner similar to that for the ion metal in the calcium-containing serine proteinases.
Active Site and Substrate Binding Sites of MT-SP1-The residues of the active site triad, Ser 195 , His 57 , Asp 102 , and other catalytic elements such as the oxyanion hole created by the main chain nitrogens of Gly 193 and Ser 195 are arranged in the active site cleft exactly as in trypsin and chymotrypsin. The specificity pocket S1, which opens to the west of Ser 195 , is bordered by segments Val 213 -Gly 220 , Ser 190 -Ser 195 , Pro 225 -Tyr 228 , and the Cys 191 -Cys 220 disulfide bridge. The side chains of all MT-SP1 residues lining the interior of this pocket with their side chains are virtually superimposable with the corresponding residues in trypsin. This applies not only to Asp 189 at the bottom of the pocket, determining the specificity for basic residues, but also to Ser 190 , making Lys P1 residues equally acceptable as Arg, and residues forming the inner wall (Gly 226 , Tyr 228 , and Val 213 ). The phenyl group of the bound benzamidine molecule is sandwiched between the parallel peptide groups Trp 215 -Gly216 and Cys 191 -Gln 192 , whereas its amidino group opposes the Asp 189 carboxylate at the bottom of the pocket forming a two-O/two-N salt bridge, with the distal ni-trogen atom additionally hydrogen-bonded to Gly 219 O and to a buried solvent molecule.
An extended S2/S4 pocket based on the indole moiety of Trp 215 is located above the S1 pocket. This pocket is delimited to the east by the flat side of the His 57 imidazole ring and to the north by the flat side of the Phe 99 benzyl moiety, and shaped to accept small hydrophobic P2 residues (Fig. 4). The adjacent S4 pocket is bordered by the carbonyl groups of Asp 96 , Phe 97 , and Thr 98 , which are partially shielded from solvent by the benzyl group of Phe 97 , a residue rarely present at this position in (chymo)trypsin-like proteinases. Because of the presence of the -electron system of Phe 99 and the nearby Asp 96 , this S4 pocket is well suited to accommodate positively charged side chains. It is noteworthy that the benzyl group of Phe 97 is fully defined in Bz-MT-SP1, in striking contrast to the equivalent Arg 97 residue in enteropeptidase (19). The hydrophobic S1Ј/S3Ј cavity to the east of Ser 195 is centered on the Cys 42 -Cys 58 disulfide bridge, and bordered by the isobutyl side chain of Ile 41 and the aromatic side chains of Tyr 60 (G) and Trp 64 . These positions can thus be preferentially filled with large hydrophobic P1Ј or P3Ј side chains, but also the adjacent S2Ј pocket, lined by Gln 192 and His 143 , gives space for hydrophobic residues (Fig. 4).
Interaction with Kunitz-type Inhibitors-Because the prototypic Kunitz-type inhibitor BPTI is closely related to the first Kunitz domain of HAI-1 at the sequence level, we crystallized the complex of BPTI with MT-SP1 and solved its structure at 2.9-Å resolution (Figs. 5 and 6). BPTI docks into the concave substrate binding surface of MT-SP1 via the reactive site loop (Thr 11 (I) to Ile 18 (I), with the Lys 15 (I)2Ala 16 (I) scissile bond) and the secondary binding segment (Gly 36 (I) to Arg 39 (I)), similar to the prototypical trypsin complex (23). The reactive site loop of BPTI runs anti-parallel to MT-SP1 segments Ser 214 -Asp 217 and His 40 -Ile 41 , with the P1-Lys 15 (I) side chain extending into the S1 pocket and interacting via its ⑀-ammonium group with the Asp 189 carboxylate. Because of the main chain kink at P3-Pro 13 (I), which is typical for Kunitz-type inhibitors, the P3 pyrrolidine ring nestles into the S4 depression.
The Phe 99 side chain, if positioned as observed in Bz-MT-SP1, would clash with the Cys 14 (I)-Cys 38 (I) disulfide bridge. In the BPTI complex, this clash is avoided by a rotation of the Phe 99 benzyl group away from this site, in this way enlarging the S2 cavity and making it accessible for the disulfide bridge (Fig. 5). The Arg 39 (I) side chain extends toward the 99 loop of MT-SP1, with its guanidyl group stacking between the Phe 97 and the Phe 99 benzyl moieties and hydrogen bonding to the Phe 97 carbonyl oxygen much more favorably than in the BPTItrypsin complex, because of the additional shielding by the Phe 97 side chain. On the primed side of the reactive center loop, the P1Ј-Ala 16 (I) and the P3Ј-Ile 18 (I) side chains form a hydrophobic knob that interacts with the hydrophobic S1Ј/S3Ј cavity.
The characteristic protruding 60 loop of MT-SP1 provides a large extra surface to make a number of favorable new contacts with BPTI, and hence with any other canonically bound Kunitz domain (Fig. 5). A comparison with Bz-MT-SP1 shows that BPTI induces some conformational changes in the 60 loop, to allow for instance the formation of a salt bridge between Asp 60 (B) (MT-SP1) and Arg 20 (I), and of charged hydrogen bonds between Arg 60 (C) and the carbonyl groups of Lys 41 (I) and Asn 44 (I). The well defined 60 loop of MT-SP1 thus exhibits some capability to adapt to the rigid Kunitz-type inhibitor. In this respect it strongly differs from the also exposed but much more rigid thrombin 60 loop, which delimits the S2 cavity and normally prevents access of bulkier substrates and protein inhibitors (30). The 60 loop of MT-SP1, in contrast, does not impair BPTI binding, but instead strengthens it by forming a number of additional favorable interactions (Figs. 5 and 6). DISCUSSION The catalytic domain of human MT-SP1 reveals the overall fold of a (chymo)trypsin-like serine proteinase, but displays unique properties such as the hydrophobic/acidic S2/S4 subsites and an exposed 60 loop, which considerably affect its substrate recognition and binding properties. The MT-SP1 polypeptide fold deviates notably from that of the serine proteinase tryptase, however, which exhibits six novel surface loops through which the tryptase monomers form the intermolecular interactions that stabilize the unusual tetramer structure (31). Thus, MT-SP1 is not closely related to tryptase, and the name "matriptase" is therefore potentially confusing. However, because enterokinase and hepsin are members of the "MT-SP" family that were discovered and described before matriptase, assignment of the name MT-SP1 to matriptase may also produce controversy. One potential solution to these nomenclature issues would be to adopt either the "MT-SP" or the TTSP (3) terminology that was suggested previously by others, and to assign numbers to individual family members based on their date of discovery.
MT-SP1 can cleave selected synthetic substrates as effectively as trypsin, but exhibits a significantly more restricted specificity than trypsin (10). This may reflect, on the one hand, the near identity of the S1 specificity pockets of these two enzymes, presumably allowing both tight binding of substrates and optimal presentation of their scissile peptide bonds to the enzyme active site, and, on the other hand, clear structural distinctions between the extended binding subsites of the two proteinases. With respect to shape and chemical composition, the S1 pocket of MT-SP1 (and trypsin) provides good complementarity for Lys as well as arginine P1 residues. The efficient accommodation of P1-Lys residues, which has been demonstrated experimentally (5,10), is facilitated by Ser 190 , whose side chain provides an additional hydrogen bond acceptor to stabilize the buried ␣-ammonium group. P1-Arg residues are accommodated equally well, because of the overall better space filling by the guanidinium group (30).
The hydrophobic S2 groove of MT-SP1 is shaped to accommodate small to medium-sized hydrophobic side chains of P2-L-amino acids, with a wide hydrophobic exit toward the bulk solvent that would expose longer and more polar side chains. In addition, the rotation of the Phe 99 benzyl group upon BPTI binding suggests that the S2 subsite of MT-SP1 is not rigid. This observation is consistent with experimental findings from positional scanning (10), which indicated that MT-SP1 accepts a broad range of amino acids in the P2 position.
Craik and co-workers (10) have reported previously that MT-SP1 exhibits a strong preference for peptide substrates that contain Arg or Lys at P4 position, and a certain preference for either Gln or basic residues at P3. This interesting specificity may be mediated by electrostatic interactions with the acidic side chains of Asp-217 and/or Asp-96 (see Fig. 4), which could favorably pre-orient specific basic peptide substrates as they approach the enzyme active site cleft. Because canonically binding substrates align antiparallel to the Ser 214 -Asp 217 segment in an extended conformation, the side chains of P4-Arg or Lys residues presumably extend toward the 99 loop with their guanidyl or ammonium groups not reaching the Asp 96 carboxylate directly but forming a charged hydrogen bond with Phe 97 O. In addition, they can favorably interact with the overall negative potential created by the 96-, 97-, and 98-carbonyls, in full agreement with the preference of MT-SP1 for basic P4 residues. This MT-SP1 S4 subsite is reminiscent of the corresponding region of coagulation factor Xa, which has also been shown to function as a cation binding site (32). The Arg 39 (I) side chain of BPTI (see Fig. 5), although provided by the secondary binding segment of the inhibitor, is a nice example of such interactions; in the BPTI-MT-SP1 complex, the hydrogen bond between the Arg 39 (I) guanidyl and the Phe 97 O is shielded from solvent by the unique Phe 97 benzyl side chain (see Figs. 4 and 5), which would significantly strengthen these interactions.
In canonically bound peptide substrates, the side chain of a P3 residue projects out of the active site cleft, where it can hydrogen-bond the carboxamide group of Gln 192 . It is also possible, however, that a bound substrate could adopt a "kinked" conformation at the P3 position, as seen for Kunitztype inhibitors (Fig. 5). In this alternative conformation, the P3 side chain extends into the S4 subsite, i.e. toward the 99 loop. To allow the guanidyl or the ammonium group of a P3-Arg or Lys residue, bound to MT-SP1 in the alternative conformation, to form direct hydrogen bonds with Phe 97 O, the substrate main chain around P3 would have to rotate and thereby weaken considerably the inter-main chain hydrogen bonds to Gly 216 . In either conformation, however, a basic P3 side chain would interact favorably with the negative potential of the MT-SP1 S4 pocket (Fig. 4). Long range electrostatic interactions between side chain charges and complementary surface potentials are often found in protein-protein complexes. For instance, in the thrombin complexes with hirudin and other protein inhibitors, the removal of the charges of acidic residues not involved in direct salt bridges has been shown to affect the electrostatic interactions strongly (33,34). The relatively low probability of the simultaneous occurrence of Arg/Lys residues at P3 and P4 in good MT-SP1 substrates (10) would then be explained by mutual charge compensation and exclusion from the same (S4) site.
The specificity of MT-SP1 differs substantially from that of trypsin, because MT-SP1 does not indiscriminately cleave peptide substrates at accessible Lys or Arg residues, but instead requires recognition of additional residues surrounding the scissile peptide bond. This requirement for recognition of an extended primary sequence for efficient catalysis of substrates suggests that MT-SP1 is a relatively specific proteinase that may play a regulatory role (5,6). Recognition of an extended primary sequence appears to be also required for efficient cleavage of macromolecular substrates by MT-SP1. Efficient auto-activation of MT-SP1 entails recognition and cleavage of an Arg-Gln-Ala-Arg P4-P1 target sequence. MT-SP1 can also efficiently activate the proteinase-activated receptor-2 (PAR2), sc-uPA (10), and the hepatocyte growth factor/scatter factor (11). These extracellular surface-localized proteins display the P4 to P1 target sequences Ser-Lys-Gly-Arg, Pro-Arg-Phe-Lys, and Lys-Gln-Gly-Arg, respectively, which match closely the MT-SP1 cleavage specificity requirements observed for small peptidic substrates (10). Another indication of the substrate specificity of MT-SP1 is that the enzyme does not activate proteins closely related to these substrates, such as PAR-1, PAR-3, PAR-4, and plasminogen, that do not display target sequences matching the extended MT-SP1 specificity near the scissile bond.
Because MT-SP1 has been found co-localized with sc-uPA in several cell types, it has been suggested that MT-SP1 may be a physiologically relevant activator of this proteinase zymogen (10). uPA plays an important role in angiogenesis and/or tumor progression and can also activate both plasminogen and matrix metalloproteinases (35). The latter, in turn, play important roles in matrix degradation and re-modeling, events that are required both during angiogenesis and tumor invasion and metastasis. In addition, MT-SP1 can directly activate hepatocyte growth factor, a protein that promotes cell growth as well as angiogenesis; therefore, it may play both direct and indirect roles in cell growth and migration and, when improperly regulated, may contribute to tumor angiogenesis, growth, and progression.
In normal tissues, the proteolytic activity of proteinases is carefully controlled localizing their action in time and space. As mentioned above, HAI-1, a type-II transmembrane protein containing two Kunitz-type domains, appears to be the primary physiological inhibitor of MT-SP1 (12). Based on the structure of our BPTI-MT-SP1 complex, we have modeled the complex between MT-SP1 and the first Kunitz domain of HAI-1, which is 36% identical to BPTI and appears more likely to exhibit high affinity for the MT-SP1 active site than the second HAI-1 Kunitz domain. This model suggests that a large number of favorable interactions could form between the first HAI-1 Kunitz domain and MT-SP1, both between the Gly 12 (I)-Arg 13 (I)-Cys 14 (I)-Arg 15 (I)2Gly 16 (I)-Ser 17 (I)-Phe 18 (I) reactive site loop (using the BPTI nomenclature, with Arg 15 (I)2Gly 16 (I) representing the scissile bond) and the active site cleft, and also at secondary interaction sites such as made by the 60 loop. In the primed side, the first HAI-1 Kunitz domain possesses an uncommon P3Ј-Phe 18 (I), which could, because of the small P1Ј-Gly residue, be nicely packed in the large hydrophobic S1Ј/S3Ј pocket of MT-SP1 (Fig. 4). The side chain of the P3-equivalent Arg 13 (I) is expected to extend into the S4 subsite, where its guanidyl group would make favorable electrostatic interactions with the 99 loop carbonyls, similar to those observed for Arg 39 (I) in the BPTI complex (see Fig. 5).
The putative Gly 12 (I)-Leu 13 (I)-Cys 14 (I)-Lys 15 (I)2Glu 16 (I)-Ser 17 (I)-Ile 18 (I) reactive site loop of the second Kunitz domain of HAI-1, in contrast, does not match the reported substrate specificity of MT-SP1. The second HAI-1 domain does, however, possess a number of negatively charged residues in its Cterminal ␣-helix that could form favorable electrostatic interactions with basic surface residues of MT-SP1 that map in or near the region corresponding to the anion binding exosite I of thrombin (such as Arg 75 , Arg 83 , Arg 85 , Lys 110 ; see Fig. 6). Such an additional exosite binding of the second HAI-1 domain accompanying the interaction of the first domain with the MT-SP1 active site would considerably increase the affinity and specificity of HAI-1 for MT-SP1. A similar cooperative action with consequent increase in affinity and specificity has been previously observed in the complex between thrombin and ornithodorin, a two-domain Kunitz-type inhibitor derived from the blood-sucking tick Ornithodoros moubata (36), where the first and the second Kunitz domains interact (noncanonically, however) with the active site and electrostatically with exosite I, respectively, of thrombin.
The role of the four LDLR domains that precede the catalytic domain remains unclear, but they have been implicated in mediating interactions with other membrane or membraneassociated proteins (5,6). The LDLR (4) domain of MT-SP1 was modeled (Fig. 6) based on the structure of the fifth low density lipoprotein class A binding domain of LDLR (37), which shares a conserved disulfide bonding pattern and a 45% amino acid identity with LDLR (4). In full-length MT-SP1, the last cysteine residue (Cys 604 (g)) of LDLR (4) is separated by only one amino acid (Asp 605 (g)) from the first residue (Cys 606 (g) ϭ Cys 1 ) of the catalytic domain, which, in turn, forms an intradomain disulfide bridge with Cys 122 . This implicates close proximity of both modules. A careful inspection of the electrostatic potentials suggests a distinct rotational orientation of LDLR (4) and the catalytic domain relative to each other, which would create both favorable interactions between electrostatic potentials of the two domains and a good steric fit of the two complementary surfaces. Our docking experiment predicts four interdomain salt bridges (Glu 593 (g)-Arg 206 , Lys 592 (g)-Asp 125 , Asp 605 (g)-Arg 119 , and Asp 605 (g)-Lys 602 (g)) and charged hydrogen bonds from Lys 239 and Arg 23 5 to the 589(g)-590(g) backbone. The LDLR (4) domain of MT-SP1 carries all four acidic side chains (of Asp 590 (g), Asp 594 (g), Asp 600 (g), and Glu 601 (g)) engaged in calcium coordination in the fifth low density lipoprotein-A domain of LDLR (37). The suggested association of LDLR (4) with the catalytic domain does not directly involve this putative calcium binding site, which would be located close to the inter-face, however. The location of the LDLR (4) domain on the proteinase surface opposite to the active site cleft allows irrestricted substrate or inhibitor binding to MT-SP1.
It has only recently been appreciated that TTSPs represent an important, emerging subfamily of the (chymo)trypsin enzyme family. With the exception of enterokinase, an important digestive enzyme with a unique substrate specificity, the biological roles of individual members of this recently discovered enzyme subfamily have not yet been unambiguously established. The first review of this new class was published earlier this year and discussed seven human TTSPs (3). As described by these authors, the majority of these human proteinases have been associated either with tumor cells and/or cell growth. Because both cell growth and tumor progression (including angiogenesis) are expected to require localized activation of growth factors and degradation and/or remodeling of the extracellular matrix, it seems likely that these processes require cell associated, proteolytic activities. Other membrane-associated proteinases such as the disintegration and metalloproteinase (ADAM), matrix metalloproteinases, and uPA have previously been implicated in these processes, and it is conceivable that specific members of the TTSPs will also contribute to these key processes. The development of highly potent and selective inhibitors of individual TTSPs, therefore, may represent an exciting new strategy to discover compounds with anti-angiogenic or antitumor activities.
MT-SP1 has been implicated in the progression of prostate cancer in a rat model, and our solution of the high resolution structures of the proteinase domain, in complex with benzamidine or BPTI, is an important first step toward the design of potent selective inhibitors. Because the proteinase active site is fully solvent-exposed in the Bz-MT-SP1 crystals, as demonstrated by initial soaking experiments (data not shown), small molecule inhibitors can be soaked into these crystals, thereby facilitating rapid progress in structure-based drug design efforts. In addition, the structures reported here can be used to model the structure of other cancer-associated type II transmembrane serine proteinases, and may therefore facilitate efforts to find potent, specific inhibitors of those important proteinases as well. First homology-based inhibitor design studies have been undertaken by Enyedy et al. (16), based on a thrombin model, which led to the development of selective bis-benzamidine inhibitors. This work showed that there is sufficient structural difference between MT-SP1 and closely related serine proteinases for the development of potent and selective inhibitors. Small molecule inhibitors of TTSPs can be used not only to discover the biological and pathological roles of specific members of this intriguing new enzyme subfamily but, because of the strong association of these enzymes with tumor cells, can also be tested as potential lead anticancer compounds.