Molecular Analysis of Ulilysin, the Structural Prototype of a New Family of Metzincin Metalloproteases*

The metzincin clan encompasses several families of zinc-dependent metalloproteases with proven function both in physiology and pathology. They act either as broad spectrum protein degraders or as sheddases, operating through limited proteolysis. Among the structurally uncharacterized metzincin families are the pappalysins, of which the most thoroughly studied member is human pregnancy-associated plasma protein A (PAPP-A), a heavily glycosylated 170-kDa multidomain protein specifically cleaving insulin-like growth factor (IGF)-binding proteins (IGFBPs). Proulilysin is a 38-kDa archaeal protein that shares sequence similarity with PAPP-A but encompasses only the pro-domain and the catalytic domain. It undergoes calcium-mediated autolytic activation, and the mature protein adopts a three-dimensional structure with two subdomains separated by an active site cleft containing the catalytic zinc ion. This structure is reminiscent of human members of the adamalysin/ADAMs (a disintegrin and a metalloprotease) family of metzincins. A bound dipeptide yields information on the substrate specificity of ulilysin, which specifically hydrolyzes IGFBP-2 to -6, insulin, and extracellular matrix proteins but not IGFBP-1 or IGF-II. Accordingly, ulilysin has higher proteolytic efficiency and a broader substrate specificity than human PAPP-A. The structure of ulilysin represents a prototype for the catalytic domain of pappalysins.

The metzincin clan encompasses several families of zinc-dependent metalloproteases with proven function both in physiology and pathology. They act either as broad spectrum protein degraders or as sheddases, operating through limited proteolysis. Among the structurally uncharacterized metzincin families are the pappalysins, of which the most thoroughly studied member is human pregnancy-associated plasma protein A (PAPP-A), a heavily glycosylated 170-kDa multidomain protein specifically cleaving insulin-like growth factor (IGF)binding proteins (IGFBPs). Proulilysin is a 38-kDa archaeal protein that shares sequence similarity with PAPP-A but encompasses only the pro-domain and the catalytic domain. It undergoes calcium-mediated autolytic activation, and the mature protein adopts a three-dimensional structure with two subdomains separated by an active site cleft containing the catalytic zinc ion. This structure is reminiscent of human members of the adamalysin/ADAMs (a disintegrin and a metalloprotease) family of metzincins. A bound dipeptide yields information on the substrate specificity of ulilysin, which specifically hydrolyzes IGFBP-2 to -6, insulin, and extracellular matrix proteins but not IGFBP-1 or IGF-II. Accordingly, ulilysin has higher proteolytic efficiency and a broader substrate specificity than human PAPP-A. The structure of ulilysin represents a prototype for the catalytic domain of pappalysins.
Insulin-like growth factors (IGF)-I and -II 4 regulate human somatic growth and development. Their activity is also correlated with several diseases, including atherosclerosis, cardiovascular disease, diabetes, and cancer (1). Most circulating IGF molecules are sequestered in complexes with soluble IGF-binding proteins (IGFBP-1 to -6) (2). IGFBPs antagonize binding of IGFs to their receptors because of their much higher affinity, and they are thus carriers, mediators, and reservoirs of IGFs (3). IGFs are released from these complexes with IGFBP through proteolytic inactivation mediated by IGFBP proteases, which include serine proteinases, cysteine proteinases, and metalloproteases (MPs). Among these, human PAPP-A specifically inactivates IGFBPs (4 -6). This MP was originally identified as an antigen in human pregnancy plasma (7). It is ubiquitously expressed and plays central roles in ovarian follicular development, myogenesis, human embryo implantation, and wound healing (8). Mature PAPP-A is a glycosylated multidomain protein of 1,547 residues that specifically hydrolyzes human IGFBP-4 in an IGF-dependent manner. Only human IGFBP-5 and bovine and porcine IGFBP-2 have been identified as further substrates (6). In addition to its proteolytic domain, three Lin12-Notch repeats (LNR-1, -2, and -3), modules that regulate ligand-induced proteolytic cleavage of the Notch receptor, and five complement control protein modules (CCP1-5) have been identified (9). Human PAPP-A is the founding member of the pappalysin family of MPs, which further comprehends the paralogue PAPP-A2 (10) and has been included in the metzincin clan of MPs (5).
This clan encompasses protease families containing an extended zinc-binding consensus sequence (ZBCS), HEXXHXXGXXH/D, which comprises three zinc ligands (underlined) and a glutamate, which acts as a general base. Currently, representative three-dimensional structures of six different proteinase families bearing this motif show, despite negligible sequence similarity, topological elements in common, among them a methionine-containing 1,4-␤-turn called Met turn. These are the astacins, adamalysins/ADAMs, serralysins, matrixins, snapalysins, and leishmanolysins. These two features, the ZBCS and the Met turn, gave rise to the name of the clan. They are linked by variable connecting segments that are characteristic for each constituting family (11)(12)(13). Interestingly, members of the adamalysin/ADAM (ADAM-9, -12, and -28) and the matrix metalloprotease (MMP) families (MMP-1, -2, -3, -7, -9, -11, and -19) have been identified as IGFBP proteases (14,15). These two latter families have previously been associated with the fate of other growth factors and derived pathologies. They mediate shedding of the ectodomains of membrane-anchored growth factors, cytokines, and receptors and thus increase their circulating forms (16).
We have identified a series of novel potential pappalysins through bioinformatic searches and studied a potential orthologue from Methanosarcina acetivorans (SwissProt protein sequence data base access code Q8TL28). This was the only archaeal form found, for which we hereby propose the name "ulilysin." We present the molecular analysis of this fully functional prokaryotic protease and discuss its mechanism of activation, as well as implications for other members of the pappalysin family.
Cysteine-to-alanine mutants were constructed, because the wild-type protein tended to form aggregates that hampered reproducibility in crystallization. With the mutation at position 269 ( Fig. 1), crystallization was reproducible. This variant was expressed in Escherichia coli BL21 Star TM cells with plasmid pET-28a, which attaches an N-terminal His 6 tag and a thrombin cleavage site. A selenomethionine variant of proulilysin was obtained analogously, except that cells were grown in minimal medium containing selenomethionine and amino acids. The bacterial cell cultures were collected and centrifuged, the pellets were resuspended, and the cells were disrupted by means of a cell disruptor before centrifugation. The supernatant was subjected to protein purification through nickel-nitrilotriacetic acid affinity chromatography, and the polyhistidine tag was removed with thrombin. The partially purified protein was further purified by anion exchange fast protein liquid chromatography. Fractions containing the tag-free 38-kDa proulilysin protein were identified by SDS-PAGE, pooled, and subjected to final size exclusion fast protein liquid chromatography. The central fractions of the peak were mixed and concentrated by centrifugal filter devices. The concentrated protein samples (5-20 mg/ml) were stored at 4°C.
Biochemical Studies in Vitro-If not otherwise stated, the proteolytic assays were performed at 38-kDa proulilysin and 29-kDa ulilysin concentrations of 0.45 and 0.65 mg/ml, respectively, in 50 mM Tris⅐HCl, pH 7.5, 5 mM CaCl 2 at 37°C for 2-4 h and at (pro)protease:substrate weight ratios of 1:100 or 1:200. The proteolytic capacity of ulilysin was tested on several substrates (Table 1). For the collagen substrates the buffer was 100 mM Tris⅐HCl, pH 7.5, 0.2 M NaCl. In the case of azosubstrates, absorbance was monitored at 440 nm. IGFBP-1 to -6 were tested at 1:300 and IGF-II at 1:100 at 20°C for 4 h. Casein and gelatin zymography was performed according to the manufacturer's instructions on precast Tris-Tricine Bio-Rad Ready Gels (with either 10% gelatin or 12% casein). The optimum pH for ulilysin activity was determined with the azocasein proteolysis assay (1:200) at 37°C using 50 mM Bis-Tris (pH 5-7) or Hepes (pH 7-9). The same assay was used to study activation of proulilysin in the presence of 5 mM CaCl 2 (in 50 mM Bis-Tris, pH 6) and to assess inhibition by standard protease inhibitors and by synthetic small molecule MMP inhibitors ( Table 2). The oligomerization state of proulilysin and ulilysin (at 0.5-1.8 mg/ml) in solution was assessed by size exclusion chromatography in a calibrated Superdex 75 HR 10/30 column. N-terminal Edman degradation, enzymatic digestion, and mass spectrometry analyses were performed in collaboration with the Proteomics Unit of the Technical and Scientific Service of the Barcelona Science Park and the Laboratory of Oncology (Vall d'Hebron Hospital, Barcelona, Spain).
Crystallization of Ulilysin, Structure Solution, and Refinement-Orthorhombic P2 1 2 1 2 crystals with two molecules in the asymmetric unit were obtained from sitting drops consisting of full-length C269A 38-kDa proulilysin (5 mg/ml in 30 mM Tris⅐HCl pH 7.5, 2 mM dithio-  threitol, 100 mM NaCl) and reservoir solution (18% polyethylene glycol 8000, 0.1 M MES, pH 6.5, 0.2 M CaCl 2 ) after several weeks at 20°C. Mass spectrometry analysis and N-terminal sequencing of the crystallized protein revealed a molecular mass of 28,885 Ϯ 50 Da and that the N terminus started at position Arg 61 , thus corresponding to activated 29-kDa ulilysin. The crystal structure was solved employing the selenomethionine derivative that crystallized in the same conditions as the native protein. A three-wavelength multiple-wavelength anomalous diffraction experiment was carried out at the European Synchrotron Radiation Facility Beamline BM16 (Grenoble, France). Furthermore, a high resolution data set was collected from the same crystal. Diffraction data were collected on an MAR CCD detector and processed with MOSFLM (17); they were then scaled, reduced, and merged with SCALA (18) within the CCP4 suite (19) ( Table 3). The multiple-wavelength anomalous diffraction data enabled identification of 10 of the 12 selenium sites present (6/monomer) with XPREP/SHELXD (20), and the phases were computed with SHARP (21). These phases gave rise to interpretable electron density maps. Subsequently, manual model building using TURBO-Frodo (22) alternated with crystallographic refinement with REFMAC5 (23) within CCP4, until the final model was obtained. This model contains protein residues Arg 61 -Ala 322 . Furthermore, one zinc (Zn 999 ) and two calcium cations (Ca 998 and Ca 997 ) were assigned to each molecule present in the asymmetric unit based on the ion coordination spheres and distance and geometry of the ligands. Comparable B-factors to the bound atoms and absence of positive or negative difference electron density after crystallographic refinement further supported this assignment, as well as the requirement of both zinc and calcium for activity. All of the protein residues were found in the most favored and additionally allowed regions of a Ramachandran plot. Each polypeptide chain displayed one disulfide bond (Cys 250 -Cys 277 ). 587 solvent molecules (Hoh7W-Hoh604W), a fifth calcium cation engaged in crystal contacts (Ca1W), and six (tentatively assigned) glycerol molecules (Gol2W-Gol7W) were also identified in the structure. Finally, a dipeptide was found (Arg 401 -Val 402 ) in each of the two active sites.
Miscellaneous-The two molecules within the asymmetric unit (suffixes A and B) are related by an almost perfect noncrystallographic dyad, and structurally equivalent, with an root mean square deviation of 0.36 Å for all atoms of each polypeptide chain. Accordingly, discussion will focus on molecule A. The figures were prepared with TURBO-Frodo and MOLSCRIPT (24). Structural superimpositions were performed with TURBO-Frodo.
Bioinformatic amino acid sequence similarity searches were undertaken within MEROPS data base (merops.sanger.ac.uk) and with the servers ProDom (protein.toulouse.inra.fr/prodom.html), Pfam (www.sanger.ac. uk/software/pfam), and PSI-BLAST (www.ncbi.nlm.nih.gov/blast). For the last server, the sequence of PAPP-A shown in Fig. 1, which includes the putative MP region, was employed as bait after exclusion of the two LNR regions. Structural similarity searches were performed with the DALI server (www.ebi.ac.uk/msd). Multiple sequence alignments were calculated with MULTALIN (prodes.toulouse.inra.fr/multalin). Close contacts and interaction surfaces were calculated with the program CNS (25). The final coordinates of ulilysin have been deposited with the Protein Data Bank is the ith intensity measurement of reflection hkl, including symmetry-related reflections, and ͗I(hkl)͘ is its average. c R factor ϭ ͚ hkl ͉͉F obs ͉Ϫk͉F calc ͉͉/͚ hkl ͉F obs ͉, with F obs and F calc as the observed and calculated structure factor amplitudes. The free R factor is the same for a test set of reflections (Ͼ500) not used during refinement. d Including atoms present with alternate occupancy.
at Research Collaboratory for Structural Bioinformatics (www.rcsb.org/ pdb) with access code 2cki.

RESULTS AND DISCUSSION
Autolytic Activation of Ulilysin Is Mediated by Calcium-Both the full-length protein of 38 kDa and its selenomethionine variant are monomeric in solution and, once purified, stable over weeks, with no traces of degradation even at 37°C (Fig. 2A). However, the addition of calcium causes a band of 29 kDa to appear in both cases (Fig. 2B). The full-length protein is inactive (see below and Fig. 2C), but preincubation with calcium renders activity after an initial lag phase. In turn, the 29-kDa form, which is also monomeric, is highly active. The full-length protein crystallizes in the presence of a high concentration of CaCl 2 . N-terminal Edman degradation and mass spectrometry analyses of the crystalline material reveal that it contains segment Arg 61 -Ala 322 (Fig. 1), revealing that the full-length protein was cleaved twice before crystallizing. The amino acid sequence around these two cleavage sites shows a similar motif, with an arginine in the P 1 Ј position, the first position downstream of the scissile bond (nomenclature according to Ref. 26; see also Fig. 2E). These data indicate that the 38-kDa form is the inactive zymogen, proulilysin, which undergoes calcium-triggered autoactivation to the 29-kDa form, mature ulilysin. Autolytic activation entails removal of the first 60 residues, which would correspond to the prodomain, and of a highly charged, 20-residue C-terminal tail that contains five glutamate and six arginine/lysine residues. Within the prodomain, a potentially free cysteine (Cys 23 ; Fig. 1) could play the role of a "cysteine switch" or "Velcro" element, where the cysteine S␥ atom coordinates the catalytic zinc ion in the zymogen and prevents substrates from binding to the active site. Such mechanisms for the maintenance of latency have been described in other metzincins (13,27,28).
Ulilysin Is a Functional Metalloprotease and a Specific and Selective IGFBP Protease-Ulilysin cleaved IGFBP-2, -3, -4, -5, and -6, as well as insulin, in an IGF-independent and mostly specific manner, but not IGFBP-1 or IGF-II (Fig. 2D). Furthermore, IGFBP-4 and -5 proteolysis occurred more efficiently in vitro than when mediated by human PAPP-A and PAPP-A2. Ulilysin produced comparable cleavage levels at much lower doses and reaction times than the human enzymes, as inferred from SDS-PAGE (data not shown). In addition, ulilysin had an optimum pH of around 6 and performed limited proteolysis of a whole series of substrates, including casein and extracellular matrix (derived) components like azocollagen, gelatin, and fibronectin (Table 1). Natural collagens of type I from kangaroo tail and of type V were also cleaved, but not collagen type I from human placenta or collagen type IV. However, these results must be taken with care as cleavage occurred at 37°C but not at room temperature. Furthermore, these substrates were also cleaved by trypsin in the same conditions, thus indicating that the purchased samples contained gelatinous material. In this context, it is noteworthy that extracellular matrix components interact with IGFBP-5 and modulate its activity in vivo (1). Moreover, the rather proteolysisresistant skeletal proteins actin and elastin were also cleaved, as was fibrinogen from the blood coagulation cascade, but not plasmin or ␣1-antitrypsin. N-terminal sequence analyses of the major fragments obtained after proteolysis of insulin and fibrinogen confirms the specificity for arginine in P 1 Ј (Fig. 2E), as already suggested by the autolytic cleavage points, and that the enzyme has a substrate preference with the pattern BX2RB(E/Q) (B, bulky hydrophobic or aromatic).
Ulilysin was only inhibited by unspecific MP inhibitors like the zinc chelators o-phenanthroline and EDTA, and by excess zinc ( Table 2). Partial inhibition was only effected by batimastat and CT1746, which are broad spectrum small molecule inhibitors of MMPs. The preference for an arginine in the P 1 Ј position also led us to test L-arginine and two arginine-based molecules that inhibit serine proteases and carboxypeptidases, benzamidine, and guanidinoethylmercaptosuccinic acid. None of them caused significant inhibition. These data indicate that ulilysin is both a specific and an efficient metalloprotease that targets selected IGFBPs and extracellular matrix proteins and that selective inhibitors remain to be found.
Overall Structure of Ulilysin-Ulilysin is ellipsoidal, with an ␣/␤ topology within a polypeptide chain that runs from residue Arg 61 to Ala 322 (Fig. 3, a and b). The protein is partitioned into two moieties separated by an extended active site cleft running from left to right, namely an upper regular N-terminal subdomain (NTSD; Arg 61 -Asn 235 ) and a rather irregular C-terminal lower domain (CTSD; Leu 236 -Ala 322 ). NTSD starts on the back of the molecule and enters, through strand ␤1, a strongly twisted, mainly parallel five-stranded ␤-sheet (Figs. 3a and 4A). Following ␤1, helix ␣1 runs on the back of the molecule from the upper right to the lower left and finishes at the junction with the CTSD. Here, the polypeptide chain follows an extended loop (Lys 100 -Gly 120 ) whose tip almost reaches the bottom of the molecule, resembling a cape over the back of the CTSD. This segment features two short 3 10 -helices (␣2 and ␣3 in Figs. 3a and 4A). The peptide chain rejoins the NTSD at the second ␤-strand that is subdivided into two, named ␤2 and ␤3. They are separated by an insertion (Thr 127 -Thr 136 ), here termed the "LNRlike loop," located on the convex side of the ␤-sheet and jutting out from the molecular surface (Fig. 3a). After ␤3, the protein chain spanning residues Ser 144 -Tyr 170 runs back over the convex surface of the sheet in an irregular manner and rejoins the ␤-sheet at ␤4. Of particular importance here is a double main chain to side chain interaction performed by Trp 165 with Asn 172 , which anchors the long ␤3␤4-connecting segment or loop (L␤3␤4) to the molecular scaffold. At the end of ␤4, the chain leads to a small ␤-ribbon, also on the convex side of the sheet, constituted by strands ␤5 and ␤6. Thereafter, the chain enters the outermost strand of the sheet, ␤7, the only one that runs antiparallel. After a bulky protrusion created by the connecting loop (L␤7␤8), the chain enters ␤8, which leads to the active site helix ␣4. The NTSD finishes at the end of the latter, at Asn 235 . This subdomain is mainly kept together by an extended hydrophobic cluster placed on the concave side of the ␤-sheet that spans almost the entire width of the molecule. Hydrophobic residues constituting this cluster are provided by ␤1, L␤1␣1, L␣3␤2, ␤2, ␤3, ␤4, L␤7␤8, ␤8, L␤8␣4, ␣4, and ␣5. On the convex side of the sheet, a smaller hydrophobic cluster is also linked by residues contributed by L␤3␤4, L␤7␤8, and the convex face of strands ␤1, ␤4, ␤5, ␤7, and ␤8.
The CTSD begins at Leu 236 with a double-S loop structure (Trp 240 -Asp 264 ) that serves as a scaffold for two nearby calcium-binding sites (see below and Fig. 3a). The back of this loop is connected to the main molecular body by a strong hydrogen bond, Thr 302 O␥1-Asp 258 O␦2. The subsequent loop runs over the molecular surface and is cross-linked with the double-S loop through a disulfide bond, Cys 250 -Cys 277 , and with the C-terminal part of the molecule possibly via a second one, Cys 269 -Cys 297 . Because the presently studied protein is a variant that contains the mutation C269A, the second of these bridges is not formed. However, the distance between the C␣ atoms (6.1 Å) and the relative arrangement of the side chains indicate that these two residues are covalently linked in the wild-type protein (Fig. 3a). The polypeptide chain rejoins the molecular body at Gly 283 , passes between the calciumbinding segment and the cape described above, and eventually reaches a tight 1,4-turn made up by residues Asn 288 -Tyr 289 -Met 290 -Asp 291 , the so-called Met turn. Within this turn, Asn 288 establishes a hydrogen bond with Gln 305 of the C-terminal helix ␣5 (see below), and Tyr 289 points into the NTSD hydrophobic cluster. These two interactions keep the Met turn in position under the zinc-coordinating residues, where it creates a hydrophobic pillow but does not contact the cation nor its ligands (see below). After the Met turn, the polypeptide chain turns round to reach the back surface of the protein and enters the C-terminal helix ␣5 (Fig. 3a). Approximately in its middle, Arg 308 points into the interior of the molecule contacting both Asn 235 O␦1 and O atoms, thus contributing to structural integrity, together with the previously mentioned Asn 288 -Gln 305 interaction. Helix ␣5 ends up at the molecular surface with the C-terminal residue, Ala 322 , solvent exposed and in the proximity of the N terminus (15.4

Å between the respective C␣ atoms).
The Active Site Cleft-The active site of ulilysin traverses the molecule from left to right (Fig. 3, a and b). Its top is framed by the antiparallel strand ␤7 of NTSD, which runs antiparallel to a bound peptide substrate, and by the ␤5␤6-ribbon, which projects away from the region of the active site cleft accommodating the primed side substrate residues. The bottom of the cleft is paved by segments Tyr 237 -Trp 240 , Pro 265 -Gly 268 , the Met turn, and the subsequent four residues (Asn 288 -Asp 295 ), all within the CTSD. A segment constituted by the first part of the calcium-binding double S-loop (Asp 242 -Arg 252 ) protrudes from the molecular surface and further influences substrate binding, in this case mainly on the nonprimed side of the active site. The catalytic zinc cation (Zn 999 ) is tetrahedrally coordinated by a solvent molecule and the N⑀2 atoms of His 228 , His 232 , and His 238 from the ZBCS, imbedded in the active site helix ␣4 (Figs. 3, a and b, and 4A). Binding distances range from 2.0 to 2.1 Å. The solvent molecule bound to the catalytic zinc is further anchored to the general base, Glu 229 of the ZBCS. The side chain of Tyr 292 , immediately after the Met turn, is also close to the catalytic cation ( Fig. 3b) but is swung out from its (probable) zinc-liganding position, as observed in other metzincins upon substrate binding (29,30). In the present structure, this is due to the presence of a peptide occupying the primed side of the active site cavity, probably left behind after a proteolytic event during purification or crystallization. The electron density map unambiguously identifies the residue penetrating the deep S 1 Ј pocket as an arginine (Arg 401 ), followed by a possible valine fitting into S 2 Ј (Val 402 ). This dipeptide may represent the ordered N-terminal segment of a larger peptide and is bound to the protein mainly through two inter-main chain hydrogen bonds with the upper rim strand ␤7 (Arg 401 N-Gly 189 O and Arg 401 O-Leu 188 N). The presence of Arg 401 allows us to identify the residues shaping the specificity pocket as Thr 225 from ␣4, Leu 188 from ␤7, Phe 220 from L␤8␣4, Met 298 from the extended segment preceding ␣5, and the main chain from Tyr 292 to Asp 295 (Fig.  3b). Of particular importance for specificity is the latter aspartate at the bottom of the pocket. It strongly binds, through its O␦1 atom, both Arg 401 N1 and N2 atoms. The latter atom further contacts Val 293 O. The nature of this cavity, mostly hydrophobic except for the pocket bottom, is ideally conceived to accommodate an arginine residue, thus explaining the strong preference of ulilysin for such a residue in P 1 Ј (see above and Fig. 2E). The bound peptide also indicates that the S 2 Ј pocket is much shallower and mainly created by the aromatic surface of Tyr 292 , as well as by Gln 185 and Ile 187 from the ␤5␤6 ribbon.
The Double Calcium-binding Site-Two calcium cations are present in the ulilysin CTSD, 9.1 Å apart (Figs. 3a and 4, B-D). (Pro)ulilysin is inactive in their absence and rigid in their presence. It is reversibly inhibited by EDTA (Table 2) and reactivated by dialysis against a calcium-containing buffer (data not shown). These data indicate that calcium is a switch for this protease.
The first site is centered on Ca 998 and shows eight oxygen ligands, five approximately in a plane, two apical ligands on one side of the plane, and another on the opposite side (Fig. 4, B and C). Four ligands are provided by the protein, and four are solvent molecules, and coordinating distances range between 2.3 and 2.6 Å. This site is reminiscent of that on Ca 317 of thermolysin (Protein Data Bank code 8tln) (31), although in the latter case there is only one solvent ligand, and a second calcium ion is just 3.8 Å away. The second ulilysin calcium Ca 997 has four ligands in a plane with the cation and, again, two apical ligands on one side and another at the opposite apical position (Fig. 4, B and D). Here, only one of the seven oxygen atoms comes from a solvent molecule. In this case, the site is reminiscent of the EF hands seen in calbindin 9K (32). However, in ulilysin this calcium site is not flanked by the characteristic helix-turn-helix motif found in EF hands. Taken together with the results of structural similarity searches, these findings suggest that the region of ulilysin encompassing the calcium binding sites is novel.
Implications for the Other Members of the Pappalysin Family-Bioinformatic sequence similarity searches suggest that pappalysins should be grouped into protease family M43 in the MEROPS data base, into the ProDom family PD332581, and into the Pfam family PF05572. Furthermore, these searches permitted the identification of two groups of sequences. The first included close relatives of human PAPP-A, with E values below 3E-20 (see Ref. 33), from several mammals, birds (chicken), fish (zebrafish and green-spotted pufferfish), amphibians (African clawed frog and pipid frog), and echinoderms (sea urchin). Besides, a second group comprised sequences with 9E-12 Ͼ E value Ͼ 7E-4 from fungi (Pleurotus ostreatus PoMTP, Coccidioides posadasii MEP1, Ustilago maydis, Aspergillus nidulans and fumigatus, Magnaporthe grisea, Neurospora crassa, Gibberella zeae, and Metarhizium anisopliae), bacteria (Cytophaga sp. and hutchinsonii cytophagalysin and sequences from Gloiobacter violaceus, Shewanella sp. and amazonensis), and archaea (M. acetivorans ulilysin). Among these, Mep1 (SwissProt code Q71H76) from the fungal pathogen C. posadasii, which can cause the respiratory San Joaquín Valley fever in immunocompromised humans and animals, is the only member, apart from human PAPP-A and -A2, to have been assessed biochemically and in vivo. The 283-residue (pro)Mep1 is secreted during endospore differentiation within the host, and it digests a host-cell surface antigen, in this way preventing recognition of endospores by host phagocytes (34). Another member studied in vitro is PoMTP metalloprotease from the edible oyster mushroom, P. ostreatus (SwissProt code Q5Y972). The mRNA of this 290-residue (pro)protein is abundant at primordial and fruit body stages, thus suggesting a role in mushroom fruiting (35). These authors proposed that this protein should be grouped with a series of putative fungal orthologues in a separate metzincin family termed eucolycins. Finally, another member is cytophagalysin, a bacterial collagenase obtained from Cytophaga sp. L43-1 (36). It is a polypeptide of 1,282 amino acid residues, putatively synthesized from cog as a zymogen. It is capable of digesting both insoluble and acidsoluble collagens and gelatin, as well as casein (37). Whereas most prokaryotic and fungal forms merely span the catalytic domain (plus a putative pro-domain), cytophagalysin encompasses a further ϳ950 residues C-terminal to the catalytic domain, a stretch of similar length to the one found in the vertebrate pappalysins, thus potentially encompassing additional domains of distinct function (Fig. 1). Detailed inspection of selected forms reveals that pappalysins present a higher similarity within their CTSDs, because most determinants of specificity are found within this subdomain. In the NTSD, the main structural features for domain integrity are the two internal hydrophobic clusters on either side of the central ␤-sheet. Strict residue conservation is dispensable because compensating substitutions may still maintain these clusters. Key residues for catalysis, substrate binding, and structural stability are absolutely conserved, such as Asp 295 , which determines the substrate specificity in P 1 Ј, and Asn 288 , at the beginning of the Met turn, which forms a hydrogen bond with Gln 305 important for structural integrity. The same holds for Arg 308 , engaged in binding of the main chain at Asn 235 , and for the aforementioned duos: Thr 302 -Asp 258 and Trp 165 -Asn 172 . In the calcium-binding site, most ligands are provided by solvent molecules or main chain carbonyl oxygen atoms in ulilysin (Fig. 4, B-D). Only three protein side chains participate: Asp 254 , Thr 259 , and Glu 243 . The first two are strictly conserved, whereas the third could be replaced by another conserved glutamate, e.g. Glu 580 (following the PAPP-A numbering; see Fig. 1). This correlates well with the finding that PAPP-A activity, like ulilysin activity, depends on calcium (9).
As previously discussed, the start of the mature ulilysin sequence suggests that a pro-domain with a putative cysteine switch might be present in pappalysins (Fig. 1). The disulfide bond pattern of ulilysin and, most likely, the closer bacterial and fungal relatives includes two SS bonds in the CTSD. Although this double clamp affects the region where all pappalysins show the closest sequence similarity, the SS pattern reported for PAPP-A and the other more closely related vertebrate members diverges (38). This difference may be intrinsic and could therefore be compatible with the maintenance of the overall fold. Alternatively, the activity of PAPP-A in vivo may be dependent on reducing agents, putatively accounting for changes in the disulfide bond pattern (8). Accordingly, reduction or formation of specific SS bonds may regulate the extracellular PAPP-A activity, and the SS bonds may show a pattern similar to those in ulilysin in certain conditions.
There is little sequence consensus between the cleavage points identified in ulilysin and those reported for PAPP-A. Further, aside from IGFBPs, no other protein or low molecular weight substrates have been identified for the human enzyme. It has thus been proposed that steric regulation could account for substrate specificity, possibly mediated by the three LNR motifs, two of which are inserted into the catalytic protease domain (Fig. 1) (9). These LNR sequences are absent from bacterial orthologues (Fig. 1), but the region of insertion fully coincides with the LNR-like loop observed between strands ␤2 and ␤3 in ulilysin. Accordingly, this protruding 10-res-idue loop may be shorter, or even absent, or much longer, featuring the ϳ66-residue insertion of PAPP-A, than in ulilysin. In any case, it is clear that this insertion would be compatible with the overall structure of the protease moiety and that it would occur on the surface, thus in a potential disposition to carry out binding functions.
Structural Similarities with Other MPs-Ulilysin bears structural similarity with members of the adamalysin/ADAM family within the metzincin clan of MPs. In particular, the most closely related structures are those of ADAM-17/TACE (Protein Data Bank code 1bkc) (39) and ADAM-33 (Protein Data Bank code 1r54) (40). A total of 177 and 167 residues can be structurally aligned with an root mean square deviation over all atoms of 3.0 and 2.8 Å, respectively, despite negligible sequence similarity (12 and 16%) (Fig. 3c). These human MPs are engaged in cancer, inflammation, and modulation of the immune response and in asthma, respectively. Members of this family are IGFBP proteases. Likewise, adamalysins/ADAMs are subdivided into an NTSD and a CTSD, separated by an active site cleft harboring the catalytic zinc ion. Generally, the structural similarity depends on a series of conserved regular secondary structure elements (Fig. 3c). In detail, similarity is high within each polypeptide chain around the long ZBCS, with a superimposable catalytic base and zinc-liganding histidines. However, the structures of the CTSDs deviate strongly. Furthermore, adamalysins/ADAMs and MMPs strongly prefer bulky hydrophobic residues in P 1 Ј of substrates, whereas ulilysin (and probably other pappalysins) favors an arginine. Finally, the possible presence of a fifth tyrosine ligand in ulilysin that may be swung out upon substrate binding is shared with astacins and serralysins (41,42).
Conclusions-We have discovered and studied a new MP that belongs to the family of the pappalysins, characterized by the prototypical human PAPP-A, which was identified as a specific IGFBP protease. In contrast with PAPP-A, ulilysin displays a characteristic cleavage pattern preference and broad substrate specificity and efficiency. Recently, it has been reported that procollagen C proteinase, a multidomain MP of the astacin family of metzincins, also shows a much broader substrate profile and higher efficiency in cleaving a variety of extracellular matrix proteins than the full-length protein. This difference has been attributed to the additional domains present in the latter that may modulate and restrict the substrate specificity through their potential protein binding or steric hindering competence (43). Our results, taken together with the extensive work performed on human PAPP-A, suggest a similar scenario, where activity of a ulilysin-like protease domain could be restrained by the LNR and CCP motifs and further domains potentially contained in the additional ϳ1,200 residues of the vertebrate forms.