Catalytic Domain Architecture of Metzincin Metalloproteases*

Metalloproteases cleave proteins and peptides, and deregulation of their function leads to pathology. An understanding of their structure and mechanisms of action is necessary to the development of strategies for their regulation. Among metallopeptidases are the metzincins, which are mostly multidomain proteins with ∼130–260-residue globular catalytic domains showing a common core architecture characterized by a long zinc-binding consensus motif, HEXXHXXGXX(H/D), and a methionine-containing Met-turn. Metzincins participate in unspecific protein degradation such as digestion of intake proteins and tissue development, maintenance, and remodeling, but they are also involved in highly specific cleavage events to activate or inactivate themselves or other (pro)enzymes and bioactive peptides. Metzincins are subdivided into families, and seven such families have been analyzed at the structural level: the astacins, ADAMs/adamalysins/reprolysins, serralysins, matrix metalloproteinases, snapalysins, leishmanolysins, and pappalysins. These families are reviewed from a structural point of view.


Metalloproteases
Cleavage of peptide bonds is essential for life, and the factors responsible for peptide cleavage are ubiquitous. Among them are MPs, 2 which are mostly zinc-dependent peptide-bond hydrolases. They participate in metabolism through both extensive and unspecific protein degradation and controlled hydrolysis of specific peptide bonds (1). Deregulation of such vast degrading potential leads to pathologies, and in addition, MPs may also act as virulence factors during poisoning and microbial infection. Such a wide range of biological functions makes structural studies of these proteins indispensable to any understanding of their function and to the design of novel, highly specific therapeutic agents to modulate their activity (2).
Most MPs are members of a protease tribe, the zincins, and they possess a short consensus amino acid sequence, HEXXH. This motif contains two protein ligands of the catalytic zinc and a glutamate that acts as general base/acid during the catalytic process (3,4). A third metal ligand is a solvent molecule, further bound to and polarized by the glutamate. This solvent performs nucleophilic attack on the carbonyl carbon of the scissile bond of a bound substrate, leading to a tetrahedral gem-diolate reaction intermediate, which is stabilized by the positively charged metal ion and neighboring protein residues (5). Subsequent evolution of the intermediate under assistance of the general base/acid eventually leads to bond disruption.
Zincins are divided into the gluzincin, aspzincin, and metzincin clans (4,7). The latter contains mostly multidomain proteins with an N-terminal prodomain engaged in latency maintenance, a catalytic protease domain, and farther downstream domains engaged in protein-protein and cell-cell interactions and other regulatory functions. The protease domain is characterized by a C-terminally extended zinc-binding motif, HEXXHXXGXX(H/D), with a hallmark glycine and a third zinc-binding histidine or aspartate. In addition, a methionine is present in a conserved downstream turn, the Met-turn (3,7,8). Metzincins split into families, seven of which have been characterized at the structural level for at least one of their members: astacins, ADAMs/adamalysins/reprolysins, serralysins, matrix metalloproteinases, snapalysins, leishmanolysins, and pappalysins. In addition, a series of sequences reported from genome sequencing projects indicate there are further structurally yet uncharacterized families, tentatively referred to as fragilysins, gametolysins, archaemetzincins, thuringilysins, coelilysins, ascomycolysins, helicolysins, and cholerilysins (for a detailed review, see Ref. 7).

The Metzincin Fold
To date, Ͼ200 structures of metzincins, comprising at least the catalytic domain, have been deposited with the Protein Data Bank (supplemental Table 1). In this review, seven lead structures of each of the aforementioned families are discussed: Astacus astacus astacin, Crotalus adamanteus adamalysin II, Pseudomonas aeruginosa aeruginolysin, human neutrophil collagenase (MMP-8), Streptomyces caespitosus neutral protease (snapalysin), Leishmania major leishmanolysin, and Methanosarcina acetivorans ulilysin (9 -15). These prototypes cover distinct kingdoms of life and represent archaea, bacteria, protozoa, crustaceans, reptiles, and mammals. The structures reveal that metzincins share a common scaffold and active-site environment, but each family has distinguishing structural elements. Mature catalytic domains are ϳ130 -260-residue globular moieties that bifurcate into an upper NSD and a lower CSD with respect to a central active-site cleft. Substrates bind horizontally to this cleft from left to right in an approximately extended conformation (supplemental Fig. 1A). (All topological indications refer to the standard orientation displayed in this figure.) The NSD displays a five-stranded twisted ␤-sheet at the top (except leishmanolysin, which has four strands) (supplemental Fig. 1A). All strands (␤I-␤V) except the fourth are parallel to each other and to any substrate that is bound in the cleft. The antiparallel strand, ␤IV, forms the lower edge of this subdomain and creates an upper rim or northern wall of the active-site crevice (16). This strand binds a substrate in an antiparallel manner, mainly on its non-primed side. The loop segment connecting strands ␤III and ␤IV (referred to as L␤III␤IV) leads to the appearance of bulge-like elements, which mainly affect subsites S 1 Ј and S 2 Ј (see Ref. 17 for subsite nomenclature in proteases). This gives rise to extensive variations in enzyme-substrate interactions on the primed side of the active-site clefts. The NSD also contains two long ␣-helices, ␣A and ␣B, the backing helix and the active-site helix, respectively. Both are arranged on the concave side of the ␤-sheet in an identical manner in all metzincin structures (supplemental Fig. 1B). Helix ␣B superimposes well in all seven structures (Fig. 1A) and encompasses the first half of the zinc-binding motif, which includes the first two zinc-binding histidine residues ( Fig. 1A and supplemental Fig. 1). At the end of helix ␣B, the polypeptide chain takes a sharp downward turn, mediated by the glycine of the consensus sequence. The main-chain angles of this residue in the different lead structures indicate that any other residue would be in a high-energy conformation and thus disfavored, with the exception of ulilysin (see below).
The CSD starts after this glycine, and the chain leads to the third zinc ligand, a histidine or an aspartate, which approaches the metal from below (supplemental Fig. 1). This subdomain contains few repetitive secondary structure elements, mainly a C-terminal helix ␣C at the end of the polypeptide chain. Helices ␣B and ␣C are connected by structures that vary both in length and conformation. However, all structures coincide at a conserved 1,4-␤-turn containing a methionine at position 3, the Met-turn, which is separated from the third zinc-binding histidine by connecting segments of 6 -53 amino acids in the differ-ent structures. The Met-turn is superimposable (including the conformation of the methionine side chain) (Fig. 1A) and is positioned underneath the catalytic Zn 2ϩ , forming a hydrophobic pillow. However, no direct contact with the metal is observed. Mutation studies suggested a role for this methionine in the folding and stability of the catalytic domains, although the strict conservation of this residue remains to be explained (18,19). 3 The S 1 Ј pocket of metzincins is shaped at the top by a protruding bulge made by L␤III␤IV and at the bottom by a wall-forming segment made up of residues intercalated between the Met-turn and the C-terminal helix ␣C (16). This segment diverges in structure and length (ranging from 11 to 37 residues between the Met-turn methionine and the first residue of helix ␣C) in all seven of the reference structures.

The Zinc-binding Site
The catalytic zinc ion lies roughly at the center of the bottom of the active-site cleft. It is coordinated by the N ⑀2 atoms of the three consensus histidines (two histidine N ⑀2 atoms and one aspartate O ␦2 atom in snapalysin) and the catalytic solvent molecule, substituted by other ligands in the enzyme/inhibitorproduct complexes reported (supplemental Table 1). Some metzincins display an additional protein ligand at a slightly greater distance from the catalytic cation in the form of a tyrosine O atom, as seen in unbound astacin and serralysins and as hypothesized in ulilysin (21). This tyrosine residue lies two positions ahead of the Met-turn methionine. In (unbound) snapalysin, a tyrosine lies two positions downstream of the metalbinding aspartate, though no longer within binding distance of the zinc ion. These tyrosine residues flip back and forth during substrate anchoring, cleavage, and product release in a motion referred to as the tyrosine switch. In this, they may play a role in substrate and catalytic solvent binding and stabilization of the tetrahedral intermediate or the product amino group (7,9,(22)(23)(24). Such a role is performed by other, non-conserved residues in tyrosine-lacking metzincins.

Distinguishing Features of Each Family
Astacin is a 200-residue digestive enzyme from the crayfish A. astacus and the first member of the astacin family to be structurally analyzed (9). Through protein degradation, growth factor activation, extracellular matrix turnover, and extracellular coat degradation (hatching), astacins participate in diverse biological processes such as digestion, development, and tissue remodeling and differentiation (e.g. promoting cartilage and bone formation and collagen biosynthesis) (27).
The three-dimensional structure of astacin shows a packman-like spherical shape and two subdomains of approximately equal size (supplemental Fig. 1A) (9). Whereas this proteinase contains only a propeptide and a catalytic MP domain, most other astacins display additional C-terminal MATH, MAM, CUB-like, Ser/Thr-rich, I (inserted)-, epidermal growth factor-like, Tox1, and transmembrane modules (for details, see Refs. 8, 27, and 28). The astacin family has the longest region of the CSD lacking regular secondary structure elements, just a small helix and a short ␤-ribbon, although this is the sequentially most conserved connecting segment among metzincins (see Table 1 in Ref. 7). The protein scaffold is cross-linked by four conserved cysteine residues forming two disulfide bridges (supplemental Fig. 1). The first two residues of the mature enzyme are buried in an internal cavity in the CSD, and the N-terminal ␣-amino group establishes an interaction with a conserved glutamate next to the third zinc-binding histidine. The unbound coordination of the catalytic zinc is trigonal-bipyramidal due to the presence of the fourth invariant, although somewhat more distal, tyrosine zinc ligand downstream of the Met-turn (supplemental Fig. 1A). In addition to astacin, the structures of human tolloid-like protease-1 and bone morphogenetic protein-1 have recently been reported (29).
Serralysins are ϳ50-kDa bacterial virulence factors secreted as autoactivatable zymogens by pathogenic ␥-class proteobacteria (30). These organisms are responsible for human diseases such as meningitis, endocarditis, pyelonephritis, plague, dermatitis, soft tissue infections, septicemia, melioidosis, pneumonia, and other respiratory and urinary tract infections. They play a major role in hospital-acquired infections due to their capacity to produce surgical wound infections and to infect neonates. As part of the virulence potential of these bacteria, serralysins are directed against coagulation factors and defense-oriented proteins, protease inhibitors, lysozyme, and transferrin and may cause an anaphylactic response.
The first serralysin to be biochemically and structurally characterized was P. aeruginosa aeruginolysin (10). Its mature 220residue catalytic domain lacks disulfide connections and is flanked on its C-terminal end by a calcium-stabilized ␤-roll domain. As in astacin, its two subdomains are of similar size. The polypeptide chain starts with an ␣-helix in the CSD (characteristic for the family) that is anchored to the molecular body by a conserved salt bridge with the C-terminal helix ␣C (supplemental Fig. 1B). The NSD features a flap made up by an elongated L␤I␣A and runs across the convex surface of the ␤-sheet. This flap varies greatly among serralysins (20 reported structures) (supplemental Table 1), distinctly affecting substrate binding. The CSD of aeruginolysin presents an extra ␣-helix within the segment linking the Met-turn with the wallforming stretch and a second flap shaped by residues of the connecting segment. These elements also modulate substrate binding. Comparison of aeruginolysin, which was first solved in complex with a bound tetrapeptide (10), with the closely related structure of unbound Serratia marcescens serralysin reveals that the unliganded zinc coordination is similar to astacin (trigonal-bipyramidal). It also includes a tyrosine that undergoes a hinge motion upon substrate binding (31).
S. caespitosus snapalysin is a secreted neutral protease that comprises a 132-residue catalytic domain preceded by an alanine-rich 100-amino acid N-terminal extension including a signal peptide and a prodomain. Similar sequences have been reported for other Streptomyces species, and they have been termed SnpA (Prt and snapalysin), MprA, and SnpA. They show milk-hydrolyzing activity.
Snapalysin is the smallest metzincin and the only family member that has been structurally characterized. Its structure recalls a flattened ellipsoid and bifurcates into two asymmetric subdomains (12). It displays all the characteristic metzincin features, connected by short loops. Distinguishing elements are a small L␤II␤III protruding from the upper sheet within the NSD, a small bulge on top of the primed side of the active-site crevice, a short helix in L␤V␣B, and a calcium-binding site (supplemental Fig. 1A). In addition, an aspartate is found at the position of the third zinc-binding histidine, and two positions ahead in the sequence, a conserved tyrosine approaches but not binds the metal.
MMPs are secreted or membrane-bound proteinases discovered 47 years ago and participate in tail resorption during tadpole-to-frog metamorphosis. They are found mainly in higher mammals, although related sequences have been found in fish, amphibians, insects, plants, prokaryotes, and viruses. Through turnover of extracellular matrix proteins, MMPs are involved in tissue resorption, remodeling, and repair, as observed during embryogenesis and development, branching and organ morphogenesis, and angiogenesis. However, their potent proteolytic potential or its absence may also lead to pathologies such as inflammation, ulcers, rheumatoid arthritis and osteoarthritis, periodontitis, heart failure and cardiovascular disease, fibrosis, emphysema, and cancer and metastasis (32). More recently, MMPs have been observed to be engaged in (in)activation events following limited proteolysis, as observed in apoptosis and intestinal defense protein activation but also in pathologies including stroke, human immunodeficiency virus-associated dementia, atherosclerosis, multiple sclerosis, bacterial meningitis, and Alzheimer disease. MMPs include extracellular proteins such as other (pro)proteinases, inhibitors, clotting factors, antimicrobial peptides, and chemotactic and adhesion molecules. In common with ADAMs (see below), MMPs are also involved in ectodomain shedding of growth factors, growth factor-binding proteins, hormones and hormone receptors, and cytokines and cytokine receptors from the cell surface (33).
Like other metzincin families, MMPs are mosaic proteins constituted by a series of inserts and domains. These may include an ϳ20-residue secretory signal peptide, an ϳ80-residue propeptide, a 160 -170-residue zinc-and calcium-dependent catalytic proteinase domain, a linker region, and a 4-fold propeller hemopexinlike C-terminal domain. Further insertions may include fibronec-tin type II-related domains; a collagen type V-like and vitronectinlike insertion domain; a cysteine-rich, a proline-rich, and an interleukin-1 receptor-like domain; an immunoglobulin-like domain; a glycosylphosphatidylinositol linkage signal; a membrane anchor; and a cytoplasmic tail. Naming of MMPs started historically with fibroblast collagenase as MMP-1 and has currently reached MMP-28, with 23 different forms described in humans (34). Those MMPs encompassing a membrane anchor gave rise to the membrane-type MMP subfamily (16,32). Zymogen activation proceeds in MMPs according to a cysteine switch or Velcro mechanism. This removes the prodomain and switches from an inactive state, where the S ␥ atom of a cysteine residue within a conserved motif, PRCGVPD, substitutes the catalytic solvent molecule in the zinc coordination sphere, to the fully accessible active enzyme (35,36). MMPs are the structurally most thoroughly studied metzincin family, with Ͼ120 structures reported (supplemental Table 1) (37). The mature catalytic domain of human neutrophil collagenase (MMP-8) (13, 37) has a shallow active-site cavity, which separates a larger NSD (ϳ120 residues) from a smaller CSD (ϳ40 amino acids), and generally a deep hydrophobic S 1 Ј pocket. No disulfide bonds are present in the structure. The N-terminal ␣-amino group is anchored to the first of two conserved aspartates imbedded in helix ␣C. The NSD displays an S-shaped double loop connecting strands ␤III and ␤IV, which embraces a structural zinc cation and a tightly bound calcium ion. The downstream residues of this segment form a prominent bulge that protrudes into the active-site groove. L␤IV␤V and L␤II␤III contribute to a second calcium-binding site on top of the NSD ␤-sheet (supplemental Fig. 1, A and B). In the CSD, the MMP-8 chain displays the shortest and most conserved connecting segment within metzincins.
ADAMs/adamalysins/reprolysins split into three subgroups, the snake venom MPs, the mammalian ADAMs, and the likewise mammalian ADAMTSs (38 -41). The former are responsible for post-envenomation hemorrhage through digestion of extracellular matrix components surrounding capillaries, resulting in tissue necrosis. In turn, ADAMs were originally described to play a role in fertilization and sperm function in mammalian reproductive tracts. They are involved in myogenesis, development, neurogenesis, differentiation of osteoblastic cells, cell migration modulation, and muscle fusion. They are also engaged in human disorders like asthma, cardiac hypertrophy, obesity-associated adipogenesis and cachexia, rheumatoid arthritis, endotoxic shock, inflammation, and Alzheimer disease. They also have a major role in protein ectodomain shedding as described previously for MMPs. Finally, some family members lacking the transmembrane domain and harboring multiple copies of a thrombospondin 1-like repeat and a CUB domain gave rise to a distinct subfamily of soluble extracellular proteases, the ADAMTSs (42). These enzymes disable cell adhesion by binding to integrins. They are also involved in gonad formation, embryonic development and angiogenesis, and procollagen activation, as well as in inflammatory processes, cartilage (aggrecan) degradation in arthritic diseases, bleeding disorders, and glioma tumor invasion.
All ADAMs/adamalysins/reprolysins are extracellular multidomain proteins containing a catalytic zinc-and calcium-dependent MP domain. In addition, they can display a prodomain and C-terminal disintegrin-like, cysteine-rich, C-type lectin, epidermal growth factor-like, thrombospondin 1-like, and/or transmembrane domains, as well as a cytoplasmic domain. Latency is maintained by the prodomain, and activation is believed to occur as in MMPs, i.e. by cleavage of the prodomain according to a cysteine switch-like mechanism (7,36,43). The first catalytic domain structure to be analyzed was that of adamalysin II from C. adamanteus snake venom (11). This is a compact 203-residue molecule of oblate ellipsoidal shape, notched at the periphery to render a relatively flat substratefixing cleft. This cleft separates a large ϳ150-residue NSD from a small ϳ50-residue CSD (supplemental Fig. 1A). Both the N and C termini are surface-located; the former is linked by a salt bridge to the C-terminal helix ␣C (7). Adamalysin II deviates most from the metzincin consensus sequence within the conserved regular secondary structure elements, especially at strand ␤I and helices ␣A and ␣C (Fig. 1B). Inserted into the common scaffold, two additional helices are found within the NSD. In the CSD, two disulfide bonds cross-link the irregular connecting segment and attach helix ␣C to the NSD, respectively. A calcium ion is located on the surface, opposite the active site and close to the C terminus (supplemental Fig. 1A). The S 1 Ј pocket, characterized by a pronounced bulge segment L␤IV␤V, is hydrophobic and deep, reminiscent of some MMPs. In addition to adamalysin II, a number of snake venom MPs (ADAM-17, ADAM-33, and ADAMTS-1, -4, and -5) have been structurally analyzed to date (supplemental Table 1).
Leishmanolysins are cell-surface proteins present in most trypanosomatid, plasmodiid, and sarcocystid protozoa. They constitute the major component of the promastigote surface and are enzymatically active against polypeptide substrates. They cleave CD4 molecules at the surface of human T cells and protect promastigotes from lysis by complement proteins, suggesting a possible role as a virulence factor. Related sequences have been found in mammals (here called invadolysin), fruit flies, thale cress, nematodes, and bacteria (7,44). The only structurally analyzed family member is L. major leishmanolysin. It is synthesized as a 602-residue inactive precursor in the endoplasmic reticulum with a signal and a 100-residue propeptide, which includes a highly conserved cysteine residue potentially acting as a cysteine switch (see above). Activation liberates a mature MP of ϳ280 residues, followed by an ϳ200-residue C-terminal domain. A 63-residue insertion domain is observed between the glycine and the third zinc-binding histidine of the long consensus motif (supplemental Fig. 1A). The MP domain is the most asymmetric among metzincins, with a 175-residue NSD and just an ϳ45-amino acid CSD. Its N terminus is located on the back left surface. The NSD is characterized by a ␤-sheet, which lacks strand ␤II (supplemental Fig. 1, A and B), and by the presence of two unique ϳ40-amino acid inserted flaps, which account for most of the differences in size from the other proteins of the clan. The NSD is cross-linked by two disulfide bonds. Preceding strand ␤IV, a slightly prominent bulge segment lies on top of the shallow, medium-sized S 1 Ј pocket, which is delimited by the wall-forming segment and the beginning of the active-site helix ␣B. At the end of the CSD, helix ␣C is followed by a segment in an extended conformation, which runs from left to right across the back surface (14).
The most recent family to be structurally characterized are the pappalysins. They were named after human PAPP-A, a heavily glycosylated 170-kDa multidomain protein specifically cleaving insulin-like growth factor-binding proteins (45). Proulilysin is a 38-kDa archaeal protein from M. acetivorans that shares sequence similarity with PAPP-A, but it encompasses only the prodomain and the catalytic domain (15). The proprotein may undergo cysteine switch-mediated activation, as suggested by the presence of a conserved cysteine in the prodomain. Activation occurs autolytically in the presence of calcium. With 262 residues, mature ulilysin is the largest MP of all metzincin catalytic domains. As distinguishing features, it presents in the NSD a loop dividing strand ␤II into two substrands (␤II and ␤IIЈ) and a ␤-ribbon inserted within L␤III␤IV that protrudes from the molecular surface and frames the active site on its primed side. The segment connecting helix ␣A with strand ␤II is the largest among metzincins and covers almost all of the back of the molecule from the NSD to the CSD in a cape-like fashion and includes two unique ␣-helices. The glycine of the zinc-binding motif is replaced in ulilysin and a small subset of pappalysins by an asparagine under slight variation of the mainchain angles, which do not correspond here to a high-energy conformation but to a left-handed ␣-helix. Overall, the chain trace flanking this residue is indistinguishable from other metzincins (supplemental Fig. 1) (7). The CSD shows two disulfide bonds and a unique two-calcium site. This site is a molecular switch for activity, as the proteinase can be reversibly inhibited through calcium chelators (15,20,21). Finally, in the absence of an unbound structure, ulilysin may possess a fifth zinc-binding tyrosine ligand provided by the Met-turn that is swung out upon substrate binding.

Conclusions
The metzincins constitute a clan of ubiquitous MPs present in all kingdoms of life, which are pivotal for physiology and pathology. So far, seven families have been structurally analyzed. The clan represents a case of divergent evolution from an urmetzincin, which may not look very different from the smallest member, snapalysin. Into such a minimal scaffold, evolution has introduced unique structural elements conserved among each family. Based on the presence of an extended zinc-binding consensus sequence pattern, several more families have been suggested to enlarge the clan, although future structural analysis will be required to confirm their ascription. Structural information on each of the seven families may help to elucidate common catalytic and processing mechanisms and to assign the function of proteins encoded by newly discovered gene sequences.