Proteases: History, discovery, and roles in health and disease

The Journal of Biological Chemistry (JBC) has been a major vehicle for disseminating and recording the discovery and characterization of proteolytic enzymes. The pace of discovery in the protease field accelerated during the 1971–2010 period that Dr. Herb Tabor served as the JBC's editor-in-chief. When he began his tenure, the fine structure and kinetics of only a few proteases were known; now thousands of proteases have been characterized, and over 600 genes for proteases have been identified in the human genome. In this review, besides reflecting on Dr. Tabor's invaluable contributions to the JBC and the American Society for Biochemistry and Molecular Biology (ASBMB), I endeavor to provide an overview of the extensive history of protease research, highlighting a few discoveries and roles of proteases in vivo. In addition, metalloproteinases, particularly meprins of the astacin family, will be discussed with regard to structural characteristics, regulation, mechanisms of action, and roles in health and disease. Proteases and protein degradation play crucial roles in living systems, and I briefly address future directions in this highly diverse and thriving research area.


Historical aspects of proteases and their role in protein degradation
In the very first issue of the Journal of Biological Chemistry (JBC) 2 in 1905, P. A. Levene published studies on "The Cleavage Products of Proteoses" (1). The Journal continually published state-of-the-art work on proteases over the years, but the pace of discovery in the field accelerated during the 39 years that Herb Tabor served as Editor of the JBC. When Herb began his tenure as Chief Editor of the JBC (1971), we knew the fine structure and a substantial amount about the kinetics of only a few proteases. Some examples of the major classes of proteolytic enzymes (aspartic, serine, cysteine, metallo) that were well studied before 1970 are as follows.
• Pepsin, an aspartic protease of the stomach, was one of the first enzymes to be discovered, characterized, and named (in 1825), and it was crystallized in 1930 (2). Studies of pepsin's action can be found in the JBC as far back as in 1907 (3), and mechanistic studies were well on the way in the 1970s.
• The serine proteases, trypsin and chymotrypsin from pancreatic secretions, were also discovered in the 1800s and crystallized in the 1930s (4). Studies of the action of trypsin appeared in the JBC in 1907 (5), whereas those for chymotrypsin appeared in the 1930s (6).
• Papain, the cysteine protease from papaya, was also discovered in the 1800s, and pure forms were reported in the JBC as early as 1954 (7).
• Thermolysin, an extracellular metalloprotease from thermophilic bacteria, was the first metalloendoproteinase to be crystallized and to have its structure solved (8).
There are many excellent reviews available for individually characterized proteases and for clans and families of proteases, as well as for general insights into functional aspects of proteases (e.g. see Ref. 13). A comprehensive database, MEROPS, of the more than 1000 individual proteases is available to all and contains a wealth of information on the characterization and evolutionary relationships of the proteases and the current literature (https://www.ebi.ac.uk/merops/) 3 (98). A degradome database of human proteases (14) and the Handbook of Proteolytic Enzymes (15) are also valuable resources.
There was ample new information coming forth in the 1960s and early 1970s on protease structure and function about small , secreted proteases (as those cited above), but little to nil was known about cell-associated proteases, cellular functions of proteases, or protein turnover. In an era when there were great advances and interest in the mechanisms of protein synthesis (the 1950s and 1960s), there was a comparative dearth of information and effort devoted to studies of protein degradation. That said, it had been known since the pioneering studies of Schoenheimer (1942) (16) that there was continuous turnover (synthesis and breakdown) of cellular proteins in eukaryotic cells. The extent of that turnover (intracellular protein degradative process) and its importance to the This JBC Review is part of a collection honoring Herbert Tabor on the occasion of his 100th birthday. The author declares that she has no conflicts of interest with the contents of this article. 1 To whom correspondence may be addressed. E-mail: jsb13@psu.edu. 2  vitality of the cell, however, was unappreciated. Cell death was recognized to involve proteases, as were wasting diseases (e.g. type 1 diabetes), and lysosomes (17) were thought to handle these "downhill" processes through autophagy. Studies with individual proteins indicated great differences in turnover of specific proteins (18,19), and the concept of short-and longlived proteins grew with studies of many individual cellular proteins. There was expanding interest in intracellular protein degradation in the 1970s, and one of the first conferences in the United States that heralded that interest was organized by Bob Schimke (an Associate Editor of the JBC) and Nobuhiku Katunuma (a prominent biochemist in Japan) in 1973, the Conference on Protein Turnover in Palo Alto, California (20). Intracellular protein degradation was clearly of international interest and activity, leading to several conferences in Europe in the 1970s. For example, Alan Barrett organized a meeting at Strangeways Research Laboratory in Cambridge, England, in 1970 on tissue proteinases; in 1973, a group of scientists at the Martin Luther University in Halle, German Democratic Republic (GDR), organized a symposium on intracellular protein catabolism in Reinhardsbrunn, GDR; Vito Turk organized a meeting in 1975 in Lubljana, Yugoslavia (now Slovenia); and Professors Horst Hanson and Peter Bohley organized additional conferences on intracellular proteolytic enzymes and protein turnover in vivo in 1977 and 1981. The 1970s were times in which GDR scientists could not leave their country for meetings, so scientists in Western countries went to the GDR, placing science above politics. This interest resulted in the formation of committees to increase communication among scientists who work on proteases and protein turnover. First there was ECOP, the European Committee on Proteolysis, in 1981, followed by ACOP (the American Committee on Proteolysis, which organized the 5th International Symposium on Intracellular Protein Catabolism) and then JCOP, the Japanese Committee on Proteolysis, and finally ICOP, the International Committee on Proteolysis. These were forerunners of the current International Proteolysis Society formed in 1999.
Before the 1970s, there were several myths, or misconceptions, regarding proteolytic enzymes and protein turnover.
• There were many who thought the only function of proteases was to totally degrade proteins at certain stages of life (particularly end-stages) or that their only function was to be secreted in order to degrade extracellular proteins, thereby releasing amino acids so that other proteins could be synthesized.
• It was thought that there were very few proteases in cells and that they could handle a great variety of degradative functions, similar to the trypsins and chymotrypsins along with some exopeptidases that could degrade almost anything in the intestinal tract.
• There were bacteriologists who argued that protein degradation did not occur in growing procaryotes because there was no need to degrade proteins; it was thought that defective, damaged, or useless proteins could be diluted out as cells divided rapidly.
• The known proteases were small (20 -35 kDa), compact, uncomplicated (no carbohydrate, lipids, or cofactors) proteins, and it was assumed that this was generally true of all proteases.
• Lysosomes were thought to be the primary or only site for degrading proteins in cells, as well as those taken up by endocytosis, and that this occurred through the merging of lysosomes and other cell components to form autophagic vacuoles.
But now we know that there are a large number of proteases in and secreted from cells. Proteinases are the largest enzyme gene family in vertebrates.
• There are 641 protease genes in the human and 677 in the mouse (i.e. ϳ3% of the human and mouse genome).
• Proteolysis occurs in virtually all stages of a cell's life, in all cell compartments, and in many stages of a protein's existence: from processing of preproproteins coincident or soon after protein synthesis to total destruction of the protein.
• There are a great variety of protease structures, from small to large (20 kDa to 6 MDa), highly complex structures, some containing multiple domains with many posttranslational moieties, such as carbohydrates and lipids.
• Lysosomal proteases are not the only intracellular proteases and, under many circumstances, are not the major proteases responsible for intracellular protein degradation.
• Evolutionary clans and families of proteases have been identified, and the classification of individual proteases is highly developed.
• Proteases regulate fate, localization, and activity of many proteins.
• Proteases are key factors in the health and viability of cells, involved in multiple processes, such as replication, transcription, cell proliferation, differentiation, extracellular matrix remodeling, and processing of hormones and biologically active peptides.
• Proteases are involved in many diseases (e.g. cancer, Alzheimer's, arthritis, blood clotting disorders, allergies, and infections, to name a few).
• Protease inhibitors are useful medically (e.g. angiotensinconverting enzyme inhibitors for blood pressure, HIV inhibitors, proteasome inhibitors for myeloma, dipeptidyl peptidase IV inhibitors for type II diabetes).

Herb Tabor's leadership and protease advances during his oversight of the JBC
The JBC has been a major vehicle for elucidating the structures and functions of proteases and especially the fundamental aspects of these enzymes. Herb was responsible for keeping the Journal focused on fundamental/basic science, not the "hot science" of the day. His emphasis was on high-quality science that stood the test of time and had the potential of long-range importance and impact.
Herb also has had a strong commitment to and influence on the ASBMB. I know this through my role as an Associate Editor of the JBC from 1999 to 2012 and as a president of the ASBMB (2004 -2006). Herb participated in many activities of ASBMB, including business and financial meetings, publication committee meetings, centennial planning meetings, and Associate Editor and editorial board member activities. When it was time for the centennial celebration, he felt strongly that both the Society and the JBC should be celebrated together, even though the Journal started in 1905, one year before the Society was established (1906). He gave strong support to the Associate Editors and staff. He was always thinking ahead about issues, best ways to communicate, and new emerging areas. Herb always listened to various viewpoints, considered alternatives, and had an uncanny way of getting people to "agree" with his view. He has always been forward-looking and especially encouraged the online version of the Journal; the JBC was the first of the life science journals to appear online (in 1995).
There has been great excitement about proteases and their functions in the last half-century. A few examples of discoveries that created that excitement will be mentioned here.
• The discovery of proteasomes and the ATP-ubiquitin proteolytic pathway certainly changed our view of the world of protein degradation. The role of ubiquitin and the proteasome in intracellular protein breakdown began to unfold in the 1970s (e.g. see Refs. [21][22][23] and expanded rapidly in the 1980s (e.g. see Refs. 24 -28).
• Signal peptidases that cleave signal peptides from secretory and membrane-associated proteins as they are translocated across membranes and into the endoplasmic reticulum were discovered in the 1970s and 1980s (e.g. see Refs. 29 -31).
• Caspases, proteases involved in programmed cell death (apoptosis), were discovered in Caenorhabditis elegans in the 1980s, and the complexity of the caspase family in humans and the role of these enzymes in apoptosis and cytokine processing was revealed in the 1990s (e.g. see Refs. 32 and 33).
• The HIV-1 protease, the retroviral aspartic protease that is essential for the maturation of the AIDS virus, was discovered in the 1980s (see Ref. 34). This protease is a prime target for drug therapy, and inhibitors of the protease, along with other drugs, have greatly prolonged the lives of people infected with the virus. The development of inhibitors of the HIV-1 protease was accelerated by the large body of information available about aspartic proteases in many organisms, which allowed development of specific viral protease inhibitors. This is an example of the importance of basic science for therapeutic advances.
• The great variety of cysteine proteases (e.g. cathepsins and calpains) and their diverse functions have come to light in recent decades (see, for example, Ref. 35). They participate in a variety of processes, including autophagy, the lysosomal degradation of cellular constituents. The discovery of the molecular players in the autophagic process has enhanced our understanding of this process in health and disease (36,37).

Advances in metalloproteases, highlighting meprins
Metalloproteases have emerged as a fascinating group of enzymes. They are present in all kingdoms of living organisms and have expanded widely during evolution. In 1980, 11 metalloproteinases were identified (38). Now we know that the mouse and human genomes encode ϳ200 metalloproteinases, the largest group in the proteolytic enzyme realm (39). Most of these enzymes are secreted from cells or plasma membranebound, and they act pericellularly and extracellularly. They are involved in tissue differentiation and remodeling during embryogenesis and in processing biologically active peptides and cytokines in adult tissues. Angiotensin-converting enzyme inhibitors to control blood pressure are among the most widely used inhibitors for humans. Metalloproteinases are also involved in many diseases, such as cancer and inflammatory diseases. They and their inhibitors (e.g. TIMPs (tissue inhibitors of metalloproteinases)) are of great medical interest and have provided optimism and disappointment in clinical trials. The use of synthetic inhibitors of metalloproteinases to inhibit cancer cell mobility provides great promise but has not yet reached its potential.
The metzincin superfamily contains most of the known metalloendoproteinases (zinc-containing enzymes that cleave peptide bonds internally on protein substrates) (39). The superfamily is composed of six evolutionarily related families: a disintegrin and metalloproteinases (ADAMs), MMPs, pappalysins (pregnancy-associated plasma proteins), serralysins (bacterial enzymes), leishmanolysins (protozoan proteinases), and astacins (Fig. 1). Each of these families has multiple individual enzymes and fascinating stories of discovery and functions. Interestingly, there are relatively low amino acid sequence similarities between the protease domains of different families. However, all of the catalytic domains have strikingly similar three-dimensional structures as well as a conserved zinc binding domain (HEXXHXX(G/N)XX(H/D)) at the active site and a conserved methionine-containing turn (Met-turn) (40). This review will focus on the astacin family (41,42), and particularly meprins of this family.
The astacin family was recognized as a consequence of extensive cloning and sequencing that occurred in the 1980s and 1990s (see Fig. 1). The original members of the family, identified by sequence similarities, were as follows: the crayfish digestive enzyme astacin, bone morphogenetic protein-1 (BMP-1) from human bone, meprins from mouse kidney and human intestine, and UVS.2, a partial sequence from Xenopus laevis embryos (41). The name "astacin family" was chosen because the crayfish Astacus astacus enzyme was the first to be sequenced and characterized (43,44). Astacins are present in animals and bacteria; none have yet been found in plants and fungi. Hundreds of astacins have been identified as genome sequencing expands to many species (45). In the human and mouse genomes, six astacin family genes have been identified, which includes two meprin genes, three BMP-1/tolloid-like genes, and one ovastacin gene. However, in Drosophila melanogaster, there are 16, and in C. elegans there are 40 astacins. The functions of most of the astacin genes in D. melanogaster and C. elegans have not been determined, but in parasitic nematodes, astacin enzymes are involved in moving through extracellular matrixes and in Hydra in head regeneration (46).
Of the characterized astacins, the crayfish astacin is the smallest, containing a 200-amino acid residue catalytic domain. From cDNA sequencing, it is known that there is a prepro sequence that is cleaved off during protein synthesis. Pre or signal sequences are found in all of the astacin family members examined thus far, presumably to direct the protein into the endoplasmic reticulum and the secretory pathway. Pro sequences keep the enzymes inactive as a regulatory mechanism. Whereas the active crayfish protein contains only the ϳ20-kDa catalytic domain, most of the astacin family members contain one or more noncatalytic domains, C-terminal to the protease domain. Many contain one or more copies of an epidermal growth factor (EGF)-like domain, and a CUB (complement subcomponents Clr/Cls, embryonic sea urchin protein UEGF, BMP-1) domain (42). These are important for proteinprotein or protein-substrate interactions. The noncatalytic domains are responsible for the variety of sizes of family members, which range from 200 (crayfish astacin) to 900 amino acids (mouse BMP-1). In addition, many of the more complex astacins are highly glycosylated proteins, further increasing their molecular mass and complexity.
Meprins, members of the astacin family, are unique oligomeric metalloproteases, containing homo-and hetero-oligomers of two evolutionarily related subunits, ␣ and ␤ (see Fig. 2). They exemplify the complexity of the metzincin superfamily members.
Meprins were discovered in 1980 as a consequence of a search for proteolytic enzymes in diabetic mice (47,48). In the process of searching for changes in proteolytic activity and protein turnover in streptozotocin-induced diabetes in BALB/c mice, proteolytic activities were measured in liver, kidney, and muscle tissues using a variety of substrates. No fundamental changes were found in degradation rates in the liver or in the proteolytic activities measured in diabetic mouse tissues compared with controls. However, it was noted that the kidneys of these mice had a relatively high activity using azocasein as substrate at basic pH values (pH 9). The enzyme was then purified from BALB/c mouse kidney and found to be a glycosylated membrane-bound metalloprotease with a subunit molecular mass of 85-90 kDa (48). The subunits formed disulfide-bridged dimers, and the dimers formed tetramers of 320 kDa. The human equivalent of mouse meprin, an intestinal enzyme called PABA-peptide hydrolase (named after the substrate hydrolyzed) was reported in 1982 (49), but the similarities of the mouse and human enzymes were not recognized until both were cloned and sequenced (41). In 1983, BALB/c mice were unavailable for a period, and members of my laboratory collected kidneys from two other inbred mouse strains, C3H/He and CBA mice. In contrast to BALB/c mice, these mice had very low kidney azocaseinase activity. This led to a publication (50) describing the "deficiency" in many "C stock" mice. The publication was noticed by Chella David, a mouse immunogeneticist at the Mayo Clinic, who informed us that many C stock mice were bred for transplantation studies and had differences in the major histocompatibility genes (H-2 genes). These observations led to collaborative studies and the discovery that a gene (the Mep-1a gene) on mouse chromosome 17 near the H-2 complex was responsible for the level of expression of meprin activity in mouse kidney (51,52). It is now known that the Mep-1a gene codes for the meprin ␣ subunit (53). Further studies with the "deficient" or low-meprin mouse strains revealed that they expressed a latent form of meprin (54), containing meprin ␤ subunits. The catalytic domain of meprin ␤ is 58% identical to that of meprin ␣, and is encoded on mouse chromosome 18 (55), an example of divergent evolution from a single gene (53). The studies of meprin ␣ and ␤ in different strains of mice have led us to understand that there are several different combinations of meprin ␣ and ␤ that exist in mouse kidney to form the quaternary structure of the meprins, and the isoforms of meprin A and B (56), EC 3.4.24.63 and EC 3.4.24.18, respectively. The reason for the lack of expression of the ␣ subunit in adults of some inbred mouse strains is unknown, but it is known that this is developmentally regulated because all mouse strains express both meprin ␣ and ␤ in embryonic kidney and until puberty (57).
The work on meprins in the 1980s was with proteins isolated from kidney brush-border membranes of adult mice; those from mice with high azocaseinase activity contain both ␣ and ␤ subunits, whereas those with the low meprin activity ("deficient" strains) contain only ␤ subunits. The mouse ␣ subunit is fully active at the plasma membrane, whereas the ␤ subunit is predominantly latent but can be activated by trypsin-like enzymes. Studies of the activation of meprin ␣ indicate that removal of the prosequence allows for formation of hydrogen bonds involving the two N-terminal residues that are critical for enzyme structure (58). With the advances of molecular biology in the 1980s and 1990s, especially cloning, sequencing, and sitedirected mutagenesis, progress on structure and function proceeded at a rapid pace.
Meprin isoforms are structurally quite complex, with multidomain, multimeric structures, as shown diagrammatically in Fig. 2. The oligomers are composed of meprin ␣ and/or ␤ disulfide-linked dimers that may self-associate to form highermolecular weight isoforms. Homomeric meprin A contains only ␣ subunits, heteromeric meprin A contains both ␣ and ␤ subunits, and homomeric meprin B contains only ␤ subunits. Both subunits are glycosylated, and the asparagine-linked sugars are important for disulfide bond formation, oligomerization, stability, secretion, and enzymatic activity (59,60). As for the domain structure, both subunits contain a signal sequence, prosequence, protease (catalytic) domain, MAM domain, TRAF domain, EGF-like domain, transmembrane domain, and small C-terminal tail (6 -26 amino acids). The noncatalytic domains are important for transport, structure, and activity of the proteases (Fig. 3) (61, 62). One notable difference between the ␣ and ␤ domain structures is that meprin ␣ contains an I (inserted) domain between the EGF and TRAF domains that is missing in the ␤ subunit. There is a proteolytic cleavage within the I domain during maturation in the secretory pathway that results in the release of this subunit from the membrane (63)(64)(65). Removing the I domain from the meprin ␣ subunit by sitedirected mutagenesis showed that the I domain is necessary and sufficient for proteolysis and release of the subunit from the membrane (Fig. 3). Therefore, meprin isomers containing only meprin ␣ are secreted into the extracellular space. The disulfide-linked meprin ␣ dimers tend to associate noncovalently into high-molecular weight complexes of 1-6 MDa, among the largest proteolytic complexes secreted in living systems (66) (Fig. 4). This self-association concentrates the monomer meprin A in the extracellular environment and may be impor- , and C (cytoplasmic). During maturation, the meprin ␣ subunit is cleaved in the I domain, separating the subunit from the membrane. As a result, three isoforms of meprin exist: membrane-bound meprin B (a homodimer of ␤ subunits), membrane-bound meprin A (heterotetramers of ␣ and ␤ subunits, found in ratios of ␣ 2 ␤ 2 and ␣ 1 ␤ 3 ), and secreted meprin A (homomeric multimers of ␣ subunit dimers). The secreted forms of meprin ␣ dimers tend to self-associate and form large multimers (1-6 MDa) extracellularly.
tant for stability and action, particularly at sites of infection and in the harsh environment of the intestine. Membrane-associated forms of meprins all contain meprin ␤ subunits and may consist of meprin ␤ disulfide-linked dimers, meprin ␣/␤ dimers that form tetramers, or meprin ␣/␤ dimers that associate noncovalently with meprin ␣ dimers. Cross-linking studies with the meprin B dimer have revealed a compact structure with inter-and intradomain contacts within the protein, including TRAF-TRAF interactions (67). An X-ray structure of human meprin B is available showing that the active site is close to the membrane, which has implications for the shedding activity of this isoform (68).
The localization of meprins to kidney and intestinal brushborder membranes was originally deduced from cell fractionation studies and later through immunohistochemical studies (69,70). Meprins are also expressed in leukocytes, and studies with these cells from mice with a deleted meprin ␤ gene showed a diminished ability to move through extracellular matrix (71). Meprin expression in monocytes and natural killer cells affects homeostasis of these cells (72). Human meprin ␣ and ␤ subunits are expressed in different layers of the epidermis and are involved in cell proliferation and terminal differentiation of human skin cells (73). Through mRNA studies, meprin subunits have been detected in pancreas, testis, fetal liver, embryonic stem cells, and brain tissues. The highest expression levels are in kidney and intestine. For this reason, the functions of meprins have primarily been studied in these tissues. Additional insights into the function of the meprins have been gathered by creating knockout mice and challenging the mice (74,75).
Meprins are capable of hydrolyzing a wide variety of substrates, from peptides to proteins. Mouse meprin B has a clear preference for acidic residues at cleavage sites; by contrast, homomeric meprin A prefers to cleave bonds containing small or hydrophobic amino acid residues (76,77). These preferences lead to clear differences in the hydrolysis of peptide substrates. For example, meprin B, but not homomeric meprin A, cleaves gastrin and osteopontin; homomeric meprin A, but not meprin B, cleaves bradykinin and substance P. Meprin B also cleaves cell surface proteins, such as E-cadherin and ENaC (epithelial sodium channel) and thereby can affect cell-cell interactions and ion transport (78). Meprin A cleaves the tight junction protein occludin, which impairs epithelial barrier function and enhances monocyte migration (79). The meprin isoforms also have different preferences for cytokine activation and degradation, and the balance of the meprin isoforms could have important implications in cytokine profiles and the progression of inflammatory responses (80,81,82). High-throughput techniques, in a search for substrates of human meprins, have revealed many substrates and interesting links between meprins and ADAMs (83).
There is good evidence that meprins are involved in several disease processes, and these are areas that will be explored in  the future and have therapeutic possibilities. For example, meprins are present at sites of inflammation, where they affect migration of leukocytes, degrade tight junction proteins, and activate/degrade cytokines. Studies of meprin ␣ knockout mice have shown that decreased expression of this subunit is associated with increased intestinal inflammation in an experimental model of intestinal bowel disease. Furthermore, the human MEP1A gene is a susceptibility gene for inflammatory bowel disease, particularly ulcerative colitis (84). Meprins also influence the course of urinary tract infections in mice (85). Other studies with mice have implicated meprins in the pathogenesis of kidney diseases (75,(85)(86)(87)(88)(89), and polymorphisms in the human MEP1B gene are associated with diabetic nephropathy in Pima Indians (90). Meprins are expressed in various cancer cells (e.g. colon and breast) and are thought to play a role in tumor cell invasion and migration (91)(92)(93)(94). Meprins have also been found to cleave amyloid precursor protein (APP) in vivo, implying a role in neurodegenerative diseases, such as Alzheimer's (95). Recent studies of the interaction of meprins with mucins in the intestine imply a role in protecting the host epithelium from bacteria and affecting the microbiome (96).
One of the challenges of the future is to understand the function of proteases in living organisms and to be able to activate and inhibit them selectively in specific tissues. Proteases exist in the context of networks of other molecules and other proteases, in cellular compartments, at cellular membranes, and in the extracellular milieu, and these environments are no doubt critical in determining function. System-wide approaches, such as degradomics that uses a combination of genetics, cell biology, and proteomics to identify substrates and active proteases, will be necessary to understand and regulate proteolytic systems (97). Herb Tabor has built the foundation for the JBC to move past the identification and characterization of individual enzymes and into complex multicomponent systems to shed more light on the role of proteases in the fabric of life.