Substrate Recognition by Recombinant Serine Collagenase 1 from Uca pugilator *

pugilator serine collagenase 1 was cloned and sequenced from a fiddler crab hepatopancreas cDNA library. A full-length sequence encodes a 270-amino acid pre-pro-enzyme highly identical in structure to the chymotrypsin family of serine proteases. The zymogen form of the enzyme was expressed in Saccharomyces cerevisiae as a fusion with the (cid:97) -factor signal sequence under control of the alcohol dehydrogenase/glyceraldehyde-3- phosphate dehydrogenase promoter. Upon activation with trypsin, the recombinant collagenase possesses col- lagenolytic properties identical to those of the enzyme isolated from the crab hepatopancreas. The collagenase substrate binding pocket recognizes a wide range of basic, hydrophobic, and neutral polar residues. (cid:98) -Branched and acidic amino acids are poor substrates. Acylation is rate-limiting for collagenase versus peptidyl amides, rather than deacylation, as for trypsin and chymotrypsin. Correlations relating substrate volume and hydrophobicity to catalysis were found for collagenase and compared to those for chymotrypsin and elastase. Relative enzyme efficiencies on single amino acid versus tetrapeptide amide substrates show that collagenase de- rives less catalytic efficiency from binding of the primary substrate residue than trypsin or chymotrypsin, but compensates in binding of the extended peptidyl residues. Serine collagenase 1 is a novel member of the chymotrypsin protease family, by virtue of its amino acid sequence and multifunctional active site. Expression Purification of the Recombinant Crab Procollagenase in Yeast— zymogen collagenase PsT Pfu

The chymotrypsin family of serine proteases is a paradigm for enzymic substrate recognition. The family is subdivided on the basis of four major classes of P1 1 residue substrate specificity: basic, aromatic, aliphatic, and acidic. These specificities are usually mutually exclusive; substrate discrimination is on the order of 10 4 to 10 5 in k cat /K m for trypsin (Lys Ͼ Phe), chymotrypsin (Phe Ͼ Lys, Phe Ͼ Ala), elastase (Ala Ͼ Tyr), and V8 protease (Glu Ͼ Ala) (2,3). These distinct specificities arise from subtle modification of surface loops surrounding a conserved double ␤ barrel core structure (3). Sequence and struc-tural similarity suggested a classical model in which only a few critical residues determine substrate specificity (4). However, recent studies demonstrate that the conversion of one protease into another is complex, requiring the transplantation of several active site loops (5,6). Thus, the evolutionary optimization of this enzyme family may obscure important mechanistic and structural commonalities regarding substrate specificity.
Serine collagenase 1 (EC 3.4.21.32) isolated from the hepatopancreas of the fiddler crab, Uca pugilator, is a serine protease capable of cleaving native triple helical collagen (7). The serine collagenases comprise a large family of homologous, yet nonidentical enzymes of mostly invertebrate origin (8). These collagenases appear to serve primarily a digestive function. Other serine collagenases have been implicated in pulmonary, parasitic, and bacterial diseases (9 -11). The enzymology of crab collagenase is unusual, as it possesses activities similar not only to the matrix metallocollagenases, but also to the serine proteases trypsin, chymotrypsin, and elastase (12)(13)(14). The collagen cleavage sites of crab collagenase have recently been identified and are located in the protease-sensitive region 3/4 of the length of the collagen chain from the amino terminus (14). Given the similar location of the crab and metallocollagenases in their attack on collagen, crab collagenase is an alternative model system for the elucidation of protease-collagen interactions. Crab collagenase also presents the opportunity to study, in a unified manner, the nature of hydrophobic and basic substrate specificity in the chymotrypsin family of serine proteases.
We present here the cloning, expression, and characterization of crab serine collagenase 1. The collagenolytic activity of the recombinant enzyme is identical to that isolated from crab hepatopancreas. Quantitative structure activity relationships are determined for collagenase and compared to the serine protease homologs trypsin, chymotrypsin, and elastase. These criteria show serine collagenase 1 to be a novel member of the chymotrypsin protease family.

EXPERIMENTAL PROCEDURES
RNA Isolation and cDNA Library Construction-Live fiddler crabs (U. pugilator) were obtained from Gulf Specimen Marine Laboratory (Panacea, FL). The hepatopancreas was dissected, immediately frozen in liquid nitrogen, and stored at Ϫ80°C. Total RNA was extracted from the frozen hepatopancreas using guanidine thiocyanate and partially purified by ultracentrifugation through a cesium trifluoroacetate gradient (15). Poly(A) ϩ RNA was isolated from total RNA by hybridization to biotinylated oligo(dT), which was recovered from solution using streptavidin-coated paramagnetic beads (Poly(A)Tract, Promega). All RNA was stored under ethanol at Ϫ80°C.
A Lambda Zap II crab hepatopancreas cDNA library was constructed and amplified by Clontech Laboratories (Palo Alto, CA). The library contains 1.8 ϫ 10 6 independent clones, with a cDNA insert size range of 1.0 -5 kilobase pairs.
Isolation of the Crab Collagenase cDNA-The polymerase chain re-action (PCR) 2 was used to amplify a fragment of the crab collagenase cDNA from the U. pugilator hepatopancreas library. Two degenerate PCR primers denoted FCN1 and FCC1 were synthesized based on the amino and carboxyl termini of the mature protease amino acid sequence (16) (FCN1, 5Ј-TGCTCTAGA-GTI-GA(A/G)-GCI-GTI-CCI-AA(T/C)-TCI-TGG-3Ј; FCC1, 5Ј-GATAAGCTTGA-TTA-IGG-IGT-IAT-ICC-IGT-(T/C)TG-IGT-(T/C)TG-IAT-CCA-3Ј). Inosine was used to reduce the degeneracy of the oligonucleoide pool by broadening the base pairing potential at these positions. 5 l of library stock containing 3.5 ϫ 10 8 phage were subjected to PCR with the FCN1 and FCC1 oligonucleotides using standard conditions (17). The PCR reaction consisted of five cycles of 1 min of annealing at 44°C, 2 min of polymerization at 72°C, and 1 min of denaturation at 95°C; followed by 30 cycles with an elevated annealing temperature of 50°C. The single-band PCR product was purified by agarose gel electrophoresis and Geneclean (Bio 101). The PCR product was sequenced by the dideoxy method, using Sequenase T7 DNA polymerase (U. S. Biochemical Corp.) and the FCN1 and FCC1 primers.
The library was plated with Escherichia coli strain XL1-Blue, adsorbed in duplicate to nitrocellulose filters, denatured, and fixed according to standard manufacturer's instructions (Stratagene, Clontech). The probe 5Ј-CA-(G/A)AA-(G/A)TA-CAT-(G/A)TC-(G/A)TC-(G/A/T)AT-(G/A)AA-3Ј was a degenerate oligodeoxynucleotide based on the FID-DMYFC (residues 34 -42) motif of the crab collagenase protein sequence (16). The 5Ј end of the degenerate probe was radiolabeled using T4 polynucleotide kinase and [␥-32 P]ATP and hybridized to the plaque lifts overnight at 42°C as described (18). The filters were washed at 47°C and autoradiographed (18). Excision and rescue of the Bluescript plasmid containing the cDNA insert was carried out according to the manufacturer's instructions (Stratagene). Both strands of the cDNA clones comprising the composite map were sequenced by the dideoxy method using Sequenase.
Subsequent screens of the library were carried out using homologous probes generated by [␣-32 P]dCTP PCR from the collagenase clone denoted FC1 (see below) (19). Either an EcoRI fragment containing the entire FC1 cDNA or a 200-bae pair EcoRI-NheI fragment of the 5Ј end of the cDNA were used as templates. Under the conditions of limiting dCTP and high template concentration, the reaction products resembled those of primer extension rather than fragment amplification. These homologous probes were hybridized overnight at 50°C (18). The filters were then washed at 65°C and autoradiographed as described (18).
Amino Acid Alignment and Secondary Structure Modeling of Crab Collagenase-The putative signal peptide of crab collagenase was determined by the hydrophobic nature of the amino acids (20). The amino acid sequences of crab procollagenase and shrimp chymotrypsinogen (EMBL accession no. X66415), rat anionic trypsinogen 2 (Protein Identification Resource (PIR) code, TRRT2; Protein Data Bank (PDB) code, 1BRA), bovine chymotrypsinogen A (PIR code, KYBOA; PDB code, 7GCH), and porcine proelastase 1 (PIR code, ELPG; PDB code, 3EST) were aligned using the PILEUP program of the GCG software package (Genetics Computer Group, Madison, Wisconsin), and consensus structural constraints, as derived from alignment of proteases of known three-dimensional structure (21,22).
Expression and Purification of the Recombinant Crab Procollagenase in Yeast-The zymogen form of crab collagenase (procollagenase) was cloned in frame with the ␣-factor leader of the PsT vector (5). PCR with Pfu DNA polymerase (Stratagene) was used to generate the necessary HindIII and SalI restriction endonuclease cleavage sites. This construct was named PsFC. The full expression vector was created by subcloning the PsFC SstI/SalI fragment containing the alcohol dehydrogenase/ glyceraldehyde-3-phosphate dehydrogenase promoter, ␣-factor leader and procollagenase into the PyT 1 M circle yeast/E. coli shuttle vector (5), yielding PyFC.
The PyFC construct was electroporated into the AB110 or DM101␣ strain of Saccharomyces cerevisiae, and transformants were selected by growth at 30°C on SD (8% glucose) plates lacking either uracil or leucine (23). A small culture was grown up in SD-Leu Ϫ (8% glucose) for 36 h at 30°C with gentle shaking. This culture was diluted 1:20 into YPD (2% glucose) and grown for 60 -72 h at 30°C with gentle shaking.
The yeast cells were removed by centrifugation and the supernatant was adjusted to pH 7.4 by addition of Tris base to a final concentration of 10 mM. DEAE chromatography was performed as described for the enzyme isolated from the crab hepatopancreas (14). Fractions were assayed for procollagenase either by Western blot analysis or by activation with trypsin. The activation assay contained 20 l of sample, 5 l of 1 M TPCK-treated bovine trypsin (Sigma), and 200 l of 400 M Suc-AAP-Leu-pNA in 50 mM Tris, 100 mM NaCl, 20 mM CaCl 2 , pH 8.0. The reaction course was monitored at 405 nM at room temperature using UV max microtiter plate reader (Molecular Devices). The fractions containing procollagenase were pooled and adjusted to 50 mM Tris, 100 mM NaCl, 20 mM CaCl 2 , pH 8.0. Addition of a 0.5% volume of TPCKtreated, agarose-immobilized bovine trypsin (Sigma) resulted in complete activation of the zymogen after 2 h of gentle shaking at room temperature, as monitored by increase in activity toward Suc-AAP-Leu-pNA. The activated collagenase was further purified by bovine pancreatic trypsin inhibitor affinity chromatography (14). An overall yield of 1 mg of recombinant collagenase/liter of yeast culture was achieved.
Kinetic Analysis of Recombinant Collagenase, Trypsin, Chymotrypsin, and Elastase-Collagenase was prepared from crab hepatopancreas as described (14). Recombinant rat trypsin was purified as described (24). Other reagents were purchased from the following sources: p-tosyl-L-lysine chloromethyl ketone-treated bovine chymotrypsin (Sigma), porcine elastase (Calbiochem), bovine calf skin collagen (U. S. Biochemical Corp.), Suc-AAP-Abu-pNA (Bachem, Torrance, CA) and Z-GPR-Sbzl (Enzyme Systems Products). All other substrates were from Bachem Bioscience. All enzyme active site titrations, substrate calibrations, kinetic assays, and collagen digestions were carried out as described (14,25). Briefly, pNA kinetic assays were monitored at 410 nm (E 410 ϭ 8,480 M Ϫ1 cm Ϫ1 ) in 50 mM Tris, 100 mM NaCl, 20 mM CaCl 2 , pH 8.0, at 25°C. A total of 1-4% N,N-dimethylformamide or 2% Me 2 SO was present in the final reaction buffer. Benzylthioester kinetic assays were monitored at 324 nm (E 324 ϭ 19,800 M Ϫ1 cm Ϫ1 ) in the above buffer at 25°C with the inclusion of 250 M dithiodipyridine (Chemical Dynamics) and 2% N,N-dimethylformamide. 7-Amino-4-methylcoumarin spectrofluorimetric assays were monitored at an excitation wavelength of 380 nm and an emission wavelength of 460 nm, under conditions identical to those for pNA. Assays were done in duplicate for 5 substrate concentrations, except for Suc-AAP-Asp-pNA, for which the k cat /K m was determined using three substrate concentrations in duplicate. The steady state kinetic parameters were determined by non-linear regression fit to the Michaelis-Menten equation. Standard deviation in k cat /K m was generally less than 10%, though individual rate and binding constants varied to a greater extent. In particular, error for elastase was 15% in k cat versus Suc-AAP-Val-pNA and 25% in K m versus Suc-AAP-Ile-pNA. Kinetic parameters were plotted versus P1 residue volume (26) and the hydrophobicity constant, (27).

Detection and Isolation of Crab Collagenase Clones from the
Hepatopancreas cDNA Library-Crab collagenase clones were detected in the cDNA library by two methods utilizing degenerate oligonucleotides based on the amino acid sequence of the protease (16). In the first method, a set of oligonucleotides, FCN1 and FCC1, complementary to the amino and carboxyl termini of mature collagenase were used in the polymerase chain reaction to amplify a DNA fragment from the cDNA library. A single, intense band of approximately the size of the mature protease (670 base pairs) was produced. 3 Direct sequencing of the PCR DNA yielded sequence around His 57 , Gly 189 , and Phe 215 (chymotrypsinogen numbering) of the collagenase. The cDNA library was also screened with a degenerate oligonucleotide complementary to the FIDDMYFC sequence of the collagenase (residues 34 -42). This sequence was chosen for three reasons: 1) minimal sequence identity to other serine proteases, 2) proximity to the 5Ј end of the gene permitting isolation of more full-length clones from the oligo(dT)primed cDNA library, and 3) low amino acid coding degeneracy (96-fold degenerate). 40,000 plaques were screened, yielding 10 primary, 7 secondary, and 3 tertiary isolates. The most complete clone, denoted FC1, contains a 15-amino acid signal se-quence, a 29-amino acid zymogen peptide, and the entire 226amino acid mature form of the collagenase, as well as 143 bases of 5Ј-and 153 bases of 3Ј-untranslated sequence (see Fig. 1 and below). The likely start codon of clone FC1 is a non-optimal AGG (Arg), rather than the expected ATG (Met) (28). Further screening of the library was indicated, as no ATG start codon could be located in any reading frame near the expected start site. Screening of an additional 30,000 plaques with PCR fragments generated from the FC1 template yielded 15 primary, 9 secondary, and 6 tertiary isolates. Two clones, FC2 and FC3, yielded necessary sequence data. Clone FC2 provided the requisite ATG start codon, though uncharacterized recombination events rendered the 5Ј-untranslated region and the 3Ј third of the cDNA unusable. Clone FC3 encoded the complete collagenase zymogen minus the signal sequence and 5Ј-untranslated region, while the 3Ј-untranslated region extends into the poly(A) tail. The cDNA presented in Fig. 1 is a composite of FC1, the ATG start of FC2, and the poly(A) tail of FC3. The coding sequences of all clones were identical.
Sequence Analysis of Recombinant Collagenase-The published amino acid sequence (16) contained six changes relative to the sequence predicted from the cDNA. These changes appear to reflect errors in the original amino acid sequence determination, rather than amino acid variation due to the cloning of an isozyme of crab collagenase. 4 The discrepancies and the possible causes are: I106V, carryover of Val 105 ; S110V, weak detection of Ser; S164N/N165S, acid-induced N 3 O acyl shift, weak detection of Ser and Asn; N192D and N202D, acidinduced deamination (chymotrypsinogen numbering, where the first letter denotes the amino acid predicted from the cDNA sequence and the second letter denotes the amino acid from the original sequence determination). One of the errors in the protein sequence, N192D, maps to the rim of the S1 site, and must be considered regarding the possible effect of the negative charge on substrate recognition. The other errors appear to map to the surface of the enzyme and are most likely functionally inconsequential.
The amino acid sequence of mature crab collagenase is homologous to the mammalian serine proteases trypsin, chymotrypsin, and elastase (35% identity) and to shrimp chymotrypsin (75% identity), another serine collagenase (Fig. 2) (16,29). Virtually all major structural features of a chymotrypsin-like serine protease are found in crab collagenase. Three disulfide bonds (residues 42:58, 168:182, and 191:220) are conserved. Conservation of the double ␤ barrel core is strict, and the surface loops are similar in size to those of the vertebrate paradigms. Some are of unique sequence and may play a role in determining the broad substrate specificity of crab collagenase. An unusual crab collagenase active site geometry of Gly 189 and Asp 226 , as compared to Asp 189 and Gly 226 in trypsin, is maintained in the cDNA (16).
Comparison of the zymogen peptides of these enzymes serves to further delineate the group, as they are of variable length and share little identity (Fig. 2). Crab collagenase and shrimp chymotrypsin possess zymogen peptides that are 2-3 times longer than those of the vertebrate proteases. The purpose of these large activation domains is unclear, as they are not required for heterologous expression of vertebrate proteases 4 R. A. Bradshaw, personal communication. such as trypsin (30). The activation site of procollagenase, VKSSR-IVGG, is more similar to those of chymotrypsinogen, SGLSR-IVVG, and proelastase, ETNAR-VVGG, which are activated by trypsin, than that of trypsinogen, DDDDK-IVGG, which is activated by enterokinase (31). Crab collagenase may self-activate, or another trypsin-like protease in the crab hepatopancreas may perform this function (32). The primary sequence alignment suggests that crab collagenase and shrimp chymotrypsin are members of a novel serine protease subfamily.
Expression and Purification of Crab Collagenase in S. cerevisiae-Crab procollagenase was cloned into the PyT S. cerevisiae expression vector (5) as a fusion with the ␣-factor signal sequence under the transcriptional control of the alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase promoter and alcohol dehydrogenase terminator, yielding the PyFC construct. Yeast containing PyFC secrete a 30-kDa protein into the medium, which cross-reacts with anti-crab collagenase antibodies on Western blots. 3 The recombinant procollagenase is purified from the yeast medium in much the same manner as the native collagenase from crab hepatopancreas (14). DEAE chromatography, trypsin activation, and subsequent bovine pancreatic trypsin inhibitor affinity chromatography are used to purify the recombinant enzyme to homogeneity. The mature recombinant collagenase is identical in size to that isolated from the hepatopancreas (Fig. 3a).
Activity of Recombinant Collagenase Versus Type I Collagen-The collagenolytic activity of the recombinant collagenase was compared directly to that of the enzyme isolated from the crab hepatopancreas (Fig. 3b). The specificity and rate of collagen cleavage are similar. The signature 3/4-and 1/4-length fragments are identical in morphology, including the 1/4-length triplet. Furthermore, the collagenolytic activity of the recombinant enzyme is completely inhibited by the serine protease inhibitor 4-(2-aminoethyl)benzenesulfonyl fluoride, as previously demonstrated for the hepatopancreas collagenase (14).
Activity of Recombinant Collagenase Versus Peptidyl pNA Substrates-The Michaelis constants of the recombinant collagenase were determined for a matched set of 15 Suc-AAP-Xaa-pNA substrates, varying only in the P1 residue (Table I). The relative balance of specificities (k cat /K m ) of the recombinant enzyme is similar to that reported previously for the hepatopancreas enzyme versus the Arg, Lys, Gln, Leu, and Phe substrates, within an error of 15-30% (14). The remaining 10 substrates Ala, Abu, Nva, Val, Nle, Ile, Met, Orn, Asp, and Glu were selected to more fully map the specificity of crab collagenase for hydrophobic, basic, and acidic residues. The substrate preference of the collagenase is quite broad. The most striking aspect of the specificity of the enzyme regards the amino acids residues it rejects (Fig. 4). ␤-Branched and acidic side chains are extremely poor substrates. Although the apparent binding constants (K m ) for Val and Ile are similar to those of the other hydrophobic substrates, k cat is as much as 10 3 -fold lower. Acidic residues are generally poor substrates. There is no correlation in K m for the various substrates (r ϭ 0.57; see Equation 3 in Table II), suggesting that there are several modes of ground state binding. This implies the existence of several distinct S1 sites or a single flexible site (14). A correlation (r ϭ 0.76; Equation 1 in Table II) for log k cat versus P1 residue volume (Å 3 ) is observed, irrespective of hydrophobicity (26). The correlation is improved and slope essentially unchanged (r ϭ 0.95; Equation 2 in Table II) if only the hydrophobic residues Ala, Abu, Nva, Nle, Leu, Met, and Phe are included. A weaker correlation of log(k cat /K m ) versus residue volume (r ϭ 0.89; Equation 4 in Table II) for this hydrophobic subset is found (Fig. 5). These results suggest that the transition state may be stabilized in part by hydrophobic interactions. It is unclear how the enzyme binds the neutral hydrophilic and basic residues so as to minimize the effects of charge or polarity in the transition state. Bias or insensitivity in the data set may also affect the interpretation of the correlations.
Correlations of Serine Protease Specificity-The steady state kinetic parameters of chymotrypsin and elastase versus the Suc-AAP-Xaa-pNA substrate set were determined under conditions identical to those for crab collagenase (Table I). This was necessary in order to accurately compare the activities of these different enzymes. Strong positive (chymotrypsin) and negative (elastase) correlations were found for log k cat or log(kcat /K m ) versus P1 residue volume (r Ն 0.95; Equations 6, 8, 10, and 12 in Table II; Fig. 5). Val and Ile were omitted for chymotrypsin, while Nva and Leu were deleted for elastase, as these points deviated significantly from the rest of the data (see "Discussion"). A tight negative correlation of K m versus volume was found for chymotrypsin (r ϭ 0.95; Equation 7 in Table II), while a much weaker positive correlation was seen for elastase (r ϭ 0.68; Equation 11 in Table II). The sensitivities of chymotrypsin and elastase log(k cat /K m ) to residue volume are identical and twice that of collagenase (Fig. 5). Chymotrypsin log(kcat /K m ) also correlated with , the log of the octanol:water partition coefficient of the residue minus the log of the coefficient for Gly (27) (m ϭ 2.0, r ϭ 0.98; Equation 9 in Table II). This result with tetrapeptide amides is consistent with the correlation of log(k 2 /K S ) for single-residue esters with , where a slope of 2.2 was found (33). Collagenase log(k cat /K m ) is less sensitive to (m ϭ 0.80, r ϭ 0.89; Equation 5 in Table II), while elastase log(k cat /K m ) correlated well, with a slope equal and opposite that for chymotrypsin (m ϭ Ϫ2.0, r ϭ 0.94; Equation 13 in Table II).
Contribution of the P1 Residue to Catalytic Efficiency-The relative contribution of the P1 residue to the cleavage of peptidyl substrates was estimated by comparing the catalytic efficiencies of collagenase, trypsin, chymotrypsin, and elastase versus single-residue and tetrapeptide P1-Arg, Phe, or Ala substrates (Fig. 6). While k cat /K m of all enzymes for the peptidyl substrates are similar, within 2-20-fold, there is a 10-to 10 4fold difference in k cat /K m for the single-residue substrates. Trypsin derives the highest k cat /K m from its single-residue Arg substrate, manifesting a 100-fold differential as compared to the peptidyl Arg cognate. Chymotrypsin shows a 10,000-fold differential in efficiency for single-residue Phe versus peptidyl Phe substrates, while elastase k cat /K m versus single-residue Ala is 100,000-fold less than that for peptidyl Ala. Interestingly, collagenase demonstrates identical 100,000-fold differences in k cat /K m for both single-residue Arg and Phe substrates, 10 -1,000-fold greater than chymotrypsin or trypsin and similar to elastase. Collagenase and elastase show the most dependence on the P2-P4 residues for catalytic efficiency, with the low activity on single-residue substrates being a consequence of small P1 residue size or non-optimal P1 residue binding.
Structurally, the degree of P2-P4 binding correlates with the length of the residue 215-220 domain (Fig. 2). This loop forms the lip of the binding pocket and forms a ␤ sheet with the P2-Pn substrate residues (3). Elastase and collagenase have the longest loops, while chymotrypsin and trypsin are 1 and 2 residues shorter, respectively.
Acylation Is Rate-limiting for Crab Collagenase, Versus Deacylation for Trypsin and Chymotrypsin-The relationship between broad specificity and catalysis was further investigated by determining the steady-state Michaelis constants for collagenase, trypsin, and chymotrypsin versus two series (P1-Arg or Phe) of peptidyl amides and esters, varying only in leaving group (Table III). The highly specific enzymes trypsin and chymotrypsin maintain high levels of k cat independent of either the activated amide 7-amino-4-methylcoumarin and pNA or the benzylthioester leaving groups. Either deacylation 5 or product dissociation is rate-limiting for these enzymes (34). In contrast, collagenase reacts with both sets of substrates and shows an increase of up to 1,000-fold in k cat as the leaving group is changed from 7-amino-4-methylcoumarin to the more labile pNA and Sbzl moieties. Acylation is therefore the likely rate-limiting step for collagenase-catalyzed cleavage of both the P1-Arg and P1-Phe peptidyl amide substrates (34). DISCUSSION The cloning and expression of the crab serine collagenase 1 has resolved several issues regarding the molecular biology and enzymology of this unusual enzyme. 1) The sequence was ver- 5 The serine protease mechanism can be depicted as shown by Reaction 1 below.
Under conditions where acylation is rate-limiting, k cat ϭ k 2 and K m ϭ K S . Under conditions where deacylation is rate-limiting, k cat ϭ k 3 and ified, and minor errors were corrected. 2) Heterologous expression verified that collagenolytic activity was intrinsic to this serine protease and provided a source of reagent quantities of the enzyme. Serine proteases, along with the matrix metalloproteases, can now be considered true collagenases. The unique nature of the collagenase active site justifies its classification as a major new branch of the chymotrypsin family of serine proteases.
Crab Collagenase and Shrimp Chymotrypsin: Implications for Collagen Recognition and Cleavage-High levels of identity between the pre-pro forms of crab collagenase and shrimp chymotrypsin, another serine collagenase (29), suggest that a region responsible for collagen recognition and cleavage may include the S4 -SЈ2 substrate binding sites of the enzyme. Most of these sites are conserved between the crab collagenase and shrimp chymotrypsin, including the acidic residues thought to be important in the recognition of Arg in the PЈ1 position by the crab enzyme (14). This suggests that the two enzymes bind collagen by a similar mechanism. A notable structural dissimilarity between the two enzymes occurs in the primary substrate binding (S1) site. A major determinant of the trypsin-like (Arg, Lys) P1 specificity of the crab collagenase is likely to be Asp 226 (13,14,16). Shrimp chymotrypsin lacks an Asp at this position, possessing an Ala instead. Several other conservative substitutions at positions 189, 217a, and 218 may further perturb the P4-P1 specificity of the shrimp enzyme. This suggests that shrimp chymotrypsin may cleave collagen at a subset   Table I, except for trypsin, which is from Ref. 14. Single-residue substrates are Ac-Arg-pNA, Suc-Phe-pNA, and Ac-Ala-pNA. Enzymes are grouped according to P1 residue. Conditions were 50 mM Tris, 100 mM NaCl, 20 mM CaCl 2 , pH 8.0 at 25°C, as described under "Experimental Procedures." Gray bars, single residue; striped bars, tetrapeptide. of the sites (Gln and Leu, but not Arg) recognized by crab collagenase (14).
The Active Site of Collagenase Is Less Hydrophobic than That of Chymotrypsin and Larger than That of Elastase-Extensive quantitative analysis of serine protease specificity has provided the foundation for general theories concerning the interaction of enzymes and substrates (see Refs. 27 and 35 for early reviews). However, much of the groundbreaking work regarding the specificity of the S1 site was carried out utilizing singleresidue esters (27). As these compounds bear little structural or chemical resemblance to the presumed physiological peptide substrates, one might question their use in examining biological function. Partial data sets for chymotrypsin and elastase versus the peptidyl amides Suc-AAP-Xaa-pNA demonstrated the utility of this substrate series in mapping specificity (36 -38). Our results agreed well with that reported previously for single-residue esters (33) and confirmed the assumption that, at least for hydrophobic P1 substrates, S1 site specificity is largely independent of the nature of the scissile bond, as well as NH 2 -terminal groups (27). This allowed the accurate comparative analysis of the recombinant crab collagenase.
Correlations of P1 residue volume and log k cat or log(k cat /K m ) were found for serine protease paradigms chymotrypsin and elastase. Although these enzymes are commonly considered to be specific for aromatic or small hydrophobic residues, respectively, these specificities represent only the upper range of linear continuums that span more than 4 orders of magnitude in k cat /K m . The sensitivities of chymotrypsin and elastase to P1 side chain volume, as reflected in the slopes of the correlations, are equal and opposite. This is also the case for the hydrophobicity constant , a measure of the free energy of transfer of an amino acid side chain from octanol to water. 6 The slope of ϩ2.0 found for chymotrypsin log(k cat /K m ) versus suggests that the free energy of transfer of a hydrophobic amino acid side chain from the active site of chymotrypsin to water is double the free energy of transfer from octanol to water (Ϫ40 -50 cal/Å 2 /mol versus Ϫ20 -25 cal/Å 2 /mol, where Å 2 refers to the solvent-accessible surface area of the side chain) (39 -42). This behavior is attributed to the favorable desolvation of both free enzyme and free substrate in forming the hydrophobic enzyme-substrate complex, equivalent to two transfers from water to octanol (42). Full desolvation of the complex occurs when the hydrophobic surfaces of enzyme and substrate are complementary. The relative slopes of the and P1 residue volume correlations are identical, suggesting that the interactions observed are either purely hydrophobic or that steric and hydrophobic effects contribute equally in this system. The inverse correlation of elastase log(k cat /K m ) with may represent increasing solvation of the complex as larger substrates are bound to the enzyme, but is likely to also include unfavorable steric effects.
Collagenase log(k cat /K m ) is half as sensitive to P1 residue volume and than chymotrypsin and elastase, which possess strongly hydrophobic S1 sites. According to the desolvation model, the collagenase S1 site is less hydrophobic than those of the other two enzymes. The positive slope of the correlation also suggests an active site which is larger than that of elastase. The collagenase S1 site increasingly, but never completely, desolvates larger substrates. The S1 site may also be partially exposed to bulk solvent. Hydrophilic residues, such as Asp 226 , involved in binding Arg, Lys, Orn, and Gln substrates, likely compromise the hydrophobicity of the region.
Several amino acid residues were consistent outliers in the correlations. The ␤-branched amino acids Val and Ile are unexpectedly poor substrates for chymotrypsin and collagenase, indicating a constriction in the S1 sites of these enzymes around the ␤ carbon. In contrast, Nva and Leu (and, to some extent, Abu) are exceptionally good substrates for elastase, suggesting that they may bind productively in a hydrophobic region not accessible to other residues. A detailed analysis must await three-dimensional structural verification.
Ground-state Substrate Binding Does Not Correlate with Transition State Catalysis-Although the serine protease kinetic mechanism 5 (34,43) describes the formation of a groundstate Michaelis complex (K S ) prior to several steps of transition state catalysis (rate-determining step Ϸ k cat ), the tightness of the complex may not in itself predict the rate of catalysis. Collagenase illustrates the generality of this hypothesis, given its broad specificity for basic, neutral hydrophilic, and hydrophobic residues. The value of k cat correlates well with P1 residue volume, irrespective of chemical nature, suggesting size is a component of transition state stabilization. In contrast, there is no correlation of K m with residue volume or k cat (assuming that acylation is rate-limiting for most substrates, K m Ϸ K S ). Similar k cat values are achieved for Gln, Arg, and Phe with K m values ranging 100-fold. This indicates that ground-state binding is independent of transition state catalysis. Elastase and chymotrypsin also show better correlations in k cat than K m with P1 residue volume, again suggesting that these enzymes are designed for transition state catalysis rather than groundstate binding. Site-directed mutagenesis studies of trypsin further support the hypothesis that ground-state binding does not correlate with transition state catalysis (44).
The Coupling of Primary and Subsite Binding in Serine Protease Catalysis-One striking observation of this study is the similar rate of catalysis and level of catalytic efficiency for all enzymes versus their preferred tetrapeptide substrates, despite the large differences in enzyme and substrate structure. Trypsin, chymotrypsin, elastase, and collagenase cleave their preferred tetrapeptide substrates with k cat values within 2-fold of one another. This suggests that all serine proteases of the chymotrypsin family reach a common maximal level of transition state stabilization in the limit of full subsite-induced activation, given the shared chemical mechanism and the similar nature of their physiological oligopeptide substrates. A key component of high level catalysis is the coupling of the S1 and S2-S4 . . . Sn sites (5,45). The structural basis of this productive substrate recognition is different for each enzyme, and is a major contributor to substrate discrimination (5,46). This is illustrated by the 35,000-fold variation in k cat /K m for singleresidue substrates versus the 20-fold variation for the cognate tetrapeptides. Clearly, there are several different compensatory mechanisms of substrate binding for the chymotrypsin class of serine proteases. The degree of productive P2-P4 binding correlates inversely with the selectivity of the S1 site or the size of the preferred P1 residue. Collagenase, possessing the P1 specificities of both chymotrypsin and trypsin, relies to a greater extent, up to 1,000-fold in k cat /K m , on the S2-S4 sites than the more specific enzymes. Collagenase P1-Phe and P1-Arg k cat /K m are equally sensitive to peptide binding, suggesting that nondiscriminant P2-P4 interactions are a critical component of its broad specificity. Mechanistic Consequences of Broad Specificity-The optimization of enzyme specificity can also be assessed mechanistically. The serine proteases hydrolyze substrates by two chemical steps after the formation of the Michaelis complex 5 (34,43). The carbonyl carbon of the amide or ester substrate is attacked (k 2 ) by Ser 195 , forming the acyl enzyme and free amine or alcohol. This covalent intermediate is deacylated (k 3 ) by water, generating the carboxylic acid product and free enzyme. Acylation is generally rate-limiting for amides, and deacylation is rate-limiting for esters, in part due to the higher pK a of the leaving group amine versus the alcohol (43). Although this is almost invariably true for single-residue substrates, deacylation can be rate-limiting for longer peptidyl amide substrates containing more potential binding energy (5). Trypsin and chymotrypsin are highly efficient, specific proteases, and deacylation (or product dissociation preceding or following deacylation) is likely rate-limiting for their preferred peptide substrates (5,6,25,47). In contrast, acylation remains the rate-limiting step for collagenase versus peptide substrates, apparently as a consequence of its much broader activity. The fact that acylation is rate-limiting for collagenase is advantageous for future work, especially in the area of protein engineering. A key issue in mutagenesis studies is the shift in rate-limiting step of variants relative to the wild-type enzyme. For example, variant trypsins are often severely deficient in catalysis (48 -50). Acylation rather than deacylation is then rate-limiting (5,24,47). This in turn alters mechanistic definitions of k cat and K m , 5 preventing accurate structure/function correlations. Corrective measures include the use of single-residue substrates, the estimation of mechanistic constants from steady-state parameters, or ultimately, presteady-state kinetics (5,6,24). These results are specific to the substrates examined here and should not be extrapolated, as other mechanistic steps may be ratelimiting for longer oligopeptide or natural substrates. In this regard, collagenase may prove especially useful in exploring the interplay between substrate binding and catalysis at a macromolecular level.