Long-range Electrostatic Complementarity Governs Substrate Recognition by Human Chymotrypsin C, a Key Regulator of Digestive Enzyme Activation*

Background: Chymotrypsin C (CTRC) targets specific regulatory cleavage sites within trypsinogens and procarboxypeptidases. Results: The crystal structure of CTRC reveals the structural basis of substrate specificity. Conclusion: Long-range electrostatic and hydrophobic complementarity drives CTRC association with preferred substrates. Significance: The observations reveal the mechanistic basis for CTRC selectivity in digestive enzyme activation and degradation. Human chymotrypsin C (CTRC) is a pancreatic serine protease that regulates activation and degradation of trypsinogens and procarboxypeptidases by targeting specific cleavage sites within their zymogen precursors. In cleaving these regulatory sites, which are characterized by multiple flanking acidic residues, CTRC shows substrate specificity that is distinct from that of other isoforms of chymotrypsin and elastase. Here, we report the first crystal structure of active CTRC, determined at 1.9-Å resolution, revealing the structural basis for binding specificity. The structure shows human CTRC bound to the small protein protease inhibitor eglin c, which binds in a substrate-like manner filling the S6-S5′ subsites of the substrate binding cleft. Significant binding affinity derives from burial of preferred hydrophobic residues at the P1, P4, and P2′ positions of CTRC, although acidic P2′ residues can also be accommodated by formation of an interfacial salt bridge. Acidic residues may also be specifically accommodated in the P6 position. The most unique structural feature of CTRC is a ring of intense positive electrostatic surface potential surrounding the primarily hydrophobic substrate binding site. Our results indicate that long-range electrostatic attraction toward substrates of concentrated negative charge governs substrate discrimination, which explains CTRC selectivity in regulating active digestive enzyme levels.

Digestive proteases are synthesized and secreted by the pancreas as inactive zymogens. Physiological activation takes place in the duodenum, where enteropeptidase initiates an activation cascade by specifically activating trypsinogens, which in turn activate chymotrypsinogens, proelastases, procarboxypeptidases, and other digestive enzymes (1). Premature activation of trypsin within the pancreas is understood to be a major initiating factor in chronic pancreatitis, and chymotrypsin C (CTRC) 4 is a significant player in this process. In approximately half of the families affected by autosomal dominant hereditary pancreatitis, the disease is caused by mutations in the cationic trypsinogen gene PRSS1 that result in either enhanced trypsinogen activation or resistance to degradation (2)(3)(4). CTRC possesses the unique capacity to impact trypsinogen activation and stability via two opposing mechanisms: it can cleave cationic trypsinogen either at Phe 18 -Asp 19 within the trypsinogen activation peptide, leading to enhanced autoactivation (5), or at Leu 81 -Glu 82 within the Ca 2ϩ -binding loop, resulting in degradation (6). A number of disease-causing cationic trypsinogen mutations exert their effect in part through accelerating cleavage by CTRC at Phe 18 -Asp 19 or through diminishing cleavage by CTRC at Leu 81 -Glu 82 (4). The p.A16V mutation, which accounts for a small percentage of hereditary pancreatitis kindreds and is also associated with idiopathic chronic pancreatitis (7,8), appears to exert its pathological effect solely by increasing the vulnerability of the cationic trypsinogen activation peptide to cleavage by CTRC (4). Finally, several mutations in the CTRC gene itself that lead to loss or impairment of protein function are significantly associated with chronic pancreatitis (9 -12).
Both cationic trypsin and CTRC are members of the chymotrypsin family of serine peptidases, which share a common two ␤-barrel-fold, a famous triad of catalytic residues Ser, His, and Asp, and a conserved catalytic mechanism for nucleophilic cleavage of peptide bonds. Although these enzymes all catalyze the same reaction, they are differentiated by the distinct substrate sequences that they recognize through a series of subsites located within the active site cleft between the two ␤-barrels (13,14). CTRC has 50 -66% sequence identity with human pancreatic elastase isoforms, which have broad specificity for cleavage after hydrophobic P 1 residues, 5 and ϳ40% sequence identity with other human chymotrypsin isoforms, which typically cleave after aromatic P 1 residues. CTRC distinguishes itself from these other family members by uniquely targeting the regulatory cleavage sites involved in trypsinogen activation (5) and degradation (6), and also serves as a co-activator, with trypsin, of procarboxypeptidases CPA1 and CPA2 (15). The inability of other chymotrypsin and elastase isoforms to target these regulatory sequences (5,15,16) points to the existence of unique elements of CTRC specificity, as does a recent study in which we have identified selective inhibitors of CTRC using phage display (17).
The mechanisms by which CTRC, confronted with multiple potential substrates and cleavage sites, selects from among them, is critical to understanding its protective role in the pancreas and its pathological role in disease. Given the opposing effects of trypsinogen cleavage by CTRC within the activation peptide versus the Ca 2ϩ -binding loop, it is apparent that the hierarchy of CTRC selectivity toward these competing sites has been carefully titrated by nature. Remarkably, even a very subtle mutation within a cleavage site can shift the activity of CTRC and tip the balance toward a disease state.
In the absence of a crystal structure for CTRC, our efforts to understand this specificity have made use of inhibitor phage display selection and mutagenesis of the natural substrate cationic trypsinogen, with somewhat contradictory results (16,17). Inhibitor studies suggest strong preference for Leu followed by Met at the P 1 position (17), whereas substrate mutagenesis suggests that CTRC is much more permissive of alternative P 1 residues (16). Another potential specificity feature is suggested by the presence of multiple Asp and Glu residues within favored CTRC target sequences (Table 1); these acidic residues appear consistently in the P4Ј position but are otherwise non-uniformly situated. Inhibitor phage display confirms a strong preference for acidic residues at the P 4 Ј position (17), but mutagenesis studies again show only moderate effects upon alteration of individual charged residues within the cationic trypsinogen Ca 2ϩ -binding loop (16).
Here, the first structure of active CTRC reveals a familiar fold with distinctive electrostatic features surrounding the substrate binding cleft. The CTRC active site is occupied by the inhibitor eglin c bound in a substrate-like conformation, offering insights into the structural basis for the unique substrate specificity of CTRC toward key physiological and pathological substrate sequences. Analysis of the structure suggests that whereas the bulk of binding energy derives from burial of the P 1 residue and several other hydrophobic side chains, specificity may derive largely from the exaggerated role of long-range electrostatic interactions, from a moderate preference for Leu at the P 1 position, and may also be influenced by local sequence-dependent backbone conformational tendencies.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-Human chymotrypsinogen C bearing a C-terminal His 10 tag was expressed using transiently transfected HEK 293T cells, purified using metal chelation chromatography, dialyzed against 50 mM sodium phosphate (pH 8.0) and 300 mM NaCl, concentrated to 20 mg/ml, and activated by cleavage with cationic trypsin as previously described (4).
A construct for recombinant bacterial expression of eglin c from Hirudo medicinalis was a generous gift from Professor Robert S. Fuller, University of Michigan Medical School. Eglin c was expressed in Escherichia coli host strain BL21(DE3) and purified to homogeneity from periplasmic extract by SP-Sep-

inhibitor sequences recognized by CTRC
The abbreviations used in the table are: Tg1, human cationic trypsinogen; Tg2, human anionic trypsinogen; Tg3, human mesotrypsinogen; proCPA1, human procarboxypeptidase A1; proCPA2, human procarboxypeptidase A2; SGPI-2, Schistocerca gregaria proteinase inhibitor-2. *, substrate peptides shaded in gray have been modeled in the CTRC active site as described in the text, structural coordinates for the models are provided as supplemental Models S1-S3. **, sequence shown is the consensus of multiple clones selected from a phage-displayed SGPI-2 library that was diversified at P 4 , P 2 , P 1 , P 1 Ј, P 2 Ј, and P 4 Ј. Residues shown in parentheses were at positions that were not diversified in the library.
harose chromatography essentially as previously described (18). Eglin c was dialyzed into 10 mM (NH 4 ) 2 OAc at pH 6.0, lyophilized, and stored at Ϫ80°C until use, when it was reconstituted with H 2 O. Crystallization-A 1:1 (mol/mol) mixture of CTRC and eglin c was concentrated to achieve a protein concentration of 4 -5 mg/ml. Crystallization employed the hanging drop vapor diffusion method. Diffraction quality crystals were grown at room temperature from droplets containing 0.2 M lithium sulfate monohydrate, 0.1 M Tris-HCl (pH 8.5), and 30% (w/v) PEG 4000. Crystals grew over the course of 3 weeks to 0.1 ϫ 0.2 ϫ 0.2 mm. Crystals were soaked in a cryoprotectant solution (0.2 M lithium sulfate monohydrate, 0.1 M Tris-HCl (pH 8.5), and 30% (w/v) PEG 4000, and 15% glycerol) and cryocooled in liquid N 2 .
Crystals were screened for diffraction and data were collected at 100 K at beamline X29 at the National Synchrotron Light Source, Brookhaven National Laboratory. The best crystals diffracted to 1.9-Å resolution. We identified crystals of CTRC/eglin c belonging to orthorhombic space group P2 1 2 1 2 1 , with unit cell dimensions a ϭ 56.27, b ϭ 76.25, c ϭ 81.82, and containing one complex in the asymmetric unit. The data were merged and scaled using DENZO/SCALEPACK (19).
Structure Determination-The x-ray structure of chymotrypsin C in complex with eglin c was solved by molecular replacement using Phaser (20) supported by CCP4. The elastase chain from the previously solved structure of a porcine elastase-inhibitor complex (PDB code 1EAI, chain A) (21) and eglin c from bovine ␣-chymotrypsin-eglin c (PDB 1ACB, chain I) (22) were used as search models. ARP/wARP was employed after molecular replacement for automated rebuilding of the CTRC-eglin c complex structure (23,24). A test set of 5% of the total reflections was excluded from rebuilding and refinement of the model. Refmac5 (25) was used to carry out refinement and to place water molecules into difference peaks (F o Ϫ F c ) greater than 3; manual rebuilding was done using COOT (26). A 10-residue activation peptide chain that was not liberated following proteolytic activation of CTRC due to a disulfide link with the enzyme (chain Q) was manually built into electron density maps (2F o Ϫ F c ) using COOT. Phosphate ions were also added using COOT. The final stage of the restrained refinement included water molecules with peaks greater than 1 and within acceptable H-bonding distances from neighboring protein atoms and three phosphate ions. Surprisingly, we were unable to build either the His 10 tag or the N-linked glycosyl groups of CTRC due to the absence of electron density. The final R/R free was 0.157/0.210. Figs. 1, 2, 4A, and 5 were generated using PyMOL (Schrödinger, LLC).
Homology Modeling of Human Elastase and Chymotrypsin Isoforms-Homology models of human elastases (ELA2A, ELA3A, and ELA3B) and chymotrypsins (CTRB1, CTRB2, and CTRL1) were constructed using the SWISS-MODEL workspace (27). Homology models were constructed using porcine elastase bound to an Ascaris chymotrypsin/elastase inhibitor (PDB code 1EAI, chain A) (21) or our model of active CTRC bound to eglin c (PDB code 4H4F, chains A and Q) as templates with 40 -50% amino acid identity to the modeled proteins. Molecular surfaces were generated using the Molecular Surface module of Schrödinger 2012 (Schrödinger, LLC). Molecular surfaces were based upon high-resolution settings (0.3 Å grid, probe radius 1.4 Å, and van der Waals radius scale of 1.0 Å). All surfaces were rendered in red -white -blue color scale using PB electrostatic potential from calculated charges at pH 8.5 with the color ramp set to a minimum of Ϫ0. 35  Molecular Modeling of CTRC Substrate Binding-The x-ray structure for CTRC was imported into the Protein-Preparation-Wizard graphical user interface of Schrödinger with Maestro 2012 version 9.3.5 (Schrödinger, LLC) for adaption to the OPLS2005 force field. Bond orders were assigned, zero-order bonds to metals were determined, disulfide bonds were created, and all hydrogens were generated for every residue. Hydrogen bond assignment was based on sampling water orientations and taking into account crystallographic waters. Protonation states were predicted for pH 8.5 using PROPKA (28,29). Steric clashes were resolved with convergence of a root mean square deviation to 0.3 Å using the OPLS2005 force field within the Schrödinger interface.
For modeling and molecular docking of peptide substrate sequences, starting conformations of substrates were obtained by Polak-Ribiere Conjugate Gradient energy minimization (30) with the OPLS 2005 force field for 5000 steps, or until the energy difference between subsequent structures was Ͻ0.001 kJ/mol Å (28). Force field minimization used a water-based solvent, generating charges with an extended cutoff (van der Waals 8.0 Å, electrostatic 20 Å, H-bond 4.0 Å). We placed soft restraints on all residues Ͼ6 Å from the modeled substrate by using harmonic restraints at 100 kcal/mol, and allowed the residues within the 6-Å cutoff to move freely during Polak-Ribiere Conjugate Gradient energy minimization over 500 iterations with repetition as necessary to converge upon a gradient threshold of Ͻ0.05.
We have previously described the methodology used for substrate docking (31); briefly, the binding site was generated via overlapping grids based on the x-ray structure with a default rectangular box centered on the target substrate. Substrates were docked into the binding site of CTRC using Glide extra precision (XP) (Glide, version 5.6, Schrödinger, LLC); molecular conformations were sampled using methods described previously (32). A structure-based pharmacophore score was generated from the optimized, best scoring pose for each substrate ligand based on the descriptors from the Glide XP score using an established approach (31,33,34). The energetic value assigned to each pharmacophore feature was calculated using Phase (Phase, version 3.2, Schrödinger, LLC) as the sum of the Glide XP contributions of the atoms comprising the site. Overall dockings at the active site were quantified and ranked on the basis of these energetic terms (33,34). To account for protein flexibility and lessen the effects of minor steric clashes, excluded volumes spheres corresponding to 80% of the van der Waals atomic radii were created for all CTRC atoms within 6 Å of each substrate or mutagenized residue modeled. A minimum of two poses per substrate, chosen for a combination of best-scoring features, was selected for visual and energetic comparison (31,33,34).

RESULTS
Structure Solution and Refinement-The structure of active human CTRC bound to inhibitor eglin c was determined to 1.9-Å resolution in space group P2 1 2 1 2 1 with one bimolecular complex per asymmetric unit. The structure was solved by molecular replacement using as search models the previously reported structures of porcine pancreatic elastase (PDB 1EAI) (21), featuring 52% sequence identity to human CTRC, and eglin c (PDB 1ACB) (22). A structure of the bovine CTRC precursor chymotrypsinogen C, possessing 80% sequence identity to human CTRC, has been reported previously (PDB 1PYT) (35); however, the elastase structure was judged to offer a better search model as a result of substantial conformational changes that take place upon protease activation (36). The model was rebuilt and refined to a final R cryst (R free ) of 15.7% (21.0%); crystallographic statistics are summarized in Table 2.
Overall Structure of the CTRC-Eglin c Complex-Like other serine proteases of the chymotrypsin family (37), CTRC is comprised of two ␤-barrels, at the interface of which is located the active site containing the catalytic triad of Ser 195 (Ser 216 ), 6 His 57 (His 74 ), and Asp 102 (Asp 121 ) (Fig. 1A). The substrate binding cleft between the ␤-barrels is occupied by bound inhibitor eglin c, a 70-amino acid protein protease inhibitor originally isolated from the leech H. medicinalis (38). As has been described previously for eglin c (22,39,40) and for the structurally related chymotrypsin inhibitor 2 (41,42), the inhibitor is a wedgeshaped molecule featuring a hydrophobic core formed by a helix and a small ␤-sheet, from which protrudes the inhibitory canonical loop, forming the thin edge of the wedge, which fits into the substrate binding cleft of the enzyme (Fig. 1, A and B). The canonical loop of eglin c, comprised of residues 40 -50, binds to the active site of CTRC in the substrate-like fashion typical of canonical serine protease inhibitors (43)(44)(45) (Fig. 1B).
CTRC residues 1-10 of the cleaved activation peptide chain (CGVPSFPPNL) are retained by the activated enzyme due to a disulfide link between Cys 1 (Cys 17 ) of the activation peptide and Cys 122 (Cys 141 ), located in the linker between the two ␤-barrels on the enzyme face distal from the active site (Fig. 1C). The disulfide bonding pattern and consequent retention of the activation peptide, conserved among other chymotrypsins and the elastase 2A isoform, has been demonstrated to stabilize the enzyme against denaturation and proteolytic digestion by pepsin (46). In addition to the covalent link, the activation peptide association is stabilized by two backbone-backbone H-bonds of Gly 2 (Gly 18 ), an H-bond between the carbonyl of Pro 7 (Pro 23 ) and the side chain of Arg 23 (Arg 37 ), and substantial hydrophobic interactions of Pro 4 (Pro 20 ), Phe 6 (Phe 22 ), Pro 8 (Pro 24 ), and Leu 10 (Leu 26 ) (Fig. 1C). Clear density is observed for the C-terminal Leu 10 (Leu 26 ) of the activation peptide chain, revealing that residues Ser 11 -Ala 12 -Arg 15 (Ser 27 -Ala 28 -Arg 29 ) of the activation peptide have been proteolytically removed. Given the binding preference of CTRC for Leu at the P 1 position (17), it is probable that removal of the tripeptide is accomplished through autoproteolytic cleavage in trans, as suggested previously (47).
Although the recombinant CTRC possessed a His 10 tag, none of the residues of the tag were visible in the electron density, suggesting that either the tag was disordered or had been proteolytically removed post-purification by autoproteolysis. Likewise, although CTRC has been shown to be glycosylated on Asn 36B (Asn 52 ), a modification required for efficient folding and secretion (48), we did not observe density for the glycosyl group. The side chain of Asn 36B (Asn 52 ) was poorly defined and it is likely that the sugar at this site was present but disordered.
Insights into Chymotrypsinogen C Activation-Comparison of the human CTRC structure with the earlier reported structure of bovine chymotrypsinogen C (PDB 1PYT) (35) shows that upon cleavage of the activation loop at Arg 15 -Val 16 (Arg 29 -Val 30 ), the new N-terminal residue Val 16 (Val 30 ) becomes buried, forming a salt bridge with Asp 194 (Asp 215 ) and an H-bond with the carbonyl oxygen of Arg 143 (Arg 162 ). This reorganization has little impact on the structure of the first ␤-barrel domain, but results in major conformational alterations of the second ␤-barrel domain, particularly within loops 1 and 2, which shape the S 1 specificity pocket and oxyanion hole, and loop D, which helps to shape the primed side subsites. Although the catalytic triad residues are already roughly positioned in chymotrypsinogen C, small conformational adjustments in all three members of the triad bring them into appropriate alignment for catalysis, reducing the Ser 195 O␥-His 57 N⑀2 distance from 3.83 to 2.88 Å, and shortening the His 57 N␦1-Asp 102 O␦2 H-bond from 2.95 to 2.55 Å. These alterations are typical of the conserved mechanism of activation of chymotrypsinogen family members (13).
Details of the Inhibitory Interaction-The inhibitory loop of eglin c is stabilized in the distinctive canonical conformation by 6 The CTRC residue numbering used in this report and in the crystal structure coordinates is derived by homology to bovine chymotrypsin, the archetypal member of this peptidase family, for consistency with structural literature in the serine protease field. Designations in parentheses are the corresponding residue numbers based on sequential numbering of the CTRC precursor.

Structure of the CTRC-Eglin c Complex
a hydrophobic "mini-core" (49) comprised of Leu 37 , Val 43 , and Phe 55 , and by a H-bond network involving Thr 44 , Asp 46 , the Arg 48 backbone N, Arg 51 , Arg 53 , and the C-terminal carboxyl group of Gly 70 ( Fig. 2A). This H-bond network is critical for maintaining protease affinity and resistance to proteolysis of the inhibitor (50 -54). The mode of protease binding of canonical inhibitors like eglin c has been shown to very closely mimic the reactive Michaelis complex with a true peptide substrate (45). As is typical of such complexes, the Leu 45 -Asp 46 reactive site peptide bond of eglin c is appropriately positioned for nucleophilic attack by the catalytic Ser 195 (Ser 216 ) of CTRC (Fig. 2B), the initial step in enzyme-catalyzed proteolysis. The largely hydrophobic substrate binding cleft of CTRC is fully occupied by eglin c residues 40 -50, which mimic non-primed side substrate residues P 1 -P 6 and primed side residues P 1 Ј-P 5 Ј. The nonprimed side residues are positioned by antiparallel backbonebackbone H-bonds between the P 3 residue Val 43 and CTRC Gly 216 (Gly 238 ), and between the P 1 residue Leu 45 and CTRC Ser 214 (Ser 236 ) (Fig. 2C). The Leu 45 carbonyl oxygen is positioned to interact with the CTRC oxyanion hole amide nitrogens of Ser 195 (Ser 216 ) and Gly 193 (Gly 214 ). On the primed side, additional H-bonds are formed between P 2 Ј residue Leu 47 and CTRC Thr 41 (Thr 58 ) (Fig. 2C).
The S 1 primary specificity pocket of CTRC, occupied in the complex by the side chain of eglin c Leu 45 , is shaped by the hydrophobic side chains of CTRC residues Ala 190 (Ala 211 ), Val 213 (Val 235 ), and Val 226 (Val 250 ) (Fig. 2C). A more shallow hydrophobic depression in the substrate binding cleft is shaped by the side chains of CTRC Leu 99 (Leu 118 ) and Phe 215 (Phe 237 ), which form a binding pocket for P 4 residue Pro 42 . On the primed side, the S 2 Ј subsite, a pocket bordered by the basic side chain of CTRC Arg 143 (Arg 162 ) and the hydrophobic side chain of Ile 151 (Ile 169 ), is filled by the hydrophobic P 2 Ј residue Leu 47 (Fig. 2C).
The backbone conformation of eglin c residues 43-49 bound to CTRC closely parallels that seen in the previously reported structure of eglin c with bovine ␣-chymotrypsin (PDB 1ACB (22)). By contrast, eglin c residues 39 -42 are shifted compared with the bovine ␣-chymotrypsin-eglin c complex to lie ϳ3 Å further removed from a basic patch formed by CTRC residues Arg 175 (Arg 195 ), Arg 218 (Arg 241 ), and Lys 224 (Lys 248 ) (Fig. 2D). This basic pocket forms the S 6 subsite of CTRC, which is occupied in the complex by a coordinated phosphate ion in the absence of a side chain on the eglin c P 6 residue Gly 40 (Fig. 2D).
Structural Insights into CTRC Substrate Specificity-CTRC has been shown to act as a regulator of pancreatic zymogen activation by targeting a specific set of substrate cleavage sites not recognized by other chymotrypsin or elastase-like digestive proteases (Table 1) (5,6,15,16). One element of this specificity is more highly efficient cleavage after Leu residues when compared with other chymotrypsin and elastase isoforms (16). By contrast, the elastase isoforms show broad P 1 specificity but comparatively low catalytic efficiency on short peptide substrates, whereas chymotrypsins prefer aromatic residues Phe, Tyr, or Trp at P 1 (16). The position occupied in CTRC by Val 226 (Val 250 ), one of the hydrophobic residues shaping the S 1 pocket (Fig. 2C), is in other chymotrypsins filled by Gly or Ala, and the Eglin c binding loop residues 40 -50 are rendered in stick representation, filling (from left to right) CTRC S 6 -S 1 and S 1 Ј-S 5 Ј subsites. C, retained activation peptide of CTRC. Residues 1-10 of the chymotrypsin C activation peptide are tethered to the activated enzyme through a disulfide link between Cys 1 and Cys 122 . The activation peptide is depicted in cyan in stick representation, with a 2F o Ϫ F c electron density map contoured at 1.6. Strong density around the Leu 10 carboxyl terminus confirms that residues 11-13 of the activation peptide are not disordered but have been proteolytically removed.
bulkier Val residue at this position is likely to be responsible for the modest binding selectivity of CTRC for Leu in preference to Met, Phe, or Tyr at the P 1 position, as identified by phage display selection and inhibitor binding studies (17) and by K m values for cleavage of tetrapeptide substrates (16). Nevertheless, the S 1 subsite of CTRC is still capable of accommodating aromatic residues, and in fact, CTRC catalytic rates are slightly enhanced for cleavage after these bulkier residues, resulting in comparable catalytic efficiencies for cleavage after Leu, Met, Phe, or Tyr (16). This result is also consistent with the identification of natural CTRC cleavage sites within protein substrates after Phe, Leu, and Tyr (Table 1).
Another distinctive feature shared by the natural target sites of CTRC is an unusual clustering of acidic residues. Asp or Glu appear very frequently at the P 4 Ј position, an element of specificity corroborated by phage display selection (17). Acidic residues can also be found at P 1 Ј, P 2 Ј, P 3 Ј, and P 5 Ј on the primed side of the cleavage site, and at P 3 , P 5 , and P 6 on the non-primed side of the cleavage site (Table 1). To gain insight into the probable electrostatic contribution to this unusual substrate specificity, we calculated the predicted electrostatic surface potential of the CTRC structure, and for comparison we generated homology models and calculated electrostatic surfaces for other human chymotrypsin and elastase isoforms, which do not target the same regulatory cleavage sites (Fig. 3). We observed a particularly striking concentration of positive charge in a ring surrounding the substrate binding cleft of CTRC (Fig. 3, top).
The most intense concentrations of positive charge are created by one cluster of basic residues on the non-primed side of the cleft in the region that makes contact with P 5 and P 6 substrate residues, and a second cluster of basic residues on the primed side bordering the subsites that recognize P 2 Ј, P 3 Ј, and P 4 Ј substrate residues. Notably, this charge distribution contrasts markedly with the predicted electrostatic surfaces of other elastase and chymotrypsin isoforms (Fig. 3, lower panels). The human elastases feature small patches of both positive and negative charge in roughly equal proportions, whereas human chymotrypsins possess substrate binding clefts lined with primarily negative charge. Thus, the unusual concentration of positive charge surrounding the CTRC substrate binding cleft is likely to be a major determinant driving specificity for substrate sequences of net negative charge.
Modeling of CTRC-Substrate Complexes-Selection by CTRC among several potential cleavage sites within human cationic trypsinogen has profound health implications. Whereas cleavage in the trypsinogen Ca 2ϩ -binding loop is a normal mechanism for CTRC to apply the brakes to a premature cascade of digestive enzyme activation in the pancreas (6), an alternative cleavage within the activation peptide has the potential to accelerate this activation cascade (5). To serve its protective function, CTRC must significantly favor the former cleavage over the latter in the pancreas environment. To gain insight into the interactions of competing cleavage sites with CTRC, we examined the structural context of the Ca 2ϩ -binding loop site in our previously reported structure of human cationic trypsin (55), and we also used the new CTRC structure as a starting point to generate models of CTRC bound to specific substrate sequences.
In the cationic trypsin structure, Ca 2ϩ is coordinated within the loop by the side chains of residues Glu 75 and Glu 85 , which anchor the base of the loop, and by carbonyl oxygens of Asn 77 and Val 80 (Fig. 4A). 7 CTRC targets the Leu 81 -Glu 82 peptide bond for cleavage. With Ca 2ϩ bound, Leu 81 is exposed and accessible to CTRC; however, the peptide backbone stretching away from the cleavage site is unable to assume the canonical conformation required for cleavage (56). Cationic trypsinogen possesses a large number of acidic residues in this region; in 7 The human cationic trypsinogen residue numbering used here is based on sequential numbering of the trypsinogen precursor as is conventionally used in designating natural polymorphisms, e.g. p.A16V.

Structure of the CTRC-Eglin c Complex
addition to Glu 75 and Glu 85 , there are two additional glutamates in the loop (Glu 79 and Glu 82 ), as well as three acidic residues in neighboring loops (Glu 32 , Asp 156 , and Glu 157 ; Fig. 4A). Thus, even with Ca 2ϩ present to neutralize the charge of Glu 75 and Glu 85 , there is a strongly negative electrostatic potential covering this region of the molecule (Fig. 4B). This potential will generate macroscopic electrostatic complementarity between the Ca 2ϩ -binding loop of cationic trypsin or trypsinogen and the substrate-binding site of CTRC. It is anticipated that when Ca 2ϩ is released, this loop becomes more flexible and accessible to CTRC, able to assume a productive binding orientation, and that electrostatic attraction may increase further due to exposure of Glu 75 and Glu 85 . Because eglin c binds to CTRC in the canonical orientation of an ideal substrate (45), we were able to model substrate sequences into the binding cleft of CTRC using eglin c as a template and then to optimize the interactions through energy minimization. Models were generated for CTRC bound to the human cationic trypsinogen activation peptide (APFDDDDK) and the CTRC-labile site within the cationic trypsinogen Ca 2ϩbinding loop (HNIEVLEGNEQ). The resulting "global" total enthalpies (⌬H) from energy minimization were Ϫ70,572 and Ϫ71,057 kcal/mol, respectively. Following modeling and minimization, we conducted substrate docking for each substrate with CTRC, giving docking scores of Ϫ10.16 and Ϫ17.72 kcal/mol, respectively. The cationic trypsinogen Ca 2ϩ -binding loop, which contacts the greater number of non-primed side subsites, was predicted to be the more preferred substrate based on overall lowest energy from docking/binding with CTRC.
The docked model of preferred substrate HNIEVLEGNEQ shows the positioning of substrate residues relative to the electrostatic features of the CTRC surface (Fig. 4C); residues Asn 77 -Ile 78 -Glu 79 -Val 80 -Leu 81 -Glu 82 -Gly 83 -Asn 84 -Glu 85 fill the largely hydrophobic cleft between the flanking clusters of positive charge, with the Leu 81 side chain embedded in the hydrophobic S 1 subsite. Surprisingly, none of the acidic side chains of the substrate, Glu 79 , Glu 82 , or Glu 85 , which fill the P 3 , P 1 Ј, and P 4 Ј positions, respectively, form direct salt bridges with the clustered basic side chains of CTRC in the energy minimized docked model (Fig. 5A). Instead, it would appear that the complex stabilization attributable to charge complementarity derives from longer range electrostatic interactions. Trypsinogen Glu 82 in the P 1 Ј position is stabilized by CTRC Arg 62A via an interaction bridged by H-bonds with the P 3 Ј side chain of trypsinogen Asn 84 . The trypsinogen Glu 85 P 4 Ј side chain is positioned equidistant from the guanidinium groups of CTRC Arg 39 and Arg 143 (about 6 Å from each). The major favorable close interactions in the complex are comprised primarily of hydrophobic and van der Waals interactions, and a series of hydrogen bonds tethering the substrate peptide backbone within the binding cleft (Fig. 5A).
The docked model of the competing substrate cleavage site, APFDDDDK (the trypsinogen activation peptide), shows that this shorter substrate fills only the S 1 -S 3 subsites on the nonprimed side of the cleft, forming fewer H-bonds and hydrophobic interactions as compared with the longer substrate (Fig. 5B). In this model, Phe 18 fills the hydrophobic S 1 subsite, and the substrate does form a single direct salt bridge between Asp 20 at the P 2 Ј position and Arg 143 (Arg 162 ) of CTRC (Fig. 5B). However, as was the case for the Ca 2ϩ -binding loop site, electrostatic stabilization conferred by the Asp residues at the P 1 Ј, P 3 Ј, and P 4 Ј positions is apparently mediated through longer-range interactions with CTRC.
Mutation of a single residue in the trypsinogen activation peptide, where Val is substituted for Ala 16 at the P 3 position, predisposes carriers for development of pancreatitis, apparently by altering the balance of the CTRC substrate selectivity in favor of the activation peptide (4,7,8). To explore the structural basis for this shift in selectivity, we modeled the complex of CTRC with the p.A16V mutant sequence. The docked model of the p.A16V mutant trypsinogen activation peptide VPFD- DDDK very closely resembles that of the wild-type sequence, with significant differences confined to the mutated residue itself. The N-terminal amine of Ala 16 H-bonds to the CTRC Gly 216 carbonyl in the model of the wild-type sequence (Fig.  5C), whereas in the mutant model this H-bond is disrupted by slight rotation of Val 16 to optimize hydrophobic contacts with the Pro 17 ring and with the ␤-carbon of Arg 217A (Fig. 5D). The impact of the mutation on binding interactions with CTRC suggested by these models would not appear to be sufficient to explain the functional importance of the mutation in predisposing carriers to pancreatitis.
Consistent with these minimal structural differences, the calculated binding energy for the p.A16V mutant sequence was very close to that of the wild-type sequence (Ϫ8.811 versus Ϫ10.16 kcal/mol, respectively) and did not suggest enhanced binding upon mutation as might be predicted by functional studies (4,5). One possible explanation for this apparent discrepancy is that the energy calculations do not take into account entropic components of binding energy, which may differ between the substrate sequences. This explanation is highly plausible, because entropic factors involved in binding are anticipated to differ between the wild-type and mutant activation peptides as detailed further under "Discussion." Thus, it is probable that the mutation critically influences the conformational ensemble and dynamics of the activation peptide in its unbound form, resulting in differential entropic contributions to formation of the CTRC-substrate complex and thereby altering cleavage selectivity.

DISCUSSION
On one hand, biochemical studies with purified proteins show that CTRC, like most digestive proteases, is capable of proteolytic cleavage of multiple target sequences revealing no highly conserved recognition motif (Table 1). CTRC appears to be relatively insensitive to a variety of mutations of the trypsinogen Ca 2ϩ -binding loop cleavage site (16), further suggesting fairly promiscuous activity. On the other hand, incontrovertible genetic evidence demonstrates that a subtle mutation of the CTRC cleavage site within the cationic trypsinogen activation peptide predisposes to development of chronic pancreatitis (7,8), as does loss of function of CTRC itself (9 -12); these observations reveal a specific regulatory role for the enzyme.
To reconcile these contrasting views of CTRC specificity, we consider how the protease selects its substrates in vivo. Steady state enzyme kinetics studies using individual substrates can reveal the thermodynamic stability of the Michaelis complex (K m ) and the overall catalytic rate of the reaction (k cat ) from which the specificity constant k cat /K m is derived. However, in the scenario in which multiple candidate substrates are in direct competition, kinetic rather than thermodynamic control may prevail (57,58). Protein-protein association is a multistep process, with different molecular forces influencing the rates of each step, and electrostatic interactions provide the dominant long-range force capable of accelerating the rate of initial molecular collision (58,59). The intense concentration of positive charge surrounding the CTRC active site, yet apparently not positioned for optimal formation of tight salt bridges in the final Michaelis complex, suggests that this funnel-like ring of surface charge may instead be optimized to kinetically favor the initial attraction of polyacidic substrate sequences, akin to a molecular tractor beam.
Nonspecific long-range electrostatic attraction will be a dominant driving force as CTRC and its substrate first approach each other, at a stage when they may not yet be correctly rotationally oriented. The stabilization conferred by electrostatic attraction, combined with the viscosity of the surrounding solution, may also extend the lifespan of the low affinity transient complex, allowing the two proteins to rotate and sample multiple trajectories of approach in repeated microcollisions (58). Thus, an electrostatically complementary substrate will have enhanced probability of finding a productive binding conformation, achieving the Michaelis complex, and becoming proteolyzed. A similar influence of electrostatic potential on substrate specificity was observed previously for the hepatitis C virus NS3 protease using pre-steady state kinetics, where clusters of positively charged residues near the active site, complementary to clusters of negative charge on the substrate, were found to drive very rapid association (60). The influence of these long-range interactions on substrate selection in vivo may be underestimated in comparisons of overall substrate affinity, which depend upon the relative rates of formation and dissociation of the high affinity Michaelis complex. The later stages of binding that mark the progression from transient to Michaelis complex are likely to be slower, due to the radical alterations of local substrate conformation often required (56).
Following formation of a roughly aligned encounter complex, the next stage of CTRC-substrate association is likely to be docking of the large hydrophobic P 1 "anchor" residue within the S 1 subsite (57). As with electrostatic charge, the kinetic importance of this interaction and its impact on specificity under conditions of direct substrate competition may be underestimated in studies with single purified substrates. For example, CTRC binding studies with tight binding inhibitors revealed a clear preference for P 1 Leu over Met, both of which were represented much more highly than Phe or Tyr in a pool of inhibitors selected by phage display (17). By contrast, CTRC appeared insensitive to mutation of the P 1 Leu in the trypsinogen Ca 2ϩ -binding loop to Met, Tyr, or Phe (16). We would speculate that in the case of the rigidly structured inhibitors, the overall rate of association reflects the rate of P 1 -S 1 docking, which is fastest for Leu due to optimal complementarity with the S 1 pocket. By contrast, with highly flexible or natively unstructured substrate sequences, the slower rate with which flanking residues conform to adjacent subsites through induced fit (57) may mask the importance of P 1 specificity.
CTRC serves a protective function in the pancreas, as demonstrated by the association of CTRC loss of function with chronic pancreatitis (9 -12), and by the finding that pathological activation of hereditary pancreatitis-causing cationic trypsinogen mutants is dependent on CTRC activity (4). We have found that CTRC can cleave cationic trypsinogen at alternative cleavage sites; proteolysis within the trypsinogen activation peptide leads to enhanced autoactivation of trypsin (5), whereas proteolysis within the Ca 2ϩ -binding loop leads to trypsinogen degradation (6). The protective function of CTRC would suggest that the dominant activity of CTRC activated in the pancreas must be the degradation of cationic trypsinogen, initiated by cleavage within the Ca 2ϩ -binding loop.
Our modeling results are consistent with the idea that when sterically accessible, the Ca 2ϩ -binding loop is the more preferred site of cleavage. This site fills a greater number of nonprimed subsites, burying a greater solvent accessible surface area, it possesses the preferred Leu at the anchor P 1 position, and it is calculated to have a thermodynamically more favorable binding energy. In the absence of bound Ca 2ϩ , the Ca 2ϩ -binding loop may be preferred by CTRC from the perspective of long-range electrostatic interactions as well, because it possesses four Glu residues within the primary sequence of the loop and an additional three exposed acidic residues on nearby loops. In comparison, the unstructured activation peptide possesses four tandem Asp residues, but the negative charge of these side chains would be offset by an adjacent Lys residue and by the N-terminal amine.
It is more difficult to explain from a structural perspective why the p.A16V mutant trypsinogen activation peptide is improved as a CTRC substrate, to the extent of significantly diminishing the protective function of CTRC in the pancreas. However, it may be as a result of the impact of this substitution on the conformational ensemble represented by the activation peptide and on its dynamics. The activation peptide is at the N terminus of trypsinogen, and is unstructured in the crystal structures of bovine and rat trypsinogens (61)(62)(63). In this unstructured state, the Ala 16 -Pro 17 (or Val 16 -Pro 17 ) peptide bond exists as an equilibrium mixture of cis and trans isomers (64,65), and the bulkier Val 16 will greatly reduce the rate of cis-trans isomerization (66). Perhaps more importantly, the steric bulk of the proline ring restricts the conformations available to the preceding residue (67), and the ␤-branched Val 16 will encounter further restrictions than the smaller Ala 16 (68). To bind to the CTRC active site and undergo proteolysis, the Pro peptide bond must assume the trans configuration (69), and both residues must conform to the idealized binding mode of a canonical loop, in which the P 3 residue main chain assumes an antiparallel ␤-strand conformation and the P 2 residue main chain assumes a polyproline II conformation (43). We speculate that the steric restrictions and reduced mobility of the p.A16V mutant have the effect of locking the activation peptide into a more substrate-like conformation, or of biasing the conformational ensemble toward a more substrate-like conformation. The impact would be to minimize the unfavorable entropic contribution to the binding energy, rendering the p.A16V mutant trypsinogen activation peptide a more favorable substrate.
In conclusion, the molecular structure and analyses of CTRC presented here offer mechanistic insight into the striking selectivity of the enzyme for regulatory sites in trypsinogens and procarboxypeptidases. Long-range electrostatic interactions between enzyme and substrate, rather than specific charge pairing, underlie substrate discrimination. In addition, the structure of CTRC establishes a framework that may enable interpretation of the functional effects of other clinically signif-icant trypsinogen mutations that impact CTRC substrate selectivity, as well as mutations within CTRC itself that modify risk for development of chronic pancreatitis.