Three-dimensional Models of Proteases Involved in Patterning of the Drosophila Embryo

Three-dimensional models of the catalytic domains of Nudel (Ndl), Gastrulation Defective (Gd), Snake (Snk), and Easter (Ea), and their complexes with substrate suggest a possible organization of the enzyme cascade controlling the dorsoventral fate of the fruit fly embryo. The models predict that Gd activates Snk, which in turn activates Ea. Gd can be activated either autoproteolytically or by Ndl. The three-dimensional models of each enzyme-substrate complex in the cascade rationalize existing mutagenesis data and the associated phenotypes. The models also predict unanticipated features like a Ca2+ binding site in Ea and a Na+ binding site in Ndl and Gd. These binding sites are likely to play a crucial role in vivo as suggested by mutant enzymes introduced into embryos as mRNAs. The mutations in Gd that eliminate Na+ binding cause an apparent increase in activity, whereas mutations in Ea that abrogate Ca2+ binding result in complete loss of activity. A mutation in Ea predicted to introduce Na+ binding results in apparently increased activity with ventralization of the embryo, an effect not observed with wild-type Ea mRNA.

secreted during oogenesis as inactive zymogens into a thin, fluid-filled perivitelline space that lies between the eggshell and the oocyte. Genetic and molecular studies suggest that these proteins act in a proteolytic cascade many hours later in the early embryo (1,(3)(4)(5). The cascade resembles in its general organization those controlling the innate immune response and blood coagulation (6). Ovulation of the egg in some way triggers the self-activation of Ndl into Ndl*. Gd can be activated either by Ndl* or by self-activation in the presence of Snk. Subsequently, Gd* activates diffusible Snk and Snk* activates diffusible Ea. The result of this cascade is cleavage by Ea* of the diffusible dimeric nerve growth factor-like Spz (7). The processed Spz appears to function as a dimer to activate the transmembrane receptor Toll only on the embryo surface that will become ventralized through the Toll signaling pathway.
In contrast to the significant knowledge garnered from previous in vivo studies, quantitative information on activity and specificity of various members of the cascade has so far eluded characterization involving purified proteins. Several questions remain regarding the activation of Ndl and Gd (1,(3)(4)(5) and the specificity of Gd* and Snk*. Elucidation of these timely and important questions would benefit from the knowledge of the structural organization of the enzymes involved in the cascade. However, none of the members of the cascade has been crystallized so far or even expressed successfully for detailed in vitro characterization. Hence, we felt that the construction of three-dimensional models of Ndl*, Gd*, Snk*, and Ea* in complex with their targets could fill a critical structure-function gap in the field as recently shown for thrombin interactions with the platelet receptors (8) and fibrinogen (9). The value of these models stems from their timeliness and the new insight offered for future mutagenesis studies, as illustrated in the present work by the effect on embryo polarization when putative cation binding sites of examined proteases were mutated.

MATERIALS AND METHODS
Sequence Alignment and Comparative Modeling-The fly sequences came from the strain Berkeley in the Flybase (FB) and Swiss Protein (SP) databases: Ea (FBgn0000533, SP-P13582), Gd (FBgn0000808, SP-O62589), Ndl (FBgn0002926, SP-P98159), and Snk (FBgn0003450, SP-P05049). These sequences were aligned with 1800 serine proteases from the trypsin family pulled from the non-redundant data base at the National Center for Biotechnology Information (NCBI) (National Library of Medicine, National Institutes of Health, Bethesda, MD) and the Flybase image at the NCBI using trypsin homologues as seeds with the BLAST program and aligned together with ClustalX as described recently (10). Sequences were clustered into 100 groups from a neighbor junction tree accounting 500 bootstraps with ClustalX. One hundred sequences were selected, one per cluster. Three-dimensional models of 70 structures were built by comparative modeling based on 12 of 20 crystal structures of serine proteases used in the sequence core. These models were used to refine the alignment of the 100-sequence core (10). The theoretical three-dimensional models of activated protease domains Ndl* (central or Ndl1*-(1146 -1385) and C-terminal or Ndl2*-(2017-2616)); Gd*-(256 -528); Snk*-(191-430); and Ea*-(127-392) were constructed by comparative modeling using the program Modeller 4 (11). The following crystal structures of serine proteases downloaded from the Protein Data Bank (PDB) (12) were used as templates: trypsin (PDB code 1tld, 1.50 Å-resolution); chymotrypsin (PDB code 4cha, 1.68 Å); tPA (PDB code 1rtf, 2.30 Å); plasmin (PDB code 1bui, 2.65 Å); plasma kallikrein (PDB code 2pka, 2.05 Å); thrombin (PDB code 1ppb, 1.92 Å); factor Xa (PDB code 1hcg, 2.20 Å); factor IXa (PDB code 1rfn, 2.80 Å); factor VIIa (PDB code 1dan, 2.00 Å); and activated protein C (PDB code 1aut, 2.80 Å). These proteases were chosen because they span the breadth of diversity of trypsin-related domains and regulatory cation binding sites. Alignments extracted from the 100-sequence core were optimized manually during preliminary comparative modeling processes according to the distance violation from templates provided by the Modeller program output files.
Two hundred models were built for each protease with different frameshifts of alignment in the poorly conserved loops and different seeds for the number generator. Models were then checked and ranked for stereochemistry, structural topology features, and amino acid spatial distribution with Procheck (13), WhatCheck (14), and Verify3D (15). Conformers in the same clusters were pooled when the root mean square deviations of their backbone was Ͻ2.5 Å. The best conformer was kept for each cluster and optimized by molecular dynamics (fast annealing 50 -600 K in 3 ps, slow cooling 600 -50 K from 5 to 15 ps) and then minimized (200 steepest descents and then 500 conjugate gradient cycles) using the program Discover (Accelrys, San Diego, CA). The following parameters were used in all of the procedures: force field CFF91, dielectric constant set at 2, and cut off to threshold non-covalent bonds was set at 14 Å during dynamics and set to ϱ during minimizations. The highest ranked models were used for analysis. The in-andout side chain distributions and the sequence-structure compatibility analyzed with Verify3D gave the following current score/expected score/ threshold for the final three-dimensional models: Ndl1* (118/116/52), Gd* (130/130/58), Snk* (112/112/50), and Ea* (117/120/54). These values are comparable with those obtained with the crystal structures used as templates. The Ramachandran plot put --dihedral angle pairs per residue mostly in the favored and allowed regions as per the program Procheck: Ndl1* (92.4%), Gd* (86.0%), Snk* (94.2%), and Ea* (91.4%). The stereochemistry of the three-dimensional models satisfied Procheck and WhatCheck requirements found for crystals of proteases solved at a resolution lower than 2.5 Å. Solvent-accessible surfaces were displayed with Insight II (Accelrys) using the Connolly's algorithm with a 1.4-Å probe radius.
We screened the three-dimensional models of catalytic domains for putative sites of cleavage by proteases. Only Arg, Lys, Ile, Leu, Phe, and Val residues were selected if not followed by Pro provided that their side chain was at least 50% exposed compared with same residue in the tripeptide GXG. Every selected position i in the model was scored for the solvent accessibility of neighbor residues from i Ϫ 4 to i ϩ 2. We defined a protease cleavage site when [(r sc,iϪ4 ϩ r sc,iϪ3 ϩ r sc,iϪ2 )/3 ϩ r i ϩ (r sc,iϩ1 ϩ r sc,iϩ2 )/2]/3 Ͼ 0.50, where r i is the percentage of overall solvent accessibility and r sc,i is the percentage of side chain solvent accessibility.
Ca 2ϩ and Na ϩ binding sites were identified with the program VALE (16) using a grid of 0.1 Å, water molecule radius of 1.4 Å, and a minimum threshold for the sum of oxygen-cation bond-strength contributions of 0.8.
Modeling of Enzyme-Substrate Complex-Three-dimensional models of protease-substrate complexes were built by comparative modeling in a thorough or quick mode. In the thorough mode, the protease-fragment complexes were threaded over thrombin-peptide crystal structures (8,9). We used the following templates: peptide-Ac-DFLAEGGGVR from PDB (1bbr and 1ucy); PPACK from PDB (1ppb); hirugen peptide-NG-DFEEIPEEYL from PDB (1hah); and peptide-LDPR from PDB (1nrs). 50 three-dimensional models were built and ranked in terms of stereochemistry quality and lowest potential binding energies. The accepted computer-generated models of protease-peptide substrate complexes had root mean square deviations of Ͻ1.5 Å for the protease backbone and peptide residues Ͻ10 Å from protease residues. Models containing a ligand with root mean square deviations of Ͻ2 Å from a higher ranked model were discarded. We selected the best ten models, extracted the ligand, docked it on the best free protease three-dimensional model as a starting point for a new modeling process, and optimized the best complex as described above for the free proteases. The thorough mode screened 1 of 50 three-dimensional models of enzyme-target peptide complexes and was used to screen every putative activation cleavage site of zymogens with every selected protease to assess activator-activated pairs.
The quick mode was used to screen possible cleavage sites all along sequence targets in and out of the catalytic domain. We used only one of seven peptide three-dimensional models to template the position P1-P11, chosen according to the length of the loop between P1 and the closest hydrophobic side chain from P4 to P10 (seven possibilities). Five three-dimensional models of the complex were provided by Modeler runs, and then the best one was minimized as in the thorough mode.
Binding Free Energy Calculations-We examined the relative binding free energies of substrates on proteases by applying an empirical method on bound and free components. We used the potential energy of the system as an enthalpy term (force field CFF91), a conformational entropy term based on solvent-accessible surface area (SASA) of residues and a hydration free energy term based on finite difference approximation of the Poisson-Boltzmann equation.
The predicted free energy of association between receptor (R) and peptide (P), ⌬G, was calculated considering that free R and P have the same conformation as in the complex RP from ⌬G ϭ ⌬G RP Ϫ ⌬G R Ϫ ⌬G P with ⌬G x ϭ ⌬G x,gas(⑀ϭ1) Ϫ ⌬G x,hyd(⑀ϭ80) and x ϭRP, R, or P. The value of ⌬G x,gas was calculated from its enthalpic and entropic contributions expressed as ⌬G x,gas ϭ ⌬H x,gas Ϫ T⌬S x,gas with ⌬H x,gas ϭ E x,vdw ϩ E x,coul and ⌬S x,gas ϭ ⌬S x,conf,gas ϩ ⌬S x,rt,gas ϩ ⌬S x,vib,gas . The enthalpy ⌬H x,gas is a function of the van der Waals (E vdw ) and coulombic (E coul ) components, whereas ⌬S x,gas is defined in terms of the rotational, configurational, and vibrational components. E vdw and E coul were computed from the CFF91 force-field without cut-off with ⑀ ϭ 2.
The value of the conformational entropy ⌬S x,conf,gas was computed from the loss of side and main chain rotation freedom using the definition as shown in Equation 1, T⌬S X,conf,gas ϭ T⌬S X,confsc,gas ϩ T⌬S X,confmc,gas where f 1 (r sc,i ) ϭ r sc,i 8 /(r sc,i 8 ϩ0.5) and r sc,i are the relative accessibility of the i th residue side chain, r sc,i ϭ SASA sc,i /SASA sc,i,GXG . SASA sc,i,GXG refers to the side chain solvent-accessible surface area of amino acid X in the tripeptide Gly-X-Gly. The empirical scales of side chain rotation freedom, ⌬s i were taken from Pickett and Sternberg (17). The function f 1 decreases the entropy values when the accessibility of the side chain is Ͻ50%. The loss of freedom of residue i main chain dihedral angles and was roughly considered as a function of the steric hindrance around residue i Ϫ 1 to i ϩ 1, affecting the access of allowed and core region in the Ramachandran graph. r i is the smallest value of SASA mc,i / SASA mc,i,GXG (accessibility of the main chains only) or SASA i /SASA i,GXG (overall accessibility) weighted by the attenuation function f 2 (x) ϭ x 7 /(x 7 ϩ 0.5). The accessible area fraction, i was fixed for each residue dihedral pair and of X from the tripeptide Ala-X-Ala in the Ramachandran graph: 0.28 for X ϭ Pro; 0.56 for X ϭ Gly; and 0.40 for all other amino acids according to the allowed and core region in Procheck graphs (13).
The size of the different ligands is very similar, and the resulting loss of rotational and translational entropy upon binding ⌬S rt,gas between different ligands is negligible. For 25-residue peptides associated to proteases modeled by quick and thorough mode T⌬S rt,gas was ϳ18 -20 kcal/mol at 298 K (18). ⌬S vib,gas was not computed in the absence of experimental data on the examined structures or their normal mode vibrations. The main modes of vibrations are weakly affected for peptides of similar length, targeting the same site of a protease in a slightly different conformation. The values of ⌬G x,hyd were calculated from their electrostatic energy G e and non-polar energy of hydration G n as ⌬G hyd ϭ ⌬G e ϩ ⌬G n . The electrostatic energies ⌬G e were computed using the finite difference Poisson-Boltzmann method implemented in the program DelPhi (19) averaged from eight 1-Å resolution grids decayed by 0.5 Å in one, two, or three of the x, y, and z directions. The choice of the grid position and resolution affects final values (the mean Ϯ S.D. is 0.8 -1.8 kcal/mol). G e values were computed for the transfer of the solute in water from ⑀ ϭ 2.0 to 80.0, G e (80.0,2.0), and then in gas from ⑀ ϭ 2.0 to 1.0, G e (1.0,2.0), as ⌬G e ϭ G e (80.0,2.0) Ϫ G e (1.0,2.0). The radius was fixed to 1.4 Å for solvent molecules and 2 Å for ions. Ionic strength was set at 145 mM, and the protonation state and partial charge distribution were assigned by the program Biopolymer according to the pH fixed at 7.0. The non-polar contribution G n was considered as linearly dependent on the molecule solvent-accessible surface area using a surface tension coefficient of 25 cal/mol/Å 2 (20), i.e. ⌬G n ϭ 25 ⌬SASA.
Based on the above definitions, the free energy for the receptorpeptide complex becomes ⌬G ϭ ⌬H gas Ϫ T⌬S rt,gas Ϫ T⌬S vib,gas Ϫ T⌬S conf,gas ϩ ⌬G hyd .
Some of the terms cancel if we compare the association of same length peptides bound to the same protease. The approximation of the relative binding free energy is given by ⌬⌬G ϳ ⌬⌬H gas Ϫ T⌬⌬S conf,gas ϩ ⌬⌬G hyd .
This approach does not allow comparison of the binding of a peptide to two different proteases unless the vibrational entropy variation upon binding is comparable.
The ⌬⌬G values refer to selected conformations and are affected by the choice of the "best model" according to global potential energy of the system and the goodness of its stereochemistry. The mean Ϯ S.D. is ϳ2.4 kcal/mol between the ⌬⌬G of the 10 best models of Snk* when it is bound to the activation site of Ea. Lower deviations were estimated as 1.2 kcal/mol for Ea* with Spz peptide, 1.7 kcal/mol for Ndl1* with Gd peptide, and 1.4 kcal/mol for Gd* with Snk peptide.
mRNA Preparation of Mutated gd and ea-The plasmid pNB-GD2 containing a full-length gd cDNA was obtained from J. L. Marsh (University of California, Irvine, CA) (21). The plasmid pGEM7Zf(ϩ) containing a full-length ea cDNA was obtained from K. V. Anderson (Sloan-Kettering Institute) (22). Mutations were introduced using the QuikChange Exchange kit (Stratagene). We mutated Phe-225 to Ala and Pro in the putative Na ϩ binding site of Gd. We also mutated separately Phe-225 to Ile, Ser, and Tyr to create the putative Na ϩ binding site of Ea, and we mutated Glu-70 to Ala and Lys in the putative Ca 2ϩ binding site of Ea. mRNAs encoding wild-type and mutant Gd and Ea were transcribed from plasmids by using the SP6 mMessage mMachine kit (Ambion, Austin, TX) and were dissolved in water in a range of concentration from 0.06 to 1 mg/ml as estimated by UV absorbance (4).
Fly Stocks and Embryo Injection-The mutations and allelic combinations used here were described previously: gd 7 /gd 7 (23) and ea 4 / ea 5022rx1 (24). Embryos (0.5-1.5 h post-fertilization) were injected centrally at 40 -60% egg-length after the removal of the outer eggshell layer according to a standard procedure (2). Injected embryos were visually examined during gastrulation, and their cuticles were prepared for examination as described previously (25,26). The injection of mRNAs encoding wild-type Gd or Ea was used as positive controls (4).

Identification of Cleavage Sites by Alignment of Primary
Sequences-Zymogens Gd (amino acid 528), Snk (amino acid 430), and Ea (amino acid 392) are organized in three domains: an N-terminal signal that is cleaved during protein secretion and a zymogen that gives rise to A (N-terminal) and catalytic B (C-terminal) chains (Fig. 1). The A and B chains remain covalently linked through disulfide bridges after proteolytic activation. The topology of Ndl is more complex and unusual because it carries two S1a protease domains. The first catalytic domain (Ndl1*-(1145-1385)) is central, and the second (Ndl2*-(2017-2616)) is C-terminal. Eleven low density lipoprotein (LDL) receptor-binding repeats intercalate the two protease domains (27). Four LDL receptor repeats are inserted in the second protease catalytic domain.
Ea and Snk show 25% identity overall, 33% within the B chain, and feature the same potential disulfide bridges (Fig. 1). Alignment with other proteases suggests that only one disulfide bridge, 1-122 in the chymotrypsin numbering, 2 (Fig. 2). Insertions or deletions relative to chymotrypsin occur in loops at the protein surface and outside the active site. Although the identity between Ndl2* and other trypsin-like proteases is low, it spreads uniformly among all domains and especially at the level of the two ␤-barrels. Ndl2* features an unusual catalytic triad, where the nucleophile Ser-195 is coupled to Glu-102 and Ser-57 that replace the canonical Asp-102 and His-57. There is no other example of a Ser-Glu-Ser catalytic triad among 1800 other serine proteases in the NCBI (www.ncbi.nlm.nih.gov) and MEROPS (www.merops.co.uk) databases, which suggests that Ndl2* may not participate in the cascade as an active protease. The LDL domain is inserted away from the potential active site in the 186-loop, where insertions of various length also exists in thrombin and tissueplasminogen activator. The presence of Asp-189 in the S1 (31) pocket shows that specificity is unambiguously trypsin-like for Ndl1* and Ea*, The secondary target sites were predicted from high binding free energy scores and are Ͼ50% accessible based on the three-dimensional models. Disulfide bonds predicted by sequence alignment and from three-dimensional models are shown in black. Also shown in the S1a domain in orange are residues of the catalytic triad (red circles), residues at the bottom of the S1 pocket (green circles), and residues involved in Ca 2ϩ or Na ϩ binding (black circle). The Ndl1 and Ndl2 protease domains of Ndl are shown separately. The topologies of bovine ␤-trypsin (Try) and bovine ␣-chymotrypsin (Chy) are reported for comparison. Low density lipoprotein (LDL) marks the position of low density lipoprotein receptor-like domain repeats (40 residues, Cys 6 -7 ) with the number of repeats shown in parenthesis. Ea and Snk share a disulfide-knot or "Clip" motif (1). whereas Ser-189 suggests a chymotrypsin-like specificity for Gd*. The presence of Gly-189 in Snk* makes the prediction of specificity ambiguous. The shape and volume of the S1 pocket in Snk* could accommodate a variety of side chains. Leukocyte elastase, which carries Gly189 and cleaves after Val in P1, shows a 23% identity with Snk* in the catalytic B chain. The structure of elastase (PDB code 1ppg) complexed with the tetrapeptide AAPV (32) shows that Val-190 defines the S1 specificity toward hydrophobic P1 residues. In the model of Snk*, the unusual His-190 (His-371) points out of the S1 pocket and interacts with Asp-194 (Asp-375), thereby leaving the S1 pocket free to interact with a variety of side chains besides hydrophobic residues.
Preferred Cleavage Sites from Enzyme-Substrate Three-dimensional Models-We predicted the position of potential protease targets in every protease sequence based on 25-residue peptide binding energy to each protease active site (Table I) or the accessibility of the peptide within the catalytic domain (Fig.  1). We computed the theoretical binding energy of all of the fragments 25-residue long with Arg or Lys at position 11 derived from the sequence of Ndl1 (37 fragments), Ndl2 (78 fragments), Gd (60 fragments), Snk (42 fragments), Ea (38 fragments), and Spz (36 fragments) after docking them using the quick mode on Ndl1*, Snk*, and Ea*. The same procedure was used for fragments containing Leu, Ile, Val, or Phe at position 11 derived from the sequence of Ndl1 (88 fragments), Gd (142 fragments), Snk (105 fragments), Ea (92 fragments), and Spz (74 fragments) and docked on Gd* and Snk*. All of the fragments containing Pro at position 12 (ϳ3% of total) and cleavage sites inaccessible in protein three-dimensional models (score Ͻ0.5) were discarded. We used an empirical relative binding free energy ⌬⌬G as a criterion to select optimal cleavage sites. Values were computed for all possible cross-activation to test protease-target specificity ( Table I). The best cleavage sites for Ndl1* are Arg-1144 in Ndl and Lys-211 in Gd. The best cleavage site for Gd* is Leu-183 in Snk. The best cleavage site for Snk* is Arg-127 in Ea. The best cleavage site for Ea* is Arg-220 in Spz (Fig. 1, Table I). Interestingly, the other fragments are predicted to bind with high scores and can be regarded as secondary target sites. Fig. 1 reports potential cleavage sites within catalytic domains sorted by trypsin-like cleavage sites (blue, Arg or Lys in P1) or chymotrypsin-like cleavage sites (pink, Leu, Val, Ile, and Phe in P1) when the accessibility score of the corresponding site is Ͼ0.50. For example, Ndl1* can cleave Ndl at Arg-1385 and can separate Ndl2 from Ndl1. Ndl1* can also cleave at Arg-1094, yielding a fragment that may correspond to the non-diffusible 38-kDa fragment reported by LeMosy et al. (30). Ea* and Snk* are predicted to cleave the prodomain of Gd at Arg-187 and can generate the 50-kDa fragment described by LeMosy et al. (3). Furthermore, Ea* can cleave the prodomain of Snk at Arg-100, thereby producing the 50-kDa fragment observed when Ea* and Snk are coexpressed (3).
The four proteases in complex with their primary targets were examined further using the thorough mode. Relevant contacts of the best targets with each enzyme are shown in Fig.  2, and the structure of the peptide and the epitope of recognition are displayed in Fig. 3. The Ndl fragment 1134 SDSKEIV-GDGR2IVGGSHTSALQWPF 1158 and the two Gd fragments 201 ESLHVAIGEPK2SSDGITSPVFVDDD 225 (cleavage 30 residues upstream of the standard activation site) and 126 FMT-QIQLEHIR2KLSFIPDKKSSLLL 150 (C2/factor B-type cleavage site 83 residues upstream of the standard activation site) were docked on the active site of Ndl1*. Several single mutations have been made in ndl, and their associated phenotypes have been reported previously (33). We introduced these mutations in the three-dimensional model of Ndl1* and optimized the structure by 500 cycle-conjugated gradient minimization on the residues within 6 Å from the site of mutation. Mutations are displayed in Fig. 3 where residues are visible at the protein surface in its front view. The mutant C1114S (C1S) loses the disulfide bond with Cys-1252 (Cys-122) that connects the A and B chains, but it is processed and secreted normally and retains partial activity. The A chain is usually inconsequential to function in serine proteases (34). The mutants G1280S (G140S) and G1282R (G142R) affect the position of the highly conserved Trp-1281 (Trp-141) that lines part of the S1Ј specificity site. Either mutation may change the backbone and side chain orientation of Trp-141 with resulting poor substrate binding and loss of protease activity as seen experimentally (33). The same argument holds for the mutant V1278M (V138M) whose bulkier side chain is expected to perturb Trp-141. The mutant G1334R (G197R) has a protrusion into the S2Ј pocket that may cause steric hindrance with the substrate backbone around the scissile bond, thereby explaining the loss of activity seen experimentally (33). The mutant H1355L (H215L) perturbs the hydrophobic core next to the active site (Fig. 3A). The His residue at this position replaces the highly conserved Trp seen in almost all of the serine proteases. The hydrophobic residue Ile-207 of Gd or Val-1140 of Ndl can make contacts with Leu-1355 (Leu-215). However, non-conservative replacements of residue 215 in thrombin cause a drastic drop in activity (35), which may explain the total loss of activity of the Ndl1* mutant (33). The mutant A1360T (A221T) perturbs the backbone of the 220-loop responsible for Na ϩ binding (36). Mutations at the same position in thrombin result in the loss of Na ϩ binding and decreased protease activity (36), which again explains the re-sults seen for the Ndl1* mutant (33). The Gd* active site is characterized by very hydrophobic properties of both primed and unprimed subsites (Fig. 2B). Residue Ile-468 (Ile-194) replaces the canonical Asp in the S1 pocket and contributes to the enhanced hydrophobicity of this site together with Ile-463 (Ile-190) and the unusual Ile-511 (Ile-226) that replaces a highly conserved Gly. This largely hydrophobic architecture of the S1 pocket is unusual in serine proteases and probably compensates for the unusual Ala-488 (Ala-215) that replaces the highly conserved Trp at this position. Residue Val-220 of Gd is potentially a good cleavage site for Gd*. To test the possibility of Gd activation by Gd*, the Gd fragment 210 PKSSDGITSPV2FVDDDEDDVLEH-QF 234 was docked onto the active site of Gd*. A potential activation site with chymotrypsin-like specificity is 128 TQIQ-LEHIRKL2SFIPDKKSSLLLDP 152 located near the C2/factor B-type cleavage site (Fig. 1). Gd* carries two insertions in the 60 and 149-loops relative to chymotrypsin, a feature also observed in thrombin (34) where the insertions contribute to the narrow substrate specificity. The 60-loop in Gd* covers the substrate residues at P1 and P1Ј. The 149-loop is quite flexible, judging from the various conformations obtained in the fifty best models, and interacts loosely with substrate residues at P3Ј-P8Ј (Fig. 3B). Mutant alleles of gd have been identified and grouped in three complementation groups (37). The Ea fragment 116 LPGQCGNILSNR2IYGGMKTKIDE-FPW 141 was docked on Snk* (contacts detailed in Fig. 2C and structure detailed in Fig. 3C). Snk* carries Gly-370 (Gly-189) in the S1 pocket, consistent with either trypsin or chymotrypsin activity. Residue His-371 (His-190) makes the S1 pocket more prone to interact with hydrophilic rather than hydrophobic side chains. Ea residue Arg-127 fills the S1 pocket of Snk*, The Spz fragment 210 NDLQPTDVSSR2VGGSDERFL-CRSIR 234 (FBgn0003495, SP-48607) was docked onto the Ea* active site (contacts detailed in Fig. 2D and structure detailed in Fig. 3D) with Arg-220 at P1 bound to Asp-332 (Asp-189). Several naturally occurring mutations of Ea* have been identified that lead to dominant or recessive phenotypes of dorsoventral differentiation (24). Dominant alleles are A325V (A183V), P373S (P225S), R335C (R192C), G336S (G193S), G371R (G223R), G283S (G142S), V360 M (V213M), and G131E (G19E). Recessive alleles are G339R (G196R), G363E (G216E), S172L (S56L), and C324Y (C182Y). We introduced these mutations in Ea* and optimized the three-dimensional models by 500 cycle-conjugated gradient minimization of residues within 6 Å from the site of mutation. The mutant A325V (A183V) carries a bigger side chain in a densely packed region. We colored the positions of the mutated residues over the Ea* surface in Fig. 3D. The mutant P373S (P225S) perturbs the backbone of the 220-loop that is crucial for Na ϩ binding and substrate recognition (36,38). The substitution may promote weak Na ϩ binding and enhanced catalytic activity, thereby explaining the gain-of-function phenotype observed experimentally (24). The mutant R335C (R192C) lacks one ion-pair interaction with the bound Spz. The mutant G336S (G193S) introduces a side chain into the S1Ј pocket that may lead to an incorrect orientation of the scissile bond and loss of catalytic activity. The mutant G371R (G223R) perturbs the 220loop backbone. The mutant G283S (G142S) may displace Trp-141 nearby by constraining the backbone of the SЈ sites. The mutant V360M (V213M) reduces the accessibility of the S1 pocket and impairs the binding of substrates carrying Arg or Lys at P1. Other Ea* mutants that are expected to compromise substrate binding are G131E (G19E) that places an acidic side chain in a hydrophobic environment, G339R (G196R) and G363E (G216E) that occlude the S1 pocket, S172L (S56L) that places a bulkier side chain in a densely packed region, and C324Y (C182Y) that removes the disulfide bond and stability of the hydrophobic core next to the active site.
Putative Cation Binding Sites and Their Alteration in Vivo-Many vertebrate serine proteases contain functional cation binding sites that allosterically regulate activity and stability of the enzymes (39, 40), but such sites have not previously been described in invertebrate serine proteases. The inspection of the primary sequence and screening of the dorsoventral protease three-dimensional models with the program VALE (16) identified binding sites for Na ϩ in Ndl1* and Gd* and for Ca 2ϩ in Ea*, each corresponding to the positions of similar sites in the vertebrate proteases. The Na ϩ binding sites of Ndl1* ( Fig.  2A) and Gd* (Fig. 2B) have an architecture similar to that described for thrombin (36,38). Two carbonyl O atoms from residues 221 and 224 contribute together with four buried water molecules to the octahedral coordination of the cation: Arg-1361 (Arg-221) and Glu-1364 (Glu-224) for Ndl1*; Cys-506 (Cys-221); and Gln-509 (Gln-224) for Gd*. The Ca 2ϩ binding site of Ea* (Fig. 2D) is similar to that of trypsin (40) with two carboxylic side chains from Glu-193 (Glu-70) and Glu-203 (Glu-80), contributing to the octahedral coordination. In Ea*, three additional carbonyl oxygens from the backbones of Thr-196 (Thr-73) and Thr-198 (Thr-75) and the side chain of Asn-199 (Asn-76) contribute to Ca 2ϩ binding. A water molecule could provide the sixth oxygen in the coordination shell.
To determine whether these putative cation binding sites influence protease function in vivo, we mutagenized key residues in Gd and in Ea and then compared the ability of wildtype and mutant proteases to rescue embryos lacking maternal function for the respective proteins (Table II) (Figs. 4 and 5). In previous studies using the same wild-type mRNAs and recipient embryos, Gd has been shown to act in a dose-dependent manner to cause an abnormal expansion of ventral pattern elements ("ventralization"), whereas Ea rescues to wild type but cannot ventralize the embryo (4,24). Similar studies could not be undertaken for the putative Na ϩ binding site in Ndl, as wild-type Ndl is unable to rescue in embryo RNA microinjection assays, presumably because of the complex activation mechanism and early action of this protease. 3 For Gd, we injected synthetic mRNAs encoding wild-type Gd protein or mutations Y510A (Y225A) and Y510P (Y225P), expecting to disrupt Na ϩ binding (38,39). At high doses (0.6 mg/ml), all three RNAs gave a strong ventralization phenotype in which excess Gd activity causes too much signaling through the Toll pathway (4), indicating that the mutant proteins are active. At a 10-fold lower dose (0.06 mg/ml), the wild-type RNA provides a broad range of phenotypes from 3 E. LeMosy, unpublished data.

TABLE II
Rescue of ea-or gd-null embryos by wild-type and mutant mRNA injections Ea and Gd mRNAs were injected into the corresponding null embryos and scored at gastrulation (criteria described in Fig. 4 legend). This scoring could be correlated with the cuticle phenotypes seen at the end of embryogenesis (Fig. 5 (41,42). These mutants resulted in a complete loss of Ea activity equivalent to the injection of the S337A (S195A) mutant lacking the catalytic serine, indicating that the Glu-193 (Glu-70) residue and possibly Ca 2ϩ binding are critical for Ea function.
Engineering a Na ϩ Binding Site in Ea-Most invertebrate proteases contain Pro-225, which is incompatible with Na ϩ binding rather than Tyr-225 or Phe-225, which are compatible with such binding (39). This usage dichotomy at residue 225 has profound structural (38,39) and evolutionary (6) implications. One exciting possibility raised by the role of residue 225 in serine proteases (39) is the rational engineering of Na ϩ binding with the P225Y substitution. We surmised that Ea would be an excellent candidate to engineer a Na ϩ site, because it had already been shown that the P373S (P225S) substitution by an EMS mutation resulted in in- FIG. 4. Representative gastrulation patterns of injected embryos. Recipient embryos in A-F are of the genotype ea 4 /ea 5022rx1 , whereas those in G and H are gd 7 /gd 7 . All of the embryos are oriented with their anterior ends to the left and dorsal surface up. Injection of Ea wild type (WT) (A) results in a wild-type gastrulation pattern in which cells on the ventral side invaginate to form the ventral furrow, posterior cells migrate anteriorly on the dorsal side (arrowhead), and a headfold is visible only faintly along the lateral surface (asterisk). The Ea S195A (B), Ea E70A (C), and Ea E70K (data not shown) mutants are completely inactive in dorsoventral patterning, giving a complete dorsalization in which there is no ventral furrow or headfold, cells at the posterior do not migrate, and multiple symmetric folds appear along the anterior-posterior axis of the embryo. The Ea P225I mutant gives a weak partial rescue (D) in which the posterior cells migrate forward, but no ventral furrow forms and there are still multiple infoldings along the embryo anterior-posterior axis. Ea P225S (E) and Ea P225Y (F) mutants cause an intermediate ventralization of the embryo with the headfold prominent on the dorsal side (asterisk) and some anteriorward displacement of posterior cells along the dorsal side. Ea P225S typically gave more anteriorward movement of these cells than did Ea P225Y, consistent with a milder ventralization (supported by analysis of cuticle elements in Fig. 5), but these could not be readily distinguished in scoring gastrulation so they were grouped together in Table II. A more extreme ventralization could be seen with the injection of Gd wild type (WT) (G), Gd Y225P (H), or Gd Y225A (data not shown) in which there is a very prominent ventral furrow, no anterior displacement of posterior cells, and only a small headfold visible on the dorsal side of the embryo.

FIG. 5. Representative cuticle patterns of injected embryos.
Recipient embryos in A-E are of the genotype ea 4 /ea 5022rx1 , whereas those in F and G are gd 7 /gd 7 . When evident, embryos are oriented approximately with their anterior ends to the left and dorsal surface up. The external cuticle develops late in embryogenesis, but its elements reflect the earlier patterning along the dorsoventral axis. A, injection of Ea wild type (WT) results in a hatching embryo with bare dorsal cuticle, laterally derived filzkörper (arrowhead) and head skeleton (asterisk), and rows of ventral denticles (inverted V). B, an uninjected embryo does not develop either lateral or ventral structures and is considered completely dorsalized. The Ea S195A, Ea E70A, and Ea E70K mutants showed a similar phenotype, but their weak cuticles did not usually survive additional processing required for injected embryos. C, Ea P225S typically produced mildly ventralized embryos that retained filzkörper and a partial head skeleton, but their ventral denticles extend more dorsally than WT (this embryo was injected with 0.5 mg/ml mRNA). D, Ea P225I produced moderately dorsalized embryos with the rescue of filzkörper but rarely head skeleton and never ventral denticles. E, Ea P225Y typically produced moderately ventralized embryos lacking lateral filzkörper or head skeletons and having moderate expansion of disorganized ventral denticles around the embryo circumference (this embryo was injected with 0.5 mg/ml mRNA). F, a moderately to strongly ventralized embryo injected with Gd Y225P mRNA (0.06 mg/ml), showing strong expansion of ventral denticles. At high doses (0.6 mg/ml), Gd WT (G) and the Gd Tyr-225 mutants often gave rise to embryos with completely circumferential, disorganized ventral denticles. creased activity and ventralized phenotype (24). Ser is an intermediate in the genetic code between Pro and Tyr, and a saturation mutagenesis study has shown that Ser is also intermediate in catalytic activity between Pro and Tyr at position 225 in thrombin (38). We compared the activities of Ea proteins containing the mutations P373Y (P225Y) and P373S (P225S) with those of wild-type Ea. We found that P373Y (P225Y), similar to P373S (P225S), was capable of ventralizing easter-mutant embryos, something that the wild-type Ea protein is not able to do even when injected at high levels (Table II) (24). The P373Y (P225Y) mutant had significantly stronger ventralizing capacity than did P373S (P225S) and only rarely resulted in wild-type gastrulation or embryo hatching, even when titrated to a level (0.2 mg/ml) in which incomplete rescue was commonly seen together with weak ventralization. This behavior differs from that of previously described ventralizing easter alleles (24), which can be titrated to give a significant level of wild-type rescue, and suggests that this mutant enzyme may be less influenced by normal regulatory controls (43). A P373I (P225I) substitution resulted in a significant loss of activity with only weak partial rescue seen. The corresponding mutation in thrombin drastically reduced catalytic activity and did not provide Na ϩ binding from the mutated thrombin crystal structure (38). DISCUSSION The primary cleavage sites of Ndl, Snk, Ea and Spz have been proposed previously (3,5,7,30) and are confirmed in the present study. Ndl is secreted in the perivitelline space and is required for the ventralization process upstream of Snk activation (30). The activation of Gd remains controversial, but our models propose that Ndl1* has trypsin-like activity and may bind Gd at its activation site Lys-211 better than Snk, Ea, or Spz. We believe that the second protease domain of Ndl*, Ndl2*, is inactive and plays no role in the cascade. Therefore, although Gd can be activated at Val-220 by Gd*, Ndl1* should be retained as a better potential actor in Gd activation from predicted binding energies (3). Gd autoactivation might be detected when Gd is overexpressed either in embryos (4) or in cell culture (3,5) as suggested by the predicted low affinity of Gd for its own activation site, but this leaky autoactivation might not be as effective at physiologic expression levels of Gd. Hence, the proposed three-dimensional models of Ndl1*, Gd*, Snk*, and Ea* are consistent with the overall organization of the enzymatic cascade defining dorsoventral polarity in the fruit fly as recently described from cell culture and embryo studies (3,5). The cascade is initiated by Gd activation, more probably by Ndl1* as suggested in vivo (4), and alternatively more weakly by Gd or Gd* as also proposed in vivo previously (5). Gd* then activates Snk and Snk* activates Ea. Ea* then processes Spz for signaling via the receptor Toll. The three-dimensional models are also consistent with previous mutagenesis studies of Ndl1* (33) and Ea* (24) and offer a structural explanation of the observed mutant phenotypes.
Notably, the three-dimensional models reveal new structural features that can be exploited in future in vivo studies. Of particular importance is the unanticipated identification of a Ca 2ϩ binding site in Ea* and a Na ϩ binding site in Gd* and Ndl1*. The binding of Ca 2ϩ in trypsin stabilizes the fold of the protease domain (40), and Na ϩ binding to thrombin and many other serine proteases increases the catalytic activity toward synthetic and natural substrates (36,38,39). Based on the results presented here, it is highly likely that Ca 2ϩ binding to Ea* plays a key role in the function of this enzyme in vivo.
Likewise, Na ϩ binding to Gd* and possibly Ndl1* has functional significance. Interestingly, Na ϩ binding to Gd* may actually result in the inhibition of the catalytic activity of the enzyme in contrast to the effect observed in all other Na ϩ -dependent allosteric serine proteases studied to date (39). The current knowledge on the role of residue 225 in serine proteases (39) predicts that Na ϩ binding can be introduced in proteases carrying Pro-225 using the Pro 3 Tyr replacement. However, the P225Y substitution in tissue plasminogen activator is not sufficient to introduce Na ϩ binding and actually results in reduced catalytic activity (44). The introduction of Na ϩ binding in this protease requires substitution of a large number of residues in addition to Pro-225. 4 Therefore, it is remarkable that the P225Y substitution in Ea* has such a profound effect on its catalytic activity, consistent with a gain of function that likely results from Na ϩ binding. This observation motivates the analysis of this protease in terms of kinetic and direct structural studies and offers new and important insights into ongoing efforts to engineer Na ϩ binding and enhanced catalytic activity in serine proteases of medical and biotechnological relevance.