Identification of a Novel Family of Proteins in Snake Venoms

The three-dimensional structure of nawaprin has been determined by nuclear magnetic resonance spectroscopy. This 51-amino acid residue peptide was isolated from the venom of the spitting cobra, Naja nigricollis, and is the first member of a new family of snake venom proteins referred to as waprins. Nawaprin is relatively flat and disc-like in shape, characterized by a spiral backbone configuration that forms outer and inner circular segments. The two circular segments are held together by four disulfide bonds, three of which are clustered at the base of the molecule. The inner segment contains a short antiparallel β-sheet, whereas the outer segment is devoid of secondary structures except for a small turn or 310 helix. The structure of nawaprin is very similar to elafin, a human leukocyte elastase-specific inhibitor. Although substantial parts of the nawaprin molecule are well defined, the tips of the outer and inner circular segments, which are hypothesized to be critical for binding interactions, are apparently disordered, similar to that found in elafin. The amino acid residues in these important regions in nawaprin are different from those in elafin, suggesting that nawaprin is not an elastase-specific inhibitor and therefore has a different function in the snake venom.

proteins and polypeptides do not exhibit these and other enzymatic activities and thus are described as "nonenzymatic proteins." These proteins include neurotoxins, cardiotoxins, myotoxins, ion channel inhibitors, and anticoagulant proteins (3,4). Thus, snake venom proteins, whether they are enzymatic or nonenzymatic, have evolved as a complex mixture of proteins that target several tissues, organs, and physiological systems and interfere in their normal functions. Therefore snake venoms, when injected into a prey or victim, result in the simultaneous assault on various tissues, leading to multiple organ or system failure and often death.
A large number of protein toxins have been purified and characterized from snake venoms. These studies have shown that each venom contains over a hundred protein toxins. These toxins, however, belong to a very small number of superfamilies of proteins. For example, a single snake venom can contain as many as 15 isoforms of phospholipase A 2 (5-7). As one would expect, they share remarkable similarities in their primary, secondary, and tertiary structures. However, at times they differ from each other in their biological targeting and hence their pharmacological effects. Similarly, other enzymes as well as nonenzymatic proteins in snake venoms also exist in many isoforms (8) and can be classified in protein families. So far more than 1000 nonenzymatic proteins have been characterized, and these protein toxins are grouped into well recognized families as follows: 1) three-finger toxins (including neurotoxins and cardiotoxins), 2) serine proteinase inhibitors (including proteinase inhibitors and dendrotoxins), 3) lectins, 4) sarafatoxins, 5) nerve growth factors, 6) atrial natriuretic peptides, 7) bradykinin-potentiating peptides, 8) disintegrins, and 9) helveprins/CRISP (8 -11). The members in each family of protein toxins have a similar molecular scaffold, but they exhibit multiple functions. Thus, it appears that during the evolution of venoms some of the molecular scaffolds have been "selected," and various "functional sites" were generated by accelerated evolution to a common molecular scaffold. We are interested in the structure-function relationships of various families of toxins from snake and other venoms.
Many of the early efforts in venom research were directed toward the isolation and characterization of either proteins that are found in abundance or the most toxic components. With the advent of more sophisticated purification techniques, there have been studies of new and interesting protein components that are found in smaller quantities. In this paper, we describe a novel toxin that is a member of a new family of snake venom toxins. Thus, we have isolated and purified nawaprin, the first member of this family, from Naja nigricollis venom.
The complete amino acid sequence and the solution structure of this toxin have been determined. Nawaprin is structurally similar to secretory leukocyte proteinase inhibitor (SLPI) 1 and elafin, the tertiary structures of which have been studied by NMR (12) and x-ray crystallography (13). Both nawaprin and elafin contain four disulfide bonds and several proline residues. Elafin is a specific inhibitor of human leukocyte elastase and porcine pancreatic elastase, the former of which was first obtained from exfoliated skin (scales) of patients with psoriasis (14,15). This new protein fold has also been used as a scaffold in the evolution of snake venom toxins and may be useful in the engineering of proteins with novel pharmacological actions.

EXPERIMENTAL PROCEDURES
Materials-Lyophilized crude N. nigricollis venom was obtained from Miami Serpentarium Laboratories (Miami, FL). Trypsin endopeptidase was purchased from Wako Pure Chemicals (Osaka, Japan). 4-Vinylpyridine was obtained from Sigma. Superdex 30 and Sephasil C18 columns were obtained from Amersham Biosciences.
Isolation and Purification of Nawaprin from N. nigricollis Snake Venom-Nawaprin was purified by a three-step purification process protocol; gel filtration of venom on a Superdex 30 column was followed by ion exchange chromatography on a UNO S6 column and HPLC on a Jupiter C18 column. Crude venom (200 mg) was loaded onto a Superdex 30 column (HiLoad TM 16/60) equilibrated with 50 mM Tris-HCl buffer, pH 7.5. The proteins were eluted with the same buffer at a flow rate of 1 ml/min on a fast performance liquid chromatography system (Amersham Biosciences). The protein elution was monitored at 280 nm. The fraction with the peak of interest (ϳ2-5 mg) was applied separately onto a UNO S6 cation exchange column (Bio-Rad) pre-equilibrated with 50 mM Tris-HCl buffer, pH 7.5 (Buffer A). The bound proteins were eluted by a linear gradient of 1 M NaCl in Buffer A. Protein elution was carried out at a flow rate of 2 ml/min and monitored at 280 nm. The unbound fraction from the UNO S6 column was loaded onto a Jupiter C18, 10 (10 mm/250 mm) column equilibrated with 0.1% (v/v) trifluoroacetic acid on a Vision Work station (PerkinElmer Life Sciences). The bound proteins were eluted using a linear gradient of 80% acetonitrile (ACN) in 0.1% (v/v) trifluoroacetic acid at a flow rate of 2 ml/min. The elution of proteins was monitored at 215 nm.
Reduction and Pyridylethylation-Purified protein was reduced and pyridylethylated using procedures described earlier (16). Protein (0.5 mg) was dissolved in 500 l of denaturant buffer 6 M guanidium hydrochloride, 0.25 M Tris-HCl, 1 mM EDTA, pH 8.5. After the addition of 10 l of ␤-mercaptoethanol, the mixture was incubated under vacuum for 2 h at 37°C. 4-Vinylpyridine (50 l) was added to the mixture and kept at room temperature for 2 h. Pyridylethylated protein was purified on a -RPC C2/C18 (2.1 mm/10 mm) column using ACN in 0.1% (v/v) trifluoroacetic acid at a flow rate of 200 l/min.
Chemical and Enzymatic Cleavage-Peptides of pyridylethylated protein were obtained by chemical cleavage using formic acid (Aspspecific) as described by Inglis (17). Briefly, the desalted protein sample (500 g) was dissolved in 2% formic acid in a glass vial and then frozen. Subsequently, under vacuum, the vial was thawed at room temperature and then sealed off. The vial was then heated at 108°C for 2 h and allowed to cool to room temperature. Peptide digestion of the pyridylethylated protein was also obtained by enzymatic cleavage with trypsin. Pyridylethylated protein (300 g) was dissolved in 300 l of 100 mM ammonium bicarbonate buffer and digested overnight by trypsin at 37°C. The peptides generated by both formic acid and tryptic digestion were separated by reverse phase HPLC on a Sephasil C18 (5, 2.1 mm/10 mm) column, equilibrated with 0.1% (v/v) trifluoroacetic acid. A linear gradient of 80% (v/v) ACN in 0.1% trifluoroacetic acid (v/v) was used to elute bound peptides.
Mass Spectrometry-The protein fractions eluted from the columns were screened for novel molecular weight peptides using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry on a Voyager DE-STR Biospectrometry Work station (Applied Biosystems, Foster City, CA). Typically, 1-5 pmol/l of the sample was co-crystallized with an equal volume of the matrix (10 mg/ml of ␣-cyano-4-hydroxycinnamic acid freshly prepared in 1:1 ACN:water containing 0.3% (v/v) trifluoroacetic acid) on a 100-well stainless steel sample plate. The accelerating voltage was set at 25,000 V, the grid voltage was set at 93.0%, and the guide wire voltage was set at 0.3%. Molecular ions were generated using a nitrogen laser (wavelength, 337 nm) at an intensity of 1800 -2200. Extraction of ions was delayed by 800 ns. The spectrum was obtained by averaging several scans. The spectrum was calibrated using external molecular weight standards.
Precise masses (0.01%) of the native protein and peptides were determined by electrospray ionization mass spectrometry (ESI-MS) using a PerkinElmer Sciex API 300 LC/MS/MS system. Typically, reverse phase HPLC fractions were directly used for analysis; alternatively, samples were prepared by dissolving desalted, lyophilized samples in 1:1:1 ACN:methanol:water (v/v/v) containing 1% acetic acid. The samples were delivered either by direct infusion or flow injection. Ion spray, orifice, and ring voltages were set at 4600, 50, and 350 V, respectively. Nitrogen was used as the nebulizer and curtain gas. An LC-10AD Shimadzu liquid chromatograph was used for solvent delivery (40% (v/v) ACN in 0.1% trifluoroacetic acid). The software Biomultiview (PerkinElmer Sciex) was used to analyze and deconvolute the raw mass spectrum.
Amino-terminal Sequencing-Amino-terminal sequencing of the native and pyridylethylated protein as well as peptides was performed by automated Edman degradation using a PerkinElmer Life Sciences 494 pulsed-liquid phase protein sequencer (Procise) with an on-line 785A phenylthiohydantoin-derivative analyzer.
NMR Spectroscopy-The NMR sample was prepared by dissolving 3.1 mg of the lyophilized peptide in 0.350 ml of 90% H 2 O, 10% D 2 O in a 5-mm Shigemi (Allison Park, PA) NMR tube, resulting in a final protein concentration of 1.7 mM and pH 3.1.
NMR experiments were performed on a Bruker (Karlsruhe, Germany) AVANCE-600 DRX spectrometer using a 5-mm 1 H inverse probe operating at temperatures of 10, 20, 25, 30, and 35°C. Two-dimensional NMR spectra were acquired in phase-sensitive mode using time-proportional phase detection (18). Homonuclear two-dimensional spectra recorded were double-quantum filtered COSY (19) with a fast recycle time (20), total correlation spectroscopy (21) with a spin-lock period of 60 ms, and NOESY (22) with a mixing time of 200 ms. Solvent signal suppression was achieved either by presaturation or by using the WATERGATE (23) pulse sequence. H-D exchange experiments were carried out by reconstituting the freeze-dried sample with D 2 O, acquiring series of one-dimensional spectra for 15 min, and then acquiring two 1-h total correlation spectroscopy spectra. All of the spectra were processed using XWIN-NMR software (Bruker) and analyzed using the program XEASY (24).
Structural Calculations-The final structure was obtained by using restraints consisting of 503 nonredundant NOE-derived distances, 18 hydrogen bonds, 9 dihedral angles, and 4 disulfide bonds (see Table  II). The NOESY spectrum recorded at 25°C with a mixing time of 200 ms provided the NOE constraints. The H-D exchange experiments and the preliminary calculated structures were used in deducing hydrogenbonding pairs. The hydrogen-bonding constraints were assigned upper distance limits of 2.2 Å for NH i to O j and 3.2 Å for N i to O j . The disulfide bond configuration was determined from the characteristic NOE interactions between the ␣ and ␤ protons of two-paired cysteine residues.
The standard simulated annealing procedure in DYANA was employed to obtain preliminary structures prior to refinement. An iterative cycle of calculations, structure analysis, manual assignment, and constraint revision was implemented to improve the quality of calculated structures. The final DYANA calculation yielded 3000 structures, 60 of which (with the lowest NOE violations) were selected for refinement by using the standard simulated annealing script in CNS (25). In this refinement process, the high temperature dynamics and cooling cycle were performed in Cartesian space. The 20 structures with the lowest overall energy were considered as representative of nawaprin. Secondary structures in nawaprin were determined using MOLMOL (26).

Purification of a Novel Protein from N. nigricollis Venom-
Gel filtration of the crude venom of N. nigricollis on a Superdex 30 column yielded eight major peaks (Fig. 1A). Because our interest lay in isolating small novel peptides (4000 -9000 Da), we searched for polypeptides that had masses that were distinctly different from the well established toxin families, using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (data not shown). Peak 4 had a mass of ϳ5290 Da. The molecular size was less than three-finger toxins and serine proteinase inhibitors but larger than atrial natriuretic peptides (8). Thus, based on its mass, we had identified a polypeptide belonging to no other known family of snake venom proteins. Proteins in peak 4 were further separated on a cation exchange column, UNO S6 (Fig. 1B). The protein of interest did not bind to the column; it was eluted in the unbound fraction. The protein was further purified on a reverse phase column, Jupiter C18 (Fig. 1C). The major peak in the HPLC chromatogram had a molecular weight of 5288.50 Ϯ 0.08 by ESI-MS (Fig. 1D). The overall yield of the protein varied from batch to batch of venom samples between 0.09 and 0.51% (n ϭ 8).
Determination of the Amino Acid Sequence-Amino-terminal sequencing of the native protein was achieved by the Edman degradation, and it resulted in the identification of first 34 residues ( Fig. 2A). To complete the sequence, the pyridylethylated protein and the two peptides F1 and F2 purified from the formic acid digest were analyzed (data not shown). All of the 51 residues, except the 48th residue, were unequivocally identified. To confirm the sequence, the pyridylethylated protein was digested with trypsin, and the tryptic peptides were then purified (data not shown). The carboxyl-terminal peptide was identified, based on its mass and sequence. Then we could identify the 48th residue as Thr. Hence, this novel protein contains 51 residues, including eight cysteine residues ( Fig.  2A). The calculated molecular weights of the native and pyridylethylated proteins were 5288.12 (with the assumption that all eight cysteine residues are involved in disulfide bond formation) and 6137.37, respectively; these values matched the estimated masses determined by ESI-MS (Table I). BLAST search for sequence homology indicated that this protein belongs to the family of whey acidic proteins (WAP) (Fig. 2B). Since then, we have isolated and purified two other peptides from snake venoms that show similar mass and amino acid sequence. Although all of the cysteine residues are conserved in these proteins, the intercysteine segments are distinctly different. 2 Because of their homology with WAPs, we have named this new family of snake venom proteins Waprins (WAP-related proteins) and named the protein from N. nigricollis venom Nawaprin (Naja waprin).
WAPs were the first members of this family to be isolated. They are small secretory proteins widely distributed in the whey of many species (27)(28)(29). They contain two four-disulfide core domains. Waprins are structurally closer to the epididymal secretory protein members of the WAP family (Fig. 2B). (v/v) trifluoroacetic acid at a flow rate of 2 ml/min. The peak containing nawaprin is indicated. D, electrospray ionization MS of nawaprin. The spectrum shows ions with three and four charges, corresponding to a single, homogeneous peptide with a molecular mass of 5288.5 (inset). and the distal epididymis (30). Trappins (transglutaminase substrate and WAP domain-containing proteins) of the WAP family are "trapped" in the tissues through covalent crosslinking (31)(32)(33). They contain an amino-terminal transglutaminase substrate domain (also called the cementoin domain (34)) with a variable number of hexapeptide repeats with the consensus sequence GQDPVK. Trappins anchor the biologically active WAP motifs at appropriate sites in the extracellular matrix through this domain. Elafin/SKALP (skin-derived antileukoproteinase), SPAI-2 (Na ϩ /K ϩ -ATPase inhibitor), and porcine WAP-3 are some of the members of the trappin family, although elafin and SPAI were first isolated as soluble proteins (14,36). In contrast, some of the WAP proteins such as SLPI and human seminal plasma inhibitor do not have the cementoin domain and are produced as secreted proteins (37)(38)(39). WAPs and other family members have one to three similar domains, whereas waprins contain a single four-disulfide core domain and are found in snake venoms in soluble form.
Determination of Solution Structure-The NOESY spectrum of nawaprin obtained at 25°C and pH 3.1 showed wide dispersion of amide proton signals indicating ␤-sheet secondary structures (Fig. 3). The analysis of the spectra was, however, not straightforward because it was made difficult by a number complicating factors: These included the presence of seven proline residues, excessive line broadening for a number of peaks, and an extra set of small peaks suggesting the presence of minor conformations of the peptide in solution.
The seven proline residues in nawaprin presented a major difficulty in resonance assignment because it led to peak overlap in appropriate regions of the spectra. All proline residues displayed strong d ␣␦ (iϪ1,i) connectivities, suggesting that they were mainly in the trans-conformation (40), but still the existence of minor cis-conformations could not be completely discounted. For example, Pro 31 showed an additional d ␣␣ (iϪ1,i) cross-peak, although its intensity is very weak. This could explain the presence of minor sets of peaks in the spectra that could not be readily assigned to a specific residue in the sequence. The presence of minor conformation(s) in solution was confirmed by reverse phase HPLC of the repurified sample wherein two minor peaks, whose intensities were ϳ5% of the large (major) peak, were detected.
In addition to these unwanted factors, broad peaks were also observed for many backbone amide protons that may suggest intermediate chemical exchange of protons with the aqueous solvent, slow conformational averaging, or flexibility in the molecule. Moreover, the backbone amide peaks of Lys 21 , Leu 23 , Cys 41 , Met 44 , and Thr 45 were split, suggesting a slow-to-medium state of conformational exchange. In the final analysis, the difficulties encountered in the resonance assignments were The complete amino acid sequence was determined by amino-terminal sequencing of the native, pyridylethylated protein and peptides obtained by formic acid and trypsin digest. B, amino acid sequence of waprin is aligned with epididymal proteins of WAP family to show maximum similarity. Identical residues are shown in black boxes, whereas conserved residues are in shaded boxes. C, structural similarity of waprin with functionally characterized WAPs. The disulfide-bonding pattern is shown by the lines connecting corresponding cysteine pairs. Identical residues are shown in black boxes, whereas conserved residues are in shaded boxes. Intercysteine loops are numbered, and the "primary" binding segments in elafin and SLPI are shown in horizontal boxes. The 34% sequence similarity between elafin and nawaprin, and the eight conserved cysteine residues, suggest equivalent cysteine pairing patterns (Fig. 2C). This was confirmed by NMR based on the characteristic NOEs between ␣ and ␤ protons of the bonded cysteine pairs. NOE connectivities that were observed included Cys 30 -H␣-Cys 46 -H␤, Cys 30 -H␤-Cys 46 -H␣, Cys 30 -H␤-Cys 46 -H␤, Cys 7 -H␤-Cys 37 -H␣, Cys 7 -H␤-Cys 37 -H␤, and Cys 24 -H␣-Cys 36 -H␤. The NOE cross-peaks linking Cys 20 -Cys 41 were not observed, probably because of rather broad lines in the corresponding part of the spectrum. However, this pairing was easily established by elimination (because the three disulfide pairings were then known) and later during preliminary structure calculations, because considerable long range NOE connectivities were also observed among the protons of their neighboring residues (Cys 41 , Phe 43 , Thr 22 , and Lys 21 ).
Structure Description-The structure of nawaprin in solution was characterized by the presence of both well and poorly defined regions, the extents of which were comparable in magnitude. Fig. 4A shows the ensemble of the best 20 structures superimposed over the backbone atoms of the "well defined" residues, 2-8, 22-38, and 44 -51 of the mean structure. It is clear that although large sections of the nawaprin molecule were well ordered, some regions were apparently disordered. The mean global backbone root mean square deviation with respect to the mean structure was 1.81 Å when all of the residues were superimposed; this was reduced to 0.32 Å when only the well defined residues of 2-8, 22-38, and 44 -51 were considered (Table II). Although the amino and carboxyl termini of nawaprin were relatively well defined, there was substantial disorder in the upper regions defined by residues 9 -21, called the "outer loop," and 39 -43, called the "inner loop"; this suggests that these two regions of the molecule have higher flexibility than the rest. This apparent structural disorder reflected in the NMR spectra was mainly caused by the dearth of long  (2-8, 22-38, and 44-51) 0.32 Ϯ 0.07 Heavy atoms (2-8, 22-38, and 44-51) 0 range NOEs that would provide information on "connections" between the two loops. The few NOE connectivities that were observed were those between the protons of Ile 19 -Cys 41 and Lys 21 -Phe 43 . Unlike many disulfide cross-linked polypeptides, nawaprin is not compact but is rather flat and disc-like (Fig. 4, B and C). The backbone configuration is essentially spiral in shape, characterized by outer and inner circular segments that are connected by disulfide bonds. The inner segment incorporates a small twisted antiparallel ␤-sheet (a ␤-hairpin) at residues 35-37 and 45-47, whereas the outer segment is devoid of any defined secondary structures except for some ␤-turns that are situated at residues 26 -29, 27-30, and 31-34. Note that in seven of the 20 "best" structures in the ensemble, a continuous 3 10 helix spanning residues 26 -30 was found instead of the usual ␤-turn(s). The three disulfide bridges are clustered together at the "base" of the molecule, anchoring the lower inner loop to the two ends of the outer loop; the fourth disulfide bridge defined by Cys 20 -Cys 41 holds the tips of the two loops together. DISCUSSION We have described the purification and three-dimensional structure determination of the first snake-toxin member, nawaprin from N. nigricollis, of a new family of proteins. Nawaprin and other waprins from snake venoms 2 are small proteins with ϳ50 amino acid residues that have a four-disulfide core domain structure, making them members of the WAP family of proteins (Fig. 2).

Comparison with Elafin and Other Proteins-A DALI algorithm (41) search for similar structures in the Protein Data
Bank revealed that the overall fold of nawaprin has significant similarity to that of elafin. This is expected given a "respectable" sequence similarity of 34% but more importantly the basically identical disulfide bonding pattern (Fig. 2). The threedimensional structures of nawaprin and elafin superimposed over the backbones of 47 equivalent residues found by DALI are shown in Fig. 5 (A and B). A root mean square deviation of 3.2 Å was obtained when the backbone of residues 2-9, 11-41, and 43-50 of nawaprin were superimposed onto the backbone of residues 11-18, 19 -49, and 50 -57 of elafin. It is clear that the tertiary folds of the two peptides are very similar. Even the locations of the secondary structures, the small antiparallel ␤-sheet in the inner segment, and the small turn or helix in the outer segment are basically identical (Fig. 5B). Because the tertiary structure of elafin is similar to those of other protease inhibitors such as the carboxyl-terminal half of human seminal plasma inhibitor human seminal plasma inhibitor (42), also known as SLPI, and SPAI-1 (43), it may be regarded that nawaprin belongs to the same class of enzyme inhibitor proteins that have an equivalent tertiary fold.
There are only a few subtle differences in the secondary structures of nawaprin and elafin. In nawaprin, the two ␤-strands are composed of three residues, 35-37 and 45-47, and it is a residue shorter than that of elafin. This perceived difference in their structures may not be real and may actually be due to uncertainties brought about by analyzing the spectra of nawaprin. In the elafin structure determined by NMR (12), the tip of the outer segment, which is the binding site, shows some apparent disorder suggesting higher mobility, whereas the entire inner loop, containing the twisted ␤-sheet, is well defined. Nawaprin appears to be more flexible than elafin as shown by the large degree of apparent disorder in the tips of the outer and inner loops. Such characteristic disorder at the tip of the inner loop of nawaprin may be due to the fact this inner loop is tethered by the Cys 20 -Cys 41 disulfide bond to a very disordered outer loop. There is a possibility that this large continuous region has higher mobility as evidenced by the lack of long range NOE connectivities in this part of the nawaprin molecule. NMR relaxation experiments could be used to probe the mobility of this region in the molecule.
Functional Implications-The tip of the outer circular segment in elafin, defined by residues 20 -26, is important for its activity, because it is the primary binding segment that interacts with the active-site pocket in porcine pancreatic elastase (13). This binding segment in elafin is composed of at least seven residues, LIRCAML (boxed parts of loops 1 and 2 in Fig.  2C), six of which are hydrophobic; the presence of several hydrophobic residues in this region is known to be crucial for the activities of elafin and other proteinase inhibitors such as SLPI (12). This region also incorporates a disulfide bond that connects the outer segment to the inner core of the inhibitor. In the porcine pancreatic elastase-elafin complex (13), the primary binding loop (outer loop) in elafin is actually in an extended ␤-strand conformation, forming an antiparallel ␤-sheet with porcine pancreatic elastase through a series of hydrogen bonds. In free solution, however, this outer loop segment is disordered (12). Nawaprin in solution also has an apparently disordered outer loop segment similar to that in elafin. However, based on the sequence alignment in Fig. 2C alone, nawaprin does not have a fragment analogous to the primary binding segment defined by residues 20 -26. The DALI algorithm finds that this primary binding segment in elafin, which is composed of LIR-CAML, is topologically similar to the segment defined by residues 12-18 in nawaprin, which is composed of MPIPPLG. Although these two segments are both hydrophobic, they are not sequentially similar to each other. Furthermore, the relative positions of the cysteine pairs that connect the tips of the outer and inner loops are also different in the two molecules. Fig. 5 (C and D) show the electrostatic potential surfaces of nawaprin and elafin. One can clearly see that the charge distributions in the two molecules are different. Although the upper halves of the two molecules, which incorporate the inner and the outer loops, contain a number hydrophobic residues, one side of the nawaprin has a more hydrophobic upper part than the corresponding region in elafin. In addition to this, there is large continuous negative patch in nawaprin defined by Glu 2 , Asp 9 , Asp 27 , and the carboxyl-terminal Pro 51 , which is absent in elafin. In fact, part of this region in elafin is positively charged because it includes two lysine residues, Lys 12 and Lys 43 . The difference in the nature of the side chains of the two molecules therefore suggests that nawaprin may not be a protease inhibitor, although its overall fold is very similar to that of elafin.
The modest sequence similarity of 34% and the fact that both nawaprin and elafin incorporate several proline residues (including two consecutive proline residues in the external segments), share similar three-dimensional folds strongly suggest that two polypeptides have evolved from a common ancestral molecule.
Physiological Role of WAPs and Related Proteins-WAP is a major protein constituent in whey, and it is suggested to be the major food source for the young (44). Its secretion varies with different phases of lactation (44), and it possibly plays a role in hair and nail growth (45). Although the physiological functions of epididymal proteins such as HE4, CE4, and BE20 (30,46,47) in the male genital tract are not clear, they may act as decapitation factors that bind to sperm that are released in the female reproductive tract (48).
Proteinase inhibitors of the WAP family play an important physiological role in regulating the activity of various proteinases. Generally, these inhibitors prevent the invasion of bacteria and other microbes. In addition, some of them play a specific role in host defense. For example, SLPI and elastase maintain the balance between proepithelin and epithelins and thus regulate innate immunity and wound healing (49). SLPI also acts as a potent anti-microbial agent that is a function that appears to be independent of its anti-proteinase activity (50). Mouse SWAM1 and SWAM2 have potent antibacterial activity, but they fail to inhibit elastase or cathepsin G (51). A caltrin-like protein secreted by guinea pig seminal vesicles inhibits Ca 2ϩ uptake by spermatozoa (52). SPAI-1 inhibits Na ϩ ,K ϩ -ATPase (36) but not proteinase (53). Therefore, based on their occurrence in snake venoms, waprins may play a part in the offensive armamentarium of this complex mixture of proteins.
Accelerated Evolution of WAPs and Related Proteins-The structures of complexes of SLPI and ␣-chymotrypsin (42) and elafin and pancreatic elastase (13) show that outer segments TYGQCLML and LIRCAML (boxed parts of loops 1 and 2 in Fig. 2C) are the primary binding segments of these inhibitors to their respective proteinases. In both complexes the sessile bonds (Leu-Met and Ala-Met, respectively) that determine the proteinase specificity are intact. A comparison of amino acid sequences indicates that the loop 2 region is the most variable one in WAPs (54 -56). This reflects on the ability of WAPs to target not only different proteinases but also other enzymes (for example, Na ϩ ,K ϩ -ATPase), receptors, or ion channels. Loops 1 and 2 of nawaprin are distinctly different from all other WAPs in both the number as well as chemical nature of the constituent residues. Thus, we believe that it will not be a proteinase inhibitor but a ligand for a specific enzyme/receptor/ ion channel. Similar variation of functionalities in the loops within the same structural fold is seen also in other proteinase inhibitors (57) as well as toxins (9). It is also important to note that in mini-proteins other surface loops could also play a role in the "bait region" or "functional site" (9). In nature, this diversification could be achieved through gene duplication and accelerated evolution of WAP genes. Recent studies have shown that a locus on human chromosome 20 contains 14 genes encoding WAPs and related proteins, suggesting the evolution of WAP gene(s) by repeated duplications (56). Further, the region in exon 2 encoding the reactive site shows only 60 -77% nucleotide identity compared with 97-98% identity in other regions (54). This suggests accelerated evolution of WAP genes.
In a similar fashion to the evolution of WAP proteins, several molecular scaffolds have been used during the evolution of "cocktails" of the toxins in snake venoms. The selected genes are duplicated several times, and the core of each protein scaffold is conserved, whereas the loops and surfaces are altered through mutations. As in the case of WAPs, some exons of the toxin genes of snake venom mutate more rapidly than their introns, thus speeding up the generation of new toxins (35). Therefore, members of a protein family share structurally important core residues including cysteine residues. However, the intercysteine loops show considerable differences in the sequence. This results in toxins with distinctly different molecular surfaces and hence different abilities to interact with target receptor/acceptor proteins. Hence they display differences in their biological properties (9).
So far, we have been able to isolate only a single waprin from each snake venom. However, three waprins showed a conserved molecular framework with significantly different intercysteine loops. It will be interesting to search for other snake venom proteins with the WAP motif.
In summary, we have isolated and characterized a new structural family of snake venom proteins, waprins. They contain a four-disulfide core structure and resemble the WAP structural fold. Furthermore, analysis of the structures indicates that waprins could have a range of different biological properties.