Isolation, cDNA Cloning, and Structure-based Functional Characterization of Oryctin, a Hemolymph Protein from the Coconut Rhinoceros Beetle, Oryctes rhinoceros, as a Novel Serine Protease Inhibitor*

We isolated oryctin, a 66-residue peptide, from the hemolymph of the coconut rhinoceros beetle Oryctes rhinoceros and cloned its cDNA. Oryctin is dissimilar to any other known peptides in amino acid sequence, and its function has been unknown. To reveal that function, we determined the solution structure of recombinant 13C,15N-labeled oryctin by heteronuclear NMR spectroscopy. Oryctin exhibits a fold similar to that of Kazal-type serine protease inhibitors but has a unique additional C-terminal α-helix. We performed protease inhibition assays of oryctin against several bacterial and eukaryotic proteases. Oryctin does inhibit the following serine proteases: α-chymotrypsin, endopeptidase K, subtilisin Carlsberg, and leukocyte elastase, with Ki values of 3.9 × 10−10 m, 6.2 × 10−10 m, 1.4 × 10−9 m, and 1.2 × 10−8 m, respectively. Although the target molecule of oryctin in the beetle hemolymph remains obscure, our results showed that oryctin is a novel single domain Kazal-type inhibitor and could play a key role in protecting against bacterial infections.

Eggs and larvae of the coconut rhinoceros beetle Oryctes rhinoceros live in compost in warm areas such as Southeast Asia. This beetle has thus developed a self-defense system that includes antimicrobial peptides. Several antimicrobial peptides were isolated from the hemolymph of the beetle: defensin (1), rhinocerosin (2), and scarabaecin (3). Oryctin (GenBank TM accession no. BAA36402), a 66-residue peptide with three intramolecular disulfide bonds, was found as a hemolymph peptide of the beetle during a search for antibacterial peptides.
The function of oryctin has been unclear, because no other known peptide or protein is sequentially similar to oryctin. We took a structural approach to reveal the function of oryctin. First, we determined the solution structure of recombinant 13 C, 15 N-labeled oryctin by heteronuclear NMR spectroscopy. Next, we searched for proteins that are structurally similar to oryctin in the Protein Data Bank (PDB). 2 The search revealed that oryctin has a fold that is similar in part to that of the turkey ovomucoid third domain (OMTKY3), a serine protease inhibitor. We then performed protease inhibition assays of oryctin and found that oryctin inhibits eukaryotic chymotrypsin-like serine proteases such as ␣-chymotrypsin and leukocyte elastase and bacterial subtilisin-like serine proteases such as subtilisin Carlsberg and endopeptidase K. Therefore, oryctin is a novel single domain Kazal-type inhibitor despite its unique amino acid sequence. Kazal-type serine protease inhibitors usually consist of multiple Kazal domains, each of which has a characteristic disulfide linkage pattern, Cys I -Cys V (where Cys I , for example, is the first cysteine residue from the N terminus), Cys II -Cys IV , and Cys III -Cys VI , as well as a secondary structure consisting of an ␣-helix and an antiparallel ␤-sheet. Here, we discuss the structure and function of oryctin by comparing the sequences, the patterns of the disulfide linkages, and the tertiary structures.

Purification of Oryctin
The hemolymph of the third instar larvae of O. rhinoceros collected on the islands of Okinawa and Ishigaki, Japan, was collected into an ice-cooled 50-ml centrifugation tube containing 1 mg of aprotinin 24 h after injection of heat-killed Escherichia coli. The hemolymph was centrifuged at 39,100 ϫ g for 50 min at 4°C. The supernatant was heated in boiling water for 10 min and then cooled on ice and centrifuged at 39,100 ϫ g for 50 min at 4°C. The supernatant was acidified with 0.1% (v/v) trifluoroacetic acid (TFA) and applied onto a Sep-Pak Vac tC18 0 -20% (v/v) in 5 min, followed by 20 -40% (v/v) in 40 min at a flow rate of 0.5 ml/min. The fractions containing oryctin were applied to the same system but using 0.05% (v/v) TFA instead of 0.05% (v/v) heptafluorobutanoic acid.

Matrix-assisted Laser Desorption Ionization Time-of-flight Mass Spectrometry (MALDI-TOF MS)
MALDI-TOF MS was measured on a Voyager Linear spectrometer (Applied Biosystems). About 1 pmol of purified oryctin was dissolved in 1 l of 0.1% (v/v) TFA. The sample solution was then mixed with saturated sinapinic acid solution in 50% (v/v) acetonitrile containing 0.1% (v/v) TFA directly on the target.

cDNA Cloning
The cDNA encoding oryctin was cloned using the following three-step PCR amplification.
Step 1: Reverse transcriptase-PCR-The fat body was collected 10 h after the injection of heat-killed E. coli. The poly(A) ϩ RNA was purified from the fat body using a Quick Prep mRNA purification kit (GE Healthcare). The first-strand cDNA was synthesized from 1.32 g of the poly(A) ϩ RNA using a First-Strand cDNA synthesis kit (GE Healthcare). Using the first-strand cDNA as a template, RT-PCR was performed using the following degenerate primers: the sense primer O1F (17mer), whose sequence was deduced from the amino acid sequence of Val 1 -Asp 6 (5Ј-GTNCCNGTNGGN(AT)(CG)NG-A-3Ј) and the antisense primer O1R (20-mer), whose sequence was deduced from the amino acid sequence of Asn 23 -Val 29 (5Ј-AC(AGT)ATNCCYTTYTCNGGRTT-3Ј). A 45-cycle stepdown protocol was used: denaturation at 94°C for 8 min (first cycle) or 30 s (second and following cycles), annealing at 60°C (initial five cycles), 55°C (second five cycles), 50°C (third five cycles), 45°C (fourth five cycles), or 40°C (final 25 cycles) for 30 s and polymerization at 72°C for 30 s. The resultant 86-bp fragment was subcloned into a TA cloning vector (Invitrogen). DNA sequencing was performed using an ABI Prism model 373A DNA sequencer with the dye-terminator protocol (Applied Biosystems).
Step 2: 3Ј-Rapid Amplification of cDNA End-3Ј-RACE was done using the first-strand cDNA as the template, the antisense NotI adaptor primer 5Ј-AACTGGAAGAATTCGCGGC-3Ј, and the sense primer O2F (22-mer), whose sequence was derived from the result of Step 1: 5Ј-CTGTGAACCCAAAC-TATGCACC-3Ј. A 35-cycle PCR was used: denaturation at 94°C for 8 min (first cycle) or 30 s (second and following cycles), annealing at 50°C for 30 s, and polymerization at 72°C for 30 s.
The amplified fragment was subcloned and sequenced as described above.
Step 3: 5Ј-Rapid Amplification of cDNA End-5Ј-RACE was done using the 5Ј-RACE kit (Invitrogen). The first-strand cDNA was synthesized using the antisense primer OGSP1 (16mer), whose sequence was derived from the result of Step 2: TTATGGACGTGGTGCA. The terminal deoxynucleotidyl transferase reaction was performed on ice for 1 h. The nested PCR was performed using the antisense primer OGSP2 (24mer) TTTTTGATTGCTTCTTCACACTCG and the abridged anchor primer. A 40-cycle PCR was used: denaturation at 94°C for 6 min (first cycle) or 1 min (second and following cycles), annealing at 50°C for 30 s, and polymerization at 72°C for 1 min. The amplified fragment was subcloned and sequenced as described above.

Determination of Disulfide Linkages
About 1 g of oryctin was dissolved in 10 l of 20 mM sodium bicarbonate (pH 8.5) containing 0.1 g of lysyl endopeptidase (Wako, Tokyo, Japan) and endoprotease Asp-N (Roche Applied Science) and incubated at 37°C for 4 h. The reaction mixture was applied directly to MALDI-TOF MS. ␣-Cyano-4-hydroxycinnamic acid was used as the matrix.

Expression and Purification of 15 N-and 13 C, 15 N-labeled Oryctin
Oryctin with an N-terminal His 6 tag that was cleavable by tobacco etch virus (TEV) protease (Invitrogen) digestion was expressed in E. coli BL21-Star(DE3) (Invitrogen) grown in M9 minimal medium using a pET28a-based (Novagen, Madison, WI) expression plasmid by adding a final concentration of 1 mM isopropyl 1-thio-␤-D-galactopyranoside at 37°C for 3 h. Cells were harvested by centrifugation, resuspended in lysis buffer (50 mM Tris-HCl (pH 8.0), 300 mM NaCl, 10 mM imidazole), and disrupted by sonication. After centrifugation, 15 N-labeled and 13 C, 15 N-labeled oryctin with an N-terminal His 6 tag was purified with Ni Sepharose 6 Fast Flow (GE Healthcare). The His 6 tag was removed by tobacco etch virus (TEV) protease digestion. 15 N-labeled and 13 C, 15 N-labeled oryctin were further purified by cation exchange followed by size exclusion chromatographies with Mono S HR 10/10 and Superdex 200 HR 10/30 (GE Healthcare), respectively.

Dihedral Angle Restraints
The programs CSI (5) and TALOS (6) were used to predict the regions of ␣-helix and ␤-strand based on 13 C ␣ , 13 C ␤ , 13 CЈ, 1 H ␣ , and 15 N chemical shifts. The predicted angle ranges were used as the dihedral angle restraints.

Hydrogen Bond Restraints
Hydrogen bonds were detected by the following HgD exchange experiment. First, the 1 H-15 N HSQC spectrum of 0.2 mM 15 N-labeled oryctin dissolved in 50 mM sodium phosphate (pH 6.8), 100 mM NaCl, and 0.02% (w/v) NaN 3 in 90% H 2 O, 10% D 2 O (v/v) were acquired. Then, the solvent was changed to 50 mM sodium phosphate (pH 6.8), 100 mM NaCl, and 0.02% (w/v) NaN 3 in D 2 O by ultrafiltration using a Vivaspin-20 (molecular weight cutoff 3000; Sartorius Stedim Biotech, Aubagne, France). 1 H-15 N HSQC spectra were acquired at 6, 12, 24, 36, and 48 h after the solvent exchange. The peak intensities in these 1 H-15 N HSQC spectra were analyzed. Slowly exchanging amide protons were detected, and their respective carbonyl acceptors were deduced from the NOE data, and the oryctin structures at intermediate stages of the structure calculations. Hydrogen bond distance restraints were set as

Structure Calculation and Structural Similarity Search
Interproton distance restraints were derived from peak intensities in the 15 N-edited NOESY-HSQC and 13 C-edited NOESY-HSQC spectra of 13 C, 15 N-labeled oryctin with a mixing time of 75 ms. The cross-peak intensities were translated into interproton distances based on the relationship, NOE ϰ (distance) Ϫ6 ; the standard distance between H N i and H N iϩ1 in the ␣-helix was 2.8 Å, and the standard distance between H ␣ i and H ␣ j in the antiparallel ␤-sheet was 2.3 Å. Structure calculations were performed using CYANA (7). Two hundred conformers were annealed in 10,000 steps of torsion angle dynamics calculations, of which 20 conformers with the lowest values in the target function were used to represent the solution structure of oryctin. The conformer with the lowest target function was used as the representative structure of oryctin. The tertiary structure was visualized with the programs MOLMOL (8) and PyMOL (DeLano, W. L.). The five conformers with the lowest target functions were submitted to the Dali server (9) to search for proteins structurally similar to oryctin.

Protease Inhibition Assays
␣-Chymotrypsin, leukocyte elastase, subtilisin Carlsberg, endopeptidase K, trypsin, papain, and thermolysin were purchased from Sigma-Aldrich; V8 protease, pepsin, and lysyl endopeptidase were from Wako Pure Chemical Industries (Osaka, Japan); proline-specific endopeptidase was from Seikagaku Corp. (Tokyo, Japan); and endoproteinases Arg-C and Asp-N were from Roche Diagnostics. Peptidyl-MCA substrates (referred to as MCA substrates) and MOCAc/Dnp type fluorescence-quenching substrates (referred to as MOCAc/Dnp substrates) for the above-mentioned proteases (shown in Table 2) were purchased from Peptide Institute (Osaka, Japan). In the case of MCA substrates, the protease activity was measured using fluorescence intensity, where 380 and 460 nm were used as the excitation and detection wavelengths, respectively. In the case of MOCAc/Dnp substrates, the protease activity was measured using fluorescence intensity, where 328 and 393 nm were used as the excitation and detection wavelengths, respectively. The pH-dependent inhibitory activity was assayed as follows. A reaction mixture containing 3.3 nM protease, 10 nM oryctin, and 66, 100, or 130 M MCA substrate or 6.6, 10, or 13 M MOCAc/Dnp substrate in 10 mM Tris-HCl (pH 7.2-9.2) was incubated at 37°C for 10 min, and then HCl was added to stop the reaction. Exceptionally, for a reaction involving pepsin, The acquired data were plotted on Lineweaver-Burk plots. The kinetics parameters (K m and k cat ) were obtained with the reaction mixture containing 3.3 nM protease, 66, 100, or 130 M MCA substrate or 6.6, 10, or 13 M MOCAc/Dnp substrate in 10 mM Tris-HCl (pH 8.0). The reaction mixture was incubated at 37°C for 3 min, and then HCl was added to stop the reaction. This reaction was performed in triplicate. Exceptionally, for a reaction involving pepsin, the pH of the reaction mixture was adjusted to 1.5 with HCl, and the reaction was stopped by adding 3 M Tris-HCl (pH 8.0).

RESULTS
Peptide Isolation, cDNA Cloning, and Primary Structure Determination-Oryctin was found and isolated during the purification of antibacterial peptides from the hemolymph of the coconut rhinoceros beetle O. rhinoceros. About 1.0 g of oryctin was obtained from 1.0 ml of the hemolymph by four steps of reverse-phase HPLC. The N-terminal amino acid sequence was determined using a protein sequencer, and the oryctin cDNA was cloned by three steps of PCR using fat body mRNA from immunized larvae. The deduced amino acid sequence from the nucleotide sequence indicated that oryctin contains an 85-amino acid precursor, and a 66-amino acid mature peptide is assumed to be produced by cleavage of the signal peptide (supplemental Fig. S1). There is no sequence similarity between oryctin and any other known peptides or proteins. The MALDI-TOF MS data indicated that oryctin is a monomeric peptide with three intramolecular disulfide bonds and without any other modifications.
The molecular masses (m/z) of detected ions after the digestion with lysyl endopeptidase and endoproteinase Asp-N were 1707.57, 1998.21, and 2808.69, respectively, using MALDI-TOF MS (Fig. 1). These ions were assigned to the pairs of Cys I -Cys V , Cys II -Cys IV and Cys III -Cys VI , respectively. An ion with an m/z of 1800.48 was assigned to fragments 3-10 and 42-49, which also indicated the disulfide bond Cys I -Cys V . Thus, we have concluded that three disulfide bonds are formed in oryctin: Cys 7 -Cys 42 , Cys 12 -Cys 35 , and Cys 20 -Cys 56 (Fig. 1).
Gene Expression Analysis-Oryctin was constitutively expressed in the fat bodies, hemocytes, midguts, and Malpighian tubules of the third instar larvae. Injection of E. coli into the larvae did not affect the expression of oryctin (supplemental Fig. S2).
NMR Structural Analysis-By using the conventional threedimensional NMR data, 607 atoms of backbone and side chain 1 H, 13 C, 15 N were assigned. The 1 H-15 N HSQC spectrum of oryctin is shown in Fig. 2A. 100% (57 residues, except proline and the N-terminal residue) of the backbone 1 H, 13 Fig. 2B.
Tertiary Structure-The three-dimensional structure of oryctin was calculated based on 552 NOE-derived distance

TABLE 2 Kinetic parameters, specificity constants and inhibition constants of oryctin and aprotinin for the hydrolysis of MCA and MOCAc/Dnp substrates by eukaryotic and bacterial proteases
The parameters and constants were obtained at 37°C and pH 8.0, except for pepsin (at 37°C and pH 1.5). NI, no detectable inhibition.  restraints, three disulfide bond restraints (nine distance restraints), three hydrogen bond restraints (12 distance restraints), and 79 dihedral angle (38 and 41) restraints. A total of 200 structures were calculated, and the 20 structures with the lowest target functions were selected as an ensemble representing the solution structure of oryctin (Fig. 2C). The restraints used and the structural statistics for the final structure are summarized in Table 1.
Structural Similarity to the Third Domain of Turkey Ovomucoid-A structural similarity search using the Dali server (9) showed that oryctin is structurally similar to the third domain of OMTKY3 (turkey ovomucoid inhibitor) ( Fig. 3B; root mean square deviation ϭ 3.2 and 3.0 Å for 41 and 39 C ␣ pairs, respectively; PDB codes 1PPF and 1YU6, respectively), which is a Kazal-type serine protease inhibitor. Despite the lack of amino acid sequence similarity between them, their patterns of three disulfide bridges are the same: Cys I -Cys V , Cys II -Cys IV and Cys III -Cys VI (supplemental Fig.  S3A). Oryctin, consisting of 66 residues, has three intramolecular disulfide bonds (Cys 7 -Cys 42 , Cys 12 -Cys 35 , and Cys 20 -Cys 56 ), and OMTKY3, consisting of 51 residues, also has three intramolecular disulfide bonds (Cys 8 -Cys 38 , Cys 16 -Cys 35 , and Cys 24 -Cys 56 ). In terms of secondary structure, oryctin has two ␣-helices and two ␤-strands, whereas OMTKY3 has one ␣-helix and three ␤-strands. Both oryctin and OMTKY3 possess a backbone fold similar to those of Kazal-type inhibitors: a central ␣-helix and an antiparallel ␤-sheet. The structural difference between oryctin and OMTKY3 is largest in the C-terminal regions, where oryctin has an amphipathic ␣-helix, whereas OMTKY3 has a short ␤-strand. The amino acid sequence of the reactive site loop (P4-P3-P2-P1-P1Ј) including Cys II of OMTKY3 is Ala P4 -Cys P3(II) -Thr P2 -Leu P1 -Glu P1Ј , whereas that of oryctin is Leu P4 -Cys P3(II) -Thr P2 -Met P1 -Asp P1Ј (shown in purple in Fig. 3, A and B), suggesting that oryctin also could function as a serine protease inhibitor.
Protease Inhibitory Activity-Supplemental Fig. S4 shows the pH-dependent inhibitory activity of oryctin against eukaryotic and bacterial proteases. Oryctin inhibited the eukaryotic serine proteases ␣-chymotrypsin and leukocyte elastase and the  bacterial serine proteases subtilisin Carlsberg and endopeptidase K over a pH range of 7.2-9.2, indicating it acts as a serine protease inhibitor in the beetle hemolymph. Fig. 4 shows the temperature-dependent inhibitory activity of oryctin against the above-mentioned four proteases, demonstrating that oryctin has inhibitory activity in a temperature range of 10 -70°C against the four proteases in 10 mM Tris-HCl (pH 8.0). Based on Lineweaver-Burk plots, oryctin inhibited ␣-chymotrypsin, endopeptidase K, subtilisin Carlsberg, and leukocyte elastase in a competitive manner with K i values of 3.9 ϫ 10 Ϫ10 M, 6.2 ϫ 10 Ϫ10 M, 1.4 ϫ 10 Ϫ9 M, and 1.2 ϫ 10 Ϫ8 M, respectively (Fig. 5).

The Number of Residues between Cys II and Cys III Provides Insight into the Functional Characterization of Kazal-type
Inhibitors-Oryctin and the reported Kazal-type inhibitors that contain three disulfide bonds show no sequence similarity, indicating that oryctin and the other Kazal-type inhibitors have evolved convergently, whereas all of the Kazal-type inhibitors except oryctin have evolved divergently (supplemental Table S1 and supplemental Fig. S3B). In fact, the phylogenetic analysis of all the Kazal-type inhibitors in supplemental Table S1 except oryctin can be performed without error using the Phylogeny.fr server (supplemental Fig. S6) (14), but that with oryctin results in an error. Supplemental Table S1 compares the number of residues among six cysteine residues (Cys I -Xn a -Cys II -Xn b -Cys III -Xn c -Cys IV -Xn d -Cys V -Xn d -Cys VI ). The putative reactive site loop (Leu P4 -Cys P3(II) -Thr P2 -Met P1 -Asp P1Ј ) of oryctin contains Cys P3(II) . Met P1 , probably the most important for the inhibitory activity, is located between Cys II and Cys III . The number of residues between Cys II and Cys III (seven) is strictly conserved in all the members, including oryctin, with the exception of the second domain of rhodniin (15), which contains eight amino acid residues in that region. The number of residues between Cys II and Cys III yields insight into the functional characterization of the Kazal-type inhibitors. Interestingly, there are more residues between Cys I and Cys II in vertebrate inhibitors than in invertebrate ones (supplemental Table S1). The lymphoepithelial Kazal-type-related inhibitor (LEKTI) isolated from human blood ultrafiltrate, which consists of 15 Kazal domains, contains longer sequence stretches of 12-13 residues between Cys I and Cys II (16). This may imply that the N-terminal region of the Kazal-type inhibitors in higher animals played some additional role in the course of evolution.
Possible Biological Roles of Oryctin-Proteases and protease inhibitors are involved in the metabolism, immunity, and metamorphosis of insects (17,18). In particular, hemolymph proteases and protease inhibitors are involved in immune responses in insects, e.g. antimicrobial peptide induction, prophenoloxidase activation, hemolymph coagulation, and protection against virulence-related microbial proteases (17). In the present study, we have shown that oryctin inhibits subtilisinlike serine proteases such as subtilisin Carlsberg and endopeptidase K. This suggests that oryctin is also involved in protection against microbial proteases. In the silkworm Bombyx mori, a protease inhibitor against fungal protease was isolated and is considered to play a role in innate immunity (19). Although further investigation is required to reveal the in vivo function(s) of oryctin, the results obtained in this study indicate that oryctin is a novel and unique member of Kazal-type serine protease inhibitors. Despite the lack of sequence similarity, oryctin exhibits a similar linkage pattern of three disulfide bonds and an inhibition mechanism similar to that of the traditional Kazaltype inhibitors.