Functional analyses yield detailed insight into the mechanism of thrombin inhibition by the antihemostatic salivary protein cE5 from Anopheles gambiae

Saliva of blood-feeding arthropods carries several antihemostatic compounds whose physiological role is to facilitate successful acquisition of blood. The identification of novel natural anticoagulants and the understanding of their mechanism of action may offer opportunities for designing new antithrombotics disrupting blood clotting. We report here an in-depth structural and functional analysis of the anophelin family member cE5, a salivary protein from the major African malaria vector Anopheles gambiae that specifically, tightly, and quickly binds and inhibits thrombin. Using calorimetry, functional assays, and complementary structural techniques, we show that the central region of the protein, encompassing amino acids Asp-31–Arg-62, is the region mainly responsible for α-thrombin binding and inhibition. As previously reported for the Anopheles albimanus orthologue anophelin, cE5 binds both thrombin exosite I with segment Glu-35–Asp-47 and the catalytic site with the region Pro-49–Arg-56, which includes the highly conserved DPGR tetrapeptide. Moreover, the N-terminal Ala-1–Ser-30 region of cE5 (which includes an RGD tripeptide) and the additional C-terminal serine-rich Asn-63–Glu-82 region (absent in orthologues from anophelines of the New World species A. albimanus and Anopheles darlingi) also played some functionally relevant role. Indeed, we observed decreased thrombin binding and inhibitory properties even when using the central cE5 fragment (Asp-31–Arg-62) alone. In summary, these results shed additional light on the mechanism of thrombin binding and inhibition by this family of salivary anticoagulants from anopheline mosquitoes.

Saliva of blood-feeding arthropods carries several antihemostatic compounds whose physiological role is to facilitate successful acquisition of blood. The identification of novel natural anticoagulants and the understanding of their mechanism of action may offer opportunities for designing new antithrombotics disrupting blood clotting. We report here an in-depth structural and functional analysis of the anophelin family member cE5, a salivary protein from the major African malaria vector Anopheles gambiae that specifically, tightly, and quickly binds and inhibits thrombin. Using calorimetry, functional assays, and complementary structural techniques, we show that the central region of the protein, encompassing amino acids Asp-31-Arg-62, is the region mainly responsible for ␣-thrombin binding and inhibition. As previously reported for the Anopheles albimanus orthologue anophelin, cE5 binds both thrombin exosite I with segment Glu-35-Asp-47 and the catalytic site with the region Pro-49 -Arg-56, which includes the highly conserved DPGR tetrapeptide. Moreover, the N-terminal Ala-1-Ser-30 region of cE5 (which includes an RGD tripeptide) and the additional C-terminal serine-rich Asn-63-Glu-82 region (absent in ortho-logues from anophelines of the New World species A. albimanus and Anopheles darlingi) also played some functionally relevant role. Indeed, we observed decreased thrombin binding and inhibitory properties even when using the central cE5 fragment (Asp-31-Arg-62) alone. In summary, these results shed additional light on the mechanism of thrombin binding and inhibition by this family of salivary anticoagulants from anopheline mosquitoes.
The ability of hematophagous insects to use blood as a food source involves complex behavioral, morphological, and physiological adaptations to find suitable hosts, pierce their skin, and efficiently acquire and digest blood. As a result, the saliva of blood-feeding arthropods carries a complex mixture of compounds playing crucial roles in counteracting the three powerful and highly redundant responses of vertebrates to tissue injury: hemostasis, inflammation, and immunity (1). Blood feeding evolved independently several times in different taxa (2), and this resulted in a large variety of salivary anti-hemostatic factors targeting platelet aggregation, vasoconstriction, and blood clotting (1,3). Salivary anticoagulants found in the mosquito family offer a good example of this convergent evolution; members of the Culicinae subfamily (i.e. Aedes and Culex species) inhibit factor Xa, whereas species belonging to the anopheline subfamily adopted a thrombin-directed anticoagulant activity (4).
The anti-thrombin of anopheline mosquitoes was first identified in the South American malaria vector Anopheles albimanus as a small cysteine-less polypeptide of 61 amino acids named anophelin (5). Synthetic anophelin was shown to be a highly specific, slow, tight-binding, reversible inhibitor of ␣-thrombin (6). It binds both the catalytic site and the anion binding exosite 1 (TABE1) of thrombin, a region that is known to be involved in the recognition of fibrinogen (7). A cDNA encoding the anophelin orthologue in the African malaria vector Anopheles gambiae had been previously identified by a selective cloning strategy and the encoded protein named cE5 (8 amino acids) displaying an additional stretch of amino acids at the C terminus. Despite these differences, recombinant A. gambiae cE5 showed full preservation of anti-thrombin function, although the two proteins displayed slightly different binding kinetics. The A. albimanus anophelin is a slow-binding thrombin inhibitor (6), whereas the A. gambiae cE5 behaves as a fastbinding inhibitor (9), raising the possibility that the extended C-terminal may participate in thrombin binding. The crystal structure of the A. albimanus anophelin in complex with human thrombin revealed its unique inhibition mechanism in comparison to other known thrombin inhibitors (10). In particular, the C-terminal segment (A(Glu-32-Pro-61)) appeared sufficient for thrombin inhibition, with residues A(Asp-33-Phe-45) blocking the exosite I and the highly conserved DPGR tetrapeptide (A(Asp-50 -Arg-53)) occupying the active site cleft of the enzyme and disrupting the characteristic catalytic triad of the serine proteinase.
The release of the genomes of 16 Anopheles mosquitoes (11) allowed identifying anophelin/cE5 orthologues from several additional species, offering further insights into the evolution of this unique family of thrombin inhibitors. Multiple alignment of cE5/anophelin family members showed that, in comparison to the New World species A. albimanus and Anopheles darlingi, orthologues from the Old World Anopheles species display an extended, serine-rich, C-terminal region (10,12). In order to investigate the possible involvement of the C-terminal region (cE5(Asn-63-Glu-82)) in thrombin-binding and inhibition as well as the role of the conserved N-terminal portion (cE5(Ala-1-Ser-30)), we performed a structural and functional analysis using both the cE5 protein and a set of cE5-derived peptides. Moreover, the three-dimensional structure of the human ␣-thrombin-A. gambiae cE5 complex unveiled the details of thrombin recognition and inhibition by an anophelin orthologue from an Old World Anopheles species.

Structural properties of cE5 as determined by circular dichroism and NMR spectroscopy
Prediction analysis carried out by DISOPRED (VL-XT predictor; www.pondr.com) 3 (50) suggested that the cE5 protein, as previously reported for the A. albimanus anophelin (10), was intrinsically disordered in solution. However, in contrast to A. albimanus anophelin, cE5 displayed a slight propensity to be structured in the region encompassing amino acids cE5(Pro-54) to cE5(Ser-64) (Fig. 1). Similar results were obtained using the database of intrinsically disordered proteins, MobiDB (not shown) (13). Furthermore, cE5 displayed a typical circular dichroism (CD) spectrum of an unstructured protein, with a minimum at 202 nm (Fig. 2). After the addition of 2,2,2-trifluoroethanol (TFE) 4 as co-solvent, which may help exploring the intrinsic tendency of a polypeptide to assume secondary struc-ture elements (14 -16), we found a very low propensity of cE5 to adopt an ␣-helical conformation. Indeed, even with high concentrations of TFE (60%), an increase of only ϳ10% in the ␣-helical content was observed (inset in Fig. 2).
The conformational properties of cE5 were also investigated using NMR spectroscopy. The 1 H, 15 N HSQC spectrum of 15 Nlabeled cE5 showed a limited chemical shift range (Fig. 3A), indicative of an unfolded molecule having few secondary structure elements, consistent with the random-coil state suggested by CD data. In addition, the 1 H, 15 N HSQC spectra of uniformly 15 N-labeled cE5 (40 M) in the absence and the presence of different amounts of TFE (10, 30, and 60% (v/v)) ( Fig. 3, B-D) were recorded in the same experimental conditions used for CD analyses. In complete agreement with the CD data, these NMR experiments indicated a continuous increase in signal dispersion (i.e. in structuration) with increasing TFE concentration. However, even at relatively high TFE concentrations, HSQC spectra still resemble those of canonical intrinsically disordered proteins (Fig. 3, B-D).

Alignment of anophelin family members from different Anopheles species
The sequencing of the genomes of 16 anopheline species (11) spanning ϳ100 million years of evolution allowed identification of several additional members of the cE5/anophelin family (12). Alignment of the 18 full-length orthologues highlighted their considerable divergence: 18 -77% amino acid identity among the different Anopheles species (excluding comparisons within the A. gambiae species complex). Overall, a block of 16 amino acids, including the highly conserved tetrapeptide APQY, can be recognized at the N-terminal region, and another highly conserved tetrapeptide, DPGR, is found toward the C terminus (Fig. 4). Between these two conserved blocks there is a more divergent central region enriched in acidic residues (Aspϩ Glu ϭ 19 -33%). A careful examination of the aligned proteins disclosed three features of potential functional relevance: (i) the N-terminal RGD tripeptide in A. gambiae and in a few other species of the complex; (ii) the conserved DPGR tetrapeptide, with proline to alanine replacement in Anopheles atroparvus and Anopheles epiroticus; (iii) the presence of a longer C-terminal in Old World anophelines.   The tripeptide RGD is known for its involvement in binding to integrins (17). RGD-containing proteins with anti-platelet activity (disintegrins) have been found in snake venoms and in the saliva of blood-feeding arthropods (ticks and horseflies), leeches, and worms but never in mosquitoes. Disintegrins act as antagonists of integrin ␣IIb␤3 and inhibit fibrinogen binding to platelets and subsequent platelet cross-linking (18). Typically, integrin-binding RGD motifs are positioned in the loop of peptide hairpins formed by disulfide bonds (19), which is not the case for the cE5 RGD. We tested cE5 and the cE5-derived P1 peptide ( Fig. 4) in platelet aggregation assays. We found no or very little effect on platelet aggregation (supplemental Fig. S1), which seems to rule out interaction of cE5 RGD with integrin ␣IIb␤3 (even though we cannot exclude binding to other integrins).
The DPGR tetrapeptide was previously found to play a crucial role in in vitro binding to ␣-thrombin (20) and shown to occupy the active-site cleft of the enzyme in the anophelin/ thrombin crystal structure (10). Substitutions of the catalytic triad-disrupting aspartate and of the S1-targeting arginine residues in the A. albimanus protein were detrimental for the inhibitory activity (10). We do not know the effect of replacing the proline in this tetrapeptide with an alanine. However, it is conceivable that given the compact size of alanine, it may replace the pivotal proline residue without much impact in inhibitor activity.

Binding studies
As determined by ITC, cE5 formed a very stable complex with thrombin (K d Յ 1 nM), with an apparent affinity slightly higher than anophelin (K d Յ 2 nM; Table 1, Fig. 5, A and B). In addition, binding assays with the cE5-based peptides P1, P2, P3, and P5 suggested that the region involved in the interaction with thrombin resides almost completely on the fragment cE5(Asp-31-Arg-62), corresponding to the P2 peptide (K d Յ12 nM). This was underscored by the comparable ⌬H values obtained for cE5 (Ϫ9.85 kcal/mol) and P2 (Ϫ9.66 kcal/mol) ( Table 1 and Fig. 5, C-E). Surprisingly, binding experiments with a P5 yielded larger K d (Յ25 nM) and smaller ⌬H (Ϫ4.89 kcal/mol) values than those obtained for P2, pointing to a negative contribution in thrombin-binding of the region corresponding to P3 in the absence of the N-terminal portion encompassed by P1 (Table 1, Fig. 5F).
It is worth noting that the binding affinity (K a ) of cE5 to thrombin is beyond the limit of direct calorimetric determination. Under these conditions, ITC titration still provides an accurate measurement of the binding enthalpy (Table 1). However, ITC can be used to determine the complete binding thermodynamics of ligands with affinities down to the picomolar range using displacement titration (21), which was performed for this system in the presence of a weak thrombin inhibitor, the synthetic tetrapeptide D-Phe-Pro-D-Arg-Ala (22). Under these conditions, cE5 displayed a binding constant of 0.3 nM ( Table 1,   in line with the value previously reported (9,10) and with that obtained from the thrombin amidolytic activity inhibition assays in this study ( Table 2).
We also used NMR titration experiments and analysis of 2D 1 H, 15 N HSQC spectra (23) to study protein-protein interactions. HSQC spectra of 15 N-labeled cE5 were recorded in the absence and presence of increasing amounts of unlabeled thrombin. At cE5/thrombin ratios Ͼ1/5, numerous spectral changes could be observed, pointing to interactions taking place between the two proteins (supplemental Fig. S3). However, it was not possible to reach saturating conditions, which would have allowed gaining more detailed structural information. This is most likely due to the relatively large size of the complex, which caused severe line broadening and loss of sensitivity in the NMR spectra (supplemental Fig. S3). In addition, 2D 1 H, 1 H NMR spectra collected for the P2 peptide showed a similar lack of relevant chemical shift dispersion and the absence of a consistent number of nuclear Overhauser effect (NOE) contacts, confirming a disordered state for the P2 peptide in aqueous solution (supplemental Fig. S4, right panel). Moreover, no significant change in either chemical shift and/or NOE intensity could be observed when comparing 2D 1 H, 1 H NOESY (Nuclear Overhauser Enhancement Spectroscopy) experiments with the P2 peptide (470 M) in the absence and the presence of a substoichiometric amount of thrombin (25 M) (supplemental Fig. S4). These observations suggest that under our experimental conditions, the peptide "dynamically" binds to thrombin by preserving disorder and flexibility.

Thrombin inhibition assays
The inhibitory effect of cE5 and of peptides P1, P2, P3, and P5 on the in vitro amidolytic activity of thrombin was measured after the hydrolysis of a fluorogenic substrate. A thrombin inhibition of ϳ80% was found for cE5, in line with previous measurements using a chromogenic substrate (9). The P2 peptide still retained inhibitory properties even though it was significantly less effective than cE5 (ϳ39% inhibition), whereas neither P1 nor P3 affected thrombin activity (Fig. 6). The inhibitory effect of P5 was not significantly different from those observed using P2 alone or any combination including either P2 or P5.
Overall, thrombin inhibition results fit quite well with ITC data, confirming that the region corresponding to the P2 peptide (cE5(Asp-31-Arg-62)) is the main segment responsible for both binding to thrombin and for the inhibitory activity. The P1 peptide (cE5(Ala-1-Ser-30)) did not bind thrombin and did not affect its activity, and these results are fully consistent with previous observations on A. albimanus anophelin (10). Moreover, our results seem to rule out any direct functional role of the serine-rich C-terminal region in thrombin inhibition. In fact, the P3 peptide alone did not bind to or inhibit thrombin, and the P5 peptide, which encompasses both P2 and P3, was not a better inhibitor or a more efficient binder than P2.
Unsurprisingly, A. gambiae cE5 displays the same unique mode of thrombin inhibition observed for A. albimanus anophelin (10), further confirming it as a general mechanism for this class of inhibitors (MEROPS family I77; Ref. 24; Fig. 7B). In agreement with the results of CD (Fig. 2) and NMR measurements (Fig. 3), the inhibitor adopts a mostly extended conformation, running in a reverse orientation to substrates on the surface of thrombin and interacting with both the exosite I and the active-site region of the proteinase (Fig. 7, A, C, and D). Further supporting the observed binding mode, both fulllength A. gambiae cE5 and its truncated P5 fragment display 3 orders of magnitude larger inhibition constants toward the exosite I-disrupted ␥-thrombin than toward ␣-thrombin (Table 2).

Bidentate binding of cE5 to thrombin
Similar to the A. albimanus anophelin-thrombin complex (10), the cE5 fragment spanning residues cE5(Glu-35) to cE5(Asp-47) blocks the exosite I of the proteinase (Figs. 4 and  7C). However, despite the overall resemblance, the first half of this cE5 segment runs closer to the proteinase surface, with its main-chain atoms deviating significantly from the path of the equivalent region of A. albimanus anophelin (Fig. 7B). The inhibitor's tripeptide cE5(Glu-35)-cE5(Glu-36)-cE5(Phe-37) Table 2 Inhibition of human ␣and ␥-thrombin by A. gambiae cE5 The affinity for thrombin of the P5 fragment is comparable with that of full-length cE5. Disruption of thrombin's exosite I decreases the inhibition potency of cE5 (and P5) by 3 orders of magnitude. K i values Ϯ S.E. given are representative of two independent experiments.

Discussion
Overall, the cE5 central region represented by the P2 peptide appeared as the main responsible for both thrombin binding and inhibition. The fact that only the cE5(Glu-35-Asn-63) segment of peptide P5 could be modeled in the crystal structure and no significant electron density was detectable for the serine-rich C-terminal portion reinforces this main role of the region corresponding to P2. Moreover, the C-terminal region was not constrained by crystal packing, explaining its observed disorder (supplemental Fig. S5). The binding of cE5 to thrombin was to a large extent similar to that previously described for A. albimanus anophelin. Nonetheless, the cE5 N-terminal and C-terminal regions still appear to play some role in both binding and inhibition. Indeed, when these two disordered segments were missing, we observed a decrease in binding affinity (Յ1 nM for cE5 versus Յ12 nM for P2) as well as in thrombin inhibition (80% for cE5 versus 39% for P2). These more efficient binding and inhibitory properties of the full-length protein are in agreement with the flanking model, already proposed for interactions involving intrinsically disordered proteins (28,29). Unexpectedly, ITC studies indicated a negative contribution of the C-terminal region encompassed by P3 (higher K d and lower ⌬H values for P5 than for either P2 or the entire cE5 protein). This conflicts with the conservation of the C-terminal serinerich extension among Old World anophelines, suggesting a potential functional role for this region. However, specific experimental conditions and/or the in vitro nature of our study should be taken into account. The situation may be different in vivo, where the regions encompassed by peptides P1 and P3 may provide interaction sites for other binding partners (e.g. the RGD tripeptide may be involved in binding to integrins) and/or undergo post-translational modifications (as suggested from the different mobility between recombinant and native cE5 protein in SDS-PAGE) (9). In conclusion, through structural, functional, and binding studies we shed additional light on the unique mechanism of thrombin binding and inhibition by this family of salivary anticoagulants from anopheline mosquitoes. It is anticipated here that future proteomic and structural studies on salivary proteins of hematophagous animals are expected to provide insights into novel antihemostatic molecules and novel mechanisms of disrupting blood clotting that may be of great help in the design of novel antithrombotics. This appears especially true considering (i) the fast evolutionary rate of salivary proteins (11,12), (ii) the convergent evolutionary nature of hematophagy, and (iii) the fact that so far we acquired some knowledge on the salivary repertoires of just a few of the Ͼ14,000 arthropod species estimated to feed on blood (30).

Circular dichroism analyses
CD spectra were recorded at 20°C in the far UV (190 -260 nm) on a Jasco J-710 spectropolarimeter (31). Each spectrum was obtained averaging three scans, subtracting contributions from corresponding blanks and converting the signal to mean residue ellipticity in units of degree ϫ cm 2 ϫ dmol Ϫ1 . Spectra were recorded using 10 M cE5 in 10 mM phosphate buffer at pH 7.5 in the absence and in the presence of increasing concentrations (up to 60% (v/v)) of 2,2,2-trifluoroethanol (TFE, 99.5% isotopic purity, Sigma). Prediction analysis of secondary structure content was performed with CDPro (31,32).

Isothermal titration calorimetry
ITC studies were performed at 22°C with an ITC200 calorimeter (MicroCal/GE Healthcare). Anophelin and cE5 (100 M) or the peptides P1, P2, P3, and P5 (150 M) were titrated into a solution of human ␣-thrombin (20 M; Hematologic Technologies Inc., HCT-0020). Before all titration experiments proteins and peptides were dialyzed against 20 mM HEPES, pH 7.5, 50 mM NaCl. All data were analyzed and fitted using the Microcal Origin version 7.0 software package. Binding enthalpy, dissociation constants, and stoichiometry were determined by fitting the data using a one-set-of-site-binding model. ITC runs were repeated twice to evaluate the reproducibility of the results.

Thrombin amidolytic activity assay
The amidolytic activity of human ␣-thrombin or ␥-thrombin (Hematologic Technologies) was followed using Tos-Gly-Pro-Arg-p-nitroanilide (Chromozym TH; Roche Applied Science) as the chromogenic substrate. Inhibition assays were performed in 50 mM Tris, pH 8.0, 50 mM NaCl, 1 mg/ml bovine serum albumin with 0.2 nM human ␣-thrombin or ␥-thrombin, 100 M substrate, and varying concentrations of inhibitor (0 -80 nM for ␣-thrombin or 0 -200 nM for ␥-thrombin). Inhibition constants (K i ) were determined according to a tightbinding inhibitor model using the Morrison equation (39) with GraphPad Prism 6.0. All reactions were initiated by the addition of thrombin and were carried out at least in duplicate at 37°C in 96-well microtiter plates. Reaction progress was monitored at 405 nm for 30 min on a Synergy2 multimode microplate reader (BioTek).

Platelet assays
Inhibition of collagen-mediated platelet aggregation was evaluated using human platelet-rich plasma obtained from donors at the National Institutes of Health blood bank and diluted to a density of 2 ϫ 10 5 platelets/l in Tyrode-BSA (40) (final volume 300 l). cE5, P1, or an equal volume of Tyrode buffer was added to each sample, which was then placed in an aggregometer (40) and stirred at 1200 rpm at 37°C for 1 min before the addition of collagen (type-1 fibrils, Chrono-log) to a concentration of 1.6 g/ml.

Crystallization of A. gambiae cE5 in complex with human ␣-thrombin
Human ␣-thrombin (Hematologic Technologies) was mixed in 20 mM HEPES, pH 7.5, 125 mM NaCl with a 4-fold molar excess of P5 peptide (cE5(Asp-310 -Glu-82)) and incubated on ice for 1 h. The resulting complex was concentrated by ultrafiltration using a 2-kDa cutoff centrifugal device (Sartorius). An initial crystallization screen at 20°C on sitting drop geometry was performed at the High Throughput Crystallization Laboratory of the European Molecular Biology Laboratory (Grenoble, France). Preliminary crystallization conditions were systematically optimized in-house until single monoclinic crystals belonging to space group C2 were obtained after 4 -6 days at 20°C using the vapor diffusion sitting-drop method from drops consisting of equal volumes (1 l) of protein complex (at 6.4 mg/ml) and precipitant solution (0.1 M PCTP, pH 5.0, 25% (w/v) PEG 1500) equilibrated against a 300-l reservoir.

Data collection and processing
Crystals were cryoprotected by brief immersion in 0.1 M PCTP, pH 5.0, 20% (w/v) PEG1500, 20% (v/v) ethylene glycol or 0.1 M PCTP, pH 5.0, 35% (w/v) PEG1500 and flash-cooled in liquid nitrogen. Diffraction data were collected from two isomorphous crystals (2100 and 2400 images in 0.05°oscillation steps and 0.037-s exposure) on a Pilatus 6 M-F detector (Dectris) using a wavelength of 0.973 Å at beam line ID30B of the European Synchrotron Radiation Facility (Grenoble, France). Data were integrated with XDS (41), scaled with XSCALE (41), and reduced with utilities from the CCP4 program suite (42). Data collection statistics are summarized in supplemental Table 1. The raw experimental datasets for the three-dimensional structure reported in this manuscript (PDB 5NHU) are available in the SBGrid Data Bank (https://data.sbgrid.org/data/) 3

Structure determination and refinement
The structure of the complex was solved by molecular replacement with PHASER (46) using the coordinates of free wild-type human ␣-thrombin (PDB entry 3U69; Ref. 22) as the search model. Alternating cycles of model building with COOT (47) and refinement with PHENIX (48) were performed until model completion (refinement statistics are summarized in supplemental Table 1). All crystallographic software was supported by SBGrid (49). Refined coordinates and structure factors were deposited at the PDB with accession number 5NHU.