Structure of the C-terminal Domains of Merozoite Surface Protein-1 from Plasmodium knowlesi Reveals a Novel Histidine Binding Site*

The protozoan parasite Plasmodiumcauses malaria, with hundreds of millions of cases recorded annually. Protection against malaria infection can be conferred by antibodies against merozoite surface protein (MSP)-1, making it an attractive vaccine candidate. Here we present the structure of the C-terminal domains of MSP-1 (known as MSP-119) from Plasmodium knowlesi. The structure reveals two tightly packed epidermal growth factor-like domains oriented head to tail. In domain 1, the molecule displays a histidine binding site formed primarily by a highly conserved tryptophan. The protein carries a pronounced overall negative charge primarily due to the large number of acidic groups in domain 2. To map protein binding surfaces on MSP-119, we have analyzed the crystal contacts in five different crystal environments, revealing that domain 1 is highly preferred in protein-protein interactions. A comparison of MSP-119 structures fromP. knowlesi, P. cynomolgi, and P. falciparum shows that, although the overall protein folds are similar, the molecules show significant differences in charge distribution. We propose the histidine binding site in domain 1 as a target for inhibitors of protein binding to MSP-1, which might prevent invasion of the merozoite into red blood cells.

A necessary step in the life cycle of the malaria parasite is its entry into the red blood cell of a mammalian host. This process involves many surface proteins specifically expressed at the merozoite stage of the parasite, including merozoite surface protein (MSP) 1 -1 (also known as merozoite surface antigen-1 or MSA-1). MSP-1, a required component of the merozoite invasion machinery, associates noncovalently with other parasite proteins, including the C-terminal portion of MSP-6 (1). MSP-1 is synthesized as a 180 -225-kDa polypeptide capable of binding sialic acid (2,3). As a requirement for merozoite entry into a red cell, MSP-1 undergoes two processing steps, the first at merozoite release from an infected cell and the second during invasion of a red cell (4) (Fig. 1a). In the first processing step, the precursor polypeptide is cleaved into four chains (with apparent molecular masses of 83, 30, 38, and 42 kDa) held together by noncovalent contacts (4 -6). In the second processing step, the 42-kDa segment (MSP-1 42 ) is cleaved into two fragments with apparent molecular masses of 33 kDa (MSP-1 33 ) and 19 kDa (MSP-1 19 or MSP-1 inv ) (4). MSP-1 33 sheds from the surface, whereas MSP-1 19 remains anchored to the merozoite membrane by a glycosylphosphatidylinositol tail attached to the C-terminal residue (7)(8)(9)(10).
Because high antibody titer against the C-terminal region of MSP-1 protects against severe malaria, this molecule represents an attractive vaccine candidate (11)(12)(13). Antibodies against MSP-1 19 correlate with clinical immunity (14,15); however, recombinant MSP-1 19 is poorly immunogenic, presumably due to poor processing of the highly disulfide-bonded protein (16,17). Antibodies to the C-terminal region of MSP-1 fall into two categories: protective antibodies that inhibit merozoite invasion of red cells, and blocking antibodies that enhance invasion by interfering with the binding of protective antibodies. Protective antibodies include those that recognize the native conformation of MSP-1 19 (7, 18 -20), those that block proteolytic processing of MSP-1 42 (21), and certain antibodies to the 83-kDa N-terminal fragment of intact MSP-1 (22). Blocking antibodies bind to the first domain of MSP-1 19 , near the N terminus introduced by the processing of MSP-1 42 (22,23).
The extremely stable MSP-1 19 contains two consecutive epidermal growth factor (EGF)-like domains, a small disulfiderich fold also found in diverse proteins including Factor Xa (24), E-selectin (25), cyclooxygenase-2 (26), and integrin ␣ V ␤ 3 (27). In MSP-1, disulfide bonds are integral to the correct folding of the molecule (20). Although the full-length MSP-1 is highly polymorphic, the C-terminal EGF-like domains in MSP-1 19 are much less so, and they represent a conserved segment of a variable molecule. MSP-1 19 shows high sequence identity across Plasmodium species: P. knowlesi MSP-1 19 shares 82% sequence identity with P. cynomolgi and 51% sequence identity with P. falciparum.
To better understand the process of invasion of the Plasmodium merozoite into a red blood cell, we have solved the structure of the C-terminal domains of MSP-1 from P. knowlesi by x-ray crystallography. We have refined the structure to an R-factor of 23.4% and R-free of 26.4% to 2.4 Å resolution (Table  I). Our structure reveals features of MSP-1 novel to P. knowlesi and provides new insight into MSP-1 from other Plasmodium species. With four copies in the crystallographic asymmetric unit, we observe MSP-1 in four different environments, allowing us to infer novel properties of the molecule.

MATERIALS AND METHODS
Expression and Purification-DNA encoding the C-terminal 92 amino acids of the of P. knowlesi (Malayan H strain) MSP-1 (Gen-Bank TM accession code AAG24615) was inserted into the vector YEpRPEU-3 and expressed in the Saccharomyces cerevisiae VK1 cell line (36). The vector included an N-terminal yeast ␣-factor pre-pro secretory signal and a C-terminal hexahistidine tag. The plasmid is episomally maintained by supplying a functional HIS4 gene for the host VK1 cell line (his4). Protein expression is under control of the ADH2 promoter and induced by introducing ethanol as a carbon source during fermentation. The secreted protein product included 105 amino acids: 5 amino acids (E-A-E-A-S) from the cleaved mating factor, the 92 amino acids from MSP-1, and 8 amino acids (G-P-H 6 ) from the affinity tag. The protein was purified using nickel affinity, ion exchange, and gel filtration chromatography.
Crystallization and X-ray Data Collection-Purified MSP-1 19 protein was concentrated to 10 mg/ml for crystallization trials. Crystals were grown via vapor diffusion with a solution containing 30% polyethylene glycol 6000, 100 mM HEPES, pH 7.0. Crystals were transferred into the same solution supplemented with 20% glycerol before cooling to 100 K in a nitrogen stream. X-ray data were collected on a Rigaku RU-200 rotating anode generator equipped with nickel mirrors and a RAXIS 4 detector (Molecular Structure Corp., The Woodlands, TX). 180°of diffraction data were processed using DENZO (37), SCALEPACK (37), and TRUNCATE (38) to a resolution limit of 2.4 Å.
Phasing and Refinement-Molecular replacement was performed with the AMoRe package (38) using coordinates from the C-terminal domains of MSP-1 from P. cynomolgi (29). The P. cynomolgi model was rotated and translated against the 8 -4 Å P. knowlesi diffraction amplitudes, using correlation coefficients to rank solutions. Four successive objects were placed in the crystallographic asymmetric unit, whereas insertion of a fifth object failed to improve the correlation coefficient. Inspection of the packing showed no steric clashes in a unit cell with 49% solvent content. Rigid body refinement in the programs AMoRe (38) and Crystallography and NMR System (CNS) (39) was followed by model building in the program O (40). Residue numbering of the P. knowlesi structure corresponds to the mature secreted protein and begins at the first residue of the cleaved ␣ mating factor secretory signal. Refinement protocols in CNS included conjugate gradient minimization, simulated annealing, and temperature factor refinement. Models were built into simulated annealing composite omit maps calculated in CNS. Tight 4-fold noncrystallographic symmetry restraints (300 kcal/mol-Å 2 ) were imposed on all atoms in the early stages of refinement and later relaxed for atoms that differ among the four copies in the asymmetric unit. Refinement steps were accepted only if they reduced the R-free (of a test set comprised of 860 reflections, 5% of the total, selected using resolution shells). The R-work and R-free are 23.4% and 26.4%, respectively, using all reflections to 2.4 Å.
Calculations-Least squares superpositions were performed using the program LSQMAN (41) with a distance cutoff of 3.8 Å, and coordinate transformations were applied using the program MOLEMAN2 (41). Structural comparisons with the NMR structure of P. falciparum MSP-1 19 (28) use the most representative member of the ensemble. Electrostatic calculations include only the native MSP-1 19 sequence, without the secretory signal and purification epitope. Molecular figures were prepared using the programs MOLSCRIPT (42), BOBSCRIPT (43), and GRASP (44).

RESULTS
Overall Description of the Structure-The C-terminal region of P. knowlesi MSP-1 contains two successive EGF-like domains packed together tightly and related by a rotation of ϳ170° (Fig. 1b). The molecule is roughly square-shaped, with dimensions of about 35 Å ϫ 35 Å ϫ 15 Å. The angle between domains 1 and 2 leads to close approach of the N terminus of domain 1 and the C terminus of domain 2. Thus, as domain 2 is anchored in the membrane by a glycosylphosphatidylinositol linkage attached to the C-terminal residue, the N-terminal residue of domain 1 (i.e. the cleavage site in MSP-1 42 ) also points toward the membrane, in an appropriate orientation for processing by proteases attached to the merozoite membrane (28).
Each EGF-like domain in P. knowlesi MSP-1 contains a segment of random coil followed by four antiparallel ␤ strands (Fig. 1b). Typically, EGF-like domains contain six cysteines in three disulfide bonds with a connectivity of 1-3, 2-4, and 5-6. The MSP-1 EGF-like domains follow this paradigm, except that where F C is the calculated and F P is the observed structure factor amplitude of reflection h for the working/free set, respectively.
FIG. 1. The structure of MSP-1 19 from P. knowlesi. a, the two proteolytic processing stages of MSP-1. In the first processing step, the 200-kDa precursor is cleaved into four pieces. In the second step, the 42-kDa C-terminal segment is cleaved into two fragments, MSP-1 33 and MSP-1 19 . Segments are not to scale. b, the MSP-1 19 ribbon representation including disulfide bonds. In each domain, ␤ strands 1-4 are shown as ribbons labeled ␤1Ϫ␤4. Disulfide bonds connect random coil to ␤1, coil to ␤2, and ␤2 to ␤4. This front view, with domain 1 on the left and domain 2 on the right, is also shown in c and in Figs. 3 and 4. The non-native residues from the C-terminal purification epitope are colored gray. c, the ␣ carbons of the four molecules in the crystallographic asymmetric unit are shown superimposed. Chain A is yellow, chain B is green, chain C is blue, and chain D is red. The structures differ significantly only at the termini and loop 69 -74.
in domain 1, valine and tryptophan replace the middle (2-4) disulfide. The valine and tryptophan pack into the domain 1 hydrophobic core, in the same volume as the disulfide bond in canonical EGF-like domains.
The crystals of P. knowlesi MSP-1 19 contain four molecules in the asymmetric unit, revealing the protein in four different environments. The four molecules are highly similar, with r.m.s. deviations of 0.4 -0.6 Å for 77-81 C␣ atoms, with differences primarily at the two termini and residues 69 -74 on the loop between ␤ strands 1 and 2 in domain 2 (Fig. 1c). This loop shows differences across the four molecules in the crystallographic asymmetric unit and also within one molecule (as reflected by high B factors). The flexibility in this loop and in the termini corresponds to the mobility seen in the NMR structure of the P. falciparum MSP-1 19 (28).
In P. knowlesi MSP-1 19 , the extensive interface between the two domains buries 26 residues and 1090 Å 2 of surface area. This produces a rigid interface with little variation in the interdomain angle among the four copies in the asymmetric unit (Fig. 1c). The residues in the interface are among the most conserved in the MSP-1 19 protein: 23 of 26 residues are identical between P. knowlesi and P. cynomolgi, with three conservative substitutions; and 14 of 26 residues are identical between P. knowlesi and P. falciparum, with seven conservative substitutions. The highly conserved domain 2 sequence Asn 57 -Asn 58 -Gly 59 -Gly 60 packs in the interface between domains 1 and 2, forming a short stretch of left-handed helix in P. knowlesi and P. cynomolgi and a tight turn in P. falciparum. P. knowlesi domain 1 superimposes on domain 2 with an r.m.s. deviation of 1.2-1.6 Å for the four independent copies in the asymmetric unit (Table II).
Histidine Binding Site-In all four copies of MSP-1 19 in the crystallographic asymmetric unit, electron density appears in a shallow depression on the surface (Fig. 2a). In each copy, the density traces back to a histidine from the hexahistidine tag of a symmetry-related molecule. For each of the four bound histidines, the plane of the imidazole ring stacks roughly parallel to the plane of the indole ring of Trp 34 , and the carboxylic acid of Glu 42 pairs with the bound (and presumably protonated) histidine. In the histidine-binding pocket, Trp 34 forms the floor, and Glu 33 and Arg 35 form the walls (Fig. 2b). The hydrophobic tryptophan and the bound histidine appear in the middle of a region of negative electrostatic potential (Fig. 2c). Although the conformation of the C-terminal hexahistidine tail differs in the four copies in the asymmetric unit, one histidine from the tail fills a histidine-binding pocket of a symmetry-related molecule. In the four copies in the asymmetric unit, each histidine bind-ing buries between 167 and 206 Å 2 of accessible surface area. This same pocket can bind other positively charged moieties: in the crystal of MSP-1 19 from P. cynomolgi (29), a lysine from a symmetry-related molecule inserts into this pocket.
In all Plasmodium species except P. falciparum, domain 1 of MSP-1 19 lacks the middle disulfide bond found in canonical EGF-like domains. The replacement of a disulfide by a small residue (valine, isoleucine, or threonine) and a tryptophan produces a shelf on domain 1 capable of binding histidine. Tryptophan is the residue most commonly found buried in proteinprotein interfaces (30) because burial of its large hydrophobic surface area yields thermodynamic gain.
Protein Binding Surfaces-The analysis of crystal contacts can be used to map protein binding surfaces on a molecule to reveal potential biologically important surfaces on the molecule (31,32). We searched crystal contacts in MSP-1 19 to identify the favored sites for binding to proteins. Combining the four copies of P. knowlesi MSP-1 19 in the crystallographic asymmetric unit with the one copy of P. cynomolgi MSP-1 19 (29), we have observations of MSP-1 19 in five different crystal environments. We counted the number of times each residue participates in crystal contacts and mapped that number onto the surface of the protein (Fig. 3a). The P. knowlesi and P. cynomolgi crystal contacts correlate: of the 9 residues involved in crystal contacts in all four P. knowlesi copies, 7 make contacts in the P. cynomolgi crystals as well. The seven residues (Ile 14 , Asp 15 , Leu 28 , Trp 34 , Glu 42 , Ala 50 , and Ser 51 ) participate in crystal contacts in all five crystal packing environments. All of these except Ser 51 are conserved between P. knowlesi and P. cynomolgi, suggesting protein binding sites conserved across species. All of these except Leu 28 differ between P. knowlesi and P. falciparum, resulting in distinctly shaped MSP-1 19 surfaces across these species.
The crystal contacts show that MSP-1 19 domain 1 is highly favored for binding protein (Fig. 3a). Trp 34 , the floor of the histidine binding site in the center of domain 1, falls in an ideal position to participate in protein-protein interactions on the surface of the merozoite. In P. falciparum MSP-1 19 , domain 1 has been shown to be the binding site of monoclonal antibody G17.12 (33).
Comparison across Species-The high level of sequence identity in MSP-1 19 across Plasmodium species results in highly similar three-dimensional structures of the molecule (Figs. 3b  and 4a). The P. cynomolgi MSP-1 19 superimposes on the P. knowlesi structure with an r.m.s. deviation of 1.3 Å for 78 C␣ atoms, whereas the P. falciparum structure superimposes on the P. knowlesi structure with an r.m.s. deviation of 1.7 Å for 65 C␣ atoms (Table II). A superposition of the four P. knowlesi MSP-1 19 molecules with the P. cynomolgi and P. falciparum homologues reveals that the differences occur primarily at the termini and in the 69 -74 loop, the same loop that shows variability among the four molecules in the P. knowlesi crystal. MSP-1 19 from P. knowlesi exhibits a striking charge distribution (Fig. 4b). The molecule carries a pronounced negative potential, with an overall charge of Ϫ7 in the 92 residues of the two domains. There are no buried ion pairs in P. knowlesi MSP-1 19 ; rather, all charged residues are surface-accessible. The overall charge is species-specific because the equivalent segments of MSP-1 from P. cynomolgi and P. falciparum have overall charges of Ϫ4 and Ϫ2, respectively (Fig. 4, c and d). In P. knowlesi MSP-1 19 , the negative charge is concentrated in domain 2 (with an overall charge of Ϫ7), whereas domain 1 is neutrally charged overall. The acidic regions in domain 1 cluster near the histidine binding site.
Hydrophobicity maps of the three Plasmodium MSP-1 19 structures show a patch of nonpolar residues along the interface between the two domains (Fig. 4, bϪd). Of the 26 residues contributing to the interface, 15 are hydrophobic, and 3 are cysteines in disulfide bonds. Because the molecule is only 15 Å thick, some hydrophobic residues in the interdomain interface (including Tyr 25 , Tyr 27 , Leu 37 , Leu 38 , and Phe 88 ) remain partially solventaccessible. Surprisingly, the large patch of hydrophobic residues on the back surface is among the least active surfaces in protein binding, as judged by the crystal contact analysis (Fig. 3a).
The arrangement of charged and hydrophobic residues on MSP-1 19 may maintain specificity of the interactions between MSP-1 19 and its binding partners such as the other MSP-1 fragments and MSP-6. Complementary changes between residues in MSP-1 19 and its binding partners may allow evolution of the MSP-1 19 protein surface. DISCUSSION The region around Trp 34 appears to be a site on P. knowlesi MSP-1 19 19 in crystal contacts and across species. a, the P. knowlesi MSP-1 19 molecular surface is shown, colored by the number of crystal contacts each residue makes, using the four molecules in the P. knowlesi asymmetric unit and the one molecule in the P. cynomolgi asymmetric unit. The surface has a color gradient from 0 (white; no contacts in any of the five molecules) to 5 (blue; contacts in each of the five molecules) crystal contacts/residue. The histidine binding site is circled in red. b, ␣ carbon traces of MSP-1 19 from P. knowlesi, P. cynomolgi, and P. falciparum are superimposed. The four copies of P. knowlesi MSP-1 19 are colored yellow, green, blue, and red; the P. cynomolgi MSP-1 19 is colored magenta; and the P. falciparum MSP-1 19 NMR structure is colored white. Non-native residues from the purification epitope of P. knowlesi are included at the bottom of domain 2. 1 19 . Of the 40 independent EGF/laminin-like domain structures currently in the Protein Data Bank (34), only Plasmodium MSP-1 19 has lost one of its three canonical disulfide bonds. The loss of a structurally important and highly conserved disulfide bond is most likely due to a functional requirement. Second, all of the Plasmodium species (except P. falciparum) contain a tryptophan in domain 1 of MSP-1 19 in place of the canonical cysteine. The disulfide is replaced with a tryptophan and a smaller residue; whereas the smaller residue may become a valine, isoleucine, or threonine, the other member of the pair is always tryptophan, suggesting functional selection. Third, tryptophan is the residue most commonly found in protein-protein interfaces (30), so the replacement of a strictly conserved disulfide-forming cysteine with a tryptophan suggests the evolution of a protein binding site. Fourth, there are no instances of crystals of MSP-1 19 with an empty histidine-binding pocket. In all five independent crystallographic environments, a histidine or lysine is found stacked above the tryptophan ring. This pocket fills when the concentration of nearby protein becomes sufficiently high. Fifth, the crystal contact map of MSP-1 19 indicates that the tryptophan and nearby regions in domain 1 are very active in participating in protein-protein interactions.
The sequence differences between P. falciparum and the other Plasmodium species represent divergent evolution from a common precursor, but it is unclear whether P. falciparum has lost a binding site or the other Plasmodium species have evolved new functionality. Alternatively, the difference be-tween P. falciparum and the other Plasmodia may be due to the strong bias against tryptophan in P. falciparum (35). The virulent P. falciparum displays somewhat different disease pathology compared with the other Plasmodium species; perhaps the differences in MSP-1 19 proteins are related to the differences in disease severity. For example, P. falciparum invades a broader spectrum of red blood cells and reaches higher levels of parasitemia compared with other malaria species, potentially due to loss of receptor specificity in P. falciparum. Some of this specificity may be conferred by the MSP-1 protein during red blood cell invasion. Although P. falciparum lacks the tryptophan defining the floor of the binding pocket, all the other species of Plasmodium have this tryptophan, which generates a site for binding histidine. We propose the use of this histidine binding site as a target for the design of compounds that bind, for example, P. vivax MSP-1. From such a scaffold compound, new inhibitors that protect against merozoite invasion of a red blood cell might be developed.
Our structure reveals a new feature of the MSP-1 19 molecule, a histidine-binding pocket in domain 1. Because antibodies that protect against malarial infections often bind specifically to domain 1, this new property of the molecule might be used to engineer tighter binding and thus more protective antibodies. Alternatively, this histidine-binding pocket might be used to interfere with the binding of blocking antibodies. For example, a small molecule inhibitor containing an imidazole scaffold may be designed to overlap with the footprint of blocking an- FIG. 4. Electrostatic potential and hydrophobicity. a, the amino acid sequences from the crystals of MSP-1 19 from P. knowlesi, P. cynomolgi, and P. falciparum. Cysteines are highlighted in yellow. bϪd, surface representations of MSP-1 19 from three Plasmodium species are shown, colored by electrostatic potential (left) and hydrophobicity (right). Molecules are shown in two views 180°apart. Electrostatic potentials contoured from Ϫ10 kT (red) to ϩ10 kT (blue) are plotted onto the molecular surface of MSP-1 19 from P. knowlesi, P. cynomolgi, and P. falciparum, respectively. Hydrophobicity values calculated from modified solvent transfer experiments (45) are contoured from Ϫ2 kcal/mol (most hydrophobic, in yellow) to 2 kcal/mol (hydrophilic, in green). The hydrophobic patches found on the surface tend to be from exposed portions of residues forming the interface between domains 1 and 2. tibodies and thus disrupt the binding of blocking antibodies. The specific disruption of MSP-1 19 blocking antibodies without perturbation of protective antibodies would greatly enhance the immune response against the malaria parasite.