Processing of an Apicoplast Leader Sequence inPlasmodium falciparum and the Identification of a Putative Leader Cleavage Enzyme*

The plastid (apicoplast) of the malaria-causing parasite Plasmodium falciparum was derived via a secondary endosymbiotic process. As in other secondary endosymbionts, numerous genes for apicoplast proteins are located in the nucleus, and the encoded proteins are targeted to the organelle courtesy of a bipartite N-terminal extension. The first part of this leader sequence is a signal peptide that targets proteins to the secretory pathway. The second, so-called transit peptide region is required to direct proteins from the secretory pathway across the multiple membranes surrounding the apicoplast. In this paper we perform a pulse-chase experiment and N-terminal sequencing to show that the transit peptide of an apicoplast-targeted protein is cleaved, presumably upon import of the protein into the apicoplast. We identify a gene whose product likely performs this cleavage reaction, namely a stromal-processing peptidase (SPP) homologue. In plants SPP cleaves the transit peptides of plastid-targeted proteins. The P. falciparum SPP homologue contains a bipartite N-terminal apicoplast-targeting leader. Interestingly, it shares this leader sequence with a Δ-aminolevulinic acid dehydratase homologue via an alternative splicing event.

Plasmodium spp., the causative agents of malaria, belong to a family of intracellular parasites called the Apicomplexa. Plasmodium infects approximately 300 million people annually, causing over 1 million deaths, the great majority of which are caused by one species, Plasmodium falciparum (1). P. falciparum infects both humans and mosquitoes during its life cycle, with the pathogenic part of this cycle occurring predominantly in the erythrocytes of humans. The discovery of a non-photosynthetic plastid (the apicoplast) in the Apicomplexa has opened up a new area of anti-malarial drug targets (2,3). However, the rational development of drugs targeting the apicoplast requires knowledge of apicoplast function. Preliminary studies indicate that the apicoplast is a site of fatty acid and isoprenoid biosynthesis (4 -6), and drugs targeting these pathways have been shown to kill P. falciparum (4,6,7).
Like plant plastids, the apicoplast contains a reduced bacterial-like genome (8), from which a small number of proteins are expressed. The great majority of apicoplast proteins, as in plant plastids, are encoded in the nucleus and must be post-translationally targeted to the plastid. In plants, nuclear-encoded plastid proteins require a cleavable, N-terminal sequence called the transit peptide, which directs these proteins across the two membranes surrounding plant plastids (for reviews, see Refs. 9 and 10). Once in the plastid stroma, this transit peptide is cleaved by a stromal-processing peptidase (SPP 1 ; Refs. 11 and 12). Apicoplasts, however, are bound by four membranes (Ref. 2, but see Ref. 13), and proteins targeted to this organelle have been shown to require a bipartite N-terminal leader sequence (5,14,15). By fusing these N-terminal leader sequences to green fluorescent reporter protein (GFP), Waller et al. (5) show that the first part of this leader functions as a signal peptide, directing proteins to the secretory pathway. The second part of the leader is necessary to direct these proteins away from the default secretory pathway and into the apicoplast (5). Indeed, this second region has several similarities to the transit peptides of plants, and the transit peptidelike region of an apicoplast-targeted protein from the related species Toxoplasma gondii is sufficient to target proteins into isolated pea chloroplasts (14).
In a Western blot analysis of these apicoplast-targeted proteins, Waller et al. (5) found a prominent band whose size corresponded to that of the mature protein as well as a less abundant band slightly greater in size. In this paper, we perform a pulse-chase experiment to demonstrate that this larger band represents the apicoplast-targeted protein before cleavage of the transit peptide-like region has occurred. We then identify the site of transit peptide cleavage by N-terminal sequencing of the processed protein. Finally, we identify a putative SPP homologue in P. falciparum, which we propose is responsible for this cleavage. Interestingly, this gene shares, through alternative splicing, a putative N-terminal apicoplasttargeting sequence with a ⌬-aminolevulinic acid dehydratase (ALAD; also known as porphobilinogen synthase; EC 4.2.1.24) homologue, a protein that functions in heme biosynthesis. The evolution of this alternative splicing is considered.

EXPERIMENTAL PROCEDURES
Transfected Cell Line-Pulse-chase and N-terminal sequencing was performed on P. falciparum cells transfected with the acyl carrier protein (ACP) leader-GFP fusion protein, produced by Waller et al. (5). The fusion protein consists of the bipartite N-terminal leader sequence of the ACP (a fatty acid biosynthesis protein) fused to the N terminus of GFP. Cells were cultured using standard techniques (16), with py-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18  rimethamine drug added to select for parasites containing the ACP leader-GFP-containing episomes (5).
Pulse-chase of the ACP Leader-GFP Construct-Red cell cultures containing parasites were washed twice in methionine-free RPMI-HEPES medium (Sigma) then incubated for 45 min in RPMI-HEPES medium containing 35 S-labeled methionine and cysteine (PerkinElmer Life Sciences Expre 35 S 35 S protein labeling mix, final concentration 80 Ci/ml) supplemented with 5% Albumax (Invitrogen) and 5% heatinactivated human serum at 37°C and under 5% CO 2 and 1% O 2 gas in nitrogen. Cells were pelleted and washed twice with non-radioactive RPMI-HEPES (Sigma) and resuspended in this medium supplemented with 5% Albumax (Invitrogen) and 5% heat inactivated human serum, and equal aliquots were pipetted into 5 separate dishes. Cells were kept at 37°C under 5% CO 2 and 1% O 2 gas in nitrogen and harvested at t ϭ 0, 1, 2, 3, and 4 h by saponin lysis and subsequent immunoprecipitation (see below). Saponin lysis involves selectively permeabilizing the red cell membrane and parasitophorous vacuole with 0.15% (w/v) saponin in phosphate-buffered saline (0.8% w/v NaCl, 0.02% w/v KCl, 0.14% w/v Na 2 HPO 4 , 0.02% KH 2 PO 4 , pH 7.4). After immunoprecipitation, samples were boiled in sample buffer, and separated by SDS-PAGE (17). Radioactive bands were visualized using x-ray imaging film (Eastman Kodak Co., Biomax).
Immunoprecipitation of ACP Leader-GFP Constructs-After saponin lysis (see above), parasite-containing pellets were washed in phosphatebuffered saline and lysed in 500 l of immunoprecipitation lysis buffer (0.05 M Tris-HCl, pH 7.5, 1% v/v Triton X-100, 0.6 M KCl) containing protease inhibitors (0.4 mM phenylmethylsulfonyl fluoride, 5 g/ml pepstatin A, 5 g/ml amino-n-caproic acid, 5 g/ml aprotinin, 5 g/ml leupeptin). After lysis at room temperature for 5 min and then on ice for 30 min, the cell extract was spun down, and the supernatant was collected.
To remove proteins that bind non-specifically to protein A or the Sepharose beads used in the immunoprecipitation, the supernatant was incubated for 1 h in 80 l of protein A-Sepharose CL-4B beads (Amersham Biosciences) made to a 50% (v/v) slurry with immunoprecipitation wash buffer (0.05 M Tris-HCl, pH 7.5, 1% v/v Triton X-100, 1 mM EDTA, 0.15 M NaCl, 0.25% w/v bovine serum albumin). 5 l of polyclonal rabbit anti-GFP antibody (CLONTECH, catalog number 8372) was added to 60 l of 50% Sepharose slurry. This was incubated for 1 h at 4°C, added to the pre-cleared supernatant from the previous step, and incubated for a further 2 h at 4°C. Beads were washed four times in wash buffer and twice in phosphate-buffered saline. Proteins were eluted from the beads by boiling in non-reducing sample buffer (18).
N-terminal Sequencing-Immunoprecipitated proteins were separated using SDS-PAGE (17) and transferred onto polyvinylidene difluoride-plus membranes (Micron Separations Inc.; manufacturer's protocol) in transfer buffer (10% v/v methanol, 10 mM CAPS, pH 11). A band corresponding to the expected size for processed GFP was visualized with a Coomassie Blue stain, excised, and subjected to N-terminal sequencing on a Beckman N-terminal sequencer.
Gene Cloning and Sequencing-Homologues of a stromal-processing peptidase (PfSPP), an ␣-subunit mitochondrial-processing peptidase (PfMPPA), a ␤-subunit mitochondrial-processing peptidase (PfMPPB), and a ⌬-aminolevulinic acid dehydratase (PfALAD) were identified in the P. falciparum genome data base. 2 The preliminary sequence data for P. falciparum chromosome 14 was obtained from The Institute for Genomic Research web site (www.tigr.org). Sequencing of chromosome 14 was part of the International Malaria Genome Sequencing Project and was supported by awards from the Burroughs Wellcome Fund and the United States Department of Defense. Sequence data for P. falciparum chromosomes 6, 7, 8, and 9 was obtained from The Sanger Institute web site at www.sanger.ac.uk/Projects/P_falciparum. Sequencing of P. falciparum chromosomes 6, 7, 8, and 9 was accomplished as part of the Malaria Genome Project with support by The Wellcome Trust. Reverse transcription-PCR using gene-specific primers was used to construct cDNA clones containing the full protein coding region for each of these genes. GenBank TM accession numbers of the genes are as follows: PfSPP (AF453250), PfMPPA (AY069958), PfMPPB (AY064478), and PfALAD (AY064477). cDNA was produced from the D10 line of P. falciparum parasites. To determine whether any of these sequences has an N-terminal, apicoplast-targeting leader, we used the Prediction of Apicoplast Targeted Sequences (PATS) neural network (modlab.biologie.uni-freiburg.de/gecco/pats) (19).
Sequence Analysis-Although there is probably more than one sort of SPP in plants (20), only one has been cloned (21). This SPP belongs to a family of metalloendopeptidases known as the pitrilysin family (Ref. 21; peptidase family M16). To determine whether any of the pitrilysin proteins in P. falciparum are homologous to the plant SPPs, a phylogenetic tree of the pitrilysin family was constructed. The P. falciparum pitrilysin homologues (PfSPP, PfMPPA, PfMPPB, and falcilysin, an enzyme involved in the hemoglobin degradation pathway (accession number AAF06062)) were translated to the amino acid sequence and aligned to other pitrilysin protein sequences retrieved from the NCBI data base using ClustalW 1.8 and PIMA 1.4 in BCM Search Launcher (dot.imgen.bcm.tmc.edu:9331/multi-align/multi-align.html). Alignments were refined manually in PAUP*4.0b3. Graphical representations of alignments were produced in Boxshade 3.21 (www.ch.embnet.org/software/BOX_form.html). The other pitrilysins used in this analysis were Escherichia coli pitrilysin (P05458), an E. coli putative zinc protease (P31828), a Borrelia burgdorferi putative zinc protease (G70166), two putative zinc proteases from Synechocystis (S74378 and S77157), human insulysin (NP_004960), rat nardilysin (P47245), Arabidopsis thaliana SPP (BAB10480), Pisum sativum SPP (AAA81472), Eimeria bovis sporozoite developmental protein (SDP; P42789), A. thaliana ␤-MPP (AAF14827), rat ␤-MPP (Q03346), Schizosaccharomyces pombe ␤-MPP (T42428), A. thaliana ␣-MPP (AAK59675), rat ␣-MPP (P20069), and S. pombe ␣-MPP (CAA22672). In addition to those available in GenBank TM , two sequences representing homologues of PfSPP in Theileria parva and Plasmodium yoelii were included in the analysis. Preliminary sequence data from the P. yoelii genome was obtained from The Institute for Genomic Research web site (www.tigr.org). This sequencing program is carried on in collaboration with the Naval Medical Research Center and is supported by the United States Department of Defense. Preliminary sequence data from the T. parva genome was obtained from The Institute for Genomic Research web site at www. tigr.org. Neighbor-joining and maximum parsimony trees were constructed with PAUP*4.0b3. Trees excluded regions of the alignment where poor matching occurred, taking into account the common region of pitrilysin proteins, which centers around the active site area, leaving 189 characters for analysis. Bootstrap values from 1000 replicates were obtained.
Northern Analysis of PfSPP and PfALAD-Sequencing of the PfSPP and PfALAD cDNAs and comparison with the genome data revealed that the two 5Ј exons incorporated into the two cDNAs are identical, suggesting that the N-terminal region of both proteins are identical. To begin analyzing the nature of this shared N terminus, a Northern analysis was performed. RNA was extracted as described (22) and electrophoretically separated on a 0.9% agarose gel containing 5 mM guanidine thiocyanate followed by capillary transfer to Hybond Nϩ membrane (Amersham Biosciences) overnight (22). Membranes were prehybridized for 3 h in 7% SDS, 0.5 M sodium phosphate, 1 mM EDTA. Three probes were used for the analysis, one binding to the shared 5Ј region (amplified by the primers 5Ј-CATTATGAAAATTTTGAAATTT-TCAG and 5Ј-CACGGGATTATTCCATCTTC), one specific to PfALAD (5Ј-GGGAAAGACGTATAAAGAGG and 5Ј-CGTTATAAACAGCAAT-AGG), and one specific to PfSPP (5Ј-GAGCTACAAAATGGGTTAAG-CAAC and 5Ј-CGTCTGTGTCTTTAAACG). Probes were labeled with 25 Ci of [ 32 P]dCTP following the Prime-a-Gene (Promega) manufacturer's protocol, and membranes were hybridized overnight in the prehybridization buffer at 65°C. Membranes were washed 2-4 times in 0.5-1 ϫ SSC (1 ϫ SSC ϭ 0.15 M NaCl and 0.015 M sodium citrate), 0.1% SDS at 60 -65°C depending on the probe used, and the film was developed at Ϫ70°C for up to 2 weeks. Bands were sized using a 0.28 -6.58-kb RNA marker (Promega; 3 l in 30% v/v formamide, 4% formaldehyde, 1ϫ RNA gel-loading buffer (18) in 1 ϫ Tris-borate/EDTA (Merck)).

RESULTS
Pulse-chase to Analyze Processing of Apicoplast-targeted GFP-Proteins targeted to plant and green algal plastids contain N-terminal transit peptides that are both necessary and sufficient to target those proteins into the plastid (23). After import into the plant plastid, this transit peptide is cleaved (11). Proteins targeted to the apicoplast of Apicomplexa also contain N-terminal extensions, which differ from plant counterparts in that they consist of two parts. The first part is a signal peptide that directs proteins to the secretory pathway and is probably cleaved upon the protein entering the endo-plasmic reticulum (5). The second part of the leader (the socalled transit peptide) is thought to direct proteins through the secretory pathway and into the multi-membraned apicoplast. Western blots of apicoplast-targeted proteins have consistently revealed the presence of two forms of the given apicoplast protein (4,5,24). It is thought that the smaller of these two forms represents the mature, active protein, whereas the other, larger protein represents the pre-processed form, which still bears the transit peptide part of the N-terminal leader. To determine whether this processing occurs and to analyze the kinetics of this process, a pulse-chase experiment was performed on a P. falciparum cell line containing an apicoplasttargeted GFP construct.
An anti-GFP antibody was used to immunoprecipitate the variously processed forms of GFP in the ACP leader-GFP cell line produced by Waller et al. (5). Western blotting was used to confirm that the procedure was successful in immunoprecipitating GFP (results not shown). Cells were pulsed for 45 min with radioactive methionine and cysteine. Immunoprecipitated proteins obtained from cells harvested at hourly intervals after the pulse were separated using SDS-PAGE, and an autoradiograph image was taken (Fig. 1). The two protein bands at around 28 and 32 kDa correspond to the expected sizes of the processed and transit peptide-containing ACP leader-GFP proteins, respectively (Fig. 1). Over the course of the chase, the top band became reduced in relative intensity compared with the bottom band. After the 45-min pulse (at t ϭ 0), some of the bottom band was already present, increasing in relative intensity over the course of the chase. This suggests that the minimum time for targeting proteins to the apicoplast is less than 45 min. It is interesting that the total amount of immunoprecipitated protein at t ϭ 0 in our pulse-chase experiment seems to be less than the other time points. We interpret this as resulting from some radioactive label being present in the red cell and parasite that, after the 45-min pulse, was not yet incorporated into protein.
N-terminal Sequencing-To confirm the identity of the immunoprecipitated protein and to determine whether and where cleavage of the transit peptide-like domain occurs, we at-tempted to sequence the N terminus of the processed form of the apicoplast-targeted GFP construct. Immunoprecipitated proteins from the ACP leader-GFP cell line were separated using SDS-PAGE and transferred to the polyvinylidene difluoride membrane. A band of around 28 kDa, corresponding to the expected size for processed GFP, was visualized after Coomassie staining (not shown). N-terminal sequencing of this band from the polyvinylidene difluoride membrane produced a five-residue sequence of LNRKN, corresponding to the first five amino acids of the mature ACP protein. This sequence begins 25 amino acids downstream of the predicted cleavage site for the signal peptide component of the bipartite leader of ACP (Fig. 2). The pre-processed form of GFP was not present in great enough abundance to enable N-terminal sequencing.
Phylogenetic Analysis of PfSPP-In plants, a SPP cleaves plant transit peptides soon after plastid-targeted proteins are imported into the plastid (12,25,26). The stromal-processing peptidases of plants belong to a family of metalloendopeptidases known as the pitrilysins (21), which also include both subunits of mitochondrial-processing peptidases (27) and the insulin-degrading enzyme insulysin (28). To assess the phylogeny of the cloned PfSPP homologue within the pitrilysin family, a phylogenetic tree incorporating a range of pitrilysin proteins was constructed (Fig. 3). Bootstrap values for neighbor joining (93%) and maximum parsimony (85%) trees support the grouping of PfSPP as well as the other apicomplexan SPP homologues with SPPs of plants. Contrary to previous suggestions (29), there is no obvious Synechocystis SPP homologue, with both Synechocystis pitrilysin proteins included in this tree not grouping with the SPPs. Of the other P. falciparum pitrilysins sequenced in this paper, one (PfMPPB) formed a clade with the ␤-MPP subunits of other organisms, whereas the other (PfMPPA) grouped with the ␣-MPP subunits of other organisms.
All catalytically active members of the pitrilysin family contain several highly conserved residues (30), including a conserved histidine-X-X-glutamate-histidine (HXXEH) zinc binding domain (28,31). Sequencing of cDNA from the putative SPP homologue reveals that it contains this domain as well as sharing several other highly conserved residues characteristic of the pitrilysin family (Fig. 4). Active pitrilysins contain a conserved glutamate downstream of the HXXEH motif that functions in the active site (32) and a second downstream glutamate that functions in zinc binding (33). Pitrilysins also contain a conserved asparagine around 25 residues downstream of the HXXEH motif and a tyrosine residue around 150 P. falciparum cells were pulsed with radioactive methionine and cysteine for 45 min. Cells were harvested at 1-h intervals over 4 h. GFP was purified by immunoprecipitation and separated by SDS-PAGE. The band at around 28 kDa represents processed GFP, with the band slightly higher than this thought to represent GFP with the transit peptide still attached (TP-GFP). Directly after the radioactive pulse (t ϭ 0), the relative amount of TP-GFP is high compared with processed GFP. Over the course of the chase, TP-GFP decreases in relative intensity, corresponding to cleavage of the transit peptide region to form processed GFP. After the 45-min pulse, there is a small amount of processed GFP. Assuming the processing occurs in the apicoplast, this suggests that 45 min closely approximates the minimum amount of time it takes for import of cytosolically translated, apicoplast-targeted proteins into the apicoplast.  Fig. 1 are shown in bold. Also shown is the putative signal peptide cleavage site, as predicted by SignalP V2.0.b2 (www.cbs.dtu.dk/services/SignalP-2.0). Cleavage of the signal peptide probably occurs shortly after co-translational import into the endoplasmic reticulum. It is thought that the transit peptide (lowercase) then mediates transportation of the apicoplast-targeted protein through the secretory pathway and across the membranes surrounding the apicoplast. Transit peptide cleavage most likely occurs after the protein has been imported into the apicoplast. residues downstream of the HXXEH motif (30). An alignment of PfSPP to other pitrilysins (Fig. 4) indicates that PfSPP contains all of these conserved residues and motifs except the conserved tyrosine residue, with PfSPP having a Phe in this position. One other catalytically active pitrilysin, the AXL1 protein involved in yeast budding (34), resembles PfSPP in having a phenylalanine residue in this position, suggesting that either tyrosine or phenylalanine (both aromatic residues) may be sufficient. Richter and Lamppa (12) identify four regions in plant SPPs that share a high level of identity. In an alignment of PfSPP to plant SPPs (not shown), PfSPP matches well to plant SPPs in region I (32% identity, 53% similarity), which encompasses the catalytic site and zinc binding domain. Nothing is known about the function of the other regions, but PfSPP matches less well to plant SPPs in regions II (24% identity, 44% similarity) and IV (19% identity, 38% similarity) and poorly to region III (11% identity, 26% similarity).
PfSPP Shares an N-terminal Leader with a Heme Biosynthesis Protein-The phylogenetic tree of pitrilysin proteins suggests PfSPP is a homologue of plant SPPs and, therefore, a likely candidate for cleavage of the transit peptide-like domain of apicoplast-targeted proteins. The transit peptides of apicoplast-targeting leader sequences are required to direct proteins from the secretory pathway to within the apicoplast (5). Thus, one prerequisite of PfSPP functioning in cleaving transit peptides of apicoplast-targeted proteins is that this nuclear-encoded protein is itself targeted to the apicoplast. All apicoplasttargeted proteins identified to date contain a bipartite leader sequence (4 -6, 14, 24, 35). In P. falciparum, this leader consists of a hydrophobic signal peptide at the very N terminus followed by a region rich in basic and asparagine residues (19). 3 Sequencing of PfSPP cDNA and comparison with the genome data revealed that it encodes a large transcript consisting of six exons. The N terminus of the predicted protein sequence contains a hydrophobic region followed by a region rich in basic and asparagine residues, consistent with PfSPP being an apicoplast-targeted protein. The PATS neural network, designed specifically to detect apicoplast-targeted proteins in P. falciparum (19), predicts PfSPP to be apicoplast-targeted. Unexpectedly, however, the two 5Ј exons of PfSPP (E1 and E2 in Fig. 5), which encode most of the predicted apicoplast leader sequence, are not immediately upstream of the exons comprising the remainder of the SPP gene. Rather, these exons are immediately upstream of another gene, PfALAD, situated 5Ј to the remaining SPP exons (Fig. 5). This suggested that the two genes might share the leader sequence exons by alternative splicing (Fig. 5). PfALAD, a ALAD homologue (36), is involved in the heme biosynthesis pathway, and in plants ALAD is located in the plastid (37). Sequencing of a cDNA clone of PfALAD confirms that it also bears the same two 5Ј exons, encoding the same bipartite apicoplast-targeting domain found on PfSPP. The ALAD and SPP cDNA sequences are incomplete at their 5Ј and 3Ј ends, meaning that the exact sizes of E1, E6, and E10 (Fig. 5) are not known. A recent paper sequenced the entire ALAD transcript and found it to be around 1.6 kb in size (38). Contrary to this, our cDNA sequencing suggests PfALAD is at least 1.8 kb in size, with Northern analysis (Fig. 6) indicating it may be up to 2.6 kb.
To begin analyzing how such unusual genes are transcribed, we performed a Northern blot. We used three separate probes for the blot. A probe specific to PfSPP revealed a transcript of around 6.6 kb and a probe specific to PfALAD revealed a clear band at ϳ2.6 kb, whereas a probe to the common exons of PfSPP and PfALAD revealed two bands of ϳ6.6 and 2.6 kb in size (Fig. 6). The two bands identified with the common probe correspond to the sizes of the single bands observed for the two gene-specific probes. This confirms the cDNA analysis showing that the two 5Ј exons of the two transcripts are identical and suggests that alternative splicing after these two exons produces the separate PfSPP and PfALAD transcripts. Significantly, the PfSPP band (6.6 kb) is considerably larger in size than that of PfALAD (2.6 kb).

DISCUSSION
This paper describes processing of the leader sequence of an apicoplast-targeted protein in P. falciparum and presents a candidate for the enzyme that performs this cleavage reaction. Together, the pulse-chase experiment (Fig. 1) and the N-terminal sequencing (Fig. 2) provide direct evidence that transit peptide regions of apicoplast-targeted proteins are post-translationally removed. The pulse-chase experiment also provides information about the kinetics of the import process. The time taken for the unprocessed band to disappear over the chase corresponds to the time it takes for proteins to transport from the ER into the apicoplast, where the cleavage presumably takes place. Because a small amount of processed protein was already visible after the 45-min pulse (Fig. 1), we can infer that the minimum time from synthesis to cleavage is less than 45 min. We attempted to determine the minimum time for processing, but shorter pulses did not provide enough labeling to visualize the proteins. Nevertheless, the relative amounts of precursor versus processed protein at the various chase inter- FIG. 3. Phylogenetic tree of the pitrilysin family. An unrooted neighbor-joining tree is shown, with bootstrap confidence values (1000 replications) shown for both neighbor-joining (left) and maximum parsimony (right) algorithms. Bootstrap values of less than 50% are indicated by a dash (-). The putative P. falciparum SPP and the corresponding homologues in P. yoelii and T. parva group with the SPPs of plants, with bootstrap values of 93% for neighbor-joining and 85% for maximum parsimony algorithms. The SPP group appears to branch with putative zinc proteases from E. coli and B. burgdorferi and not with any Synechocystis pitrilysin proteins as previously predicted (29). MPP group together with reasonable bootstrap support, with the ␣ and ␤ subunits forming apparent clades within this group. Falcilysin, a P. falciparum food vacuole protein, appears quite distinct from other members of the group represented on the tree. E. coli pitrilysin clades with several eukaryotic proteins, including human insulysin and E. bovis sporozoite developmental protein (SDP). The tree is based on 189 characters.  (30). P. falciparum SPP contains most of these conserved residues. The exception is the conserved tyrosine (Y), with PfSPP containing a phenylalanine (F) in this position. Of the other P. falciparum pitrilysin proteins sequenced in this paper, PfMPPB contains all the conserved residues. PfMPPA lacks many of the conserved residues, including the histidines within the HXXEH motif. A similar observation has been made in other MPP ␣ subunits, which are thought to play a lesser role in the enzymatic activity of MPPs compared with the ␤ subunit (64). vals suggests that the minimum time required for apicoplast targeting is probably around 30 -40 min. The ACP leader-GFP fusion protein is under the direction of the calmodulin promoter, a promoter that is likely expressed at high levels (39). In our estimate of processing time, we are assuming that the ACP leader-GFP fusion protein is not expressed at levels where it interferes with the apicoplast import machinery, for instance by saturating apicoplast membrane receptors or the cleavage enzyme (PfSPP). Clearly, analyzing the processing of a native protein rather than an introduced one would address this confounding variable. Attempts to purify native ACP using an anti-ACP antibody produced by Waller et al. (5) were unsuccessful.
Assuming this limitation has a minimal effect on the kinetics of apicoplast targeting, how does an apicoplast import time of about 40 min compare with plastid targeting in other organisms? Evidence suggests that less than 10 min is required for plastid targeting in plants and green algae (40), whereas around 40 min is required in Euglena gracilis (41). Like apicoplasts, the plastids of E. gracilis are derived by secondary endosymbiosis, and nuclear-encoded plastid proteins are targeted via the secretory pathway (41). The greater time it takes for the proteins of these secondarily derived plastids to arrive at their destination compared with plant plastid proteins could, thus, be explained by their trafficking via the somewhat more circuitous secretory pathway.
Cheresh et al. (42) recently described similar analyses of an ACP leader-GFP fusion protein, where pulse label and chase also confirmed that the putative transit peptide-bearing precursor was converted to a processed form. Whereas our ACP leader-GFP fusion protein is under direction of the calmodulin promoter (5) and is maximally expressed at trophozoite and schizont stages of the erythrocyte life cycle (43), Cheresh et al. (42) utilized the histidine-rich protein promoter to drive ACP leader-GFP fusion protein, which results in maximal expression early on (ring stage) in the intraerythrocytic life cycle. Somewhat unexpectedly, Cheresh et al. (42) were able to immunoprecipitate significant amounts of pre-processed protein from outside the parasite (i.e. in the supernatant of the saponin lysis). They interpreted this to mean that proteins are targeted to the apicoplast via the parasitophorous vacuole, a compartment that surrounds the outside of the parasite but is still within the host erythrocyte (see review in Ref. 44). We did not attempt to recover ACP leader-GFP fusion protein from the parasitophorous vacuole in our experiments because light microscopic analysis of our parasites has never indicated that GFP is targeted to this compartment. It is possible that the targeting of ACP leader-GFP fusion protein to the parasitophorous vacuole observed by Cheresh et al. (42) is an artifact, with the apicoplast import apparatus not sufficiently well established at ring stage to cope with overexpressed apicoplasttargeted GFP, resulting in the protein exiting the cell via the default secretory pathway. No information about expression of apicoplast-targeted proteins through the erythrocyte stages is yet available, so it is unclear which promoters are suitable when studying apicoplast targeting in transgenic systems. Clearly it will be necessary to track a native apicoplast-targeted protein through the endomembrane system of Plasmodium to resolve the exact route. Our laboratory is currently investigating the role the parasitophorous vacuole may play in apicoplast targeting.
N-terminal sequencing of the processed reporter identified the cleavage motif QVNF2LNRKN (where 2 represents the point of cleavage). This indicates that the transit peptide domain of P. falciparum ACP is 24 amino acids, consistent with Western blot analyses (5). Because the fusion protein was de- FIG. 6. Northern blot on P. falciparum RNA using probes specific to SPP and ALAD RNA as well as a probe specific to the shared exons of the SPP and ALAD transcripts. The common probe binds to two transcripts, one corresponding to the size of the message identified with the SPP-specific probe (ϳ6.6 kb) and the other corresponding to the message identified with the ALAD-specific probe (ϳ2.6 kb). The size of these transcripts is somewhat larger than the cDNA that was sequenced, suggesting that parts of the 5Ј-and/or 3Ј-untranslated regions remain to be sequenced. The fact that the ALAD transcript is smaller in size than the SPP transcript suggests it may contain a different polyadenylation site to the SPP transcript. signed to contain several amino acids that were almost certainly part of the mature ACP (5), the "context" of the cleavage site should have been maintained, so we believe the cleavage is faithful to the native processing. Clearly, though, it would be desirable to purify and sequence native proteins. A loosely defined cleavage motif has been identified for plant transit peptides (45). This plant motif of (isoleucine/valine)-X-(alanine/ cysteine)2alanine (where 2 represents the cleavage site) is clearly different from the P. falciparum ACP site (Fig. 2), suggesting such a motif is not relevant for apicoplast proteins. A more valid comparison may be to organisms whose plastid is more closely related to the apicoplast. Proteins targeted to the plastids of several other phyla require a bipartite leader similar to that of P. falciparum (see review in Ref. 46), and evidence suggests that the apicoplast shares a common secondary endosymbiotic origin with many of these (47,48). The cleavage sites of a few heterokont (brown algal and diatom) and cryptomonad proteins have been identified or predicted (49 -51), with many containing a methionine residue in the -1 position (50,51). The cleavage site of one other putative apicoplast-targeted protein, the fatty acid biosynthesis protein FabI, has been determined in P. falciparum. This protein has a methionine in the -1 position (7), thus conforming to the "methionine at -1" rule of other secondary endosymbionts. However, given that the ACP cleavage site lacks a methionine in this position, not all apicoplast transit peptide cleavage sites conform to this motif. Fig. 3 is, to our knowledge, the first phylogenetic tree of the pitrilysin family. The mitochondrial-processing peptidases form a clade within the pitrilysin family, with the ␣and ␤-MPP subunits forming groups within this clade. Another subfamily seems to be represented by a clade that includes human insulysin, rat nardilysin, Eimeria SDP, and E. coli pitrilysin. The tree includes four pitrilysins from P. falciparum, two of which branch with the ␣and ␤-MPP subunits. The food vacuole protein falcilysin (30) seems quite distinct from other members of the group represented on the tree. The remaining P. falciparum pitrilysin branches with plant SPPs, which together form a distinct SPP clade within the family. Indeed, the presence of a putative apicoplast-targeting leader at its N terminus suggests that PfSPP is targeted to the apicoplast. Analysis of the preliminary sequence data from a range of Apicomplexa indicates that this SPP homologue is widespread throughout the group. These putative apicomplexan SPPs are the first non-plant SPPs that have been identified. Although pitrilysin proteins probably have similar catalytic mechanisms (28), the substrates they bind to and cleave are very diverse, ranging from plastid and mitochondrial-targeting peptides, to insulin and hemoglobin fragments. An alignment of PfSPP with other pitrilysins revealed that PfSPP shares most of the conserved residues around the active site (Fig. 4). Given that homology in the group exists around the catalytic site, it might be assumed that regions where the pitrilysins do not match are important in substrate recognition. PfSPP is quite divergent to plant SPPs in these other regions. This may reflect the difference in P. falciparum transit peptides (rich in asparagine and basic residues) compared with plant transit peptides (rich in hydroxylated and basic residues) that must be recognized by their respective SPPs. Our laboratory is currently attempting to determine whether PfSPP actually functions in recognizing and cleaving apicoplast transit peptides.
The Apicomplexa obtained their plastids by secondary endosymbiosis (for review, see Ref. 46). A feature of secondary endosymbioses is that a large proportion of the genes from the nucleus of the endosymbiont are transferred to the host cell nucleus (for review, see Ref. 52), with the genes required for plastid function targeted back to the plastid. An unexpected finding during this study was that PfSPP shares its two 5Ј exons (E1 and E2 in Fig. 5), essentially its predicted apicoplasttargeting leader sequence, with a homologue of the heme biosynthesis enzyme ALAD. To our knowledge, this is the first time two plastid-targeted proteins have been shown to share a predicted plastid-targeting leader. One other example of proteins sharing a leader is known. In rice and maize, the ironsulfur subunit of succinate dehydrogenase and the ribosomal protein S14 (rps14) share a mitochondrial-targeting sequence (53,54). In rice, a non-functional copy of rps14 is present in the mitochondrial genome. Mitochondria, like plastids, arose via an endosymbiotic process, with much of the genetic material of the symbiont transferred to the nucleus of the host (55). It is believed that rice rps14 is an example of how this transfer might occur. The transferred gene inserts into the nuclear genome near a protein already containing a mitochondrialtargeting sequence and uses this existing targeting sequence to get the gene product back into the organelle (53,55).
Alternative splicing enables PfSPP and PfALAD to share an N-terminal, organelle-targeting leader. This raises the intriguing possibility that one of the two proteins transferred from the endosymbiont nucleus (or perhaps even from the plastid genome) and inserted itself behind an already existing apicoplast leader sequence in the nuclear genome. In the case of PfALAD, this would mean inserting within PfSPP, directly behind the leader, and for PfSPP this would mean inserting behind PfALAD (Fig. 5). With the genomes of several Apicomplexa currently being sequenced, it may be possible in the near future to chart how this situation may have evolved. It is interesting to note that P. yoelii has an ALAD homologue inserted within its SPP, whereas this in not the case for the SPP homologue of the more distantly related apicomplexan T. parva.
Regardless of whether they transferred from the symbiont nucleus, the plastid genome or from another point within the host cell genome, at some point either PfALAD or PfSPP became associated with the leader sequence of the other. But simply inserting within or behind an existing apicoplast-targeted gene is not in itself sufficient for the protein to become targeted to the apicoplast. First, the transferred gene needs to be spliced in a way for it to be functionally expressed as a protein with a leader. How, then, does this splicing occur?
A Northern blot indicates that the PfALAD transcript is smaller than the PfSPP transcript (Fig. 6). This is a somewhat unexpected result, since we might expect the PfALAD transcript to be larger in size than the PfSPP transcript, given that the PfSPP transcript would have the four PfALAD exons spliced out as an intron, whereas this is not the case for PfALAD. This result is best explained by the PfALAD transcript containing a different polyadenylation site to PfSPP (Fig.  5). Numerous examples are known where alternatively spliced proteins have different polyadenylation sites (for review, see Refs. 56 and 57). In many of these examples, the proximal (i.e. first) polyadenylation site can be regulated. This suggests that the proximal polyadenylation site at the PfSPP/PfALAD locus may be subject to regulation, with transcripts containing the proximal poly(A) site spliced to form PfALAD mRNA and transcripts containing the distal poly(A) site spliced to form PfSPP mRNA. Upon prolonged exposure of the northern blots in Fig.  6, a weak band, larger in size than the SPP band, appears on membranes incubated in both SPP and ALAD specific probes (not shown). It is possible that this band represents transcripts cleaved at the distal polyadenylation site, but where the ALADspecific exons and introns have not been spliced out. It would be interesting to determine whether such transcripts are functional in translation or whether they represent a primary transcript (i.e. heterogeneous nuclear RNA) or indeed whether they are non-functional transcripts formed from leaky regulation at the proximal polyadenylation site.
ALAD catalyzes the second step of the heme biosynthesis pathway, forming porphobilinogen from two molecules of ⌬-aminolevulinic acid. Recent experiments have suggested that the majority of ALAD activity in Plasmodium spp. is derived from enzymes imported from the host erythrocyte (58,59). The northern results presented here suggest that PfALAD is actively transcribed during the intraerythrocytic stage of the life cycle of P. falciparum. A recent paper, furthermore, found that PfALAD rescues an E. coli ALAD null mutant (38), suggesting the enzyme is functional in heme biosynthesis. This raises interesting questions about the role PfALAD may play in the intraerythocytic cycle. The presence of an apicoplast-targeting leader strongly suggests that PfALAD localizes to the apicoplast. Plant ALAD proteins are similarly targeted to the plastid (37), and PfALAD appears to have a Mg 2ϩ binding domain unique to plant ALADs (36). In plants, ⌬-aminolevulinic acid is formed via the so-called 5-carbon pathway (for review, see Ref. 60). It is curious, however, that ⌬-aminolevulinic acid in Plasmodium spp. is formed via the so-called 4-carbon pathway, a single step reaction catalyzed by ⌬-aminolevulinic acid synthase (61,62), an enzyme not found in plants (review in (60)). In other organisms, ⌬-aminolevulinic acid synthase is located in the mitochondrion (63). It appears then that enzymes that potentially function in de novo heme synthesis in Plasmodium spp. occur in up to three separate compartments. How the parasite coordinates heme biosynthesis and how such a composite and seemingly complex system evolved is an area that warrants further study.