Unexpected Protein Families Including Cell Defense Components Feature in the N-Myristoylome of a Higher Eukaryote*

N-Myristoylation is an irreversible modification that affects the membrane binding properties of crucial cytoplasmic proteins from signal transduction cascades. We characterized the two putative N-myristoyltransferases of Arabidopsis thaliana as a means of investigating the entire N-myristoylation proteome (N-myristoylome) in a higher eukaryote. AtNMT1 compensated for the nmt1 defect in yeast, whereas AtNMT2 and chimeras of the two genes did not. Only AtNMT1 modified known N-myristoylated proteins in vitro. AtNMT1 is therefore responsible for the A. thaliana N-myristoylome, whereas AtNMT2 does not seem to have usual myristoylation activity. We began with the whole set of N-myristoylated G proteins in the A. thaliana proteome. We then used a reiterative approach, based on the in vitro N-myristoylation of more than 60 different polypeptides, to determine the substrate specificity of AtNMT1. We found that the positive charge on residue 7 of the substrate was particularly important in substrate recognition. The A. thaliana N-myristoylome consists of 437 proteins, accounting for 1.7% of the complete proteome. We demonstrated the N-myristoylation of several unexpected protein families, including innate immunity proteins, thioredoxins, components of the protein degradation pathway, transcription factors, and a crucial regulatory enzyme of glycolysis. The role of N-myristoylation is discussed in each case; in particular, this process may underlie the “guard” hypothesis of innate immunity.

N-terminal methionine excision (NME) 1 is a conserved, essential modification that affects approximately two thirds of the proteins of all proteomes (1,2). We use the higher plant Arabidopsis thaliana as a model system for in-depth studies of NME. In A. thaliana, 56% of the proteins of the cytoplasmic proteome are predicted to undergo NME. The molecular basis of NME was recently elucidated in A. thaliana. This pathway involves (i) organellar peptide deformylases (2)(3)(4), (ii) cytoplasmic and organellar methionine aminopeptidases (2,4), and (ii) a number of cytoplasmic N-acylases (5)(6)(7). We need to compare NME substrate characterization data with functional genomics data if we are to understand this process in all the compartments in which it occurs. For example, NME was recently shown to be essential in the plastid, where it stabilizes a small subset of key proteins (8).
In the cytoplasm, NME leads to a small number of posttranslational N-acylations (9,10). For example, protein N-terminal myristoylation (MYR) results in the irreversible addition of a saturated C:14 fatty acid to the N-terminal glycines of some proteins. Myristoyl-CoA:protein N-myristoyltransferase (NMT; EC 2.3.1.97) catalyzes the transfer of myristate to a number of eukaryotic and viral proteins (for a review on MYR, see Ref. 11). NMT is essential for viability in protozoans and fungi such as Saccharomyces cerevisiae (12)(13)(14). It has been suggested that MYR involves important protein components of signal transduction cascades and apoptotic proteins (15). The requirement for the unmasking, by NME, of glycine as the N-terminal residue for the initiation of MYR may account for NME being essential in the cytoplasm of lower and higher eukaryotes. Our studies aim to identify the molecular basis of the requirement for NME in higher eukaryotes. To achieve this goal, we need to characterize all the myristoylated proteins (MYR proteome or N-myristoylome), to identify the most sensitive, primary targets. Recent efforts at N-myristoylome prediction resulted in fine definition of the substrate sequence motif, by reanalysis of comprehensive kinetic and structural data (16). This led to the development of a computer program, Predictor, which predicts N-myristoylation by means of two types of analysis, for fungi or higher eukaryotes (17). However, regardless of the organism studied, approximately half of all bioinformatic predictions of potential substrates fall into the so-called "twilight zone." This precludes MYR prediction for these candidates. Therefore, kinetic measurements on N-terminal peptides or full-length proteins are required to obtain definitive data on a given NMT in higher eukaryotes. We recently developed a rapid and reliable MYR diagnostic to achieve this (7).
A straightforward approach to studies of the N-myristoylome is possible as only approximately 0.5% of cytoplasmic proteins are thought to undergo MYR (i.e. approximately 30 -200 proteins in a eukaryotic proteome 17). In this study, we aimed to determine the pattern of N-myristoylated proteins in a higher eukaryote proteome (that of A. thaliana) using experimental data obtained directly from the complete proteome. This method was first applied to one class of important signal proteins, GTP-binding proteins (G proteins; GP), and was then extended to several other natural protein substrates for which it was possible to make reliable N-myristoylome predictions. We identified a number of unexpected myristoylated proteins. This study was the first direct analysis of the N-myristoylome of a higher eukaryote.

Materials
All chemicals were purchased from Sigma. Stock Myr-CoA solutions (200 M) were dissolved in 1% Triton X-100 in 10 mM sodium acetate buffer, pH 5.6. Oligonucleotides were synthesized at MWG Biotech SA (Courtaboeuf, France), and peptides were synthesized at Eurogentec (Seraing, Belgium). In each case, 2-5 mg of peptide was synthesized, and peptides (85% pure) were dissolved in H 2 O to give 4 mM solutions. We defined the initiator methionine in polypeptide sequences as Met-1, although this residue is removed by NME. Thus, in peptide sequences, the N-terminal residue, usually a Gly, was numbered as amino acid 2. Nucleotide sequences were determined by the Big-Dye Terminator V3 method, with a 16-capillary ABI Prism 3100 Genetic Analyzer (Applied Biosystems).

Molecular Biology and General Genetic Methods
We obtained pBB131, which encodes Saccharomyces cerevisiae NMT (ScNMT1), from Jeffrey I. Gordon (Washington University, St. Louis, MO). ScNMT1 was excised from this plasmid and inserted into pET16b as previously described (7). The cDNAs for both the NMT of A. thaliana (AtNMT1 and AtNMT2) were cloned by rapid amplification of cDNA ends from a cDNA library prepared from 2-week-old A. thaliana seedlings as previously described (4). The nucleotide sequences of AtNMT1 and AtNMT2 are available from GenBank TM , under the accession numbers AF250956 and AF250957, respectively. NMT open reading frames (ORFs) were recloned in pQE31 (Qiagen), to generate N-terminal fusions with a His 6 tag, and were then inserted into pET16b. NMT fusions were also inserted between the EcoRI and XhoI restriction sites of the yeast URA3 galactose-inducible plasmid pYES (Invitrogen). The Saccharomyces cerevisiae strains YB332 (MAT␣ ura3 his3⌬200 ade2 lys2-801 leu2) and its thermosensitive derivative YB336 (MAT␣ nmt1-181 ura3 his3⌬200 ade2 lys2-801 leu2) (18) were provided by J. I. Gordon.
YP medium (1% yeast extract, 2% peptone) supplemented with 2% dextrose was used as the standard yeast liquid medium. Yeasts were transformed as described elsewhere (19) and cultured at 24°C. Transformed cells were selected on SD medium (1.8% agar, 1.43 g/liter yeast nitrogen base, 0.5% ammonium sulfate, Clontech yeast dropout amino acid supplement without uracil, 2% dextrose). We checked for effective complementation by culturing cells for 4 days on SD plates supplemented with 2% raffinose and then restreaking them on SD and YP plates supplemented with 2% galactose and incubating them at 24 and 35°C.

Protein Production, Purification, and Analysis
Protein production in Escherichia coli was achieved by transforming BL21-pRares (Rosetta; Novagen) cells with a given plasmid construct. Cells were grown at 22°C for 6 h in 2ϫ TY medium supplemented with 50 g/ml ampicillin and 34 g/ml chloramphenicol, to reach an A 600 of 0.9. They were then induced with 0.4 mM isopropyl-1-thio-␤-D-galactopyranoside and incubated for another 12 h, with shaking, for AtNMT1 and ScNMT1 (7). For AtNMT2, we used the same conditions, except that the growth medium was supplemented with 3% ethanol and the induction time was 5 h. In all cases, cells were harvested by centrifugation and resuspended in 10 -20 ml of buffer A, which consisted of 20 mM sodium phosphate buffer, pH 7.3, 500 mM NaCl, plus 10 mM 2-mercaptoethanol. Samples were subjected to sonication, and cell debris was removed by centrifugation. The supernatant (5-15 ml) was applied to a Hi-Trap chelating HP (0.7 ϫ 2.5 cm; AP-Biotech) nickel affinity column equilibrated in buffer A. Elution was carried out at a flow rate of 0.5 ml/min, in two steps, with buffer B (buffer A plus 0.5 M imidazole) followed by a linear 0.35 mM/min imidazole gradient. The pool of purified protein (5 ml) was first dialyzed against buffer A for 12 h and then against buffer A plus 55% glycerol for 24 h before storage at Ϫ20°C. Matrix-assisted laser desorption ionization time-of-flight analysis (MS Facility, Institut National de la Recherche Agronomique, Jouy, France) of purified AtNMT1 and AtNMT2 indicated that these two proteins were full-length, with molecular masses of 51.3 Ϯ 0.5 and 50.8 Ϯ 0.5 kDa, respectively. These values are entirely consistent with the theoretical values of 50.9 and 50.7 kDa, indicating that the purified enzyme corresponded to the full-length product of the corresponding ORF.
Proteins were overproduced in yeast cultured in YP medium containing raffinose (2%) until an A 600 of 0.9 was reached and then induced with 2% galactose. The cells were collected by centrifugation, and the equivalent of 5 A 600 was extracted with glass beads (425-600 m; Sigma) in a buffer consisting of 50 mM Tris, pH 8.0, 10% glycerol, 5 mM MgCl 2 , 5 mM EDTA, 1 mM dithiothreitol, and anti-protease mixture (Roche).
Protein concentration was determined with the Bio-Rad protein assay kit. Bovine serum albumin was used as the protein standard. Polyacrylamide gel electrophoresis in SDS denaturing gels (10% polyacrylamide gels; 0.75 mm thickness) was performed using the Mini-PROTEAN III system (Bio-Rad). Gels were stained with Bio-Safe Coomassie Stain (Bio-Rad) and blotted onto membranes, which were probed with anti-His antibodies (AP Biotech), as previously described (4).

NMT Assays
NMT activity was assayed at 30°C by continuously monitoring the absorbance at 340 nm of NADH, by coupling the reaction to pyruvate dehydrogenase activity (7). The standard assay was performed in a final volume of 200 l, in 1-cm optical-path quartz cuvettes. Changes in absorbance over time were followed using an Ultrospec-4000 spectrophotometer (AP Biotech), equipped with a temperature control unit and a 6-position Peltier heated cell changer. The reaction mixture contained 50 mM Tris, pH 8.0, 1 mM MgCl 2 , 0.193 mM EGTA, 0.32 mM dithiothreitol, 0.2 mM thiamine pyrophosphate, 2 mM pyruvate, 0.1 mg/ml bovine serum albumin, 0.1% Triton X-100, 5-1000 M peptide, 2.5 mM NAD ϩ , and 0.125 unit/ml porcine heart pyruvate dehydrogenase (2 units/mg). A radioactive discontinuous assay (20) was also used to confirm some results (detailed protocol available in Ref. 7).
The original A. thaliana proteome data base (23) was progressively purged of redundant and incorrect entries with the improved annotation developed by the Arabidopsis Genome Initiative (last update February 2003). This was achieved by cross-comparison of the data available at the The Arabidopsis Information Resource (www. arabidopsis.org) and Munich Information Center for Protein Sequences (mips.gsf.de/proj/thal/db/index.html). We extracted ORFs with given patterns from the data base, with the Pattern Matching program (www.arabidopsis.org/cgi-bin/patmatch/nph-patmatch.pl; see Ref. 24). The pattern syntax used was: "ϽM" constrains the pattern to the N-terminal residue (i.e. an initiator M), "[^Y]" means that residue Y (or a subset list) is excluded and "[Z]" means that residue Z (or a subset list) is included. X means any normal amino acid.
The Predictor scores corresponding to each ORF of the B4 library were calculated at mendel.imp.univie.ac.at/myristate/. To obtain the Arabidopsis Predictor score, the "Higher Eukaryote" option was used. Only the profile score taking into account essentially the physical properties of the peptide was used, and we did not consider the special weight parameters (terms T 9 -T 11 ) originally used to add penalties deduced from various parameters, including kinetic measurements (17). Instead, we included a special T 12 enthalpic parameter derived from our kinetic measurements and analysis (Tables I-III). Corrections to create the Arabidopsis score were as follows. Enzymatic and Structural Data-The kinetic parameters (k cat , K m , and k cat /K m values) were derived from iterative non-linear least square fits of the Michaelis-Menten equation to the experimental data (25). Confidence limits for the fitted values were determined by 100 Monte Carlo iterations, using the experimental standard deviations on individual measurements.
The amino acid sequence of AtNMT1 was aligned with that of Sc-NMT1 with InsightII software (Accelrys). The three-dimensional struc-ture of ScNMT1 bound to a peptide and a myristyl-CoA analog (26) was used to construct several three-dimensional models of AtNMT1, with the homology modeler module. The lowest energy structure of AtNMT1 was further minimized with the CharmM forcefield and superimposed on the structure of the peptide bound to ScNMT1.

AtNMT1
Complements an S. cerevisiae nmt Conditional Mutation, whereas AtNMT2 Does Not-Thanks to the systematic genome projects underway and the recent release of new data, many putative NMT sequences can now be identified from cDNAs or from genomic fragments. In particular, a number of new plant NMT sequences have been identified. We carried out a phylogenetic analysis of all available full-length putative NMT amino acid sequences (Fig. 1). This analysis indicated that plant NMTs are located between fungal and animal NMTs, on the basis of sequence similarity. Two ORFs encoding two different NMTs were identified in A. thaliana. AtNMT2 was located at the point of divergence from green algae, whereas AtNMT1 displayed strong sequence similarity to NMTs from other dicotyledonous plants. The cDNAs corresponding to these two proteins were cloned and used to produce AtNMT1 and AtNMT2 in E. coli, and these proteins were then purified (Fig. 2, panel A). We noted that a DNA sequencing error resulting in the elimination of a single C from the sequence and the absence of known cDNAs had resulted in incorrect annotation, intron-exon structure, and deduced amino acid sequence (At2g44170) of AtNMT2 in the various genome libraries. AtNMT1 and AtNMT2 encode proteins composed of 434 and 430 amino acids, respectively. They are 80% identical to each other and display 45 and 35% identity, respectively, with ScNMT1.
We then examined whether AtNMT1 and AtNMT2 displayed NMT activity. We first investigated whether the corresponding ORFs could compensate for an NMT defect in the yeast S. cerevisiae. A point mutation in the nmt gene in the S. cerevisiae mutant nmt-181 of strain YB336 induces a thermosensitive phenotype, with growth at 24°C and death at 35°C. Under non-permissive conditions, the growth defect can be cured by the expression of a gene encoding a functional NMT (27,28). The cDNAs for AtNMT1 and AtNMT2 were inserted into yeastinducible shuttle plasmids. Only the AtNMT1 construct was able to complement the growth defect of strain nmt-181 ( Fig. 2, panel B). As a control, we checked that both NMTs were actually produced in the complemented strains. Western blotting with anti-His antibodies showed this to be the case (data not shown). The known three-dimensional structure of NMTs can be roughly approximated to two separate domains: the N-terminal domain, which is involved in lipid binding and the catalytic mechanism; and a second domain involved in peptide recognition. We investigated why AtNMT2 was unable to complement the nmt-yeast strain, despite displaying 80% sequence identity to AtNMT1, by constructing six protein chimeras of AtNMT1 and AtNMT2, in which various changes were incorporated into the specific motifs of NMTs. Each construct was tested in the complementation assay ( Fig. 2, panel C), and the corresponding protein products were purified (Fig. 2, panel A). None of the constructs, except AtNMT1 and an AtNMT1 variant with a single serine substitution of residue Phe-183 (FS) was able to complement the yeast defect ( Fig. 2, panel B).
The purified AtNMT proteins and variants were assayed in vitro with two substrates: the calcium sensor AtSOS3 (29), an in vivo myristoylated protein, and its non-myristoylatable derivative AtSOS3-G2A (7). Only AtNMT1 and, to a lesser extent, the single F183S variant (FS) displayed significant NMT activity, the catalytic efficiency of AtNMT2 and the various chimeras being at least 4 orders of magnitude lower than that of AtNMT1 (8900 Ϯ 1300 M Ϫ1 s Ϫ1 ). In addition, neither AtNMT1 nor AtNMT2 was able to use AtSOS3-G2A as a substrate. Together, the genetic and enzymatic data indicate that At-NMT1 supports normal levels of NMT activity in A. thaliana and is kinetically responsible for the N-myristoylome. In contrast, AtNMT2 appears to display no such activity caused by complex changes affecting both domains of the enzyme. NMT1 showed that AtNMT1 and HsNMT1 differed considerably in terms of specificity (6), consistent with the phylogenetic analysis ( Fig. 1). These data also suggested that AtNMT1 may differ from ScNMT1 and HsNMT1 in certain other specific features that require clarification. We studied the small GP family, checking for effective myristoylation of these proteins. We chose to study the small GP superfamily for the following reasons: (i) genome annotation is simple because of known similarities in sequence characteristics, for GTP binding, for instance (32)(33)(34), and (ii) the members of one of its five distinct families (RAS, RAB, RHO, ARF, and RAN), the ADP-ribosylation factor (ARF) family, are almost systematically N-myristoylated (35). Consistent with the most recent report on this topic (36), we were able to identify 94 genes encoding GP in A. thaliana, although we did also detect an additional ARF (ARFE; At2g24765) (see Fig. s1, which is supplemental material available in the on-line version of this article). As a glycine residue is essential to NMT function, we selected all the GP with G as the second residue. The proteins selected comprised 13 ARFs, 4 ARLs, and 7 RABs (Fig. s1). The ␣ subunit of the heterotrimeric GP (GPA1), which belong to the GP family and also has a G in position 2, was added to the GP set under study. In vitro MYR experiments were performed with the various derived N-terminal peptides (Table II). Consistent with our expectations, all the ARF proteins were N-myristoylated, whereas the ARLs and the RABs were not. The only exception was ARA6, a RAB GP recently described as N-myristoylated in vivo (37) and in vitro (7). Finally, GPA1 was also found to be myristoylated.

Study of the Complete Set of Small G Proteins N-Myristoylated by AtNMT1 Demonstrates the Unreliability of Currently Available Prediction Tools-Determination
Investigation of the specificity of a large number of in vitro synthesized octapeptides in S. cerevisiae led to determination of the PS00008 PROSITE pattern (MG[^EDRKHPFYW]XX-[ACGNST][^P]) for MYR. This motif could not be used to predict N-myristoylation in A. thaliana, as (i) it did not predict the N-myristoylation of ARFB1b and ARF1c, which have an R at position 6 and (ii) it wrongly predicted the myristoylation of nine non-myristoylatable proteins: ARLA1a, ARLA1c, ARL1d, RABC1, RABC2a, RABC2b, and the three RABA1s. Thus, as reported in other studies, this pattern led to incorrect prediction of the N-myristoylome (16).
For the A. thaliana N-myristoylome, studied by considering the subset of GP, the "Higher Eukaryote" and "Fungi" options of the Predictor software (17) were used to calculate the associated scores. Catalytic efficiency values were plotted as a function of either of the two associated Predictor scores. As expected, the Fungi score was less reliable than the Higher Eukaryote score. However, even with the Higher Eukaryote score, we obtained 50% false predictions (Fig. 3, panel A). This result was not unexpected, as (i) the Higher Eukaryote score was optimized with kinetic parameters derived from the study of HsNMT1 and (ii) preliminary biochemical characterization of AtNMT1 showed clear differences in substrate specificity between HsNMT1 and AtNMT1 (6). The precise determination of myristoylation in a representative set of protein examples is also recognized to be an effective approach to studies of the specificity of new NMTs, as the substrate binding pocket is less conserved than that of Myr-CoA (16). We therefore entirely agree with Maurer-Stroh et al. (17) that a large set of experimentally verified MYR proteins from a given taxon is required to improve prediction.
Reducing the N-Myristoylome to a Set of 724 Proteins in A. thaliana-MYR measurements must be performed on a large number of putative substrates to obtain definitive data on the N-myristoylomes of higher eukaryotes such as A. thaliana. As the only feature common to all studied NMTs is the clear requirement for the N-terminal Gly (G 2 ) unmasked by NME, we extracted full-length protein sequences displaying this feature from the products of 25,498 ORFs. This feature was present in 2477 ORF products (9% of the proteome). This 2477-ORF library corresponded to the first generation (B1) library.
According to previous studies on fungal NMTs, the probability of a protein undergoing N-myristoylation depends strongly on two positions: residues 3 and 6. We assessed the relative contributions made by each of these positions to MYR. In fungi, FIG. 2. Complementation of a yeast nmt-conditional mutant by various NMT derivatives. Nomenclature is detailed in panel C. Panel A, 100 ng of purified AtNMT1, AtNMT2, and various chimeras (see panel C) were analyzed by polyacrylamide gel electrophoresis in denaturing gels containing SDS. A Coomassie-stained gel is shown. The arrow indicates the location of a contaminant of higher molecular weight that was present in some preparations. Lane M corresponds to the molecular size markers (values indicated on the left). Panel B, YB336 yeast cells containing the indicated plasmid were cultured at 24°C. Cells were re-streaked onto SD plates and cultured at 24 or 35°C for 4 days. I, induced with galactose; NI, non-induced or repressed (sugar was raffinose or glucose). Panel C, schematic diagram of the various cDNA fusions between AtNMT1 and AtNMT2. FS is a single variant of AtNMT1 with a F183S substitution, a crucial residue of the active site. SF is the corresponding variant of AtNMT2 with a single S181F substitution. 1-2 means that the corresponding NMT variant has the N terminus of AtNMT1 and the C terminus of AtNMT2. 1-2-1 is a more complex chimera with only a central domain of At-NMT2 and the rest corresponding to the sequence of AtNMT1. Exact fusion limit of each chimera is available upon request. Sc1 is ScNMT1. bulky residues such as Asp, Glu, Phe, His, Lys, Pro, Arg, Trp, and Tyr are excluded from position 3. Other studies have shown that a similar set of excluded amino acid residues direct N-acetylation rather than MYR in vivo (9). We investigated whether peptides with such residues at position 3 could serve as substrates of AtNMT1, by studying the effects on MYR potency of nine other peptides derived from the predicted N termini of 40 ORF products from the B1 library (At3g54840 to At1g80290 in Table III). Our results showed that the observed amino acid exclusion pattern at position 3 in yeast could not be used for substrate prediction because the presence of His in this position did not prevent MYR, whereas the presence of Val did. We therefore used a modified pattern, ϽMG][^DEFKRVWY], and restricted our 2477 library to 1473 candidates, constituting our second generation (B2) library.
In the GP study (Table I), we found that all AtNMT1 substrates had a positively charged residue at position 7, with the exception of ARA6, which has an Leu. The impact of the nature of residue 7 was investigated by creating two variants of SOS3 with modified side chains at position 7 (K7D and K7L). For AtNMT1 enzyme, we showed that replacing Lys-7 by neutral residues decreased catalytic efficiency by an order of magnitude but did not abolish the reaction. In contrast, replacement of the Lys residue in position 7 of SOS3 by the negatively charged Asp prevented AtNMT1 from modifying this molecule (Table IV). This suggested that we should exclude from our library all proteins with negatively charged residues (Asp or Glu) at position 7. Based on the pattern ϽMG[^DEFKRVWY]-XXX[^DE], our third generation (B3) library consisted of 1334 proteins.
We then used the B3 library to study position 6. In the set of myristoylated GP (Table I), all the members of which were present in the B3 library, we found that, as in S. cerevisiae, amino acids with small side chains (Ala, Gly, Ser, Thr) were generally present but that a residue with a longer side chain, Arg, was occasionally found. We further investigated the effect of position 6 on catalysis by selecting additional putative protein substrates from the G3 library with different amino acids in this position. We chose proteins with a positively charged residue at position 7, to increase the probability of MYR by AtNMT1 (At1g80290 to At1g64625 in Table III). We found that, with the exception of Arg and Phe, only small side chain residues were accepted at position 6 (Table III). In particular, Val, which has a small side chain, was tolerated at this position, in contrast to the Prosite pattern.  Table I); in addition, 168 have a Cys at position 3 or 4. Moreover, 112 have an Asn at position 3 and 284 have an Ser at position 6. Finally, Pro, an amino acid that is thought to disrupt the extended structure adopted by the peptide on binding to the enzyme, was observed at position 8 in 52 instances.
Medium Scale Probing of the A. thaliana B4 Library by Selected Subset Analysis of MYR Candidates Reveals New Important Criteria-As only a limited proportion of the proteins in the B4 library would be expected to be true MYR proteins, we decided to select a number of ORF products to refine our analysis further. MYR candidates were selected according to the following criteria: (i) expressed sequence tags corresponding to the N-terminal sequence had been identified, (ii) the predicted initiator Met codon was located in a favorable sequence context for translation (38), and (iii) the full-length sequence showed significant similarity to other known sequences. We decided to essentially concentrate on proteins for which a predicted function was given in genome sequence data base annotations (46%). In each case, we checked that the annotation was confirmed by experimental data. For instance, six protein kinases (PK) were annotated as cyclin-dependent PK, although they did not contain the consensus motifs characteristic of cyclin-dependent PKs (39). Another example was the N terminus of thioredoxin h6 (TRXh6) that was incorrectly annotated at The Arabidopsis Information Resource and SwissProt but correctly at Munich Information Center for Protein Sequences.
The ϳ330 proteins were classified into various subcategories and analyzed (data not shown). We retrieved several interesting candidates: (i) in the "protein fate" category, 4 proteasome components (2 UBP3 and 2 RPT2); (ii) in the "cell defense" category, 20 members of the nucleotide-binding site and leucine-rich repeat domains (NBS-LRR) family (40) and 5 cytosolic thioredoxins (TRX); (iii) in the "signal transduction" category, in addition to the expected PK, such as AKIN, several calcium-dependent PK such as CPK2, CPK3, and CRK6 (for a review, see Ref. 41), several phosphatases, and the 15 GP listed above; (iv) in the "cell division" category, PTEN, a phosphatase a The amino acid sequence of each peptide is indicated. The impact of the position studied is indicated in bold.
b See text for more information. In the AGI annotation, ARA6 corresponds to protein At3g54840, SOS3 to protein At5g24270, and BZP44 to protein At1g75390 in the proteome. Some single substitutions in these three peptides were studied. For instance, in SOS3-G2A, the N-terminal Gly-2 was replaced by an Ala.
c Values in parentheses refer to data obtained in the radioactive discontinuous assay.
recently implicated in male gamete development (42); (v) several "transport" proteins, including the SKOR membrane potassium channel (43) and an oxaloacetate/malate transporter (At1g14560); (vi) several "metabolism" proteins including the bifunctional enzyme 6-phosphofructo-2-kinase/fructose-2,6bisphosphate 2-phosphatase (F2KP), a central enzyme in the regulation of glycolysis; and, finally, (vii) in the "development" category, two protein homologs (DEM1 and DEM2) thought to play an important role in plant embryo development (44). Finally, in the "transcription" category (30 ORFs in B4), none of the members of which have ever been shown to be myristoylated, (i) six proteins were true or predicted b-Zip transcription factors (TF), such as BZP12 and BZP44 (45,46) and (ii) three were putative MYB-like proteins. We analyzed the MYR of several such putative transcription factors (AtBZP44, At1g02110, or At1g52320). We synthesized 14 of the corresponding octapeptides and assayed MYR (all in Table III except BZP44, data in Table I).
As an additional control, we added the peptide derived from CPK2, a PK recently shown to be myristoylated (47). All the selected substrates, including CPK2, were efficiently myristoylated, with the exception of SKOR and bZIP44. These two negative results were unexpected, as these substrates were predicted on the basis of both the Prosite pattern and Predictor score. However, closer study of the amino acid sequences of these molecules indicated in both cases, in contrast to other positive substrates: (i) the absence of a positively charged residue at position 7, (ii) the presence of a Gly at position 7, and (iii) series of amino acids at positions 3-5 with small radii of gyration, and (iv) absence at position 3 of a Cys or an Asn, two residues for which a high bias at this position was found in the B4 library. We therefore hypothesized that at least one of these features prevented these two peptides from undergoing myristoylation. We investigated the impact of a small side chain at position 7 by replacing the Lys of SOS3 by an Ala (SOS3-K7A; Table I). This resulted in a strong decrease in catalytic efficiency, consistent with our observations. However, despite this decrease in catalytic efficiency, this peptide was nonetheless a substrate of AtNMT1. We then investigated whether the positions displaying bias in the B4 library had a strong positive impact on MYR in the absence of a positively charged residue at position 7. We therefore assessed the effect of replacing the Cys in position 3 by an Ala in such a context (i.e. ARA6). We investigated the effect of an Asn residue in position 3 and of a Phe in position 5 by replacing the Ser-3 of BZP44 by an Asn or the Thr-5 by a Phe. Finally, we assessed the impact of Ser-6 and Pro-8 by creating two peptides with single substitutions of the Ala in ARA6. We found (Table I) that (i) an Asn in position 3, in contrast to a Cys in this position, had a strong positive effect, converting BZP44 into an NMT substrate, (ii) a hydrophobic group at position 5 was beneficial, (iii) the hydroxyl group of Ser had a strong impact at position 6, and (iv) a Pro at position 8 had no negative effect on MYR of AtARA6.
Restricting the A. thaliana N-Myristoylome to 437 Proteins-Our medium scale analysis (Ͼ60 protein substrates tested) of putative substrates of AtNMT1 goes some way toward the analysis of a large set of experimentally verified MYR proteins from a given taxon required to improve N-myristoylome prediction (17). We calculated the Predictor scores of the proteins in the B4 library with the Higher Eukaryote, which was found to be the most reliable for MYR prediction (Fig. 3, panel A). We decided to use the Higher Eukaryote Predictor score as a basis for the creation of a special pattern for A. thaliana N-myristoylome prediction. Only the "profile" score, which essentially takes into account the physical properties of the peptide was used. A specific parameter was added to this value to give a specific weight derived from kinetic measurements (see ''Experimental Procedures''). This new parameterization of Predictor defined the "Arabidopsis Predictor." This correction increased the scores of the AtNMT1 substrates but had no effect ARFD1b RABC2b a The amino acid sequence of each peptide is indicated. The impact of the position studied is indicated in bold.
b See text for more information. In the AGI annotation, ARA6 corresponds to protein At3g54840, SOS3 to protein At5g24270, and BZP44 to protein At1g75390 in the proteome. Some single substitutions in these three peptides were studied. For instance, in SOS3-G2A, the N-terminal Gly-2 was replaced by an Ala.
c Values in parentheses refer to data obtained in the radioactive discontinuous assay. on other, negative scores. We then plotted the MYR catalytic efficiencies of many of the peptides assayed (Tables I-III) as a function of their Arabidopsis Predictor score. This new score was clearly correlated with the results of kinetic analysis (Fig.  3, panel B). All peptides with positive scores were NMT substrates (i.e. k cat /K m Ͼ 2%), whereas those with negative scores were not (i.e. k cat /K m Ͻ 1%). This validated a posteriori the calculation. We then calculated the Arabidopsis score of each peptide of the B4 library and annotated it for MYR probability. Our results suggest that the A. thaliana N-myristoylome consists of 437 ORFs, accounting for 1.7% of the complete proteome. Detailed data for each ORF are given in Table s1, which is supplemental material available in the on-line version of this article.
In Vitro and in Silico Comparison of the Substrate Specificity of AtNMT1 and ScNMT1 Highlights Major Differences, Including the Crucial Role of Position 7 for MYR by AtNMT1-We used the three-dimensional structure of ScNMT1 complexed with two substrates (26) to model the three-dimensional structure of AtNMT1. We investigated the differences observed between ScNMT1 and AtNMT1, by focusing on the peptide-binding pocket. We found that the negative charges on the carboxylates of the side chains of two aspartates of AtNMT1 (Asp-120 and Asp-408) were located very close (1-4 Å) to the positive charge of the ⑀-ammonium of the side chain of Lys-7 of the substrate (Fig. 4). This strongly suggested that an electrostatic interaction between the two partners stabilized the enzyme-substrate complex, making MYR more likely in the presence of a positively charged side chain at position 7. It also accounted for the strong negative effect of negatively charged residues at this position on the reaction catalyzed by AtNMT1. In ScNMT1, the residue equivalent to Asp-120 is Arg-107, a positively charged residue. This suggested that the yeast enzyme should be less sensitive to the ionic nature of the side chain at position 7 (Fig. 4). As SOS3 has already been reported to be myristoylated by ScNMT1 (7), we studied the impact of several changes at position 7. The effect of these changes on  reactions with ScNMT1 was less severe than that with At-NMT1, consistent with our model (Table IV). This is clearly a major difference between the two NMTs.
We then investigated the reactions with ScNMT1 (Table IV) of several of the peptides studied with AtNMT1 (Table III). Several actual AtNMT1 substrates, such as AtDEM1 and At-DEM2, are not substrates of ScNMT1 (Table IV). Both these peptides have negatively charged residues at positions 8 and 9. Peptides with this structure are known to be poor substrates of ScNMT1 (48). In the absence of clear contacts derived from the crystal structure of a complex between a peptide and ScNMT1 (26) explaining why these remote positions should be excluded, the role of a negative electrostatic environment was suggested (16). Maurer-Stroh et al. gave peptides with negative charges at positions 8 and 9 a special negative "position penalty" for these positions in calculation of the Fungi Predictor score. We think that this penalty is appropriate. Our three-dimensional model shows that AtNMT1 does not have a negative electrostatic environment at the binding site for positions 8 and 9. It is probably for this reason that it is able to accept peptides with negative residues in those positions, whereas ScNMT1 cannot. This is another major difference between fungal and higher eukaryote NMTs.
In conclusion, all substrates of ScNMT1 were substrates of AtNMT1, but not all substrates of AtNMT1 were substrates of ScNMT1. Thus, the specificity of AtNMT1 is more relaxed than that of ScNMT1. DISCUSSION In this study, we aimed to analyze the entire N-myristoylome in a model higher eukaryote, A. thaliana. Although this work At4g14610 At5g63020 a The amino acid sequence of each peptide is indicated. The impact of the position studied is indicated in bold.
b See text for more information. In the AGI annotation, ARA6 corresponds to protein At3g54840, SOS3 to protein At5g24270, and BZP44 to protein At1g75390 in the proteome. Some single substitutions in these three peptides were studied. For instance, in SOS3-G2A, the N-terminal Gly-2 was replaced by an Ala.
c Values in parentheses refer to data obtained in the radioactive discontinuous assay. d CB, calcium-binding protein; DE, development; GP, GTP-binding protein; MB, membrane protein; ME, metabolism; PD, protein degradation; PH, phosphatase; PK, protein kinase; R, resistance protein; TF, transcription factor; TR, thioredoxin. Le indicates that the protein sequence was from Lycopersicon esculentum. Ps indicates that the protein sequence was from Pseudomonas syringae. In either case, the two letters are followed by the GenBank TM entry (for instance LeAAF76314). ND, not determined. e Data from Ref. 6. involved predictions based on the annotated ORFs, experimental verification was essential. Our approach was validated a posteriori as it predicted the MYR of expected substrates (ARFs and CPKs) and of proteins already shown to be myristoylated in vivo (ARA6, SOS3, or CPK2).

MYR in a Higher Eukaryote Proteome: A Case
Study-Sc-NMT1 is by far the best characterized NMT. The actual myristoylation status of only a few (ϳ13) of the 70 putative substrates of this enzyme has been verified to date. A large number (Ͼ200) of model peptides have now been assayed with Sc-NMT1, but the substrate specificity of this enzyme remains too vague or so restrictive that it is difficult to identify actual myristoylated proteins easily a priori (see, for instance, At3g49370 in Table IV). This makes it even more difficult to predict MYR in any new non-fungal proteome, necessitating further characterization of the substrate specificity of the corresponding NMTs. We therefore used an approach of exhaustive prediction followed by gradual restriction of the data set on the basis of further kinetic analysis. We observed a number of differences in substrate specificity between ScNMT1 and At-NMT1 (Tables III and IV). One major difference concerns position 7, the weight of which is crucial in A. thaliana because of a probable electrostatic interaction with the enzyme (Fig. 4). However, the specificity of AtNMT1 appears to be more relaxed than that of ScNMT1. This may explain why (i) AtNMT1 complements the yeast nmt-181 mutant and (ii) the relative percentage of the N-myristoylome of A. thaliana is expected to be significantly larger than that of S. cerevisiae.
Although our experimental investigation of the A. thaliana N-myristoylome was exhaustive, it is incomplete as it accounts for only 10% of the final N-myristoylome. We therefore reconciled the recently developed prediction software for eukaryotic proteomes (17) with our experimental data. We found that the Higher Eukaryotes option was the best suited to AtNMT1 but that major corrections to the weights of the kinetic terms were needed to fit predictions to the experimental data, as one in three predictions were wrong (Fig. 3).

All proteins starting with MGNXX[ACGSTV][^DE] or MG[^DEFKRVWY]XX[ACGSTV]
[KR] were myristoylated, and these proteins accounted for 60% of the N-myristoylome. Despite our prediction of double the number of putative proteins (437 versus 198), we fully agree with the recent proposal of Maurer-Stroh et al. (17) concerning the A. thaliana N-myristoylome. Thus, plants probably have up to twice as many myristoylated proteins than fungi (1.7% versus 0.5-0.8%). Statistical analysis of the A. thaliana N-myristoylome revealed that 30% of the proteins had a Cys at position 3 or 4, although this residue did not increase the probability of MYR (Table I) and was not used to select N-myristoylome proteins. This result strongly suggests that further S-palmitoylation of these proteins occurs at these positions. S-Palmitoylation is another lipid modification, initially induced by MYR, facilitating stable membrane anchorage (49,50). (Table s1) showed that an exceptionally large proportion of these proteins were predicted to be components of signal transduction pathways (54% of the N-myristoylome versus 10% of the proteome). Most of these proteins (71%) were PK, many of them calcium-dependent PK (i.e. most of the CDK, all CRKs, and one SnRK; see Ref. 41). This finding is entirely consistent with previous reports providing a partial description of the situation in animals (50,51). The large size of the plant N-myristoylome may therefore result from the more intricate signal transduction pathways in plants than in fungi. It may also be partly the result of the complexity of defense mechanisms in higher plants (11% of the Arabidopsis proteome), several of which involve MYR (see discussion below and Refs. 23 and 52).

Transduction Pathways Involve a Large Number of N-Myristoylated Proteins in A. thaliana-Close study of the N-myristoylome of A. thaliana
Among the MYR candidates belonging to signal transduction cascades, we focused our attention on the family of GP. Fifteen GP (14.9% of GP) are thought to be truly N-myristoylated. These proteins probably have similar functions to their counterparts in the S. cerevisiae proteome. Thirteen proteins (versus three in yeast) belong to the ARF GP family; another corresponds to the ␣-subunit of heterotrimeric GP, a unique protein in the genomes of both A. thaliana (GPA1) and yeast (ScGPA1); the remaining protein is a plant-specific small GP of the RAB family. Although data on the physiological role of several of these crucial GP were available (53)(54)(55)(56), actual MYR status had been checked in vivo only for ARA6 (37). Our in vitro data indicate that ARA6 is the only non-ARF GP to undergo N-myristoylation in A. thaliana. Two nuclear ARLs, ARL4 and ARL5, are known to be myristoylated in animals (57,58). This situation contrasts with that in plants, in which none of the ARLs is myristoylated (Table II). Overall, our data suggest that plant GP differ from those found in animals, as suggested in a recent study (59). Our results indicate that AtGPA1 is myristoylated, like fungal and animal GPAs (35).
New N-Myristoylated Proteins in A. thaliana-This study identified a number of unexpected proteins as undergoing MYR. This was the case for some TF, including a number of those belonging to the bZIP family (for a recent review on plant BZPs, see Ref. 46). Orthologs of these TF were found in other plants and mammals, where they also appear to undergo MYR, as shown by the high Predictor scores obtained. We did not expect to find that some TF were substrates of NMT and the functional significance of this result is unclear. Three nuclear proteins were recently shown to be N-myristoylated in yeast and humans (57,58,60). The role of MYR in nuclear proteins can be explained by several variations on a basic idea, as myristate is known to act in concert with other mechanisms in the regulation of protein targeting and function (61). For instance, MYR may promote anchoring to a given membrane or protein partner, trapping the TF and preventing it from acting until another signal switches off this interaction, thereby reversing the mechanism. MYR is thought to be a stable modification. This signal may therefore correspond to (i) phosphorylation, (ii) calcium binding, as in recoverin, or (iii) proteolytic cleavage, as in the caspase-induced pro-apoptotic protein BID (62).
Few myristoylated proteins seem to be involved in metabolism. The bifunctional enzyme F2KP (EC 2.7.1.105/EC 3.1.3.46), the central enzyme in the regulation of glycolysis, was another protein for which the prediction of MYR was FIG. 4. An electrostatic interaction between AtNMT1 and the peptide substrate. A schematic of the proposed electrostatic interaction between the two aspartate (Asp-120 and Asp-408) side chains (green) of AtNMT1 (blue) and the side chain of Lys-7 of the peptide (yellow) is shown. The distance between two given atoms is indicated as a white line, with the corresponding value expressed in Å. totally unexpected. The corresponding ORF in A. thaliana was recently characterized (63). Our data (Table III) clearly indicate that this protein is N-myristoylated. For other plant F2KP for which unambiguous full-length cDNA sequences are available (i.e. soybean, spinach, lettuce, lotus, poplar, potato, and Bruguiera gymnorhiza), the corresponding MYR sites and associated positive Arabidopsis Predictor scores strongly suggest that MYR of these proteins occurs. Interestingly, plant F2KP are known to have an N-terminal extension not present in yeast and animal forms, which is crucial for tetrameric assembly and for kinetic properties, as demonstrated by N-terminal deletion analysis (63,64). F2PK is reversibly phosphorylated, and its activity in degrading the regulatory metabolite fructose 2,6-biphosphate is important in modulating the rate of glycolysis (65). F2PK activity is tightly regulated in higher plants (66). It is therefore likely that phosphorylation and N-myristoylation (see previous paragraph) act together in regulating the activity of F2PK, and thus glycolysis. Interestingly, hexokinase, another regulated enzyme of glycolysis that acts as a glucose sensor (67), is known to interact with various membranes and compartments in mammals via its 10 most Nterminal residues (68). The N-terminal end of this protein has conserved Gly-2 and Lys-7. However, it remains unclear whether hexokinase is myristoylated in mammals. In A. thaliana, despite the presence of a Gly in position 2 of hexokinase, this enzyme is not predicted to be myristoylated. It was excluded from our analysis as it contains a Lys-3.
Five thioredoxins (TRXh2, -h6, -h7, -h8, and -h9) from among the nine members of the cytosolic thioredoxin (h) family (69,70) are predicted to be myristoylated. These five proteins have a 20 -30-amino acid extension in common, and their sequences are more similar to each other than to those of the other forms that more closely resemble their human counterparts (70). The associated role of MYR is unknown in plants. However, cytosolic thioredoxins are known to be involved in the redox activation of fructose-6-phosphate 1-phosphotransferase by fruc-tose 2,6-bisphosphate (71), the substrate of F2PK. This suggests that the various protein regulators of glycosis in plants (thioredoxins and F2PK) may be located in the membrane, channeling the information required to regulate the pathway correctly.
Another group of proteins that were unexpectedly predicted to be myristoylated function in the protein degradation pathway. Two ubiquitin proteases (RPT2) and two proteasome subunits (UBP3) were predicted to be myristoylated. The myristoylation of RPT2 (YDL007W) and UBP5 (YER144C) has been predicted in S. cerevisiae (11,17), but recent biochemical data are available only for RPT2 (72).
A Role of MYR for Innate Immunity Proteins in Higher Eukaryotes-Twenty of a total of 149 (13.4%) nucleotide-binding site and leucine-rich repeat domain (NBS-LRR) proteins were retrieved in the N-myristoylome. Eleven of these proteins were predicted to be S-palmitoylated. After the GP and PK families, the NBS-LRR family is the protein family with the largest number of myristoylated members. The members of this new family of myristoylated proteins have a nucleotide-binding site.
Nothing is yet known about the relationship between nucleotide binding and myristoylation in NBS-LRR proteins, but it probably plays a regulatory role similar to that described for ARFs (73). According to the phylogenetic tree and protein classification (40), the MYR NBS-LRR proteins belong to only two subfamilies: RPS5/RPS2 and RPP1. NBS-LRR proteins are a family of innate immunity (R) proteins involved in plant disease resistance (for a review, see Ref. 74). R proteins enable the organism to perceive pathogen attack through protein-mediated recognition of a pathogen avirulence (avr) protein injected into the plant, and to induce cell death as a result. Interestingly, avr proteins undergo (i) bacterium-mediated NME (avr-Rpm1, avrB, or avrC) or proteolytic cleavage (avrRpt2 and avrPphB; Refs. 75 and 76), both of which unmask an N-terminal glycine and (ii) plant-mediated N-terminal lipid (MYR and often palmitoylation) modifications (Table III and   a The amino acid sequence of the peptide corresponding to the indicated protein is given in Table I, II, or III. b Values in parentheses refer to data obtained in the radioactive discontinuous assay. c The Fungi Predictor score associated with each protein was calculated. data and Arabidopsis Predictor tool indicate the MYR of such avr proteins. The guard hypothesis (74) has been put forward to explain the functioning of R proteins. According to this hypothesis, avr proteins compete for the same binding site as the N-terminal domain of R proteins, on a cellular complex of proteins involving the ''guardee'' molecule, which is probably a protein such as RIN4 (78,79). Our data are entirely consistent with this hypothesis and suggest that the lipid modification of both avr and R proteins may be part of the primary signals directing these two proteins to the same binding site on the cellular membrane complex. This competition for the same binding site results in the release of at least a fraction of R proteins from the membrane and their migration to the nucleus, where they trigger cell death. Two non-NBS-LRR R proteins with PK activity, FEN and PTO, are myristoylated in tomato (6,80). Both these proteins are induced by avrPto, a myristoylated secreted protein from bacteria (81). MYR appears to be involved in the R mechanisms of many higher plants (see also Refs. 82 and 83), but it is unclear whether this is also the case in other higher eukaryotes. In particular, a pathogenesis-related protein conserved in animals has been recently reported to be myristoylated in humans (84). CONCLUSION The strategy used here could potentially be used for the study of any new N-myristoylome. This approach involves (i) yeast complementation experiments, (ii) in vitro assays, (iii) predictive and reiterative data library construction depending on complete genome sequence availability, and (iii) Predictor scoring improvement and measurement. Our data indicated that MYR is a general process involved in many steps of crucial cellular mechanisms, not only signal transduction, and that greater characterization and annotation are required to improve our understanding of cell physiology and protein functions. This study is one step toward this goal. Finally, it should be noted that of the two NMTs identified in A. thaliana, At-NMT1 and AtNMT2, only AtNMT1 shows NMT activity in vitro or in vivo. To date, we have detected no NMT activity associated with AtNMT2.