Covalent Structures of Potato Tuber Lipases (Patatins) and Implications for Vacuolar Import*

Proteome data of potato (Solanum tuberosum) tuber juice and of purified potato tuber vacuoles indicated that mature patatins may perhaps lack a C-terminal propeptide. We have confirmed this by complete mass spectrometric sequencing of a number of patatin variants as well as their N-linked complex-type glycans from the starch-rich cultivar Kuras. For this cultivar full-length patatin cDNAs have also been sequenced, as the patatin locus is highly polymorphous. It is well known that patatins are located in the vacuoles of potato tubers. Furthermore, the complex glycan structures show that the path is via the Golgi apparatus. However, the vacuolar targeting signal has never been identified for this storage and defense protein, which amounts to 25–40% of tuber protein. We propose that a six-residue C-terminal propeptide, -ANKASY-COO– comprises this signal. The crystallographic structure of a recombinant patatin (Rydel, T. J., Williams, J. M., Krieger, E., Moshiri, F., Stallings, W. C., Brown, S. M., Pershing, J. C., Prucell, J. P., and Alibhai, M. F. (2003) Biochemistry 42, 6696–6708), which included this propeptide thus, for the first time, shows the structure of a putative ligand of the vacuolar sorting receptor and processing enzyme responsible for patatin import.

Patatins comprise 25-40% of total soluble potato tuber protein of the world's third most important food crop. They have a broad acyl-hydrolase activity and are homologous to human phospholipase A 2 (1,2). Patatins are encoded by a single gene locus of ϳ1.4 Mb at the end of the long arm of chromosome 8 (3). Sequencing of a 154-kb bacterial artificial chromosome clone containing a portion of this locus revealed two putative functional and 12 pseudo patatin genes (4). This structure of the locus suggests that variants might be frequent. The patatin protein sequences vary significantly among cultivars (5). They are targeted to the endoplasmic reticulum by a typical 23-residue signal peptide, which is removed on endoplasmic reticulum import. The mature tuber patatin variants are dimers of 40-to 42-kDa subunits without disulfide bridges but carry from one to three N-linked glycans. The complex type of N-linked glycans of patatins shows that patatins must pass the Golgi apparatus before they end up in the vacuole (6,7). The import signal to the vacuole is unknown. Surprisingly, mature patatins purified from potato tubers have so far only been characterized by N-terminal protein sequencing and sequencing of a few peptides (8). The sequences have been translated from cDNA sequences, but never confirmed by complete protein sequencing. In recent proteome studies of potato tuber juice (5,9) and soluble proteins from purified tuber vacuoles 3 we never observed the last amino acid residues expected from the cDNA-derived patatin sequences. Because there are several arginine and lysine residues within the C-terminal sequence of patatins, which will give rise to trypsin cleavage, we have subjected purified patatins to chymotrypsin and Lys-C protease digestions and determined the complete amino acid sequences of eight patatins, except for a few residues in three patatin variants. We show that patatins indeed lose a six-residue ct 4propeptide on their way to the vacuole or inside the vacuole, and that all asparagine-X-serine/threonine-Y (X and Y are any amino acid residue except for proline (10)) sequences carry a complex N-linked glycan. The surface locations of glycans are demonstrated, and the properties of the exposed ct-propeptide are discussed.

EXPERIMENTAL PROCEDURES
Purification of Patatins-Juice of mature cv Kuras potato tubers was prepared at 4°C and fractionated by Superdex 200 gel filtration at room temperature as described elsewhere (9), except that the eluant was 20 mM Tris-Cl buffer, pH 8.2 (buffer A). Two milliliters of the 90-kDa peak dominated by dimeric patatins were then fractionated by anionic exchange chromatography on a Mono Q 5/5 column (Amersham Biosciences) equilibrated in buffer A. The Mono Q column was eluted by 5 ml of buffer A followed by a linear gradient up to 0.5 M NaCl in buffer A (flow rate 1 ml min Ϫ1 ; fractions of 0.5 ml). Fractions were monitored by A 280 nm , and MALDI-TOF MS (9).
Proteolytic Digestions-Mono Q fractions (30 l) were precipitated with ice-cold ethanol to a final concentration of 60% over night at Ϫ20°C, which will precipitate patatins. Pellets were reduced in 50 l of buffer (8 M urea, 200 mM Tris, 20  supplemental Figs. S1-S11, Tables S1-S6, and the potato protein database. 1 To whom correspondence may be addressed. for 30 min at 25°C in the dark and precipitated with 6 vol of ice-cold ethanol over night at Ϫ20°C. The pellets were dissolved in 20 l of 50 mM NH 4 HCO 3 , pH 8.0, and digested by sequencing grade modified bovine chymotrypsin (Princeton Separation Inc., Adelphia, NJ), Lys-C (Roche Diagnostic, Mannheim, Germany), or modified sequencing grade porcine trypsin (Promega, Madison, WI) dissolved in 50 mM acetic acid. Samples were digested at 37°C at E:S ϭ 1:100 (w/w) for 30 min, and after addition of more protease (1:100) digestions were continued for 1 h and stopped with 5 l of 5% formic acid. Digests were concentrated 10-fold by vacuum centrifugation and diluted with 20 l of 5% formic acid prior to LC-MS/MS, or storage at Ϫ20°C. Deglycosylation with Glycopeptidase A-Carboxymethylated patatin fraction 48 was prepared and precipitated as described above. The pellet was dissolved in 15 l of 0.1 mM ammonium acetate, pH 5, and incubated with 60 milliunits of glycopeptidase A from almonds (Sigma-Aldrich) for 18 h at 37°C, dried by vacuum centrifugation, and digested with chymotrypsin as describe above.
Nano-LC-Electrospray Ionization-MS/MS and Data Analyses-Aliquots of proteolytic digests were analyzed by nanoflow capillary high pressure liquid chromatography interfaced directly to an electro spray ionization Q-TOF tandem mass spectrometer (MicroTOFQ, Bruker Daltonics, Bremen, DE) as described elsewhere (11). Protein and peptide databases translated from all available potato expressed sequence tag sequences (DFCI Potato Gene Index, Release 12.0, July 24, 2008), and from Kuras-specific expressed sequence tag sequences (12) and fulllength cDNAs 5 were created as described by Emmersen (13). The potato protein database is available on-line (supplemental database S1).
Compiled lists of MS data from one type of proteolysis of all fractions, or data from single Mono Q fractions were analyzed and searched by Mascot software v2.2 (Matrix Sciences) (14) against the potato protein database. Search parameters were: enzyme, semi-chymotrypsin, allowing two missed cleavages; complete modification: carboxymethylated; partial modification: oxidized methionine; peptide tolerance: 0.1 Da. This tolerance was also used for interpreting peptide fragment MS/MS data. Settings for Lys-C and trypsin were similar. Glycopeptides were extracted manually from the raw MS/MS spectra using DataAnalysis version 3.4 (Bruker Daltonics, Bremen, DE). Errortolerant searches were performed to identify deglycosylated asparagines in the digest of a glycopeptidase A-treated sample, and patatin peptides with substitutions relative to Fig. 2.

RESULTS
Patatin Purification and Sequence Analyses-Potato tuber juice was fractionated by Superdex 200 gel filtration at room temperature. The 90-kDa peak, dominated by dimeric patatins, was further fractionated by anionic exchange chromatography ( Fig. 1). Fractions were analyzed by MALDI-TOF MS. Fractions 43-60 contained proteins of 40 -42 kDa and were subjected to S-carboxymethylation and digestion with chymotrypsin. Additionally, fractions 44, 48, and 51 were digested with Lys-C pro-tease and trypsin. Nano-LC-MS/MS analyses of each digest and Mascot searches against our in-house potato protein database provided full or nearly full amino acid sequence coverage of pat1-k1 through pat4-k1, including the glycosylated peptides as shown in Fig. 2. The MS data from chymotryptic and Lys-C peptides are shown in supplemental Tables S1 and S2. All text refers to position numbers according to Fig. 2 and not residue numbers as deletions at positions 118 and 223 misalign corresponding residues of patatin variants.
The C Terminus of Mature Patatins-A Mascot search list of the 5905 MS/MS spectra of chymotryptic peptides of fractions 43-60 identified pat1-k1 peptides ending at position arginine 381 seven times, and of the identical pat1-k2 through pat4-k1 C-terminal peptides, also at position 381, 29 times ( Fig. 2 and supplemental Table S1). In addition, Lys-C generated peptides identified arginine 381 as the C terminus of pat1-k1, twice, and of the other patatins, once (supplemental Table S2). The cleaved peptide bonds are in agreement with the specificity of chymotrypsin (after leucine 373 or leucine 374), and Lys-C (after lysine 372). No traces of peptides, which included residues beyond position arginine 381, were identified. Because neither chymotrypsin, nor Lys-C, will usually cleave after arginine residues, the last observed arginine 381 comprises the C terminus of mature patatins. Furthermore, no peptides indicating shorter forms of mature patatins were observed, suggesting a high arginine-C specificity of the protease removing the patatin propeptide during vacuolar import.
Sites of N-Linked Glycans-Mass spectra of glycopeptides were extracted by searching for ions of 204.1 Ϯ 0.2 for GlcNAc (acetylated glucosamine), 366.1 Ϯ 0.2 for GlcNAc-Man 1 , and 528.2 Ϯ 0.2 for GlcNAc-Man 2 . Chymotryptic peptides of low molecular weight gave a good coverage of double-charged glycopeptides as the mass spectrometer was optimal for the m/z range 100 to 2100. In some cases triple-and quadruple-charged precursor ions of larger glycopeptides were also automatically isolated by the mass spectrometer and subjected to fragmentation (MS/MS). All potential asparagine-X-threonine/serine glycan acceptor sites (10)   the most abundant N-linked glycan in potato tuber (15). The MS/MS spectrum of glycopeptide 106 -122 of pat2-k1 is typical and demonstrates the glycan sequencing (Fig. 3A). MS cannot distinguish ␣and ␤-anomers or carbon numbers involved in links. The remaining 11 glycan sites are documented by their MS/MS spectra in supplemental Figs. S1-S11, and the corresponding glycopeptides are underlined in Fig. 2. The standard MS/MS settings of the MicroTOFQ instrument fragmented the glycan side chain only, not the peptide backbone. The same fragmentation pattern has been seen previously in Q-TOF MS (Fig. 5 of Ref. 16). Glycan variants stemming from incompletely processed glycans have been observed (not shown). More importantly, glycosylation appears to be essentially complete, because the non-glycosylated forms of these peptides have not been observed in our data (supplemental Tables S1 and S2).
In a separate experiment the S-carboxymethylated patatin in fraction 48 of Fig. 1, which had given rise to most of the observed glycopeptides, was subjected to deglycosylation with glycopeptidase A, and then digested with chymotrypsin (supplemental Table S3). Four different deglycosylated peptides all including position 115 were seen. These deglycosylated pep-tides were now accessible to normal peptide fragmentation by MS/MS (example in Fig. 3B). The deglycosylated asparagines now appeared as aspartates, the well known product of enzymatic deglycosylation (17).

DISCUSSION
MS/MS sequencing of proteolytic digests of fractions 43-60 of Fig. 1 showed, like MALDI-TOF MS, that no fraction contained solely one patatin gene product (supplemental Tables S4 and S5). Essentially, patatin heterogeneity can originate from differential maturation of the protein chain or the glycan side chains of each patatin gene transcript. We found an unusual clear-cut processing of patatins at the N termini (position 24), as well as at the C termini (position 381), which is demonstrated in supplemental Tables S1-S3. Pots et al. (19) isolated and characterized four groups of active patatins from the Bintje variety, which were also heterogeneous. These authors suggested that glycans might cause the heterogeneity. We did see minor heterogeneity of the uncharged complex glycans, which can affect the molecular weight of a patatin variant, but not the charge or protein sequence.
So why can we and others not obtain pure potato tuber patatin variants by standard chromatographic procedures? First, the patatins are encoded by a large family of rather similar genes. If the sequence of the 154-kb bacterial artificial chromosome clone from cv Katahdin (4) is representative of the estimated 1.4-Mb patatin locus (3), then the locus might encode 18 active patatin variants (and 118 partial or pseudo genes). Sec-FIGURE 2. Alignment of translated patatin cDNA sequences using Kuras pat1-k1 as template. A dot indicates an identical residue, a dash a deletion. Numbers are residue positions common to all. GenBank TM accession numbers for pat1-k1 through pat3-k1clones are DQ114415 through DQ 114421. Pat4-k1 has not been cloned and sequenced from cv Kuras; the proteome data scored highest for CAA27571 (translated from gene X03932). Recombinant pat17, rpat-17, from Solanum cardiophyllum (AY033231) is shown for comparison, and its structure is shown in Fig. 4. Rpat17 was expressed in E. coli with an N-terminal His tag and the full C terminus, as shown. Mature patatin proteins are shown in capital letters with the active site residues serine 77 and aspartate 216 highlighted in reversed print (2,3). Glycine 77 in pat3-k1 excludes enzymatic activity of this variant. N-terminal signal peptides (1-23) and the ct-propeptides (382-387) are shown in lowercase letters. All potential sites of N-linked glycans are shown in lowercase italics. All have been observed in the underlined peptides. Sequenced chymotryptic peptides are shaded in light gray; dark gray indicates additional sequences observed in overlapping Lys-C or tryptic peptides. White residues have not been assigned from mass spectra.
ond, recent work shows that patatin variants are quite similar, because a 635-nucleotide RNA interference sequence from the transcript of Kuras pat3-k1 covering active site serine 77 can suppress 99% of all tuber patatins in transgenic cv Desiree (20). Third, it is presently unknown whether active patatins are homo-or heterodimers. Heterodimers will add exponentially to the number of different active patatins. Fourth, the tetraploid nature of potato may give rise to similar allelic variants. Fortunately, modern methods of proteomics do not require pure proteins for sequencing. However, reliable translated cDNA sequences are crucial.
The patatin locus appears to evolve rapidly giving rise to extensive patatin sequence variability among potato cultivars. This might be explained by a non-essential storage function of patatin proteins and the complexity of the patatin locus with patatin genes consisting of 5-7 exons and an abundance of pseudo genes. Therefore, it has been essential to sequence Kuras patatin genes for protein assignments. We have sequenced 500 full-length cDNA single clones from cv Kuras, including seven different patatins (GenBank TM accessions  DQ114415 through DQ114421, supplemental Table S4), and assembled Kuras contigs from ϳ9000 good expressed sequence tag sequences (12) (supplemental database S1), which indicate the presence of 25 different patatins in Kuras tubers. The sequences corresponding to Kuras GenBank TM accession numbers DQ114415 through DQ114421 are from single clones sequenced by primer walking. This guarantees that they are not chimeras, which might be the case for contigs. So, the seven pat1-k1 through pat3-k1 sequences of Fig. 2 represent true patatin genes in Kuras. In contrast, the Kuras patatin gene for pat4-k1 is unknown, however highly similar to gene X03932 from a haploid potato (21).
In addition to the many MS/MS data with a perfect fit to the patatins in Fig. 2, we do have 31 high quality patatin-like chymotryptic sequences seen at least twice, which deviate by one residue from pat1-k1 through pat4-k1 (supplemental Table S6). These are most likely representing less expressed unknown Kuras patatin gene variants or alleles. The data (full list of significantly scoring chymotryptic peptides; not shown) provides evidence of the presence of at least 24 different patatin proteins, which, taken together with our 25 different Kuras patatin contigs, document the high complexity and polymorphism of the potato patatin locus.
Unambiguously, the lists of compiled sequences from fractions 43-60 shows that the last six residues of translated Kuras tuber patatins are absent in the mature vacuolar patatins and thus comprise a ct-propeptide. Indeed, the last ct-residues, ASY or ASF, are similar to the ends of many other ct-propeptides of vacuolar storage proteins (reviewed in Refs. 22,23).
In Fig. 4 the pat-17 structure is shown highlighting the locations of glycans and the interactions of the C-terminal residues of mature patatin. The amino acid sequence of pat-17 from Solanum cardiophyllum is very similar to patatins pat1-k1 through pat4-k1 from the Solanum tuberosum cv Kuras. The recombinant non-glycosylated form was expressed in Escherichia coli with an N-terminal histidine tag and all of the translated C-terminal residues (Fig. 2), which the present study shows are removed in mature potato patatins as a ct-propeptide. N-linked complex glycans observed experimentally in Kuras patatins are located to the homologous positions at asparagines 60, 90, 115, and 203, and threonine 270 of pat-17, although only asparagine 203 can be glycosylated in pat-17. All sites of glycan attachment are located to loops or turns on the molecular surface as seen in Fig. 4. Glycans increase solubility in salt solutions and decrease solubility in organic solutions (24). Glycans also protect against protein unfolding by decreasing the rate of unfolding (25).
The crystals of pat-17 contained three molecules per unit cell, A, B, and C. Fig. 4 shows the A molecule for which four residues of the ct-propeptide, 382 ANKA 385 , could be traced (2). The structure shows the ct-propeptide sticking out from the remainder of the molecular surface. This protrusion becomes even more pronounced if the electron density of the last two unstructured residues (386 -387) is visualized (not shown). Therefore, the ct-propeptide, including its free ␣-carboxylate, ANKASY-COO Ϫ , is easily accessible to a vacuolar sorting receptor (VSR). A homologous cytosolic patatin from Hevea brasiliensis, Hev b 7 (26), ends at position 384 of Fig. 2 and has no ct-propeptide-like residues.
Ct-peptides of the general "hydrophobic-negative charge" pattern are well known ligands of some VSRs (22,23). Many studies have associated ct-peptides to certain VSRs (also referred to as ELPs, BP-80, or PV-72 homologs), and sequencespecific vacuolar sorting signals to others. We have aligned 16 VSRs from potato, Arabidopsis (AtVSR1 through AtVSR7), and other plants (not shown). All ϳ625-residue VSR transmembrane proteins, whether they bind ct-propeptides or sequencespecific vacuolar sorting signals, have homologous sequences: an N-terminal signal peptide (ϳ20 residues), a luminal domain (550 residues) containing 30 residues followed by a PA domain (protease associated, 114 residues), another 250 residues followed by three EGF-like domains (3 ϫ 50 residues), a 10-to 15-residue serine/threonine-rich segment, the transmembrane domain (22 residues), and a cytoplasmic domain (35 residues) with a 4-residue internalizing motif (see for example, Uniprot O22925). Thus the domains of the luminal 550 residues of any VSR might have adapted to different high affinity ligands such as the various ct-propeptide or sequence-specific vacuolar sorting signal motives.
Vacuolar proteins are recognized for sorting as discussed, but they are also bound and cleaved by a vacuolar processing enzyme. We analyzed the interactions of the helical C-terminal residues 377 RKKLR 381 of mature patatins within a 5-Å sphere. Invariant arginine 377 is intimately bound to invariant aspartate 71 in a binary mode via short 2.80-and 2.92-Å contacts between the terminal nitrogen atoms of the guanidinium group and the side-chain carboxylate oxygens (salt bridge) (Fig.  4). One of the nitrogens has an additional 2.77-Å hydrogen bond to backbone carbonyl oxygen of methionine 28 (residue 4 of mature patatin). The following residues have weak contacts only. The side-chain nitrogen of lysine 379 is 3.65 Å away from the carboxylate of aspartate 59. Also the contacts of the C-ter-minal arginine 381 and the N-terminal residues 25 LGE 27 are weak, because the carboxylate of glutamate 27 is pointing away from arginine 381. We conclude that the C-terminal residues of mature patatins after arginine 377 are loosely bound and, therefore, might easily unfold on binding to a processing enzyme, if required. The processing enzyme is presently unknown in potato tuber vacuoles. Because our proteomics data show the cleavage being exclusively after arginine 381, we propose that it might be an endopeptidase rather than an exopeptidase involved in removal of the six-residue patatin ct-propeptide.