New functions of the thylakoid membrane proteome of Arabidopsis thaliana revealed by a simple, fast, and versatile fractionation strategy.

Identification of membrane proteomes remains challenging. Here, we present a simple, fast, and scalable off-line procedure based on three-phase partitioning with butanol to fractionate membrane proteomes in combination with both in-gel and in-solution digestions and mass spectrometry. This should help to further accelerate the field of membrane proteomics. Using this new strategy, we analyzed the salt-stripped thylakoid membrane of chloroplasts of Arabidopsis thaliana. 242 proteins were identified, at least 40% of which are integral membrane proteins. The functions of 86 proteins are unknown; these include proteins with TPR, PPR, rhodanese, and DnaJ domains. These proteins were combined with all known thylakoid proteins and chloroplast (associated) envelope proteins, collected from primary literature, resulting in 714 non-redundant proteins. They were assigned to functional categories using a classification developed for MapMan (Thimm, O., Blasing, O., Gibon, Y., Nagel, A., Meyer, S., Kruger, P., Selbig, J., Muller, L. A., Rhee, S. Y., and Stitt, M. (2004) Plant J. 37, 914-939), updated with information from primary literature. The analysis elucidated the likely location of many membrane proteins, including 190 proteins of unknown function, holding the key to better understanding the two membrane systems. The three-phase partitioning procedure added a new level of dynamic resolution to the known thylakoid proteome. An automated strategy was developed to track possible ambiguous identifications to more than one gene model or family member. Mass spectrometry search results, ambiguities, and functional classifications can be searched via the Plastid Proteome Database.

Chloroplasts in green algae and plants have evolved from a cyanobacterial endosymbiont. They contain a small genome that encodes ϳ120 proteins and RNA molecules. Most chloroplast proteins are encoded by the nuclear genome, synthesized in the cytosol as precursors, and subsequently targeted to chlo-roplasts via an N-terminal chloroplast transit peptide (cTP). 1 This cTP is proteolytically removed after import in chloroplasts. Chloroplasts contain three membrane systems: a double membrane system of inner and outer envelopes, surrounding the organelle, and the thylakoid membrane system. Thylakoid membranes contain four major multisubunit protein complexes involved in photosynthesis (2), comprising nearly 100 proteins; 65 of these have one or more transmembrane domains (TMDs) (3).
Subcellular prediction of all proteins encoded by the Arabidopsis nuclear genome using TargetP (4) suggested that 4255 proteins have a cTP (14.9% of the total Arabidopsis proteome); at least 520 of these are predicted to have one or more TMDs and are located in the inner envelope or thylakoid membrane (5). Many of the proteins in the outer chloroplast envelope membrane typically do not have a cleavable cTP that can be predicted by the current subcellular localization predictors. These hydrophobic proteins are distributed between the envelope and the thylakoids via a sorting process that remains elusive. Recent proteomics studies on both membrane systems identified significant sets of proteins in and associated with the membrane system (3,6,7). However, it is unlikely that these studies have been exhaustive, and many low abundant proteins remain to be detected. This motivated us to further analyze the thylakoid membrane proteome and to develop a method that allows identification of low abundant proteins. 20 -30% of the genes in any sequenced genome encode integral membrane proteins with one or more ␣-helical TMDs (8). These membrane proteins fulfill critical functions in the transport of ions, small organic molecules, and proteins and in intraand intercellular communication. Unfortunately, membrane proteomes are difficult to analyze experimentally because of their hydrophobic nature, resulting in insolubility and adsorption, and incompatibilities of ionic detergents with mass spectrometry.
Different experimental strategies for large-scale identification of membrane proteins have been explored (reviewed in Ref. 9). Despite extensive efforts to synthesize nonionic detergents for separation of membrane proteins on two-dimensional gels with immobilized pH gradient strips in the first dimension, no significant membrane protein separation on two-dimensional electrophoresis gels with high dynamic resolution has been reported (10). Organic solvent extraction using a mixture of chloroform/methanol (3,6,11,12), direct extraction by methanol (13), and chromatography (14 -16) have proven more successful, combined with one-dimensional SDS-PAGE, followed by either in-gel digestion and mass spectrometry (MS) or insolution digestion and on-line liquid chromatography (LC)-tandem mass spectrometry (MS/MS). CNBr cleavage of the insoluble fraction of yeast cells combined with extensive twodimensional LC-MS/MS (MudPit) has also been successful in identification of yeast membrane proteins (17). Recently, direct methanol extraction of membranes followed by in-solution trypsin digestion in buffered methanol of membrane fragments and nano-LC-electrospray ionization (ESI)-MS/MS was reportedly successful in identifying a significant fraction of integral membrane proteins (18,19).
Despite these successes, simple and fast fractionation methods for membrane proteomes are lacking that (i) do not require HPLC, (ii) allow for removal of lipids and apolar pigments such as chlorophylls and carotenoids, (iii) allow processing of small amounts of membrane proteins, and (iv) are compatible with ESI-MS. Such off-line fractionation techniques are needed to reach dynamic resolutions that are sufficient to identify ion channels and transmembrane signal transducers as well as low abundance enzymes such as kinases and phosphatases.
Three-phase partitioning (TPP) was originally developed in the 1950s to purify proteins using 1-butanol in the upper phase and (NH 4 ) 2 SO 4 and pH in the aqueous lower phase (20). Low speed centrifugation will induce an interphase with precipitated proteins above the aqueous phase because butanol bound to proteins increases their buoyancy (21). The precipitated proteins are largely free of salts and are easy to resuspend. The amount of salt required to precipitate a protein varies inversely with the protein's molecular mass, and proteins precipitate most readily at their pI (22). The method has previously been used for the purification of soluble proteins and recently for the purification of overexpressed proteins (23)(24)(25). However, this methodology has not been explored for integral membrane proteins.
In this work, we have adapted the original TPP method to hydrophobic proteins and used this modified protocol to fractionate the proteome of salt-stripped thylakoids of chloroplasts from the model plant Arabidopsis thaliana. After removal of SDS, fractionated membrane proteins were solubilized in (CH 3 ) 2 SO, digested with trypsin, and analyzed by on-line reverse phase (RP)-nano-LC-ESI-MS/MS. In addition, protein fractions were separated by SDS-PAGE and in-gel digested with trypsin, followed by on-line RP-nano-LC-ESI-MS/MS.
A total of 242 proteins were identified, at least 40% of which were predicted transmembrane proteins, and 50% of which were not identified in our earlier thylakoid membrane analysis (3). Most of these proteins newly identified here have no predicted functions, and they include several proteins with DnaJlike domains, rhodanese domains, or PPR/TPR domains. All proteins, together with thylakoid-associated proteins identified in other studies, were cross-checked against experimental analysis of the envelope membrane proteome surrounding the chloroplast. These 714 non-redundant proteins were classified in functional categories using an adapted classification developed for MapMan (26) combined with information from KEGG, Aracyc, and functional domain predictors as well as primary literature. This analysis showed strong differentiation in cellular functions between the two membrane systems and elucidated the suborganellar location of many chloroplast membrane proteins. The data, including details of MS-based identification as well as functional classifications, can be extracted and searched via the Plastid Proteome Database (PPDB; ppdb.tc.cornell.edu/).

EXPERIMENTAL PROCEDURES
Plant Growth and Thylakoid Preparations-A. thaliana (Columbia ecotype 0) was grown on soil with 10-h light/14-h dark periods at 22/19°C (light/dark). Thylakoids were purified from fully developed leaves using a combination of differential and gradient centrifugation steps as described (3).
Removal of Water-soluble and Peripheral Thylakoid Proteins-The thylakoid membranes were resuspended at 10 mg/ml chlorophyll in ice-cold 10 mM Hepes containing a mixture of protease inhibitors as detailed previously (27). The suspension was diluted 20 times with ice-cold solutions of 2 M KBr and KNO 3 to reach a final concentration of 1 M each. The suspension was stirred slowly for 30 min on ice with 10-s sonication steps every 5 min. Membranes were collected by centrifugation at 150,000 ϫ g for 25 min at 4°C, washed with 10 mM Tris-HCl (pH 8) to remove the remaining salts, and resuspended in solubilization buffer (10 mM Tris-HCl (pH 8), 2% SDS, 5 mM TBP, and 6 M urea). The solubilized membranes were used as starting material for the TPP fractionation. Salt-extracted proteins in the supernatant (water-soluble and peripheral) were precipitated with 10% trichloroacetic acid, left on ice for 15 min, and collected by centrifugation (10,000 ϫ g for 25 min at 4°C). The pellet was washed with 10 mM Tris-HCl (pH 8) and resuspended in solubilization buffer.
One-dimensional Electrophoretic Analysis-Proteins from the different samples were heated at 100°C for 2 min and spun for 5 min at 14,000 ϫ g prior to separation by one-dimensional SDS-PAGE. Proteins were separated on 24 ϫ 18-cm Tricine/SDS/urea-polyacrylamide gels (8 -15% acrylamide gradient) (28). The gels were fixed, and proteins were visualized by high sensitivity silver staining (29), Coomassie Brilliant Blue R-250, or SYPRO Ruby (Molecular Probes, Inc.). Images of the gels were taken with an Amersham Biosciences Personal Densitometer TM II or FluorS (Bio-Rad).
Extraction of Pigments and Fractionation of Hydrophobic Membranes Using TPP-KBr/KNO 3 -treated thylakoid membranes (containing 20 mg of chlorophyll) were resuspended in solubilization buffer. Chlorophylls and carotenoids were removed prior to fractionation by sequential extractions with 1-butanol saturated with solubilization buffer with the objective to avoid interference with protein separation or MS. 1-Butanol was added to the solubilized membrane proteins at a 4:7 ratio to facilitate two-phase separation between a butanol-enriched upper phase and the aqueous lower phase. Since pigments (unlike proteins) preferentially partition to this upper phase, they could easily be removed by collecting the pigment-enriched upper phase (after shaking the upper and lower phases and spinning the tubes for 5 min at 7000 ϫ g). The upper phase was then replaced with a fresh upper phase. Five extractions were necessary to remove the pigments from the lower phase.
Once the pigments were removed, fractionation of the hydrophobic membrane proteins was achieved either by stepwise increasing the (NH 4 ) 2 SO 4 concentration or by stepwise shifting the pH. After each step, an interphase (or third phase) consisting of precipitated proteins was collected between the upper and lower phases and resuspended in solubilization buffer by sonication. Six fractions were collected using six different steps: 1) TPP1, addition of 55 mM (NH 4 ) 2 SO 4 (crystals); 2) TPP2, a shift from pH 8 to 5 with Mes (50 mM final concentration); 3) TPP3, addition of (NH 4 ) 2 SO 4 to 275 mM; 4) TPP4, saturation of the solution with (NH 4 ) 2 SO 4 ; 5) TPP5, shift to very low pH (ϳ1) by addition of phosphoric acid (150 mM final concentration); and 6) TPP6, shift to very high pH (ϳ14) by addition of sodium hydroxide (400 mM final concentration). No proteins were detected in the collected upper phases.
Protein Digestion, MS Analysis, and Extraction of MS Data-More than 700 protein spots were excised manually from the one-dimensional gels. The spots were automatically washed, digested with modified trypsin (Promega) (30), and extracted using the ProGest robot (Genomic Solutions, Ann Arbor) as summarized previously (3).
For in-solution digestion, aliquots from the dissolved TPP fractions were precipitated, and SDS was removed as principally described (31). Proteins were precipitated with 100% acetone for 30 min at Ϫ20°C. Precipitates were collected by centrifugation at 10,000 ϫ g for 15 min at 4°C. The supernatant was removed, and a solution of 80% acetone, 10% methanol, and 0.2% acetic acid was added. The samples were incubated for 30 min at Ϫ20°C, followed by centrifugation at 18,000 ϫ g for 15 min at 4°C. The residual acetone was removed by vacuum centrifugation. Because it has been reported that trypsin is active in 10% Me 2 SO (32), we solubilized the proteins in 100% Me 2 SO for 20 min, followed by dilution with digestion buffer (final concentrations of 50 mM ammonium bicarbonate and 10% Me 2 SO; 1:20 (w/w) trypsin/protein ratio). The proteins were digested overnight at 37°C. The Me 2 SO was then removed by vacuum centrifugation of the peptide mixtures.
All samples were analyzed using nano-LC-ESI-MS/MS in automated mode on a quadrupole/orthogonal acceleration time-of-flight tandem mass spectrometer (Micromass Q-TOF1). Details for on-line RP-HPLC were described previously (33). The spectra were used to search the public data bases (downloaded locally) automated via in-house Mascot (Matrix Science). When searching Mascot, the maximum peptide and fragment errors were 1.2 and 0.8 Da, respectively. The probabilitybased Mowse score, the number of matching peptides, and the highest peptide score were extracted from the Mascot peptide summary report pages using in-house written software. The minimum criteria for identification were as follows: (i) one matching peptide with a peptide score higher than the minimum significant (p Ͻ 0.05) individual ion score, (ii) two matching peptides with a peptide score Ͼ21, or (iii) three matching peptides with a peptide score Ն20. Since Ͼ50% of Arabidopsis proteins are part of multigene families, protein identifications were verified for ambiguities resulting from the presence of multigene families or conserved domains. This was largely automated via software written inhouse. Proteins that were not identified with any "unique" peptides (which are not shared with any other protein entries) were manually verified, and peptides were ignored if a better fit to the MS/MS spectrum could be found in a higher ranking entry.
The highest peptide score, the probability-based Mowse score from the peptide summary report, and the number of matching peptides as well as remaining ambiguities can be directly seen in PPDB when searching an accession number. All identifications based on one matching peptide or low mascot scores were manually verified, and all proteins that were identified only once were carefully considered.
The Plastid Proteome Database-The construction of PPDB was described previously (3). PPDB contains the theoretical analysis of all Arabidopsis entries (currently Release 4.0 of ATH1.pep, with 28,581 nuclear encoded Arabidopsis proteins as well as the mitochondrial and plastid genomes). Predicted TMDs from the Aramemnon Database (aramemnon.botanik.uni-koeln.de/) are from Release 1.2. Compared with the earlier PPDB release, the Mascot scores, the number of matching peptides, and the higher peptide score for each identification as well as functional classification are listed. Ambiguous identifications of members of multigene families can be directly viewed in PPDB.

Fractionation of Hydrophobic Thylakoid Proteins Using TPP
The latest genome-wide analysis of the predicted Arabidopsis proteome suggested that the chloroplast proteome contains 520 proteins with one or more TMDs when using TMHMM as the predictor (www.cbs.dtu.dk/services/TMHMM/) (5). When using the consensus prediction listed in the latest release (Release 1.2) of the Aramemnon Database, 728 proteins with TMDs were found. These integral membrane proteins can be part of the thylakoid membrane or inner envelope membrane. The number of integral membrane proteins in the outer membrane is unknown because we cannot predict proteins in this membrane system (most do not have cTPs). Currently, fewer than 200 integral membrane proteins in the thylakoid and envelope membrane systems together have been identified, despite significant experimental efforts in both membrane systems. These efforts included substantial fractionation with organic solvents of salt-stripped thylakoid and envelope membranes, followed by ESI-MS/MS, as well as one-dimensional gels combined with on-line nano-LC-MS/MS (3,6,16,36,37). In addition, many integral membrane proteins were initially identified via genetic tools and more "traditional" protein analyses (see Ref. 5).
To identify additional integral membrane proteins or proteins that are tightly bound to the thylakoid via lipid anchors or short helical structures parallel to the membrane plane, we specifically developed a new fractionation protocol based on a so-called "three-phase partitioning" technique using t-butyl alcohol, originally successfully developed for soluble proteins (21). These TPP protocols were adapted here to membrane proteins using salt-stripped thylakoid membranes as our target proteome (Fig. 1, A and B). This adaptation comprises essentially the switch from t-butyl alcohol to 1-butanol, addition of solubilization buffer (SDS, urea, and TBP) to the upper and lower phases, as well as selection of appropriate (NH 4 ) 2 SO 4 concentrations and pH shifts. We chose to use 1-butanol because it does not require any addition of (NH 4 ) 2 SO 4 to form a two-phase system, which proved important for the fractionation of membrane proteins. The 1-butanol was saturated with SDS solubilization buffer to avoid depleting the SDS from the lower phase during the extraction of pigments and massive precipitation of the hydrophobic proteome. It should be pointed out that 1-butanol is only poorly miscible in water (maximum of 7%, w/v), but this increases up to ϳ20% when SDS and urea are added (data not shown).
Prior to the 1-butanol fractionation, the salt-stripped thylakoid membranes were solubilized with urea, SDS, and reductant to dissociate the lipids and to denature proteins and protein complexes. In addition, this liberated the large amount of apolar pigments (chlorophylls and carotenoids) from their protein carriers in the thylakoid membranes. Stripping of the membrane was carried out with 1 M KBr and 1 M KNO 3 combined, and we found this very effective (see "Experimental Procedures" for details) (Fig. 2, A and B). This detergent/lipid/ pigment/protein mixture was then mixed with 1-butanol at a 4:7 ratio, leading to partitioning of chlorophylls and carotenoids to the organic upper phase (Fig. 1, A and B). The pigment-enriched upper phase was then removed and replenished with new 1-butanol. Four successive extractions were necessary to completely remove the pigments from the lower phase. Analysis of these pooled upper phases showed that they contained only very small amounts of proteins from carryover (data not shown). Fig. 1B shows the efficiency of the pigment extraction; whereas the butanol phase from the first extraction (fraction E1) was dark green, that from the last extraction (fraction E5) was almost colorless. During the extractions (TPP0), a red precipitate was formed ( Fig. 1B) that was highly enriched in carotenoids and devoid of proteins (data not shown).
Once the pigments were removed, the proteins were stepwise precipitated by changing the concentrations of (NH 4 ) 2 SO 4 or changing the pH in the solution (Fig. 1A). The interphases (TPP1-TPP6) collected during the fractionation were all white/ gray, with the exception of TPP1, which was slightly green (Fig.  1B). Different (NH 4 ) 2 SO 4 concentrations and pH steps were tested, and the scheme presented in Fig. 1A was optimum for the thylakoid proteome. The amounts of proteins that precipitated in each TPP fraction were determined and tested for reproducibility by protein determinations and separation by SDS-PAGE (see legend to Fig. 1). Image analysis of four different gel-separated extractions stained with the fluorescent dye SYPRO Ruby also showed a very good reproducibility (Fig. 2C).

Fig. 2 (A and B)
shows the protein profiles (stained with silver or Coomassie Brilliant Blue) of the total thylakoid membrane, the KBr/KNO 3 -extracted peripheral population, the stripped membrane (the starting material for TPP fractionation), and the six TPP fractions. MS analysis showed that the major bands in the salt-extracted fraction corresponded to the three proteins of the water-splitting complex of Photosystem II and the ␣and ␤-subunits of the CF 1 complex (ATP synthase). These proteins were nearly completely removed from the stripped thylakoid membranes. The major bands in the stripped thylakoid membranes were the very abundant antenna proteins of Photosystems I and II as well as other membrane subunits of Photosystems I and II, the cytochrome b 6 f complex and the CF 0 complex of the ATP synthase.
The membranes were tested for contamination with chloroplast envelopes by Western blotting as described previously (3). As will become clear from the very extensive MS analysis (described below), the starting material did not contain any of the abundant chloroplast envelope proteins (e.g. the triose phosphate translocator representing 10% of the protein mass of envelopes) and only one protein of the envelope protein translocase complex Tic-Toc (Tic40). No obvious components from mitochondria or other non-plastid membrane systems (plasma membrane, endoplasmic reticulum, etc.) were identified, with the exception of a weak ambiguous identification of two predicted mitochondrial members of the FtsH protease family (see below). Thus, we concluded that the starting material for TPP fractionation and MS analysis was sufficiently pure to obtain meaningful proteome data and to assign the vast majority of the identified proteins to the thylakoid membrane.
We started the TPP fractionation at pH 8, which corresponds to the in vivo pH of the stroma. The hydrophobic proteome seems to have a population of proteins very sensitive to (NH 4 ) 2 SO 4 (TPP1) since a large portion of proteins precipitated already at 55 mM, including a significant amount of the abundant light-harvesting chlorophyll-binding proteins. A subsequent decrease from pH 8 to 5 was sufficient to selectively precipitate a number of proteins such as a 100-kDa protein (At3g18890) with unknown function (Fig. 2B, TPP 2 lane). By then increasing the (NH 4 ) 2 SO 4 concentration to 275 mM, a broad range of proteins precipitated quite specifically (TPP3): allene-oxide synthase (At5g42650), a glycine-rich protein with unknown function (At4g01050), the four thylakoid membrane members of the FtsH protease family (FtsH1, FtsH4, FtsH5, and FtsH8), and the thylakoid-bound nucleoid-binding protein MFP1 (At3g16000) (38) (Fig. 2B, TPP 3 lane). Some of the very abundant light-harvesting chlorophyll-binding proteins were still found in this fraction, but clearly the majority of them precipitated in TPP1 and TPP2. In the subsequent fraction (TPP4, formed by addition of (NH 4 ) 2 SO 4 until saturation), light-harvesting chlorophyll-binding proteins no longer significantly contributed to the population. Instead, the major bands were small proteins below 20 kDa. Components of the photosynthetic apparatus (PsbS, CF 0 -II, PsbR, and PsaF) were the more abundant proteins in this fraction ( Fig.   FIG. 1. Scheme of fractionation of the hydrophobic thylakoid membrane proteome by TPP. Thylakoid membranes were stripped of water-soluble and peripheral proteins using chaotropic agents (KBr and KNO 3 , both at 1 M) combined with sonication and dissolved in solubilization buffer at a final concentration of 3 mg/ml chlorophyll (A). 1-Butanol (n-butanol) saturated with solubilization buffer was added to the solubilized membrane at a ratio of 4:7. Chlorophylls and carotenoids were removed by sequential extraction. Upper and lower phases were vigorously shaken and spun at 7000 ϫ g for 5 min. A tiny interphase of precipitated carotenoids (TPP0) (B) was collected at the interface (or third phase) between the chlorophyll-enriched upper phase and the lower phase. The upper phase was also collected and replaced with a fresh upper phase to extract more chlorophylls. Five extractions (fractions E1-E5) (B) were necessary to remove all the chlorophylls. The membrane proteins were then fractionated by increasing the concentration of (NH 4 ) 2 SO 4 or by a pH shift. After each step, the third phase was collected and resuspended in solubilization buffer by sonication for a further separation by one-dimensional SDS-PAGE. The conditions under which the different fractions (TPP1-TPP6) (B) were precipitated and collected are indicated in the scheme. Average yields are as follows: peripheral fraction, 12%; TPP1, 78%; TPP2, 1.7%; TPP3, 1.5%; TPP4, 3.9%; TPP5, 0.2%; and TPP6, 2.1%.

2B, TPP 4 lane).
To be sure that we precipitated all proteins, we further changed the pH of the solution to first a very low pH (TPP5) and then to a very high pH (TPP6). PsbS (22 kDa) was the most abundant protein in the last two fractions, in addition to several proteins below 10 kDa (Fig. 2B, TPP 5 and TPP 6 lanes).

Identification of the Hydrophobic Thylakoid Proteome by In-gel Digestion and MS
An extensive MS analysis using one-dimensional RP-LC-ESI-MS/MS was carried out after in-gel digestion with trypsin. About 700 gel plugs (2.0-mm diameter) were excised from the  Fig. 1). Selected protein identities are indicated on the Coomassie Blue-stained gel. Note that in most protein bands more than one protein was detected. The gel image is split by a line; this is because these lanes were derived from different gels. Relevant molecular mass markers are indicated for the two parts of the gel image. C, SYPRO Ruby-stained gel lanes with TPP3 and TPP6 from four independent TPP fractionations, demonstrating the reproducibility of the TPP extraction procedure. LHCP, light-harvesting chlorophyllbinding protein; AOS, allene-oxide synthase; cytf, cytochrome f; GGR, geranylgeranyl reductase. gel-separated TPP fractions. Proteins in the gel plugs were digested with trypsin, and the extracted peptides were analyzed by RP-LC-ESI-MS/MS. The rationale for this extensive sampling was that we also "picked" gel regions outside the visible protein bands, avoiding domination of spectra by peptides derived from the most abundant proteins and enabling us to identify proteins with relatively low expression levels.
Indeed, 242 proteins were identified; 21 of these were chloroplast-encoded proteins, and the remainder were nuclear encoded proteins (Supplemental Table 1). The minimum criteria for MS-based identification are detailed under "Experimental Procedures," and the probability-based Mowse score (as a measure of significance), the number of matched peptides, and the highest peptide score in Mascot are available via PPDB for each accession number and for each time the accession number was identified. Many of the proteins were identified more than once, providing additional significance to the identifications. As will be discussed in more detail, care was taken not to overannotate proteins that are members of multigene families. In case the MS/MS data could not distinguish between two closely related proteins, their identification was marked as "ambiguous," and these can also be viewed in PPDB (see "Experimental Procedures" for details and "Discussion").
The nuclear encoded proteins were analyzed for the presence of a predicted cTP using TargetP. 84% of the 221 identified nuclear encoded proteins were predicted to have a cTP, suggesting that indeed most, if not all, are chloroplast proteins. After correction of detected incorrect gene models, the number of predicted cTP-containing proteins increased to 85% (see below). This is very similar to sensitivities of TargetP with curated data sets (4,5), despite reports of much lower sensitivities (see "Discussion") (37). To determine the number of proteins with TMDs, we used the most conservative predictor TMHMM, verified the prediction in case the topology had been experimentally determined, and corrected the prediction where necessary. After correction, 40% of the proteins were predicted to have at least one TMD. A consensus prediction from the Aramemnon Database (Release 1.2) suggested nearly 50% TMD-containing proteins. All these parameters and predictions can be retrieved via PPDB.

In-solution Digestion of the Six TPP Fractions and Compatibility with ESI-MS/MS
In-solution enzyme digestion of hydrophobic proteins has proven to be very difficult for various reasons. However, if successful, it provides several major advantages over in-gel digestions. Particularly, it avoids time-consuming gel separation steps (gel cutting, washing, and concentration of relatively large volumes of peptides). In addition, sensitivity is expected to increase since gel extraction of digested proteins leads to losses, avoided by in-solution digests.
Therefore, we made an effort to develop a protocol that can digest the TPP fractions with trypsin (the protease of choice) in-solution. Using a combination of a protocol for SDS removal by acetone precipitation (31) and in-solution trypsin digestion in 10% Me 2 SO, we were able to obtain very efficient and reproducible digestions while being fully compatible with nano-RP-HPLC and ESI-MS/MS. Efficiency of digestion was verified by MALDI time-of-flight MS measurements in linear mode (data not shown). Compared with the very extensive analysis of gel-separated proteins discussed above, no additional proteins were identified by the limited MS/MS analysis of the in-solution trypsin digestion. However, the in-solution digestion typically resulted in higher sequence coverage (and higher Mascot scores) for abundant hydrophobic proteins with many TMDs (e.g. the D1 and D2 proteins with five TMDs and the Photosys-tem I reaction center proteins PsaA and PsaB, both with 11 TMDs).

Annotation of Gene Models
The TAIR Database reports more than one gene model for 1411 (5%) of 27,170 genes of the annotated A. thaliana genome (1141 genes with two, 109 with three, and 17 with four or more gene models). Each gene model might be biologically relevant (5). To detect discrepancies between the apparent masses from the one-dimensional electrophoresis gels and the masses calculated from the gene models, we verified for large discrepancies between experimental molecular masses from the onedimensional gels (Fig. 2) and predicted masses (also after removal of predicted cTPs) of all identified proteins. After careful analyses using the MS/MS data as well as the latest information from expressed sequence tags (ESTs) in Arabidopsis and other plant species, we determined four interesting discrepancies, including "gene fusions." At3g12340 contains two gene models in the TAIR Database with predicted masses of 42 and 22 kDa; the apparent mass found in the gel was 20 kDa. By matching our MS/MS data (three peptides) to the protein sequence, we determined that the identified protein corresponded to Gene Model .2. The fulllength cDNA (AY099869) and different ESTs (e.g. AV530598) matched only Gene Model .2. (data not shown).
At5g51540 (one gene model; no full-length cDNA in the TAIR Database) is annotated as a predicted mitochondrial endopeptidase-like protein with a predicted mass of 96 kDa, whereas we identified the protein in the 20-kDa region on the onedimensional electrophoresis gels. Our MS/MS data (five peptides) fit only the C-terminal part of the protein (Fig. 3A). Searches in EST data bases (e.g. 32499817 and 41053037) showed a coverage of only this C-terminal domain (Fig. 3A). Interestingly, EST 32499817 from Brassica napus, a species closely related to A. thaliana, contains the likely N terminus and, importantly, is predicted to have a cTP by TargetP. We suggest that this C-terminal domain is in fact a small thylakoid protein of ϳ20 kDa containing two predicted TMDs and that this gene should be renamed. The N-terminal part of At5g51540 shows some similarity to an endopeptidase, but no EST or full-length cDNA has been found for this part of the gene model.
At4g37920 (one gene model) has a predicted mass of 76 kDa, but was found on the gels at 26 kDa. Our MS/MS data (four peptides) fit only the N-terminal part of the gene model (Fig. 3B). A full-length cDNA (BT002988) overlaps completely with the C-terminal part of the gene model, but not at all with the N-terminal half (Fig. 3B). Matching ESTs (AJ610260, AV831009, CD814385, AU237830, and others) do not bridge these C-and N-terminal parts of the gene model (Fig. 3B), suggesting that At4g37920 covers two expressed proteins.
At3g46780 (one gene model) was identified by nine peptides in MS/MS, fully supporting the gene model (data not shown). Interestingly, the corresponding full-length cDNAs (AF367356 and AY143685) do not support this gene model, possibly suggesting alternative splicing. We found a number of additional proteins (e.g. At3g18890 and At3g14110) with a significant mismatch between experimental protein mass and predicted mass, but we were not able to explain these discrepancies.

Newly Identified Thylakoid Proteins in TPP
The 242 identified proteins were cross-checked against our earlier study of the salt-stripped (Na 2 CO 3 ) thylakoid membrane proteome using acetone/chloroform/methanol extractions as well as against our study on the luminal and peripheral thylakoid proteome using two-dimensional electrophoresis gels FIG. 3. Identification and annotation of different gene models in A. thaliana. A, At5g51540 is predicted to be a 96-kDa mitochondrial protein, but was identified in this study as a 20-kDa protein on one-dimensional electrophoresis gels. Five internal sequences determined by nano-ESI-MS/MS matched the predicted protein and are indicated by boxes in the predicted sequence. No full-length cDNA has yet been reported for this gene model. Alignment of the two different ESTs (237785 from A. thaliana and CD817877 from B. napus) indicates a match with the C-terminal part of the gene model. Predicted TMDs (by TMHMM) are indicated by green rectangles. The putative starting methionine in the C-terminal part of the EST from B. napus is indicated by the arrow, and the protein is predicted to have a cTP (by TargetP). B, At4g37920 is predicted to be a 77-kDa mitochondrial protein, but was identified in this study as a 26-kDa protein on one-dimensional electrophoresis gels.

TABLE I Accession numbers, features, and functions of the 119 proteins newly identified by TPP analysis
Laboratory annotation, number of (curated) TMDs using TMHMM, manual curation based on experimental primary literature, and functional classifications are listed. All identifications are based on MS/MS data using Mascot as the search engine. Maximum probability-based Mowse scores of identification, highest peptide scores, and Bins with more detailed functional assigments can be found via PPDB and in supplemental Table 1.  TABLE I-continued for protein separation (3,27). These three approaches resulted together in 319 identified proteins, with 119 proteins identified exclusively after TPP fractionation (Fig. 3A). 73 proteins were also found in the analysis of the proteome of the salt-stripped thylakoids using chloroform/methanol extractions, and 32 were identified using both two-dimensional electrophoresis gels and solvent extractions (Fig. 3A). Table I lists these 119 TPP proteins, predicted cTP and TMDs, and functional classifications. When excluding two obvious mitochondrial proteins and three outer envelope proteins, 83% have predicted cTPs.
To determine how this partitioning relates to abundance and functional role, we used the recent classification in MapMan (26) as our starting point. We generously received the classification for 22,000 Arabidopsis genes from the curator of Map-Man (26) and verified the classification for the 320 proteins. Where needed, classifications, named "Bins," were corrected or refined, or if proteins had not yet been assigned, we placed them in the appropriate Bin based on published literature and prediction of functional domains using PFAM HMM (39) as well as on information from KEGG (www.genome.ad.jp/kegg/ kegg2.html) and Aracyc (www.arabidopsis.org/tools/aracyc/). These assignments can be found in Supplemental Tables 1 and  2 and can also be extracted via PPDB. Together, the 320 proteins were assigned to Ͼ50 Bins. However, to obtain a better overview of the thylakoid function and fractionation, the Bin assignments were grouped into seven categories (Fig. 4B).
The majority (63%) of the 73 shared identifications between TPP and acetone/chloroform/methanol extraction were highly abundant proteins of the photosynthetic apparatus, with 23% proteins without any known function (Fig. 4B). Many (56%) of the 32 proteins identified in all three analyses were abundant "photosynthetic" soluble or peripheral proteins, in addition to FIG. 4. Cross-correlation of the different proteome analyses of the thylakoid membrane system and functional classification of the uniquely identified TPP fraction. A, comparison of the protein accession numbers identified by two-dimensional electrophoresis (2-DE) gel analysis of proteins extracted by Yeda press, salt wash, and Triton X-114 (3,27); by acetone/chloroform/methanol (A/C/M) extraction of salt-stripped thylakoids (3); and by TPP (this study). All proteins were analyzed by TargetP for the presence of a predicted cTP. The sum of the predicted cTP-containing proteins and chloroplast-encoded proteins (Cp) is indicated as a percentage for each cross-section. The number of predicted TMDs was also calculated for each cross-section using TMHMM or the consensus prediction listed in the Aramemnon Database. Predictions were corrected if the number of TMDs was experimentally determined. B, simplified functional classification for the different protein populations indicated in the Venn diagram in A. The functional categories are (i) Calvin cycle, photorespiration, minor carbohydrates, starch, OPP, and glycolysis; (ii) DNA organization, transcription, and translation; (iii) redox proteins, oxidative defense, and stress response; (iv) protein fate; (v) thylakoid (cyclic) electron transport, ATP synthesis, and chlororespiration; (vi) unknown function; and (vii) other functions. This demonstrates that most of the abundant proteins involved in the electron transport chain were identified by two or all three experimental approaches. In contrast, the majority of the 119 proteins that were identified only in the TPP fractionation experiment have no known function. OPP, oxidative pentose phosphate pathway. 16% proteins with unknown function and 16% involved in oxidative defense (Fig. 4B). As expected, the overlap between the TPP analysis and the luminal and peripheral proteins was only small (Fig. 4A) because most of the soluble proteins were stripped from the thylakoid membranes before the TPP fractionation. However, this overlap consisted of proteins with either unknown function or function related to protein assembly, folding, and degradation (protein fate) (Fig. 4B). A small percentage of the proteins identified only by TPP were involved in photosynthesis, whereas a large percentage had no known function (50%) as well as protein fate (14%) and various other functions (see below). Whereas our previous studies extracted most of the abundant photosynthetic machinery, the TPP fractionation revealed a new class of proteins of much lower abundance, thus significantly improving the dynamic resolution of the thylakoid proteome analysis. 39 of the 119 newly identified proteins have one or more predicted TMDs, and 79% have predicted cTPs. We highlight below the function of some of the 119 identified proteins. Chlorophyll/Prenyl Lipid Biosynthesis-PorB, PorC, CHL27, FLU, and geranylgeranyl reductase are all involved in chlorophyll/prenyl lipid biosynthesis. CHL27, a putative diiron protein required for the synthesis of protochlorophyllide and a candidate subunit of the aerobic cyclase in chlorophyll biosynthesis, has been shown to be associated with both inner envelope and thylakoid membranes (40). FLU, a negative regulator of chlorophyll biosynthesis, is associated with a total chloroplast membrane fraction (41). Western blotting showed that PorB and PorC associate exclusively with thylakoid membranes (42,43), although PorB in particular was also identified in experimental proteomics studies on the chloroplast envelope (6,7).
Protein Sorting-The TPP fractionation allowed the identification of proteins involved in protein translocation (in)to or across the thylakoid membrane. These include ALBINO3 (44,45) and TatA/E (Tha4) and TatB (HCF106), subunits of the twin arginine translocation pathway, as well as soluble cpSRP43. VIPP1 (At1g65260) was earlier identified genetically as a chloroplast mutant and was found peripherally associated with both the thylakoid and inner envelope membranes (46). VIPP1, with a predicted PspA domain, is a possible enhancer of the Tat translocation pathway (47). Interestingly, the Arabidopsis nuclear genome has six genes with predicted PspA domains, but only the identified VIPP1 protein has predicted cTPs.
Stress Defense-Several stress-related proteins were among the 119 TPP proteins. Two Lil3 proteins were identified; these are small chlorophyll-binding proteins with two TMDs and are believed to be induced during light stress in thylakoids (48). Another stress-related thylakoid-associated protein, fibrillin (At1g51110), was identified. Fibrillins are believed to be associated with lipids/carotenoids and are typically induced by different stresses (49). All 14 predicted fibrillins in Arabidopsis are also predicted to be localized to chloroplasts. So far, we identified nine of them in our experimental studies, and three in particular (At4g04020, At4g22240, and At3g23400) were identified many times (Ͼ20 times), suggesting a high abundance. Other stress-related proteins include the thylakoidbound ascorbate peroxidase (At1g77490), peroxiredoxin Q (At3g26060), and glutathione peroxidase (At2g25080). These redox-active components are known chloroplast components involved in different aspects of oxidative stress defense and pathogen response (reviewed in Refs. 50 and 51).
Protease and Peptidases-A thylakoid-processing peptidase, TPP-2, involved in cleavage of luminal transit peptides, was identified among the 119 TPP proteins. TPP-2 is one of the three general thylakoid-processing peptidases. A small family of thylakoid-bound Zn 2ϩ -containing metalloproteases (FtsH1, FtsH2, FtsH5, and FtsH8) was well resolved on the gel and enriched in almost a single TPP fraction, TPP3 (Fig. 2B). They are all believed to have two TMDs, and each of them was identified Ͼ20 times. We also ambiguously identified two mitochondrial isoforms, FtsH3 and FtsH10; they were identified only once (probability-based Mowse score of 50). Finally, we identified a putative and new chloroplast-predicted metalloprotease (At5g05740) that belongs to the M50 family and is predicted to have three to six TMDs. Eight of these M50 metalloproteases are present in Arabidopsis, and five of them are predicted to have cTPs.
Signaling and Jasmonic Acid Biosynthesis-At5g22640 is a protein with a MORN (membrane occupation and recognition nexus) motif, which is present in proteins of neuronal cells and is involved in the junction between two membranes. Some MORN motifs have been found in phosphatidylinositol 4,5phosphate kinase involved in signaling. About 15 proteins in Arabidopsis have a MORN motif, and only two are predicted to have a cTP. This protein was recently also identified in an envelope proteome analysis (7). We identified three proteins involved in the synthesis of jasmonic acid; two are closely related paralogs of allene-oxide cyclase, AOC1 (At3g25760) and AOC2 (At3g25770), and one is allene-oxide synthase (At5g42650). They were all identified multiple times in the TPP fractions, often with very high probability-based Mowse scores, suggesting that they are quite abundant. Allene-oxide synthase is a single copy gene, whereas four genes encode allene-oxide cyclase (52). Interestingly, we identified only two of the four paralogs of allene-oxide cyclase; one form, AOC2 (At3g25770), was identified Ͼ30 times, with high probability-based Mowse scores, suggesting the highest abundance of this particular paralog. Such details are accessible via PPDB. One protein (At1g55480) is a homolog of shoot1 from soybean, with very weak homology to tyrosine phosphatase. This soybean gene was part of a quantitative trait locus implied in fasciation (53). Chloroplast-predicted At2g21530, convincingly identified a number of times, contains a Forkhead-associated domain, which is a modular phosphopeptide recognition motif. Only 13 proteins in Arabidopsis have a Forkhead-associated domain, and four are predicted to have cTPs.
DNA Binding-MFP1 (At3g16000) has been shown to interact with the plastid chromosome, organized as so-called nucleoids. After a number of conflicting reports, the protein was shown to be localized to the thylakoid (38). At1g14345 has some similarity to VIP1, a cytosolic component shown to interact with a DNA-binding protein from Agrobacterium.
Plastoskeleton-At1g50020 is annotated as a tubulin ␣ 6chain, but none of the domain predictors tested suggest such a domain. Interestingly, in vitro experiments showed that ␥-tubulin is detectable on the surface of isolated plastids (54). Six proteins with a tubulin domain are predicted to be localized to chloroplasts and can be structural components of the plastoskeleton. Moreover, at least 30 proteins with a kinesin domain and 14 with a myosin domain are predicted to be plastidic.
Rhodanese Domain Proteins-We have now identified four proteins in the thylakoid with a rhodanese domain (At3g25480, At2g42220, At4g27700, and At4g01050), with three identified exclusively by TPP analysis. In particular, At4g01050 seemed highly expressed given that we identified this protein many times. 7 of 17 predicted rhodanese domain proteins in Arabidopsis have a predicted cTP, but the function of this domain is not well understood.
DnaJ Central Domain Proteins-Six proteins (At4g13670, At1g80030, At5g21430, At2g22360, At1g75690, and At4g-39960) have a DnaJ-like domain (central domains), suggesting that they might have chaperone-like functions. In particular, At5g21430 was identified many times (see PPDB). Interestingly, several have a predicted TMD. Proteins with a TPR or PPR Motif-4 of 119 newly identified proteins have a TPR (55) or PPR (56) motif. The PPR proteins compose a very large protein family in mitochondria and chloroplasts often involved in RNA-related processes (e.g. Refs. [57][58][59]. The TPR family in Arabidopsis is even larger and is emerging as a family of important protein assembly factors and/or co-chaperones (60,61).
Envelope Proteins-We identified two inner envelope proteins (Tic40 and ceQORH) and two outer membrane proteins (OE6 and OM24). Tic40 (At5g16620) has been studied quite intensively; it has one TMD and a large hydrophilic loop on the stromal side, and an Arabidopsis disruption mutant was recently identified (60). ceQORH was shown to be located at the inner envelope membrane. This protein is unusual in that it does not have a cleavable cTP, and neither the N nor C terminus seems to be required for chloroplast targeting (62). The functions of ceQORH, OE6 (63), and OM24 (64) are unknown. These four proteins likely represent an envelope contamination, in contrast to VIPP1 and CHL27, which are known to associate with both thylakoid and envelope membranes.

Cross-correlation of the 242 TPP Proteins with Proteomics Studies on Other Cellular Compartments in A. thaliana
As we penetrate deeper into the thylakoid proteome, there is potentially an increasing risk of identifying proteins that in fact come from other subcellular, non-chloroplast locations. Over the last few years, several proteome studies have been published on non-chloroplast compartments in A. thaliana leaves and cell cultures. We collected these data sets, two studies on plasma membranes (65, 66); one study on the total vacuole (67), the tonoplast (68), the peroxisome (69), the nucleus (70), the cell wall (71), and the hydrophobic mitochondrial membranes (72); and a dozen other mitochondrial proteome analyses, all of which are stored in a mitochondrial proteome data base (www.mitoz.bcs.uwa.edu.au/) (73). Where necessary, we converted gi accession numbers into AGI accession numbers. We then cross-correlated the 242 TPP proteins with these data sets (Table II). In total, 15 TPP proteins were identified in these non-chloroplast proteome analyses; however, nine are confirmed and abundant thylakoid or chloroplast proteins. FtsH3 and FtsH10 (the MS data did not allow us to distinguish between these paralogs) are likely mitochondrial based on their subcellular localization prediction and were already classified in Fig. 5B as non-chloroplast components. One protein (the signal particle receptor) is likely an endoplasmic reticulum protein. Finally, the location and function of four others are unknown, but three of them (At2g43630, At3g63490, and At5g23060) were identified in chloroplast proteome analysis by other laboratories (Table II). We can thus conclude that the 242 proteins identified in this TPP analysis likely contain very few proteins from non-chloroplast compartments. DISCUSSION TPP Is a Versatile Method in Membrane Proteomics-Identification of membrane proteomes with high dynamic resolution remains challenging, despite significant progress in recent years (9,19,74,75). High dynamic resolution of any complex proteome, hydrophobic or hydrophilic, can currently be achieved only upon sufficient fractionation. Here, we have presented a simple, fast, and scalable off-line procedure based on TPP with 1-butanol to fractionate membrane proteomes in combination with either in-gel or in-solution digestion and LC-ESI-MS/MS. The main advantages of TPP compared with other fractionation techniques are speed (fractionation can be done in Ͻ2 h) and the easy removal of apolar pigments and lipids. In contrast to our experience with sequential extraction with organic solvents (chloroform/methanol) (see Ref. 3), the protein aggregates were easy to resolubilize in SDS solubilization buffer, and no residual proteins remained at the end of the FIG. 5. Comparison of thylakoid and envelope proteomes. Proteins were identified by large-scale proteomics studies, by more classical biochemical tools, or by forward or reverse genetics. Thylakoid proteins were collectively identified in this study and in other experimental thylakoid proteomics studies (3,27) as well as in envelope proteome analysis as described previously (6,7,36). In addition, the literature was carefully screened for additional thylakoid and envelope proteins. This resulted in 384 thylakoid and 429 envelope proteins (associated or co-purified). A and B, functional classification of thylakoid and envelope proteomes displayed as pie diagrams. The functional classification in Bins and assignment into more general functional categories are listed for each accession number in Supplemental Table 2. aa, amino acids. OPP, oxidative phosphate pentose pathway. C, cross-correlation of the thylakoid (384 proteins) and envelope (429 proteins) proteomes and the proteome analysis of total chloroplasts (687 proteins) described in Ref. 37. All proteins were analyzed by TargetP for the presence of a predicted cTP, indicated as a percentage for each cross-section. The sum of the predicted cTP-containing proteins and chloroplast-encoded proteins (Cp) is also indicated as a percentage for each cross-section. fractionation procedure. Many parameters in the TPP fractionation procedure can be changed (solvent, salt, and pH) and adapted to different membrane proteomes (e.g. Escherichia coli inner membranes) (data not shown), making it a versatile procedure. We have also shown that the TPP fractionation method is compatible with in-solution digestion with trypsin and both ESI and MALDI. Thus, TPP fractionation, together with one-dimensional gels and in-gel digestion or SDS removal and in-solution trypsinization in Me 2 SO, offers a very attractive strategy for analysis of membrane proteomes. This should help to further accelerate the field of membrane proteomics and to eliminate the under-representation of membrane proteins in experimental studies.
TPP Fractionation Unveils a New Set of Thylakoid Proteins: Improved Dynamic Resolution of TPP-To make the functional classification of this data set compatible with other classifications, we used the recent classification in MapMan (26) as our starting point. The underlying thought here is to hopefully contribute to a functional classification of the Arabidopsis genome that will be used by the Arabidopsis community for proteome, metabolome, and transcriptome analysis. Comparison of earlier experimental analysis with the new TPP analysis showed that the TPP fractionation was very effective in identifying the abundant photosynthetic machinery. More importantly, the TPP analysis visualized a whole new "layer" of proteins, including nearly 100 proteins with unknown function as well as two low abundant enzymes involved in tetrapyrrole biosynthesis and protein translocation components of the thylakoid membrane system.
Comparison of the Composition and Function of the Thylakoid and Envelope Proteomes: Is There a Common Denominator?-The chloroplast membrane proteome includes the thylakoid membranes and the outer and inner membranes of the chloroplast envelope. The envelope double membrane forms a barrier between the cytosol and plastid stroma and is therefore critical in the translocation of small molecules and proteins (76 -78). The envelope is also the site for a number of important biosynthetic pathways, including fatty acid and lipid biosynthesis (79 -81). The chloroplast envelope proteome has been studied using "classical" biochemical techniques and genetics TABLE III 24 proteins with unknown function identified in both thylakoid and envelope preparations Laboratory annotation, predict cTPs, the number of predicted TMDs using two different predictions, the thylakoid preparation in which the protein was identified, the highest peptids scores in MS/MS-based identification, and references for envelope location are listed, plus the calculated molecular masses (in kilodaltons) of the unprocessed and processed proteins (after removal of the predicted cTP). All identifications in the thylakoid fractions are based on MS/MS data using Mascot as the search engines. and, in recent years, by mass spectrometry (6,7,36,62,82). The accession numbers from these large-scale envelope proteomics studies were collected, as well as envelope proteins identified in more "gene for gene" studies, totaling 429 proteins. To compare the envelope proteome with the thylakoid proteome, we pooled the 319 proteins that we experimentally identified in our proteomics studies and those from Ref. 83 and carefully screened the primary literature, resulting in an additional 65 thylakoid proteins. This resulted in a total of 384 thylakoid (associated) proteins.
We did not attempt to remove any contaminations from these large-scale proteomics studies to avoid selecting against proteins that might associate with both the thylakoid and envelope membranes. This was possible only because the different membrane proteome preparations were of relatively high purity. Here it is relevant to point out the interesting finding that a set of six enzymes involved in cytosolic glycolysis were identified in a proteome analysis of purified mitochondria from A. thaliana (84). These "cytosolic" enzymes likely function in association with the mitochondria, thus allowing pyruvate (the end product of glycolysis) to be provided directly to the mitochondrion, where it is used as a respiratory substrate (84).
The assembled 384 and 429 thylakoid and envelope proteins together result in 714 non-redundant proteins, and each accession number was assigned to a functional class (Bin), as discussed above. These Bins were then grouped into 15 different categories. These Bin assignments are listed in Supplemental Table 2 and are also available via PPDB. Their distribution for both membrane proteomes are displayed in two "pie" diagrams (Fig. 5, A and B).
The difference between biological functions is striking. Fig. 5 clearly shows the primary function of each membrane system in photosynthetic electron transport in the thylakoid fraction (31%) versus transport of small molecules (9%) and hormone and lipid metabolism (10%) in the envelope fraction. It was quite surprising to note the relative large percentage of proteins in the envelope fraction involved in DNA/RNA binding and plastid gene expression (9%). The thylakoid fraction contains a significant population of proteins involved in oxidative stress response (8%), which is understandable given the high rates of electron transport and high redox potentials. Both membrane fractions have a significant amount of proteins devoted to protein targeting, translocation, folding, and proteol-ysis as well as other protein biogenesis factors. 23 proteins in this functional category are found in both membrane fractions. These include 10 members of the Clp (ClpP3, ClpP5, ClpR2, ClpR3, ClpR4, ClpS1, and ClpS2) and FtsH (FtsH2, FtsH5, and FtsH8) protease families and four chaperones (ClpC1, Hsp70, Cpn60, and Cpn21). The Clp protease family members are part of a 325-kDa complex that is relatively abundant in the stroma but that also associates with the thylakoid membrane (33). The three FtsH proteins were shown to be thylakoid membrane components (85,86). In addition, these 23 proteins include cpSRP43, cpSecA, Tha4, and Tic40, involved in protein targeting/translocation; Tha4 is part of the thylakoid Tat translocon (87), and Tic40 is part of the inner membrane protein translocon (60), whereas SRP43 and SecA are proteins known to be involved in targeting of proteins to the thylakoid membrane (88,89). VIPP1 was shown to associate with both membrane systems (see above), and At5g17170 has a PDZ and rubredoxin domain and is likely involved in protein-cofactor assembly.
Proteins with Unknown Function: Where Do They Belong?-Finally, both membrane fractions contain a similar percentage of proteins with completely unknown functions (25 and 28%) (Fig. 5, A and B). This population is of great interest, as it holds the key to further understanding the two membrane systems. When cross-correlating these proteins with unknown function (120 envelope-associated and 97 thylakoid-associated proteins), only 24 were found in both membrane fractions (Table  III), suggesting that most of the 193 non-redundant proteins are specific to either membrane system. Some of these are probably also present as soluble proteins in the stroma and, in the case of the proteins associated with the outer membrane, also present in the cytosol. 18 of the 24 (75%) have a predicted cTP. As discussed earlier, our MS/MS data and EST information showed that gene model At4g37920.1 is not correct. Two proteins were established to be outer envelope proteins (OEP6 (63) and OM24 (64)); and indeed, they do not have predicted cTPs. When excluding these outer membrane proteins and including the correction of the gene model, 19 of 22 proteins (86%) have a predicted cTP, which reflects previous observations of TargetP sensitivity (5). We also verified the origin of their identification in each membrane system and the highest peptide score for those identified in the TPP fractionation (Table III). Further experimentation will be needed to firmly establish the location of each of these 24 proteins; but based on currently available information, we suggest that many are thylakoid proteins.
Recently, we proposed a "rule of thumb" for prediction of integral membrane proteins in the thylakoid versus envelope membrane (5). We suggested that thylakoid membrane proteins typically have no or few cysteines, are small, and have acidic pI values (after removal of the cTP). Indeed, when plotting the 24 proteins against these three predicted parameters, most follow the rule of thumb for thylakoid membrane proteins (data not shown).
Recently, an extensive proteomics study was published in which the complete chloroplast was analyzed, resulting in a list of 687 proteins (37). It was reported that some 50% of the identified proteins were not predicted to have a cTP (by Tar-getP) and therefore that TargetP was not a sensitive subcellular localization predictor. This set of 687 chloroplast proteins was cross-correlated with the thylakoid (384 proteins) and envelope (429 proteins) proteomes and displayed in the Venn diagram in Fig. 5C. All proteins were analyzed by TargetP for the presence of a predicted cTP, as is indicated by a percentage for each cross-section. The Venn diagram shows that the thylakoid, envelope, and total chloroplast fractions contain a unique set of proteins (150, 222, and 370 proteins, respectively). In particular, the cross-sections between the thylakoid and total chloroplast and the shared population among all different fractions show a predicted cTP content of Ͼ90%, suggesting that the TargetP predictor is at least as sensitive as originally reported (4) and confirmed by larger curated data sets (5). The percentage of predicted cTPs in the envelope fraction is significantly lower, which is expected since the outer envelope membrane proteins do not have a classical cleavable cTP. Finally, the 370 proteins unique to the total chloroplast fraction show only 37% predicted cTPs, suggesting that half of them are not chloroplast proteins.
PPDB, MS-based Identifications, and Ambiguity Display-Information concerning all proteins discussed in this study can be retrieved from PPDB. This includes in-house annotation and details for each MS/MS-based identification (probability-based Mowse scores, highest peptide scores, and number of matching peptides). Here, we want to highlight the problem of ambiguous MS-based identification, often ignored in the literature. In quite a few instances, it is possible that the same MS/MS spectrum matches more than one protein. In simple cases, these matches represent different interpretations of the same MS/MS spectrum, which is then often reflected in different peptide scores (and peptide rank in Mascot). In many other cases, only one peptide sequence is interpreted from the MS/MS spectrum, but it matches more than one protein. This can easily lead to "over-identification" and overestimation of the number of proteins in the sample. If the identification of a protein depends entirely on peptides (MS/MS spectra) that also match other proteins, there is a significant chance that it is a false-positive identification. Using an in-house built software routine, we have filtered for such ambiguous identifications and manually removed the obvious false-positives. Many of these ambiguities reflect the different gene models; in other cases, they represent very closely related proteins. These ambiguous identifications are linked to each other in PPDB and can be examined directly.
Conclusion-Knowledge of the thylakoid proteome has greatly expanded over the last few years. Fig. 6 provides a schematic overview of the different functions of experimentally determined proteins in and around the thylakoid. From this analysis, we can draw a number of conclusions. (i) TPP fractionation of the proteome of the stripped thylakoid membrane allowed identification of a new group of low abundant proteins, most of which have never been studied before. (ii) Proteome analyses using different fractionation techniques in parallel are critical to establish subcellular protein locations. (iii) Thylakoid membranes and chloroplast envelopes have distinct proteomes, supporting the absence of significant physical interactions. (iv) TargetP sensitivity for prediction of chloroplast proteins is close to 90%, particularly if gene models are correct.