Non-native, N-terminal Hsp70 Molecular Motor Recognition Elements in Transit Peptides Support Plastid Protein Translocation*

Background: Transit peptides direct the import of the majority of chloroplast-destined proteins into chloroplasts. Results: Multiple unrelated Hsp70 recognition elements are able to substitute for the translocation determinant domain of transit peptides. Conclusion: The majority of transit peptides utilize Hsp70-interacting property to initiate import. Significance: This finding expands the mechanistic view of the chloroplast protein import process. Previously, we identified the N-terminal domain of transit peptides (TPs) as a major determinant for the translocation step in plastid protein import. Analysis of Arabidopsis TP dataset revealed that this domain has two overlapping characteristics, highly uncharged and Hsp70-interacting. To investigate these two properties, we replaced the N-terminal domains of the TP of the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase and its reverse peptide with a series of unrelated peptides whose affinities to the chloroplast stromal Hsp70 have been determined. Bioinformatic analysis indicated that eight out of nine peptides in this series are not similar to the TP N terminus. Using in vivo and in vitro protein import assays, the majority of the precursors containing Hsp70-binding elements were targeted to plastids, whereas none of the chimeric precursors lacking an N-terminal Hsp70-binding element were targeted to the plastids. Moreover, a pulse-chase assay showed that two chimeric precursors with the most uncharged peptides failed to translocate into the stroma. The ability of multiple unrelated Hsp70-binding elements to support protein import verified that the majority of TPs utilize an N-terminal Hsp70-binding domain during translocation and expand the mechanistic view of the import process. This work also indicates that synthetic biology may be utilized to create de novo TPs that exceed the targeting activity of naturally occurring sequences.

In plants, between 2000 and 5000 nucleus-encoded proteins are predicted to target to plastids (1). More than 92% of plastid-localized proteins contain an N-ter extension, TP, 3 that has been added to the mature domain via evolution creating a precursor form (2,3). TPs govern post-translational targeting of precursors into the plastid stroma via sequential interaction with the TOC and TIC (the translocons at outer and inner chloroplast envelopes), respectively (4,5). Upon entry into the stroma, TPs are removed via proteolytic processing by the stromal processing peptidase. Despite identification Ͼ30 years ago (6), little is known about how TPs accomplish this function.
Although there is a plethora of programs to predict TPs (7-9), these tools do not provide insight into how TPs can function in a common import pathway. Moreover, recent bioinformatic analysis has not been fruitful in identifying the consensus motifs within TPs (10). However, when TPs are clustered into subgroups, a few short conserved peptide motifs were identified within each subgroup, but their roles remain largely uncharacterized (10). Despite the large number of TPs and short amino acid (aa) length, structural approaches have not provided much insight. This is due in part to TPs being highly unstructured in aqueous solutions (11,12) hindering direct structure-function analysis. Nevertheless, three loosely defined regions in TPs have been identified as follows: (i) N-ter of ϳ10 uncharged aa ending with Pro/Gly and preferably Ala as the second aa, (ii) central domain lacking acidic aa but rich in hydroxylated aa, and (iii) C-ter rich in Arg and possibly forming an amphiphilic ␤-strand (5,13).
The importance of the highly uncharged TP N-ter (13) in plastid protein import has been shown both in vitro (14 -17) and in vivo (10, 18 -20). These studies have proposed that the TP N-ter of the small subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase (SSU) and ferredoxin (FD) directs precursor binding during the import process by interacting with the envelope lipids or the Toc159 receptor. However, we generated synthetic TPs lacking the uncharged N-ter but still able to bind to plastids, similar to the wild-type TPs. These synthetic TPs of SSU (SStp) and FD (FDtp) contain the reversed aa sequences from C-to N-ter. Together with a series of their N-ter mutants, these constructs have confirmed the TP N-ter as an essential determinant for protein translocation into plastid stroma (21).
In addition to the uncharged properties, the TP N-ter has been shown to harbor a strong Hsp70-binding site (H70BS) (22)(23)(24). Based on the predictions, all of our import-competent mutants contain a strong H70BS at their N-ter, which the import-deficient mutants lack (21). Because the stromal Hsp70 was shown to act as a plastid translocation motor (25)(26)(27)(28)(29), we suspected that the N-ter H70BS in our import-competent constructs is required for the stromal Hsp70 interaction to trap and/or pull the precursor into plastids (21) similar to the proposed unfolding and pulling model of chloroplast protein import (30).
To further address these two N-ter properties, we have constructed new TPs based on the forward (native) and reverse constructs of SStp that are called SSF and SSR, respectively. The N-ter of these constructs was extended to include peptide sequences derived from a comprehensive study of 36 peptides where their affinities for multiple Hsp70s have been determined (31). Nine peptides were selected that are strongly interacting with all chaperones to noninteracting with any of them. We further determined the relative affinity of these peptides to the chloroplast Hsp70 CSS1. The effect of the N-ter alteration in preprotein import was assessed by in vivo and in vitro import assays. In addition, the recognition of physicochemical properties was further elucidated by reversing and scrambling the N-ter 10 aa. Collectively, these results demonstrate that the addition of a short, non-native, Hsp70-interacting domain that even includes charged residues at the N-ter of a TP can support preprotein translocation into plastids in vivo and in vitro. This work indicated that the majority of TPs utilize an Hsp70-binding N-ter during translocation and opens the way for synthetic biology to create de novo TPs that exceed the targeting activity of natural TPs.
To ease the generation of Escherichia coli expression vectors, an NheI site was added to a pET-30a-based pET-SSF-YFP (21), producing pET-NheI-YFP. The expression cassette in pBS was subcloned into pET-NheI-YFP to generate pET constructs. The negative control, pET-m20-YFPHis 6 , containing the first 20 aa (m20) of the mature domain of SSU (mSSU) fused to YFP and His 6 tag.
All of the primers are listed in supplemental Table S1. All constructs have been verified by sequencing.
In Vivo Plastid Import Assays-Arabidopsis thaliana Col-0 and onion (Allium cepa) cultivar Vidalia were used if not stated otherwise. Transient expression in onion epidermis peels and Arabidopsis seedlings using particle bombardment was done as reported previously (32). Epifluorescence imaging and the relative intensity ratio calculations were performed as described previously (21). The number of cells used is stated in the figure legend.
In Vitro Chloroplast Import Assays-The TP-YFP fusion proteins and m20-YFPHis 6 control were produced from pET constructs using TNT T7 Coupled Wheat Germ Extract System (Promega, Madison, WI) with 0.8 Ci/l [ 35 S]methionine (MP Biomedical). Two batches of translation were used. The translation products were separated on SDS-polyacrylamide gel followed by digital autoradiography using Storage Phosphor Screen (Molecular Dynamics) or Molecular Imager FX (Bio-Rad) and quantified using Quantity One software (Bio-Rad). The relative concentration of each protein was determined from the radioactivity and the numbers of Met. The concentrations were equalized by adding 50% TNT Wheat Germ Extract (Promega). An equal volume of equalized proteins was used in the import assay.
The chloroplasts were isolated from 12-to 14-day-old seedlings of dwarf pea (Pisum sativum) cultivar Green Arrow as described previously (33). The import assays were performed using a modified method from Dabney-Smith et al. (34). Briefly, the assays were performed in a 500-l reaction containing 0.25 mg of chlorophyll/ml chloroplasts, labeled protein, 2 mM L-Met, 10 mM DTT, 2 mM Mg-ATP, 0.5% BSA, 300 mM urea, 330 mM sorbitol, 50 mM HEPES-KOH, pH 8.0. The reactions were incubated at room temperature. A 150-l sample was taken at 5, 10, and 15 min after the reaction was started and mixed with 600 l of ice-cold buffer (330 mM sorbitol, 50 mM HEPES-KOH, pH 8.0). The chloroplasts were re-isolated and separated on a 15% SDS-polyacrylamide gel and quantified by digital autoradiography. Two sets of independent assays were performed.
The amount of imported proteins was normalized where the amount of mature domain of the SSF construct at 15 min was assigned as 100% for the SSF mutant series and the m20-YFPHis 6 control. The amount of mature domain of FDF at 15 min was assigned as 100% for the FDF mutant series. The relative import rate of each time point was calculated from the relative amount of the mature proteins at that time point divided by the import time. The fastest rate among the three time points was assigned as the maximum import rate.
Pulse-Chase Assays-To avoid multiple products presented in the in vitro translation of TP-YFP constructs, we chose to use the construct containing TP followed by mSSU with C-ter 3ϫHA and His 6 tags. The precursor of SSU from pET-11 vector (35) was subcloned into pET-30a. The C-ter tag was added by PCR. Primer sequences are listed in supplemental Table S1. The mSSU-3ϫHA-His 6 region was subcloned into the pET constructs containing TP-YFP fusion producing TP-mSSU-3ϫHA-His 6 constructs. The 35 S-labeled precursors were expressed in E. coli BL21(DE3) and purified as described previously (33).
The chloroplasts were isolated from 12-day-old seedlings of dwarf pea cultivar Wando as described previously (33). The plants were kept in the dark for at least 14 h to deplete ATP, and all procedures were performed under dim light to prevent ATP synthesis. The precursors were incubated with isolated chloroplasts in the presence of 0.1 mM ATP in the buffer containing 10 mM DTT, 4 mM MgCl 2 , 1% BSA, 300 mM urea, 330 mM sorbitol, and 50 mM HEPES-KOH, pH 8.0, at room temperature for 20 min. In this condition (pulse), the precursors are bound to the TOC/TIC translocons as the import intermediate (21,36). The intact chloroplasts were re-isolated using 40% Percoll (GE Healthcare). To initiate the translocation (chase), the re-isolated chloroplasts were suspended into the same buffer with the presence of 2 mM ATP and incubated at room temperature. Samples were taken at 0, 2, 4, 8, and 20 min and rapidly diluted with 3 volumes of ice-cold buffer (330 mM sorbitol, 50 mM HEPES-KOH, pH 8.0) to stop the import. The chloroplasts were re-isolated, and a fraction was used to determine total protein concentration using BCA assay (Thermo Fisher Scientific). Equal amounts of proteins were separated on a 10 -20% SDS-polyacrylamide gel and quantified by autoradiography. Two sets of independent assays were performed.
The substrate was produced from bovine ␣-lactalbumin (Sigma). The reduced carboxymethylated ␣-lactalbumin (RCMLA) was made based on the published method (37). To produce fluorescein-labeled RCMLA (F-RCMLA), the protein was dialyzed into 50 mM borate buffer, pH 8.50. NHS-fluorescein (Thermo Scientific) was added at 25-fold molar excess and incubated at room temperature for 1 h in the dark. F-RCMLA was then separated using a PD-10 column (GE Healthcare) into 20 mM potassium phosphate buffer, pH 7.40, and finally dialyzed into Buffer M.
The SSF peptide was expressed and purified as reported previously (21). The N-ter peptide competitors were synthesized and modified with N-ter acetylation and the C-ter amidation by New England Peptide, Inc. All peptides are 15 aa in length containing the Hsp70-binding domain ( Fig. 2A) and part of SSR sequence similar to the first 15 aa of the N-ter mutant constructs based on SSR (Fig. 2D). These synthetic peptides were solubilized in DMSO before diluting with Buffer M containing 8 M urea to the final concentration of DMSO at 5%.
The competitive binding assays were performed in a 40-l reaction containing varying concentrations of SSF or 40 M of the N-ter peptide, 10 M His-mCSS1, 1 M F-RCMLA, 2 mM ADP, 250 mM urea, 1.25% DMSO in Buffer M. The reaction was incubated at 37°C for 30 min before separation using native PAGE. The gel with 6% acrylamide was used in 0.5ϫ TBE buffer (44.5 mM Tris borate, pH 8.30, 1 mM EDTA). To increase detection sensitivity, the proteins in the gel were transferred onto a nitrocellulose membrane. The fluorescent signal was detected with Bioimaging Systems (UVP) equipped with 302 nm UV light source and SYBR Green emission filter. The image was captured using LabWorks software (UVP), and the quantitation of band intensity was performed using Quantity One software (Bio-Rad).
MALDI-TOF MS-The saturated sinapinic acid matrix in 33% acetonitrile and 0.1% trifluoroacetic acid was mixed with equal volume of protein sample. The mixture was spotted onto the stainless steel target plate and dried. MALDI-TOF MS was performed on a Bruker Daltonics Microflex mass spectrometer with positive ion mode.
TP and Other Protein Datasets-The 912-and 208-TP datasets were acquired from the studies by Chotewutmontri et al. (21) and Lee et al. (10), respectively. The plant protein training sets of TargetP, including 141-TP (called cTP141), mTP368, Nu54, Cy108, and SP269, were part of the work by Emanuelsson et al. (38). The numbers in the TargetP dataset names indicate the total number of sequences in that set.
Comparison of TP Datasets-To compare the TPs in different datasets, a standalone BLAST program version 2.2.25ϩ was utilized (39). The TP datasets were converted into BLAST databases. The TP sequences from each set were searched against each database to identify the same protein.
Percentage of Uncharged aa (%UA) Calculations-The calculation was performed according to Chotewutmontri et al. (21). The %UA (not Asp, Glu, His, Lys, or Arg) was calculated within a subsequence in a window of length (L). The calculation was repeated from L of 5 to 17 aa. The %UA data within the first 30 aa was fitted to the inverted Boltzmann sigmoidal model using GraphPad Prism 5.0 (GraphPad Software).
Hydrophobic Accessible Surface Area (HASA) Calculations-The HASA of the hydrophobic aa (Ala, Ile, Leu, Met, Phe, Trp, Tyr, and Val) within an 8-aa subsequence was calculated by summation of the corresponding area of each aa based on the published total side-chain accessible surface area values (40). The 8-aa window was moved along the length of the sequences to show the HASA at different areas of the sequences. The Perl script for this analysis is available upon request.
Clustering of Sequences Based on H70BS Patterns-For the 208-TP dataset (10), the N-ter 80-aa sequences were utilized. The RPPD algorithm was used to predict H70BSs for each TP. The MATLAB program (MathWorks) was used. The clustergram function from the bioinformatics toolbox was applied to cluster data based on the hierarchical clustering and generated a dendrogram along with a heat map of the clustering. To cluster the TPs according to their Hsp70-binding patterns, Pearson correlation was used in distance matrix calculation. Unweighted pair group method with arithmetic mean was used in dendrogram construction. The clustering results are listed in supplemental Table S2.
For the analysis in Fig. 1B, 20 random sequence datasets were generated based on the aa frequency of the entire 208-TP set. Each set contains 208 of 80-aa sequences. The H70BS was predicted based on the RPPD algorithm. The mean RPPD score as a function of aa position was calculated for an entire dset. The MATLAB (MathWorks) was used to cluster the mean RPPD score patterns of all the datasets similar to the above paragraph.
Hydrophobicity Calculations-The hydrophobicity of the peptides was estimated using an on-line program ProtScale on the ExPASy server (42). The hydrophobic scale of aa by Kyte and Doolittle (43) was applied.
Sequence Logo-To generate the logo, the sequence datasets were submitted to the on-line program WebLogo (44).
Evaluation of the N-ter Sequences Using Position-specific Scoring Matrix (PSSM)-To evaluate whether the N-ter peptide sequences were similar to the TP N-ter sequences, the PSSM method was used. This method is widely used in identifying the motifs or patterns within the sequences (45). Generally, the aa sequences containing the motif of interest were aligned. This alignment of the motif with the length of M aa was then used to generate a PSSM with a dimension of 20 aa ϫ M positions. Each row represents an aa (i), and each column represents an aa position (j) of the motif. The matrix element contains a score (s i,j ). To identify the motif within a sequence, a sliding window of size M is scanned through the sequence. At each location, the score of the subsequence is calculated from the summation of the score s i,j of each aa in the subsequence (46). A type of scoring used the relative frequency (F i,j ) of the aa i at position j and the background frequency (p i ) of the aa i (46). The matrix element score (s i,j ) was calculated as the log ratio of these frequencies (f i,j /p i ). The subsequence score (SS) can then be calculated from the summation of the s i,j scores associated with the aa in the subsequences (46).
Because the peptides in the series are 8 -12 aa long, a TP PSSM corresponding to the TP N-ter 10 aa was created. The N-ter sequences of TPs from the 208-TP set (10) were used without aligning the sequences. Because the first position is always Met (Fig. 2G), this position is ignored. Although position 2 had a distinct aa distribution, positions 3-12 had approximately the same distributions (Fig. 2E). For the first position of the PSSM matrix (j ϭ 1), the calculated relative frequency of aa at the sequence position 2 (f i,l ) showed that the 208-TP lacked Cys, His, Trp, and Tyr at this position. We instead calculated these frequencies from a larger set, the 912-TP set (21). However, we found no Trp at this position. We fixed this missing value by using the value of 0.01 as the number of Trp counts at this position. To avoid the same problem, the aa from the sequence position 3-12 were combined to calculate an averaged frequency (f i,a (3)(4)(5)(6)(7)(8)(9)(10)(11)(12) ), which was used as the frequencies of the PSSM matrix at positions 2-9. The aa frequencies of Uni-Prot Release 2012_10 were used as the background frequencies (p i ). We used log base 2 of the ratio of F i,j over ps i to calculate the matrix element s i,j scores. The PSSM score of a sequence was then calculated from the summation of the s i,j scores from nine positions corresponding to sequence positions 2-10. The frequencies and the s i,j scores of this TP PSSM are reported in Table 1. A Perl script was written to perform this calculation and is available upon request.
Homology Modeling of Chloroplast Hsp70 -The crystal structure of substrate-binding domain (SBD) of E. coli DnaK with its peptide substrate (PDB 1DKZ) (50) was used as the template. First, the sequence of an Arabidopsis chloroplast Hsp70 (AtcpHsc70-1) was aligned to the other Hsp70 sequences using T-Coffee (51). The alignment of the SBD as shown in Fig. 10B was used to build the homology model of AtcpHsc70-1 on the Swiss-Model server (52). The electrostatic surfaces of the structures were produced from Molecular Operating Environment program (MOE; Chemical Computing Group) using CHARMM force field.

RESULTS
Bioinformatic Analysis of TP Datasets-Previously, Ivey et al. (22) suggested that ϳ70% of TPs in the CHLPEP dataset (53) harbored a strong H70BS at their N-ter. However, because only 14 TPs in this set are from Arabidopsis and it is redundant (53 instances of SStp), we chose to revisit the H70BS analysis. Using the 208-TP dataset of experimentally verified chloroplast proteins from Arabidopsis (10), we analyzed the occurrence and placement of H70BSs with the RPPD algorithm (22, 41) that utilizes 6-aa window analysis. The analysis with the Rudiger et al. (54) algorithm using a 13-aa window was not performed

components of the TP PSSM for analysis of the N-ter sequences
These parameters were derived from the N-ter sequences of TP datasets. f i,j indicates relative frequency of the aa i at position j; p i indicates background frequency of the aa i derived from UniProt database; s i,j indicates the matrix element score calculated from the log ratio of f i,j /p i . because it is less accessible to the terminal regions and shows inconsistency with our previous results (21). Fig. 1A shows this analysis and TargetP prediction. The orange-yellow heat map in Fig. 1 represents levels of Hsp70 affinity predicted by RPPD with higher score (yellow/white) corresponding to higher affinity. The TPs were clustered into nine groups based on their H70BS patterns within the first 80 aa (TP and part of mature protein sequence) using Hierarchical clustering. These groups show pronounced differences in the position of their highest affinity, which varied from position 5 to 70 (5,10,17,23,32,46,52,58, and 70 for groups 1-9, respectively). This fairly even distribution raised a concern that the H70BS was randomly distributed. To resolve this, we generated 20 random sequence datasets, each contain 208 sequences derived from the aa frequency of the 208-TP set. The RPPD score of each sequence were calculated. We then asked how similar are the mean RPPD scores of an entire set as a function of aa positions between the random sets, the 208-TP set, and the other six control sets. The control sets include a much larger set of 912 reliably predicted TP from Arabidopsis (21) and the training sets of TargetP (38) for chloroplast (141-TP), mitochondrial (mTP368), nuclear (Nu54), cytosol (Cy108), and secretory pathway (SP269) localized proteins. The number in the dataset name indicates the total sequences in the set. We applied hierarchical clustering to the mean RPPD score patterns of theses sets. All three TP datasets (208-TP, 912-TP, and 141-TP) were found to cluster together (Fig. 1B). This TP cluster only contains two random sets despite the fact that the 20 random sets were derived from the 208-TP set. Thus, the observation that the RPPD patterns among three separate TP sets are more similar than those of the random sets indicates that this distribution is not random and seems to be conserved among TPs. In addition, the TP groups in Fig Table S2. We further analyzed this dataset using TargetP (38). About 90% were predicted to localize to chloroplasts (C-M-S-O heat map in Fig. 1A) with 68% of these TPs classified into TargetP reliability classes 1 and 2 (white for class 1 in RC heat map of Fig.  1A). The 208-TP set has a predicted TP length of 52.2 Ϯ 17.2 aa (mean Ϯ S.D.).

Amino acids
Previously, the uncharged N-ter of TPs was identified by von Heijne et al. (13), and we have verified them using a larger set of 912 predicted TPs from Arabidopsis (21). The same analysis was applied to the 208-TP set. To show %UA along the length of TPs, we used a sliding window of analysis (from 5 to 17 aa) and observed that the %UA at the N-ter had a maximum value of 94% and dropped to 81% between aa 20 and 30 (Fig. 1C). The C-ter after aa 40 contained ϳ75%UA. When fit to a sigmoidal curve, the transition mid-point between 94 to 81%UA occurred at aa 14, regardless of the window size.
To investigate distribution of the %UA of the TP N-ter, we segregated the N-ter into three groups based on their %UA (60 -79%, 80 -99, and 100%). The fraction of TPs observed in each group is shown as a function of N-ter length in Fig. 1D. The fraction of 100% uncharged N-ter reduced from ϳ78 to 30% when the analyzed N-ter length increased from 5 to 17 aa. Inversely, the fraction of 80 -99% uncharged N-ter increased from 18 to 65%. Yet, the fraction of 60 -79% uncharged N-ter remained very low and fluctuated between 3 and 13%. Thus, Ͼ90% of the TPs in the 208-TP set contained a predominantly uncharged N-ter (Ͼ80%UA) regardless of analysis length. Moreover, Ͼ40% of the TPs have totally uncharged N-ter within their first 15 aa.
When separated into nine clusters as shown in Fig. 1A, each TP cluster had about the same average %UA in the N-ter 15 aa (Fig. 1E). We also plotted the frequency distribution of the maximal RPPD score within the N-ter 15 aa of the 208-TP set (Fig.  1F). Based on the finding that Hsp70-interacting peptides have an RPPD score above 2 (22), we found that 77% of TPs in the 208-TP set had an RPPD score above 2 within the first 15 aa, indicating that these TPs contain an N-ter H70BS. The same analysis was applied to the 912-TP set from Arabidopsis (21) and the 141-TP set containing TPs from various plant species (38), and we found that 82 and 78% of the TPs had an RPPD score above 2, respectively ( Fig. 1F). Although Fig. 1A shows that 32-47% of TPs harboring the strongest H70BS at the N-ter, these analyses indicate that the majority of the N-ter of TPs (77-82%) is able to interact with Hsp70.
We then asked whether there is a correlation between %UA and predicted Hsp70 affinity of the TP N-ter. Using the 208-TP set, the maximal RPPD scores from the TP N-ter 15 aa were plotted against the %UA of the N-ter 15 aa (Fig. 1G). The RPPD scores of the TPs with different %UA were distributed throughout the range of 2 to 30, suggesting that there is no direct correlation between the %UA and Hsp70 affinity of the TP N-ter.
Different TP datasets have been used in various works, yet dataset comparison has not been done. We therefore compared three TP datasets, the 208-TP set (10), the 912-TP set (21), and the 141-TP training set of TargetP (38). We found that the majority of TPs in each set did not overlap with the others (Fig.  1H). This was expected for the 141-TP set where TPs were derived from all plants in Swiss-Prot, but it was unexpected for the 208-TP and the 912-TP sets because both were composed solely of Arabidopsis proteins. The discrepancy between the 208-TP and the 912-TP sets may result from the 912-TP set containing only TPs that are classified by TargetP with the top two reliability classes. Similar analysis of the 208-TP set revealed that only 125 TPs belong to classes 1 and 2. The large differences observed between the TargetP training set and the other two datasets clearly indicate that this set is not a good representation of higher plant TPs, and it provides motivation for a more systematic investigation of TP properties in general.
Construction of TP N-ter Mutants-We developed a quantitative in vivo plastid protein import assay as described in Chotewutmontri et al. (21). This assay utilizes the transient expression of different TP-YFP fusion proteins in onion epidermal cells to determine the plastid targeting efficiency of the TPs and is expressed as a ratio of plastid/cytosolic YFP signals. To evaluate the variability of this assay, we expressed SStp followed by m20 from tobacco fused to YFP and measured the ratios from three widely available onion cultivars. We observed that    the ratios were reproducible within a given cultivar but differ significantly between cultivars (supplemental Fig. S1). For this work, we used only the Vidalia cultivar because it was used in our previous study (21). Using this in vivo assay, we investigated the roles of uncharged and Hsp70-binding properties of the TP N-ter by utilizing two previously generated constructs. The first construct is based on a native TP, SSF, fused to m20 followed by YFP (SSF-20-YFP) and the second related construct that contains a dysfunctional synthetic TP coding for the reversed sequence of SSF called SSR-20-YFP. Although SSF-20-YFP localized to the plastids, SSR-20-YFP had a very low targeting efficiency (21). These two constructs served as the positive and negative starting vectors that provided import-competent and import-deficient constructs, respectively.
The N-ter of these two constructs were further modified by adding additional peptide sequences in front of SSF and SSR. To estimate the function of the N-ter peptides, we interpreted the results from SSF-20-YFP and SSR-20-YFP as the upper and lower limit of the real values. Using only the data from the dysfunctional SSR case may not be accurate because it is not a real TP and the fact that the fusion product can have unforeseen characteristics. These new peptide sequences were derived from a detailed study where their affinities for three different Hsp70s were experimentally determined (31). We selected nine peptides with a range of Hsp70 affinities. The sequence and the experimental affinity of these peptides are shown in Fig. 2A. These peptides originated from very different sources, yet none are from plants. We have truncated the original peptide to only the Hsp70-binding domains (31) creating peptides ranging  from 8 to 12 aa in length. All truncated peptides contain polar and/or charged aa except for pp38 peptide. The affinities of full-length peptide to E. coli DnaK, bovine e luminal BiP, and bovine cytosolic Hsc70 had been determined as rating affinities of the competitiveness against RCMLA in binding to Hsp70s (31). We calculated the summation of the rating affinities to generate a combined Hsp70 rating affinity for each peptide (Fig.  2A). The TP N-ter mutants were generated by extending their original N-ter to include the truncated peptide sequences together with the substitution of internal Met to Ala or Ser (Fig.  2B).
To assess whether the TP N-ter recognition is based on local physicochemical properties, another series of mutants was generated. The N-ter 10 aa of both SSF in SSF-20-YFP and FDtp named FDF in FDF-20-YFP (21) were replaced with their reversed or scrambled sequences (Fig. 2B). These mutants should retain the same physicochemical properties as the original constructs.
Bioinformatic Analysis of the Hsp70-interacting and Noninteracting Peptides-To further characterize these nine peptides, we determined hydrophobicity and %UA of the peptides. The hydrophobicity scale of aa by Kyte and Doolittle (43) was used to calculate the hydrophobicity score. When the combined rating affinities were plotted against the hydrophobicity scores (Fig. 2C), seven of the nine peptides showed some correlation (R 2 of 0.82; p value of 0.0046). However, PepG and np09 showed no correlation, and PepG had a strong Hsp70 affinity yet is hydrophilic. Inversely, np09 was very hydrophobic but has a weak affinity. The combined rating affinities versus %UA plot (Fig. 2D) also showed a correlation with R 2 of 0.72 (p value of 0.015).
Although these peptides were not plant-derived, it was possible that they were simply functioning as a TP due to random sequence similarity, so we tested whether the utilized N-ter peptides were similar to TP N-ter. The PSSM method, which is heavily used in identification of motif or pattern in the sequences (45), was utilized, even though TP N-ter are poorly conserved with high levels of Ser and Thr (Fig. 2, E and F). The PSSM scores were calculated based on the aa distributions at specific positions in the TP sequences in comparison with the background frequencies (the aa frequencies of UniProt database). A total PSSM score greater than 0 indicates a higher chance of being a TP N-ter than a random sequence. About 90% of tested TP N-ter had the PSSM scores greater than 0 (Fig. 2G). The PSSM score distributions of nonchloroplast proteins (mTP368, Nu54, Cy108, and SP269 datasets mentioned earlier) were plotted as references. We found that DRC8, pp9, HA, pp38, PepG, HbS, np09, and V10 had the scores of Ϫ12.8, Ϫ12.7, Ϫ12.6, Ϫ8.8, Ϫ8.7, Ϫ5.2, Ϫ5.1, and Ϫ3.0, respectively, suggesting that they are not similar to TP N-ter (Fig. 2G). Only A6R had a positive score of 1.4, which makes it a possible TP N-ter. However, about 20% of known TPs, (both TargetP and 208-TP datasets) had scores less than 1.4 (Fig. 2G), which is consistent with this predictive tool having a specificity of ϳ87% (55).
To further characterize these new TPs, we used five subcellular localization programs to predict localization of the mutants. The most frequent predictions indicated that all mutants are targeted to the chloroplasts ( Table 2).
Affinity of the Peptides to the Chloroplast Hsp70 CSS1-To determine the affinity of the peptides, the mature domain of chloroplast CSS1 precursor protein from pea (56) was expressed in E. coli and purified with N-ter His tag (His-mCSS1, Fig. 3A). Native PAGE indicated that the majority of His-mCSS1 was in monomer form, although dimer and aggregated forms were also detected (Fig. 3B). We chose to work with the well characterized model Hsp70 substrate, RCMLA. We produced RCMLA from bovine ␣-lactalbumin and further labeled it with fluorescein, generating F-RCMLA. The MALDI spectra indicated all eight Cys residues were carboxymethylated, and every RCMLA was labeled with 2-5 fluoresceins (Fig. 3C).
Using this modified RCMLA, we performed competitive binding assay using SSF against F-RCMLA binding to His-mCSS1. Native PAGE was used to separate the CSS1-bound and free forms of F-RCMLA. The proteins were electroblotted onto nitrocellulose membrane to increase sensitivity of fluorescent detection. Fig. 3, D and E, shows that the level of F-RCMLA bound to the monomeric His-mCSS1 (RϩM) decreases when the higher concentration of SSF was added. At 40 M SSF, the bound F-RCMLA was reduced to about 50% (Fig. 3E).
We then performed the competitive binding assay using short synthetic peptides corresponding to the first 15 aa of the N-ter mutant constructs based on SSR (Fig. 2, A and D). We did not use the peptide based on SSF constructs because the N-ter of SSF contains a strong H70BS and would complicate the results (22). A single concentration at 40 M of the peptide was used. The level of F-RCMLA bound to the monomeric His-mCSS1 (RϩM) was quantified and expressed as the percentage of reduction of the bound F-RCMLA from no competitor control (Fig. 3, F and G). These relative affinities of the peptides are in good agreement with the previously determined affinities to other Hsp70s (31). Both peptides derived from noninteracting peptides HA and np09 showed very weak affinities although other peptides derived from Hsp70-interacting peptides showed greater affinities to His-mCSS1. This result corroborated the prior results and confirmed their activity with the recombinant form of the major stromal Hsp70, His-mCSS1.
In Vivo Import Assays-The N-ter peptide extensions were then tested for their ability to direct the import of YFP protein in vivo in both onion epidermal cells and Arabidopsis seedlings. All constructs are based on the same expression cassette differing by only about 10 aa from others, and we assumed that the expression should be very similar. The mature forms of the precursors are exactly the same and should have the same stability in the chloroplasts. Although a range in expression was observed in different cells as intrinsic variations of biolistic transformation, only a limited range suitable for fluorescent imaging detection was used. To account for variation, the images from at least 30 cells per construct were taken. The ratio analysis of the data also partially controls for varying expression levels. Fig. 4, A-D, shows representative images of the onion cells transiently expressing YFP 12 h after transformation. Although most of the constructs were able to target to plastids at different levels, pp9-SSR could not be detected in plastids, and np09-SSF was not expressed. Four constructs, pp38-SSF, The imported proteins were measured within 15 min of import. The precursors were assigned as not imported proteins if the import rates were lower than 20% of SSF-20-YFP import rate.

Properties of Transit Peptide N Terminus
pp9-SSF, PepG-SSR, and V10-SSF, had dual localization to plastids and mitochondria. Fig. 5 shows a representative of the dual-localized construct. V10-SSF was localized to the compartments with the same characteristics as those of plastid (around 5-m compartments that sometimes form small tubular structures called stromules) and mitochondrial (small with dramatic morphology ranging from round spots to tubular shape) markers (32). To extend the targeting results from the non-green plastid found in the monocot (onion), the constructs were also transiently expressed in dicot chloroplasts using Arabidopsis seedlings. Fig. 4, E-H, shows the localization patterns of these constructs 2 days after transformation. More than 30 cells expressing each construct were observed. The construct was assigned as cytosol-localized if the majority of the cells showed a strong YFP signal in the nucleus resulting from diffusion of YFP in the cytosol. The localization observed in Arabidopsis is fairly similar to that observed in onion cells in terms of plastid targeting ( Table 2). We considered the dual-targeted constructs as having the ability to target to plastids. However, the frequency of dual localization was significantly reduced and only observed from the V10-SSF and A6R-SSF constructs. This difference may reflect the more robust targeting to green plastids with the SSF TP than what is observed in the non-green, nonphotosynthetic leucoplasts found in onion cells.
To quantify the efficiencies of plastid targeting, images of onion cells similar to those of Fig. 4, A-D, were used. Although both onion and Arabidopsis showed similar qualitative results, we quantitated the targeting activity in onion because the distribution and uniform shape of the plastids permitted a robust quantitative analysis. We calculated the ratio of plastid/cytosolic YFP targeting for 2-3 plastids in each of Ն20 different cells. These values were averaged, and the targeting ratio is reported in Fig. 4, I and J. These ratios indicate that the mutants of both a weak Hsp70-binding peptide (np09) and a nonbinding peptide (HA) were unable to support the targeting of either SSF or SSR constructs and had the lowest targeting efficiencies at ϳ1.7 (Fig. 4I). The addition of the peptides with strong Hsp70binding affinities (PepG, V10, DRC8 and A6R) yielded the much higher ratios of ϳ5.5 for both SSF and SSR. Consistent with this trend, the addition of the moderate Hsp70-binding peptide, HbS, was found to have intermediate ratios at ϳ4.0 (the average of both SSF and SSR). Only two peptides behaved differently from their predicted and measured Hsp70 affinities; the two strong Hsp70-binding peptides, pp9 and pp38, displayed low targeting ratios at ϳ2.0. We also found that all constructs based on SSF have higher efficiency than those with SSR. We used the ratio cutoff of three to assign the targeting of the constructs in onion cells (Table 2).   The reversed (N10R) and scrambled (N10S) mutants that we described in Fig. 2B show reduced targeting ratios compared with the original constructs (Fig. 4J). However, these ratios are still significantly higher than those of their reversed TP con-structs (FDR and SSR) based on Tukey's test (␣ ϭ 0.05), except N10R-FDF versus FDR with a p value of 0.0576. This result suggests that the recognition of the TP N-ter domain is based on physicochemical properties and is independent of sequence.

Properties of Transit Peptide N Terminus
To further determine the relationship between two properties of the TP N-ter (%UA and mCSS1 affinity) and the targeting efficiency, we plotted the plastid/cytosolic YFP ratios of the mutants with unambiguous data. The ratios from the dual-targeting constructs were omitted because the defined cytosolic region in our analysis includes some mitochondria, and the measured cytosolic YFP signal includes the signal from mitochondrial YFP in these constructs. The plastid/cytosolic YFP signal ratio of these constructs is always lower than the true value. The ratios of pp9 and pp38 were also omitted because they lack a native property of TPs, which will be discussed below. When the plastid/cytosolic YFP ratios were plotted against %UA (Fig. 4K), %UA showed weak correlation with Pearson correlation coefficient (r) of 0.22 and a p value of 0.51. The targeting ratio versus mCSS1 affinity (Fig. 4L) showed a stronger correlation with r of 0.72 and p value of 0.01.
In Vitro Import Assays-To determine the import rate of the N-ter mutants, in vitro import assays using isolated pea chloroplasts were employed. The precursors were labeled with [ 35 S]Met via in vitro translation. We have performed the assays in two batches. The translation products from each batch are shown in Fig. 6, A and B. Unfortunately, in vitro translations of these constructs produced both full-length proteins and some smaller species possibly due to the translation using internal ATG start sites. The translation from the ATG of the last resi-dues of SSF, SSR, FDF, and FDR corresponds to the same 30-kDa proteins, and the translation from the ATG of the first residue of YFP corresponds to the 27-kDa proteins. To account for this variation, the translation products were analyzed by SDS-PAGE and autoradiography to allow the quantitation of the full-length proteins. We then tested each batch of precursors for import using equal input of full-length precursors in the assays. Fig. 6, C-H, shows the import time course of the precursors where the accumulation of the imported and processed mature domain was quantitated from the autoradiographs. The precursors and the mature proteins were well separated and easily identified. However, all of the translation products yield a similar sized preprotein whose apparent molecular weight is accurate based on the pre-stained protein standards. The mature band is also the correct molecular weight and also has a uniform size for the different constructs (Fig. 6H). The import of some precursors showed an early plateau after 5 min, although others did not plateau after 15 min of import. The maximum import rates determined from these time points were plotted in Fig. 7A. Surprisingly, V10 mutants import at a much higher rate than SSF-20-YFP in vitro but not as high in vivo. To compare the import rates with the targeting efficiencies, the in vivo targeting ratios of the precursors were plotted against the in vitro import rates (Fig. 7B). This plot indicates that the precursors with higher import rates have higher target-  (21). D and H, reversed (N10R) and scrambled (N10S) mutants. I and J, ratios of plastid/cytosolic YFP were measured from onion assays similar to those shown in A-D. Means Ϯ S.E. are shown. 20 Յ n Յ 30. In addition, FDF, FDR, SSF10, SSR10, FDF10, and FDR10 from a previous study (21) are included. These constructs with suffix 10 contain the N-ter 10 aa of the opposite TP at their N-ter. I, ratios from the N-ter peptide constructs. J, ratios from the reversed (N10R) and scrambled (N10S) N-ter constructs. K, plastid targeting efficiency ratios were plotted against the %UA calculated from the N-ter 15 aa of the precursors. The correlation line was determined without the dual-localized mutants, pp9 and pp38 mutants. L, plastid targeting efficiency ratios were plotted against the relative mCSS1 affinity. The correlation line was determined without the dual-localized mutants, pp9 and pp38 mutants.  MARCH 20, 2015 • VOLUME 290 • NUMBER 12

Properties of Transit Peptide N Terminus
ing efficiencies. The different amount of precursors used between two batches also resulted in different import rates between two batches. The variation in import in vitro versus in vivo for certain constructs such as V10 will require additional investigation.
Analysis of pp9 and pp8 Constructs-We further investigated the constructs based on the two strong Hsp70-binding peptides, pp9 and pp38, which were unexpectedly import-deficient. The sequences of pp9 and pp38 contain multiple aromatic aa, including 2 and 3 Trp, respectively ( Fig. 2A). Trp is the  (Table 1). Trp is not only hydrophobic but also has very large side chains. We developed a sliding window analysis that plots the total HASA of the peptide chain (Fig.  8A). It is clear that SSF and TPs as a group (the 208-TP set) have a relatively constant yet low HASA (Ͻ500 Å 2 /8 aa). However, when we measured pp9 and pp38 constructs, they have a much higher HASA at their N-ter compared with others (supplemental Fig. S2). The fact that this area is Ͼ2 times the TP average suggests that this property may interfere with the translocation. We investigated the failure of import of pp9 and pp38 constructs using pulse-chase assays (Fig. 8B). The 35 S-labeled peptide-SSF fusion to mSSU followed by 3ϫHA and His 6 tags was used as the precursor. The precursors were incubated with chloroplasts in the presence of 0.1 mM ATP, which allows the precursors to pulse as the import intermediates inside the TOC/TIC translocons and not translocated into the stroma (36). The chloroplasts with bound precursors were re-isolate, and the translocation was initiated with the presence of 2 mM ATP. The import was terminated at different times, and the  A and B). For each translation batch, equal molar concentrations of precursor proteins were used in the import assays. The precursors from translation batches 1 and 2 were used in C-E and F and G, respectively. Two sets of assays were performed. Representative autoradiographs of the importprocessed mature domains are shown on the right (C-G). The amounts of mature domains were measured at 5, 10, and 15 min after import started. The values from the SSF and SSR series were normalized where the amount of mature domain of SSF construct at 15 min was assigned as 100%. The amount of mature domain of FDF at 15 min was assigned as 100% for FDF series. A and B show the autoradiographs of 1-l translation products from batches 1 and 2, respectively. Top labels, constructs; p, precursor size; a, additional products; asterisk, mature domain size. C, constructs based on SSF-20-YFP from the translation batch 1. The in vitro import rates were derived from the fastest rates within 15 min of import. Two separate sets are shown based on the batch of in vitro translation of the precursors. The raw values from m20-YFPHis 6 , SSF, and SSR series were normalized to the raw value of mature domains from a 15-min import of SSF-20-YFP (100%). The FDF series were normalized to the raw value of mature domains from a 15-min import of FDF-20-YFP. The precursor without TP, m20-YFPHis 6 , was used as a negative control. n ϭ 2. Means Ϯ S.E. are shown. A, maximum import rates. The rates of batch 2 (filled bars) were scaled to match the batch 1 rates (opened bars) using the native SSF and SSR rates as controls. B, maximum import rates plotted against the plastid targeting efficiency ratios. chloroplasts were re-isolated. Equal amounts of total proteins were loaded on the gel in Fig. 8B. As expected for the wild-type SSF, the amount of precursor was reduced, although the amount of imported protein was increased over time. The import-deficient and non-Hsp70-interacting HA precursor failed to form the bound intermediate. Although the precursor forms of both pp9 and pp38 were reduced over time, the imported forms were not detected confirming the import deficiency of these constructs. This result also suggested that the pp9 and pp38 precursors were able to form the import intermediates but unable to translocate into the stroma, and the intermediates were released from the envelopes over time.

DISCUSSION
Our previous work demonstrated that the N-ter 10 aa of two well characterized TPs (SStp and FDtp) determined the translocation efficiency of the preprotein both in vivo and in vitro (21). When these 10 aa were added to the N-ter of an inactive precursor, both of these sequences restored the translocation of the preprotein in vitro and in vivo, suggesting that the critical element enabling translocation was found within these 10 aa. Bioinformatic analysis of these short N-ter indicated that both had a predicted strong H70BS, whereas the N-ter of inactive preproteins lacked this property (21). In this study, we directly confirm that it is the presence of N-ter H70BS that determines the ability of a precursor to be translocated by testing a set of nine unrelated peptides that possess a range of affinities for the chloroplast Hsp70 CSS1.
Locations of H70BSs Vary in TPs-Previous works had predicted that a large percentage of TPs contained an H70BS at the N-ter region based on the redundant TP datasets (22,23). When this analysis was repeated using a nonredundant and experimentally confirmed set of chloroplast TPs, we observed a much greater distribution in the placement of H70BSs (Fig. 1A). Using a clustering algorithm, we can group these TPs into nine clusters where the position of the strongest H70BS is located in different regions. How this H70BS placement functions in relation to the other identified motifs (15,18) and contributes to a general TP design (21,57,58) will be the subject for further study. It is now clear that only three clusters of the TPs have a strongest H70BS within their N-ter 20 aa and together represent 32% of the TPs. This is considerably less than 70% reported by our laboratory (22). However, this result may suggest that many of the well studied proteins that are highly abundant and associated with major photosynthetic processes, such as SSU and FD, may have evolved a TP organization that is very efficient to enable rapid translocation and that a common feature of these proteins is the N-ter placement of a strong H70BS. The inclusion of these proteins largely bias the previous TP sets because they are both abundant and very well studied (7,23,53). Despite 32% of the TPs having the strongest H70BS at the N-ter (Fig. 1A), our analysis also indicates that the majority of the TP N-ter (77-82%) is able to interact with Hsp70 (Fig. 1F). However, it is possible that these remaining TPs contain a N-ter-binding site for another molecular chaperone such as Hsp90 or Hsp100.
Role of N-ter Hydrophobicity Versus Hsp70 Recognition-Our earlier paper verified the observation that TP N-ter tends to be uncharged (13,21). Because it has been shown that Hsp70 proteins recognize hydrophobic peptides (31,41,54), it was also important to show that translocation was dependent on an actual Hsp70 interaction and not just a consequence of their hydrophobicity. It is now clear that the hydrophobic core of a peptide/protein fits into the substrate cavity of Hsp70s, although the charged flanking regions interact with the surrounding area (59). Because most TPs contain an uncharged N-ter, it creates a challenge to separate the Hsp70-interacting function from the uncharged property (31,41,54). Using the database of 208 experimentally verified TPs, we observe a transition point between the highly to moderately uncharged regions at aa 14 (Fig. 1C), which is almost the same as aa 15 reported in the 912-TP set (21). Interestingly, comparison of the proteins that make up these two sets, as well as the TargetP training set, indicates that there is very little overlap in composition (Ͻ8% are common to any two sets). Even when restricted to only Arabidopsis protein datasets (208-TP and 912-TP), only 7.4% are common to both (Fig. 1H). However, in all three sets, the highly uncharged N-ter is a common feature. Considering only this highly uncharged 15 aa region, we found that 43% of the 208 TPs contain purely uncharged aa (Fig. 1C). When the TPs from clusters 1 to 3 (Fig. 1A) that contain the strongest H70BS within the first 20 aa were analyzed, 36% of them are completely uncharged in the N-ter 15 aa. These results indicate that more than a third of TPs have purely uncharged N-ter regions regardless of their N-ter Hsp70 affinities.
Non-native N-ter Are Not Similar to TP N-ter-The peptide set used here is composed of nine peptides that are 8 -12 aa in length. For each peptide, two mutants were generated by fusing the peptide at the N-ter of SSF-and SSR-20-YFP fusion constructs (Fig. 2B). All peptides except A6R were predicted to be compositionally different from the TP N-ter (Fig. 2G). When we analyzed the complete set of 18 fusion proteins for their abilities to be predicted as TPs by five different subcellular localization programs (Table 2), 14 were predicted by Ն3 programs, 11 by Ն4 programs, and only 5 by all 5 programs. These results and the PSSM analysis suggest that these non-native peptides have functional properties not recognized by the leading prediction tools that are largely based on aa composition.
N-ter Peptides Restore Import in Vivo-We have selected nine tested peptides that were previously tested for binding to three functional and evolutionarily distinct Hsp70s as follows: E. coli DnaK, bovine cytosolic Hsc70, and bovine endoplasmic reticulum-localized BiP (31). We extended this assay for affinity to the major plastid Hsp70 CSS1 (Fig. 3, F and G); these peptides span a range of interaction strengths. The peptides, pp9 and pp38, are very nonpolar and contain no charged aa, and they strongly and moderately interact with CSS1, respectively (Fig.  2, A, C, and D, and 3G). The PepG, V10, DRC8, A6R, and HbS contain charged aa, range from 67-91%UA, and interact with CSS1. Peptides, np09 and HA, contain acidic aa and do not interact with CSS1. Thus, our peptide set contains an experimentally verified set of peptides with strong, moderate, and no interaction with CSS1.
We have previously designed and described a positive (SSF-20-YFP) and a negative SSR-20-YFP) construct that targets YFP to plastids (21). These constructs allowed the testing of target-ing and translocation in multiple plants as follows: Arabidopsis, onion, and tobacco. We used these constructs as parental constructs to test the effect of the nine N-ter peptides described above and shown in Fig. 2A. The effects on YFP targeting were observed using in vivo plastid protein import assays in both onion (Fig. 4, A and B) and Arabidopsis (Fig. 4, E and F) cells. The results from both plants are similar in terms of plastid targeting ( Table 2). We quantified the in vivo targeting efficiencies of these 18 constructs from onion cells. In each pair, the mutant based on SSF had higher efficiency than the SSR mutant (Fig. 4I). This small but reproducible difference indicated that the targeting efficiency of the constructs is not only affected by the N-ter peptide sequence but also from secondary contributions derived from the sequence of SSF or SSR. Nevertheless, the better ability to translocate preproteins containing two tandem H70BSs (N-ter peptide and N-ter of SSF) may be the result of cooperativity at one or more step during translocation.
By comparing in vivo import efficiency with %UA and CSS1 affinity values, we found a positive correlation in both %UA and CSS1 affinity (Fig. 4, K and L). These correlations were determined without the values from the dual-targeting constructs and the pp9 and pp38 constructs. In general, TPs with a higher %UA or CSS1 affinity N-ter had higher in vivo import efficiency. However, the correlation between CSS1 affinity and the import efficiency at r of 0.72 (p value of 0.01) is much greater than the correlation of %UA at r of 0.22 (p value of 0.51). Fig. 9 summarizes the import assays for both in vivo (left axis) and in vitro (right axis) targeting results of each peptide when placed in front of either the SSF and SSR constructs. The N-ter peptides are shown according to their experimental CSS1 affinities (x axis). If we used a cutoff of 10 for the affinity, it is clear that the mutants with noninteracting N-ter cannot support the import, although five out of seven mutant sets with H70BS N-ter can. The pp9 and pp38 peptides failed to support import both in vivo and in vitro despite having strong Hsp70 affinities. The lack of protein import of the pp9 and pp38 constructs requires some additional analysis because both contain N-ter with 100%UA and strong Hsp70 affinity (Figs. 2, C and D, 3G,  and 4I). This indicates that other properties of their N-ter either fail to comply with some other criteria for the import or possibly have some elements that block translocation. Our results (Fig. 4I) and previous results (21) showed that only precursors containing H70BS N-ter were competent in plastid targeting indicating that these TPs met the requirements for every step.
Because the TP N-ter domain is required for the translocation step (21), we hypothesize that the failure of pp9 and pp38 peptides occurs during binding and/or intermediate steps.
The sequences of pp9 and pp38 contain multiple aromatic aa ( Fig. 2A). Aromatic aa are well known to interact directly with lipids in the membrane (60,61). Trp and Phe are not only hydrophobic but also have very large side chains. Although TPs on average have a relatively constant HASA at Ͻ500 Å 2 /8 aa, pp9 and pp38 constructs have Ͼ2 times HASA ( Fig. 8A and supplemental Fig. S2). Pulse-chase assays showed that these constructs were able to form the early import intermediates but failed to be translocated (Fig. 8B). It may be that these aa interact very strongly with the lipid bilayer in such a way that the TP cannot productively engage one or more components of the translocon. Further experimentation and mutagenesis may uncover how these peptides interact with artificial membranes mimicking the chloroplast outer envelope. It has already been shown that the N-ter of SStp can interact with lipids and undergo a conformational change (12,14) that may be involved in the initial reversible binding step.
Plastid-localized Hsp70s May Have Different Affinity for Peptides-This work and previous studies (22)(23)(24) have shown that most TPs contain a strongly predicted H70BS. Analysis of early datasets such as the TargetP training set have indicated that these sequences are largely located at the N-ter; however, our current work as well as the work by Rial et al. (23) indicate that these sequences may be distributed throughout the TP sequence with only ϳ35% being found within the N-ter third of the TP. One limitation of this type of analysis is that the algorithms used were developed from bacterial DnaK. It is possible that the structural properties of the SBD of the plastid Hsp70 may be quite different from DnaK. To investigate this, we built a model of the SBD (aa 460 -682) of a plastid Hsp70 from Arabidopsis, cpHsc70-1, using the crystal structure of the E. coli DnaK SBD (aa 389 -607) bound to a 7-aa peptide (Fig. 10A). This domain has 52% identity and 72% similarity to the SBD of E. coli DnaK. AtcpHsc70-1 is even less similar to the mammalian chaperones with only 50 and 49% identity to bovine Hsc70 and BiP, respectively. The alignment is shown in Fig. 10B.
Analysis of the homology model indicates that AtcpHsc70-1 has considerable differences in aa that line the peptide-binding channel (Fig. 10, C and D). Based on the bound peptide in DnaK structure, it is clear that in the AtcpHsc70-1 model several aa The in vivo ratio of 3, the relative in vitro import rate of 3, and the relative mCSS1 affinity of 10 were used to divide these data into four groups as follows: negative affinity-negative import (Ϫ/Ϫ), negative affinity-positive import (Ϫ/ϩ), positive affinity-negative import (ϩ/Ϫ), and positive affinity-positive import (ϩ/ϩ), which are highlighted in red, blue, yellow, and green areas, respectively. A line is drawn to connect the data points from the same N-ter sequence. Note that the scaled values from in vitro import of batch 2 were used here.
near the C-ter of the peptide are quite a bit larger and more charged (Glu-508 and Arg-529 compared with Ser-437 and Asn-458 of DnaK, respectively). It is possible that these structural changes could result in a significant difference in the affinity and specificity for how AtcpHsc70-1 binds peptides. Therefore, it is possible that the algorithms used to predict the H70BSs of TPs may not be fully accurate in predicting where the plastid Hsp70 binds TPs. Unfortunately, without any additional data, these existing prediction tools are the best we have to predict Hsp70 binding. Multiple Chaperones May Drive Translocation-Although the mechanism of protein translocation into plastids is still poorly understood (4, 62-65), most current models illustrate a vectorial transfer of the TP into the organelle with the N-ter entering the stroma first and the C-ter and the mature domain remaining behind. It has also been suggested that the TP enters and is translocated via the TOC and TIC translocons as an extended linear peptide devoid of secondary structure (66 -68). These models and our observation of an N-ter placement of a strong H70BS indicate that when the TP emerges from the translocon pore, the interaction with the stromal Hsp70 traps the precursor and initiates the translocation. This is consistent with the role of Hsp70s functioning as a molecular motor that couples ATP hydrolysis to the vectorial movement of preprotein translocation across the plastid membranes. This work is supported by experimental work that has shown reduced protein import upon disruption of plastid Hsp70 in several plants, including moss (28), pea, and Arabidopsis (29). It was clearly shown that knock-out mutants of either of the two stromal Hsp70s lead to a defect in precursor translocation across the membrane (29). However, protein import still partially functioned in these plants suggesting other ATP-dependent protein translocation mechanisms. Although our current analysis did not preclude the involvement of cytosolic Hsp70s, our previous report indicated that the interaction of the TP N-ter with a component within the isolated chloroplasts is essential for the translocation process (21). Nevertheless, the cis-located or cytosolic Hsp70 may involve in this process in vivo.
In prokaryotes, the ATPases of the Hsp100/Clp family enable the ATP-dependent degradation of the ClpAP, ClpXP, and other proteases (69). The plastid stroma contains both ClpP and ClpC that function as two-component ATP-dependent protease (70,71). It has been shown that some Hsp100 members, ClpB/Hsp104, are capable of disaggregating protein without proteolytic degradation, suggesting that ClpB functions independently of ClpP or another proteolytic subunit (72). Several studies indicate that ClpC (now called Hsp93) is involved in the import of preproteins into chloroplasts (73,74). In Arabidopsis, there are two plastid forms of ClpC, AtHsp93V and AtHsp93III. Knock-out mutants of hsp93III have a normal phenotype, yet hsp93V mutant plants are defective in protein import (75,76). It is believed that Hsp93 interacts with incoming preproteins in the Tic110-Tic40-Hsp93 complex (77).
Unfortunately, the details of peptide recognition by the Clp/ Hsp100 chaperones are poorly understood. Earlier structural work with E. coli ClpA indicates it has three functional domains as follows: an N-ter domain and two ATPase domains (78). The N-ter is believed to function as the SBD. Both ClpA and ClpB have a prominent hydrophobic surface that lies between two helices (79). Interestingly, it was recently shown that it is the N-ter domain of chloroplast Hsp93 that is important for its function in the import (80). These data in combination with the structural data of ClpA suggest that the Hsp93 N-ter is responsible for binding both to substrates and possibly to TIC. Future work will be needed to elucidate whether Hsp93 has a similar hydrophobic patch that may bind to incoming TPs.
Concluding Remarks-Our results indicate that about 77% of TPs contain an N-ter placed H70BS, although about a third of the TPs contain the strongest HS70BS within their N-ter 20 aa. Although most models put Hsp93 as the motor that drives protein translocation, recent publications have suggested that a stromal Hsp70 is also involved in chloroplast protein import (28,29,57,63,81,82). Thus import may be mediated by at least two motor proteins, Hsp93 and Hsp70. Interestingly, when Hsp93V is knocked out and Hsp93III is knocked down, plastids can still import proteins at 40 -60% of the wild type, suggesting that Hsp70 or some other ATPase function as the motor (83). Stromal Hsp70 interaction may initiate the translocation process based on N-ter H70BS in TPs, although the TPs lacking an N-ter H70BS may utilize other stromal chaperones such as Hsp93. For example, the TP of ferredoxin-NADPH reductase (FNRtp) intrinsically lacks a strong N-ter H70BS based on our analysis in Fig. 10E. Both FNRtp and its mutant (FNRtp-1234) having all of the H70BSs mutated were still able to support import (84). FNRtp was recently found to interact with Hsp93 only when it is located at the N-ter of preprotein (85). This suggests that FNRtp may be an example of a TP that utilizes a N-ter Hsp93-binding site to initiate translocation.
It is clear that two chaperones either alone or together are responsible for the stromal ATP requirement during import (25). Although all known translocons involve one or more peripherally associated NTPase to drive translocation (86), the Hsp100 family has only been shown to participate in protein translocation in other systems like the endoplasmic reticulum and mitochondria where the Hsp70/BiP system is lost or compromised (87). This suggests that the plastid may be unique because it utilizes both chaperones as motors even in wild-type plants (81). Considering the possible dual involvement of Hsp93 and Hsp70, one can envision two modes of function as follows: (a) they bind TPs sequentially and independently or (b) they bind TPs simultaneously and synergistically (81). Recent work has shown some Hsp100 are intimately associated with DnaK, and it is now clear that disaggregation function of ClpB involved interaction with DnaK (88,89). This observation would support synergistic model B. Many works have shown that the same chloroplast import intermediates were associated with both Hsp93 and Hsp70 (28,29) and recently also with Hsp90 (90). It is possible that different TPs may utilize different modes of Hsp70/Hsp93/Hsp90 interaction that depend on the TP sequence and organization. These variations may result in different levels of translocation force and may be why the plastid translocation process is a more robust unfoldase than what is observed in mitochondria (91) or the AAA ϩ ATPases alone (92). It will be very interesting to develop single molecule force measurements or other methods that may reveal how different TPs are recognized by this multichaperone system. Our work has clearly shown that one class of TPs seems to require an Hsp70 recognition domain at the N-ter and indicates that other TPs may be initially recognized by the Hsp93 or Hsp90 chaperones by binding to currently unknown elements probably also placed at the N-ter.