Structural Requirements for Interaction of Peroxisomal Targeting Signal 2 and Its Receptor PEX7*

Background: Type 2 peroxisomal targeting signals (PTS2) tag proteins for import into peroxisomes. Results: Characterization of structural properties of PTS2 allows the prediction of novel PTS2 and identification of the binding site on the receptor PEX7. Conclusion: PTS2 forms helical structures that bind to a groove on PEX7. Significance: Understanding the recognition of PTS2 by its receptor is a critical step in peroxisomal protein transport. The import of a subset of peroxisomal matrix proteins is mediated by the peroxisomal targeting signal 2 (PTS2). The results of our sequence and physical property analysis of known PTS2 signals and of a mutational study of the least characterized amino acids of a canonical PTS2 motif indicate that PTS2 forms an amphipathic helix accumulating all conserved residues on one side. Three-dimensional structural modeling of the PTS2 receptor PEX7 reveals a groove with an evolutionarily conserved charge distribution complementary to PTS2 signals. Mammalian two-hybrid assays and cross-complementation of a mutation in PTS2 by a compensatory mutation in PEX7 confirm the interaction site. An unstructured linker region separates the PTS2 signal from the core protein. This additional information on PTS2 signals was used to generate a PTS2 prediction algorithm that enabled us to identify novel PTS2 signals within human proteins and to describe KChIP4 as a novel peroxisomal protein.

The import of a subset of peroxisomal matrix proteins is mediated by the peroxisomal targeting signal 2 (PTS2). The results of our sequence and physical property analysis of known PTS2 signals and of a mutational study of the least characterized amino acids of a canonical PTS2 motif indicate that PTS2 forms an amphipathic helix accumulating all conserved residues on one side. Three-dimensional structural modeling of the PTS2 receptor PEX7 reveals a groove with an evolutionarily conserved charge distribution complementary to PTS2 signals. Mammalian two-hybrid assays and cross-complementation of a mutation in PTS2 by a compensatory mutation in PEX7 confirm the interaction site. An unstructured linker region separates the PTS2 signal from the core protein. This additional information on PTS2 signals was used to generate a PTS2 prediction algorithm that enabled us to identify novel PTS2 signals within human proteins and to describe KChIP4 as a novel peroxisomal protein.
Peroxisomes are single membrane-bound organelles, which are found in all nucleated cells. They host a variety of metabolic functions such as detoxification of hydrogen peroxide (H 2 O 2 ), the degradation of very long and branched chain fatty acids or D-amino acids, and the synthesis of plasmalogens, docosahexaenoic acid, or bile acids (1).
Soluble peroxisomal proteins contain cis-acting peroxisomal targeting signals that mediate their recognition and import into peroxisomes. These signals reside either at the extreme C terminus (PTS1) (2,3) or in proximity to the N terminus (PTS2) (4,5). PTS1 is recognized by its receptor, the peroxin 5 (PEX5) (6,7), and similarly, PTS2 is specifically bound by PEX7 (8,9). These soluble receptors mediate the transport of their cargo proteins to the peroxisomal surface. There they bind to a multimeric protein complex (docking complex) initiating the transfer of the proteins across the peroxisomal membrane (10). In contrast to many other transport mechanisms, the import machinery of peroxisomes can transport fully folded and even oligomerized proteins across the membrane (11,12).
Most of the proven peroxisomal matrix proteins of yeast and mammals harbor a PTS1, but in Arabidopsis thaliana 30% of the known peroxisomal proteins are transported via the PTS2 pathway (13). The PTS2 motif was originally inferred from the analysis of the first 40 amino acids of yeast (4) and rat thiolase (5). More detailed studies on the thiolase PTS2 of yeast (14), rat (15), and tobacco (16) identified relevant positions of the core nonapeptide, and the motif (R/K)(L/V/I)X 5 (Q/H)(L/A) was established as a canonical consensus sequence (14). Recent investigations took advantage of the increasing number of available sequence data and tried to extract a more restrictive consensus sequence based on sequence comparison (13,17), which finally led to the suggestion of R(L/V/I/Q)X 2 (L/V/I/ H)(L/S/G/A)X(H/Q)(L/A) for the most common PTS2 variants and (R/K)(L/V/I/Q)X 2 (L/V/I/H/Q)(L/S/G/A/K)X(H/Q)(L/ A/F) comprising essentially all known possibilities (17). The binding of PTS2 to PEX7 is mediated by a conserved WD-40 domain of PEX7 usually folding into a propeller-like structure, which is often found in peptide-binding proteins (18). The whole transport process appears saturable and can be inhibited by antibodies blocking chaperones of the Hsp70 and Hsp40 family (19). In most species the N-terminal part of the protein including the PTS2 (transit peptide) is processed inside peroxisomes (20), in mammals by the protease TYSND1 (21).
In humans, the selective defect in the PTS2-dependent import pathway due to mutations in PEX7 leads to the severe disease rhizomelic chondrodysplasia punctata type 1 (RCDP1) 3 (22). Patients suffer from congenital cataracts, growth, and mental retardation, shortening of the upper extremities (rhizomelia), and stippled foci of calcification in epiphyseal cartilage (chondrodysplasia punctata) (23). In mammals, three enzymes are known to harbor a PTS2, namely acyl-CoA thiolase exerting the last step of fatty acid ␤-oxidation, alkylglycerone-phosphate synthase (alkyldihydroxyacetone phosphate synthase) (24) participating in plasmalogen biosynthesis, and phytanoyl-CoA hydroxylase (25) exerting the first step of the ␣-oxidation of branched chain fatty acids. Mevalonate kinase, which participates in the synthesis of cholesterol, has been reported to be peroxisomal (26) and harbors a PTS2-like sequence (27), yet no interaction with PEX7 could be found (28).
The quality of algorithms evaluating putative targeting signals based on their similarity to naturally occurring signals depends on a large learning set. This can be compensated by implementation of structural characteristics. However, the quality of the prediction can also serve as criterion for the relevance of the implemented parameters. Recent investigations primarily analyzed the amino acid frequencies at each position of known PTS2 motifs and of putative PTS2 signals encoded in orthologues of PTS2-carrying proteins (13,17) without evaluation of physical property patterns of side chains or sequence segment-based properties.
Using biochemical, cell biological, and computational methods, we have revealed structural requirements for functional PTS2 signals that are important for their interaction with PEX7. This allowed us to generate a prediction algorithm that identified functional PTS2 signals and a novel peroxisomal protein demonstrating the relevance of the identified criteria.
Cross-complementation-Plasmids encoding PTS2 thiolase -EGFP HS 3 E and myc-hPEX7 variants (ratio 1:3) were transfected into COS7 cells by electroporation, and cells were processed for immunofluorescence microscopy and Western blot analysis as described above.
DNA Cloning-For details on DNA cloning, see supplemental material.
Luciferase Assay-COS7 cells were transfected in 24-well plates using Lipofectamine 2000 (Invitrogen) according to the manufacturer's instructions with the following plasmids: the appropriate combinations of 0.35 g of bait (pM-GAL4 encoding plasmids) and 0.35 g of prey (VP16-DNA-BD encoding plasmids) together with 0.1 g of luciferase reporter plasmid pFRluc (P1383, Stratagene) and 0.05 g of pCMV-␤-Gal (P204, Promega) for normalization. After 48 h, the cells were washed once with PBS and incubated with 50 l of lysis buffer (100 mM phosphate buffer, pH 7.8, 0.5% Triton X-100, complete protease inhibitor mixture (Roche Applied Science)) for 20 min. The extracts were centrifuged for 20 min at 15,300 ϫ g, and the supernatant was measured. The luciferase assay was performed according to the protocol of the Matchmaker TM system (Clontech) using pRF-Luc vector (Stratagene) for detection of interaction by luminescence measurements.
Sequence Analysis of PTS2 Segments and Three-dimensional Structural Modeling-cDNA sequences of proteins were derived from the NCBI-based GenBank TM data base (30). For comparison of the proteins within the cordata lineage, the Ensembl data base (31) was used.
Sequence Sets-For the generation of the positive set, only soluble proteins were considered that required the PTS2 signal for their import into peroxisomes (i.e. the PTS2 is either sufficient to target a reporter protein to peroxisomes or mutations in the PTS2 signal destroyed the peroxisomal targeting signal or the encoding protein was found in the cytosol of PEX7-deficient cells). In contrast, PTS2 signals encoded in membrane proteins, such as rat PEX11 (32) or mouse stearoyl-CoA desaturase (SCD1) (33), were not considered. Thus, in summary, 14 evolutionary independent protein families were identified, namely acyl-CoA thiolase, alkylglycerone-phosphate synthase, phytanoyl-CoA hydroxylase, mevalonate kinase, malate dehydrogenase, citrate synthase, acyl-CoA oxidase, heat shock protein 26 (Hsp26), heat shock protein 70 (Hsp70), transthyretinlike protein, long chain acyl-CoA synthetase, aspartate aminotransferase, amine oxidase, and fructose-1,6-bisphosphate aldolase. If one were to take the whole pool of sequence data from these families, a bias would arise because thiolases are widely conserved in eukaryotic evolution, whereas the majority of the other proteins with PTS2 signals are only found in the plant kingdom (eight families). Metazoa (three families), fungi (one family), or protozoa (one family) together contribute five independent protein families. Moreover, the number of available protein sequences differed between the protein families. To produce an evolutionarily balanced and unbiased set of PTS2 proteins, we selected (if possible) three proteins from each protein family, except for thiolase from which three proteins from each eukaryotic kingdom were selected (supplemental Table 1). Within the kingdoms, the chosen proteins originate from evolutionarily distant species such as fish, amphibians, and mammals from metazoa or monocotyledons and dicotyledons from plant species to cover the whole width of the respective kingdom. Finally, the resulting set of 43 selected sequences was aligned according to their PTS2 nonapeptide motif together with the 15 preceding and 25 succeeding amino acids. The maximal pairwise sequence identity in the motif region was determined to be below 70%.
A negative or background set was created to judge statistical significance of enrichment of amino acids in the PTS2 motif positions. It was derived by random selection of eukaryotic N termini out of the IPI proteomes (34) from Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Bos taurus, Gallus gallus, and Arabidopsis thaliana, after removing sequences with greater than 98% sequence identity from each proteome (with cd-hit (35)). To obtain more stable background frequencies, the negative set chosen was 10 times bigger than the positive set. Special care was taken so that the length distribution was identical in both sets to replicate the varying distances of PTS2 motifs from the N terminus.
Sequence Logo-The sequence logo in Fig. 2C was created with the twosamplelogo webserver (36). Only amino acids are shown at the respective positions that are over-represented in PTS2 motifs with a statistical significance of p Ͻ 0.005 (t test). The coloring is according to amino acid type. The height of amino acid letters and position columns in general are proportional to their level of enrichment.
Entropy Difference Analysis-Significance of positional amino acid restrictions ( Fig. 2A) was further evaluated with randomized entropy difference analysis as implemented in the HCV database (65). In short, Shannon entropies are calculated for each position of a positive/query and a negative/background alignment (same sets as described above for sequence logo). Next, using a Monte Carlo procedure, the two sets were mixed randomly with replacement resulting in two random sets with the same size as the original two. Positions marked in red ( Fig.  2A) show significance with a p value Ͻ0.001 if the random sets would obtain higher entropy difference than the original sets in a maximum of 1 out of 1000 set randomizations.
Physical Property Single Position Deviation-The 20-dimensional vector of amino acid frequencies on each position of the alignment of 43 selected PTS2 motifs was tested for correlation with physical properties from a data base of roughly 700 parameter sets (37)(38)(39). The best correlating representatives of nonredundant physical properties for positive charge (40), negative charge (40), bulkiness (41), and aliphatic side chains (42) were selected. Next, the selected physical property parameters were normalized between 0 and 1, and the average was calculated for each position in the PTS2 motif alignment compared with the average of the same physical property in the UniRef50 data base (43). If the absolute value of the difference of the averages at one motif position is higher than twice the absolute value of the median of the physical property over all motif positions, the physical property deviation at the respective position is shown in Fig. 2B.
Physical Property Window Deviation-Averages of windows of physical properties (ϳ700 parameter sets from the data base described above) with length 1-12 were evaluated for maximal deviation between the set of 43 selected PTS2 motifs and the Uniref50 data base (43). The influence of different window sizes is balanced by deriving the average not by dividing by the number of positions but by the square root of the number of positions. The resulting property window averages are ranked by their difference from the UniRef50 average, and among sets of redundant properties (with R-value Ͼ0.4), only the highest deviating instance is kept. In Fig. 3, we show the identified characteristic physical properties "normalized frequency of ␣-helix" (44), "flexibility parameter with no rigid neighbors" (45), and "information measure of coil" (46). Only deviations are shown that are consistently above or below the data base average for a window length of at least four positions.
Structural Modeling-The three-dimensional structure of PEX7_HUMAN was modeled according to multiple structural templates identified by the consensus structure prediction server three-dimensional jury (47) by using the stand-alone version of MODELLER (version 9.5) (48). The templates used are histone-binding protein RBBP7 (PDB code 3cfs, chain B) from H. sapiens and chromatin assembly factor 1 p55 subunit (PDB code 3c9c, chain A) from Drosophila melanogaster. The modeling process was performed in two steps.
Step 1 is the building of a three-dimensional structural model according to multitemplates. Dynamic programming-based structural alignment was performed to the aforementioned templates, and the amino acid sequence of PEX7_Human. This process was performed by using the salign class of MODELLER (48), and then 100 structural models were built based on this alignment and the structures of the templates by using the automodel class of MODELLER. At the same time, the discrete optimized protein energy score of each model was calculated, and the one with the lowest energy was selected for further loop refinement. Step 2 is the loop refinement. According to the alignment, there are some amino acids in PEX7 that are corresponding to gaps in the templates in the alignment. These loop regions can be further refined by using the loop model class of MODELLER. Because the alignment is available, the refinement can be carried out automatically. During this process, 200 models were built; the discrete optimized protein energy score of each model was calculated, and the one with the lowest energy was selected as the final model for further amino acid conservation value mapping.
Mapping of Amino Acid Conservation Values-In total, 41 orthologue sequences of PEX7 were retrieved from the website of OMA (49), which were aligned together with PEX7_Human by using the multiple sequence alignment toolkit of MAFFT (L-INS-I settings) (50). The conservation values of each amino acid of PEX7_Human were calculated by using the method of real valued evolutionary trace (51), excluding positions with more than 30% gaps. Those values were then mapped to the B-factor column of the PDB file of the structural model built above. The structure and conservation mapping were then visualized in Yasara (52). To confirm that the observed conservation site in PEX7 is protein family-specific rather than foldspecific, we have repeated the procedure for the RBBP7 protein, which has the same fold as PEX7 but is from a different family and found a distinct pattern of conservation on the side that interacts with the histone helix (data not shown).
Helix Docking-To evaluate possible binding conformations of the putative PTS2 helix, multiple orientations were tried through manual placement, and one tentative candidate orientation was chosen that satisfied the complementary pattern of charge and hydrophobicity. The complex of PEX7 with the PTS2 helix was then energy-minimized through short simulated annealing molecular dynamics simulations using the AMBER03 force field as implemented in Yasara (52).
PTS2 Signal in Silico Screening-The methods and detailed description of the basic PTS2 in silico screening algorithm used in this study are summarized in the supplemental material.

RESULTS
The experimentally verified consensus sequence for PTS2 signals is dominated by the characterizations of positions S 1 , S 2 , S 3 , and S 4 in various species. 4 In contrast, restrictions of the central five positions of the signal (X 1 -X 5 ) are hardly understood and primarily extrapolated from amino acid frequencies in naturally occurring PTS2 signals. However, mammalian proteins are usually under-represented in such comparisons.
Mutational Analysis of Canonical PTS2 Signal-Thus, we performed a mutational analysis of these central five amino acids, using human thiolase as model PTS2 peptide. A reporter construct was generated, in which the first 30 amino acids of rat thiolase B were cloned in front of EGFP, and the PTS2 nona-peptide was flanked by two restriction sites (PstI and EcoRI) allowing the simple exchange of nonapeptides (Fig. 1A). When the plasmid encoding the reporter protein with the human thiolase PTS2 (RLQVVLGHL) was transfected into COS7 cells and the subcellular localization of EGFP was analyzed by immunofluorescence microscopy, we obtained a punctate staining pattern. EGFP was found to be colocalized with the peroxisomal membrane protein PMP70 (Fig. 1B), indicating peroxisomal targeting of the fusion protein. In contrast, when the reporter protein harbored an arbitrary tripeptide (-RSL) instead of a PTS2, EGFP was found evenly distributed across the cell, indicating a cytosolic and nuclear distribution (Fig. 1C). This proved that the import of the reporter protein was dependent on a functional PTS2. Using this reporter construct, we analyzed the effect of single amino acid substitutions of the central five amino acids (X 1 -X 5 ) of human thiolase by either acidic (aspartate), basic (lysine or arginine), or bulky hydrophobic (leucine) amino acids (Table 1).
We found that the introduction of a negative charge (aspartate) at position X 2 (VX 2 D) and X 3 (VX 3 D) destroyed the PTS2, but it was well tolerated at positions X 1 (QX 1 D), X 4 (LX 4 D), and X 5 (GX 5 D). Similarly, the introduction of a positive charge at position X 3 (VX 3 K) destroyed the PTS2, but at all other positions the import was retained. Interestingly, the reporter protein was found in peroxisomes and mitochondria when the positive charge was introduced at position X 2 (VX 2 K, Fig. 1, D and E) or X 5 (GX 5 R, Fig. 1, F and G). This was indicated by colocalization with the mitochondrial marker protein ATPase. The introduction of the hydrophobic amino acid leucine at position X 1 (QX 1 L) (Fig. 1H) and X 5 (GX 5 L) did not destroy the PTS2, but the mutation QX 1 L introduced an additional ER targeting signal as indicated by colocalization with the ER marker protein-disulfide isomerase (Fig. 1I).
In summary, these experiments demonstrate that in an evolutionarily optimized PTS2 (human thiolase), the majority of mutations in the central five amino acids is well tolerated. However, charged residues at position X 3 or a negatively charged residue at position X 2 destroy the PTS2 signal, revealing new restrictions for functional PTS2 signals. Nonetheless, these restrictions are still too loose to explain the low number of functional PTS2 signals and the strong conservation of PTS2 signals across evolution.
Sequence and Physical Property Analysis of Naturally Occurring PTS2 Signals-To elucidate further restrictions for functional PTS2 signals, we performed a detailed sequence and physical property analysis of naturally occurring PTS2 signals. In contrast to previous investigations, we compensated for the over-representation of plant proteins, which contribute more than 50% of protein families harboring PTS2. Therefore, a positive set of PTS2 carrying proteins was compiled, in which each protein family and each phylum are represented by three evolutionarily distant proteins. Moreover, the deviation of amino acid frequencies found in sequences proximal to the N terminus compared with frequencies of overall proteins was taken into account (details on the selection of PTS2 carrying proteins and the determination of amino acid frequency are summarized under "Experimental Procedures"). When the extended PTS2 motif alignment was analyzed for the information density (Shannon entropy) ( Fig. 2A) at each position, significant differences between the positive set comprising the PTS2 harboring sequences and the background set were found at all positions of the core PTS2 nonapeptide. Maximum differences were obtained at the characteristic positions of the consensus sequence (S 1 -S 4 ) and at position X 3 . However, several positions outside of the core PTS2 signal also showed significant differences between the sets suggesting further restrictions for PTS2 signals.
A similar pattern of positions, with significant differences between positive and background set, was found when the relative abundance of classes of amino acids sharing physical properties of their side chains such as charge or bulkiness was compared (Fig. 2B, upper part). Within canonical PTS2 sequences (Fig. 2B, lower part), the known characteristics of conserved positions are well reflected in that basic residues are found over-represented at S 1 and S 3 and large hydrophobic residues are preferred at positions S 2 and S 4 . The properties of residues preferentially found at position X 3 , and to a lesser extent at X 2 , resemble those at positions S 2 and S 4 . Minor but significant differences between positive and background sets were also found, namely an over-representation of basic residues at position X 1 an under-representation of acidic residues at positions X 2 , X 3 , and X 4 and of bulky hydrophobic residues at position X 5 . Furthermore, significant deviations of the positive set were again found at several positions outside of the core PTS2, but information density and amino acid properties together were significantly different only at positions y 3 (aliphatic), y 1 (basic), and z 2 (aliphatic).
The relative enrichment of individual amino acids within the PTS2 motif alignment also reflects the canonical consensus sequence at positions S 1 , S 2 , S 3 , and S 4 (Fig. 2C), except for a lack of lysine over-representation at position S 1 . At position X 3 , the bulky aliphatic residues leucine and isoleucine were found over-represented. Moreover, minor over-representations were found at many other positions inside and outside the PTS2, but only the preferences for alanine at position y 3 , for arginine at position y 1 , and for proline at position z 2 coincide with preferences at other levels of conservation. These results corroborate the experimental evidence for the importance of position X 3 , but suggest further restrictions for PTS2 signals, which act within and outside the core PTS2.
Helical Structure of PTS2 Motifs-Next, we considered the contribution of position X 3 to functional signals, because there the preference for large and hydrophobic residues coincides with the inactivating effect of charged amino acids. As X 3 is separated from the two other hydrophobic amino acids (S 2 and S 4 ) by two and three amino acids, respectively, the hydrophobic residues S 2 , X 3 , and S 4 can be aligned on one side of an ␣-helix with seven amino acids per two turns. Moreover, the two basic amino acids (S 1 and S 3 ) would align alongside the helix leading to a positive flank. A model of such an ␣-helix with the charge distribution pattern of the human thiolase PTS2 is depicted in a top (Fig. 3A) or frontal (Fig. 3B) view.
When the sequences harboring PTS2 signals (positive data set) were analyzed for their probability to contain ␣-helical structures (Fig. 3C), we found amino acids supporting the formation of ␣-helices to be over-represented between positions y 2 and S 3 (green line), whereas the flanking regions are rich in amino acids mediating a high flexibility (blue line) or the formation of coiled structures (orange line) rather than regular structures. Thus, naturally occurring PTS2 signals are probably represented by an ␣-helix, which is flanked by unstructured regions. To corroborate the hypothesis that the ability to form an ␣-helical structure is necessary for a functional PTS2 signal, the helix-breaking amino acid proline was introduced at the least conserved position, X 4 , of the human thiolase PTS2 within the reporter construct. When this plasmid was transfected into COS7 cells, the protein was found diffusely distributed across the cell, indicating that this mutation destroyed the PTS2 signal. In contrast, other mutations at the same position (basic, acidic, large, or small residues) did not interfere with peroxisomal import (Fig. 3D).
Interaction between PTS2 and Its Receptor PEX7-Provided that the PTS2 forms a well defined ␣-helical structure with a conserved charge distribution, the cognate receptor PEX7 should recognize this signal by a complementary domain on its surface. Thus, a three-dimensional homology-based model of human PEX7 was generated as described in detail under "Experimental Procedures." Because the structure of PTS2 appears conserved across evolution, the complementary PTS2binding domain of PEX7 should behave similarly. Thus, the evolutionarily conserved amino acids on the surface of the PEX7 model were labeled by a color code (Fig. 4A), whereas the less conserved sequences are indicated in gray. The most conserved region of PEX7 is a cluster on top of the propellerlike structure of the WD-40 domain (Fig. 4A), whereas other areas appear less conserved as illustrated in side view (Fig. 4B) or a view from the bottom of this structure (Fig. 4C). This conserved area forms a groove-like structure, in which the PTS2 helix can be embedded (Fig. 4B). Furthermore, the surface charge pattern of this groove appears complementary to that of the hypothetical PTS2 helix (Fig. 4A). Interestingly, when mutations in PEX7 occurring in RCDP1 patients were entered into this model of PEX7, the majority of missense mutations, which still result in a stable protein, are located at the conserved part of the protein (indicated as spheres in the side view of Fig.  4, D, and top view of E).
Ϫ indicates no import of the reporter protein harboring the human thiolase PTS2. b Boldface type indicates residues that were introduced into human thiolase PTS2 within the reporter protein context.
To test this model, we investigated the interaction between the PTS2 and PEX7 in more detail. Two glutamate residues of PEX7 (Glu-113 and Glu-200) are predicted to lie in close proximity to R(S 1 ) and H(S 3 ) of the PTS2 (Fig. 4A and schematically depicted in Fig. 5A) and should contribute to the interaction between signal and receptor. A third glutamate residue (Glu-287), which resembles the other glutamates with respect to its position in the WD-40 domain and its conservation, but appears remote from the bound PTS2, served as a control. When the interaction between human PEX7 and PTS2 thiolase -EGFP was measured in a mammalian two-hybrid assay, we found that this interaction caused a strong and specific signal in the luciferase reporter activity (Fig. 5B). However, when the glutamate residues Glu-113 and Glu-200 of PEX7 were substituted by arginine, the interaction between PTS2 and these PEX7 variants was no longer detectable. In contrast, the substitution of Glu-287 retained most of the interaction between the PTS2 and PEX7, indicating that such mutations are compatible with a functional WD-40 structure. Next, the functional consequences of these mutations were tested by the restoration of PTS2-mediated import in cultured human fibroblasts of an RCDP1 patient lacking functional PEX7. Therefore, these fibroblasts were cotransfected with expression plasmids for PTS2 thiolase -EGFP together with either the empty vector (Fig.  5C) or with normal human PEX7 carrying an N-terminal Myc tag (Fig. 5D) or with mutated variants thereof (E113R in Fig. 5E; E200R in Fig. 5F). As expected PTS2 thiolase -EGFP was found evenly distributed across RCDP1 fibroblasts (Fig. 5C) but colocalized with PMP70 upon coexpression of myc-hPEX7 (Fig.  5D). myc-hPEX7 carrying the mutation E113R was not able to compensate for PEX7 deficiency (Fig. 5E), but cotransfection with myc-hPEX7 (E200R) caused a punctate staining against a cytosolic background (Fig. 5F) suggesting that the latter mutation can still partially complement PEX7 deficiency.
To further corroborate the close proximity between Glu-200 of PEX7 and histidine at position S 3 of the PTS2, a cross-complementation experiment was performed. When the histidine at position S3 of PTS2 thiolase -EGFP was substituted by glutamate (HS3E), the PTS2 signal was destroyed (Fig. 5G), and this effect was neither compensated by overexpression of myc-hPEX7 (Fig. 5H) nor by myc-hPEX7 (E113R) (Fig. 5I). However, when PTS2 thiolase -EGFP (HS3E) was coexpressed with myc-PEX7 (E200R), the reporter protein was found in peroxisomes (Fig. 5J), indicating that the mutation HS3E is specifically compensated by E200R. Western blot analysis of protein extracts from similarly transfected COS7 cells demonstrated comparable levels of PTS2 thiolase -EGFP-HS3E and of the myc-hPEX7 variants (Fig. 5K). Together, these results support the threedimensional model of the interaction between PTS2 thiolase and PEX7 as illustrated in Fig. 4.
PTS2 in Silico Screening Algorithm-Provided that the helical structure followed by a flexible domain is an important characteristic of PTS2 signals, these parameters could serve to categorize peptides that fulfill the minimal consensus sequence of  PTS2 signals. Thus, an in silico screening algorithm was developed, which evaluates the N-terminal 40 amino acids of putative PTS2-carrying proteins based on the following: (i) comparison of the core nonapeptide to amino acid frequencies found at each position of naturally occurring PTS2 signals; (ii) restrictions deduced from the mutational analysis of the central five positions (X 1 -X 5 ) of human thiolase PTS2; (iii) evaluation of the helical propensity of the putative PTS2 signal; and (iv) the presence of an unstructured domain C terminus to the PTS2 signal (see supplemental material). This algorithm was used to evaluate the N termini of all human proteins (ϳ39,000 human RefSeq sequences from the NCBI GenBank TM (30)), which were then ranked according to their prediction score. After the exemption of all transmembrane proteins, which should not be substrates for PTS2-mediated protein transport, a list of promising candidates was obtained (30 top candidates except proteins included in the learning set are listed in Table 2). Fourteen of these candidates were chosen for further investigation (boldface in Table 2) to evaluate the reliability of our algorithm. First, the minimal PTS2 signals encoded in these proteins were tested for their ability to mediate peroxisomal targeting in the context of the reporter protein. We found that three peptides encoded in the proteins KChIP4 (potassium channel interacting protein 4), GLOXD1 (glyoxalase domain containing protein 1), and TGF␤2 (transforming growth factor ␤2), respectively, acted as functional PTS2 signals and transported EGFP selectively into peroxisomes (Fig. 6, A-C). The peptide encoded in the RAI17 (retinoic acid-induced protein/ZIMP10) acted as a PTS2 but also as mitochondrial targeting signal, because the reporter protein was found to colocalize with PMP70 (Fig. 6D) and with the mitochondrial marker ATPase (data not shown). The other 10 peptides investigated were not able to target the reporter protein to peroxisomes and thus do not represent PTS2 signals. Thus, roughly 28% of the chosen candidate proteins actually harbor a functional PTS2 signal.
To investigate whether the identified PTS2 signals are functionally active in their native protein context, the subcellular distribution of KChIP4, GLOXD1, TGF␤2, and RAI17 was investigated when expressed as EGFP-tagged full-length proteins. We found that KChIP4-EGFP (Fig. 6E) selectively colocalized with PMP70 indicating a peroxisomal localization of the fusion proteins. TGF␤2-EGFP was mainly found colocalized with the ER marker protein-disulfide isomerase, although additional peroxisomal targeting was observed in some cells (data not shown). GLOXD1-EGFP was found to colocalize with the mitochondrial marker MnSOD, and RAI17-EGFP was found in the cytosol and the nucleus of cells (data not shown). The overall summary of our investigation is depicted in Table 3.

DISCUSSION
Since its initial description, the PTS2 has attracted less attention than PTS1. Although major determinants for PTS2 motifs have been elucidated previously, the consensus sequence for this targeting signal was too loose to explain the low number of functional PTS2 signals. Here, important new properties of mammalian PTS2 signals were elucidated, and their binding site on the receptor PEX7 was identified.
The mutational analysis of the central five amino acids of a human thiolase PTS2 identified functional restrictions that exclude specific residues at individual positions of the motif. In contrast, the detailed sequence and physical property analysis of available PTS2 signals reveals the optimal shaping of PTS2 signals by evolutionary adaptation processes.
We can demonstrate experimentally that bulky aliphatic amino acids are not only preferred at position X 3 but are essential for a functional PTS2 as the conversion into a charged residue inactivates the signal. In contrast, at position X 2 , lysine is well tolerated, whereas aspartate inactivates the signal although FIGURE 5. Interaction between PEX7 and PTS2. A, schematic representation of the interaction face between PEX7 (green) and PTS2 (orange) as suggested by the three-dimensional model. B, mammalian two-hybrid assay. COS7 cells were cotransfected with plasmids encoding GAL4 DBD -PEX7 or variants thereof (E113R, E200R, and E287R) and VP16-AD-PTS2 thiolase -EGFP fusion proteins or the empty vector together with a plasmid encoding the UAS GAL4 -luciferase reporter and a plasmid expressing ␤-galactosidase for normalization. C-F, immunofluorescence microscopy of human RCDP1 fibroblasts lacking functional PEX7 after cotransfection of expression plasmids for PTS2 thiolase -EGFP and an empty vector (EV) (C), or for myc-hPEX7 (D), myc-hPEX7 (E113R) (E), or myc-hPEX7 (E200R) (F) using ␣-EGFP (green) and ␣-PMP70 (red) antibodies. G-K, immunofluorescence microscopy of COS7 cells that were cotransfected with the reporter plasmid PTS2 thiolase -EGFP encoding the mutation HS 3 E together with either an empty vector (G), myc-hPEX7 (H), myc-hPEX7 (E113R) (I) or myc-hPEX7 (E200R) (J). K, Western blot analysis of protein extracts from COS7 cells cotransfected with PTS2 thiolase -EGFP (HS 3 E) together with Myc-tagged versions of PEX7 (myc-hPEX7), myc-hPEX7 (E113R), or myc-hPEX7 E200R or an empty plasmid using ␣-EGFP and ␣-Myc antibodies. Labeling with ␣-␤-actin served as loading control. Scale bars, 50 m for C-F and 20 m for G-J. the amino acid preference resembles that of position X 3 . However, basic residues at position X 2 generate an additional mitochondrial targeting signal, suggesting that the avoidance of a competing targeting signal is an additional reason for the under-representation of amino acid classes at individual posi-tions. Similarly, basic residues at X 5 (arginine) can promote additional mitochondrial targeting, and large aliphatic residues (leucine) at X 1 can generate an additional targeting signal for the ER. In addition to the three type-changing substitutions that inactivate the PTS2, sequence alignment of known PTS2   motifs suggested further restrictions that are reflected by overand under-representation of specific types of amino acids (e.g. bulky residues are significantly underrepresented at position X 5 , but the large hydrophobic amino acid leucine is well tolerated). It remains to be emphasized that our mutational analysis was performed in a reporter system that exposed the PTS2 at the extreme N terminus and thus might confer fewer restrictions than PTS2 signals located further away from the N terminus. Moreover, our investigations revealed a bipartite structural motif, which appears conserved across all PTS2 signals, namely a helical structure supposed to interact with PEX7 and an unstructured region connecting the core protein with the actual PEX7-interacting sequence segment.
The helical structure is indicated by the over-representation of amino acids supporting the formation of ␣-helices and the absence of the helix breaking amino acid proline in naturally occurring PTS2 signals. Furthermore, the PTS2 signal of human thiolase is sensitive to the insertion of proline at position X 4 , whereas large, small, acidic, and basic amino acids are well tolerated. In a helical structure, the two basic residues (S 1 and S 3 ) as well as the two large hydrophobic residues (S 2 and S 4 ) are aligned on one side of the helix, separated by two turns (Fig.  3, A and B). Together with the important position X 3 , they comprise one side of the helix with a hydrophobic area aligned by positive charges. The other side of the helix appears less conserved, and functional PTS2 signals are compatible with all amino acid classes at positions X 1 , X 2 , X 4 , and X 5 , except for a negative charge at position X 2 . However, in an ␣-helix, X 2 is positioned in close proximity to the basic residues S 1 and S 3 , and a negative charge could neutralize one of these charges and thereby inactivate the PTS2. The helical structure of PTS2 signals resembles mitochondrial targeting signals, which have been described as positively charged amphipathic helices (53). This similarity is supported by our observation that single point mutations in PTS2 thiolase such as V(X 2 )K or G(X 5 )R can generate a mitochondrial targeting signal without affecting peroxisomal targeting. These mutations alter the side of the helix that is not required for peroxisomal targeting, whereas in rat thiolase PTS2 the substitution of histidine (S 3 ) by basic amino acids also generated mitochondrial targeting but destroyed the PTS2 (5).
As a helical PTS2 motif exposes all highly conserved residues on one face of the helix, the interaction with the receptor PEX7 should involve this side of the helix. Accordingly, the newly generated three-dimensional model of PEX7 revealed a groove on the most conserved surface area, which shows a charge distribution complementary to the conserved side of the PTS2 helix. Moreover, many missense mutations in PEX7 identified in patients suffering from RCDP1 (54) affect this region.
Overall, our refined three-dimensional model of PEX7 resembles structures suggested previously (29,55), but it identified conserved residues that appear concentrated on one side of the WD-40 structure and allowed the prediction of residues that contribute to the interaction. Two conserved glutamic acid residues of PEX7 (Glu-113 and Glu-200) are proposed to interact with arginine S 1 and histidine S 3 , respectively. Substitution of each of these glutamates by arginine reduced the interaction of PEX7 with PTS2 thiolase below the detection level of the mammalian two-hybrid assay, whereas another glutamate to arginine mutation at a similar position in the WD-40 domain largely retained the interaction. Moreover, the mutation E113R in myc-hPEX7 destroyed its ability to complement PEX7 deficiency in RCDP1 fibroblasts, and the mutation E200R can restore the import of a PTS2 thiolase variant harboring the reciprocal charge exchange HS 3 E, which normally inactivates the PTS2. Thus, we consider the predicted model of interaction highly probable despite the surprising finding that myc-hPEX7 (E200R) partially complements PEX7 deficiency. The latter could be due to residual binding between PTS2 thiolase and PEX7-E200R, which was below the detection limit of the twohybrid assay. In the two-hybrid system, the PTS2 is fused to the VP16 activation domain, which dislocates the PTS2 from the extreme N terminus, and this might render the strength of the interaction with PEX7 more sensitive to mutations. Our prediction poses the PTS2 helix horizontally in the groove on top of the WD-40 domain of PEX7 and thus differs from the insertion of a linear unfolded peptide into the channel in the middle of the propeller as suggested previously (55). This orientation is supported by docking experiments and by the fact that the PTS2 signal of some proteins appears up to 37 amino acids away from the start (human alkylglycerone-phosphate synthase), which renders a linear insertion of the N terminus before the recognition of the PTS2 signal less probable.
The exposure of the PTS2 signal away from the bulk of the protein was concluded from predictions that indicate an unstructured linker region, as noticed between C-terminal PTS1 motifs and the core proteins. Our computational analysis is in agreement with three-dimensional structures of naturally occurring proteins, as in the available x-ray structures of human acyl-CoA thiolase (PDB code 2IIK) and human phytanoyl-CoA hydroxylase (PDB code 2A1X) the N-terminal sequences, including the putative linker regions, were not resolved suggesting that these sequences are not sufficiently structured. Such a linker has been shown to be of functional importance for the exposure of the PTS1 signal (56,57), but in the case of PTS2 the linker domain should also contain the cleavage site for the processing peptidase.
We are confident that the identified criteria are relevant for typical PTS2 signals because their implementation into an in silico screening algorithm allowed the generation of a preliminary PTS2 prediction program, which led to a hit rate of 4 out of 14 when testing a list of PTS2 signals with a high PTS2 score derived from the whole human proteome. Moreover, KChIP4 (58) was imported into peroxisomes when expressed as EGFPtagged full-length protein, and thus the algorithm led to the identification of a novel peroxisomal protein. EGFP-tagged full-length TGF␤2 (59) was found predominantly in the ER, GLOXD1 in mitochondria, and RAI17 in the nucleus and cytosol as described recently (60) suggesting that the targeting signals for these organelles can over-rule PTS2 as previously demonstrated for PTS1 signals (56). Alternatively, the lack of peroxisomal targeting despite a functional PTS2 might be due to a modulating influence of the amino acids directly surrounding the PTS2.
The newly identified peroxisomal protein KChIP4 (Kv-channel interacting protein 4) was first described to interact with a potassium channel and presenilin (60). However, the protein appears in various splice variants, some of which have been described in the cytosol or at the plasma membrane (61,62). The subcellular localization of the variant analyzed in this study has not been investigated, although its expression is well described. KChIP4 belongs to a family of proteins harboring the structural motif of an EF-hand, which mediates structural changes upon Ca 2ϩ binding (63). The expression profile of KChIP4 (64) shows brain selectivity, which explains the absence of the protein from peroxisomal fractions analyzed by proteomic approaches.
In summary, this investigation refines structural requirements for functional PTS2 signals and suggests a model for the interaction with the receptor PEX7. These criteria allowed the identification of four functional PTS2 signals encoded in human proteins and a novel peroxisomal protein.