If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
* This research was supported in part by an American Association for Cancer Research-Fox Chase Cancer Center Career Development Award in Translational Cancer Research, by Department of Defense Neurofibromatosis Research Program Grant W81XWH-05-1-0200, and by a grant from the Pennsylvania Department of Health (all to J. R. P.). Additional support was provided by the National Institutes of Health Grant CA006927 and by an appropriation from the Commonwealth of Pennsylvania to Fox Chase Cancer Center. The Structural Genomics Consortium is a registered charity (number 1097737) funded by the Wellcome Trust, GlaxoSmithKline, Genome Canada, the Canadian Institutes of Health Research, the Ontario Innovation Trust, the Ontario Research and Development Challenge Fund, the Canadian Foundation for Innovation, VINNOVA, the Knut and Alice Wallenberg Foundation, the Swedish Foundation for Strategic Research, and Karolinska Institutet. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The on-line version of this article (available at http://www.jbc.org) contains supplemental Tables 1 and 2 and Fig. 1. 1 Supported by funding from NCI, National Institutes of Health (NIH), Grant T32 CA009035. 2 Present address: Division of Biology, Kansas State University, Manhattan, KS 66502.
The p21-activated kinases (Paks) serve as effectors of the Rho family GTPases Rac and Cdc42. The six human Paks are divided into two groups based on sequence similarity. Group I Paks (Pak1 to -3) phosphorylate a number of substrates linking this group to regulation of the cytoskeleton and both proliferative and anti-apoptotic signaling. Group II Paks (Pak4 to -6) are thought to play distinct functional roles, yet their few known substrates are also targeted by Group I Paks. To determine if the two groups recognize distinct target sequences, we used a degenerate peptide library method to comprehensively characterize the consensus phosphorylation motifs of Group I and II Paks. We find that Pak1 and Pak2 exhibit virtually identical substrate specificity that is distinct from that of Pak4. Based on structural comparisons and mutagenesis, we identified two key amino acid residues that mediate the distinct specificities of Group I and II Paks and suggest a structural basis for these differences. These results implicate, for the first time, residues from the small lobe of a kinase in substrate selectivity. Finally, we utilized the Pak1 consensus motif to predict a novel Pak1 phosphorylation site in Pix (Pak-interactive exchange factor) and demonstrate that Pak1 phosphorylates this site both in vitro and in cultured cells. Collectively, these results elucidate the specificity of Pak kinases and illustrate a general method for the identification of novel sites phosphorylated by Paks.
The abbreviations used are: Pak, p21-activated kinase; GST, glutathione S-transferase; PSSM, position-specific scoring matrix; OPS, optimal Pak substrate; PKA, protein kinase A; ROC, receiver operating characteristic; AUC, area under the ROC curve; GTPγS, guanosine 5′-3-O-(thio)triphosphate.
4The abbreviations used are: Pak, p21-activated kinase; GST, glutathione S-transferase; PSSM, position-specific scoring matrix; OPS, optimal Pak substrate; PKA, protein kinase A; ROC, receiver operating characteristic; AUC, area under the ROC curve; GTPγS, guanosine 5′-3-O-(thio)triphosphate.
interact with active forms of the 21-kDa molecular mass Rho-related GTPases Rac and Cdc42. Paks can be divided into Group I (Pak1 to -3) and the more recently discovered Group II (Pak4 to -6), based on sequence similarity and regulatory mechanism (
). Whereas the catalytic activity of the Group I Paks is markedly up-regulated upon binding to GTP-bound Rac or Cdc42, the Group II Paks interact with but are not demonstrably activated by Rac or Cdc42.
Substantial data link the expression and hyperactivity of Paks to tumorigenesis and metastasis (
). These observations have led to a great interest in understanding the effector pathways downstream of Paks that mediate tumorigenesis and metastasis.
Over 30 direct substrates of Group I Paks have been identified, outlining major functional roles in cytoskeletal regulation, survival signaling, cell cycle progression, and mitogen-activated protein kinase pathway activation (reviewed in Ref.
; see also supplemental Table 2). Pak1 and -2 are the most distantly related Group I Paks yet share 91% sequence identity within their kinase domains, suggesting that they may recognize similar substrates. Indeed, Pak1 and -2 phosphorylate a number of common substrates, including Bad, Raf, Mek, and Merlin (
For the Group II Paks, much less is known regarding regulation of kinase activity, identity of substrates, or function. Similar to Group I Paks, studies have implicated Group II in survival signaling, mitogen signaling, and cell motility (
). The kinase domain of Pak4 shares only 53-55% sequence identity with Group I Paks. The sequence divergence between the kinase domains of the two Pak groups suggests that they may recognize distinct substrates and serve at least partially divergent functions (
). Thus, an important outstanding question is to what degree these two groups recognize similar substrates and are functionally redundant.
Previous studies have generally investigated Pak kinase substrate specificity in an ad hoc manner based on testing the impact of amino acid substitutions within known Pak protein and peptide substrates. Such studies, however, are generally not comprehensive and can incur bias from the particular sequence context. For example, using mutagenesis, King et al. (
) identified important sequence determinants for Pak3 phosphorylation of Raf-1 at Ser338, but the importance of these specific residues in other sequence contexts is unknown. Similar work analyzing Pak2 phosphorylation of synthetic peptides derived from the Rous sarcoma virus nucleocapsid NC protein has elucidated recognition determinants in this context (
). Unfortunately, whereas the Raf-1 study focused primarily on amino acids C-terminal to the targeted serine, the NC study mainly investigated residues N-terminal to the phosphorylation site, making it difficult to determine if the features identified in each study are context-dependent or independent. However, both studies did identify an arginine residue two amino acids N-terminal to the phosphoacceptor site (the -2-position) as a critical determinant for Pak recognition. Recently Shaw and co-workers (
) confirmed this finding using an alternative approach employing degenerate peptide mixtures and observed a strong bias for peptides containing arginine at the -2-position and lesser preferences for arginine at the -3- and -4-positions.
We report here the application of a recently described positional scanning peptide library approach to determine the complete, context-independent substrate sequence preferences of members of both the Group I and Group II Paks and apply that information to identify a novel Pak1 phosphorylation site in Pix. Although several serine/threonine kinases have been analyzed with this method (
), it has not yet been applied to a member of the Ste20-related kinase family, and no comprehensive peptide library analysis has yet been conducted for a Pak kinase.
Cell Culture—HEK293 cells were cultured in Dulbecco's modified Eagle's medium supplemented with 10% heat-inactivated fetal calf serum, 2 mml-glutamine, and 100 units/ml penicillin/streptomycin.
Antibodies—A phosphopeptide and corresponding unphosphorylated peptide were synthesized according to the sequence surrounding βPix serine 340 (acetyl-CLSASPRMS(PO3)GFI-CONH2; Synpep Corp.). The anti-phosphoserine 340 βPix antibody was generated by immunizing rabbits with the phosphorylated peptide (Covance). Phosphospecific Pix antibodies were isolated in our laboratory by negative selection on immobilized, unphosphorylated peptide followed by positive affinity selection on immobilized immunogen. Antibodies eluting in 100 mm glycine, pH 2.5, were recovered and dialyzed against phosphate-buffered saline. Bovine serum albumin (2 mg/ml final concentration) and glycerol (50% final concentration) were added, and the purified antibodies were stored at -20 °C. The remaining antibodies used were anti-βPix (polyclonal; Chemicon), phospho-PAK1/2 (Thr423/Thr402) (polyclonal; Cell Signaling), and anti-Myc 9B11 antibody (monoclonal; Cell Signaling).
Plasmids—pcDNA3-Myc-βPix was obtained from R. Cerione (Cornell). pJ3H-Pak1 K299R and pJ3H-Pak1 T423E were prepared by subcloning Pak1 from pJ3M-Pak1 using BamHI/EcoRI restriction sites. Pak2 was cloned as a BamHI/XhoI fragment into pET28. A cDNA for human αPix obtained from the Kazusa DNA Research Institute (KIAA0006) was used as a PCR template to generate a BamHI/EcoRI fragment encoding amino acids 155-546 of human αPix (MTEN... LNRL) that was cloned into pGEX-6P-1 (GE Healthcare) for expression as a GST fusion protein. The GST-Pak4 (amino acids 209-501) expression construct was previously reported (
). Mutagenesis was conducted using the QuikChange site-directed mutagenesis kit (Stratagene) using the following primers (mutated nucleotides in boldface type): βPix S340A, forward (5′-CTG CCA GTC CTA GGA TGGCTG GCT TTA TCT ATC AGG-3′) and reverse (5′-CCT GAT AGA TAA AGC CAGCCA TCC TAG GAC TGG CAG-3′); αPix S488A, forward (5′-CTG CAA GTC CTC GGA TGGCTG GCT TTA TCT ATC AGG G-3′) and reverse (5′-CCC TGA TAG ATA AAG CCA GCC ATC CGA GGA CTT GCA G-3′); Pak2-QR, forward (5′-CAG AAA CAG CAAAGG AAG GAG CTC ATC ATT AAC G-3′) and reverse (5′-CGT TAA TGA TGAGCT CCT TCC TTT GCT GTT TCT G-3′); Pak4-PK, forward (5′-GCG CAA GCA GCC GAA GCG CGA GCT CCT CTT CAA CG-3′) and reverse (5′-CGT TGA AGA GGA GCT CGC GCT TCG GCT GCT TGC GC-3′).
All mutant constructs were fully sequenced. For expression in Escherichia coli, GST-βPix wild type and GST-βPix S340A were subcloned from pcDNA3-Myc-βPix into pGEX-6P-1 using BamHI/EcoRI.
Sequence Logos—Position-specific scoring matrix (PSSM) sequence logos were generated manually in Adobe Illustrator using the Arial black font and scaling letters by the absolute value of the log2 of the raw selectivity score.
Determination of Pak Phosphorylation Specificity—Peptide library screens were carried out as described previously with minor modifications (
). Briefly, a series of 198 partially degenerate peptides with the general sequence YAXXXXX(S/T)XXXXAGKK(biotin) was employed, in which S/T indicates an even mixture of Ser and Thr, and all positions X except one are a degenerate mixture of the 17 amino acid residues excluding Ser, Thr, and Cys. Each individual peptide bears one of 22 amino acid residues (all unmodified residues, plus phosphothreonine and phosphotyrosine) fixed at one of the X positions. Reactions were carried out in sealed multiwell plates for 2 h at 30 °C in a buffer containing 50 mm HEPES, pH 7.4, 12.5 mm NaCl, 1.5 mm MgCl2, 1.5 mm MnCl2, 0.1% Tween 20 with 50 μm [γ-32P]ATP or [33P]ATP at 0.3 μCi/μl, 50 μm peptide substrate, and the kinase of interest. Pak1 reactions also included 1.3 μm GTPγS-charged Cdc42. Aliquots of the reactions were spotted onto streptavidin membranes and washed as described previously (
). Incorporation of radioactivity into peptides was quantified by exposure to a phosphor screen and analysis using ImageQuant software. PSSMs were generated from background-subtracted data that were normalized as described (
). Data reflecting the average selectivity values from at least two separate runs are shown. To determine the phosphoacceptor residue specificity, similar peptides bearing the sequence YAXXXXXZXXXXAGKK(biotin) were used in which all X positions were degenerate, and the Z residue was either Ser, Thr, or Tyr. Reactions were carried out in microcentrifuge tubes as described above for the peptide library screens. Aliquots (2 μl) were spotted onto the streptavidin membrane, which was washed and quantified as above.
In Vitro Kinase Assays with Optimal Pak Substrates (OPS)—Kinase assays were performed at 30 °C in 1× phospho buffer (50 mm HEPES, pH 7.5, 12.5 mm NaCl, 0.625 mm MgCl2, 0.625 mm MnCl2). For reaction rate determinations, Pak2 (10 nm final concentration) or Pak4 kinase domain (50 nm final concentration) was mixed with 10 μm OPS I or II and pre-equilibrated to 30 °C. Reactions were started by the addition of 100 μm ATP containing [γ-32P]ATP. Mixtures were incubated for 2, 5, or 10 min and stopped on dry ice, followed by incubation at 95 °C for 10 min. Reactions were then spotted onto P81 cation exchange paper (Whatman), washed extensively in 0.1% phosphoric acid, and analyzed by scintillation counting on a Beckman LS 6000 SC instrument. Under these conditions, phosphoryl transfer remained linear over the time course, without substantial substrate depletion. Phosphate incorporation was calculated using counts obtained from ATP standard solutions. Substrate titration reactions were carried out as described above in the presence of increasing amounts of either OPS-I or II (0-50 μm final concentration) at 30 °C for 10 min. OPS-I and -II were obtained as crude synthetic products (Biosynthesis, Inc.) and were purified by high pressure liquid chromatography to >95% purity.
Recombinant Protein Expression and Purification—GST-βPix wild type and GST-βPix S340A were purified from Rosetta (DE3) pLysS bacteria (Novagen) as described (
). Human Pak1 was subcloned using BamHI/EcoRI sites into pFastBac HTB (Invitrogen), and baculovirus expressing recombinant His-Pak1 was prepared according to the manufacturer's protocol. Serum-free adapted Sf9 cells where grown in suspension (in SFM-900) to a density of 1 × 106 cells/ml and infected for 50-60 h with a 25-fold dilution of the viral stock. His-Pak1 was purified from cell pellets by sonication in 50 mm sodium phosphate, pH 8.0, 0.5 m NaCl, 5 mm imidazole, 1 mm phenylmethylsulfonyl fluoride, and 10 μg/ml each of chymostatin, leupeptin, and pepstatin. 20,000 × g supernatants of the resulting extract were applied to nickel-nitrilotriacetic acid beads. Beads were washed in 50 mm sodium phosphate, pH 8.0, 0.5 m NaCl, 20 mm imidazole, and Pak1 was eluted by wash buffer containing 250 mm imidazole. Pak1 was dialyzed against 50 mm Tris, pH 7.5, 100 mm NaCl, 5% glycerol, 1 mm dithiothreitol and stored at -80 °C. Expression, purification, and charging of Cdc42 were performed as described (
In Vitro βPix Phosphorylation—Recombinant Pak1 was activated by incubation with GTPγS-charged Cdc42 for 30 min at 30 °C in phospho buffer containing 1 mm ATP. Activated Pak1 was then incubated with 2.5 μg of either wild type βPix or S340A βPix in the presence of 1.3 mm ATP in phospho buffer for 30 min at 30 °C. Reactions were stopped by the addition of 2× sample buffer (125 mm Tris, pH 6.8, 4% SDS, 10% glycerol, 200 mm dithiothreitol, 0.02% bromphenol blue). Reaction products were analyzed by SDS-PAGE and Western blotting using the indicated antibodies.
In Vitro αPix Phosphorylation—∼1.4 μg each of wild type or S340A αPix was incubated with recombinant His-Pak2 for 30 min at 30 °C in phospho buffer containing 20 μm ATP and ∼0.5 μCi of [γ-32P]ATP. Reactions were stopped by the addition of 2× sample buffer and analyzed by SDS-PAGE and autoradiography.
βPix/Pak Expression in Cells—HEK293 cells were seeded at 6 × 106 cells/well of a 6-well plate and grown for 24 h before transfection using Lipofectamine 2000 (Invitrogen). 36 h later, cells were washed once with phosphate-buffered saline and lysed in radioimmune precipitation buffer (25 mm Tris-HCl, pH 8, 137 mm NaCl, 10% glycerol, 0.1% SDS, 0.5% deoxycholate, 1% Nonidet P-40, 2 mm EDTA, 1 mm sodium ortho-vanadate, 1 mm phenylmethylsulfonyl fluoride, and 10 μg/ml chymostatin, leupeptin, and pepstatin). 15,000 × g (10 min) supernatants were prepared, and Myc-tagged βPix was immunoprecipitated from equal amounts of total protein from each lysate with anti-Myc antibody and protein A-agarose (Pierce). Immunoprecipitations were washed twice in radioimmune precipitation buffer without phosphatase inhibitors and washed twice in 1× dephosphorylation buffer (50 mm Tris, pH 8.5, 0.1 mm EDTA) and were then incubated for 90 min at 30 °C in the presence or absence of 80 units of calf intestinal phosphatase (Invitrogen). Samples were then analyzed by SDS-PAGE and Western blotting with the indicated antibodies.
The Substrate Specificity of Group I and II Paks—In order to determine the relative preference of Paks for phosphorylation of serine, threonine, and tyrosine independent of sequence context, we incubated full-length, recombinant Pak2 (
) with radiolabeled ATP and three degenerate peptide mixtures containing either serine, threonine, or tyrosine as the phosphoacceptor. Quantitation of the reaction products revealed a substantial preference of both Paks for serine over threonine that was more pronounced for Pak4 (Fig. 1). As expected, no significant phosphorylation of tyrosine-containing peptides was observed for either Pak. We assume that the substrate specificity of the isolated kinase domain of Pak4 toward short peptides reflects that of the full-length protein.
We next characterized the overall phosphorylation specificity of two Group I Paks (Pak1 and Pak2) and one Group II Pak (Pak4) in radiolabel kinase assays using mixtures of partially degenerate peptides as previously described (
). Each Pak was incubated in parallel with an array of 198 peptide substrate mixtures in which each peptide in each mixture contained a fixed, central serine or threonine residue as a phosphoacceptor. In each peptide mixture, one of the 20 amino acids was systematically fixed at one of nine positions surrounding the phosphorylation site (see Fig. 2), and the other eight positions were degenerate. In addition, phosphothreonine and phosphotyrosine were included at the fixed positions to investigate the influence of prior nearby phosphorylation on recognition by Paks. After incubation, each reaction was spotted on a filter membrane, which was then washed to remove unincorporated label, and the membrane was analyzed by phosphorimaging. Those peptide mixtures that include residues favored at a particular position are preferentially phosphorylated by the kinase and thus provide increased signals on the resulting array (e.g. Arg at the -2-position in Fig. 2A). This method allows a complete and quantitative description of the kinase substrate specificity.
Quantified data were normalized by dividing the amount of phosphate incorporated into each peptide by the average amount incorporated into all peptides with the same fixed position to generate selectivity scores for each residue. PSSMs that include the complete set of selectivity values for each Pak are presented in supplemental Table 1. To compare the global similarity in the substrate preferences between pairs of Pak kinases, we prepared log score scatter plots (
) (Fig. 3, A-C). In these plots, each data point corresponds to a particular amino acid residue at a particular position (198 data points in this case). The abscissa reflects the log2 of the selectivity score (log score) for this residue position for one kinase, and the ordinate reflects the log score for the same residue position for the second kinase. Differences in the log scores of the two kinases for a particular residue position, therefore, are reflected by off-diagonal residue position scores. Low log scores (strong negative selection) have inherently greater variability due to the poorer signal/noise ratios of the raw data. Consequently, departures from the central diagonal in the bottom, left-hand quadrant are less likely to reflect true specificity differences.
Consistent with their high degree of sequence identity, Pak1 and -2 exhibited highly correlated residue position scores (Fig. 3A). Spearman's rank correlation analysis of the paired kinase log scores returned a correlation coefficient (ρS) of 0.77 (Pearson correlation (ρP) = 0.83). By contrast, the scatter plot comparing Pak1 and -4 shows significant dispersion from the diagonal (Fig. 3B; ρS = 0.55, ρP = 0.64), indicating greater differences in position-specific residue selectivity. A two-sided test was used to establish the statistical significance of the correlation coefficients in each pairwise test (p < 1e-06 in each case). To confirm that this dispersion is not due to experimental variability, we generated two replicate peptide phosphorylation data sets each for Pak2 (not shown; ρS = 0.96, ρP = 0.94) and for Pak4 (Fig. 3C; ρS = 0.94, ρP = 0.93), which revealed very good reproducibility of the experimental method. Additionally, we applied the two-sided, one-sample t test to the mean residue position score differences between replicates of either Pak2 or Pak4. In neither case could we reject the null hypothesis that the true mean difference is 0 (Pak2 p value = 0.25; Pak4 p value = 0.26). Thus, the distinct sequence preferences observed for Pak1/2 versus Pak4 are significant.
To facilitate visualization of the amino acid preferences at each position, we prepared PSSM logos (
) based on the data set for each kinase (Fig. 4). At each sequence position in the PSSM logo, letters representing each amino acid residue are stacked from most favored to most disfavored, with the height of the letter reflecting the absolute value of the log score (note that disfavored residues have negative log scores). This representation allows for rapid assessment and comparison of both positively and negatively selected residues at each position relative to the phosphoacceptor.
As expected from the analysis above, Pak1 and -2 exhibit virtually identical substrate specificity, with a predominant positive selection for arginine at all positions from -5 to -1 (Fig. 4, A and B). As previously reported, arginine at -2 is the most strongly positively selected residue (
). Interestingly, lysine was slightly disfavored at this position by both Pak1 and -2 despite the conservation of charge and the presence of lysine at this position in several published Pak1 substrates (e.g. c-Myc and RhoGDI; see supplemental Table 2). Both Group I Paks favored large hydrophobic residues (Trp, Ile, Val, and Tyr) at positions +1 to +3 from the phosphoacceptor. Proline was strongly disfavored at +1, as is also the case for many AGC and calmodulin-dependent protein kinases (
). Interestingly, a significant positive selection for both tyrosine and phosphotyrosine at +2 was also observed.
Although having a similar preference for upstream arginine residues and hydrophobic residues at the +1-position, the Group II representative Pak4 exhibited distinct substrate specificity from Paks1/2 at the +2- and +3-positions (Fig. 4, compare C with A and B). Indeed, the second strongest positive selection in Pak4 (after arginine at -2) was for serine at +3. By contrast, Pak1/2 did not strongly favor any particular amino acid at the +3-position other than a slight preference for large hydrophobic residues (Figs. 4, A and B). Similarly, a marked selection for alanine at the +2-position in Pak4 was not shared with Pak1 or -2. The positive selection for phosphotyrosine at +2 in the Group I Paks was not observed for Pak4, although tyrosine was somewhat favored. These results indicate that Pak4 selects distinct substrate sequences from Paks1/2, particularly in the +2- and +3-positions, and that these residues downstream from the phosphoacceptor appear to contribute disproportionately to the efficiency of phosphorylation by Pak4 relative to Paks1/2. These distinct sequence preferences provide a potential mechanism by which Pak4 (and perhaps other Group II Paks) could recognize distinct substrates and consequently fulfill at least partially distinct functions.
Optimized Peptide Substrates for Group I and II Paks—Peptide screening data can be used to design consensus peptide substrates for protein kinases that incorporate the most strongly selected residue at each position analyzed. Although such peptides are truly optimal substrates only if selection at each position is completely independent of the surrounding sequence, this approach has been used to produce highly efficient and selective peptide substrates for Pim kinases, Akt/protein kinase B, protein kinase C, and MAPKAP kinases (
We synthesized peptides corresponding to the optimal Pak substrate sequences of Group I (OPS-I; GGRRRRRSWYFGGGK) and Group II (OPS-II; GGRRRRRSWASPGGK) and used these as substrates for Pak2 and Pak4 in kinase assays in vitro. Consistent with their anticipated preferences, Pak2 phosphorylated OPS-I at an ∼4-fold higher rate than OPS-II (Fig. 5A). Pak4 generally exhibited slightly faster phosphorylation of OPS-II than OPS-I (Fig. 5A), although this did not always reach statistical significance (Fig. 5, compare A and C). Titrations of each kinase with OPS-I and -II are shown in Fig. 5, B and C. Attempts to determine kinetic constants for the two enzymes toward OPS-I and -II were complicated by substrate inhibition at higher concentrations (Fig. 5, B and C). Nevertheless, these data support the results of the peptide specificity determinations and introduce “optimal” peptide reagents for measuring Pak kinase activity in vitro. Indeed, we have found that OPS-I serves as a sensitive and specific substrate for Group I Paks in in-gel kinase assays (data not shown).
Pak Residues Mediating the Distinct Substrate Preferences of Group I and II Paks—Pak2 and Pak4 share 55% amino acid sequence identity within their kinase domains. We sought to determine what sequence differences in the substrate-binding region are responsible for the selectivity differences between these two related kinases. Since no structures of Pak kinases bound to substrates have yet been reported, we predicted the regions of these kinases that interact with substrates by comparison with the reported structure of the related basophilic kinase protein kinase A (PKA) bound to a peptide substrate (
) (Protein Data Bank code 1JBP) as shown in Fig. 6A. One of the few differences between Pak1 and Pak4 in the putative substrate-binding region is the presence of the sequence Pro307/Lys308 in human Pak1, which corresponds to Gln358/Arg359 in human Pak4. These residues are absolutely conserved among other members of their respective Group I and Group II families and are located in the β3-αC loop of the small lobe of the kinase, near the predicted locations of the +2 and +3 amino acid side chains of the substrate. Since the phosphorylation motifs of the two kinase groups differ largely at these positions within substrates, we hypothesized that this pair of residues could account for their differences. To test this, we mutated the Pro-Lys sequence found in Pak2 to Gln-Arg and conversely mutated the Gln-Arg sequence of Pak4 to Pro-Lys. The mutant proteins were expressed in E. coli, and their specificity was assessed using the degenerate peptide arrays as above.
These mutations had a profound impact on substrate selectivity at the +2- and +3-positions (Fig. 6B), although selectivity at other positions was essentially unchanged (supplemental Fig. 1 and supplemental Table 1). In the Pak4 mutant, selection for alanine at +2 and serine at +3 was dramatically reduced, and instead selection for phosphotyrosine, as observed in Pak2, was evident. In the Pak2 mutant, phosphotyrosine selection at +2 was lost, but the alanine and serine preferences of Pak4 were not acquired. These results demonstrate that the amino acid residues at positions 358 and 359 (human Pak4 numbering) are necessary and sufficient to mediate phosphotyrosine selection at the +2-position by Pak2. However, the alanine and serine selectivity of Pak4 (at +2 and +3, respectively), although dependent on these residues, also requires additional amino acids that may not contact the substrate directly.
Identification of a Novel Pak1 Phosphorylation Site—Knowledge of protein kinase phosphorylation motifs has been useful in some cases for identifying novel protein substrates (
). Nevertheless, other factors also influence kinase substrate selectivity in vivo, including surface accessibility of the target site, other sites of interaction between kinase and substrates, and co-expression and co-localization of the kinase and substrate within the cell. To determine if the specificity data described above were sufficient to identify novel Pak1 substrates, we used Scansite (
) (available on the World Wide Web at scansite.mit.edu) and the Pak1 recognition matrix (see supplemental Table 1) to search the SwissProt database for human proteins best matching the Pak1 peptide substrate preferences. We found that established Pak substrates harboring serine phosphorylation sites (listed in supplemental Table 2) were not among the top 2% of proteins scored (of 11,655 human SwissProt sequences searched), suggesting that recognition motifs alone may not be sufficient to identify novel Pak substrates.
To test the predictive power of the specificity matrices to identify phosphorylation sites in specific proteins we used Scansite to quantitatively evaluate all serine residues in each protein for which at least one specific Group I Pak phosphorylation site has been identified (supplemental Table 2). Based on the selectivity data, Scansite calculates a single numeric score for each 15-amino acid sequence centered on a serine residue (
). Lower scores reflect better matches to the Pak1 PSSM. Fig. 7A shows a density histogram of the Scansite scores for serines targeted by Group I Paks and the scores for nonphosphorylated serines occurring in the same proteins, highlighting the ability of the PSSM score to discriminate between these two populations. Thus, given a known Pak1 substrate, the preference matrix can be used to facilitate the identification of serine residues most likely to be phosphorylated by Pak1.
Because the population of PSSM scores of true Pak1 substrate serines partially overlaps with the distribution of nonphosphorylated serines as seen in Fig. 7A, the PSSM does not completely distinguish between these two populations. Consequently, use of the scores to predict sites phosphorylated by Pak1 must balance sensitivity of site detection against the increasing likelihood of false positives at higher scores. This compromise between sensitivity (detection of true positives) and specificity (elimination of false positives) of an algorithm is typically represented by a receiver operating characteristic (ROC) curve (
We prepared a table of calculated Scansite scores for all of the analyzed serines (both Pak-phosphorylated and control, nonphosphorylated serines), ranked by increasing score. We then treated each Scansite score as a threshold and calculated the fraction of scores below that threshold that correspond to serines actually phosphorylated by Pak (true positives) and those not phosphorylated by Pak (false positives). Similarly, for Scansite scores above that threshold, we calculated the fraction of scores corresponding to serines phosphorylated by Pak (false negatives) and those not phosphorylated by Pak (true negatives). These fractions were calculated for all score thresholds. The ROC curve plots the fraction of true positives versus the fraction of false positives for all possible thresholds based on this empirically derived data set (
The area under the ROC curve (AUC) represents the probability that for two randomly picked serines, where one is truly phosphorylated by Pak1 and one is not, Scansite would correctly assign a lower Scansite score to the truly phosphorylated serine than the serine not actually phosphorylated by Pak1 (
). The closer the ROC curve is to the diagonal (AUC = 0.5), the less the algorithm is capable of discriminating between the two populations. An AUC of 1 would represent a perfect ability to discriminate between the groups. In our case, the estimated AUC is 0.78, indicating good sensitivity and specificity of the prediction. A useful feature of the ROC curve is that the AUC is not significantly affected by the distributions of the underlying populations. It is simply based on ranking the scores from the two populations.
We also plotted the expected distribution of true positives, true negatives, false positives, and false negatives as a function of Scansite score (Fig. 7C). This representation is useful, after having used Scansite to score all serines in a putative Pak1 substrate, for choosing a threshold score to reduce the number of candidate serines to be considered. This threshold could, for example, be made more stringent, based on a low tolerance for false positives, or more relaxed, in order to improve the likelihood of detecting all true sites, as desired.
We previously determined that Pak1 can phosphorylate βPix (Pak-interactive exchange factor, also called Cool-1) at an unknown site (or sites) in vitro (data not shown). βPix and its paralog αPix (Cool-2) are guanine nucleotide exchange factors for Rac and Cdc42 that associate via a Src homology 3 domain with a proline-rich region in Pak1 and -2. In order to determine the phosphorylated residue(s) and to test the predictive power of the PSSM to identify novel Pak phosphorylation sites, we used Scansite and the Pak1 PSSM to score all serines in βPix. Notably, Ser525 in βPix, a previously identified Pak2 phosphorylation site (
), was the second best scoring serine in βPix. The best candidate, Ser340, has not yet previously been identified as a Pak phosphorylation site. The analogous serine in αPix, Ser488, is known from NMR structural studies (Protein Data Bank code 1V61)
to be located in a surface-exposed loop (the β3-β4 loop) in the Pix pleckstrin homology domain and therefore is probably accessible to kinases. Notably, Ser340 and its surrounding sequence are highly conserved in Pix orthologs from humans, zebrafish, chicken, and Drosophila (Fig. 7D). Both αPix and βPix isoforms contain an identical sequence, suggesting that this region may play an important structural or functional role. Furthermore, independent, large scale phosphoproteomics studies have detected phosphorylation of βPix Ser340 in A431 cells (PhosphoSite data base) (available on the World Wide Web at www.cellsignal.com), HeLa cells (
). We therefore tested whether this site could be phosphorylated by Pak1.
We generated a phosphospecific antibody against the conserved 10-amino acid phosphoepitope surrounding Ser340 and used it to monitor βPix phosphorylation in vitro using recombinant Pak1 and βPix. As shown in Fig. 8A, active Pak1 promoted phosphorylation of Ser340, as observed by Western blots using the phosphospecific antibody. By contrast, when a nonphosphorylatable βPix mutant (S340A) was used as substrate, no reactivity with the antibody was seen. These results demonstrate the ability of Pak1 to phosphorylate Ser340in vitro and validate the specificity of the phosphospecific antibody.
To determine if Group I Paks can also phosphorylate the analogous residue in αPix (Ser488), we expressed and purified a truncated form of αPix (amino acids 155-546) in E. coli comprising either the wild-type sequence or with Ser488 mutated to alanine (S488A). Phosphate incorporation into αPix in vitro with either Pak1 (not shown) or Pak2 (Fig. 8B) was dramatically reduced in the S488A mutant, suggesting that Ser488 in αPix is also directly targeted by Group I Paks. A similar experiment using full-length wild type or S340A βPix as a substrate revealed an ∼55% reduction in radiolabeled phosphate incorporation in the mutant relative to wild type (data not shown).
Phosphoproteomic studies suggest that Ser340 (βPix) and Ser488 (αPix) is indeed phosphorylated under physiological conditions (
). To test if Pak1 can mediate phosphorylation of this site in living cells, HEK293 cells were transfected with Myc-tagged βPix along with constitutively active (T423E) or dominant negative (K299R) forms of Pak1, and phosphorylation of βPix was monitored by Western analysis of anti-Myc immunoprecipitates using the phosphospecific Pix antibody (Fig. 8C). Myc-tagged βPix expressed alone exhibited some phosphorylation on Ser340 (lane 3), presumably due to endogenous Paks. Co-expression of constitutively active Pak1 (CA) Pak1, however, increased the level of Ser340 phosphorylation (lane 4). Phosphatase treatment (calf intestinal phosphatase) of the immunoprecipitates abolished reactivity with the phosphospecific antibody (lane 5), confirming the specificity of the antibody for the phosphorylated species. Finally, expression of dominant negative Pak1 (kinase-dead (KD) Pak1) significantly reduced βPix Ser340 phosphorylation (lane 6). Thus, Pak1 can phosphorylate βPix at Ser340 not only in vitro, but also in the context of live cells. These results validate the use of the Pak1 PSSM for the identification of new Group I Pak phosphorylation sites. As yet, too few substrates of Group II Paks are known to determine rigorously if the Pak4 selectivity matrix can be used to identify Pak4 phosphorylation sites in Group II Pak substrates.
Pak Substrate Specificity—Despite the fact that Pak1 and Pak2 are the most divergent members of the Group I Pak family, we find that they share essentially indistinguishable substrate specificity. Based on sequence homology, we anticipate Pak3 to behave similarly, suggesting the possibility of functional redundancy among Group I Paks. Significant differences were found, however, in the substrate specificities of Pak1/2 and the Group II member Pak4 (see Figs. 2, 3, 4). Whereas Pak1 and Pak2 preferred large hydrophobic residues in positions from +1 to +3, Pak4 showed unusually strong and markedly different specificity at the +2- and +3-positions. Indeed, the second most strongly positively selected residue by Pak4 was for serine at the +3-position. In addition, alanine was the most preferred residue at the +2-position. We could attribute at least part of the distinct substrate preferences of Pak1/2 and Pak4 to two residues in the small lobe of the Paks (Gln358 and Arg359 in Pak4; Fig. 6). These residues are absolutely conserved within each Pak group, suggesting that the remaining Group II Paks will probably share similar sequence preferences at the substrate +2- and +3-positions. To our knowledge, this is the first time that residues in the small lobe of a kinase have been experimentally linked to substrate selectivity.
These results are consistent with the possibility that Group I and II Paks may phosphorylate partially distinct sets of substrates and, therefore, fulfill distinct biological functions. Nevertheless, the few substrates of Group II Paks currently known are also phosphorylated by Group I Paks. Thus, the identification of additional Group II Pak substrates will be required to determine whether the peptide substrate specificity differences revealed here are exploited physiologically.
In addition to the distinguishing features described above, Group I and II Paks also share some common substrate preferences. Paks are unusual among the basophilic kinases in exhibiting a predominant preference for arginine at the -2-position rather than at the -3-position (
) recently proposed two acidic residues termed PEM+1 and YEM+1 as critical for selection of arginine at the -2-position. Consistent with this hypothesis, all six Paks conserve Asp and Glu at these positions, respectively, and our structural comparison with PKA places these residues in Pak at similar positions relative to the substrate arginine as in PKA (not shown). Similarly, the shallow, hydrophobic +1-binding pocket in PKA is also largely conserved in both Group I and II Paks (composed in Pak4 of Pro479, Met482, Leu475, and Met524) and probably explains the modest selectivity for hydrophobic residues at the +1 substrate position by both Pak groups.
Based on the analysis of crystal structures of kinases bound to substrates, Zhu et al. proposed that two conserved amino acid residues known as the “toggle residue” and the “toggle-regulating residue” mediate proline deselection at the +1 substrate position by AGC and calmodulin-dependent protein kinases (
). As for several members of these kinase families, the toggle residue in both Group I and II Paks is glycine, and the toggle-regulating residue is methionine. This observation is consistent with the hypothesis that these residues prevent phosphorylation of substrates with proline at +1 and that this deselection evolved to confer a reciprocal substrate specificity in AGC, calmodulin-dependent protein kinases, and Ste20-related kinases from proline-directed CMGC kinases (
) revealed significant differences between Group I and II Paks in the rearrangements of the αC helix that most likely accompany catalysis. The different conformers analyzed in six high resolution structures of active Group II Paks revealed rearrangements of helix αC that result in an additional helical turn at the αC N terminus and a distortion of its C terminus. Interestingly, the result of this structural rearrangement is a swinging movement of Arg359 in Pak4 as it becomes incorporated into the αC helix extension. In this conformation, Arg359 is turned away from the substrate-binding site, forming a network of hydrogen bonds with the glycine-rich loop. These interactions stabilize a closed conformation of the glycine-rich loop, which would not be compatible with ATP binding. In another conformation, Arg359 is oriented toward the activation loop and putative substrate-binding region, presumably contributing to the substrate selectivity described here. Strikingly, a proline residue is absolutely conserved at the position analogous to 358 in Group I Paks, which prevents the helical extension observed in Group II Paks. Indeed, we have found that the Q358P/R359K mutant of Pak4 exhibits an elevated Km for ATP.
S. W. Deacon and J. R. Peterson, unpublished data.
Additional studies will be required to determine if these amino acid residues facilitate coordination of substrate binding and catalysis. Taken together, the specificity data and structural comparisons described here provide an essentially complete framework for understanding Pak kinase substrate selectivity.
Identification of Novel Pak Phosphorylation Sites—Data base mining for proteins best meeting kinase sequence specificity has been used to identify new kinase substrates. However, many variables can influence substrate phosphorylation other than amino acid sequence, such as accessibility of the target sequence to the kinase or the presence or absence of “docking sites” distinct from the phosphorylated sequence that may contribute to kinase-substrate interactions. Indeed, we observed that established Pak kinase substrates did not rank among the human proteins most closely matching the Pak1 or Pak2 specificity matrices. Thus, these other specificity-determining mechanisms may play a larger role in Pak substrate selection than in some other kinases. Nevertheless, the specificity data is predictive of Pak1 phosphorylation site in proteins known or expected to be Pak substrates (Fig. 7). Here we outline a strategy for using the substrate specificity data to identify new Pak1 phosphorylation sites in proteins already known or suspected to be phosphorylated by Pak1.
First, using the Pak PSSM determined here and the World Wide Web-based Scansite tool, all serine residues within the putative substrate are scored and ranked. A threshold score is chosen based on the expected fraction of true positives and false positives as outlined in Fig. 7C to reduce the complexity of candidate serines. Additional information can then be applied in order to prioritize these candidates for direct testing, such as 1) predicted or actual surface accessibility of each serine, 2) evolutionary conservation of the serine, 3) their location within functionally important domains, or 4) known sites of serine phosphorylation from phosphoproteomic studies.
The Pak1 phosphorylation site in αPix and βPix identified here falls within a surface-exposed loop between β3 and β4 strands of the pleckstrin homology domain of Pix proteins. Intriguingly, the analogous loop in the pleckstrin homology domain of the guanine nucleotide exchange factor Dbs directly contacts its substrate, Cdc42, promoting guanine nucleotide exchange (
). We are currently investigating whether this region of Pix plays a similar role in regulating Pix guanine nucleotide exchange factor activity.
We gratefully acknowledge Dr. Richard Cerione for the βPix expression plasmid, Dr. Warren Kruger for critical reading of the manuscript, and Jami Fukui for preparation of recombinant proteins. B. E. T. gratefully acknowledges the support and guidance of L. Cantley during the early stages of this work.