Specificity Profiling of Pak Kinases Allows Identification of Novel Phosphorylation Sites*

The p21-activated kinases (Paks) serve as effectors of the Rho family GTPases Rac and Cdc42. The six human Paks are divided into two groups based on sequence similarity. Group I Paks (Pak1 to -3) phosphorylate a number of substrates linking this group to regulation of the cytoskeleton and both proliferative and anti-apoptotic signaling. Group II Paks (Pak4 to -6) are thought to play distinct functional roles, yet their few known substrates are also targeted by Group I Paks. To determine if the two groups recognize distinct target sequences, we used a degenerate peptide library method to comprehensively characterize the consensus phosphorylation motifs of Group I and II Paks. We find that Pak1 and Pak2 exhibit virtually identical substrate specificity that is distinct from that of Pak4. Based on structural comparisons and mutagenesis, we identified two key amino acid residues that mediate the distinct specificities of Group I and II Paks and suggest a structural basis for these differences. These results implicate, for the first time, residues from the small lobe of a kinase in substrate selectivity. Finally, we utilized the Pak1 consensus motif to predict a novel Pak1 phosphorylation site in Pix (Pak-interactive exchange factor) and demonstrate that Pak1 phosphorylates this site both in vitro and in cultured cells. Collectively, these results elucidate the specificity of Pak kinases and illustrate a general method for the identification of novel sites phosphorylated by Paks.

The p21-activated kinases (Paks) 4 interact with active forms of the 21-kDa molecular mass Rho-related GTPases Rac and Cdc42. Paks can be divided into Group I (Pak1 to -3) and the more recently discovered Group II (Pak4 to -6), based on sequence similarity and regulatory mechanism (1,2). Whereas the catalytic activity of the Group I Paks is markedly up-regulated upon binding to GTP-bound Rac or Cdc42, the Group II Paks interact with but are not demonstrably activated by Rac or Cdc42.
Substantial data link the expression and hyperactivity of Paks to tumorigenesis and metastasis (1,3). Elevated Pak1 is associated with cancer of the breast (4), colon (5), and ovary (6). Transgenic expression of a constitutively active allele of Pak1 in murine mammary glands is sufficient to induce tumors (7), and the expression of a dominant negative form of Pak1 inhibits invasiveness of a breast cancer cell line (8). These observations have led to a great interest in understanding the effector pathways downstream of Paks that mediate tumorigenesis and metastasis.
Over 30 direct substrates of Group I Paks have been identified, outlining major functional roles in cytoskeletal regulation, survival signaling, cell cycle progression, and mitogen-activated protein kinase pathway activation (reviewed in Ref. 1; see also supplemental Table 2). Pak1 and -2 are the most distantly related Group I Paks yet share 91% sequence identity within their kinase domains, suggesting that they may recognize similar substrates. Indeed, Pak1 and -2 phosphorylate a number of common substrates, including Bad, Raf, Mek, and Merlin (9 -13). Both Pak1 and Pak2 are expressed in many tissue types, but only Pak2 is essential for viability in mice (14).
For the Group II Paks, much less is known regarding regulation of kinase activity, identity of substrates, or function. Similar to Group I Paks, studies have implicated Group II in survival signaling, mitogen signaling, and cell motility (15)(16)(17)(18)(19)(20)(21)(22)(23)(24). The best studied member, Pak4, is widely expressed and is an essential gene in mice (24). The kinase domain of Pak4 shares only 53-55% sequence identity with Group I Paks. The sequence divergence between the kinase domains of the two Pak groups suggests that they may recognize distinct substrates and serve at least partially divergent functions (2). Nevertheless, reported substrates for Pak4 are limited to Raf (15), Bad (17), Lim domain kinase 1 (16), and guanine nucleotide exchange factor-H1 (25), proteins also phosphorylated by Pak1 (11, 26 -30). Thus, an important outstanding question is to what degree these two groups recognize similar substrates and are functionally redundant.
Previous studies have generally investigated Pak kinase substrate specificity in an ad hoc manner based on testing the impact of amino acid substitutions within known Pak protein and peptide substrates. Such studies, however, are generally not comprehensive and can incur bias from the particular sequence context. For example, using mutagenesis, King et al. (31) identified important sequence determinants for Pak3 phosphorylation of Raf-1 at Ser 338 , but the importance of these specific residues in other sequence contexts is unknown. Similar work analyzing Pak2 phosphorylation of synthetic peptides derived from the Rous sarcoma virus nucleocapsid NC protein has elucidated recognition determinants in this context (32). Unfortunately, whereas the Raf-1 study focused primarily on amino acids C-terminal to the targeted serine, the NC study mainly investigated residues N-terminal to the phosphorylation site, making it difficult to determine if the features identified in each study are context-dependent or independent. However, both studies did identify an arginine residue two amino acids N-terminal to the phosphoacceptor site (the Ϫ2-position) as a critical determinant for Pak recognition. Recently Shaw and coworkers (33) confirmed this finding using an alternative approach employing degenerate peptide mixtures and observed a strong bias for peptides containing arginine at the Ϫ2-position and lesser preferences for arginine at the Ϫ3and Ϫ4-positions.
We report here the application of a recently described positional scanning peptide library approach to determine the complete, context-independent substrate sequence preferences of members of both the Group I and Group II Paks and apply that information to identify a novel Pak1 phosphorylation site in Pix. Although several serine/threonine kinases have been analyzed with this method (34 -36), it has not yet been applied to a member of the Ste20-related kinase family, and no comprehensive peptide library analysis has yet been conducted for a Pak kinase.
Sequence Logos-Position-specific scoring matrix (PSSM) sequence logos were generated manually in Adobe Illustrator using the Arial black font and scaling letters by the absolute value of the log 2 of the raw selectivity score.
Determination of Pak Phosphorylation Specificity-Peptide library screens were carried out as described previously with minor modifications (36). Briefly, a series of 198 partially degenerate peptides with the general sequence YAXXXXX(S/ T)XXXXAGKK(biotin) was employed, in which S/T indicates an even mixture of Ser and Thr, and all positions X except one are a degenerate mixture of the 17 amino acid residues excluding Ser, Thr, and Cys. Each individual peptide bears one of 22 amino acid residues (all unmodified residues, plus phosphothreonine and phosphotyrosine) fixed at one of the X positions. Reactions were carried out in sealed multiwell plates for 2 h at 30°C in a buffer containing 50 mM HEPES, pH 7.4, 12.5 mM NaCl, 1.5 mM MgCl 2 , 1.5 mM MnCl 2 , 0.1% Tween 20 with 50 M [␥-32 P]ATP or [ 33 P]ATP at 0.3 Ci/l, 50 M peptide substrate, and the kinase of interest. Pak1 reactions also included 1.3 M GTP␥S-charged Cdc42. Aliquots of the reactions were spotted onto streptavidin membranes and washed as described previously (38). Incorporation of radioactivity into peptides was quantified by exposure to a phosphor screen and analysis using ImageQuant software. PSSMs were generated from background-subtracted data that were normalized as described (36). Data reflecting the average selectivity values from at least two separate runs are shown. To determine the phosphoacceptor residue specificity, similar peptides bearing the sequence YAXXXXXZXXXXAGKK(biotin) were used in which all X positions were degenerate, and the Z residue was either Ser, Thr, or Tyr. Reactions were carried out in microcentrifuge tubes as described above for the peptide library screens. Aliquots (2 l) were spotted onto the streptavidin membrane, which was washed and quantified as above.
In Vitro Kinase Assays with Optimal Pak Substrates (OPS)-Kinase assays were performed at 30°C in 1ϫ phospho buffer (50 mM HEPES, pH 7.5, 12.5 mM NaCl, 0.625 mM MgCl 2 , 0.625 mM MnCl 2 ). For reaction rate determinations, Pak2 (10 nM final concentration) or Pak4 kinase domain (50 nM final concentration) was mixed with 10 M OPS I or II and pre-equilibrated to 30°C. Reactions were started by the addition of 100 M ATP containing [␥-32 P]ATP. Mixtures were incubated for 2, 5, or 10 min and stopped on dry ice, followed by incubation at 95°C for 10 min. Reactions were then spotted onto P81 cation exchange paper (Whatman), washed extensively in 0.1% phosphoric acid, and analyzed by scintillation counting on a Beckman LS 6000 SC instrument. Under these conditions, phosphoryl transfer remained linear over the time course, without substantial substrate depletion. Phosphate incorporation was calculated using counts obtained from ATP standard solutions. Substrate titration reactions were carried out as described above in the presence of increasing amounts of either OPS-I or II (0 -50 M final concentration) at 30°C for 10 min. OPS-I and -II were obtained as crude synthetic products (Biosynthesis, Inc.) and were purified by high pressure liquid chromatography to Ͼ95% purity.
Recombinant Protein Expression and Purification-GST-␤Pix wild type and GST-␤Pix S340A were purified from Rosetta (DE3) pLysS bacteria (Novagen) as described (39). Full-length wild type Pak2 was expressed as a His-tagged fusion protein as previously reported (40). As reported (40), this protein is constitutively active as purified and is not further stimulated by Cdc42. The wild type Pak4 kinase domain (37) and mutant were expressed and purified as GST-tagged fusion proteins essentially as reported (37). Human Pak1 was subcloned using BamHI/EcoRI sites into pFastBac HTB (Invitrogen), and baculovirus expressing recombinant His-Pak1 was prepared according to the manufacturer's protocol. Serum-free adapted Sf9 cells where grown in suspension (in SFM-900) to a density of 1 ϫ 10 6 cells/ml and infected for 50 -60 h with a 25-fold dilution of the viral stock. His-Pak1 was purified from cell pellets by sonication in 50 mM sodium phosphate, pH 8.0, 0.5 M NaCl, 5 mM imidazole, 1 mM phenylmethylsulfonyl fluoride, and 10 g/ml each of chymostatin, leupeptin, and pepstatin. 20,000 ϫ g supernatants of the resulting extract were applied to nickelnitrilotriacetic acid beads. Beads were washed in 50 mM sodium phosphate, pH 8.0, 0.5 M NaCl, 20 mM imidazole, and Pak1 was eluted by wash buffer containing 250 mM imidazole. Pak1 was dialyzed against 50 mM Tris, pH 7.5, 100 mM NaCl, 5% glycerol, 1 mM dithiothreitol and stored at Ϫ80°C. Expression, purification, and charging of Cdc42 were performed as described (41).
In Vitro ␤Pix Phosphorylation-Recombinant Pak1 was activated by incubation with GTP␥S-charged Cdc42 for 30 min at 30°C in phospho buffer containing 1 mM ATP. Activated Pak1 was then incubated with 2.5 g of either wild type ␤Pix or S340A ␤Pix in the presence of 1.3 mM ATP in phospho buffer for 30 min at 30°C. Reactions were stopped by the addition of 2ϫ sample buffer (125 mM Tris, pH 6.8, 4% SDS, 10% glycerol, 200 mM dithiothreitol, 0.02% bromphenol blue). Reaction products were analyzed by SDS-PAGE and Western blotting using the indicated antibodies.
In Vitro ␣Pix Phosphorylation-ϳ1.4 g each of wild type or S340A ␣Pix was incubated with recombinant His-Pak2 for 30 min at 30°C in phospho buffer containing 20 M ATP and ϳ0.5 Ci of [␥-32 P]ATP. Reactions were stopped by the addition of 2ϫ sample buffer and analyzed by SDS-PAGE and autoradiography.
␤Pix/Pak Expression in Cells-HEK293 cells were seeded at 6 ϫ 10 6 cells/well of a 6-well plate and grown for 24 h before transfection using Lipofectamine 2000 (Invitrogen). 36 h later, cells were washed once with phosphate-buffered saline and lysed in radioimmune precipitation buffer (25 mM Tris-HCl, pH 8, 137 mM NaCl, 10% glycerol, 0.1% SDS, 0.5% deoxycholate, 1% Nonidet P-40, 2 mM EDTA, 1 mM sodium ortho-vanadate, 1 mM phenylmethylsulfonyl fluoride, and 10 g/ml chymostatin, leupeptin, and pepstatin). 15,000 ϫ g (10 min) supernatants were prepared, and Myc-tagged ␤Pix was immunoprecipitated from equal amounts of total protein from each lysate with anti-Myc antibody and protein A-agarose (Pierce). Immunoprecipitations were washed twice in radioimmune precipitation buffer without phosphatase inhibitors and washed twice in 1ϫ dephosphorylation buffer (50 mM Tris, pH 8.5, 0.1 mM EDTA) and were then incubated for 90 min at 30°C in the presence or absence of 80 units of calf intestinal phosphatase (Invitrogen). Samples were then analyzed by SDS-PAGE and Western blotting with the indicated antibodies.

RESULTS
The Substrate Specificity of Group I and II Paks-In order to determine the relative preference of Paks for phosphorylation of serine, threonine, and tyrosine independent of sequence context, we incubated full-length, recombinant Pak2 (40) or Pak4 kinase domain (37) with radiolabeled ATP and three degenerate peptide mixtures containing either serine, threonine, or tyrosine as the phosphoacceptor. Quantitation of the reaction products revealed a substantial preference of both Paks for serine over threonine that was more pronounced for Pak4 (Fig. 1). As expected, no significant phosphorylation of tyrosine-containing peptides was observed for either Pak. We assume that the substrate specificity of the isolated kinase domain of Pak4 toward short peptides reflects that of the full-length protein.
We next characterized the overall phosphorylation specificity of two Group I Paks (Pak1 and Pak2) and one Group II Pak (Pak4) in radiolabel kinase assays using mixtures of partially degenerate peptides as previously described (36,38). Each Pak was incubated in parallel with an array of 198 peptide substrate mixtures in which each peptide in each mixture contained a fixed, central serine or threonine residue as a phosphoacceptor. In each peptide mixture, one of the 20 amino acids was system-atically fixed at one of nine positions surrounding the phosphorylation site (see Fig. 2), and the other eight positions were degenerate. In addition, phosphothreonine and phosphotyrosine were included at the fixed positions to investigate the influence of prior nearby phosphorylation on recognition by Paks. After incubation, each reaction was spotted on a filter membrane, which was then washed to remove unincorporated label, and the membrane was analyzed by phosphorimaging. Those peptide mixtures that include residues favored at a particular position are preferentially phosphorylated by the kinase and thus provide increased signals on the resulting array (e.g. Arg at the Ϫ2-position in Fig. 2A). This method allows a complete and quantitative description of the kinase substrate specificity.
Quantified data were normalized by dividing the amount of phosphate incorporated into each peptide by the average amount incorporated into all peptides with the same fixed position to generate selectivity scores for each residue. PSSMs that include the complete set of selectivity values for each Pak are presented in supplemental Table 1. To compare the global similarity in the substrate preferences between pairs of Pak kinases, we prepared log score scatter plots (43) (Fig. 3, A-C). In these plots, each data point corresponds to a particular amino acid residue at a particular position (198 data points in this case). The abscissa reflects the log 2 of the selectivity score (log score) for this residue position for one kinase, and the ordinate reflects the log score for the same residue position for the second kinase. Differences in the log scores of the two kinases for a particular residue position, therefore, are reflected by off-diagonal residue position scores. Low log scores (strong negative selection) have inherently greater variability due to the poorer signal/noise ratios of the raw data. Consequently, departures from the central diagonal in the bottom, left-hand quadrant are less likely to reflect true specificity differences.
Consistent with their high degree of sequence identity, Pak1 and -2 exhibited highly correlated residue position scores (Fig.  3A). Spearman's rank correlation analysis of the paired kinase log scores returned a correlation coefficient ( S ) of 0.77 (Pearson correlation ( P ) ϭ 0.83). By contrast, the scatter plot comparing Pak1 and -4 shows significant dispersion from the diagonal ( Fig. 3B; S ϭ 0.55, P ϭ 0.64), indicating greater differences in position-specific residue selectivity. A two-sided test was used to establish the statistical significance of the correlation coefficients in each pairwise test (p Ͻ 1eϪ06 in each case). To confirm that this dispersion is not due to experimental variability, we generated two replicate peptide phosphorylation data sets each for Pak2 (not shown; S ϭ 0.96, P ϭ 0.94) and for Pak4 ( Fig. 3C; S ϭ 0.94, P ϭ 0.93), which revealed very good reproducibility of the experimental method. Additionally, we applied the two-sided, one-sample t test to the mean residue position score differences between replicates of either Pak2 or Pak4. In neither case could we reject the null hypothesis that the true mean difference is 0 (Pak2 p value ϭ 0.25; Pak4 p value ϭ 0.26). Thus, the distinct sequence preferences observed for Pak1/2 versus Pak4 are significant.
To facilitate visualization of the amino acid preferences at each position, we prepared PSSM logos (43) based on the data set for each kinase (Fig. 4). At each sequence position in the PSSM logo, letters representing each amino acid residue are stacked from most favored to most disfavored, with the height  of the letter reflecting the absolute value of the log score (note that disfavored residues have negative log scores). This representation allows for rapid assessment and comparison of both positively and negatively selected residues at each position relative to the phosphoacceptor.
As expected from the analysis above, Pak1 and -2 exhibit virtually identical substrate specificity, with a predominant positive selection for arginine at all positions from Ϫ5 to Ϫ1 (Fig. 4, A and B). As previously reported, arginine at Ϫ2 is the most strongly positively selected residue (32,33). Interestingly, lysine was slightly disfavored at this position by both Pak1 and -2 despite the conservation of charge and the presence of lysine at this position in several published Pak1 substrates (e.g. c-Myc and RhoGDI; see supplemental Table 2). Both Group I Paks favored large hydrophobic residues (Trp, Ile, Val, and Tyr) at positions ϩ1 to ϩ3 from the phosphoacceptor. Proline was strongly disfavored at ϩ1, as is also the case for many AGC and calmodulin-dependent protein kinases (44). Interestingly, a significant positive selection for both tyrosine and phosphotyrosine at ϩ2 was also observed.
Although having a similar preference for upstream arginine residues and hydrophobic residues at the ϩ1-position, the Group II representative Pak4 exhibited distinct substrate specificity from Paks1/2 at the ϩ2and ϩ3-positions (Fig. 4, compare C with A and B). Indeed, the second strongest positive selection in Pak4 (after arginine at Ϫ2) was for serine at ϩ3. By contrast, Pak1/2 did not strongly favor any particular amino acid at the ϩ3-position other than a slight preference for large hydrophobic residues (Figs. 4, A and B). Similarly, a marked selection for alanine at the ϩ2-position in Pak4 was not shared with Pak1 or -2. The positive selection for phosphotyrosine at ϩ2 in the Group I Paks was not observed for Pak4, although tyrosine was somewhat favored. These results indicate that Pak4 selects distinct substrate sequences from Paks1/2, particularly in the ϩ2and ϩ3-positions, and that these residues downstream from the phosphoacceptor appear to contribute disproportionately to the efficiency of phosphorylation by Pak4 relative to Paks1/2. These distinct sequence preferences provide a potential mechanism by which Pak4 (and perhaps other Group II Paks) could recognize distinct substrates and consequently fulfill at least partially distinct functions.
Optimized Peptide Substrates for Group I and II Paks-Peptide screening data can be used to design consensus peptide substrates for protein kinases that incorporate the most strongly selected residue at each position analyzed. Although FIGURE 3. Pairwise comparison of the substrate selectivity of Group I and II Paks. Each plot represents a substrate specificity comparison of two kinases, and each data point represents a specific amino acid residue at a specific sequence position. The abscissa is the log score of this residue-position combination for one kinase, and the ordinate is the log score for the same residue-position combination for the other kinase. Similar residue-position scores produce data points that fall on the diagonal, whereas selectivity differences are reflected by points more distant from the diagonal. A, a comparison of Pak1 and -2 reveals a globally similar substrate selectivity. B, by contrast, the scatter plot for Pak1 versus Pak4 exhibits greater dispersion, indicating differences in substrate preferences. C, the comparison of two identical runs of Pak4 illustrates the consistency of the method from experiment to experiment. Correlation coefficients calculated by Spearman's rank correlation analysis ( S ) or Pearson's correlation ( P ) are shown for each pairwise comparison.
such peptides are truly optimal substrates only if selection at each position is completely independent of the surrounding sequence, this approach has been used to produce highly efficient and selective peptide substrates for Pim kinases, Akt/protein kinase B, protein kinase C, and MAPKAP kinases (36,(45)(46)(47).
We synthesized peptides corresponding to the optimal Pak substrate sequences of Group I (OPS-I; GGRRRRRSWYF-GGGK) and Group II (OPS-II; GGRRRRRSWASPGGK) and used these as substrates for Pak2 and Pak4 in kinase assays in vitro. Consistent with their anticipated preferences, Pak2 phosphorylated OPS-I at an ϳ4-fold higher rate than OPS-II (Fig.  5A). Pak4 generally exhibited slightly faster phosphorylation of OPS-II than OPS-I (Fig. 5A), although this did not always reach statistical significance (Fig. 5, compare A and C). Titrations of each kinase with OPS-I and -II are shown in Fig. 5, B and C. Attempts to determine kinetic constants for the two enzymes toward OPS-I and -II were complicated by substrate inhibition at higher concentrations (Fig. 5, B and C). Nevertheless, these data support the results of the peptide specificity determinations and introduce "optimal" peptide reagents for measuring Pak kinase activity in vitro. Indeed, we have found that OPS-I serves as a sensitive and specific substrate for Group I Paks in in-gel kinase assays (data not shown).
Pak Residues Mediating the Distinct Substrate Preferences of Group I and II Paks-Pak2 and Pak4 share 55% amino acid sequence identity within their kinase domains. We sought to determine what sequence differences in the substrate-binding region are responsible for the selectivity differences between these two related kinases. Since no structures of Pak kinases bound to substrates have yet been reported, we predicted the regions of these kinases that interact with substrates by comparison with the reported structure of the related basophilic kinase protein kinase A (PKA) bound to a peptide substrate (48). We aligned the large (C-terminal) lobes of the kinase domains from the crystal structures of active forms of Pak1 (49) (Protein Data Bank code 1YHV) and Pak4 (37) (Protein Data Bank code 2CDZ) with PKA bound to a peptide substrate (48) (Protein Data Bank code 1JBP) as shown in Fig. 6A. One of the few differences between Pak1 and Pak4 in the putative substrate-binding region is the presence of the sequence Pro 307 / Lys 308 in human Pak1, which corresponds to Gln 358 /Arg 359 in human Pak4. These residues are absolutely conserved among other members of their respective Group I and Group II fami-  Fig. 2. At each sequence position, each of the naturally occurring amino acids (in single letter representation) is shown along with phosphotyrosine and phosphothreonine. The height of the letter is scaled by the absolute value of its log score. Positively selected amino acids are stacked above the position identifier from most selected (top) to least selected. Disfavored residues are stacked below the position identifier, from least to most disfavored (bottom). Negatively charged residues are shown in red, positively charged residues in blue, nonpolar residues in black, uncharged polar residues in gray, and phosphorylated residues in orange.
lies and are located in the ␤3-␣C loop of the small lobe of the kinase, near the predicted locations of the ϩ2 and ϩ3 amino acid side chains of the substrate. Since the phosphorylation motifs of the two kinase groups differ largely at these positions within substrates, we hypothesized that this pair of residues  could account for their differences. To test this, we mutated the Pro-Lys sequence found in Pak2 to Gln-Arg and conversely mutated the Gln-Arg sequence of Pak4 to Pro-Lys. The mutant proteins were expressed in E. coli, and their specificity was assessed using the degenerate peptide arrays as above.
These mutations had a profound impact on substrate selectivity at the ϩ2and ϩ3-positions (Fig. 6B), although selectivity at other positions was essentially unchanged (supplemental Fig.  1 and supplemental Table 1). In the Pak4 mutant, selection for alanine at ϩ2 and serine at ϩ3 was dramatically reduced, and instead selection for phosphotyrosine, as observed in Pak2, was evident. In the Pak2 mutant, phosphotyrosine selection at ϩ2 was lost, but the alanine and serine preferences of Pak4 were not acquired. These results demonstrate that the amino acid residues at positions 358 and 359 (human Pak4 numbering) are necessary and sufficient to mediate phosphotyrosine selection at the ϩ2-position by Pak2. However, the alanine and serine selectivity of Pak4 (at ϩ2 and ϩ3, respectively), although dependent on these residues, also requires additional amino acids that may not contact the substrate directly.
Identification of a Novel Pak1 Phosphorylation Site-Knowledge of protein kinase phosphorylation motifs has been useful in some cases for identifying novel protein substrates (50 -53). Indeed, the amino acid sequence surrounding a candidate phosphoacceptor is one critical determinant of whether a putative substrate will be phosphorylated by a kinase (54). Nevertheless, other factors also influence kinase substrate selectivity in vivo, including surface accessibility of the target site, other sites of interaction between kinase and substrates, and co-expression and co-localization of the kinase and substrate within the cell. To determine if the specificity data described above were sufficient to identify novel Pak1 substrates, we used Scansite (55) (available on the World Wide Web at scansite.mit.edu) and the Pak1 recognition matrix (see supplemental Table 1) to search the SwissProt database for human proteins best matching the Pak1 peptide substrate preferences. We found that established Pak substrates harboring serine phosphorylation sites (listed in supplemental Table 2) were not among the top 2% of proteins scored (of 11,655 human SwissProt sequences searched), suggesting that recognition motifs alone may not be sufficient to identify novel Pak substrates.
To test the predictive power of the specificity matrices to identify phosphorylation sites in specific proteins we used Scansite to quantitatively evaluate all serine residues in each protein for which at least one specific Group I Pak phosphorylation site has been identified (supplemental Table 2). Based on the selectivity data, Scansite calculates a single numeric score for each 15-amino acid sequence centered on a serine residue (55). Lower scores reflect better matches to the Pak1 PSSM. Fig. 7A shows a density histogram of the Scansite scores for serines targeted by Group I Paks and the scores for nonphosphorylated serines occurring in the same proteins, highlighting the ability of the PSSM score to discriminate between these two populations. Thus, given a known Pak1 substrate, the preference matrix can be used to facilitate the identification of serine residues most likely to be phosphorylated by Pak1.
Because the population of PSSM scores of true Pak1 substrate serines partially overlaps with the distribution of non-phosphorylated serines as seen in Fig. 7A, the PSSM does not completely distinguish between these two populations. Consequently, use of the scores to predict sites phosphorylated by Pak1 must balance sensitivity of site detection against the increasing likelihood of false positives at higher scores. This compromise between sensitivity (detection of true positives) and specificity (elimination of false positives) of an algorithm is typically represented by a receiver operating characteristic (ROC) curve (56). We explain this in more detail below.
We prepared a table of calculated Scansite scores for all of the analyzed serines (both Pak-phosphorylated and control, nonphosphorylated serines), ranked by increasing score. We then treated each Scansite score as a threshold and calculated the fraction of scores below that threshold that correspond to serines actually phosphorylated by Pak (true positives) and those not phosphorylated by Pak (false positives). Similarly, for Scansite scores above that threshold, we calculated the fraction of scores corresponding to serines phosphorylated by Pak (false negatives) and those not phosphorylated by Pak (true negatives). These fractions were calculated for all score thresholds. The ROC curve plots the fraction of true positives versus the fraction of false positives for all possible thresholds based on this empirically derived data set (54). Fig. 7B shows the ROC curve for the data shown in Fig. 7A.
The area under the ROC curve (AUC) represents the probability that for two randomly picked serines, where one is truly phosphorylated by Pak1 and one is not, Scansite would correctly assign a lower Scansite score to the truly phosphorylated serine than the serine not actually phosphorylated by Pak1 (56). The closer the ROC curve is to the diagonal (AUC ϭ 0.5), the less the algorithm is capable of discriminating between the two populations. An AUC of 1 would represent a perfect ability to discriminate between the groups. In our case, the estimated AUC is 0.78, indicating good sensitivity and specificity of the prediction. A useful feature of the ROC curve is that the AUC is not significantly affected by the distributions of the underlying populations. It is simply based on ranking the scores from the two populations.
We also plotted the expected distribution of true positives, true negatives, false positives, and false negatives as a function of Scansite score (Fig. 7C). This representation is useful, after having used Scansite to score all serines in a putative Pak1 substrate, for choosing a threshold score to reduce the number of candidate serines to be considered. This threshold could, for example, be made more stringent, based on a low tolerance for false positives, or more relaxed, in order to improve the likelihood of detecting all true sites, as desired.
We previously determined that Pak1 can phosphorylate ␤Pix (Pak-interactive exchange factor, also called Cool-1) at an unknown site (or sites) in vitro (data not shown). ␤Pix and its paralog ␣Pix (Cool-2) are guanine nucleotide exchange factors for Rac and Cdc42 that associate via a Src homology 3 domain with a proline-rich region in Pak1 and -2. In order to determine the phosphorylated residue(s) and to test the predictive power of the PSSM to identify novel Pak phosphorylation sites, we used Scansite and the Pak1 PSSM to score all serines in ␤Pix. Notably, Ser 525 in ␤Pix, a previously identified Pak2 phosphorylation site (57), was the second best scoring serine in ␤Pix.
The best candidate, Ser 340 , has not yet previously been identified as a Pak phosphorylation site. The analogous serine in ␣Pix, Ser 488 , is known from NMR structural studies (Protein Data Bank code 1V61) 5 to be located in a surface-exposed loop (the ␤3-␤4 loop) in the Pix pleckstrin homology domain and therefore is probably accessible to kinases. Notably, Ser 340 and its surrounding sequence are highly conserved in Pix orthologs from humans, zebrafish, chicken, and Drosophila (Fig. 7D). Both ␣Pix and ␤Pix isoforms contain an identical sequence, suggesting that this region may play an important structural or functional role. Furthermore, independent, large scale phosphoproteomics studies have detected phosphorylation of ␤Pix Ser 340 in A431 cells (PhosphoSite data base) (available on the World Wide Web at www.cellsignal.com), HeLa cells (58), and mouse brain (59), and phosphorylation of the analogous serine in ␣Pix, Ser 488 , was reported in HeLa (58) and Jurkat cell lines (60). We therefore tested whether this site could be phosphorylated by Pak1.
We generated a phosphospecific antibody against the conserved 10-amino acid phosphoepitope surrounding Ser 340 and used it to monitor ␤Pix phosphorylation in vitro using recombinant Pak1 and ␤Pix. As shown in Fig. 8A, active Pak1 promoted phosphorylation of Ser 340 , as observed by Western blots using the phosphospecific antibody. By contrast, when a nonphosphorylatable ␤Pix mutant (S340A) was used as substrate, no reactivity with the antibody was seen. These results demonstrate the ability of Pak1 to phosphorylate Ser 340 in vitro and validate the specificity of the phosphospecific antibody.
To determine if Group I Paks can also phosphorylate the analogous residue in ␣Pix (Ser 488 ), we expressed and purified a truncated form of ␣Pix (amino acids 155-546) in E. coli comprising either the wildtype sequence or with Ser 488 mutated to alanine (S488A). Phosphate incorporation into ␣Pix in vitro with either Pak1 (not shown) or Pak2 (Fig. 8B) was dramatically reduced in the S488A mutant, suggesting that Ser 488 in ␣Pix is also directly targeted by Group I Paks. A similar experiment using full-length wild type or S340A ␤Pix as a substrate revealed an ϳ55% reduction in radiolabeled phosphate incorporation in the mutant relative to wild type (data not shown).
Phosphoproteomic studies suggest that Ser 340 (␤Pix) and Ser 488 (␣Pix) is indeed phosphorylated under physiological conditions (58 -60). To test if Pak1 can mediate phosphorylation of this site in living cells, HEK293 cells were transfected with Myc-tagged ␤Pix along with constitutively active (T423E) or dominant negative (K299R) forms of Pak1, and phosphorylation of ␤Pix was monitored by Western analysis of anti-Myc immunoprecipitates using the phosphospecific Pix antibody (Fig. 8C)

DISCUSSION
Pak Substrate Specificity-Despite the fact that Pak1 and Pak2 are the most divergent members of the Group I Pak family, we find that they share essentially indistinguishable substrate specificity. Based on sequence homology, we anticipate Pak3 to behave similarly, suggesting the possibility of functional redundancy among Group I Paks. Significant differences were found, however, in the substrate specificities of Pak1/2 and the Group II member Pak4 (see Figs. 2-4). Whereas Pak1 and Pak2 preferred large hydrophobic residues in positions from ϩ1 to ϩ3, Pak4 showed unusually strong and markedly different specificity at the ϩ2and ϩ3-positions. Indeed, the second most strongly positively selected residue by Pak4 was for serine at the ϩ3-position. In addition, alanine was the most preferred resi-due at the ϩ2-position. We could attribute at least part of the distinct substrate preferences of Pak1/2 and Pak4 to two residues in the small lobe of the Paks (Gln 358 and Arg 359 in Pak4; Fig. 6). These residues are absolutely conserved within each Pak group, suggesting that the remaining Group II Paks will probably share similar sequence preferences at the substrate ϩ2and ϩ3-positions. To our knowledge, this is the first time that residues in the small lobe of a kinase have been experimentally linked to substrate selectivity.
These results are consistent with the possibility that Group I and II Paks may phosphorylate partially distinct sets of substrates and, therefore, fulfill distinct biological functions. Nevertheless, the few substrates of Group II Paks currently known are also phosphorylated by Group I Paks. Thus, the identification of additional Group II Pak substrates will be required to determine whether the peptide substrate specificity differences revealed here are exploited physiologically.
In addition to the distinguishing features described above, Group I and II Paks also share some common substrate preferences. Paks are unusual among the basophilic kinases in exhibiting a predominant preference for arginine at the Ϫ2-position rather than at the Ϫ3-position (33). Shaw and co-workers (33) recently proposed two acidic residues termed PEMϩ1 and YEMϩ1 as critical for selection of arginine at the Ϫ2-position. Consistent with this hypothesis, all six Paks conserve Asp and Glu at these positions, respectively, and our structural comparison with PKA places these residues in Pak at similar positions relative to the substrate arginine as in PKA (not shown). Similarly, the shallow, hydrophobic ϩ1-binding pocket in PKA is also largely conserved in both Group I and II Paks (composed in Pak4 of Pro 479 , Met 482 , Leu 475 , and Met 524 ) and probably explains the modest selectivity for hydrophobic residues at the ϩ1 substrate position by both Pak groups.
Based on the analysis of crystal structures of kinases bound to substrates, Zhu et al. proposed that two conserved amino acid residues known as the "toggle residue" and the "toggle-regulating residue" mediate proline deselection at the ϩ1 substrate position by AGC and calmodulin-dependent protein kinases (44). As for several members of these kinase families, the toggle residue in both Group I and II Paks is glycine, and the toggle-regulating residue is methionine. This observation is consistent with the hypothesis that these residues prevent phosphorylation of substrates with proline at ϩ1 and that this deselection evolved to confer a reciprocal substrate specificity in AGC, calmodulin-dependent protein kinases, and Ste20-related kinases from proline-directed CMGC kinases (44).
We recently reported crystal structures of the kinase domains of all three Group II Paks (Pak4 to -6) (37). Comparison of these structures with that of the kinase domain of Pak1 (49) revealed significant differences between Group I and II Paks in the rearrangements of the ␣C helix that most likely accompany catalysis. The different conformers analyzed in six high resolution structures of active Group II Paks revealed rearrangements of helix ␣C that result in an additional helical turn at the ␣C N terminus and a distortion of its C terminus. Interestingly, the result of this structural rearrangement is a swinging movement of Arg 359 in Pak4 as it becomes incorporated into the ␣C helix extension. In this conformation, Arg 359 is turned away from the substrate-binding site, forming a network of hydrogen bonds with the glycine-rich loop. These interactions stabilize a closed conformation of the glycine-rich loop, which would not be compatible with ATP binding. In another conformation, Arg 359 is oriented toward the activation loop and putative substrate-binding region, presumably contributing to the substrate selectivity described here. Strikingly, a proline residue is absolutely conserved at the position analogous to 358 in Group I Paks, which prevents the helical extension observed in Group II Paks. Indeed, we have found that the Q358P/R359K mutant of Pak4 exhibits an elevated K m for ATP. 6 Additional studies will be required to determine if these amino acid residues facilitate coordination of substrate binding and catalysis. Taken together, the specificity data and structural comparisons described here provide an essentially complete framework for understanding Pak kinase substrate selectivity.
Identification of Novel Pak Phosphorylation Sites-Data base mining for proteins best meeting kinase sequence specificity has been used to identify new kinase substrates. However, many variables can influence substrate phosphorylation other than amino acid sequence, such as accessibility of the target sequence to the kinase or the presence or absence of "docking sites" distinct from the phosphorylated sequence that may contribute to kinase-substrate interactions. Indeed, we observed that established Pak kinase substrates did not rank among the human proteins most closely matching the Pak1 or Pak2 specificity matrices. Thus, these other specificity-determining mechanisms may play a larger role in Pak substrate selection than in some other kinases. Nevertheless, the specificity data is predictive of Pak1 phosphorylation site in proteins known or expected to be Pak substrates (Fig. 7). Here we outline a strategy for using the substrate specificity data to identify new Pak1 phosphorylation sites in proteins already known or suspected to be phosphorylated by Pak1.
First, using the Pak PSSM determined here and the World Wide Web-based Scansite tool, all serine residues within the putative substrate are scored and ranked. A threshold score is chosen based on the expected fraction of true positives and false positives as outlined in Fig. 7C to reduce the complexity of candidate serines. Additional information can then be applied in order to prioritize these candidates for direct testing, such as 1) predicted or actual surface accessibility of each serine, 2) evolutionary conservation of the serine, 3) their location within functionally important domains, or 4) known sites of serine phosphorylation from phosphoproteomic studies.
The Pak1 phosphorylation site in ␣Pix and ␤Pix identified here falls within a surface-exposed loop between ␤3 and ␤4 strands of the pleckstrin homology domain of Pix proteins. Intriguingly, the analogous loop in the pleckstrin homology domain of the guanine nucleotide exchange factor Dbs directly contacts its substrate, Cdc42, promoting guanine nucleotide exchange (61). We are currently investigating whether this region of Pix plays a similar role in regulating Pix guanine nucleotide exchange factor activity.