An Oriented Peptide Array Library (OPAL) Strategy to Study Protein-Protein Interactions*

One of the major questions in signal transduction is how the specificities of protein-protein interactions determine the assembly of distinct signaling complexes in response to stimuli. Several peptide library methods have been developed and widely used to study proteinprotein interactions. These approaches primarily rely on peptide or DNA sequencing to identify the peptide or consensus motif for binding and may prove too costly or difficult to accommodate high throughput applications. We report here an oriented peptide array library (OPAL) approach that should facilitate high throughput proteomic analysis of protein-protein interactions. OPAL integrates the principles of both the oriented peptide libraries and array technologies. Hundreds of pools of oriented peptide libraries are synthesized as amino acid scan arrays. We demonstrate that these arrays can be used to map the specificities of a variety of interactions, including antibodies, protein domains such Src homology 2 domains, and protein kinases.

One of the major questions in signal transduction is how the specificities of protein-protein interactions determine the assembly of distinct signaling complexes in response to stimuli. Several peptide library methods have been developed and widely used to study proteinprotein interactions. These approaches primarily rely on peptide or DNA sequencing to identify the peptide or consensus motif for binding and may prove too costly or difficult to accommodate high throughput applications. We report here an oriented peptide array library (OPAL) approach that should facilitate high throughput proteomic analysis of protein-protein interactions. OPAL integrates the principles of both the oriented peptide libraries and array technologies. Hundreds of pools of oriented peptide libraries are synthesized as amino acid scan arrays. We demonstrate that these arrays can be used to map the specificities of a variety of interactions, including antibodies, protein domains such Src homology 2 domains, and protein kinases.
Signaling proteins are often composed of distinct protein domains or modules. Some of these serve as protein-protein interaction units, whereas others carry out enzymatic reactions (1). The elucidation of the genetic codes of various organisms, including humans, has presented an increasing number of protein domain families with unknown functions.
To understand how signaling proteins interact with each other and maintain the fidelity of signal transduction, it is critical to determine the specificities of these protein domains. Protein domains (such as SH2, 1 SH3, and phosphotyrosine binding (PTB) domains) often mediate protein-protein interactions by recognizing short stretches of peptide sequences on target proteins (2,3). In the past several years, many combinatorial peptide library based approaches have been developed to analyze the binding or substrate specificities of protein do-mains (reviewed in Ref. 4). These peptide libraries can be classified as either display or pool peptide libraries.
The display libraries employ a variety of matrices for individual peptide presentation, including pin (5), phage (6 -8), beads (9), DNA binding proteins (10), or ribosomes (11). Following multiple rounds of affinity enrichment steps, the identities of the selected peptides can be ascertained by DNA or peptide sequencing (4). Display peptide libraries generally require isolation and sequencing of individual peptides to decode a binding motif, which is time-consuming and costly, limiting it to low throughput applications.
In contrast, the pool peptide libraries rely on the comparison of binding affinities of different pools of random peptides to obtain binding consensus motifs (12). Thus, the sequences of individual bound peptides do not need to be resolved. A few years ago, we developed an oriented peptide library method to further simplify and facilitate screening (13). In this method, soluble pools of random peptides are oriented via a central "fixed" amino acid, which reduced the degeneracy of the library and prevented possible phase problems arisen from interacting residues being located at different positions on different peptides. Bound peptides or those modified by the protein of interest are then enriched and sequenced as a mixture to deconvolute the consensus motif in a statistical manner. The soluble oriented peptide library approach still depends on peptide sequencing, and a large amount of purified proteins (ϳ100 g) are generally needed for the assay. In addition, the occasional incomplete cleavage of peptides during Edman degradation sequencing can result in a carryover of amino acids from one sequencing cycle to the next, thereby complicating the results.
We report here an oriented peptide array library (OPAL) approach in which hundreds of pools of oriented peptide libraries are synthesized and arranged as scan arrays. We showed that OPAL could be used as a high throughput method to study the specificities of protein domains.
Peptide Library Synthesis-The peptide libraries were synthesized on cellulose membrane using the ASP222 SPOT robot (16). An extra fine needle tip was used to achieve a density of ϳ1600 peptides per SPOT membrane (8 ϫ12 cm). The estimated yield of peptide at each spot was ϳ5 nmol. The following oriented peptide library arrays were synthesized for phosphorylation dependent interactions: AXXXX [ where X is a mixture of 19 amino acids (except Cys), pS is phosphoserine, pY is phosphotyrosine, pT is phosphothreonine, and the brackets ([/]) encase the amino acids that were preferred by the protein of interest. To generate oriented peptide library pools, each degenerated position was scanned with any of the 19 amino acids (excluding Cys).
A library with the sequence AX 1 X 1 X 1 X 1 X 1 [S/T]X 1 X 1 X 1 X 1 A was synthesized for protein kinases, where X 1 represents any amino acid but not Ser, Thr, and Cys. Ser and Thr are excluded at the degenerate X 1 positions; thus, only the fixed Ser/Thr can be phosphorylated. To generate oriented peptide library pools, each degenerate position was substituted with any of the 20 amino acids.
OPAL Analysis of Antibody and SH2 Domain Specificities-All steps were carried out at room temperature unless otherwise specified. The OPAL membrane was first blocked with 5% bovine serum albumin in TBST (0.1 M Tris-HCl (pH 7.4), 150 mM NaCl, and 0.1% Tween 20) for 2 h. GST fusion proteins (3 g) were labeled with GSH-horseradish peroxidase (Sigma) (0.6 g) for 30 min and then added to the array membrane at a final concentration of 1 g/ml for 30 min (17). The array membrane was subsequently washed three times with TBST for 10 min, and peptide spots that bound the domain of interest were visualized by enhanced chemiluminescence.
To determine the specificities of antibodies, monoclonal antibodies MPM2 (1 g/ml) (18), 3F3/2 (10 g/ml) (19), and PY20 (1 g/ml, Santa Cruz Biotechnology) were first incubated with OPAL membranes for 30 min at room temperature. The membranes were washed three times with TBST for 15 min, probed with anti-mouse horseradish peroxidase, and bound peptide spots were visualized by enhanced chemiluminescence.
Kinase Assays-Kinase assays on membranes were performed as described (20). Briefly, the OPAL membrane was blocked overnight in kinase buffer (62.5 mM Mes, pH 6.5, 20 mM MgAc2, 100 mM NaCl, 1 mM dithiothreitol, and 3% bovine serum albumin). The next day, 2 g of crude protein kinase A (Sigma) in kinase buffer containing 25 M ATP and 20 Ci of [␥-32 P]ATP was added to the OPAL membrane and incubated for 2 h at 30°C. The membrane was then washed five times with 1 M NaCl and ten times with 5% phosphoric acid. 32 P-labeled peptide spots were visualized and quantitated with PhosphorImager (Amersham Biosciences). His 6 -Cdc15 was prepared in insect cells and purified using nickel-nitrilotriacetic acid resin (Qiagen).
Data Base Search-A weighed data matrix was first generated based on the OPAL analysis result. Data base search of PKA substrates was performed on Scansite (21).

RESULTS AND DISCUSSION
The Oriented Peptide Array Library Strategy-The OPAL method involves synthesis and printing of oriented peptide libraries as spotted arrays on solid supports (Fig. 1). OPAL can be synthesized on solid phase and then cleaved to be spotted on array surfaces such as gold or glass slides. Alternatively, OPAL can be synthesized directly onto solid supports such as SPOT membranes (16). For example, an oriented library AXXXX[pS/ pT]XXXXA was designed for proteins that recognize phosphoserine or phosphothreonine ( Fig. 2A) in which pS and pT are fixed while X is random (any natural or amino acid except for Cys). Because there is no pS or pT in the random positions, pS/pT becomes the functional interaction center. The random positions are thus oriented with regard to their distance to the fixed pS/pT. The relative importance of different amino acids at any given position can then be addressed by using a scan library strategy (12,22). For instance, for Z in ZXXX(pS/ pT)XXXX (i.e. position P Ϫ 4 N-terminal to the pS/pT), 19 pools of libraries were synthesized. Each pool had one of the 19 amino acids (except Cys) fixed at position Z and was spotted as a single spot on the array. As a result, each pool/spot can, in theory, contain a half-billion oriented peptides. A total of 152 (19 amino acids ϫ 8 positions) pools of libraries were similarly spotted for the AXXXX[pS/pT]XXXXA library. The OPAL array essentially scans for the amino acids on the x axis and for the positions on the y axis. If a library pool at any given spot is preferred by the protein of interest, this mixture will be detected by antibody blotting and show up as a positive spot. Consequently, the specificity of a protein domain can be determined by reading the library array without complicated data manipulation.
Epitope Mapping of Phospho-specific Antibodies Using OPAL-The OPAL approach was first tested to determine the epitopes of phospho-specific antibodies such as MPM2. MPM2 has been shown to react with phosphoproteins in the M phase of the eukaryotic cell cycle (18 (23,24).
We synthesized a pS/pT fixed OPAL (AXXXX[pS/pT]XXXXA, Fig. 2A) directly on SPOT membranes with a density (ϳ20 library pools/cm 2 ) four times that of conventional SPOT arrays. The MPM2 antibody was incubated with this OPAL, and the MPM2-bound library spots were then visualized by enhanced chemiluminescence. As shown in Fig. 2B, stronger binding was observed at the P Ϫ 1 and P ϩ 2 positions, where hydrophobic amino acids (mainly Ile, Leu, Val, and Phe) were selected. Hydrophobic amino acids were also preferred (but more weakly) by MPM2 at the P Ϫ 2 and P ϩ 3 positions. At the P ϩ 1 position, we found that Gly was also selected in addition to Pro, which was previously known to be there.  findings are in good agreement with published data (23,24) and indicate that the OPAL approach can be used to analyze the binding specificity of antibodies.
We went on to study the epitopes of two additional phosphospecific antibodies, 3F3/2 and PY20 (Fig. 2, C and D). Monoclonal antibody 3F3/2 was originally developed against Xenopus egg extracts that had been supplemented with ATP␥S (25). It was later found to recognize an unknown kinetochore phospho-epitope (19 (Fig. 2C). This specificity is consistent with a known site recognized by 3F3/2 on human topoisomerase II (DFDEKTDDED) (26). Furthermore, the OPAL method can be applied to examine a phosphotyrosine-specific antibody. As shown in Fig. 2D, the monoclonal PY20 antibody recognized almost all the spots equally well except for Asp/Glu at positions P ϩ 1 and P ϩ 3, implying a broad specificity of this antibody. These results further validated the OPAL method.
Determining the Binding Specificities of Protein Domains-A large number of both known and new domains that have been identified through the genome projects await the elucidation of their function and targets. Conventional peptide and oriented peptide library techniques have been used extensively to map the binding specificities of protein-protein interaction domains. In many cases, this has yielded novel insights into signaling interactions through data base searches for sequence motifs. In comparison, the OPAL technique holds a clear advantage over the conventional methods, as it requires no peptide sequencing or data manipulation. We decided to verify this with SH2 domains. SH2 domains are a large family of phosphotyrosine (pY)-dependent interaction modules that have been extensively studied.
We started with the SH2 domain of the adaptor signaling molecule GRB2, because its specificity is known (27). As predicted, OPAL analysis indicated that the GRB2 SH2 domain  Fig. 4A were transformed into bar graphs to indicate the relative importance of amino acids at each position N-and C-terminal to the fixed Ser/Thr residue. The total values were normalized to 20 (number of possible natural amino acids) such that a value of Ͻ1 would denote negative selection. recognized primarily Asn at the P ϩ 2 position with little or no selectivity for residues N-terminal to the pY (Fig. 3A). Ile and Val or Leu residues were also selected at the P ϩ 1 and P ϩ 3 positions. This motif was determined in a single scan and is in perfect agreement with previous analyses of the GRB2 SH2 domain (13).
We then went on to study three additional SH2 domains, namely SHP-2 C-terminal SH2 (28,29), GRB7, and GRB10 (14). SHP-2, a phosphatase containing two SH2 domains, is an important regulator of cell proliferation and development (30). The N-terminal SH2 domain has been shown to select a pY[IV]X[FIV] motif (13), but the preference of the C-terminal SH2 domain (CSH2) has not been systematically investigated. As shown in Fig. 3B, the SHP-2 CSH2 domain selected hydrophobic amino acids at the P ϩ 1, P ϩ 2, and P ϩ 3 positions, similar to previous results (30). Interestingly, our OPAL analysis showed that hydrophobic amino acids (particularly Ile, Leu, and Val) were also strongly preferred at the P Ϫ 3, P Ϫ 2, and P Ϫ 1 positions. Thus, the consensus motif for the SHP-2

CSH2 domain is [ILV][ILV][ILVFTY]pY[TILV][IL][ILVP]
. This result suggests that the Ile, Leu, and Val residues flanking the phosphotyrosine may be necessary for SHP-CSH2 domain binding. It further offers an explanation for the confusion regarding the known in vivo binding sites of SHP-2, where Ile, Leu, and Val have been shown to be important for binding at either the P Ϫ 2, P ϩ 2, or P ϩ 3 positions (30).
GRB7 and GRB10 are two closely related members of the GRB7 family proteins (14). They have been shown to interact with many receptor protein tyrosine kinases and mediate a variety of signaling activities (31). However, the specificities of their SH2 domains are largely unknown. Examination of both the GRB7 and the GRB10 SH2 domains using the phosphotyrosine OPAL revealed that both domains strongly selected Asn at the P ϩ 2 position and preferred hydrophobic residues at the P ϩ 3 position (Fig. 3, C and D). Therefore, the GRB7 and GRB10 SH2 domains shared the same specificity, recognizing the motif [FY]pY[ETYS]N [ILVPTYS]. It should be noted that the selection of Asn at P ϩ 2 fits well with the known binding sites of these SH2 domains (31,32). The above studies demonstrate that OPAL is reliable at decoding protein-protein inter-action domain specificities. Furthermore, it provides new insights into the function of domains with both known and unknown specificities.
OPAL as a Tool to Study Substrate Specificities of Protein Kinases-The target specificities of protein kinase domains play a critical part in directing the flow of signal cascades. We therefore went on to investigate whether OPAL can be applied to the study of protein kinase specificity. For this purpose, an OPAL array with the sequence AX 1 X 1 X 1 X 1 X 1 [S/T]X 1 X 1 X 1 X 1 A was generated for protein Ser/Thr kinases where Ser and Thr are invariant and X 1 is random (any natural or modified amino acid except for Ser, Thr, and Cys) (Fig. 4). The random position was subsequently substituted by one of the 20 natural amino acids, including Ser and Thr, and would become fixed. At any given spot, if a library pool was preferred by the kinase of interest, this mixture would be labeled in the presence of [␥-32 P]ATP. The amount of 32 P incorporation should then reflect the preference by the kinase for the fixed amino acid at this position. An added advantage of this design is the inclusion of Ser and Thr among the "scanned" amino acids, which should allow us to determine whether a kinase prefers Ser or Thr.
We first examined the kinase domain specificity of PKA using the array. As shown in Fig. 4A, positions relative to Ser/Thr are indicated on the y axis. The amino acids fixed for each spot is indicated on the top x axis. At the P Ϫ 2 and P Ϫ 3 positions, Arg is preferred by PKA. At the P ϩ 1 position, hydrophobic amino acids are generally favored by PKA, with Ile being the best favored. PKA also preferred Ser over Thr (see positions Ϫ1 to ϩ4). These results correspond perfectly with the known specificity of PKA (RRXSI) and yield a clearer picture regarding PKA specificity than was previously obtained using the soluble oriented peptide library approach (33).
Interestingly, the OPAL approach also identified novel sequence preferences of PKA. At the P ϩ 2 site we found that PKA phosphorylated peptides with Cys and Asp 1.9-and 2.7fold better, respectively, than those with other amino acids (e.g. Gly). To confirm this finding, several peptides that differed only at the P ϩ 2 residue (ARRGSIGFI, ARRGSIDFI, and ARRGSICFI, with the P ϩ 2 residue set in boldface) were synthesized based on the deduced PKA substrate specificity  (Fig. 4A). Kinetic analyses demonstrated that the Kemptidelike peptide was phosphorylated by PKA with a K m of ϳ23 M (Table I). This value is similar to previously reported data using a LRRASLG peptide as the substrate (34). In comparison, substitution of P ϩ 2 residue to Asp and Cys lowered the K m value to 8.8 and 9.1 M, respectively, while improving the overall K cat by 3.5-and 2-fold. These data are in good agreement with the array analysis result. Our analyses underscored, therefore, the simplicity and power of the OPAL approach to mapping the specificities of protein kinases. We also expanded the analysis of OPAL for determining protein kinase specificity to a kinase, budding yeast Cdc15, whose specificity had not been previously studied using conventional peptide library approaches. Cdc15 functions in the mitotic exit network (MEN) and acts directly upon another protein kinase, Dbf2, and its ortholog, Dbf20 (35). In contrast to PKA, little or no selectively was observed for amino acids at the P Ϫ 5 to P Ϫ 1 positions or at the P ϩ 1 position (Fig. 4B). However, arginine and lysine residues were strongly selected at positions P ϩ 2, P ϩ 3, and P ϩ 4 (Fig. 4B). Interestingly, the only known targets of Cdc15 (35), Thr-544 in Dbf2 and Thr-536 in Dbf20, are both compatible with the specificity determined by OPAL (TFRHR, where Thr is phosphorylated).
Prediction of in Vivo Target Sites of Protein Kinases-In the era of genome-wide studies, the identification of recognition motifs should more readily allow the prediction of candidate targets. The OPAL array from Fig. 4A was quantitated using a PhosphorImager, and the data were plotted into bar graphs to compare the relative importance of amino acids at each scanned position (Fig. 5). These graphs again highlighted the requirements for Arg residues at positions P Ϫ 3 and P Ϫ 2 and hydrophobic amino acids at P ϩ 1 for PKA substrates. Because the amount of 32 P incorporation at each OPAL spot should correlate with the degree to which a given amino acid was preferred by the kinase examined, we reasoned that the relative values in Fig. 5 could be used as a Scansite matrix to search for potential in in vivo targets of PKA. Indeed, the resulting matrix successfully predicted PKA substrates in the Swiss-Prot data base (Table II). Among the top 25 candidates, 11 are known substrates of PKA, whereas other three have been shown to be regulated by the cAMP/PKA pathway. This result demonstrates that OPAL analyses can be coupled with the Scansite program to predict in vivo substrates of protein kinases.
Perspective-Traditional SPOT and pool peptide library approaches may encounter the phasing problem. For a pentapeptide library (XXXXX), positions 1 to 5 can be scanned with any one of the 20 amino acids to make 100 (5 positions ϫ 20 amino acids) library pools (36). In this library, if a DR motif is preferred by the protein, pool libraries such as DRXXX, XDRXX, XXDRX, or XXXDR can all be positively selected. The recognition motif will then become difficult to interpret. Similarly, screening of pool libraries employed the iterative approach, and multiple rounds of library synthesis and screening are required (12,20,37). For example, sub-libraries (XXX12XXX) that contain two defined amino acids at positions 1 and 2 flanked by degenerate residues X (a mixture of 20 amino acids) are synthesized as arrays on cellulose. The best amino acids at positions 1 and 2 (B 1 B 2 ) from the first round of screening are then incorporated into the next library synthesis (e.g. XXXB 1 B 2 12X). In contrast, peptide arrays of OPAL are oriented via a central amino acid for interactions. This not only allows convenient data interpretation but also generates a weighed data set that can be used to search potential targets in the data base.
In this report, we have demonstrated that the OPAL method is a highly efficient and informative tool for studying proteinprotein interactions. Compared with conventional peptide library methods, several advantages of OPAL analysis stand out. For example, data interpretation becomes easier as the need for peptide sequencing is eliminated and the results are "read" directly from the array. The amount of proteins required for analysis is also far less than that needed for soluble oriented libraries. In particular, if peptides are printed as microarrays on glass slides, the materials used will be minimal. For analyzing protein module binding, the arrays can be reused multiple times. Furthermore, by improving synthesis conditions, we were able to synthesize a large number of arrays within a short time (Ͼ8,000 peptide pools (12-15-mer) in a week). Therefore, it should be easier to adopt OPAL for high throughput screening of binding or enzymatic specificities of protein domains in the human genome.