Characterization of Phosphopeptide Motifs Specific for the Src Homology 2 Domains of Signal Transducer and Activator of Transcription 1 (STAT1) and STAT3*

Signal and activators of transcription (STAT) 1 and STAT3 are activated by overlapping but distinct sets of cytokines. STATs are recruited to the different cytokine receptors through their Src homology (SH) 2 domains that make highly specific interactions with phosphotyrosine-docking sites on the receptors. We used a degenerate phosphopeptide library synthesized on 35- (cid:1) m TentaGel beads and fluorescence-activated bead sorting to determine the sequence specificity of the peptide-binding sites of the SH2 domains of STAT1 and STAT3. The large bead library allowed not only peptide sequencing of pools of beads but also of single beads. The method was validated through surface plasmon resonance measurements of the affinities of different peptides to the STAT SH2 domains. Further-more, when selected peptides were attached to a truncated erythropoietin receptor and stably expressed in DA3 cells, activation of STAT1 or STAT3 could be achieved by stimulation with erythropoietin. The combined analysis of pool sequencing, the individual peptide

Signal transducers and activators of transcription (STAT) 1 and STAT3 are activated by overlapping but distinct sets of cytokines. STATs are recruited to the different cytokine receptors through their Src homology (SH) 2 domains that make highly specific interactions with phosphotyrosine-docking sites on the receptors. We used a degenerate phosphopeptide library synthesized on 35-m TentaGel beads and fluorescenceactivated bead sorting to determine the sequence specificity of the peptide-binding sites of the SH2 domains of STAT1 and STAT3. The large bead library allowed not only peptide sequencing of pools of beads but also of single beads. The method was validated through surface plasmon resonance measurements of the affinities of different peptides to the STAT SH2 domains. Furthermore, when selected peptides were attached to a truncated erythropoietin receptor and stably expressed in DA3 cells, activation of STAT1 or STAT3 could be achieved by stimulation with erythropoietin. The combined analysis of pool sequencing, the individual peptide sequences, and plasmon resonance measurements allowed the definition of SH2 domain binding motifs. STAT1 preferentially binds peptides with the motif phosphotyrosine-(aspartic acid/glutamic acid)-(proline/ arginine)-(arginine/proline/glutamine), whereby a negatively charged amino acid at ؉1 excludes a proline at ؉2 and vice versa. STAT3 preferentially binds peptides with the motif phosphotyrosine-(basic or hydrophobic)-(proline or basic)-glutamine. For both STAT1 and STAT3, specific high affinity phosphopeptides were identified that can be used for the design of inhibitory molecules.
The signal transducers and activators of transcription (STATs) 1 constitute a family of latent cytoplasmic transcrip-tion factors that are activated by a large number of cytokines, growth factors, and hormones. The binding of these extracellular signaling polypeptides to specific cell surface receptors typically results in receptor homo-or heterodimerization and consecutive activation of receptor-associated protein tyrosine kinases of the Jak family. Activated Jak kinases phosphorylate tyrosine residues in the intracellular domains of the receptors (1). STATs then bind with their SH2 domains to these receptordocking sites. The Jak kinases phosphorylate the STATs on a single tyrosine located carboxyl-terminal to the SH2 domain (2). The tyrosine phosphorylation of STATs is the decisive activation event, resulting in STAT dimer formation through mutual SH2 domain-phosphotyrosine interactions. STAT dimers translocate into the nucleus, bind to response elements in gene promoters, and enhance the transcription of these target genes (3)(4)(5).
Seven mammalian STAT genes have been identified in three chromosomal clusters (6). The different STAT proteins are activated by distinct cytokines and growth factors, and each STAT protein activates a distinct set of target genes (5,7). The specific coupling of the different STAT family members to cytokine receptors is crucial for the generation of diverse intracellular signals by different cytokines.
The molecular mechanisms and rules that form the basis of the selective binding of STAT family members to cytokine receptors are only partly understood. Specificity is not controlled by the Jaks but rather by the ability of tyrosine-phosphorylated receptor complexes to recruit specific STATs through binding of their SH2 domains. Elegant experiments have demonstrated that such receptor "docking" sites can be transferred from one receptor to the other and that such chimerical receptors will then activate additional STAT family members (8). Conversely, swapping of STAT SH2 domains can change the receptors to which the chimerical STATs are recruited (9). Further experiments have confirmed that the highly specific phosphotyrosine-SH2 domain interaction is the basis for the specific activation of STATs by cytokines (10 -13).
Over the last years, most of the cytokine receptor docking sites for STATs have been identified by mutational analysis of intracellular receptor domains (reviewed in Ref. 5). However, attempts to define binding motifs specific for the different STAT proteins through the analysis of several dozens of such docking sites had only modest success. The number of known receptor docking sites is not large enough to allow definitions of the peptide-binding motifs for the different STAT SH2 domains.
Analysis of crystal structures of phosphotyrosine peptides bound to SH2 domains has shown that amino acids carboxylterminal to the phosphorylated tyrosine make extensive contacts with the SH2 domain (14 -17). The critical importance of the four to five carboxyl-terminal amino acids for the specificity of binding to the SH2 domain has been confirmed by experiments with phosphopeptide libraries (18 -21). Purified SH2 domains of several signaling molecules were incubated with libraries composed of degenerate phosphopeptides, followed by isolation and sequencing of mixtures of high affinity binding peptides. The results of such pool sequencing experiments can be used to define binding motifs specific for the different SH2 domains. The approach is very powerful in screening large numbers of different peptides, but an important caveat has to be kept in mind. The prediction of optimal phosphopeptide motifs assumes that selection at each position is independent of the adjacent amino acids. However, it is highly conceivable that adjacent amino acids strongly influence each other. In this case, pool sequencing approaches are prone to predict incorrect optimal peptide motifs.
STAT SH2 domains have not yet been analyzed by phosphopeptide library experiments. We have published previously (20) a phosphopeptide screening method using fluorescenceactivated bead sorting. Here we have analyzed the binding properties of STAT1 and STAT3 SH2 domains with an improved library approach using a phosphopeptide library synthesized on 35-m TentaGel beads that allowed single peptide sequencing as well as pool sequencing of high affinity peptides. The novel large bead library method was validated through systematic affinity measurements by surface plasmon resonance of selected library peptides. Furthermore, selected peptides were studied in vivo. They were attached to a truncated erythropoietin receptor (EpoR) that lacks intracellular phosphotyrosine-docking sites, and the chimeric receptors were stably expressed in DA3 cells that have no endogenous EpoRs. Stimulation of transfected cells with erythropoietin (Epo) resulted in a specific activation of STAT1 or STAT3.
Based on the combined analysis of single peptide and pool sequencing results, we predict optimal binding motifs for STAT1 and for STAT3. Furthermore, affinity measurements by surface plasmon resonance identified high affinity peptides specific for STAT1 or STAT3 that can be used as models for the rational design of specific inhibitory molecules.

Subcloning, Expression, and Purification of STAT1, STAT1t, and STAT3
The portion of the human STAT1 gene encoding residues 135-712 was amplified by PCR (Vent polymerase; New England Biolabs). An NdeI restriction site was introduced by the 5Ј primer and the FLAG epitope followed by a stop codon, and the BamHI restriction sites were introduced by the 3Ј primer. The product was cloned into the NdeI/ BamHI restriction sites of the pET20b Escherichia coli expression vector and sequenced (Novagen). Growth and induction of transformed E. coli (BL21DE3(pLysS)) was performed as outlined in the instruction manual. About 50% of induced protein remained soluble and were subsequently isolated. Cells were collected by centrifugation (20 min; 4°C; 20,000 ϫ g) and resuspended in ice-cold extraction buffer (100 ml/30 g of cells: 20 mM Hepes/HCl, 0.1 M KCl, 10% glycerol, 1 mM EDTA, 10 mM MnCl 2 , 20 mM DTT, 100 units/ml DNase I (Roche Molecular Biochemicals), Complete TM protease inhibitor tablets (Roche Molecular Biochemicals), pH 7.6). Cells were lysed by 3 cycles of freezing/thawing. Lysis was allowed to complete at 4°C while stirring slowly for 1 h. The homogenate was centrifuged for 20 min at 21,000 rpm at 4°C. 0.1% polyethyleneimine (Sigma) was added to the supernatant; the solution was mixed gently and centrifuged for 15 min at 15,000 rpm. 50 mM NaCl was added to the supernatant containing the soluble STAT1 before it was applied to anti-FLAG M2 affinity gel beads and purified according to the instructions from the manufacturer. FLAG fusion protein was eluted by competitive elution with 3 column equivalents of FLAG peptide, 3 times for 1 h at 4°C on a rotating wheel. As the FLAG peptide did not interfere with subsequent experiments, no further purification steps were necessary. The purified proteins were stored at Ϫ70°C. All buffers used during protein purification contained 2 mM DTT and were chilled, thoroughly degassed, and flushed with N 2 before use. The protein concentration of the purified STAT1t was determined using the Bio-Rad protein assay. The purity of purified STAT1t was analyzed by SDS-PAGE and Coomassie Brilliant Blue R-250 (Bio-Rad) staining.
Complementary DNAs encoding human full-length STAT1 and murine full-length STAT3 lacking the last stop codons were amplified by PCR (Vent polymerase), and an XhoI restriction site was introduced by the 5Ј primer. The FLAG epitope followed by a stop codon and the PacI site was introduced by the 3Ј primer. The products were sequenced and subsequently cloned into the XhoI/PacI restriction sites of the baculovirus transfer vector pBacPAK8 (Clontech) and subsequently co-transfected into Sf21 insect cells using the Clontech/BacPAK6 DNA kit as outlined in their instruction manual. Single plaques were picked and checked for proper STAT1 and STAT3 recombinant virus expression by PCR and Western blotting using an anti-FLAG antibody, respectively. The positive clones were further amplified in TN5 insect cells for subsequent expression and purification. TN5 insect cells in suspension culture (500,000 cells/ml) were infected with 1 multiplicity of infection of recombinant virus and harvested by centrifugation (1500 ϫ g, 15 min) 32 (STAT1) or 48 h (STAT3) post-infection. The cells were lysed in 10 ml/1 ϫ 10 8 cells ice-cold extraction buffer (20 mM Mes, 100 mM KCl, 10 mM NaF, 10 mM Na 2 HPO 4 /NaH 2 PO 4 , pH 7.0, 0.02% NaN 3 , 4 mM EDTA, 1 mM EGTA, 20 mM DTT, Complete protease inhibitors (Roche Molecular Biochemicals), pH adjusted to 7.0 with Tris) with a Dounce homogenizer (2 times for 10 strokes). The lysate was cleared by centrifugation at 20,000 ϫ g for 30 min at 4°C. 50 mM NaCl was added to the supernatant before it was applied to Anti-FLAG M2 affinity gel beads. Further purification and analysis of the final preparation were performed as described above for STAT1t, except that for STAT3 the elution buffer contained 20 mM Tris, 20% glycerol, 300 mM KCl, 0.2 mM EDTA, 2 mM DTT, and 0.1% Nonidet P-40.
The Grb2-SH2 domain-GST fusion protein was expressed and purified as described (20).

Synthesis of the Phosphopeptide Libraries
Synthesis of Phosphotyrosine Library PL408-2br-6.0 g of aminomethyl functionalized TentaGel M30352 (RAPP Polymer GmbH) with a diameter of 35 m, ϳ2.1 billion polymer particles, were used as starting material for solid phase peptide synthesis by the Fmoc/t-butyl protection strategy. Fmoc amino acids including Fmoc-Tyr (PO(OH) 2 ) were obtained from Novabiochem and Propeptide (Vert-Le-Petit, France). 6 equivalents of Fmoc-protected amino acid (0.4 M solution in DMF) were coupled by in situ activation with 6 eq of benzotriazol-1-yl-oxytripyrrolidinophosphonium hexafluorophosphate (0.4 M in DMF) and 12 eq of N-methylmorpholine (0.8 M in DMF) for 1 h 30 min at room temperature. The complete synthesis of the library was carried out according to the "split and pool synthesis" method described by Lam et al. (22), where at each coupling stage each amino acid is coupled individually in separate reactors. After completion of the coupling, the contents of all reactors were combined, washed with DMF 6 times, N-deprotected with 20% piperidine/DMF 2 times for 30 min, and washed with DMF 6 times. The wet slurry of beads was transferred into a polypropylene bag. The bag was then closed with a thermic sealing device. The bag was placed on a plane surface and gently treated with a rubber-coated roller of 3 cm width for efficient deaggregation and mixing of the beads needed in the "split and pool" process. The beads were then removed from the bag, washed with DMF, and divided into the reactors for the next coupling step. Side chain deprotection was carried out with 90% trifluoroacetic acid and 5% water and 5% triisopropylsilane as scavengers at room temperature for 3 h. The beads were washed with trifluoroacetic acid/ water, 95:5, tetrahydrofuran, DMF, 0.5 M N-ethyldiisopropylamine in DMF, DMF, isopropyl alcohol, and finally ethanol. Each bead carrying peptides of only one type of sequence covalently linked via a branched spacer of Lys-Gly-⑀-aminohexanoic acid to the polymer. The library beads were resuspended in ethanol, passed through a cell strainer of 0.1-mm mesh size (BD Biosciences, Falcon catalog number 2360), and stored at Ϫ20°C.
Synthesis of Phosphotyrosine Library PL407 and of the Non-phosphorylated Tyrosine Library PL409-10 -The synthesis of the libraries PL407 and PL409-10 was performed as described for the library PL408-2br, except that 5.7 and 5 g, respectively, of aminomethyl-functionalized TentaGel M30102 (RAPP Polymere GmbH) with a diameter 10 m was used. After the coupling steps, the beads were directly mixed by stirring the wet slurry in DMF without applying the deaggregation method with a roller described for the library PL408-2br. The structure of the library PL407 was XXXY*XXXXX-Ahx-NH-bead and for library PL409-10 XXXXXYXXXXXX-(Ahx) 2 -NH-bead, whereas Y is tyrosine, Y* is phosphotyrosine, X includes all of the natural amino acids except cysteine, and Ahx is the spacer ⑀-aminohexanoic acid.
Incubation of the STAT1, STAT1t, and STAT3 with the Libraries PL407 and PL409-10 Different concentrations of FLAG-tagged purified STAT1, STAT1t, or STAT3 proteins were incubated with 250 g of beads (600 beads/g) in a total volume of 100 l of incubation buffer (PBS, 1% bovine serum albumin, 0.1% NaN 3 , 0.05% Tween 20, 1 mM DTT, sterile-filtered, pH 7.2) overnight at 4°C on a rotating wheel. Beads were centrifuged at 1500 ϫ g for 5 min, washed 4 times with cold incubation buffer, resuspended in 50 l of incubation buffer, and incubated once more with 1 l of anti-FLAG-M2 FITC fluorescently tagged antibody (0.22 mg/ml) for 1 h in the dark on ice mixed occasionally by vortexing. Anti-FLAG-FITC antibody was produced by labeling mouse monoclonal anti-FLAG antibody M2 (IntegraBiosciences, Eastman Kodak, catalog number IB13025) with fluorescein 5-isothiocyanate (Molecular Probes, F-1906), and a molar dye to protein ratio ϭ 3.6 was obtained. The beads were washed again with cold incubation buffer and transferred to Micronics tubes for analysis on a FACsort (BD Biosciences) with a modified sample uptake needle of 275-m inner diameter. All tubes were coated for at least 1 h with 100% FCS.

Incubation of STAT1 with PL407 for Sorting
25 mg of PL407 beads (600 beads/g) were incubated in 10 ml of incubation buffer with 8 nM STAT1 overnight at 4°C on a rotating wheel. The beads were washed in incubation buffer and incubated in 2.5 ml of the same buffer with 5.5 g of FITC-labeled anti-FLAG antibody for 1 h.

Sorting of Beads of the PL407 Library after Incubation with STAT1
The beads of the library PL407 incubated with STAT1 and fluorescent antibodies were analyzed and isolated on the cell sorter Elite ESP (Beckman Coulter, Inc.) using a 150-m nozzle at a sheet pressure of 4.4 pounds/square inch and a droplet frequency of 7.3 kHz and one droplet collection mode aborting coincident events. Fluorescence was measured at channels FL1 at 530 nm wavelength in peak area logarithmic detection mode, and forward and side scatter were measured in peak area linear detection mode. The gating for the sort was a forward and side scatter region corresponding for non-aggregated, monomeric beads linked by "and" with a FL1 histogram region. The combined gating was set to select for isolation the highest fluorescent events amounting to about 1.7% of total events. Event rate was held at about 250 beads/s. The beads were collected in tubes previously coated with 100% fetal calf serum for 1 h.

Incubation of STAT1t and STAT3 with Library PL408-2br
30 nM of purified STAT1t or STAT3 proteins were incubated with 30 mg of beads (25,000 beads/mg) in a total volume of 2 ml of incubation buffer overnight at 4°C on a rotating wheel. Beads were centrifuged at 1500 ϫ g for 5 min, washed 4 times with cold incubation buffer, resuspended in 1 ml of incubation buffer, and incubated once more with 8.8 g of anti-FLAG FITC antibody for 1 h in the dark on ice mixed occasionally by vortexing. The beads were left settling; the supernatant was removed, and PBS was added. A 50-l aliquot was used for a control analytical run (data not shown). All tubes were previously coated for at least 1 h with 100% FCS.

Sorting of Beads of the PL408-2br Library after Incubation with STAT1t or STAT3
The beads of the library PL408-2br incubated with STAT proteins and fluorescent antibodies were analyzed and isolated on a cell sorter Elite ESP (Beckman Coulter, Inc.) using a 150-m nozzle at sheet pressure of 5.2 pounds/square inch and a droplet frequency of 9.1 kHz and a two-droplet collection mode aborting coincident events. Fluorescence was measured at two channels, FL1 at 530 nm and FL2 at 575 nm wavelength simultaneously in linear peak detection mode. The sample beads to be sorted were kept in suspension by continuous slow vortex-ing. The Elite settings were peak linear for forward scatter, FL1 and FL2, and logarithmic integral for side scatter. The gating for the sort included a forward scatter/side scatter region linked by "and" with an FL1/FL2 region. An FL1/FL2 region was chosen instead of a simple FL1 histogram region for better discrimination between FITC fluorescence versus autofluorescence of beads. The gating was set to select for isolation the highest fluorescent events amounting to about 0.3% of total events for STAT1t and 0.5% for STAT3, respectively. Event rate was held at about 100 beads/s. The beads were collected in tubes coated with 100% fetal calf serum for 1 h.

Peptide Sequencing
Library beads were analyzed on a HP G1000A system (Hewlett-Packard). A newly optimized chemistry method, double couple 3.0, was used to enhance chemical efficiency, minimize lags, and increase yield. In the sample cartridge the biphasic column was modified. 20% of the packing material from the bottom part (SAX) of the column was removed to create enough space for the small polyvinylidene fluoride strips (Immobilon P, Millipore) on which the single bead was deposited for handling and analysis. Pool sequencing of the sorted library beads was performed as described above for single bead sequencing, except that about 650 beads were loaded in the sample cartridge. The results from pool sequencing were modified by a linear correction in cycles 2-12 for the observed lag from the previous cycle by forcing the contents to zero for all amino acids in cycle 6, the position of phosphotyrosine. The corrected amount C x,n (amino acid n at cycle X) was obtained from the original value A x,n by the formula: C x,n ϭ A x,n Ϫ A xϪ1,n ϫ A 6,n /A 5,n . These values were normalized with the detected, lag-corrected amounts of each amino acid from sequencing the original and not the sorted library PL408-2br. Then the relative amounts of the amino acids of the pool were calculated for each cycle.

Calculation of Probabilities for Common Sequences
The probability P(n) that in the three positions ϩ1, ϩ2, and ϩ3 of the 54 sequences obtained from beads (Tables I and II), n sequences would be found per chance in common with the 17 different sequences in the same positions from natural docking sites (Table IV)

Peptide Synthesis
All reagents and solvents used in the synthesis were of the highest quality commercially available. Amino acid derivatives were purchased from Alexis (Grü nberg, Germany) or were prepared according to standard protocols; TentaGel-S-PHB-resin was from Rapp (Tü bingen, Germany). The peptides were synthesized by standard protocols according to the Fmoc/t-butyl chemistry (23) on an ACT 396 synthesizer (Advanced Chemtech, Louisville, KY). Cleavage from the resin and deprotection was performed with trifluoroacetic acid/triethylsilane/water (95:3:2) in 1.5 h, and purification of the peptides was achieved by high pressure liquid chromatography on Nucleosil 5 C18 PPN with a linear gradient of acetonitrile (0.08% trifluoroacetic acid), 0.1% trifluoroacetic acid from 15:85 to 60:40 in 40 min.

BiaCore Competition Assays
The Bialite machine was employed to perform BiaCore competition binding assays. IFNGR1-derived biotinylated 435-446 tyrosine-phosphorylated or -unphosphorylated peptides were immobilized on a streptavidin sensorchip surface (SA-5, BiaCore) by injecting 10 l of 20 nM peptide in HBS buffer (BiaCore) at a flow rate of 5 l/min (reaching a signal of about 30 -60 resonance units). The unphosphorylated biotinylated IFNGR1 chain 435-446 was immobilized on the 1st flow cell and the phosphorylated form on the 2nd flow cell on the same sensor-chip. The gp130 receptor derived biotinylated 762-773 tyrosine-phosphorylated or -unphosphorylated peptides and the SHC-derived biotinylated 424 -431 tyrosine-phosphorylated and -unphosphorylated peptides were immobilized on separate sensorchips in a similar manner.
STAT1t (400 nM) and STAT3 (250 nM) were incubated with various concentrations of different non-biotinylated phosphopeptides for 20 min at room temperature. The samples were then injected onto their respective sensorchip mentioned above. To avoid bulk refractive index changes due to differences between the injected solution and the elution buffer, as well as unspecific associations resulting from binding of the protein surface and/or soluble peptides to the immobilized peptides and/or the sensorchip surface, the protein sample was passed sequentially over both flow cells. The signal corresponding to unspecific associations was then subtracted from the signal obtained by the flow cell having the phosphorylated immobilized peptide. STAT1t or STAT3 binding in the presence of phosphopeptide was then calculated as a percentage of total STAT1t or STAT3 binding in the absence of competitor.
Grb2-SH2-GST fusion protein (50 nM) was incubated with various concentrations of different non-biotinylated phosphopeptides for 20 min at room temperature. Measurements were then made as outlined above.

Cloning of Plasmids and Mutagenesis
cDNA of a mutant and truncated murine EpoR (EpoR-His/Tyr 343 -Phe) (24) was kindly provided by James Ihle. The cDNA was amplified by PCR using a common forward primer (5Ј-CCGGGCTGCAGGAATT-CCCCCTCGAGCTGCAG-3Ј) and reverse primers that introduce 7 amino acids corresponding to positions Ϫ2 to ϩ4 of the following peptides listed in Table III: The amplification products were digested with XbaI and EcoRI and cloned into plasmid pcDNA3 (Invitrogen). The correct sequence of all constructs was verified by sequencing with T7 and SP6 primers (data not shown).

Cell Lines and Cell Culture
DA3 cells (murine leukemia-derived cell line) were kindly provided by James Ihle (24) and were grown in RPMI 1640 medium containing 10% fetal calf serum (FCS), 25 units/ml murine IL-3, and 100 units/ml penicillin/streptomycin (all Invitrogen, except murine IL-3, Sigma I4144), at 37°C, 5% CO 2 . Cells were transfected with the different constructs using electroporation. 72 h post-transfection, G418 (Calbiochem) was added to the medium at a concentration of 550 g/ml for selection of transfected cells. At day 14 post-transfection, single clones from the growing pool were obtained by limiting dilution. Clones were selected for cell surface expression of EpoR by flow cytometric analysis.
Flow Cytometric Analysis 10 6 DA3 cells were harvested at 500 ϫ g for 5 min, filtered, and washed in 1 ml of ice-cold FACS buffer (1% FCS, 0.02 mM NaN 3 in phosphate-buffered saline (PBS)). Cells were then incubated on ice with anti-EpoR (Santa Cruz Biotechnology, sc-5624) for 45 min. Cells were washed twice with FACS buffer and subsequently incubated on ice for 20 min with anti-rabbit Alexa 488 (Molecular Probes, A-11008). Cells were washed again twice with FACS buffer, taken up in 100 l, and analyzed using a 2-laser FACSCalibur (BD Biosciences).

Binding of Recombinant STAT Proteins to the Peptide Library-
To study the molecular basis of the SH2 domain specificities of STATs, we first purified the closely related STAT1 and STAT3 proteins. Because entire STAT proteins cannot be expressed in bacteria, the full-length human STAT1 and the full-length murine STAT3 cDNA were fused with a FLAG epitope at the carboxyl terminus and cloned into a baculovirus vector. The fusion proteins were expressed in Sf21 and TN5 insect cells and immunoaffinity-purified with anti-FLAG gel beads to over 90%. For expression in E. coli, a shorter form of STAT1 was prepared as well and designated STAT1t. Amino acids 135-712 of human STAT1 were subcloned and fused with a FLAG epitope at the carboxyl terminus. This truncated form of STAT1 lacks the amino-terminal STAT domain and the transactivation domain at the carboxyl terminus but contains the coiled-coil domain, the DNA binding domain, the linker domain, and the SH2 domain. The fusion protein was expressed in E. coli and purified by immunoprecipitation with anti-FLAG affinity gel beads to over 90% purity.
The activity of the SH2 domains of the recombinant proteins and the optimal concentrations for the incubation with the phosphopeptide library were tested in a pilot study. Different concentrations of STAT1, STAT1t, and STAT3, ranging from 3 to 100 nM of recombinant FLAG-tagged STAT proteins, were incubated with 250 g of beads of the phosphotyrosine peptide library PL407. This library with the general structure 19 spacer bead was synthesized by the split and pool method on 10-m TentaGel beads. X 19 indicates the 19 natural amino acids except cysteine, and Y* indicates a phosphorylated tyrosine residue. The split and pool method generates beads that carry a unique peptide sequence (22). The diversity of the library theoretically amounted to 19 9 ϭ 3.2 ϫ 10 11 different sequences. After incubation of the recombinant STATs with the library, an anti-FLAG monoclonal antibody labeled with the fluorescent dye FITC was added. The beads were then analyzed by flow cytometry. The general binding properties of STAT1t and STAT3 are shown in Fig. 1, A and B, respectively. The black line shows the background fluorescence of the beads. The addition of FITC-labeled FLAG antibodies (green curve) increases background fluorescence through nonspecific binding to the beads, shifting the curve to the right. Highly fluorescent beads are then detected after the addition of recombinant STAT1t protein as shown in Fig. 1A (red line ϭ 3 nM, blue line ϭ 10 nM, orange line ϭ 30 nM, and dark blue line ϭ 100 nM STAT1). STAT1t binds to the phosphopeptides on these beads, and the FITC-labeled FLAG antibody binds to the FLAG tag at the carboxyl terminus of STAT1t. Fig. 1B shows the same set of experiments (with the same color coding for the different concentrations) with recombinant full-length STAT3.
We then wanted to test if the observed binding indeed reflects phosphotyrosine-SH2 domain binding. We therefore synthesized the non-phosphorylated tyrosine peptide library PL409-10 with the general structure X 19 X 19 X 19 X 19 X 19 YX 19 X 19 X 19 X 19 X 19 X 19 spacer beads on 10-m TentaGel beads. This library was again incubated with the different concentrations of STAT1t (Fig. 1C) and STAT3 (Fig. 1D) as used with the phosphorylated library before. Addition of increasing amounts of recombinant STATs results in a dose-dependent increase of the number of beads with high fluorescence, but this increase was clearly less than with the phosphorylated library. Fig. 1, E and F, shows an overlay of the curves obtained with 100 nM STAT1 (Fig. 1E) and 30 nM STAT3 (Fig. 1F) and the phosphorylated library (red line) and the non-phosphorylated library (dark blue line), respectively. The importance of the SH2 do-main for the generation of high fluorescence beads was further confirmed with a R602K mutant of recombinant STAT1t. The change of arginine 602 to lysine abolishes the function of the SH2 domain because arginine 602 is necessary for binding of the phosphate group in the phosphotyrosine-binding pocket of the SH2 domain (16,25). Incubation of the phosphotyrosine library PL407 with 100 nM of the R602K STAT1t mutant (yellow line in Fig. 1E) results in a curve similar to the background fluorescence generated by the addition of FITC-labeled FLAG antibodies in the absence of STAT proteins (green curve in Fig.  1, A-D).
Pool Sequencing of PL407 Beads Bound by Full-length STAT1-25 mg of PL407 beads (600 beads/g; 1.5 ϫ 10 7 beads) were then incubated with 8 nM STAT1 overnight; FITC-labeled FLAG antibody was added, and the beads were sorted on an Elite ESP. The gating was set to select for the isolation of the highest fluorescent events amounting to about 1.7% of total events. The selected beads were then used for pool sequencing (Fig. 2A). The binding motif for STAT1 derived from this pool sequencing approach was YRYY*RRRYF. Strikingly, arginine was found in a relative frequency of 1.8 to 4 in positions Ϫ3, Ϫ2, Ϫ1, ϩ1, and ϩ2 of the phosphopeptides. At position ϩ3 it was 11 times more frequent than the statistical average.
Synthesis and Properties of the Large Bead Phosphopeptide Library PL408-2br-Next, we synthesized a new peptide library on large beads by the split and pool synthesis. Each of these large 35-m TentaGel beads carried ϳ1 pmol of an individual peptide (about 6 ϫ 10 11 molecules), and this amount was sufficient for determination of the sequence of the peptides at the single bead level. This second phosphopeptide library was designated PL408-2br and has the general structure 19 . In additional pilot ex-periments with this new library, the optimal concentrations of recombinant STAT1t and STAT3 for bead sorting were found to be 30 nM (data not shown).
Sorting and Sequencing of High Fluorescence Beads-30 mg of beads (25,000 beads/mg; 750,000 beads) were incubated with 30 nM recombinant STAT1t or STAT3 in a total volume of 2 ml overnight at 4°C on a rotating wheel. FITC-labeled FLAG antibodies were then added, and the mixture was incubated for an additional hour. The beads were then analyzed and sorted by flow cytometry. The gating was set to select for the isolation of the highest fluorescent events amounting to about 0.3% of total events for STAT1t and 0.5% for STAT3, respectively. The selected beads were then subjected to peptide sequencing. Fig. 2B shows the result of the pool sequencing of the beads selected by STAT1t. According to the results, the STAT1t SH2 domain is not very selective for the amino acids upstream of the phosphorylated tyrosine, with the exception of position Ϫ1 relative to Y*, where tyrosine is preferred. At position ϩ1, the negatively charged amino acids aspartic acid and glutamic acid are favored at ϩ2; a proline or an arginine is chosen at ϩ3; a strong preference for the positively charged arginine is found; and at ϩ4, histidine is slightly preferred. The negatively charged amino acids are disfavored at positions ϩ2 and ϩ3. The hydrophobic amino acids are well tolerated at ϩ1 but not at ϩ2 or ϩ3. In positions ϩ5 and ϩ6, no clear preferences can be detected. The peptide representing the amino acids most frequently present at each position has the sequence VDYKYY*-DPRHDL (Pool 1 in Table I). At several positions additional amino acids are clearly preferred, and a second ideal binding peptide considering such alternative amino acids was synthesized for further analysis (Pool 2 in Table I). We then performed single bead sequencing of 27 beads selected by STAT1t. The amino acid compositions of these peptides are shown in Table I. The consensus peptide was calculated with the computer program Lineup (Wisconsin Package, Genetics Computer Group, Inc.) and shows the amino acids found most often in any position along the peptide. Fig. 2C shows the pool sequencing of the beads selected by full-length STAT3. Strikingly, the full-length form of STAT3 displays the same 2-4-fold preference for arginine at all positions along the peptide that was found with full-length STAT1.
Only glutamine at position ϩ3 and tyrosine at position ϩ4 are more frequent than arginine. The optimal binding motif for STAT3 is predicted to be RRRRRY*RRQYRR (3-Pool 1 in Table  II). Excluding arginine results in the binding motif VKYKDY*KPQYAY (3-Pool 2 in Table II). In this corrected motif, the position with the strongest selective preference for an amino acid is the position ϩ3, where glutamine is found 3 times more frequently than the expected statistical probability. Table II shows the sequences obtained from single bead sequencing of 27 different peptides (designated 3-1 to 3-27). The The value 1 indicates no selectivity, and 19 would represent a case where only one amino acid is present at a certain position. Amino acids with values greater than 1 are positively selected, and amino acids with values smaller than 1 are disfavored. The amino acids are grouped by biochemical properties (acidic, basic, polar, and hydrophobic) and appear in the same order and colors as in the boxed legend. A, full-length STAT1 was incubated with the phosphotyrosine peptide library PL407. B, the truncated STAT1t lacking the amino-and carboxyl-terminal domains was incubated with the large bead library PL408-2br. C, full-length STAT3 was incubated with PL408-2br.
consensus sequence was calculated with the program LINEUP. No consensus amino acid could be calculated at positions Ϫ4, Ϫ2, ϩ4, and ϩ6. The consensus peptide VXRXRY*RRQXRX shows the preference of full-length STAT3 for glutamine at position ϩ3. Interestingly, a hydrophobic amino acid at position ϩ1 is found in 11 of the 27 peptides. As shown in Table IV,   TABLE I  STAT1-SH2 domain binding peptides   Ϫ5  Ϫ4  Ϫ3  Ϫ2  Ϫ1  Y  ϩ1  ϩ2  ϩ3  ϩ4  ϩ5  ϩ6 Single bead sequencing Single bead sequencing hydrophobic amino acids are frequently found in known receptor docking sites for STAT3. Basic amino acids are found in 10 of the 27 peptides at position ϩ1. Again, basic amino acids are found frequently in known receptor docking sites for STAT3. Strikingly, this equally strong preference for hydrophobic residues and for basic residues at position ϩ1 found in the individual peptides is not represented in the consensus peptide or in the pool sequence derived motifs. In both cases the hydrophobic residues are under-represented because no preference for any single amino acid of the 7 hydrophobic amino acids is found, and the weight is distributed on seven amino acids. The weight of basic residues is distributed on 3 amino acids only, which can then emerge from the amino acid pool as preferred residues.
In order to eliminate additional binding sites in the aminoterminal or carboxyl-terminal domain that could bind argininecontaining peptides, we tried to express and purify a truncated version of STAT3 equivalent to STAT1t. Unfortunately, we could not get enough soluble and active protein after expression in E. coli for the peptide library experiments.
Resynthesis and Affinity Determination of Selected Peptides-Six of the 27 peptides selected by STAT1t and 6 of the 27 peptides selected by STAT3 were randomly selected for resynthesis (Table III). The consensus peptides and the pool sequence motifs of STAT1t and STAT3 were also synthesized, as well as a number of known binding sites for STAT1 or STAT3 (Table III). These peptides were then used for affinity measurements in a BiaCore competition binding assay. For the STAT1 SH2 domain, a tyrosine-phosphorylated peptide corresponding to the known STAT1 docking site on IFNGR1 at Tyr 440 was used as a reference for the competition assay (26). The signals obtained for the binding of 400 nM STAT1t to the sensor chip coated with the IFNGR1-Y440 phosphopeptide was set as 100%. For competition experiments, increasing concentrations of peptides were incubated with the recombinant STAT1t protein, and the degree of inhibition of the signal was measured. The results in Table III, column 3, are expressed as ID 50 values and represent the concentration of competing peptide that achieves a 50% inhibition of the binding of STAT1t to the IFNGR1-Y440-coated sensor chip. Shown in parentheses are the relative ID 50 values compared with the ID 50 value of the reference peptide IFNGR-Y440. The non-phosphorylated IFNGR1-Y440 peptide cannot compete for the binding of STAT1 to the phosphorylated IFNGR1-Y440 peptide, and its ID 50 is Ͼ1 mM. The phosphopeptides corresponding to the two known STAT1-docking sites IFNGR1-Y440 and CSF-1R-Y708 (27) show strong to medium binding reflected by ID 50 values of 0.49 and 4.03 M, respectively. Four of the six library peptides selected by STAT1 (1-4, 1-8, 1-9, and 1-26) have ID 50 values in the range of the docking sites IFNGR1-Y440 and CSF-1R-Y708. Peptide 1-24 has a slightly higher ID 50 , and peptide 1-22 shows an ID 50 of 49 M, 100 times higher than the reference peptide. The consensus peptide and the pool sequence derived motif 1 have ID 50 values similar to the known docking sites (Table III,  column 3).
We then wanted to test if the presence of the amino-terminal or carboxyl-terminal domain changes the ID 50 values in our BiaCore competition assay. The peptides IFNGR1-Y440, 1-Pool 1 and 1-26 were tested with the full-length and the truncated form of recombinant STAT1. The ID 50 of IFNGR1-Y440 was 0.51 M for STAT1 and 0.49 M for STAT1t, respectively. The peptide 1-Pool 1 had an ID 50 of 3.63 M with STAT1 and 3.19 M with STAT1t, and the peptide 1-26 had an ID 50 of 1.5 M with STAT1 and 2.89 M with STAT1t, respectively. We conclude that the binding characteristics of the STAT1 SH2 domain are not changed by the presence of the amino-terminal and carboxyl-terminal domain. We therefore do not believe that the selection of arginine-rich peptides by full-length STAT1 and STAT3 is a result of a changed SH2 domain binding specificity. It is more likely caused by non-specific binding of the arginine side chain to amino-and carboxyl-terminal parts of STAT proteins. For STAT3, the specific docking site on gp130 at Tyr 767 served as the reference tyrosine-phosphorylated peptide. The signals obtained for the binding of 250 nM STAT3 to the sensor chip coated with the gp130-Y767 phosphopeptide was set as 100%. The results of the competition assay with the resynthesized peptides are expressed as ID 50 and shown in Table III, column 4. The importance of the SH2 domain-phosphotyrosine interaction was confirmed by the absence of any inhibitory activity of the non-phosphorylated gp130-Y767 peptide. The known STAT3-docking sites on gp130, Tyr 767 , and Tyr 905 have ID 50 values of 5.32 and 2.25 M, respectively. We could not measure an ID 50 value for the pool sequence derived peptide 3-Pool 1, because it had strong nonspecific binding to the sensor chip surface. The consensus peptide and the pool sequence derived 3-Pool 2 peptide bound very strongly to the STAT3 SH2 domain, with an ID 50 value about 10 times lower than the reference peptide gp130-Y767. The peptides 3-8 and 3-11 have very high ID 50 values. They do not specifically bind to the STAT3 SH2 domain. The peptides 3-3, 3-4, 3-13, and 3-25 are 2 to 6 times weaker inhibitors than the gp130-Y767 peptide.
In order to confirm that the peptides selected are specific for STAT proteins, we tested their affinities toward the SH2 domain of the adapter protein Grb2. With the exception of peptides 1-22 and 1-24, they all showed an ID 50 of more than 100 M and many of them more than 500 M. Furthermore, the high ID 50 values of a Grb2-specific phosphopeptide derived from SHC toward STAT1 and STAT3 SH2 domains confirms the specificity of the phosphopeptide-SH2 domain interaction.
STAT proteins are phosphorylated on a single tyrosine carboxyl-terminal of the SH2 domain. This phosphotyrosine is then bound by the SH2 domain of the partner STAT during the formation of STAT dimers. We tested phosphotyrosine peptides corresponding to the tyrosine phosphorylation sites of STAT1, STAT2, and STAT3. The STAT1 and STAT3 peptides are only modest to weak inhibitors of the STAT-sensor chip interaction. The ID 50 values of STAT1-Y701 and STAT3-Y705 in the STAT1-IFNGR1-Y440 assay are over 50 times higher than the reference peptide IFNGR1-Y440. The same is found in the STAT3-gp130-Y767 assay, where STAT1-Y701 and STAT3-Y705 are 22 and 7 times weaker inhibitors than the reference peptide gp130-Y767, respectively. STAT2-Y690 is an exception. The ID 50 values of 3.59 M on IFNGR1-Y440 sensor chip and of 3.9 M on the gp130-Y767 chip are in the range of the ID 50 values of the known receptor-docking site peptides that we tested.
In Vivo Specificity of Phosphopeptides-We next tested if peptides selected by our library approach could activate the corresponding STAT proteins in cells. We used an erythropoietin receptor truncated at amino acid 377 and with a Y343F mutation that lacks all intracellular phosphotyrosine-docking sites (24) as a backbone, onto which different phosphopeptide sequences were transferred. The fusion proteins were stably transfected into DA3 cells that have no endogenous EpoR. The expression of the chimeric receptors was confirmed by flow cytometric analysis using an antibody against the EpoR (data not shown).
We constructed 6 fusion proteins with individual potential docking sites for STATs corresponding to the peptide sequences IFNGR1-Y440, gp130-Y767, 1-4, 1-9, 3-8, and 3-consensus shown in Table III. The first two served as controls for activa-tion of STAT1 and STAT3, respectively. The other peptides were selected because of their affinities to STAT1 or STAT1 as measured in the BiaCore assays (Table III). 1-4 had a high affinity for STAT1, but not for STAT3, whereas 1-9 had high affinities for both STATs. 3-Consensus had a very high affinity for STAT3 and a 10-fold lower affinity for STAT1. 3-8 had a low affinity for both STATs.
The different clones were then stimulated with recombinant human erythropoietin, and the activation of STAT proteins was tested by phosphotyrosine Western blots (Fig. 3). Because the expression levels of the chimeric receptors cannot be controlled exactly, the signal intensity differences between the different clones are not quantitative data that reflect the absolute affinities of the receptors to STAT1 or STAT3. However, the relative activation of STAT1 versus STAT3 can be compared between different clones and allows conclusions about specificity of STAT activation. The EpoR fusion construct with the IFNR1-Y440 peptide activated STAT1 only (Fig. 3A), in agreement with the known activation of STAT1 but not STAT3 by the interferon ␥-receptor. The fusion of peptide 1-4 to the truncated EpoR allowed activation of STAT1 and some STAT3, whereas peptide 1-9 bound both STAT1 and STAT3 (Fig. 3A). The different activation patterns correspond very well to the BiaCore affinity measurements (Table III). The gp130-Y767 peptide can bind both STAT3 and STAT1, reflecting the observation that cytokines that engage the gp130 receptor chain can activate both STATs (28). Similarly, the 3-consensus peptide can bind to STAT3 and to STAT1. We could not obtain phospho-STAT1 nor phospho-STAT3 signals with several clones expressing the EpoR-peptide 3-8 fusion protein.
We conclude that the phosphopeptide library approach selects biologically relevant peptides and that the affinity measurements with BiaCore allow predictions about the specificity of such peptides for different SH2 containing proteins. DISCUSSION In the current model of STAT activation, the specific contact of the SH2 domains of the different STATs to phosphotyrosine-  (Table III) were added to a truncated EpoR with a Y343F mutation that lacks all original EpoR phosphotyrosine-docking sites. DA-3 cells were transfected, and clones with stable expression of the fusion proteins were selected. The different clones were stimulated for 20 min with 10 IU/ml human recombinant Epo. Cell lysates were tested for STAT1 or STAT3 activation with phosphotyrosine Western blots with phospho-STAT1 or phospho-STAT3-specific antibodies. A, peptides selected by the STAT1 SH2 domain. The IFNGR1-Y440 peptide attached to the EpoR can activate STAT1 only; the 1-4 peptide activates STAT1 and little STAT3, and the 1-9 peptide activates both STAT1 and STAT3. B, peptides selected by the STAT3 SH2 domain. Both gp130-Y767 and 3-consensus peptides activate STAT3 and STAT1.
docking sites at the receptors is crucial for the selection of STAT family members by the different cytokines. We wanted to understand better the molecular rules underlying the specific binding of STATs to the receptors. The binding preferences of the SH2 domains can be elucidated through sequencing of high affinity ligands from phosphopeptide libraries. This approach has been used for a number of SH2 domain containing signaling proteins. In these experiments, the isolated SH2 domains were expressed in the form of GST fusion proteins (19,20). The STAT SH2 domains have not yet been analyzed, and previous attempts to use STAT SH2 domain fusion proteins were unsuccessful. 2 We therefore expressed truncated or full-length STAT1 and STAT3 proteins in E. coli or insect cells, respectively.
Pool sequencing of the peptides selected by full-length STAT1 or full-length STAT3 revealed a striking preference for arginine at each position along the phosphotyrosine peptide. This finding is in contrast to the result obtained with the truncated form of STAT1 that lacks the amino terminus and the carboxyl terminus. Most likely, the over-representation of arginine is an artifact caused by nonspecific interactions between the full-length STATs and the arginine side chain with its hydrophobic and basic properties. We do not believe that it reflects a genuine change of the SH2 domain structure in full-length STATs versus truncated STATs, because we tested the affinities of both forms of STAT1 for 3 peptides (IFNGR1-Y440, 1-Pool 1, and 1-26) and found no systematic difference between them.
Because the pool sequencing approach can potentially predict wrong motifs in the case that amino acids are not independent from each other, we synthesized a new peptide library on the large 35-m TentaGel beads suited for single bead sequencing. This library with the general structure X 19 X 19 X 19 X 19 X 19 Y*X 19 X 19 X 19 X 19 X 19 X 19 has the enormous theoretical diversity of 19 11 ϭ 1.17 ϫ 10 14 . However, only 750,000 beads were present in our binding experiments, and it could seem that many of the potential high affinity peptides were missed in our approach. This would indeed be true if the binding specificity would be determined by amino acids along the entire dodecapeptide. However, most of the selectivity comes from the residues at position ϩ1, ϩ2, and ϩ3, and only 6859 (19 3 ) different amino acid combinations are possible at these positions. (Each single combination of amino acids at ϩ1 to ϩ3 can be embedded in 19 8 (1.7 ϫ 10 10 ) different peptides.) In the average every unique combination of amino acids at ϩ1 to ϩ3 was present 109 times (750,000 divided by 6859) in the 750,000 beads incubated with the recombinant STAT proteins in our experiments. We are confident that all high affinity peptides were present in these reactions. Of course, the two times 27 individual peptides that were sequenced are just a tiny fraction of the high affinity peptides selected by the STAT proteins. To obtain the exact sequence of a known receptor docking site would have been a very unlikely event. But in the positions ϩ1, ϩ2, ϩ3 crucial for the specificity, we found 3 sequences identical to receptor sites at these positions, namely LPQ in the peptide 1-9 and in IL-9R-Y407, RPQ in peptide 3-6 and leukemia inhibitory factor receptor-Y1028, and VRR in peptide 3-13 and CSF-1R-Y708 (Tables III and IV). The statistical probability to find three common sequences by chance is 0.033% (calculation see material and methods). Therefore, our finding of three common sequences is highly significant proof for the correctness of the sequences (or a subset thereof) obtained by the fluorescence-activated bead sorting. These theoretical considerations are confirmed by our in vivo data obtained with EpoR fusion proteins with different phosphopeptides (Fig. 3). The three peptides 1-4, 1-9, and 3-consensus can bind STAT1 and STAT3, as shown by Epo induced phosphorylation of these STATs.
The proposed optimal binding motif for STAT1 as derived from our pool sequencing experiments with recombinant STAT1t is XXYY*(D/E)(P/R)RHXX. In the single peptide sequencing approach, none of 27 randomly selected and sequenced peptides had the optimal Y*DPR sequence. This could happen by chance. Alternatively, the pool sequence motif could be a superimposition of more than one binding motif. For example, none of the single peptide sequences show a combination of a negative charge at ϩ1 and a proline at ϩ2. Also in favor of more than one optimal binding motif are the BiaCore data. The three peptides IFNGR1-Y440, 1-4, and 1-9 fit the above binding motif at only one of the three positions carboxylterminal to the phosphotyrosine, but they can bind better to the STAT1 SH2 domain than the 1-Pool 1 peptide with its Y*DPR sequence (Table III) Ϫ5  Ϫ4  Ϫ3  Ϫ2  Ϫ1  Y  ϩ1  ϩ2  ϩ3  ϩ4  ϩ5  ϩ6   STATS  STAT1-Y701  P  K  G  T  G  *  I  K  T  E  L  I  STAT2-Y690  Q  E  R  R  K  *  L  K  H  R  L  I  STAT3-Y705  G  S  A  A  P  *  L  K  T  K  F  I  Receptors  STAT1 only BiaCore competition assay, we propose the sequence Y*(D/ E)(P/R)(R/P/Q) as binding motif for STAT1, with an mutual exclusion of negative charged amino acids at position ϩ1 and proline at position ϩ2. The definition of the binding motif for the STAT3 SH2 domain is hampered by the nonspecific binding of arginine to the full-length STAT proteins. We were unable to get enough soluble recombinant protein of a truncated STAT3 without the carboxyl-and amino-terminal domains, and we could not directly determine the peptide binding preferences of such a short STAT3. The collection of 27 peptides selected by the full-length STAT3 will contain phosphopeptides that bound to the SH2 domain and arginine-rich peptides that are bound non-specifically to other domains outside the SH2 domain. We cannot reliably identify the true SH2 domain peptides. From the 6 peptides randomly chosen for resynthesis and BiaCore competition analysis, only two (3-14 and 3-29) had reasonable low ID 50 values in the range of the known docking sites on the gp130 receptor. Interestingly, the consensus peptide calculated from the 27 individually sequenced STAT3 peptides with the computer program LINEUP has a very low ID 50 value. It binds over 10 times better to the STAT3 SH2 domain than the gp130-Y767 peptide. Both peptides have two positively charged amino acids at positions ϩ1 and ϩ2 and a glutamine at position ϩ3, but the arginine at position ϩ2 is apparently better bound than the histidine. Based on the combined analysis of the pool sequencing, the individual peptide sequences and the BiaCore competition assay, we propose the sequence Y*-(basic/hydrophobic)-(P/basic)-Q as the binding motif for STAT3.
The BiaCore competition assays reveal another basic property of the Jak-STAT signaling pathway. The affinities of the STAT SH2 domains for the phosphotyrosine peptides of other STAT proteins in homo-or heterodimers are much lower than for peptides corresponding to receptor docking sites (Table III). The STAT2-Y690 peptide is an exception to the rule. It has an ID 50 value for the STAT1-IFNGR1-Y440 interaction that is in the range of receptor docking sites. The phosphorylated STAT2 could serve as a docking site for both the STAT1 and the STAT3 SH2 domains. In fact, no STAT1 docking sites have been identified on IFN-␣ receptors, whereas possible STAT2-docking sites were found (29,30). Furthermore, the activation of STAT1 by IFN-␣ depends on the presence of a functional STAT2 (31). The mechanism of IFN-␣ induced STAT3 activation is not yet clear. The IFNAR1-Y527 with the sequence Y*SSQ has been proposed as the docking site for STAT3 (32), but this peptide is a very week inhibitor of STAT3 binding to the gp130-Y767 (Table III). We believe that the STAT2-Y690 site is a better candidate docking site for STAT3 to the IFN-␣ receptor. How do the STATs form dimers if the affinities to the tyrosine site of the partner STATs are so much lower than for the receptor docking sites? The only possibility is to form mutual phosphotyrosine-SH2 domain interactions and thereby high avidity complexes. IFN-␣ induces the formation of STAT1 homodimers, STAT1-STAT2 heterodimers, STAT1-STAT3 heterodimers, and STAT3 homodimers, but no STAT2-STAT3 heterodimers are observed. Our ID 50 measurements demonstrate that the STAT3 SH2 domain can bind to the STAT2-Y690 with similar affinity as the STAT1 SH2 domain. Therefore, the lack of STAT2-STAT3 heterodimer formation is probably caused by the incapacity of the STAT2 SH2 domain to bind to the STAT3-Y705 site. Selection and sequencing of peptides that bind with high affinity to the SH2 domain of STATs can potentially help to identify new STAT-binding proteins. We performed a blast search with pentapeptides containing the tyrosine and the four carboxyl-terminal amino acids (positions ϩ1 to ϩ4) from our list of STAT1-or STAT3-binding peptides (Tables I and II, respectively). We searched the human genome data base at NCBI using pre-set parameters for searching with short sequences. Table V shows a selection of hits obtained with STAT1-binding peptides. As already mentioned, peptide 1-9 is identical to the known STAT1-docking site on the IL-9 receptor (33). Interestingly, peptide 1-6 is identical to the four carboxylterminal amino acids of the long isoform of IFNAR2. The other proteins listed in Table V have not been implicated in Jak-STAT signaling so far. However, there are several kinases and phosphatases among them, and further work might identify new cross-talks between them and the Jak-STAT signaling pathway.
We have shown here the feasibility and value of the large peptide library approach for the analysis of phosphopeptide binding specificities of SH2 domain-containing proteins. An important potential benefit of our approach is the identification of peptides with very high affinities to a specific SH2 domain. Such peptides could serve as models for the design of specific inhibitory molecules. In the case of STAT1, the known IF-NGR1-Y440 docking site is such a high affinity peptide. The 1-4 library peptide has an even lower ID 50 value. Both peptides have low affinities for the STAT3 SH2 domain and could serve as models for a STAT1-specific inhibitory molecule. For STAT3, the pool sequence-derived motif and the computer consensus motif for STAT3 have ID 50 values of 0.34 and 0.64 M, respectively. These peptides have comparatively low affinities for STAT1 and might serve as models for specific STAT3 inhibitory molecules.