The Structure of Prp40 FF1 Domain and Its Interaction with the crn-TPR1 Motif of Clf1 Gives a New Insight into the Binding Mode of FF Domains*

The yeast splicing factor Prp40 (pre-mRNA processing protein 40) consists of a pair of WW domains followed by several FF domains. The region comprising the FF domains has been shown to associate with the 5′ end of U1 small nuclear RNA and to interact directly with two proteins, the Clf1 (Crooked neck-like factor 1) and the phosphorylated repeats of the C-terminal domain of RNA polymerase II (CTD-RNAPII). In this work we reported the solution structure of the first FF domain of Prp40 and the identification of a novel ligand-binding site in FF domains. By using chemical shift assays, we found a binding site for the N-terminal crooked neck tetratricopeptide repeat of Clf1 that is distinct and structurally separate from the previously identified CTD-RNAPII binding pocket of the FBP11 (formin-binding protein 11) FF1 domain. No interaction, however, was observed between the Prp40 FF1 domain and three different peptides derived from the CTD-RNAPII protein. Indeed, the equivalent CTD-RNAPII-binding site in the Prp40 FF1 domain is predominantly negatively charged and thus unfavorable for an interaction with phosphorylated peptide sequences. Sequence alignments and phylogenetic tree reconstructions using the FF domains of three functionally related proteins, Prp40, FBP11, and CA150, revealed that Prp40 and FBP11 are not orthologous proteins and supported the different ligand specificities shown by their respective FF1 domains. Our results also revealed that not all FF domains in Prp40 are functionally equivalent. We proposed that at least two different interaction surfaces exist in FF domains that have evolved to recognize distinct binding motifs.

Splicing of pre-mRNA is catalyzed by the spliceosome, a large ribonucleoprotein complex composed of 5 units of snRNPs 7 and about 100 accessory, non-snRNP splicing factors (1). In the early stages of spliceosome assembly, the splicing factor Prp40 associates with the U1 snRNP and plays an important role in bringing the 5Ј and the 3Ј splice sites into spatial proximity (2). Prp40 is a modular protein consisting of an N-terminal WW domain pair followed by several FF domains. Although the Prp40 WW domains bind to proline-rich sequences in the branch point-binding protein and the U5 snRNP-associated protein Prp8 (2,3), the region spanning the FF domains has been shown to associate with the 5Ј end of U1 snRNP (4) and to interact with at least two different proteins as follows: the splicing factor Clf1/Syf3p (crooked neck-like factor 1/synthetic lethal with cdc 40) (5)(6)(7), and the C-terminal domain (CTD) of the largest subdomain of RNA polymerase II (RNAPII) (8).
FF domains were first identified in the murine splicing factor FBP11 (formin-binding protein 11) as a repeated sequence of about 60 amino acids containing two conserved phenylalanine (F) residues that give name to the domain (9). Although protein-interaction modules are commonly found in functionally unrelated proteins, FF domains are primarily present in only the following three protein families: the splicing factors FBP11 and Prp40, the transcription factor CA150, and p190 RhoGTPase-related proteins (9). With only few exceptions, FF domains are arranged in arrays of up to six domains that in the case of CA150 seem to create multiple independent binding sites, rather than to confer binding cooperativity (10). Most interestingly, Prp40, FBP11, and CA150 have all been shown to recognize phospho-CTD repeats through their FF domains (11). Some progress has been made to understand how FF domains recognize phosphorylated ligands. In this respect, the solution structure of the N-terminal FF domain (FF1) of the human Huntingtin yeast partner (HYPA)/FBP11 has shown that the FF domain fold consists of three ␣-helices arranged as an orthogonal bundle and a 3 10 helix in the loop connecting the second and the third helix (12). By using chemical shift perturbation experiments, Bycroft and coworkers (12) were able to identify the binding site of HYPA/FBP11 FF1 domain for a peptide corresponding to two doubly phosphorylated RNAPII CTD repeats, leading to the suggestion that FF domains may represent a new class of phosphopeptide-binding modules. Most significantly, the phospho-ligand recognition site described for the HYPA/ FBP11 FF1 domain is only conserved in a particular subset of FF * This work was supported by Human Frontier Science Program Research Grant RG0234/ 2000M and by the Spanish Ministerio de Educació n y Ciencia Grants GEN2003-20642-C09-04/NAC and BFU2005-06276/BMC. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. This paper is dedicated to Professor Manuel Grande Benito on the occasion of his 60th birthday. □ S The on-line version of this article (available at http://www.jbc.org) contains Supplemental Figs. S1-S6. The atomic coordinates and structure factors ( For the splicing factor Prp40, previous studies have shown that a region including its first FF domain interacts with the protein Clf1. Clf1/Syf3p is an essential and well conserved multifunctional protein involved in cell cycle progression, pre-mRNA splicing, and initiation of DNA replication in yeast (5)(6)(7). Clf1 is composed of 15 crooked necklike tetratricopeptide repeats (crn-TPR) (5,6,13). TPR motifs represent a class of ubiquitous protein-interaction modules of partially conserved sequence. Although some RNAPII CTD heptapeptides form ␤-turns at the central SPTS motif in the presence of the target protein (14 -16) and others adopt an extended conformation (17), TPR motifs in general fold as a pair of anti-parallel ␣-helices. Repeats of TPR motifs often assemble into right-handed helical superstructures creating extensive binding surfaces with diverse specificities (18). Furthermore, in contrast to the RNAPII CTD, phosphorylation does not seem to occur in TPR motifs. The diversity in not only the function but also in the structure of these Prp40 FF domain binding partners raises important questions as to whether distinct binding specificities exist in FF domains and how the FF domain fold enables this potential promiscuity in ligand binding.
To gain more insight into the interaction mode of FF domains, we have determined the solution structure of the first FF domain of Prp40, and we characterized its interaction with the first crn-TPR motif (crn-TPR1) of Clf1 by chemical shift perturbation experiments. Most interestingly, the interaction surfaces of the Prp40 and HYPA/FBP11 FF1 domains are adjacent but structurally separate. NMR binding studies with the Prp40 FF1 domain using phosphorylated and unphosphorylated RNAPII CTD repeats revealed no interaction. Furthermore, no interaction was observed between the C-terminal FF domain (FF4) of Prp40 and the Clf1 crn-TPR1 motif showing that not all FF domains in the splicing factor Prp40 are functionally equivalent. Sequence alignments and phylogenetic reconstructions using the FF domains of Prp40, HYPA/FBP11, and CA150 proteins reveal that Prp40 and HYPA/FBP11 are not orthologous proteins and support the different ligand specificities shown by their respective FF1 domains. Taken together, at least two spatially separate interaction surfaces seem to exist in FF domains that recognize distinct binding motifs.

MATERIALS AND METHODS
Sequence Alignment and Phylogenetic Tree Reconstructions-Sequence alignments were performed using T-Coffee (19). For initial alignments, all predicted FF domain sequences in the literature (8) and those found in data bases, such as smart.embl-heidelberg.de and www. sanger.ac.uk/Software/Pfam, were used. After an initial alignment, FF domains that did not satisfy our confidence level were manually removed resulting in the final alignment (see supplemental Fig. 1). Proteins annotated in data bases as hypothetical were classified by domain architecture and length of the linkers connecting the WW domains as follows: proteins containing two WW domains separated by a short (ϳ15 aa) conserved linker were initially classified as similar to FBP11 proteins, whereas proteins with two or more WW domains separated by long, variable linkers were categorized as CA150-related. Because in both Saccharomyces cerevisiae and Schizosaccharomyces pombe there is only a unique Prp40 protein, whereas all remaining organisms contain at least one CA150-type (transcription factor) and one FBP11-type (splicing factor) protein, independence of the Prp40 subclass was maintained for the analysis. As sequence identity between equivalent FF domains within each of the three protein families is higher than 75%, two, three, and four representative protein sequences were selected from Prp40, CA150, and FBP11 proteins, respectively, for sequence alignment. Phylogenetic trees were generated with Phylip 3.5 based on the previous alignment. The protdist program was used to compute distance matrices from protein sequences employing the Dayhoff PAM matrix and neighbor programs. The final tree is shown in Fig. 5A. The SwissProt/UniProt accession numbers of the protein sequences are as follows: CG3542, Drosophila melanogaster FBP11; Q90WG3, Gallus gallus FBP11 (formin-binding protein 11-related protein). The FF domain-containing region is identical to human protein FNB3_HUMAN (Huntingtin yeast partner A/Huntingtin-interacting protein HYPA/ FBP11), and because of this we named the sequence as Hs/Gg HYPA/ FBP11; Q6NWY9, Homo sapiens HYPC/FBP11 (Huntingtin-interacting protein C/formin-binding protein 11-related protein); Q8IEF0, Plasmodium falciparum FBP11; O14776, H. sapiens CA150; Q86MP, Caenorhabditis elegans CA150; Q7PMV0, Anopheles gambiae CA150; O14176, S. pombe Prp40; and P33203, S. cerevisiae Prp40.
Sample Preparation-DNA fragments encoding the respective FF domains and TPR motifs were amplified by PCR using genomic S. cerevisiae DNA as template. FF domain constructs correspond to residues 134 -189 (FF1) and 121-189 (extended FF1), to residues 488 -552 and 465-552 (FF4 and extended FF4) of Prp40, and to residues 212-266 of Ypr152. Protein constructs of TPR motifs correspond to residues 31-64 and 1-64 (crn-TPR1 and extended crn-TPR1) of Clf1. The gene fragments were inserted into a modified pET24d vector (G. Stier, EMBL Heidelberg, Germany) enabling production of the N-terminal fusion protein with a His tag followed by glutathione S-transferase and a TEV protease cleavage site. The proteins were expressed in Escherichia coli strain BL21(DE3) and purified in three steps. After affinity purification with glutathione-Sepharose, the fusion protein was subsequently cleaved with an N-His-tagged TEV protease, and both the protease and the glutathione S-transferase proteins were separated from the domain by a Ni 2ϩ -chelation step and gel filtration chromatography. Protein purity was determined by SDS-PAGE. All constructs contain two to four additional residues (GA, GAM, and GAMD) preceding the domain that result from primer design and the TEV cleavage site.
To prepare 13 C-and/or 15 N-labeled samples, cells were grown in minimal medium (M9) with D-[ 13 C]glucose and/or 15 NH 4 Cl as sole sources of carbon and nitrogen, respectively. NMR samples of the Prp40 FF1 domain and Clf1 crn-TPR1 motif had a protein concentration of ϳ1 mM for structure determination and were dissolved in 20 mM sodium phosphate buffer, 100 mM NaCl, 0.02% (w/v) NaN 3 in 90% H 2 O, 10% D 2 O or 100% D 2 O at pH 6.2.
NMR Spectroscopy-Except for chemical shift perturbation experiments, all NMR data were collected at 285 K for the Prp40 FF1 domain and at 295 K for the Clf1 crn-TPR1 motif on Bruker DRX-500, DRX-600, and DRX-800 NMR spectrometers equipped with triple-resonance z-gradient probes. All spectra were processed with the NMRPipe/ NMRDraw package (20) and were analyzed with XEASY (21), whereas 15 N relaxation data were analyzed using NMRView (22). 1 H, 13 C, and 15 N chemical shifts were assigned by standard methods (23) using 1 H, 15 N-HSQC, HNCA, CBCANH, CBCA(CO)NH, H N H ␣ -J experiments for backbone assignments and 1 H-13 C-CT-HSQC, HC(CO)NH-, C(CO)NH-and HC(C)H-TOCSY experiments for aliphatic side chain assignments. For the Prp40 FF1 domain, aromatic side chain assignments were achieved using two-dimensional (H ␤ )C ␤ (C ␥ C ␦ )H ␦ and (H ␤ )C ␤ (C ␥ C ␦ C ⑀ )H ⑀ experiments (24), two-dimensional homonuclear 1 H-TOCSY and NOESY, and three-dimensional 13 C-edited NOESY experiments, whereas the three-dimensional 13 C-edited NOESY was sufficient to assign aromatic side chains in the Clf1 crn-TPR1 motif. Heteronuclear { 1 H}-15 N NOE experiments were recorded at 500.13-MHz proton frequency by standard two-dimensional methods (25) using a 1.0 mM 15 N-labeled sample. The heteronuclear NOE experiments were run twice in an interleaved fashion with and without (reference experiment) proton saturation during the recovery delay. Errors in the peak intensities were estimated from the average base-line noise in the spectra of repeated experiments. { 1 H}-15 N NOEs were determined as the peak intensity ratio between the reference and the saturation experiment and are the average of two measurements.
Structure Calculation-Inter-proton distance restraints were derived from fully assigned peaks in three-dimensional 15 N-and 13 C-edited NOE experiments (using mixing times of 140 ms) integrated with the XEASY package. Hydrogen bond restraints were applied as indicated by 3 J(H N ,H ␣ ) coupling constants, 13 C ␣ and 13 C ␤ secondary chemical shifts, and NOE patterns. Restraints for the backbone angles and were derived from the program TALOS (26) and were applied where they agreed with the experimental angles determined by quantitative 3 J(H N ,H ␣ ) correlation experiments (27). For the dihedral angles predicted by TALOS, the upper and lower restraint boundaries were doubled as compared with those suggested by the program. 3 J(N,C ␥ ) coupling constants were measured by spin-echo difference experiments (28) to restrain the side chain angle 1 to 180 Ϯ 40°for trans-rotamers and to 0 Ϯ 90°for gauche-rotamers. For structure calculation the programs CNS (29) and ARIA 2.0 (30, 31) were used with a mixed torsion and Cartesian angle dynamics simulated annealing protocol. Structures were calculated in 8 iterations producing 30 structures in each of the first 7 iterations and 50 structures in the final iteration. The 20 lowest energy structures of the final iteration were submitted to water refinement. The quality of the 10 lowest energy structures was analyzed after water refinement using the programs CNS and PROCHECK-NMR (32), and the results are summarized in Table 1.
Ultracentrifugation Experiments on the crn-TPR Motif-Equilibrium analytical centrifugation of the crn-TPR motif at starting concentrations of 0.1 and 0.3 mM was performed on a Beckman Optima XL-I centrifuge at 285 and 295 K to investigate temperature-dependent aggregation of the crn-TPR1 motif. Equilibrium was achieved at 55000 rpm after 12 h on a Ti60 rotor. Curves were fitted with the ULTRAS-CAN software to a monomer-dimer equilibrium profile (see supplemental Fig. 4).
Models of the crn-TPR Motif-These models were generated with InsightII using the NMR data to define the boundaries of the helices and their relative orientation (see supplemental Fig. 5). Available structures of TPR motifs were not used as templates to generate the model because the crn-TPR motif used in this work is quite divergent with respect to the canonical TPR repeats characterized so far. In them, the packing of both helices is normally stabilized by contacts between the highly conserved residues WXXXGXXL from the first helix that form a hydrophobic pocket and the side chain of a conserved aromatic residue of the second helix that fits into the pocket. In the case of the crn-TPR1 repeat, the equivalent positions of Trp and Gly are occupied by an Asn and a Glu, respectively, unlikely to form the classical cavity. As such, models that would have been generated only by sequence homology would have given an orientation of the helices different from the one obtained in this work, which is based on NOE data.
Chemical Shift Mapping Experiments-Chemical shift perturbation experiments with the 15 N-labeled Prp40 FF1 domain and unlabeled Clf1 crn-TPR1 were performed at 600-MHz proton frequency on a Bruker DRX-600 NMR spectrometer at 295 K using two-dimensional 1 H, 15 N-HSQC experiments. All samples were dissolved in the same NMR buffer as described above. The 15 N-labeled proteins were 0.5 mM in concentration. Unlabeled Clf1 crn-TPR1 was added to the 15 N-labeled Prp40 FF1 domain to a final molar ligand:protein ratio of ϳ2:1. As a control experiment unlabeled Clf1 TPR1 was added to the 15 N-labeled C-terminal FF4 domain of Prp40 to a final molar ratio of ϳ3:1 (data not shown).
For binding studies with the 15 N-labeled Prp40 FF1 domain and peptides derived from the RNAPII CTD, a synthetic doubly phosphorylated CTD repeat with the sequence SYpSPTpSPS (where pS indicates phosphoserine) and an unphosphorylated CTD repeat with the sequence (YSPTSPS) 2 were purchased from MWG Biotec AG (Ebersberg, Germany). A (YpSPTpSPS) 2 peptide was synthesized in-house by using the Fmoc strategy. The Fmoc-Ser(PO(benzyloxy)OH)-OH amino acid was purchased from Novabiochem. The final peptide was cleaved with 95% trifluoroacetic acid, 5% H 2 O and precipitated in cold ether. The crude material was purified by preparative HPLC to 90% purity (as characterized by HPLC-MS). The peptides were added to the 15 N-labeled Prp40 FF1 domain up to a molar peptide:protein ratio of ϳ7:1. The NMR data corresponding to the (YpSPTpSPS) 2 titration were acquired on a Bruker DRX-800 NMR spectrometer (see supplemental Fig. 6). Average chemical shift changes upon ligand binding were calculated with the equation ⌬␦ av ϭ ((⌬␦ 1H ) 2 ϩ (⌬␦ 15N /n) 2 ) 1/2 , where ⌬␦ 1H and ⌬␦ 15N are the linear change along the 1 H and 15 N axes, respectively, and n is the ratio of the chemical shift dispersion in 15 N and 1 H. Chemical shift perturbations were determined to be significant if they were Ͼ0.2 ppm.
Affinity Measurements-The dissociation constant of the Prp40 FF1-Clf1 TPR1 complex was determined as 150 Ϯ 20 M by fluorescence spectroscopy, titrating a 60 M solution of the Prp40 FF1 domain with a 1.1 mM solution of the Clf1 crn-TPR motif. Both samples were used in the same buffer as for the NMR experiments. The temperature was set to 295 K to avoid dimerization of the ligand. The excitation wavelength was 297 nm. Changes in the intensity of the fluorescence signal upon addition of the ligand at 343 nm were used for calculating the dissociation constant.
Homology Models-FF2 homology models were generated using MODELLER6.1 (33) based on the sequence alignment shown in supplemental Fig. 1, the lowest energy structure of the Prp40 FF1 domain and the solution structure of the HYPA/FBP11 FF1 domain (Protein Data Bank entry 1uzc).
Other Methods-Figures of three-dimensional structures and surface representations were prepared with MOLMOL (34). Solvent-accessible surface areas were calculated using NACCESS (35). NMR data were represented using XWinPlot, version 3.1.

RESULTS
Prp40 FF1 Domain Boundaries-Because sequence alignments produced consistent domain boundaries for the N-and the C-terminal Prp40 FF domain (hereafter referred to as FF1 and FF4, respectively (8,9,36)), we designed two initial constructs of the Prp40 FF1 and FF4 domain (Prp40 aa 134 -189 and Prp40 aa 488 -551, respectively). We also expressed a 55-residue construct that corresponds to the unique FF domain present in Ypr152. The three constructs yielded folded samples as judged from two-dimensional 1 H, 15 N correlation (HSQC) spectra. To evaluate further whether the charged and hydrophobic residues preceding the Prp40 FF1 and FF4 domains, respectively, were important for the domain fold, protein constructs with extended N termini were prepared, extended FF1 (aa 121-189) and extended FF4 (aa 471-551). However, in the two-dimensional 1 H, 15 N-HSQC spectra of the extended constructs, we did not observe chemical shift changes for the folded residues as compared with the ones in the initial constructs. The additional peaks displayed random coil chemical shifts, and in addition, heteronuclear NOE experiments showed that they have negative or very small intensities, supporting the conclusion that the N-terminal exten-sions of both Prp40 FF1 and FF4 domains were unstructured in solution (HSQC data shown for both FF4 constructs, supplemental Fig. 2). In general, the flanking regions of FF domains are highly variable in sequence, further indicating that FF domains seem to fold autonomously with about 55 residues. Remarkably, one exception to the rule has been already observed in the HYPA/FBP11 FF1 structure (12), where additional residues located at the N terminus of the FF domain directly interact with the canonical domain fold. When this 10-residue extension is compared with the corresponding region of the Prp40 FF1 domain, no residue is conserved, suggesting that only HYPA/FBP11 FF1-related sequences may display these additional contacts. Given the direct evidence for an interaction between the Prp40 FF1 domain and the splicing factor Clf1, we decided to focus on the structural characterization of the Prp40 FF1 domain construct without N-terminal extension (aa 134 -189).
Description of the Prp40 FF1 Domain Structure-Analysis of the NMR spectra of the Prp40 FF1 domain resulted in almost complete assignment of 1 H, 15 N, and 13 C resonances. For structure determination, a total of 1192 nonredundant NOE distance restraints were assigned and applied together with 42 hydrogen bond restraints and 59 dihedral angle restraints derived from J-coupling experiments and data base mining using the program TALOS (26) ( Table 1). The final ensemble of the 10 lowest energy structures is well defined with a backbone (N, C ␣ , CЈ atoms) r.m.s.d. of 0.20 Å to the mean structure (Fig. 1A). The structural statistics for the NMR structure ensemble are summarized in Table 1.
The fold of the Prp40 FF1 domain consists of three ␣-helices, ␣1 (aa 134 -146), ␣2 (aa 154 -163), and ␣3 (aa 175-187) and a 3 10 helix (aa 167-170) located in the loop that connects the second and the third helix (Fig. 1B). As in the previously determined structure of the HYPA/ FBP11 FF1 domain (12), an extensive network of semi-conserved aromatic (Phe-139, Phe-154, Tyr-168, Phe-182, and Tyr-185) and aliphatic residues (Ala-135, Ile-140, Leu-143, Ile-157, Leu-161, Val-171, and Pro-175) forms the core of the domain (Fig. 1B). The first loop (aa 147-153) is not disordered, in agreement with the large number of medium and long range NOEs observed for these residues and as judged from the average { 1 H}-15 N heteronuclear NOE values (data not shown). Asp-149, Ser-150, Thr-151, and Trp-152 form a type I ␤-turn, with Val-148 and Trp-152 displaying numerous long range NOEs to the core of the FF domain. The importance of Val-148 for the FF domain fold is underlined by the conservation of aliphatic amino acids at this position in FF domain sequences, whereas Trp-152 is exclusively present on Prp40 FF1 domains. The loop connecting the second and third helix (loop 2) consists of a 3 10 helix (residues 167-170) with a conserved DXRY motif whose aromatic residue contributes to the hydrophobic core of the domain. Another interesting feature of this linker is the presence of three successive negatively charged residues (Asp-172 to Asp-174) that are unique to FF1 and FF4 Prp40 sequences.
Comparison with the Structure of the HYPA/FBP11 FF1 Domain-The structure of the N-terminal FF domain of human HYPA/FBP11 has been determined by NMR (12). With a backbone r.m.s.d. of ϳ0.7 Å (excluding ␣2 from the fit), the overall fold of the Prp40 FF1 domain is very similar to that described for the HYPA/FBP11 FF1 domain (Fig.  1C). The main differences between the Prp40 and HYPA/FBP11 FF1 domain structures reside in the orientation of the second helix (␣2). Although this helix packs tightly to the core of the FF domain in the HYPA/FBP11 structure, the Prp40 structure is more open. This is probably caused by the bulky side chain of Trp-152, which intercalates between the first and second helix. The corresponding residue in the HYPA/FBP11 FF1 domain is an alanine (Ala-409) that is smaller than the tryptophan and requires less space to fit into the structure (Fig. 1C).

91.0%
Residues in additionally allowed region 9.0% a No distance restraint was violated by more than 0.3 Å, and no dihedral angle restraint was violated by more than 5°. b ͗SA͘ refers to the ensemble of the 10 structures with the lowest energy. c E L-J is the Lennard-Jones energy calculated using the CHARMM PARMALLH6 parameters. E L-J was not included in the target function during the structure calculation. d Excluding glycine and proline residues. Although numerous long range NOEs were observed from Trp-152 to Asn-146 (at the C terminus of ␣1) and to Leu-161 (at the C terminus of ␣2) for the Prp40 FF1 domain, none of the Ala-409 protons lie within 5 Å of the corresponding residues in the HYPA/FBP11 FF1 domain structure (Lys-403 in ␣1 and Ile-418 in ␣2). On the other hand, the H ␣ proton of Ile-418 (␣2) contacts many side chain protons of Leu-399 (␣1) in the HYPA/FBP11 FF1 domain structure, whereas no NOEs were observed between the corresponding H ␣ proton of Leu-161 and the side chain of Met-142 in the Prp40 FF1 domain. Moreover, the second turn of the second helix is irregular in the Prp40 FF1 domain structure, resulting from an unusual pattern of NOEs involving Ile-157 (␣ i ) and Leu-161 (N iϩ4 and ␤ iϩ4 , respectively) instead of the regular ␣ i to N iϩ3 /␤ iϩ3 NOE pattern (supplemental Fig. 3, A and B). It is worth noting that Trp-152 is only conserved in the FF1 domains of yeast Prp40 (S. cerevisiae and S. pombe) and that the sequence of the second helix is the least conserved region in the FF domain sequences (supplemental Fig. 1).
Interaction between Prp40FF1 and crn-TPR Motifs-Although four proteins comprising the TPR motifs are present in the spliceosome, only Clf1 has been shown to interact with Prp40. This interaction has been mapped to the N-terminal eight crn-TPR motifs of Clf1 and a region of Prp40 harboring the first FF domain (5)(6)(7). For proteins such as Clf1 consisting of multiple structural repeats separated by short linkers, the selection of the correct motif frame is critical, because changes in the motif register will have important structural consequences (37). For construct design, we have used the boundaries of Clf1 cnr-TPR motifs published by Ben-Yehuda et al. (13). Of all prepared Clf1 TPR motif constructs, only those corresponding to the N-terminal TPR (crn-TPR1) motif of Clf1 (aa 31-64) and an extended crn-TPR1 motif (aa 1-64) yielded sufficient amounts of soluble and folded proteins for binding studies. To investigate the solubility and monomeric properties of the (aa 31-64) crn-TPR1 motif, sedimentation equilibrium data at different temperatures were acquired. At 295 K (supplemental Fig. 4), the motif behaves as a monomer up to 1 M concentration, whereas at lower temperatures (285 K) there is a detectable monomer-dimer equilibrium at concentrations in the millimolar range. To avoid the presence of a dimer during the binding experiments, these were always carried out at 295 K. Because of the presence of three tryptophan residues in Prp40 FF1, fluorescence spectroscopy was used to determine its affinity for the crn-TPR1 (aa 31-64) construct, which does not contain tryptophan residues, resulting in a medium range interaction (K d of 150 Ϯ 20 M).
Chemical Shift Mapping of the Prp40 FF1 Domain Binding Site-Addition of each of the two soluble Clf1 constructs to the 15 N-labeled Prp40 FF1 domain resulted in chemical shift changes for a number of amide resonances with concomitant line broadening (Fig. 2, A-C; data not shown for the extended Clf1 crn-TPR1). Because the same residues of the Prp40 FF1 domain were affected by ligand binding with both crn-TPR1 constructs, it can be concluded that the additional N-terminal residues of the extended crn-TPR1 construct do not contribute to the interaction. As shown in Fig. 2B, the residues involved in binding (those whose amide chemical shifts change more than 0.2 ppm averaged units in Fig. 2C)  Most interestingly, the binding site of the Prp40 FF1 domain is spatially close to, but separate from, the phospho-CTD-binding site previously mapped for the HYPA/FBP11 FF1 domain which is located at the N termini of ␣1 and ␣3 (12) (Fig. 2D). In contrast to the predomi-nantly positively charged RNAPII CTD-binding site of the HYPA/ FBP11 FF1 domain, most hydrophobic residues with more than 50% solvent-accessible surface area in the Prp40 FF1 domain structure (Ile-158, Pro-166, Trp-169, Met-170, Pro-175, and Leu-176) and a number of negatively charged residues map to the determined Clf1 TPR1-binding surface underlining the different characters of both binding surfaces. Furthermore, almost all residues in the ␣2 helix of the Prp40 FF1 domain are involved in the interaction with the crn-TPR1 motif. The slightly different orientation of this helix as compared with the HYPA/ FBP11 FF1 domain might thus indeed be of functional importance for the Prp40 FF1 domain.
In contrast to the Prp40 FF1 domain, no interaction was observed between the 15 N-labeled Prp40 FF4 domain and the unlabeled Clf1 TPR1 motif under virtually identical experimental conditions. This suggests that FF domains present in Prp40 have distinct ligand specificities and thus are not functionally equivalent.
Chemical Shift Perturbation Studies of the Clf1 TPR1 Motif-In order to gain more detailed information about the interaction between the Prp40 FF1 domain and the Clf1 TPR1 motif, we also identified the residues of the Clf1 crn-TPR1 motif involved in the binding. Mainly due to the intrinsic overlap characteristic to ␣-helical peptides, resonance assignment of the Clf1 crn-TPR1 motif was more difficult than expected for a 36-residue peptide. Despite its quite divergent sequence as compared with canonical TPR motifs, the Clf1 crn-TPR1 motif folds as a pair of ␣-helices (residues 39 -47 and 51-59, respectively) connected by a short, positively charged loop as determined by 13 C ␣ and 13 C ␤ secondary chemical shifts, heteronuclear { 1 H}-15 N NOEs, and short to medium range NOEs (supplemental Fig. 5). Because of spectral overlap, the positively charged loop could only be partially assigned. However, a set of long range NOEs between both helices were identified and used to build a model of the crn-TPR1 motif that best represents the experimental data (Fig. 3A). Addition of the unlabeled Prp40 FF1 domain to the 15 Nlabeled crn-TPR1 motif (2:1 FF1-motif ratio) resulted in a number of resonances that shifted and/or disappeared in the 1 H, 15 N-HSQC spectra (Fig. 3B). Chemical shift changes were observed in residues located in both ␣-helices, namely Leu-40, Leu-43, Arg-44, Tyr-46, and Gln-47 in the ␣1 helix and Thr-52, Glu-53, Glu-55, Tyr-57, Leu-58, and Asn-61 in the ␣2 helix, with the biggest changes observed for the amides of Leu-43, Glu-53, Tyr-57, and Glu-58. According to the model, the affected residues cluster on one side of the motif (Fig. 3C). The crn-TPR1 sequence used in this work is shown as Fig. 3D. Unfortunately, additional information about the orientation of the Clf1 crn-TPR1 motif on the Prp40FF1 domain structure could not be obtained from NOESY spectra of the complex because of the line broadening observed. However, our results demonstrate that the independent N-terminal crn-TPR motif of Clf1 is sufficient for recognizing the Prp40 FF1 domain. This is in agreement with previous studies showing that individual TPR repeats may have different peptide-binding specificities and thus the same binding affinity for their ligands as the full-length protein (38,39).
Interaction Studies with the RNAPII CTD-In a previous study the region spanning the FF domains of Prp40 was shown to interact with hyperphosphorylated RNAPII CTD (8). Furthermore, mapping of the interaction surface between the HYPA/FBP11 FF1 domain and a doubly phosphorylated pair of CTD repeats revealed that two positively charged residues in the FF domain-binding site may recognize the phosphate groups of the CTD (12). We have therefore performed chemical shift perturbation studies with the 15 N-labeled Prp40 FF1 domain and three synthetic peptides derived from the RNAPII CTD. However, even at high molar ligand:domain ratios, we found that the Prp40 FF1 domain did not interact with the phosphorylated CTD peptides (SYpSPTpSPS and (SYpSPTpSPS) 2 , respectively) nor with the unphosphorylated tandem CTD repeat (YSPTSPSYSPTSPS). Titration data for the (SYp-SPTpSPS) 2 peptide are shown as supplemental Fig. 6. It is worth noting that two positive control experiments using the 15 N-labeled Rsp5 WW2 domain and the unlabeled phosphorylated and phosphorylated CTD peptide, respectively, revealed chemical shift changes (3). Furthermore, in contrast to the RNAPII CTD-binding site of the HYPA/FBP11 FF1 domain, the corresponding surface area of the Prp40 FF1 domain is predominantly negatively charged and therefore unfavorable for binding the phospho-CTD (Fig. 4A). Although we cannot rule out the pos-  15 N-HSQC spectra of the 15 N-labeled Prp40 FF1 domain in the absence (reference spectrum in black) and presence of increasing amounts of unlabeled crn-TPR1 motif. Spectra in dark blue, cyan, orange, and red correspond to molar protein:ligand ratios of 1:0.5, 1:1, 1:1.5 and 1:2, respectively. Labels indicate residues affected by ligand binding. B, ribbon representation of the Prp40 FF1 domain displaying amide atoms of residues (represented as blue spheres) that are affected by ligand binding (as defined in C). C, bar representation of the chemical shift changes corresponding to the data shown in A. Chemical shift changes were calculated as the difference between the value obtained at the 1:2 protein:peptide ratio and the equivalent chemical shift corresponding to the free domain, using the equation ⌬␦ av ϭ ((⌬␦ 1H ) 2 ϩ (⌬␦ 15N /n) 2 ) 1/2 . Residues that get broadened or that disappear upon ligand binding are represented with blue stars, and residues that change more than 0.2 ppm are represented as blue bars. These two type of residues were considered to be affected upon ligand binding and thus were displayed in B. D, comparison of the binding sites of the Prp40 sibility that the wrong phosphorylation pattern was used in the synthetic peptides employed in our experiments, the negative surface potential of the FF1 domain suggests that other FF domains in Prp40 are responsible for recognizing the phosphorylated RNAPII CTD.
Domain Composition of Prp40, FBP11, and CA150 Proteins-Both Prp40 and HYFA/FBP11 associate with the U1 snRNP in yeast and metazoans, respectively. Given that their first FF domains seem to recognize different ligand motifs, we were interested in clarifying whether Prp40 and FBP11 are orthologous proteins. The overall sequence conservation of FF domains, as already mentioned, is relatively low leading to discrepancies in the literature regarding the number and location of the yeast Prp40 FF domains. To clarify both questions, we have combined the knowledge of the structure-determining residues of Prp40 and FBP11/HYPA FF1 domains with sequence alignments and phylogenetic tree reconstruction tools using representative sequences from the three functionally related FBP11, Prp40, and CA150 protein families.
In agreement with previous predictions (8), our sequence analysis showed that CA150 proteins consist of six FF domains (8), and S. pombe Prp40 contains five well defined FF domains. As predicted by Bedford et al. (9), S. cerevisiae Prp40 includes only four FF domains. However, we cannot exclude the possibility that two additional FF domains may exist in Prp40, as suggested previously (8), but their sequences have diverged too much to be detected with the level of confidence used in our analysis. In contrast to CA150 proteins, the number of FF domains in FBP11 proteins seems to depend on the organism as follows: human and chicken FBP11 proteins display six, the Drosophila counterpart five, and Plasmodium only three.
Based on the multiple sequence alignment (supplemental Fig. 1), we generated a neighbor-joining tree with the Phylip package (Fig. 5A). If Prp40 and FBP11 proteins were orthologous, their FF domains should form a cluster within the phylogenetic tree. If, on the contrary, they were different proteins, some Prp40 FF domains should group together with FBP11 FF domains and others with CA150 FF domains. Indeed, we find that in general domains belonging to a certain subclass are grouped together (for instance, all FBP11 FF1 domains form a cluster as shown in Fig. 5A). Prp40 FF domains, on the other hand, cluster with either CA150 or FBP11 FF domains, depending on the particular FF domain analyzed. For instance, the S. cerevisiae Prp40 FF1 domain clusters with CA150 FF1 domains and is clearly distinct from FF1 domains in FBP11 proteins. This behavior once more supports the different binding specificities observed for Prp40 and HYPA/FBP11 FF1 domains. Furthermore, the Prp40 FF2 domain seems to be more closely related to CA150 FF5 domains and FBP11 FF4 domains than to the corresponding CA150 and FBP11 FF2 domains (Fig. 5A). Taken together, we conclude from this analysis that despite their presumably similar functions in spliceosome assembly FBP11 and Prp40 are not orthologous proteins.

DISCUSSION
The sequence alignment generated with sequences of three FF domain-containing proteins, Prp40, FBP11, and CA150, reveals that positions contributing to the secondary structure are always occupied by hydrophobic residues with little preference for any particular hydrophobic amino acid. Only a pair of positively charged residues are conserved at the C terminus of FF domains in most sequences. The most divergent part of the sequence is localized in helix 2 and in loop 2 that connects helix 2 to helix 3. Within loop 2, we notice that half of the FF domain sequences contain a DXR(Y/F) motif. The side chain of the aromatic residue forms part of the protein core and thus explains its presence in many FF sequences. In Prp40, the ligand-binding site identified in the NMR titrations is mostly formed by the DXR(Y/F) motif, encircled by negatively charged and neutral residues together with two aromatic residues, Trp-169 and Trp-177, only present in the Prp40 FF1 domain. We have also observed that FF4, which does not contain the motif, is unable to interact with the crn-TPR repeat used in this work. It thus seems possible that other ligands yet to be identified may exist as targets for the FF domains that lack the DXR(Y/F) motif, for instance Prp40 FF2 and FF4, several FBP11 FF domains, and members of the p190 RhoGAP family (the latter not included in the alignment given in supplemental Fig. 1). Indeed, the p190-A RhoGAP region consisting of the FF domains has been described recently to interact with the transcription factor TFII-I, and this interaction is regulated by tyrosine phosphorylation on the first FF domain (40). Most interestingly, the phosphorylated tyrosine is located in the loop 2 region. Thus, probably the FF domains that lack the DXR(Y/F) motif may have a more flexible loop 2, even with the 3 10 helix unformed. This flexibility, especially in the case of the p190-A RhoGAP FF1 domain, may allow the kinase to access and phosphorylate the tyrosine. Because tyrosine phosphorylation has an inhibitory role in the interaction with the transcription factor TFII-I, it may indicate that the tyrosine is directly involved in ligand binding. If this is the case, then both p190-A RhoGAP FF1-and Prp40 FF1-binding sites may be localized on the same part of the surface. Certainly structural information on RhoGAP FF domains and on the complexes they form will clarify this hypothesis. Moreover, the study of the potential effects that phosphorylation may have in the domain structure will also be very valuable to obtain a detailed description of the role of loop 2 in the interaction.
TPR motifs are not the unique targets that can be recognized by FF domains containing the DXR(Y/F) motif. Recently, a number of transcription and splicing factors were identified as potential interaction partners for CA150 (10,11). In particular, the Tat-specific factor 1 (Tat-SF1) encompasses multiple weak binding sites for the CA150 FF domains that conform to the consensus motif (D/E) 2/5 F/W/Y(D/E) 2/5 (10). The RNAPII CTD consists of up to 52 heptapeptide repeats with the consensus sequence YSPTSPS (41). Phosphorylation of CTD repeats at positions 2 and 5 is thought to create a "CTD code" that regulates interactions with a variety of transcription and splicing factors, including CA150 (42)(43)(44). Most interestingly, both these serines are phosphorylated during the M phase of the cell cycle to inhibit RNA splicing and promote gene silencing (45). Phosphorylation at these serine positions in a tandem CTD repeat creates a motif that resembles that of (D/E) 2/5 F/W/Y(D/E) 2/5 motif in the sense that an aromatic residue is surrounded by negatively charged residues. This similarity may provide a basis for the involvement of FF domain-containing proteins in linking mRNA splicing to transcription. Although no structural information is available for the binding mode of (D/E) 2/5 F/W/Y(D/E) 2/5 motifs, the phospho-CTD-binding site of the HYPA/FBP11 FF1 domain has been mapped by chemical shift perturbation experiments (12). The phospho-CTD-binding site includes two lysine residues, often present in FBP11 and CA150 FF domains, suitable for forming salt bridges with the phosphate groups of the tandem CTD repeat. With the exception of the FF2 domain, these lysines are rare in Prp40 FF domains.
The differences in charge distribution of the phosphopeptide-binding site in the Prp40 and HYPA/FBP11 FF1 domains led us to calculate pK a values for all FF domains included in the sequence alignment shown in supplemental Fig. 1. Whereas the Prp40 FF1 domain possesses an overall pK a of 4.7, the FBP11 and CA150 FF1 domains have pK a values ranging from 8.8 to 9.9 (Fig. 5B). In contrast to the negatively charged FF1 domain, the Prp40 FF2 has an overall pK a of 9.9 and thus seems to be more favorable for an interaction with the phospho-CTD than the Prp40 FF1 domain. It is also of note that CA150 FF domains are almost exclusively positively charged. In FBP11 proteins, however, only the FF1 domains have a pK a Ͼ8.0, whereas the FF2, FF3, and FF4 domains are FIGURE 5. Domain composition of Prp40, FBP11, and CA150 proteins. A, neighborjoining phylogenetic tree of FF domains in the three related proteins Prp40, FBP11, and CA150. Species are named as follows: Ag., A. gambiae; Ce., C. elegans; Dm., D. melanogaster; Gg., G. gallus; Hs., H. sapiens; Pf., P. falciparum; Sc., S. cerevisiae; and Sp., S. pombe. FF domains of proteins with fewer than six FF domains may have two numbers. The first number corresponds to the position of the FF domain in the protein sequence (e.g. Prp40 FF4), and the second number indicates the similarity to other FF domains based on their phylogenetic relationship (e.g. Prp40 FF4_6). The scale bar represents a distance of 0.2 substitutions per site. B, pK a values of FF domains in the three related proteins Prp40, FBP11 and CA150. White, gray, and black boxes were used for neutral (6.0Ͻ pK a Ͻ8.0), positively charged (pK a Ͼ8.0), and negatively charged (pK a Ͻ6.0) FF domains, respectively. Open boxes indicate absent domains.
negatively charged (with the exception of the P. falciparum FF4), and the FF5 and FF6 domains have a neutral pK a . This might suggest that in contrast to CA150 proteins, only the FF1 domains of FBP11 proteins can interact with the phospho-CTD, although other FBP11 FF domains may rather recognize splicing factors, as the Prp40 FF1 domain.
Prp40 was originally discovered as a protein associated with U1 snRNA (4). Indeed, a suppressor mutation in the Prp40 FF2 domain (S240F) was found to rescue otherwise lethal mutations in the 5Ј end of U1 snRNA (4). To our surprise, sequence analysis and a homology model of the Prp40 FF2 domain revealed that the S240F suppressor mutation maps to the crn-TPR-binding site of the Prp40 FF1 domain and not to the phosphopeptide-binding site of the HYPA/FBP11 FF1 domain (Fig. 4, A-C). In contrast to the Prp40 FF1 domain, the binding pocket of the Prp40 FF2 domain is highly positively charged with three lysine residues (Lys-228, Lys-247, and Lys-248) in close proximity to the S240F mutation. Furthermore, four hydrophobic residues (Phe-204, Pro-235, Tyr-238, and His-239) surround the S240F mutation potentially creating a favorable environment for negatively charged molecules, as phosphoproteins or even RNA (in the case of a direct interaction with the 5Ј end of U1 snRNA). Supporting this hypothesis is the recent finding that an ortholog of CA150, that of the dipteran Chironomus tentans protein hrp130, was shown to associate with nascent pre-mRNA (46). However, it is still unclear whether this interaction involves the FF domains or other regions of the protein. If the RNA interaction turns out to be the case, some FF domains may contain a new RNA-binding motif. Taken together, our analysis of sequences and surface potentials of FF domains provides a first step toward understanding the binding versatility of these domains. Distinct ligand-binding sites and specificities even within a single FF domain may enable scaffolding proteins such as Prp40 to mediate multiple interactions with unrelated binding partners and thereby enhance the assembly of large ribonucleoprotein complexes as the spliceosome. Clearly, further studies, in particular of FF domain interactions, will be necessary to elucidate possible individual roles of FF domains and to understand the principles governing their binding specificities.