Dimerization and Protein Binding Specificity of the U2AF Homology Motif of the Splicing Factor Puf60*

PUF60 is an essential splicing factor functionally related and homologous to U2AF65. Its C-terminal domain belongs to the family of U2AF (U2 auxiliary factor) homology motifs (UHM), a subgroup of RNA recognition motifs that bind to tryptophan-containing linear peptide motifs (UHM ligand motifs, ULMs) in several nuclear proteins. Here, we show that the Puf60 UHM is mainly monomeric in physiological buffer, whereas its dimerization is induced upon the addition of SDS. The crystal structure of PUF60-UHM at 2.2 Å resolution, NMR data, and mutational analysis reveal that the dimer interface is mediated by electrostatic interactions involving a flexible loop. Using glutathione S-transferase pulldown experiments, isothermal titration calorimetry, and NMR titrations, we find that Puf60-UHM binds to ULM sequences in the splicing factors SF1, U2AF65, and SF3b155. Compared with U2AF65-UHM, Puf60-UHM has distinct binding preferences to ULMs in the N terminus of SF3b155. Our data suggest that the functional cooperativity between U2AF65 and Puf60 may involve simultaneous interactions of the two proteins with SF3b155.

PUF60 is an essential splicing factor functionally related and homologous to U2AF 65 . Its C-terminal domain belongs to the family of U2AF (U2 auxiliary factor) homology motifs (UHM), a subgroup of RNA recognition motifs that bind to tryptophancontaining linear peptide motifs (UHM ligand motifs, ULMs) in several nuclear proteins. Here, we show that the Puf60 UHM is mainly monomeric in physiological buffer, whereas its dimerization is induced upon the addition of SDS. The crystal structure of PUF60-UHM at 2.2 Å resolution, NMR data, and mutational analysis reveal that the dimer interface is mediated by electrostatic interactions involving a flexible loop. Using glutathione S-transferase pulldown experiments, isothermal titration calorimetry, and NMR titrations, we find that Puf60-UHM binds to ULM sequences in the splicing factors SF1, U2AF 65 , and SF3b155. Compared with U2AF 65 -UHM, Puf60-UHM has distinct binding preferences to ULMs in the N terminus of SF3b155. Our data suggest that the functional cooperativity between U2AF 65 and Puf60 may involve simultaneous interactions of the two proteins with SF3b155.
Pre-mRNA splicing is a stepwise process initiated by the recognition of sequence elements at the splice site by specific splicing factors (1). The branch point sequence is recognized by splicing factor SF1 (2,3), whereas the polypyrimidine tract and the 3Ј splice site AG-dinucleotide are bound by the heterodimer U2AF 65 -U2AF 35 (4 -7). Although SF1 alone interacts only weakly with the branch point sequence, this interaction is stabilized significantly by U2AF 65 , which binds simultaneously to SF1 and to the polypyrimidine tract (8). In the next step of splicing initiation U2 snRNP is brought to the 3Ј splice site. This involves base pairing of the U2 RNA to the branch site RNA (9) and localization of the SF3b subunit p14 near the branch point adenosine by an interaction with the N terminus of the U2 snRNP component SF3b155 (10 -12). Initial contacts between the U2 snRNP and the pre-mRNA are mediated by the N terminus of SF3b155 binding to U2AF 65 and displacing SF1 from the branch point sequence (13).
PUF60 (poly-U-binding factor 60 kDa, also called FIR, Hfp, Ro-bp1) is a splicing factor homologous to and complementary in function to U2AF 65 . Similarly to U2AF 65 , its domain structure consists of a predicted intrinsically unstructured N terminus, two central RRM domains, and a C-terminal UHM. The UHM domain is special in that it has been reported to mediate homodimerization of Puf60 in SDS-PAGE (21). Full-length Puf60 was found to interact with itself in yeast two-hybrid analyses, suggesting that the oligomerization detected in SDS-PAGE also occurs under physiological conditions (26,27).
Puf60 was discovered as a poly-U RNA-binding protein required to reconstitute splicing in depleted nuclear extracts. Its function is enhanced by the presence of U2AF 65 , but not by the small U2AF subunit, U2AF 35 (21). Puf60 and U2AF 65 can interact in vitro and in yeast cells (21,26,27). It was recently demonstrated that Puf60 and U2AF 65 mutually enhance their affinity for binding polypyrimidine tract RNA in a cooperative fashion. Moreover, the ratio of U2AF 65 to Puf60 can directly influence selective inclusion or skipping of alternatively spliced exons in several genes (28). The function of Puf60 in splicing is thus closely linked to the function of U2AF 65 .
In addition to its role in alternative splicing, Puf60 also controls human c-myc gene expression. Under the synonym FIR (FBP-interacting repressor), Puf60 was reported to interact with and inhibit the transcription factor FBP (FUSE (far upstream sequence element)-binding-protein), an activator of c-myc promoters (29). Probably because of a similar mechanism, mutations in the Drosophila homolog of Puf60, Hfp (Half Pint), lead to increased expression of d-myc genes, thus negatively regulating cell cycle progression (30). Hfp mutations also lead to aberrant splicing of specific mRNAs in Drosophila ovaries (31). Similar to its mammalian ortholog Puf60, Hfp is thus a regulator both of transcription and of alternative splicing.
Here, we report that Puf60 UHM is mainly monomeric under physiological conditions, whereas it dimerizes upon the addition of SDS. The crystal structure of PUF60-UHM and mutational analysis reveal that the dimerization is entirely mediated by electrostatic interactions. NMR relaxation data show that the dimer interface involves a loop that is highly flexible in solution. Furthermore, we show that PUF60-UHM binds to ULM sequences in U2AF 65 , SF1, and SF3b155. The UHMs in PUF60 and U2AF 65 show preferences for binding to different ULMs in the N terminus of SF3b155. We propose that PUF60 and U2AF 65 may cooperatively recruit U2 snRNP by simultaneously binding to SF3b155.
NMR-All of the NMR spectra were recorded at 300 K on a Bruker DRX500 spectrometer, processed with NMRPipe (32), and analyzed with NMRView (33). Backbone 1 H, 15 N, and 13 C resonances were assigned with standard triple resonance experiments (34). 15 N relaxation data were recorded as described (35). Dissociation constants were derived from chemical shift displacements in HSQC spectra upon the addition of ligands as described (36) (see supplemental data).
Crystallization and Data Collection-For crystallization, the chimeric thioredoxin-Puf60(460 -559) fusion protein was concentrated to about 70 mg/ml in 20 mM Tris (pH 7.0), 150 mM NaCl, 5 mM ␤-mercaptoethanol. The crystals were grown by vapor diffusion from hanging drops composed of 1 l of protein solution and 1 l of crystallization buffer (1.4 M (NH 4 ) 2 SO 4 , 50 mM potassium formate) suspended over 1 ml of the latter as reservoir solution. The crystals grew to sizes of about 100 ϫ 100 ϫ 500 m and were cryoprotected by serial transfer into a solution containing 20% (v/v) ethylene glycol, 1.5 M (NH 4 ) 2 SO 4 , 50 mM potassium formate). Diffraction data were recorded at beam-line PX01 of the Swiss Light Source (Villigen, Switzerland). Data processing and scaling was carried out with XDS (37).
Structure Determination and Refinement-The structure of the thioredoxin-Puf60 fusion protein was solved by molecular replacement as implemented in PHASER (38). The structure of E. coli thioredoxin (Protein Data Bank code 2TRX) and a homology model of Puf60-UHM generated with MODELLER (39) based on the structure of free SPF45-UHM (Protein Data Bank code 2PE8) as a template were used as search models. The solution comprises eight Trx-Puf60-UHM monomers that were refined in alternating cycles of model correction in COOT (40), and restrained refinement as implemented in REFMAC (41) and PHENIX.REFINE (Ref. 42; see Table 1 for structural statistics). Structures were visualized with PYMOL (DeLano Scientific LLC, San Carlos, CA). The eight UHM domains in the unit cell of the crystal structure can be superimposed onto a reported solution structure of Puf60-UHM (Protein Data Bank code 2DNY) with root mean square deviations of 0.9 -1.1 Å over 90 of 100 C␣ atoms. The solution structure, however, does not indicate dimerization of the Puf60-UHM.
GST Pulldown Experiments-GST-tagged ULMs (1 nmol) were mixed with 3 nmol of His 6 -tagged UHMs in 150 l of phosphate-buffered saline supplemented with 2 mM ␤-mercaptoethanol and 0.1% (w/v) Igepal CA-630 at 22°C and mixed vigorously for 1 h. For GST precipitation, 8 l of glutathione-Sepharose 4B (Amersham Biosciences) pre-equilibrated in phosphate-buffered saline were added and mixed vigorously for 30 min. The beads were washed three times for 1 min in the buffer described above and analyzed by SDS-PAGE. Western blotting was carried out with ␣-Puf60 antibody (Abcam 22819).
Puf60-UHM in solution (50 mM P i , pH 7.0, 150 mM NaCl, 5 mM dithiothreitol). NMR secondary chemical shifts (Fig. 1A) show that Puf60-UHM adopts the typical ␤1-␣A-␤2-␤3-␣B-␤4-␣C topology found for all RRMs and UHMs (20,43). The overall rotational correlation time ( c ) of Puf60-UHM was calculated from the ratio of the trimmed mean 15 N longitudinal (T 1 ) and transverse (T 2 ) relaxation times of residues with heteronuclear 1 H-15 N NOE values above 0.65 ( Fig. 1B) (44,45). The average 15 N T 1 /T 2 ratio for these residues is 7.4 ( Fig. 1C), corresponding to a c of 9.7 ns. However, at 50 MHz Larmor frequency and 297 K, c values of 8.3 ns (T 1 /T 2 ϭ 5.7) and 15.8 ns (T 1 /T 2 ϭ 17.7) would be expected for a 13-kDa monomer and a 26-kDa UHM domain dimer, respectively (46, 47) (gray lines in Fig. 1C). Thus, the observed relaxation times indicate the presence of a mainly monomeric rather than a dimeric form of Puf60-UHM. The slightly increased T 1 /T 2 ratio, compared with what is expected for a pure monomer, might result from some nonspecific aggregation, because the T 1 /T 2 ratio of Puf60-UHM lacking an N-terminal His tag (T 1 /T 2 ϭ 4.4, c ϭ 7.0 ns; data not shown) is consistent with a monomeric protein.
To further investigate the oligomerization state of the UHM, we used sedimentation velocity analytical ultracentrifugation (AUC). The AUC data also indicate a largely monomeric state of the UHM domain (Fig. 1D, solid gray line), whereas partial dimerization is observed at higher protein concentrations (Fig.  1D, dotted gray line). By fitting the AUC data to a monomerdimer equilibrium model, the dimerization constant is estimated to be K dimer ϭ 3-4 mM. The two central RRM domains of Puf60 were reported to dimerize in the presence of DNA (48). We therefore tested whether a construct comprising RRM1-RRM2 and the C-terminal UHM had a tendency to dimerize without DNA or SDS. Our AUC data indicate that this construct is largely monomeric as well (Fig. 1D, solid black line). Taken together, these data demonstrate that Puf60-UHM (in the absence of SDS) and Puf60 RRM1-RRM2-UHM (in the absence of DNA or SDS) are monomeric in solution. Therefore, the UHM dimerization observed in denaturing and reducing SDS-PAGE (21) is presumably induced by the experimental conditions.
The Three-dimensional Structure of the PUF60-UHM-Next, we determined the crystal structure of Puf60-UHM at 2.2 Å resolution. Diffracting crystals could only be obtained using a fusion protein, in which E. coli thioredoxin A (Trx) is connected to the N terminus of Puf60-UHM via a short linker sequence (49). We confirmed that the Trx-UHM construct dimerizes in SDS-PAGE similarly to what is seen for the UHM alone (data not shown).
The asymmetric unit consists of eight Trx-Puf60 fusion proteins arranged in a doughnut shape. Eight Trx molecules are stacked in two layers in the center of the doughnut, surrounded by a ring of eight PUF60-UHM domains ( Fig. 2A). Consistent with the NMR secondary chemical shifts, Puf60-UHM adopts a ␤1-␣A-␤2-␤3-␣B-␤4-␣C secondary structure. A central fourstranded ␤-sheet is sandwiched by helices ␣A and ␣B on one side and helix ␣C on the other side (Fig. 2B). As seen in other UHM structures (14,23), Puf60-UHM has an additional strand ␤3Ј adjacent to ␤4, which forms a ␤-hairpin extension to the central four-stranded ␤-sheet. The ␤3Ј strand comprises the conserved Arg-Xaa-Phe motif (RWF 535-537 in Puf60), which plays a crucial role in ULM binding in all known UHM⅐ULM complexes (14,15,23). A unique structural feature of Puf60-UHM is the presence of unusually long ␤2 and ␤3 strands, which form a ␤-hairpin that protrudes out of the ␤-sheet (Fig.  2B). In solution, the acidic ␤2-␤3 loop is flexible, as indicated by low heteronuclear NOE values, which drop to a minimum of 0.12 for Gly 504 (Fig. 1B). In contrast, an average heteronuclear NOE of 0.74 for residues 462-501 and 512-559 indicates the absence of internal motion on subnanosecond time scales.
Dimerization Interface-Because Puf60-UHM crystallizes at concentrations above the dimerization constant of 3-4 mM (1.5 mM in the mother liquor, 32 mM in the crystal lattice), we expected to detect a dimeric UHM in the crystal. Analysis with PISA (50) shows that each of the eight Puf60-UHM domains contacts three other Puf60-UHM domains, two in the same and one in a symmetry-related asymmetric unit. One of the UHM-UHM interfaces within an asymmetric unit is composed of charged interactions between the residues EEE (505-507) in the ␤2-␤3 loop of one protein monomer and Arg 467 , Arg 540 / Lys 541 in the adjacent strands ␤1 and ␤4 of the other dimer subunit, respectively (Fig. 2C). As shown in the electrostatic surface representation (Fig. 2C, right panel), the residues Arg 467 , Arg 540 , and Lys 541 form a positively charged surface, which is contacted by the negatively charged acidic ␤2-␤3 loop. The electrostatic interactions involve the tips of the long side chains of arginine/lysine, which contact glutamate/aspartate residues in the mobile ␤2-␤3 loop. Of the six salt bridges that can be formed, electron density is visible for a maximum of four contacts in any of the putative dimer interfaces in the asymmet- Most favored (# (%)) 1387 (92.3%) Additionally allowed (# (%)) 106 (7.1%) Generously allowed (# (%)) 2 (0.1%) Disallowed (# (%)) 7 (0.5%) a As defined in XDS (37). b As defined in REFMAC (41). c Residual isotropic B-factor after TLS refinement. d As defined in PROCHECK (59).
ric unit. Notably, the combinations of charged residues involved in direct salt bridges vary for the different dimer interfaces in the asymmetric unit.
To determine which residues are involved in the dimerization of Puf60 in SDS-PAGE, we introduced amino acid changes for the residues in the ␤2-␤3 loop (E501A/K502A/Q503A, E505A/E506A/E507A, and D508A/A509A/E510A) and of the positively charged residues that contact the acidic loop (R540A/ K541A, R467A, and RK540 -541AAϩR467A). The structural integrity of the E505A/E506A/ E507A mutant was confirmed by comparison of the HSQC spectra (supplemental Fig. S1). The integrity of the other mutants was confirmed by one-dimensional NMR (data not shown).
These findings indicate that the dimerization of Puf60 UHM involves the acidic residues 505 EEEDAE 510 in the flexible ␤2-␤3 loop and the basic residues Arg 467 and Arg 540 -Lys 541 . Salt bridges and electrostatic contacts between these regions thus mediate dimerization of Puf60-UHM in the presence of SDS and presumably also contribute to the small population of dimeric species in physiological buffers (Fig. 1D).
To confirm that the observed bands indeed correspond to dimerization of the UHM in SDS-PAGE and that the observed positions of the bands do not fortuitously appear at unusual positions, we mixed recombinant, purified ZZ-tagged wild type UHM (28.4/56.8 kDa for monomer/dimer; Fig. 2E, lane 1) with untagged wild type UHM (12.6/25.2 kDa; Fig. 2E, lane 6). Because the protein species at 41.2 kDa in lanes 2 and 3 is not contained in either pure ZZ-tagged UHM (lane 1) or untagged UHM (lane 6), the appearance of a mixed dimer species of the type ZZ-UHM⅐UHM at 41.2 kDa (lanes 2 and 3) proves the formation of a mixed dimer. The UHM mutant E505A/E506A/E507A does not form the mixed dimer species, confirming that the mutations impair the dimerization of the UHM in SDS-PAGE (Fig. 2E, lanes 4, 5, and 7).   Fig. S3). With further increasing SDS concentrations, both Puf60-UHM wild type as well as the mutant protein are denatured (Fig. 3B).
Binding of Puf60 UHM to Tandem ULMs-The dimerization propensity of Puf60-UHM opens the possibility that simultaneous binding of two UHM domains to two adjacent ULMs on the same peptide chain (tandem ULM motif) could cooperatively induce the dimerization also in the absence of SDS. Based on the distance of the ULMbinding sites of the Puf60 homodimer in the crystal structure, we estimated that the ULMs should be separated by a minimum of 15-20 residues in an extended conformation. We identified evolutionarily conserved tandem ULMs in intrinsically disordered regions of several proteins with the program SIRW (51). Of these, tandem ULMs in SF3b155 (194 -229, 210 -251 and 229 -269) and in the nuclear RNA helicase Prp16 (201-238) (Fig. 4, A and B) were tested experimentally for binding to Puf60-UHM.
Using Western blot detected GST pulldown experiments, we found that the tandem ULM sequence of SF3b155 (194 -229) binds Puf60-UHM (supplemental Fig. S4A). However, ITC (supplemental Fig.  S4, C and D) and NMR (supplemental Fig. S5A) S5B). Thus, dimerization of Puf60-UHM is not induced upon binding to these tandem ULMs in the absence of SDS. ULM binding in the presence of SDS (350 M to 1.4 mM) was not observed in GST pulldown experiments (data not shown). This is consistent with the observation that the SDS interaction Distinct ULM Binding Properties of Puf60 and U2AF 65 -GST pulldown experiments show that Puf60-UHM binds to ULMs in SF1, U2AF 65 , and SF3b155 (supplemental Fig. S4A). We quantified the affinities of Puf60-UHM for these ULMs by ITC (Fig. 4C, supplemental Fig. 4C, and  The ULM binding preferences of Puf60-UHM are distinct from those of U2AF 65 -UHM (supplemental Fig. S4B and Table  2). As reported previously, U2AF 65 -UHM preferentially binds the ULMs in SF1 (8,15) and SF3b155(317-357) (13,18,24). Of the ULMs in SF3b155, U2AF 65 -UHM binds to the ULM around Trp 338 (K d ϭ 6 M (18)) with higher affinity than to the one at Trp 200 (K d ϭ 16 M (18)). This preference is weak but reproducible (supplemental Fig. S4B) and has been described previously (18).
Structural Basis for ULM Specificity-To further characterize the binding specificity of Puf60-UHM for distinct ULMs we compared its structure with the structures of the UHM⅐ULM complexes of U2AF 65 ⅐SF1 (Protein Data Bank code 1O0P) and SPF45⅐SF3b155 (Protein Data Bank code 2PEH). As shown in Fig. 5C, the ULM-binding region of Puf60-UHM, defined by the NMR titrations (Fig. 5B), is structurally more similar to SPF45 than to U2AF 65 . Helix ␣A in U2AF 65 -UHM is N-terminally extended by four additional residues compared with the ␣A helices in Puf60-UHM and SPF45-UHM. As a consequence, the conformation of the ␤1-␣A loops in U2AF 65 -UHM differs considerably from Puf60-UHM or SPF45-UHM. It is likely that these differences, in combination with amino acid variations in the ULM sequences (Fig. 4A), determine the specificity of the UHM⅐ULM complexes. For example, SF1 has a longer stretch of positively charged residues preceding the ULM-tryptophan than the SF3b155 ULMs. In the U2AF 65 ⅐SF1 structure, this region contacts the highly negatively charged helix ␣A of U2AF 65 -UHM (10 Glu/Asp residues). Because the length helix ␣A of Puf60 is shorter and because it is less negatively charged (5 Glu/Asp residues), ionic interactions involving these residues should contribute less to the ULM binding by Puf60-UHM. A second specificity-mediating region in the SF1 and SF3b155 ULMs involves

DISCUSSION
Puf60 was repeatedly found to interact with itself in yeast two-hybrid assays (26,27), and the Puf60 UHM domain was reported to be necessary and sufficient for the dimerization of Puf60 in SDS-PAGE (21   and NMR data show that the UHM domain is mainly monomeric in physiological buffer, whereas SDS is required for dimerization. A crystal structure and mutational analysis reveal a dimer interface, which is stabilized by electrostatic interactions and involves the acidic ␤2-␤3 loop of one subunit and basic residues (Lys 467 , Arg 540 , and Lys 541 ) in the ␤-sheet surface of the other subunit of the dimer. The acidic ␤2-␤3 loop is conserved in all higher eukaryotic orthologs of Puf60 but is distinct in other UHM or RRM domains (20). This suggests that Puf60 orthologs may have a similar dimerization mode, which is unique for Puf60 and not found in other UHMs.
The flexibility of the ␤2-␤3 loop in solution (indicated by the NMR relaxation data) and the variability of the electrostatic contacts seen in the crystal structure suggest that the dimer interface is dynamic. The electrostatic nature of the dimer interface presumably contributes to the stability of the Puf60-UHM dimer in SDS-PAGE (21) (Fig. 2, D and E). Because the dimerization interface is stabilized by electrostatic contacts, the SDS alkyl chains might not be able to energetically favor the solvation of the UHM monomer.
We found that a longer construct, comprising the two central RRM domains and the UHM, is also largely monomeric in the Detergent-induced oligomerization has been reported for several membrane-associated proteins (52)(53)(54)(55)(56). No experimental evidence for a functional role of the SDS-induced dimerization of Puf60-UHM is known. However, it is possible that detergent-induced (or lipid-induced) dimerization might play a role for the molecular functions of Puf60. Alternatively, SDS may resemble a putative, as yet unknown ligand of Puf60.
Puf60 was reported to interact directly with U2AF 65 (26,27). Our data provide a rationale for how the two proteins interact and suggest that a minimal binding interface involves the ULM sequence of U2AF 65 and the UHM domain of Puf60. Note that binding of Puf60 to the U2AF 65 ULM can only occur if this ULM is not already bound by U2AF 35 -UHM, which has a significantly higher affinity. Thus, for this interaction to occur in vivo, there should be a population of U2AF 65 molecules that is not bound to U2AF 35 in the nucleus.
It was shown in pulldown experiments from nuclear extract that Puf60 associates with SF3b155 (28). We suggest that this interaction likely involves direct binding of Puf60-UHM to ULM sequences in the N terminus of SF3b155. Interestingly, the Puf60-UHM and U2AF 65 -UHM have distinct binding affinities for ULMs. Puf60-UHM binds only weakly to SF1-ULM, whereas this ULM strongly interacts with U2AF 65 -UHM. Furthermore, Puf60-UHM has a stronger affinity to SF3b155(194 -229) than to SF3b155(317-357), whereas the opposite is found for U2AF 65 -UHM ( Table 2). The affinity differences of these two UHM domains are rationalized by comparing structural models of these interactions. As shown in Fig. 5C, the ULMbinding region of the two UHMs is significantly different, which may be linked to the distinct binding preferences.
Our biochemical data imply that Puf60 and U2AF 65 can bind to the N terminus of SF3b155 simultaneously and noncompetitively (Fig. 6). The mutual enhancement of splicing activation by these two splicing factors (28) could thus involve simultane-ous and potentially cooperative recruitment of SF3b155 to the 3Ј splice site.
Recently, it was reported that the UHM domain of the kinase KIS strongly binds to SF1 (similar to U2AF 65 -UHM) (22). It also binds to ULMs in the N terminus of SF3b155 and prefers Trp 200 over Trp 338 (similar to Puf60-UHM) (57). The distinct binding preferences of the Puf60, KIS, and U2AF 65 UHMs suggest that UHM-ULM interactions have evolved to achieve some binding selectivity. Thus, a given ULM cannot be classified as strong or weak but might bind with differential affinity to each UHM. Our data provide molecular insights into the intricate network of UHM-ULM interactions. Structure-based analysis allows the design of mutations in ULM and/or UHM sequences for modulating this network and studying its role in the regulation of splicing in vivo.  (18). The domain structure of the proteins is shown.