Snapshots of the RNA Processing Factor SCAF8 Bound to Different Phosphorylated Forms of the Carboxyl-terminal Domain of RNA Polymerase II*

Concomitant with RNA polymerase II (Pol II) transcription, RNA maturation factors are recruited to the carboxyl-terminal domain (CTD) of Pol II, whose phosphorylation state changes during a transcription cycle. CTD phosphorylation triggers recruitment of functionally different factors involved in RNA processing and transcription termination; most of these factors harbor a conserved CTD interacting domain (CID). Orchestration of factor recruitment is believed to be conducted by CID recognition of distinct phosphorylated forms of the CTD. We show that the human RNA processing factor SCAF8 interacts weakly with the unphosphorylated CTD of Pol II. Upon phosphorylation, affinity for the CTD is increased; however, SCAF8 is promiscuous to the phosphorylation pattern on the CTD. Employing a combined structural and biophysical approach, we were able to distinguish motifs within CIDs that are involved in a generic CTD sequence recognition from items that confer phospho-specificity.

Concomitant with RNA polymerase II (Pol II) transcription, RNA maturation factors are recruited to the carboxyl-terminal domain (CTD) of Pol II, whose phosphorylation state changes during a transcription cycle. CTD phosphorylation triggers recruitment of functionally different factors involved in RNA processing and transcription termination; most of these factors harbor a conserved CTD interacting domain (CID). Orchestration of factor recruitment is believed to be conducted by CID recognition of distinct phosphorylated forms of the CTD. We show that the human RNA processing factor SCAF8 interacts weakly with the unphosphorylated CTD of Pol II. Upon phosphorylation, affinity for the CTD is increased; however, SCAF8 is promiscuous to the phosphorylation pattern on the CTD. Employing a combined structural and biophysical approach, we were able to distinguish motifs within CIDs that are involved in a generic CTD sequence recognition from items that confer phospho-specificity.
Eukaryotic RNA transcription by RNA polymerase II (Pol II) 2 is accomplished in a fine-tuned interplay of Pol II, RNA transcription factors, and RNA processing factors. The regulatory platform for this coupling of nuclear events of RNA biogenesis and RNA maturation is thought to be the carboxyl-terminal domain (CTD) of the largest subunit Rpb1 of Pol II. The CTD is required for efficient capping, splicing, cleavage, and polyadenylation reactions (1)(2)(3)(4), and an orchestration of specific RNA processing factor recruitment is accompanied by a dynamic phosphorylation pattern of the CTD during a transcription cycle (5)(6)(7)(8)(9)(10). The CTD consists of multiple tandem-heptapeptide repeats of the consensus sequence 1 YSPTSPS 7 , and this sequence is highly conserved from yeast to human. All three serine residues can be phosphorylated during transcription elongation (11)(12)(13), and a varying phosphorylation pattern is generally believed to be the result of a balanced action of sitespecific CTD kinases and phosphatases. However, phosphorylation of either Ser-2, Ser-5, or Ser-7 or a combination of several phosphorylation sites seems not to be equivalent in function (14,15); moreover, individual phosphorylation events reflect the actual position of Pol II within the transcription cycle. Whereas position Ser-5 becomes phosphorylated when Pol II is in promoter-proximal regions and thereby leads to recruitment of the capping enzyme (1, 16 -18), position Ser-2 is predominantly phosphorylated when Pol II is in regions that are more distal from the promoter. This triggers binding of different RNA 3Ј-end processing factors (18,19). Phosphorylation of Ser-7 was recently shown to be involved in the transcription and processing of small nuclear RNA in mammalian cell lines and is required for Integrator recruitment (13). Although Pol II phosphorylated at position Ser-7 of the CTD is also found on mRNA genes in chromatin immunoprecipitation experiments (12), a specific function of phosphorylation at position Ser-7 during transcription of mRNA genes remains enigmatic. Apparently, a so-called "CTD code" (20,21) is enciphered by phosphorylation events of either position, Ser-2, Ser-5, or Ser-7, within one repeat or by a combination of several phosphorylation events. In addition, a CTD code need not be restricted exclusively to one single repeat but can extend over several repeats of the multiple tandem-heptapeptide sequence. Indeed, genetic analysis indicates that the minimal functional unit of the CTD lies within heptapeptide pairs (22,23). A CTD code becomes even more complex, since the two proline residues, Pro-3 and Pro-6, can either be in cis or trans configuration. A functional role for prolyl isomerization in transcription is demonstrated by the fact that the prolyl isomerase Pin1 (in yeast Ess1) binds to the CTD (24) and can influence CTD phosphorylation and Pol II transcription (25,26).
Obviously, such a CTD code should also be paralleled by the factors that bind to the CTD at different stages of the transcription cycle. One of the first systematic screens for CTD-associated proteins involved a yeast two-hybrid screen with mammalian CTD as bait and identified four CTD-binding proteins from rat that interact with Pol II in vivo, designated rA1, rA4, rA8, and rA9 (15). All four CTD-binding proteins contain a common Ser/Arg diamino acid-rich polypeptide stretch, which led to the renaming of the proteins to SCAFs (SR-like CTDassociated factors) (27). Two of them, SCAF4 and SCAF8, con-tain a conserved CTD-interacting domain (CID) at the amino terminus and an RNA recognition motif positioned shortly after the Ser/Arg-rich domain. SCAF8 was previously reported to bind to the Ser-2 and Ser-5 double-phosphorylated form of the CTD of Pol II in vivo and to co-localize to sites of transcription (27). Whereas SCAF proteins seem only to occur in mammals, a yeast protein, Nrd1, was reported to share sequence homology to SCAF4 and SCAF8 in the CID as well as in the RNA recognition motif and to have a similar domain arrangement (15). Like SCAF8, Nrd1 binds to the phosphorylated form of the CTD utilizing a CID. Whereas SCAF8 was postulated to be a splicing-related protein (27), Nrd1 was shown to be involved in Pol II termination at small nucleolar RNA genes (28 -30) and apparently is also involved in the regulation of cryptic unstable transcripts (31,32). Furthermore, CIDs are also found in protein factors that couple mRNA 3Ј-end processing to transcription. In common with SCAF4, SCAF8, and Nrd1, the mRNA 3Ј-end processing factor Pcf11 binds to the CTD using an amino-terminal CID. The overall domain architecture of Pcf11, however, is different, since the Pcf11 polypeptide chain does not contain any RNA recognition motif. Pcf11 binds to the unphosphorylated CTD (33), but binding is enhanced upon Ser-2 phosphorylation (33)(34)(35). The most sequence-divergent CID is found in the amino-terminal region of the yeast protein Rtt103, which is part of the Rat1 exonuclease complex that promotes Pol II transcription termination (36). In contrast to all CIDs reported so far, Rtt103 exclusively binds to the Ser-2-phosphorylated form of the CTD (36).
In summary, CIDs are highly conserved at their amino acid sequence level, but they vary in their CTD affinity and specificity to various phospho forms of the CTD. Thus, CIDs appear to be a privileged family of CTD binding domains, whose structural and biophysical characterization should enable us to shed substantial light on the CTD code. To date, structural data are limited to the RNA 3Ј-end processing factor Pcf11 bound to a synthetic Ser-2-phosphorylated CTD peptide (37). Based on the crystal structure, an indirect read-out for the phosphorylation status of the CTD by stabilization of a ␤-turn conformation around the phosphorylated Ser-2 (Ser(P)-2) residue within the 2 pSPTS 5 motif was proposed. The phosphate moiety of Ser(P)-2 is involved in a network of intramolecular hydrogen bonds within the ␤-turn, but the phosphate group is not bound or recognized directly by the CID of Pcf11 at all. Such an indirect read-out of the phosphorylation status of CTD by Pcf11-CID was corroborated by the results of NMR experiments (38), although these results were interpreted to support an inducedfit mechanism upon CTD binding rather than for a recognition of a stabilized preformed conformation.
In this report we characterize the CTD binding properties of the CID of the human SCAF8 protein. In a structural and biophysical approach, we addressed the question of specificity of binding of SCAF8 to different phosphorylation forms of the CTD. By means of co-crystal structures, we further addressed the question of whether a CTD phosphorylated at position Ser-2 and Ser-5 can bind to SCAF8 in a ␤-turn conformation and how and to which extent different phosphorylated and unphosphorylated forms of the CTD are recognized by the CID of SCAF8. Additionally, for the first time we obtained structural evidence for a CTD phosphorylated at position Ser-7.
Cells were harvested by centrifugation and resuspended in buffer A (50 mM Tris pH 8.0, 500 mM NaCl, 30 mM (NH 4 ) 2 SO 4 , and 10 mM ␤-mercaptoethanol). Cell walls were broken by sonication, and the cell debris was cleared by centrifugation. The supernatant was loaded onto a 1-ml column volume HisTrap FF column (GE Healthcare) equilibrated with buffer A. Unspecific bound proteins were removed in a high-salt wash with buffer A containing 1 M NaCl. Bound proteins were eluted in a gradient of 15 column volumes to buffer A supplemented with additional 250 mM imidazole. Fractions containing SCAF8-CID were pooled and diluted with buffer B (50 mM Hepes pH 7.3, 25 mM NaCl, 30 mM (NH 4 ) 2 SO 4 , 1 mM EDTA, and 1 mM dithioerythritol) and loaded over a 1-ml column volume Mono S column (GE Healthcare) equilibrated with buffer B. Bound proteins were eluted in a gradient of 15 column volumes to 1 M NaCl. Finally, the SCAF8-CID was further purified by size exclusion chromatography using Superose 12 (GE Healthcare) equilibrated with buffer C (25 mM Hepes pH 8.0, 100 mM NaCl, 1 mM EDTA, and 1 mM dithioerythritol). The purification procedure of the selenomethionine-labeled SCAF8-CID(N15M/ Y17M) variant was performed similarly as described for wild-type protein, except that a concentration of 5 mM dithioerythritol was used instead of 1 mM dithioerythritol. All purification steps were monitored by SDS-PAGE gel electrophoresis, and complete selenomethionine incorporation was confirmed by matrix-assisted laser desorption ionization time-of-flight mass spectrometry. For crystallization, the protein was concentrated to 20 mg/ml measured by the Bradford assay (Bio-Rad) using lysozyme as a standard.
Fluorescence Anisotropy Measurements-Double repeat peptides of 5,6-carboxyfluorescein (FAM)-labeled, biotinylated, and unlabeled CTD peptides were purchased from Anaspec (San Jose, CA) and used for competitive fluorescence anisotropy experiments (Table 1). Measurements were carried out in a fluorescence spectrometer in T-configuration (Model FL322, Jobin Yvon) in buffer C at 25°C. Samples were excited with vertically polarized light at 477 nm, and both vertical and horizontal emission was recorded at 525 nm. For binding experi-ments, wild-type SCAF8-CID or mutated variants were titrated into a reaction mixture containing buffer C supplemented with 2 M of either FAM-Ser(P)-2 or FAM-Ser(P)-5. In competition experiments, a fixed amount of biotinylated or unbiotinylated CTD peptides (given in the figure legends) was added to the reaction mixture before titration of the protein. Protein and peptide concentrations were determined according to their absorbance at 280 nm. Data were fitted to the cubic equation applying a 1:1 competitive binding mode as described (41). Estimation of K d S.E. from competition experiments were obtained by residual resampling (42). 100 sample data sets were refitted including the uncertainties of the reference K d -values and peptide concentrations, for which the variance could be estimated experimentally.
Crystallization and Crystal Flash-cooling-Both native free SCAF8-CID as well as selenomethionine-labeled free SCAF8-CID(N15M/Y17M) crystals were obtained in a hanging drop vapor diffusion setup using 100 mM citric acid buffer, pH 5.5, 2.8 M (NH 4 ) 2 SO 4 and varying concentrations from 0 to 6% (v/v) glycerol as a reservoir solution. In all experimental setups a total of 1.5 l of the protein solution was mixed with equal volumes of the reservoir solution and equilibrated against 700 l of reservoir at 298 K.
Unlabeled and biotinylated CTD peptides for co-crystallization were identical to those used in the fluorescence anisotropy measurements. Briefly, the CTD peptides were mixed with protein in 2-fold molar excess and incubated at 305 K for 15 min before crystallization. Co-crystals of SCAF8-CID with Ser(P)-2/Ser(P)-5-CTD, Ser(P)-5-CTD and unphosphorylated CTD peptides were obtained when a reservoir solution containing 1.2 M Li 2 SO 4 and 1.6 M (NH 4 ) 2 SO 4 was applied, and for cocrystallization of SCAF8-CID with Ser(P)-2-CTD a mother solution containing 0.1 M potassium thiocyanate and 30% (w/v) polyethylene glycol 2000MME was used. Macroseeding with native crystal of free SCAF8-CID was required before crystallization after overnight equilibration. The only exception was the case of co-crystals with an unphosphorylated CTD peptide and the SCAF8-CID(N15M/Y17M) variant, which appeared immediately without seeding. In contrast, complex crystals of SCAF8-CID with a Ser(P)-2/Ser(P)-7-CTD peptide were obtained using 3.2 M (NH 4 ) 2 SO 4 and 1% (v/v) ethylene glycol as a mother solution. For cryo-protection, before flash-cooling in liquid nitrogen native, the SCAF8-CID Ser(P)-2-CTD and selenomethionine-substituted protein crystals were transferred into their reservoir solution containing additional 20% (v/v) glycerol, whereas co-crystals of SCAF8-CID with Ser(P)-2/Ser(P)-7-CTD peptides were transferred into the reservoir solution containing increased ethylene glycol concentration (final 10% (v/v)). All other SCAF8-CID/CTD complex crystals were transferred into a cryo-solution containing 2.5 M Li 2 SO 4 .
X-ray Data Collection, Phasing, and Structure Refinement-Synchrotron diffraction data were collected at the Swiss Light Source (Villigen, Switzerland) at 100 K (supplemental Table 1). Diffraction data were processed and scaled with the XDS package (43). Initial selenium sites were obtained from the selenomethionine-labeled SCAF8-CID(N15M/Y17M) data using SHELXD (44). For improved phasing power, crystals of this mutant were used for SAD data collection, since the wild-type protein only contained the amino-terminal and a second methionine residue in the amino acid sequence (Fig. 1). Phase improvement and density modification was carried out with SHELXE (45). An initial model for SCAF8-CID(N15M/Y17M) was manually built with O (46) and refined with CNS (47). Refinement convergence was achieved in cycles of manual building and subsequent structure refinement. Phase extension to the wild-type, high resolution SCAF8-CID data set was performed following a rigid-body protocol (47) using the SCAF8-CID(N15M/Y17M) model. In the later stages of final SCAF8-CID structure refinement, the program REFMAC (48) was used, including TLS refinement (49), and manual building was performed with COOT (50). Phases for the co-crystal structure of the SCAF8-CID/CTD peptides were obtained by rigid-body refinement (47) with the refined native SCAF8-CID. Model building and refinement was performed similarly as described above. For calculation of a free R-factor, a randomly generated set of 5% of the reflections from the diffraction data set was used and excluded from refinement. Model quality was assessed using the program PROCHECK (51). Statistics describing diffraction data and model quality are listed in supplemental Table 1. Structure factor amplitudes and atomic coordinates are deposited at the Protein Data Bank under the entries 3D9I, 3D9J, 3D9K, 3D9L, 3D9M, 3D9N, 3D9O and 3D9P.

RESULTS
SCAF8-CID Binding to the CTD-Because different CIDs bind to different phospho-forms of the CTD, we asked whether there is a generic affinity of SCAF8-CID for the unphosphorylated form of the CTD. When recombinant CTD was loaded onto immobilized SCAF8-CID, we could detect significant amounts of bound CTD, even after extensive washing of the reconstituted, immobilized SCAF8-CID/CTD complex (supplemental Methods Fig. 1A). However, when a mixture of phosphorylated and unphosphorylated CTD was applied, selective binding of the hyperphosphorylated CTD was observed (27). Apparently, SCAF8-CID might have a generic affinity for the CTD amino acid sequence per se, but binding is strongly enhanced upon phosphorylation of the CTD. To obtain more quantitative data, we performed fluorescence anisotropy titration competition experiments with FAM-Ser(P)-2-CTD peptides competing against unmodified or biotinylated peptides (Table 1). SCAF8 binds weakly to a double repeat of unphosphorylated CTD, and the equilibrium dissociation constant (K d ) can only be estimated to be higher than 1 mM (supplemental Fig. 1B). However, the affinity of SCAF8-CID for the CTD is dramatically enhanced upon both Ser-2 and Ser-5 phosphorylation, and the K d for SCAF8 binding to Ser(P)-2/Ser(P)-5-CTD was reduced to 19 (ϩ/Ϫ 2/2) M (Fig. 1). To avoid detrimental effects in our experiments caused by labeled peptides, we conducted a series of cross-controls where either fluorescently labeled FAM-Ser(P)-2-CTD or FAM-Ser(P)-5-CTD peptides competed against either unmodified or biotinylated peptides (supplemental Fig. 1, C and D). Whereas slightly increased affinities of SCAF8 to 5,6-carboxyfluorescein-labeled peptides were observed, the affinities of biotinylated and unmodified Ser(P)-2/Ser(P)-5 CTD peptides remained constant. Thus, we concluded that both unmodified and biotinylated CTD peptides can be used in our experiments, but only in measurements where the fluorescent probe was displaced.
We further addressed the question of whether high affinity binding of SCAF8 requires both residue Ser-2 and Ser-5 to be phosphorylated, since mutations of either Ser-2 or Ser-5 within the CTD sequence abolished binding to the hyperphosphorylated CTD (27). In accordance with the qualitative results, we found that SCAF8-CID binds to Ser(P)-2-CTD with a slightly weaker affinity (K d ϭ 68 (ϩ/Ϫ 8/6) M) than to Ser(P)-2/ Ser(P)-5 CTD. However, the affinity of SCAF8-CID for Ser(P)-5-CTD is decreased by an order of magnitude (K d ϭ 330 (ϩ/Ϫ 50/30) M) but is still higher than for unphosphorylated CTD. This suggests that Ser-2 phosphorylation is a key determinant of high affinity binding of SCAF8, but phosphorylation at position Ser-5 contributes to a much weaker extent. To corroborate this hypothesis and to explore the structural features of SCAF8-CID for its CTD specificity and affinity, we determined the crystal structures of SCAF8-CID alone and in complex with various phospho-forms of the CTD.
Structure Determination-The structure of human SCAF8-CID (hereafter referred to as free SCAF8-CID) was solved by single anomalous dispersion methods using selenomethioninesubstituted protein crystals of a double mutant SCAF8-CID(N15M/Y17M) and phase-extension to 1.6 Å using a native data set. Co-crystal structures were obtained with peptides representing four different phosphorylation states of the CTD, namely Ser(P)-2/Ser(P)-5-CTD, Ser(P)-2-CTD, Ser(P)-5-CTD, and Ser(P)-2/Ser(P)-7-CTD, and furthermore, the co-crystal structure for SCAF8-CID with an unphosphorylated CTD could be determined using phases from the native structure (supplemental Table 1). It is worthwhile to mention that both SCAF8-CID molecules within the asymmetric unit (AU) had a CTD peptide bound for all co-crystals, except for the structure with the unphosphorylated CTD peptide, where only one peptide molecule per AU was observed. Whereas the peptide bound to one molecule was fully solvent-exposed, the second additionally contacted a symmetry-related molecule, and the conformation observed is probably influenced by the crystal contact. Thus, in the following discussion we will only focus on observations made for the solvent-exposed peptide.
Structure of Free SCAF8-CID-The SCAF8-CID folds into an eight-helix bundle in a right-handed superhelical arrangement (Fig. 2), and the overall structure of SCAF8-CID closely resembles the CID-structure of Pcf11-CID (37,38) with the highest root mean square deviation on C␣ positions of 1.7 Å over the entire polypeptide chain. However, in three distinct regions the SCAF8-CID fold deviates from Pcf11-CID. One difference observed is in helix 1 of SCAF8-CID which transforms after three ␣-helical turns into a 3 10 helix with two additional turns (Fig. 2B). In contrast, helix 1 in Pcf11-CID is purely ␣-helical. Furthermore, this difference causes the adjacent carboxyl-terminal loop between helix 1 and helix 2 to adopt a different conformation from that observed for Pcf11-CID. A rather unusual finding was that this loop contains two proline residues which are both in cis configuration. Notably, this region was recently proposed to be involved significantly in Pcf11-CID binding to RNA (52) and is structurally different in SCAF8-CID. Most important is the second difference found in helix 4,  which contains the majority of residues involved in CTD binding and which is highly conserved in CIDs ( Fig. 2A). An amino acid insertion in SCAF8 relative to Pcf11 leads to an additional helical turn in helix 4. Hence, the loop between helix 4 and helix 5 adopts a different geometry, and helix 5 is also extended by an additional helical turn when compared with the Pcf11-CID structure. Last, the turn between helix 7 and helix 8 is shorter in SCAF8-CID compared with Pcf11-CID, and the carboxyl-terminal helix 8 packs closer to the hydrophobic core of SCAF8-CID than in Pcf11-CID. A very intriguing property of SCAF8-CID is the high abundance of positively charged residues at the molecular surface (Fig. 2C). This was somehow expected, since the amino acid sequence already revealed a high content of positively charged residues. However, it is worthwhile to mention that no significant negatively charged surface patch could be identified.
The CTD Is Bound in a ␤-Turn Conformation-As anticipated from the structural and amino acid sequence conservation between SCAF8-CID and Pcf11-CID ( Fig. 2A), we expected SCAF8 to bind the Ser(P)-2/Ser(P)-5-CTD in a ␤-turn conformation, with the phosphate moieties protruding from a shallow binding groove on SCAF8-CID. In the co-crystal structure of SCAF8-CID with a Ser(P)-2/Ser(P)-5-CTD peptide, one repeat of the double repeat peptide is bound in a ␤-turn conformation (Fig. 3A). Residues belonging to one entire repeat plus Tyr-8 and Ser(P)-9 of the adjacent carboxyl-terminal repeat could be unambiguously modeled into the electron den-sity (Fig. 3), whereas the rest of the peptide was apparently disordered and not visible in the electron density.
There are two SPXX motifs within the CTD amino acid sequence that were predicted to form ␤-turn structures (53). Similar to the CTD bound to Pcf11, the 2 pSPTS 5 motif adopts this conformation; however, the 5 SPSY 8 motif is bound in an extended conformation. A further similarity in CTD binding of these two CIDs is that the bound proline residues are found in all trans configurations. In the case of Pcf11, it was recently shown that Pcf11-CID binds exclusively to a population of CTD molecules that is in all trans configuration (38), and a similar binding preference is, thus, likely the case in SCAF8.
SPXX motifs require two intramolecular hydrogen bonds to form a stable ␤-turn structure (53). In the co-crystal structure of SCAF8-CID and Ser(P)-2/Ser(P)-5-CTD, these two hydrogen bonds are established from Ser(P)-2 to backbone amides from Thr-4 and Ser(P)-5. Additionally, we observe a third short intramolecular hydrogen bond in the ␤-turn formed between the hydroxyl group of the Thr-4 and an oxygen atom of the phosphate group of Ser(P)-2 (Fig. 3A). Notably, this hydrogenbond network was also observed in the co-crystal structures of SCAF8-CID in complex with a CTD peptide phosphorylated at position Ser-2 or at both Ser-2 and Ser-7, respectively (Fig. 3, B  and E). Thus, we initially concluded that Ser-2 phosphorylation stabilizes the ␤-turn conformation of the CTD when bound to SCAF8, similarly as proposed for Pcf11 (37). However, such a stabilizing effect was disproved by the finding that the CTD not FIGURE 2. Primary and tertiary structure of SCAF8-CID. A, sequence alignment of CIDs from human SCAF8 (AAH70071), human SCAF4 (AAH43353), and S. cerevisiae Nrd1 (CAA65493), Pcf11 (NP_010514), and Rtt103 (NP_010575). Conserved amino acids within CIDs are highlighted from dark green for identical amino acids to yellow for conserved amino acids. Secondary structure elements are illustrated as cyan cylinders for helical elements and gray lines for loop regions. Amino acids which contribute to the hydrophobic core are marked as black triangles. Residues that are involved in CTD binding are marked as purple squares if they are conserved within CIDs and as gray circles if not. Arg-112, which performs a direct read-out of the phosphorylation status of Ser(P)-2, is labeled with a red square. Residues subjected to site-directed mutagenesis are boxed. Sequence alignment was performed with ClustalW (59) and illustrated with ALSCRIPT (60). B, ribbon representation of SCAF8-CID. C, electrostatic surface potential over the range of ϩ/Ϫ 6 kT/e. D, surface representation colored according to the sequence conservation given in A. Ribbon diagram and surface representations were illustrated using PYMOL (61) and APBS (62). phosphorylated at position Ser-2 was also bound in a ␤-turn conformation (Fig. 3, C and D). In both co-crystal structures, when SCAF8 was bound to Ser(P)-5-CTD or unphosphorylated CTD peptides, the ␥-oxygen atom of Thr-4 is within hydrogenbonding distance to the ␥-oxygen atom of the unphosphorylated Ser-2 residue. Consequently, the number of hydrogen bonds formed within the ␤-turn is constant irrespective of the phosphorylation state of residue Ser-2. It still remains possible that the hydrogen bonds within the ␤-turn are stronger when residue Ser-2 is phosphorylated.
Direct Recognition of Ser(P)-2 but No Direct Read-out for Ser(P)-5-In contrast to Pcf11, SCAF8 is able to directly sense the phosphorylation status of Ser-2 within the Ser(P)-2/Ser(P)-5-CTD. This direct interaction is established between the side chain of Arg-112 of the SCAF8-CID and Ser(P)-2 side chain (Fig. 3A). In a bifurcated manner, the guanidium group of Arg-112 forms a salt bridge and a hydrogen bond to the Ser(P)-2 phosphate moiety. A similar mode of direct read-out of Ser(P)-2 was observed in the co-crystal structure of SCAF8-CID with the Ser(P)-2-CTD and the Ser(P)-2/Ser(P)-7-CTD (Fig. 3, B and E). But in co-crystal structures where Ser-2 is not phosphorylated (Fig. 3, C and D), the guanidinium group of Arg-112 is not within hydrogen-bonding distance to any CTD residue. Moreover, it is sometimes found in multiple conformations, suggesting that this particular residue is mainly involved in recognition of the phosphorylation state of Ser-2 of the CTD.
In contrast to the extended conformation of the Ser(P)-2/ Ser(P)-5-CTD when bound to the cis-trans isomerase Pin1 (54), SCAF8 binds the Ser(P)-2/Ser(P)-5-CTD in ␤-turn conformation. Both negatively charged phosphate groups are located in close proximity, only 4.5 Å away from each other, but from electrostatic considerations one would expect the two phosphate groups to be located at the furthermost possible distance. Assuming that both phosphate groups are singly protonated, the negative net charge of Ser(P)-2 can be compensated for by the salt bridge with Arg-112, allowing this close localization. Additionally, we observe electron density for putative water molecules within hydrogen-bonding distance to both phosphate moieties, which as hydronium ions could also compensate for the negative charge of the two groups. Although their tetrahedral coordination argues for water molecules, we cannot entirely exclude that cations such as sodium ions can occupy these positions.
Whereas Ser(P)-2 is tightly bound by SCAF8-CID, no direct interactions between the phosphate group of Ser(P)-5 and any residue of SCAF8-CID was observed in the co-crystal structure with the Ser(P)-2/Ser(P)-5-CTD. The closest possible hydrogen-bonding partner to an oxygen atom of the phosphate moiety of Ser(P)-5 is the ⑀-amino group of Lys-23 (Fig. 3A). However, this is a rather long range interaction (4.5 Å), and thus, the side chain of Lys-23 is probably not involved in direct read-out of the Ser(P)-5 phosphate group. A weak interaction is also supported by the flexible nature of Lys-23 with poorly defined electron density for the side chain. A similar observation was made in all other structures of SCAF8-CID/CTD complexes independent of whether residue Ser-5 is phosphorylated or not. A weak recognition of the phosphorylation state of residue Ser-5 of the CTD is also corroborated by the finding that in the co-crystal structure of SCAF8-CID with Ser(P)-5-CTD, we do not observe any shift in register of CTD binding. A shift in register would place Ser(P)-5 at the position of Ser(P)-2, which in principle could be favored by the interaction of the phosphate group and Arg-112. Furthermore, both SPXX motifs of the CTD could potentially adopt a ␤-turn conformation (53), and consequently, such a putative shift in register within the CTD sequence could not be excluded a priori. However, from our structural experiments it seems likely that SCAF8-CID has a generic sequence specificity for the CTD which ensures that the bound sequence stays in register and that phosphorylation at position Ser-5 only enhances affinity by favorable electrostatic interactions. This hypothesis is further corroborated by the positive electrostatic surface potential of SCAF8-CID.
Generic Sequence Specificity of SCAF8 to the CTD-The structures of SCAF8-CID with various phosphorylated forms of the CTD enabled us to distinguish sequence motifs in SCAF8 that are responsible for recognition of various phosphorylation modifications of the CTD from motifs that contribute to a generic specificity for the CTD sequence. Generic sequence specificity for the CTD amino acid sequence also arises from the finding that all residues in SCAF8-CID that bind to the CTD, irrespective of the phosphorylation status, are conserved between all CIDs ( Fig. 2A). In SCAF8-CID as well as in Pcf11, they reside in helix 4 and helix 7 and cluster into a conserved interaction patch at the molecular surface of SCAF8 (Fig. 2D). Only Tyr-1 and Pro-3 of the CTD sequence are recognized in this conserved manner. In the SCAF8-CID structure, Tyr-1 is recognized by the carboxyl group of the Asp-67 side chain that is within short hydrogen-bonding distance to the phenolic oxygen atom of Tyr-1 (Fig. 3). Thus, binding of SCAF8-CID to a CTD phosphorylated at position Tyr-1 is impossible, as observed in vitro (27). An important function of this conserved aspartate residue of CIDs for CTD binding in vivo was shown in the case of Pcf11, where mutations of the structural equivalent and the two consecutive residues to alanine residues led to a lethal phenotype in Saccharomyces cerevisiae (34). Additionally important for Tyr-1 recognition are hydrophobic interactions with Met-26 and the aromatic ring system of Tyr-64, which stacks against the side chain of Tyr-1 of the CTD (Fig. 3). Notably, Tyr-64 is a strictly invariant residue in all CIDs identified to date ( Fig. 2A). Aside from side chain interactions, the backbone of Tyr-1 is also locked into position by a hydrogen bond between the amide group of the latter residue and the carbonyl group of Ile-21 (Fig. 3). In summary, any shift in register between the two SPXX motifs within the CTD sequence is abolished by these strong interactions leading to Tyr-1 recognition. Furthermore, Tyr-64 is also involved in Pro-3 recognition because its side chain is sandwiched between Tyr-1 and Pro-3 of the CTD, and together with the conserved residues Val-113 and Leu-116, Tyr-64 creates a hydrophobic depression at the surface of SCAF8-CID which anchors Pro-3 of the CTD (Fig. 3).
Structural Differences among the SCAF8-CID CTD Complexes-Aside from the direct Ser(P)-2-CTD recognition, additional differences were noticed in the co-crystal structures of SCAF8-CID with various phosphorylation forms of the CTD. When a Ser(P)-2/Ser(P)-5-CTD is bound by SCAF8-CID, we observed electron density for a longer stretch of the extended region carboxyl-terminal to the ␤-turn, ranging from Ser(P)-5 to Ser(P)-9. In contrast, in the Ser(P)-2-CTD, Ser(P)-5-CTD, and Ser(P)-2/Ser(P)-7-CTD complexes, distinct electron density was only found up to residue Ser-5 or Ser(P)-5. Thus, SCAF8-CID can establish more interactions with the CTD when both residues Ser-2 and Ser-5 are phosphorylated. A unique feature of the SCAF8-CID Ser(P)-2/Ser(P)-5-CTD complex is that Tyr-8 of the CTD is bound by Gln-72. The side chain of the latter residue is within hydrogen-bonding distance of both the backbone amide and the carbonyl group of Tyr-8, and thereby the CTD is clamped onto the surface of SCAF8 (Fig. 3A). Additionally, a water molecule binds to the phenolic oxygen atom of Tyr-8. Although we observed electron density for Ser(P)-9, no significant direct interactions either with the backbone or a side chain were identified. Consequently, SCAF8-CID only recognizes Ser-2 phosphorylation within a single heptapeptide repeat.
A second interaction between SCAF8-CID and the extended region after the ␤-turn is established by the guanidinium group of Arg-71. Irrespective of the phosphorylation state of the CTD, the side chain forms hydrogen bonds to the carbonyl group of the backbone of Ser-5 of the CTD, but additionally, the guanidinium group is also within hydrogen-bonding distance to the phenolic oxygen of Tyr-1 of the CTD (Fig. 3). By these bifurcated hydrogen bonds, Arg-71 is possibly able either to measure or to influence the curvature of the ␤-turn. It is noteworthy to mention that the donor-acceptor distance between the phenolic oxygen atom and the amide group in Arg-71 becomes longer when Ser-2 is phosphorylated, since the side chain conformation is changed.
Except for the Ser(P)-2/Ser(P)-5-CTD bound to the CID, all other structures must have the second repeat bound in the ␤-turn conformation, as we observe electron density for the peptide chain amino-terminal to Tyr-1 (Fig. 3). The only significant interaction observed is a hydrogen bond between the carbonyl group of Pro-6 of the upstream repeat with the amide of Lys-23 of SCAF8-CID. A similar backbone interaction was also observed in the complex structure of Pcf11 with Ser(P)-2-CTD (37), and apparently binding an amino acid stretch upstream from the central tyrosine residue of the CTD seems to be a general property of CIDs.

DISCUSSION
We have determined the structure of SCAF8-CID alone and in complex with variously phosphorylated forms of the CTD and have verified SCAF8-CID binding specificity by fluorescence anisotropy titration competition experiments. Although the overall structure of SCAF8-CID and the CTD binding mode are very similar to Pcf11-CID, apparently subtle differences lead to specificity. The most obvious difference in CTD binding is the direct read-out of Ser-2 phosphorylation in SCAF8. This difference in direct interaction observed between Arg-112 of SCAF8-CID and the Ser(P)-2 phosphate moiety of the CTD could explain why the K d determined for SCAF8-CID is decreased by an order of magnitude relative to Pcf11 (38), when a phospho-CTD with highest affinity was used. Such an increase in affinity by direct read-out is further corroborated by the finding that the K d for a Ser(P)-2-CTD peptide is slightly increased when compared with results obtained with a Ser(P)-2/Ser(P)-5-CTD peptide, but affinity for a Ser(P)-5-CTD peptide is more strongly reduced. To emphasize the importance of this arginine residue of SCAF8-CID in CTD binding, we performed similar experiments on a variant of SCAF8-CID, where Arg-112 was replaced by a threonine residue (R112T see Table 2). Indeed, this variant showed decreased affinities to a CTDpeptide phosphorylated at position Ser-2, and the K d for a Ser(P)-2-CTD peptide was decreased to a value comparable with that determined for Ser(P)-5-CTD, which remained unchanged. However, the affinity for the Ser(P)-2/Ser(P)-5-CTD was not affected dramatically. Because we propose that Arg-112 neutralizes the negative charge of the phosphate moiety of Ser(P)-2, we wondered how the two phosphate groups could be accommodated within the ␤-turn in this variant. In the co-crystal structure of SCAF8-CID(R112T) with the Ser(P)-2/Ser(P)-5-CTD peptide, we observed a water molecule close to the phosphate groups that potentially compensates for a negative net charge (supplemental Fig. 2). Thus, it is rather likely that additional interactions contribute to the high affinity binding of SCAF8-CID to the Ser(P)-2/Ser(P)-5-CTD. An additional contribution to the high affinity for the Ser(P)-2/Ser(P)-5-CTD could also arise from residues of SCAF8 that interact with the backbone of the ␤-turn and the adjacent extended region. Using a protein variant in which both residues Arg-71 and Gln-72 were mutated to alanine (R71A/Q72A), we obtained similar results as observed for R112T (Table 2). For this mutant variant of SCAF8-CID, not only the affinity for the Ser(P)-2-CTD, but also for the Ser(P)-5-CTD peptide, was decreased. But the change in affinity of the SCAF8-CID(R71A/Q72A) for the Ser(P)-2/Ser(P)-5-CTD was less pronounced than expected, and mutation of all three residues (SCAF8-CID R71A/Q72A/R112T) also showed the same trend (Table 2). Thus, by serendipity we were able to produce a "supervariant" of SCAF8 that at least in steady state measurements apparently discriminates even better than wild-type SCAF8 between the double-phosphorylated Ser(P)-2/Ser(P)-5-CTD peptide and single-phosphorylated versions of the CTD.
We further hypothesize that electrostatic attractive force between the CTD and the SCAF8-CID plays a considerable role for CTD binding. Because SCAF8-CID binds the unphosphorylated CTD in a very similar fashion as any of the phosphorylated CTDs yet the equilibrium dissociation constants are very different, electrostatic steering (55) might influence SCAF8-CID CTD binding. This hypothesis is further supported by our preliminary experiments which indicate that the affinity for Ser(P)-2/Ser(P)-5-CTD compared with Ser(P)-2-CTD is much more dependent on the ionic strength (data not shown).
Although we see this dependence on electrostatic effects, which influences the affinity for a Ser(P)-2/Ser(P)-5-CTD, other effects might additionally play a role as Ser(P)-2/Ser(P)-7-CTD should show similar behavior, yet the K d is much higher than for Ser(P)-2/Ser(P)-5-CTD ( Fig. 1 and Table 2). Another possible explanation for our finding could be that a CTD in solution has different structural properties when both positions Ser-2 and Ser-5 are phosphorylated compared with a CTD that is only phosphorylated at either serine position or in which factors that influence, for instance, prolyl cis-trans isomerization play an important role for CTD binding of SCAF8. All these effects cannot be addressed by our equilibrium experiments, and such effects would only be revealed by kinetic experiments. Unfortunately, these experiments were not feasible due to the large amount of peptide required and the exceptionally high costs of their synthesis.
An important detail of the work presented here is that the CTD is bound in a ␤-turn conformation irrespective of the phosphorylation state of the CTD. Moreover, the number of intramolecular hydrogen bonds within the ␤-turn of the CTD remains constant, and thus, stabilization of this conformation by Ser-2 phosphorylation as discussed previously is rather unlikely. Because the side chain of the unphosphorylated Ser-2 is involved in this network of hydrogen bonds, caution has to be used when mutated versions of the CTD are investigated. Alanine screening in the CTD sequence is a method commonly used for deriving information on phosphorylation specificity of CTD-associating factors. Based on circular dichroism studies of free CTD in solution, it was recently proposed that mutations of Ser-2 to alanine only weakly influence CTD structure, but when Ser-5 was mutated to alanine the circular dichroism-spectra changed significantly (56). From our structural data, we can predict that also mutation of Ser-2 can have detrimental effects and might perturb binding experiments artificially.
For the first time we can now address the question of how various CIDs can exhibit different phospho-CTD specificity, which is information that could not be derived from the crystal structure of Pcf11 alone. Apparently, CIDs have a generic affinity for the CTD sequence per se, and residues that establish this generic affinity are highly conserved ( Fig. 2A). We, therefore, propose that all CIDs will bind a CTD in a ␤-turn conformation, and Tyr-1 and Pro-3 recognition will be conserved among them. However, residues that interact in SCAF8-CID with Ser(P)-2 and Ser(P)-5 are not conserved. For instance, Arg-112 of SCAF8-CID is conserved in SCAF4 and Rtt103 but not in Nrd1 from S. cerevisiae and Pcf11. This is in good agreement with the finding that Rtt103 exclusively binds to a Ser(P)-2-CTD (36) and that SCAF4 binds tighter to a CTD phosphorylated at position Ser-2 than at position  In contrast, Nrd1 does not have this particular side chain at this position, but instead it contains a methionine residue, and thus, a direct interaction of Nrd1 with any phosphate moiety at this position is rather unlikely. This nicely explains why in contrast to SCAF8 Nrd1 binds tighter to CTD phosphorylated at position Ser-5 than phosphorylated at position Ser-2. 4 An additional effect that contributes to the latter observation is that Arg-71 in SCAF8-CID, which binds to the backbone of Ser(P)-5, is invariant in the amino acid sequence of Nrd1 and SCAF4. However, this residue is a lysine in Pcf11; not only does the side chain not contact Ser-5 at all but it forms a hydrogen bond to Pro-3 in the crystal structure. Interestingly, in Rtt103 this residue is a glutamine.
Aside from this more apparent finding reported above, our data agree well with recently reported minimal sequence motifs within the CTD derived from genetic data in yeast (23), and similar to their predictions, we observe in the complex structure of SCAF8-CID with Ser(P)-2/Ser(P)-5-CTD, a stretch of the CTD bound that contains two tyrosine residues of adjacent repeats and a sequence motif of succeeding phosphoserines residues for Ser(P)-2-Ser(P)-5-Ser(P)-2 (Fig. 3A). This coincides with the hypothesis posed in the latter report that a requirement for a minimal CTD sequence is an amino acid stretch of 1 YSPTSPSYSP 10 . Our findings, therefore, suggest that similar requirements are also present in higher eukaryotes. Furthermore, also plant and yeast CTD phosphatases apparently require this minimal CTD sequence (57), and most intriguingly, the yeast CTD phosphatase Scp1 binds to a CTD that is also in a ␤-turn conformation and requires the same minimal CTD sequence (58). With the exception of the Ser(P)-2/Ser(P)-5-CTD complex structure, we observe electron density amino-terminal to Tyr-1 of the CTD, when other peptides were bound by SCAF8-CID. This could either point toward a slightly longer CTD recognition site than observed in the com-plex structure with Ser(P)-2/Ser(P)-5-CTD because of the lack of a third hepta repeat of the provided peptide or to a gliding of SCAF8-CID along the CTD, similar to what was described for Pcf11-CID (38).
Our current working model predicts that SCAF8 binds to a CTD probably shortly after transcription initiation when the CTD becomes phosphorylated at position Ser-5 and perhaps stays attached to the CTD. By gliding along the CTD, which becomes increasingly phosphorylated as transcription proceeds, SCAF8 could then primarily scan for stretches of the CTD sequence where phosphorylation at appropriate sites would be found or, strictly speaking, until a sequence motif of three consecutive phosphoserine residues Ser(P)-2-Ser(P)-5-Ser(P)-2 is found, at which time it would bind to the elongating Pol II with highest affinity. As soon as position Ser-5 becomes dephosphorylated, SCAF8 could again set forth to find other stretches of the CTD for binding. Our model would imply that SCAF8 also stays attached to a CTD phosphorylated at position Ser-2. Because the affinity for a Ser(P)-2-CTD is higher than for a Ser(P)-5-CTD, we predict that the highest density of SCAF8 would be found at RNA polymerase II that is in the middle of transcribing a gene and locates distal rather than proximal to the promoter of a gene.