Solution Structure of Tandem SH2 Domains from Spt6 Protein and Their Binding to the Phosphorylated RNA Polymerase II C-terminal Domain*

Spt6 is a highly conserved transcription elongation factor and histone chaperone. It binds directly to the RNA polymerase II C-terminal domain (RNAPII CTD) through its C-terminal region that recognizes RNAPII CTD phosphorylation. In this study, we determined the solution structure of the C-terminal region of Saccharomyces cerevisiae Spt6, and we discovered that Spt6 has two SH2 domains in tandem. Structural and phylogenetic analysis revealed that the second SH2 domain was evolutionarily distant from canonical SH2 domains and represented a novel SH2 subfamily with a novel binding site for phosphoserine. In addition, NMR chemical shift perturbation experiments demonstrated that the tandem SH2 domains recognized Tyr1, Ser2, Ser5, and Ser7 phosphorylation of RNAPII CTD with millimolar binding affinities. The structural basis for the binding of the tandem SH2 domains to different forms of phosphorylated RNAPII CTD and its physiological relevance are discussed. Our results also suggest that Spt6 may use the tandem SH2 domain module to sense the phosphorylation level of RNAPII CTD.

sites Ser 5 , Ser 7 , and Ser 2 (5). Specifically, both Ser 5 and Ser 7 are phosphorylated by the basal transcription factor TFIIH (Transcription factor II H) at the promoter-proximal region, whereas Ser 2 is phosphorylated by the kinase complex pTEFb⅐CDK9 at the gene-coding region (6,7). Various combinations of phosphorylation and cis-trans isomerization of prolines constitute the so-called "CTD code" (8,9), which serves as a recognition marker for diverse regulatory factors that are involved in transcription initiation, elongation, termination, mRNA processing, and mRNA transport (7,10).
Spt6 is an elongation factor and histone H3⅐H4 chaperone, which interacts with the nucleosome and RNAPII in Saccharomyces cerevisiae (11,12). Spt6 activates transcription elongation of many genes in vivo (11,(13)(14)(15), regulates histone H3 lysine 36 methylation, and participates in mRNA 3Ј-end processing and export (16 -18). Many research reports have indicated that Spt6 cooperates with another histone H2A⅐H2B chaperone named FACT (FAcilitates Chromatin Transcription) in the reassembly of chromatin in the wake of transcribing RNAPII (12, 19 -21). In addition, functional loss of Spt6 results in an aberrant short transcript, which begins from the cryptic promoter in the coding region of genes (15). A previous study reported that murine Spt6 associated with hyperphosphorylated RNAPII in cell extracts through its C-terminal region (residues 1476 -1726), where an SH2 domain preferentially recognized Ser 2 phosphorylation of RNAPII CTD (18). A point mutation of the phosphate-binding arginine (Arg 1528 ) to lysine in this SH2 domain impaired the binding ability of Spt6 to RNAPII. Cells that contain this R1528K mutant produced transcripts with splicing defects and had malfunctions in mRNA export (18).
Previous studies have shown that Spt6 contains the first SH2 domain reported to recognize the phosphoserine of RNAPII CTD rather than phosphotyrosine, which is the preferential partner of canonical SH2 domains (18). Therefore, the structural investigation of this unusual SH2 domain should increase our understanding of the interaction of this SH2 domain with RNAPII CTD. Recently, three groups reported the crystal structures of two tandem SH2 domains of Saccharomyces cerevisiae, Candida glabrata, and Antonospora locustae Spt6 (22)(23)(24). They found that the tandem SH2 domains directly bound to Ser 2 -phosphorylated RNAPII CTD, and both SH2 domains were essential for Spt6 function in vivo. In all three of the structures, the first SH2 domain (designated "SH2N" here-after) had a canonical pocket for phosphate recognition. However, due to the lack of a complex structure with RNAPII CTD, two of the groups had different hypotheses about the binding site in the second SH2 domain (designated "SH2C"). To solve this discrepancy, we employed the solution NMR approach to characterize the interaction between these tandem SH2 domains (designated "SH2NC") and RNAPII CTD at the atomic level.
Here, we report the solution structure of the C-terminal region (residues 1250 -1440) of S. cerevisiae Spt6. The structure reveals that this fragment consist of two tandem SH2 domains, which are packed in a head-to-tail manner through a conserved interdomain hydrophobic core. Using NMR chemical shift perturbation experiments, we have determined the binding affinities and interaction surface of the tandem SH2 domains with RNAPII CTD peptides phosphorylated at one or more residues (Tyr 1 , Ser 2 , Ser 5 , or Ser 7 ). Data show that the SH2N domain binds to phosphoserine or phosphotyrosine with the same canonical phosphate binding pocket, which is missing in the SH2C domain. However, the SH2C domain still weakly binds to phosphoserine via a conserved noncanonical binding site and enhances the association between SH2N and RNAPII CTD. Surprisingly, we found that the tandem SH2 domains also bound to Ser 7 -or Tyr 1 -phosphorylated RNAPII CTD, which were reported to be involved in snRNA expression or transcription-coupled DNA repair, respectively (1,25,26). Our data suggest that the SH2NC module of Spt6 can sense the phosphorylation level of RNAPII CTD because the weak binding between the SH2NC module and RNAPII CTD can be additively enhanced by simultaneous phosphorylation of the same or different repeat units of RNAPII CTD.
Chemical Shift Perturbation Experiments-Both synthetic peptides (SciLight-Peptide Biotechnology LLC, Beijing, China) and 15 N-labeled proteins were prepared in the same buffer: 10 mM Bis-Tris, pH 7.0, 100 mM NaCl, 1 mM DTT, 1 mM EDTA, 10% (v/v) 2 H 2 O, and adjusted to pH 7.0 if needed. A series of 1 H-15 N HSQC spectra were recorded at 310 K as the peptides were titrated into the 500-l 0.5 mM protein solution. The total volume of added peptide solution did not exceed 50 l to ensure that there was no significant dilution of the protein. The combined chemical shift change ⌬ is calculated using Equation 1, where ⌬ 1H and ⌬ 15N are chemical shift changes in the 1 H and 15 N dimensions, respectively. The dissociation constant K d is deduced by fitting the combined chemical shift change to the function below (Equation 2), where ⌬ max denotes the maximal change in chemical shift when the protein is saturated by the peptide, and c is the total concentration of peptide in solution.
Phylogenetic Analysis-We searched the PDB database with the structure of yeast Spt6 SH2C (residues 1359 -1440) on the DALI server (33). Structures with a Z-score of Ͼ5.0 were aligned with SH2C. The secondary structure region was aligned according to the three-dimensional comparison, whereas the loop region was aligned using the ClustalW program (34). With the established sequence alignment (see supplemental data set 1), a neighbor-joining phylogenetic tree was built in the MEGA4 program (35). Tandem-It is reported that RNAPII can be pulled down by a C-terminal fragment (residues 1162-1496) of murine Spt6 (18), which encompasses an SH2 domain (residues 1250 -1355) and an additional C-terminal region. The predicted secondary structure (data not shown) of this additional region from different species exhibited a pattern very similar to that of the upstream SH2 domain (namely SH2N), indicating that this region (namely SH2C) is evolutionarily conserved and might also be involved in the interaction between Spt6 and RNAPII.

Spt6 Has Two SH2 Domains in
Sequence analysis indicated that the SH2C domain shares little sequence similarity with canonical SH2 domains and is present only in Spt6. The SH2C domains of the Spt6 proteins were not detected by several important databases, including Uniprot, SMART, and Pfam. Further BLAST searches using SH2C as a query in the Uniprot database did not identify any SH2 domains from any other protein except for the Spt6 homologues. A sequence alignment showed that the SH2C domain is less conserved than the SH2N domain across different species (see Fig. 2). For instance, between yeast and human, the SH2C domains have only 7.3% sequence identity, compared with 16.5% for the SH2N domains.
We determined the solution structure of the tandem SH2 domains (residues 1250 -1440) of S. cerevisiae Spt6 using heteronuclear NMR spectroscopy. Structures were calculated with experimental restraints such as NOE, predicted dihedral angles, backbone amide 1 H-15 N RDC data, and hydrogen bond information from a hydrogen-deuterium exchange experiment (Table 1). Fig. 1A shows 20 energy-minimized structures of the tandem SH2 module. The backbone root mean square devia-tion is 0.63 Å for the well restrained region (residues 1259 -1283, 1291-1310, and 1321-1438). The overall architecture of the structure clearly displays two SH2 domains arranged in a head-to-tail manner. Both domains adopt the canonical SH2 fold, which consists of three anti-parallel ␤ strands (␤B, ␤C, and ␤D) that are sandwiched by two ␣ helices (␣〈 and ␣〉) at the N and C termini (Fig. 1B). Two additional short anti-parallel ␤ strands (␤⌭ and ␤F) are inserted between the ␤D strand and ␣〉 helix (Fig. 1B). Although the sequence similarity between the SH2N and SH2C domains is very low (13.3% identity), their backbone C ␣ atoms can be superimposed well with a 1.1 Å root mean square deviation for the secondary structure region (Fig.  1B).
Two SH2 Domains Intimately Associate with Each Other-The tandem SH2 domains are closely packed against each other through hydrophobic residues and are connected by a short linker (residues 1553-1558). Notably, compared with the SH2C domain, the SH2N domain has a longer ␣〉1 helix, which makes extensive contacts with the SH2C domain. Indeed, some long range NOEs were identified between residues in the ␣〉1 helix of SH2N (Leu 1346 , Met 1350 ) and the SH2C domain (Phe 1383 , Phe 1393 , Tyr 1422 , and Val 1427 ). These residues form an interdomain hydrophobic core (Fig. 1C). A sequence alignment further shows that except for Tyr 1422 , the residues are all type conserved among the Spt6 homologues (Fig. 2), suggesting the importance of this interdomain hydrophobic core in the evolution of this tandem SH2 domain module.
NMR dynamic and RDC data support the formation of the tandem SH2 domains into a rigid module. We measured the longitudinal (transversal) relaxation time T 1 (T 2 ), and the 1 H-15 N NOE data of the backbone amides (Fig. 3). These dynamic data revealed that the two SH2 domains have similar dynamic properties in solution, and the interdomain linker is as rigid as the secondary structure region. In addition, only one common alignment tensor is needed to back-calculate the measured RDC data for both domains, and these RDC values fit well with the measured ones (see supplemental Fig. S1). These results revealed that the two SH2 domains tumble as a whole in solution and have a fixed relative orientation (Fig. 2B).
Tandem SH2 Domains Bind to Phosphorylated RNAPII CTD-The interaction between the tandem SH2 domains and phos-  phorylated the RNAPII CTD peptides was investigated using chemical shift perturbation experiments. Eight unmodified or phosphorylated RNAPII CTD peptides were titrated into the 15 N-labeled protein, respectively ( Table 2). All of the phosphorylated peptides induced a number of distinct chemical shift perturbations in the 1 H-15 N HSQC spectra of SH2NC, whereas no perturbation was found for the unmodified peptide ( Fig. 4 and supplemental Fig. S2), indicating that the interaction between the SH2NC module and the RNAPII CTD is phosphorylationdependent. On the other hand, the observation of a single set of resonances averaged over the bound and free states in the 1 H-15 N HSQC spectra suggested that the interaction between the phosphorylated peptide and the protein was in the realm of fast exchange.
Yoh and coworkers (18) have reported that Spt6 preferentially binds to Ser 2 -phosphorylated (pS2) over Ser 5 -phosphorylated (pS5) RNAPII CTD. Our data confirmed that the binding affinity of SH2NC module to the pS2 peptide is higher than to the pS5 peptide (see Table 2). More interestingly, the pY1 peptide showed the highest binding affinity. We found that the tandem SH2 domains also bound to pS7, which is a recently confirmed phosphorylation site for RNAPII CTD (3,26).
Double phosphorylation at Ser 2 and Ser 5 were identified previously in the same RNAPII CTD repeat (6,7). We wanted to investigate whether these two phosphorylations act cooperatively in the interaction between Spt6 and RNAPII CTD. We found that adding pS5 to the pS2 peptide increased the binding affinity by ϳ2-fold. This result reveals that the simultaneous phosphorylation of Ser 2 and Ser 5 in the same repeat enhances the association between RNAPII CTD and Spt6 but does not exhibit a synergistic effect.
We also investigated whether longer peptides with multiple phosphorylation sites would bind more tightly to the SH2NC    module, which presumably has two binding pockets for RNA-PII CTD. Indeed, both the two-repeat peptide pS2(1)ϩpS (2) and the three-repeat peptide pS2(1)ϩpS2(3) with both Ser 2 sites phosphorylated have 2ϳ3 times higher affinities than the single repeat pS2 peptide (Table 2). These results show that SH2NC preferentially binds to RNAPII CTD peptides with double or multiple phosphorylation sites. Furthermore, it appears that two consecutive repeats is long enough to span the binding sites in these two SH2 domains, as indicated by the similar affinities of the pS2(1)ϩpS2(2) and pS2(1)ϩpS2 (3) peptides.
Using solution NMR, we showed that the SH2NC module of Spt6 bound to both phosphoserine and phosphotyrosine residues of RNAPII CTD with very low affinities (mM), which are difficult to detect by other analytical techniques such as fluorescence polarization, surface plasma resonance or isothermal titration calorimetry (23). Our work also confirmed that NMR is well suited to unravel extremely weak protein-protein interactions (36).
InteractionSurfaceonTandemSH2Domains-All phosphorylated RNAPII CTD peptides perturbed almost the same region in the SH2N domain (Fig. 4, A-E), whereas only a few residues in the SH2C domain were weakly affected by the   (Figs. 4B and 5C). Arg 1282 is an invariant residue corresponding to Arg 175 in the Src SH2 domain (Fig. 6, A and  B), which binds to phosphate through its positively charged side chain (Fig. 6A) (37). Other perturbed residues, except Leu 1316 and Gly 1319 , are located around the canonical binding pocket of the SH2N domain (Fig. 5, C and D). Among them, residues Asn 1263 , Gly 1264 , Arg 1265 , Gln 1266 , and Asp 1269 reside in the ␣A1 helix that is close to the side chain of R1282. Residues Asp 1289 -Val 1292 are in the ␤C1 strand opposite to Arg 1282 , whereas His 1304 and Ile 1307 are two residues located between the canonical pocket and the EF loop.
In the Src SH2 domain, the EF loop is involved in the recognition of the C-terminal hydrophobic residues of phosphotyrosine (37). However, in the SH2N domain, only one residue (Asp 1325 ) in this region was slightly perturbed (Fig. 4C).
Compared with pY1, pS5, and pS7 peptides, both the single repeat pS2 peptide and the double Ser 2 phosphorylated peptides (pS2(2)ϩpS2(2) and pS2(1)ϩpS2(3)) had titrations that resulted in a weak chemical shift perturbation of a few residues in the SH2C domain. Interestingly, all of these perturbed residues (Tyr 1381 , Tyr 1382 , Thr 1436 , and Leu 1437 ) are located on a positively charged surface where two invariant residues, Tyr 1381 and Lys 1435 , are found (Fig. 5, C and D). The structural analysis and sequence alignment of the SH2C domains from different species showed a high degree of conservation of this positively charged surface. In contrast, in the region corresponding to the canonical binding pocket of a canonical SH2 domain, the critical arginine residue in the ␤B2 strand is replaced by a serine (Ser 1384 ) (Fig. 6C). More importantly, no residues in this area experienced a chemical shift perturbation. These results indicate that the SH2C loses its canonical phosphate binding pocket but uses a novel binding site to recognize phosphoserine.
Both SH2 Domains Contribute to RNAPII CTD Binding-Compared with the SH2N domain, the chemical shift perturbation of the SH2C domain was much weaker upon titration with the phosphorylated RNAPII CTD peptides (Fig. 4). To find out whether both domains were essential for RNAPII CTD binding, we tried to express the single SH2N and the single SH2C domain alone. However, the SH2C domain is expressed in inclusion bodies, possibly due to the solvent exposure of the interdomain hydrophobic interface. Fortunately, the single SH2N domain was soluble and stable enough for NMR analysis. We successfully assigned the backbone resonances of the separated SH2N domain and titrated it with the double Ser 2 phosphorylated peptide pS2(1)ϩpS2(2). The peptide induced a similar perturbation to the same residues as those in the SH2NC module (Fig. 4H). However, an increase in the dissociation constant from 1.5 Ϯ 0.2 mM (for SH2NC) to 2.5 Ϯ 0.2 mM was observed (supplemental Fig. S3). Furthermore, the simultaneous mutation of both binding sites (R1282A, S1283A, Y1381F, and K1435A) in the SH2N and SH2C domains completely abolished the interaction between the SH2NC module and the pS2(1)ϩpS2(2) peptide (Fig. 5B). These results suggested that both the SH2N and the SH2C domains con-tribute to the interaction between Spt6 and phosphorylated RNAPII CTD.

SH2C Domain Represents a Novel SH2 Subfamily-The SH2
domain is a widespread protein interaction module and is involved in various cell functions such as cell surface receptor signal transduction, protein trafficking, cell cycle progression, gene expression, DNA repair, and cell polarity (38,39). Canonical SH2 domains specifically recognize phosphotyrosine with relatively high affinity (M) (37).
Canonical SH2 domains, such as the Src SH2 domain, have a phosphate binding pocket around a highly conserved arginine residue in the ␤B strand, whose side chain forms a hydrogen bond with the phosphate group of its binding partner. This arginine is highly conserved even in the SH2 domains that can bind unmodified peptides, such as SAP and CTEN SH2 (40,41). However, in the yeast Spt6 SH2C domain, this characteristic arginine residue is replaced by a serine (Ser 1384 ) (Figs. 2 and 6C). The phylogenetic tree in Fig. 7 suggests that there is no close evolutionary relationship between the Spt6 SH2C domain and any other SH2 domains with known structures. Interestingly, two residues (Tyr 1381 and Lys 1435 ) in the noncanonical phosphate binding site are totally invariant in all aligned species (Fig.  2). All of these analyses suggest that the Spt6 SH2C domain represents a novel SH2 domain subfamily, which has a noncanonical binding site for phosphoserine.

Structural Basis for Interaction between SH2NC Module and
Phosphorylated RNAPII CTD-Our NMR studies reveal that the SH2N domain binds to the phosphoserine and phosphotyrosine of RNAPII CTD with millimolar affinity, which is ϳ1000 times lower than the canonical SH2 to phosphotyrosine binding affinity. The canonical Src SH2 domain binds to phosphotyrosine using a highly conserved arginine (Arg 175 ) and a less well conserved serine (Ser 177 ) (Fig. 6A). A region around the EF loop recognizes the ϩ3 to ϩ5 residues downstream of the phosphotyrosine and defines the sequence specificity. This binding mode has been called the "two-pronged" model (37). Canonical SH2 domains accommodate the phosphotyrosine aromatic ring through the attraction between the side chain amino groups of Arg 155 /Lys 203 and the -electron of phosphotyrosine (37). However, in the SH2N domain of Spt6, these two residues corresponding to Arg 155 and Lys 203 are replaced by Gly 1264 and Asp 1306 (Fig. 4A). Residue Gly 1264 lacks a side chain to attract the aromatic ring, whereas the negatively charged Asp 1306 may repel the -electron of the phosphotyrosine. These residue differences help explain why the binding affinity of the SH2N domain to phosphotyrosine drops dramatically (to the millimolar range) despite the retention of the phosphate binding arginine (Arg 1282 ) and serine (Ser 1284 ). The binding fidelity of the canonical pocket of the SH2N domain is also affected so that now both phosphoserine and phosphotyrosine can bind to SH2N.
In all of the titration experiments, only one residue (Asp 1325 ) of the EF loop was weakly perturbed upon the addition of a phosphorylated RNAPII CTD peptide (Fig. 4), indicating that the EF loop is only marginally involved in the binding with RNAPII CTD. This result is consistent with the fact that the SH2N domain shows little selectivity toward the C-terminal residues of the phosphorylation site. In a typical SH2 and phosphotyrosine interaction, the binding strength is mostly determined by the recognition of the phosphotyrosine, which contributes more than half of the binding free energy (42). Thus, the electrostatic interaction between the side chains of residues Arg 1282 /Ser 1284 and the phosphate group should contribute the majority of the binding energy of the SH2N domain.
Our data also indicated that the SH2C domain has a noncanonical phosphate binding site for phosphoserine. Based on the structural analysis and chemical shift perturbation information, a conserved positively charged site around Tyr 1381 and Lys 1435 was found that weakly bound to Ser 2 -phosphorylated RNAPII CTD.
Physiological Relevance of Binding between Tandem SH2 Domains and Phosphorylated RNAPII CTD-Our NMR studies indicated that the SH2NC module of Spt6 exhibited only very weak binding affinity (in the millimolar range) to short phosphorylated RNAPII CTD peptides in vitro. However, it was demonstrated that the SH2NC module could readily pull down the hyperphosphorylated full-length RNAPII from a cell extract in an Ser 2 phosphorylation-dependent manner (18). Disruption of this phosphorylation-dependent binding between Spt6 and RNAPII CTD causes transcription reinitiation from a cryptic promoter and a high level of mRNA accumulation in the nucleus (15,18).
The in vivo binding of SH2NC to RNAPII CTD could be enhanced significantly. The most possible scenario would be that dozens of YSPTSPS repeats of RNAPII are simultaneously phosphorylated at multiple sites and that this dramatically increases the effective local concentration of the recognition sites for the SH2NC module to facilitate the recruitment of Spt6. Our chemical shift perturbation experiments indicated that the double phosphorylation in either the same (pS2ϩpS5 peptide) or different repeat units (pS2(1)ϩpS2(2) and pS2(1)ϩpS2(3) peptides) could additively increase the affinity of the SH2NC module to RNAPII CTD. However, the small enhancement also showed that the double phosphorylation in the same or different repeats was not synergistic. Given this observation, the SH2NC module may function as a sensor for the phosphate density in RNAPII CTD, i.e. the higher the phosphorylation level of RNAPII CTD, the tighter the binding between Spt6 and RNAPII.
Another way to enhance the association between Spt6 and RNAPII is that a third RNAPII-associated factor tethers Spt6 in the proximity of RNAPII CTD, thereby increasing the chance of an encounter between SH2NC and RNAPII CTD. For example, it has been reported that Spt6 co-immunoprecipitates with RNAPII, Pob3, Spt16, Spt4, and Spt5 (43) and that Spt5 associates with RNAPII in vivo (13). Recently, Mayer et al. (44) reported a genome-wide occupancy profile of the different phosphorylated forms of RNAPII and Spt6 along the top 50% of the most highly expressed genes in yeast. They found that the deletion of the SH2NC module led to much less recruitment of Spt6 to RNAPII but did not completely abolish this process (44). This finding not only has proven that the SH2NC module is required for the recruitment of Spt6 to phosphorylated RNAPII during transcription but also supports the idea that there is another mechanism for Spt6 association with RNAPII, which is independent of the binding between SH2NC and RNAPII CTD.
We found that the SH2NC module of Spt6 also weakly binds to pS7 NAPII CTD, which was a newly discovered modification event of RNAPII CTD (3,26) catalyzed by the same kinase complex (TFIIH) as for Ser 5 (4). This RNAPII CTD phosphorylation can be detected in both protein coding and snRNA genes (26). Substituting Ser 7 with an alanine residue did not affect the transcription or processing of protein-coding genes but dramatically reduced the transcription and processing of two spliceosomal snRNA genes (26). Our results implied that Spt6 might be involved in snRNA expression.
Compared with Ser 2 , Ser 5 , and Ser 7 phosphorylation of RNAPII CTD, the role of Tyr 1 phosphorylation in gene transcription and processing has not been studied thoroughly. It has been reported that Tyr 1 phosphorylation is catalyzed by c-Abl and Abl-related gene (Arg) kinases in DNA damage response in mammalian cells (1,25). DNA damage in the coding region of a gene can cause arrest the RNAPII-directed transcription and trigger a process called transcription-coupled repair to remove the lesion in the transcribed strand (45). Because the SH2NC modules are conserved from yeast to humans, we assume that mammalian Spt6 also has the ability to bind to pY1 RNAPII CTD and might participate in the transcription coupled DNA repair.
Broad involvement of Spt6 in the transcription cycle depends on its association with RNAPII. The ability to bind to different phosphorylated forms of RNAPII CTD is critical for the recruitment of Spt6 to dynamically phosphorylated RNAPII. We find that Spt6 directly binds to Tyr 1 -, Ser 2 -, Ser 5 -, or Ser 7 -phosphorylated RNAPII CTD. We propose that the SH2NC module of Spt6 probably functions as a sensor for the phosphate density in RNAPII CTD and regulates the interaction between Spt6 and RNAPII. In addition, our data also suggest that Spt6 may play a role in snRNA expression and transcription-coupled DNA repair.