Structural determinants for accurate dephosphorylation of RNA polymerase II by its cognate C-terminal domain (CTD) phosphatase during eukaryotic transcription

The C-terminal domain (CTD) of RNA polymerase II contains a repetitive heptad sequence (YSPTSPS) whose phosphorylation states coordinate eukaryotic transcription by recruiting protein regulators. The precise placement and removal of phosphate groups on specific residues of the CTD are critical for the fidelity and effectiveness of RNA polymerase II–mediated transcription. During transcriptional elongation, phosphoryl-Ser5 (pSer5) is gradually dephosphorylated by CTD phosphatases, whereas Ser2 phosphorylation accumulates. Using MS, X-ray crystallography, protein engineering, and immunoblotting analyses, here we investigated the structure and function of SSU72 homolog, RNA polymerase II CTD phosphatase (Ssu72, from Drosophila melanogaster), an essential CTD phosphatase that dephosphorylates pSer5 at the transition from elongation to termination, to determine the mechanism by which Ssu72 distinguishes the highly similar pSer2 and pSer5 CTDs. We found that Ssu72 dephosphorylates pSer5 effectively but only has low activities toward pSer7 and pSer2. The structural analysis revealed that Ssu72 requires that the proline residue in the substrate's SP motif is in the cis configuration, forming a tight β-turn for recognition by Ssu72. We also noted that residues flanking the SP motif, such as the bulky Tyr1 next to Ser2, prevent the formation of such configuration and enable Ssu72 to distinguish among the different SP motifs. The phosphorylation of Tyr1 further prohibited Ssu72 binding to pSer2 and thereby prevented untimely Ser2 dephosphorylation. Our results reveal critical roles for Tyr1 in differentiating the phosphorylation states of Ser2/Ser5 of CTD in RNA polymerase II that occur at different stages of transcription.

The CTD 2 is an intrinsically disordered domain found in eukaryotic RNA polymerase II, the polymerase responsible for transcribing all protein-coding mRNAs as well as some small nuclear RNAs and microRNAs in eukaryotes (1,2). Although the CTD sequence seems surprisingly simple with the consensus heptad (Y 1 S 2 P 3 T 4 S 5 P 6 S 7 ) repeated 17-52 times in various species, this domain undergoes extensive post-translational modification (PTM) during transcription that predominantly involves phosphorylation (3). Different PTMs or combinations of PTMs orchestrate the process of transcription by recruiting various regulatory proteins at specific stages of the eukaryotic transcriptional cycle (2,4). The coordination of CTD phosphorylation/dephosphorylation by CTD-modifying enzymes is thus essential for the accurate transcription of genetic information (5,6).
Phosphorylations at Ser 2 and Ser 5 of the CTD are the primary modifications in each round of transcription (7). pSer 5 is detected predominantly during initiation of transcription and required for capping enzyme recruitment (8,9). During the transition from initiation to productive elongation, Ser 5 sites gradually become dephosphorylated, whereas Ser 2 sites become phosphorylated and eventually become the major phosphorylated species in the later stages of transcription (7,10). At the end of transcription, all remaining phosphate groups on the CTD are removed because nonphosphorylated CTD is required for RNA polymerase II to recycle and bind the next promoter (11). Other than phosphorylation at Ser 2 and Ser 5 , the other three sites (namely Tyr 1 , Thr 4 , and Ser 7 ) are also phosphorylated in vivo (1). Among the three alternative phosphorylation sites, Tyr 1 phosphorylation is particularly interesting because Tyr 1 is highly conserved in all species, and mutation in this residue leads to defects in RNA polymerase II stability (12), antisense transcription (13), and elongation/termination (14,15). However, the molecular mechanism of how Tyr 1 and its phosphorylation regulate transcription is still unclear.
Just as proper placement of phosphorylation marks on the CTD is essential for engaging in transcription, so too is specific and accurately timed removal of CTD phosphorylation marks. For example, deficiency in pSer 5 removal causes termination defects leading to transcriptional read-through (16). Failure in pSer 2 dephosphorylation leads to cell death because the pro-This work was supported by National Institutes of Health Grants R01 GM104896 (to Y. Z.) and R01 GM125882 (to Y. Z. and J. S. B.) and Welch Foundation Grant F-1778 (to Y. Z.). The authors declare that they have no conflicts of interest with the contents of this article. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This article contains Figs. S1-S4 and Tables S1 and S2. The atomic coordinates and structure factors (code 6NPW) have been deposited in the Protein Data Bank (http://wwpdb.org/). 1 To whom correspondence should be addressed. Tel.: 512-471-8645; E-mail: jzhang@cm.utexas.edu. 2 The abbreviations used are: CTD, C-terminal domain; PTM, post-translational modification; EMSA, electrophoretic mobility shift assay; UVPD, UV photodissociation; BME, ␤-mercaptoethanol; yCTD, yeast CTD; MALDI-TOF, matrix-assisted laser desorption ionization-time of flight.
cro ARTICLE moter binding of RNA polymerase II requires a hypophosphorylated CTD (17). However, it is currently puzzling how CTD phosphatases distinguish the highly similar phosphoryl motifs once they are recruited to RNA polymerase II. It is particularly challenging for dephosphorylating Ser 2 and Ser 5 when both are phosphorylated during elongation, and their flanking residues are very similar (Y 1 S 2 P 3 and T 4 S 5 P 6 ). Nature accomplishes this goal with precision as observed during transcription elongation and termination when pSer 5 levels diminish rapidly but pSer 2 levels steadily increase (18). This task is mainly accomplished by Ssu72, a conserved essential CTD phosphatase that dephosphorylates pSer 5 at the transition from elongation to termination (19,20).
To understand the molecular mechanism of this high specificity of pSer 5 dephosphorylation, we herein investigate the structural characteristics of Ssu72 specificity. Using synthetically phosphorylated CTD peptides and recombinant CTD mutants, we found that Ssu72 has weak phosphatase activity against pSer 2 in addition to its previously reported strong activity against pSer 5 . The unique binding mode of Ssu72 toward its substrate peptides defines the structural basis for the pSer 5 preference (Protein Data Bank (PDB) code 6NPW), which requires the proline residue in the SP motif in the cis configuration to form a tight ␤-turn. The bulky side chain of Tyr 1 makes it energetically unfavorable for pSer 2 -Pro 3 to adopt this configuration due to steric hindrance. We further reveal the role of Tyr 1 in Ssu72's ability to distinguish pSer 5 from pSer 2 using CTD substrates with Tyr 1 mutated to other residues, which alter Ssu72's inherent specificity. Additionally, structural analysis reveals that the cis-proline configuration of Ser 2 is even more unfavorable when Tyr 1 phosphorylation occurs, which may help prevent inappropriate pSer 2 dephosphorylation by Ssu72 during the transcription cycle. Our results provide a structural explanation for the accurate removal of CTD phosphate groups and reveal a new role of Tyr 1 in preventing inappropriate removal of Ser 2 phosphorylation during transcription.

Dephosphorylation activity toward other heptad residues of the CTD by Ssu72
As shown from previous kinetic and crystallographic studies from our and other laboratories, Ssu72 predominantly dephosphorylates pSer 5 on CTD heptad repeats (21)(22)(23). However, Ssu72 has also been reported to dephosphorylate pSer 7 , albeit with much lower activity than that for pSer 5 (24). Ssu72 is part of the cleavage and polyadenylation factor complex (25) forming stable interactions with the N-terminal domain of the scaffolding protein symplekin (26,27). This complex formation enhances Ssu72 phosphatase activity by increasing protein stability in vitro (21). We hence used the Ssu72-symplekin complex in our biochemical and structural study. For all experiments below, when we show results of Ssu72 reactions, the material includes symplekin complexed with Ssu72 for better stability in vitro and higher experimental reproducibility.
To test whether Ssu72 can dephosphorylate other sites of the CTD in addition to the reported pSer 5 and pSer 7 , we needed a highly sensitive assay to monitor CTD dephosphorylation products even if their abundance is low. Instead of using Malachite green assay, which suffers from a narrow window of detection and a poor signal/background ratio (28), we utilized matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) MS, which can detect low-abundance products in the picomolar range. Using a short 10-mer pSer 5 CTD peptide treated with Ssu72 as a proof of concept, we found that adding 2,5-dihydroxybenzoic acid as a matrix allows for effective ionization of both phosphorylated and dephosphorylated CTD species (Fig. 1A). We observed an 80-Da downshift in our MALDI spectra when the phosphoryl peptide was incubated with Ssu72 ( Fig. 1A, gold trace), consistent with the removal of a single phosphate group from pSer 5 . Therefore, our MALDIbased assay is suitable to detect phosphatase activity for Ssu72 against suboptimal substrates.
Ssu72 exhibits high structural similarity to low-molecularweight tyrosine phosphatases and has an almost identical reaction mechanism (23). To test whether Ssu72 has phosphatase activity against pTyr 1 in CTD, we first evaluated phosphatase activity against a 9-mer pTyr 1 CTD peptide by MALDI-TOF in which no dephosphorylation was detected (Fig. 1B). To ensure that the lack of activity is not due to the failure of targeting Ssu72 to the substrate, we treated a doubly Tyr 1 -Ser 5 phosphorylated 10-mer peptide with Ssu72 as pSer 5 recruits Ssu72 to the substrate (Fig. 1C). We observed one phosphate being rapidly removed in the reaction, which we expect to be the dephosphorylation of pSer 5 as shown previously (Fig. 1A), but no further dephosphorylation of the peptide was detected (Fig. 1C). Our results indicate that Ssu72 cannot dephosphorylate pTyr 1 of the RNA polymerase II CTD.
We next asked whether Ssu72 could dephosphorylate other serine/threonine residues of the CTD. Previously, it was reported that Ssu72 could dephosphorylate pSer 7 with activity about 4000 times less than that against pSer 5 (24), but no activity against pThr 4 or pSer 2 has been reported. To test whether Ssu72 can dephosphorylate pThr 4 , we used a doubly phosphorylated Thr 4 -Ser 5 CTD peptide as a substrate to ensure the recruitment of the phosphatase. When the peptide was analyzed for its molecular weight by MALDI, only one phosphate was removed even after prolonged incubation (Fig. 1D, gold trace). We have previously shown crystallographically that Ssu72 maintains strong pSer 5 specificity even in the context of pThr 4 , suggesting the removed phosphate is derived from pSer 5 and not pThr 4 (29). Therefore, Ssu72 shows no detectable phosphatase activity against pThr 4 .
In contrast, pSer 2 can potentially act as a substrate for Ssu72 as it is flanked by a proline to form an SP motif for recognition, similartopSer 5 .TodeterminewhetherpSer 2 issubjecttodephosphorylation by Ssu72, we used a 10-mer synthetic peptide containing pSer 2 as substrate. Incubation of this peptide with Ssu72 resulted in a peak appearing with 80 Da removed, consistent with at least partial removal of the pSer 2 mark from the phosphoryl peptide (Fig. 1E, gold trace). Longer incubation increases the intensity of the peak, but full dephosphorylation is never reached as observed with pSer 5 peptide (Fig. 1A). This result is surprising because there have been no previous reports of Ssu72 activity against pSer 2 . Our sensitive MALDI-TOF data

Tyr 1 determines the specificity of CTD phosphatase Ssu72
suggest that Ssu72 can dephosphorylate pSer 2 in vitro but with less activity compared with pSer 5 .

Ssu72 activity against full-length CTD
Although Ssu72 only shows weak phosphatase activity against pSer 2 in CTD peptides, we considered the possibility that the dephosphorylation can be substantially more prevalent when full-length CTD is used as a substrate due to the high local concentration of pSer 2 . To test this, we used full-length yeast CTD (26 repeats with two SP motifs in each repeat, Ser 2 -Pro 3 and Ser 5 -Pro 6 ; Fig. S1) as the substrate for Ssu72 (Fig. S1). To achieve the saturated phosphorylation at all SP motifs, we incu- Figure 1. Analysis of the specificity of Ssu72 toward phosphorylated CTD peptides using MALDI. A-E, sequences of CTD peptides have been indicated in the top left-hand corner of each respective graph with phosphoryl residues highlighted in red color and a "p" indicating phosphorylation. The color scheme of the trace is depicted at the top right-hand corner, showing that the black trace is the controlled experiment with no phosphatase treatment, and the gold trace is with Ssu72 treatment. Three independent biological duplicates have been sampled for each assay with consistent results. One representative of the samples is shown in each panel. "M" notation indicates the substrate peptide, "Ϫ#P" notation indicates an approximate number of phosphates removed based on mass shift, and "ϩNa" indicates the formation of sodium adduct. Samples with no kinase/phosphatase treatment are used as a control and shown as a black trace. Samples treated in saturating condition with the kinase Erk2 are shown as a blue trace, and samples further treated by Ssu72 are shown as a gold trace. Mass labels indicate m/z at peak intensity. "#P" notation indicates an approximate number of phosphates added based on mass shift. A general figure legend has been depicted at the top right-hand corner. A, a portion of MALDI-MS spectra of WT GST-yCTD variant treated with Erk2 (blue) followed by dephosphorylation with Ssu72 (gold) with the control shown in black. B, electromobility shift assay using a 10% native gel for Erk2-treated GST-yCTD after the indicated time of Ssu72 treatment. C, immunoblotting of Ssu72-mediated dephosphorylation of GST-yeast CTD blotted by pSer 5 (3E8) and pSer 2 (3E10) antibodies. The error bars represent the S.E. D, portion of MALDI-MS of 3C protease-digested GST-5CTD treated with Erk2 (blue) followed by dephosphorylation with Ssu72 (gold) with the control reaction shown in black. E, portion of MALDI-MS of 3C protease-digested GST-3CTD treated with Erk2 (blue) followed by dephosphorylation with Ssu72 (gold) with the no-kinase control reaction shown in black. F, phosphorylation localized to residues highlighted in yellow from UVPD-MS of product in E. Extracted ion chromatograms of 3C protease-digested GST-3CTD with one (green; m/z 915.4) and two phosphates (blue; m/z 942.0) are shown.

Tyr 1 determines the specificity of CTD phosphatase Ssu72
bated CTD with Erk2 under the condition of excess ATP and extended reaction time. The number of phosphates added was quantified using MALDI-TOF ( Fig. 2A, blue trace). WT fulllength yeast CTD was almost completely phosphorylated on all SP motifs (ϳ52 phosphorylation sites detected for the 52 SP motifs from 26 CTD repeats) ( Fig. 2A, blue trace). Using this fully phosphorylated CTD as a substrate with all Ser 2 and Ser 5 phosphorylated, we characterized the pattern of dephosphorylation of CTD by Ssu72 using MALDI-TOF, electrophoretic mobility shift assay (EMSA) and immunoblotting assays (Fig. 2, A-C). A time-course EMSA indicated that the Ssu72-mediated CTD dephosphorylation occurs in a two-phase mode, a fast phase is reached within the first ϳ10 min and then followed by a slow phase where gradual dephosphorylation is occurring with prolonged incubation (Fig. 2B). A sample taken after 90 min of the reaction was analyzed by MALDI-TOF and contained ϳ26 phosphates on the CTD ( Fig. 2A, gold trace). To determine the identities of the phosphoryl species, we monitored the dephosphorylation using immunoblotting with phosphoryl-specific antibodies (Fig. 2C). Incubation of fully phosphorylated yeast CTD with Ssu72 shows the rapid loss of pSer 5 signal as detected by the pSer 5 -specific antibody (3E8) over time (Fig. 2C). In contrast, the pSer 2 signal during the dephosphorylation increased first because the removal of pSer 5 allows better pSer 2 antibody (3E10) binding (2). The level of pSer 2 eventually starts to drop after 5 h of incubation. Overall, when Ssu72 dephosphorylates full-length CTD with both pSer 2 and pSer 5 present, it dephosphorylates pSer 5 rapidly, leaving pSer 2 as an intermediate species that undergoes very slow dephosphorylation.
To identify the phosphorylation site with single-residue resolution, we intended to use tandem MS. However, the lack of basic residues on CTD prevents the trypsin proteolysis of CTD to smaller fragments for identification. Thus, we investigated whether the same fast-slow two-phase dephosphorylation by Ssu72 can also be observed on shorter recombinant CTDs so that we can map the shorter CTD without the necessity of trypsinolysis. Indeed, when we used Erk2 to phosphorylate a CTD variant with recombinant GST followed by five CTD repeats at the C terminus, a maximum of 10 phosphates were added, accounting for all Ser 2 and Ser 5 motifs (Fig. 2D, blue trace). The treatment of Ssu72 results in the removal of approximately half of the phosphoryl groups from the serine residues ( Fig. 2D, gold trace). Another recombinant CTD was generated with three CTD repeats cloned C-terminal to the GST recombinant protein. The phosphorylation under saturated conditions resulted in the addition of six phosphates (Fig. 2E, blue trace), up to half of which were removed with Ssu72 treatment (Fig. 2E, gold trace). To identify the position of phosphorylation with single-residue resolution, we applied UV photodissociation (UVPD) using 193-nm photons as an alternative to existing collision-and electron-based activation methods for tandem MS analysis (30). UVPD provides several compelling advantages in the context of MS/MS analysis of the CTD (31). First, because the UVPD process is not modulated by mobile protons (like collision-induced dissociation), UVPD generates rich fragmentation patterns for both positively and negatively charged peptides. This makes UVPD especially suitable for highly phos-phorylated peptides like those found in the CTD. Second, UVPD is a fast high-energy activation method that does not dislodge labile modifications, making it well-suited for analysis of phosphopeptides (unlike collision-induced dissociation). Both of these advantages make UVPD a natural choice for CTD analysis. Using UVPD-MS on the product of Ssu72 dephosphorylation, we identified that the major species of dephosphorylation are consistent with a mixture of CTD peptides with pSer 2 at different heptads (Fig. 2F). Overall, our results on CTD peptides and recombinant CTD proteins support a model in which Ssu72 preferentially dephosphorylates pSer 5 rapidly but can also dephosphorylate pSer 2 with much lower activity.

Structural determinants for Ssu72 recognition of pSer 2
Because both Ser 2 and Ser 5 from the CTD have similar flanking residues but their phosphorylation states are linked to different biological functions, we wondered why Ssu72 favors pSer 5 over pSer 2 as substrate and used crystallography to identify structural elements that distinguish the two SP motifs. We incubated a catalytically inactivated Ssu72 (C13D/D144N) complex with a 19-mer CTD peptide with Ser 5 and Ser 2 phosphorylated on two separate heptad repeats (Table S1) (Fig. 3A). When both phosphoryl sites are available, the crystal structure reveals high positive density at the active site best fitted with pSer 5 bound, whereas pSer 2 extends outward into the bulk solvent (Figs. 3, A and B, and S2A). We next crystallized Ssu72 with a CTD peptide containing only pSer 2 . A much smaller positive density with a tetrahedral shape was observed in the active site, consistent with a phosphate or sulfate group (Fig. 3C). From our previous experience working with Ssu72, such electron density has often been observed in the active site even when no phosphoryl substrate is included in crystallization. The lack of electron density at the active site for the CTD peptide containing pSer 2 and the weak kinetic activity of pSer 2 peptide both suggest the lack of stronger interaction between the pSer 2 peptide and Ssu72. Thus, consistent with the biochemical assays, our crystal structures also indicate that Ssu72 recognizes pSer 5 as a substrate much better than pSer 2 .
To understand why Ssu72 strongly prefers Ser 5 dephosphorylation even though the flanking residues of Ser 2 are quite similar (Y 1 S 2 P 3 versus T 4 S 5 P 6 ), we analyzed all published X-ray crystal structures of Ssu72 bound with various CTD peptides and noticed that all substrates adopt the same configuration of a sharp ␤-turn (Fig. 3D). In this configuration, the proline residue next to the phosphoryl-serine is recognized by a hydrophobic pocket of Ssu72 in cis-proline configuration, regardless of flanking residue PTM states, peptide length, or the presence of scaffold protein symplekin (Fig. 3D). Although cis-proline only accounts for 10 -20% of the overall proline population, the flanking cis-proline to substrate appears to be a prerequisite for Ssu72 recognition as peptidomimetics with a nonrotatable trans-proline cannot be bound to the active site (32). To retain the same substrate-bound configuration at the active site with cis-proline bound in the hydrophobic pocket when pSer 7 is recognized at the active site, the pSer 7 peptide reverses the orientation of the N and C termini of the peptide (Fig. S2B) (24). The 180°flip of the peptide directionality allows the same cis-proline placed in the hydrophobic pocket but loses favorable inter-

Tyr 1 determines the specificity of CTD phosphatase Ssu72
actions that stabilize pSer 5 recognition, resulting in much weaker activity against pSer 7 .
Because Ssu72 appears to bind each of its substrates in an identical configuration (Fig. 3D), the recognition of the pSer 2 CTD peptide is likely retained with Pro 3 in the cis-proline conformation and bound in the hydrophobic pocket close to the active site. However, such an arrangement would place the Tyr 1 with its bulky side chain in the elongated tunnel of the Ssu72 Tyr 1 determines the specificity of CTD phosphatase Ssu72 active site (Fig. 3E). When we modeled the flanking residues into the active site next to pSer 2 , we noticed that Tyr 1 could barely be fitted into the tight ␤-turn (Fig. 3E). The bulky side chain of tyrosine causes unfavorable van der Waals contacts or even steric clashes in all four of its most frequently used rotamer states (33) (Fig. 3E). The most common tyrosine rotamer observed in folded proteins (called m-85°based on its torsion angle with 44% occurrence frequency) would place the phenol ring of Tyr 1 rather close to Pro 53 of Ssu72 (1.2 Å). If Tyr 1 adopts a t80°configuration instead (ϳ34% frequency), the hydroxyl group of the Tyr 1 side chain would directly collide with Pro 46 of Ssu72. The p90°rotamer (observed in 19% of protein structures) would clash into the ligand itself, and the m-30°rotamer (9% appearance) also places the Tyr 1 side chain in an unfavorable van der Waals contact with Pro 53 and Lys 44 (Fig. 3E). Considering the dynamic states of the protein, the Tyr 1 side chain would have to adopt a rotamer configuration not energetically favored to maintain the substrate-binding mode for Ssu72 recognition. Thus, the bulky side chain of Tyr 1 places pSer 2 substrate at a disadvantage to form the substrate-recognition mode, distinguishing it from the more favorable pSer 5 substrate.

Tyr 1 differentiates the two SP motifs for Ssu72 dephosphorylation
Our structural studies suggest that the residue preceding the phosphorylated CTD SP motif allows Ssu72 to differentiate the two different phosphoserine sites (pSer 5 versus pSer 2 ). To evaluate whether the identity of the residue preceding the phosphorylated SP motif is critical to Ssu72 specificity, we engineered CTDs with various mutations adjacent to the SP motif (Fig. S1) and analyzed the dephosphorylation process by Ssu72 when kinases phosphorylate all SP motifs. We speculate that the bulkiness of the flanking Tyr 1 sterically hinders the cisproline configuration for substrate recognition at pSer 2 compared with pSer 5 by Ssu72. Thus, we replaced Tyr 1 with a small residue to test whether the removal of the bulky side chain is sufficient to eliminate the substrate differentiation. We generated a three-repeat CTD with each Tyr 1 mutated to threonine (Fig. 4A), thus making both Ser 2 and Ser 5 sites preceded by a small residue with no potential steric hindrance for the cisproline configuration formation. We then treated these TSPTSPS repeats with a saturating amount of Erk2, resulting in the phosphorylation of each SP motif (Fig. 4A, blue trace). When using these hyperphosphorylated CTDs as a substrate for Ssu72, all phosphorylations from the hyperphosphorylated substrate were removed rapidly with no discrimination as detected by MALDI-TOF (Fig. 4A, gold trace). This is dramatically different from WT CTD where Ssu72 shows fast-slow two-phase dephosphorylation (Fig. 2) where pSer 5 is removed rapidly and pSer 2 is removed much slower (Fig. 2, A and F). The result of this mutant strongly supports our hypothesis that bulkiness of the residue in front of the SP motif protects pSer 2 from Ssu72 dephosphorylation.
To further establish that the steric hindrance of the residue in front of the SP motif disfavors the dephosphorylation by Ssu72, we replaced Tyr 1 with another bulky residue to see whether it also allows distinction of Ser 5 /Ser 2 . When we generated a fiveheptad repeat with Tyr 1 replaced with histidine at each repeat (Fig. 4B), Erk2 phosphorylated all 10 SP motifs (Fig. 4B, blue trace), but Ssu72 only removed approximately half of these phosphorylations when further dephosphorylation ground to a halt (Fig. 4B, gold trace). This is similar to the consensus sequence with tyrosine in the first position when the dephosphorylation by Ssu72 exhibits fast-slow two-phase dynamics (Fig. 2C). Careful inspection of the structure of Ssu72 reveals that, when modeled into the Tyr 1 position, the histidine side chain is placed in the active site with unfavorable van der Waals interactions with Ssu72, similar to Tyr 1 (Fig. S3). Thus, the presence of any bulky residue in front of a CTD SP motif appears to attenuate Ssu72 dephosphorylation activity.
We also tested whether the steric hindrance can be relieved by inserting a small residue in between a bulky residue and the SP motif to allow Ssu72 to dephosphorylate SP motifs more efficiently. We generated a three-CTD octamer containing the repeating sequence YTSPTSPS (Fig. 4C), which has Thr inserted between Tyr 1 and the Ser 2 -Pro 3 SP motif. The insertion of a residue with a small side chain allows all SP motifs to be dephosphorylated rapidly by Ssu72 (Fig. 4C, gold trace). Thus, the bulkiness of the residue in the position immediately preceding the SP motif differentiates the dephosphorylation of CTD by Ssu72.

Tyr 1 phosphorylation prevents the dephosphorylation of Ser 2 by Ssu72
Our results for the full CTD suggest that the activity of Ssu72 against pSer 2 is low with Tyr 1 hindering the effective formation of substrate-recognition configuration such that the pSer 5 is preferably dephosphorylated. Because the accuracy of dephosphorylation of pSer 5 /Ser 2 is important during transcription, we wondered whether PTM on flanking residues will impact Ssu72 specificity. Previously, we have shown that Thr 4 phosphorylation reduced the dephosphorylation of pSer 5 by Ssu72 by 3-4fold (29). With the weak activity of pSer 2 by Ssu72, we wanted to quantify how Tyr 1 phosphorylation impacts the Ssu72 phosphatase activity. Both Tyr 1 and Ser 2 phosphorylation species are observed using ChIP-Seq in the transcription elongation stage (1). Recent reports using MS to identify CTD phosphorylation of engineered RNA polymerase II identified the existence of a Tyr 1 -Ser 2 double phosphorylation motif (50 and 20% of all diphosphorylated heptads in yeast and human, respectively), even though the abundance of double phosphorylation  Table S1) represented as sticks with carbon atoms colored white and intramolecular hydrogen bond holding the tight ␤-turn conformation of the peptide highlighted as yellow dashes. C, 2F o Ϫ F c map (contoured at 1.8) shown as a blue mesh for the Ser 2 peptide soaked in Ssu72-symplekin C13D/D144N complex. D, superimposition of Ssu72-peptide complex structures of PDB codes 3P9Y (green), 3O2Q (blue), and 4IMI (light yellow) onto the structure of Ssu72 bound to the doubly phosphorylated Ser 2 -Ser 5 peptide (obtained in this study; shown in yellow). E, the four different rotamers of the tyrosine most frequently observed modeled in the active site of Ssu72 assuming pSer 2 is bound to the active site.

Tyr 1 determines the specificity of CTD phosphatase Ssu72
is substantially lower than that of single phosphorylation (34). If Ssu72 binds substrate in the cis-proline conformation as found in all Ssu72 structures, phosphorylation of Tyr 1 should further hinder the formation of the tight ␤-turn required for substrate recognition of pSer 2 by Ssu72 (Fig. 5A). To test whether pTyr 1 indeed prevents dephosphorylation of pSer 2 in vitro, we used a

Tyr 1 determines the specificity of CTD phosphatase Ssu72
synthetic peptide containing Tyr 1 and Ser 2 , both phosphorylated on the same heptad as substrate, and incubated it with Ssu72. Little dephosphorylation can be detected in the sensitive MALDI-TOF assay (Fig. 5B, gold trace). Thus, phosphorylation of the preceding Tyr 1 prevents pSer 2 from being dephosphorylated by Ssu72.
To establish that the phosphorylation of Tyr 1 inhibits pSer 2 from being dephosphorylated by Ssu72, we tested whether this protection from dephosphorylation could be reversed by removing the Tyr 1 phosphorylation mark. PTP1b is a potent tyrosine phosphatase (35). Although PTP1b is not a physiological phosphatase for RNA polymerase II in vivo, its activity against pTyr 1 allows us to use it as a tool to interrogate the effect of pTyr 1 on the dephosphorylation of pSer 2 . We first validated the in vitro specificity of PTP1b on every single phosphorylable residue of the CTD heptad repeat and found that pTyr 1 is the only residue it dephosphorylates (Fig. 5C), whereas other residues (Ser 2,5,7 and Thr 4 ) are not subject to dephosphorylation by PTP1b (Fig. S4). When we treated Tyr 1 -Ser 2 doubly phosphorylated CTD peptide with PTP1b, one phosphate was removed Tyr replaced by His in each heptad repeat. C, an engineered octamer repeat of three with a Thr residue inserted after Tyr. For sample C, the variants were first treated with Erk2 (blue) followed by dephosphorylation with Ssu72 (gold) with nonphosphorylated control shown in black. Mass labels indicate m/z at peak intensity. "M" notation indicates the substrate peptide, and "ϩ#P" notation indicates an approximate number of phosphates added based on mass shift.

Figure 5. Phosphorylation of Tyr 1 protects the dephosphorylation of Ser 2 by Ssu72.
A, modeling phosphorylated Tyr 1 with CTD pSer 2 in the active site for Ssu72 binding. The isomeric states of Tyr 1 are the four most frequently observed rotamers. B, mass spectrometry analysis using MALDI-TOF for the synthetic phosphopeptide Tyr 1 -Ser 2 before Ssu72 treatment (black) and after Ssu72 treatment (gold). Mass labels indicate m/z at peak intensity. C, mass spectrometry analysis using MALDI-TOF for the synthetic phosphopeptide control shown in black and PTP1B treatment shown in blue. D, mass spectrometry analysis using MALDI-TOF for the synthetic phosphopeptide after treatment with PTP1B and then followed by Ssu72. Three independent biological duplicates have been sampled for each assay with consistent results. One representative of the samples is shown in each panel. The trace after PTP1b treatment is in blue, and the trace after Ssu72 treatment is in gold. Mass labels indicate m/z at peak intensity. "M" notation indicates the substrate peptide, and "Ϫ#P" notation indicates an approximate number of phosphates removed based on mass shift.

Tyr 1 determines the specificity of CTD phosphatase Ssu72
with a loss of 80 Da, consistent with the phosphate on pTyr 1 (Fig. 5D, blue trace). Following PTP1b treatment, we next added Ssu72 to the reaction, which resulted in a product appearing at 1500 Da, which corresponded to a peptide missing both phosphate groups (Fig. 5D, gold trace). This is in contrast to the double phosphorylated peptide treated by Ssu72 alone where no phosphate was removed (Fig. 5B). Thus, once Tyr 1 phosphorylation is removed, pSer 2 is subject to dephosphorylation by Ssu72. This suggests that the timing of Tyr 1 phosphorylation/dephosphorylation can be important for ensuring proper Ser 2 phosphorylation state near the end of transcription.

Discussion
AstheprimarymechanismforPTM,phosphorylation/dephosphorylation require high precision for the accurate transmission of information. Phosphatases, which usually have a panspectrum enzymatic activity against phosphorylated substrates, rely heavily on the cellular location of substrate specificity (35). In hyperphosphorylated proteins where different phosphorylation sites encode various functions, however, it is not yet known how these phosphatases distinguish different phosphoryl sites once they are recruited to their targets. The CTD of RNA polymerase II is highly phosphorylated with five phosphorylable residues in each of its repetitive heptad repeats, which begs the question of how CTD phosphatases are capable of removing the correct phosphate marks at the right stage in transcription. During elongation, for example, pSer 5 of the CTD is dephosphorylated rapidly, whereas Ser 2 phosphorylation is still steadily increasing (4). Herein, we show that Tyr 1 residues of the substrate play a critical role in distinguishing the almost identical SP motifs. The major phosphatase involved in elongation, Ssu72, has a unique active site that requires the proline residue in the substrate motif to be in cis configuration. This requirement disfavors pSer 2 whose neighboring bulky Tyr 1 side chain hinders the formation of an energetically favorable CTD configuration recognized by Ssu72. Using CTD mutants, we proved that the bulky side chain of Tyr 1 is responsible for the differentiation between the two SP motifs in CTD. Intriguingly, pSer 2 activity by Ssu72 is abolished when the preceding Tyr 1 is phosphorylated, the combination of which has been detected in cells (34). Tyr 1 is highly conserved across all species, even in the highly divergent sequence of Drosophila melanogaster CTD where only three of 46 heptads show consensus (36). Furthermore, even conservative mutations (i.e. to phenylalanine) of all Tyr 1 positions in Saccharomyces cerevisiae, Schizosaccharomyces pombe, and vertebrates result in either lethal or significant growth-deficient phenotypes (12). Tyr 1 phosphorylation is critical for the stability of RNA polymerase II, termination factor recruitment, and antisense transcription (12)(13)(14)(15). Our research provides a plausible mechanism for the observed diversified biological implication of Tyr 1 conservation and its phosphorylation by regulating the activity of CTD modifiers such as Ssu72 to impact the phosphorylation state of other CTD residues. In human cells, Tyr 1 phosphorylation occurs at the beginning of transcription, placing it at the critical point to influence the subsequent events of CTD modification. Tyr 1 and Ser 2 phosphorylations overlap during transcriptional elongation in vivo as shown in ChIP-Seq analysis (37), and tandemly phosphorylated Tyr 1 -Ser 2 have been detected by MS both in human and yeast RNA polymerase II (34,38). Our results show that not only does Tyr 1 greatly reduce the activity of Ssu72 against pSer 2 in vitro, but as a further safety proof, Tyr 1 -Ser 2 double phosphorylation can prevent inappropriate dephosphorylation of pSer 2 by Ssu72. This strategy can potentially provide timely control of the phosphorylation state of specific residues in the CTD for the accurate regulation of transcription.

Conclusion
In this investigation, we identified that Tyr 1 of the CTD consensus heptad repeat is critical in the differentiation of the two SP motifs for dephosphorylation by CTD phosphatase Ssu72. This allows the removal of Ser 5 phosphorylation marks while allowing phosphorylation of Ser 2 to accumulate. Phosphorylation of Tyr 1 further prevents any inappropriate dephosphorylation of pSer 2 by Ssu72. This study reveals a new biological function of Tyr 1 by determining the accurate removal of the phosphorylation on other CTD residues.

Experimental procedures
Phosphorylated CTD peptides used in this study were custom-ordered from Anaspec and CPC BioSciences, and their sequences have been listed in Table S1. The gene blocks and oligos for the 3CTD, 5CTD, and its variants are described in Fig.  S1 were ordered from Integrated DNA Technologies, Inc. and GenScript. All other reagents and chemicals were purchased from Sigma-Aldrich unless specified otherwise.

Protein expression and purification
Purification of Ssu72-D. melanogaster Ssu72 and symplekin constructs and expression plasmids used in this study are the same as those described previously (29). Ssu72-symplekin was also purified as detailed before (23, 29). Briefly, the Ssu72 and symplekin plasmids were separately transformed into Escherichia coli BL21 (DE3) cells, and these cells were subsequently grown in Luria-Bertani (LB) medium with 50 g/ml kanamycin at 37°C. When the OD 600 of the culture reached around 0.4 -0.6, expression was induced by adding isopropyl ␤-D-thiogalactopyranoside to a final concentration of 0.5 mM, after which the temperature was turned down to 16°C, and the cultures were grown for another 18 h. The cells were pelleted by centrifugation, resuspended in lysis buffer, and lysed by sonication. Highspeed centrifugation was used to separate the cellular debris from the aqueous fraction, which was passed through a nickelnitrilotriacetic acid column (Qiagen); the His-tagged protein bound to the nickel beads and was eluted with a buffer containing a high concentration of imidazole. Thrombin protease was used to cleave off the N-terminal His tag during an overnight dialysis at 4°C, after which the proteins were passed through ion-exchange and size-exclusion chromatography columns for further purification. The Ssu72 and symplekin purified separately by the above means were then mixed together and incubated at 4°C overnight to allow for complex formation. The complex was specifically separated from any uncomplexed excess protein by passing this mixture through a gel-filtration column. Homogeneity of the complex was confirmed by run-

Tyr 1 determines the specificity of CTD phosphatase Ssu72
ning the gel-filtration fractions on an SDS-polyacrylamide gel. The inactive Ssu72 C13D/D144N-symplekin, which was used in all the crystallization experiments, was purified using the same method as described above.
Purification of the kinase Erk2-Human Erk2 used to phosphorylate the GST-CTD and its variants was expressed from pET-His6-ERK2-MEK1_R4F_coexpression vector, which was a gift from Melanie Cobb (Addgene plasmid number 39212). E. coli BL21 (DE3) cells, transformed with the above vector, were grown in LB medium at 37°C in the presence of 50 g/ml ampicillin antibiotic to maintain the plasmid selection. When the OD 600 of this culture reached 04 -0.6, 0.5 mM isopropyl ␤-D-thiogalactopyranoside (final concentration) was added to induce expression. Erk2 was coexpressed with MEK1 by growing this culture after induction at 37°C for another 4 h before pelleting the cells by centrifugation at 5000 ϫ g for 20 min. The cells were lysed by sonicating in a buffer containing 50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 15 mM imidazole, 10% glycerol, 0.1% Triton X-100, and 10 mM ␤-mercaptoethanol (BME). The lysate was cleared by centrifugation at 15,000 rpm for 45 min at 4°C. The supernatant was initially purified by affinity chromatography using nickel-nitrilotriacetic acid (Qiagen) beads, and the target protein bound to the beads via its His 6 tag was eluted with buffer containing 50 mM Tris-HCl, pH 8.0, 500 mM NaCl, 200 mM imidazole, and 10 mM BME. The eluted protein sample was dialyzed against buffer containing 20 mM Tris-HCl, pH 7.5, 50 mM NaCl, 10 mM BME, and 20% glycerol, after which it was diluted further to a final concentration of 10% glycerol and purified by anion-exchange chromatography with Mono Q resin (GE Healthcare). An NaCl gradient (50 -1000 mM) was used to elute the protein from the column into fractions that were analyzed by SDS-PAGE. Fractions containing phosphorylated protein were pooled and concentrated, if necessary, in Vivaspin columns (Sartorius).
Purification of engineered GST-CTD variants-The CTD (three repeats and five repeats) and its variants (sequences are described in Fig. S1) were amplified from synthetic DNA templates generated by Integrated DNA Technologies, Inc. These were then cloned into a pET28a (Novagen) derivative vector encoding an N-terminal His tag followed by GST tag and a 3C protease site to generate the GST-CTD constructs.
GST-CTD proteins were expressed and purified as described in established protocols published previously (30). Concisely, the cell pellet was lysed by sonication in a buffer at pH 8.0 containing 500 mM NaCl, 0.1% Triton X-100, and 10% glycerol. The cell debris was then separated by high-speed centrifugation at 16,000 ϫ g for 45 min at 4°C, and the supernatant fraction was passed through a nickel-affinity column to purify the Histagged GST-CTD variants, which were eluted by using a high concentration of imidazole. The eluate was purified further by passing it through a gel-filtration column in the buffer at pH 8.0 with 50 mM NaCl and 10 mM BME. Fractions containing protein were run on an SDS-polyacrylamide gel for size verification and ultimately concentrated and aliquoted before flash freezing in liquid nitrogen.
Purification of PTP1B phosphatase-The expression plasmid containing PTP1b was kindly provided by Dr. Zhong Yin Zhang from Purdue University. E. coli BL21 (DE3) cells containing overexpressed His-tagged PTP1B were lysed, and the tagged protein was purified using the standard protocols described in earlier sections. The nickel column eluate was further purified by gel filtration in a buffer containing 100 mM MES, pH 6.5, 1 mM EDTA, 50 mM NaCl, and 10 mM BME. The fractions containing protein were concentrated before flash freezing in liquid nitrogen.
Generating phosphorylated GST-CTD substrate-20 g of GST-3CTD, GST-5CTD, and other GST-CTD variants (described in Fig. S1) were treated with 3 g of Erk2 prepared as above in a buffer containing 50 mM Tris-Cl, pH 7.5, 50 mM MgCl 2 , and 2 mM ATP for 16 h at 30°C.

Phosphatase assay
Ssu72-100 M synthetic peptide (described in Table S1) was treated with 10 -20 M Ssu72-symplekin complex in 100 mM MES, pH 6.0, at 28°C for 90 min before being subjected to desalting for MALDI analysis. Similarly, 1 g of kinase-treated GST-CTD sample was treated with 10 -20 M Ssu72-symplekin in 100 mM MES, pH 6.0. The samples were subsequently run on a 15% native gel for EMSA analysis or were treated with 3C protease before desalting for MALDI.
PTP1B-100 M peptide (described in Table S1) was treated with 100 M PTP1B in 100 mM HEPES, pH 7.5, at 28°C for 60 min before being subjected to desalting for MALDI analysis.
Immunoblotting 60 g of Erk2-treated GST-yCTD was incubated with 17 g of Ssu72-symplekin complex for various lengths of time in a buffer containing 100 mM MES, pH 6.5, and 1 mM EDTA. Samples were removed at respective time points, and the reaction was quenched by adding loading dye and heating at 95°C for 2 min. 0.55 g of this substrate was then spotted three times (for three technical replicates) on two 0.45-m nitrocellulose membranes (Bio-Rad) (one for each antibody) and allowed to air dry for 20 min. The membranes were then blocked in 5% BSA in 1ϫ TBST (Tris-buffered saline with 0.2% Tween) for 1 h at room temperature. For probing the CTD Ser 2 phosphorylation, the membrane was incubated with the 3E10 antibody (Millipore catalog number 04-1571-1) at a dilution of 1:600 prepared in blocking solution. To detect Ser 5 phosphorylation, the second membrane prepared was incubated with the 3E8 antibody (Millipore catalog number 04-1572) prepared at 1:600 dilution in blocking solution. The membranes were incubated with the respective antibodies overnight at 6°C on a shaker, after which the membranes were washed with a solution of TBST five times for 5 min each. The membranes were then incubated with a 1:30,000 dilution of secondary antibody in blocking buffer for 2 h at room temperature. After another set of washes, the membranes were exposed to SuperSignal West Pico Chemiluminescent Substrate (Pierce catalog number 34079) according to the manufacturer's instructions before imaging using a G:BOX gel doc system (Syngene). The images were analyzed, and the spot intensity was quantified in ImageJ (39). Statistical analysis and p value calculation were done using the t test function in Microsoft Excel. The column scatters plots with mean and S.D. were plotted in RStudio (40). Three independent bio-Tyr 1 determines the specificity of CTD phosphatase Ssu72 logical replicates were prepared and imaged as described, all revealing the same trend.

MALDI-TOF MS
Trifluoroacetic acid (TFA) was added to proteins at a concentration at 0.1 mg/ml after 3-4 h of 3C protease treatment or 100 M peptide. The final concentration of TFA was around 0.1-0.4%, and pHydrion pH test paper (Sigma) was used to verify that the pH was lesser than 4. Samples were subsequently desalted and concentrated using ZipTips with 0.6 l of C 18 resin (Millipore), a wash solution of water/ methanol/TFA (95:5:0.1, v/v/v) was used, and samples were finally eluted from the resin in acetonitrile/water/TFA (50: 50:0.1, v/v/v). 0.5 l of 2,5-dihydroxybenzoic acid matrix (saturated in acetonitrile/water, 50:50, v/v) was spotted on a MALDI metal sample plate, and 0.5 l of the desalted sample prepared above was added to that and thoroughly mixed by pipetting several times. These drops were allowed to air dry at room temperature to form crystals on the sample plate. MALDI-TOF spectra were acquired on a Voyager-DE PRO (Applied Biosystems) instrument; all measurements were made in the positive mode; the preloaded ACTH_linear parameter settings were used for all peptides, GST-3CTD, GST-5CTD, and its variants; the myoglobin_linear parameter settings were used for the GST-yCTD. Laser intensity was adjusted manually to obtain the highest signal to noise. The spectra obtained were postprocessed in Data Explorer (Applied Biosystems) with the noise removal utility, which was used to enhance the visualization of certain spectra when required. All MALDI assays were carried out in three independent biological replicates. The final graphs were plotted in R using the ggplot package (40).

Liquid chromatography-MS for phosphate localization
Following Erk2 and Ssu72 treatment, GST-CTD proteins (ϳ5 g) were digested with 3C protease for 5-6 h. Resultant 3CTD peptides were desalted with PepClean TM C 18 spin columns according to the manufacturer's instructions and eluted with water/acetonitrile (30:70, v/v). The solvent was evaporated, and peptides were reconstituted in water/acetonitrile/formic acid (98:2:0.1, v/v/v) for subsequent LC separation. Separations were carried out on a Dionex Ultimate 3000 Nano liquid chromatograph configured for direct injection. PicoFrit TM 75-m-inner-diameter analytical columns (New Objective, Woburn, MA) were packed in-house to 20 cm with 1.8-m UChrom C 18 (NanoLCMS Solutions, Oroville, CA). Mobile phases A and B were water and acetonitrile, respectively, each containing 0.1% formic acid. Separations occurred over a two-step gradient as follows: mobile phase B was increased from 2 to 20% in 15 min and then from 20 to 35% in another 13 min. The flow rate was maintained at 0.300 l/min during the separation.
An Orbitrap TM Fusion Lumos Tribrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA) equipped with a Coherent ExciStar XS excimer laser operated at 193 nm and 500 Hz was used in tandem with the liquid chromatograph. Two pulses at 2 mJ were used for UVPD. A targeted analysis was used to select the 3ϩ charge state of the mono-and diphosphorylated 3CTD peptides, corresponding to m/z 915.4 and 942.0, respectively. Orbitrap resolving power for MS 1 and tandem MS spectra was set to 60,000 and 30,000 at m/z 200, respectively.
MS/MS spectra were deconvoluted to neutral forms using the Xtract algorithm and matched to the nine UVPD ion types (a, a ⅐ , b, c, x, x ⅐ , y, y Ϫ 1, and z) within 10-ppm accuracy using ProSight Lite. Phosphate localization was performed by adding a phosphate group (ϩ79.97 Da) to possible serine, threonine, and tyrosine residues to optimize characterization scores in ProSight Lite.

Crystallization, crystal soaking of Ssu72 with CTD peptides, and structure determination
A purified activity-deficient variant of Ssu72 (C13D/D144N) was incubated with symplekin in a 1.2:1 ratio overnight in a cold room and then purified with gel-filtration chromatography. The protein complex was then crystallized in a solution containing 12% PEG 3350 (w/v) and 100 mM HEPES, pH 7.5, at room temperature. Ssu72 C13D/D144N-symplekin crystals were soaked with 0.1 mM pSer 2 /Ser 5 double phosphorylated CTD peptides and 10 mM pSer 2 peptide to obtain the phosphatase-CTD peptide complex. The crystals were cryoprotected with mother liquor containing 15% glycerol (v/v) before vitrification in liquid nitrogen.
Crystallographic data for Ssu72-symplekin crystals soaked with CTD peptides were collected at the beamline BL 5.0.3 in the Advanced Light Source (ALS) at National Lawrence Laboratory. All diffraction data were processed with HKL2000 (42). The structures were determined by molecular replacement using the structure of Drosophila Ssu72-symplekin complex (PDB code 4IMI) as a model with PhaserMR in the Phenix Program Suite (41). The ligand was built iteratively in Coot (42), and computational refinements were performed using Phenix Refine (43). The quality of the final refined structures was evaluated by MolProbity (44). The final statistics for data collection and structure determination are summarized in Table S2.