Structure and Carboxyl-terminal Domain (CTD) Binding of the Set2 SRI Domain That Couples Histone H3 Lys36 Methylation to Transcription*

During mRNA elongation, the SRI domain of the histone H3 methyltransferase Set2 binds to the phosphorylated carboxyl-terminal domain (CTD) of RNA polymerase II. The solution structure of the yeast Set2 SRI domain reveals a novel CTD-binding fold consisting of a left-handed three-helix bundle. NMR titration shows that the SRI domain binds an Ser2/Ser5-phosphorylated CTD peptide comprising two heptapeptide repeats and three flanking NH2-terminal residues, whereas a single CTD repeat is insufficient for binding. Residues that show strong chemical shift perturbations upon CTD binding cluster in two regions. Both CTD tyrosine side chains contact the SRI domain. One of the tyrosines binds in the region with the strongest chemical shift perturbations, formed by the two NH2-terminal helices. Unexpectedly, the SRI domain fold resembles the structure of an RNA polymerase-interacting domain in bacterial σ factors (domain σ2 in σ70).

Gene transcription by RNA polymerase II (Pol II) is physically and functionally coupled to other nuclear events, most notably mRNA processing (1)(2)(3)(4)(5)(6)(7). Transcription-coupled events generally depend on the carboxyl-terminal repeat domain (CTD) 4 of the largest Pol II subunit, which binds many nuclear factors during transcription elongation. The CTD forms a mobile extension from the structural core of Pol II (8) and consists of heptapeptide repeats of the consensus sequence Tyr 1 -Ser 2 -Pro 3 -Thr 4 -Ser 5 -Pro 6 -Ser 7 , which can be phosphorylated at residues Ser 2 and Ser 5 . The CTD phosphorylation pattern changes during the transcription cycle. Ser 5 phosphorylation occurs in promoter-proximal regions and leads to recruitment of the 5Ј-RNA capping enzyme (9 -12). Ser 2 phosphorylation occurs in regions that are more distal from the promoter and triggers binding of the 3Ј-RNA processing machinery (10,13).
Recently it emerged that transcription is also coupled to the alteration of chromatin structure. The histone methyltransferases Set1 and Set2, which catalyze methylation of histone H3 lysines Lys 4 and Lys 36 , respectively, are associated with Pol II during elongation (reviewed in Refs. 14 and 15). Histone methylation apparently controls newly initiated Pol II, and two phases of histone H3 methylation can be distinguished after transcription initiation (16). Set1 association with Pol II is mediated by the Paf complex, which occurs in promoter regions, and depends on Ser 5 phosphorylation of the CTD (17,18). In contrast, Set2 directly interacts with the phosphorylated CTD of Pol II and is observed throughout the coding region of genes (17)(18)(19)(20). Set2 recruitment to Pol II requires the CTD kinase CTDK-I that phosphorylates Ser 2 residues in the CTD (17,18,20,21).
Set2 interacts with the Pol II CTD via a novel domain, the Set2 Rpb1-interacting (SRI) domain (22,23). The SRI domain of S. cerevisiae comprises the COOHterminal residues 619 -718 of Set2 (22). In vitro, the yeast Set2 SRI domain binds specifically and with high affinity to the CTD doubly phosphorylated at Ser 2 and Ser 5 (22). In vivo, deletion of the Set2 SRI domain abolishes H3 Lys 36 methylation and impairs transcription elongation (22), suggesting that the SRI domain is responsible for coupling transcription to histone methylation by Set2.
Here we report the solution structure of the Set2 SRI domain from the yeast S. cerevisiae and present NMR binding experiments with phospho-CTD peptides. Our results elucidate the molecular determinants for Set2 CTD binding, which underlies coupling of transcription to Set2-directed chromatin modification.

EXPERIMENTAL PROCEDURES
Sample Preparation-The region of the gene of the Saccharomyces cerevisiae Set2 protein (Swiss Prot P46995) encoding for Set2 residues 620 -719 was cloned into a modified pET9d vector with an NH 2 -terminal hexahistidine tag. The protein was overexpressed in Escherichia coli pLysS cells at 18°C for 16 h. For labeling of the protein with 15 N/ 13 C or 15 N, cells were grown in M9 minimal medium supplemented with [ 13 C 6 ]glucose and/or 15 NH 4 Cl. Cell lysates were subjected to affinity chromatography on a nickel-nitrilotriacetic acid column (Quiagen), followed by cleavage of the hexahistidine tag with tobacco etch virus protease and dialysis overnight. The tag and the His 6 -tagged protease were removed on a second Ni-NTA column. DNA was removed by cation exchange chromatography (Mono S, Amersham Biosciences). After gel filtration the sample was dissolved in 20 mM sodium phosphate, pH 6.5, 200 mM NaCl, 0.2 mM dithiothreitol. Edman sequencing of the protein confirmed the presence of four additional residues (GAMG) at the NH 2 terminus, which result from the cloning strategy. NMR samples were prepared in H 2 O or 100% D 2 O at 0.4 -1 mM concentration of protein.
NMR Structure Determination-NMR spectra were acquired at 292 K on Bruker DRX500, DRX600, or DRX900 spectrometers with cryogenic triple resonance probes. Spectra were processed with NMRPipe (24) and analyzed using NMRVIEW (25). The 1 H, 13 C, and 15 N chemical shifts were assigned by standard methods (26). Distance restraints were derived from two-dimensional NOESY and 15 N-or 13 C-resolved three-dimensional NOESY. Restraints for the backbone angles and were derived from TALOS (27). Slowly exchanging amide protons were identified from 1 H, 15 N correlation experiments after dissolving of lyophilized protein in D 2 O. 15 N relaxation (T1, T2) and heteronuclear ( 1 H)-15 N NOE was measured on a 15 N-labeled protein sample at 292 K as described (28) (supplemental Fig. S1). The experimentally determined distance and dihedral restraints (supplemental Table S1 and Fig.  1C) were applied in a simulated-annealing protocol using ARIA (29) and CNS (30). NOEs were manually assigned and distance calibrations were performed by ARIA. The final ensemble of NMR structures was refined in a shell of water molecules (31). Structural quality was analyzed with PROCHECK (32).
Phosphopeptide Interaction Studies-The phospho-CTD peptides used for binding experiments were chemically synthesized (one-repeat peptide, YpSPTp-SPS; two-repeat peptide, SPS-YpSPTpSPS-YpSPTpSPS, pS ϭ phosphoserine). For NMR titration, increasing amounts of the CTD peptide were added to a 0.4 mM solution of 15 N, 13 C-labeled SRI domain up to a 1.25-fold molar excess. Chemical shifts were monitored in two-dimensional 1 H, 15 N HSQC experiments.

RESULTS AND DISCUSSION
The Set2 SRI Domain Forms a Conserved Three-helix Bundle-The solution structure of the yeast Set2 SRI domain was determined by multidimensional NMR (supplemental Table S1; also see "Experimental Procedures"). The structure revealed three ␣-helices arranged in a left-handed bundle (Fig. 1). The NH 2 -terminal helix ␣1 is slightly kinked at residues Phe 639 and Val 640 , and the linker between helices ␣1 and ␣2 includes a short 3 10 -helical turn at residues Ser 650 -* This work was supported in part by the Deutsche Forschungsgemeinschaft and the Fonds der Chemischen Industrie. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www.jbc.org) contains supplemental Table S1 and Figs. S1-S3. Gln 652 . A hydrophobic core is formed by numerous residues located at the interface between the three helices, including four residues in the two regions linking the helices (Fig. 1C). Consistently the heteronuclear { 1 H}-15 N NOE measurements demonstrate that the polypeptide backbone in all three helices and the connecting linker regions is rigid ( Fig. 1C and supplemental Fig. S1). The hydrophobic core residues are generally conserved across species (Fig. 1C), demonstrating that our structure is a good model for SRI domains in Set2 of other species.
The SRI Domain Defines a Novel CTD-binding Fold-Comparison with the five known structures of CTD-binding domains reveals that the SRI domain defines a novel CTD-binding fold. Other CTD-binding domains include FF domains, CTD-interacting domains, WW domains, BRCT domains, and a domain in the Cgt1 subunit of the 5Ј-capping enzyme (reviewed in Ref. 7). Of these, FF and CTD-interacting domains also form helical bundles (33,34), but, in contrast to the SRI domain, the superhelical arrangement in these two domains is right-handed (supplemental Fig. S2). Thus the six CTD-binding domains that have been structurally characterized use different folds for specific CTD recognition.
The SRI Domain Binds a Two-repeat CTD Phosphopeptide-To characterize the CTD-binding determinants of the SRI domain, we performed NMR titration experiments with Ser 2 /Ser 5 -phosphorylated CTD peptides (Fig. 1C). A phosphopeptide consisting of a single CTD repeat (YpSPTpSPS, pS ϭ phosphoserine; Fig. S3A) did not perturb chemical shifts in a two-dimensional 1 H, 15 N HSQC spectrum, indicating that there is no significant binding (data not shown). However, titration with a peptide that comprised two CTD repeats and three flanking NH 2 -terminal residues (SPS-YpSPTpSPS-YpSPTp-SPS) resulted in many strong chemical shift perturbations (Fig. 1C and supplemental Fig. S3). From the titration data the dissociation constant is estimated to be in the low micromolar range, comparable with the reported approximate affinity of 6 M for a CTD phosphopeptide comprising three repeats (22).
Regions in the SRI Domain That Interact with the CTD-Residues that show strong chemical shift perturbations of their backbone NH groups cluster in two regions on the SRI domain structure (Fig. 2A). The first region includes residues Lys 634 , Phe 635 in ␣1, and Ala 662 , Val 666 , Lys 667 , Thr 670 , Thr 671 , and Glu 673 in ␣2, whereas the second region includes residues Phe 653 , His 655 , Glu 656 in the ␣1-␣2 linker, and residue Ile 705 in ␣3 (Figs. 1C and 2A and supplemental Fig. S3). With the exception of Ile 705 , the strongest perturbations upon peptide binding were observed in region 1 (Phe 635 , Ala 662 , Val 666 , Lys 667 , and Glu 673 ). In this region, the side chain NH 2 groups of residues Asn 631 and FIGURE 1. Structure and CTD binding of the yeast Set2 SRI domain. A, ensemble of final NMR structures. The three ␣-helices are shown in green, and a short 3 10 -helix is shown in pink. B, ribbon diagram of the lowest energy structure in A. C, alignment of SRI domain sequences and NMR structure determination and CTD binding data. The secondary structure is shown above the sequence. Solvent-protected amide protons that show slow H/D exchange are indicated by filled circles. Secondary chemical shifts ⌬␦(C␣-C␤) are indicated by black bars. Residues that experience large chemical shift perturbations upon addition of the CTD two-repeat phosphopeptide SPS-YpSPTpSPS-YpSPTpSPS (pS ϭ phosphoserine) are indicated above the alignment with crosses and circled crosses for backbone and side chain amides, respectively. Yellow stars indicate residues Ala 662 and Val 666 that are implicated in binding of a CTD tyrosine side chain. Residues that are identical and conserved in fungal Set2 homologues are on red background and in red, respectively. Hydrophobic core residues are marked with a black square.
Asn 633 also show significant chemical shift perturbations (supplemental Fig.  S3B). Both regions are conserved among fungal Set2 homologues (Fig. 2B), befitting the conserved function of the Saccharomyces pombe and Neurospora crassa Set2 homologues (35,36). The observation of two putative CTD-binding regions, and the finding that two CTD repeats are required for SRI domain binding, indicate that the phospho-CTD extends over a long distance along helices ␣1 and ␣2 and the connecting linker.
CTD Tyrosine Side Chains Contribute to SRI Domain Binding-The peptide titration experiments also revealed that the two-repeat CTD peptide (supplemental Fig. S3A) binds to the SRI domain via its tyrosine residues. Intermolecular NOEs between both CTD tyrosine side chains and the SRI domain were detected (data not shown). Preliminary assignments indicate that one of the tyrosine side chains is in proximity of residues Ala 662 and Val 666 in region 1 (Figs. 1C and 2B). These two residues are part of a hydrophobic patch between helices ␣1 and ␣2 and flanked by positively charged surfaces (Fig. 2C), as expected for interaction with the negatively charged phospho-CTD. Interestingly, the tyrosine-proximal residue Ala 662 is identical in human Set2, as are Phe 635 , Glu 656 , and Glu 673 in the putative CTD-binding regions (Fig. 1C). In the three known CTD-protein complex structures, the Y1 side chain is also involved in hydrophobic contacts (34,37,38), suggesting that Y1 binding is a general feature of CTD recognition. Previous studies revealed that the CTD can adopt different conformations (reviewed in Ref. 7), and this structurally versatile nature of the CTD discourages any detailed model building.
The SRI Domain Resembles a Polymerase-interacting Domain in Bacterial Factors-Comparison of our structure with known folds in the data base (DALI (39)) strikingly shows that the SRI domain resembles a region in bacterial factors (Fig. 3). The four highest hits were the factors 28 (PDB code 1rp3), E (PDB-code 1or7), R (PDB code 1h3l), and 70 (PDB code 1sig), which show DALI scores of 5.6, 5.4, 5.1, and 4.9, respectively, and root mean square deviations between 3.3 and 3.7 Å. The region in 70 that is structurally related to the SRI domain is domain 2 ( 2 ), which interacts with the clamp region of the core RNA polymerase upon formation of the holoenzyme (40). The 2 domain is involved in binding the Ϫ10 element of promoter DNA and contributes to DNA melting during initiation (reviewed in Ref. 41). In the eukaryotic initiation complex, promoter DNA around position Ϫ10 lies near the NH 2 -terminal domain of the initiation factor TFIIE␣ (42), which shows weak sequence homology (43) and structural similarity (44) to the bacterial 2 domain. We speculate that the eukaryotic TFIIE␣ NH 2 -terminal domain, which may contact promoter DNA, and the Set2 SRI domain, which binds the negatively charged phospho-CTD, both evolved from the bacterial 2 domain.