Identification of the Single-stranded DNA Binding Surface of the Transcriptional Coactivator PC4 by NMR*

The C-terminal domain of the eukaryotic transcriptional cofactor PC4 (PC4 CTD ) is known to bind with nanomolar affinity to single-stranded (ss)DNA. Here, NMR is used to study DNA binding by this domain in more detail. Amide resonance shifts that were observed in a1H15N-HSQC-monitored titration of15N-labeled protein with the oligonucleotide dT18 indicate that binding of the nucleic acid occurs by means of two anti-parallel channels that were previously identified in the PC4 CTD crystal structure. The β-sheets and loops that make up these channels exhibit above average flexibility in the absence of ssDNA, which is reflected in higher values of T1ρ, reduced heteronuclear nuclear Overhauser effects and faster deuterium exchange rates for the amides in this region. Upon ssDNA binding, this excess flexibility is significantly reduced. The binding of ssDNA by symmetry-related channels reported here provides a structural rationale for the preference of PC4 CTD for juxtaposed single-stranded regions (e.g. in heteroduplexes) observed in earlier work.

The eukaryotic general transcriptional cofactor PC4 is known to enhance activated in vitro and in vivo transcription from various RNA polymerase II promoters, in concert with members of all major classes of transcriptional activators (1)(2)(3). The activator-dependent stimulation of transcription by PC4 has been shown to originate mostly from increased recruitment of basal transcription factors during the early stages of preinitiation complex (PIC) 1 formation (4). It is likely that this effect is caused by the interactions that have been reported between PC4 and the activation domains of promoter-binding activator proteins on the one hand, and basal factors (including TFIIA) on the other (1,3). Thus, PC4 appears to act as a bridging factor that stabilizes the preinitiation complex.
Recently it has become clear that PC4 has several additional functions. In the first place, the protein was found to be able to repress transcription under specific conditions (5)(6)(7). Repression at moderate PC4 concentrations is observed in minimal in vitro transcription systems, that normally do not require the basal factor TFIIH (5,6), and with respect to aspecific transcription initiation by RNA polymerase II from DNA ends and unwound regions (6). Repression is not observed in full-factor in vitro transcription, because TFIIH is able to overcome repression by PC4 (5,6). Thus, PC4 appears to act as what could be called a fidelity-enhancing factor for RNA polymerase II transcription initiation by repressing adventitious transcription from incomplete PICs on the one hand, and on the other hand enhancing transcription from fully assembled PICs, i.e. those that include TFIIH, generally believed to be the last factor to enter the PIC during assembly (8). Surprisingly, PC4 has recently also been found to associate in vivo with the RNA polymerase III transcription factor TFIIIC and to stimulate transcription by RNA polymerase III by increasing the rate of reinitiation (9). Interestingly, copurification of PC4 with the human replication protein A (RPA) and effects of PC4 observed in a reconstituted SV40 replication system suggested that the protein may play a further role in DNA replication (10). Thus, it is emerging that PC4 is an extremely versatile factor, involved in the processing of genetic information at multiple levels.
Although stimulation of RNA polymerase II transcription by PC4 is probably largely effected by protein-protein interactions depending on the N-terminal half of the protein, the C-terminal half of the protein in addition comprises a powerful ssDNAbinding domain (3,4,11,12). The physiological role of this domain has long remained enigmatic. Recently, binding of single-stranded DNA was found to be entirely dispensable for transcription activation by PC4 but strictly required for repression of transcription in the absence of TFIIH (6). Earlier analysis of PC4 CTD (12) has shown that it binds with nanomolar affinity to juxtaposed stretches of ssDNA running in opposite directions, such as present in partially melted double-stranded DNA or heteroduplexes. Optimal binding to such structures requires approximately 8 unpaired nucleotides in each of the two strands. Single-stranded oligonucleotides can be bound with almost equal affinity, provided these molecules contain a minimum of 16 -20 nucleotides. These observations support the idea that ssDNA has to be bent 180°to create an arrangement of anti-parallel strands similar to the topology of a heteroduplex for PC4 to bind.
Earlier, the crystal structure of PC4 CTD was solved (13). This structure revealed a novel homodimeric fold in which each monomer consists of a curved four-stranded ␤-sheet and a 45°-kinked ␣-helix, both contributing to a complex dimerization interface. A prominent feature of the structure is the presence of two channel-like depressions running in anti-parallel directions. These channels, formed by the curved ␤-sheets, are lined both by positively charged and aromatic residues, suggesting involvement in ssDNA binding in a manner similar to the DNA binding mode of other single-stranded DNA-binding proteins such as RPA (14). To investigate whether the channels that were identified indeed constitute the DNA binding surface of PC4 CTD , we have studied this protein domain and its interaction with a single-stranded oligonucleotide by means of heteronuclear NMR.

EXPERIMENTAL PROCEDURES
NMR Sample Preparation-PC4 CTD was overexpressed in Escherichia coli strain BL21 (DE3) using pET-11a (Novagen) constructs. The PC4 CTD construct encodes amino acids 63-127 of the full-length protein, preceded by Met-Ala, originating from the vector sequence. Bacteria were grown at 37°C in synthetic medium (6.0 g/liter Na 2 HPO 4 , 2H 2 O, 3.0 g/liter KH 2 PO 4 , 0.5 g/liter NaCl, 1.0 mM MgSO 4 , 0.2 g/liter FeCl 3 , 20 M CaCl 2 , 0.5 mg/liter thiamine) containing 0.5 g/liter 15 NH 4 Cl as the sole nitrogen source and either 4.0 g/liter 12 C-glucose, or 1.0 g/liter 13 C-glucose as the only carbon source. Induction took place at an A 600 nm of 0.5, by the addition of isopropyl-␤-D-thiogalactopyranoside to a final concentration of 1 mM. After 4 h the cells were harvested, resuspended in buffer A (20 mM Tris-HCl, pH 7.3, 10% glycerol, 1 mM EDTA, 5 mM dithiothreitol, 10 mM Na 2 S 2 O 5 , 1 mM phenylmethylsulfonyl fluoride, 10 g/ml aprotinin, 20 g/ml leupeptin) containing 500 mM KCl and lysed by freeze-thawing (once) and sonication for a total of 5 min (in bursts of 20 s). The lysate was then centrifuged at 20,000 ϫ g for 20 min at 4°C. The supernatant was diluted to 200 mM of KCl by adding buffer A without KCl and immediately loaded onto a heparin-Sepharose column (Amersham Pharmacia Biotech). The column was washed using buffer A at 200 mM KCl and eluted with buffer A containing 500 mM KCl. Purest fractions, selected on the basis of Coomassie-stained SDS-polyacrylamide gel electrophoresis gels, were pooled and, after dilution with buffer A without KCl to a final concentration of 75 mM KCl, applied to an SP-Sepharose column (Pharmacia), followed by washing with buffer B (20 mM potassium phosphate, pH 5.5, 5 mM dithiothreitol) containing 75 mM KCl and elution by means of a linear salt gradient (75-1,000 mM KCl) in buffer B. Peak fractions were further purified using a Superdex 75 column (Pharmacia) in the same buffer containing 400 mM KCl and subsequently concentrated by means of Centricon spin dialysis tubes (molecular mass cut-off 10 kDa, Amicon). Protein samples at this stage appeared as single bands on silver-stained polyacrylamide gels. Protein concentrations were determined using Bio-Rad protein assays (Bio-Rad) employing bovine gammaglobulin as a standard. Deuterated glycine was added to the final samples to an end concentration of 2 M, because this resulted in better long-term stability of the protein at The oligonucleotide dT 18 was purchased from Carl Roth GmbH & Co. The DNA was dissolved in buffer B containing 400 mM KCl and 2 M deuterated glycine to an oligonucleotide concentration of 11 mM.
Protein-DNA titrations were carried out by repeated addition of small aliquots of an 11 mM solution of dT 18 to a 1.5 mM 15 N-PC4 CTD sample. Formation of the protein-DNA complex was observed through the recording of a 500 MHz 1 H 15 N-HSQC spectrum after each addition. 15 N-T 1 and 15 N-T 1 relaxation and heteronuclear NOE experiments were carried out at 500 MHz as described earlier (15)(16)(17), using pulsedfield gradients for coherence selection in combination with the sensitivity enhancement scheme (18 -20). The longitudinal 15 N relaxation rates were determined from a series of 7 spectra with delays of 32, 64, 128, 256, 512, 768, and 1024 ms. The transverse in-phase 15 N relaxation rates (21) were determined from a series of 7 spectra with delays of 8, 16, 32, 48, 64, 96, and 128 ms, using a 15 N-spin lock with a field strength of 1.4 kHz. Heteronuclear cross relaxation constants were derived from two 500 MHz spectra recorded with and without 2.5 s of saturation of the amide protons.
To measure the exchange rate of amide protons, the protein was first lyophilized and then redissolved in D 2 O, followed by the recording of a series of 1 H 15 N-HSQC spectra, up to 200 h after redissolving the protein.
All spectra were processed on Silicon Graphics O 2 workstations using the software package NMR Pipe (22). Longitudinal and transversal in-phase relaxation rates and the errors in these parameters were obtained by curve-fitting of a mono-exponential function through the peak intensities according to the Levenberg-Marquardt algorithm using in-house developed software, as described by Vis et al. (17).

RESULTS AND DISCUSSION
The 600 MHz 1 H 15 N-HSQC spectrum of PC4 CTD is shown in Fig. 1. A nearly complete backbone assignment (listed in Table  I) was obtained using double and triple resonance methods, essentially in the manner described by Vis et al. (23). Thus, amide 15 N and 1 H frequencies were obtained from a highresolution 600 MHz 1 H 15 N-HSQC spectrum, followed by the collection of corresponding intra-and inter-residual C ␣ and CO connectivities from 600 MHz HNCA, HN(CO)CA, HNCO, and HN(CA)CO spectra. This information allowed in most cases for the unique matching of inter-residual C ␣ and CO connectivities to the intra-residue connectivities of the preceding amide. Peptide fragments identified in this way could then be retrieved in the protein sequence on the basis of characteristic C ␣ -frequencies (24) and, if detectable, side chain resonances obtained from a 750 MHz total correlation spectroscopy-HSQC spectrum. Nearly all H ␣ frequencies could also be obtained from this total correlation spectroscopy-HSQC spectrum. Remaining ambiguities in the sequential assignment were resolved using short range NOEs from a 750 MHz NOE spectroscopy-HSQC.
To investigate which parts of PC4 CTD are involved in the interaction with ssDNA, a titration was performed with the oligonucleotide dT 18 . The choice of an 18-mer for our titration was based on earlier experiments that indicated that PC4 CTD interacts with approximately 16 -20 residues of ssDNA (12). The complex that forms when the oligonucleotide is added exhibits slow exchange, resulting in a simultaneous decrease in the intensity of the original peaks and increase in the intensity of new signals (data not shown). Nevertheless, assignment of the new signals was in almost all cases straightforward, because the perturbations in the spectrum were relatively modest. Fig. 2A contains a selected region of the superimposed 1 H 15 N-HSQC spectra of the free protein and the 1:1 complex showing resonances that are either not affected (Gln-116), slightly affected (Lys-126), or severely affected (Asn-106) by the addition of the ssDNA. Fig. 2B lists all perturbations observed. The perturbations are given as Euclidean distances (in Hz) between the peak maxima in the HSQC spectra with and without an equimolar amount of dT 18 . Although small shifts (around 10 Hz) in peak positions are observed for residues throughout the protein, all perturbations exceeding 20 Hz map to the ␤-sheet region. Especially large shifts of 30 Hz or more are seen for the outermost parts of ␤-sheets, ␤2 and ␤3, and the loop that connects them (residues Phe-77, Lys-78, Gly-79, Lys-80, and Val-81), the "␤-ridge" that separates the two anti-parallel channels (residues Trp-89, Met-90, Asp-91, and Lys-101), and the region that connects sheet ␤4 to the ␣-helix (residues Leu-105 and Asn-106). In addition, a surprisingly large perturbation was found for the N-terminal Ala residue, which originates from the expression vector sequence but is located in close proximity to the ends of the anti-parallel channels in the crystal structure. In   FIG. 2. HSQC-monitored titrations of PC4 CTD with the single-stranded oligonucleotide dT 18 . A, selected HSQC region showing peaks from residues Asn-106, Gln-116, and Lys-126 in the absence (dotted contour lines) and presence (solid contour lines) of equimolar amounts of the oligonucleotide dT 18 . Arrows indicate the perturbations caused by the addition of ssDNA. B, combined chemical shift perturbations for 15 N and 1 H (⌬␦ NH ) along the PC4 CTD sequence. Perturbations were calculated as Euclidean distances between peak maxima, i.e. ⌬␦ NH ϭ ((⌬␦ 15 N) 2 ϩ (⌬␦ 1 H) 2 ) 1/2 , where ⌬␦ 15 N and ⌬␦ 1 H denote the 15 N and 1 H chemical shift changes (in Hz), respectively. In case of degeneracy, the perturbation of the peak having the highest intensity is shown. In the case of residues Lys-78 and Ile-83, no unambiguous assignments could be made for the complexed state; for these residues, minimal perturbation values are shown. The secondary structure of PC4 CTD has been indicated schematically below the graph. Residues Lys-78 and Ile-83 could not be assigned unambiguously in the complex; perturbations for these residues in the graph are lower limits.    Fig. 2B are shown in the PC4 CTD crystal structure by means of color coding, with colors ranging from green (indicating no perturbation) through white to red (corresponding to a perturbation of 30 Hz or more). As this figure shows, the data support a model in which the ssDNA is held in between the ␤2␤3-arms and the ␤-ridge of the protein.
Consistent with current views concerning protein-ssDNA interactions (25), we find a large number of positively charged and aromatic residues at the binding surface with their side chains exposed to solution (Lys-68, Arg-70, Tyr-71, Arg-75, Phe-77, Lys-78, Lys-80, Arg-86, Tyr-88, Trp-89, Lys-97, Arg-100, and Lys-101). Interestingly, superposition of one of the PC4 anti-parallel channels onto the ssDNA binding channel found in the RPA-ssDNA cocrystal structure revealed that several of these residues (Arg-75, Phe-77, Arg-86, Tyr-88, Trp-89, and Arg-100) occupy positions with respect to the channel that correspond well to the positions of similar residues in the two RPA subunits (13,14). This indicates that PC4 and RPA may have similar ways of contacting ssDNA even though the ␤-strand topology is markedly different. For residues Tyr-71, Arg-75, Phe-77, Lys-78, Lys-80, Arg-86, Trp-89, and Lys-101, large ssDNA-induced amide resonance perturbations were observed in our titration. We expect these residues to make side chain contacts with the DNA. For Trp-89, involvement in bind-ing has recently been confirmed by mutagenesis (6). For residues Lys-68, Arg-70, Tyr-88, Lys-97, and Arg-100, no assignments could be obtained, but the position of these residues with respect to the binding channels strongly suggests that their side chains are also involved in binding.
Comparison with other single-stranded DNA-binding proteins suggests a particularly important role in ssDNA binding for the ends of ␤-strands, ␤2 and ␤3, and the loop ␤2␤3 that connects them (residues 77-80). This part of the protein is reminiscent of typical ssDNA-binding loops found in many other single-stranded DNA-binding proteins, like the L 45 -loop of the OB-fold proteins (26). It contains an aromatic residue (Phe-77) and several positively charged residues (Lys-78 and Lys-80, with Arg-75 located nearby in ␤-strand ␤2). , the two optimal binding sites that were found in an earlier study (12). Protein monomers are shown in light blue and dark blue. DNA is shown in green with ribbons representing the phosphate backbone and base pairs are shown as horizontal bars. An arrow (40 Å) is drawn to indicate the scale of both models. Positively charged and aromatic side chains, on the basis of our NMR study, that we expect to mediate contacts have been indicated in red. Involvement in DNA binding of the side chains shown in yellow is suggested by both our NMR data and a recent mutagenesis study (6).
supported by a recent mutagenesis study (6).
Temperature factors from the crystallographic structure determination of PC4 CTD (13) suggested that the ␤2␤3-loop and the ␤-ridge exhibit above average mobility. To assess the flexibility of these regions in solution, we carried out relaxation measurements ( 15 N-T 1 , 15 N-T 1 , and 1 H 15 N-NOE) for the backbone amides. Results are shown in Fig. 4. Although 15 N-T 1 did not show significant variation along the protein sequence, 15 N-T 1 values are indeed higher in this region than in the rest of the protein. Considerably lower heteronuclear 1 H 15 N-NOE values were observed in the ␤2␤3-loop and the ␤-ridge compared with the rest of the molecule. Consistent with these relaxation data, fast hydrogen exchange was observed in both the ␤2␤3loop and the ␤-ridge (Fig. 4, lower panel), whereas most of the amides in the core of the protein exhibit either medium or slow exchange. Thus, the regions of PC4 that mediate ssDNA contacts are relatively flexible in solution. Upon formation of the complex with dT 18 , T 1 values for amide nitrogens throughout the protein are reduced, most likely because of the larger size of the complex (Fig. 5). However, the reduction of signals from residues in the ␤2␤3-loop region is significantly larger than this uniform decrease in signal intensity. This effect may be caused by attenuation of the higher mobility in this region, which can be understood in terms of stabilization by ssDNA contacts.
DNA binding by the two channels as demonstrated here is consistent with and provides support for existing biochemical data in several ways. In the first place, high affinity for heteroduplex structures and the apparent requirement for two juxtaposed single-stranded regions (12) are readily explained by the present model: each binding channel can contact one of the strands running in opposite directions, as illustrated in Fig.  6. Moreover, the binding site size (approximately 8 nucleotides in each of the two single-stranded regions) as determined from the affinities for both heteroduplexes and single-stranded oligonucleotides of increasing size (12), corresponds closely to what would be expected on the basis of the channel length in the PC4 structure: if it is assumed that ssDNA bound by PC4 is similar in conformation to ssDNA bound by RPA, 8 nucleotides should span a distance of nearly 40 Å, which is also approximately the length of the channels (Fig. 6).
The backbone resonances of the residues Met-69, Arg-70, Tyr-88, and Lys-97-Arg-100 of PC4 CTD could not be assigned because the corresponding signals were missing in all spectra. Interestingly, several other residues (Glu-93-Met-96) that are in close proximity to the ones that were unassigned give rise to multiple sets of signals, suggesting conformational heterogeneity. Because all of these residues are located in the region of the PC4 CTD structure that connects the tips of the ␤-ridge (which themselves produce well defined nondegenerate resonances) to the core of the protein dimer, it is tempting to speculate that the orientation of the tips with respect to the protein core is ill-defined in solution. Interestingly, the tips contain several negatively charged residues (Asp-91, Glu-93, and Glu-95) that seem to interfere with the interaction of PC4 CTD with an 8-nucleotide heteroduplex in the hypothetical model published earlier (see Ref. 13 and Fig. 6B), because these charges would be in close proximity to the phosphate backbone of the DNA. However, such DNA molecules are bound with very high affinity (12), suggesting that steric hindrance does not occur. A possible explanation for this is that the tips may be able to adopt alternative conformations in which they have moved "inward" compared with the crystal structure (i.e. toward each other). Such alternative conformations may allow the ␤-ridge to be inserted into the heteroduplex opening more easily. Conformational heterogeneity would be consistent with the NMR degeneracy that is found for the amino acids in the hinge region and the fact that crystallographic temperature factors for this region are significantly higher than the average for PC4 CTD (13).
So far only a single protein (other than the direct homologues of PC4 in various species) has been characterized that contains a region homologous to the PC4 C terminus. This protein, the yeast factor Sub1/Tsp1 (27,28), is also an RNA polymerase II transcriptional coactivator but differs from PC4 in several important respects. In the first place, Sub1/Tsp1 is much larger than PC4 (33 kDa) and bears no homology to PC4 outside the region corresponding to the PC4 ssDNA-binding domain. Furthermore, the reported mechanisms for coactivation reported for Sub1/Tsp1 (28) and PC4 (4) differ considerably. Nevertheless, the homology between PC4 CTD and Sub1/Tsp1(40 -105) is very high (73% conserved, 49% identical), suggesting a similar fold and function. Indeed, Sub1/Tsp1 has been reported to bind to ssDNA (27). The fact that two otherwise quite different transcriptional cofactors both contain this type of ssDNA binding domain suggests that interaction with ssDNA or heteroduplexes is part of a general mechanism that helps regulate transcription in eukaryotes. Indeed, direct evidence for a role of ssDNA binding by PC4 in regulation of transcription was recently provided (6). The identification of DNA-contacting residues described here and the construction of point mutants that do not interact with ssDNA (6) will be helpful in the further elucidation of the precise role of the PC4 C-terminal domain and the homologous domain in Sub1/Tsp1, in transcription and other processes.