Specific Recognition of the C-rich Strand of Human Telomeric DNA and the RNA Template of Human Telomerase by the First KH Domain of Human Poly(C)-Binding Protein-2

P oly( C )- b inding p roteins (PCBPs) constitute a family of nucleic acid-binding proteins that play important roles in a wide spectrum of regulatory mechanisms. The diverse functions of PCBPs are dependent on the ability of the PCBPs to recognize poly(C) sequences with high affinity and specificity. PCBPs contain three copies of KH (hnRNP K h omology) domains, which are responsible for binding nucleic acids. We have determined the NMR structure of the first KH domain (KH1) from PCBP2. PCBP2 KH1 domain adopts a structure with three α -helices packed against one side of a three-stranded antiparallel β -sheet. Specific binding of PCBP2 KH1 to a number of poly(C) RNA and DNA sequences, including the C-rich strand of the human telomeric DNA repeat, the RNA template region of human telomerase, and regulatory recognition motifs in the poliovirus-1 5’UTR, was established by monitoring chemical shift changes in protein 15 N-HSQC spectra. The nucleic acid binding groove was further mapped by chemical shift perturbation upon binding to a six-nucleotide human telomeric DNA. The binding groove is an α / β platform formed by the juxtaposition of two α -helices, one β -strand, and two flanking loops. While there is a groove in common with all of the DNA and RNA binders with a hydrophobic floor accommodating a three-residue stretch of C residues, nuances in recognizing flanking residues are provided by hydrogen bonding partners in the KH domain. Specific interactions of PCBP2 KH1 with telomeric DNA and telomerase RNA suggest that PCBPs may participate in mechanisms involved in the regulation of telomere/telomerase functions.

molecule of PCBP1 or PCBP2, while in the silencing complexes, two PCBPs are involved: hnRNP K and PCBP1 or 2.
Besides cellular RNAs, PCBPs also critically participate in the regulation of viral RNA functions. It was shown that the 5'-UTR of Poliovirus mRNA (which is also the genomic RNA) harbors two binding sites for PCBP1 or 2. Binding of PCBP2 to one of the sites, a C-rich internal bulge sequence known as loop B RNA within the IRES (Internal Ribosomal Entry Site) element, is required for cap-independent translation of the viral RNA (12), (13) (14,15); (16). Binding of PCBP1 or 2 to the other site, a C-rich loop in the stem-loop B domain within a so-called cloverleaf-like RNA structure, achieves high affinity only in the presence of the viral protein 3CD (precursor of the viral protease 3C and the viral polymerase 3D) bound to a different part of the cloverleaf. Formation of the PCBP-cloverleaf-3CD ternary complex inhibits translation and switches the viral RNA to a template for RNA replication (17); (15). Unlike PCBP-binding sites within 3'UTRs of the cellular mRNAs, the two viral C-rich recognition sequences are presented in the context of a structured RNA, as demonstrated by our recent NMR structures of two sequence variants of the loop B RNA (18).
Involvement of PCBPs in regulating RNA functions is certainly not limited to the cases mentioned above. A recent study has identified 160 mRNA species that associate in vivo with PCBP2 from a human hematopoietic cell line (19), suggesting that the contribution of PCBPs in posttranscriptional gene regulation may be far more profound than currently known. Moreover, PCBP actions may not be limited to mRNAs. For example, an interaction between a PCBP4 isoform known as MCG10 or PCBP4b and the C-rich RNA template region of human telomerase has been proposed as a possible mechanism for the functional role of PCBP4b as a p53-induced regulator in apoptosis and cell cycle arrest at G 2 -M (20) 6 Biological functions of PCBPs are further diversified by their ability to interact specifically with not only RNA but also DNA sequences. Specific binding of hnRNP K to the single-stranded pyrimidine-rich sequence in the promoter of human c-myc gene activates transcription (21). It was also shown that hnRNP K could recognize the C-rich strand of human telomeric DNA with high affinity (22). This property of high-affinity binding to telomeric repeats has recently been extended to PCBP1. Using affinity chromatography and mass spectroscopy, Bandiera and coworkers had identified PCBP1 as the most specific C-rich telomeric DNA binder in the K562 human cell line (23). The authors also implied that PCBP2 should share similar binding property based on the high degree of homology between PCBP2 and PCBP1 (82% identical, 88% positive; the three nucleic acid-binding KH domains share an even higher degree of homology, which is 93% identical and 97% positive collectively -see Figure 1b), and the remarkable similarity between high affinity SELEX RNA target sequences for PCBP2, W 2-6 [C 3-5 W 2-6 ] 3-4 , W is A or U (24), and the C-rich human telomeric repeats, (CCCTAA) n . Further studies are needed to confirm the interaction between the PCBPs and human telomeric DNA, and to assign the functional importance of such an interaction.
Given all the current knowledge about PCBPs, it is very intriguing to note that the highly diverse functional roles of PCBPs are nearly all dependent on the ability to recognize C-rich RNA or DNA sequences with high specificity and affinity. It is known that individual KH domains from PCBPs can bind nucleic acids as discrete and independent entities (25). With the presence of three KH domains in the PCBPs, it would be very interesting to discover which KH domain, or synergy of domains, contributes to the nucleic-acid-binding specificity and affinity. It would also be important to reveal the detailed interactions defining specificity and affinity, and how the seemingly similar events of binding to single-stranded C-rich motifs give rise to the very broad spectrum of functional diversities.
In an effort to address these fundamentally important questions regarding structures and functions of PCBPs, we have begun a program to study the nucleic acid-protein interactions with a focus on PCBP2. So far, a large number of nucleic acid targets have been established or proposed for PCBP2. This portfolio covers both RNA and DNA sequences, both naturally occurring and SELEX motifs, both non-structured and highly structured RNAs. Moreover, there is relatively good knowledge about the nucleic acid-binding property of the three KH domains from PCBP2; it was shown that both the KH1 and KH3 domains as independent entities could bind strongly to poly(C) ribohomopolymers (25); however, only the KH1 domain was shown to interact specifically with the two C-rich sequences presented within the cloverleaf and the IRES structures from the Poliovirus 5'UTR. With all these data, there is no doubt that PCBP2 represents one of the best biochemically characterized proteins not only in the PCBP subfamily but also in the superfamily of KH domain-containing proteins (considering that no bona fide nucleic acid targets have been identified for most of the known KH domain-containing proteins).
Detailed structural study on the PCBP2 system should provide valuable insights into the molecular mechanisms of PCBP functions in particular, and KH-domain mediated processes in general. As part of our continuing research effort, we have determined the solution structure of the first KH1 domain from PCBP2 and characterize its interaction with DNA and RNA targets including sequences from cellular mRNA, Poliovirus RNA, human C-rich strand telomeric DNA, and the RNA template of human telomerase. Binding Protein-2 (PCBP2) was amplified by PCR using appropriate primers and a plasmid containing the gene for full-length PCBP2. The amplified gene was cloned between Nde I and Xho I sites of the plasmid vector pET 24a, so the protein was overexpressed with a C-terminal His-tag. The cloned plasmid was transformed into a BL21(DE3) strain of Escherichia coli RNA polymerase and a synthetic DNA template (26,27). For annealing, the RNA solutions were heated to 90 o C and slowly cooled to room temperature. DNA and RNA oligonucleotides were ordered from Integrated DNA Technology (IDT).
Samples of the KH1-nucleic acid complexes for NMR studies were prepared by titrating solutions of the nucleic acids into solutions of the KH1 protein domain and monitoring the NMR signals of the protein in a 15 N-HSQC spectrum. To monitor complex formation between the PCBP2 KH1 domain and various DNA/RNA molecules, we compared 15 N-HSQC or TROSY-15 N-HSQC spectra of the 15 N-labeled protein before and after nucleic acids titration. It should be noted that although TROSY-type experiments are usually used in systems suffered from line-broadening as a result of increased molecular weight or exchanges and therefore suggestive of such problems, the choice of TROSY experiment for some of the complexes in this study is largely a matter of personal flavor, rather than necessity. TROSY experiment is less sensitive but has superior line-shape because no decoupling is required during data acquisition. Since solubility of the KH1 domain and its nucleic acid complexes is generally quite good, we were able to perform our experiments at very high protein concentration. Regular and TROSY-type 15 N-HSQC spectra were used quite randomly in our studies without noticeably difference; with the only exception being the complex with a 34-nt RNA bulged stem-loop for which reasonable spectra could only be obtained using TROSY-15 N-HSQC.
Structure Calculations. In addition to the NOE-derived distance restraints, residual dipolar coupling restraints, and TALOS/HNHA-derived torsion angle restraints, generic hydrogen bond distance restraints were utilized for regions of regular secondary structure that were based on characteristic NOE patterns and secondary chemical shifts. Structure refinement was carried out by simulated annealing in torsion angle space using established procedures implemented in the program XPLOR-NIH (37). NMR restraint violations and structure quality were analyzed via the programs AQUA and PROCHECK_NMR (38).
Structure figures were generated using MidasPlus (39) or Chimera (40) from the Computer Graphics Laboratory, University of California, San Francisco.

RESULTS
Structure of the PCBP2 KH1 Domain. The secondary structure of the PCBP2 KH1 domain was defined by established NMR methods based on chemical shift indexing (CSI), backbone torsion angles derived from the HNHA experiment and TALOS prediction, as well as characteristic NOE patterns. Figure 1c shows the NMR-derived secondary structure, with a summary of the sequential and medium range NOEs characteristic of α-helices or β-strands, and chemical shift indexing (CSI) data taking into account of chemical shifts of amide proton, amide nitrogen, α and β carbons. The secondary structure consists of three α helices and three β strands arranged in the order β1-α1-α2-β2-β3-α3. The evolutionarily conserved invariable GXXG loop is located between α1 and α2; the variable loop is between β2 and β3. As shown in Figure 1c, measured one-bond amide H-N RDC values show different signs for the α helices and β strands, indicting the roughly parallel orientation of these secondary structure elements.
The three-dimensional structure of the PCBP2 KH1 domain was calculated using residual dipolar couplings (RDC) restraints in addition to distance restraints derived from NOE measurements and torsion angle restraints derived from scalar coupling experiments. Structure calculation statistics are summarized in Table 1. Figure 2a shows the backbone superimposition of the twenty final structures with the lowest energy penalty; Figure 2b is a ribbon diagram of the structure with lowest overall energy. The spatial arrangement of the structural elements of the PCBP2 KH1 domain is that the three β strands form an antiparallel β-sheet, with a spatial order of β1-β3-β2; the three α helices are packed against one side of the β-sheet. The two loops, which are less ordered, point upward in the orientation shown and help to define the edges of the nucleic acid binding platform (details in later sections). The three α helices and the three β strands are all amphipathic in nature, with their hydrophobic residues pointing inward to form a hydrophobic core. The residues in the core (L14, I16, L18, I47, I49, I59, and L61 from the β strands; V25, I28, I29, V36, M39, I68, A71, F72, I75, I76, and L79 from the α helices) are exclusively hydrophobic; most of these residues are isoleucines, leucines, or valines, which are the most hydrophobic amino acids (highest positive values for the Kyte-Doolittle hydropathy index). It therefore seems like a very high degree of hydrophobicity is sought in the core. This also provides a good explanation for the high degree of conservation for these residues among the KH domains ( Figure 1b). There are several sharp turns of the backbone in the molecule, at junctions between α1 and β1, β2 and α2, α3 and β3 (Figures 2a and 2b); all of these sharp turns are mediated by a glycine (G22, G44, and G63, respectively, at the three junctions). G44 and G63 are highly conserved among the KH domains from PCBPs. G22 is conserved among the  Figure  3b from the DNA and RNA complexes show identical or quite similar chemical shift changes for most of the residues, with only a couple NMR signals manifesting a relatively larger difference between DNA and RNA complexes (some such cross-peaks from the RNA complex spectrum are boxed).
Since we had established that a DNA oligonucleotide with the sequence 5'-TAAAAA-3' did not interact with PCBP2 KH1, we hypothesized that the three C residues within the telomeric DNA repeat or the telomerase RNA template provided the determinant for specific interaction with PCBP2 KH1. Indeed, when we titrated PCBP2 KH1 with a 6-nucleotide DNA 5'-AACCCT-3', which is a shorter version of the C-rich human telomeric DNA, we obtained a virtually identical 15 N-HSQC spectrum as that with the 10-nucleotide DNA ( Figure 3d); G30 and I49 cross-peaks are not observed in the 10-nucleotide DNA complex most likely due to different dynamics; all other cross-peaks could be superimposed with corresponding cross-peaks in the 10-nucleotide DNA-complex spectrum. Based on these data, we conclude that specific recognition of the human telomeric DNA sequence by PCBP2 KH1 is mainly mediated by a motif centered on the three C residues. Although we did not perform a similar experiment on the telomerase RNA template, we believe this conclusion should also be valid for the RNA case, based on our results for the other RNA oligonucleotides and the similarity between DNA and RNA binding.

Nucleic Acid Binding Does Not Cause Substantial Structural Changes in PCBP2 KH1
Domain. We have made almost complete backbone assignments for PCBP2 KH1 domain in complex with the 6-nucleotide (5'-AACCCT-3') human telomeric DNA. We have also assigned a significant number of NOEs that permit a meaningful analysis of the NOE patterns. Based on identification of characteristic short-and medium-range NOEs and chemical shift indexing, it is clear that the DNA-bound form of PCBP2 KH1 domain assumes the same secondary structure as in the free form; DNA binding does not disrupt any existing secondary structure nor induce any new secondary structure. Moreover, DNA-binding seems to only affect residues in and around the nucleic-acid-binding pocket, which is presented on one surface of the protein (vide infra).
Structures of free and RNA-or DNA-bound forms of two KH domains; namely, the KH3 domains from hnRNP K and Nova2, respectively, have been solved by NMR and X-ray crystallography (42); (43); (44,45). In both cases, the overall structure of the KH domain was not changed significantly by nucleic acid binding. Based on the data we have at this point, this scenario should also be true for the PCBP2 KH1 domain interacting with the human telomeric DNA.

Mapping the Telomeric DNA Binding Interface of PCBP2 KH1 by Chemical Shift
Perturbations. With substantial assignments available for the PCBP2 KH1 domain in both the free and telomeric DNA bound forms, we were able to map unambiguously the DNA binding interface by chemical shift perturbations. Chemical shift changes for the backbone amide protons and nitrogens of PCBP2 KH1 are plotted against residue number in the top two panels of Figure   4. Except for T60, all of the residues experiencing significant chemical shift changes are located in the first two α helices (α1 and α2), the second β strand (β2), and the two loops (the GXXG loop and the variable loop). Significant chemical shift changes for other backbone nuclei (H α , C α and C') are also confined to the α1-α2-GXXG loop-β2-variable loop region. These data indicate that the nucleic acid binding site of PCBP2 KH1 is an α/β platform defined by the juxtaposition of two α helices and one β strand, with further contribution from the two flanking loops.  Figures 5a and 5b, residues with red or orange colors are clustered on one surface of the structure and clearly define a narrow and elongated groove for nucleic acid binding. The narrow width of the groove may impose a steric impediment for bulky purine residues, explaining the preference for pyrimidine-rich sequences; the limited length of the groove may dictate that only a limited number of residues are allowed in the core nucleic acid recognition sequences.
Properties of the Nucleic Acid Binding Groove of PCBP2 KH1. Now that we have mapped the nucleic acid binding groove of PCBP2 KH1, we can analyze this in some detail to gain insights into its attributes that may dictate specificity and affinity for binding nucleic acids. To assist this analysis, we generated another surface representation of the nucleic acid binding groove, which has an identical viewing angle as in Figure 5a but with a different coloring scheme based on residue properties instead of on chemical shift perturbation. In Figure 6, hydrophobic residues are colored green, positively charged residues blue, negatively charged residues red, and uncharged hydrophilic residues yellow. It is seen clearly that the floor of the nucleic acid binding groove consists of hydrophobic residues exclusively. Isoleucine and valine, two of the most hydrophobic residues, dominate the floor, with a string of four glycines lying along the groove ridge on the upper left corner of the floor. The absolute conservation of hydrophobicity on the floor strongly indicates its functional or structural importance. A single point mutation, Ile to Asn, in KH2 of FMR1 (corresponding to V36 in PCBP2 KH1) results in fragile X mental retardation syndrome. Introduction of the asparagines changes the electrostatic properties of the nucleic acid binding groove, making it significantly less hydrophobic. At the ridge on the left-hand side of the groove, all residues but one are positively charged (either lysine or arginine). Residues at the right-hand side ridge of the groove seem more diverse in terms of electrostatic property, but they are common in that they are all hydrophilic residues and possess side-chain functional groups that can act as hydrogen bond donors; the later attribute is probably more critical in defining nucleic acid binding specificity and affinity. No aromatic residues are present in the binding groove, indicating that base-stacking interaction between protein side chains and nucleic acid bases does not play any role in nucleic acid binding by the PCBP2 KH1 domain.  (25). This domain specifically recognizes a DNA tetrad of 5'-TCCC-3', whose RNA equivalent conforms to the optimal consensus SELEX RNA sequence motif for hnRNP K, UC 3 W 2 , W is either A or U (24). Molecular details about specific recognition of the DNA tetrad by KH3 of hnRNP K have been recently revealed by a NMR structure of the complex (45). In the complex, specific recognition of the bases of the three C residues is mediated by an extensive network of hydrogen bonds involving residues Ile29, Ile36, Ile49, and Arg59 (for example, the backbone amide of Ile49 forms a hydrogen bond with the O2 group of the third C residue in the triple C sequence); electrostatic interaction with the RNA backbone phosphate groups comes from the backbone amide of Gly32 and the side chains of Lys31, Lys37 and Arg40. In the PCBP2 KH1 domain, corresponding residues at these key positions are Ile29, Val36, Ile49, Arg57, Lys32, Lys31, Lys37 and Arg40, respectively (Figure 1b). In the structure of the free PCBP2 KH1 domain ( Figure 5), these residues are positioned at roughly the right places (as compared to those in the hnRNP K KH3-DNA complex structure) to effect a similar set of interactions with a triple C sequence; only relatively minor rearrangements of these amino acids would be required to effect complex formation.

DISCUSSION
Although both hnRNP K KH3 and PCBP2 KH1 domains exhibit specificity towards poly(C) sequences, they might differ in their recognition of the identity of the first nucleotide of the core recognition sequence. As indicated by SELEX experiments (24), the first nucleotide of the optimal RNA sequences for hnRNP K was always a U, while it could be U or A for PCBP2.
Corresponding to this SELEX study, we show that PCBP2 KH1 can recognize RNA and DNA sequences with an A or U/T preceding the poly(C) patch ( Figure 3). Superposition of the 15 N-HSQC spectra of PCBP2 KH1 in complex with 5'-AACCCT-3' and 5'-TCCCCA-3' showed that they were virtually identical, indicating that the binding of these two DNAs should be rather similar. In the hnRNP K KH3-DNA complex, the first T residue participated exclusively in hydrophobic interactions with the α-methylene groups of Gly26 and Gly30, the α-methylene group of Ser27, and the aromatic ring of Tyr84 (45). In PCBP2 KH1 domain, these residues correspond to Gly26, Gly30, Ser27, and Asp82 (Figure 1b), with Asp82 versus Tyr84 as the only difference. This difference might be significant due to the very different nature of tyrosine and aspartic acid. We also notice that a stretch of residues immediately N-terminal to Gly26 shows the least conservation between the two domains (Figure 1b; HGKEV in PCBP2 KH1 versus PKDLA in hnRNP K KH3). This may also contribute to a possible difference in local structures, resulting in the more generous binding propensity of the PCBP2 KH1 domain.
Prior to this report, studies on the biological functions of PCBP2 have heavily focused on its RNA-binding ability, with only one report (to the best of our knowledge) suggesting its possible involvement in mechanisms associated with DNA-recognition. In the present study, we have established the ability of PCBP2 KH1 domain to recognize a couple of poly(C) DNA sequences specifically, including the C-rich strand of human telomeric repeats. Comparison with binding to the RNA counterpart sequences unequivocally demonstrates that PCBP2 KH1 domain utilizes not only the same nucleic acid binding platform but also very similar interactions to recognize both the RNA and DNA targets. This is in contrast to some other dual-specificity nucleic acid binding proteins such as the prototypic Xenopus TFIIIA, which recognizes RNA and DNA targets with different motifs; the structures of the RNA and DNA targets being recognized are also different (49). In essence, the interactions between PCBP2 KH1 domain and human telomere/telomerase DNA/RNA we established in the present study imply that proteins of the PCBP family may participate in regulation of telomere and telomerase activities through specific binding to the Crich sequences of telomeric DNA and telomerase RNA. Of course, we could speculate that since the PCBPs have three KH domains, different KH domains in the same molecule may possibly bind to telomeric DNA and telomerase RNA template simultaneously. In fact, should any poly(C)-specific binding activity be required in functions related to telomere and telomerase biology, the ubiquitous PCBP proteins should come to mind as the best candidates, because these proteins are simply the major poly(C)-specific binding proteins inside mammalian cells. Adding to the credibility of involvement of PCBPs in telomere/telomerase regulation, note that hnRNPs (the PCBPs are also members of the hnRNP family) are known to play roles in telomere biology (50). hnRNPs A1, C1/C2, and D are capable of interacting with the human telomerase holoenzyme; hnRNPs A1, A2-B1, D, and E and hnRNP homologous proteins from other organisms can associate with the single-stranded telomeric sequence (G-rich overhang). hnRNPs are integral components of the nuclear matrix. The nuclear matrix is a putative attachment site for telomeres, so hnRNPs and telomeres are in close proximity if not directly associated. hnRNP C1/C2 binds directly and specifically to a six-nucleotide U-rich tract just 5' to the hTR template region. Since hnRNPs are often found in complex with each other, it is possible that other hnRNPs would bind to the hTR template region.    Note that the red spot is seen through a deep and narrow hole, which should not be accessible by the nucleic acids from this side of the structure. Hydrophobic residues are colored green; positively and negatively charged residues are colored blue and red, respectively; hydrophilic noncharged residues are colored yellow. Note that the residues on the floor of the binding groove are exclusively hydrophobic. sample. Note that residue numbering starts from the first residue of the NMR construct as residue 1, therefore all residue numbers need to add a value of 8 to agree to the numbering used in the main text, which corresponds to residue numbering in the full-length PCBP2 protein. Figure S2: Example of a backbone sequential walk from S27 to C46 (S35 to C54 in the numbering system used in the main text) in a 3D HNCA experiment of a 13 C/ 15 N doubly labeled PCBP2 KH1 sample. Figure S3: Sequence alignments generated by the program CLUSTAL X of the two KH domains (KH1 and KH2) of MCG10 (also known as PCBP4b, MCG10 is used here to avoid confusion with the major isoform of PCBP4), to the three KH domains of PCBP4 (KH1, KH2, and KH3), the third KH domain of hnRNP K, and PCBP2 KH1. Residue numbering in the "ruler" field is for PCBP2 KH1 as in the main text. Figure S4: Superimposition of the lowest energy structures of PCBP2 KH1 (in red) and hnRNP K KH3 (in blue). Pair-wise RMSD on backbone heavy atoms (excluding residues in the variable loops which have different numbers of residues) between the two structure is ~1.4 Å. Figure S5: Ramachandran plot of the NMR ensemble of PCBP2 KH1 produced by the quality checking program PROCHECK-NMR. Table S1: Chemical shift assignment table PCBP2 KH1. Note that residue numbering starts from the first residue of the NMR construct as residue 1, therefore all residue numbers need to add a value of 8 to agree to the numbering used in the main text, which corresponds to residue numbering in the full-length PCBP2 protein.