Nucleophosmin C-terminal Leukemia-associated Domain Interacts with G-rich Quadruplex Forming DNA*

Nucleophosmin (NPM1) is a nucleocytoplasmic shuttling phosphoprotein, mainly localized at nucleoli, that plays a key role in ribogenesis, centrosome duplication, and response to stress stimuli. Mutations at the C-terminal domain of NPM1 are the most frequent genetic lesion in acute myeloid leukemia and cause the aberrant and stable translocation of the protein in the cytoplasm. The NPM1 C-terminal domain was previously shown to bind nucleic acids. Here we further investigate the DNA binding properties of the NPM1 C-terminal domain both at the protein and nucleic acid levels; we investigate the domain boundaries and identify key residues for high affinity recognition. Furthermore, we demonstrate that the NPM1 C-terminal domain has a preference for G-quadruplex forming DNA regions and induces the formation of G-quadruplex structures in vitro. Finally we show that a specific sequence found at the SOD2 gene promoter, which was previously shown to be a target of NPM1 in vivo, is indeed folded as a G-quadruplex in vitro under physiological conditions. Our data extend considerably present knowledge on the DNA binding properties of NPM1 and suggest a general role in the transcription of genes characterized by the presence of G-quadruplex forming regions at their promoters.

Nucleophosmin (NPM1) is a nucleocytoplasmic shuttling phosphoprotein, mainly localized at nucleoli, that plays a key role in ribogenesis, centrosome duplication, and response to stress stimuli. Mutations at the C-terminal domain of NPM1 are the most frequent genetic lesion in acute myeloid leukemia and cause the aberrant and stable translocation of the protein in the cytoplasm. The NPM1 C-terminal domain was previously shown to bind nucleic acids. Here we further investigate the DNA binding properties of the NPM1 C-terminal domain both at the protein and nucleic acid levels; we investigate the domain boundaries and identify key residues for high affinity recognition. Furthermore, we demonstrate that the NPM1 C-terminal domain has a preference for G-quadruplex forming DNA regions and induces the formation of G-quadruplex structures in vitro. Finally we show that a specific sequence found at the SOD2 gene promoter, which was previously shown to be a target of NPM1 in vivo, is indeed folded as a G-quadruplex in vitro under physiological conditions. Our data extend considerably present knowledge on the DNA binding properties of NPM1 and suggest a general role in the transcription of genes characterized by the presence of G-quadruplex forming regions at their promoters.
Nucleophosmin (also known as NPM1, B23, numatrin, and hereby termed NPM1) 3 is an abundant phosphoprotein that was originally identified as a non-ribosomal nucleolar protein playing a key role in ribosome biogenesis: NPM1 is able to bind RNA as well as DNA, and has intrinsic RNase activity that preferentially cleaves pre-rRNA (1)(2)(3)(4). NPM1 is also involved in the regulation of the important tumor suppressors p53 (5) and p14arf (6,7) and therefore mutations at the NPM1 locus or aberrant localization of NPM1 may interfere with p53 or p14arf transcriptional programs and activities. Furthermore, NPM1 plays more than one role outside the nucleolus, including the control of centrosome duplication (8). This function is confirmed by analysis of NPM1 knock-out mice showing unrestricted centrosome duplication, genomic instability, and mid-gestation embryonic lethality (9). The multiple cellular functions of NPM1 inside and outside the nucleolus are due to its chaperone activity and ability to shuttle between the nucleus and cytoplasm (10,11). These properties are in turn dictated by its modular organization. In fact, several distinct, although partially overlapping, functional domains and signatures have been identified in NPM1 (12,13).
NPM1 was first identified in 2005 as the most frequently mutated gene in acute myeloid leukemia, its mutations being observed in about 30% of cases (14) that display distinctive molecular and clinical features (13,15,16). Mutations are all localized at the C-terminal end of the protein (exon 12) and invariably result in the aberrant cytoplasmic localization of NPM1 (17). More than 40 alterations have been detected and were found to be mutually exclusive with major chromosomal abnormalities (13,14). NPM1 mutations, which are typically heterozygous, have similar consequences on the mutated protein. The reading frame is altered, due to duplication of short base sequences leading to a protein longer by 4 residues and with a sequence different from the wild type in the last 7 residues. As a result, the novel C-terminal sequence in the mutated proteins lacks the wild-type nucleolar localization signal and acquires a newly formed, Crm1-dependent, nuclear export signal (LXXXVXXVXL) (17,18). Furthermore, the protein C-terminal domain appears greatly destabilized and partially or totally unfolded, due to alteration of the protein hydrophobic core (19 -21). Both of these consequences at the protein level concur to determine the aberrant and stable cytoplasmic localization of the protein in the cytosol (17,18).
The NPM1 C-terminal domain is responsible for the nucleic acid binding activity of the protein (Fig. 1A). NPM1 was shown to bind both DNA and RNA oligos with a preference for singlestranded structures over double-stranded DNA B structures, in a sequence-unrelated manner (3,22,23). Therefore, a role for NPM1 as a single-stranded binding protein was proposed (22), a property that may be linked to the export of ribosome sub-units from the nucleus (17). Recently, however, the protein was also identified as a cofactor in the transcriptional activation of the mitochondrial superoxide dismutase 2 (SOD2) gene; in particular it was shown that full-length NPM1 binds a G-rich region at the SOD2 promoter (24). Because leukemic mutant NPM1 is stably localized in the cytoplasm, we decided to further investigate the DNA binding properties of the protein, assuming that some important function related to nucleic acid binding in the nucleus might be impaired in these mutants.
In this work we use different DNA sequences, protein constructs, and site-directed mutants to investigate the NPM1 C-terminal domain boundaries involved in DNA binding, assess the role of individual residues in binding, and analyze the effect of DNA and stabilizing salts on NPM1 folding. Importantly, we show that NPM1, whereas being able to recognize with low affinity any DNA oligo tested, binds with high affinity and is able to induce the folding of G-rich sequences that form three-dimensional structures known as G-quadruplexes. These include the above mentioned G-rich sequence at the SOD2 gene promoter, which we demonstrate here is indeed folded as a G-quadruplex under physiological conditions. Taken together these data extend our knowledge on the DNA binding properties of NPM1 and suggest that future efforts should be paid to identify G-quadruplex forming DNA regions that are recognized by NPM1 in vivo.

EXPERIMENTAL PROCEDURES
Protein Expression and Purification-NPM1-C53 was expressed and purified as previously reported (20).
A gene construct for the expression of NPM1-C70 was obtained through gene synthesis (GeneArt, Regensburg, Germany) and cloned into expression vector pET28a(ϩ) (Novagen, San Diego, CA) between restriction sites NdeI and BamHI. Escherichia coli cells, strain BL21(DE3), transformed with the expression vector, were grown to A 600 0.5 in LB medium supplemented with kanamycin at 37°C. At this point 1 mM isopropyl 1-thio-␤-D-galactopyranoside was added and cells were further grown at 20°C for 16 h. Cells were harvested, resuspended in lysis buffer (20 mM Hepes, pH 7.0, 150 mM NaCl, 5 mM ␤-mercaptoethanol, and 20 mM imidazole) and sonicated. After centrifugation, the supernatant was loaded on a nickel-nitrilotriacetic acid column, pre-equilibrated with lysis buffer. Protein was eluted using a linear gradient of lysis buffer plus imidazole (20 mM to 1.0 M). Fractions containing the protein, as judged from SDS-PAGE, were collected, diluted 10-fold with lysis buffer, and incubated with thrombin (0.5 units/mg of protein) for 1.5 h at 4°C. After thrombin cleavage the reaction mixture was supplemented with protease inhibitors (Roche Applied Science) and loaded on a nickel-nitrilotriacetic acid column, preequilibrated with lysis buffer, to remove the thrombin cleaved N-terminal His tag and uncleaved protein. Protein was recovered from the flow-through, diluted 10-fold with 20 mM Hepes, pH 7.0, 5 mM ␤-mercaptoethanol, and loaded on a SP-Sepharose column pre-equilibrated with the same buffer. Protein was eluted using a NaCl linear gradient, buffer was exchanged to remove the salt, concentrated up to 40 mg/ml in 20 mM Hepes buffer, pH 7.0, 5 mM ␤-mercaptoethanol, and stored at Ϫ80°C.
Oligonucleotides-Oligonucleotides used in this study ( Fig.  1B) were purchased from PRIMM s.r.l (Milan, Italy) and purified by HPLC. Oligonucleotides used for SPR analysis were biotinylated at the 5Ј-end. Prior to use, lyophilized oligos were resuspended in the appropriate buffer (20 mM Hepes, pH 7.0, with or without 100 mM NaCl or KCl according to the different experiments), quantified spectrophotometrically and annealed. For annealing, oligos were heated to 95°C for 15 min and slowly cooled down at RT, overnight.
Surface Plasmon Resonance-The interactions between biotinylated DNA constructs (ligands) with purified proteins NPM1-C70 or NPM1-C53 (analytes) as well as with the porphyrin TmPyP4 (analyte), were all measured using the SPR technique and a Biacore X100 instrument (Biacore, Uppsala, Sweden). Each biotinylated DNA construct was immobilized on a Sensor Chip SA, pre-coated with streptavidin from Biacore AB. The capturing procedure on the biosensor surface was performed according to the manufacturer's instructions and setting the aim for ligand immobilization to 1000 response units. Running buffer was Hepes-buffered saline-EP, which contains 10 mM Hepes, pH 7.4, 0.15 M NaCl, 3 mM EDTA, 0.005% (v/v) Surfactant P20 (Biacore AB). Analytes were dissolved in running buffer, and binding experiments were performed at 25°C with a flow rate of 30 l/min. The association phase (k on ) was followed for 180 s, whereas the dissociation phase (k off ) was followed for 300 s. The complete dissociation of active complex formed was achieved by addition of 10 mM Hepes, 2 M NaCl, 3 mM EDTA, 0.005% (v/v) P20, pH 7.4, for 60 s before each new cycle start. Analytes were tested in a wide range of concentrations to reach at least a 2-fold increase from lower concentration tested. When experimental data met quality criteria, kinetic parameters were estimated according to a 1:1 binding model using Biacore X100 Evaluation Software. Conversely, an affinity steady state model was applied to fit the data. In this latter case two possible situations were exploited: (a) single binding site, using the equation y ϭ R max ϫ [analyte]/([analyte] ϩ K D ); and (b) double independent binding site, using the equa- Circular Dichroism-All circular dichroism experiments were performed using a Jasco J710 instrument (Jasco Inc., Easton, MD) equipped with a Peltier apparatus for temperature control. Static spectra of G10-loop and c-MYC oligos were collected at 25°C, using oligos annealed in the appropriate buffer and concentrated to 20 M. Spectra were collected using a quartz cell with 1-mm optical path length (Hellma, Plainview, NY) and a scanning speed of 100 nm/min. The reported spectra are the average of five scans. To measure NPM1-C70 induced G-quadruplex formation, oligos dissolved in buffer (without additional salts) but not annealed, and concentrated to 20 M, were incubated for 1 h with appropriate amounts of NPM1-C70 and spectra collected as above. To monitor NPM-C70 induced unfolding, oligos were annealed in Hepes buffer supplemented with 100 mM NaCl and incubated with NPM1-C70 increasing amounts. The spectral contribution of buffers and proteins was subtracted as appropriate. To measure the effect of the c-MYC and G10-loop oligos on NPM1-C70 structure, CD spectra of the protein (20 M) in 20 mM Hepes, pH 7.0, alone or incubated with pre-annealed oligos (20 M) were recorded. The spectral contribution of the oligo alone was then recorded and subtracted as appropriate. Thermal denaturation experiments were performed using a quartz cell with 1-mm optical path length and monitoring the variation of CD signal at 260 nm. Temperature was progressively increased, in 1°C/min steps, from 25 to 105°C. The Kaleidagraph software was used for CD spectra analysis and representation.
Urea-induced Protein Denaturation-NPM1-C70 and NPM1-C53 urea-induced denaturations were followed monitoring the change in CD signal at 222 nm, in the presence of increasing amounts of urea. The same experiments were performed in the absence or presence of NaCl 0.25, 0.5, and 1.0 M, respectively. Data were plotted and analyzed using Kaleidagraph software. A simple two-state model was used in all cases to fit the data according to Equation 1, where ⌬G d is the free energy of folding at a concentration D of denaturant, m D-N is the slope of the transition (proportional to the increase in the surface-accessible area on going from the native to the denatured state), and D1 ⁄ 2 is the midpoint of the denaturation transition. An equation that takes into account the pre-and post-transition baselines was used (25).

Surface Plasmon Resonance Analysis of NPM1-DNA
Interaction-Full-length NPM1 is able to interact with a G-rich sequence found at the SOD2 promoter and to function as a cofactor in transcriptional activation (24). This region is predicted to form a hairpin with a five-paired base stem and a stretch of 10 consecutive guanines in the loop (hereby named G10-loop; Fig. 1B). Our first aim was to determine whether this DNA binding activity could be mapped to the C-terminal domain of the protein. To this end, a biotinylated version of the G10-loop was immobilized on a streptavidin chip and used as bait in SPR analysis. Two different constructs of the NPM1 C-terminal domain were used as the analytes: NPM1-C70 comprising the last 70 C-terminal residues, and NPM1-C53 comprising the last 53 residues (Fig. 1A). The first construct was chosen because previous binding data on truncated proteins mapped the NPM1 nucleic acid binding activity to this segment (3), whereas the NPM1-C53 construct was prepared because it is the minimal folding unit of the C-terminal domain (20) that, as established by NMR data, is made of three helices ( Fig. 1A) (19).
When testing NPM1-C70 as the analyte we obtained a K D ϭ 7.2 M ( Fig. 2A, Table 1), whereas with NPM1-C53 we obtained a K D ϭ 169 M (Fig. 2, B and C, Table 1). Interestingly, both the association and dissociation rate constants could be determined when NPM1-C70 was the analyte ( Fig. 2A), whereas they were too rapid to be determined with NPM1-C53 (Fig. 2B). This result suggests that the nucleic acid binding site is altered in the shorter protein construct and that, in this case, binding is mainly dictated by electrostatic interactions of the positively charged protein domain ( Fig. 1A) with the negatively charged G10-loop.
Next, we wanted to clarify the structural properties of the DNA hairpin necessary for binding NPM1. To this end we first tested a so-called T10-loop, where the 10 guanines at the loop of the G10-hairpin are replaced by 10 thymines, whereas all the other bases are conserved (Fig. 1B). With NPM1-C70 we obtained a K D ϭ 307 M that increased to K D ϭ 1.17 mM with NPM1-C53 (Table 1). This experiment indicates that a sequence that forms a hairpin structure resembling that of the native G10-loop but with thymines instead of guanines at the loop is poorly recognized, with a 42-fold lower affinity. NPM1-C53 is poorly competent for binding.
To explore the dimensional requirements of the hairpin for high affinity recognition, we next immobilized a G5-loop, which maintains the same hairpin arrangement of the G10loop but with only five guanines in the loop (Fig. 1B). With this oligo, we obtained a K D ϭ 40 M with NPM1-C70 and 224 M with NPM1-C53. These experiments indicate that both the presence of guanines at the hairpin loop and their number contribute to the global affinity, suggesting that the three-dimensional structure of the G10-loop plays an important role.
To further assess the DNA binding properties of NPM1, we next tested an oligo made only of T-bases (38-mer), hereby named poly(T) (Fig. 1B), and obtained a K D ϭ 120 M with NPM1-C70 and a K D ϭ 520 M with NPM1-C53 (Table 1). This suggests that the increased flexibility of the poly(T) linear oligo with respect to the T10-loop can partly compensate for the absence of guanines.
A further experiment was designed to establish the role played by guanines in a DNA sequence that does not natively form hairpin structures. We thought that if the protein preferentially recognizes with high affinity a hairpin loop made of guanines, it might also be able to induce such loop formation in a poly(G) oligo and recognize it with good affinity (Fig. 1B). Thus with poly(G) we expected to find a K D for NPM1-C70 higher than that found with the G10-loop but lower than that found with the poly(T) oligo.
To our surprise, with NPM1-C70 we obtained a K D ϭ 5.8 M, comparable with that of the physiological substrate G10-loop; with NPM1-C53 we obtained instead a K D ϭ 31 M, the lowest measured so far with this protein construct (Table 1). These results suggest that the recognition of a poly(G) oligo is far more specific than we might have anticipated. This may be rationalized by hypothesizing that this oligo is not linear but has the potential to form a structure that resembles that of the G10-loop. Indeed, by inspecting the literature, we realized that a poly(G) oligo, if long enough, such as our 38-mer (Fig. 1B), has the potential to form three-dimensional structures known as G-quadruplexes (26,27).
NPM1 C-terminal Domain Binds a Sequence Known to Form a G-quadruplex-G-quadruplexes are formed by sequences displaying at least four stretches of at least three guanines, with no sequence requirements for the intervening loops that are usually one to seven nucleotides long (G 3 N 1-7 G 3 N 1-7 G 3 N 1-7 G 3 ) (26,28). The guanines interact with each other in an arrangement different from the classical B DNA pairing, forming planar tetrads stabilized by the so-called Hoogsteen type H-bonds (27,28). These structures are greatly stabilized by Na ϩ or K ϩ ions that intercalate in the rings formed by the four guanines in the tetrad. Repetitive G-rich sequence stretches are highly represented in the human genome and are clustered at gene promoters, suggesting their functional importance (27,28).
We decided to investigate whether one well characterized G-quadruplex might interact with our NPM1 constructs. We focused our attention on a sequence contained in the NHE III (nuclease hypersensitive element III) region of the c-Myc promoter that is known to regulate up to 90% of total c-MYC expression. This is a well characterized example of a parallel G-quadruplex forming region, both in vitro and in vivo (29). An oligo representing the G-quadruplex forming region of this promoter (hereby c-MYC oligo) was therefore immobilized on a chip for SPR analysis (Fig. 1B). The region differentiating NPM1-C70 from NPM1-C53 (whose three-dimensional structure is shown in the inset) is underlined. Mutated lysines are shown in bold. The PSIPRED prediction is also shown (C ϭ coil, H ϭ ␣-helix, E ϭ ␤-strand). B, oligos used in this study are shown both in sequence and in putative structure.

NPM1 Interacts with G-quadruplexes
Interestingly, we found that NPM1-C70 binds the c-MYC quadruplex with a K D ϭ 1.9 M (Fig. 3A and Table 1), confirming that NPM1-C70 has high affinity for a sequence that adopts a G-quadruplex structure. Similarly to the G10-loop, and contrary to the other oligos, both the association and dissociation rates could be determined, suggesting that c-MYC G-quadruplex recognition is specific. Moreover, the experiment performed using NPM1-C53 as the analyte led to a K D ϭ 82 M, confirming reduced affinity with this shorter domain (Fig. 3, B and C, and Table 1).
The G10-loop Forms a G-quadruplex Structure in Vitro-The results reported so far suggest that the NPM1 C-terminal domain recognizes with particularly high affinity sequences known to form G-quadruplex three-dimensional arrangements. Under this light, it is interesting to note that the G10loop sequence, which is predicted to form a hairpin according to conventional Watson-Crick pairing (Fig. 1B), also matches the above mentioned folding rule for G-quadruplexes and, accordingly, is predicted to form a G-quadruplex by the Quadparser algorithm (26). Therefore, when annealing, this oligo has the potential to form at least two alternative structures. To infer which of these two structures is the most likely to be populated, we first collected the CD spectra of G10-loop and c-MYC, for comparison. It is well known that circular dichroism is diagnostic of G-quadruplex formation (28). In particular, by comparing the spectra of the c-MYC oligo annealed in the absence or presence of 100 mM NaCl or 100 mM KCl, respectively, we observe a red-shift and increase in intensity of the peak at around 260 nm and the formation of a through at 240 nm (Fig. 4A). Importantly the same features are observed in the case of the G10-loop spectra (Fig. 4B). These variations are both considered hallmarks of parallel G-quadruplex formation (28,30). A second indication of a G-quadruplex structure for the G10-loop is derived from denaturation experiments. In Fig. 4C the thermal melting profiles of G10-loop in the 25-105°C interval are shown, whereas their corresponding static spectra are reported in Fig. 4D. These data indicate that the melting transition is still not complete at 105°C for the G10-loop in the presence of 100 mM KCl or NaCl, whereas a poorly cooperative transition with a T m centered at around 65-70°C is observed when the same experiment is performed in the absence of monovalent cations. The predicted melting temperature for a G10-loop adopting hairpin structure is T m ϭ 69.1°C in 100 mM monovalent cations, according to the mfold server. Conversely the Quadpredict algorithm predicts higher T m values, i.e. 94.9 and 77.1°C, in the presence of 100 mM KCl or NaCl, respectively. These higher values are in better agreement with our experimental data, suggesting a G-quadruplex structure in these conditions. Thermal melting profiles were also collected for the c-MYC oligo (Fig.   4E), as a control, and found to be similar to those obtained with the G10-loop oligo (Fig. 4C). Finally we analyzed the binding of the porphyrin TmPyP4 to our oligos, by means of SPR. This molecule is known to bind G-quadruplex structures with high affinity and a complex stoichiometry involving at least two binding sites with different affinities (31)(32)(33). The interaction of TmPyP4 with poly(T) oligo (Fig. 5A) determines a series of sensorgrams that, in the concentration range explored, always reach equilibrium before the end of the contact time between analyte and ligand, allowing both kinetic analysis, with a simple 1:1 model interaction (Fig. 5A), and the construction of a Scatchard plot (Fig. 5B). The average value for this double determination is K D ϭ 345 Ϯ 15 nM and is compatible with the presence of a single binding site ( Table 2). When analyzing the T10-loop data, again we observed a single binding site with a lower dissociation constant (K D ϭ 35 Ϯ 15 nM) (Fig. 5, C and D, and Table 2). Conversely, when analyzing TmPyP4 binding to the c-MYC oligo (Fig. 5, E and F) a more complex behavior was observed, depending on the TmPyP4 concentration. At low concentrations (bottom curves of Fig. 5E) the dissociation phase is characterized by a single process corresponding to the off-rate constant of a high affinity binding site. By increasing the TmPyP4 concentration, the off-rate becomes clearly biphasic (upper curves of Fig. 5E) reflecting the titration of a second binding site with lower affinity. Moreover, not all the traces reach the equilibrium, so that the corresponding Scatchard plot is determined for a subpopulation of the porphyrin concentrations studied. As a result, the equilibrium analysis performed via the Scatchard plot is not compatible with a single binding site (see the dashed line in Fig.  5F), whereas an excellent agreement is obtained assuming two independent binding sites (solid line in Fig. 5F and see Table 2 for the corresponding K D values). Importantly, the experiments performed with the G10-loop indicated that TmPyP4 binding to this oligo follows the same behavior as observed with c-MYC, with the same stoichiometry and similar dissociation constants for the high and low affinity sites, respectively (Fig. 5, G and H, and Table 2). In conclusion, TmPyP4 binding data obtained with the G10-loop are in agreement with those obtained with c-MYC and with previous work on G-quadruplex forming oligos (31)(32)(33), whereas data obtained with the hairpin T10-loop are not. Taken together, our results suggest that, in the presence of  NOVEMBER 26, 2010 • VOLUME 285 • NUMBER 48 physiological amounts of monovalent cations, the G10-loop folds as a G-quadruplex in vitro and might also have this structure when recognized in vivo by NPM1 (24).

NPM1-C70 Stimulates the Formation of G-quadruplex Structures-
Having established that NPM1-C70 binds with high affinity preformed G-quadruplex structures, we next investigated whether it is able to induce G-quadruplex formation in vitro. In Fig. 6A we report CD spectra of the not annealed c-MYC oligo titrated with NPM1-C70, in the absence of monovalent ions. By progressively increasing the amount of NPM1-C70 we observe, once again, a red-shift and increase in signal of the 260 nm peak and the progressive formation of a through at 240 nm, which indicates that a G-quadruplex is formed upon protein binding. The same effect, albeit less evident, is obtained when using the G10-loop oligo (Fig. 6B). In a mirror experiment, using pre-annealed c-MYC or G10-loop oligos (Fig. 6, C and D, respectively) we checked whether NPM1-C70 might have the property to destabilize pre-formed G-quadruplex structures. In both cases increasing amounts of protein had no effect on the CD signal of oligos pre-annealed in the presence of monovalent cations. Thus NPM1-C70 is able to induce G-quadruplex formation in unstructured oligos, whereas it does not unwind pre-structured oligos.
Identification of Key Residues in the NPM1-quadruplex Interaction-Data reported so far indicate that the NPM1 region comprised between the longer NPM1-C70 construct and the shorter NPM1-C53 construct (aa 225-241) is necessary for high affinity binding. This region contains five lysine residues that we hypothesized might play a role in the specific recognition played by NPM1-C70 versus NPM1-C53 (Fig.  1A). To test this hypothesis we mutated each of them into alanine. The K229A/K230A double mutant was also prepared. With these variant proteins we performed a complete set of SPR experiments to measure the binding affinities with the G10-loop and c-MYC oligos. Interestingly, none of the single mutants significantly affected the interaction when the variant proteins were tested against both oligos (Table 3). However, when testing the double mutant K229A/K230A, we observed a marked loss of affinity and obtained a K D ϭ 135 M for the G10-loop and K D ϭ 78 M for the c-MYC oligo. These values are very close to those obtained when testing the two oligos with NPM1-C53 (compare Table 3 with Table 1) and suggest that residues Lys 229 and Lys 230 cooperate to the specific and high affinity recognition played by NPM1-C70 on both oligos. Besides that, they also suggest that the mode of recognition of the c-MYC and G10loop is likely to be similar, pointing again to a G-quadruplex structure for the G10-loop oligo.
Folding Studies-Previous structural studies on the NPM1 C-terminal domain (19) as well as our own folding studies (20,21) were performed on the NPM1-C53 construct rather than on NPM1-C70 (Fig. 1A). The reason is that the segment comprising the residues that distinguish NPM1-C70 from NPM1-C53 is predicted to belong to the large natively unstructured domain that separates the chaperone N-terminal domain of the protein from the C-terminal domain (12). In Fig. 1A we report the prediction made using the PSIPRED algorithm that shows a high consensus coil prediction for this region (sequence underlined). The same predictions are obtained when using all the algorithms available in the Disprot server (not shown). Because we showed that this region is necessary for high affinity binding, we now wondered if folding predictions were correct and if yes, whether this region folds upon contact with DNA. A first indication comes from CD analysis of the protein in the absence or presence of c-MYC (Fig. 7A) or G10-loop (Fig. 7B) oligos. In both cases, CD spectra are almost completely superimposible, suggesting that the secondary structure content is invariant, with no extra folding induced upon DNA binding.
To determine whether this segment is actually folded or unfolded, we performed comparative urea denaturation experiments on NPM1-C70 and NPM1-C53 in the presence of increasing amounts of NaCl (Fig. 7, C and D, respectively). Two critical values are extrapolated by fitting the denaturation curves according to a simple two-state model, i.e. the midpoint of denaturation Urea 1/2 and the m D-N value. The latter is the slope of the transition and gives an estimate of the change in solvent accessible surface when moving from the denatured to the native state (therefore the higher this value the more compact the native state). The product of these two values represents protein stability (⌬G). In Table 4 it is shown that, in the absence of salt, the ⌬G value for NPM1-C70 is considerably higher than the ⌬G for NPM1-C53, by roughly 1.5 kcal/mol. In fact, whereas the m D-N values are similar, the Urea 1/2 value for NPM1-C70 is 3.8 versus 2.6 M for NPM1-C53. If the region differentiating the two constructs were natively unstructured and free to move in the solvent in NPM1-C70, one would expect similar Urea 1/2 and ⌬G values. On the contrary our data suggest that this segment forms a compact structure with the remaining protein segment and thus contributes to overall stability. To further corroborate this finding we performed the same experiments in the presence of increasing salt amounts (Table 3). It is well known that NaCl is highly effective in stabilizing protein structure and promoting compactness (20,21). Therefore we reasoned that, if the segment were natively unstructured in the absence of salt, it should progressively acquire compactness when increasing the salt concentration and this should be reflected in increased m D-N values. Our data indicate that this does not occur, with m D-N values remaining roughly the same, within the experimental error, when increasing NaCl concentration up to 1.0 M. Taken together these experiments indicate that, contrary to predictions, NPM1-C70 adopts a compact structure, without an unstructured N-terminal arm, which constitutes the functional DNA binding domain of the protein.

DISCUSSION
In this work, we have investigated the DNA binding properties of NPM1, building on the foundations already established by others (3,(22)(23)(24). Our analysis suggests that, besides NPM1-C53 being able to recognize any oligo tested with low affinity, high affinity recognition is achieved only with NPM1-C70. Moreover, rather than being just a single-stranded binding protein as initially suggested (3,22), the preferential binding of NPM1 is for oligos with a defined three-dimensional structure. This is the case of the G10-loop, a sequence found at the promoter of the SOD2 gene, which binds NPM1 and was predicted to form a hairpin based on conventional Watson-Crick pairing (24). However, a hairpin structure is not sufficient to confer high recognition properties, as shown by the results obtained with the T10-loop. The presence and number (see the G5-loop)  (panels B, D, F, and H)  of guanines in the loop, and therefore the three-dimensional structure adopted by this particular sequence, is indeed critical (see Table 1).
Further investigations indicate that the expected hairpin might not be the structure adopted by the G10-loop in vitro and possibly also in vivo. First, we showed that NPM1-C70 is able to recognize with high affinity G-quadruplex forming oligos, like the poly(G) oligo and, above all, the G-rich sequence found at the NHEIII 1 region of the c-MYC promoter. Then we presented several lines of evidence that the G10-loop, whose sequence obeys the G-quadruplex folding rule (26), is indeed folded as a parallel G-quadruplex, at least in vitro. In fact, we showed by CD spectroscopy that the G10-loop presents all the hallmarks of parallel G-quadruplex formation in the presence of physiological concentrations of K ϩ and Na ϩ ions. Moreover we found that the thermal unfolding of this oligo in the presence of monovalent cations is still not complete at temperatures as high as 105°C, reflecting the enhanced stability conferred to G-quadruplex structures by K ϩ and Na ϩ . Finally we showed that the G10-loop, contrary to the T10-loop, is capable of binding the porphyrin TmPyP4 with the same stoichiometry and dissociation constants comparable with those of the c-MYC oligo, thus behaving as expected for a G-quadruplex forming oligo.
G-quadruplex forming regions are frequent in the human genome and, interestingly, they are particularly abundant at gene promoter regions, with roughly 47% of all gene promoters having putative G-quadruplex forming regions (26,28). A large number of these regions have been investigated and shown to form fully folded quadruplexes at least in vitro (34). In particular, it is interesting to note that these regions are frequently associated to oncogene promoters, including those of c-Myc, k-ras, bcl2, and c-kit genes, whereas a markedly reduced frequency at tumor suppressor promoters was observed (34). Moreover telomeres, which are made of the repetitive sequence TTAGGG, were also shown to form G-quadruplexes both in vitro and in vivo, these structures being detrimental to telomerase function (35). These findings suggested that drugs capable of stabilizing G-quadruplexes might have a therapeutic value in cancer treatment both because of telomerase inhibition or oncogene suppression (27). As a proof of concept, it was shown that the NHEIII 1 region at the c-MYC promoter suppresses c-MYC gene transcription when adopting a G-quadruplex structure and that the porphyrin TmPyP4, which stabilizes the G-quadruplex structure, is able to down-regulate this gene in vivo and to inhibit tumor growth in xenograft tumor models (36). The role of G-quadruplexes in gene regulation may, however, differ in other cases and be associated to gene activation, possibly through protein binding (28). This might be the case of the SOD2 promoter, because we showed a G-quadruplex arrangement for the G10-loop region.
Given their importance in processes such as gene transcription and regulation and telomere elongation control, it is not surprising that a considerable number of proteins have been reported to interact with G-quadruplexes, most of them associated to telomere function (28). These include proteins that only bind G-quadruplexes, proteins that promote G-quadruplex formation, helicases that unwind G-quadruplexes, proteins that destabilize G-quadruplexes, and nucleases that are specific for G-quadruplexes (37). We have shown here that NPM1-C70, besides binding G-quadruplex forming regions, is also capable of promoting G-quadruplex formation in vitro, a property that may be relevant for NPM1 function in the cell.
Having established that G10-loop folds as a G-quadruplex, and because this particular sequence is recognized by NPM1 in vivo and contributes to SOD2 gene activation (24), it is tempting to speculate that there might be other G-quadruplex forming regions in the genome that are recognized and stabilized by NPM1.
This will be an important subject for future investigations. In fact, we know that mutations of NPM1 associated with acute myeloid leukemia map to the C-terminal G-quadruplex binding domain, destabilize the protein, and drive its aberrant translocation to the cytosol (both wild-type and mutant) (17). Therefore, a significant amount of NPM1 DNA binding activity occurring in the nucleus is likely to be lost in acute myeloid leukemia blasts and this may contribute to driving transformation through the loss of uncharacterized gene regulation activities.
Our studies also suggest that further work is needed to understand the structure and function of the NPM1 C-terminal domain. In fact whereas we showed that NPM1-C70 is the functional domain for G-quadruplex binding, only the structure of NPM1-C53 is available to date (19). This observation led us to focus our attention on the sequence differentiating these two constructs (see Fig. 1A). We mutated all five lysines contained in this sequence and found that single alanine substitutions cannot explain the marked loss of function of NPM1-C53 with respect to NPM1-C70. However, when mutating both Lys 229 and Lys 230 to alanine, we obtained dissociation constants comparable with those of NPM1-C53 when testing both the G10-loop and c-MYC oligos, suggesting that these two residues cooperate in recognizing some presently unknown structural feature within the G-quadruplex arrangement.
What then is the structure of NPM1-C70 and how are these two lysines positioned with respect to each other? We have previously underlined that all disorder prediction algorithms indicate this protein construct to resemble the structure of NPM1-C53, with an additional natively unfolded tail at the N terminus. However, we have shown here that (i) NPM1-C70 does not acquire additional secondary structure upon DNA binding, (ii) its stability is considerably higher than that of NPM1-C53, and (iii) that structure compactness is not increased by the presence of salts. These findings contrast the predictions and suggest instead that NPM1-C70 is folded as a compact domain, with the region comprised between amino acids 225 and 241 contributing to shape a G-quadruplex-binding site whose structure is presently unknown.

TABLE 3 Effect of single alanine mutations on NPM1-C70 binding to G10-loop and c-MYC oligos
Therefore structural studies of NPM1-C70 and its complexes with relevant G-quadruplex forming oligos are much in need, also because there is a paucity of structural information con-cerning the mechanism of G-quadruplex recognition by their interacting proteins (27). Moreover the C-terminal domain of NPM1 is considered as a possible drug target for the treatment  Table 4. D, urea-induced denaturation of NPM1-C53 under the same conditions as in panel C. Symbols are also the same.

NPM1 Interacts with G-quadruplexes
of acute myeloid leukemia with cytoplasmic NPM1 (13,21). In principle, any molecule that targets this domain could stabilize a native-like state even in the mutated protein and prevent its association with the nuclear export system or translocation. Under this light, the structural analysis of NPM1 binding to nucleic acids would help in the rational design of such molecules.