High Affinity Interactions of Nucleolin with G-G-paired rDNA*

Nucleolin is a very abundant eukaryotic protein that localizes to the nucleolus, where the rDNA undergoes transcription, replication, and recombination and where rRNA processing occurs. The top (non-template) strand of the rDNA is very guanine-rich and has considerable potential to form structures stabilized by G-G pairing. We have assayed binding of endogenous and recombinant nucleolin to synthetic oligonucleotides in which G-rich regions have formed intermolecular G-G pairs to produce either two-stranded G2 or four-stranded G4 DNA. We report that nucleolin binds G-G-paired DNA with very high affinity; the dissociation constant for interaction with G4 DNA is K D = 1 nm. Two separate domains of nucleolin can interact with G-G-paired DNA, the four RNA binding domains and the C-terminal Arg-Gly-Gly repeats. Both domains bind G4 DNA with high specificity and recognize G4 DNA structure independent of sequence context. The high affinity of the nucleolin/G4 DNA interaction identifies G-G-paired structures as natural binding targets of nucleolin in the nucleolus. The ability of two independent domains of nucleolin to bind G-G-paired structures suggests that nucleolin can function as an architectural factor in rDNA transcription, replication, or recombination.

Transcription and processing of rRNA occur within a specialized subnuclear compartment, the nucleolus. In cells that are actively transcribing the rDNA, nucleoli appear to be composed of three compartments: the fibrillar center, which contains DNA that is not being transcribed; the dense fibrillar component, where rDNA transcription occurs; and the peripheral granular component, where pre-rRNA processing and pre-ribosome assembly take place (1,2). In proliferating cells, RNA polymerase I (pol I) 1 and other components of the transcription complex localize to the dense fibrillar component, whereas molecules essential for rRNA processing, like fibrillarin and the small nucleolar RNAs, localize to the peripheral granular component (for review, see Refs. [3][4][5]. The rate at which the rDNA is transcribed in actively dividing cells is remarkable. Electron microscopic analysis shows that during active rDNA transcrip-tion in metazoan cells, the spacing between pol I complexes is only 100 base pairs (6).
One of the most abundant proteins in the nucleoli of vertebrate cells is the highly conserved protein, nucleolin. Mammalian nucleolin is 709 amino acids in length and consists of an unusual grouping of sequence and structural motifs (7)(8)(9)(10)(11)(12)(13)(14). The N-terminal region of nucleolin houses several long stretches of acidic residues with the potential to function as "acid blobs" in activation of transcription (15). The central region of nucleolin contains four RNA binding domains (RBDs; also called RNA recognition motifs or RRMs). RBDs are common among proteins that interact with single-stranded nucleic acids (16,17), and the RBDs of nucleolin are believed to mediate interactions of nucleolin with RNA (18 -22). The C terminus of nucleolin contains nine repeats of the tripeptide motif arginine-glycineglycine (RGG), in which the arginine residues are dimethylated (23,24).
The distribution of nucleolin within the nucleolus is unusual. Whereas proteins like pol I and fibrillarin appear to be restricted to a single compartment of the nucleolus, nucleolin is abundant within both the dense fibrillar component and the granular component (for review, see Ref. 4). The presence of nucleolin in the peripheral granular component is consistent with the participation of nucleolin in rRNA processing and ribosome assembly (19 -22). The fact that nucleolin is abundant within the dense fibrillar component suggests that nucleolin also functions in other processes, including transcription, replication, or recombination of the rDNA. Nonetheless, conserved and specific interactions of nucleolin with the duplex rDNA have not been reported.
The rDNA transcription unit includes the regions that template mature 18, 5.8, and 28 S RNAs and external and internal transcribed spacer regions (Fig. 1A). In all eukaryotes, the entire transcribed region of the rDNA is very rich in the base guanine (34.2% in humans) within the spacers, as well as within the regions that template the mature rRNAs. The Grichness is restricted to a single strand, the non-template strand, and most guanines are within runs that contain three or more consecutive Gs (Fig. 1B).
Single-stranded DNAs that contain runs of three or more consecutive guanine residues readily self-associate in vitro to form structures stabilized by G-G pairing (25)(26)(27)(28)(29)(30)(31). In these structures, guanines interact via Hoogsteen bonding to form planar rings called G quartets ( Fig. 2A), and the G quartets stack upon each other to stabilize higher order structures (Fig.  2B). That guanine-guanine interactions could occur readily in solution was first established nearly 40 years ago (32). Although G-G-paired DNA has not been directly observed in vivo, G-G-paired structures form rapidly and spontaneously in vitro and are very stable once formed. Because of its sequence, the G-rich strand of the rDNA has considerable potential to form G-G-paired structures ( Fig. 2A). Formation of such structures may be stimulated by the unwinding and localized denaturation that accompanies rDNA transcription.
The observations presented above have led us to investigate the interaction of nucleolin with G-G-paired DNA. Here we report that mammalian nucleolin binds tightly and specifically to both four-stranded G4 DNA and two-stranded G2 DNA. The dissociation constant for binding is K D ϭ 1 nM, which represents a remarkably high affinity for interaction of a eukaryotic protein with nucleic acid. Mutational analysis shows that two separable domains of nucleolin can bind G4 DNA, one comprised of the four RBDs (RBD-1,2,3,4) and the other comprised of the C-terminal Arg-Gly-Gly repeats (RGG 9 ). These results suggest that G-G-paired DNA is a natural binding target of nucleolin within the nucleolus. Nucleolin may, therefore, be an architectural factor that functions to organize the G-rich nontemplate strand of the rDNA during transcription, replication, or recombination.
Protein Purification-Full-length (106-kDa) murine nucleolin was purified starting with nuclear extract prepared from PD31 pre-B cells and chromatographed on heparin-agarose resin as described (33). Fractions containing nucleolin were identified at this and subsequent steps by blotting with anti-nucleolin antibodies (14). These fractions were dialyzed against Buffer L (10 mM Tris, pH 7.4, and 1 mM EDTA) containing 0.2 M NaCl, 0.1 mM dithiothreitol, and 0.1 mM phenylmethylsulfonyl fluoride, applied to a Hi-Trap Q (Amersham Pharmacia Biotech) column, and eluted with a 0.2-1.0 M NaCl gradient in Buffer L. Fractions containing nucleolin were dialyzed against Buffer B (10 mM Hepes, pH 7.8, 1 mM EDTA) containing 0.2 M NaCl, applied to a Hi-Trap SP (Amersham Pharmacia Biotech) column, and eluted with a 0.2-1.0 M NaCl linear gradient in Buffer B. Nucleolin-containing fractions were dialyzed against Buffer L containing 0.2 M NaCl, applied to polyguanosine-agarose resin (Sigma), and eluted with Buffer L containing 1.0 M NaCl.
All recombinant proteins were produced by overexpression as described previously (14). As the final step in purification, fusion proteins that contained RGG 9 domains were applied to Mono S and eluted with a 0.05-1.0 M NaCl linear gradient in Buffer B. Other proteins were fractionated instead by Mono Q chromatography and eluted with a 0.05-1.0 M NaCl linear gradient in Buffer L. All purified fusion proteins chromatographed as single species on SDS-polyacrylamide gel electrophoresis. Concentrations of proteins were determined by Bradford microassay (Bio-Rad).
Formation of G-G-paired DNAs was carried out as described by Sen and Gilbert (25,26,34) with minor modifications. Briefly, synthetic oligonucleotides were incubated at 2-3 mg/ml in TE (10 mM Tris-HCl, pH 7.4, 1 mM EDTA) containing 1 M NaCl for G4 DNA formation or 1 M KCl for G2 DNA formation at 60°C for 48 h. After incubation, samples were diluted 1:5 with 10 mM Tris-HCl, pH 7.4, 1 mM EDTA, 12.5 mM KCl, and 2.5% glycerol, and DNAs resolved on an 8% nondenaturing polyacrylamide gel (29:1, polyacrylamide:bisacrylamide), run in 0.5ϫ TBE (50 mM Tris borate, pH 8.2, 0.5 mM EDTA) containing 10 mM KCl at 4°C at 5-8 V/cm. Bands corresponding to G4 DNA, G2 DNA, and single-stranded DNA were identified according to their relative mobility by UV-shadowing or autoradiography and excised. DNAs were eluted from the crushed gel slices by soaking in TE containing 50 mM NaCl and 20 mM KCl at room temperature for 8 -12 h, precipitated with ethanol, washed, and stored at Ϫ20°C. G4 DNAs were 5Ј end-labeled with T4 polynucleotide kinase (New England Biolabs), and G-G pairing was verified by assaying characteristic protection of the guanine N-7 from methylation with dimethylsulfate (35).
DNA Mobility Shift Analysis and Measurements of Binding Affinities-Binding to G4 DNA and G2 DNA was carried out in 15-l reac- tions containing 10 mM Tris, pH 7.4, 100 mM NaCl, 1 mM EDTA, 100 g/ml bovine serum albumin, and 1 fmol of 32 P-labeled DNA for 30 min at 37°C, glycerol was added to a final concentration of 5% (w/v), and complexes were resolved by gel electrophoresis on 6% (29:1, acrylamide: bisacrylamide) 0.5ϫ TBE gels at 5 V/cm for 10 h at 4°C. Affinities were estimated by gel mobility shift assays in which binding to a fixed amount of G4 DNA was assayed in the presence of increasing amounts of protein. Protein-DNA complex formation was quantitated by Phos-phorImager analysis of the dried gels, and K D values were calculated by plotting the fraction of bound DNA at each protein concentration. Reported K D values are averages from at least three separate experiments. To verify the very low K D values for G4 DNA interactions, assays were performed at three DNA concentrations, 330 fM, 3.3 pM, and 33 pM; the apparent K D was the same at all concentrations.

Binding of G4 DNA by Endogenous Nucleolin-G4
DNA forms spontaneously in solutions of G-rich synthetic oligonucleotides, but to form G4 DNA at very high yield, synthetic oligonucleotides were incubated at high concentrations at 60°C for 48 h (32). Under these conditions, over 90% of the starting material typically formed G4 DNA. G4 DNA formation was verified in all cases by dimethyl sulfate footprinting (35). Fig. 2C shows a typical footprint obtained by probing G4 DNA formed from the ETS-1 oligonucleotide. This 40-mer derives from a sequence in the 5Ј-ETS region of the human rDNA and represents one of many regions in the rDNA that will readily form G-G-paired structures in vitro. Structures of other G-Gpaired DNAs used in binding assays were similarly verified (data not shown).
Nucleolin was purified from nuclear extracts of murine PD31 pre-B cells. The protein preparation was shown to be homogeneous by silver staining, and the identification of the 106-kDa polypeptide as nucleolin was confirmed by Western blot analysis with anti-nucleolin antibodies (Fig. 3A). The ability of nucleolin to bind G4 DNA was assayed by gel mobility shift using G4 DNA formed from the ETS-1 oligonucleotide. Binding analysis showed that full-length mammalian nucleolin binds very tightly to G4 DNA formed from the ETS-1 oligonucleotide: K D ϭ 1 nM (Fig. 3B). Similar results were obtained with G4 DNA generated from other oligonucleotides (not shown).
Binding of G4 DNA by Recombinant Nucleolin-Full-length nucleolin cannot be expressed in E. coli, but deletion of the N terminus permits good expression of recombinant protein (14). The Nuc-1,2,3,4-RGG 9 fusion protein (nucleolin residues 284 -709), which carries RBDs 1, 2, 3, and 4 and the RGG 9 domain, was assayed for interaction with G4 DNA formed from the ETS-1 oligonucleotide and shown to bind this G4 DNA with K D ϭ 0.5 nM (Fig. 4A). Binding produced two shifted complexes of distinct mobilities, which probably represent interaction of more than one polypeptide with each G4 DNA substrate, via protein-DNA or protein-protein interactions. Nuc-1,2,3,4-RGG 9 bound comparably with G4 DNA formed from the ETS-1 oligonucleotide and other oligonucleotides (data not shown). Incubation of G4 DNA with nucleolin did not permanently alter DNA structure, because following addition of SDS and proteinase K to the binding reaction, all DNA migrated as free G4 DNA (not shown). MBP did not bind G4 DNA (K D Ͼ Ͼ 40 nM) (Fig. 4B).
Both full-length mammalian nucleolin and recombinant nucleolin (residues 284 -709) therefore bind G4 DNA with high affinity. Nucleolin undergoes extensive posttranslational modifications, including phosphorylation and dimethylargininylation (36 -38). The high affinity binding of recombinant nucleolin shows that these modifications are not essential for interaction with G-G-paired DNA.
RGG 9 Binds G4 DNA-The 41-amino acid C-terminal region of nucleolin is comprised of nine repeats of the motif RGG. Nuc-RGG 9 , which expresses the RGG 9 domain as a chimeric MBP-fusion protein, bound G4 DNA with K D ϭ 3.3 nM (Fig.  6A). The RGG 9 domain, therefore, comprises a second and independent high affinity G4 DNA binding domain. Competition experiments carried out in the presence of cold competitor G4 DNA or single-stranded DNA showed that G4 DNA effectively competed for binding, whereas the single-stranded oligonucleotide had no effect, even at 1000-fold molar excess (Fig.  6B). Additional binding and competition studies demonstrated that recombinant Nuc-RGG 9 does not bind duplex DNA or single-stranded DNA (K D Ͼ 1 M; data not shown) and that deletion of five of the nine RGG repeats (Nuc-RGG 4 ) abolished G4 DNA interaction (Fig. 7). The RGG 9 domain of nucleolin binds comparably to G4 DNAs formed from other synthetic oligonucleotides and, thus, appears to recognize G4 DNA structure independent of sequence context. 9 to Produce a High Affinity G4 DNA Binding Domain-Having identified RBD-1,2,3,4 and RGG 9 as separable G4 DNA binding domains, additional deletion analysis was carried out in an attempt to define smaller subdomains capable of high affinity interaction with G4 DNA. Binding assays were carried out with eight different deletion mutants, expressed in E. coli as chimeric MBP fusion proteins, and purified to homogeneity. Results of these experiments, summarized in Fig. 7, showed that Nuc-3,4-RGG 9 , which carried RBDs 3 and 4 and the RGG 9 domain, bound G4 DNA with high affinity (K D ϭ 0.5 nM). Binding affinity was decreased 4-fold (K D ϭ 2 nM) when RBDs 1 and 2 were substituted for RBDs 3 and 4 to produce Nuc-1,2-RGG 9 .

RBD-3,4 Combines with RGG
The importance of the RGG 9 domain in G4 DNA recognition is reinforced by the observation that whereas the Nuc-3,4-RGG 9 chimera-bound G4 DNA with relatively high affinity, deletion of RGG 9 to produce Nuc-3,4 resulted in a complete loss of binding (K D Ͼ 40 nM). Similarly, Nuc-1,2 was not active in G4 DNA binding (K D Ͼ 40 nM). Finally, complete loss of G4 DNA binding occurred when the RGG 9 region was truncated by deletion of the N-terminal five RGG repeats to create MBP-RGG 4 (Fig. 7). DISCUSSION We have shown that the abundant nucleolar protein, nucleolin, binds G-G-paired DNA with very high affinity (K D ϭ 1 nM). Nucleolin can bind to both four-stranded G4 DNA and twostranded G2 DNA, and nucleolin recognizes G-G-paired structures independent of sequence context. The remarkably high binding affinities suggest that G-G-paired structures are binding targets of nucleolin in vivo. The observation that nucleolin binds G-G-paired structures independent of sequence context shows that this protein will be able to bind G-G-paired structures wherever they might form within the G-rich rDNA.
Dynamic Formation of G-G-paired DNA in the Nucleolus-Most nuclear DNA is double-stranded, and complementary base pairing will normally protect duplex DNA from forming G-G-paired structures. However, duplex DNA becomes transiently single-stranded during three critical and dynamic processes: transcription, replication, and recombination. Cells have developed sophisticated mechanisms to prevent DNA from adopting alternative structures, including a variety of proteins that bind to transiently exposed single-stranded regions. Nonetheless, these mechanisms are not foolproof. For example, there is considerable evidence that triplet repeat expansion results from formation of non-Watson-Crick structures during replication (see Ref. 40 and references therein).
The sequence composition and the strand asymmetry of the rDNA provide it with considerable potential to form G-G-paired structures. The rDNA is G-rich on the top (non-template) strand, not only within the region transcribed into pre-rRNA but also within the spacers (Fig. 1). During active transcription, pol I molecules pack at extremely high density on the rDNA repeats; electron micrographic analysis shows that the spacing between pol I complexes is only 100 base pairs (6). Transcription at this level requires that a considerable fraction of the rDNA duplex be denatured. We hypothesize that G-Gpaired structures form within the G-rich top strand of the rDNA during transcription or when the duplex is transiently denatured during replication or recombination. G-G-paired structures are very stable once formed (26) and would not be predicted to dissociate spontaneously in vivo.
Other experiments provide further support for the notion of a dynamic process of formation and unwinding of G-G-paired structures within the active rDNA. We have recently shown that G-G-paired DNA is the preferred substrate of two eukaryotic helicases, the human BLM helicase, which is deficient in Bloom's syndrome (41), and the Saccharomyces cerevisiae Sgs1p helicase (42). Both these helicases are members of the highly conserved RecQ helicase family. Moreover, S. cerevisiae Sgs1p localizes predominantly to the nucleolus (43,44), where it could function to maintain the structure of the G-rich rDNA. The human functional homolog of Sgs1p in S. cerevisiae appears to be the WRN helicase (deficient in Werner's syndrome). Like Sgs1p, WRN is a RecQ family helicase that is predominantly nucleolar in localization (45,46). Unwinding activity mapped to the conserved helicase core domain of Sgs1p (42), strongly suggesting that preferential activity on G-G-paired substrates may be a general property of helicases in this family. It is therefore very likely that WRN will also prove to be active on G-G-paired rDNA substrates.
Nucleolin as an Architectural Factor in rDNA Transcription, Replication, or Recombination-Two separable domains within nucleolin can bind G-G-paired structures, one comprised of the RBDs 1, 2, 3, and 4 and the other comprised of the C-terminal RGG 9 domain. The presence of two independent G-G DNA binding domains would contribute to the ability of nucleolin to organize G-G-paired regions. Nucleolin may thus be an architectural factor, in effect forming a scaffolding for the structured G-rich strand. The presence of long acidic runs in the N terminus of nucleolin is consistent with its function in transcription, but nucleolin is a complex molecule with multiple distinct domains, and it may have multiple functions. We have identified nucleolin as one component of a heterodimeric protein, LR1, induced specifically in B cells activated for immunoglobulin heavy chain switch recombination (14,33,47). The rDNA repeats must undergo active recombination to maintain homogeneity of this gene family, and one function of nucleolin may be to stimulate or regulate recombination of the rDNA.
Nucleolin in the Nucleolus-Nucleolin is abundant in the peripheral granular component of the nucleolus, where rRNA processing occurs, and also in the central dense fibrillar component of the nucleolus, where rDNA transcription occurs (for review, see Ref. 4). Reported functions of nucleolin in rRNA processing (21) and ribosome assembly (24) are consistent with its presence in the nucleolar peripheral granular component. Function in rDNA transcription, replication, and/or recombination is consistent with the observed localization of nucleolin within the nucleolar central dense fibrillar component. The N terminus of nucleolin contains long acidic regions of as many as 38 aspartate and glutamate residues in an uninterrupted stretch, which could function as acid blobs (15) to activate transcription by pol I. The N terminus of nucleolin also contains sites for the mitosis-specific cdc2 kinase (38) and casein kinase II (36,37). Both of these kinases phosphorylate histone H1, and they could analogously regulate nucleolin in response to cell cycle-dependent controls.
Many proteins have been identified which contain RBDs and RGG motifs, but the mutational analysis of nucleolin makes it unlikely that high affinity binding to G-G-paired DNA is a common property of all RBD/RGG proteins. Most RBD-containing proteins contain only two or three RBDs, and deletion of two of the RBDs of nucleolin to produce Nuc-1,2, Nuc-2,3, or Nuc-3,4 greatly diminished binding affinity (Fig. 7). Similarly, whereas many proteins contain RGG motifs, nucleolin is unusual in that it contains nine repeats of the RGG motif, and deletion analysis showed that Nuc-RGG 4 does not bind G4 DNA.
The broad nucleolar distribution of nucleolin has led to considerable interest regarding its mode of localization within the nucleolus. The two domains of nucleolin that bind G-G-paired DNA (RBD-1,2,3,4 and RGG 9 ) are also essential for nucleolar localization (48 -50), whereas the N-terminal acidic region is dispensable. The ability to interact with G-G-paired nucleic acids may, therefore, be essential to localization or retention of nucleolin within the nucleolus.