G4 DNA Binding by LR1 and Its Subunits, Nucleolin and hnRNP D, A Role for G-G pairing in Immunoglobulin Switch Recombination*

The immunoglobulin heavy chain switch regions contain multiple runs of guanines on the top (nontemplate) DNA strand. Here we show that LR1, a B cell-specific, duplex DNA binding factor, binds tightly and specifically to synthetic oligonucleotides containing G-G base pairs (K D ≤ 0.25 nm). LR1 also binds to single-stranded G-rich sequences (K D ≈ 10 nm). The two subunits of LR1, nucleolin and hnRNP D, bind with high affinity to G4 DNA (K D = 0.4 and 0.5 nm, respectively). LR1 therefore contains two independent G4 DNA binding domains. We propose that LR1 binds with G-G-paired structures that form during the transcription of the S regions that is prerequisite to recombination in vivo. Interactions of donor and acceptor S regions with subunits of the LR1 could then juxtapose the switch regions for recombination.

Immunoglobulin class switch recombination is a regulated recombination event that joins a rearranged and expressed heavy chain variable region (VDJ) to a new downstream constant (C) 1 region, deleting the DNA between. Each immunoglobulin class removes antigen from the body in a distinct way, so switch recombination alters the pathway of antigen clearance without affecting antigen specificity. Switch recombination is region-specific: junction sites are found throughout the upstream (donor) and downstream (acceptor) S regions. Comparison of switch junction sequences shows that switch recombination does not depend on either sequence-specific or homologous recombination mechanisms (1). Circular molecules containing the deleted C region and flanking sequences are produced during switching (2), suggesting that switching involves a synapsis event in which distant switch regions are brought together into a recombination complex, which undergoes cleavage and religation to produce an excised switch circle and a chromosomal switch junction (see Fig. 1).
Switch recombination depends on G-rich DNA regions called switch or S regions. The S regions are 2-10 kilobases in length and located in the intron upstream of each C region that undergoes switching: C, C␥, C⑀, and C␣ (Fig. 1). The organiza-tion of the S regions in the heavy chain locus reflects their functional importance in switch recombination: there is an S region upstream of each C region except C␦, and use of the C␦ region is governed by RNA processing, not DNA recombination. Simultaneous transcription of both activated switch regions is essential for subsequent recombination (reviewed in Ref. 3). During S-region transcription, the G-rich strand is the "top" or nontemplate strand.
G-rich DNA has unusual properties because of the unique pairing potential of guanine. Guanine can interact with cytosine in standard Watson-Crick G-C base pairs, and guanine can also interact with guanine in structures stabilized by G-G Hoogsteen bonding (Fig. 2). One stable structure formed by G-G bonding is the G quartet, in which four guanines associate in a planar ring in which each G interacts with two other Gs (Fig. 2A). G quartets can, in turn, stabilize interactions between runs of Gs, thus allowing nucleic acids to form fourstranded structures called G4 DNA ( Fig. 2B; Refs. 4 -8). Synthetic oligonucleotides derived from the S and S␥2b switch regions were among the first sequences shown to form G4 DNA in vitro (4). Since that time, G4 DNA formation has been characterized extensively in experiments which show that a run of three Gs is sufficient to drive G4 DNA formation and that G-G interactions are essentially independent of sequence context in which the runs of Gs occur (reviewed in Ref. 9).
In mammalian cells, G-rich DNAs are found in three distinct genomic microenvironments: the heavy chain switch regions, the rDNA, and the telomeres. The fact that G-rich DNA occurs at very specific regions of the genome suggests that G-G pairing may be important to specific cellular functions. Consistent with this hypothesis, a number of proteins have been described that bind to, cleave, or promote formation of G-G-paired DNAs (10 -17), including some that interact with telomeric sequences (18 -21). We have recently shown that G4 DNA is the preferred substrate of one critical mammalian helicase, the BLM helicase (22). The BLM helicase is deficient in Bloom's syndrome, a human genetic disease characterized by extreme genomic instability, tendency to development malignancies, and immunodeficiency (23)(24)(25).
LR1 is a B cell-specific, sequence-specific DNA binding factor that binds to duplex sites in the S regions (26,27). LR1 DNA binding activity is present in pre-B and B cell lines, and it is absent from resting B cells but induced in primary B cells activated to carry out switch recombination. This spectrum of LR1 activity correlates with the ability of a cell type to support recombination of extrachromosomal switch substrates (28). LR1 binds with very high affinity (K D ϭ 1.8 nM) to duplex DNA sites conforming to the consensus, GGNCNAG(G/C)CTG(G/A) (29).
LR1 is a heterodimer of nucleolin and a specific isoform of hnRNP D (29,30). Both nucleolin and hnRNP D are members of the large family of eukaryotic nuclear proteins that contain RNA binding domains (RBDs, also called RNA recognition motifs, or RRMs) and Arg-Gly-Gly repeats (RGGs). These structural motifs are commonly found in proteins that interact with RNA or single-stranded DNA (reviewed in Refs. [31][32][33]. Surprisingly, despite the high affinity and sequence specificity of LR1 duplex DNA binding, neither of its subunits contains domains that commonly mediate duplex DNA interactions. The unusual subunit composition of LR1 suggested that duplex DNA might not be its only binding target. Here we report that LR1 specifically binds G4 DNA with K D ϭ 0.25 nM, 7-fold lower than its K D for duplex DNA binding. We further show that both recombinant nucleolin and hnRNP D also bind G4 DNA (K D ϭ 0.4 nM and 0.5 nM, respectively). As nucleolin and hnRNP D, the two components of LR1, can both independently bind G4 DNA, a single LR1 heterodimer can bind to two separate G-rich regions of DNA. We suggest that LR1 binds to G-G-paired structures that form during S region transcription, to juxtapose donor and acceptor switch regions for recombination. G-G pairing has the further potential to stabilize intermediates in the recombination process.

EXPERIMENTAL PROCEDURES
Protein Preparations and Antibodies-LR1 was purified 12,000-fold from nuclear extract of the murine pre-B cell line, PD31, by four chromatographic steps (29). Recombinant nucleolin was produced as a maltose-binding protein fusion protein containing amino acids 284 -709 of human nucleolin, expressed in Escherichia coli from the pMalNuc plasmid, and purified as described previously (30). The fusion protein contains the four RBDs of nucleolin and the C-terminal RGG motifs; deletion of the acidic N terminus of nucleolin was essential to permit bacterial expression. Recombinant His 6 -tagged hnRNP D M20 was produced from a fusion construct in the pET30A(ϩ) (Novagen) bacterial expression vector using an engineered murine cDNA clone in which an N-terminal His 6 tag is fused to hnRNP D amino acid 31 (29). The M20 isoform of hnRNP D contains sequences encoded by alternative codon exon 2 but not exon 7 (34). His 6 -tagged hnRNP D M20 was expressed in E. coli and purified by nickel-chelate chromatography as described by the manufacturer (Novagen). Polyclonal antibodies were raised against recombinant human nucleolin and a synthetic peptide bearing a Cterminal sequence of hnRNP D and purified as described previously (29,30).
G4 DNA Formation, DNA Labeling, and Methylation Footprinting-Sequences of synthetic deoxyoligonucleotides used to form G4 DNA are shown in Table I. G4 DNAs were formed and end-labeled as previously described (22). In all cases, the characteristic G-G pairing was verified by methylation footprinting using dimethylsulfate (35). End-labeling of single-stranded oligonucleotides (22) and formation and labeling of DNA duplexes (26) also followed previously described procedures.
DNA Binding and Measurements of Binding Affinity-Binding to duplex DNA was carried out in 15-l reactions containing 20 mM HEPES, pH 7.5, 100 mM NaCl, 1 mM dithiothreitol, 0.1% Nonidet P-40, 2.5% glycerol, 2% polyvinyl alcohol, 100 g/ml bovine serum albumin, 4 fmol of 32 P-labeled duplex DNA for 15 min at room temperature. Protein-DNA complexes were resolved by electrophoresis on 5% polyacrylamide gels in 90 mM Tris-borate, 1 mM EDTA, pH 8.3. Binding to G4 DNA and single-stranded DNA was carried out in 15-l reactions containing 10 mM Tris, pH 7.4, 100 mM NaCl, 1 mM EDTA, 100 g/ml bovine serum albumin, 1 fmol of 32 P-labeled DNA for 30 min at 37°C, and the complexes were resolved by gel electrophoresis on 6% polyacrylamide, 45 mM Tris-borate-EDTA gels at 4°C. When antibodies were

LR1
Binds G4 DNA-Each of the immunoglobulin switch regions contains reiterations of a consensus repeat characterized by at least one run of three or more of Gs (Table I). To test the possibility that LR1 might recognize G4 DNA formed by S region sequences, we carried out gel mobility shift experiments to assay binding of highly purified protein to 5Ј end-labeled G4 DNA formed from the P oligonucleotide. This oligonucleotide is a synthetic 49-mer derived from the S␥2b switch region (4). G4 DNA, formed from four separate G-rich strands, provides an excellent model for G-G-paired structures, as it is readily formed at high yield and is very stable in solution. Formation of G4 DNA was in all cases verified by methylation footprinting (Refs. 4, 6, and 22; data not shown).
Highly purified LR1 (29) bound to G4 DNA formed from the P oligonucleotide with K D ϭ 0.25 nM (Fig. 3A). This is a very low binding constant for interaction between a eukaryotic protein and DNA. LR1 also bound to single-stranded P oligonucleotide (Fig. 3A); the dissociation constant for this interaction is K D ϭ 11 nM. LR1 bound other G-rich single-stranded DNA with similar K D values (data not shown). The complexes of LR1 with G4 DNA or single-stranded DNA were sensitive to proteinase K/SDS treatment, and protein binding therefore does not permanently alter DNA structure or conformation (data not shown). In assays of LR1 binding to Watson-Crick duplex DNA formed from the P oligonucleotide, a very small fraction of labeled DNA interacted with protein.
In the assay of LR1 binding to single-stranded P oligonucleotide shown in Fig. 3A, two bands are apparent in the lane that contains no protein. The faster-migrating band is singlestranded oligonucleotide, and the other is G4 DNA that has formed spontaneously. This illustrates the propensity of guanines to interact in solution. Similarly, concentrated solutions of GMP have been shown to form a viscous gel (36). Binding of the G4 DNA by LR1 probably accounts for the highly retarded species.
We used rabbit polyclonal antibodies raised against a recombinant fusion protein carrying amino acids 284 -709 of nucleolin (Hanakahi (30) or a C-terminal peptide of hnRNP D (see "Experimental Procedures") to verify that LR1 interacts with G4 DNA. As shown in Fig. 3B, neither preimmune nor immune serum antibodies affected the mobility of G4 DNA in the absence of protein. Anti-nucleolin antibodies dramatically inhibited DNA binding. Antibodies raised against the hnRNP D C-terminal peptide supershifted the protein-DNA complex. The LR1 heterodimer is therefore responsible for the observed G4 DNA binding activity.
At increasing concentrations of LR1, multiple complexes were evident in assays of LR1 binding to G4 DNA (Fig. 3A). This may reflect protein interaction with more than one G4 DNA molecule, as would occur if the LR1 heterodimer contained multiple G4 DNA binding domains. To investigate this possibility, we asked if recombinant nucleolin or hnRNP D could bind G4 DNA.
Recombinant Nucleolin Binds G4 DNA-Recombinant nucleolin was expressed and purified as described under "Experimental Procedures" and assayed for binding to G4 DNA in gel mobility shift experiments. Recombinant nucleolin bound P oligonucleotide G4 DNA, with an estimated K D ϭ 0.4 nM. It did not bind single-stranded DNA or Watson-Crick duplexes formed from the same oligonucleotide, even at very high protein concentrations (K D Ͼ 200 nM; Fig. 4A). The nucleolin-G4 DNA complex was competed by G4 DNA but not by singlestranded P oligonucleotide, as was expected from the direct binding assays (Fig. 4B).
Switch recombination in vivo frequently joins S and S␥ switch regions. In the S␥ switch regions, the characteristic repeat is about 50 base pairs in length and consists of one or more runs of G. In contrast, the S switch region (like S␣ and S⑀) is composed of variations of pentameric motifs like GGGGT, GAGCT, and GGGCT (see Table I). We tested the ability of recombinant nucleolin to bind to G4 DNA formed from an oligonucleotide, RX1, that derives from the murine S region. RX1 carries one GGGGT repeat and two GAGCT repeats ( Table  I). As shown in Fig. 4C, recombinant nucleolin bound to RX1-G4 DNA with K D ϭ 0.4 nM, comparable with binding to P oligonucleotide G4 DNA. Recombinant nucleolin also bound to G4 DNAs formed from G-rich sequences not derived from the switch regions (data not shown). G4 DNA binding by nucleolin therefore appears to be specific for the G4 DNA structure, independent of surrounding sequence.
Recombinant hnRNP D Binds G4 DNA-hnRNP D is a highly conserved protein that is expressed as three isoforms related by alternative splicing (34). Recombinant hnRNP D bound to G4 DNA (K D ϭ 0.5 nM) but not to single-stranded DNA or Watson-Crick duplexes formed from the same oligonucleotide (K D Ͼ 200 nM: Fig. 5A). Formation of the complex between hnRNP D and P oligonucleotide G4 DNA was competed by G4 DNA formed from the P oligonucleotide but not by single-stranded P oligonucleotide (Fig. 5B). Recombinant hnRNP D also bound to G4 DNA formed from the S region RX1 oligonucleotide (Fig. 5C) and to G4 DNAs formed from other G-rich sequences (data not shown).

Antibody Recognition of Nucleolin or hnRNP D Bound to G4
DNA-We verified binding of recombinant nucleolin and hnRNP D to G4 DNA by assaying sensitivity to anti-nucleolin and anti-hnRNP D antibodies. As shown in Fig. 6, anti-nucleolin antibodies inhibited the interaction of nucleolin with G4 DNA but had no effect on the mobility of G4 DNA in the absence of added nucleolin. Anti-hnRNP D antibodies supershifted the complex of hnRNP D with G4 DNA but did not affect the mobility of free G4 DNA. The observation that the anti-C terminal hnRNP D antibodies supershifted the binding complex suggests that the epitopes recognized by this anti-peptide antibody preparation are in an exposed region of the hnRNP D polypeptide. In contrast, the dominant epitopes recognized by the polyclonal anti-nucleolin antibodies appear to be within the region of nucleolin that makes contact with DNA.
We previously studied recognition of LR1 duplex DNA binding activity by these anti-nucleolin and anti-hnRNP D antibodies (29,30). Analogous to the observations shown in Fig. 6, in those experiments we found that the anti-nucleolin antibodies removed nucleolin from the LR1 (nucleolin/hnRNP D) heterodimer (30), whereas anti-hnRNP D antibodies supershifted the LR1-DNA complex (29).

DISCUSSION
We have shown that the B cell-specific factor, LR1, binds to G-rich S region sequences as Watson-Crick duplexes and as G-G-paired structures stabilized by Hoogsteen pairing. The dissociation constant of the interaction of LR1 with G4 DNA is 0.25 nM. This is a very high affinity interaction for a eukaryotic nucleic acid-binding protein. Moreover, both subunits of the LR1 heterodimer, nucleolin and hnRNP D, can independently bind G4 DNA with comparably high affinity.
G-G-paired DNA May Form during Transcription That Is Prerequisite to Switch Recombination-Simultaneous transcription of both activated S regions is prerequisite to switch recombination (reviewed in Ref. 3). During switch region transcription, the G-rich nontemplate strand is unwound from the C-rich template strand, which is transcribed by RNA polymerase. The mechanistic basis for the dependence of switch recombination on transcription has not been understood. We hypothesize that transcription may allow the G-rich strand to transiently form structures stabilized by G-G pairing and that proteins involved in recombination recognize this DNA structure. Such structures are likely to be transient, as they can be unwound by the BLM helicase (22) and possibly other mammalian helicases. S regions are long: in the mouse and human, S regions are from 2 to 10 kilobases in length. If G-G pairing occurs during S-region transcription, then the potential for G-G pairing should be roughly proportional to the length of an S region and the number of runs of three or more Gs it contains. Long S regions will therefore increase the opportunity for recombination by providing additional sites at which G-G pairing can occur.
Other laboratories have shown that during in vitro transcription of the G-rich S regions, stable RNA:DNA hybrids form between the newly synthesized transcript and the C-rich template strand (37)(38)(39). Formation of such hybrids would increase the opportunity for G-G pairing, by increasing the time during which the G-rich top strand is freed from the Watson-Crick duplex.
LR1 Binding May Promote Synapsis of Two G-rich Switch Regions-LR1 can bind duplex DNA, single-stranded G-rich DNA, and G4 DNA. Recognition of each of these substrates might contribute to LR1 function in switch recombination.
The LR1 heterodimer contains six RBD domains: four in nucleolin, and two in hnRNP D. These domains are found in many proteins that interact with single-stranded nucleic acids (reviewed in Refs. [31][32][33]. Structurally, an RBD forms a platform upon which a single strand of nucleic acid is bound in an open conformation (40,41). As others have pointed out (4, 7), G-rich DNA has an unusual potential to function in recombination because G-rich but nonhomologous regions can interact via G-G Hoogsteen pairing. RBDs have been shown to function in nucleic acid annealing (42)(43)(44)(45)(46). It is an interesting possibility that, by binding to a single-stranded region, LR1 may render the DNA available to interactions with other G-rich nucleic acids. LR1 binds to duplex sites in the S regions that conform loosely to the consensus GGNCNAG(G/C)CTG(G/A), and LR1 duplex DNA binding activity is found only in B cells, where it correlates with switch recombination (26 -28). LR1 binds to one of its sites in the S␥1 switch region with K D ϭ 1.8 nM, and binding is relatively insensitive to mutations at most positions in this consensus (26,27,29,30,47,48). The S regions are dense with sites that are very similar to the LR1 binding consensus, and LR1 is likely to occupy some fraction of these sites in B cells that have been activated for switch recombination.
LR1 bound to either single-stranded or duplex regions would be poised to capture G4 DNA that formed even transiently. Because the affinity of the LR1/G4 DNA interaction is so very high, LR1 bound to duplex sites would comprise a reservoir of protein that would be poised to capture G-Gpaired DNA as it formed. Moreover, as both components of the LR1 heterodimer, nucleolin and hnRNP D, can bind G4 DNA, LR1 has two independent G4 DNA binding domains. The presence of these two domains would enable a single LR1 heterodimer to interact with two G-G-paired regions. If these regions are located on donor and acceptor switch regions, then interaction with LR1 could juxtapose these two switch regions for recombination.