The Central Domain of Core RAG1 Preferentially Recognizes Single-stranded Recombination Signal Sequence Heptamer*

RAG1 and RAG2 initiate V(D)J recombination by introducing DNA double strand breaks between each selected gene segment and its bordering recombination signal sequence (RSS) in a two-step mechanism in which the DNA is first nicked, followed by hairpin formation. The RSS consists of a conserved nonamer and heptamer sequence, in which the latter borders the site of DNA cleavage. A region within RAG1, referred to as the central domain (residues 528–760 of 1040 in the full-length protein), has been shown previously to bind specifically to the double-stranded (ds) RSS heptamer, but with both weak specificity and affinity. However, additional investigations into the RAG1-RSS heptamer interaction are required because the DNA substrate forms intermediate conformations during the V(D)J recombination reaction. These include the nicked and hairpin products, as well as likely base unpairing to produce single-stranded (ss) DNA near the cleavage site. Here, it was determined that although the central domain showed substantially higher binding affinity for ss and nicked versus ds substrate, the interaction with ss RSS was particularly robust. In addition, the central domain bound with greater sequence specificity to the ss RSS heptamer than to the ds form. This study provides important insight into the V(D)J recombination reaction, specifically that significant interaction of the RSS heptamer with RAG1 occurs only after the induction of conformational changes at the RSS heptamer.

V(D)J recombination leads to the formation of functional immunoglobulin and T cell receptor genes in developing B and T cells, respectively. Through DNA rearrangement of the immunoglobulin and T cell receptor genetic loci, the genes are assembled by combining selected gene segments termed variable (V), joining (J), and at some loci, diversity (D). The variability of the assembly process in each developing lymphocyte yields an immune system that contains a repertoire of antigen receptors with an array of binding specificities (1,2).
The first phase of V(D)J recombination consists of site-specific DNA cleavage steps adjacent to selected gene segments and requires the lymphoid-specific recombination activating proteins, RAG1 and RAG2 (1,2). Together the RAG proteins are directed to potential DNA cleavage sites by recognition of the recombination signal sequence (RSS), 1 which flanks each gene segment. The RSS contains a conserved heptamer and nonamer sequence separated by either 12 (12-RSS) or 23 (23-RSS) base pairs of poorly conserved DNA. Successful assembly of two gene segments requires that the segments are adjoined to dissimilar RSSs, a requirement referred to as the 12/23 rule. The RAG proteins cleave at each selected RSS to first nick the double-stranded DNA between the RSS heptamer and the bordering antigen receptor gene segment. The result is a 3Ј-OH group that executes a nucleophilic attack on the opposite strand, generating a covalently closed hairpin at the coding end and a signal end that is blunt-ended at the 5Ј-end of the RSS heptamer.
The second phase of the V(D)J recombination reaction consists of joining the respective coding and signal ends and requires both RAG proteins along with the non-homologous endjoining proteins DNA-PKcs, Ku70/80, Artemis, XRCC4, and DNA ligase IV. This phase joins the 12 and 23 signal ends in a precise junction (1). The coding end joint formation, in contrast to the signal end junction, is variable because of the addition or deletion of bases. During this phase, studies have shown that the RAG proteins remain bound to both the coding ends and signal ends, perhaps to stabilize the joining procedure (3,4).
The majority of biochemical studies with the RAG proteins have been accomplished using truncated murine proteins consisting of the core regions, which are the minimal catalytically active regions required for in vivo recombination activity. The core regions of RAG1 and RAG2 includes residues 384 -1008 of 1040 and residues 1-387 of 527, respectively (2).
Although both RAG proteins are required for DNA cleavage, biochemical characterizations have revealed that RAG1 contains the sequence-specific DNA binding domains to the RSS. The DNA binding domain with specificity for the RSS nonamer lies within the N-terminal region of core RAG1 (residues 384 -454) and was found to contain sequences similar to the homeodomain DNA binding domains (5,6). The RSS heptamer binding site has been found to lie within the central domain of core RAG1, which includes residues 528 -760 (7). Along with its ability to bind to the RSS heptamer, the central domain also contains the predominant RAG2 binding site (7,8). In contrast to RAG1, RAG2 alone does not bind to the RSS. Results from DNA footprinting formation (9), protein-DNA photo cross-linking (10 -12), and RAG2 mutagenesis studies (13)(14)(15) indicate that RAG2 enhances the interaction of RAG1 to the RSS heptamer, perhaps by inducing conformational changes in RAG1 and/or the DNA.
The RAG protein-RSS heptamer interaction is a critical interaction for successful DNA cleavage. A survey of the antigen receptor loci has demonstrated that the heptamer sequence in each RSS is highly conserved, particularly the three bases at the 5Ј-end (5Ј-CAC-3Ј), whereas the nonamer sequence is more poorly retained (16,17). Previous evidence from DNA footprinting studies suggested the structure of the RSS becomes distorted at the cleavage site upon binding of the RAG proteins (9). Furthermore, substrates that included single-stranded (ss) RSS (either entirely ss RSS flanked by a double-stranded (ds) coding region or RSS with mispaired bases at the coding/heptamer border) were much more efficiently cleaved compared with an entirely ds substrate (18,19). Together these results suggest that a likely loss of base pairing at the coding/heptamer border occurs upon RAG binding, which then facilitates subsequent catalytic events.
Although the central domain of RAG1 appeared to recognize the ds RSS heptamer, the binding affinity was exceptionally weak and the specificity was low relative to nonsequence-specific DNA (7). Because the structure of the RSS heptamer may be altered from ds to ss form upon RAG protein binding and because nicking may lead to altered RAG interactions, we have characterized the interaction of the central domain to these variant configurations of the RSS heptamer. Here we describe the substantial enhancement in binding affinity and sequence specificity of the RAG1 central domain for certain conformations of the RSS heptamer versus the ds form. Thus, the RSS heptamer binding site is located in the RAG1 central domain, but strong complex formation depends on disruption of the ds DNA helix. The implications of these results to the V(D)J recombination reaction are discussed.

EXPERIMENTAL PROCEDURES
Protein Cloning-Plasmid pAEC5 encodes a gene for maltose binding protein (MBP) fused to RAG1 residues 528 -721, where the fusion protein is referred to as MBP⅐R1cd(⌬ZFB). The RAG1 gene fragment was created by amplification of the appropriate region from the fulllength murine RAG1 gene. The primers introduced a BamHI site at the 5Ј-end of the product and two stop codons and a SalI site at the 3Ј-end of the product. The fragment was inserted into the BamHI and SalI sites of the multiple cloning site of pMAL-c2 (New England Biolabs).
Protein Expression and Purification-The plasmids pCJM233 (encoding MBP fused to core RAG1, referred to as MBP⅐core RAG1), pRS3 (encoding MBP fused to RAG1 residues 528 -760, referred to as MBP⅐R1cd), and pAEC5 (encoding MBP fused to RAG1 residues 528 -721, referred to as MBP⅐R1cd(⌬ZFB)) were transformed into Escherichia coli BL21 cells. The fusion proteins were expressed and purified as previously described (7), with the following modification for MBP⅐R1cd preparations. Size-exclusion chromatography with a Superdex 200 column of MBP⅐R1cd yielded ϳ80 and 20% monomeric and dimeric protein, respectively. The monomeric and dimeric fractions of MBP⅐R1cd were collected and stored separately. GST⅐core RAG2, expressed by transient transfection in 293T cells, was purified as previously reported (5).
Oligonucleotide Substrates for Electrophoretic Mobility Shift Assay-The 59-base sequence of the top strand of the wild type (WT) 12-RSS is d(GATATGGCTCGTCCTACACAGTGATATAGACCTTAACAAAAAC-CTCCAATCGAGCGGAG). The mutant heptamer (MH) 12-RSS oligonucleotide sequence is identical to the WT 12-RSS except the sequence GAGAAGC replaced the WT heptamer sequence (CACAGTG). The mutant heptamer and mutant nonamer 12-RSS sequence is identical to the WT 12-RSS, except the MH is changed as described above and the nonamer sequence (ACAAAAACC) has been replaced by AGGCTCTGA. Oligonucleotides were commercially synthesized and PAGE-purified (Integrated DNA Technologies). The ss 12-RSS substrates used corresponded to the top (sense) strand, unless otherwise stated. The ds 12-RSS substrate used was prepared by annealing complementary oligonucleotides. The nicked 12-RSS substrates were prepared by annealing a 16-base oligonucleotide (corresponding to the first 16 bases of the top strand) and a 43-base oligonucleotide (corresponding to bases 17-59 of the top strand and 5Ј-labeled with cold ATP) to a 59-base complementary oligonucleotide. Substrates were labeled at the 5Ј-end of the top (sense) strand with [␥-32 P]ATP where indicated, using T4 polynucleotide kinase.
Electrophoretic Mobility Shift Assay-MBP⅐R1cd and MBP⅐R1cd-(⌬ZFB) were incubated with 1 nM 32 P-labeled substrates and the com-plexes resolved on 6% nondenaturing polyacrylamide gels as previously described (20). The dimeric fraction of MBP⅐R1cd was used in the experiments, except where stated otherwise. MBP⅐core RAG1 was incubated with 1 nM 32 P-labeled substrates and resolved on a discontinuous 3.5/8% nondenaturing gel. The binding buffer contained 10 mM Tris, pH 8.0, 5 mM MgCl 2 , 2 mM dithiothreitol, 6% glycerol, and 100 mM NaCl. The bands were visualized using an Amersham Biosciences SI PhosphorImager and densitometer and analyzed using ImageQuaNT software. Protein concentrations in titrations to all ss substrates and WT nicked RSS ranged up to 1.5 M. K d values were determined as previously reported (20) and are the result of n ϭ 3 experiments per protein-DNA interaction.
To analyze the effect of RAG2 on the interaction of RSS with the RAG1 proteins, GST⅐core RAG2 was incubated with either MBP⅐core RAG1, MBP⅐R1cd, or MBP⅐R1cd(⌬ZFB) for 30 min at 4°C, followed by the addition of 1 nM 32 P-labeled substrates and incubation for 30 min at 25°C, with subsequent resolution on 5% nondenaturing gels. The binding buffer was the same as listed above, except for experiments containing both GST⅐core RAG2 and MBP⅐core RAG1, in which 5 mM CaCl 2 was substituted for MgCl 2 .

RESULTS AND DISCUSSION
The Central Domain of Core RAG1 Binds with High Affinity to ss RSS-The central domain of core RAG1 spans residues 528 -760 of the full-length murine protein ( Fig. 1) and contains two active-site residues (Asp-600 and Asp-708) of the three that were found by mutagenesis studies to reside in RAG1 (21)(22)(23). Further evidence of the importance of this region to catalytic activity is that point mutations in the central domain lead to immunodeficiency diseases, which are the result of either completely defective recombinase activity (i.e. T Ϫ B Ϫ severe combined immunodeficiency) or significantly reduced activity (i.e. Omenn syndrome) (24 -27). In addition, mutagenesis studies of conserved residues in the central domain region of intact core RAG1 showed several positions that severely affected cleavage activity (28,29).
We have found previously that the central domain demonstrated specific binding to the ds RSS heptamer. However, the binding affinity was very weak (with little protein-DNA complex formed even at micromolar concentrations of protein) and with only 3-fold specificity to the heptamer over nonsequencespecific DNA (7). Because the conformation of the RSS heptamer is predicted to gain ss character during the cleavage reaction and cleavage intermediates occur during the course of the reaction, we have tested the interaction of the central domain with the corresponding RSS substrates, including ds, ss, and nicked RSS ( Fig. 2A). In these experiments, increasing concentrations of the RAG1 central domain fused to maltosebinding protein (referred to as MBP⅐R1cd) were titrated into FIG. 1. Domains in core RAG1. Core RAG1 is the minimal region required for catalysis in in vivo assays. Within core RAG1, NBR (nonamer binding region) is the region that binds specifically to the RSS nonamer. ZFB (zinc finger B) is a C 2 H 2 zinc finger previously identified. The core is shown as three separate DNA binding regions: the Nterminal Region, which contains the NBR; the Central Domain, which recognizes the RSS heptamer; and the C-terminal Domain, which binds DNA in a sequence-independent manner. The locations of the putative active site residues (Asp-600, Asp-708, and Glu-962) are shown.
oligonucleotides containing a 12-RSS, with subsequent resolution of the bound complexes by electrophoretic mobility shift assay (EMSA). In the binding assays, 12-RSS was used rather than 23-RSS, because the RAG proteins bind and cleave more efficiently at a 12-RSS (2). It is evident from Fig. 2A that the binding affinity of MBP⅐R1cd is significantly greater for ss and nicked RSS versus the ds substrate. Quantitation of the binding curves (Fig. 2B) shows that MBP⅐R1cd binds with Ͼ30-fold preference for ss DNA over the ds substrate and a Ͼ7-fold increase in affinity for nicked RSS over ds substrate (Table I).
The binding affinity to the ss RSS heptamer corresponding to the bottom strand sequence (5Ј-CACTGTG) was within error of that to the top strand (data not shown), which is not unexpected given the near dyad symmetry of the ds heptamer sequence. These results clearly demonstrate that the central domain in core RAG1 has a significantly enhanced capacity to bind to a ss versus a ds DNA helix. However, it is not clear what would result in the increased affinity for nicked versus ds RSS. Perhaps the central domain is capable of unwinding the DNA helix near the nicked site. Previously, the RAG proteins were found to form a complex with nicked RSS that had an apparently slower off rate than with unnicked RSS (30). An increased affinity of the RAG1 heptamer binding site with newly nicked DNA after the first cleavage step may account for this behavior.
Two distinct protein-DNA complexes are formed in the titration of MBP⅐R1cd to the different RSS substrates (Fig. 2A). These complexes may represent monomeric and dimeric forms of the central domain bound to the RSS. Whereas MBP⅐R1cd eluted from size-exclusion chromatography predominantly in the monomeric form, initial loading concentrations of ϳ10 -20 M yielded ϳ20% of the protein in the dimeric form. To determine whether the two shifted MBP⅐R1cd⅐12-RSS complexes in Fig. 2A were because of a difference in stoichiometry of MBP⅐R1cd in the protein-DNA complexes, we utilized the separate monomeric and dimeric protein fractions from size-exclusion chromatography. The separate fractions were first incubated with ss 12-RSS and the resulting protein-DNA complexes subjected to EMSA (Fig. 2C). At the protein concentrations used, only a fraction of the DNA was complexed, likely producing the formation of only sequence-specific protein-DNA complexes. Significantly, the dimeric fraction yielded two shifted bands, whereas the monomeric fraction resulted in one shifted band even though the protein concentration in the latter fraction was slightly higher. The dimeric fraction of MBP⅐R1cd was used in the DNA-binding experiments described here (see "Experimental Procedures" and Table I). However, the monomeric fraction of protein yielded similar binding affinities to ss 12-RSS (within error) as the dimeric fraction (data not shown). In Fig. 2C, the appearance of two complexes with the dimeric fraction is most likely because of the partial dissociation of dimer to monomer after completion of size-exclusion chromatography. This weak self-association of MBP⅐R1cd also explains why the relative proportions of dimeric to monomeric MBP⅐R1cd complexed with 12-RSS did not change significantly during the course of the titration (Fig. 2A). It is possible though that in the intact RAG1 protein, the central domain may dimer- ize more readily, because it is bordered on each side by domains that have been shown to oligomerize in the absence of DNA (7,31).
The Central Domain Demonstrates Specific Binding to the ss RSS Heptamer-We have previously shown that the isolated central domain and intact core RAG1 bind preferentially to the ds RSS heptamer (7). However, the binding specificity was only ϳ3-fold greater than to ds nonsequence-specific DNA. This difference is likely not sufficient to warrant the high conservation of the heptamer in the endogenous RSSs located in the antigen receptor loci. Because the central domain demonstrates vastly enhanced binding affinity to ss over ds DNA, we asked whether the central domain showed greater specificity if the RSS heptamer were in ss form. To determine the extent of preferential recognition of the central domain to the RSS heptamer sequence, we compared the binding affinity to an oligonucleotide in which the heptamer sequence has been replaced. Comparison of the results between WT ss 12-RSS (in Fig. 2A) and MH ss 12-RSS (Fig. 3A) demonstrates that equivalent protein concentrations yield less complex with the latter substrate. Mutation of both the heptamer and nonamer sequences did not significantly further reduce the binding, as expected, because the nonamer binding site is not located in the central domain. From analysis of the binding curves, the central domain demonstrates ϳ10-fold specificity for ss RSS heptamer over nonsequence-specific ss DNA (Table I). These results demonstrate that the central domain has a significantly increased specificity for the RSS heptamer in ss versus ds form and provides further support for the model that ss DNA is an important structural intermediate in the DNA cleavage phase of V(D)J recombination.
In addition, by a similar method, we found that the central domain demonstrates Ͼ7-fold decreased affinity for nicked 12-RSS in substrates containing mutated heptamer and nonamer sequences (Table I). Thus, as the RSS is nicked the central domain gains greater affinity and specificity for the RSS heptamer, likely because of local unwinding of the ds DNA near the nicked site.
The ZFB Contributes to DNA Binding Affinity of the Central Domain to the ss RSS Heptamer-Besides two active site residues, the central domain also contains a near classic C 2 H 2 zinc finger motif between residues 722-760, which is referred to as the ZFB (zinc finger B) (31). It has been previously shown that the ZFB is the predominant binding site to core RAG2 (8) and also that the isolated central domain is capable of binding core RAG2 (7). Because zinc fingers are often sequence-specific DNA binding motifs, we asked whether the ZFB contributes in the interaction with the RSS heptamer. An MBP fusion protein with the ZFB deleted from the central domain of RAG1 was constructed and is referred to as MBP⅐R1cd(⌬ZFB). Comparison of the titration of MBP⅐R1cd(⌬ZFB) (Fig. 3B) versus MBP⅐R1cd ( Fig. 2A) to ss 12-RSS shows that the truncated central domain did not form a complex with WT ss 12-RSS as efficiently as the central domain containing the ZFB. From analysis of the respective binding curves the difference in affinity is 3-to 4-fold (Table I). These results demonstrate that in the context of the isolated central domain, the ZFB makes a significant contribution to ss RSS heptamer binding. However, residues 528 -721 within the central domain clearly make the major sequence-specific contacts with the heptamer, because the truncated domain still demonstrated an ϳ3-fold increase in binding affinity to WT ss 12-RSS over nonsequence-specific ss DNA.
Despite a small fraction (Ͻ20%) of MBP⅐R1cd(⌬ZFB) eluting from size-exclusion chromatography as a dimer, both the monomeric (Fig. 3B) and dimeric (data not shown) fractions yielded only one protein-DNA complex, not two as occurred with MBP⅐R1cd bound to ss 12-RSS. This may indicate that deletion of the ZFB significantly decreases stability of the central domain dimer.
Comparison of the Central Domain to Intact Core RAG1-Given the ability of the isolated central domain to preferentially bind ss over ds RSS, we asked whether intact core RAG1 showed similar DNA binding properties. Quantitation of binding curves demonstrated that intact core RAG1 bound with slightly lower affinity to ss versus ds RSS (Table I). It is not surprising that the differential effects of the central domain are not observed in the intact core RAG1 protein because of the presence of additional DNA binding domains (Fig. 1). Specifically, the N-terminal region of core RAG1 contains the nonamer binding site, which efficiently recognizes the ds nonamer sequence (5,6). Furthermore, a C-terminal domain in core RAG1 (residues 761-979) was found to bind ds DNA in nonsequence-specific manner cooperatively and with relatively high affinity (7). Moreover, it is possible that the heptamer binding site in the central domain is not as accessible to DNA in the intact core RAG1 protein, particularly in the absence of RAG2.
RAG2 Inhibits Association of the RAG1 Central Domain with ss RSS-Because the central domain of RAG1 demonstrated high affinity for WT ss 12-RSS, we asked whether this interaction would be affected upon the addition of RAG2. In these experiments, samples containing increasing concentrations of GST⅐core RAG2 were incubated with MBP⅐R1cd, followed by the addition of WT ss 12-RSS. After an additional incubation of the protein complexes with ss 12-RSS, the resulting protein-DNA complexes were resolved by EMSA. Surprisingly, the results demonstrate that as the concentration of RAG2 increased, the amount of ss 12-RSS complexed with MBP⅐R1cd decreased (Fig. 4A). The order in which components were combined had no effect on the results, because experiments in which RAG2 was added last yielded similar results (data not shown). However, in the same assay performed with MBP⅐R1cd(⌬ZFB), no decrease in DNA binding activity was observed, indicating that RAG2 does not have substantial interaction with residues 528 -721 of RAG1 (Fig. We also tested the effect of RAG2 on the interaction of intact core RAG1 with ss 12-RSS, using buffer conditions that prevent DNA cleavage activity. After utilizing EMSA to resolve the protein-DNA complexes, supershifted bands due to RAG1⅐RAG2⅐12-RSS complexes were detected (Fig. 4C). As the concentration of GST⅐core RAG2 increased, the amount of bound DNA increased, indicating that RAG2 enhanced the association of MBP⅐core RAG1 to the ss 12-RSS. It is possible that RAG2 alters the conformation of RAG1 to increase the DNA binding ability of the latter. Core RAG2 has also been found to induce a similar increase in binding affinity of core RAG1 for ds 12-RSS (32).
We suggest two possible interpretations for the contrasting influence of RAG2 on the association to the ss 12-RSS of the isolated central domain versus intact core RAG1. First, upon binding the ZFB, RAG2 blocks the DNA binding site in the isolated central domain, but in intact core RAG1 other regions of the protein (i.e. the N-or C-terminal core RAG1 domains) prevent this inappropriate orientation of RAG2. Second, it is possible that prior to V(D)J recombination, RAG2 specifically inhibits the central domain of RAG1 from binding to DNA until a correct initial complex is formed with an RSS. This latter possibility would be a means for RAG2 to regulate the binding of the central domain and, hence, the active site of RAG1 from contacting DNA until an appropriate protein-DNA complex was formed. CONCLUSION In this study, we have found that the central domain of core RAG1 demonstrated substantially enhanced affinity to putative and known RSS intermediates, namely ss and nicked substrates as compared with ds RSS. Binding to the ss RSS substrate was particularly favorable. Furthermore, the central domain showed a significant increase in specificity for the RSS heptamer in the ss and nicked forms versus the ds form. Given these results, we propose the following series of events upon binding of the RAG proteins to a single RSS. First, the RAG proteins bind to the ds RSS substrate with strong interactions between the nonamer binding site of RAG1 and the RSS nonamer but only weakly specific interactions with the RSS heptamer. DNA helical distortions, which likely include base unpairing, is induced at the RSS heptamer-coding flank border. In the 12-RSS, the spacing between the nonamer and heptamer of two helical turns is most likely optimal for binding of both conserved elements by the RAG proteins. In the 23-RSS, additional bending of the DNA helix, which may be facilitated by the high mobility group proteins, HMG1 or HMG2, is likely required for optimal interaction (33,34). In either case, the introduction of ss conformation in the RSS heptamer results in robust and specific association of the RAG1 central domain with the heptamer and appropriate proximity of the aspartate active site residues to the DNA cleavage site for subsequent nicking activity. After nicking occurs, the RAG1 central domain remains tightly associated with the RSS heptamer with the active site primed for hairpin formation.
The above model introduces additional constraints on the complete association of the RAG proteins with the RSS in that optimal interaction of RAG1 with the RSS heptamer requires DNA distortion. This may reduce the ability of sequences dissimilar to the canonical RSS heptamer to bind to the RAG1 central domain if they are less prone to helix distortion or if the ss sequences do not interact with the central domain in a sequence-specific manner. Additional questions relevant to the model presented here include how the RAG proteins distort the DNA helix at the RSS heptamer-coding flank border as well as the number of base pairs that are affected by this interaction. Answers to these questions will shed light on the specific events that occur during protein-DNA interactions in the V(D)J recombination reaction.