Biochemical Analysis of Hypermutational Targeting by Wild Type and Mutant Activation-induced Cytidine Deaminase*

The synthesis of high affinity antibodies requires activation-induced cytidine deaminase (AID) to initiate somatic hypermutation and class-switch recombination. Here we investigate AID-catalyzed deamination of C → U on single-stranded DNA and on actively transcribed closed circular double-stranded DNA. Mutations are initially favored at canonical WRC (W = A or T, R = A or G) somatic hypermutation hot spot motifs, but over time mutations at neighboring non-hot spot sites increase creating random clusters of mutated regions in a seemingly processive manner. N-terminal AID mutants R35E and R35E/R36D appear less processive and have altered mutational specificity compared with wild type AID. In contrast, a C-terminal deletion mutant defective in CSR in vivo closely resembles wild type AID. A mutational spectrum generated during transcription of closed circular double-stranded DNA indicates that wild type AID retains its specificity for WRC hot spot motifs within the confines of a moving transcription bubble while introducing clusters of multiple deaminations predominantly on the nontranscribed strand.

The synthesis of high affinity antibodies requires activation-induced cytidine deaminase (AID) to initiate somatic hypermutation and class-switch recombination. Here we investigate AID-catalyzed deamination of C 3 U on single-stranded DNA and on actively transcribed closed circular double-stranded DNA. Mutations are initially favored at canonical WRC (W ‫؍‬ A or T, R ‫؍‬ A or G) somatic hypermutation hot spot motifs, but over time mutations at neighboring non-hot spot sites increase creating random clusters of mutated regions in a seemingly processive manner. N-terminal AID mutants R35E and R35E/R36D appear less processive and have altered mutational specificity compared with wild type AID. In contrast, a C-terminal deletion mutant defective in CSR in vivo closely resembles wild type AID. A mutational spectrum generated during transcription of closed circular double-stranded DNA indicates that wild type AID retains its specificity for WRC hot spot motifs within the confines of a moving transcription bubble while introducing clusters of multiple deaminations predominantly on the nontranscribed strand. AID 1 is required for the secondary Ig gene diversification processes, somatic hypermutation (SHM) and class-switch recognition (CSR) (1). SHM and CSR are abolished in mice and humans (HIGM-2 patients) deficient for AID (1,2). SHM entails the generation of point mutations in the V region of Ig genes at about million-fold higher rate than the rest of the genome, whereas CSR performs recombinational events with downstream sequences that delete the intervening constant regions (3). The two processes help to generate high affinity antibodies with an optimized fit to an antigen (V gene SHM) along with specialized Ig isotypes (CSR) enabling a more efficient clearing of systemic infections, and both require active transcription (4 -6). Although AID is synthesized in B cells selectively under tight regulation, the ectopic expression of AID can sufficiently induce SHM and CSR in non-B cell lines and can induce C⅐G 3 T⅐A mutations in uracil glycosylase-deficient Escherichia coli and Saccharomyces cerevisiae (7)(8)(9)(10)(11).
Based on its similarity in sequence to the apoB mRNA processing enzyme Apobec-1, AID was initially thought to use RNA as its substrate (3,12). However, biochemical assays using partially purified AID (13)(14)(15)(16) and in vivo studies with overexpressed AID (10,17) offer convincing documentation that ssDNA is a substrate for AID. AID appears to be inactive on RNA, dsDNA, and RNA-DNA hybrid molecules in vitro (13), but it cannot be ruled out that AID might be active on RNA in vivo (18). AID simulates hallmark properties of SHM when acting alone on naked DNA in vitro (14,19), especially by targeting canonical WRC (20) V gene mutational hot spots (W ϭ A or T; R ϭ A or G) preferentially, while avoiding SYC cold spots (S ϭ G or C; Y ϭ C or T) (14). AID-catalyzed C deaminations in vitro exhibit broad clonal mutagenic heterogeneity (14) reminiscent of Ig V gene mutational distributions (21,22). Transcription could enable AID to act on ssDNA exposed within the moving transcription bubble. We (14) and others (15,17,23,24) have shown that transcription-dependent AID-catalyzed deaminations in vitro are strongly favored on the exposed nontranscribed strand, but the specific nature of the sequence specificity and distribution of deaminations during transcription has not been addressed.
An in-depth in vitro analysis geared to reveal the biochemical basis for the distributions of mutations and the favored targeting of deaminations to hot spot motifs is the focus of this paper. We have taken a two-tiered approach to evaluate the biochemical properties of AID: when acting on naked ssDNA and when acting on closed circular dsDNA undergoing transcription. Mutational spectra are obtained using wt AID and mutant forms of AID. The AID mutants were designed to have reduced interactions with DNA and allow us to determine how binding and processivity perturb mutation frequency and specificity. An analysis of mutations performed on ensembles of individual DNA clones has enabled a determination to be made of the precise location, timing, and distribution of AID-initiated mutations.

Construction and Purification of AID Mutants and DNA Substrates-
Mutant AID proteins (R35E, R24E, R35E/R36D, and a 10-amino acid C-terminal deletion) were constructed by site-directed mutagenesis (QuikChange site-directed mutagenesis kit, Stratagene) using the pAcG2T-AID vector (13,14) as the template. The following primers and their complementary strands were used: 5Ј-tgg gct aag ggt gag cgt gag acc tac-3Ј (for R24E mutant), 5Ј-tac gta gtg aag gag cgt gac agt gct-3Ј (for R35E mutant), 5Ј-ctg tgc tac gta gtg aag gag gat gac agt gct aca tcc ttt-3Ј (for R35E/R36D double mutant), and 5Ј-ctg tat gag gtt gat gac taa cga gac gca ttt cgt act-3Ј (a 10-amino acid C-terminal deletion). The entire coding region of AID in the mutant constructs was verified by DNA sequencing. Following the recommended protocol, the mutant plasmids were then cotransfected with linearized baculovirus DNA (BD Biosciences) to generate recombinant baculovirus encoding for the mutant AID proteins. To express the mutant proteins, Sf9 insect cells (Invitrogen) were infected with the corresponding viruses at an multiplicity of infection of 3 for 72 h. Wild type and mutant GST-AID proteins were purified as described previously (13,14), with an additional purification step. All AID proteins were fractionated on Superdex 75 (Amersham Biosciences) and eluted in a buffer containing 50 mM Tris-HCl, 200 mM NaCl, 5 mM dithiothreitol, and 10% glycerol. Fractions were stored at Ϫ80°C. Our current level of purity for wt AID is about 85% based on analyzing integrated resolved band intensities from a Coomassiestained denaturing polyacrylamide gel (Fig. 1). The deamination activity was tested for all preparations of AID and mutant AID using the primer extension assay described previously (13). The deamination activities (pmol of DNA deaminated/g of enzyme/min) of the AID preparations on the one specific DNA oligomer, 5Ј-aaa ggg gaa agC aaa gag gaa agg tga gga ggt-3Ј, were as follows: 2 pmol/g/min (R35E), 1.4 pmol/g/min (R35E/R36D), 1.8 pmol/g/min (C-terminal), and 0.6 pmol/ g/min (wt AID). There was no activity for the R24E AID mutant above the background for our assay (Ͻ6 fmol/g/min). T7 RNA polymerase was purchased from Promega and ultrapure rNTP from Amersham Biosciences. M13mp2 gapped substrate DNA was prepared as described previously (14). M13mp2T7 covalently closed circular dsDNA substrate was constructed by site-directed mutagenesis of M13mp2 phage DNA, in which lacZ␣ nucleotides 279 -299 were replaced with a T7 promoter sequence 5Ј-CTATAGTGAGTCGTATTACGT-3Ј.
Mutation Analysis of AID-targeted C Deamination in Vitro-Deamination specificities of wt and mutant AID were measured under the following reaction conditions: 30-l volume, 50 mM HEPES (pH 7.5), 1 mM dithiothreitol, 10 mM MgCl 2 , 500 ng of gapped DNA (ϭ100 fmol), 0.2 g of RNase A, and 25-200 ng of wt or mutant AID (ϭ0.5-4 pmol). After incubations for 2.5, 5, and 10 min at 37°C, the reactions were quenched by a double extraction with phenol:chloroform:isoamyl alcohol (25:24:1). Conversions of C 3 U on the DNA substrate were detected as white or light blue plaques indicating C 3 T mutations in a lacZ␣ target gene after transfection into uracil glycosylase-deficient E. coli, as described previously (14). Transcription-dependent AID deamination specificity was measured using a covalently closed circular dsDNA M13mp2T7 substrate, undergoing active transcription by T7 RNA polymerase. In a typical 30-l reaction, RNase-activated GST-AID attached to 25 l of glutathione-Sepharose beads (13), 50 ng of dsDNA M13mp2T7 substrate, and 1 l of T7 RNA polymerase in a reaction buffer containing 50 mM HEPES (pH 7.5), 1 mM dithiothreitol, 10 mM MgCl 2 containing 250 M rNTPs was incubated at 37°C for 30 min. The reaction products were extracted twice with phenol:chloroform:isoamyl alcohol (25:24:1). The DNA product was transfected into ung Ϫ E. coli-competent cells and plated on ␣-complementation host cells. Deamination was analyzed by scoring C 3 T and G 3 A mutations in the lacZ␣ gene of light blue and white M13mp2T7 phage. C 3 T mutations indicate that deamination has occurred on the transcribed strand and G 3 A mutations on the nontranscribed strand. To verify that transcription was occurring, we labeled the RNA by including [␣-32 P]ATP in the transcription reaction and followed the synthesis by PAGE using a 6% denaturing gel. Tran-scription of the supercoiled M13mp2T7 resulted in RNA transcripts with lengths primarily in the range of 300 -500 nucleotides.

RESULTS
The specificity of AID-catalyzed deamination of C 3 U on ssDNA recapitulates keynote features of SHM in humans, perhaps most importantly the ability to distinguish between WRC hot spots and SYC cold spots (14). To probe the biochemical basis of AID deamination specificity, we have analyzed spectra from large numbers (ϳ50) of individual clones generated with wt AID and AID mutants.
Spectral and Clonal Analysis of Wild Type AID-catalyzed C Deamination Specificity on ssDNA-The WRC hot spot and SYC cold spot distinctions are not absolute because some WRC sites are more reactive than others, even those having identical WR bases preceding a target C residue, and non-hot spot C residues are occasionally deaminated in preference to WRC sites (14). Here we generate and analyze wt mutational spectra derived from individual ssDNA phage clones to probe the mechanisms of AID deamination specificity.
Human wt AID was incubated for 2.5 and 10 min in the presence of gapped closed circular M13 phage DNA containing a lacZ␣ reporter gene within the single-strand gap (Fig. 2a). C 3 U deaminations on gapped ssDNA were detected as C 3 T mutations in M13 phage (white and light blue plaques) ( Table I) after transfection into uracil glycosylase-deficient E. coli (14) (see sketch, Fig. 2). The fraction of mutated clones is small, 3.7, 5.1, and 5.2% at 2.5, 5, and 10 min, respectively (Table I), and increases approximately linearly with increased levels of AID for incubations of at least 10 min (data not shown). The average number of deaminations/clone is 13, 17, and 23 at 2.5, 5, and 10 min, respectively ( Table I). Half of the mutated clones have large numbers of mutations (20 -70 mutations), yet ϳ95% of the clones reacted with wt AID are not mutated, even after a 10-min incubation (Table I; see also Ref 14).
The observation that the fraction of mutated clones saturates after 5 min whereas the average number of mutations/clone increases for at least 10 min suggests that wt AID may be acting processively; the enzyme appears to remain bound and active on one ssDNA substrate for at least 10 min (Table I). As mentioned previously (14), our experiments using gapped M13 DNA substrate contain roughly equal numbers of unannealed 7.2-kb ssDNA and gapped circular dsDNA substrate. Therefore, it seems unlikely that AID cycles on the same gapped DNA molecule because one might expect that excess ssDNA would act to trap AID that had dissociated from the gapped substrate. Cycling between DNA substrates could, of course, be occurring on a longer time scale, where partitioning between processivity and cycling could depend on such factors as ssDNA sequence and length.
Mutational spectra (Fig. 2a, bar graphs) were obtained from sequencing approximately 50 mutant clones at each incubation time. Deamination patterns for individual DNA clones depict the deamination specificity and spatial distribution of mutations at high resolution (Fig. 2a). The 10 clones shown below each spectrum portray the spatial mutational patterns for the most weakly and strongly deaminated DNA (Fig. 2a, sparse and dense clones, respectively). The colored circular symbols show the locations of WRC hot spot sites (red circles), NNC neutral sites (green circles), and SYC cold spot sites (blue circles) in the target region. The colored T symbols denote hot spot, neutral, and cold spot sites where an AID-catalyzed deamination has occurred resulting in a C 3 T mutation in lacZ␣.
The spectrum for the 2.5-min incubation with wt AID shows that C 3 T mutations occur prominently in WRC hot spots GST-tagged AID was purified by glutathione-Sepharose affinity chromatography followed by Superdex 75 gel filtration. Lane 1, protein molecular mass standards. Lane 2, 20 l of a Superdex 75 peak fraction (ϳ0.5 g).

FIG. 2.
Analysis of C deamination specificity of wt AID acting on ssDNA. a, C 3 U deamination spectra and the identification of C 3 T mutations in DNA clones after incubation with wt AID for 2.5 and 10 min were obtained by sequencing about 50 individual mutant clones. Deaminations resulting in C 3 T mutations in the lacZ␣ target sequence on M13 bacteriophage (sketch at top) are identified as clear or light blue plaques, whereas nonmutated phage appear as dark blue plaques (see "Experimental Procedures"). Each colored bar represents a percentage of mutated phage clones with a C 3 T mutation at the indicated position on the lacZ␣ target sequence (Ϫ217 to ϩ149). Red bars identify C deaminations occurring in 5Ј-WRC hot spot motifs, blue bars represent 5Ј-SYC cold spot motifs, and green bars represent neutral motifs (neither WRC nor SYC). Subsets of individual mutant phage DNA clones, five with lowest and highest numbers of C 3 T mutations, sparse clones, and dense clones, respectively, are shown below each spectrum. The colored dots show the locations of WRC hot spot sites (red), NNC neutral sites (green), (Fig. 2a), where 40 -60% of the clones have mutations at 11 different WRC sites distributed more or less uniformly over the 365-nucleotide target sequence. Mutations at SYC cold spots occur infrequently; two SYC sites (Ϫ202, ϩ109) are mutated in 20% of the clones and five occur in 5-10% of the clones (Ϫ204, Ϫ171, ϩ56, ϩ69, ϩ98) (Fig. 2a). Neutral C sites having neither WR nor SY 5Ј-motifs are mutated at frequencies between those for hot and cold spots (Fig. 2a).
Deaminations tend to congregate in regions in which WRC hot spots appear to act as nucleation sites for multiple deaminations at nearby intermediate and cold spot motifs (Fig. 2a, dense clones). The increased numbers of C 3 T mutations in the 10-min clones occur primarily in neutral and cold spots, with smaller numbers of new hot spot sites undergoing deamination at the later time point. In other words, hot spot deaminations occur first and become saturated in time so that they must decrease at longer incubation times relative to neutral and cold spot mutations, a result that is consistent with processive action of AID. Scatter diagrams show that as the number of mutations/clone increases, the mutant fraction occurring in hot spot motifs decreases, but the fractions in cold spot and neutral spot motifs increase (Fig. 2b). Therefore, AID usually deaminates hot spots first and then targets intermediate and cold spots on the same DNA strand at a later time. Clones with the fewest numbers of mutations (about 30% of the mutated clones have between 1 and 10 mutations, located primarily at WRC hot spots) provide additional evidence that AID deaminates hot spots first and only later acts at neutral and cold spots on the same DNA molecule, as opposed to acting at a hot spot on a different DNA molecule.
The distribution of mutations in sparsely mutated clones suggests that AID can access distal target C residues, separated by as many as 50 -100 bases while remaining bound to a single ssDNA substrate (Fig. 2a, 2.5 min, sparse clones). Clone 1, for example, is deaminated at one hot spot residue and two intermediate residues located on either side of the hot spot, 34 nucleotides to the 3Ј-and 113 nucleotides to the 5Ј-side. Clone 2 is deaminated at three widely separated hot spot residues. Mutational clusters anchored by at least one deaminated hot spot C are evident in clones containing at least five deaminated C residues and appear common in clones having at least 10 mutations (Fig. 2a, 2.5 min, dense clones 45-49). However, even in clones containing multiple mutational clusters, there are widely spaced regions, often greater than 50 nucleotides and up to as many as 150 nucleotides, with no deaminated C residues, even though these nonmutated regions contain nu-merous WRC hot spot motifs (Fig. 2a, 2.5 and 10 min, dense clones 44 -49). Evidence that the nondeaminated hot spots are accessible to AID is that some of the clones have mutations at specific hot spots that are absent in other clones (e.g. Fig. 2a, 2.5 min, compare dense clones 48 and 49). We have observed many independent examples that substantiate this point.
Mutational Distribution and Specificity of N-and C-terminal AID Mutants-AID is composed of a strong positively charged N-terminal domain (ϩ11 net charge), rich in Lys and Arg, an active site region, and a C-terminal domain that is required for CSR but not SHM (25,26) (Fig. 3a). We had proposed (14) that the positively charged N terminus is likely to be involved in ssDNA binding and might thereby be responsible for AID processivity. However, we had not anticipated the major change in AID deamination specificity that occurred when replacing Arg with Glu (R35E) in the N-terminal nonactive site region (Fig. 3a).
The strongest hot spot, CAC, site 130, in the R35E AID deamination spectrum is mutated in ϳ50% of the clones but does not have a WRC signature motif (Fig. 3b). Note that ϳ40% of the clones have a proximal WRC (site 128) mutated. Four of 50 clones contained just one mutation, with three of the four mutated at CAC 130 (Fig. 3a, sparse clones 1-3, 5). The remainder of the spectrum generally resembles wt AID; the mutations are clustered near WRC hot spots and are accompanied by sizable gaps between clusters within which hot spots are deaminated in some clones but not in others (Fig. 3a, dense  clones). Introducing a second mutation in an adjacent amino acid to form R35E/R36D accentuates the specificity difference with wt AID (Fig. 3c). Although the C 3 T mutation at CAC 130 is mutated in about 50% of the clones, as observed with R35E, deamination of the nearby WRC 128 hot spot is attenuated 4-fold, from 40 to 10% for clones with both mutations. The attenuated hot spot makes mutations in the neutral motif CAC 130 stand out even more and provides another example of altered mutational specificity (compare C sites 128 (red) and 130 (green) in Fig. 3, c and b).
We suggest that the double mutant, as a consequence of its reduced N-terminal charge (ϩ7) compared with wt AID (ϩ11), appears to be less processive and cycles much more readily between DNA substrates. Reduced processivity is implied by a 40% reduction in the average number of deaminations/clone for R35E/R36D compared with wt AID (Table I, 2.5-min incubation). An increase in cycling for the double mutant is indicated by a significant increase in the ratio of mutated to nonmutated clones with increasing length of incubation. At 2.5 min, the and SYC cold spot sites (blue) in the target region. Similarly, the colored T symbols indicate C 3 T mutations at hot, neutral, and cold spots. b, relationship between the numbers of mutations/clone and fractions of C 3 T mutations at hot, neutral, and cold spots. Each dot in the graphs represents a single clone with the indicated number of mutations and the fraction of C 3 T mutations at hot, neutral, and cold spots. Trend lines were obtained by fitting the data to the linear regression analysis using Sigma Plots (version 7.0) software. p values are for the null hypothesis that the slope of the trend lines equals 0. fraction of mutated clones is twice as large for wt compared with R35E/R36D, 3.7% versus 1.5%, respectively (Table I).
However, the opposite is true at 10 min, where the mutated clone fraction is 5.2% for wt AID compared with 19.7% for the R35E/R36D double mutant (Table I). Thus, a reduction in N-terminal positive charge has enabled the AID double mutant to access different DNA substrates, presumably by cycling from one to another substrate, with reduced processivity on each. The mutational spectrum for the C-terminal deletion mutant ⌬C AID (Fig. 3d) is similar to wt AID (Fig. 2a). An examination FIG. 3. Deamination specificity of N-and C-terminal AID mutants acting on ssDNA. a, schematic representation of AID domain structure. b, C 3 U deamination spectrum and the identification of C 3 T mutations in DNA clones after incubation with AID mutant R35E for 2.5 min. Individual mutant phage DNA clones, five with lowest and highest numbers of C 3 T mutations, sparse clones, and dense clones, respectively, are shown below each spectrum. Designations for the colored bars in the spectrum and colored dots and T symbols in the clones are defined in the legend for Fig. 2. The R35E AID mutant has reduced the positive charge of the N-terminal ␣ helix from 11 (wt AID) to 9. c, C 3 U deamination spectrum after incubation with AID double mutant R35E/R36D for 2.5 min. The R35E/R36D AID double mutant has reduced the positive charge of the N-terminal ␣ helix from 11 (wt AID) to 7. d, C 3 U deamination spectrum after incubation with AID C-terminal deletion mutant ⌬C for 2.5 min. ⌬C has a deletion of 10 amino acids at the C-terminal end and is proficient for SHM but deficient in CSR. of individual clones for wt and ⌬C shows no significant differences in the spatial distribution of mutations (data not shown). The average number of deaminations/clone and the fraction of mutated clones are similar for wt and ⌬C (Table I). Thus, ⌬C, which is proficient in SHM but deficient in CSR (25,26), appears to mimic wt AID in carrying out SHM-specific processes, i.e. hot spot and cold spot mutational specificities and processivity.
Transcription-dependent Deamination by AID, Spectral and Clonal Analysis-We have determined the spectrum of transcription-dependent AID-catalyzed C deamination by integrating the lacZ␣ reporter gene along with a bacteriophage T7 promoter into a covalently closed circular M13 phage dsDNA and transcribing the gene with T7 RNA polymerase in the presence of wt AID (Fig. 4). C 3 U deaminations were detected as C 3 T and G 3 A mutations in M13 mutant phage (white and light blue plaques) after transfection into uracil glycosylase-deficient E. coli (14) (see sketch, Fig. 4). The background mutant frequency of dsDNA substrate alone is 0.5 ϫ 10 Ϫ3 . A slight increase in mutation frequency (ϳ2-fold) is observed in the presence of either T7 RNA polymerase or AID alone (Table  II). When both AID and T7 RNA polymerase are present, a 4-fold increase in mutation frequencies (3.7 ϫ 10 Ϫ3 ) is observed as a result of transcription-dependent AID-catalyzed deamination (Table II).
We selected 86 mutant phage clones from the transcriptiondependent AID-catalyzed deamination reactions and performed a sequencing analysis. Among these clones, 68 contain only G 3 A or C 3 T mutations, characteristic of AID deamination on the nontranscribed and transcribed strands. The other 18 clones have single background mutations other than G 3 A or C 3 T that may have been generated during transcription because there was an increase in mutation frequency when only T7 RNA polymerase was added to the reaction. To determine the background frequency of G 3 A or C 3 T mutations generated by T7 transcription alone, we sequenced a few randomly selected mutant phage clones reacted with T7 RNA polymerase in the absence of AID. The data show that all clones contain single mutations, but only a small portion of clones have G 3 A (1 of 18) or C 3 T (4 of 18). Thus, most all of the G 3 A or C 3 T mutations observed in the 68 clones from reactions including both T7 RNA polymerase and AID represent deamination action of AID on actively transcribed dsDNA, especially in the clones where there are more than one mutations on either strand.
Half of the 68 analyzed clones have only a single mutation, whereas 20% have two, 23% have 3-6, and 9% have between 10 and 35 mutations (Table III). Most of the mutations are on the nontranscribed strand, with a 13:1 bias over the transcribed strand (219 mutations on the nontranscribed strand and 17 mutations on the transcribed strand). Deaminations occur on the nontranscribed strand in 60 of the clones and on the transcribed strand in 5 clones. In 3 clones, deaminations occur on both strands indicating that AID may be able to perform strand switching during transcription (Table III). The far greater numbers of mutations on the nontranscribed strand is consistent with AID acting on ssDNA (14,15,17,23), i.e. on the single-stranded portion of the transcription bubble. Note that deaminations on the nontranscribed strand are detected as G 3 A mutations, whereas mutations occurring on the transcribed strand are shown as C 3 T (Fig. 4a).
We also observed that the largest numbers of mutations occur close to the promoter region and decline sharply as the distance increases farther downstream (Fig. 4a). Although the lacZ␣ target can detect mutations from nucleotide positions Ϫ70 to 250 (see Fig. 4a An examination of the mutation spectrum of AID-catalyzed C 3 U deamination on an actively transcribed dsDNA template reveals that the strong specificity of AID for deaminating WRC hot spot motifs and the avoidance of SYC cold spots are maintained during transcription (Fig. 4a). Among 219 mutations scored on the nontranscribed strand, 124 (57%) occur in the WRC hot spots, and only 13 (6%) occur in the SYC cold spots. Adjusting for the base composition, the average number of mutations/site is 3.5 for WRC hot spots (124 mutations/35 sites), 1.2 for neutral spots (82 mutations/69 sites), and only 0.5 for SYC cold spots (13 mutations/26 sites). Thus, the average number of mutations at WRC hot spots appears to be about 7-fold higher than those at SYC cold spots.
A more detailed examination of the multiple mutations within individual mutant clones suggests that AID may be tracking along within the moving transcription bubble while deaminating cytosines in a processive manner. As noted above, 32% of the mutant clones have more than three mutations.
Clone 14 contains five small mutation clusters having deaminated C residues from 2 to 5 bases apart. The clusters are separated from each other by ϳ 20 -50 bases, with an occasional C deamination occurring between them (Fig. 4b). Clone 73 contains a minimum cluster composed of two nearby Cs but mainly has single C deaminations with wide spaces in between (Fig. 4b). Clones 87 and 66 show similar patterns of clustered deaminations (Fig. 4b). In these various clones, it seems unlikely that more than one AID molecule could deaminate a tightly spaced cluster of mutations considering the physical constraints of a transcription bubble and the proximity of these mutations, especially when mutations are only 1 or 3 nucleotides apart. These deamination patterns in many mutated clones are consistent with what we observe in AID-deaminated ssDNA clones (Fig. 2a).
Clone 36 contains mutational clusters on both the nontranscribed and transcribed strands (Fig. 4b). The mutations begin on the nontranscribed strand from just inside the promoter region and continue for about 30 nucleotides into the lacZ␣ structural gene. AID appears to switch strands and catalyze deamination of C over a 40-nucleotide region on the transcribed strand and then switch back again, causing individual and clustered deaminations for an additional 50 nucleotides on the nontranscribed strand (Fig. 4b). Although we cannot rule out the possibility that more than one transcription bubble is occurring on one molecule of DNA or that more than one AID is acting on a substrate undergoing transcription, the evidence within the individual clones suggests that AID may be acting processively within the confines of a transcription bubble. DISCUSSION An investigation into the biochemical basis underlying the stochastic behavior of AID is the principal theme of this paper. Our approach is to probe broad spectral properties of wt AIDcatalyzed C deamination and then to focus on the identities and distributions of mutations by examining the individual DNA clones that contribute to the overall mutational spectra. One analysis is performed with AID and AID mutants acting on ssDNA. A related analysis examines the action of AID during the active transcription of a covalently closed circular DNA by T7 RNA polymerase. SHM and CSR require active transcription (4 -6).
SHM patterns in B cells (20,28) and E. coli (29) and in vitro (14) show that hot spots, although clearly favored, are not overtly dominant because C 3 T mutations in WRC occur only FIG. 4. Analysis of C deamination specificity of wt AID acting on dsDNA undergoing transcription by T7 RNA polymerase. a, C 3 U deaminations on the nontranscribed strand give rise to G 3 A mutations (shown above the DNA sequence), and C 3 U deaminations on the transcribed strand give rise to C 3 T mutations (shown below the DNA sequence). Note that although C deaminations result in C 3 T mutations on both strands, the C 3 T mutations on the nontranscribed strand are detected as G 3 A when the DNA clone is sequenced. Arrows in the M13mp2T7 plasmid sketch on the top (see "Experimental Procedures") show the transcription directions of the lacZ␣ gene and the T7 promoter. Red A or T refers to mutations occurring at C residues located within WRC hot spot motifs; mutations at C located in SYC cold spots are shown as blue A or T; mutations at C sites having neither hot spot nor cold spot motifs are shown as green A or T. b, five about ϳ2-fold over randomly mutated C sites (14,28). Hot spot C residues are occasionally not mutated, whereas intermediate and cold spots Cs are (14,28). A compilation of mutated V gene DNA sequences from humans, mice, and cultured B cell lines shows broad heterogeneity in the number and distribution of C 3 T mutations (21,22). Our data show that wt AID working on naked ssDNA behaves in a similar manner (Fig. 2).
Beyond comparing in vitro with in vivo mutational data, a biochemical analysis allows us to delve more deeply into SHM mechanisms. An examination of individual DNA clones reveals the presence of densely clustered mutations separated by lightly mutated regions (Fig. 2a, dense clones). The wide variability in the location and number of mutations for different clones suggests that AID initially binds at a random position on ssDNA and deaminates C residues processively over a short (ϳ10-base) region but then appears to translocate randomly to a different region on the same DNA substrate (Fig. 2a). Longer exposures to AID result in an increase in the fraction of deaminations occurring at intermediate and cold spots relative to hot spots, as would be expected for a processive scanning mechanism (Fig. 2b). However, our evidence to date suggesting a processive scanning mechanism is indirect. Other mechanisms could account in principal for similar types of randomly clustered mutational patterns, including aggregation of AID or binding by different AID molecules. A direct measurement of AID scanning on a single ssDNA substrate will be necessary to resolve the matter of processivity.
A simple model to account for the seemingly disparate processive and translocation properties of AID, as revealed by the spectral and clonal data, envisions AID as a dimer composed of two monomer subunits containing independent active sites (30), bound at two distant locations on the same DNA substrate strand (Fig. 5). Each subunit scans a limited span of DNA processively resulting in mutational clusters with far fewer mutations in the intervening regions. The processive scanning action of each monomer of AID could be limited to short spans of DNA if the monomers bound at two distant locations were also anchoring each other to those locations via a pulling action by each monomer as it attempts to scan in opposite directions independently. If one monomer subunit was released from the DNA, it would be able to translocation to a distant position within the same DNA molecule, as the other monomer may still be bound and anchoring the AID dimer to that same DNA molecule. We have observed that AID binds with similar affinities to ssDNA (Ն30 nucleotides) whether or not C residues are present, 2 which might account for the random and heterogeneous distribution of mutations for different DNA clones.
The N-terminal ␣ helical region of AID carries a large positive charge (ϩ11) (Fig. 3a) that should facilitate binding to the negatively charged ssDNA backbone. AID mutants with Nterminal amino acid substitutions that reduce the charge would thereby be expected to cycle between different ssDNA substrates and be less processive compared with wt AID. Increased cycling and decreased processivity are observed for the R35E/R36D AID double mutant (ϩ7 charge); the AID double mutant catalyzes deamination on 20% of the DNA substrates compared with just 5% for wt AID (Table I, 10-min incubation) and exhibits a concomitant 3-fold reduction in the average number of deaminations/clone (Table I, 2.5-min incubation).
Unexpectedly, however, the R35E/R36D mutant has also altered the mutational specificity so that the most prominent C deamination site, found in 50% of the clones, occurs in a nonhot spot motif (Fig. 3c). This aberrant selectivity is attributable to the R35E substitution (Fig. 3b). The additional R36D substitution further alters the specificity by reducing WRC hot spot mutations located adjacent to the favored non-hot spot motif by 4-fold, from 40 to 10% for clones having both mutations. Two recent studies document the biological importance of N-terminal mutants. First, the mutation R24W (Fig. 3a) has been identified in immunodeficient patients (25), and we have determined that a mutation at this residue (R24E) inactivates representative clones are shown to illustrate clusters of deaminated C residues (indicated by the lacZ␣ nucleotide position) on the nontranscribed strand (clones 14, 87, and 66), widely dispersed C deaminations on the nontranscribed strand (clone 73), and strand switching by AID to generate clustered and dispersed mutations on both nontranscribed and transcribed strands (clone 36). The deamination pattern for clone 36 is consistent with AID carrying out C deamination while translocating on the nontranscribed strand, then switching to the transcribed strand to deaminate C residues, and switching back to the nontranscribed strand.  5. Model describing the processive translocation action of AID on ssDNA. Based on structural data obtained for yeast cytidine deaminase, a model was proposed suggesting that AID is a dimer with separate active sites on each subunit (30). Localized mutational clusters separated by sparsely mutated regions observed in single DNA clones (Fig. 1a) could occur if each subunit performed processive C deamination over a localized region, ϳ10 bases, on ssDNA.
AID deaminase activity in vitro (data not shown). Second, Honjo and co-workers (31) screened AID mutants for defects in SHM by measuring the reversion of G to C at a single hot spot and verified the mutations by sequencing individual clones. SHM-defective AID mutants were found to map at several locations in the positively charged N-terminal ␣ helix. Our biochemical data predict that a change in mutational specificity coupled with a reduction in processivity would result in a general reduction in SHM in accord with the in vivo data (31).
In contrast to the N-terminal mutations, a 10-amino acid C-terminal deletion (⌬C) appears to have essentially no effect on either the activity or specificity of AID in vitro (Fig. 3d). The ⌬C mutant causes specific deficiency in CSR in vivo (25), but it still induces mutations in the S region, suggesting that the C terminus of AID is required for the interaction with a CSRspecific factor (26).
SHM requires the active transcription of Ig V regions (4 -6). In the model transcription-dependent AID-catalyzed deamination system (Fig. 4), we observe heterogeneous mutational clustering favoring WRC hot spot deaminations, consistent with the stochastic action of AID on ssDNA in the absence of transcription (Fig. 2). There is a stronger preference for deaminations at WRC hot spots in the transcription-dependent reaction (7-fold) than on naked ssDNA (4.6-fold) (Table II) (14)). Unlike gapped ssDNA, where AID has access to all possible cytosines at any given time, within a moving transcription bubble, AID must select among a limited number of C residues present in a short span of ssDNA that is only briefly exposed. These results are consistent with the analysis of deamination by AID on ssDNA substrate at short incubation times. The hot spot preference is much more prominent in the AID mutation spectrum at 2.5 min than in the spectrum at 10 min, demonstrating that first AID targets C in WRC hot spots, then later on targets those in neutral and cold spots, provided that AID still has access to them (Fig. 2a).
It has been suggested that AID is unlikely to act processively on transcribed dsDNA because of physical constraints imposed by a moving transcription bubble (32); however, the observation that ϳ50% of mutant clones contain multiple mutations, many clusters with tightly spaced mutations only 1-3 nucleotides apart, suggests that AID may be able to keep up with a moving bubble (4). We also observed that mutations decrease as the distance increases away from the promoter (Fig. 4a). Although the observed decline of mutations in our transcription assay occurs only over a range of 200 nucleotides downstream of the promoter, nevertheless, it is reminiscent of V gene mutational polarity in vivo where a significant drop in mutations is observed ϳ500 nucleotides from the promoter (33)(34)(35).
However, it is important to emphasize that the simplified in vitro T7 transcription assay used here does not reveal the complex in vivo events in which mutations occur on nontranscribed and transcribed strands in roughly equal numbers (36) and where V genes but not C genes are targets for mutation. Although C deaminations occur preferentially on the nontranscribed single-stranded portion of the bubble in agreement with earlier studies (14,17,23,24), and clones exist for which it appears that AID is switching from the nontranscribed to the transcribed and then back to the nontranscribed strand (Fig.  4b), it would obviously be naïve to suggest that strand switching in this simple model transcription system can reflect the complex in vivo situation in which roughly equal numbers of C 3 T mutations occur on both strands (36). Nevertheless, the clonal data demonstrate the physical possibility that AID could access a strand undergoing transcription and switch back and forth deaminating Cs on both strands (Fig. 4b). In vivo, within a human RNA polymerase II-generated transcription bubble, AID could interact with transcription factors that allow it to deaminate both strands. Alternatively, transient negatively supercoiled regions in DNA sequences undergoing transcription upstream of human RNA polymerase II might allow AID to access both DNA strands (37). It is perhaps even more likely that mutations on the transcribed strand could be introduced by error-prone DNA polymerases filling in repair patches or AID-deaminating C residues within those ssDNA gaps, created by the excision of AID generated uracils on the nontranscribed strand. What our data show is that once having gained access to a transcription bubble, AID is then able to target WRC hot spots for deamination, mainly on the nontranscribed strand (Fig. 4). A similar approach using eukaryotic RNA polymerase II acting in conjunction with specific transcription factors and human single-strand binding protein RPA (24) will help to elucidate how AID gains access to a particular region undergoing transcription.
The initiation step of SHM in actively transcribed Ig V genes appears likely to be AID-catalyzed C 3 U deaminations. Therefore, the stochastic behavior and processive action of AID on ssDNA and on transcribed dsDNA substrates might serve as a biochemical basis for understanding the broad and heterogeneous SHM patterns seen in vivo. Copying U by a high fidelity replicative DNA polymerase would generate C 3 T transitions at SHM WRC motifs. Uracil glycosylase-catalyzed conversion of U to an abasic site, with subsequent copying by a low fidelity enzyme such as polymerase (38,39), could generate general C 3 N transversions and transitions in the same motif. And, as mentioned above, the triggering of base excision or mismatch repair enzymes (40,41) to excise an abasic moiety can lead to a diversification of SHM to facilitate mutations at additional hot spot motifs, e.g. TA on both DNA strands.