Activation-induced Cytidine Deaminase-mediated Sequence Diversification Is Transiently Targeted to Newly Integrated DNA Substrates*

The molecular features that allow activation-induced cytidine deaminase (AID) to target Ig and certain non-Ig genes are not understood, although transcription has been implicated as one important parameter. We explored this issue by testing the mutability of a non-Ig transcription cassette in Ig and non-Ig loci of the chicken B cell line DT40. The cassette did not act as a stable long term mutation target but was able to be mutated in an AID-dependent manner for a limited time post-integration. This indicates that newly integrated DNA has molecular characteristics that render it susceptible to modification by AID, with implications for how targeting and mis-targeting of AID occurs.

Gene conversion (GCV) 4 and somatic hypermutation (SHM) are two mechanisms crucial for diversifying the variable region sequence of rearranged Ig loci of B cells. In many organisms, V(D)J recombination is able to generate a large primary repertoire of highly diversified antigen receptors, and SHM is used to expand and fine-tune antibody specificities after antigen encounter in a process called affinity maturation. In the chicken Ig light chain (IgL) and heavy chain loci, combinatorial diversity is severely limited, and GCV and SHM are employed to diversify the primary antibody receptor pool (reviewed in Ref. 1). In the chicken IgL locus, there are 25 pseudo V elements with sequence similarities to the sole functional V region, and they serve as sequence donors in the GCV pathway to create sequence variability. Nontemplated SHM events, which occur at a lower frequency, also contribute to sequence diversity of the primary Ig receptor repertoire (2). In the chicken B cell line DT40, Ig GCV and SHM have been shown to arise from a common intermediate (3).
Activation-induced cytidine deaminase (AID) is required for GCV/SHM and is thought to initiate these reactions by deamination of cytosine to generate uracil in DNA (4 -8). In DT40 cells, the uracils are processed predominantly by uracil DNAglycosylase (UNG), and the resulting abasic site is channeled into either homology-based repair using pseudo V elements to produce GCV events or error-prone repair to produce SHM events.
Little is known about the molecular characteristics that render a gene accessible to AID-mediated diversification processes. Mouse and cell line studies of SHM have demonstrated that transcription is essential, whereas the endogenous Ig promoters and the Ig variable region sequences themselves are not required (9 -13). Non-Ig genes have also been reported to undergo SHM, albeit at frequencies much lower than Ig genes, and such mis-targeting of SHM is thought to contribute to B cell malignancies (14 -18).
We tested the ability of a non-Ig expression cassette to undergo sequence diversification in Ig and non-Ig loci of DT40 cells and found that it underwent AID-dependent mutation in a locus-independent fashion to generate predominantly transition mutations at G/C bp. In addition, despite stable levels of transcription, the mutability of the cassettes was transient and waned in the weeks after integration. Our results indicate that high level transcription is insufficient for stably committing a gene to AID-mediated sequence diversification. Furthermore, the lack of specific targeting information in the cassette is overcome by it being a recently incorporated component of the genome. These results suggest that newly integrated DNA assumes an AID-permissive state similar to that of Ig genes, with implications for how mis-targeting of AID occurs.

EXPERIMENTAL PROCEDURES
Constructs-The human EF1-␣ promoter with 1 kb of its downstream intronic sequence was PCR-amplified (EFNHEF, 5Ј-cttgaaaggagctagcattggctccg-3Ј; EFLHINR, 5Ј-ctctagagaagctttcacgacacc-3Ј) from the pEBB plasmid (19) to replace the ␤-actin promoter in a puromycin selection cassette flanked by mutant loxP sites (20). The resulting EF1␣-puro cassette was cloned into targeting vectors L and E. There are two versions of vector L, both flanked by a left homology arm spanning 3.5 kb upstream of the IgL promoter and a 1-kb right homology arm that starts at the IgL leader intron and extending into the J-C * This work was supported in part by Grant AI066130 from the National Institutes of Health (to D. G. S.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. intron. Targeted integration of either L construct will insert the EF1␣-puro cassette upstream of the promoter; one of the constructs contains a T7 promoter that will replace the endogenous IgL promoter whereas the other preserves the endogenous promoter. The homology arms of construct E include 2.2 kb upstream and downstream of the IgL enhancer, and targeted integration will replace the enhancer with the EF1␣-puro cassette. Clones included in this study have the EF1␣-puro cassette integrated in either transcriptional orientation. As no phenotypic differences have been observed in clones with different transcriptional orientations in the EF1␣-puro cassette, the results are discussed without distinction between the two scenarios.
Cell Culture and Transfection-All CL18-derivative clones of DT40 cells were cultured as described previously (21). Transfections were performed with 25 g of NotI-linearized plasmid or a BamHI fragment containing the EF1␣-puro cassette being electroporated into cells using a Gene Pulser (Bio-Rad) at 580 V and 25 microfarads. 10 -14 h after transfection, the cells were seeded at limited dilutions and selected with 0.5 g/ml puromycin (Sigma). Stable transfectants were identified 6 -8 days after transfection.
Flow Cytometry-Surface IgM expression was monitored by staining cells with a phycoerythrin-␣-chicken IgM antibody (clone M-1, Southern Biotech) and analyzed on a FACScan (BD Biosciences).

Generation and Integration of Heterologous Transcription
Cassettes-We designed a non-Ig expression cassette (Fig. 1a) composed of the puromycin resistance gene driven by the human EF1-␣ promoter, which included a 1-kb region of EF1-␣ intronic sequences downstream of the transcription start site known to contribute to high level transcription (22). In mammalian systems, SHM occurs at highest levels within 1 kb downstream of transcription start sites (23), and thus by separating the puromycin resistance gene from the promoter, we hoped to minimize mutation of the gene and hence the loss of mutated clones during the selection process.
The EF1␣-puro cassette was incorporated into two IgL targeting constructs. The L construct was designed to insert the cassette upstream of the coding VJ region, whereas the E construct was designed to replace the IgL enhancer with the EF1␣puro cassette (Fig. 1b). Either the L or E construct was electroporated into the CL18 clone of DT40 cells, and stably transfected single cell clones were selected in puromycin, expanded, and analyzed for targeted or random integration by Southern blotting (supplemental Fig. 1, A and B). Southern blotting was also used to determine the copy number of the integrated cassettes, and only single-copy integrants were chosen for subsequent analyses. Curiously, the majority of clones generated were single-copy integrants regardless of whether targeted integration had occurred, possibly implying that even random integrants utilized homology-based mechanisms for construct integration. Northern blots demonstrated that all clones expressed the EF1␣-puro cassette, although transcription levels were higher in targeted integrants than random integrants (see below); other clones discussed in subsequent sections were also analyzed by Northern blots to measure expression of the EF1␣-puro cassette (see below, and data not shown).
Heterologous Cassettes Can Undergo AID-dependent Hypermutation in Ig and Non-Ig Loci-We cultured two targeted L clones (L4 and L6) and three targeted E clones (E7, E8, and E18) for 4 weeks after transfection and sequenced the region 125-650 bp downstream of the transcription start site (the sequenced region is entirely within EF1-␣ intronic sequences). The cassette is comprised of the human EF1-␣ promoter (gray rectangle) driving expression of a puromycin resistance gene (white rectangle) and an SV40 polyadenylation signal (white oval). The cassette is flanked by loxP sites (black triangles) to allow Cre-mediated removal. The black bar marks the EF1-␣ intronic sequences included in the promoter region. b, stepwise targeting schemes for constructs L and E. Relevant IgL components are indicated as white rectangles, whereas the promoter is gray, and the transcription start site is indicated by the arrow. The seven promoter-proximal pseudo V gene segments are shown, and the EF1␣-puro cassette is indicated by a white rectangle flanked by loxP sites (black triangles). Construct L contains homology arms flanking the IgL promoter; targeted integration would insert the EF1␣puro cassette upstream of the promoter, and the endogenous promoter would be replaced by the T7 promoter. Construct E has homology arms flanking the IgL enhancer; targeted integration would replace the enhancer with the EF1␣-puro cassette.
Both L4 and L6 were generated using the L construct that contained the T7 promoter so that the IgL promoter was replaced to avoid possible promoter competition (Fig. 1b). As there are no homologous DNA elements to the EF1␣-puro cassette in the DT40 genome, we expected the majority of sequence variations to arise as untemplated point mutations and not by gene conversion (GCV). Mutations were found in all five clones, and collectively, 39 mutations were observed in 164 sequences analyzed, translating into a mutation frequency of 4.5 ϫ 10 Ϫ4 mutations per base pair ( Table 1, lines 1-5). This is not much different from the combined IgL GCV/SHM frequency in other clones in which the IgL locus was unaltered (see below, 7.5 ϫ 10 Ϫ4 events per base pair; Table 2, lines 4 and 5, before and after subcloning). 5 These results indicate that the EF1␣-puro cassette can be mutated when integrated into the IgL locus.
We next asked if the mutability of the EF1␣-puro cassette was restricted to the IgL locus by performing a similar sequence analysis of three random integrant clones L1, L12, and E20 after 4 weeks of culture. The L construct used to generate L1 and L12 contained the IgL promoter (Fig. 1b). Interestingly, the three randomly integrated clones also exhibited a substantial accumulation of mutations in the EF1-␣ puro cassette (combined mutation frequency of 7.6 ϫ 10 Ϫ4 mutations per base pair; Table 1, lines 6 -8 pooled together). To test whether the constructs were being mutated because of a genome-wide mutation phenomenon, we sequenced the endogenous EF1-␣ gene, reasoning that if the cells had a generally elevated mutation activity, other highly transcribed endogenous genes should become targets as well. This was not the case, as very few mutations were found in the endogenous EF1-␣ gene (0.45 ϫ 10 Ϫ4 mutations per base pair; Table 1, line 10), in line with mutation frequencies of non-Ig genes reported previously in DT40 cells (3). These data suggest that mutation activity was specific to the transfected constructs.
Because the L and E constructs contained IgL locus sequences in the homology arms (albeit nonoverlapping sequences), we wondered if those sequences were contributing to the mutability of the EF1␣-puro cassette. To address this, we generated single-copy, randomly integrated clones of the EF1␣puro cassette without any flanking IgL sequences (NF cells). Sequence analysis of three independent clones after at least 4  weeks of culture revealed that the cassettes were mutated at levels higher than those in AID-deficient cells (as described below), although the mutation frequency of the NF construct was lower than L or E (Table 1, line 9). We conclude that the flanking IgL sequences in constructs L and E are not required for the introduction of mutations into the EF1␣-puro cassette.
As is the case for untemplated point mutations in DT40 Ig variable regions (3), the mutations observed in the EF1-␣ puro cassette in L, E, and NF clones were almost entirely at GC base pairs, suggesting that AID might be responsible. To test this, the L construct was transfected into AID-deficient DT40 cells (4), and six independent clones were subjected to sequence analysis after at least 4 weeks of culture. We found only three mutations out of 185 sequences, yielding a mutation frequency (0.31 ϫ 10 Ϫ4 mutations/bp; Table 1, line 11) more than 10-fold lower than that observed in wild-type CL18 cells. We conclude that most of the mutations found in the EF1␣-puro cassette in wild-type cells were dependent on AID.
Heterologous Cassettes Mutate Only Transiently in Ig and Non-Ig Loci-The above results indicated that the EF1␣-puro cassette was mutated regardless of its integration site. This led us to wonder if mutation of the cassette had something to do with it being a piece of DNA recently incorporated into the genome, in which case mutability might be lost after some time in culture. This was investigated by subcloning cells that had been grown for 3 weeks after transfection; randomly chosen subclones were then cultured for 4 additional weeks, which allowed mutations to accumulate for the same amount of time as in the analyses above. Sequence analysis revealed that the random integrant clones L1, L12, and E20 lost their ability to mutate the EF1-␣ construct after subcloning, with combined mutation levels dropping from 7.6 ϫ 10 Ϫ4 to 0.42 ϫ 10 Ϫ4 mutations/bp ( Table 2, lines 4 -6). Intriguingly, even cassettes integrated in the IgL locus did not mutate after subcloning ( Table 2, lines 1-3). We recently observed that the EF1-␣ promoter has a substantial defect in supporting GCV/SHM of IgL when used to replace the endogenous IgL promoter (13). Thus the inability of the EF1␣-puro cassette to support stable mutation in the IgL locus could be due to a defect in the promoter itself. Nonetheless, any defects in mutation targeting could be overcome during the initial period post-integration.
To ensure that the loss of mutability of the EF1␣-puro cassette observed after subcloning was not the result of a general defect in AID-mediated diversification, we examined the IgL locus for evidence of GCV/SHM. The CL18 clone carries a frameshift mutation in the IgL variable region that can be corrected by GCV to allow surface IgM expression, and this provides a convenient readout for ongoing GCV (21). Measurements of surface IgM expression revealed that the L1 and L12 clones gave rise to comparable percentages of IgMϩ cells before and after subcloning after 4 weeks of culture (Fig. 2a). Sequence analysis corroborated results from IgM reversion assays and showed that these cells were still capable of performing robust IgL GCV/SHM after they were subcloned ( Table 2, lines 4 and 5). These results demonstrate that the loss of mutability that occurs after subcloning is specific to the EF1␣-puro cassette and cannot be explained by the loss of an activity or factor, such as AID, required for GCV/SHM. It was also possible that the clones stopped mutating the EF1␣-puro cassette because of decreased transcription after subcloning. To address this, we performed Northern blots on cells before and after subcloning, and we found that expression of the heterologous cassette was unchanged after subcloning (Fig. 2b, and data not shown). Therefore, the loss in mutability was not a result of changes in expression of the cassette.
The Origin of Mutations in the Heterologous Cassettes-We analyzed the distribution and pattern of mutations occurring in the EF1␣-puro cassette by combining all mutations recorded before subcloning. Mutations were detected throughout the region sequenced with two 50-bp intervals (400 -450 and 500 -550 bp from the transcription start site) showing the highest frequencies of mutation (Fig. 3b). This distribution is similar to that reported in mammalian Ig genes and transgenes and does not correlate in any obvious way with the distribution of SHM hot spots (RGYW motifs) in the EF1-␣ intronic region sequenced (Fig. 3b). In addition, there was a heavy bias for transition mutations at G and C residues (67%; Fig. 3a). This is in sharp contrast to what has been reported for SHM events in the DT40 IgL locus, which are biased toward G to C and C to G FIGURE 2. a, flow cytometric plots showing surface IgM staining on the y axis of clones L1 and L12 (random integrants) at day 28 before and after subcloning. b, Northern blot analysis of expression in clones L4 and L6 (targeted integrants) and L1 and L12 (random integrants) before and after subcloning. The blot was hybridized with probes for puromycin and GAPDH, the latter being the internal control. Quantitation was performed by calculating the puromycin:GAPDH ratio. transversion mutations (3,24,25). It is thought that uracil DNA-glycosylase (UNG) recognizes and processes uracils introduced by AID; in UNG Ϫ/Ϫ DT40 cells, GCV is abrogated, and mutation patterns shift largely toward G/C to A/T transitions, presumably because the uracils remain in the DNA during replication (6 -8, 26). The pattern we observe is therefore consistent with a model in which mutation is initiated by AIDmediated deamination of C, with replication of the resulting uracil generating G to A and C to T mutations. The high prevalence of G/C transition mutations seen in our EF1␣-puro cassette suggests that UNG is not efficiently recruited to the cassette and that the repair pathways that would normally follow AID-mediated deamination of Ig genes were not recruited to the EF1␣-puro cassette. This suggests that targeting of AID and targeting of error-prone repair can be dissociated from each other.
A recent report used a stop codon reversion assay to examine GCV of an artificial substrate containing a transcription cassette and an upstream homologous sequence donor in DT40 cells (27); results showed that the unit underwent GCV in the IgL locus but not in a non-Ig locus, suggesting that GCV is limited to Ig regions. There could be two explanations for the differences in the results of that study compared with those reported here. First, the published GCV unit might not have been transcribed strongly enough to be targeted by AID in non-Ig loci. Second, GCV requires the participation of sequence donors, UNG, and homology-based repair, and features of the Ig loci might facilitate this repair pathway. Therefore, even if the GCV unit was targeted by AID when integrated into a non-Ig locus, UNG would not have been recruited efficiently, and GCV would have been inefficient. Our use of direct sequencing enabled detection of AID action even though the normal repair pathways utilized for GCV/SHM of Ig loci in DT40s were not efficiently involved.
Our results suggest that DNA newly introduced into cells has properties, acquired or intrinsic, that distinguish it from existing genomic DNA, and such properties allow it to be targeted by AID. Although it is possible that the cassette can be accessed by AID prior to integration, all mutations reported here occurred post-integration. This is because the clones underwent singlecell cloning prior to expansion and sequencing; therefore, the expanded culture all originated from the same cell harboring a single copy of the transfected transgene. Because only unique mutations within a clone were scored, such mutations could only have arisen post-integration. The EF1␣-puro cassette might have particular features that contribute to its mutability in these experiments, most notably that it contains a strong promoter. The human EF1-␣ promoter used here includes most of the sequences from the EF1-␣ first intron, a region that contains numerous transcription factor binding sites that greatly enhance transcription (22). Further studies will be required to determine how promoter strength and structure contribute to the mutability observed in our experiments.
Numerous studies using mammalian B cell lines or non-B cell lines overexpressing AID have reported the mutation of non-Ig cassettes when integrated into non-Ig loci (10, 28 -31). These studies have led to the idea that transcription itself is the major determinant for AID targeting. Our results initially seemed to be in good agreement with this idea. However, it became apparent after subcloning that the mutability of the cassette was only transient and had dropped to background levels by 3 weeks post-transfection; in contrast, transcription of the cassette was unaffected by subcloning. It is still possible that mutations continued to accumulate after subcloning at rates below the sensitivity of our sequencing assay. Our results indicate that high levels of transcription alone are not sufficient to stably commit a gene to being an efficient target of AID and suggest that high level transcription is not the primary targeting mechanism of the Ig genes.
The transient nature of mutations we observed in the non-Ig cassette suggests that there are molecular properties that distinguish newly integrated DNA from genomic DNA, which allow the newly incorporated DNA to be targeted by AID. These properties are lost gradually after construct integration. The DNA used in transfections was of bacterial origin and hence differs from eukaryotic genomic DNA in several ways, including the fact that it is not packaged in chromatin and that it is not methylated at CpG islands. The relatively small number of redundant mutations found within each clone (7% on average) and the relatively high proportion of sequences without a mutation (78%) indicate that the window of AID accessibility encompasses more than just the first few cell divisions. Our results suggest that intermediate chromatin states of the newly integrated DNA allow it to bypass the lack of specific AID tar- geting information, possibly because these states resemble a chromatin structure that makes Ig genes predominant substrates of AID. A link between chromatin structure and AID accessibility was previously suggested by the finding that a histone deacetylase inhibitor could expand the region of the IgH locus subject to SHM (32). The existence of molecular similarities between newly incorporated DNA and mutationally active Ig genes could reflect the evolutionary origin of AID as a member of a deaminase family shown to have antiviral activities in part by editing viral genomes (see Ref. 33 and reviewed in Ref. 34). The potential of non-Ig genes to mimic features of Ig genes that confer AID accessibility also provides a plausible explanation for how mis-targeting of SHM occurs.