Overcoming Transcription Activator-like Effector (TALE) DNA Binding Domain Sensitivity to Cytosine Methylation*♦

Background: TALE-based technologies are poised to revolutionize the field of biotechnology; however, their sensitivity to cytosine methylation may drastically restrict their ranges of applications. Results: TALE repeat N* proficiently accommodates 5-methylated cytosine. Conclusion: Sensitivity of TALE to cytosine methylation can be overcome by using TALE repeat N*. Significance: Utilization of TALE repeat N* enables broadening the scope of TALE-based technologies. Within the past 2 years, transcription activator-like effector (TALE) DNA binding domains have emerged as the new generation of engineerable platform for production of custom DNA binding domains. However, their recently described sensitivity to cytosine methylation represents a major bottleneck for genome engineering applications. Using a combination of biochemical, structural, and cellular approaches, we were able to identify the molecular basis of such sensitivity and propose a simple, drug-free, and universal method to overcome it.

Transcription activator-like effectors (TALEs), 4 a group of bacterial plant pathogen proteins, have recently emerged as new engineerable scaffolds for production of tailored DNA binding domains with chosen specificities (1). Interest in these systems comes from the apparent simple cipher governing DNA recognition by their DNA binding domain (2,3). The TALE DNA binding domain is composed of multiple TALE repeats that individually recognize one DNA base pair through specific amino acid di-residues (repeat variable di-residues or RVDs). The remarkably high specificity of TALE repeats and the apparent absence of context-dependent effects among repeats in an array allow modular assembly of TALE DNA binding domains able to recognize almost any DNA sequence of interest. Within the past 2 years, engineered TALE DNA binding domains have been fused to transcription activator (dTALEs) (4), repressor (5), or nuclease domains (TALENs) (6) and used to specifically regulate or modify genes of interest (1). Although successfully used in different cellular contexts, engineered TALE DNA binding domains have recently been reported to be affected by the presence of 5-methylated cytosine (5mC) in their endogenous cognate target (7). Often considered as the fifth base, 5mC is found in about 70% of CpG dinucleotides in mammalian and plant somatic/pluripotent cells (8,9) and has also been reported in 5-cytosine-phosphoadenine, 5-cytosine-phosphothymine, and 5-cytosine-phosphocytosine dinucleotides (10). Moreover, 5mC has been identified in CpG islands embedded in many promoters (11) and, to a higher extent, in proximal exons of several genes (12). These two critical regulatory regions are generally chosen by investigators to knock out genes of therapeutic and biotechnological interest or to modulate their expression using TALE-based technologies. The ubiquity of 5mC in different cell types and genomic kingdoms, its particular localization, and its negative impact on dTALE activity reported in Ref. 7 make this epigenetic modification a major drawback for all TALE-based technologies ranging from genome engineering, iPS production/ differentiation, to therapeutic applications. Utilization of 5-aza-2-deoxycytidine (5-aza-dC) as a demethylating agent could be an option to overcome such a drawback as suggested earlier (7). However, its cytotoxicity and well known pleiotropic effect (13) make it unsuitable for precise therapeutic and biotechnological genome engineering applications. Therefore, alternative and harmless strategies are highly desired. One of the most relevant strategies would be to develop 5mC-insensitive TALE DNA binding domains. To date, the cipher governing TALE repeat/DNA base recognition has been thoroughly documented for the unmodified bases A, T, G, and C (2, 3). However, 5mC has not been reported to be specifically recognized by any of the naturally occurring TALE repeats described so far. Here we unravel the ability of TALE repeat N* to efficiently accommodate 5mC and describe a simple, drug-free, and universal approach to overcome TALE DNA binding domain sensitivity to cytosine methylation. Besides the important biotechnological prospects raised by our work, we provide the first example of TALEN-mediated processing of methylated XPC loci, paving the way for xeroderma pigmentosumtargeted gene therapy.

EXPERIMENTAL PROCEDURES
TALEN Constructs-XPCT1, XPCT2, and XPCT3 TALENs and their respective variants were purchased from Cellectis Bioresearch (Paris, France). Their amino acid sequences and their cognate DNA target are documented in supplemental  Table III.
Monitoring TALEN Extrachromosomal SSA Activity-CHO-KI cells were plated at 2500 cells/well in a 96-well plate. The next day, cells were cotransfected by increasing amounts of DNA encoding XPC TALEN (from 0 to 25 ng each) and a constant amount of XPC extrachromosomal unmethylated target (75 ng) using PolyFect transfection reagent (Qiagen) according to the manufacturer's protocol. TALENs single strand annealing (SSA) activities were determined according to the protocol described in Ref. 14.
Monitoring of TALEN-induced Targeted Mutagenesis (TM)-To evaluate the ability of different XPC TALENs to induce TM at their endogenous loci, 293H cells were first plated at a density of 1.2 ϫ 10 6 cells/10-cm dish. The next day, cells were transfected with a total amount of 2, 5, or 10 g of TALEN expressing vector or empty vector using Lipofectamine 2000 transfection reagent (Life Technologies) according to the manufacturer's protocol. Two or 3 days after transfection, genomic DNA was extracted, and the loci of interest were amplified with locus-specific primers (supplemental Table I) linked to adaptor sequences needed for the deep sequencing method. Amplicons were analyzed either by EndoT7 assay according to the protocol described in Ref. 15 or by deep sequencing using the 454 system (Roche Applied Science, an average of 5000 sequences/sample were analyzed).
5-Aza-2-deoxycytidine Treatment, Bisulfite Treatment, and DNA Sequencing-To investigate the influence of 5-aza-dC on the methylation status of XPC loci and on XPC TALENs activities, 293H cells were pretreated every day and for 2 days with 0.2 M 5-aza-dC before transfection, and this treatment was maintained up to 48 h after transfection. To determine the level of DNA methylation, genomic DNA was extracted, treated by bisulfite according to the manufacturer's protocol (EZ DNA Methylation-Gold kit, Zymo Research), and then amplified by PCR using primers that were specific for the flanking regions of the bisulfite-treated XPC loci (supplemental Table I). PCR amplicons were then analyzed by regular sequencing or deep sequencing using the 454 system.
Toxicity Assay-The CHO-KI cell line was transfected in a 96-well plate as described above, with increasing amounts of TALEN expression vectors and a constant amount of GFP-encoding plasmid. GFP levels were monitored by flow cytometry (Guava EasyCyte, Guava Technologies) 1 and 6 days after transfection. Cell survival was calculated as a ratio (TALEN-transfected cells expressing GFP at day 6/control-transfected cells expressing GFP at day 6). Ratios were corrected for the trans-fection efficiency determined at day 1 and plotted as a function of final concentration of DNA transfected (in mM).
TALE DNA Binding Domain Overexpression and Purification-The coding sequences for XPCT1L_HD and N* recombinant TALE DNA binding domains were subcloned into the kanamycin-resistant pET-24 vector MCS located upstream a His 6 tag coding sequence. Recombinant proteins were overexpressed and purified according to the protocol described previously (16). Freshly purified recombinant XPCT1 left TALEN (XPCT1L) DNA binding domains (80% homogeneity, ϳ2 mg/ml) were used to perform equilibrium binding assays.
In Vitro Binding Assay-3Ј-Carboxyfluorescein (FAM)-labeled DNA single strand oligonucleotide, corresponding to XPC1 left target (XPC1L), was synthesized and HPLC-purified by Eurogentec (supplemental Table II). To prepare DNA duplex corresponding to XPC1L double strand DNA, XPC1L_Forward labeled with FAM on its 3Ј end was mixed with 1 eq of XPC1L_Reverse in 100 mM Tris-HCl, 50 mM EDTA, 150 mM NaCl, p H8. The mixture was heated to 95°C for 2 min and then cooled down to 25°C over 1 h. XPC1L duplex and FAM final concentrations were assessed by spectrophotometry using their respective extinction coefficients at 260 and 495 nm. As expected, a ratio [XPC1L duplex]/[FAM] ϳ1 was obtained. This procedure was used to prepare all other DNA targets used in our experiments. Oligonucleotide sequences are documented supplemental Table II. To investigate the binding of TALE DNA binding domain to XPCT1L duplex, 5 nM XPC1L duplex was incubated with increasing concentrations of XPCT1L TALE DNA binding domain (from 0 to 250 nM) in binding buffer (10 mM Tris-HCl, 175 mM NaCl, pH 8) at 25°C. After a 5-h incubation, necessary to reach equilibrium, the fluorescence anisotropy of the mixture was recorded with a PHERAstar Plus (BMG Labtech) operating in fluorescence polarization end point mode with excitation and emission wavelengths set to 495 and 520 nm, respectively.
To avoid any experimental bias due to the presence of protein aggregates in our purified protein preparations and to compare binding isotherms obtained from one protein with those obtained from another, the fraction of active recombinant proteins (i.e. folded proteins that are competent for DNA binding) was determined using a stoichiometry binding assay (17). Using three different concentrations of XPC1L unmethylated DNA duplex (100, 200, and 300 nM), we were able to determine that the mean percentages of active recombinant XPCT1L-HD and N* proteins were 9 and 13%, respectively. Variations of fluorescence anisotropy were represented as a function of active XPCT1L-HD or N* concentrations. All binding isotherms displayed positive cooperativity ([XPCT1L] 0.9 / [XPCT1L] 0.1 ϳ10) consistent with Ref. 18. Apparent dissociation constants (K d ) were determined with the Simfit/Sffit programs using a cooperative ligand binding saturation function (19).
FoldX Modeling of TALE Repeat HD and N* in Complex with 5mC-TALE repeat N* and HD tridimensional structures were obtained from PthXO TAL effector crystal structure (Protein Data Bank (PDB) ID 3UGM) (20). TALE repeat N* 7 and TALE repeat HD 8 in complex with their respective cytosines were used as templates to model methylation in position 5 of each cytosine, using the FoldX software. It should be noted that the FoldX software modeling algorithm did not allow for conformational change of TALE repeat backbone.
U2OS Methylome Survey-The U2OS methylome documented by Ohm et al. (21) under the sample and series names GSM739944 and GSE29872, respectively, was surveyed using a BLAST search to look for the TALEN DNA targets documented by Reyon et al. (15) and determine their methylation status. The ␤ value associated with the retrieved sequences (hits) indicates their methylation status. A score of 1 is expected for full methylation, 0 for absence of methylation, and 0 Յ ␤ Յ 1 for all signals between (21).

RESULTS AND DISCUSSION
Although there was no prior evidence that the TALE repeat N* could bind 5mC, our interest in exploring this possibility was driven by its dual ability to bind cytosine and thymine with the same efficiency (2). Due to its missing glycine residue in position 13, the RVD loop of TALE repeat N* does not extend as deep into the DNA major groove as the other TALE repeats (20). Consistent with its apparent dual specificity, this aspect of its structure suggested that N* could avoid steric clashes with the additional methyl moiety present at position 5 of thymine and, by extension, potentially accommodate the methyl present in 5mC. This property was unlikely to be shared by TALE repeat HD, which displayed hydrophobic interactions and hydrogen bonding with the aromatic ring and the amine moieties of cytosine (20,22). Therefore, as suggested by previous observations (7), TALE DNA binding domains bearing the TALE repeat HD are likely to be sensitive to cytosine methylation, and we envisioned that substituting HD repeats by N* repeats could overcome methylation sensitivity. To examine this hypothesis, we engineered two TALEN variants (XPCT1-HD and XPCT1-N*) containing either HD or N* at position ϩ2 of their left TALE DNA binding domain (Fig. 1A) and compared their ability to introduce targeted DNA modifications into the methylated XPC locus, involved in the development of the human genetic disease xeroderma pigmentosum (23).
To first control whether substitution of HD to N* affected the intrinsic nuclease activity of XPCT, we performed an SSA assay in Chinese hamster ovary (CHO) cells (14), using an unmethylated extrachromosomal XPC1 target (Fig. 1). Our results showed that both TALENs displayed similar intrinsic activity (Fig. 1B). We then assessed the ability of these TALENs to disrupt the endogenous methylated XPC1 target in 293H cells by TM. TALEN-induced TM, consisting of small insertion or deletion of nucleotide generated via imprecise nonhomologous end joining, was determined by an EndoT7 assay and/or by deep sequencing as described previously (15,24). We observed that although both TALENs share similar intrinsic nuclease activities, XPCT1-N* induced at least 17 times more TM events than XPCT1-HD ( Fig. 1C and supplemental Fig. 1A). The difference of TM frequency induced by the two TALENs was due neither to variation of transfection efficiency from one TALEN to another nor to a difference of protein expression (supplemental Fig. 1B), thus suggesting a possible involvement of 5mC in the observed difference on TM efficiency. To evaluate FIGURE 1. TALE repeat N* enables overcoming XPCT1 TALEN sensitivity to 5mC in 293H cells. A, schematic representation of the XPCT1 TALEN model used to investigate the influence of TALE repeat N* on TALE DNA binding domain sensitivity to 5mC. Sequence of the XPC1 DNA target is indicated; the 5mC located at position ϩ 2 of the left target is colored in red, and the 11-bp spacer between the right and left targets is colored in blue. TALE repeat array sequences bearing HD (blue), NI (yellow), NN (green), and NG (purple) and constituting the TALE DNA binding domains of XPCT1 left (XPCT1L-HD or N*) and right (XPCT1R) are indicated. N-terminal, C-terminal, and FokI domains are colored in black. B, intrinsic nuclease activity of XPCT1-HD or N* TALENs investigated by SSA assay using an extrachromosomal unmethylated XPC1 target (14). C, TM of endogenous methylated XPC1 target induced by 5 g of XPCT1-HD-or N* TALEN-encoding plasmids in 293H cells, determined by deep sequencing (n ϭ 2). Error bars indicate S.D. NOVEMBER 9, 2012 • VOLUME 287 • NUMBER 46 whether this difference of activity originated from the presence of 5mC in the endogenous XPC1 target, we tested the ability of 5-aza-dC, a well known demethylating agent, to rescue XPCT1-HD activity. Interestingly, treatment of 293H cells by 200 nM 5-aza-dC resulted in about 20% of XPC1 demethylation (supplemental Fig. 2A) and induced more than 6-fold increase of XPCT1-HD dependent TM frequency (supplemental Fig.  2B, left panel) without influencing its protein expression (supplemental Fig. 2C). In contrast, XPCT1-N* activity remained unchanged (supplemental Fig. 2B, right panel), indicating that TALE repeat N* binding capacity was not influenced by the presence of the methyl moiety of 5mC. Together, our results showed that XPCT1-HD was sensitive to cytosine methylation and that such sensitivity can be efficiently overcome by substituting the TALE repeat HD with N* at position ϩ2 of its DNA binding domain.

Efficient Recognition of 5-methylated Cytosine by TALE Repeat N*
To investigate the molecular mechanism underlying sensitivity of XPCT1-HD to 5mC, the DNA binding affinity of the XPCT1L was investigated in vitro in the presence of unmethylated or methylated DNA targets. TALE DNA binding domains of XPCT1L-HD and XPCT1L-N* fused to an N-terminal His tag were overexpressed in Escherichia coli, purified by affinity column chromatography, and used to perform equilibrium binding assays with fluorescent XPCT1L DNA targets (supplemental Fig. 3). Binding isotherms, obtained from the variation of DNA fluorescence anisotropy upon increasing protein concentration, showed that full methylation of XPC1L DNA target impaired the DNA binding capacity of XPCT1L-HD (supplemental Fig. 3A, compare open circles and open triangles; supplemental Table IV). This impairment was essentially due to the presence of 5mC at position ϩ2 of the sense strand (supplemental Fig. 3A, closed triangles; supplemental Table IV), in full agreement with the general DNA binding mode of the TALE repeat, which is known to interact exclusively with the sense strand (2,3,20,22). In contrast, the DNA binding capacity of XPCT1L-N* remained unaffected by the presence of 5mC on both DNA strands (supplemental Fig. 6B). We thus conclude that the additional methyl moiety present in 5mC induces a binding penalty that prevents proficient binding by the TALE repeat HD. Such a binding penalty is alleviated in the case of TALE repeat N*, probably because of its peculiar RVD loop structure. The availability of three-dimensional coordinates of TALE repeats HD and N* in complex with cytosine (20) allowed us to rationalize this assumption at a molecular level. Indeed, when the TALE repeat HD was modeled in the presence of 5mC, an obvious steric clash could be observed between the C␤ of aspartate 13 (D) and the methyl moiety of 5mC (distance between D_C␤ and C_C5 Ͻ3 Å, supplemental Fig. 4B), thus implying that RVD HD loop must operate conformational changes to accommodate 5mC. TALE repeat N*, however, remained free of any steric hindrance with regard to 5mC (supplemental Fig. 4C), explaining its ability to efficiently accommodate the 5mC in position ϩ2 of the endogenous XPC1 target. Interestingly, our in vitro results also showed that XPCT1-N* and XPCT1-HD harbored the same affinity for unmethylated XPC1L target (supplemental Fig. 3 and supplemental Table IV). This indicates that the binding penalty caused by the loss of interactions between aspartate 13 and cytosine (supplemental Fig. 4, B and C, top) is not sufficient enough to affect the overall TALE DNA binding affinity.
In light of these structural and biochemical considerations, we hypothesized that other naturally occurring TALE repeats, either lacking or harboring small side chain residues at position 13, could also bind 5mC. To confirm this, we assessed the ability of TALE repeats H* and NG to substitute HD within XPCT1 TALE DNA binding domain and rescue its activity toward its endogenous methylated locus in 293H cells (supplemental Fig.  5). Our results showed that both TALE repeats H* and NG could rescue XPCT1 activity, with a clear advantage for H*, which was almost as efficient as N*. We thus conclude that although small amino acids in position 13 can accommodate 5mC, absence of such amino acids, the hallmark of the TALE repeat *, leads to more proficient 5mC recognition.
The ability of TALE repeat N* to overcome TALE DNA binding domain sensitivity to 5mC was then challenged using two other engineered TALENs, XPCT2 and XPCT3, specifically designed to process different methylated endogenous XPC targets (XPC2 and XPC3). These targets contained one and two 5mCs, respectively, located at different positions ( Fig. 2A), making it possible to evaluate the influence of the number and position of N* repeats in a TALE DNA binding domain. TALEN activities of XPCT2-N* and XPCT3-N* were determined in 293H cells according to the protocol described above and then compared with their HD counterparts (Fig. 2B). Our EndoT7 assays and deep sequencing results showed that N* variants were always the most active, indicating that TALE repeat N* is able to successfully bind 5mC in different contexts. Interestingly, the basal activities of TALEN-HD variants and their enhancement observed after HD/N* substitution were different from one TALEN to another (from 0.7 to 3% of TM frequency and from 2-to 17-fold enhancement respectively, Fig. 2B). These results suggest that the binding penalty induced by 5mC and the activity enhancement promoted by HD/N* substitution depend on the number and the position of 5mC within the TALE DNA binding site. However, we cannot rule out the influence of TALE DNA target sequence on the overall sensitivity of TALENs to 5mC, and further experiments are needed to address this point.
Finally, we verified that HD/N* substitution within TALE DNA binding domains did not increase TALEN-induced toxicity in CHO cells using the protocol described by Grizot et al. (14). For all TALENs tested, we found that the presence of single or multiple TALE repeats N* did not influence TALENinduced toxicity as seen by similar cell survival patterns obtained between HD and N* variants (supplemental Fig. 6). In full agreement with their lack of toxicity, TALEN-N* variants displayed similar TM frequencies in 293H cells, 3 or 7 days after transfection (less than 2-fold difference of TM frequencies was observed between day 3 and day 7, data not shown). Consistent with this absence of toxicity, naturally occurring TAL effectors were reported to bear up to 23% of TALE repeat N* (2, 3) within their DNA binding domain, whereas retaining high specificity. Therefore, taken together, our results showed that the TALE repeat N* could be used as a universal 5mC binding module without affecting toxicity of engineered TALE DNA binding domains.
The negative impact of cytosine methylation on TALEN activity could also be found in the recently published work of Reyon et al. (15). This extensive work reported that 12 out of 96 different TALENs tested in U2OS cells were inactive, without clearly identifying the reason of this failure. The presence of multiple CpG dinucleotides in almost all DNA sequences targeted in this study prompted us to look for possible correlation between TALEN activity and CpG methylation of their respective target. A BLAST survey of U2OS cells methylome (21) enabled us to identify 15 different TALEN targets and determine their methylation status (supplemental Table V). Among them, two were found to be fully methylated. Interestingly, these two methylated sequences, named HOXD11 and TLX3, fell into the group of unprocessed targets reported by Reyon et al. (15), which was consistent with the inhibitory effect of cytosine methylation on TALEN activity (supplemental Fig. 7). The 13 other unmethylated targets were reported to be processed with widely variable outcomes (15). The failure to process the unmethylated TGFBR2 target represents an extreme case indicating that other factors, including chromatin compaction, target sequence, or TALEN expression as the most obvious ones, could also significantly influence TALEN activity.
The identification of TALE repeat N* as a universal 5mC binding module unmasks exciting prospects for the fields of genome engineering, gene regulation, and gene therapy. Among the most promising ones is the ability to reprogram somatic cells into iPS cells by dTALE-mediated activations of pluripotency genes, known to be silenced by hypermethylation (c-Myc and Oct4) (7). A similar approach could also be used to prevent development of a wide variety of malignant cell lines by promoting activation of hypermethylated genes involved in tumor suppression (25,26). Finally, the successful TALEN-mediated processing of methylated XPC loci paved the way for XPC gene correction in a patient cell line, a step further in xeroderma pigmentosum-targeted gene therapy.
In summary, our work demonstrates the ability of TALE repeat N* to efficiently accommodate 5mC. Based on this finding, we present a simple, efficient, and universal method to overcome TALE DNA binding domain sensitivity to cytosine methylation. Such a method presents three major advantages. First, it allows one to bypass the need for chemical demethylation of endogenous targets, which is unsuitable for cell engineering and therapeutic applications. Second, it is readily applicable to all TALE-derived proteins, and in particular, to engineered transcription activators, thus potentially enabling site-specific activation of methylated promoters responsible for gene silencing. Third, it is transposable to the broad range of cellular systems including ES, iPS mammalian cells, and plant cells that have already been shown to be engineerable with TALE-based technologies.