Factor-induced Reprogramming and Zinc Finger Nuclease-aided Gene Targeting Cause Different Genome Instability in β-Thalassemia Induced Pluripotent Stem Cells (iPSCs)*

Background: Genome alterations need to be investigated before clinical application of iPS cells. Results: Reprogramming and gene targeting can generate substantial but different genomic variations. Conclusion: Stringent genomic monitoring and selection are needed both at the time of iPSC derivation and after gene targeting. Significance: This study examined the genome instability during iPSC generation and subsequent gene correction and revealed different genome alterations at each step. The generation of personalized induced pluripotent stem cells (iPSCs) followed by targeted genome editing provides an opportunity for developing customized effective cellular therapies for genetic disorders. However, it is critical to ascertain whether edited iPSCs harbor unfavorable genomic variations before their clinical application. To examine the mutation status of the edited iPSC genome and trace the origin of possible mutations at different steps, we have generated virus-free iPSCs from amniotic cells carrying homozygous point mutations in β-hemoglobin gene (HBB) that cause severe β-thalassemia (β-Thal), corrected the mutations in both HBB alleles by zinc finger nuclease-aided gene targeting, and obtained the final HBB gene-corrected iPSCs by excising the exogenous drug resistance gene with Cre recombinase. Through comparative genomic hybridization and whole-exome sequencing, we uncovered seven copy number variations, five small insertions/deletions, and 64 single nucleotide variations (SNVs) in β-Thal iPSCs before the gene targeting step and found a single small copy number variation, 19 insertions/deletions, and 340 single nucleotide variations in the final gene-corrected β-Thal iPSCs. Our data revealed that substantial but different genomic variations occurred at factor-induced somatic cell reprogramming and zinc finger nuclease-aided gene targeting steps, suggesting that stringent genomic monitoring and selection are needed both at the time of iPSC derivation and after gene targeting.

The generation of personalized induced pluripotent stem cells (iPSCs) followed by targeted genome editing provides an opportunity for developing customized effective cellular therapies for genetic disorders. However, it is critical to ascertain whether edited iPSCs harbor unfavorable genomic variations before their clinical application. To examine the mutation status of the edited iPSC genome and trace the origin of possible mutations at different steps, we have generated virus-free iPSCs from amniotic cells carrying homozygous point mutations in ␤-hemoglobin gene (HBB) that cause severe ␤-thalassemia (␤-Thal), corrected the mutations in both HBB alleles by zinc finger nuclease-aided gene targeting, and obtained the final HBB genecorrected iPSCs by excising the exogenous drug resistance gene with Cre recombinase. Through comparative genomic hybridization and whole-exome sequencing, we uncovered seven copy number variations, five small insertions/deletions, and 64 single nucleotide variations (SNVs) in ␤-Thal iPSCs before the gene targeting step and found a single small copy number variation, 19 insertions/deletions, and 340 single nucleotide variations in the final gene-corrected ␤-Thal iPSCs. Our data revealed that substantial but different genomic variations occurred at factorinduced somatic cell reprogramming and zinc finger nucleaseaided gene targeting steps, suggesting that stringent genomic monitoring and selection are needed both at the time of iPSC derivation and after gene targeting.
Human iPSCs 4 can undergo indefinite self-renewal while maintaining the potential to generate all somatic cell types in the body, thus opening up new ways for developmental biology research, disease modeling, and applications in regeneration medicine. Indeed, combining iPSC generation with targeted genome editing had been used for modeling various genetic diseases, including ␤-Thal (1,2). ␤-Thal is one of the most common genetic diseases in the world, and patients suffering from severe anemia need regular blood transfusions. It is caused by mutations or deletions in the ␤-hemoglobin gene (HBB) that destroy the normal function of red blood cells (3,4). Currently, transplantation of bone marrow from a healthy donor is the only way to cure ␤-Thal, but this treatment is limited by the lack of human leukocyte antigen-matched donors. Theoretically, the generation of iPSCs from ␤-Thal patients followed by targeted genome correction of mutated HBB could be an ideal new treatment for these diseases (5). The recent development of genome editing tools, such as zinc finger nucleases (ZFNs) (6), transcriptional activator-like effector nucleases (7), and clustered regulatory interspaced short palindromic repeat/Cas9-based RNA-guided DNA endonucleases (8), has significantly improved gene targeting efficiency in human iPSCs or embryonic stem cells, thus making it practica-ble to generate personalized, gene-corrected iPSCs for cell therapy. However, it is critical to evaluate whether the reprogramming and the subsequent gene targeting steps generate unwanted genome alterations before application of this type of cellular therapy in clinical practice.
The generation of gene-corrected iPSCs requires factor-induced somatic reprogramming and nuclease-aided gene targeting steps. The impact on genome stability of reprogramming or gene targeting has drawn lots of attention. For example, it was reported that iPSCs carried more frequent CNVs than other cell lines, such as ES cells and somatic cells (9,10). Some of these CNVs were certainly attributed to the reprogramming process (11)(12)(13)(14). However, in another report, very few nucleotide level variations, such as non-synonymous single nucleotide variations (SNVs) and insertions/deletions (Indels), were found in iPSCs generated through a non-viral approach (15). Similarly, the impact on genome stability of genomeediting tools, such as transcriptional activator-like effector nucleases or clustered regulatory interspaced short palindromic repeat/Cas9, has also been analyzed (16). In general, these genome-editing tools seemed not to induce much genome variation based on the whole-genome sequencing data (17)(18)(19), suggesting that these tools might be safe for clinical applications.
The current study was designed to examine the genome variations generated throughout the process of producing genecorrected ␤-Thal iPSCs, including iPSC generation through a non-viral approach, clonal selection, expansion, genome editing, and exogenous gene excision. We first generated an integration-free ␤-Thal iPSC line from amniocytes that carried homozygous point mutations in the second intron of HBB (site 654). We then corrected both mutated HBB alleles by ZFNaided gene targeting and excised exogenous drug resistance genes to obtain the final HBB-corrected iPSCs. Next, we performed sequential CGHs and exome sequencing on parental cells used for iPSC derivation, iPSCs before gene correction, and the final gene-corrected ␤-Thal iPSCs. Our results showed that iPSC derivation even with a non-viral approach could generate a certain number of variations, including both CNVs and SNVs. Meanwhile, the subsequent ZFN-aided gene targeting caused negligible CNVs but a lot more SNVs. Our analysis indicated that factor-induced somatic cell reprogramming and ZFN-aided gene targeting tend to generate different genomic variations. These variations need to be carefully analyzed and evaluated before the clinical application of personalized genecorrected iPSCs for cellular therapy.

EXPERIMENTAL PROCEDURES
iPSC Generation and Cell Culture-The methods of isolating amniotic fluid cells and iPSC generation from a ␤-thalassemia patient were performed as described previously (2,20). The amniotic fluid cells were cultured in Amniogrow PLUS (Cytogen), and the iPS cells were maintained in mTeSR1 (Stemcell Technologies). All cell types were maintained at 5% CO 2 .
ZFNs and Donor Vectors for Gene Targeting-ZFNs were designed by Sigma-Aldrich. The ZFN pair was designed to target the 3Ј-side region that was ϳ600 bp downstream of the last exon of HBB gene (see Fig. 1D, underlined). The HBB-ZFN pair recognition sequences were 5Ј-CACTCTTTCACAGTCTGC and 5Ј-CTAAGCCCAGTCCTT. They were expressed from two plasmids under the control of the CMV promoter. The left and right homology arms were amplified from genomic DNA of a healthy individual. The primer sets HBBL-F/R amplified the 2.3-kb left arm, and HBBR-F/R amplified the 1.5-kb right arm. A loxP-flanked PGK-neomycin cassette or loxP-flanked PGKpuromycin cassette were inserted between two homology arms into the pMD-18T vector (Takara). For targeting, 1 ϫ 10 6 iPSCs were electroporated with 2 g of donor DNA and 4.5 g of each ZFN plasmid. Then the electroporated iPS cells were plated onto Matrigel-coated 6-well plates in the presence of Y-27632 (10 nM; Sigma) for 1 day. Positive clones were selected by puromycin (0.5 g/ml) or G418 (100 g/ml; Sigma) in mTeSR1. Primers sequences are listed in Table 1.
PCR Detection of Corrected Clones-Genomic DNA was extracted using the TIANamp Genomic DNA kit (Tiangen) for PCR analysis. 50 -100 ng of genomic DNA templates and LA Taq (Takara) were used in all PCRs. The primer set including P1 and P2 was used to amplify a 2.8-kb product of the 5Ј-junction of a targeted integration (see Fig. 1D). The primer set including P3 and P4 was used to amplify a 2-kb product or a 500-bp product to identify whether random integration occurred. The primer pair IVS-654-F/R was used to amplify a 600-bp product containing the mutant region of HBB, and then PCR products were sequenced to identify the corrected clones. All primers sequences are listed in Table 1.
Southern Blotting-A 502-bp HBB-specific probe in the 5Ј-side of the left homology arm was synthesized by PCR ampli- fication using the primer pair 5Јprobe-F/R and DIG-dUTP labeling kit (Roche Applied Science). Genomic DNA was digested by BglII, and then standard Southern blotting was performed following the instructions of DIG High Prime DNA Labeling and Detection Starter kit II (Roche Applied Science).
Flow Cytometry Analysis-Cells were digested by 0.25% trypsin (Invitrogen) and fixed with 1% paraformaldehyde for 10 min at 37°C. After washing with 2% fetal bovine serum (FBS; Excell) in PBS, cells were permeabilized with 90% methanol for 30 min on ice. After washing, cells were incubated with primary anti- bodies for 30 min at 37°C. Meanwhile, control samples were incubated with isotype control antibodies for 30 min at 37°C. After washing, cells were incubated with secondary antibodies for 30 min at 37°C. The cells were washed and resuspended in PBS and then analyzed on an Accuri C6 (BD Biosciences) (21). The antibodies used were OCT3/4 antibody (Santa Cruz Biotechnology, sc-5279), SSEA4 antibody (Abcam, AB16287), and HBB antibody (Santa Cruz Biotechnology, sc-21757). Quantitative Real Time PCR-Total RNA was extracted using TRIzol (Invitrogen) and reverse transcribed using oligo(dT) (Takara), and then quantitative PCR was performed with a CFX96 machine (Bio-Rad) and a SYBR Green Premix EX Taq TM kit (Takara) following the manufacturers' instruction manuals. ␤-Actin was used for quantitative RT-PCR normal-ization, and all data were measured in triplicate. Primer sequences are listed in Table 1.
Erythroblast Differentiation of Human iPS Cells-␤Thal654_ iPS and ␤Thal654_iPSCre16 cells were harvested by Dispase (Invitrogen) digestion and co-cultured with OP9 stromal cells for 8 days at 2.5 ϫ 10 6 cells/10-cm dish in 20 ml of ␣-minimum   Eagle's medium (Gibco) supplemented with 10% FBS (HyClone), 100 M monothioglycerol (Sigma), and 100 M vitamin C. Half of the culture medium was changed at days 4 and 6. CD34 ϩ cells were directly sorted out using a CD34 Progenitor Cell Isolation kit (Miltenyi Biotec) at day 8. Hematopoietic colony-forming unit (cfu) assays were performed using 2 ml/dish MethoCult GFϩ H4435 semisolid medium (Stemcell Technologies) following the manufacturer's instruction manuals on 35-mm low adherence plastic dishes (Monroe). The number of CD34 ϩ cells sorted by magnetic activated cell sorting for cfu assays was about 5 ϫ 10 5 cells. Colonies were counted after 12-14 days.

High Resolution Assay of Comparative Genomic Hybridization Microarray and Genome-wide Copy Number Variation
Analyses-Genomic DNAs extracted from donor cells (amniotic fluid cells), ␤Thal654_iPS cells, and ␤Thal654_iPSCre16 cells were digested using AluI and RsaI enzymes. A SureTag DNA labeling kit (Agilent) was applied for DNA labeling. First, different fluorescence dyes were used for DNA labeling of ␤Thal654_iPS cells (Cy5-dUTP) and the donor amniotic fluid cells (Cy3-dUTP). Labeled ␤Thal654_iPS cell DNA was hybridized with the labeled donor cell DNA following the instruction manuals of the SurePrint G3 human CGH microarray kit (1 ϫ 1 M, Agilent). Second, different fluorescence dyes were used for DNA labeling of ␤Thal654_iPSCre16 cells (Cy5-dUTP) and ␤Thal654_iPS cells (Cy3-dUTP). Labeled ␤Thal654_iPSCre16 cell DNA was hybridized with the labeled ␤Thal654_iPS cell DNA following the instruction manuals of the SurePrint G3 human CGH microarray kit (1 ϫ 1 M, Agilent). We followed oligonucleotide array-based CGH protocol version 6.0 (Agilent) to process DNA samples and handle and scan microarray profiles. Then the microarray scanning profiles were processed and analyzed by Feature Extraction 10.7.3.1 (Agilent) and Workbench 7.0 (Agilent). The threshold of the Aberration Detection Method-2 algorithm was set to 6.0 with Fuzzy Zero. CNVs were named by at least four consecutive probes with log 2 ratio (samples were labeled with a ratio of fluorescent Cy5 and Cy3) consistent with duplication or deletion (duplication and deletion are larger than 1 kb).
Exome Sequencing-Genomic DNAs were extracted from donor amniotic fluid cells, ␤Thal654_iPS cells, and ␤Thal654_iPSCre16 cells. We used SeqCap EZ Exome 64M (Roche NimbleGen) and a TruSeq DNA sample preparation kit (Illumina) to capture the exome and establish the exome sequencing library following the manufacturers' instruction manuals. All sequencing was carried out on an Illumina HiSeq 2000 sequencer with a paired end 2 ϫ 100-nucleotide multiplex. Human genome build GRCh37 (hg19) was selected as the reference human genome in these analyses. The 2 ϫ 100-nucleotide paired-end reads were mapped onto the human refer-ence genome using Burrows-Wheeler Alignment version 0.5.9. Potential PCR repetitions were removed using Samtools (version 0.1.18), and mapping profiles were analyzed using flagstat.
SNV and Indel Analyses-Targeted genomic regions had at least 30ϫ coverage. Candidate ␤Thal654_iPS mutations were defined as variants that were present in a given ␤Thal654_iPS exome but not in the donor amniotic cells, and candidate ␤Thal654_iPSCre16 mutations were defined as variants that were present in a given ␤Thal654_iPSCre16 exome but not in ␤Thal654_iPS. To exclude the false positives caused by insufficient depths of exome sequencing, we first filtered out the Indels and SNVs with coverage of sequencing depth less than 10ϫ. For Indel analysis, we also filtered out the direct repeats, homopolymers, and repetitive sequences that were caused by the technical limitation of high throughput, short read sequencing technologies (16). Selected SNVs and Indels were validated by Sanger sequencing. All primers sequences are listed in Table 1.

Correction of Homozygous Mutations of HBB Genes in ␤-Thal
iPSCs with Aid of ZFNs-We have previously derived an iPSC line from the amniotic cells of a fetus that was diagnosed with ␤-Thal major (IVS2-654), which was named ␤Thal654_iPS. The cell line carries two homozygous C3 T mutations at the second intron of HBB gene (2). A reporter assay showed that our ZFNs designed for HBB targeting exhibited satisfactory activity and specificity (2) (Fig. 1, A, B, and C). We failed to obtain an iPSC line with both HBB alleles corrected through one round of gene targeting. Thus, we used a two-step strategy to correct mutated HBB alleles sequentially with HBB-specific ZFNs (Fig. 1D). Then we constructed two targeting vectors containing different drug resistance genes, one for neomycin and the other for puromycin, to achieve homologous recombination for gene targeting (Fig. 1D). We first introduced the neomycin-resistant donor template together with ZFNs into the ␤Thal654_iPSCs. After selected by G418, we obtained the iPSCs with a single HBB allele targeted, which were named ␤Thal654_iPSG2 (Fig. 1E). The correction was further confirmed by genomic PCR, Southern blotting, and Sanger sequencing (Fig. 1, F, G, and H). Similarly, we introduced the second donor template with the puromycin resistance gene and ZFNs into ␤Thal654_iPSG2 and obtained an iPSC line with both HBB alleles targeted, which was named ␤Thal654_ iPSG2Pu11 (Fig. 1E). Both drug resistance genes were then excised by Cre recombinase to generate the final gene-corrected iPSCs, which were named ␤Thal654_iPSCre16 (Fig. 1E). The iPS clone was validated by genomic PCR and Southern blotting (Fig. 1, F and G). Lastly, by Sanger sequencing, we confirmed that the C3 T mutations of both alleles were both cor- rected in ␤Thal654_iPSCre16 (Fig. 1H). G binding analysis showed that both uncorrected (2) and corrected ␤-Thal iPSCs maintained a normal karyotype (Fig. 1I).
Characterization of the Gene-corrected iPSC-The gene-corrected ␤Thal654_iPSCre16 exhibited typical human embryonic stem cell morphology (Fig. 1E) and expressed pluripotent markers, such as OCT4, SOX2, NANOG, and SSEA4, as detected by FACS and quantitative RT-PCR (Fig. 2, A and B). Upon injection into immunodeficient mice, the corrected ␤Thal654_iPSCre16 cells could form teratomas containing all three germ layers (Fig. 2C). These data demonstrate that the pluripotency of iPSCs was maintained after ZFN-mediated gene targeting.
To further examine whether the correction of disease-causing mutations could restore the normal expression of HBB, we performed hematopoietic differentiation of uncorrected and corrected ␤-Thal iPS cells based on an OP9 co-culture protocol described previously (2,22). Upon OP9 co-culture, both uncorrected and corrected ␤-Thal iPS cells could differentiate rapidly and produce the CD34 ϩ /43 ϩ hematopoietic progenitor cells (23). These iPSC-derived hematopoietic progenitor cells could further differentiate into various mature blood lineages as analyzed by a cfu assay (Fig. 2D). Upon plating in a semisolid culture system, all types of colonies could be observed, including erythrocyte, granulocyte, megakaryocyte, granulocyte/megakaryocyte, erythrocyte/granulocyte/ megakaryocyte/macrophage (Fig. 2E). To examine the expression of HBB, we manually picked out erythrocyte colonies and analyzed the expression of HBB by quantitative RT-PCR and FACS. Because that the C3 T mutation at the second intron of HBB leads to abnormal splicing of the fulllength mRNA, its correction should restore the normal expression level of ␤-globin in red blood cells. Indeed, we showed that the level of ␤-globin significantly increased in gene-corrected ␤-Thal iPSCs compared with their uncorrected counterparts (Fig. 2F). Thus, these data demonstrate that the gene-corrected ␤-Thal iPSCs maintained the capability of differentiation into blood lineages and that our correction restored the expression of HBB.
Copy Number Variations-To assess the subchromosomal changes during the process of reprogramming and gene targeting, we performed sequential aCGH on the original amniocytes, amniocyte-derived ␤Thal654_iPS cells, and gene-corrected ␤Thal654_iPSCre16 cells (Fig. 3). Using the genome of the original donor cells as a reference, aCGH detected a number of large fragment deletions and duplications in ␤Thal654_iPS cells after reprograming, including three deletions and four duplications that impacted 20 genes (Table 2). Surprisingly, we only detected one small deletion in gene-corrected ␤Thal654_iPSCre16 (Table 3) when compared with the genome of its uncorrected counterpart. All CNVs detected by aCGH were further verified by quantitative genomic PCR (Fig. 4). These data indicate that the gene targeting process might not lead to large fragment abnormality of the genome even though it underwent multiple clonal events and genome editing. To rule out CNVs potentially caused by long term cell culturing, we reanalyzed the detected CNVs in the parental amniocytes and uncorrected iPSCs with different passages. As expected, we failed to detect the existence of such CNVs in parental amniocytes with different passages by using quantitative PCR. Moreover, the CNVs detected in iPSCs remained the same even with prolonged expansion and passages (Fig. 4). These data indicate that CNV generation occurred during the reprogramming or gene targeting process rather than during cell expansion and passaging.
Indels and SNVs-To detect the minor genomic changes at the nucleotide level, we performed whole exome sequencing on the original amniocytes, amniocyte-derived ␤Thal654_iPS cells, and corrected ␤Thal654_iPSCre16 cells. In comparison with the parental donor cells, we detected a total of 83 Indels in ␤Thal654_iPS cells before targeting. Consistent with previous reports that Indel calling usually generates a high rate of false positives (15,24,25), only five of 83 called Indels passed the more stringent bar. We confirmed that most of the false positive Indels were located within highly repetitive regions. With the same stringent bar set for Indel calling (see "Experimental Procedures"), we detected 19 Indels in gene-corrected iPS cells (Table 4). These data indicate that the gene targeting process tends to trigger more nucleotide level variations than the reprogramming process. Among them, only one of the 19 Indels found in corrected iPS cells affected the coding region of one known gene.
We also detected a fair number of SNVs generated by reprogramming and gene targeting (Table 5). Again, we found many more SNVs generated by gene targeting than by reprogramming (45 nonsynonymous SNVs in gene-corrected iPSCs versus two in uncorrected iPSCs; Table 5).
We further examined whether these SNVs could be generated through long term culturing and multiple passaging before gene targeting. We randomly selected seven SNVs detected in uncorrected iPSCs (at passage16) and reanalyzed them by Sanger sequencing in parental amniocytes or ␤Thal654_iPS cells with different passages (passages 1, 3, 5, and 7 for amniocytes and passages 16 and 26 for iPSCs). The results

Reprogramming-and Gene Targeting-induced Genome Variations
showed that all the randomly selected SNVs were absent in both parental amniocytes and iPS cells regardless of passage number (Table 6).
Regarding newly generated SNVs in gene-corrected iPS cells, we found that these SNVs were maintained in corrected iPSCs through multiple passages but never present in uncorrected iPS   (Table 7). These data exclude the possibility that long term culturing and multiple passaging generate genome variations during reprogramming and gene targeting.

DISCUSSION
iPS technology combined with gene targeting provides new ways to treat or investigate genetic diseases. However, safe evaluation standards of these genetically modified personalized iPSCs are lacking. Genomic variation is an important parameter to be considered for safe clinical application. In this study, by using ␤-Thal iPS cells as a model, we assessed genomic variations generated during factor-induced reprogramming and subsequent gene correction mediated by ZFN-aided gene targeting processes. We found that both factor-induced reprogramming and ZFN-aided gene targeting affected the genome integrity but at different levels. A fair number of large fragment variations (CNVs) were detected in ␤-Thal iPS cells after reprogramming, whereas few CNVs occurred during gene targeting. In contrast, gene targeting tends to generate nucleotide level variations rather than cause large fragment changes. It was not clear whether these variations were due to the off-target effect of ZFN or caused by multiple rounds of clonal selections. We indeed detected more SNVs in chromosome 20 than in other chromosomes (Fig. 5), but no evidence was found to support that the HBB ZFNs were prone to recognize the sequence in chromosome 20.
Other recent studies reported that the genome-editing tools did not seem to generate more intolerable variations at the single nucleotide level, such as SNVs or Indels (16). However, in final gene-corrected ␤-Thal iPS cells, we did detect three Indels and 46 nonsynonymous SNVs that could affect known gene functions. Considering that the whole gene correction process described here contains two rounds of gene targeting and one round of drug resistance gene excision, the numbers of detected SNVs and Indels are comparable with those of previous reports. With the decreasing cost of genome sequencing, it is possible and necessary to carefully assess these variations before further clinical application in the future. In addition, it is difficult to assess the subchromosomal changes based solely on the genome sequencing data (15). By using aCGH, we detected a fair number of CNVs in ␤-Thal iPSCs after the reprogramming process. However, the subsequent gene targeting and drug selection cassette excision generated minimal additional CNV changes. Our data suggest that reprogramming and gene targeting cause different genomic variations, and these variations need to be analyzed by appropriate approaches in the safety evaluation process before clinical applications.