Role of the PWWP Domain of Lens Epithelium-derived Growth Factor (LEDGF)/p75 Cofactor in Lentiviral Integration Targeting*

LEDGF/p75 is a chromatin-interacting, cellular cofactor of HIV integrase that dictates lentiviral integration site preference. In this study we determined the role of the PWWP domain of LEDGF/p75 in tethering and targeting of the lentiviral pre-integration complex, employing potent knockdown cell lines allowing analysis in the absence of endogenous LEDGF/p75. Deletion of the PWWP domain resulted in a diffuse subnuclear distribution pattern, loss of interaction with condensed chromatin, and failure to rescue proviral integration, integration site distribution, and productive virus replication. Substitution of the PWWP domain of LEDGF/p75 with that of hepatoma-derived growth factor or HDGF-related protein-2 rescued viral replication and lentiviral integration site distribution in LEDGF/p75-depleted cells. Replacing all chromatin binding elements of LEDGF/p75 with full-length hepatoma-derived growth factor resulted in more integration in genes combined with a preference for CpG islands. In addition, we showed that any PWWP domain targets SMYD1-like sequences. Analysis of integration preferences of lentiviral vectors for epigenetic marks indicates that the PWWP domain is critical for interactions specifying the relationship of integration sites to regions enriched in specific histone post-translational modifications.

Stable integration of the viral DNA into the host genome is one of the hallmarks of retroviral replication and has profound consequences for both the virus and the host. For the virus it is essential to direct integration in the host cell chromatin to sites that allow efficient gene expression to fulfill a successful infection cycle. Retroviruses from different genera have evolved to integrate their genomic DNA at different sites in the genome of their respective hosts. Human immunodeficiency virus (HIV) and other lentiviruses have a strong preference for integration in active transcription units (1), and murine leukemia virus (MLV) genomes preferentially integrate near transcription start sites and CpG island regions (2), whereas avian sarcoma leukosis virus (3,4) and the human T-cell leukemia virus display a much weaker preference for these features (5,6), only weakly favoring integration in active transcription units. Deciphering the mechanisms that dictate integration site selection is instrumental to better understand basic retrovirology and its clinical applications such as drug development and gene therapy.
Defects in lentiviral infection are only observed after potent knockdown of LEDGF/p75 because residual protein levels are sufficient to support integration (13,14,21). This initially blurred the interpretation causing controversy with two studies that failed to observe a reduction in HIV replication after partial knockdown of LEDGF/p75 (22,23). Although weak knockdown fails to demonstrate clear effects on HIV replication, the genomic distribution of HIV integration sites is already significantly affected under these circumstances (16,24). The recent development of LEDGINs, small molecules that disrupt the LEDGF/p75-IN interaction, potently block HIV-1 replication and prove the requirement of LEDGF/p75 for HIV replication (25). The requirement of potent LEDGF/p75 knockdown (Ͼ90% depletion) to obtain a phenotype for HIV replication is the reason why LEDGF/p75 was not detected as a cofactor in any of the four genome-wide RNAi-based screens (26 -29).
Detailed mapping of the chromatin binding profile of LEDGF/p75 by DamID technology revealed an association with markers of active chromatin and a disfavoring of promoter regions, a profile paralleling that of HIV-1 integration (30). Whereas the interaction between LEDGF/p75 and IN is well studied (12,(31)(32)(33)(34)(35), the molecular basis underlying the LEDGF/p75 chromatin interaction is largely unknown. Elements in the N-terminal portion of LEDGF/p75 have been shown to be necessary for chromatin binding, including a PWWP domain (amino acids 1-93) and two AT hook-like motifs (amino acids 178 -197) (19, 36 -38) (Fig. 1).
A PWWP domain contains a relatively well conserved Pro-Trp-Trp-Pro signature that is related to the Tudor domain Royal Family (39,40). The domain is present in diverse chromatin-associated proteins involved in DNA repair, histone modification, transcriptional regulation, and DNA methylation (41)(42)(43). Although the function of the domain is unknown, the PWWP module has been reported to bind DNA without sequence specificity (41,44) and to have methyllysine binding ability (39,40). The largest homologous group of PWWP-containing proteins is related to hepatoma-derived growth factor (HDGF), classified as HDGF-related proteins (HRPs) (41). HDGF has been demonstrated to act as a transcriptional repressor; HDGF specifically binds to a conserved DNA element in the promoter of target genes, such as SMYD1, via its PWWP domain (45). Besides LEDGF/p75, one other HRP family member, HRP-2, contains a functional integrase binding domain. Although HRP-2 stimulates recombinant IN activity in vitro and restores function to salt-depleted HIV-1 PICs (23,34), HRP-2 does not mediate tethering of HIV integrase to condensed chromatin (46).
In an effort to understand the role of the PWWP domain in LEDGF/p75-mediated tethering and targeting of the PIC, we employed potent LEDGF/p75 knockdown cell lines to permit unambiguous analysis in the absence of endogenous LEDGF/ p75 (17). Deletion of the PWWP domain disrupts association of LEDGF/p75 with condensed mitotic chromatin (37). We generated an N-terminal LEDGF/p75 truncation that lacks the PWWP module together with a set of chimeric proteins in which the PWWP domain of LEDGF/p75 was swapped for that of two other HRPs: HDGF and HRP-2. In addition, we fused full-length HDGF to the C-terminal end of LEDGF/p75 (LEDGF 325-530 ). These proteins were used to complement LEDGF/p75-depleted cells. All fusion proteins and chimeras were evaluated for nuclear localization, binding to cellular chromatin, interaction with HIV-1 IN, and rescue of lentiviral vector transduction and HIV virus replication. Ultimately, pro-viral integration sites were identified, and the genomic distribution of proviral integration sites was analyzed.
Vector Transduction and Analysis-For transduction experiments 20,000 cells were plated per well in a 96-well plate and transduced. 72 h post-infection, cells were reseeded in two new 96-well plates, one for eGFP FACS analysis and one for luciferase activity assay. For FACS, transduced cells were fixed in a final concentration of 2% paraformaldehyde. Overall eGFP expression (mean fluorescence intensity) multiplied by the percentage of gated cells was measured with a FACSCalibur flow cytometer (BD Biosciences) and analyzed using the CellQuest software package provided with the instrument. For luciferase activity measurement, transduced cells were washed with 1ϫ PBS and subsequently lysed with 70 l of lysis buffer (50 mM Tris, pH 7.5, 200 mM NaCl, 0.2% Nonidet P-40, 10% glycerol). The lysate was assayed according to the manufacturer's protocol (ONE-Glo™ Luciferase Assay System, Promega, Madison, WI). Data were normalized for total protein (BCA Protein Assay kit, Pierce). All conditions were run at least in triplicate in each experiment.
Determination of Integrated Copies Using Quantitative Realtime PCR-Integrated proviral copies were determined essentially as described earlier (17). Cells transduced with HIV-based or EIAV-based lentiviral vectors were cultured for 2 weeks to eliminate all non-integrated DNA. Cells infected with wild type HIV-1 NL4.3 virus were split after 5 days of infection and cultured under azidothymidine/ritonavir at 1 g/ml (i.e. 20-fold IC 50 as determined in MT4/3-(4,5-dimethylthiazol-2-yl)-2,5diphenyltetrazolium bromide (MTT)) assay (53); NIH AIDS Research and Reference Reagent Program) to eliminate all nonintegrated proviral copies. Genomic DNA was extracted using the GenElute Mammalian genomic DNA miniprep kit (Sigma). Samples corresponding to 100 ng of genomic DNA were used as input for quantitative PCR. Integrated copies for HIV-based lentiviral vectors or HIV-1 NL4.3 virus were measured using a Woodchuck Hepatitis Virus Posttranscriptional Regulatory Element or Gag-derived primer-probe set, respectively. Integrated copies for EIAV-based viral vectors were measured using an EIAV-specific WPRE-derived primer-probe set (16). The reaction contained 12.5 l of 2ϫ iQ Supermix (Bio-Rad), 40 nM forward and reverse primer, and a 40 nM concentration of the probe in a final volume of 25 l. In all cases RNaseP was quantified as an endogenous housekeeping control (TaqMan RNaseP control reagent, Applied Biosystems, The Netherlands). All samples were run in quadruplet and subjected to an initial 3-min denaturation at 95°C followed by 50 cycles of 10-s denaturation at 95°C and extension at 55°C for 30 s. Data were analyzed with iQ5 Optical System Software (Bio-Rad).
Virus Strains-The molecular clone pNL4.3 was obtained through the NIH AIDS Research and Reference Reagent Program. Virus was produced on 293T cells after PEI transfection of the molecular clone. After 4 days the supernatant was harvested. The latter was used to inoculate MT-4 cells to grow HIV NL4.3 virus stocks. Supernatant was collected after 4 days, and aliquots were stored at Ϫ80°C. The p24 titer was determined to quantify the viral concentration.
HIV-1 Replication Assay-Cells were seeded at 20,000 cells per well in a 6-well dish or at 30,000 cells per T25 flask and infected 1 day later as described earlier with minor modifications (14). Briefly, cells were infected for 24 h with HIV NL4.3 (multiplicity of infection 0.01). Cells were washed twice with 1ϫ PBS before the addition of 4 ml of fresh complete DMEM medium. HIV replication was followed in time by sampling the supernatant every 24 h. HIV replication was followed visually by cytopathogenic effect scoring and by p24 ELISA (HIV-1 p24 Elisa kit, PerkinElmer Life Sciences). Cultures were sampled until cells were confluent or until full blown CPE. In each experiment all cell lines were included in triplicate.
Integration Site Cloning-Integration sites were amplified by linker-mediated PCR as described previously (17). Genomic DNA was digested using MseI, and linkers were ligated. Provirus-host junctions were amplified by nested PCR using barcoded primers. This enabled pooling of PCR products into one sequencing reaction. Products were gel-purified and sequenced on the 454 GS-FLX instrument at the University of Pennsylvania.
Bioinformatic Analysis-Integration sites were analyzed essentially as described earlier (17). Briefly, sites were judged to be authentic when the sequences contained a proper bar code and long terminal repeat (LTR) primer and had a best unique hit when aligned to the human genome as appropriate (hg18) using BLAT, and the alignment began within 3 bp of the viral LTR end and had Ͼ98% sequence identity. Statistical methods 5 M. Patel and J. Olsen, unpublished information. are described in detail by Berry et al. (54). Integration site frequencies were compared with matched random controls (MRCs) by Fisher's exact test (where stated). Analysis was carried out using The R Project for Statistical Computing. Heat maps are developed to summarize many relationships using receiver operating characteristic (ROC) area method (54). The construction of a ROC curve and a guide to interpreting heat maps is detailed in Brady et al. (55). Using the 80-bp SMYD1 promoter sequence obtained from Yang and Everett (45), the number of 80-bp sequence motif tags (80% sequence identity was set as a threshold to be considered as a valid hit) in a defined window around each integration site or MRC was calculated. Histone modification data were used from Barski et al. (56), Wang et al. (57), and Robertson et al. (58).

Generation of LEDGF/p75
Chimeras-To investigate the role of the N-terminal DNA binding elements of LEDGF/p75 in HIV integration targeting, we generated a series of chimeric proteins, specifically deleting, replacing, or altering these elements ( Fig. 1). In a first step we targeted the PWWP domain of LEDGF/p75. We deleted the complete domain (amino acids 1-93) to generate ⌬N 93 -LEDGF. To study the role of the PWWP domain, we replaced the PWWP domain of LEDGF/ p75 with that of HDGF (amino acid 1-146) and HRP-2 (amino acid 1-106), generating PWWP HDGF -LEDGF and PWWP HRP2 -LEDGF, respectively. Although the PWWP domains of LEDGF/p75, HDGF, and HRP-2 are 80% identical, essential amino acids believed to be involved in the interaction with chromatin are different among the different proteins (41) (supplemental Fig. S1). HDGF has been shown to act as a transcriptional repressor by binding a conserved element in the promoter of target genes via its N-terminal PWWP domain (45). Additionally, we replaced the chromatin binding elements of LEDGF/p75 (amino acid 1-324) with full-length HDGF (HDGF-LEDGF 325-530 ). Subsequently, constructs encoding the chimeric proteins were stably introduced into LEDGF/p75depleted cells (17) using MLV-based viral vectors and selected with blasticidin. Control cell lines expressing RNAi-resistant LEDGF/p75 (LEDGF BC) or the HIV integrase interaction-deficient D366A mutant (LEDGF D366A ) were generated in parallel (11,35). Growth rates were comparable with that of the parental HeLaP4-CCR5 cell line for all generated cell lines (data not shown). Expression of the fusion proteins was verified by Western blotting (Fig. 2). Although not all proteins were expressed to the same extent, all fusion proteins migrated at the correct molecular weight.
LEDGF/p75 Chimeras Tether HIV IN to the Nucleus-Whereas endogenous LEDGF/p75 typically appears as dense fine speckles in the interphase nucleoplasm, it localizes to condensed chromatin during mitosis (59). Immunocytochemistry for LEDGF/p75 demonstrated no fluorescence in the LEDGF/ p75-depleted cells (Fig. 3a) (17). Complementation of the depleted cells with LEDGF BC or the D366A mutant (LEDGF D336A ) restored both the fine speckled nuclear pattern  Stable KD cell lines were complemented with the respective LEDGF/p75 hybrids. Total cell extracts were prepared and separated on a 12.5% SDS gel. A LEDGF 325-530 -specific antibody was used for detection.
during interphase and the binding to mitotic chromatin, in line with earlier reports (46,59,60) (Fig. 3, b and i, and Fig. 3, c and j, respectively). Truncation of the PWWP domain (⌬N 93 -LEDGF) resulted in a more diffuse subnuclear distribution with loss of nucleolar exclusion (Fig. 3d) and a loss of interaction with condensed chromatin (Fig. 3k), indicating that the PWWP domain is a major determinant for chromatin association. Swapping the PWWP domain of LEDGF/p75 with that of either HDGF or HRP-2 recovered the speckled nuclear distribution with nucleolar exclusion (Fig. 3, e and f, respectively), comparable with wild-type LEDGF/p75, and restored interaction with mitotic chromosomes (Fig. 3, l and m, respectively). Replacing the N-terminal end of LEDGF/p75, containing all chromatin binding elements, with full-length HDGF resulted in similar staining (Fig. 3, g and n, respectively), demonstrating that the N-terminal domain of LEDGF/p75 can be replaced with alternative chromatin binding elements, in line with earlier reports (17,19).
In addition to nuclear localization, LEDGF/p75 hybrids should support chromatin tethering of HIV-IN. Transient expression of HIV-1 integrase fused to monomeric red fluorescent protein (mRFP-IN s ) resulted in a diffuse fluorescent signal throughout the cytoplasm and the nucleus in the absence of LEDGF/p75 (Fig. 3, a and h, red fluorescence), as previously reported (17,22,59). Expression of LEDGF BC relocated mRFP-IN s to the nucleus and to condensed chromatin (Fig. 3, b and i), whereas complementation with the interaction-deficient LEDGF D336A did not (Fig. 3, c and j), corroborating the requirement of a direct interaction of the integrase binding domain with IN for both the nuclear and condensed chromatin binding. Although deletion of the LEDGF/p75 PWWP domain (⌬N 93 -LEDGF) supported nuclear localization of IN during interphase, interaction with condensed chromosomes was lost (Fig. 3, d and k, respectively). However, substitution of the PWWP domain of LEDGF/p75 with that of HDGF or HRP-2 rescued nuclear localization of IN, resembling the wild-type LEDGF/p75 phenotype (Fig. 3, e and l and Fig. 3, f and m,  respectively). Likewise, fusion of LEDGF 325-530 to full-length HDGF relocated IN to the nucleus of interphase cells and to condensed chromatin (Fig. 3, g and n, respectively).
Next we quantified the number of integrated proviral copies in the genomic DNA of the respective cell lines after transduction with HIV-based lentiviral vectors (Fig. 4B). In consonance with previous data (17), introduction of LEDGF BC in KD cells resulted in a 3.9-fold increase in integrated copies, with levels comparable with those levels obtained in wild-type cells (data not shown), whereas complementation with LEDGF D366A or ⌬N 93 -LEDGF did not significantly increase the number of integrated copies over KD cells (p Ͼ 0.1, two-tailed t test). Complementation of LEDGF/p75-depleted cells with the PWWP chimeras rescued integration to near wild-type levels (87 and 73% that of LEDGF BC for the PWWP HDGF -and PWWP HRP2 -LEDGF, respectively). Similarly, introduction of HDGF-LEDGF 325-530 recovered 75.4% of the LEDGF BC levels.
In addition to HIV-IN, LEDGF/p75 interacts with other lentiviral integrases (22,61,62). To evaluate the potency of the LEDGF/p75 chimeras to complement other lentiviruses, we transduced the respective cell lines and control cells with an EIAV-based viral vector, engineered to encode eGFP and fLuc reporters. Introduction of the respective LEDGF/p75 fusions in knockdown cells rescued EIAV transduction and integration, paralleling the results obtained for the HIV-based viral vectors (supplemental Fig. S4).
Rescue of Spreading HIV Infection by LEDGF/p75 Chimeras-We next evaluated the capacity of the LEDGF/p75 chimeras to rescue HIV-1 replication. Cell lines stably expressing LEDGF/ p75 chimeras were infected with HIV-1 NL4.3 virus (28,000 pg of p24/ml; multiplicity of infection 0.01). Viral replication was monitored by sampling the culture medium at regular intervals and subsequent determination of the p24 concentration by ELISA. Experiments were repeated independently at least three times. A representative HIV replication experiment is shown (Fig. 5A). As expected, HIV replication was significantly inhibited in KD cells (compared with HeLaP4-CCR5 control cells (WT). Complementation of the KD cells with LEDGF BC restored HIV replication to near wild-type levels, whereas expression of the interaction-deficient LEDGF D366A did not restore HIV replication. Cells expressing ⌬N 93 -LEDGF, lacking the PWWP domain, supported HIV replication only marginally but significantly above the level observed for the KD cells. By comparison, exchanging the PWWP of LEDGF/p75 with that of either HDGF or HRP-2 restored HIV replication to near  DECEMBER 5A). Comparable results were obtained when cells were infected at higher multiplicity of infection (supplemental Fig.  S5). Estimation of integrated HIV copies by quantitative PCR (Fig. 5B) demonstrated rescue of viral integration to wild-type levels upon introduction of LEDGF BC (8.3-fold more than KD), whereas expression of LEDGF D366A or ⌬N 93 -LEDGF did not. The number of integrated proviral copies for LEDGF D366Acomplemented cells was not different from that in KD cells (p ϭ 0.2857, two-tailed t test), whereas ⌬N 93 -LEDGF reached significance (p ϭ 0.0193, two-tailed t test). Swapping the LEDGF/p75 PWWP domain with that of HDGF or HRP-2 or fusion of LEDGF 325-530 to HDGF restored HIV proviral integration to 61.6, 65.3, and 39.8% of LEDGF BC, respectively.

PWWP Domain and Integration Targeting
The Integration Site Consensus Sequence Is Not Affected by LEDGF/p75 Hybrids-We next asked whether altering the DNA binding elements of LEDGF/p75 affects integration site distribution. Integration sites were determined for an EIAVbased viral vector, as HeLaP4 cells contain an integrated HIV long terminal repeat that might interfere with the isolation of HIV proviral integration sites. Both EIAV and HIV-IN interact with LEDGF/p75 and show the same integration site preferences (16,63). EIAV integration sites were analyzed as described previously (17), yielding a total of 4923 integration sites. Random control sites, matched to each experimental integration site in terms of distance to the nearest MseI cleavage site, were generated computationally (MRC; see "Experimental Procedures"). Generation of MRCs accounts for biases in the recovery of integration sites based on their proximity to MseI sites and allows for more accurate statistical analysis (3,54). Retroviral integrases display a virus-specific target sequence preference around the site of integration. Consistent with previous reports (15)(16)(17), the characteristic palindrome at the site of integration was conserved (supplemental Fig. S6) even when the genomic distribution of integration sites is altered, underscoring that IN determines the local sequence preference in the target DNA independent of the tethering mechanism.
Genomic Distribution of Integration Sites in Cells Containing LEDGF/p75 Hybrids-Lentiviruses favor integration into transcription units and gene-dense regions (1,3,63). Depletion of LEDGF/p75 reduces this preference, and a preference for integration close to CpG islands and gene 5Ј ends emerges instead (15,16). As an initial survey of the proviral integration site distribution in the different complemented cell lines, we first analyzed the integration frequency relative to these features. All integration sets were significantly different from their respective MRC sites for the integration into RefSeq genes in a Mann-Whitney test (p Ͻ 0.001; Table 1, statistics not shown). Integration frequency in RefSeq transcription units reduced for KD cells to 52% (Table 1), and tests using other human gene catalogs yielded similar results (supplemental Fig. S7), in agreement with earlier data for LEDGF/p75-depleted cells (16,17). Integration in LEDGF KD cells was favored near CpG islands (Table  1). Both trends were reversed to wild type by LEDGF/p75 complementation (LEDGF BC). The integration site distribution after expression of integrase interaction-deficient LEDGF D366A was similar to that of KD cells (Table 1). In addition, we com-  pared integration site distributions of the different sets for a selection of genomic features. The heat map (Fig. 6) summarizes relationships between integration sites and specific genomic features (54,55). Enriched associations compared with random (MRC) are displayed as increasing shades of red, and negative associations are displayed as increasing shades of blue, with no difference as gray tiles. Statistical significance compared with the KD data set is determined by regression and represented by asterisks overlaid on the tile (54). Again, integration site distribution in LEDGF D366A cells was similar to that of KD cells (Fig. 6).
PWWP Swaps Rescue the Knockdown Phenotype-Next, the role of the PWWP domain of LEDGF/p75 in integration site targeting was investigated. In a first step we deleted the PWWP domain (⌬N 93 -LEDGF). Introduction of ⌬N 93 -LEDGF into LEDGF/p75-depleted cells failed to rescue the KD phenotype (Table 1) and showed diminished integration in RefSeq genes and a preference for integration near CpG islands as in LEDGF/ p75 KD (p ϭ 0.929 and p ϭ 0.234 compared with KD, respectively; Mann-Whitney test). There was no additional alteration in the genomic features heat map (Fig. 6, no statistical difference compared with KD). Swapping the PWWP domain of LEGDF/p75 with that of HDGF or HRP-2 rescued integration frequency in RefSeq genes with the concomitant reduction of integration near CpG islands to levels observed in LEGDF BC cells ( Table 1). The correlation of integration sites with several genomic features was restored to levels observed in LEDGF BC cells (supplemental Fig. S8, no statistical difference compared with LEDGF BC). Replacing all DNA binding elements in LEDGF/p75 with full-length HDGF (HDGF-LEDGF 325-530 ) recovered integration into transcription units (84.1% in RefSeq genes; Table 1), favoring integration in the body of genes significantly more than LEDGF BC (p ϭ 1.84e-6, Mann-Whitney test). Integration in HDGF-LEDGF 325-530 -expressing cells was still preferred near CpG islands as observed in KD cells (4.8%; p ϭ 0.5477 compared with KD, Mann-Whitney test). In addi-FIGURE 6. Heat map of integration frequency relative to genomic features. A heat map summarizes the relationships of proviral integration site data sets to genomic features. Integration data sets are indicated above the columns. Genomic features analyzed are shown to the left of the corresponding row of the heat map (55). Tile color indicates whether a chosen feature is favored (red, enrichment compared with random) or disfavored (blue, depletion compared with random) for integration for the respective data sets relative to their MRCs, as detailed in the colored ROC area scale at the bottom of the panel. p values show significance of departures from the KD data set are shown with asterisks (*, p Ͻ 0.05; **, p Ͻ 0.01; ***, p Ͻ 0.001, Wald statistics referred to 2 distribution). The naming of the genomic features is described in Brady et al. (55); TSS, transcription start site. tion, the genomic heat map demonstrated favored integration in GC-rich regions for HDGF-LEDGF 325-530 -expressing cells ( Fig. 6 and supplemental Fig. S8).
HDGF Fusion Does Not Retarget to HDGF Binding Sites-We and others demonstrated that fusion of alternative DNA-binding proteins to the C-terminal end of LEDGF/p75 allows retargeting of integration in the neighborhood of the respective protein binding sites (17,18). HDGF has been demonstrated to bind specifically to SMYD1 promoter-like sequences (45). We analyzed the number of the HDGF binding sites around each lentiviral integration site using various window sizes (5 and 10 kb). The frequency of the SMYD1-like sequence near integration sites of KD and ⌬N 93 -LEDGF-expressing cells was not dif-ferent compared with their respective MRCs (Table 2; p Ͼ 0.05, Fisher's exact test). However, the frequency was significantly different for all other sets (p Ͻ 0.001, Fisher's exact test). Pairwise comparison of the frequency of the SMYD1-like sequences of the different integration site sets to that of LEDGF BC only reached significance for KD cells, which showed reduced frequency (Mann-Whitney test). Thus all the functional fusion proteins targeted integration near SMYD1-like sequences, likely a consequence of targeting integration to gene-rich regions. Elevated frequency of targeting SMYD1-like sites was seen for HDGF-LEDGF 325-530 (52.8% for the fusion versus 47.8% for the LEDGF BC; 5-kb window), but the trend did not achieve significance given the size of the dataset; when combining the LEDGF BC and WT data sets, a significant difference was observed for HDGF-LEDGF 325-530 (p Ͻ 0.01 for 5-and 10-kb windows).
PWWP Domains Retarget to Specific Histone Modifications-Recently, it was demonstrated that PWWP modules in several other proteins recognize methyllysine residues on histone tails (57, 64 -66). We thus evaluated the role of the PWWP domain in the integration site distribution relative to histone modifications for LEDGF BC, PWWP HDGF -LEDGF, and PWWP HRP2 -LEDGF, the PWWP domain being the sole domain that differs among these proteins. For analysis we used high resolution maps from a study in HeLa cells, where H3K4 mono-and trimethylation ChIP-seq profiles were identified (58), supplemented with the maps for the genome-wide distribution of 39 histone modifications in human CD 4 ϩ T cells (56,57). Associations of integration sites and histone methylation/acetylation were quantified (Fig. 7, 10-kb window). Histone modifications are grouped into clusters reported to co-localize and associate FIGURE 7. Heat map of integration frequency relative to epigenetic features. The heat map of integration frequency relative to epigenetic marks is shown. ChIP-seq data from HeLaS3 cells (58) and human CD 4 ϩ T cells (56,57) were used. Detailed information on the epigenetic marks can be found in Refs. 56 and 57. Associations of integration and histone methylation/acetylation were quantified using ROC areas, comparing the association of integration site data sets with the frequency in corresponding MRC sets. A ROC area scale is shown along the bottom of the panel. Tile color indicates whether a chosen feature is favored (blue, enrichment relative to random) or disfavored (yellow, negative correlation compared with random) for integration (10-kb window) in the respective data sets relative to their MRCs. p values showing significance of departures from WT cells are shown with asterisks (**, p Ͻ 0.01; ***, p Ͻ 0.001, Wald statistics referred to 2 distribution), dashes overlay control tiles. with classes of functional genomic elements (57). Tile color reflects whether a feature is favored (blue, enriched relative to random) or disfavored (yellow, depleted compared with random) for integration in the respective data set. Integration frequency showed similar patterns for all four data sets, positively correlating with histone modifications generally associated with active transcription (acetylations and monomethylation of H3K27, H3K9, H4K20, H3K79, H2BK5, etc.) and negatively associating with markers common to transcriptionally repressed regions (e.g. H3K27me3, H3K9me3) and heterochromatin (e.g. H4K20me3 and H3K79me3). Significant departures from WT, indicated with asterisks (**, p Ͻ 0.01; ***, p Ͻ 0.001, Wald statistics referred to 2 distribution), are most prominent for PWWP HRP2 -LEDGF (Fig. 7). To deduce whether certain epigenetic marks are favored by a specific PWWP domain, we counted the epigenetic marks around integration sites for all PWWP data sets (Fig. 8). The plot compares the distribution of epigenetic counts of experimental PWWP data sets versus control (WT) for several window sizes (10 kb, 100 kb, 1 Mb windows). As expected, no significant difference was observed for LEGDF BC (p Ͼ 0.05; Wilcoxon rank sum test). However, when the PWWP of LEDGF/p75 was replaced with that of HDGF or HRP-2, significance was reached for several modifications (10-kb and 100-kb window). Integration was preferred near H3K36me3 for PWWP HDGF -LEDGF (p Ͻ 0.001; Wilcoxon rank sum test), and a ratio Ͼ2 was obtained for H4R3me2 and H3K9me3 (p Ͻ 0.05; Wilcoxon rank sum test). Integration in PWWP HRP2 -LEDGF-expressing cells was preferred near acetylated H2BK5, H3K27, H4K8, H4K12, and H4K16 and near H2BK5me, H3K9me1, H3K27me1, H3K36me3, and H4K20me1 (p Ͻ 0.01; Wilcoxon rank sum test). Taken together, these results indicate that the PWWP domain is involved in either direct recognition of epigenetic marks or else binds factors positioned on chromosomes in a fashion correlated with epigenetic marks.
Role of the AT Hook-like Motifs-Integration site analysis indicated that deletion of the PWWP domain failed to rescue the KD integration site distribution (Table 1 and Fig. 6). In addition, we demonstrated that the PWWP domain is needed for efficient HIV integration and replication (Fig. 5). However, complementation of LEDGF/p75-depleted cells with ⌬N 93 -LEDGF still rescued viral vector transduction and virus replication, albeit only marginally. One explanation could be that the AT-hook-like motifs support nonspecific binding to chromatin (36,70,47). In an effort to study the contribution of the AThook-like domains, we mutated both AT-hook-like motifs by site-directed mutagenesis (R183D, K192D, R196D) (47) and complemented LEDGF/p75-depleted cells (⌬N 93 -LEDGF AT-). ⌬N 93 -LEDGF ATlocates to the nucleus and does not interact with mitotic chromatin (supplemental Fig. S9a) (36,38). HIVbased lentiviral vectors showed a reduced transduction efficiency compared with ⌬N 93 -LEDGF cells (supplemental Fig.  S9b; p ϭ 0.0113, two-tailed t test), in line with earlier reports (38,67). Next, stable ⌬N 93 -LEDGF and ⌬N 93 -LEDGF ATcells were infected with HIV-1 NL4.3 virus (supplemental Fig. S9c). Although HIV replication was already severely affected in cells complemented with ⌬N 93 -LEDGF compared with LEDGF/p75 BC, additional mutation of the AT hook-like motifs reduced HIV replication to the background levels obtained in LEDGF D366A or LEDGF/p75-depleted cells (supplemental Fig.  S9c, inset).

DISCUSSION
To determine whether the PWWP domain plays a role in LEDGF/p75-mediated tethering and targeting of the lentiviral PIC, we employed stable knockdown cell lines to allow analysis in the absence of endogenous LEDGF/p75 protein (17). We deleted or replaced the PWWP domain of LEDGF/p75 and studied IN localization, HIV replication, and lentiviral integration site distribution in the host genome. Stable integration of the viral DNA links the fate of the invading virus with that of the host cell. For the virus, integration in active chromatin facilitates viral gene expression, which in turn will result in a successful infection cycle. Although the role of the PWWP domain has been studied before (19, 36 -38, 68 -73), we provide the first analysis of its impact on integration site selection.
Deletion of the PWWP domain in LEDGF/p75 resulted in a diffuse subnuclear distribution, loss of nucleolar exclusion (Fig.  3) (19,36,37,68), and loss of interaction with mitotic chromatin (19,37). The PWWP domain of LEDGF/p75 appears to exert a dominant effect on the protein intranuclear localization, excluding it from the nucleoli (19,46,68,74). Nucleolar localization is not a default pathway; current understanding suggests that interaction with nucleolar hub proteins is required (75,76). The fact that PWWP-deleted LEDGF/p75 locates to the nucleolus might be explained by a "cryptic" nucleolar localization sequence that is overruled by the presence of PWWP. Several motifs have been identified that can target a protein to the nucleolus (76), such as the (R/K)(R/K)X(R/K) motif of the human La protein (77), which occurs in multiple copies in ⌬N 93 -LEDGF.
Conflicting results with regard to the relevance of the PWWP domain for the LEDGF/p75 cofactor activity during HIV replication have been reported. Although the domain was shown to be essential for HIV-based vector transduction in LEDGF/ p75 Ϫ/Ϫ mouse fibroblasts (38) (rescuing Յ0.1-20% of WT activity) and in LEDGF/p75-depleted human CD 4 ϩ T cells (19, 67) (44.2 and 63% of WT activity, respectively), it also was found to be dispensable (73). In our hands complementation of LEDGF/p75-deficient cells with ⌬N 93 -LEDGF restored transduction efficiency only partially (48% of WT activity; Fig. 4A), in agreement with the initial reports in the LEDGF/p75-depleted CD 4 ϩ T cells (19,67). The absence of a significant rescue of integrated proviral copies relative to LEDGF/p75-depleted cells indicates that ⌬N 93 -LEDGF does support productive virus replication only poorly. This was confirmed in multiple-round HIV infection experiments (Fig. 5B). The residual activity compared with LEDGF/p75 KD may originate from non-integrated vector copies or an increased stability of the IN complex in the presence of ⌬N 93 -LEDGF (60). One explanation could be that the AT-hook-like motifs support nonspecific binding to chromatin (36,47,70). Additional mutation of the AT hook-like motifs in ⌬N 93 -LEDGF further reduced viral vector transduction and HIV virus replication to the background levels of LEDGF/p75-depleted or interaction-deficient LEDGF D366A control cells (p Ͼ 0.05 compared with KD and LEDGF D366A ; Student's t test), suggesting a role for the AT hook-like region in the marginal rescue of HIV replication by ⌬N 93 -LEDGF.
We also generated LEDGF/p75 hybrids where we exchanged the PWWP domain of LEDGF/p75 with that of HDGF or HRP-2, and we replaced the chromatin binding N terminus of LEDGF/p75 with HDGF (HDGF-LEDGF   duction with HIV-based viral vectors to near wild-type levels (Fig. 4). We conclude that the PWWP domain in LEDGF/p75 is essential for HIV integration and replication and that different PWWP domains can fulfill this requirement. However, the expression of PWWP HDGF -LEDGF and PWWP HRP2 -LEDGF proteins in LEDGF/p75-depleted cells did not fully rescue HIV NL4.3 replication to wild-type levels (Fig. 5), indicating that the natural LEDGF/ p75 PWWP domain may have unique properties. We analyzed lentiviral integration site distributions and confirmed the characteristic palindrome at the site of integration was conserved in all data sets (supplemental Fig. S6), consistent with data that the sequence is determined by integrase (78,79). Deletion of the PWWP domain resulted in an integration pattern like that found in LEDGF/p75-depleted cells or cells containing interaction-deficient LEDGF D366A (Table 1 and Fig. 6). The patterns in the absence of the PWWP domain showed reduced integration into transcription units and gene-dense regions and a preference for CpG islands and gene 5Ј ends. Integration frequency relative to genomic features was restored to wild-type levels when the PWWP domain of LEDGF/p75 was replaced with that of other HRP proteins (HDGF, HRP-2). Replacing all chromatin binding elements in LEDGF/p75 with full-length HDGF resulted in significantly more integration in genes compared with LEDGF BC combined with a preference for CpG islands and favored integration in GC-rich regions (Table 1 and Fig. 6). Even though integration into genes was favored, HDGF-LEDGF 325-530 could not rescue HIV replication or integration to fully wild-type levels. Lentiviral integration generally favors gene dense regions, predictably being rich in GC content in large windows around integration sites in WT cells (80). However, when considering only small genomic intervals surrounding integration sites, lentiviral integration prefers A/T-rich loci within the larger GC-rich regions (supplemental Fig. S8; compare GC content Ͼ50 and Ͻ50 kb for WT and LEDGF BC). It has been suggested that the AT-hook-like motifs in LEDGF/p75 contribute to favored binding to A/T-rich sequences or possibly A/T-rich sequences favor wrapping around nucleosomes, which promotes integration (24,81,82). Introduction of HDGF-LEDGF 325-530 in LEDGF/p75-depleted cells did not restore integration in A/T-rich regions compared with LEDGF BC or any of the PWWP swaps but directed integration even more toward GC-rich segments of the genome, arguing in favor of the idea that non-PWWP domains of LEDGF/p75 that are absent in HDGF (such as AT-hook-like motifs or charged regions) affect the GC content of integration site neighborhoods.
In evolutionary terms the PWWP domain first appears in eukaryotic proteins. The domain is often associated with chromatin function, including histone and DNA modification, transcription, and DNA repair (41). Although several publications indicate that the PWWP domain recognizes chromatin and DNA without sequence specificity (43,83), Yang and Everett (45) demonstrated that HDGF acts as a transcriptional repressor through binding of its PWWP element to a conserved DNA element in the promoter of target genes, including the SMYD1 gene. We and others demonstrated that fusion of alternative DNA-binding proteins to the C-terminal end of LEDGF/p75 allows retargeting of integration in the neighborhood of the respective protein binding sites (17,18). Indeed, we found that PWWP HDGF LEDGF or HDGF-LEDGF 325-530 targets lentiviral integration to SMYD1-like sequences. However, a similar preference for integration near this sequence was detected in cells expressing any of the PWWP domain-containing constructs tested. It is possible that the PWWP domains of LEDGF/p75, HDGF, and HRP-2 all bind SMYD1-like sequences. Alternatively, the PWWP domains might interact with specific genes via epigenetic modifications present at SMYD1-like sequences in a cellular context. Another possibility is that SMYD1-like motifs are enriched near promoters and thus are enriched near genes and gene-rich regions, thereby correlating these sites with lentiviral integration that targets these regions as well.
Chromatin function is regulated in part by reversible methylation of specific lysine residues of histone proteins. Specialized chromatin regulatory factors named "chromatin readers" (66) specifically recognize distinct histone signatures on the chromatin, thereby defining the functional consequence of histone methylation. Several PWWP-containing proteins interact via their PWWP domain with specific histone modifications; the PWWP domain of Pdp1 (PWWP domain protein 1) specifically interacts with H4K20me1 (57), the PWWP domain of plant homeodomain finger-containing protein 1 (BRPF1) (64) and DNA methyltransferase 3a (65) interact with H3K36me3 (40,84), and the PWWP domain of N-PAC, MSH-6, NSD1, and NSD2 interacts with H3K36me3 (66). We thus analyzed lentiviral integration preference near specific epigenetic marks. In HeLa cells only two histone modifications (H3K4me1 and -me3) have been mapped (58), so we also included epigenetic maps obtained in CD 4 ϩ T cells (56,57). Although we cannot conclude from the epigenetic heat map (Fig. 7) which specific marks are recognized, these data suggest that PWWP HDGF -LEDGF-, PWWP HRP2 -LEDGF-, and LEDGF BC-tethered integration associates with specific epigenetic marks. To deduce whether certain epigenetic marks are favored by a specific PWWP domain, we compared the distribution of epigenetic counts of the PWWP sets versus WT cells (Fig. 8).
One explanation for our data might be that variety in the PWWP domains results in different integration targeting. Binding motifs differ among family members; the HRP-family, containing LEDGF/p75, HDGF, and HRP-2 among other proteins, typically contains a PHWP motif, whereas the NSD1 family of histone methyltransferases often contains a RWWP motif, and DNA methyltransferase 3a/b carries a SWWP motif. Less direct models are also possible in which different PWWP domains favor binding to chromosomal sites that have differing associations with histone residues and modifications. For the virus, selection of integration sites adjacent to particular epigenetic marks through binding of a cellular cofactor that recognizes these marks may be beneficial for proviral gene expression and/or may affect the ratio of transcriptionally latent versus active proviruses. More detailed studies of PWWP domain biology and effects on integration should clarify the molecular mechanisms involved.