Interactome mapping defines BRG1, a component of the SWI/SNF chromatin remodeling complex, as a new partner of the transcriptional regulator CTCF

The highly conserved zinc finger CCCTC-binding factor (CTCF) regulates genomic imprinting and gene expression by acting as a transcriptional activator or repressor of promoters and insulator of enhancers. The multiple functions of CTCF are accomplished by co-association with other protein partners and are dependent on genomic context and tissue specificity. Despite the critical role of CTCF in the organization of genome structure, to date, only a subset of CTCF interaction partners have been identified. Here we present a large-scale identification of CTCF-binding partners using affinity purification and high-resolution LC-MS/MS analysis. In addition to functional enrichment of specific protein families such as the ribosomal proteins and the DEAD box helicases, we identified novel high-confidence CTCF interactors that provide a still unexplored biochemical context for CTCF's multiple functions. One of the newly validated CTCF interactors is BRG1, the major ATPase subunit of the chromatin remodeling complex SWI/SNF, establishing a relationship between two master regulators of genome organization. This work significantly expands the current knowledge of the human CTCF interactome and represents an important resource to direct future studies aimed at uncovering molecular mechanisms modulating CTCF pleiotropic functions throughout the genome.

The highly conserved zinc finger CCCTC-binding factor (CTCF) regulates genomic imprinting and gene expression by acting as a transcriptional activator or repressor of promoters and insulator of enhancers. The multiple functions of CTCF are accomplished by co-association with other protein partners and are dependent on genomic context and tissue specificity. Despite the critical role of CTCF in the organization of genome structure, to date, only a subset of CTCF interaction partners have been identified. Here we present a large-scale identification of CTCF-binding partners using affinity purification and high-resolution LC-MS/MS analysis. In addition to functional enrichment of specific protein families such as the ribosomal proteins and the DEAD box helicases, we identified novel highconfidence CTCF interactors that provide a still unexplored biochemical context for CTCF's multiple functions. One of the newly validated CTCF interactors is BRG1, the major ATPase subunit of the chromatin remodeling complex SWI/SNF, establishing a relationship between two master regulators of genome organization. This work significantly expands the current knowledge of the human CTCF interactome and represents an important resource to direct future studies aimed at uncovering molecular mechanisms modulating CTCF pleiotropic functions throughout the genome. CTCF 4 is a ubiquitously expressed transcription factor with pleiotropic functions driven by recognition and binding of a preferentially unmethylated CpG-rich consensus sequence within several genomic sites. Regulatory functions of CTCF as an enhancer-blocking insulator were first discovered at the ␤-globin (1) and the imprinted H19-Igf2 loci (2)(3)(4). These pioneering studies revealed the ability of CTCF to act as an insulator, thereby preventing the interaction between enhancers and promoters and regulating transcription at selected gene loci. A transcriptional repressor role for CTCF was also reported to be mediated by its binding to different sequences in mouse, human, and chicken MYC promoters (5,6). Later, the genome-wide mapping of CTCF-binding sites revealed that it can recognize a wide variety of DNA target sequences (7)(8)(9), being the high occupancy sites conserved across cell types (10).
Over the past years, a broader view of CTCF as a unique versatile zinc finger protein has emerged, adding knowledge on CTCF functions in transcriptional activation/repression, enhancer-blocking and/or chromatin barrier insulation, hormone-responsive silencing, genomic imprinting, transcription pausing, alternative mRNA splicing, and, more recently, as an architectural protein regulating higher-order chromatin structure and genome topology (11)(12)(13)(14)(15)(16)(17). Indeed, recent advances in chromosome conformation capture and high-throughput chromosome conformation capture methods significantly increase our understanding of CTCF roles in mediating longrange interactions as the basis of the genome partitioning into topologically associating domains (TADs), defined as units of chromosomes exhibiting a high frequency of interaction within domains compared with the adjacent domains. Interestingly, CTCF-binding sites have been found to be enriched at TAD boundaries along with transcription start sites (TSSs), further supporting its role as a chromatin organizer (18). On the contrary, Ramírez et al. (19), in contrast to earlier studies, recently found that in flies the CTCF DNA-binding motif is rarely asso-ciated with TAD boundaries and that specific DNA motifs can allocate different boundary proteins, thus guiding genome architecture. In particular, a DNA-guided chromatin assembly model has been proposed based on the recognition of boundary elements by specific proteins, which help loading TADs assembly factors onto chromatin (19).
Several efforts have been made to unravel mechanisms at the basis of CTCF pleiotropic functions to achieve a deep understanding of how this unique transcription factor can execute diverse functions in different contexts and cell types. Nakahashi et al. (20) demonstrated that CTCF associates with a wide array of DNA modules via combinatorial clustering of its 11 zinc fingers. An additional strategy widely recognized to modulate CTCF recruitment at various genomic loci is the interaction with other proteins that affects its functional specificity in a genomic context-and tissue-specific manner (12-16, 21, 22). Indeed, an array of classical biochemical techniques has been used to identify binding partners of CTCF including traditional co-immunoprecipitation strategies and binding to CTCF bait in yeast two-hybrid assays; for other proteins, co-localization with CTCF genome-wide by conventional ChIP, ChIP-on-ChIP, or ChIP-Seq experiments have also been reported (12-16, 21, 22). By these approaches, it has been demonstrated that CTCF exerts its function by specific co-associations with a plethora of other proteins belonging to distinct functional groups such as DNA-binding proteins (e.g. Ying yang YY-1, YB1, and Kaiso), DNA and RNA helicases (e.g. CHD8 and DEAD box RNA helicases p68), histones (e.g. H2A and H2A.Z), and other regulatory proteins including poly(ADP-ribose) polymerase, nucleophosmin, topoisomerase II, RNA polymerase II, and transcription factor II-I (15,21). In addition, cooperation of CTCF with cohesin has emerged to be crucial in determining genomes spatial organization into chromatin loops (23,24) and TADs (18,(25)(26)(27).
It is therefore clear that the identification of novel CTCFbinding partners is of central interest to shed light on mechanisms driving well-known CTCF functions and to open novel perspectives on still unexplored roles of this multivalent transcription factor. Indeed, despite the biological importance of CTCF, our general knowledge of the human CTCF interaction network is limited to selected CTCF protein partners, mainly involved in specific functions such as binding and modification of DNA or chromatin. Recent advances in MS instrumentation and computational tools resulted in the identification of highconfidence interaction proteomes of several biologically relevant protein groups by large-scale affinity purification of proteins coupled to MS (AP-MS) approaches, markedly improving our knowledge of protein interaction networks and functions (28 -35).
Here we present a global interaction study of human CTCF by high-resolution nano-LC-electrospray ionization-MS/MS. We identified 90 high confidence protein-protein interactions that constitute a network of proteins with specific functions in chromatin binding, promoter-specific chromatin binding, transcription, and more. In addition to confirming a number of well-known CTCF interactors, our study reveals co-associations of CTCF with still uncharacterized protein partners that are important for genome organization such as BRG1, the major ATPase subunit of the chromatin remodeling complex SWI/SNF. This work significantly expands the current knowledge of the human CTCF interactome and represents an important resource to direct future studies aimed at uncovering molecular mechanisms modulating CTCF pleiotropic functions throughout the genome.

Purification and identification of CTCF-interacting complexes by high-resolution MS
Despite the master role of CTCF in regulating gene expression and genome structure, a large-scale study to identify CTCF interaction partners by high-resolution LC-MS/MS analysis has not been previously reported. Here, we applied an AP-MS approach to characterize the human CTCF interactome in WiT49 cell lines overexpressing CTCF. A schematic outline of the AP-MS procedure used in this study is shown in Fig. 1. Following the transfection of the WiT49 cell line with the pcDNA3 bearing the full-length CTCF DNA encoding sequence, CTCF overexpression was verified by quantitative RT-PCR (Fig. S1). Protein complexes were purified by immunoprecipitation on whole cell lysates and protein A affinity pulldown. Then, after tryptic digestion, peptides were subjected to MS/MS in technical replicates by using a nano-LC Orbitrap system. By applying very stringent filtering criteria including the presence in replicate injections and/or identification with more than one unique peptide, 90 high-confidence proteins, putatively belonging to the CTCF interactome, were identified ( Table 1). Details of the identifications are reported in Table S1.
Two proteins of our selected CTCF-interacting proteins (i.e. Nucleophosmin (NPM1) and rRNA 2Ј-O-methyltransferase fibrillarin (FBL)) are among the CTCF interactors included in the curated BioGRID interaction repository database, as well as other proteins that have also been previously associated with CTCF such as the DEAD box proteins 5 and 17 (DDX5 and DDX17) also known as RNA helicases p68 and p72 (36). We also identified several other known CTCF interactors such as DNA topoisomerase 2 (TOP2B), poly(ADP-ribose) polymerase 1, and Cullin-associated NEDD8-dissociated protein 1 (CAND1) that were not further considered lacking the filtering criteria used in this study. Other previously known CTCF-binding partners were not identified in our screen, likely because of their low abundance that could have prevented their detection by MS. However, we cannot exclude that the failure to identify these proteins could be due to the association of CTCF with different binding partners more relevant in our cell model.

Clustering of CTCF-binding partners based on known protein interactions and functions
The clustering and visualization of protein-protein interaction networks is critical for the functional interpretation of MS data and for targeting validation on novel binding partners of biological relevance. To this aim, candidate CTCF-interacting proteins were mapped on a single interconnected network by the NetworkAnalyst software using the literature-curated IMEx Interactome database. The ClusterMaker2 Cytoscape plug-in was used for clustering and visualizing network nodes

Protein interaction landscape of human CTCF
into modules for the detection of previously annotated complexes. By this approach, 76 of the 91 candidate proteins were mapped on a network including a large cluster containing several ribosomal proteins (Fig. 2, blue) connected to several smaller ones including a cluster of several ATP-dependent RNA helicases (Fig. 2, yellow). An unbiased gene ontologybased classification was then applied to investigate functions of proteins associated with CTCF in the pulldown experiment. To this end, the ClueGO cytoscape plug-in was used to generate a functionally grouped gene ontology (GO)/pathway term network of enriched molecular function categories for the identified proteins based on kappa statistics. Identified proteins were assigned to 13 groups that were mapped on a functionally clustered network ( Fig. 3 and Table S2). Not surprisingly, the larger cluster of the output network for enriched categories revealed that a subset of identified proteins was involved in several specific functions related to the RNA transcription process, as well as to chromatin DNA binding and promoter-specific chromatin binding. An additional enriched molecular function ontology group in the CTCF interactome is that related to ATPdependent helicase activity. To provide further insights into proteins associated with GO terms, we visualized these terms with their associated proteins in a heat-map layout showing the individual proteins, resulting in the identification of enriched molecular functions (Fig. S2). This analysis points out the overlapping presence of several DEAD or DEAH box helicases in several groups. Among these, we identified EIF4A1, DDX3X, DHX9, DDX5, DDX17, of which the last two already known CTCF-interacting proteins (36). We also revealed the presence of the transcription activator BRG1, also known as ATP-dependent chromatin remodeler SMARCA4, which is together with BRM (also known as SMARCA2), one of the two mutually exclusive core ATPase subunits of the switch/sucrose non-fermentable (SNF/SWI) chromatin remodeling complex. Interestingly, we also detected the AT-rich interactive domaincontaining protein 1 (ARID1A), another component present in only some variants of the SWI/SNF complex. Because a potential interaction of BRG1 and CTCF has been long postulated but still not experimentally demonstrated (37-39), we focused our attention on this interaction and select BRG1 for further investigations. To exclude that the interaction between BRG1 and CTCF was mediated by nucleic acids, we performed pulldown assays in the presence or absence of Benzonase nuclease to degrade DNA/RNA followed by targeted LC-MS/MS analyses. The presence of BRG1 in the CTCF IP was not changed following Benzonase digestion, suggesting that the interaction between the proteins is DNA/ RNA-independent (Figs. S3-S20).

Validation by co-immunoprecipitation of interactions of CTCF with BRG1 and DDX5
Co-immunoprecipitation (co-IP) followed by immunoblot was performed to further validate the interaction of CTCF with BRG1. DDX5 was also selected to validate the specificity of interaction based on previous evidence that report DDX5 as a common interaction partner of both BRG1 and CTCF (36,40). Both BRG1 and DDX5 were co-immunoprecipitated with anti-CTCF but not with anti-IgG (Fig. 4A). By CTCF IP followed by Western blotting, we also confirmed that Benzonase treatment did not affect the interaction of CTCF with BRG1 (Fig. S21). Moreover, we performed reciprocal co-IP assays using whole cell lysate from WiT49 cells and antibodies against BRG1 (Fig.  4B) and DDX5 (Fig. 4C). The reciprocal co-IP analysis demonstrated that CTCF co-purified with endogenous BRG1 and DDX5 proteins. These results further support AP-MS data and confirm specific interactions of the selected candidate proteins with CTCF.

Genomic co-occupancy by CTCF, BRG1, and DDX5
To provide further insights into the functional interaction of CTCF with BRG1 and DDX5, we wondered whether common DNA-binding sites were shared by these proteins. We then Figure 1. Schematic workflow of the immunoprecipitation-MS approach to identify the CTCF-interacting proteins. Protein complexes were purified by WiT49 whole cell extract by a two-step affinity purification with an anti-CTCF followed by protein A/G pulldown. Following elution and tryptic digestion, the resulting peptides were subjected to nano-LC-MS/MS in technical replicates for protein identification. Selected preys were then validated by co-immunoprecipitation and Western blotting (WB).

Protein interaction landscape of human CTCF
reanalyzed ChIP-Seq data on genome-wide binding profiles of the three proteins in HeLa cells (Table S4) to assess their chromatin co-occupancy and to determine whether they co-localized to the same genomic regions. At first, we performed pairwise comparisons of the genomic sites occupied by each protein (Fig. 5, A-C). Consistent with the physical interaction revealed by AP-MS, we found significant co-localizations for all the analyzed comparisons (p value Ͻ 0.005). In particular, when CTCF and BRG1 sites were compared, we found that ϳ9% of BRG1 sites were shared with 6% of CTCF sites (Fig. 5A). Similarly, comparison between CTCF and DDX5 revealed that 11% of DDX5 sites were co-occupied by ϳ7% of CTCF sites (Fig. 5B). A higher number of overlapping sites were shared by BRG1 and DDX5, with 26% of DDX5 sites also bound by 22% of BRG1 sites (Fig. 5C). We also identified a set of 497 sites simultaneously co-occupied by CTCF, DDX5, and BRG1 with ϳ44% of sites co-occupied by BRG1 and CTCF also bound by DDX5 (Fig. 5D).
To further characterize localizations of the three proteins, we investigated the distribution of both co-occupied sites (Fig. 6, blue bars) and sites occupied by CTCF alone (Fig. 6, green bars) with respect to distances from TSSs. Co-localized regions for CTCF-BRG1, CTCF-DDX5, and CTCF-BRG1-DDX5 were enriched in a window of 0 -2 kb around TSSs with respect to sites occupied by CTCF alone (p value Ͻ 2.2 eϪ16 ). Interestingly, the CTCF-BRG1-DDX5 intersection was significantly enriched around TSSs even with respect to both CTCF-BRG1 (p value Ͻ 2.2 eϪ16 ) and CTCF-DDX5 (p value Ͻ 6.4 eϪ08 ), thus suggesting a higher enrichment at promoter regions of sites co-occupied by all the three proteins with respect to overlapping sites of pair-wise comparisons.
Moreover, we also found an over-representation (p value Ͻ 2.2 eϪ16 ) with respect to CTCF sites alone, of CTCF-BRG1,

Protein interaction landscape of human CTCF
CTCF-DDX5, and CTCF-BRG1-DDX5 co-localized regions with trimethylation of histone H3 at lysine 4 (H3K4me3) and trimethylation of histone H3 at lysine 36 (H3K36me3) that are associated with active transcription. Accordingly, for the same co-localized regions, we observed an under-representation (p value Ͻ 2.2 eϪ16 ) of the histone mark of gene repression H3K27me3. Overall, these results suggest the co-occupancy of CTCF, BRG1, and DDX5 on transcriptionally active chromatin regions (Table S3).

Discussion
The transcription factor CTCF plays a pivotal role in a myriad of genomic processes, including transcription, imprinting, and long-range chromatin interactions. It is widely recognized that the versatility of this multitasking master regulator is at least in part determined by co-association with genomic context-specific binding partners (14,21). Protein-protein interaction maps have proven to be very useful for understanding the protein molecular functions.
Here, we present the first global CTCF-associated protein interactome map performed by high-resolution MS. Our study confirms previously reported interactions and reveals novel potential CTCF-binding partners, suggesting that the CTCF annotated interaction proteome is far from being complete.
Consistent with other studies, we identified several ribosomal proteins together with the nucleolar protein Nucleophosmin, a molecular chaperone involved in the transport of ribosome sub- Figure 3. Functionally grouped network of enriched molecular function categories for the identified proteins generated by using the ClueGO cytoscape plug-in. The proportion of shared proteins between terms was evaluated using kappa statistics. GO terms are represented as nodes whose size represents the term enrichment significance. Partially overlapping functionally related groups are represented as squares, whereas nonoverlapping terms are represented as circles. Clusters including more than two terms are numbered as clusters 1 (green), 2 (blue), and 3 (yellow). The group number resulting from ClueGO associations of GO terms is indicated for each cluster (Table S2). Nonclustered terms including groups 0 -7 are colored pink.

Protein interaction landscape of human CTCF
units and histones from the cytoplasm to the nucleus and nucleoli (41). It has been demonstrated that Nucleophosmin interacts with CTCF at the insulator sites in vivo (41). CTCF/Nucleophosmin association has been also hypothesized to be responsible for the co-purification with ribosomal proteins (41). We also confirmed by both MS identification and by co-IP experiments the interaction of CTCF with the DEAD box RNA helicase p68 (DDX5). This complex has been also reported to include the steroid receptor RNA activator and is essential for CTCF function as an enhancer-blocking insulator in vivo (36). Interestingly, together with DDX5 we also identified the highly homologous protein DDX17 (p72), previously reported to be associated with CTCF (36) and several additional members of the DEX(D/H) box family, such as DHX9 (RNA helicase A) and DDX3X. These proteins are engaged in multiple processes of RNA biology including pre-mRNA processing (i.e. cap formation, splicing/alternative splicing, and polyadenylation), ribosome biogenesis, RNA turnover, export, and translation (as reviewed in Refs. 42 and 43). In addition, a growing body of evidence suggests the involvement of several DEX(D/H) box proteins as transcriptional regulators (42,43). Intriguingly, these roles in the transcriptional machinery appear to be independent from their RNA helicase or unwindase activity. Indeed, it has been reported that they may either stabilize the transcriptional initiation complex or act as bridging factors that facilitate the recruit-ment of other transcription factors/co-activators such as CBP, p300, and RNA polymerase (Pol) II to responsive promoters (44). RNA helicases p68/p72 and the noncoding steroid receptor RNA activator have been also found associated with MyoD and are directly involved in its co-activation by promoting the assembly of a transcription initiation complex including the TATA-binding protein TBP and the RNA Pol II (40). The catalytic subunit of the ATPase SWI/SNF chromatin remodeling complex, BRG1, that physically interacts with p68/p72 (40) also takes part in this mechanism.
Our AP-MS analysis, for the first time, reveals that BRG1 is co-associated with CTCF. The high-confidence interaction was verified in replicate independent experiments and was further validated by co-IP and reverse co-IP by performing a BRG1 immunoprecipitation and probing the immunoblot with a CTCF antibody. Consistent with the fundamental role of chromatin remodeling complexes in regulating chromatin accessibility for gene expression, a genome-wide screen of SWI/SNF component (i.e. Ini1, BAF155, BAF170, and BRG1) binding sites demonstrated an extensive overlap with promoters, enhancers, and many regions occupied by Pol II and CTCF sites (37). More recently, in addition to the transcriptional role of BRG1 at gene promoters, a more complex scenario is emerging identifying BRG1 as a dynamic component of higher-order chromatin organization enriched at TAD boundaries (38,39). BRG1 has also been involved in the

Protein interaction landscape of human CTCF
maintenance of nuclear structure integrity and in mediating specific long-range chromatin interactions through interactions with transcription factors and other co-factors (37,45). In this context, it was suggested that BRG1 plays a role at TAD boundaries by regulating nucleosome occupancy and possibly CTCF localization. Indeed, an intersection of CTCF ChIP-seq data set carried out using MCF-10A cells (46) with BRG1 peaks revealed that ϳ10% of all BRG1 peaks and 12% of BRG1 peaks specifically located at TAD boundaries directly overlapped with CTCF (38). Moreover, a relationship between BRG1 knockdown and the reduction of nucleosome occupancy around the CTCF sites was also observed in mouse fibroblast cells (38). Similar effects were noticed for BRG1 knockdown around TSS of known genes (47). Despite the fact that cross-talk between BRG1 and CTCF has long been hypothesized and supported by genome-wide approaches, attempts to co-purify these factors by AP-MS were to date unsuccessful. In eukaryotic cells, a balance between tight packaging and accessibility of the chromatin is usually achieved by specific proteins that dynamically modify chromatin structure. BRG1 and CTCF are regarded as master regulators of chromatin architecture. Indeed, BRG1 is involved in the fine tuning of DNA accessibility in an ATPdependent manner, whereas CTCF is widely recognized as a global genome organizer able to coordinate high-order chromatin structures and to regulate gene expression (12,13,15). Our data point toward a cooperation between the two proteins that may be crucial in determining their functional specificity. Interestingly, our data suggest an unanticipated interplay in transcriptional regulation between CTCF, BRG1, and DDX5 because we found that regions simultaneously co-occupied by the three proteins are significantly enriched at promoter regions.
The high-resolution map of CTCF-binding sites in human genome revealed that only ϳ20% of CTCF sites are near transcription start sites (8). Unlike general transcription factors, the localization of CTCF sites distal to TSS has been suggested to be consistent with its putative role as an insulator-binding protein (8). Nevertheless, much evidence for a direct role of CTCF in transcription regulation on individual genes has been demonstrated (48,49). Moreover, Peña-Hernández et al. (50) reported that the interaction between CTCF and transcription factor II-I was essential in directing CTCF to the promoter regions of genes involved in metabolism. We also noticed, a significant over-representation, with respect to CTCF sites, of CTCF, BRG1, and DDX5 co-localized regions with H3K4me3 and H3K36me3. These histone marks are usually enriched at TSS/ promoter regions with open chromatin structure and known to be positively correlated with gene transcriptional activation. Accordingly, we observed an under-representation of CTCF, BRG1, and DDX5 co-localized regions with the repressive histone modification H3K27me3 associated with silent genes. Taken together, our findings suggest that the CTCF sites where the transcription factor co-localizes with BRG1 and DDX5 mostly include a subset of genome-wide CTCF sites located around the TSS and associated with histone marks of transcriptionally active chromatin. Overall, it can be supposed that, whatever the effect of CTCF on transcription (e.g. repression, activation/transactivation, or pausing), these different outputs can only occur through cooperation with other proteins involved in remodeling chromatin architecture such as BRG1. Additional proteins of the transcriptional machinery such as DDX5 may contribute to the diversification of CTCF functions by means of alternative complexes formation possibly involved in the recruitment of other transcription factors/co-activators to promoters.
Although the roles of several identified proteins are still undefined, our study highlights the capability of AP-MS to fill the gaps in our knowledge about novel CTCF interactors contributing to fine-tuning of its multiple functions. The presented CTCF interaction proteome represents a knowledge base for further elucidating individual protein interaction with CTCF and for instructing future functional experiments to uncover Protein interaction landscape of human CTCF molecular bases responsible for the high versatility of this unique transcription factor.

Cell culture, cloning, and transfections
The WiT49 cell line derived from a Wilms tumor primary lung metastasis (51) was cultured in Iscove's modified Dulbecco's medium, supplemented with 10% fetal calf serum, 100 units/ml penicillin, and 100 mg/ml streptomycin at 37°C in a humidified 5% CO 2 atmosphere. The cDNA encoding the full-length CTCF gene was cloned into the pcDNA3 expression vector, under the control of the constitutively expressed cytomegalovirus promoter. For plasmid transfection, WiT49 cells were transfected with the pcDNA3-CTCF plasmid or with the empty control vector using Lipofectamine 3000 according to the manufacturer's protocol (Thermo Fisher Scientific). Stably transfected cells were selected with 1 mg/ml G418 (Life Technologies) and maintained in 0.6 mg/ml G418.

Sample preparation for MS analysis
WiT49 cells overexpressing CTCF from ten 150-mm plates were harvested by trypsinization and washed with PBS. The cells were lysed for 45 min at 4°C in lysis buffer (200 l of lysis buffer/plate) containing 10 mM Tris-HCl, pH 7.4, 350 mM NaCl, 1 mM EDTA, 1% Triton X-100, 10% glycerol and then clarified at 15,000 ϫ g for 15 min at 4°C. For Benzonase digestion, the cells were lysed in 10 mM Tris-HCl, pH 7.4, 350 mM NaCl, 1 mM MgCl 2 , 1% Triton X-100, 10% glycerol and incubated for 30 min at room temperature in the presence or absence of 250 units of Benzonase (Sigma-Aldrich). Aliquots of lysates (10 l) were analyzed by 1% agarose gel electrophoresis and ethidium bromide staining to verify DNA/RNA degradation (data not shown). Protein concentration was determined by Bradford assay. For IP, protein lysates (1-2 mg for benzonase-treated/ untreated samples) were diluted in IP buffer up to 1 ml (50 mM Tris-HCl, pH 7.4, 150 mM NaCl, 0.25% sodium deoxycholate) and incubated for 1 h at 4°C with DiaMag protein A-coated magnetic beads (40 l, Diagenode). After the preclearing step, the samples were incubated overnight at 4°C with polyclonal anti-CTCF (Diagenode C15010210, 10 g) and polyclonal rab-bit anti-IgG (Diagenode C15410206, 10 g) as negative control. Immunoprecipitated proteins were then incubated for 3 h under rotation at 4°C with the DiaMag protein A-coated magnetic beads (40 l, Diagenode) prewashed in the IP buffer. The beads were collected on a magnetic stand, washed three times with 100 l of 100 mM NH 4 HCO 3 , pH 8.0, and resuspended in 100 l of the same buffer. The proteins were reduced with 10 mM DTT (final concentration) at 55°C for 1 h and, following a wash step with 100 l of NH 4 HCO 3 , carbamidomethylated with 7.5 mM iodoacetamide (final concentration) at room temperature in the dark for 15 min. Following a further wash step with 100 l of NH 4 HCO 3 , enzymatic hydrolyzes were performed by the addition of 0.2 g of tosyl phenylalanyl chloromethyl ketone-treated trypsin to the reduced and alkylated mixture. Digestions were performed by incubation at 37°C for 16 h. After digestions, the samples were centrifuged at 10,000 ϫ g for 15 min, and supernatants were dried under vacuum in a Speed-Vac vacuum (Savant Instruments, Holbrook, NY). The samples were then resuspended in 40 l of H 2 O, 0.1% TFA and centrifuged at 10,000 ϫ g for 15 min. Aliquots of the supernatant (3 l) were analyzed by high resolution nano-LC-tandem mass spectrometry.

High resolution nano-LC-tandem mass spectrometry
Mass spectrometry analysis was performed on a Q Exactive Orbitrap mass spectrometer equipped with an EASY-Spray nano-electrospray ion source (Thermo Fisher Scientific) and coupled to a Dionex UltiMate 3000RSLC nano system (Thermo Fisher Scientific). Solvent composition was 0.1% formic acid in water (solvent A) and 0.1% formic acid in acetonitrile (solvent B). Peptides were loaded on a trapping PepMap TM 100 Cartridge Column C18 (300 m ϫ 0.5 cm, 5 m, 100 Å) and desalted with solvent A for 3 min with at a flow rate of 10 l/min. After trapping, eluted peptides were separated on an EASY-Spray analytical column (15 cm ϫ 75 m inner diameter PepMap RSLC C18, 3 m, 100 Å), heated to 35°C, at a flow rate of 300 nl/min by using the following gradient: 4% B for 3 min, from 4 to 22% B in 50 min, from 22 to 35% B in 10 min, and from 35 to 90% B in 5 min. A washing (90% B for 5 min) and a reequilibration (4% B for 15 min) step was always included at the end of the gradient. Eluting peptides were analyzed on the Q-Exactive mass spectrometer operating in positive polarity mode with capillary temperature of 280°C and a potential of 1.9 kV applied to the capillary probe. Full MS survey scan resolution was set to 70,000 with an automatic gain control target value of 3 ϫ 10 6 for a scan range of 375-1500 m/z and maximum ion injection time of 100 ms. The mass (m/z) 445.12003 was used as lock mass. A data-dependent top five method was operated during which higherenergy collisional dissociation (HCD) spectra were obtained at 17,500 MS2 resolution with an automatic gain control target of 1 ϫ 10 5 for a scan range of 200 -2000 m/z, maximum injection time of 55 ms, 2 m/z isolation width, and a normalized collisional energy of 27. Precursor ions targeted for HCD were dynamically excluded for 15 s. Full scans and Orbitrap MS/MS scans were acquired in profile mode, whereas ion trap mass spectra were acquired in centroid Protein interaction landscape of human CTCF mode. Charge state recognition was enabled by excluding unassigned and singly charge states.

MS data processing
The acquired raw files were analyzed with the Proteome Discoverer 2.1 software (Thermo Fisher Scientific) using the SEQUEST HT search engine. The HCD MS/MS spectra were searched against the Homo sapiens Uniprot_sprot database (release 2015_11_11, 42,084 entries) assuming trypsin (full) as digestion enzyme and two allowed number of missed cleavage sites. The mass tolerances were set to 10 ppm and 0.02 Da for precursor and fragment ions, respectively. Oxidation of methionine (ϩ15.995 Da) and N-terminal acetylation (ϩ42.011 Da) were set as dynamic modifications and carbamidomethylation of cysteine (ϩ57.021 Da) as static modification. False discovery rates (FDRs) for peptide spectral matches (PSMs) were calculated and filtered using the target decoy PSM validator node in Proteome Discoverer. The target decoy PSM validator node specifies the PSM confidences on the basis of dynamic scorebased thresholds. It calculates the node-dependent score thresholds needed to determine the FDRs, which are given as input parameters of the node. Target decoy PSM validator was run with the following settings: maximum delta Cn 0.05, a strict target FDR of 0.01, a relaxed target FDR of 0.05, and validation based on q value. The protein FDR validator node in Proteome Discoverer was used to classify protein identifications based on q value. Proteins with a q value of Ͻ0.01 were classified as highconfidence identifications, and proteins with a q value of 0.01-0.05 were classified as medium-confidence identifications. Only proteins identified with high confidence were retained with an FDR of 1%. The resulting list of CTCF-interacting proteins was finally uploaded into the Contaminant Repository for Affinity Purification (CRAPome, www.crapome. org) (63) 5 database to further investigate the presence of potential contaminants within the identified protein list. The obtained results were not used as an exclusion criterion but as an estimate of probability and significance for each interacting protein. The criteria used for inclusion as potential CTCF-interacting protein were the presence in replicate injections with more than one unique peptide and the absence in control IgG IP sample. Proteins identified by searching MS/MS spectra against a custom common contaminant database were also not considered.

Bioinformatic analyses
The list of CTCF interactors identified by LC-MS/MS was imported into the NetworkAnalyst software for integrative analysis of protein data through statistical, visual, and networkbased approaches (52). The literature-curated IMEx Interactome database from InnateDB (53) was selected for the protein-protein interaction analysis. The resulting zero-order network was visualized and further analyzed using Cytoscape 3.6.0 (54). The Markov clustering algorithm implemented in the Cytoscape plug-in clusterMaker2 was used for network clustering (55). Molecular function enrichment analysis was performed by using the ClueGO cytoscape plug-in to generate a functionally grouped GO/pathway term network of enriched molecular function categories for the identified proteins based on kappa statistics (56).

ChIP-seq data analysis
The ChIP-seq data used in this study are from previous publications and are listed in Table S4 (36,37,57). The numbers of consensus peaks for CTCF, BRG1, DDX5, H3K4me3, H3K36me3, and H3K27me3 is summarized in Table S4. The freely available LiftOver tool (https://genome.ucsc.edu/) 5 was used when necessary to convert the genome coordinates from NCBI36/hg18 to GRCh37/hg19. The analyses were carried out using the GRCh37/hg19 coordinates. The consensus regions for CTCF, BRG1, DDX5, H3K4me3, H3K36me3, and H3K27me3 were defined in terms of co-localizations (i.e. overlaps with distance equal to zero) between replicate tracks when available. The CTCF consensus peaks were considered as reference. The Bioconductor package ChIP-peakAnno was used to quantify the co-localizations by computing the number of overlapping/not overlapping regions and the corresponding lists for each comparison (58,59). The significance of the co-localizations was assessed by a permutation test using the shuffle function of the Bioconductor package ChIPseeker (60). Co-localized regions were annotated with respect to gene positions, and the gene annotation was performed using the package ChIPseeker. The parameters were set up to annotate the regions with the closest gene (in terms of TSS) within a window of 3 kbp. The Ensembl release GRCh37.p13 was considered as a reference database and imported in R using the Bioconductor package biomaRt (https://bioconductor.org/packages/releasebioc/html/ biomaRt.html) (64, 65). 5 The Fisher's exact test implemented in R was used to evaluate the statistical significance of associations (true odds ratio Ͼ 1 to test for over-representation and true odds ratio Ͻ 1 to test for under-representation). Statistical significance was reported in terms of p values.