Genome-wide Studies of CCCTC-binding Factor (CTCF) and Cohesin Provide Insight into Chromatin Structure and Regulation*

Eukaryotic genomes are organized into higher order chromatin architectures by protein-mediated long-range interactions in the nucleus. CCCTC-binding factor (CTCF), a sequence-specific transcription factor, serves as a chromatin organizer in building this complex chromatin structure by linking chromosomal domains. Recent genome-wide studies mapping the binding sites of CTCF and its interacting partner, cohesin, using chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) revealded that CTCF globally co-localizes with cohesin. This partnership between CTCF and cohesin is emerging as a novel and perhaps pivotal aspect of gene regulatory mechanisms, in addition to playing a role in the organization of higher order chromatin architecture.

Eukaryotic genomes are organized into higher order chromatin architectures by protein-mediated long-range interactions in the nucleus. CCCTC-binding factor (CTCF), a sequence-specific transcription factor, serves as a chromatin organizer in building this complex chromatin structure by linking chromosomal domains. Recent genome-wide studies mapping the binding sites of CTCF and its interacting partner, cohesin, using chromatin immunoprecipitation coupled with deep sequencing (ChIP-seq) revealded that CTCF globally co-localizes with cohesin. This partnership between CTCF and cohesin is emerging as a novel and perhaps pivotal aspect of gene regulatory mechanisms, in addition to playing a role in the organization of higher order chromatin architecture.
Genetic information in the eukaryotic genome is highly organized. Higher order chromatin organization includes interactions between DNA elements on the same or different chromosomes (1)(2)(3). Transcription factors can bridge promoters and distal cis-regulatory elements such as enhancers, insulators, and silencers, with looping out of the intervening DNA. For example, estrogen receptor-␣ (ER-␣) 2 mediates intra-and interchromosomal interactions across the human genome, as observed by chromatin interaction analysis with pairedend tag sequencing (ChIA-PET) (4). Similar techniques designed to investigate chromatin-interacting loci suggest the existence of spatial chromatin compartments where genes are differentially regulated based on higher order chromatin structure (5)(6)(7). Although some of these methods cannot distinguish between true interactions and mere proximity, they demonstrate that the chromosome does not exist in linear form. Significant progress has been made toward understand-ing local chromatin structure in terms of histone modifications, but the higher order architecture of chromosomes remains less well understood.
Recent genome-wide assays have shown that the transcription factor CTCF can link chromatin domains through longrange interactions between distal genomic regions, suggesting a crucial role for it as an architect of chromatin conformation (8). Cohesin, the protein complex that holds two sister chromatids together until they partition into daughter cells during mitosis (9), is also seen to interact with CTCF. Genome-wide analysis of cohesin binding using chromatin immunoprecipitation followed by microarrays (ChIP-chip) or deep sequencing (ChIPseq) indicates that cohesin can also regulate gene expression in conjunction with CTCF (10). Thus, together with cohesin, CTCF is becoming recognized as an important factor involved in transcriptional gene regulation and chromatin architecture organization. Groups like the ENCODE Project Consortium are continuing to generate systematic data sets of CTCF and cohesin binding along with other assays of chromatin structure in a variety of human cells, promising to yield further insights (11).
Here, we review the diverse roles of CTCF and the possible mechanisms behind them and discuss advances in our understanding of the pivotal roles of CTCF and its partner, cohesin, in higher order chromatin architecture and transcription regulation.

Versatile Roles of CTCF
CTCF, first identified as a transcriptional repressor of c-MYC (12), is a zinc-finger protein that is ubiquitously expressed and highly conserved in vertebrates (13). A paralog of CTCF called BORIS is expressed only in testes and may be involved in resetting epigenetic marks in the male germ line (14). In conjunction with hormone receptors, CTCF mediates synergistic transcriptional repression of the chicken lysozyme gene (15) and activates the amyloid ␤-protein precursor gene (16). CTCF binds to insulators and prevents communication between promoters and enhancers. Its enhancer-blocking activity is seen at many insulators such as the 5Ј-end of the chicken ␤-globin locus (17), the imprinting control region (ICR) of the mouse Igf2/H19 gene (18), and the DM1 locus underlying myotonic dystrophy (19). By flanking the 3Ј-and 5Ј-boundaries of the ␤-globin locus in human and mouse, CTCF isolates the ␤-globin gene from strong neighboring enhancers, suggesting another role for it as a barrier element (17,20).
CTCF is also implicated in genomic imprinting, namely differential expression of maternally and paternally inherited genes. In somatic cells, the Igf2/H19 ICR shows differential DNA methylation according to parental origin (18,21,22). Hypermethylation of the paternally inherited ICR inhibits CTCF binding to it, which silences expression of the H19 gene and also, in conjunction with its distal enhancers, indirectly activates transcription of Igf2. In contrast, CTCF binding to the hypomethylated maternal ICR leads to activation of H19 while repressing Igf2 by isolating its promoter from its distal enhancer through looping interactions.
In mammals, CTCF is a trans-acting factor in X chromosome inactivation, which silences one X chromosome in females. This process involves counting of X chromosome number, mutually exclusive choice, and silencing by noncoding RNAs Xist and Tsix (23,24). Overexpression of CTCF can cause cell cycle arrest and apoptosis, whereas its knockdown induces proliferation and inhibits differentiation (25,26), suggesting a role for it as a tumor suppressor.

Genome-wide Mapping of CTCF-binding Sites
Many genome-wide studies have revealed tens of thousands of CTCF-binding sites and shed light on its functions (27)(28)(29). An initial study in mouse fetal liver using ChIP-chip found Ͼ200 methylation-free CTCF-binding sites, many of which functioned as insulators (30). DNase I hypersensitivity mapping showed that 10% of 3904 DNase I sites were also bound by CTCF, suggesting that a subset of CTCF sites are functional cis-regulatory elements (31). The first genome-wide analysis of CTCF binding using ChIP-chip identified 13,804 sites in human fibroblasts (32), mostly outside promoters. More than 75% of these binding sites contained the consensus CTCFbinding motif (CCGCGNGGNGGCAG) (18). A landmark ChIP-seq study by Barski et al. (33) identified 20,262 CTCFbinding sites in human CD4 ϩ T cells. Analyzed in conjunction with a variety of histone modifications, many CTCF sites were seen to be located between active and repressive chromatin domains, suggesting that CTCF functions as a barrier preventing the spread of these domains. An analysis of 13 pluripotencyrelated transcription factors and CTCF in mouse embryonic stem cells (ESCs) showed that a subset of CTCF sites coincided with multiple transcription factor-binding loci, in particular with c-MYC and n-MYC, implying that CTCF may cooperate with other transcription factors at some target promoters (34).
Another genome-wide study in human CD4 ϩ T, HeLa, and Jurkat cells showed that CTCF-binding sites were largely invariant across these cell types, but a subset showed cell-type specificity (35). This study provided additional evidence that CTCF acts as a chromatin domain barrier because it was seen to occupy the boundaries between active chromatin regions marked by H2AK5ac and repressive regions marked by H3K27me3. The ENCODE Project Consortium has now identified tens of thousands of CTCF-binding sites in a large number of human cell types, confirming on a genomic scale that CTCF binding is associated with both gene activation and repression (27). However, it is still unclear what proportion of CTCF sites function as insulators involved in enhancer blocking or function as barrier elements.

How Can CTCF Perform These Diverse Functions?
It has been proposed that CTCF plays diverse roles through the ability of its 11 zinc-finger domains to recognize various sequences (36). However, recent genome-wide studies of its binding found only one consensus motif to be overrepresented in most CTCF-binding sites (27,32,35). Analysis of cohesin binding (see below) revealed many CTCF-independent cohesin sites, most of which (84%) also contained the consensus CTCF motif (37). Therefore, the CTCF motif may not be the only determinant of its binding to DNA. Given that ChIP does not detect only direct interactions between a protein and its target sites on DNA, it is possible that CTCF plays diverse roles through its binding partners, which it recognizes using its zincfinger domains. Because many transcription factors can co-localize with CTCF, it is also possible that CTCF acts as a landmark to recruit other transcription factors by constitutively binding on the genome.
Post-transcriptional modification can also contribute to the diverse roles of CTCF. So far, phosphorylation (38,39), poly-(ADP-ribosylation) (40,41), and sumoylation (42,43) have been reported. Several lines of evidence support a role for these modifications in CTCF function. A point mutation of CTCF phosphorylation sites affected the expression of its target, c-MYC (38). An insulator assay coupled with 3-aminobenzamide, an inhibitor of poly(ADP-ribose) polymerase, demonstrated that poly(ADP-ribosylation) is required for the chromatin insulator activity of CTCF (40). Although not influencing the DNA binding of CTCF, sumoylation nevertheless represses the c-MYC P2 promoter through its co-localization with the repressive Polycomb protein Pc2 (43). These examples indicate that posttranslational modifications of CTCF contribute to its versatile roles.
CTCF binds to the linker regions between nucleosomes at the ICR of the imprinted Igf2/H19 and DM1 loci (19,44), and these sites are enriched in the H2A.Z histone variant. DNase I hypersensitivity studies have revealed that enhancer blocking is correlated with nucleosome depletion at insulators (45), suggesting that nucleosome positioning is an important aspect of CTCF-mediated insulator function. A genome-wide analysis of nucleosomes and DNase I-hypersensitive sites in relation to CTCF binding identified positioned nucleosomes flanking CTCF sites (46), suggesting that CTCF acts as a landmark to position nucleosomes around its binding sites.

CTCF-mediated Higher Order Chromatin Architecture
An accumulating body of evidence supports that CTCF mediates long-range chromosomal interactions by looping. Chromatin conformation capture assays and fluorescence in situ hybridization indicate that CTCF mediates interchromosomal interactions by promoting the co-localization of the Igf2/ H19 locus with the Wsb/Nf1 locus on a different chromosome into a "transcription factory." Only one allele of each locus is included in such interactions, and it is possible that such interchromosomal interactions mediated by CTCF are a common feature of higher order chromatin structure (47).
An integrative analysis combining data from Hi-C assays, which use proximity ligation to identify long-range chromosomal interactions (7), with genome-wide data on CTCF binding supports the idea that long-range intra-and interchromosomal interactions could be mediated by CTCF (48). Further support comes from the mapping of CTCF-mediated chromatin interaction sites using ChIA-PET, combining ChIP, proximity ligation, and deep sequencing. This study identified 1480 cisand 336 trans-interacting sites in mouse ESCs (8) and suggested that CTCF organizes the genome into discrete regions enriched with distinct epigenetic marks.
CTCF-mediated interactions may be particularly important in immune cells. CTCF has a crucial role in regulating SATB1 (special AT-rich binding protein 1)-dependent Th2 cytokine expression (49). SATB1 is also a chromatin organizer in thymocytes (50) and organizes loop structures in the MHC class I locus by attaching chromatin to the nuclear matrix, leading to alteration of gene expression (51). There is as yet no evidence for a direct interaction between CTCF and SATB1, but because CTCF can also interact with the nuclear matrix, it is possible that CTCF interacts with SATB1 either directly or indirectly to construct the appropriate higher order chromatin structure (52,53). Genome-wide comparison between CTCF-and SATB1-binding sites may illuminate the relationship between these chromatin organizers. Recent work has shown that CTCF, possibly in conjunction with cohesin, mediates looping interactions important for Ig locus contraction and V(D)J recombination (54 -56).

How Can CTCF Manage Chromatin Organization?
The association of CTCF with matrix-associated regions in the genome allows chromatin to attach to the nuclear matrix and form functionally and structurally distinct domains (52). Such topologically independent loops may result in gene silencing, but most insulator elements may not necessarily be attached to the nuclear matrix. Attachment of chromosomal domains to the nuclear lamina (NL) can also contribute to the higher order structure of chromatin (57,58). In human fibroblasts, lamina-associated domains (LADs) separate the genome into large discrete domains that interact with the NL. These LADs contain repressive chromatin modifications, and their borders are demarcated by CTCF (52,59). It is conceivable that CTCF prevents chromatin from anchoring onto the NL at the boundary of LADs by masking interacting sites of chromatin and NL components.
Many CTCF binding partners have been reported, which could aid in the diverse functions of CTCF. They include DNA-binding proteins (YB1, YY1, RFX, CIITA, and Kaiso), chromatin proteins (H2A.Z, SIN3A, CHD8, TAF-1/Set, Suz12, and cohesin), and others (lamin, RNA polymerase II, importins, CP190, topoisomerase II, and nucleophosmin) (60,61). CTCF interacts with a nuclear matrix phosphoprotein, nucleophosmin, which is localized in the periphery of the nucleolus (62). Co-localization of CTCF and nucleophosmin onto the ␤-globin insulator resulted in localization of the insulator to the peripheral region of the nucleolus (62). Based on such observations, it is plausible that some CTCF binding partners may be important for organizing chromatin structure by localizing chromatin to specific areas in the nucleus in combination with CTCF.

Cohesin Binds in Association with CTCF and Is Involved in Insulator Function
A spate of genome-wide binding studies have shown co-localization of CTCF and cohesin at their chromosomal sites, suggesting a functional connection (63)(64)(65)(66). The cohesin complex includes SMC1, SMC3, SCC3, and the ␣-kleisin SCC1, also known as MCD1 or RAD21. SMC1 and SMC3 dimerize at one end of their linear, folded-back, coiled-coil structures. SCC1 links the nucleotide-binding domains at the other end of SMC1/3 to form a closed V-shaped ring and also recruits SCC3 to the complex (9). The two sister chromatids are encircled by cohesin rings from S phase to G 2 , preventing their premature segregation until anaphase of mitosis (67,68). Cohesin expression is not limited to dividing cells, suggesting additional cellular roles for it. Furthermore, mutations in cohesin underlie diseases such as Cornelia de Lange syndrome (69,70) and developmental defects (71). Because most of these diseaselinked cohesin mutations do not completely abrogate cell division, these disorders may be triggered by malfunction of noncanonical cohesin roles such as altered gene expression rather than loss of chromatid cohesion (66).
In yeast, cohesin is localized to centromeres, pericentric domains, intergenic regions, and ORFs, without a consensus sequence (72). Unlike yeast, mammalian cohesin sites contain consensus sequences similar to the CTCF motif (63,66). SCC3 associates with CTCF at the c-MYC insulator and also interacts with the ICR of the imprinted Igf2/H19 locus in an allele-specific manner (64). Another genome-wide ChIPchip analysis identified 8811 cohesin (SCC1) sites and 13,894 CTCF sites in the human genome (66). Like CTCF, the largest proportion of cohesin sites (49%) were located in intergenic regions, and 89% of SCC1 sites overlapped with those of CTCF. An insulator assay revealed that cohesin is essential for CTCF insulator function at the Igf2/H19 imprinting locus, although it is dispensable for CTCF binding on the genome (66). In agreement with these studies, an independent mapping of SCC1/RAD21 sites in the human and mouse genomes also revealed overlap between their binding sites and demonstrated a dependence on CTCF for the recruitment of the cohesin subunits SCC1 and SMC3 to specific CTCF sites (63). However, CTCF was not required for the sister chromatid cohesion functions of cohesin. Other studies mapping different cohesin subunits such as SMC1A and SMC3 in mouse ESCs showed similar results (73). These studies also revealed many CTCF/cohesin sites at promoters, suggesting a joint regulatory role. Although most cohesin sites overlap with CTCF, a significant proportion of each factor's sites are independent of the other, implying CTCFindependent functions of cohesin as well as cohesin-independent CTCF roles.
Several studies have demonstrated cohesin-mediated longrange looping (37, 74 -78). Although CTCF-binding sites are highly conserved across cell types (27,35), loop formation at these CTCF sites occurs in a cell type-specific manner (79). If such loops are mediated by cohesin, it remains unclear how cell type-specific looping is achieved. Moreover, CTCF functions as an enhancer blocker as well as a barrier preventing the spread of heterochromatin. The enhancer-blocking function depends on looping, and cohesin plays an important role in CTCF enhancer-blocking activity. However, it is not clear whether cohesin functions with CTCF to construct chromatin barriers because barrier activity at the CTCF sites of the human and mouse homeobox gene A (HOXA) loci appears to be independent of cohesin (80), suggesting that distinct functions of CTCF may or may not depend on cohesin.

Cohesin Regulates Gene Expression with and without CTCF
A CTCF-independent role for cohesin in gene regulation has been shown in multiple cell types (78). Without any CTCF cobinding, cohesin was functionally co-localized with ER-␣ in human MCF-7 breast cancer cells and with liver-specific transcription factors in hepatic cells. An integrative analysis of cohesin binding in MCF-7 cells with chromosomal interaction data from ChIA-PET (4) indicated that cohesin was enriched in ER-␣-bound regions that were involved in intrachromosomal loops. Cohesin could thus mediate long-range looping in a CTCF-independent manner. SCC1 co-localizes with pluripotency-related factors, including KLF4, OCT4, SOX2, ESSRB, and NANOG, in a CTCF-independent manner, suggesting that cohesin also plays a role in ESC identity (37).
Direct regulation of oncogenic c-MYC by cohesin has been reported in human cells and model organisms (64,65,(81)(82)(83)(84). Expression profiling of disease/mutant cohesin cell lines identified several hundred dysregulated genes (82). Many cohesin sites were enriched in the promoters of dysregulated genes, suggesting direct regulation by cohesin. Some studies have shown significant overlap between genes dysregulated in cohesin mutant cell lines and genes dysregulated in mutants of NIPBL (SCC2), a cohesin-loading factor (81,82). Accordingly, depletion of cohesin and NIPBL results in similar dysregulation of cohesin target genes (73). These mutants do not show significant defects in chromosome segregation. Thus, it is possible that NIPBL affects gene expression by regulating cohesin loading at regulatory sites.
In one recent example of cohesin-mediated looping in gene regulation, cohesin interacted with the mediator complex, which is a transcriptional coactivator. NIPBL binds to the cohesin-mediator complex and loads cohesin onto promoters, resulting in DNA looping between enhancers and promoters of key pluripotency genes such as OCT4 and NANOG (73). In another example, TAF3, a core promoter factor, localized to a subset of CTCF/cohesin sites and regulated genes by longrange looping, which is indispensible for endoderm lineage differentiation and prevention of premature differentiation of neuroectoderm and mesoderm in mouse ESCs (85). These examples suggest that looping often involves cohesin, but the positioning of the loops and the location of cohesin depend on other factors such as mediator or CTCF.

CTCF and RNA Polymerase II
CTCF can interact with the initiation and elongation forms of RNA polymerase II (RNAPII) in vitro, but in vivo, it displays a preference for the initiation form (86). It has been observed that RNAPII tends to stall at CTCF/cohesin-binding sites (87). It is possible that a CTCF-cohesin complex constrains RNAPII from proceeding along its DNA template. In contrast, a single CTCF-binding site cloned into a luciferase assay vector exhibits transcriptional activity (60). Thus, CTCF may help recruit RNAPII to promoters for initiation but could also mediate promoter-proximal pausing of RNAPII, which poises it for swift expression of developmental or stress-responsive genes (88).
Because splicing occurs co-transcriptionally, pausing or slowing down of RNAPII can affect splicing in addition to elongation and overall transcript levels. CTCF also appears to be involved in RNAPII pausing-dependent splicing. At the CD45 gene, a model for alternative splicing, as well as other genes genome-wide, CTCF binding causes RNAPII pausing and promotes the incorporation of "weak" upstream exons into spliced transcripts. Just as in the case of the Igf2/H19 ICR, CTCF binding to these pause sites is inhibited by DNA methylation, raising the possibility that this epigenetic mark affects splicing by a CTCF-mediated process (89). Interestingly, an independent study showed that cohesin is also recruited to genes where RNAPII pausing occurs (90). Although pausing was shown to be independent of cohesin binding, it is not clear to what extent cohesin and CTCF work together and how in RNAPII pausingrelated functions.

Model for the Role of CTCF, Cohesin, and Other Factors in Chromatin Architecture
The diverse chromosomal and cellular contexts in which CTCF, cohesin, and other factors function, particularly with regard to long-range looping interactions, are unlikely to have a single underlying mechanism. However, some unifying principles suggest themselves (Fig. 1).
First, an attractive explanation for the role of cohesin is that a cohesin ring could encircle two DNA segments from distant loci on the same chromosome to form an intramolecular loop. Although this notion has been proposed before (9,68,91), there is as yet no direct molecular evidence for such structures. Second, in this view of looping interactions, the important mechanical stabilizing influence that maintains loops is provided by cohesin rather than other factors like CTCF that are also part of this structure. However, because the cohesin ring lacks DNA-binding specificity and could slide along DNA (92), factors like CTCF, SATB1, or others could serve critical roles as positioning factors, specifying where cohesin locates on the chromosome to generate loops. These positioning factors interact with specific DNA loci, as well as with cohesin, directly or indirectly, to target the placement of loops and lock them in place once formed.
Third, different combinations of factors could be required under different circumstances (Fig. 1). Loops in stable structural features such as chromosome scaffolding interactions, matrix-associated regions, or distinct heterochromatin or euchromatin domains could be somewhat invariant across cell types, and such "permanent" loops are more likely to require CTCF as a positioning factor and cohesin as the clasp. Other situations in which the long-term structural integrity and topology of the loops are important, such as in imprinting and Ig locus contraction, are also likely to require cohesin in conjunction with CTCF as the preferred positioning partner. However, some functions of CTCF, such as its barrier function preventing the spread of heterochromatin could conceivably be achieved just by its strong binding, without the requirement for topological loops, and therefore do not have a requirement for cohesin (78). Interactions between proteins binding at distant loci, such as enhancers and promoters, could form transient loops depending on cell type and physiological state, and they may require cohesin, although perhaps at a subset of these "temporary" regulatory loops. Such regulatory loops could use the ubiquitous CTCF as a positioning factor, but they are more likely to involve alternative positioning factors, such as in the case of ER and mediator in cell type-specific regulation (73,78).
Finally, cohesin-loading factors like NIPBL/SCC2 could be important not just for the act of cohesin loading but also to ensure that it occurs at proper locations, in conjunction with the appropriate positioning factors. For sister chromatid cohesion, cohesin loading along chromosomes is undoubtedly critical, but perhaps not its location. In contrast, intramolecular looping depends on proper positioning, and it is possible that Nipbl and other mutations seen in Cornelia de Lange syndrome disrupt the location of the intramolecular cohesin clasps and thus have pathological effects in a cell-and tissue-specific manner while largely sparing the chromatid cohesion function.
Although the ideas discussed above can account for many of the current observations, many questions arise. During cell division, cohesin is deposited on individual DNA chromatids before S phase, and DNA replication creates the two catenated chromatids within a cohesin ring. Is it possible for a cohesin ring to be loaded across two non-homologous chromosomal segments to form an intramolecular loop? In yeast, intramolecular cohesin rings can occur at pericentric heterochromatin, so this may not be an insurmountable problem (93). Although much evidence supports the so-called single-ring model for cohesin in chromatid cohesion, alternative models involving two rings are also possible (9,67). Is it possible that intramolecular loops involve an alternative form of the ring? Once formed, are cohesin rings at the base of intramolecular loops permanent, or can they be removed when necessary? If the latter, how might this occur? Dissociation of cohesin rings around sister chromatids occurs due to proteolytic cleavage of SCC1/RAD21 by separase at the start of anaphase, so this is a highly cell cycleregulated process. It is unclear whether this mode of cohesin ring dissolution or an alternative mechanism involving ATP hydrolysis or removal by sliding occurs at intramolecular loops.

Conclusions
Eukaryotic genomes are compartmentalized into functional domains that can dynamically reorganize depending on cellular demands. The spatiotemporal architecture of chromatin can establish and maintain active, poised, or repressive chromatin states with regard to gene expression. Genome-wide mapping of CTCF-and cohesin-binding sites has advanced our understanding of the roles of these key proteins in establishing and maintaining chromatin architecture and gene regulation. These studies indicate that CTCF-and cohesin-mediated chro- matin organization is prevalent in the genome and is pivotal for malleability in pluripotency, development, and cell identity.
More work remains, however, to determine the molecular details of how cohesin works with CTCF and other factors to organize topological looping and establish regulatory interactions between distant regions of the genome. It also remains to be seen how long-range chromatin-organizing factors such as CTCF and cohesin might cooperate with locally acting chromatin-remodeling factors and chromatin-modifying enzymes to generate distinct transcriptional environments. How this longrange and higher order epigenetic architecture of chromosomes is stably maintained or altered during cell division and development is largely unknown. Continued investigation of the genomic binding locations of these chromatin factors in different cellular contexts coupled with mechanistic studies will be required to shed light on these questions.