Biological Implications and Regulatory Mechanisms of Long-range Chromosomal Interactions

Development of high-throughput sequencing-based methods has enabled us to examine nuclear architecture at unprecedented resolution, allowing further examination of the function of long-range chromosomal interactions. Here, we review methods used to investigate novel long-range chromosomal interactions and genome-wide organization of chromatin. We further discuss transcriptional activation and silencing in relation to organization and positioning of gene loci and regulation of chromatin organization through protein complexes and noncoding RNAs.

Development of high-throughput sequencing-based methods has enabled us to examine nuclear architecture at unprecedented resolution, allowing further examination of the function of long-range chromosomal interactions. Here, we review methods used to investigate novel long-range chromosomal interactions and genome-wide organization of chromatin. We further discuss transcriptional activation and silencing in relation to organization and positioning of gene loci and regulation of chromatin organization through protein complexes and noncoding RNAs.
The nucleus is a three-dimensional structure that is often simplified to a one-dimensional linear structure, which overlooks the importance of nuclear organization in biological functions. Early microscopy studies on nuclear architecture focused on general organizing principles such as chromosome positioning, which led to the hypothesis that each chromosome remains within its own territory during interphase (1,2). However, it is now recognized that chromosome territories frequently intermingle and coincide with genome function and stability (3,4). The advent of chromosome conformation capture (3C) 2 -based methods has transformed our understanding of long-range chromosomal interactions, revealing through sequencing analysis a highly complex landscape consisting of specific localization patterns that correlate with transcriptional activation, repression, translocation, and other biological events. Together with traditional imaging-based methods (5), these newer technologies are starting to uncover more general principles of chromosomal organization among a variety of cell types (6 -8).
In this minireview, we will first examine recent technologies used to define long-range interactions. We will then discuss the functional implications of these long-range interactions and the different factors involved in organizing long-range interactions in the nucleus. Our focus will be mainly on chromatin conformation. Many other excellent reviews focusing on global nuclear organization relating to chromosome positioning (9) and nuclear bodies (10) are available for interested readers.

Studying Long-range Chromosomal Interactions with FISH-and 3C-based Methods
Nuclear organization was initially explored using imaging analysis such as FISH-based methods (11). Multicolor FISH provided direct evidence for chromosome territories (5). FISH was also used to correlate transcriptional activity with position of genes within a chromosome territory in several studies. Actively transcribed genes at the HoxB locus were more likely seen looped out of its chromosome territory (12). Positioning of genes at the periphery of a chromosome territory reflected a repressed transcriptional state (13). Genes located far from one other can often aggregate and colocalize when activated (14). Interestingly, FISH studies have also characterized in trans long-range interactions as rare events, occurring in 5-15% of cells (15)(16)(17)(18)(19)(20). However, FISH requires fixation of cells, leading to two interpretations for these data (21). It may be that longrange interactions are so transient that, at any given time, we can observe only a small percentage of these interactions or that long-range interactions are stable but present in only a subset of cells. The latter observation is supported when transactivation, measured based on accumulation of cytoplasmic mRNA in interchromosomal interactions, occurs only in a small percentage of cells (22). Although requiring further study, both confirm that interactions are heterogeneous within cells.
Recently, new approaches have been developed based mostly on 3C technology, introduced by Dekker et al. (23) to quantify interactions between distant DNA regions ( Fig. 1). In the original protocol, cells are fixed with formaldehyde, and chromatin is digested with either "six cutter" restriction enzymes such as HindIII, BamHI, and EcoRI or "four cutter" enzymes such as DpnII, MboI, and Csp6I. Then, to ensure that only DNA fragments within the same DNA-protein complex will ligate to each other, the fragments are diluted to favor in cis ligation. Ligation products are identified by PCR using primers targeting sequences flanking the restriction enzyme cutting sites. Usually, various primer combinations targeting every restriction site in a specific region are used to construct a matrix of ligation efficiency, revealing interaction frequencies within that region. The traditional 3C approach captures "one-to-one" interactions, where the gene of interest is probed for probable interacting partner loci; however, in many cases, there is no prior indication of where these potential interacting sites are.
Building on the 3C procedure, chromosome conformation capture-on-chip (4C) defines a bait region to discover novel interacting regions (15,16). A variety of 4C strategies have been developed, but generally, the fixation and digestion steps are similar to 3C analysis, where circular DNA loops are also formed by ligation (Fig. 1). Inverse PCR primers designed to amplify all unknown sequences are ligated to the bait region. The amplification product is then subjected to microarray or next-generation sequencing. In addition to using enzyme digestion, sonication can also be adapted to 4C to avoid systematic bias of enzyme cutting (24,25).
To look at all contact frequencies between any two points within a single large genomic region, chromosome conformation capture carbon copy (5C) was designed to generate a matrix of interaction frequencies (7,26) using oligonucleotide pairs matched to every ligation site between interacting 3C fragments. After amplification, readout of the junctions using microarray or next-generation sequencing can generate a three-dimensional organization map of a large genomic region at high resolution. 5C is powerful because it can theoretically measure the interaction efficiency between any two digestion sites; however, a very large number of oligonucleotides are needed to evaluate the conformation of a whole chromosome or an entire genome, so the cost of synthesizing such a large amount of primers prevents this technology from being applied to genome-wide studies.
A similar but more powerful method, HiC can generate an "all-to-all" genome-wide interaction frequency matrix (6,7). The fixation and digestion steps are similar to those in the basic 3C protocol, but after digestion, the restriction ends are filled in with biotin-labeled nucleotides, which after blunt end ligation can be pulled down for high-throughput sequencing. This removes the need to design specific oligonucleotide pairs (7) and increases the resolution to ϳ1 Mb based on 10 million pair-end reads (7). However, increasing HiC resolution is difficult because a 10-fold increase in resolution requires a 100-fold increase in sequence depth (27). Because HiC is only able to resolve on the Mb level, correlation with specific genes or epigenetic marks still remains unrealistic. Only in organisms with smaller genomes such as yeast can a kb level resolution be reached (28). Nevertheless, HiC remains a powerful tool for revealing chromosome territories and genome compartmentalization.
To specifically screen "point-to-point" interactions, another strategy called ChIA-PET was developed combining ChIP with 3C to discover loops bound by particular proteins (29). In the initial study, Fullwood et al. reported thousands of intrachromosomal contacts between estrogen receptor ␣-binding sites. Recently, this technique was extended to delineate looping activity between CCCTC-binding factor (CTCF) and RNA polymerase II (30,31). ChIA-PET is different from most other 3C-based protocols (Fig. 1). First, sonication is used instead of enzyme digestion, which rules out the digestion inefficiency at different restriction enzyme cutting sites. Second, an immunoprecipitation step is added after ligation. The interactions detected by ChIA-PET can be validated by traditional 3C and 4C (29), suggesting that ChIA-PET is robust and reproducible.
These 3C-based techniques are powerful but limited because 3C-based methods can capture only the average chromatin contact frequency in all cells. Whether these contact frequen- There are many methods derived from the original 3C design. Here, we present a few popular methods. In brief, cells are cross-linked, and chromatin is digested by restriction enzymes or sonicated. The structures of protein complexes containing DNA are preserved. These complexes are then diluted to a very low concentration, and ligation reactions are performed. Different amplification strategies are used to measure the relative cross-linking efficiency between loci. 3C is used to detect one specific interaction. 4C detects all possible interacting regions of one given locus. 5C and HiC provide "many-to-many" interacting efficiencies in a large genomic region or the whole genome. ChIP-PET includes immunoprecipitation to specifically examine the long-range interactions associated with a specific protein.
cies are significant will require validation in individual cells through FISH, which remains the only way to detect organization at the single cell level so far.

Long-range Interactions and Transcription
The classic explanation of transcription imagines a one-dimensional process in which RNA polymerase slides through a region to be transcribed (Fig. 2, A and B), but transcriptional mechanisms in vivo are likely very different. Early immunolabeling experiments of nascent RNA transcripts identified uneven distribution of transcripts in nuclei. These discrete foci were found to be sensitive to transcriptional inhibition (32). Even at saturation, the number of foci observed in such labeling studies ranged from hundreds to several thousand per nucleus (14,33,34), but there were far fewer foci than active transcription units observed (33,35). Thus, regions in which several transcription units shared the same foci came to be called "transcription factories" (36).
Immunogold and immunofluorescent detection of RNA polymerase II (RNAPII) coupled with nascent RNA revealed that many RNAPII foci and nascent RNA foci overlap (37). However, not all RNAPII sites are positive for nascent RNA labeling, indicating that not all RNAPII-enriched foci are equally active. RNAPII phosphorylated at Ser-2 and Ser-5 label active transcription factories, whereas RNAPII foci phosphor-ylated only at Ser-5 are transcription poised sites (13,34). Recent studies showed that when transcription initiation is inhibited by heat shock, RNAPII remains at the transcription foci 30 min after active genes have moved away (38), suggesting that transcription factories are not self-assembled RNAPII proteins brought together by transcription but are relatively stable subnuclear compartments.
A key feature of the transcription factory model is that disparate transcription units can share the same factory at a given frequency, a concept initially referred to as "aggregation neighboring active genes" (39). Subsequent studies revealed that colocalization of "neighboring genes" up to 40 Mb away can share the same factories at higher-than-expected frequencies, as confirmed by both RNA FISH/RNAPII Ser-5 colocalization and 3C assays (14). Colocalization of active loci is not limited to genes on the same chromosome but can be expanded to the whole genome, although trans colocalization occurs with reduced frequency (40). Studies comparing transcription of ␣-globin (Hba) and ␤-globin (Hbb) "super genes," constitutively expressed genes in all erythroid cells from nascent RNA FISH analysis (14), with other active genes suggested that although super genes constantly occupy transcription factories, other active genes transiently move into these factories only when transcription is needed. Supporting this "burst" model, kinetic analysis of immediate-early genes such as Fos and Myc indicated that induction as short as 5 min is sufficient for Fos relocation from outside of transcription factories into the constitutively transcribed Igh locus (41). Although the concept of transcription factories has become popular in recent years, debate continues on whether transcription is the direct and rapidly acting force that drives loci to transcription factories. One study showed that inhibition of transcription elongation for 5 h does not perturb colocalization (42). The interpretation may be that although correlated with transcription, colocalization is independent of transcriptional bursts. Other examples argue that colocalization of active genes may be mediated through nuclear speckles enriched in splicing factor SC35 (43). However, this may be an experimental artifact because active RNAPII redistributes with nuclear speckles in protocols utilizing chromatin fixation (44).
If active alleles "gather" to share basic transcriptional machinery, one would expect that active alleles in the same foci would behave similarly, but evidence appears to the contrary (40). Specialized transcription factories seem to be the organizers of coregulated genes (45). As shown by an immuno-RNA FISH tracking experiment, a minichromosome containing active genes introduced into cells localized with endogenous genes within a subset of transcription factories depending on the promoters of the active genes, suggesting that coregulated genes share the same factory (46). Schoenfelder et al. (40) provided more direct evidence of in vivo clustering of active genes; when a human HBB allele was introduced into a mouse genome, the integrated HBB preferentially localized with the endogenous mouse Hbb locus compared with the mouse Hba locus. When an enhanced 4C assay was applied to screen loci occupied by active RNAPII and interacting with Hbb or Hba, hundreds of overlapping loci were observed that contained both in cis and in trans transcription partners. These colocalization events were regulated by Klf1, which, together with RNAPII, formed these specialized transcription factories. Similarly, NFB factories transcribing microRNAs were observed during TNF␣ induction (47).
In summary, a functionally organized nucleus consists of specialized transcription factories containing active RNAPII, specific transcription factors, and various genes active or poised for transcription. Organization of the nucleus' intra-and interchromosomal interactions ensures compartmentalization of nonrandom transcription. These conformations may be tissuespecific and dynamically regulated in particular biological events.

Nuclear Lamins and Silenced Chromosome Regions
Nuclear lamins are type V intermediate filament proteins that are located between the inner nuclear membrane and peripheral heterochromatin (48). They are involved in various functions, including regulation of nuclear envelope shape (49), formation of the mitotic spindle (50), DNA replication (51), and transcription (52). More recently, lamins were shown to associate with transcriptionally inactive chromatin in humans and Drosophila (53,54). These large lamin-interacting domains (LADs) span Ͼ1 Mb and are mostly heterochromatic. The location of these LADs depends partly on their interaction with lamins, although definitive evidence is still lacking. For example, in lamin B null (LMNB Ϫ/Ϫ ) mouse embryonic fibroblasts, despite having lower gene density, chromosome 18 is positioned away from the nuclear periphery (55,56). Apart from LADs, centromeres and telomeres also interact with lamins. Centromeres are usually anchored near the nuclear periphery by B-type lamins (57,58). Similarly, the distribution and stability of telomeres are influenced by lamins as well (59).
Although the nuclear lamina associates with largely repressed regions, whether it causes transcriptional silencing remains an unanswered question. Transcriptional activity at the nuclear lamina has been shown to be similar to that within internal regions of the nucleus (52). Moreover, when chromosome regions relocate to the nuclear periphery through tethering to the inner nuclear membrane, only a fraction of the genes show reduced expression (60). Additionally, genes can relocate to other organizing structures such as pericentromeric heterochromatin loci and be silenced (61,62). Further investigation will clarify what role nuclear lamins play in transcriptional silencing.

Potential Role of Long-range Interactions in Chromosomal Translocation
Long-range interactions may be involved in creating chromosomal translocations. As a hallmark of cancer, specific chromosome regions frequently become translocated (63) through DNA single-or double-strand breaks (DSBs) (64,65). In many cases, this results in fusion proteins that confer selective advantage to cells harboring the translocations, contributing to cell transformation. Particular chromosomal translocations seem to recur due to the nonrandom compartmentalization of the genome. Interestingly, translocations are often associated with active transcription (66), which may correlate certain translocation patterns with transcription factory colocalization (67). As mentioned previously, although Myc and Igh are located on different chromosomes, they share the same transcription factory, with 25% of Myc alleles colocalizing with the Igh locus after Fos induction. This preferential colocalization may lead to more frequent chromosomal translocation as seen in Burkitt lymphoma (41). Recently, two studies assessing genome-wide translocation compared translocation sites using 4C-or HiCbased sequencing (68,69). To experimentally manipulate translocation frequency, both studies applied a knock-in or integrated I-SceI locus, which can be cut when restriction enzyme is introduced into the cells to cause artificial DSBs. Using highthroughput genome-wide translocation sequencing or translocation-capture sequencing, numerous translocations were identified after introducing DSBs at various loci. When comparing translocation frequency with spatial proximity, the studies reached complementary conclusions. Zhang et al. (69) utilized HiC to investigate translocation frequencies of various integrated I-SceI loci and found that both in cis and in trans translocation correlated with spatial proximity. Using 4C on a more defined system, Hakim et al. (68) found that whereas Aid Ϫ/Ϫ (Aid encodes an enzyme that create DNA breaks) translocation correlated with nuclear contact frequency, Aid-dependent translocations, which corresponded to recurrent translocations, did not correlate with nuclear contacts. Instead, it was DNA damage itself detected through AID activity in specific chromatin regions that correlated with recurrent translocations (68). Therefore, both studies concluded that although most certain types of translocation are associated with contact frequency, some are not.

Regulatory Mechanisms in Nuclear Organization
As discussed, the nucleus is highly organized and exhibits nonrandom conformations. Many powerful tools such as 3C-based methods and FISH have helped model nuclear organization in detail; however, there is still no efficient way to systematically identify the proteins that organize these interactions. Nonetheless, we discuss selected examples of protein and RNA factors that have been shown to influence chromatin conformation and may contribute to the organization of nuclear architecture.

Control of the ␤-Globin Locus by Transcription Factors EKLF, GATA-1, and FOG-1
The ␤-globin locus is one of the most extensively studied examples of long-range interaction. The locus control region (LCR), located 25 kb from the closest gene, regulates a set of genes encoding variants of the ␤-chain of hemoglobin as far as 80 kb away. As determined by 3C analysis, LCR loops out genes to interact with the ␥-globin loci in fetal cells and switches to regulating the ␤-globin loci in adult cells (70). LCR contains binding sites for transcription factors EKLF (Klf1) and GATA-1, both essential regulators of ␤-globin expression (71). EKLF knock-out mice show severely reduced ␤-globin expression and loss of long-range interactions between LCR and ␤-globin (72). Induction of GATA-1 fused with the estrogen receptor in a GATA-1 Ϫ/Ϫ background promotes long-range interactions and GATA-1 occupancy of LCR/␤-globin loci. Interaction of its cofactor FOG-1 with GATA-1 is also essential to mediate looping (73). More recently, enhanced 4C analysis of genome-wide long-range interactions involving ␤-globin loci showed that many of these loci colocalize with EKLF staining and contain EKLF-regulated genes (Fig. 3A) (40). These results validate the role of EKLF in organizing specialized transcription, including ␤-globin and its in cis and in trans interacting partners GATA-1 and FOG-1. Green arrows indicate genes with active epigenetic marks. Red brackets represent genes with repressive histone modification. Black boxes represent active enhancer elements. C, XIST-mediated XCI. The XIST lncRNA is expressed on Xi and spreads to cover the full chromosome (upper panel). The PRC2 complex is recruited by XIST and also loaded onto Xi. Xi is compacted and forms unique higher order chromosome structure (lower panel), with inactivated genes located inside, and active genes ("escapees" represented by red boxes) located on the outer surface.

CTCF Regulates Genome Organization in Pluripotent Stem Cells
Pluripotent stem cells are distinct from somatic cells in transcriptional and epigenetic status. Recent studies using 3C and 4C methods have shown that pluripotency factors Nanog and Klf4 have distinct interaction networks that maintain stem cell identity (74 -76). Additionally, Dixon et al. (6) systematically explored genome-wide organization by HiC and provided significant insights into genome structure in human and mouse pluripotent stem cells. Using an optimized HiC method and computational algorithms, a high-resolution interaction map generated in stem cells and differentiated progenitors established the concept of the "topological domain," defined as a region bound by narrow segments where chromatin interactions end. By determining long-range interacting partners at each specific domain and demarcating transition regions, this work provided a simple but elegant way of modeling genome organization.
HiC mapping of embryonic stem (ES) cells identified CTCF to be strongly enriched at topological boundary regions. As the only insulator characterized in vertebrates, CTCF is unique in its ability to block enhancers and demarcate euchromatin/heterochromatin boundaries (77). However, genome-wide occupancy profiling indicated that CTCF binds many more sites than anticipated, suggesting roles other than as an insulator (78,79). On the local level, CTCF binding reportedly participates in long-range interactions such as colocalization of gene loci Igf2/ H19 and Wsb1/Nf1, organization of the ␤-globin loci, and pairing of X chromosomes during X inactivation (19,80,81). Direct evidence that CTCF regulates long-range interactions on a genome-wide level was not available until Handoko et al. (30) characterized CTCF-mediated functional chromatin interactomes. The authors applied ChIA-PET through pulldown of CTCF in biological and technical replicates and deep sequenced for in cis and in trans long-range interactions (30). Altogether, 3306 CTCF-binding sites with 1480 intrachromosomal interactions and 336 interchromosomal interactions were identified and validated by 4C and FISH assays. These interactions were CTCF-dependent, demonstrated by CTCF knockdown cells analyzed by 3C and FISH. Most importantly, the authors defined four categories of looping with corresponding biological function based on assessment of epigenetic marks inside and outside the loop (Fig. 3B). The first category featured active epigenetic marks inside the loop and repressive marks outside, whereas the second described the opposite. The third category consisted of enhancer promoter loops, and the fourth considered loops acting as barriers between active and repressive regions. Defining these types of looping showed CTCF with multiple roles regulating higher order chromatin structure and linked chromatin loops with local transcription.
We are beginning to understand how CTCF functions in organizing chromatin structure, but several questions remain. First, CTCF has Ͼ68,000 binding sites in mouse ES cells, but only 3000 of them are identified to participate in looping (30). These 3000 sites cannot be distinguished from other CTCF sites, suggesting that cofactors may be involved. Another explanation may be that binding interactions are transient or occur only in a subpopulation of cells, thus requiring deeper sequencing depth to capture those interactions. Furthermore, although CTCF is ubiquitously expressed, no comprehensive studies have looked at whether transient depletion of CTCF has a significant influence on transcription, thus leaving the full biological impact of looping elusive for now.

Long Noncoding RNAs in X Chromosome Inactivation
Perhaps the best example of long noncoding RNAs (lncRNAs) mediating higher order chromatin structure is in X chromosome inactivation (XCI). XIST was the first lncRNA identified to be essential for XCI, which was later followed by the discovery of other lncRNA players such as RepA, Tsix, and Jpx (82).
XCI proceeds stepwise through development. Before differentiation of ES cells, the lncRNA Tsix is expressed biallelically and prevents binding of the RepA-PRC2 protein complex to the Xist promoter (83). When differentiation initiates, Tsix expression becomes limited to only one randomly selected allele (83). This leads to RepA-PRC2 complex loading onto the Xist locus on the to-be inactivated X chromosome (Xi) and activates Xist expression. Xist transcripts recruit PRC2 as transcription proceeds (84), while at the same time, transcription factor YY1 binds to the "nucleation center" of Xi but not to the activated X chromosome (85). Finally, the PRC2-XIST complex is loaded onto the YY1 protein and begins the spread of heterochromatic regions from the nucleation center to neighboring regions, eventually "coating" PRC2 and XIST over the surface of the entire Xi (Fig. 3C) (86). Although XIST clearly functions in almost every step of XCI, XIST depletion after XCI has little effect on chromosome conformation (87).
The central role of lncRNAs in XCI may not be unique but rather an example of their universal role as mediators of higher order chromatin structure because lncRNAs appear to be inherently suitable to mediate these types of interactions. lncRNAs can tether to transcription units to serve as allelespecific tags, as their binding sites are more selective than those of DNA-binding proteins (88), or serve as adaptors and interact with multiple proteins (89,90). For example, a recent study showed HOTTIP lncRNA as an essential mediator of the WDR5-MLL complex in regulating long-range interactions and gene expression at the HoxA cluster (90). lncRNAs are more likely to be retained in the nucleus, making them essential regulators of nuclear architecture (91). Further characterization of lncRNA function through knock-out studies is needed to conclude which specific lncRNAs are involved in nuclear organization because some knock-out studies have shown that some lncRNAs are dispensable for nuclear organization (92)(93)(94).

How Dynamic Are the Long-range Interactions? Insights from Mobility Studies
To date, most chromatin interaction studies use fixed cells, which can only reflect interaction efficiency at a population average. FISH experiments face limitations as well, where well defined long-range interactions are seen only in a small population of cells (92)(93)(94). To fully understand chromatin dynamics, chromatin motility studies provide us with a glimpse. One study in yeast revealed that chromatin undergoes Brownian diffusion within a confined nuclear subregion (95), which accords with the concept of chromosome territories (96). In mammalian cells, it is estimated that the radius of rapid motion is ϳ40 nm and requires ATP in an active mode of Brownian diffusion, not a passive one (97). Long-range motion, occurring usually at the micrometer level, is also much rarer than short-range motion (98) as concluded by 4C and HiC studies showing that most interactions occur within domains at the Mb level. Thus, these interchromosomal interactions and their overall biological significance still remain to be determined.

Conclusion
The significance of higher order chromatin structure and long-range chromosomal interactions has opened up new fields of research in molecular biology. A combination of FISH-and 3C-based technology has helped probe nuclear architecture at unprecedented resolution, and instead of imagining chromatin as a randomly compacted structure, we can now treat chromatin as a dynamic structure that loops within and between compartments in the nucleus. Many distal elements control specialized cellular functions such as transcription, gene silencing, and translocation through long-range interactions. Although the details of how features of nuclear architecture regulate these functions are still largely unknown, studies on insulators, transcription factors, the polycomb complex, and noncoding RNAs provide some preliminary insight into the regulation of biological function by these chromatin structure components.