CpG-binding Protein Is a Nuclear Matrix- and Euchromatin-associated Protein Localized to Nuclear Speckles Containing Human Trithorax

CpG-binding protein (CGBP) binds unmethylated CpG dinucleotides and is essential for mammalian development. CGBP exhibits a punctate nuclear localization correlated with 4,6-diamidino-2-phenylindole light regions and is excluded from metaphase chromosomes. The distribution of CGBP is distinct from the heterochromatin-associated proteins MBD1, methyl-CpG-binding protein 2, and HP1α. Some CGBP-containing nuclear speckles co-localize with splicing factor SC-35 and actively transcribed regions of the genome, whereas most CGBP co-localizes with acetylated histones, indicating that CGBP is localized to active chromatin. CGBP contains two nuclear localization signals that are insufficient to direct punctate subnuclear distribution. Instead, localization of CGBP to nuclear speckles requires signals within the acidic, basic, and coiled-coil domains. CGBP associates with the nuclear matrix, and fragments of CGBP that fail to associate with the nuclear matrix fail to localize to nuclear speckles and exhibit reduced transcriptional activation activity. Mutated versions of CGBP that lack DNA binding activity exhibit a normal nuclear distribution, suggesting that CGBP accumulates at nuclear speckles as a result of protein/protein interactions. Importantly, the subcellular distribution of CGBP is identical to human trithorax, suggesting that these proteins may be components of a multimeric complex analogous to the histone-methylating Set1 complex ofSaccharomyces cerevisiae that contains CGBP and trithorax homologues.

Cytosine methylation of CpG motifs is an important epigenetic modification in higher eukaryotes. Methylated DNA is generally associated with transcriptionally inactive heterochromatin (1). In contrast, actively expressed genes are generally hypo-methylated and found in an open euchromatin configuration. Appropriate cytosine methylation is essential for normal mammalian development. For example, individual disruption of the genes encoding the methyltransferases Dnmt1, Dnmt3a, or Dnmt3b result in an embryonic lethal phenotype in mice (2,3), and overexpression of Dnmt1 leads to hyper-methylation of DNA, loss of genomic imprinting, and embryonic lethality (4). More subtle mutations within human Dnmt3b result in ICF (immunodeficiency, centromere instability, and facial anomalies) syndrome (5), and mutations of methyl-CpGbinding protein 2 (MeCP2) 1 lead to Rett syndrome, a progressive neurodegenerative disorder (6). In addition, hyper-methylation of tumor suppressor genes is commonly observed in cancer cells (7). Despite the importance of appropriate cytosine methylation patterns for normal mammalian development, little is known regarding the regulation of this process.
Several DNA-binding proteins have been described that interact specifically with methylated CpG motifs. These include MeCP2, methyl CpG-binding domain (MBD) protein 1, MBD2, and MBD4 (8). In addition, MBD3 is a component of the Mi-2 histone deacetylase and nucleosome remodeling complex (9). Similarly, MBD2 is a component of the MeCP1 histone deacetylase complex (10), thus linking cytosine methylation with histone acetylation and providing a unifying framework for the control of chromatin structure and gene regulation.
CpG-binding protein (CGBP) is a recently described DNAbinding protein that exhibits a novel binding affinity for DNA sequences containing unmethylated CpG motifs (11). Consistent with this binding specificity, CGBP acts as a trans-activator of transcription in co-transfection experiments. Interestingly, the single DNA-binding domain of CGBP is composed of a cysteine-rich CXXC domain (12). The CXXC domain is only found in a few other proteins, including Dnmt1, MBD1, human trithorax (HRX/MLL/ALL-1), and MLL-2 (11,13,14). Reciprocal chromosomal translocations involving the HRX gene are commonly observed in acute leukemia. Little is known regarding the function of the related MLL-2 gene, although it is amplified in some cancer cell lines (15,16). In contrast to CGBP, these other CXXC domain proteins contain additional distinct DNA-binding domains, and the role of the CXXC domain within these factors is unclear. However, recent studies (17,18) demonstrate that similar to CGBP, CXXC domains within MBD1 and HRX exhibit binding affinity for unmethylated CpG motifs.
Ligand selection of high affinity binding sites from a pool of degenerate double-stranded oligonucleotides reveals a consen-sus binding site for CGBP of (A/C)CpG(A/C) (12). Individual mutation of highly conserved cysteine residues within the CXXC domain of CGBP completely abrogates DNA binding activity, and this domain requires the presence of zinc for efficient DNA binding activity (12). In addition, CGBP contains two copies of the plant homeodomain (PHD), a motif found in several dozen proteins implicated in modulation of chromatin structure and gene regulation (11,19). Recently, the PHD finger of the polycomb-like protein was found to mediate direct interaction with the protein "enhancer of zeste" (20), and a PHD domain within the chromatin-associated KAP-1 protein cooperates with a bromodomain to recruit a chromatin-remodeling complex (21). Furthermore, the PHD finger within CREBbinding protein contributes to the acetyltransferase activity of this factor (22). Importantly, we have recently reported that CGBP-null mice exhibit a peri-implantation lethal phenotype (23), thus demonstrating the requirement of CGBP for normal mammalian development. Although the molecular basis for this phenotype is still under investigation, it is intriguing that CGBP-null embryos die at a developmental stage during which global changes of cytosine methylation occur (24,25). Given the DNA binding specificity of CGBP and its structural similarity to other proteins involved in the regulation and function of cytosine methylation and chromatin structure, it is tempting to speculate that CGBP-null embryos die from dysregulated gene expression, at least partially as a consequence of altered chromatin structure. In an effort to better understand the global function of CGBP, and its possible role in modulating chromatin structure, experiments were performed to determine the subcellular localization of CGBP and define the protein domains required to direct this distribution.

EXPERIMENTAL PROCEDURES
Cell Culture and Transfection-HEK-293 (human embryonic kidney) and NIH-3T3 (mouse embryonic fibroblast) cells were grown as monolayers in Dulbecco's modified Eagle's medium supplemented with 10% fetal calf serum (HyClone, Logan, UT) at 37°C in a humidified atmosphere of 5% CO 2 . For subcellular localization and immunofluorescence experiments, cells were cultured on a coverslip in a 24-well dish and transfected with 1-2 g of DNA by calcium-phosphate co-precipitation (26). Following transfection, cells were grown in fresh media for 24 h, and fixed with cold methanol or 4% paraformaldehyde. For co-transfection assays, cells at ϳ60% confluency were transfected with 10 -20 g of DNA by calcium phosphate co-precipitation. Following transfection, cells were grown in fresh media for 24 -48 h prior to analysis. Transactivation assays were performed by co-transfecting HEK-293 cells with a fixed amount (2.5 g) of CMV-based pEGFP reporter vector and various amounts of pcDNA3 FLAG-CGBP expression vector. The total amount of DNA in each sample was equalized by adding pcDNA3 plasmid. Cells were harvested and analyzed by Western blot as described above.
Subnuclear Biochemical Fractionation and Western Blot Analysis-Sequential nuclear extraction was performed as described (27). Cells were transfected with 5-10 g of DNA by calcium-phosphate co-precipitation. Following transfection, cells were grown in fresh media for 48 h and used for analysis. Cells were washed with cold phosphate-buffered saline (PBS) and extracted with cytoskeleton buffer (CSK) containing 10 mM Pipes (pH 6.8), 100 mM NaCl, 300 mM sucrose, 3 mM MgCl 2 , 1 mM EGTA, supplemented with the protease inhibitors leupeptin, aprotinin, and pepstatin (1 g/ml each), 1 mM phenylmethylsulfonyl fluoride, 1 mM dithiothreitol, and 0.5% (v/v) Triton X-100. After extraction for 5 min on ice, the proteins solubilized in CSK buffer were recovered following separation from the insoluble fraction by centrifugation at 5,000 ϫ g for 3 min. Chromatin was solubilized by digestion with 30 units of DNase I (RNase-free) in CSK buffer containing protease inhibitors for 20 min at 37°C. The chromatin fraction was extracted for 5 min on ice by adding 1 M ammonium sulfate in CSK buffer to a final concentration of 0.25 M and centrifuging as above. The pellet was further extracted with 2 M NaCl in CSK buffer for 5 min on ice. The remaining pellet was solubilized with sodium phosphate buffer containing 8 M urea, which was considered the nuclear matrix fraction. Protein concentration was determined by the Bradford assay. For Western blot analysis, an equivalent proportion of each fraction was solubilized in Laemmli sample buffer (28). Following electrophoresis on a 6 -12% SDS-polyacrylamide gel, proteins were transferred onto nitrocellulose membrane (MSI, Westborough, MA). The membrane was then incubated with either anti-CGBP antibody (11), anti-acetyl histone H3 antibody (Upstate Biotechnology, Inc., Lake Placid, NY), anti-NuMA antibody (Oncogene, Cambridge, MA), anti-FLAG antibody (Sigma), or anti-GFP antibody (Clontech, Palo Alto, CA) followed by horseradish peroxidase-labeled secondary antibody and detected by using an ECL detection kit (Amersham Biosciences) according to the manufacturer's instructions.
Deletion mutations of CGBP were prepared using a combination of restriction enzyme digestions and PCR amplification using Pfu polymerase (Stratagene, La Jolla, CA) and subcloned into pEGFP and/or pcDNA3-FLAG vectors. Site-directed mutagenesis was performed on the CXXC domain of CGBP using primers that mutate cysteine residues to alanine using the QuickChange site-directed mutagenesis kit (Stratagene, CA) in accordance with the protocol provided by the manufacturer. Mutagenesis oligonucleotides include C169A (mutates cysteine 169 to alanine), 5Ј-cggtcagcccgcatggctggtgagtgtgaggcatg-3Ј, and C208A (mutates cysteine 208 to alanine), 5Ј-gccggctgcgccaggcccagctgcgggccc-3Ј. Each mutation was subcloned into the pFLAG-CMV (Sigma) and pEGFP vectors. The nucleotide sequences of truncated and mutated CGBP constructs were confirmed by automated DNA sequencing.
Indirect Immunofluorescence and Confocal Microscopy-Cells were seeded onto a cover glass at 2-5 ϫ 10 4 cells/well in a 24-well dish and transfected as described above. Cells were then washed twice with cold PBS and fixed with 4% (v/v) paraformaldehyde in PBS for 20 min at room temperature and then washed with PBS. Cells were permeabilized with 0.2% Triton X-100 in PBS for 10 min at room temperature, FIG. 1. CGBP localizes to nuclear speckles in DAPI light regions. A, nuclear distribution pattern of endogenous CGBP and GFP-CGBP fusion proteins. Endogenous CGBP protein was detected in NIH-3T3 and HEK-293 cells using anti-CGBP antibody and Texas Redconjugated secondary antibody. Nuclei were counterstained with DAPI and observed by confocal microscopy. Vectors expressing GFP-tagged CGBP were transiently transfected into NIH-3T3 and HEK-293 cells as described under "Experimental Procedures." B, nuclear distribution of speckles containing GFP-CGBP was compared with DAPI staining in NIH-3T3 and HEK-293 cells. and a blocking solution (PBS containing 2.5% normal serum (Santa Cruz Biotechnology, Santa Cruz, CA, and 0.2% Tween 20) was added and incubated for 1 h at room temperature. Anti-CGBP rabbit IgG (1:500) (11), anti-acetyl histone 3 rabbit IgG (1:100, 5 g/ml) (di-acetylated Lys-9 and Lys14; catalog number 06-599; Upstate Biotechnology, Inc.) and anti-acetyl histone 4 rabbit IgG (tetra-acetylated Lys-4, Lys-7, Lys-11, and Lys-15; catalog number 06-598; Upstate Biotechnology, Inc.) (1:100, 5 g/ml), (Upstate Biotechnology, Inc.), anti-SC-35 (1:500, 9.2 g/ml) (Sigma) or anti-FLAG mouse IgG (1:1000, 3.5 g/ml) (Sigma) were added and incubated for 2 h at room temperature. Cells were then washed three times with PBS containing 0.2% Tween 20 for 5 min. Appropriate secondary antibody labeled with Texas Red (2 g/ml in blocking solution) (Santa Cruz Biotechnology) was added and incubated for 1 h at room temperature. Cells were washed three times with PBS containing 0.2% Tween 20 for 5 min. Nuclear counterstaining was performed with 0.1 g/ml DAPI in PBS for 5 min followed by washing with PBS. Cells were mounted with 10 l of Fluoromount G (Southern Biotechnology Associates, Birmingham, AL) and observed using a fluorescence microscope or were scanned with a Zeiss LSM 510 laser scanning confocal microscope. Bromouridine triphosphate was used for in situ detection of RNA synthesis in permeabilized cells using immunodetection with antibody directed against bromouridine (Harlan Seralab, Indianapolis, IN) as described previously (32).

Determination of the Subcellular Localization of CGBP-
Confocal immunofluorescence was performed on HEK-293 and NIH-3T3 cells using antisera directed against CGBP. The endogenous CGBP protein was localized to the nucleus in both cell lines and exhibits a punctate or speckled distribution (Fig.  1A). Transfection of these cells with a vector expressing GFPtagged full-length human CGBP results in a similarly speckled nuclear distribution of the GFP-CGBP fusion protein (Fig. 1A). These speckles are concentrated in areas of DAPI light staining, consistent with localization of CGBP to euchromatic regions of the nucleus (Fig. 1B).
Co-localization experiments were conducted to better characterize the nature of the nuclear speckles containing CGBP. Cells were co-transfected with vectors expressing epitopetagged (GFP or FLAG) CGBP or an epitope-tagged marker protein. The results presented in Fig. 2 indicate that CGBP exhibits a subnuclear distribution that is entirely distinct from that of MBD1, MeCP2, and HP1␣, each of which has been FIG. 2. CGBP is localized to euchromatin and co-localizes with HRX. Nuclear distribution of CGBP was compared with the methyl-CpGspecific proteins MBD1 and MeCP2, heterochromatin marker protein HP1␣, RNA splicing factor 35, actively transcribed regions, markers of active chromatin region acetylated histone H3 and acetylated histone H4, and HRX. FLAG-tagged or GFP-tagged CGBP were co-transfected with GFP-tagged MBD1 or FLAG-tagged MeCP2, HP1␣, or amino-terminal HRX (amino acids 1-1436). The FLAG epitope was detected with anti-FLAG antibody and Texas Red-conjugated secondary antibody. GFP-tagged CGBP was transiently expressed and compared with endogenous SC-35, acetylated histone H3, and acetylated histone H4 using specific antisera and Texas Red-conjugated secondary antibody as described under "Experimental Procedures." Actively transcribed regions were labeled with bromouridine and detected as described under "Experimental Procedures." All experiments were performed in NIH-3T3 cells except for SC-35, which was examined in HEK-293 cells.
found to localize within regions of heterochromatin (8,30,33,34). In contrast, a small fraction of CGBP-containing speckles co-localize with the RNA splicing factor SC-35 and with actively transcribed regions of the genome. However, CGBP exhibits significant co-localization with acetylated histone H3 and acetylated histone H4, which are markers for areas of euchromatin. Importantly, the set of nuclear speckles that contain CGBP is identical to that detected as containing the amino terminus (amino acids 1-1436) of HRX, which has previously been reported to localize within a novel class of nuclear speckles and to associate with the nuclear matrix (35)(36)(37).
Further microscopy studies were performed to examine the localization of CGBP in metaphase cells. As reported previously (18), MBD1 is found tightly associated with heterochromatic condensed chromosomes at metaphase (Fig. 3, bottom  row). In contrast, GFP-CGBP exhibits a highly diffuse distribution during this phase of the cell cycle (Fig. 3, top row). It is excluded from the condensed chromosomes and does not exhibit a punctate distribution during metaphase.
Finally, biochemical fractionation experiments were utilized as an independent method to further assess the subcellular localization of CGBP. HEK-293 and NIH-3T3 cells were successively extracted to recover soluble, chromatin-associated, and nuclear matrix-associated protein fractions. Western blot analyses demonstrate that endogenous CGBP is nearly exclusively associated with the nuclear matrix fraction (Fig. 4). Western blots were also performed with antisera directed against NuMA and acetylated histone H3 as controls for the integrity and purity of the nuclear matrix and chromatinassociated protein fractions, respectively (38,39). We conclude that CGBP associates with the nuclear matrix, consistent with the previously reported nuclear matrix association of the colocalizing protein HRX (35).
Identification of Protein Domains Responsible for Nuclear Localization of CGBP and for Targeting to Nuclear Speckles and Association with the Nuclear Matrix-Experiments were performed to identify the signals required for CGBP association with the nuclear matrix and nuclear speckles. For example, the DNA binding activity of Ikaros has been found to be essential for its localization to heterochromatin speckles (40). Experiments were conducted to determine whether the DNA binding activity of CGBP is similarly required for the observed subcellular distribution. Mutations of conserved cysteine residues within the CXXC domain that ablate DNA binding activity (12) were introduced into GFP-CGBP fusion proteins. The subcellular localization of these DNA-binding deficient forms of CGBP was assessed in both NIH-3T3 and HEK-293 cells. Confocal microscopy and biochemical fractionation reveals that both of these mutated constructs co-localize with wild type full-length CGBP FLAG-tagged protein (Fig. 5A) and associate with the nuclear matrix (Fig. 5B). Hence, DNA binding activity was not required for targeting of CGBP to nuclear speckles. Rather, this localization was presumably mediated via protein/ protein interactions.
We have demonstrated previously (11) that CGBP strongly trans-activates the cytomegalovirus (CMV) promoter, which contains numerous CpG motifs. As expected, the DNA-binding deficient CGBP constructs fail to trans-activate a co-transfected reporter gene vector composed of the CMV promoter/ enhancer driving expression of GFP (Fig. 5C). Note that wild type CGBP additionally auto-activates expression of the CGBP expression vector, which also contains the CMV promoter (Fig.  5C, top panel).
In order to define the nuclear localization signals within CGBP, a series of truncated versions of CGBP (Fig. 6A) was generated and expressed in NIH-3T3 cells as GFP fusion pro-teins (Fig. 6B). Successive truncations from the carboxyl terminus of CGBP reveals that a fragment as short as amino acids 1-122 is localized to the nucleus (Fig. 6). However, truncation to amino acids 1-103 results in a cytoplasmic localization. Hence, a nuclear localization signal resides within amino acids 104 -122 of CGBP. Inspection of the amino acid sequence of this domain (DEGGGRKRPVPDPDLQRRA) (11) reveals a cluster of basic residues (in boldface) consistent with a consensus bipartite nuclear localization signal (RKRX 8 RR) (41).
Importantly, the 302-656 amino acid fragment of CGBP is also partially localized to the nucleus, indicating the presence of at least one additional nuclear localization signal in the carboxyl half of the CGBP protein. Expression of the basic domain of CGBP (amino acids 318 -367) as a GFP fusion protein is sufficient to direct efficient nuclear localization, thus defining a second nuclear localization signal. This 40-amino FIG. 3. CGBP is excluded from chromosomes during mitosis. GFP-tagged CGBP or GFP-tagged MBD1 was transiently expressed in HEK-293 cells, which were then grown in medium containing 0.1 g/ml Colcemid to block the cells in mitosis. Nuclei were counterstained with DAPI and observed with confocal microscopy.

FIG. 4. CGBP is associated with the nuclear matrix. NIH-3T3
and HEK-293 cells were fractionated by sequential nuclear extraction as described under "Experimental Procedures." An equal proportion of each fraction was analyzed by Western blotting using anti-CGBP, antiacetyl histone H3, or anti-NuMA antisera. acid fragment contains 65% histidine, arginine, or lysine residues. The GFP fusion protein containing CGBP amino acids 361-481 is evenly distributed between the nucleus and cytoplasm, suggesting a nonspecific diffusion of this small peptide throughout the cell. The downstream fragment of CGBP containing amino acids 483-656 is excluded from the nucleus, indicating the absence of nuclear localization signals downstream of the coiled-coil domain. It is surprising that the isolated basic domain localized to the nucleus more efficiently than a longer fragment (amino acids 302-656), which also contains the basic domain. Presumably elements within the downstream sequence act to mask the nuclear localization signal located within the basic domain when expressed in this fashion. Addition of the upstream CGBP nuclear localization signal (amino acids 109 -121) to the 302-656 construct results in efficient nuclear localization (Fig. 6B).
Additional truncated versions of GFP-CGBP fusion proteins were expressed in NIH-3T3 cells to identify protein domains required to target CGBP to nuclear speckles. A fragment of CGBP (amino acids 1-481) lacking the PHD2 domain exhibits a speckled nuclear distribution (Fig. 6B). However, additional truncation to amino acid 367 (amino acids 1-367), which deletes the coiled-coil domain, results in a partially speckled pattern with much of the GFP fusion protein distributed diffusely throughout the nucleus. Further truncation to amino acid 320 (amino acids 1-320), which additionally deletes the basic domain, results in a complete loss of nuclear speckling and a diffuse nuclear distribution. Hence, signals encoded within the basic and coiled-coil domains are required for directing a punctate distribution of CGBP.
Truncation of the amino terminus of the protein reveals that amino acids 1-301, including the PHD1 domain, CXXC DNAbinding domain, and a portion of the acidic domain, are not required for directing a punctate distribution, as the CGBP fragment containing amino acids 302-656 exhibits a speckled nuclear distribution (Fig. 6B). The CGBP fragment containing amino acids 103-481 also exhibits a punctate nuclear distribution, similar to the pattern produced by expression of fulllength GFP-or FLAG-CGBP (Fig. 6, B and C). Similar to the results described above, removal of the coiled-coil domain from the carboxyl-terminal end of this fragment of CGBP (leaving amino acids 103-367) leads to a partially speckled nuclear distribution, whereas additional removal of the basic domain (leaving amino acids 103-320) leads to a diffuse nuclear distribution. Stepwise truncations from the amino terminus of the 103-481-amino acid CGBP fragment produced similar results. Truncation that leaves amino acids 213-481, which removes the CXXC domain, leads to a speckled subnuclear distribution. Additional truncation to position 317 (leaving amino acids 318 -481), which removes the acidic domain, leads to a partially speckled distribution with a background of diffuse localization. This indicates that the acidic domain, which contains 29% glutamate and aspartate residues, participates in the appropriate localization of CGBP to nuclear speckles.
Importantly, however, none of the individual domains located in the central region of CGBP are sufficient to direct a speckled nuclear distribution (Fig. 6B). Expression of the coiled-coil domain (amino acids 361-481), acidic domain (amino acids 213-320), or basic domain (amino acids 318 -367) as individual GFP fusion proteins results in a diffuse distribution FIG. 5. Association of CGBP with nuclear speckles and the nuclear matrix is independent of DNA binding activity. A, targeting of CGBP to nuclear speckles does not require DNA binding activity. GFP-tagged CGBP mutants (C169A and C208A) that fail to bind to DNA (12) and FLAG-tagged CGBP were co-transfected into NIH-3T3 and HEK-293 cells and detected with confocal microscopy using anti-FLAG antibody and Texas Red-conjugated secondary antibody as described under "Experimental Procedures." B, DNA-binding domain mutants of CGBP are associated with the nuclear matrix. FLAG-tagged CGBP expression vectors were transiently expressed in HEK-293 cells and fractionated as described under "Experimental Procedures." An equal proportion of each fraction was analyzed by Western blotting using anti-FLAG antibody. C, DNA-binding domain mutants of CGBP do not trans-activate reporter genes. Increasing amounts (1, 2.5, 5, and 10 g) of FLAG-tagged CGBP expression vectors were co-transfected with fixed amounts of CMV-based pEGFP-C2 vector into HEK-293 cells. Total amounts of transfected DNA were normalized using pcDNA3 vector DNA. Cells were harvested 2 days following transfection, washed with PBS, and lysed with 8 M urea buffer. Equal proportions of each fraction were analyzed by Western blot using anti-FLAG or anti-GFP antisera as described under "Experimental Procedures." throughout the nucleus. The coiled-coil and acidic domains were linked to the amino-terminal CGBP nuclear localization signal (amino acids 109 -121), as these fragments lack both of the CGBP nuclear localization signals. Hence, multiple domains distributed throughout the central region of CGBP contribute cooperative signals that lead to a speckled nuclear distribution.
Biochemical fractionation experiments were also performed on cells expressing truncated GFP-CGBP fusion proteins to compare nuclear matrix association with the subnuclear distribution observed by confocal microscopy. These two parameters are found to be highly correlated. Without exception, CGBP constructs that exhibit a predominant punctate nuclear distribution, such as fragments containing amino acids 1-481, 302-656, 103-481, and 213-481, are also nearly exclusively associated with the nuclear matrix (Fig. 7). However, constructs that exhibit a partially speckled distribution, such as fragments containing amino acids 1-367, 103-367, 318 -367, or 318 -481, are partitioned between the chromatin-associated fraction and the nuclear matrix fraction. Constructs that exhibit a diffuse nuclear staining, such as fragments containing amino acids 1-320, 103-320, or 361-481, are found predominantly in the soluble fraction.
We conclude that multiple signals present within the central portion of CGBP are responsible for association with the nuclear matrix and generation of a punctate subnuclear distribution. Signals within the acidic, basic, and coiled-coil domains all contribute to this localization, yet no individual domain is sufficient to direct this subcellular targeting. Interestingly, in addition to its activity as a nuclear localization signal, the basic domain appears essential for proper subnuclear localization of CGBP and is sufficient for directing at least a partially speckled distribution when linked to either the coiled-coil or acidic domains. Although not sufficient for normal CGBP localization, expression of the isolated basic domain as a GFP fusion protein permits partial association with the chromatin fraction.
The functional significance of the association of CGBP with the nuclear matrix and nuclear speckles was examined in cotransfection assays. As demonstrated in Fig. 5C, CGBP transactivates expression of a GFP reporter gene under the control of the CMV promoter, which contains several dozen CpG motifs. A similar analysis was performed with several truncated CGBP constructs that exhibit various degrees of nuclear speckling and association with the nuclear matrix. All of the constructs tested contain both the acidic domain that is sufficient for trans-activation activity of CGBP (42) and the CXXC DNA-FIG. 6. Analysis of protein domains that direct nuclear localization and target CGBP to nuclear speckles. A, schematic diagram of CGBP and various truncated fragments. Protein regions such as the PHD, CXXC, acidic, basic, and coiled-coil domains are indicated, along with corresponding amino acid positions. The amino-terminal bipartite nuclear localization signal of CGBP (amino acids 109 -121) was inserted between the GFP epitope and CGBP sequence for constructs containing amino acids 361-481, 302-656, and 213-320. B, subcellular distribution of GFP-CGBP fusion proteins. GFP-tagged constructs were transiently expressed in NIH-3T3 cells and observed with confocal microscopy as described under "Experimental Procedures." C, the co-localization of a truncated mutant (amino acids 103-481) with wild type CGBP. GFP-tagged mutant and FLAG-tagged CGBP were co-transfected and detected using anti-FLAG antibody and Texas Red-conjugated secondary antibody with confocal microscopy. binding domain (11,12). A construct encoding amino acids 1-481, which exhibits a subnuclear distribution identical to that of the full-length CGBP protein, exhibits a trans-activation activity similar to the wild type CGBP construct (Fig. 8). The construct containing amino acids 1-367, which exhibits a partially speckled distribution and partial association with the nuclear matrix, exhibits a reduced trans-activation activity (ϳ50% of wild type when normalized for the level of FLAG-CGBP expression). Importantly, the CGBP construct containing amino acids 1-320, which exhibits a diffuse nuclear distribution and fails to associate with the nuclear matrix, exhibits a dramatically reduced trans-activation activity (ϳ15% of wild type) despite containing DNA-binding and trans-activation domains. Hence, appropriate subnuclear targeting appears to be critical for normal CGBP function. DISCUSSION These studies were conducted to determine the physical distribution of CGBP within a cell in the expectation that such information will offer insight into the global function and mechanism of action of CGBP. CGBP localizes nearly exclusively to euchromatic nuclear speckles. The distribution of CGBP is distinct from that exhibited by the heterochromatin-associated proteins MBD1, MeCP2, and HP1␣. In contrast, the distribution of CGBP slightly overlaps with the RNA splicing factor SC-35 and with actively transcribed regions of the genome, and exhibits significant overlap with a subset of regions containing acetylated histones. Hence, CGBP localizes to areas of euchromatin rather than sites of active transcription, suggesting that it plays a role in modulation of chromatin structure.
The CGBP protein contains two nuclear localization signals, located upstream of the CXXC DNA-binding domain and within the basic domain. Neither of these nuclear localization signals is capable of directing a speckled nuclear distribution. Rather, the signals that direct CGBP to active chromatin nuclear speckles are distributed throughout an extended central region of the protein that includes acidic, basic, and coiled-coil domains. However, none of these domains is individually able to compose a nuclear speckle targeting signal, although the basic domain associates with the chromatin fraction and directs partial targeting to nuclear speckles when linked to either the adjacent acidic or coiled-coil domains. In addition, biochemical fractionation experiments demonstrate that CGBP associates nearly exclusively with the nuclear matrix, an interaction that is independent of DNA binding activity, and is presumably a consequence of protein/protein interactions. Association of CGBP with the nuclear matrix appears to be causally related to localization to nuclear speckles, as these two characteristics coincide for all mutated versions of CGBP analyzed. Importantly, association of CGBP with nuclear speckles and the nuclear matrix is functionally important for the trans-activation activity of this factor.
Interestingly, CGBP uniquely localizes to a set of nuclear speckles that also contains HRX (MLL/ALL-1). HRX is the human homologue of Drosophila trithorax, which is a regulator of chromatin structure and homeobox gene expression. HRX is at least partially associated with the nuclear matrix (35), and HRX-containing nuclear speckles have been shown to be distinct from nuclear speckles that contain PML or TAL-1 (36). Consistent with the results reported here, HRX-containing speckles were found previously to be distinct from sites of active transcription, as determined by the localization of the SC-35 splicing factor (43). Interestingly, in contrast to HRX that associates with condensed chromosomes during mitosis, CGBP fails to interact with metaphase chromosomes. Hence, these proteins co-localize to a unique class of nuclear speckles during interphase, yet exhibit disparate subcellular localiza-tions during cell division, indicating that CGBP-containing nuclear speckles are dynamic structures that undergo remodeling throughout the cell cycle.
Although CGBP and HRX both contain CXXC and PHD domains, analysis of truncation mutations indicates that neither of these domains is responsible for directing CGBP to HRX-containing nuclear speckles. Similar to CGBP, the signals directing HRX to nuclear speckles are complex. At least two domains (SNL-1 and SNL-2) within the amino terminus of HRX are capable of directing HRX to nuclear speckles (37). However, a version of HRX lacking these two domains still localizes to nuclear speckles, indicating that at least one additional domain is involved in this subcellular targeting. Comparison of the SNL-1 and SNL-2 sequences of HRX with the FIG. 7. Identification of CGBP domains that mediate association with the nuclear matrix. A, schematic diagram of CGBP and truncated fragments. The amino-terminal bipartite nuclear localization signal of CGBP (amino acids 109 -121) was inserted between the epitope (GFP or FLAG) and CGBP sequence for constructs containing amino acids 302-656 and 361-481. Numbers indicate amino acid residues of CGBP. B, biochemical fractionation of GFP-and FLAG-CGBP fusion proteins. FLAG or GFP-tagged constructs were transiently expressed in HEK-293 cells and fractionated as described under "Experimental Procedures." An equal proportion of each fraction was analyzed by Western blotting using anti-FLAG or anti-GFP antisera. S, soluble fraction; C, chromatin fraction; M, nuclear matrix fraction. central region of CGBP reveals intriguing pockets of sequence homology. A highly basic region within SNL-2 (amino acids 1067-1074; RIKHVCRR) shares 63% identity and 88% similarity to sequence present within the basic domain of CGBP (amino acids 326 -333; KVKHVKRR). Mutation of residues 1065-1073 of HRX results in the loss of ϳ80% of nuclear speckles (37). Similarly, mutation of amino acids 418 -423 (SSRIIK) of HRX results in the loss of 60% of nuclear speckles. Sequence comparison reveals that this region is embedded within a short domain that exhibits 40% identity and 60% similarity with a region of the CGBP acidic domain, amino acids 260 -269 (AVASSTVKEP). Whether these short regions of similarity will prove to be critical for directing association of CGBP with the nuclear matrix and localization to nuclear speckles is currently under investigation.
Co-localization of HRX and CGBP to an identical set of nuclear speckles suggests that they may directly interact or may be components of a common multimeric protein complex. A direct interaction of HRX with CGBP has not been reported, despite several yeast two-hybrid screens performed with the amino-terminal portion of HRX that co-localizes with CGBP to nuclear speckles (44 -50). The alternative hypothesis that CGBP and HRX are components of a common multimeric complex gains additional support from studies in Saccharomyces cerevisiae, in which the yeast homologue of CGBP (Spp1) and a trithorax family member (Bre2) are both components of a complex denoted COMPASS or Set1 (51)(52)(53)(54). This complex contains 7-8 components and is required for methylation of the Lys-4 residue of histone H3, which is associated with transcriptionally competent chromatin (55). The Set1 complex is also involved in gene silencing at telomeres and mating type loci (56) and activation of DNA repair genes (57). The results presented here provide evidence that an analogous complex may exist in higher eukaryotes. Yeast two-hybrid screening failed to demonstrate a direct interaction between Spp1 and Bre2 (51), consistent with the failure to detect an interaction with CGBP using HRX as the bait in two-hybrid screens.
Direct analysis of the protein interactions of CGBP in vivo are complicated by the finding that CGBP is nearly exclusively associated with the nuclear matrix, thus making co-immuno-precipitation and affinity pull-down assays difficult. However, the results described here predict that CGBPϪ/Ϫ cells may exhibit abnormal patterns of histone modification and chromatin structure, consistent with peri-implantation embryonic death of CGBP-null embryos, a time during which the genome undergoes global remodeling of chromatin structure and cytosine methylation (24,25,58).