Identification of novel binding elements and gene targets for the homeodomain protein BARX2.

BARX2 is a homeobox transcription factor that influences cellular differentiation in various developmental contexts. To begin to identify the gene targets that mediate its effects, chromatin immunoprecipitation (ChIP) was used to isolate BARX2 binding sites from the human MCF7 breast cancer cell line. Cloning and sequencing of BARX2-ChIP-derived DNA fragments identified 60 potential BARX2 target loci that were proximal to or within introns of genes involved in cytoskeletal organization, cell adhesion, growth factor signaling, transcriptional regulation, and RNA metabolism. The sequences of over half of the fragments showed homology with the mouse genome, and several sequences could be mapped to orthologous human and mouse genes. Binding of BARX2 to 21 genomic loci examined was confirmed quantitatively by replicate ChIP assays. A combination of sequence analysis and electrophoretic mobility shift assays revealed homeodomain binding sites within several fragments that bind to BARX2 in vitro. The majority of BARX2 binding fragments tested (14/19), also affected transcription in luciferase reporter gene assays. Mutation analyses of three fragments showed that their transcriptional activities required the HBS, and suggested that BARX2 regulates gene expression by binding to DNA elements containing paired TAAT motifs that are separated by a poly(T) sequence. Inhibition of BARX2 expression in MCF7 cells led to reduced expression of eight genes associated with BARX2 binding sites, indicating that BARX2 directly regulates their expression. The data suggest that BARX2 can coordinate the expression of a network of genes that influence the growth of MCF7 cells.

BARX2 is a homeobox transcription factor that influences cellular differentiation in various developmental contexts. To begin to identify the gene targets that mediate its effects, chromatin immunoprecipitation (ChIP) was used to isolate BARX2 binding sites from the human MCF7 breast cancer cell line. Cloning and sequencing of BARX2-ChIP-derived DNA fragments identified 60 potential BARX2 target loci that were proximal to or within introns of genes involved in cytoskeletal organization, cell adhesion, growth factor signaling, transcriptional regulation, and RNA metabolism. The sequences of over half of the fragments showed homology with the mouse genome, and several sequences could be mapped to orthologous human and mouse genes. Binding of BARX2 to 21 genomic loci examined was confirmed quantitatively by replicate ChIP assays. A combination of sequence analysis and electrophoretic mobility shift assays revealed homeodomain binding sites within several fragments that bind to BARX2 in vitro. The majority of BARX2 binding fragments tested (14/19), also affected transcription in luciferase reporter gene assays. Mutation analyses of three fragments showed that their transcriptional activities required the HBS, and suggested that BARX2 regulates gene expression by binding to DNA elements containing paired TAAT motifs that are separated by a poly(T) sequence. Inhibition of BARX2 expression in MCF7 cells led to reduced expression of eight genes associated with BARX2 binding sites, indicating that BARX2 directly regulates their expression. The data suggest that BARX2 can coordinate the expression of a network of genes that influence the growth of MCF7 cells.
Homeodomain transcription factors control development by regulating regional patterns of gene expression (1,2). This regulation influences diverse cellular behaviors including adhesion, migration, proliferation, differentiation, and apoptosis. In most cases, the identities of the target genes that link homeodomain proteins to these processes have not been elucidated. Previous work characterized the role of homeodomain and paired domain transcription factors in the temporal and spatial regulation of cell adhesion molecules (CAMs) 1 (3)(4)(5). A Southwestern screen to identify proteins that bound to a homeobox recognition motif from the chicken Ng-CAM gene, led to the discovery of the murine homeodomain protein, Barx2 (6). Related binding sites for BARX2 were subsequently identified in the mouse L1cam and Ncam1 genes, and these sites were shown in transgenic mice to be required for the native pattern of gene expression (3,(5)(6)(7).
Murine Barx2 is expressed in several mesenchymal and epithelial tissues during development including cartilage, skeletal and smooth muscle, the central nervous system, and branching epithelial structures, such as the mammary, lacrimal, and salivary glands (6,9,10). 2,3 Our recent studies indicate that Barx2 influences cellular processes that control cell adhesion and remodeling of the actin cytoskeleton seen in myoblast fusion (10), and chondrogenesis. 3 Human BARX2 has also been described as a tumor suppressor gene; loss of BARX2 expression correlates with invasiveness in ovarian cancer, and ectopic expression of BARX2 inhibits the invasiveness of an ovarian cancer cell line (11).
The role of BARX2 in development and cancer progression prompted us to search for genes that are regulated by BARX2, and to investigate the requirements for binding of BARX2 to genomic regulatory elements. While microarray analysis is useful for the identification of genes regulated by a specific transcription factor, direct target genes cannot be readily distinguished from genes affected by downstream pathways. Additionally, microarray analysis does not identify the DNA elements that are required for regulation of gene expression. Since 4 -6-bp motifs such as those recognized by homeodomain factors occur frequently in the genome, bioinformatic approaches may identify an unrealistically large number of potential binding sites without distinguishing those that are functionally relevant. In an effort to reduce this complexity, conventional methods of promoter analysis usually focus on the region proximal to the start of gene transcription, despite the fact that regulatory elements can be scattered throughout the gene locus. Moreover, in vitro methods for identification of transcription factor binding sites, such as electrophoretic mobility shift assays (EMSAs) that utilize short oligonucleotides to examine protein-DNA interactions, do not take into consideration chromatin structure or the accessibility of transcription factors to binding sites in vivo.
In contrast to these methods, chromatin immunoprecipitation (ChIP) identifies native transcription factor binding sites in a given cellular context and can potentially reveal target genes that are influenced by binding. In a recent example, Weinmann et al. (12), isolated gene promoter fragments bound by E2F by combining ChIP with hybridization to a microarray containing CpG island-enriched genomic DNA. Further analyses showed direct binding of E2F to 14 of these fragments, and identified one promoter fragment that was regulated by E2F in cells. A more recent study identified more than two hundred genomic binding sites for the Drosophila homeodomain factor Engrailed (13). The expression of twelve genes associated with these sites was activated in Drosophila embryos by a fusion protein that linked the Engrailed DNA binding domain to a strong activation domain, providing further evidence that Engrailed binds to these genes in vivo. Moreover, in both of these studies, analyses of the isolated genomic sequences identified novel binding motifs for the factors examined.
Using ChIP, the present study identified 60 potential genomic binding sites in MCF7 cells for BARX2. These sites are associated with genes involved in cytoskeletal organization and cytokinesis, cell adhesion, growth factor signaling, and transcription. Binding sites for BARX2 were preferentially located within gene introns, and many of the BARX2 binding fragments were conserved both in sequence and chromosomal location in the human and mouse genomes. In vitro and cellular assays showed that BARX2 regulates gene expression by binding to DNA sequences containing paired TAAT motifs that are separated by a tract of poly(T) residues. Moreover, the expression of eight genes identified by ChIP was reduced after inhibition of BARX2 expression, demonstrating that ChIP can identify bona fide regulatory target genes.

EXPERIMENTAL PROCEDURES
Cell Lines-MCF7 and MDA-MB-231 cells were obtained from the American Type Culture Collection. MCF7 cells were cultured in Dulbecco's modified eagle medium supplemented with 10% fetal bovine serum (Invitrogen), 2 mM L-glutamine, and antibiotics (100 units/ml penicillin and 100 g/ml streptomycin). MDA-MD-231 cells were cultured in RPMI medium 1640 (Invitrogen) containing 5% fetal bovine serum, 2 mM L-glutamine, and antibiotics. All cells were maintained at 37°C in 5% CO 2 .
Chromatin Immunoprecipitation-The ChIP protocol used in this study was adapted from Weinmann et al., (14) and from the protocol recommended by Upstate Biotechnologies. MCF7 cells were grown on three 10-cm plates to 85% confluence. Formaldehyde was added to a final concentration of 1%, and the plates were incubated 10 min at 37°C. The cross-linking reaction was stopped by the addition of 100 mM glycine containing protease inhibitors (Complete; Roche Applied Science). Cells were washed in dilution buffer (0.01% SDS, 1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl, 150 mM NaCl, pH 8.0 plus protease inhibitors), resuspended in lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris-HCl, pH 8.0 plus protease inhibitors) and sonicated to shear the DNA into 0.3-3-kb fragments. Insoluble material was removed by centrifugation, and the extract was precleared by incubation with blocked protein A-Sepharose (14) to reduce nonspecific interactions. The precleared chromatin was split into two samples, one in which 3 g of anti-BARX2 antiserum (Santa Cruz Biotechnology) was added, and one in which no antibody was added (negative control). Both samples were treated identically in every other respect. Samples were incubated overnight at 4°C and blocked protein A-Sepharose (14) was then added. The immunoprecipitated complexes were washed twice in dilution buffer, once in high salt dilution buffer (0.01% SDS, 1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl, 500 mM NaCl, pH 8.0), once in LiCl buffer (0.01% SDS, 1% Triton X-100, 1.2 mM EDTA, 16.7 mM Tris-HCl, 250 mM LiCl, pH 8.0) and once in TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0). Following treatment of the samples with RNase A (Roche Applied Science) and proteinase K (Roche Applied Science), cross-links were reversed by incubation at 65°C overnight. The DNA was purified using the Qiagen MinElute kit. In analytical ChIP experiments, purified DNA was used directly for real-time PCR amplification of selected genomic fragments.
To clone the purified DNA fragments, T4 DNA polymerase (Roche Applied Science) was used to generate blunt ends, the DNA was ligated to the pCR-BluntII-TOPO vector (Invitrogen) and transformed into TOP10 cells (Invitrogen) following the manufacturer's recommendations. Ampicillin-resistant colonies were selected, and DNA was extracted from overnight cultures using the Qiagen spin miniprep kit. The 5Ј-and 3Ј-ends of the fragments were sequenced on an ABI model 373 automated DNA sequencer.
Real-time PCR-To examine the enrichment of specific BARX2 binding fragments in independent ChIP assays, primers were generated corresponding to regions within each ChIP-derived genomic fragment examined. Primers were synthesized on an ABI DNA/RNA synthesizer, or obtained from Integrated DNA Technologies (IDT) or GenSet oligos. Sequences of primers are shown in Appendix I (see Supplementary Data). For each PCR assay, similar concentrations of genomic DNA isolated from MCF7 cells by ChIP in the presence or absence of anti-BARX2 antibodies were used. In addition, unfractionated chromatin was used for each set of primers as a positive control to verify the efficiency and specificity of amplification. These experiments were performed using a Roche Lightcycler and the SYBR-Green HotStart Master Kit following manufacturer's instructions. The relative enrichment of each ChIP-derived fragment was determined based on the crossing point (CP) values for each amplification reaction. The CP value is defined as the cycle number where all samples have equivalent fluorescence. The concentration of DNA in the ChIP-derived and negative control samples at the CP was determined using a standard curve. The enrichment of each ChIP-derived fragment was determined from the ratio of these two concentrations. For every fragment analyzed, enrichment was measured in two or three independent immunoprecipitations and mean and S.E. values were determined. A melting curve analysis was performed for each sample after PCR amplification to ensure that a single amplification product was obtained.
Computational Analyses-A Unix-based tool kit called Genehuggers was used to analyze the ChIP dataset (www.scripps.edu/services/gh/). Each GeneHuggers program utilizes a common datatype so that the output of one module can be used as input to another module. The BLAST program (15) was used to align sequences from ends of the ChIP fragments with a human genome sequence data base assembled with GeneHuggers tools using an expect value of 1e Ϫ90 . After conversion of BLAST reports into the GeneHuggers data type, the best hit for each sequence was selected manually. The region between the end sequences was extracted and the complete fragment sequences were again aligned to the human genome using BLAST. Occasionally, BLAST missed part of the clone sequence and a sim4 alignment was performed between the clone and the contig sequence to completely assign the clone sequence. The position of each complete fragment relative to the nearest mRNA feature was determined by querying the UniGene field of each corresponding GenBank TM record. For consistency, UniGene identifiers and conventions for human and mouse genes are used throughout this article. Functional assignments for potential target genes were determined manually.
For analyses of cross-species homology, the BARX2-ChIP fragments were compared with the mouse genome assembly using the BLAST algorithm with a word size of 7. The position of each homologous sequence relative to UniGene identifiers was determined as described above. Additional searches of the mouse genome Trace Archive were performed using the Discontiguous MegaBLAST program (16). Matches to unassembled contigs were scored if they were assigned an alignment score of at least 200. The contigs were then compared with the annotated mouse genome assembly using MegaBLAST. Fragments that were located in introns or very proximal to non-hypothetical gene transcripts were examined in more detail for homology to the mouse genome using the BLAST2 program (bl2seq). The gene name was used to query HomoloGene data base to find the corresponding mouse transcript identifier and sequences encompassing the entire gene transcript and 50,000 bases upstream and downstream were extracted from the mouse genome. The most conserved subsequences between the human and mouse sequences were identified using bl2seq with a reduced gap penalty (Ϫ0.5), a reduced mismatch penalty (Ϫ1), and a word size of 7. Conserved regions that overlapped with the human ChIP clone were then extracted from both human and mouse sequences.
The sequences of the ChIP fragments were searched using the MEME program (meme.sdsc.edu/meme/website/intro.html) to identify over-represented subsequences (17). In addition, the regular expression search tool RIGHT (18) was used to search the sequences for potential BARX2 binding motifs similar to those identified previously in the mouse Ncam1 and L1cam, and chicken NgCAM genes (3), or to the consensus binding motif identified previously by in vitro selection (19).
EMSAs-Oligonucleotide probes were synthesized corresponding to HBS elements from each BARX2-ChIP fragment, and to mutant versions in which the core TAAT motifs were disrupted. Double-stranded probes were end-labeled with [␥-32 P]ATP (3000 Ci/mmol; DuPont) using polynucleotide kinase (New England Bioloabs). Probes were purified by elution from an 8% polyacrylamide gel and their specific activity was determined. All probes were of comparable specific activity. EMSAs used 25,000 cpm of each probe and in vitro translated proteins, as described previously (3). Relative binding of the protein to the HBS probes was determined by measuring the intensity of the probe/protein complexes using a PhosphorImager (Molecular Dynamics).
Construction and Mutagenesis of BARX2-ChIP-Fragment Luciferase Vectors-BARX2-ChIP fragments were excised from the pCRBluntII-TOPO vector using XhoI and SpeI, and inserted into the XhoI and NheI sites of the pGL3basic or pGL3promoter vectors (Promega). In some cases, the fragments were excised with EcoRV and XhoI and cloned into the SmaI and XhoI sites of the pGL3 vectors. Site-directed mutagenesis was performed on each of the two TAAT motifs (mut1 and mut2, respectively) within the HBS elements of the Bx23, Bx68, and Bx98 fragments. Oligonucleotide primers corresponding to both strands were designed such that the mutant bases were located in the middle of the primer with 15 bases of correct sequence on either side. Primers were designed to produce mut1 and mut2 versions of fragments Bx23, Bx68, and Bx98, and also a mut1 ϩ 2 mutation of Bx98.
Mutagenesis was performed as described in the QuikChange sitedirected mutagenesis kit (Stratagene), with the following modifications. Primers were annealed and purified by elution from an 8% acrylamide gel. The pGL3promoter and pGL3basic vectors containing each fragment were used as templates. PCR reactions were performed using PfuTurbo DNA polymerase (Stratagene) as recommended by the manufacturer. Plasmids were digested with DpnI (New England Biolabs), purified using the Qiagen PCR purification kit, ligated using the Rapid DNA Ligation kit (Roche Applied Science) and repurified. An aliquot of each sample was transformed into TOP10 cells (Invitrogen), and ampicillin-resistant colonies were selected for PCR screening. Colonies were amplified, lysed at 100°C for 15 min, and used directly for PCR with TaqDNA polymerase (New England Biolabs), and products were visualized by agarose gel electrophoresis. Mutations were confirmed by sequencing.
Cellular Transfection of MCF7 and MDA-MB-231 cells-For transfections, purified plasmids were obtained by large scale purification using the Ultraclean plasmid prep kit (Mo Bio). MCF7 or MDA-MB-231 cells were placed in 24-well tissue culture plates at an initial density of 1 ϫ 10 5 cells/well and transfected with 500 ng of the luciferase reporter construct and 300 ng of the LacZ reporter CMV␤gal (Clontech) to provide an internal reference for transfection efficiency, using Lipo-fectAMINE 2000 reagent (Invitrogen). Cellular transfection and assay conditions were otherwise as described previously (6), using the Promega Luciferase assay system. All experiments were performed in duplicate, and the data shown were derived from at least three independent experiments.
RNAi-mediated Inhibition of BARX2 Expression and RT-PCR Analysis of BARX2 Target Genes-Three RNA interference (RNAi) expression constructs that target the human BARX2 gene were prepared using the pSuper vector (kindly provided by Dr. R. Agami (20)). Oligonucleotides were generated containing a 19-nt sequence derived from BARX2, separated by a short spacer from the reverse complement of the same sequence, and followed by a transcription termination signal. The resulting transcript is predicted to form a 19-bp stem-loop structure that acts as a small interfering (si) RNA. The sequences were flanked with BamHI and XhoI sites to facilitate cloning. The sequences, designated BARX2si1, -2, and -3, are as follows: BARX2si1, 5Ј-GATCCCCG-ATCCTCTCCAAGGAGACCTTCAAGAGAGGTCTCCTTGGAGAGGA-TCTTTTTGGAAC-3Ј; BARX2si2, 5Ј-GATCCCCGAGTCAGAGACGGA-ACAGCCCTTCAAGAGAGGGCTGTTCCGTCTCTGACTCTTTTTGGA-AC-3Ј; and BARX2si3, 5Ј-GATCCCAAGGAGACCTGCGATTACTTTTC-AAGAGAAAGTAATCGCAGGTCTCCTTGTTTTTAAC-3Ј. Two of these sequences were selected using publicly available software from the Dharmacon RNAi Design Center and all were screened for possible matches to other transcripts using BLAST to minimize the possibility of off-target effects.
Two 100-mm plates of 50% confluent MCF7 cells were co-transfected with 2 g of each of the pSuper expression plasmids, or an equivalent total amount of the empty vector, and 2 g of the pEYFPN1 expression vector (Clontech), using LipofectAMINE 2000 (Invitrogen). Cells were examined for yellow fluorescent protein (YFP) fluorescence to estimate transfection efficiency, and harvested after 4 days in culture. Transfection efficiency in two separate experiments was 40 -50%. RNA was prepared using Trizol (Invitrogen), treated with DNase, and purified using Qiagen RNAeasy columns. cDNA was prepared using M-MuLV reverse transcriptase (New England Biolabs). Primers were prepared corresponding to selected BARX2 target gene transcripts and PCR amplification was performed using Qiagen HotStarTaq. Sequences of primers are shown in Appendix I (see Supplementary Data). Limited PCR cycles were used to maintain amplification within the linear range. PCR products were resolved on 2% agarose gels, captured as digital images, and quantified using Scion Image software (Scion Corporation). Replicate PCR products amplified from cDNA prepared from BARX2si or control transfected cells were compared with each other. Results were normalized to the housekeeping gene cyclophilin. ChIP Identifies 60 Genomic Binding Sites for BARX2 in MCF7 Breast Cancer Cells-As a first step to identifying regulatory elements in target genes that bind to BARX2, we used ChIP to isolate BARX2 binding sites in MCF7 cells. Briefly, proteins in intact cells were cross-linked to DNA using formaldehyde, cells were lysed, and chromatin was sonicated to produce DNA fragments between 0.3 and 3 kb in size. Protein/ DNA complexes were immunoprecipitated using polyclonal anti-BARX2 antibodies, and the DNA was purified and cloned into a plasmid vector. The 5Ј-and 3Ј-ends of the 120 cloned fragments were sequenced, and these sequences were mapped to the human genome using the BLAST algorithm. For 82 of the clones, the sequences of their 5Ј-and 3Ј-ends mapped within 100 bp to 3 kb of each other, and the intervening sequences were extracted from the human genome. Thirty-eight samples were not examined further because we could not sequence both ends, because one or both end sequences mapped to repetitive elements, or because the end sequences mapped to different chromosomes, suggesting that two fragments had ligated to each other.

BARX2 Is Expressed in MCF7, but Not in
The genes nearest to the 82 extracted BARX2-ChIP sequences were determined using a bioinformatics program called Gene-Huggers, which extracted UniGene identifiers from the Gen-Bank TM data base. Through the combination of BLAST and GeneHuggers, 60 BARX2-ChIP fragments were mapped in prox- imity to known or predicted genes (those supported by mRNA sequences, or predicted by expressed sequence tags (ESTs) that overlapped assigned exons, respectively). The remaining BARX2-ChIP fragments were not analyzed further, as no mRNA or EST sequences that would predict a gene was found within a 500-kb distance of the fragment. Table I lists the genomic distribution of the 60 gene-associated BARX2 binding fragments.
Thirty-five percent (21 of 60) of the BARX2-ChIP fragments were located within gene introns, generally within the first or second intron. This is higher than the proportion (24%) of the human genome that is estimated to be contained within introns (21), and suggests a preference of BARX2 to bind intronic DNA. A further 35% of the fragments were located within the region spanning 50-kb upstream or downstream of a gene. Moreover, several were located less than 10-kb upstream of a transcription start site, suggesting that they are associated with the gene promoter. The remaining 30% of intergenic BARX2-ChIP fragments were located greater than 50 kb from a gene. Since this screen did not isolate the same genomic region more than once, it is likely that many BARX2 binding sites within the genome remain to be identified.
Specificity of BARX2 Binding Is Confirmed by Independent ChIP Assays-The binding of BARX2 to the isolated ChIP fragments was confirmed using independent ChIP assays. Twenty-one loci were selected from the 60 initially identified by ChIP for these assays. These fragments were located within 50 kb of a gene locus, or within an intron. The relative enrichment of each fragment was examined by quantitative PCR in at least two independent ChIP experiments. Two parallel ChIP assays were performed in each experiment: one in which an anti-BARX2 antibody was used for immunoprecipitation, and another in which no antibody was added (negative control). Enrichment of BARX2-ChIP fragments was determined by comparing the amount of fragment-specific PCR product amplified from the anti-BARX2 ChIP, and negative control ChIP samples. For the 21 BARX2-ChIP fragments examined, the enrichment ranged between ϳ2and 46-fold (Table II). This specific enrichment suggests that the fragments represent bona fide binding sites for BARX2 in MCF7 cells. The absence of false positive fragments may be due to the selection of only gene proximal loci for these assays.
These analyses revealed many genomic regions that bind to BARX2. However, ChIP assays lack the lack the resolution necessary to identify the discrete BARX2 binding elements within these regions. We therefore utilized a combination of computational, in vitro, and cellular analyses to further delineate the sequence requirements for binding and regulation by BARX2.
Human-Mouse Comparisons Reveal Extensive Sequence Conservation in BARX2 Binding Fragments-The sequences of the BARX2-ChIP fragments were compared with the mouse genome using the discontiguous BLAST algorithm. Of the 60 fragments that mapped to the human genome, 40 showed sequence homology to the mouse genome. The conserved regions spanned between 29 and 1116 bp and exhibited 77-100% sequence identity. The majority of the conserved sequences matched unannotated genomic contigs, and the location of these sequences with respect to any mouse gene could not be determined. However 10 sequences were mapped to the annotated mouse genome assembly, revealing conserved sequences in syntenic regions of mouse and human chromosomes (Table  III). In each case, the sequence mapped to the mouse ortholog of the gene that it mapped to in the human genome. Moreover, the sequences located in the NCAM1, FLNA, RER1, HSPA9B, RBM15, PTPRR, TLE3, and DNCL1 genes all showed positional correspondence, being located in the same region of the mouse and human genes with respect to their intron/exon structure (Table III). In the case of DMXL1, the ortholog has not yet been annotated in the mouse genome. However, by comparing human DMXL1 mRNA to the mouse genome using BLAST, the location of the mouse DMXL1 gene could be predicted and shown to overlap the region containing the conserved BARX2 binding site.
The Bx68 fragment is located 160 kb from the NMU2 gene in the human and mouse genomes. In addition, BLAST analysis of the genomic region flanking the Bx68 fragment identified several matches to the human and mouse EST databases, suggesting that a novel gene overlaps the Bx68 fragment. The studies described below indicate that the Bx68 fragment exhibits significant promoter activity and might control transcription of this novel gene locus.
It has been estimated that the extent of sequence conservation between non-coding regions of the human and mouse genomes is ϳ20% (22,23). Thus the observation that over 50% of the BARX2-ChIP fragments examined exhibit significant sequence conservation with the mouse genome suggests that many of these fragments are functionally relevant.
BARX2 Binding Fragments Are Associated with Genes Involved in Multiple Cellular Processes-The gene located nearest to each BARX2-ChIP fragment was considered a potential target for binding and regulation by BARX2. For fragments located within introns, the gene associated with the intron was designated a potential BARX2 target gene. To gain insight into the particular cellular processes that might be influenced by BARX2 in MCF7 cells, these genes were grouped into functional categories (Table II). The most highly represented categories were genes encoding transcription factors, receptors or ligands, and proteins involved in cytoskeletal organization and remodeling. Genes in the latter category include the actinbinding proteins anillin and filamin-A, and the microtubule motor protein dynein. The association of BARX2 with genes involved in cytoskeletal organization has also been observed in recent studies where ␣-actin was shown to be a target of BARX2 during muscle differentiation (10).
In earlier studies, BARX2 was shown to bind to regulatory elements from the mouse L1cam and Ncam1 genes (6). In the present study, two CAMs, human NCAM1 and CNTN4, were identified as potential targets in MCF7 cells. The BARX2-ChIP fragment that was isolated from human NCAM1 (Bx23) mapped within the first intron, immediately downstream of the first exon. Our earlier studies indicated that BARX2 bound to a homeodomain binding site (HBS) in the murine Ncam1 promoter immediately upstream of the first exon, and that different BARX2 domains could repress or activate the promoter (3). Comparison of the mouse and human genomes revealed that the sequences of both of these binding sites (promoter and intronic) are conserved; however, the significance of these two sites for the regulation of the mouse and human NCAM1 genes remains to be determined.
Several BARX2-ChIP fragments were located in genomic regions where two potential target genes could be identified, and in some cases these genes overlapped. For example, the Bx56 fragment is located within intron five of COG5, and this intron also contains GPR22, one of two G protein-coupled re-   ceptor genes identified as potential BARX2 targets in this study. Similarly, the Bx36 fragment localized to the first intron of synapsin II (SYNII); however, the tissue inhibitor of metalloproteinase 4 (TIMP4) gene lies within intron five of SYNII, placing the fragment 40-kb downstream of TIMP4. Interestingly, these two genes show a reciprocal pattern of expression in breast cancer cell lines that differentially express BARX2 (see Table II). Two other BARX2 binding fragments were located within short intergenic regions: the Bx15 fragment is located less than 10 kb from both contactin 4 (CNTN4) and interleukin 5 receptor ␣ (IL5RA), and the Bx28 fragment is located between anillin (ANLN) and acyloxyacyl hydrolase (AOAH). In the latter case, the results of studies described below indicate that ANLN is a functional target of BARX2 (see Fig. 4).
BARX2 Binds Preferentially to Paired TAAT Motifs within BARX2-ChIP Fragments-Like most homeodomain proteins, BARX2 binds to motifs that contain the core sequence, TAAT (3,6,19) however, further requirements for BARX2 binding have not been well defined. Earlier studies showed that BARX2 bound with high affinity to the HBS element from the murine Ncam1 promoter (3). This element contained two closely spaced TAAT motifs (paired motifs), one of which was an overlapping TAAT motif. We used the regular expression search program RIGHT (18), to search the BARX2-ChIP sequences for similarity to the murine Ncam1 HBS element. Most of the fragments contained at least one paired TAAT motif (separated by less than 10 bp), and in several cases, one of the motifs was an overlapping TAAT sequence. In addition, we used the MEME program to perform an unbiased search for other TAAT-containing elements that were common to three or more BARX2-ChIP fragments. In 25% of the sequences (15 of 60), MEME identified a common sequence element that included a single TAAT in close proximity to a TTTGTATTT motif. BLAST analysis determined that this element represents a region of the Alu repeat.
To test whether BARX2 binds to these elements, radiolabeled probes were prepared corresponding to eight HBS elements that contained paired TAAT motifs, and five BARX2-ChIP Alu repeat sequences ( Fig. 2A). The HBS probes represent variations in HBS element composition, including the presence or absence of an overlapping TAAT motif, and differences in the number and composition of the bases intervening the paired TAAT motifs. In addition, a mutant version of each probe was produced in which the TAAT motif(s) were disrupted. Both wild-type and mutant probes were tested for binding to in vitro translated BARX2 protein in EMSA (Fig. 2). EMSAs were also performed using a truncated BARX2 fragment, HDBBR, containing the homeodomain (HD) and the Barx basic region (BBR). HDBBR binds in a similar manner as full-length BARX2; however, the intensity of binding is greater due to the absence of the N-terminal domain, which is inhibitory to DNA binding (3). All EMSAs used similar molar amounts of each probe, to allow the relative binding to each probe to be compared. The results of these assays are summarized in Fig. 2A, and five exemplar binding assays using wildtype and mutated HBS elements are shown in Fig. 2, B and C, respectively.
Full-length BARX2 and HDBBR bound to each of the eight HBS probes, and binding was disrupted by mutation of the TAAT motifs, indicating that binding was specific to this motif (Fig. 2). The strength of binding was variable, with Bx98(HBS1), Bx68(HBS1), Bx81(HBS1), and Bx103(HBS1) probes binding most strongly to both full-length BARX2 and HDBBR. Three of these probes contained at least one overlapping TAAT motif, indicating that BARX2 binds preferentially to paired TAAT motifs, and that BARX2 might form particularly strong interactions with overlapping TAAT motifs. In addition, the HDBBR fragment formed larger, albeit weak, complexes with several of the HBS probes (Fig. 2B) indicating that in some cases more than one molecule of HDBBR can bind to the probe simultaneously.
In contrast, full-length BARX2 bound to only one Alu probe, Bx23(Alu), although the HDBBR fragment of BARX2 bound weakly to all five Alu probes. The Bx23(Alu) probe differed from the other Alu probes in that it contained a paired TAAT motif in addition to the Alu consensus sequence (Fig. 2A). This paired motif might be the preferred site for BARX2 binding. Thus the Alu repeat element, while present in many BARX2-ChIP fragments, is unlikely to be the primary binding site for BARX2. However, this does not preclude the possibility that BARX2 binds to Alu repeats in vivo, where additional DNA sequences or cofactors might strengthen the interaction.
BARX2 Binding Fragments Exhibit Regulatory Activity in Vivo -BARX2 has been shown to both activate and repress gene transcription, presumably through its activator and repressor domains (3). To determine whether the immunoprecipitated BARX2 binding fragments contain sequences that confer promoter, enhancer, or repressor activities, we cloned 19 fragments into a promoterless luciferase reporter vector (pGL3basic), and a vector containing a weak promoter driving the luciferase gene (pGL3promoter). Individual constructs were transfected into MCF7 cells and expression of the luciferase reporter gene was assayed. Based on their average activities in at least three independent experiments, the fragments were placed into one of four functional categories (Table  IV). The fragments were defined as having promoter activity if they were able to increase the activity of the pGL3basic con-struct by at least 1.5-fold. This suggested that that the fragment had the ability to recruit the transcriptional machinery. Fragments were defined as having activator or repressor activities if they increased or decreased the activity of the pGL3promoter vector by at least 1.5-fold, respectively. Fragments in Set A had promoter activity when tested in pGL3basic, with Bx93 and Bx68 exhibiting very strong activity (14-and 11-fold, respectively). Four fragments from Set A also repressed the pGL3promoter construct, possibly as a result of interference between two promoter elements. The five fragments composing Set B repressed the pGL3promoter construct, with both Bx34 and Bx94 exhibiting 12.5-100-fold repression. These fragments had little effect on the promoterless construct. The fragments in Sets C and D had no effect on the basal activity of either vector.
To examine whether the activity of these fragments is altered by changes in BARX2 expression, transfection experiments were performed as described above in the MDA-MD-231 breast cancer cell line, which we determined did not express detectable amounts of BARX2 by RT-PCR (Fig. 1). The transcriptional activities of 13 of 19 fragments were different between MCF7 and MDA-MB-231 cells (Table IV). In contrast to the varied activities of these fragments in MCF7 cells, fragments in Sets A, B, and C all exhibited modest promoter activity in MDA-MB-231 cells while the fragment in Set D had no affect on the basal activity of either vector. This suggests that in the absence of BARX2 the fragments have a basal activity that is controlled by other elements and factors. Overall, these observations support the notion that BARX2 is required for the activities of these fragments in MCF7 cells.
The HBS Element of Several BARX2 Binding Fragments Is Necessary for Activity-To determine whether HBS elements that bound to BARX2 in vitro are involved in the regulatory activities of these fragments, we mutated the HBS elements within three BARX2 luciferase reporter constructs. The Bx68 and Bx98 fragments were selected, as they bound strongly to BARX2 in vitro, and displayed moderate to strong promoter activity in vivo. In addition, we mutated the HBS element within the Bx23 fragment to determine whether this second site for BARX2 binding in NCAM1 is functional. The mutations generated in the TAAT motifs corresponded to those that had disrupted BARX2 binding in vitro (see Fig. 2). For each fragment, the two TAAT motifs were mutated independently of each other to create mut1 and mut2 versions of the fragments. Both TAAT motifs in the Bx98 fragment were also mutated simultaneously (mut1 ϩ 2). Since each of the fragments exhibited both promoter and activator or repressor activities, the mutant constructs were tested in both the pGL3basic and pGL3promoter vector backgrounds. The constructs were transfected into MCF7 cells, and expression of the luciferase gene was assayed in at least three independent experiments. In the pGL3basic background, the promoter activity of both the Bx68 and Bx23 fragments required an intact HBS element (Fig. 3A). Mutation of the first TAAT motif in Bx68 abolished the promoter activity of the fragment, and mutation of either TAAT motif in Bx23 significantly reduced Bx23 activity. In addition, the activator function of Bx68 and the repressor func-tion of Bx98 in pGL3promoter also exhibited HBS dependence (Fig. 3B).
The TAAT motifs within Bx68 and Bx98 HBS elements can also act independently of one another. The activity of the Bx68 fragment in the pGL3basic background requires the first TAAT motif (mut1), but not the second TAAT motif (mut2; Fig. 3A). However, in the pGL3promoter background, both TAAT motifs are required for activity (Fig. 3B). Similarly, the TAAT motifs within the Bx98 HBS element exhibit independence of one another in the pGL3promoter vector (Fig. 3B). Mutation of the first TAAT motif (mut1) abolishes activity, suggesting that it provides enhancer activity, while mutation of the second TAAT motif (mut2) increases activity, indicating that it exerts a repressor function. Mutation of both motifs simultaneously abolishes activity. These results indicate that both TAAT motifs are necessary for Bx98 activity and that each acts independently to control the activity of the fragment.
To examine whether the activity of the fragments requires BARX2, we transfected the wild-type and mutant forms of Bx68, Bx23, and Bx98 plasmids into MDA-MB-231 cells, which do not express BARX2, and assayed for luciferase gene expression (Fig. 3, C and D). Mutation of either TAAT motif in pGL3basic Bx23, pGL3promoter Bx23 and pGL3promoter Bx68 fragments did not significantly affect their activities (Fig.  3, C and D), indicating that these HBS elements are not re- FIG. 3. The HBS element is required for activity of selected BARX2 binding fragments. Wild-type and mutant versions of Bx23, Bx68 and Bx98 fragments were cloned into pGL3basic and pGL3promoter vectors, transfected, and assayed for luciferase gene expression. Expression of the empty vector, C, was used as a reference standard and is set to 1 (stippled bar). All samples are represented by at least 3 independent luciferase assays. WT is the wild-type fragment. mut1 is a site-directed mutation of TAAT motif1. mut2 is a site-directed mutation of TAAT motif2. mut1 ϩ 2 is a site-directed mutation of both TAAT motifs. quired in MDA-MB-231 cells. Since the same HBS elements were required for activity in MCF7 cells, these results indicate that binding of BARX2 to these elements controls the activity of both the Bx23 and Bx68 fragments. In contrast, the activity of the Bx98 fragment in MDA-MB-231 cells is affected by mutation of the TAAT motifs, raising the possibility that in the absence of BARX2, other homeodomain transcription factors can act through these HBS elements to control gene expression.
Since the binding of BARX2 to Bx23 and Bx68 is required for their activity, the sequences of their HBS element were compared. Both elements were similar and consisted of paired TAAT motifs, one of which is an overlapping motif. In addition, the paired motifs were separated by a stretch of poly(T) residues (see Fig. 2). These three characteristics may be important determinants of functional BARX2 binding sites in vivo.
Inhibition of BARX2 using RNAi Reduces the Expression of Eight BARX2 Target Genes-To determine whether the genes associated with any of the fragments identified using ChIP are functional targets of BARX2, we initially asked whether any of these genes were differentially expressed in MCF7 and MDA-MB-231 cells. The relative expression levels of 16 of the 21 genes that were verified by re-ChIP analysis were examined in MCF7 and MDA-MB-231 cells by quantitative RT-PCR. As shown in Table II, seven of these putative target genes were differentially expressed between the two cell lines: ESR1, a transcription factor involved in the response of cells to estradiol, netrin G1 (NTNG1), an axon guidance molecule (24), fibroblast growth factor 12 (FGF12), a regulator of growth and differentiation, and tissue inhibitor of metalloproteinase 4 (TIMP4), were expressed in MCF7 cells but not in MDA-MB-231 cells, while synapsin II (SYNII), a gene involved in synaptogenesis and the modulation of neurotransmitter release (25), was expressed in MDA-MB-231 but not in MCF7 cells. The expression of anillin, an actin-binding protein involved in cytokinesis (26), was greater in MDA-MB-231 cells, and G protein-coupled receptor 114 (GPR114) expression was greater in MCF7 cells.
To determine more directly whether BARX2 is involved in regulating expression of putative BARX2 target genes, we examined the effect of inhibiting BARX2 expression. RNA inhibition (RNAi) was used to reduce BARX2 expression in MCF7 cells. Briefly, MCF7 cells were co-transfected with a mixture of three RNA inhibition (RNAi) expression constructs or a control vector. At least three quantitative RT-PCR assays were performed using cDNA synthesized from two independent RNAi experiments. The housekeeping gene cyclophilin was used as a reference standard. In cells transfected with the BARX2 RNAi expression constructs, BARX2 mRNA levels were reduced by ϳ50% (Fig. 4). The expression of 11 putative BARX2 target genes was examined after BARX2 RNAi using RT-PCR. The genes chosen, with the exception of NCAM1, were robustly expressed in MCF7 cells. As shown in Fig. 4, the expression of eight of these genes was reduced by 20 -70% relative to cyclophilin expression. The genes affected by BARX2 RNAi included ESR1, FGF12, ANLN, the actin-binding protein filamin-A (FLNA), the RNA-binding protein RBM15, a novel zinc finger protein (ZnF), and DMXL1, a WD repeat protein of unknown function. It is interesting to note that ESR1, FGF12, and ANLN are also differentially expressed between MCF7 and MDA-MB-231 cells.
The expression of FGF12, the novel ZnF gene, and ESR1 were dramatically reduced in response to BARX2 RNAi by ϳ70, 55, and 50%, respectively. This indicates that a threshold level of BARX2 is required to maintain expression of these genes. The relatively modest reduction in expression of other target genes may be due to the fact that BARX2 expression was only partially inhibited by RNAi. Three genes (ELK4, HSPA9B, and NCAM1) were not significantly affected by BARX2 RNAi treatment, suggesting either that BARX2 is not involved in their regulation, or that other factors can substitute for BARX2. Overall, of the genes identified by BARX2-ChIP that were examined in this study, a large proportion are regulatory targets of BARX2 in MCF7 cells. These results indicate that ChIP is a useful method for identifying direct regulatory targets of transcription factors. DISCUSSION It has become increasingly apparent that chromatin is a dynamic structure that plays a central role in the regulation of development. Chromatin controls the accessibility of DNA binding sites to transcription factors, influences the interactions between factors, and controls the subnuclear localization of the various proteins required for transcriptional regulation (27)(28)(29)(30). Using ChIP as a selection method, we have isolated BARX2 binding sites from accessible regions of native chromatin in MCF7 breast cancer cells. This allowed the identification of a number of BARX2 target genes as well as determination of the binding sites through which BARX2 influences gene expression.
These studies revealed a strong preference for binding of BARX2 to elements that contain paired TAAT motifs. Moreover, we identified a functional BARX2 binding element that contains an overlapping TAAT motif separated from a single TAAT motif by a T-rich stretch of nucleotides. Over half of the BARX2 binding fragments identified in this study were conserved between the human and mouse genomes, and we were able to map 10 of these sequences to orthologous genes. The remainder may also be located in gene orthologs, but could not be mapped, as the assembly and annotation of the mouse genome is still in progress. These studies also led to the identification of several direct regulatory targets of BARX2. These genes, ESR1, FGF12, ANLN, FLNA, RBM15, DNLC1, novel ZnF, and DMXL1, showed reduced expression after inhibition of BARX2 expression in MCF7 cells. The ESR1, FGF12, and ANLN genes also showed differential expression in MCF7 cells and in MDA-MB-231 cells that do not express BARX2, further supporting a role for BARX2 in their regulation. Interestingly, the expression of these genes was not significantly altered by overexpression of BARX2 in MCF7 cells (not shown). It is possible that the exogenously expressed BARX2 protein does not gain access to these genomic sites in vivo. Alternatively, this result may reflect limiting amounts of particular cofactors that are required for BARX2 to control gene expression. Several other putative BARX2 target genes, NTNG1, TIMP4, SYN2, and GPR114 are also differentially expressed in MCF7 and MDA-MB-231 cells, suggesting that BARX2 may play a role in their regulation.
Many functionally relevant binding sites for BARX2 were identified in regions outside of gene promoters, particularly in introns. These sites would not have been identified using conventional promoter analyses, which are generally restricted to the region proximal to the transcription start site. BARX2 binding elements located distal to genes or within introns might function through long range interactions that involve looping of chromatin to bring distal elements within proximity of gene promoters (31), or by inducing permissive or repressive structures that propagate along the chromosome and affect the accessibility of gene promoters to the transcriptional machinery (32). Moreover, the high proportion of intronic BARX2 binding sites identified in this study raises the possibility that BARX2 affects transcript elongation through the intron by influencing local chromatin structure. Additional studies utilizing in vivo methods may reveal that many functionally relevant transcription factor binding sites are scattered throughout the genome.
The observation that more than 50% of the BARX2 binding fragments isolated by ChIP are conserved between mouse and human is striking, since the extent of sequence conservation between non-coding regions of the human and mouse genomes is estimated to be only 20% (22,23). This further supports the idea that BARX2-ChIP enriched for sequences that are likely to be functionally relevant, as conserved non-coding sequences are generally considered to be reliable guides to regulatory regions (33,34).
Since most transcription factor recognition motifs are short degenerate sequences, the probability of their occurrence in the genome is high, although only a small proportion of these motifs are likely to be bound in vivo, with even fewer functioning directly in gene regulation. We used a combination of in vitro and in vivo methods to effectively demonstrate the functional relevance of several BARX2 binding sites in MCF7 cells. This approach was designed to circumvent the inherent limitations of each method when used in isolation. As an example, characterization of the Bx23 and Bx68 fragments using ChIP, computational analysis, EMSA, and promoter assays demonstrated that they contain bona fide BARX2 binding elements that are required for regulation by BARX2 in MCF7 cells. This functional HBS element is similar to the consensus BARX2 binding sequence identified through in vitro selection studies (19) in that it contained a TAAT motif and a poly-T rich region. However, the HBS element derived from Bx23 and Bx68 is longer and contains multiple TAAT motifs, some of which overlap. This clustering of TAAT motifs might increase the probability of BARX2 binding to the HBS or allow several molecules to bind together, as was observed for the HDBBR fragment in EMSA. Mechanisms that increase the local concentration of transcription factors on DNA have been shown to facilitate transcriptional regulation (35).
BARX2 may also control transcription in cooperation with other factors that bind to particular HBS elements. Mutagenesis studies indicated that individual TAAT motifs within an element could function independently of one another, suggest-ing that different factors may be bound at each motif. The proteins that bind with BARX2 to HBS elements remain to be elucidated and are likely to depend on the sequence composition and the cellular context. The cooperation between these factors might involve dimerization as has been observed for other homeodomain proteins (36,37).
Two genomic fragments characterized in this study acted as strong promoters suggesting that they might control the transcription of neighboring genes. Several other fragments acted as modest promoters when tested in a promoterless reporter vector. However most of these fragments were located in gene introns rather than proximal to a known transcription start site. While they are unlikely to act as promoters in vivo, their activity indicates that they are able to recruit components of the transcription machinery. These sequences may therefore enhance gene expression in vivo through long range interactions with the gene promoter.
Individual regulatory elements generally function in an integrated manner with neighboring, or even distal, genomic sequences to give the native pattern of gene regulation. Two of the BARX2 binding fragments that exhibited promoter activity in heterologous assays (Bx98 and Bx82), are associated with the DMXL1 and novel ZnF genes, which require BARX2 for their expression. Interaction between these fragments and promoters in trans might be sufficient to control the transcription of these genes in response to BARX2. In contrast, the activities of BARX2 binding fragments from other genes including ANLN, ESR1, RBM15, and DNCL1 did not correlate with the requirement for BARX2 to drive their expression. This suggests that the regulation of these genes by BARX2 may require additional factors that bind to elements located elsewhere in the gene.
BARX2 may also bind to multiple sites within a single gene target, and these sites might be used differentially in various developmental contexts. For example, our studies indicate that BARX2 binds two conserved functional HBS elements in NCAM1, one in the promoter region and one in the first intron. Previous studies showed that the promoter HBS element is required for expression in the spinal cord in embryonic mice; however, the promoter region is not sufficient to drive expression in all of the tissues that normally express Ncam1 (7). Thus, the overall combination of both promoter and intronic regulatory regions might be necessary to drive the wild-type NCAM1 expression pattern during development.
The direct regulatory targets of BARX2 identified in this study include molecules that control the growth of MCF7 breast cancer cells. The estrogen receptor ␣ controls proliferation via direct effects on cell cycle progression (38 -40), and anillin plays an essential role in cytokinesis (26,41) (26,41). FGF12 is poorly characterized, however most members of this family can influence cell growth as well as differentiation. Two types of breast cancers are defined based on the presence or absence of ESR1 gene expression: one type is estrogen-dependent for growth and can be treated with anti-estrogens, while the other type is estrogen-independent and resistant to antiestrogen treatment. In this study we have shown that BARX2 and ESR1 are both expressed in MCF7 cells, while neither gene is expressed in MDA-MB-231 cells. Moreover, we have observed that T47D breast cancer cells also express both BARX2 and ESR1 (not shown). This raises the possibility that BARX2 and ESR1 gene expression are correlated; further studies will be required to determine whether co-expression of BARX2 and ESR1 is a common feature of breast cancers in vivo. In addition, the two cell lines that express BARX2 and ESR1, MCF7 and T47D, are non-invasive, while MDA-MB-231 cells are highly invasive. Thus, there appears to be a negative correla-tion between BARX2 and ESR1 expression and invasion. This is consistent with the role of BARX2 in inhibiting invasion of ovarian cancer cells through regulation of cadherin-6 expression (11). Future studies will focus on determining whether specific BARX2 target genes identified in this study act downstream of BARX2 to influence the growth and invasion of breast cancer cells.
In addition to its role in cancer progression, we have recently shown that BARX2 regulates chondrogenesis and muscle differentiation (8,10). Two putative BARX2 target genes identified in this study, NCAM1 and SOX5, are known to promote the differentiation of mesenchymal cells into muscle and/or cartilage (42)(43)(44). These genes are good candidates for mediating the effects of BARX2 on these developmental processes. The cellular changes that occur during the differentiation of myoblasts into myotubes require dramatic alterations of the actin cytoskeleton. BARX2 induces smooth muscle ␣-actin expression during skeletal myotube formation and promotes stress fiber formation in fibroblasts (10), suggesting that it promotes muscle development by modulating the actin cytoskeleton. Several other BARX2 target genes identified in this study are known to influence the cytoskeleton directly. Anillin and filamin-A are actin bundling proteins that control cytokinesis and the formation of stress fibers, respectively (26,41,45), and dynein controls the transport of molecules and organelles along actin microtubules (8). The cellular processes that involve the NCAM1, SOX5, anillin, filamin-A, and dynein genes, such as cell adhesion, cytoskeletal remodeling and cell cycle regulation, are essential components of the myoblast differentiation program. Regulation of these genes by BARX2 would indicate that rather than influencing only one pathway, BARX2 promotes differentiation by coordinating these processes.
Using ChIP to identify BARX2 binding sites and target genes has provided insight into the nature of the sequences and interactions that are involved in regulation of gene expression by BARX2 in the context of native chromatin. Moreover, the results of this study greatly increase our understanding of the regulatory network that is controlled by BARX2 during development. Further studies may help to elucidate the mechanisms by which this network is coordinately controlled.