Pearson Correlation Analysis of Microarray Data Allows for the Identification of Genetic Targets for Early B-cell Factor* □ S

B lymphocyte development is a complex biological process critically dependent on the transcription factor early B cell factor (EBF). To deepen understanding of the roles for EBF in this process, we have used Pearson correlation analysis to evaluate microarray data from a set of mouse B lymphoid cell lines representing different stages of development. Comparing the expression pattern of EBF to that of the other genes in the data set revealed that VpreB1 , mb-1 , and (cid:1) 5 , all known target genes, presented high correlation values to EBF. High correlations were also seen for the VpreB3 and CD19 genes and biochemical as well as functional data supported that they are target genes for EBF even though the expression of CD19 was critically dependent of Pax-5. We also obtained evidence for extensive collabo-rative actions of EBF and E47 even though microarray analysis of hematopoetic progenitor cells ectopically expressing these proteins suggested that they activated only a subset of pre-B cell restricted genes. B cell development

B cell development proceeds from a multipotent progenitor in the bone marrow into a highly specialized immunoglobulin secreting plasma cell. The process can be divided into several stages based on the recombination status of the immunoglobulin genes and gene expression patterns (1,2). The differentiation pathway is dependent on the action of transcription factors that act to coordinate the expression of these stage-specific genes (3). In the earliest stages of B cell development there exists an apparent need of the coordinated action of transcription factors Pu.1 (4), EBF 1 (5), and E2A (6) for the formation of the earliest progenitors, whereas the transcription factor BSAP (7,8) is crucial for lineage commitment (9 -11) and progression into the pre-B cell stage (12,13). Homologous disruptions of the genes encoding these proteins in mice have proven their importance in vivo (12)(13)(14)(15)(16)(17)(18)(19) but even though several target genes are identified (3,20,21), there is still a need to elucidate how they exert their biological functions to establish and promote B cell development. One possibility to obtain information about genetic programs and coordinated gene expression is by the use of microarray technology that allows for the simultaneous measurements of the expression levels of several thousand genes. We have earlier used a set of B cell lines arrested at different stages of their development to establish a crude map over gene expression patterns in B cell differentiation (22). The analysis of control gene expression suggested that this approach allowed for a reasonable approximation of expression patterns (22) giving us a tool to investigate also the coordination of stage-specific gene activation. We have used part of these data to estimate the relative importance of transcription factors in the activation of the mouse mb-1 (Ig␣) promoter by Pearson correlation analysis (23). This promoter contains binding sites for EBF, E47, BSAP, and Ets proteins (23)(24)(25)(26)(27) and when comparing the expression levels of the mb-1 message to the levels of the transcription factors, EBF gave a 98% correlation, whereas BSAP and Ets proteins resulted in lower values (23). This indicated to us that the expression of the mb-1 gene was highly dependent of EBF and that this type of analysis can provide important biological information.
To increase the understanding of the role of EBF in early B cell development we decided to investigate if global Pearson correlation analysis of microarray data can be used to identify genetic targets. The expression levels of EBF displayed a high correlation with a number of known EBF target genes but also of other genes including the VpreB3 and CD19 genes. A role for EBF in the regulation of these genes could also be supported by biochemical and functional assays suggesting that correlation analysis can be used to investigate genetic networks and identify targets for transcription factors in complex biological systems.

MATERIALS AND METHODS
Tissue Culture Conditions-All cells were grown at 37°C and 5% CO 2 in RPMI supplemented with 7.5% fetal calf serum, 10 mM HEPES, 2 mM pyruvate, 50 M 2-mercaptoethanol, and 50 g/ml gentamycin (all purchased from Invitrogen AB). The pro-B cell lines were grown in RPMI as above supplemented with 10% of interleukin-3 containing WEHI3-conditioned media. The Ba/F3 cells were kind gifts from Dr. R. Grosschedl (Gene Center, Munich).
Gene Expression Analysis-RNA was prepared using Trizol (Invitrogen) and 7.5 g of total RNA was annealed to a T7-oligo T primer by denaturation at 70°C for 10 min followed by 10 min of incubation of the samples on ice. First strand synthesis was performed for 2 h at 42°C using 20 units of SuperScript Reverse Transcriptase (Invitrogen) in buffers and nucleotide mixtures according to the manufacturer's instructions. This was followed by a second strand synthesis for 2 h at 16°C, using RNase H, Escherichia coli DNA polymerase I, and E. coli DNA ligase (all from Invitrogen), according to the manufacturer's instructions. The obtained double stranded cDNA was then blunted by the addition of 20 units of T4 DNA polymerase and incubated for 5 min at 16°C. The material was then purified by phenol:cloroform:isoamyl alcohol extraction followed by precipitation with NH 4 Ac and ethanol. The cDNA was then used in an in vitro transcription reaction for 6 h at 37°C using a T7 in vitro translation kit and biotin-labeled ribonucleotides. The obtained cRNA was purified from unincorporated nucleo-tides on an RNAeasy column (Qiagen). The eluted cRNA was then fragmented by incubation of the products for 2 h in fragmentation buffer (40 mM Tris acetate, pH 8.1, 100 mM KOAc, 150 mM MgOAc). 20 g of the final fragmented cRNA was hybridized to the Affymetrix chip U74Av2 (Affymetrix) in 200 l of hybridization buffer (100 mM MES buffer, pH 6.6, 1 M NaCl, 20 mM EDTA, 0.01% Tween 20) supplemented with herring sperm DNA (100 g/ml) and acetylated bovine serum albumin (500 g/ml) in an Affymetrix Gene Chip hybridization oven 320. The chip was then developed by the addition of fluorescein isothiocyanate-streptavidin followed by washing using an Affymetrix Gene Chip Fluidics Station 400. Scanning was performed using a Hewlett-Packard Gene Array scanner.
Data Analysis-Affymetrix gene chip arrays were analyzed using the dChip software (28) and obtained model-based expression values (based on the PM/MM model) were exported. Standard Pearson correlation coefficients were determined for each gene versus all other genes using data from 16 different Affymetrix arrays. As a measurement of random/ average correlation between genes the average number of correlations for all given values were calculated. (Process was automated and programs can be made available on request.) The slope of the fitted line, K i , used to compare expression levels of EBF to the rest of the data set, was calculated using linear regression minimizing the sum of the square errors.
Protein Extracts and Electrophoretic Mobility Shift Assay (EMSA)-Nuclear extracts were prepared according to Schreiber et al. (29). DNA probes were labeled with [␥-32 P]ATP by incubation with T4 polynucleotide kinase (Roche Applied Science), annealed with the complementary strand and purified on a 5% polyacrylamide TBE gel. 5-10 g of nuclear extract or 0.5-2 l in vitro transcribed/translated protein was incubated with the labeled probe (20,000 cpm, 3 fmol) for 30 min at room temperature in binding buffer (10 mM HEPES, pH 7.9, 70 mM KCl, 1 mM dithiothreitol, 1 mM EDTA, 2,5 mM MgCl 2 , 5% glycerol) with 0.75 g of poly(dI-dC) (Amersham Biosciences Inc.). 1 mM ZnCl 2 was added to shift assays performed in the presence of EBF. DNA competitors were added 10 min before addition of the DNA probe. The samples were separated on a 6% acrylamide TBE gel, which was dried and subjected to autoradiography. Competitors based on synthetic oligonucleotides were added at molar excesses indicated in the respective figures. Fulllength 5, mb-1, CD19, and VpreB promoters were generated by PCR (see below) and were added at molar excesses as indicated in the respective figures.
Plasmids and Constructs-The E47FD plasmid was based on the neomycin encoding eukaryotic expression vector pcDNA3 (Invitrogen) placing the inserted cDNA under the control of a cytomegalovirus promoter. The EBF encoding plasmid mediating puromycin resistance was based on the retroviral vector pBabe (30). The mb-1 and 5 promoter constructs have been reported previously (23,31), whereas the VpreB1, CD19, and VpreB3 promoters were cloned by PCR using genomic mouse DNA as template and cloning of the resulting products in the SmaI site of the luciferase reporter plasmid pGL3 basic (Promega). All constructs were verified by sequencing.
In Vitro Transcription and Translation-Recombinant protein was generated by coupled in vitro transcription/translation using a TNT reticulocyte lysate kit (Promega) and pCDNA3 (Invitrogen) template plasmids (32).
Retroviruses and Infection of BaF/3 Cells-The Pax-5 encoding retrovirus virus and the control virus were kind gifts from Professor Sten-Eirik Jacobsen. The viruses were based on murine stem cell virus vectors expressing BSAP and green fluorescent protein, or for the control only green fluorescent protein, as a single bicistronic transcript containing an internal ribosomal entry site. Prior to viral coating 24well non-tissue culture plates were incubated 2 h in room temperature with retronectin (40 g/ml, Stemcell Technologies) followed by 30 min in phosphate-buffered saline supplemented with 2% bovine serum albumin. Plates were then incubated with the viral supernatant for 1 h at tissue culture conditions (see above). 50,000 BaF3 cells were seeded into the coated wells in media supplemented with 10% viral supernatant, 6 g/ml Polybrene (Sigma), and 25 ng/ml murine IL-3 (Stemcell Technologies). Cells were incubated for 24 h before repeating the infection procedure. Day 7 after infection green fluorescent protein-positive cells were sorted on a FACS-Vantage Cell sorter (BD Biosciences, San Jose, CA).
Reverse Transcriptase and Polymerase Chain Reactions-RNA was prepared from cells using Trizol (Invitrogen) and cDNA was generated by annealing 1 g of total RNA to 0.5 g of random hexamers in 10 l of diethyl pyrocarbonate-treated water. Reverse transcriptase reactions were performed with 200 units of SuperScript reverse transcriptase (Invitrogen) in the manufacturer's buffer supplemented with 0.5 mM dNTP, 10 mM dithiothreitol, and 20 units of RNase inhibitor (Roche Diagnostics) in a total volume of 20 l, at 37°C for 1 h. One-twentieth of the RT reactions were used in the PCR assays. PCR reactions were performed with 1 unit of Taq polymerase (Invitrogen) in the manufacturer's buffer supplemented with 0.2 mM dNTP, in a total volume of 25 l. Primers were added to a final concentration of 1 mM.
For all PCRs the program was identical with the exception of the number of cycles (Y) and the annealing temperature (XX).
Q-PCR-RNA was prepared using Trizol (Invitrogen). Total RNA cleanup including DNase treatment was performed with the RNeasy Micro kit (Qiagen). Reverse transcription was performed as described above (see RT and PCR). Q-PCR were performed using 2ϫ TaqMan PCR master mixture (Applied Biosystems) and 20ϫ assay-on-demand TaqMan probes (Applied Biosystems) in a total volume of 20 l. Taq-Man probes used for Q-PCR were: CD19, Mm00515420_m1; Mb-1, Mm00432423_m1; and HPRT, Mm00446968_m1 (all from Applied Biosystems).

Isolation and Purification of Bone Marrow Progenitors and Mature Peripheral B-cells-Bone marrow cells were sorted on a FACS-Vantage
Cell Sorter (BD Biosciences, San Jose, CA), equipped with a 488-nm argon ion (Coherent Enterprise II, Santa Clara, CA) and a 633-nm He-Ne (model 127, Spectra-Physics, Mountain View, CA) laser. Antibodies used were B220-APC, CD43-PE, IgM-biotin (streptavidin TRI), CD19-fluorescein isothiocyanate, and CD138 (Syndecan)-PE all from Pharmingen. The purity of all sorted cell populations is reproducible over 95%.

Pearson Correlation Analysis Allows for the Identification of EBF Target
Genes-Making the presumption that genes regulated in a similar fashion would have correlating expression patterns, we decided to attempt to define genetic targets for EBF using a data set generated from microarray analysis of 14 mouse B lineage cell lines representing different stages of development (22). These cell lines displayed a wide spectra of EBF expression levels and the relative amount of RNA, as estimated by the microarray analysis, appeared to correlate well to the amount of EBF DNA binding activity observed using nuclear extracts from the same cell lines (data not shown). To investigate the correlations between genes in the data set and EBF, we decided to use standard Pearson correlation analysis that results in a numerical value for how well alterations in expression levels of two genes correlates. This type of analysis will generate a wide variety of correlations between different genes, so to obtain information about the levels of correlations that can be expected for any given gene, either by chance or by true correlation, we calculated the correlation values for the whole data set, i.e. ϳ156 million correlations. We then plotted the average number of correlations ՆX (X being the Pearson correlation values between Ϫ1 to ϩ1) found for all the genes in the data set ( Fig. 1). This shows that any given gene in average only displays a correlation Ն0.9 to three other genes, i.e. about 1 in 4000. Thus, Pearson correlation values above or around 0.9 would only rarely be seen as random events and could therefore indicate a biologically relevant link. To investigate if we would be able to use this approach to define target genes for EBF we performed the same analysis focusing on the correlations obtained between genes within the data set and EBF. The number of correlating genes were plotted into a diagram and compared with the average numbers previously calculated (Fig. 1). The analysis suggested that EBF expression correlated significantly better with a set of genes than what would be expected for an average gene. Fifteen genes were found to correlate with a value Ͼ0.9 and another 15 with a value above 0.85, as compared with the predicted value of around 9. Investigating the identity of these highly correlated mRNAs revealed that they included 5, VpreB1, and mb-1 transcripts, all suggested EBF target genes (5,31). This shows that from the 12,000 genes we were able to mathematically extract 3 of 6 known/ suggested EBF target genes among the top 15 (0.12%). The other proposed EBF targets B29 (33), Blk (34), and BSAP (35) were found to display a correlation of 0.81, 0.26, and 0.56, respectively, suggesting that the expression of these genes is not closely linked to the expression of EBF in this data set.
To further investigate the highly correlated genes we postulated that to be a genuine genetic target there should be a high dependence of EBF for the expression of the potential target gene. That is, a doubling in EBF expression levels should result in at least a doubling of target gene expression. Thus, by plotting the expression value of genes displaying a high Pearson correlation with EBF toward the levels of EBF we should obtain a graph with a rather steep slope (high K i ). Calculating the K i value for all the genes in the data set and blotting the correlation values on the x axis and the K i values on the y axis (Fig. 2, Table I), it became apparent that the majority of the genes that displayed a high K i were found in the group with a high Pearson correlation value. Among these were 5, VpreB1, mb-1, VpreB3, IgM, and CD19 (Fig. 2). This indicates that all the known EBF target genes with high correlations displayed a high dependence of EBF, whereas some other highly correlated genes did not. One gene that fulfilled the criteria for being a potentially new EBF target was the VpreB3 gene (36). The expression of this gene has not been extensively studied so to investigate the expression pattern in primary bone marrow B cells we sorted B220ϩ, CD43ϩ, IgMϪ pro-B cells, B220ϩ, CD19ϩ, IgMϪ pre-B cells, B220ϩ, IgMϩ, CD19ϩ B cells, and CD19Ϫ Syndecanϩ plasma cells and performed RT-PCR analysis of the RNA content of these cells (Fig. 3). EBF was expressed at the highest level in the pre-B cell population, whereas some expression was detected also among the B and plasma cell populations. The peak of expression of 5 and VpreB3 was also observed at the pre-B cell stage but while the 5 message only could be detected in the pre-B cells, VpreB3 expression followed that of EBF also in primary cells. Thus, our mathematical analysis of microarray data suggests that the expression levels of a limited but well defined set of genes are correlated to the expression of EBF in B cell development.
EBF Shares Several Target Genes with E47-To investigate if the promoters of the potential target genes we defined by the Pearson/K i analysis had the ability to interact with EBF, we performed EMSA analysis using the mb-1 promoter EBF site (25) and recombinant in vitro translated EBF protein (Fig. 4A).  The ability of the target promoters to compete for complex formation between the recombinant protein and the binding site was then assayed by the inclusion of PCR products obtained by amplification using the promoters cloned in a pGL3 vector and primers directed against vector sequences flanking the promoters or the polylinker in the control vector (Fig. 4A). The 5 and mb-1 promoters competed as expected for complex formation, whereas the amplified pGL3-polylinker did not. The VpreB1 promoter has been suggested to be a direct target for EBF (37) and this control element competed for complex formation although with a low efficiency. Also the CD19 promoter competed for complex formation suggesting that this promoter contains a binding site for EBF. There have been no reports concerning the characterization of the VpreB3 promoter but the GenBank TM registration of three full-length enriched cDNA sequences (NM009514, AK008794, and BL062250) allowed for the identification of a putative promoter 5Ј of the coding gene (36). We cloned this putative promoter into a luciferase reporter plasmid after PCR amplification of genomic DNA and subsequent transfections of reporter constructs under control of this element suggested that it displayed a weak promoter activity in pre-B and B lineage cell lines (data not shown). Using a PCR product from this putative VpreB3 promoter as competitor resulted in a reduced formation of the labeled complex suggesting that the promoter was able to compete for binding to recombinant EBF protein. Because VpreB3 and CD19 represent novel target genes, we investigated their promoter sequences and tested potential EBF binding sites in competition experiments as above (Fig. 4A). This allowed us to define EBF binding sites in both the VpreB3 (VpreB3:EBF) and the CD19 (CD19:EBF) promoters supporting the idea that they are direct targets for EBF. The limited ability of the VpreB1 promoter to compete for complex formation made us examine the genomic surroundings of this gene searching for potential additional EBF binding sites. One of these presumed sites located in the VpreB1 intron was able to interact with EBF (VpreB1:intron, Fig. 4A), possibly contributing to the EBF responsiveness of this gene. The 5, VpreB1, and mb-1 promoters have been shown to be direct targets for E-proteins (E2A, Heb, and E2-2) (38, 39) as well as EBF (23,31) and a collaboration between these proteins in gene regulatory events in the early B cells has been supported both in vitro (23,31,32) and in vivo (35). Inspection of the DNA sequence of the VpreB1, CD19, and VpreB3 promoters revealed that they all contained E-boxes (CANNTG) (41) suggesting that these genes could be targets for E-proteins such as E47. To investigate if E47 was able to interact with these promoters, we competed for complex formation of recombinant E47 and an E-box from the mouse immunoglobulin heavy chain enhancer (E5) by the addition of PCR amplified promoter elements (Fig. 4C). The addition of the promoter DNA from the mb-1, 5, VpreB1, VpreB3, or CD19 genes resulted in competition for complex formation, whereas the addition of amplified polylinker did not. This suggests that all the promoters are able to interact with E47 in vitro. By competition experiments using oligonucleotides spanning specific E-boxes, we were also able to define functional E-boxes in the VpreB1, CD19, and VpreB3 promoters as well as 3Ј of the VpreB1 gene. These data support the idea that several of the genes defined as dependent of EBF from the correlation analysis contain binding sites for E47 and thus might represent targets for coordinated action of these transactivators.
Stable Ectopic Expression of E47 and EBF Induces Expression of Pre-B Cell-restricted Target Genes-To investigate the ability of EBF and E47 to induce B lineage genes in a hematopoetic progenitor cell, we made stably transfected Ba/F3 cells ectopically expressing these transcription factors. Ba/F3 cells represent an interleukin-3-dependent hematopoetic progenitor cell (42) with low expression of B-lineage genes such as B29 and mb-1 but without any expression of EBF or surrogate light chain genes (31). These progenitor cells express E47 as judged by Western blot (23), but none of the B lineage restricted the E47 homodimer BCF1 as judged by EMSA analysis (Fig. 5) (23). The lack of BCF1 is believed to be a result of lineagerestricted post-translational modifications of the E47 protein (43), so to achieve the formation of this protein complex in the progenitor cell, we utilized a forced-dimer of E47 (31) to generate stably transfected Ba/F3 cells expressing both BCF1 and EBF. One of these clones has been reported earlier (31), whereas three more (clones 3, 5 and 6, Fig. 5) were newly established by co-transfection of a neomycin resistance encoding E47FD (31) and a puromycin resistance encoding EBF expression plasmid (30). The cells were double selected and clones were screened for protein expression by EMSA using an mb-1 promoter EBF site or an IgH intron enhancer E-box (E5) (31) (Fig. 5). Screening of 60 clones resulted in three with significant expression of both E47FD and EBF as judged by EMSA (Fig. 5). RNA was prepared from these cells in parallel with the production of the nuclear extracts and after cDNA synthesis and in vitro transcription used to hybridize an Affymetrix U74Av2 chip. As reference we used two selected neo/ pyro-resistant lines and one parental Ba/F3 clone. The obtained data were analyzed by treating the different measurements within the control or transfected groups as replicas to reduce the impact of clone-specific features, and the average alterations in expression levels of genes classified as present in at least three of the clones were investigated. This suggested a more than 3-fold increase in expression of 29 genes and a similar decrease in the expression of 15 genes (Table II and Supplementary Materials). Among the regulated genes were several of those we predicted to be genetic targets for EBF and E47 from the correlation analysis. These included 5 and VpreB1, where two probe sets for each gene detected induction and also mb-1, Blk, and IgM were modestly induced in the transfected cells. To verify the induction of some presumed target genes we performed a RT-PCR analysis of the newly generated clones (Fig. 6, data not shown). This indicated that VpreB1 and 5, as expected, were induced to a low level in EBF-transduced cells, whereas no expression was detected in E47FD expressing clones. The expression was, however, more robust in the cells expressing both EBF and E47FD supporting the idea of a functional synergy between these proteins. The same profile was observed when we investigated the expression of the VpreB2 gene. We could also detect an increased expression of VpreB3, even though we were unable to detect this in the microarray analysis. However, in contrast to 5 and VpreB1, we detected a basal expression of VpreB3 in the pa- FIG. 3. RT-PCR analysis of the VpreB3 expression pattern in bone marrow B cells supports a correlation with EBF levels. The panels show ethidium bromide-stained agarose gels of PCR products obtained from 5-fold serial dilutions of cDNA. Primary cells from different stages of B cell development were isolated and sorted from bone marrow using antibodies against surface markers. RNA was extracted and after cDNA synthesis RT-PCR was performed to detect the levels of HPRT, 5, EBF, and VpreB3 transcripts as indicated.
rental BaF3 cells. We were also able to verify a modest upregulation of mb-1 expression by real time PCR (Fig. 6B). Even though a set of pre-B cell-specific genes were induced, we were unable to detect expression of Rag-1, Pax-5, CD19, or TdT by PCR ( Fig. 7 and data not shown) in any of the clones. Thus, we suggest that EBF has the ability to induce expression of several defined target genes and that the function of EBF in early B cell development is highly dependent of the presence of BCF1.
EBF/E47 and Pax-5 Cooperate in the Induction of the CD19 Gene-Even though we were able to find both EBF and E47 binding sites in the CD19 promoter we were unable to detect any expression of CD19 message in our stably transfected clones. The expression of CD19 has been shown to be highly dependent on Pax-5 (12,13) and to investigate if EBF/E47 and Pax-5 coordinately were involved in the regulation of the CD19 gene, we transduced two of our stably transfected EBF/E47FD expressing BaF/3 clones and one E47FD expressing clone with a Pax-5/green fluorescent protein encoding retrovirus. The expression of the transcription factors after 7 days of cultivation was then assayed by EMSA (Fig. 7A). RT-PCR analysis suggested that Pax-5 induced a low expression of the CD19 gene but cells expressing all three proteins displayed a more robust CD19 expression (Fig. 7B). This was totally dependent of Pax-5 because control virus-infected cells from the same clones did not express any detectable amounts of CD19 message even after 40 cycles of PCR (Fig. 7, B and C). To get a more accurate quantification of the induction levels we performed real time PCR analysis of CD19 expression (Fig. 7C). This revealed that the expression levels were 10 -12-fold higher in the cell lines expressing EBF and E47 together with Pax-5 indicating that the CD19 gene may be a target for coordinated activation by Pax-5 and EBF/E47. DISCUSSION EBF is a crucial transcription factor in early B cell development and we here report of the identification of additional genetic targets for this protein. The identification of the VpreB2, VpreB3, and CD19 genes as targets for EBF and E47

TABLE II This table displays a selection of genes up regulated in BaF/3 cells expressing EBF and E47FD
The induction value is based on the average increase in expression in four independently generated stably transfected clones as compared to the level of tree control samples. For a gene to be included in the analysis we demanded present calls in at least three of the samples unless indicated by an asterisk (*). confirms that these two proteins often act in concert. This has now been shown for several genes in cell line systems (23,31,44,45) but also by in vivo studies of mice that are transheterozygote for mutations in the EBF and E47 encoding E2A genes (35). These mice display a more pronounced impairment of B cell development than mice carrying either of the heterozygote mutations. The exact molecular mechanisms of this functional synergy has not yet been resolved even though some studies indicate an increased stability of the DNA binding complexes in the presence of both factors in vitro (23,32). 5 and VpreB1 are encoded from genes closely linked in the genome (46) and the coordinate activation of these genes could be because of the activation of common regulatory elements. However, the coordinated activation of the VpreB2 gene, highly homologous to the VpreB1 gene but located in another position at the same chromosome (46,47), suggests that VpreB1 and VpreB2 are both regulated in a similar manner by control elements in the conserved regions. This was further supported by the identification of both conserved E47 and EBF binding sites also in the intron and 3Ј of the coding gene. We could also detect modulation of expression levels of a set of other genes, some of which were dependent on EBF alone, whereas the majority of these responded to the coordinated expression of the transcription factors (data not shown, Table II). The majority of the genes were not associated with the pre-B cell stage and even though this does not exclude that they indeed are target genes for EBF and E47 we have not investigated them further. One gene that could be of potential interest is the CEACAM1 gene that was up-regulated in the stably transfected cells (Table II). In contrast to what could be observed for most of the other genes, CEACAM1 expression was enhanced also in the absence of E47FD (data not shown). Even though this gene is broadly expressed, this surface protein has been shown to enhance B cell receptor signaling (48) and thus, may play a role in B-cell development. Even though it is likely that the function of EBF and E47 can be affected by both the transcription factor context and by the epigenetic status of the parental cell line, we believe that these two proteins themselves induce a limited set of genes in the developing B lymphocyte. Another potential role for EBF and E47 could be to aid Pax-5 in the activation of pre-B cell restricted target genes. Several lines of investigation, including studies in Pax-5-deficient pro-B cells, support the idea that CD19 expression is critically dependent on Pax-5 (12,13). Our studies in BaF/3 cells support the idea of a crucial role for Pax-5 but also provide some data suggesting that EBF and E47 may support the activity of Pax-5 on its target genes. This would then provide another important mechanism in B cell differentiation where early acting factors provide an environment for increased activity of the commitment factor Pax-5. We also report that putting numerical values on the relative correlations between a transcription factor and a whole gene set using data generated by microarray analysis, can aid in the identification of genetic targets for the protein. Using this type of mathematical analysis will to a certain degree produce false correlations both because of random events and simply overlapping expression patterns but the numbers of randomly generated positive correlations appear to decrease dramatically around 0.9. It also appears as the K i value can be used to further reduce the number of potential target genes down to reasonable numbers. To obtain useful information from this approach it demands that the transcription factor on which we focus acts in a dominant manner so to create a high correlation and the usefulness of the method will be highly dependent on the factor under observation. Correlation analysis of E47 expression versus the other genes in the data set was, for instance, uninformative because we were unable to observe any strong correlations to the expression of the E2A mRNA. This FIG. 6. Ectopic expression of EBF and E47 induces a subset of pre-B cell restricted genes. Panel A shows ethidium bromide-stained agarose gels with the resulting PCR products after amplification of either the control gene 36B4 or potential EBF target genes using cDNA from control or EBF-, E47-, or EBF/E47-transfected BaF/3 cells as indicated. The ϪRT reaction contains the input material that has not been converted into cDNA by addition of reverse transcriptase. The experiment is representative for several independent RT-PCR experiments and at least two clones of each type were analyzed with comparable results. Panel B displays a real time PCR analysis of mb-1 expression in parental or EBF/E47 expressing BaF/3 cells. might be a result of the fact that E47 is a part of a network involving both repressors and redundant activators of transcription (38). The functional relevance of this network in B cell development has been established both in transgenic mice models where expression of the E47 inhibitor Id-1 in pro-B cells resulted in impaired B cell development (49) and in mice lacking combinations of E-proteins (18,19,50,51). The role of a specific factor in such networks will be hard to elucidate by correlation analysis unless several dimensions of the data set are analyzed simultaneously. The K i value could also provide information about the amplification loops within a defined genetic system as well as of genes upstream of the transcription factor (low K i ). Optimal usage of the large amounts of information that are generated in microarray experiments represents a large challenge and even though the method we describe in this report has limitations and may not be useful in all model systems, it allowed us to isolate three known and two novel genetic targets for EBF in the top 0.12% of the investigated genes. This provides proof of concept and it is likely that this approach can be used also in other model systems to provide clues of genetic networks and transcription factor target genes. Even though we here report of additional EBF target genes, none of those defined to date are likely to explain either the complete lack of mature B cells in EBF-deficient mice (17) or the apparent ability of EBF to reduce T cell development (40) and a continued search for additional genetic targets will be needed to understand the full function of this factor.