Identification of Biologically Relevant Enhancers in Human Erythroid Cells*

Background: Programs of cellular development and differentiation are controlled by enhancers. Results: Human erythroid cell type-specific enhancers are marked by p300 and groups of transcription factors. Conclusion: Enhancers are important regulators of species-specific erythroid cell structure and function. Significance: Deciphering how nonpromoter regulatory elements control gene expression in erythroid cells is important for understanding inherited and acquired hematologic disease. Identification of cell type-specific enhancers is important for understanding the regulation of programs controlling cellular development and differentiation. Enhancers are typically marked by the co-transcriptional activator protein p300 or by groups of cell-expressed transcription factors. We hypothesized that a unique set of enhancers regulates gene expression in human erythroid cells, a highly specialized cell type evolved to provide adequate amounts of oxygen throughout the body. Using chromatin immunoprecipitation followed by massively parallel sequencing, genome-wide maps of candidate enhancers were constructed for p300 and four transcription factors, GATA1, NF-E2, KLF1, and SCL, using primary human erythroid cells. These data were combined with gene expression analyses, and candidate enhancers were identified. Consistent with their predicted function as candidate enhancers, there was statistically significant enrichment of p300 and combinations of co-localizing erythroid transcription factors within 1–50 kb of the transcriptional start site (TSS) of genes highly expressed in erythroid cells. Candidate enhancers were also enriched near genes with known erythroid cell function or phenotype. Candidate enhancers exhibited moderate conservation with mouse and minimal conservation with nonplacental vertebrates. Candidate enhancers were mapped to a set of erythroid-associated, biologically relevant, SNPs from the genome-wide association studies (GWAS) catalogue of NHGRI, National Institutes of Health. Fourteen candidate enhancers, representing 10 genetic loci, mapped to sites associated with biologically relevant erythroid traits. Fragments from these loci directed statistically significant expression in reporter gene assays. Identification of enhancers in human erythroid cells will allow a better understanding of erythroid cell development, differentiation, structure, and function and provide insights into inherited and acquired hematologic disease.

Identification of cell type-specific enhancers is important for understanding the regulation of programs controlling cellular development and differentiation. Enhancers are typically marked by the co-transcriptional activator protein p300 or by groups of cell-expressed transcription factors. We hypothesized that a unique set of enhancers regulates gene expression in human erythroid cells, a highly specialized cell type evolved to provide adequate amounts of oxygen throughout the body. Using chromatin immunoprecipitation followed by massively parallel sequencing, genome-wide maps of candidate enhancers were constructed for p300 and four transcription factors, GATA1, NF-E2, KLF1, and SCL, using primary human erythroid cells. These data were combined with gene expression analyses, and candidate enhancers were identified. Consistent with their predicted function as candidate enhancers, there was statistically significant enrichment of p300 and combinations of co-localizing erythroid transcription factors within 1-50 kb of the transcriptional start site (TSS) of genes highly expressed in erythroid cells. Candidate enhancers were also enriched near genes with known erythroid cell function or phenotype. Candidate enhancers exhibited moderate conservation with mouse and minimal conservation with nonplacental vertebrates. Candidate enhancers were mapped to a set of erythroid-associated, biologically relevant, SNPs from the genome-wide association studies (GWAS) catalogue of NHGRI, National Institutes of Health. Fourteen candidate enhancers, representing 10 genetic loci, mapped to sites associated with biologically relevant erythroid traits. Fragments from these loci directed statistically significant expression in reporter gene assays. Identification of enhancers in human erythroid cells will allow a better understanding of erythroid cell development, differentiation, struc-ture, and function and provide insights into inherited and acquired hematologic disease.
Erythrocytes are specialized cells that have evolved to efficiently carry out their primary functions of oxygen transport and delivery. Among the vertebrates, mammalian erythrocytes are unique. Mature mammalian erythrocytes are enucleate and lack most cellular organelles. Erythroid progenitor cells contain nuclei, but they are extruded, presumably to allow for additional hemoglobin content for more efficient oxygen transport. Lacking DNA, mature erythrocytes lack the capacity for cell division or RNA synthesis, and they have very limited capacity for self-repair. Thus, erythroid cells are highly specialized cells with a number of unique characteristics.
The regulation of programs controlling cellular development and differentiation vary temporally, between cell and tissue types, and between species. These programs are controlled by critical regulatory DNA sequences, cis-regulatory modules (CRMs), 3 which include gene promoters, enhancers, silencers, and insulators. A detailed understanding of the structure and function of CRMs in varying cell types will provide insights into these regulatory programs and provide crucial information for predicting and understanding the phenotypic consequences of genetic variation in noncoding DNA. Recent studies utilizing genomic methodologies have shown that enhancers, a class of CRMs, are frequently associated with disease-associated genetic variants (1)(2)(3)(4)(5)(6).
Mammalian genomes contain more enhancers than promoters, with enhancers subserving numerous roles in controlling gene regulation (6,13,39). Tissue-specific genes are more dependent on enhancer regulation and exhibit less promoter diversity than housekeeping genes, which are primarily regulated by their promoters with few enhancers in their genomic vicinity (1). Identification and characterization of enhancers that control programs of gene expression in highly specialized human erythroid cells will allow a better understanding of erythroid cell development, differentiation, structure, and function as well as provide insights into inherited and acquired hematologic disease.
This report describes the construction of genome-wide maps of candidate enhancers in primary human erythroid cells. Genome-wide maps of p300 and four erythroid transcription factors, GATA1, NF-E2, KLF1, and SCL, in human primary erythroid cell chromatin were constructed and analyzed with parallel gene expression analyses. Consistent with their predicted function, these regulatory elements were enriched near genes highly expressed in erythroid cells or involved in erythroid cell structure and function. Conservation analyses of candidate human enhancers revealed only moderate conservation with mouse and minimal conservation with nonplacental vertebrates. Fourteen candidate enhancers, representing 10 genetic loci, mapped to sites previously associated with biologically relevant erythroid cell traits in GWAS. Fragments from 9 of the 10 biologically relevant regions directed statistically significant expression in reporter gene assays.

EXPERIMENTAL PROCEDURES
Cell Culture and Selection-To obtain primary human erythroid cells, CD34 ϩ cells were cultured and selected as described (41,42). These cells represent the R3/R4 cell population of nucleated erythroid cells defined by Zhang et al. (43).
RNA Isolation and Preparation, Microarray Data Acquisition, and Analyses-RNA was prepared from primary human erythroid cells and prepared for microarray analyses as described (44,45) and detailed in the supplemental Methods. Gene expression microarray quality control and data analyses are described in the supplemental Methods. Quantitative realtime PCR was performed to confirm expression levels of RNA transcripts with the primers in supplemental Table S1. Realtime PCR data were normalized as described (45). Triplicate analyses were performed for each target (44,46).
Illumina High Throughput Sequencing and Data Analyses-DNA processing and high throughput sequencing were performed as described (44). Sequenced reads were mapped to the human genome (UCSC Genome Browser hg18 (47), NCBI Build 36 using the Eland short-read alignment program. The Model-based Alignment of ChIP-Seq (MACS) program was used to identify peaks with a p value of Ͻ10eϪ5 (48). Localization of binding sites relative to known genes was done using the ChIPseeqer package (49). Factor co-localization was determined using the Active Region Comparer. Motif finding was done using the Homer algorithm (50). Conservation of candidate enhancer regions between corresponding genomic regions of vertebrates was determined using the UCSC hg18 genome browser database (47) with the 44-way vertebrate and placental mammal PhastCons track (51).
The PhastCons conservation scores of regions surrounding promoters, exons, and distal and intergenic regions were compared with the PhastCons scores of randomized regions generated by combining the regions for all transcription factor binding sites and moving the regions to random locations in the genome outside of gaps in the known hg18 sequence using the BedTools ShuffleBed function. Conservation plots were generated using Cistrome (52). Conservation of human candidate enhancer regions was analyzed using the UCSC LiftOver tool. For LiftOver controls, sites were concatenated, randomly shuffled across the genome, and analyzed. The maximum Phast-Cons score for each candidate enhancer mapped to sites previously associated with biologically relevant erythroid cell traits in GWAS studies was determined using the Galaxy aggregate function (53,54). The UCSC Genome Browser 7X regulatory potential table was used to determine the maximum regulatory potential (RP) scores for each region (54,55).
Identification and Analysis of Biologically Relevant SNPs-The locations of SNPs shown to demonstrate highly significant linkage to erythroid cell-related traits were obtained from the UCSC Genome Browser database and the catalogue of GWAS compiled by NHGR, National Institutes of Health (4). Using BedTools software (see supplemental Methods), nonpromoterrelated p300 peaks (TSS to Ϯ1 kb) were intersected with erythroid-related SNPs, and overlap was identified. Similarly, peaks with two or more sites of erythroid transcription factor binding identified by Active Region Comparer were intersected with erythroid-related SNPs.
Validation of ChIP-seq Results-Primers were designed for representative binding regions for all five antibodies in target genes identified by the MACS program (supplemental Table  S2). Immunoprecipitated DNA was analyzed by quantitative real-time PCR (iCycler, Bio-Rad) as described (44).
Reporter Gene Assays-Fourteen candidate enhancer regions were PCR-amplified using oligonucleotide primers immediately flanking the boundaries of the called peaks (supplemental Table S3). These fragments were cloned upstream of a SV40 promoter-firefly luciferase reporter cassette in the pGL2Promoter plasmid. The integrity of all test plasmids was confirmed by sequencing. The negative control plasmid contained a promoterless-firefly luciferase gene cassette, PGL2Basic (Promega), and the positive control plasmid contained a ␥-globin gene promoter-firefly luciferase reporter gene cassette with the human ␤-globin gene HS2 enhancer cloned upstream of the ␥-globin-luciferase cassette (56). 10 7 K562 cells (CCL 243, ATCC) were transfected by electroporation with a single pulse of 300 V at 950 microfarads with 15 g of test plasmid and 0.3 g of pRL-TK, a reporter plasmid expressing Renilla luciferase driven by the herpes simplex virus thymidine kinase promoter (Promega) as described (57). At least two preparations of each plasmid were tested in triplicate. Two days after transfection, cell extracts were analyzed using the Dual-Luciferase assay according to manufacturer's instructions (Promega). Firefly luciferase activity directed by each of the test plasmids, corrected for the Renilla luciferase activity of the cotransfection control, was normalized by firefly luciferase activity from the pGL2P control plasmid to obtain the -fold change. Statistical significance was determined as p Ͻ 0.05 by a onetailed Student's t test.
Data Access-The raw data files generated by the ChIP-seq analyses and microarray assays have been submitted to the Gene Expression Omnibus (GEO) for use by other investigators (reference series GSE43626). The mRNA microarray experiments comply with MIAME (Minimum Information About a Microarray Experiment) standards (58).

mRNA Expression and p300 and Erythroid Transcription Factor ChIP-seq Analyses in Human Primary Erythroid Cells-
Human primary erythroid cells, representing the R3/R4 populations of cells (43), were cultured from human CD34 ϩ stem and progenitor cells. Transcriptome analyses were performed with erythroid cell mRNA hybridized to Illumina human v2 mRNA expression arrays. Levels of expression were assigned absent or present calls using the Illumina detection p values based on negative control hybridization probes. Of 19,707 transcripts examined, 8678 transcripts were expressed. Quantitative real-time PCR was performed to validate expression levels of representative mRNA transcripts assigned by the expression arrays (supplemental Table S4 and Fig. S1).
Using primary erythroid cell chromatin, ChIP-seq was performed utilizing antibodies specific for the transcriptional coactivator p300 and the erythroid transcription factors GATA1, NF-E2, KLF1, and SCL/Tal1 to generate genome-wide maps of factor binding. Genome-wide maps of H3K4me2 and H3K4me3 occupancy were similarly constructed. The MACS program was used to identify peaks with a cut-off of p Ͻ 10eϪ5 (supplemental Table S5). Validation of factor enrichment at selected peaks identified by ChIP-seq was performed by quantitative ChIP PCR for all five antibodies (supplemental Table S6).
Sites of p300 and Erythroid Transcription Factor Occupancy in Erythroid Cell Chromatin-The human genome was portioned into six bins relative to RefSeq genes corresponding to exons, introns, promoters, distal (Ϫ1 to Ϫ50 kb), downstream (ϩ1 to ϩ50 kb), and intergenic regions. Sites of factor occupancy were assigned to these bins, and percentages were calculated (Fig. 1). p300 and erythroid transcription factors were enriched in introns and distal regions (1-50 kb from a RefSeq gene) (Fig. 1). p300 occupancy was also enriched in promoters and exons, consistent with the alternate role of p300 as a transcriptional co-activator at gene promoters. Sites of erythroid transcription factor binding were also enriched in intergenic regions (Ͼ50 kb from a RefSeq gene; 17-18%) (Fig. 1). As in previous reports, erythroid transcription factor binding was very common in intron 1 of RefSeq genes (data not shown) (44,59). These data suggest that transcription factors mark enhancers more commonly than p300 in erythroid cells because bona fide enhancers are expected to act in the genomic vicinity of their cognate genes (35).
Co-localization of p300 and Erythroid Transcription Factors-The co-localization of p300 and erythroid cell transcription factors was analyzed using Active Region Comparer. Erythroid transcription factors commonly co-localized, especially the combinations of KLF1 and NF-E2, GATA1 and KLF1, and GATA1 and NF-E2 (Table 1). Interestingly, three or more erythroid transcription factors co-localized frequently, ϳ17% of the time. Co-localization of p300 with individual erythroid transcription factors was less frequent (Table 1). These data indicate that like other cell types studied, candidate erythroid cell enhancers are typically identified by p300 occupancy or co-localization of tissue-expressed transcription factors (60,61).
The Homer program was utilized to identify overrepresented DNA motifs at sites of factor binding. Not surprisingly, related motifs were found among the erythroid transcription factors (e.g. GATA1 with PU.1, KLF1 with GATA1, NF-E2 with GATA1, and SCL with GATA1). These results are shown in supplemental Fig. S2.

Identification of Candidate Erythroid Enhancer Regions-
Genomic studies have identified two classes of enhancers, those marked by p300 binding and those marked by binding of multiple cell-and tissue type-specific transcription factors. We defined candidate erythroid enhancers as regions of DNA marked by nonpromoter-associated p300 occupancy or nonpromoter binding of two or more erythroid transcription factors. Typically, cell-and tissue type-specific enhancers act over distances of tens to hundreds of kilobases (34). Thus, bona fide erythroid enhancers are expected to be enriched in the genomic vicinity of genes that are expressed and functional in erythroid cells (1,13,62). To determine whether erythroid enhancers are localized in this manner, gene expression in erythroid cells was correlated with sites of occupancy of p300 and erythroid transcription factors. To exclude gene promoters, localization of p300 or erythroid transcription factors within 1 kb of annotated TSSs was excluded from the analyses. There was a statistically significant higher erythroid expression of genes with p300 binding sites within 1-50 kb of the TSS compared with expression of genes with p300 binding sites Ͼ50 kb from a TSS (Fig. 2, p Ͻ 2.2eϪ16; supplemental Table S7). Similar to p300, there was statistically significant higher expression of genes with erythroid transcription factor binding sites within 1-50 kb of the TSS compared with expression of genes with binding sites Ͼ50 kb from a TSS (Fig. 2). This was true when combinations of two, three, or four co-localizing erythroid transcription factors were analyzed (p Ͻ 2.2eϪ16 for all three combinations, respectively). As expected (6), H3K4me3 occupancy was uncommonly found at sites of candidate enhancers (supplemental Table S7).
We also examined whether candidate enhancers were enriched near genes with known erythroid cell structure or function. We performed an unsupervised statistical enrichment analysis of functional gene annotations (63). Candidate erythroid enhancers identified by two of four erythroid transcription factors were associated with genes linked to erythroid cell-related phenotypes ( Table 2 and supplemental Table S6).
Candidate enhancers identified by p300 were not associated with genes linked to erythroid cell-related phenotypes (data not shown). Analyzing genes with candidate enhancers identified by 2 of 4 erythroid transcription factors by Gene Ontology (GO) annotation identified biological processes involved in erythroid cell function, including K-Cl cotransporter activity, myosin binding, glucose transport, and cellular iron homeostasis (supplemental Table S8).
To further determine whether candidate enhancers identified by two of four erythroid transcription factors were associated with genes with erythroid cell function, the number of genes induced during erythroid differentiation associated with candidate enhancers (1-50 kb) was compared with the number  of genes with randomized candidate enhancers (1-50 kb). The number of genes induced during erythroid differentiation was significantly higher than the number of genes from randomized enhancer locations (p Ͻ 0.01).
In addition, candidate enhancers were associated with genes in the GO term categories erythrocyte differentiation and erythrocyte homeostasis (p Ͻ 0.01 and p Ͻ 0.01, respectively) and were not associated with genes in the GO term categories muscle differentiation and neuron differentiation (p ϭ 0.50 and p ϭ 0.91, respectively).
Conservation Analyses of Candidate Enhancer Regions-Evolutionary constraint in regions of noncoding DNA has served as a proxy for functional constraint in the identification of candidate enhancer regions. However, recent studies have demonstrated that many enhancers are rapidly evolving, and in some species, many enhancers are both evolutionarily young and species-specific (33,36). We investigated conservation of candidate erythroid enhancers between humans, mice, chickens, frogs, and zebrafish at different levels of stringency using the UCSC Genome Browser LiftOver tool, a computational tool that utilizes BLAT algorithm alignments to identify orthologous sequences between species (47,64). Conservation was analyzed for candidate enhancers located in distal, intergenic, and intron regions, avoiding the high degree of conservation typically found between gene promoters and exons. Even at lower stringency (50% minimum ratio of bases that must remap), there was very high conservation between humans and mice for p300, all four erythroid transcription factors, and the combination of two of four erythroid transcription factors compared with randomly shuffled control sequences (Table 3). There was a very large falloff of conservation between human and lower nonmammalian species with nucleated circulating erythrocytes for the erythroid transcription factors, even at low stringency (50% minimum ratio of bases that must remap).
Conservation plots using PhastCons conservation scores with the 44-way vertebrate and placental mammal PhastCons track were constructed for binding regions of p300 and the four erythroid transcription factors. Strong conservation for p300 and all of the erythroid transcription factors was present in gene promoters and exons. However, there was weak constraint for p300 and the erythroid transcription factors, with the exception of SCL/Tal1, at distal and intergenic sites (Fig. 3).
Candidate Enhancer Regions and Biologically Relevant Single Nucleotide Polymorphisms-We explored whether candidate erythroid enhancers are enriched in regions associated with biologically relevant erythroid cell traits. We collected a data set of erythroid-associated noncoding SNPs (see "Experimental Procedures") from the GWAS catalogue of NHGRI, National Institutes of Health (4). Currently, the functional significance of the overwhelming majority of these SNPs is unknown. SNP locations were compared with the sites of p300 or erythroid transcription factor occupancy. Fourteen SNPs associated with erythroid cell phenotypes were identified (Table 4 and supplemental Fig. S3), with four of the biologically relevant SNPs located in intron 2 of the BCL11A gene on chromosome 2. p300 occupancy was found at six SNPs, three without erythroid transcription factors and once each with two, three, and four co-localizing erythroid factors, respectively. Nine of the fourteen SNPs had co-occupancy with the combination of erythroid factors GATA1, NF-E2, and KLF1.
Recent studies have used PhastCons analyses and RP scores to predict whether or not a region of DNA contains a functional CRM (54,59,65). PhastCons uses a hidden Markov model method on aligned genomic sequences to estimate a probability that any nucleotide is conserved (66). The UCSC Genome Browser was used to determine 44-way placental mammal PhastCons scores for each of the 14 candidate biologically relevant enhancer regions. Twelve of 14 enhancer regions had maximal PhastCons scores of Ͼ0.8, suggesting that they contain a functional CRM (Table 5). An alternative way to predict the presence of CRMs is the RP score, which evaluates whether regions of DNA sequence have patterns more similar to those of

annotations of putative target genes near candidate human erythroid enhancers
Unsupervised enrichment analysis of annotated genes in the proximity of candidate enhancer regions identified by at least two of four erythroid transcription factors. The top enriched Mouse Genome Informatics phenotype ontology terms showing highly significant enrichment of genes implicated in erythroid cell-related phenotypes are shown. Only terms that showed significant enrichment and had a binomial -fold enrichment of Ն2 were considered. Reporter Gene Assay of Biologically Relevant Enhancers in Erythroid Cells-Individual reporter gene plasmids were prepared with the biologically relevant enhancer elements cloned upstream of a human ␥-globin gene promoter-luciferase reporter gene cassette. These plasmids were transfected into human K562 cells, which have features of human erythroid cells. After 2 days, the cells were harvested, and luciferase activity was analyzed. The luciferase reporter gene was driven by the human ␥-globin gene promoter, which is expressed in K562 cells. Activity from test plasmids was normalized to that directed by the human ␥-globin gene promoter-luciferase reporter gene control plasmid. Twelve of the 14 candidate enhancers mapped to biologically relevant SNPs directed statistically significant (p Ͻ 0.05) reporter gene activity ( Fig. 4 and supplemental Table S9). A cluster of four of these SNPs, all linked with levels of hemoglobin F, were located in intron 2 of the BCL11A gene. Fragments containing three of four of these SNPs directed statistically significant reporter gene activity, suggesting that the other SNPs required intact chromatin for their function, they were in linkage disequilibrium,

TABLE 3 Evolutionary conservation at sites of p300 and erythroid transcription factor occupancy in distal, intergenic, and intron regions in erythroid cell chromatin
Conservation of human candidate enhancer regions was analyzed using the UCSC LiftOver tool at stringency levels of 75 or 50% of bases in the region that must remap.  they were nonfunctional, or they were associated with other functions.

DISCUSSION
Mammalian erythroid cells are an excellent example of the complexity in temporal, developmental, and differentiation stage-specific changes exhibited by a single cell type. Mammalian erythroid cells originate from hematopoietic stem and progenitor cells. In the embryo and fetus, erythroid cells have differing developmental origins, with the primitive erythroid cell lineage developing from yolk sac-derived erythroid progenitors and the definitive cell lineage maturing from two different developmentally regulated stem and progenitor cell populations (67)(68)(69)(70). These cells have different programs of regulation, with variation in spatial, temporal, and site-specific differentiation. Indeed, altered programs of erythropoiesis are activated throughout the life of the organism, such as occurs after blood loss, oxidative stress, or other organismal stress.
Our conservation analyses revealed that erythroid cell enhancers, like heart enhancers, are under weak evolutionary constraint, particularly when comparing placental mammals and nonplacental vertebrates. They also indicated that many candidate erythroid enhancers are species-specific and evolutionarily young. Mammalian erythrocytes are among the most highly specialized cells known, having evolved to an enucleate cell endowed with a highly redundant cell membrane. These changes, which increase surface area and cytoplasmic volume ratios, are primarily attributed to the need for additional hemoglobin content for oxygen transport, making cellular oxygen diffusion more efficient. As homeotherms evolved, oxygen demands increased, and organisms evolved to meet these demands. Birds developed a flow-through respiratory system, significantly more efficient than mammalian respiratory systems. It has been suggested that mammals diverged at this point, developing enucleate erythrocytes with increased oxygen carrying capacity to adapt to increased oxygen demands (71)(72)(73)(74). Because discrete changes in CRMs may alter gene expression, generating potential for the genesis of novel species-specific traits (75,76), identification of gene expression changes occurring over short evolutionary distances can suggest the origin of species-specific traits. Thus, comparative studies of enhancers in human and nonplacental vertebrates will probably provide novel information about the evolution of erythrocyte structure and function.
Our understanding of enhancer structure and function continues to expand. Previous studies, such as those of the globin gene loci (77)(78)(79)(80)(81), the GATA1 and SCL/Tal1 gene loci (82)(83)(84)(85)(86)(87)(88)(89)(90)(91)(92)(93), and the erythropoietin gene locus (83), characterized enhancers as distantly located, positively acting cis-regulatory elements (77,80). Recent studies have shown that enhancers have additional, complex roles in cellular gene regulation. These include roles in determining nuclear organization (6), transcription initiation and release of RNA polymerase II from promoter pausing (18), transcriptional competence (11), insulator activity (95,96), development, and cell fate determination (11,19,97). Recent data indicate that the secondary enhancers synergize with primary enhancers to fine tune gene expression (98,99). Noncoding RNAs have also been linked to enhancer func- a Little or no expression in erythroid cell mRNA..  (107). Rapid advances in genomic technologies, including genomewide association studies, functional genomics, and high throughput gene expression analyses, are increasing our knowledge of gene regulation and its role in determining complex traits (75,108). GWAS have identified a catalogue of polymorphisms associated with phenotypic traits, with most of these polymorphic variants located in noncoding regions of the genome. In parallel, functional genomics studies, particularly ChIP-seq-based analyses, have identified regions of DNA with regulatory potential on a genome-wide scale (109 -112). Catalogs of genome-wide erythroid transcription factor occupancy in erythroid cells (113)(114)(115)(116)(117)(118)(119)(120)(121)(122)(123)(124), which localize and define cis regulatory elements, are essential for our understanding of the mechanisms of phenotypic variation in inherited and acquired disease (35). Other studies of erythroid enhancers have demonstrated the role of intragenic enhancers as alternative promoters (107) and the combinatorial assembly of developmental stage-specific enhancers in regulating gene expression during erythropoiesis (115). Our data demonstrate the role of cell-expressed transcription factors and p300 in marking erythroid cell enhancers, reveal the lack of evolutionary constraint of human erythroid enhancers, and show a significant link of enhancers with human erythroid cell phenotypes. Ongoing synthesis of the data obtained from complementary lines of investigation is beginning to unravel the complex mechanisms of genetic variation in disease susceptibility (125).
Identification of critical cis-regulatory elements in erythroid cells will also be extremely useful in the genetic diagnosis of patients with hematologic disease. In some cases of inherited disease, deleterious coding region mutations have been identified on one allele, but the causative mutation in trans has not been identified. For instance, erythroid cells from a subset of patients with recessively inherited, ␣-spectrin-linked anemia have decreased ␣-spectrin mRNA levels and diminished ␣-spectrin protein synthesis, leading to abnormal, spectrin-deficient erythrocytes (126 -128). The precise genetic basis (or bases) (i.e. the mutations on one or both alleles) of decreased spectrin mRNA accumulation in these cases is not known, even after mutation screening of the promoter and coding exons of the ␣-spectrin gene. Similarly, in congenital dyserythropoietic anemia type II, a recessively inherited disorder due to mutations in the SEC23B gene, a number of patients exhibit all of the phenotypic characteristics of congenital dyserythropoietic anemia type II, but a SEC23B mutation has only been identified on one allele (40,94). Both of these genes have candidate enhancers in the genomic vicinity, making these regions excellent candidates for disease-associated mutations in these patients.