Nuclear Factor Binding Sites in Human (cid:98) Globin IVS2*

The second intron of the human (cid:98) globin gene ( (cid:98) IVS2) has been previously identified as a region required for proper expression of (cid:98) globin. To further characterize this region, we have footprinted the entire (cid:98) IVS2 and have analyzed regions of interest by electrophoretic mo- bility shift assay. Through these studies we have identified four utilized binding sites for the erythroid regula- tory factor GATA-1, two sites bound by general transcription factor Oct-1, two sites bound by the nu- clear matrix attachment DNA binding protein special A-T-rich binding protein 1, and a site bound by a poten- tial homeobox protein. Additionally, we have found several factors displaying temporal or tissue specificity by electrophoretic mobility shift assay, which may be po-tentially involved in the regulation of (cid:98) globin expression. These proteins are not supershifted by antibodies to factors important in erythroid regulation such as GATA-1, NFE-2, or YY1, or by antibodies against more general transcription factors. The 70-kb 1 human (cid:98) globin gene complex has been extensively studied as a model of gene regulation. The region con-sists of 20 kb known as the locus control region, or LCR, which can confer erythroid-specific expression and position independ-ence on any gene of interest (1–3) followed by a series of (cid:98) family genes, five of which are temporally expressed (embry-onic (cid:101) , fetal A (cid:103) , G (cid:103) , adult (cid:100) and (cid:98)

The second intron of the human ␤ globin gene (␤ IVS2) has been previously identified as a region required for proper expression of ␤ globin. To further characterize this region, we have footprinted the entire ␤ IVS2 and have analyzed regions of interest by electrophoretic mobility shift assay. Through these studies we have identified four utilized binding sites for the erythroid regulatory factor GATA-1, two sites bound by general transcription factor Oct-1, two sites bound by the nuclear matrix attachment DNA binding protein special A-T-rich binding protein 1, and a site bound by a potential homeobox protein. Additionally, we have found several factors displaying temporal or tissue specificity by electrophoretic mobility shift assay, which may be potentially involved in the regulation of ␤ globin expression. These proteins are not supershifted by antibodies to factors important in erythroid regulation such as GATA-1, NFE-2, or YY1, or by antibodies against more general transcription factors.
The 70-kb 1 human ␤ globin gene complex has been extensively studied as a model of gene regulation. The region consists of 20 kb known as the locus control region, or LCR, which can confer erythroid-specific expression and position independence on any gene of interest (1-3) followed by a series of ␤ family genes, five of which are temporally expressed (embryonic ⑀, fetal A ␥, G ␥, adult ␦ and ␤) (4,5). A number of nuclear proteins have been identified that play a role in transcription of one or more ␤ family genes, through binding to the LCR, promoter, and/or enhancer sequences of these genes (6 -12). Some of these factors may have a role in the normal temporal switching of globin, as switching has been shown to be regulated at the level of transcription (4,5). Despite these many findings, the details of the molecular mechanisms regulating ␤ family gene expression and switching are still unclear.
The adult human ␤ globin gene has been shown to include enhancers 3Ј to the structural gene and within the gene itself, specifically, in the region spanning the 3Ј-end of IVS2 and the beginning of exon 3 (13)(14)(15). Previous work has indicated that ␤ IVS2 is required for ␤ globin expression (16). Two DNase I hypersensitive sites have been identified within the ␤ globin structural gene, a stronger site in exon 3, and a weaker site in the center of IVS2 (17). The ␤ IVS2 intronic enhancer region has a utilized binding site for erythroid transcription factor GATA-1. 2 We have previously shown in murine erythroleukemia (MEL) cells that the IVS2 sequences from the ␦ globin gene will not substitute for ␤ IVS2 in human ␤ globin expression. The replacement of ␤ IVS2 with ␦ IVS2 leads to a substantial decline in the expression of ␤ globin and renders the gene uninducible (18). ␤ IVS2 has also been shown to be a nuclear matrix attachment region (MAR), one of 10 found in the ␤ globin complex (19,20). These MARs may be involved in transcription, splicing, and replication of DNA (21)(22)(23).
In order to identify DNA binding proteins that may play a role in the expression of human ␤ globin, we have characterized the entire human ␤ IVS2 region by DNase I footprint analysis. Based on the footprint pattern and transcription factor binding analysis utilizing a computer-generated map, sites were chosen to be analyzed by EMSA. We report here evidence that three of nine potential GATA-1 binding sites are utilized, as well as a fourth degenerate GATA-1 site. Additional proteins seen to bind IVS2 are Oct-1, a ubiquitous factor binding to a homeobox consensus site, several stage or tissue-specific factors, and the nuclear matrix binding protein SATB1. The binding of this latter protein to ␤ IVS2 suggests that ␤ IVS2 and nuclear matrix attachment may be involved in the regulation of ␤ globin transcription.

MATERIALS AND METHODS
Subcloning and Footprint Analysis-␤ IVS2 was cleaved into three smaller pieces as depicted (see Fig. 1). The parent vector for ␤ IVS2 subcloning was pSP72␤ 3.4, which contains a 3.4-kb ClaI/SphI fragment of the human ␤ globin gene subcloned into the ClaI and SphI sites of vector pSP72 (Promega). The BamHI-EcoRI fragment of the gene, including all of ␤ IVS2, was analyzed further. To produce subclone I, pSP72␤3.4 was sequentially digested with restriction enzymes BamHI and MunI, and the 200-bp insert was ligated into the EcoRI and BamHI sites of the Bluescript SKϩ vector (Stratagene). For subclone II, pSP72␤3.4 was sequentially digested with MunI and DraI, and the 329-bp insert was subcloned into the EcoRI and SmaI sites of plasmid pGEM 7Zfϩ (Promega). For subclone III, pSP72␤3.4 was sequentially digested with EcoRI and DraI, and the resultant 400-bp insert was subcloned into the EcoRI and SmaI sites of pGEM 7Zfϩ.
All three subclones were entirely footprinted at least twice each in both orientations, using nuclear extracts from CEM (24), HEL-92 (25), HeLa (26), K562 (27), and MEL (28) cells. CEM is a human T-lymphocyte cell line; HEL-92 and K562 are human fetal-embryonic hematopoietic lines; and MEL is a murine adult hematopoietic line. HeLa cells are human cervical carcinoma cells and are not hematopoietic. Linearized subclones were dephosphorylated and end-labeled with [␥-32 P]ATP using T4 polynucleotide kinase. Following a second restriction digest, probes were polyacrylamide gel electrophoresis-purified on a 5% nondenaturing polyacrylamide gel (29). Some of each probe was subjected to Maxam-Gilbert sequencing (29) to provide a sequence ladder for footprint gels. Labeled probe was footprinted by DNase I digestion as described previously (30) using 50 g of nuclear extract and 160 ng of DNase I/reaction, except where indicated.
Nuclear Extracts-All nuclear extracts were prepared from cells as described previously (31) or by a small scale adaptation of this method (32).
Electrophoretic Mobility Shift Assays-After annealing, oligonucleotide probes were end-labeled using T4 polynucleotide kinase and [␥-32 P]ATP. In general, 20,000 cpm of probe/reaction were used. EMSA were run on 3.5, 4, or 5% nondenaturing polyacrylamide gels as described previously (30). Reactions included 10 g of nuclear extract and 5 g of double-stranded poly(dI⅐dC) as nonspecific competitor each.
EMSA buffer was 20 mM Tris-HCl, pH 7.6, 10% glycerol, 0.2 mM EDTA, 2.5 mM MgCl 2 , and 60 mM KCl, with 1 mM dithiothreitol and 0.2 mM phenylmethylsulfonyl fluoride. Reactions were incubated for 20 min on ice. For competition assays, 10-, 100-, or 1000-fold molar excess of unlabeled competitor oligonucleotides was added to the reaction. For supershift assays, reactions were incubated as usual for 20 min on ice, and then 1 g of antibody was added and the reactions were further incubated for 45 min to 1 h on ice. All antibodies were Supershift grade antibodies from Santa Cruz Biochemicals, except for the anti-SATB1 antibody, which was the generous gift of Dr. T. Kohwi-Shigematsu (La Jolla Cancer Research Foundation, La Jolla, CA).

RESULTS
Footprinting Analysis of Human ␤ IVS2-The 850-bp human ␤ IVS2 was subcloned in three pieces into pGEM 7 or Bluescript SKϩ vectors, as described under "Materials and Methods" (Fig. 1). DNase I footprinting was performed on both strands of each construct using nuclear extracts from the following cell lines: CEM, HEL-92, K562, HeLa, and MEL. The intron was found to be extensively footprinted. The central region of the intron has previously been shown to include a DNase I hypersensitive site (17). Of the 14 footprinted regions identified (Table I), seven were selected for further analysis, and one site had previously been characterized in this laboratory. 2 The sequence of ␤ IVS2 was mapped using the Eukaryotic Transcription Factor Binding Sites (tfsites) data base of the University of Wisconsin GCG package (33). This data base includes consensus sequences for the binding of sequence-specific eukaryotic transcription factors. Footprint data and this map were analyzed to determine which sequences to characterize by EMSA.
GATA-1 Binding of Human ␤ IVS2-␤ IVS2 includes nine consensus sequences for the binding of GATA-1, yet not all of these sites are bound in vitro by GATA-1. Four sites were found to bind GATA-1 (footprints 2, 7, 10, and 14, Fig. 2; Table I; Fig. 3, A and B; and see Fig. 8B). Sites 7, 10, and 14 conform to the general GATA-1 consensus sequences as listed in the GCG tfsites data base (33,34), but site 2 does not match any of these sequences. However, it is clear that GATA-1 can bind a great variety of sequences (35,36). When compared with the commonly used GATA-1 consensus sequences YTATCW (35,36) or MYWATCWY (34), the site 2 sequence (TGCATCAG) matches the first sequence at four of six positions and the second sequence at four of eight positions. Double-stranded oligonucleotides were synthesized to generate EMSA probes for sites 2, 7,  and 10. Each of these three probes was specifically competed by an excess of unlabeled competitor GATA-1 consensus oligonucleotide, but not with a GATA-1 mutant consensus oligonucleotide ( Fig. 3B and Fig. 8B). Site 14 has been previously characterized in this laboratory, 2 and was found to bind GATA-1.
Site 5 Analysis-Footprint site 5, a broad footprint seen with all nuclear extracts tested (Fig. 4), contains consensus sequences for transcription factors Oct-1 and Oct-2 (ATTTG-CAT), and GATA-1 (ATAATCTC) ( Table I). EMSA done with a 60-bp oligonucleotide containing the ubiquitous footprint 5 sequence revealed a complex pattern (Fig. 5A). Within this site are two possibly stage-specific bands seen only with HEL-92, K562 (bands 3, a and b), and weakly with CEM nuclear extracts in some gels (not shown). There is also a different sized band seen only with MEL nuclear extract (band 4), a unique band found only with CEM and K562 (band 5), and a ubiquitous band (band 1). Each of these bands was competed by an excess of unlabeled site 5 probe (Fig. 5B, lanes 2, 8, and 9) but not by a nonspecific probe for the general transcription factor AP-1 (not shown). The large ubiquitous band (Fig. 5A, band 1) was competed by an Oct-1 consensus oligonucleotide (Fig. 5B,  lanes 3, 10, and 11) and was also supershifted by an anti-Oct-1 antibody (lane 5). Additionally, the GATA-1 consensus oligonucleotide did not compete any bands even though a GATA-1 consensus is present in the site 5 oligonucleotide (lane 4). In an attempt to identify the nuclear factors responsible for bands 3, a and b, 4, and 5, supershift analyses were performed using antibodies against c-Fos, c-Jun/AP-1, Oct-2 (lane 6), Fli-1, Pu.1, Ets 1/Ets 2, GATA-1, NFE-2, YY1, and SATB1. All of these were negative by supershift assay with the site 5 oligonucleotide.
Site 6 Analysis-Footprint site 6 is also seen with all nuclear extracts (Fig. 4) and includes consensus sequences for the ho-meobox protein engrailed (CAATTAAA) and GATA-1 (ATAAT-CAT). Fig. 6A shows an EMSA gel with an oligonucleotide synthesized to the site 6 footprint sequence. As occurred for site 5, the site 6 oligonucleotide EMSAs revealed a complex pattern buried within this ubiquitous footprint. This includes an embryonic-fetal erythroid specific band seen only with HEL-92 and K562 (band 2b). There is also a larger band seen with HEL-92, K562, and possibly MEL (band 2a), although the low mobility of this band makes resolution difficult. All bands are competed by an excess of unlabeled site 6 probe (Fig. 6B, lanes  2, 3, 6, and 7; Fig. 6C, lanes 2 and 9, Fig. 6D, lanes 2, 3, and 4), but not with the GATA-1 consensus oligonucleotide, although the gel shift oligonucleotide contains a GATA-1 consensus se- quence (Fig. 6C, lanes 4 and 10). One ubiquitous band is competed by an oligonucleotide containing the consensus sequence for the homeobox protein Antennapedia (CAATTAAA) (Fig. 6D,  band 4, lanes 5, 6, and 7), and is also competed by an oligonucleotide containing the consensus sequence for the homeobox protein Oct-1 (ATTTGCAT) (Fig. 6C, lanes 6 and 12) but not by the mutant Oct-1 oligonucleotide (Fig. 6C, lanes 7 and 13). This may be the site in IVS2 previously described to bind the homeobox protein HOX 2B (B6) (37,38).
Interestingly, the nuclear matrix attachment DNA binding protein SATB1 was found to bind to the site 6 probe in CEM nuclear extract (Fig. 6A, band 3). ␤ IVS2 has been described as one of nine sites in 90 kb of globin gene sequence studied to contain a MAR (19). Additionally, it has been noted that MARs from the human ␤ globin gene can bind SATB1 (21). A SATB1 consensus oligonucleotide inhibits band 3 formation with the site 6 probe (Fig. 6C, lane 3), and the SATB1 band is supershifted by an anti-SATB1 antibody (Fig. 6B, lane 4). SATB1 is a 103-kDa protein (39), and the SATB1 runs very slowly on EMSA. It is also possible that the SATB1 band seen in Fig. 6B might be a complex of SATB1 and some other protein. SATB1 complexes have been suggested in a recent paper describing the binding of SATB1 to an A ␥ globin regulatory region (20). As was done for footprinted site 5, a supershift assay was performed using the same panel of antibodies against general factors, erythroid factors, and ets proteins. No additional supershifts were seen by EMSA.
Site 7 Analysis-The sense strand footprint at site 7, which is seen with CEM, HEL-92, and K562 nuclear extracts (Fig. 7) has consensus sequences for the homeobox protein bicoid (CCTAATCTC) and GATA-1 (CCTAATCTC). This footprint is broader in HEL-92 and K562 nuclear extracts than in CEM, and the region footprinted only in HEL-92 and K562 includes a GATA-1 consensus sequence. EMSA with a 40-bp oligonucleotide including the site 7 footprint showed a more complex pattern with erythroid nuclear extracts than other extracts (Fig. 8A). The site 7 probe used spanned two GATA-1 consensus sequences (CCTAATCTC and TTATCTTA), and in fact proved to bind GATA-1 in the erythroid lines, generating a GATA-1 band which could be supershifted with an anti-GATA-1 antibody (Fig. 8A, lane 8; Fig. 8C, lane 7). This band was also specifically competed with a GATA-1 consensus oligonucleo- tide, but not with a GATA-1 mutant oligonucleotide (Fig. 8B,  band 2, lanes 13 and 14). Additionally, a higher band seen strongly with CEM and faintly with K562 was due to binding of nuclear matrix binding protein SATB1 (band 1). Band 1 is specifically competed by the SATB1 consensus oligonucleotide (Fig. 8B, band 1, lanes 3 and 11) and is supershifted by the anti-SATB1 antibody (Fig. 8C, lanes 2 and 6). The SATB1 binding of the site 7 oligonucleotide seemed stronger than the binding to the site 6 oligonucleotide. A faint band was seen in CEM and K562 nuclear extracts when the site 7 probe was labeled to a high specific activity (Fig. 8, B and C, band 4). This band could be competed with an Oct-1 consensus oligonucleotide, but not with an Oct-1 mutant oligonucleotide (Fig. 8B,  lanes 7 and 15 and 8 and 16). However, no supershift was seen using the anti-Oct-1 antibody (Fig. 8C, lanes 4 and 8).
Other Gel Shift Analyses-A 20-bp oligonucleotide was synthesized to further investigate a faint but reproducible 7-bp footprint (site 9) found only on the sense strand (data not shown). It is possible that this represents either a functional RNA binding protein or a single-stranded DNA binding protein (40). However, no bands were seen by EMSA of doublestranded probe or labeled sense or antisense single-stranded probe with any nuclear extract (data not shown).
A 28-bp oligonucleotide was synthesized to characterize a footprint (site 12, Fig. 9) seen with nuclear extracts CEM, HEL-92, and K562. This oligonucleotide generated a complex gel shift pattern. One band was seen only with CEM, HEL-92, K562 and NIH 3T3 embryonic fibroblast nuclear extracts (41) (data not shown). Further characterization of binding to this site 12 probe showed that the bands are not competed by a general factor binding oligonucleotide (AP-1), and no supershifts were observed with the anti-SATB1 antibody (data not shown). DISCUSSION Human ␤ globin IVS2 has been entirely footprinted and further characterized by EMSA. Previous data have indicated that this region has several interesting structural and func-tional features; it contains a 3Ј-enhancer region (13)(14)(15), two DNase I hypersensitive sites (17), and is required for proper expression of the ␤ globin gene (16). We have previously analyzed the expression in MEL cells of ␤ constructs in which ␤ IVS2 has been replaced by ␦ or ␥ globin IVS2, and have found that these globin IVSs are not interchangeable. When ␤ IVS2 is replaced with ␦ IVS2, the base-line expression of ␤ is greatly decreased, and the cells are not inducible with Me 2 SO (18). In addition, constructs in which ␤ IVS2 has been replaced with ␥ IVS2 produce ␤ transcripts that are improperly initiated in K562 cells (42). Comparison of ⑀, ␥, ␦, and ␤ IVS2 using restriction maps and maps generated by the tfsites data base reveals no significant sequence conservation on the nucleotide level, and few conserved potential transcription factor binding sites, except for two GATA-1 binding sites that are conserved in position. The first is the second intronic GATA-1 site (Fig. 2), which is conserved in position between ⑀ and ␤ globin. The second is the seventh GATA-1 site, which is conserved in position between A ␥ and ␤ globin. Neither of these sites were found to bind GATA-1 in our experiments and are not apparently functionally important, at least in the expression of ␤ globin.
The footprint pattern of ␤ IVS2 and those areas studied by EMSA have revealed a very dense and complex pattern of protein binding (Table I). Previous studies in which only the DNase I hypersensitive site of murine ␤ IVS2 was characterized also revealed a complex gel shift pattern (43). These gel shift analyses covered about one-third of the total sequence of murine ␤ IVS2. Two proteins were identified as binding to murine ␤ IVS2, GATA-1 and Spi-1/Pu.1, an Ets family protein (44). These murine ␤ IVS2 binding sites are not conserved in human ␤ IVS2. Human ␤ IVS2 does contain four potential Ets binding sites, but only one of these is footprinted in human ␤ IVS2 (site 14), and this one site has been shown to bind GATA-1 only. 2 The complexity of the binding pattern seen in IVS2 suggests that it is an area of complex regulatory function, and might be involved in the regulation of ␤ expression in the adult, and perhaps in earlier stages of erythropoiesis. Certainly a regulatory function is supported by the extensive binding of erythroid transcription factor GATA-1 to this region. The redundancy of the GATA-1 consensus sequences alone (nine sites), unique among the globin genes, indicates that some function is likely. By comparison, the human ␦ globin gene has only three GATA-1 consensus sequences, and the human ␥ gene only two GATA-1 consensus sequences. Three of the nine consensus sites in ␤ IVS2 are bound by GATA-1 as is a fourth related sequence. It is interesting that although binding sites for GATA-1 have been extensively characterized (34 -36), one still cannot predict with certainty which sites are utilized in vivo or in vitro.
Several stage-or tissue-specific bands were seen on EMSA of ␤ IVS2. The gel shift analyses on site 5 in particular revealed several bands of interest (Fig. 5, A and B), none of which could be supershifted by antibodies to general transcription factors or known factors important in the regulation of globin expression such as NFE-2, YY1, or GATA-1. Of particular interest is a band seen only with MEL (adult erythroid) cells (Fig. 5A, band  4), as this could represent a potential factor for positive expression of ␤ globin. The two bands seen with HEL-92 and K562 (bands 3, a and b) and faintly in CEM and the band seen only with CEM and K562 nuclear extracts (band 5) are also intriguing. Perhaps these proteins are not seen in murine MEL nuclear extract due to the species difference, or possibly they play a role in embryonic-fetal erythropoiesis, or in lymphoid cells. The HEL-92-, K562-, and possibly MEL-specific band bound to the site 6 oligonucleotide (Fig. 6A, band 2a) can be approximately sized due to the presence of SATB1 (band 3) binding to this oligonucleotide in CEM nuclear extract. The HEL-92 and K562 specific band runs more slowly than SATB1 which has a molecular mass of 103 kDa. This size would be larger than known erythroid regulatory proteins, with the exception of ␥PE. ␥PE is a 108-kDa protein that binds to sites near human ␥ globin (45). Its broad pattern of tissue distribution argues against it being any of the uncharacterized proteins we have found. The particular pattern of expression of the HEL-92-and K562-specific band, i.e. only in embryonic-fetal erythroid cells, could be relevant to down-regulation of ␤ globin expression early in development. This could be of particular importance if GATA-1 binding to ␤ IVS2 is indeed important in positive regulation of ␤ globin expression, as GATA-1 is certainly present in embryonic-fetal erythroid cells. Another differentially expressed band at site 12 ( Fig. 9), seen with CEM, HEL-92, K562, and NIH 3T3 nuclear extracts, but not with the adult HeLa or MEL cells, is of unknown significance.
We have found one potential homeobox protein binding site in ␤ IVS2 (site 6, Fig. 6A, band 4). Previous data have shown that homeobox proteins may be important in erythroid differentiation (46). Eight of nine genes in the HOX 2 cluster are expressed in erythroid cells, but rarely in B or T cells (46 -49). There is also indirect evidence that HOX 3C may be necessary for adult hematopoiesis (50). A band found in all extracts at site 6 was competed by an oligonucleotide with the Oct-1 consensus sequence (ATTTGCAT) (Fig. 6C, band 4, lanes 6 and 12) and by an oligonucleotide with the Antennapedia consensus sequence (CAATTAAA) (Fig. 6D, band 4, lane 7). The Antennapedia  8) and K562 (lanes 9 -16) nuclear extracts, and competitor oligonucleotide as marked. Lanes 3 and 11 include 100ϫ molar excess of unlabeled competitor oligonucleotide; all other competition lanes include 1000ϫ molar excess of unlabeled competitor oligonucleotide. Band 4 is a faint band that is seen in CEM and K562 nuclear extracts when probe has been labeled to a high specific activity. C, supershift EMSA with CEM (lanes 1-4) and K562 (lanes 5-8) nuclear extracts, and antibodies (Ab) as marked.
sequence is within the site 6 footprint and site 6 probe and is listed in the tfsites data base as the engrailed consensus. This is the core consensus for many HOX proteins including HOX B6, which may have a role in erythroid differentiation (51). However, we do not see any erythroid cell specificity of the particular protein binding at this site. Although the site 7 footprinted region and oligonucleotide contain the consensus sequence for the homeobox protein bicoid, all bands seen with K562 nuclear extracts could be competed by GATA-1, SATB1, or Oct-1 sequence oligonucleotides, and so all bands in erythroid cell extracts are accounted for. There seems to be no erythroid-related homeobox binding at this site in ␤ IVS2.
We have found that the nuclear matrix-associated DNA binding protein SATB1 binds to site 6 of ␤ IVS2 with CEM nuclear extract and more intensely to site 7 of ␤ IVS2 with CEM and K562 nuclear extract (Fig. 6A, band 3; Fig. 8A, band 1). MARs are postulated to play an important role in the functional organization of chromatin loop domains. There is evidence that replication and transcription occur at the interface of DNA and the nuclear matrix and that the nuclear matrix is involved in RNA splicing (21)(22)(23). Recent reports have indicated that DNA binding of some transcription factors is associated with the nuclear matrix (52)(53)(54)(55). MARs have a strong potential for extensive unpairing or unwinding. Although MARs often contain or reside close to enhancer sequences (21), their role is not clear as yet.
SATB1 is one of the characterized MAR binding proteins. It is a 103-kDa protein that binds as a monomer and is expressed primarily in thymus (21,39). It binds selectively to MARs with well mixed ATC sequences (21,39). ␤ IVS2 has been previously characterized to be one of nine sites in the 90 kb of the human ␤ globin gene locus to function as an MAR (19). MARs are regions of DNA at least 200 bp in length and are generally 70% AT-rich (21,22). The areas binding SATB1 in ␤ IVS2 are about 73% AT rich and do consist of a well mixed ATC sequence. Two distinct sites, in footprints 6 and 7, bind SATB1; two sites seem to be required for a strong SATB1 interaction to occur (21,39). The site 6 and 7 oligonucleotides are, respectively, 83 and 75% AT-rich. The bands run very slowly on EMSA and may consist of a complex of SATB1, and some other protein as has been suggested (20).
Preliminary data show that SATB1 is a suppressor of transcription based on transient cotransfection assays with a reporter gene (21). One regulatory region to which SATB1 binds is the Ig heavy chain intronic enhancer, which is flanked by MARs. In this context SATB1 may help to repress expression in non-B cells (56). SATB1 has also been observed to bind the ␤ globin gene (21). Also, SATB1 has been recently reported to bind to the human A ␥ 3Ј-regulatory region at sites I and IV (20).
These sites had been previously characterized as binding HOX protein 2.8 (2H) (57,58). Besides being highly expressed in CEM cells, SATB1 was also found in heart, skeletal muscle, fetal liver, K562 cells, and B and T cells (20). It was proposed that the A ␥ regulatory region might influence gene expression through interaction with the nuclear matrix. The regulatory region was also found to be an MAR, and this group speculated that promoter/enhancer interaction is mediated by SATB1 binding of MARs. However, they found no MAR near the A ␥ promoter (20).
MARs and SATB1 binding in ␤ IVS2 could have any of several functions. Previous data seem to indicate a correlation between MARs and enhancer regions (21,22). Also, each ␤ family gene (except G ␥) harbors an MAR, while by comparison, no such sites exist in the large ␣ globin gene complex (19). MARs might mediate an attachment between individual ␤ globin family genes and the ␤ globin LCR (which also contains MARs) (19), possibly having some role in globin switching. The ␤ IVS2 MAR might increase expression mediated by the ␤ IVS2 enhancer, as the ␤ 3Ј-MAR (19), situated 500 bp downstream of the ␤ 3Ј-enhancer, might facilitate expression mediated by this enhancer. Or, the ␤ IVS2 MAR in combination with GATA-1 binding or binding of other factors might function as an independent enhancer in ␤ IVS2. Possibly, MARs could mediate interaction between IVS2 and the ␤ promoter or 3Ј-enhancer.
From the complexity of DNase I footprint and EMSA results we have obtained, it is clear that there are many interactions between human ␤ IVS2 sequences and nuclear factors, both known factors and those yet to be characterized. The biological significance of the presence of protein-DNA interactions in ␤ IVS2, and any interactions between ␤ IVS2 and other regulatory sequences 5Ј or 3Ј to the human ␤ gene, other ␤ family genes or the LCR remain to be determined. The details of the relationships between DNA binding factors and globin gene switching also remain to be elucidated. Deletion analysis and site-directed mutagenesis of human ␤ IVS2 transacting factor binding sites may provide new insights into the relationship between protein binding to this region and human ␤ globin gene function.