An atypical homeodomain in SATB1 promotes specific recognition of the key structural element in a matrix attachment region.

SATB1 is a cell type-specific nuclear matrix attachment region (MAR) DNA-binding protein, predominantly expressed in thymocytes. We identified an atypical homeodomain and two Cut-like repeats in SATB1, in addition to the known MAR-binding domain. The isolated MAR-binding domain recognizes a certain DNA sequence context within MARs that is highly potentiated for base unpairing. Unlike the MAR-binding domain, the homeodomain when isolated binds poorly and with low specificity to DNA. However, the combined action of the MAR-binding domain and the homeodomain allows SATB1 to specifically recognize the core unwinding element within the base-unpairing region. The core unwinding element is critical for MAR structure, since point mutations within this core abolish the unwinding propensity of the MAR. The contribution of the homeodomain is abolished by alanine substitutions of arginine 3 and arginine 5 in the N-terminal arm of the homeodomain. Site-directed mutagenesis of the core unwinding element in the 3′ MAR of the immunoglobulin heavy chain gene enhancer revealed the sequence 5′-(C/A)TAATA-3′ to be essential for the increase in affinity mediated by the homeodomain. SATB1 may regulate T-cell development and function at the level of higher order chromatin structure through the critical DNA structural elements within MARs.

Eukaryotic chromosomes are thought to be separated into topologically independent loop domains by periodic attachment onto an intranuclear frame known as the nuclear matrix or skeleton, defined as the insoluble material left in the nucleus after a series of biochemical extraction steps (1). Specific DNA sequences that bind to the nuclear matrix in vitro are called matrix attachment regions (MARs), 1 and these sequences have been postulated to form the base of chromosomal loops (reviewed in Refs. 2 and 3). MARs may be important to organize chromosomes and regulate DNA transcription and replication within the nucleus. In support of this notion, MARs often colocalize or are located in close proximity to regulatory sequences including enhancers (4 -9), and some MARs can augment transcription from heterologous promoters in stable transformants (5-7, 10, 11). Recent evidence shows that MARs play a role in tissue-specific gene expression. The MARs associated with the immunoglobulin heavy chain locus are essential for transcription of a rearranged gene in transgenic B lymphocytes (12). Identification of the cell type-specific MARbinding protein SATB1, which is predominantly expressed in thymocytes, shows that MARs can be specific targets for a cell type-specific factor (13).
SATB1 defines a novel class of DNA-binding proteins that recognize a specific sequence context that exhibits a high base unpairing or unwinding propensity. MARs are generally ATrich and typically contain a subregion(s) that exhibits a strong potential to base-unpair under negative superhelical strain (10,14). A high AT content, however, is not sufficient to confer high affinity binding to SATB1; specific mutations within MARs, which maintain the AT-richness but eradicate the unwinding capability, substantially reduce or abolish SATB1 binding (13). Analysis of SATB1 binding sites in MARs revealed that binding is restricted to the subregion of MARs that has a high unwinding propensity. This base-unpairing region consists of a cluster of sequence stretches with a special AT-rich DNA sequence context, in which Cs are sequestered exclusively on one strand and Gs on the other (ATC sequences) (13). A short core unwinding element can be present within one of these ATC sequences, which can be detected by virtue of its most persistent base unpairing even under conditions that favor the doublestranded DNA configuration; mutation of this element abolishes the base-unpairing propensity of MARs (14). The unpairing potential was demonstrated to be essential for MAR function; a concatemer, wild-type (25) 7 , of the core unwinding element of the 3Ј MAR of the immunoglobulin heavy chain (IgH) enhancer displays high binding affinity to the nuclear matrix, unwinds under superhelical strain, and enhances transcription from a linked reporter gene. A corresponding mutated version, mutated (24) 8 , has lost all of these properties (10).
To date, three proteins with similar binding specificity have been identified in addition to SATB1: nucleolin, a major nucleolar protein with multiple functions (15), p114, isolated from breast carcinoma (16), and Bright, a protein that is predominantly expressed in B-cells (17). These proteins bind with high affinity to MARs, and we showed that nucleolin and p114 can distinguish wild-type (25) 5 from mutated (24) 8 . Unlike other proteins known to bind MARs such as lamin B 1 (18) and topoisomerase II (19), SATB1 binds MARs with very high affinity, exhibiting dissociation constants (K d ) in the range of 10 Ϫ9 to 10 Ϫ10 M, comparable to many sequence-specific transcription factors.
To understand the biological role of SATB1, it is important to delineate the functional domains in this protein. A minimum 150-amino acid MAR-binding domain that contains novel DNA-binding motifs was previously identified (20). We report here that SATB1 contains an additional domain that shares homology with known homeodomains. Homeodomains are 60-amino acid DNA-binding domains, and their amino acid sequence is highly conserved, as well as their three-dimensional structure. Homeodomain proteins function in vitro and in vivo as sequence-specific transcription factors, and they are important developmental regulators that determine position or cell-type specificity (reviewed in Refs. 21 and 22). Unlike known homeodomains that directly and independently bind DNA, the homeodomain in SATB1 does not bind to the MAR probes analyzed here nor does it bind to a dimerized sequence (RP 2 ) that resembles the homeodomain consensus sequence (23). When associated with the MAR-binding domain, however, the SATB1 homeodomain enhances binding specificity toward the core unwinding element of a MAR.

EXPERIMENTAL PROCEDURES
Protein Domain Analysis-We performed searches of the SWISS-PROT data base (release 26.0, August, 1993) using the program Blast (24) and Blitz (25). Computations were performed using the Blast server at NCBI and the Blitz server at EMBL. Best results were obtained with Blitz searches using the PAM 120 matrix and a gap penalty of 13. To lower the background of nonsignificant matches, it was necessary to remove a segment rich in glutamines and prolines from the query sequence (residues 593-619 of SATB1). The MARbinding domain was previously delineated by the successive deletion mapping combined with gel mobility shift analysis, and the repeated regions (boxes I and II) were detected by computer-aided sequence comparisons (20).
Protein Expression-Plasmids for the fusion protein expression were constructed as follows. The desired SATB1 fragments were amplified from the human cDNA clone pAT1146 (13) by the polymerase chain reaction using Taq DNA polymerase and the appropriate primers containing a BamHI or EcoRI site. The fragments were isolated from agarose gels, purified by Elutip D columns (Schleicher & Schuell), and cloned in frame in the BamHI or BamHI/EcoRI site of the vector pGEX2T (Pharmacia Biotech Inc.). Deletion of the homeodomain was achieved by first synthesizing a fragment ranging from the N-terminal residue of the MAR domain (position 346) to the N-terminal residue of the homeodomain (position 641) and cloning it into BamHI/EcoRIdigested pGEX2T. In a second step, a fragment ranging from the Cterminal residue of the homeodomain (position 702) to the end of the cDNA was amplified and ligated in frame in the EcoRI site downstream of the insert made in the first step. Glutathione S-transferase (GST)fusion proteins were overexpressed in Escherichia coli (XL1 Blue) and purified on glutathione-Sepharose according to standard procedures (26). Protein concentrations were determined using a Bradford protein assay kit (Bio-Rad), which was followed by quantitation of the fusion proteins by Coomassie Blue staining of SDS-polyacrylamide gels. To obtain precise comparisons, the different fusion proteins were run side by side on the same gel, and the band intensities were compared by laser densitometer scanning.
DNA-binding Assays-Gel mobility shift assays were carried out as described (13), with no poly(dI-dC)⅐poly(dI-dC) added or with 0.5 g/20 l. The 3Ј MAR is identical to the IgH 3Ј-En fragment described previously (13). The wild-type 3Ј MAR and the mutated fragments were subcloned in the EcoRI site of Bluescript (Stratagene), and the fragments were isolated by EcoRI restriction enzyme digestion and purification from an agarose gel. Pentamer repeats of binding sites V and VI were made exactly as described for wt (25) 5 (13), using the following oligonucleotides: 5Ј-CTTAAAATTACTCTATTATTCGAAttc-3Ј with its complementary strand 5Ј-TTCGAATAATAGAGTAATTTTAAGgaa-3Ј for wt(V) 5 , and 5Ј-TTCCCTCTGATTATTGGTCTCCATGAAttc-3Ј with 5Ј-TTCATGGAGACCAATAATCAGAGGGAAgaa-3Ј for wt(VI) 5 . The lowercase letters indicate single-stranded overhangs used for end to end ligation of the double-stranded oligonucleotides.
Probes for gel mobility shift analysis were prepared by labeling isolated restriction fragments at both ends using Klenow polymerase and [ 32 P]dATP. Under conditions of protein excess, the concentration required for half-maximal binding may be considered an estimate of the equilibrium binding coefficient (27). Autoradiographs of the gel mobility shift experiments were scanned by laser densitometry, and the percentage of free probe remaining was plotted against the protein concentration in nM.
DNA titration experiments were performed as described (28) with some modifications. The concentration of the DNA fragment to be labeled was determined using a TKO 100 minifluorometer (Hoefer Scientific Instruments), followed by agarose gel electrophoresis and ethidium bromide staining using a plasmid of known concentration as a standard. The concentration of protein that gave rise to a 40 -70% shift at the lowest DNA concentration was determined empirically. All the DNA titrations were done in the presence of 0.5 g/20 l of poly(dI-dC). The binding reaction was incubated for 30 min at room temperature to ensure that equilibrium was reached. After electrophoresis the gels were dried and analyzed by a PhosphorImager (Bio-Rad).
Site-directed Mutagenesis-The single point mutations mut 2 to mut 7, and mut IV of the 3Ј MAR were previously described (14). Mut V, mut VI, and mut 8 were made by a PCR-based approach using four primer sets (29). Briefly, complementary oligonucleotides containing the desired mutations were synthesized, and they were used separately as primers in two PCR reactions with either KS or SK primer from the pBluescript polylinker region flanking the 300-bp 3Ј MAR. The two PCR products, one containing the desired mutation at its 5Ј-end and the other at its 3Ј-end, were mixed at an equimolar ratio, annealed, and amplified by PCR with both KS and SK primers. The amplified fragments containing the mutation were purified with a Wizard PCR preps system (Promega), digested with EcoRI, and subcloned in the EcoRI site of the vector Bluescript. Mut 8 was confirmed by Sanger sequencing. Mut V and mut VI were confirmed by the presence of an XhoI site in mut V or an SpeI site in mut VI, which were introduced by the multiple point mutations (see Fig. 3A). Alanine substitutions were introduced in the homeodomain following the Exsite PCR-based mutagenesis protocol (Stratagene), with TaqPlus and Pfu polymerase (both from Stratagene) and the pGEX2T plasmid containing the (MD ϩ HD)-encoding insert as template. The mutations were designed to introduce a novel restriction site and were confirmed by restriction enzyme digestion and protein expression.

SATB1 Contains a Homeodomain and Cut-like Repeats-In
addition to the MAR-binding domain (residues 346 -495) previously reported (20), computer-aided homology searches of the Swiss-Prot data base (30) identified a homeodomain homology at the C terminus of SATB1 (residues 641-702) (Fig. 1A). Many of the residues that are most conserved among homeodomains are also found in the SATB1 homeodomain, which shares 33% identity with the engrailed class of homeodomains (reviewed in Ref. 31, Fig. 1B). Identities are found with residues that in the x-ray structure of other homeodomains contribute to the hydrophobic core and residues that directly interact with DNA (32). This putative homeodomain is, however, divergent. Major differences include a single amino acid insertion at the end of the first helix and a substitution of the highly conserved WFQ motif in the third helix of known homeodomains by FFQ in both human and mouse SATB1.
In addition to the homeodomain homology, a set of two repeats was found near the center of SATB1 (residues 370 -445 and 493-568), similar to the Cut repeats of the Cut-and Cloxhomeo-proteins of Drosophila and mammals, respectively. Cut proteins contain a homeodomain and three additional DNAbinding domains of 73 amino acids, called Cut repeats (33)(34)(35)(36). The two Cut-like repeats in SATB1 (named here A and B) contain the previously documented repeats box I (residues 382-415 and 505-538) and box II (429 -445 and 552-568), respectively (20) (Fig. 1, A and C). Repeat A occurs at the center of the MAR-binding domain of SATB1, but it does not include the Nand C-terminal amino acids that are mandatory for MAR binding (20). The two repeats of SATB1 are 45% identical over 75 residues with each other and display 27-35% identity with the Cut repeats. This similarity is considered to be significant, since no gaps were required for optimal alignment (Fig. 1C).
The Homeodomain Increases Binding Affinity of SATB1 to a MAR-Most homeodomain proteins contain a homeodomain as the sole DNA-binding domain. A group of homeodomain proteins have additional domains that assist the homeodomain in DNA binding specificity (reviewed in Refs. 37 and 22). In the case of SATB1, the MAR-binding domain by itself is sufficient to recognize and bind a specific region (base-unpairing region) within MARs that has a high propensity for base unpairing, and the homeodomain may have a new role in DNA recognition. To explore this possibility, glutathione S-transferase (GST)-SATB1 fusion proteins were constructed; one protein contained the MAR domain and homeodomain linked together in their natural protein context (GST(MD ϩ HD)); one protein had the 60-amino acid homeodomain specifically deleted (GST(MD⌬HD)), and one fusion protein contained the homeodomain separately (GST(HD)) ( Fig. 2A). These purified fusion proteins were used in quantitative gel mobility shift experiments with a fixed concentration of a synthetic MAR probe, wild-type (25) 5 , and increasing protein concentrations. This probe was derived from the core unwinding element of the MAR located 3Ј of the IgH enhancer, and it has the same properties as a natural MAR (10). Fig. 2B shows the gel mobility shift experiments and the binding curves that were derived from these autoradiographs. Each of these and the following gel shift experiments were repeated at least three times giving similar results. The isolated HD showed virtually no binding activity for the wt(25) 5 probe; however, when HD was associated with MD (MD ϩ HD), the binding affinity was approximately 10 times higher (K d ϭ 0.1 nM) than for MD⌬HD (K d ϭ 1.0 nM). The affinity of MD⌬ HD toward wt (25) 5 was virtually identical to that of the isolated MD alone (GST(MD)), indicating that the C-terminal animo acids from 496 to 763 besides HD have no additional contribution toward binding to wt(25) 5 (data not shown). HD weakly bound to longer MAR fragments, but this activity was mainly nonspecific, since it could be competed by nonspecific competitors (data not shown). This effect of the homeodomain on binding affinity was confirmed by additional DNA titration experiments, in which the dissociation constants were determined using a fixed protein concentration and increasing concentrations of the DNA probe (Fig. 2C). Dissociation constants determined in this manner are independent of minor variations in the protein concentration determination or the amount of active protein in the protein preparation. The results obtained from gel mobility shift experiments were quantitated using a PhosphorImager, and  2C). The dissociation constants determined by protein titration or DNA titration were similar, indicating that nearly all the protein in the protein sample was in an active form.
The SATB1 Homeodomain Promotes Binding of the MAR Domain to the Core Unwinding Element of the IgH 3Ј MAR-SATB1 binds a variety of MARs from different species and selectively recognizes sites within MARs that are prone to become stably base-unpaired under negative superhelical strain (13). 2 The structural properties and the SATB1 binding sites of the 5Ј MAR and the 3Ј MAR, which flank the IgH enhancer, were previously characterized (13,14) (Fig. 3A). These natural MARs were used as probes in quantitative gel mobility shift experiments to determine whether HD can increase binding affinity to MARs in general. When the 5Ј MAR fragment was used as probe, HD had no effect on the binding affinity, both MD ϩ HD and MD⌬HD exhibited nearly equal affinity to the 5Ј MAR (K d ϭ 7 and 10 nM, respectively) ( The ATC sequence cluster in the IgH 3Ј MAR is shown. Each ATC sequence is indicated by a bracket, and the SATB1 direct contact sites are shown by double-headed arrows and roman numerals. Residues that constitute the core unwinding element within site IV are indicated by filled dots. The sequences of the mutated binding sites are shown below, with an asterisk to mark the mutated residues. B, gel mobility shift experiments and binding curves comparing the affinities of (MD ϩ HD) and (MD⌬HD) to wild-type 5Ј MAR, wild-type 3Ј MAR, and mut IV. alone; the dissociation constants (K d ) for MD ϩ HD and MD⌬HD were 2.5 and 15 nM, respectively (Fig. 3B). This differential effect of the homeodomain could be due to the different structural properties that distinguish these two MARs. Both MARs contain a base-unpairing region, but only the IgH 3Ј MAR has a core unwinding element. The unwinding propensity is much greater for the 3Ј MAR than the 5Ј MAR; in a supercoiled plasmid, significant unwinding can be detected in the 5Ј MAR only when the 3Ј MAR is deleted (14). The core unwinding element is defined as a short discrete site that resists base pairing even under conditions that greatly favor a double-stranded configuration, and mutation of these sites results in a complete loss of the unwinding propensity of the MAR. Previous missing nucleoside experiments (13) showed that SATB1 directly contacts three sites in the 5Ј MAR (sites I, II, and III) when the isolated 5Ј MAR was used as a substrate. Using the 3Ј MAR as a substrate, three adjacent ATC sequence stretches (sites IV, V, and VI) were detected as the SATB1 contact sites (Fig. 3A). Binding site IV overlaps with the core unwinding element and is the major binding site, since SATB1 makes contacts with sites V and VI only when site IV is mutated and is no longer bound (13).
To determine whether the homeodomain in SATB1 contributes to this preferential recognition of site IV, we used mutated MAR fragments as probes in gel mobility shift experiments with GST(MD ϩ HD) and GST(MD⌬HD). Each of these mutated MARs had one of the three sites destroyed by mutation and two sites intact (Fig. 3A). The affinity of the MAR-binding domain alone (MD⌬HD) to each of the three mutated fragments was nearly the same as to wild-type 3Ј MAR with estimated K d values of 15, 20, 22, and 12 nM for 3Ј MAR, mut IV, mutV, and mut VI, respectively (Fig. 3B, only the results for wild-type 5Ј-and 3Ј MARs and mut IV are shown). This result indicates that the MAR domain, in the absence of the homeodomain, cannot effectively distinguish among the three sites in the ATC sequence cluster. Regardless of which site was mutated, binding by (MD⌬HD) was unaffected. On the other hand, the presence of the homeodomain together with the MAR-binding domain (MD ϩ HD) exhibited a significantly reduced binding affinity to mut IV (K d ϭ 14 nM) compared with wild-type 3Ј MAR (K d ϭ 2.5 nM) (Fig. 3B). No significant decrease in binding affinity was detected for MD ϩ HD to mut V or mut VI compared with wild type, as long as site IV remained intact (data not shown). These results also show that the HDmediated increase in affinity to the 3Ј MAR does not merely reflect a cooperativity of binding, caused by the presence of adjacent binding sites. If this were the case, any one of the three mutations would be expected to abolish the effect of the homeodomain and not just mutation of site IV. In fact, binding of (MD ϩ HD) is virtually noncooperative, since a Hill coefficient of 1.2 was determined. On the contrary, the weaker binding of (MD⌬HD) to 3Ј MAR appears to be cooperative, with a Hill coefficient of 2.2 (data not shown).
The contribution of the homeodomain in directing SATB1 to the core unwinding element was further confirmed by employing concatemers of each site with short surrounding sequences as probes in gel mobility shift experiments (data not shown). The concatemer wt(IV) 5 is identical to the previously described synthetic MAR wild type (25) 5 (10). If the homeodomain does assist the MAR-binding domain to preferentially recognize the core unwinding element, it should specifically increase affinity to wt(IV) 5 but not to wt(V) 5 or wt(VI) 5 . Indeed, the increase in binding affinity observed with (MD ϩ HD) compared with (MD⌬HD) was 10-fold for wt(IV) 5 but less than 2-fold for wt(V) 5 and wt(VI) 5 (data not shown). Thus, the homeodomain of SATB1 contributes to binding specificity by selectively increas-ing the affinity to site IV that contains the wild-type core unwinding element. It should be noted that, although the MAR-binding domain alone cannot distinguish among the three sites in the natural context of the 3Ј MAR, when each binding site was concatemerized and used separately as probe, (MD⌬HD) showed moderate preference for site IV over site V and site VI. This preference for site IV, however, was much more pronounced when MD was associated with HD.
These results strongly indicate that in the context of the 3Ј MAR fragment, the MAR-binding domain of SATB1 is sufficient for the ATC sequence context recognition, because it can bind to any one of the three sites in the ATC sequence cluster of the IgH 3Ј MAR with comparable affinities. The specific recognition of the core unwinding element within the 300-bp MAR fragment, however, requires the association of the MARbinding domain with the homeodomain. The homeodomain appears to direct SATB1 toward a preferential recognition of the core unwinding element in a cluster of ATC sequences, as illustrated in Fig. 5.
Specific Mutations within the N-terminal Arm of the SATB1 Homeodomain Reduce Homeodomain Activity-Homeodomains generally contact DNA by two separate regions, an Nterminal arm lies in the minor groove and specific DNA contacts are mediated by Arg-3 and Arg-5. The third ␣-helix or recognition helix fits in the major groove of the recognition site, and Gln-50 and Asn-51 were shown to specifically contact DNA (32,37). These residues are conserved in the SATB1 homeodomain, and we tested by site-directed mutagenesis whether these residues are required for the homeodomain-mediated increase in affinity. In GST-(MD ϩ HD) Arg-3 and Arg-5 of the N-terminal arm of the homeodomain were substituted with alanine residues (mutR 3 R 5 ), and in the putative third helix the FQN motif (position 50 -52) was replaced with alanine residues (mutFQN) (Fig. 4A). Mut R 3 R 5 showed a 4.4-fold decrease in binding affinity to the 3Ј MAR in comparison to that of wildtype MD ϩ HD (Fig. 4B). The effect of mut R 3 R 5 is, therefore, comparable to the effect of the homeodomain deletion that resulted in a 6-fold decrease in affinity. Mut FQN showed an intermediate effect on binding affinity by exhibiting a 2.4-fold decrease in binding (Fig. 4B). Thus, the major contribution of the homeodomain is mediated by its N-terminal arm, most likely in the minor groove. This binding may be supported by the interaction of the third helix of the homeodomain in the major groove.
The Homeodomain Recognizes a Short (C/A)TAATA Motif That Colocalizes with the Core Unwinding Element-To examine if specific residues in binding site IV are necessary for homeodomain recognition, we analyzed a series of single point mutations as shown in Fig. 4B, left panel. Among these, mut 4, mut 5, and mut 6 each had one of the three base substitutions made in mut IV. These single point mutations did not alter the high unpairing propensity of DNA sequences in the 3Ј MAR (14). When K d values were determined for (MD ϩ HD) versus (MD⌬HD) using these singly mutated fragments, it was found that the homeodomain did not increase binding affinity of the GST-fused SATB1 to mut 5, mut 6, or mut 7 (just like for mut IV). Mut 8, in which 5Ј-CTAATA-3Ј was replaced with 5Ј-ATAATA-3Ј, had an intermediate effect; the homeodomain still increased binding affinity to mut 8, although to a lesser extent than wild type. These experiments show that the specific sequence 5Ј-(C/A)TAATA-3Ј (742-747), located within binding site IV 5Ј-TTCTAATATAT-3Ј (740 -750), is essential for recognition by the SATB1 homeodomain. The MAR domain alone did not distinguish the point mutations in the 300-bp 3Ј MAR fragment; the K d values for (MD⌬HD) were essentially the same for wild-type 3Ј MAR, mut IV, and mut 2-8 (Fig. 4B).
Furthermore, mut R 3 R 5 had a similar effect on binding affinity as the homeodomain deletion (MD⌬HD). This series of experiments indicates that the specificity of SATB1 toward the core unwinding element of the 3Ј MAR is achieved by the presence of both MAR-binding domain and the homeodomain. It remains to be established whether the homeodomain, when linked to the MAR-domain in the natural protein context, directly contacts DNA. DISCUSSION SATB1, a cell type-specific MAR-binding protein essential for T-cell development, contains a MAR-binding domain and a newly identified atypical homeodomain. These two domains act together to confer binding specificity toward the core unwinding element found within a MAR.
Multiple Domain Structure of SATB1-The MAR-binding protein SATB1 contains a MAR-binding domain, a homeodomain, and two Cut-like repeats. The SATB1 homeodomain is unique among known homeodomains; a striking feature is the replacement of the invariant tryptophan at position 49 of the homeodomain with a phenylalanine in SATB1. This may have important implications for structure and function of the protein, since tryptophan 49 is not only conserved in all the homeodomains so far identified but is also essential for homeodomain function. Mutations of the WFQ motif containing the tryptophan 49 in the oct-1 homeodomain abolished DNA binding (38), and the mutant phenotype of dwarf mice, characterized by abnormal development of the anterior pituitary gland, is caused by a single point mutation that replaces tryptophan with cysteine in the POU homeodomain of pit-1 (39).
The presence of Cut-like repeats and a homeodomain in SATB1 suggests structural similarity to the Cut proteins identified from various species (34,36,40,41). Cut proteins contain a set of three cut repeats followed by a homeodomain. The phenotype of mutants in Drosophila suggests a role for cut protein in cell specification in several tissues including the wing (42), the external sensory organs (34), and Malpighian tubules (43). SATB1 may be considered a distant relative of the cut family of proteins; however, the SATB1 homeodomain shares more homology with the homeodomain of engrailed (33% identity) than with that of Cut proteins (26% identity). Furthermore, unlike known Cut repeats that were shown to be specific DNA-binding domains (33,35,44), the Cut-like repeats in SATB1 did not appear to bind SATB1-binding sites. It remains to be established if the SATB1 cut-like repeats recognize other DNA sequences that were not tested here.
Homeodomain Contribution to MAR Binding-The isolated SATB1 homeodomain exhibits only very weak nonspecific binding activity to base-unpairing sequences. The MAR domain, on the other hand, can bind independently with high affinity and specificity; it distinguishes MARs that can unwind from mutated MARs that have lost this capability. Thus, the homeodomain initially appeared to be nonsignificant in DNA binding. However, a unique function is now attributed to this homeodomain. When associated with the MAR domain in the natural protein context, the SATB1 homeodomain directs the MAR domain to the core unwinding element of a MAR. This distinguishes SATB1 from the way by which Paired protein, the Drosophila Cut, and the mammalian Cut-like proteins recognize their target DNA. In these proteins, the homeodomains bind DNA independently, and the associated domains contribute to binding specificity by making additional DNA contacts (33,35,44,45). The SATB1 homeodomain is similar to the homeodomains in the POU transcription factors, which cannot bind independently or bind with low affinity and relaxed specificity (reviewed in Ref. 46). In the case of the POU transcription factors, both the POU domains and the homeodomains are equally required for high affinity binding, and together they form a bipartite binding domain (38). For SATB1, on the other hand, the MAR domain alone displays fully functional MARbinding activity, and the contribution of the homeodomain results in further selection of specific elements embedded within a MAR sequence context. The contribution of the homeodomain is small, however, and was previously missed when the minimum domain that confers MAR binding was delineated (20). This in part could be due to the active protein component in the full-sized bacterially produced SATB1 not being accurately determined.
The dissection of SATB1 protein in individual components has brought to light how these multiple levels of recognition are ultimately put together to achieve a high degree of binding site specificity that is unprecedented among MAR-binding proteins. This is illustrated in Fig. 5. We had previously shown that SATB1 does not bind MARs merely on the basis of their high AT content but that it specifically recognizes AT-rich regions in MARs that have a high propensity for base unpairing, and within these base-unpairing regions it exhibits a preference for binding to the core unwinding element (13). First, we showed in a separate study using a phage display library of random peptides that a short peptide homologous to the N-terminal arm of the MAR-binding domain can effectively recognize ATrich DNA (47). This suggests that the short homologous N-and C-terminal amino acid stretches of the MAR-binding domain are individually sufficient for recognizing AT-rich DNA, but to distinguish between AT-rich DNA with high unwinding propensity and DNA that lacks this property, the entire 150-amino acid MAR-binding domain is required. Within an AT-rich DNA sequence with high unwinding propensity, the specific recognition of the core unwinding element that is critical in affecting overall DNA structure of the MAR (14) is achieved by the combined action of a unique homeodomain and a MAR-binding domain. Core unwinding elements have been identified in several other MARs, such as in the MAR at the 5Ј boundary of the human ␤-globin locus control region (48). 2 These elements are remarkably similar to the SATB1 homeodomain recognition site of the IgH 3Ј MAR, which suggests that SATB1 may exhibit preference for core unwinding elements in general.
Unusual Mode of Binding of the MAR Domain and Homeodomain in SATB1-The MAR-binding domain in SATB1 binds DNA in the minor groove, making little contact with DNA bases. SATB1 presumably recognizes DNA sequences indirectly by binding to the altered sugar phosphate backbone structure dictated by a specific DNA sequence context (13). Although the homeodomain in SATB1 does not bind DNA independently, mutagenesis of the target DNA revealed that a specific sequence 5Ј-(C/A)TAATA-3Ј, in the SATB1 binding site IV, is necessary for the increase in affinity mediated by the homeodomain. Furthermore, the increase in affinity was almost completely abolished by alanine substitutions of arginine residues in the N-terminal arm of the SATB1 homeodomain, which is known in other homeodomains to bind the minor groove. The corresponding region for other homeodomain was found to be flexible and lack any secondary structure as shown by NMR and x-ray crystallography (reviewed in Ref. 49). There-fore, the effect resulting from alanine substitutions of the two arginine residues is unlikely to be a consequence of the subsequent change in the overall protein folding. These results taken together suggest, but do not prove, that the homeodomain, in the context of the SATB1 protein, may directly contact the target DNA site in the minor groove. Unlike other homeodomains, mutagenesis of residues in the third helix, which is known to interact with the major groove, has only a minor effect on SATB1 binding. This finding is consistent with previous results showing that SATB1 is a minor groove binding protein.
The SATB1 homeodomain recognition sequence found in site IV is similar to the homeodomain binding site consensus, TAAT core (22,50), and it overlaps with the direct SATB1 contact site IV. Missing nucleoside experiments revealed no additional contacts with (MD ϩ HD) compared with (MD⌬HD) (data not shown). This result, taken together with the fact that the sequence 5Ј-(C/A)TAATA-3Ј in site IV is responsible for the positive effect of the homeodomain in SATB1 binding, may suggest that upon binding to a MAR, the SATB1 homeodomain and the MAR domain contact the same site simultaneously, possibly from opposite sides of the DNA helix. Crystal structural analysis must be done to determine whether the SATB1 homeodomain in its natural protein context directly makes contact with DNA. It is of interest that the crystal structure of the even-skipped homeodomain showed that two homeodomains are bound by one 10-bp consensus sequence on both faces of the DNA, without any steric hindrance (51). This simultaneous occupation of one site from both sides of the DNA helix could provide significant stability to the protein-DNA complex. This protein-DNA interaction is unusual, however. The multiple DNA-binding domains found in the POU, Cut, and the Paired proteins bind to sites that are juxtaposed. Similarly, in the transcription factor oct-1, the POU-specific domain and the homeodomain were suggested to occupy adjacent positions in the major groove (52).
Biological Significance of SATB1 Recognition of MARs-Homeodomains represent the hallmark of developmental regulatory proteins (reviewed in Ref. 21), and the presence of this domain in a MAR-binding protein is unprecedented. In this regard, SATB1 is unique among several other proteins that preferentially bind MARs in vitro including nucleolin (15), topoisomerase II (19), histone H1 (53), the high mobility group proteins HMG I/Y (54), lamin B1 (18), ARBP (55), and hnRNPU (SAF-A) (56 -58). In fact, a recent study of SATB1 knockout mice showed that SATB1 ablation results in a major defect in T-cell development and alterations in expression of multiple genes. 3 Genomic DNA sequences that are bound to SATB1 in vivo have recently been characterized based on cross-linking techniques. This study revealed that in the nucleus SATB1 actually binds DNA sequences containing ATC sequence clusters and that these sequences are tightly bound to the nuclear matrix, representing MARs. 4 This result, together with the results from the SATB1 knockout experiments, suggests that higher order chromatin structure may be involved in T-cellspecific gene regulation. Such regulation could be directed toward MARs at the base of chromatin loops, in particular toward the core unwinding elements, as specified by the combined action of the MAR-binding domain and the homeodomain of SATB1. the manuscript, and Dr. Joel Gottesfeld for expert advice and constructive criticism of the manuscript.