CCAAT Displacement Activity Involves CUT Repeats 1 and 2, Not the CUT Homeodomain*

The CCAAT displacement protein, the homolog of the Drosophila melanogaster CUT protein, contains four DNA-binding domains: three CUT repeats (CR1, CR2, and CR3) and the CUT homeodomain (HD). Using a panel of fusion proteins, we found that a CUT repeat cannot bind to DNA as a monomer, but that certain combinations of domains exhibit high DNA-binding affinity: CR1+2, CR3HD, CR1HD, and CR2HD. One combination (CR1+2) exhibited strikingly different DNA-binding kinetics and specificities. CR1+2 displayed rapid on and off rates and bound preferably to two C(A/G)AT sites, organized as direct or inverted repeats. Accordingly, only CR1+2 was able to bind to the CCAAT sequence, and its affinity was increased by the presence of a C(A/G)AT site at close proximity. A purified CCAAT displacement protein/CUT protein exhibited DNA-binding properties similar to those of CR1+2; and in nuclear extracts, the CCAAT displacement activity also required the simultaneous presence of a C(A/G)AT site. Moreover, CR1+2, but not CR3HD, was able to displace nuclear factor Y. Thus, the CCAAT displacement activity requires the presence of an additional sequence (CAAT or CGAT) and involves CR1 and CR2, but not the CUT homeodomain.

A CCAAT displacement activity was identified originally in sea urchin and later in humans and other mammals (1)(2)(3). Sequence analysis of the cDNA encoding for the human CCAAT displacement protein (CDP) 1 revealed a high degree of conservation with the Drosophila melanogaster homeodomain protein CUT (4,5). Several lethal and viable cut mutations have been reported in Drosophila (reviewed in Ref. 6;. Altogether, genetic studies in Drosophila suggested that the CUT protein plays an important role in determining cell-type specificity in several tissues late in development (11)(12)(13)15). In mammals, CDP/CUT has been characterized primarily as a transcriptional repressor (16 -19). The bulk of the results indi-cated that CDP/CUT expression or activity may be restricted to proliferating cells. Accordingly, many of the identified targets of CDP/CUT are genes that are repressed in proliferating precursor cells and are turned on as cells become terminally differentiated and CDP/CUT activity ceases (1, 3, 5, 16, 20 -25). It was thus proposed that mammalian CDP/CUT proteins function as transcriptional repressors that inhibit terminally differentiated gene expression during early stages of differentiation (3,16,26). In addition, a role for CDP/CUT in cell cycle progression was suggested from the findings that CDP/CUT DNA-binding activity oscillates during the cell cycle, reaching a maximum at the end of G 1 and during the S phase (27) . CDP/ CUT was shown to repress the p21 waf1/cip1/sdi1 gene and also to bind to the promoters of various histone genes, which are regulated in a cell-cycle dependent manner (1,(27)(28)(29)(30)(31)(32). Intriguingly, binding of CDP/CUT to histone gene promoters has been associated with both activation and repression of these genes, and it has been suggested that CDP/CUT activity could be modulated by Rb and Rb-related proteins (1, 28, 30 -32).
Sequence homology between Drosophila and mammalian CUT proteins is limited to five evolutionarily conserved domains: a region predicted to form a coiled-coil structure, three regions of ϳ70 amino acids, three CUT repeats (CR1, CR2, and CR3, which share from 52 to 63% amino acid identity with each other), and a CUT-type homeodomain (HD) (4,5). The high degree of conservation of CUT repeats suggested that they may have an important biochemical function. Indeed, CUT repeats were found to function as specific DNA-binding domains (33)(34)(35)(36). CDP/CUT proteins therefore are unique in that they contain four DNA-binding domains: the CUT homeodomain and the three CUT repeats.
The available data suggest that CDP/CUT has the capability to bind to a wide range of DNA sequences. Reported binding sites include sequences related to CCAAT, ATCGAT, Sp1 sites, and AT-rich matrix attachment regions (23,(37)(38)(39). On the other hand, PCR-mediated site selection with GST fusion proteins containing various CDP/CUT DNA-binding domains led to the isolation of several types of sequences that could be aligned with either ATNNAT (mainly ATCGAT) or CCAAT. However, a sizable fraction of the selected sequences (ϳ20%) diverged greatly from these consensus sequences and yet represented excellent binding sites (33,34,36). These results therefore indicate that CDP/CUT proteins can tolerate a certain degree of flexibility in their DNA targets. This property may not be unique to CDP/CUT proteins, as a similar relaxed sequence specificity was found for the GATA factors when submitted to PCR-mediated site selection (40,41).
The evolutionary conservation of four DNA-binding domains within the same protein strongly suggests that this peculiar organization permits the execution of functions that could not otherwise be fulfilled. At this point, however, it is not well understood how CDP/CUT proteins interact with DNA. So far, DNA binding by CUT repeats has been investigated using glutathione S-transferase fusion proteins, which exist as dimers (33)(34)(35)(36). It has not been convincingly demonstrated whether each CUT repeat can bind DNA as a monomer or whether it requires cooperation with another CUT repeat or with the CUT homeodomain. As a monomer, CR3 was shown to cooperate with the CUT homeodomain to bind to DNA with high affinity and specificity (35,36), yet it is not known whether CR1 and CR2 can cooperate with each other or with the CUT homeodomain. If this were the case, we could extrapolate that various combinations of CUT repeats and the CUT homeodomain may confer to the protein the capacity to interact with a large spectrum of DNA sequences. Moreover, if CDP/ CUT proteins were capable of binding simultaneously to multiple binding sites, then cooperativity should increase the overall affinity for a given DNA segment. Another issue that remains to be investigated is which of the four CDP/CUT DNAbinding domains are responsible for the CCAAT displacement activity and what are the DNA sequence requirements for this activity to take place.
To begin to decipher the various modes of interaction of CDP/CUT with DNA, we have expressed each CUT repeat as a monomer, either alone or with another CUT repeat or the CUT homeodomain. As monomers, either two CUT repeats or one CUT repeat and the CUT homeodomain were required for efficient DNA binding. The DNA-binding properties of the most efficient combinations (CR1ϩ2 and CR3HD) were characterized and compared with that of the full-length CDP/CUT protein. We then defined the DNA sequence requirement for the CCAAT displacement activity and investigated the contribution of each domain to this activity. In contrast to what was previously reported, our results demonstrate that CR1 and CR2 are responsible for the CCAAT displacement activity, without the participation of the CUT homeodomain. A model is presented to illustrate the known CDP/CUT DNA-binding activities.

MATERIALS AND METHODS
Plasmid Construction-Plasmids for expression of histidine-tagged fusion proteins were prepared by inserting various fragments from the human CDP/CUT cDNA (GenBank TM /EBI Data Bank accession number M74099) into the bacterial expression vector pET-15b (Novagen) (5): CR1, nt 1605-2019 into the XhoI site of pET-15b; CR2, nt 2861-3153 into the XhoI site of pET-15b; CR1HD and CR2HD, a fragment encoding the homeodomain (nt 3737-3949) placed in frame at the carboxyl terminus of His-tagged CR1 and CR2 into the vectors described above, respectively; CR1ϩ2, nt 1694 -3127 into the BamHI site of pET-15b; and CR2ϩ3, nt 2861-3737 into the BamHI site of pET-15b. For CR1-L-L-HD, three separate fragments (nt 1694 -2853, 3071-3409, and 3673-3963) were placed in frame into the XhoI site of pET-15b. For CR2-L-HD, two fragments (nt 2853-3409 and 3673-3963) were placed in frame into the BamHI site of pET-15b. The MBP-CR3 and MBP-CR3HD vectors have previously been described (36). MBP-CR1 was prepared by inserting nt 1605-2019 into the pMal-C2 plasmid (New England Biolabs Inc.). For expression of the full-length CDP/CUT protein in Sf9 insect cells, nt 27-5101 were inserted into pBlueBac His2b (Invitrogen). The resulting plasmid was cotransfected with a helper plasmid to obtain baculoviruses expressing CDP/CUT.
Expression and Purification of CDP/CUT Fusion Proteins-pET-15b-and pMal-C2-derived vectors were introduced into the BL21(DE3) or DH5 strain of Escherichia coli and induced with isopropyl-␤-D-thiogalactopyranoside. Sf9 insect cells were infected with baculovirus encoding His-tagged CDP/CUT and incubated for 3 days. The fusion proteins were purified by affinity chromatography following procedures recommended by the suppliers (Invitrogen).
Electrophoretic Mobility Shift Assay (EMSA)-EMSAs were performed with either 10 ng of purified fusion protein or 5 g of nuclear extract from mammalian cells. The samples were incubated at room temperature for 5 min in a final volume of 30 l of 25 mM NaCl, 10 mM Tris, pH 7.5, 1 mM MgCl 2 , 5 mM EDTA, pH 8.0, 5% of glycerol, and 1 mM dithiothreitol with 30 ng of poly(dI⅐dC) and 3 g of bovine serum albumin as nonspecific competitors. End-labeled double-stranded oligonucleotides (ϳ10 pg) were added and further incubated for 15 min at room temperature. Samples were loaded on a 5 or 4% polyacrylamide gel (30:1) and separated by electrophoresis at 8 V/cm in 0.5ϫ Tris borate/EDTA. Gels were dried and visualized by autoradiography.
PCR-mediated Random Site Selection-Binding site selections were performed essentially as described previously using 50 ng of His-tagged CR1ϩ2 fusion protein in the first selection cycle and then 10 ng in the subsequent cycles (36,42). The sequence of the oligonucleotide used was 5Ј-AGACCTGCAGTCTGC(N) 15 CTGTCGTCTAGAGGA-3Ј. Pro-tein⅐DNA complexes were separated from the free oligonucleotides by electrophoresis on a 5% polyacrylamide gel (30:1). After the fifth cycle, the PCR products were digested with PstI and XbaI and cloned into the plasmid pBluescript KS (Stratagene). Sequencing of the inserts were performed with the T7 polymerase sequencing kit (U. S. Biochemical Corp.).
Calculation of DNA-binding Affinity-To determine the dissociation constant (K D ), EMSAs were performed essentially as described above, but using a fixed amount of DNA (10 pM) and a wide range of protein concentrations with the following modifications: Ͻ10 pM DNA was used, and protein and DNA were incubated for 15 min at room temperature. The binding affinity (K D ) was calculated using the method described by Carey (43,44). The amount of free and bound DNAs was quantitated by scanning of the autoradiograms on a PhosphorImager (Fuji). Scintillation counting of the excised bands in one case gave similar results. The data were plotted as the fraction of free DNA versus log of protein concentration. Since the protein concentrations did not take into account the fraction of inactive proteins, which was estimated in independent experiments to be Ͻ30% in each case, our data are referred to as the apparent dissociation constant (K D(app) ).
On (k on ) and Off (k off ) Rate Determination-To estimate the on rate, the reaction was performed at 4°C. 50 ng of purified protein was preincubated at 4°C for 30 min in EMSA binding buffer prior to adding 1 ng of radiolabeled probe. Once the probe was added to the reaction mixture, an aliquot was taken and loaded on nondenaturing polyacrylamide gel at different time points in the presence of electric current. To estimate the off rate, 50 ng of purified proteins was incubated with 1 ng of radiolabeled probe for 15 min. A 1000-fold excess of the unlabeled probe was added to the reaction mixture, and aliquots were taken and loaded on nondenaturing polyacrylamide gel at different time points in the presence of electric current.
Preparation of Nuclear Cell Extract-Monolayers of NIH3T3 cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% fetal bovine serum. Nuclear extracts were prepared according to the procedure of Lee et al. (45), except that nuclei were obtained by submitting cells to three freeze/thaw cycles in buffer containing 10 mM Hepes, pH 7.9, 10 mM KCl, 1.5 mM MgCl 2 , and 1 mM dithiothreitol.
Methylation Interference Assay-This assay was performed essentially as described previously (46). The probe was prepared from the pBluescript KS vector, whose polylinker includes an ATCGAT motif. To label the upper or lower DNA strand, the plasmid was digested with either EcoRI or XhoI, end-labeled with [␥-32 P]ATP using T4 polynucleotide kinase, and digested with the second enzyme (XhoI or EcoRI). The probes were purified from a nondenaturing acrylamide gel by passive elution. The labeled probes were partially methylated at purines using dimethyl sulfate. 20,000 cpm of partially methylated probe was used in EMSAs as described above. The free and retarded DNAs were visualized by autoradiography and purified by passive elution. The DNAs were cleaved at guanine and adenine bases using 0.2 M NaOH. The digested products were analyzed on 12% denaturing polyacrylamide gel.

Monomers of CUT Repeats Do Not Efficiently Bind to DNA-
The DNA-binding properties of CUT repeats have previously been investigated using GST fusion proteins (33-36, 47). Since GST fusion proteins exist as dimers, these prior studies did not resolve whether individual CUT repeats can bind to DNA (36,48,49). To address this issue, we expressed each CUT repeat as a fusion protein with a histidine tag and assessed their DNAbinding properties in EMSAs using 29FP oligonucleotides encoding a CDP/CUT consensus binding site (ATCGAT). Representative results using 10 ng of His-tagged fusion proteins are shown in Fig. 1B. Similar results were obtained when MBP was fused to CR1, CR3, or CR3HD (data not shown). When expressed as monomeric fusion proteins, the CUT repeats and the CUT homeodomain were not able to bind efficiently to DNA (Fig. 1B, lanes 1-4). We then asked whether various combinations of CUT repeats and the CUT homeodomain could bind to DNA as monomers. CR1ϩ2, but not CR2ϩ3, was able to bind to DNA (Fig. 1B, lanes 5 and 6). In addition, each CUT repeat bound to DNA when expressed together with the CUT homeodomain: CR1HD, CR2HD, and CR3HD (Fig. 1B, lanes 7-9). In the latter fusion proteins, CR3 and the CUT homeodomain were maintained in the arrangement in which they exist in the CDP/CUT protein. Thus, the result obtained with the CR3HD fusion protein most likely reflects the activity of these domains in the context of the CDP/CUT protein. However, the same cannot be said of CR1HD and CR2HD because, in these fusion proteins, the homeodomain has been brought in close proximity to CR1 or CR2, unlike in the original CDP/CUT protein. To verify whether CR1 and CR2 could cooperate with the CUT homeodomain in the context of the entire CDP/CUT protein, we engineered larger fusion proteins in which CR2 and/or CR3 had been precisely deleted, keeping intact the linker regions between CUT repeats: CR1-L-L-HD and CR2-L-HD (Fig. 1A). These two fusion proteins were able to bind to DNA, indicating that CR1 and CR2 can cooperate with the CUT homeodomain even when positioned at a distance from each other in the primary sequence (Fig. 1B, lanes 10 and 11). Whether such cooperation can occur in the context of a full-length CDP/CUT protein that contains CR3 could not be demonstrated because any DNA sequence that is recognized by CR1HD or CR2HD is also bound by CR3HD (see "Discussion"). In summary, when expressed as monomers, individual CUT repeats cannot bind to DNA, but can do so in conjunction with another CUT repeat or the CUT homeodomain. Therefore, at least two DNA-binding domains are necessary for DNA binding, and the combinations that work are CR1ϩ2, CR1HD, CR2HD, and CR3HD.

CR1ϩ2 Binds to Sequences Containing Two C(A/G)AT Sites, Organized Either as Direct or Inverted Repeats and Separated by a Variable Number of Nucleotides-
We have assessed the DNA-binding specificity of CR1ϩ2 by performing polymerase chain reaction-mediated random oligonucleotide selection using oligonucleotides containing random bases at 15 positions. Selected sequences were then analyzed to find the best alignment and to derive a consensus binding site. Strikingly, all selected binding sites contained a CAAT or CGAT sequence almost invariably positioned at the beginning of the random sequence, indicating that C(A/G)AT is a preferred binding site, but that the flanking, nonrandom sequences also contribute to the binding site ( Fig. 2A). For this reason, we have included the last 3 nonrandom bases in the compilation of selected sequences. In addition, all selected binding sites contained a second C(A/G)AT sequence (although sometimes with one mismatch) either as a direct or inverted repeat and positioned at a variable distance from the first C(A/G)AT site. Consequently, the selected sequences have been organized in two groups on the basis of whether the C(A/G)AT sites occur as direct or inverted repeats. Thus, the derived consensus binding sites for CR1ϩ2 include two half-sites organized as direct or inverted The consensus binding site for CR1ϩ2 is quite different from

FIG. 2. Compilation of sequences selected by CR1؉2.
A, DNA sequences were selected using purified His-tagged CR1ϩ2 and PCRgenerated random oligonucleotides as described under "Material and Methods." After five rounds of selection, selected oligonucleotides were cloned and analyzed by DNA sequencing. The DNA sequences were organized to fit the best possible alignment. On the basis of the frequency of each nucleotide at each position, a core consensus sequence was deduced. B, radiolabeled oligonucleotides encoding sequences selected by CR1ϩ2 were incubated with 100 ng of the indicated fusion protein at room temperature for 15 min and resolved on a nondenaturing polyacrylamide gel. C, radiolabeled oligonucleotides encoding the CR1ϩ2 consensus site (CAAT-CGAT) were incubated with decreasing concentrations of monomers of CR1, CR2, or CR1ϩ2 at room temperature for 15 min and resolved on a nondenaturing polyacrylamide gel. the ATCGAT consensus that was previously identified using GST fusion proteins (33,34,36). However, we noted that the juxtaposition of two CGAT sites, as in the case of sequences where n ϭ 0, reconstitutes the ATCGAT consensus binding site. Such a sequence would be expected to be recognized by CR3HD, whereas other sequences where n is Ͼ0 would be bound by CR1ϩ2 only. Indeed, CR1ϩ2 bound well to oligonucleotides where the C(A/G)AT sites were separated by either 0, 5, or 6 bases, but CR3HD recognized only the N 0 oligonucleotide (Fig. 2B).
Monomers of CR1 or CR2 Do Not Bind to a Recognition Site That Is Optimal for CR1ϩ2-We showed in Fig. 1B that monomers of CR1 or CR2 could not bind efficiently to oligonucleotides containing the ATCGAT motif. This failure of CUT repeat monomers to bind DNA could have resulted from the fact that the ATCGAT site is not optimal for CUT repeats. To resolve this issue, we performed EMSAs with oligonucleotides encoding a consensus binding site for CR1ϩ2. Monomers of CR1 or CR2 were compared with the CR1ϩ2 bipartite domain. Although CR1ϩ2 generated a strong retarded complex in the nanomolar range, a monomer of CR1 produced only a faint retarded complex when present in the 10 Ϫ7 M range, and the CR2 monomer did not exhibit any DNA binding (Fig. 2C). Altogether, the results in Figs. 1B and 2C demonstrate that CUT repeats do not efficiently bind to DNA as monomers.
CR1ϩ2 Binds to the ATCGAT Binding Site More Efficiently when a Second CGAT Motif Is Also Present-In previous studies, the universal CUT repeat consensus binding site was defined as ATCGAT (33,34,36). Indeed, CR1ϩ2 bound well to the 29FP oligonucleotide containing this motif (Fig. 1, lane 5). We reasoned that this oligonucleotide was well recognized by CR1ϩ2 because it contained one perfect CGAT half-site (within the ATCGAT motif), and one imperfect inverted half-site (CGGT) immediately upstream (Fig. 3A). To test this hypothesis, we generated two sets of oligonucleotides that differed within the second half-site. In one case, a point mutation gen-erated a second, perfect CGAT site. As shown in Fig. 3, CR1ϩ2 bound more efficiently to this sequence ( compare A and B). In the other case, two point mutations created a GGGC motif, which diverged completely from the CGAT consensus. CR1ϩ2 bound the least efficiently to this sequence (Fig. 3, compare C  with A and B). In contrast, the affinity of CR3HD for the three sets of oligonucleotides did not vary significantly (Fig. 3, compare D-F). These results demonstrate that the affinity of CR1ϩ2 for a given DNA sequence depends on the presence of two half-sites that conform to the C(A/G)AT consensus.
CR1ϩ2 and CR3HD Exhibit Similar DNA-binding Affinity, but Very Different DNA-binding Kinetics-Since the CUT repeats and the CUT homeodomain exist in their normal configuration in the CR1ϩ2 and CR3HD fusion proteins, we decided to characterize further their DNA-binding properties. The DNA-binding affinity of each fusion protein for its consensus binding site was measured by calculating the K D(app) as described under "Materials and Methods" (Fig. 4). The K D(app) values were calculated by plotting the fraction of free DNA versus the log of protein concentration. CR3HD and CR1ϩ2 exhibited similar K D(app) values for their respective consensus binding sites: 1.6 and 1.1 ϫ 10 Ϫ9 M (Fig. 4 and Table I).
Whereas the off rate of CR1ϩ2 was Ͻ1 min, the CR3HD⅐DNA complex was much more stable (Fig. 5, A and C). We considered the possibility that the low stability of the CR1ϩ2⅐DNA complex was due to the fact that this binding site was not optimal. We thus recalculated the off rate, this time using the N 5 oligonucleotides containing the CCAAT-ATTG sequence (note that ATTG is the reverse complement of CAAT). Again, the retarded complex had completely disappeared 1 min after addition of the specific competitor DNA (Fig. 5B). We conclude that a fast off rate is an intrinsic property of CR1ϩ2. The CUT homeodomain must be responsible for the increased stability of the protein⅐DNA complex since CR1HD and CR2HD exhibited slower off rates than CR1ϩ2 (Fig. 5, D, E, and A,  respectively).
Another striking difference was observed regarding the on rate. Whereas an equilibrium was reached within the first FIG. 3. CR1؉2, but not CR3HD, binds better to ATCGAT oligonucleotides that contain an additional CGAT motif. Radiolabeled oligonucleotides encoding the ATCGAT site with various flanking sequences were incubated with decreasing amounts of purified fusion proteins at room temperature for 15 min and resolved on a nondenaturing polyacrylamide gel. In A and D, the original 29FP oligonucleotide containing the ATCGAT site also includes the sequence CGGT, which closely resembles the CGAT consensus half-site for CR1ϩ2. In B, C, E, and F, point mutations were introduced in the second half-site either to produce a perfect CGAT (B and E) or to make a degenerate half-site, GGGC (C and F). minute in the reaction with CR1ϩ2, the intensity of the retarded complex continued to increase for at least 15 min in the case of CR3HD (Fig. 5, F and G). Thus, CR1ϩ2 rapidly associates with DNA, whereas formation of CR3HD⅐DNA complexes takes place much more slowly. As a result, even though the CR1ϩ2⅐DNA complex was very unstable, the rapidity with which CR1ϩ2 bound to DNA helps explain why its K D(app) was not higher than that of CR3HD. In summary, CR1ϩ2 and CR3HD exhibited strikingly different DNA-binding kinetics: CR1ϩ2 rapidly bound to DNA, but just as rapidly dissociated from it; in contrast, CR3HD slowly associated with DNA, but remained bound to it for a much longer period of time.
CR3HD Wraps around the DNA More than CR1ϩ2-To understand the difference in DNA-binding kinetics between CR1ϩ2 and CR3HD, we performed DNA methylation interference assays. The most obvious differences were at the second and fifth positions of the ATCGAT core, as indicated by the stars in Fig. 6A. Methylation at these positions interfered with CR3HD (but not CR1ϩ2) binding. In contrast, methylation of the adjacent guanine residue interfered with both CR1ϩ2 and CR3HD binding. Since dimethyl sulfate is known to methylate adenine at N-3 within the minor groove and guanine at N-7 FIG. 5. Off and on rates of CR1؉2, CR1HD, CR2HD, and CR3HD. A-E, 100 ng of the indicated fusion proteins was incubated with radiolabeled oligonucleotides at room temperature until equilibrium was reached (15 min). A 1000-fold molar excess of unlabeled oligonucleotides was added; and at the indicated time points, aliquots of the mixture were taken and analyzed by EMSA as described in the legend to Fig. 1. Oligonucleotides encoding the ATCGAT site were used in A and C-E, whereas the N 5 site, CAAT-ATTG (see Fig. 2B), was used in B. F and G, 100 ng of either CR1ϩ2 or CR3HD fusion protein, respectively, was mixed with radiolabeled oligonucleotides encoding the ATCGAT site. The incubation took place at 4°C; and at the indicated times, aliquots of the mixture were taken and analyzed by EMSA as described in the legend to Fig. 1.

FIG. 6. Methylation interference assay of CR1؉2 and CR3HD.
A, a DNA fragment containing the ATCGAT site was end-labeled within the upper or lower strand, partially methylated, and used in EMSAs with either CR1ϩ2 or CR3HD fusion proteins. The free and bound DNA molecules were separated by polyacrylamide gel electrophoresis and purified. Free DNA (F), bound DNA (B), and the probe before EMSA (P) were digested with NaOH and resolved on a 12% denaturing polyacrylamide gel. B, the ATCGAT binding site was mutated as shown, and the mutated oligonucleotides were tested by EMSA with CR1ϩ2 (lanes 1-3) and CR3HD (lanes 4 -6). C, shown is a diagram of the contact points made by CR1ϩ2 and CR3HD within the major and minor grooves of the ATCGAT core sequence. Note that in contrast to CR1ϩ2, CR3HD makes contact within the minor groove at the second and fifth positions. within the major groove, these results suggest that CR3HD makes contact within the minor and major groove at these positions, whereas CR1ϩ2 makes contact within the major groove only. To confirm that CR1ϩ2 and CR3HD make contact within the major groove at the CG positions within the ATCGAT core, we performed EMSAs using oligonucleotides in which the G:C base pairs were replaced for C:I. The structure of the C:I base pair differs from that of the C:G base pair only within the minor groove. Thus, these mutations should not affect DNA binding by CR1ϩ2 and CR3HD if indeed these proteins bind within the major groove at these positions. This is exactly what was observed: CR1ϩ2 and CR3HD efficiently bound to a DNA molecule containing the ATCIAT sequence in place of ATCGAT (Fig.  6B, compare lanes 1 and 3 and lanes 4 and 6). In contrast, a C:I base pair differs from a T:A base pair only within the major groove. Thus, replacing the T:A and A:T base pairs with C:I and I:C at the second and fifth positions of the ATCGAT core, respectively, should abolish DNA binding if CR1ϩ2 and CR3HD normally make contact within the major groove at these positions. Indeed, CR1ϩ2 and CR3HD did not bind efficiently to a DNA molecule containing the ACCGIT sequence instead of ATCGAT (Fig. 6B, compare lanes 1 and 2 and lanes  4 and 5).
Altogether, the results from methylation interference assays and EMSAs with mutated binding sites demonstrate that CR1ϩ2 and CR3HD make contact within the major groove at positions 2-5 of ATCGAT and within the minor groove at positions 1 and 6 of ATCGAT (Fig. 6C). The main difference was at the second and fifth positions of ATCGAT. CR3HD makes contact within the major and minor grooves at these positions, whereas CR1ϩ2 makes contacts only within the major groove. Thus, CR3HD wraps around the DNA, at least to some extent, at these positions, whereas CR1ϩ2 interacts with only one side of the double helix. These differences in DNAbinding contacts are likely to explain the higher stability of the CR3HD⅐DNA complex compared with CR1ϩ2.
CR1HD Binds Preferably to an ATCAAT Site-The DNA binding studies presented in Fig. 1 demonstrated that CR1 can cooperate with the CUT homeodomain to form a bipartite DNAbinding domain. These results raised the possibility that the CDP/CUT protein may exist in alternative conformations, one of which would favor the association of the homeodomain with CR1 instead of CR3. To investigate this possibility, we would first need to identify sequences that are recognized by CR1HD, but not by CR3HD. To this end, PCR-mediated site selection was performed with CR1HD. The derived consensus binding site was ATCAAT, although a few of the selected sequences included different bases at the two central positions of the ATNNAT core (data not shown). The same sequences were previously reported to be selected at high frequency by CR3HD (36). Therefore, as of yet, we do not know a DNA sequence that is recognized uniquely by CR1HD. As a result, it is not possible to verify whether CDP/CUT may exist in different conformational states. Full-length CDP/CUT Exhibits Limited Cooperativity between CR1ϩ2 and CR3HD-We next investigated the DNAbinding properties of the full-length CDP/CUT protein. To this end, the protein was purified as a histidine-tagged fusion protein using a baculovirus expression system. A Coomassie stain of the purified protein is presented (Fig. 7A). Since CR1ϩ2 and CR3HD can bind to distinct DNA sequences, we verified whether the full-length protein would bind with higher affinity to a DNA molecule containing two binding sites, one for CR1ϩ2 and one for CR3HD. EMSAs were performed with a series of double-stranded oligonucleotides containing either one or two binding sites separated by varying distances. Oligonucleotides were designed such that their total lengths did not vary by more than just a few base pairs. Judging from the intensity of the retarded complexes, the full-length CDP/CUT protein exhibited the highest affinity for a DNA molecule (N 19 ) containing two binding sites separated by 19 base pairs (calculated from the center of each binding site) (Fig. 7B). The K D(app) values indicated that the affinity was ϳ3-fold higher for N 19 (K D(app) ϭ 0.5 ϫ 10 Ϫ9 M) than for the single ATCGAT site (K D(app) ϭ 1.6 ϫ 10 Ϫ9 M). In contrast, CR1ϩ2 or CR3HD, when expressed separately, did not bind significantly better to the DNA molecule containing two sites (Table I). These results demonstrate that the full-length CDP/CUT protein is capable of cooperativity, albeit to a limited extent. The possible reasons for the lack of extensive cooperativity will be addressed under "Discussion." The NF-Y-binding Site (CCAAT) Is Recognized by CR1ϩ2, but Not by CR3HD, CR2HD, or CR23HD-We asked which of the CDP/CUT DNA-binding domains were able to bind to a CCAAT site. First, EMSAs were performed using CR1ϩ2 or CR3HD fusion proteins and oligonucleotides encoding the ATC-GAT site (29FP), the CCAAT site, or both the CCAAT site and a CGAT site. The rationale for the design of the latter oligonucleotides was that the optimal binding sequence for CR1ϩ2, as determined in Fig. 2, includes not just one but two C(A/G)AT sites. Although both CR1ϩ2 and CR3HD were able to bind to the ATCGAT sequence, only CR1ϩ2 efficiently bound to the single CCAAT site (Fig. 8, A-F). More important, replacement of 2 bases to create a second C(A/G)AT site (CCAATϩCGAT) greatly increased the affinity of CR1ϩ2 for the oligonucleotides (Fig. 8, B and C; see also Table I). Thus, CR1ϩ2 binds with higher affinity than CR3HD to a prototype CCAAT site, and the affinity of CR1ϩ2 for this site is further increased when a second C(A/G)AT site is present. These results are in agreement with the findings of several studies that reported the presence of more than one CCAAT-like sequence within promoters that are regulated by the CCAAT displacement activity (1, 3, 5, 17, 22, 24, 28, 30, 50 -52). More important, fusion proteins containing CR2 and the CUT homeodomain (CR2HD and CR23HD) did not efficiently bind to the CCAAT sites, although they could bind to ATCGAT (Fig. 8, G-L). Thus, in contrast to previous claims, the CCAAT displacement activity does not involve cooperation between CR2 and the CUT homeodomain (33). Altogether, these results suggest that the CCAAT displacement activity of CDP/CUT is provided by CR1ϩ2 and is optimized by the proximity of another half-site, CAAT or CGAT.

Full-length CDP/CUT Displays DNA-binding Kinetics and
Specificity Similar to Those of CR1ϩ2-To confirm that CR1ϩ2 is active in the context of the full-length CDP/CUT protein, we tested whether the protein could bind to the CCAAT and CCAATϩCGAT DNA sequences, two sites that are well recognized by CR1ϩ2, but not by CR3HD. The full-length CDP/CUT protein efficiently bound to these DNA sequences (Fig. 8, N and O; see also Table I). Moreover, CDP/CUT displayed DNA-binding kinetics essentially similar to those of CR1ϩ2 (Fig. 9, A and  B). It rapidly formed a complex with DNA, and it rapidly dissociated from it. Altogether, these results suggest that the CR1ϩ2 domains are active in the context of the full-length protein and play an important role in determining the DNAbinding kinetics and specificity of CDP/CUT.
CR1ϩ2 Is Responsible for the CCAAT Displacement Activity of CDP/CUT-The results accumulated so far suggested that the CCAAT displacement activity would require the presence of another half-site (CAAT or CGAT) and would involve competition by CR1ϩ2. To confirm these hypotheses, we analyzed binding to the CCAAT site using nuclear extracts from NIH3T3 cells. With a probe that contained CCAAT plus an CGAT motif, two main retarded complexes were observed (Fig. 10A, lane 2). The fast migrating complex disappeared in the presence of anti-NF-Ya antibodies, whereas antibodies against CDP/CUT FIG. 9. On and off rates of full-length CDP/CUT. 100 ng of dephosphorylated full-length CDP/CUT was incubated with radiolabeled 29FP oligonucleotides encoding the ATCGAT site. A, the incubation took place at 4°C; and at the indicated times, aliquots of the mixture were taken and analyzed by EMSA as described in the legend to Fig. 1. B, the incubation took place at room temperature until equilibrium was reached (15 min). A 1000-fold molar excess of unlabeled oligonucleotides was added; and at the indicated time points, aliquots of the mixture were taken and analyzed by EMSA as described in the legend to Fig. 1. CC, coiled coil. caused a supershift of the slower migrating complex, demonstrating that these two complexes contain NF-Y and CDP/CUT, respectively (Fig. 10A, lanes 3 and 4). The slow migrating CDP/CUT complex was not observed with a probe containing a simple CCAAT site, indicating that CDP/CUT does not efficiently bind to a CCAAT site in the absence of a second half-site (Fig. 10A, lanes 6 -8). In contrast, with a CCAAT-CGAT probe, overexpression of CDP/CUT led to an increase in the intensity of the slow migrating complex with a corresponding decrease in the NF-Y complex (Fig. 10B, lanes 1 and 2). As expected, the CCAAT displacement activity was abolished by adding an excess of unlabeled oligonucleotides containing a CDP/CUT consensus binding site (Fig. 10B, lanes 2 and 3). These results confirm that the CCAAT displacement activity is strengthened by the presence of a C(A/G)AT motif at close proximity. We then verified whether addition of purified CR1ϩ2 or CR3HD to the nuclear extract would similarly compete with NF-Y. Purified CR3HD did not bind to the CCAAT-CGAT probe (Fig. 10C, lane 5), and addition of CR3HD to the nuclear extract did not significantly affect any of the retarded complexes (lanes 6 -8).
In contrast, 1 ng of CR1ϩ2 produced a strong retarded complex, and addition of the same amount of CR1ϩ2 to the nuclear extract was sufficient to decrease both the CDP/CUT and NF-Y retarded complexes (Fig. 10C, lanes 2 and 3). Altogether, these results demonstrate that CR1 and CR2 are responsible for the CCAAT displacement activity of CDP/CUT. DISCUSSION CUT proteins belong to a novel class of homeodomain proteins that exhibit the unique feature of containing multiple DNA-binding domains. In previous studies, the DNA-binding properties of various GST-CUT repeat fusion proteins were analyzed (33)(34)(35)(36)47). However, GST fusion proteins were shown to exist as dimers, a property that may have affected their interaction with DNA (36,48,49). Moreover, how the native CUT protein interacts with DNA using four DNA-binding domains has not been thoroughly investigated. In this study, we analyzed monomeric histidine-tagged fusion proteins containing one or two CUT DNA-binding domains. We then investigated the properties of the full-length CDP/CUT protein, either as a purified protein from a baculovirus expression system or as an endogenous protein present in nuclear extracts from NIH3T3 cells. These experiments have established that a single CUT repeat does not efficiently bind to DNA as a monomer (Figs. 1B and 2C). However, a CUT repeat can efficiently bind to DNA in cooperation with either another CUT repeat or with the CUT homeodomain. The combinations that worked best were CR1ϩ2, CR1HD, CR2HD, and CR3HD. Thus, not only can CR3 cooperate with the homeodomain, but also CR1 and CR2, even when positioned at a distance from the homeodomain. Obviously, the linker regions between the CUT repeats and the CUT homeodomain allow sufficient flexibility to permit interactions between different CUT repeats and the homeodomain. We can therefore envisage, at least in theory, multiple modes of DNA binding by CDP/CUT. However, the demonstration that CR1 or CR2 can interact with the CUT homeodomain when CR3 is also present in the protein would require that we use a binding site that is recognized by CR1HD or CR2HD, but not by CR3HD. Unfortunately, such a binding site does not exist because the DNA-binding specificities of the various CUT repeats are not different enough.
Although CR1 and CR2 were capable of cooperative DNA binding, CR2 and CR3 were not. This was surprising in light of the facts that CR2 was able to interact with either CR1 or the CUT homeodomain, that CR3 could interact with the CUT homeodomain, and that each CUT repeat can cooperate with itself when expressed as a GST homodimer (33,34,36). We have considered the possibility that the failure of CR2ϩ3 to bind to DNA resulted from the fact that we did not use an optimal binding site for this particular combination. However, using PCR-mediated site selection, we have not been able to pull out high affinity binding sites for CR2ϩ3, even after seven amplification/selection cycles. These results led us to conclude that some intrinsic properties of CR2 and CR3 or the linker region between them do not permit a functional association between these two domains.
In contrast, the linker region between CR1 and CR2 appears to be highly flexible since the optimal binding sites as defined in PCR-mediated site selection contained two C(A/G)AT sites organized either as direct or inverted repeats. Moreover, the distance between these two sites could be varied (within the limits of the random sequence) without affecting the efficiency of DNA binding. Thus, CR1 and CR2 can exist in multiple configurations in relation to one another. Although we did not determine the upper limit of the distance between two C(A/ G)AT sites, it is likely that CR1ϩ2 is able to accommodate a fairly large distance since cooperative binding to CCAAT sites at a distance has been reported for the gp91 phox gene promoter (17).
The kinetics of DNA binding by CR1ϩ2 and heterodimers  made of one CUT repeat and the CUT homeodomain were very different. CR1ϩ2 displayed rapid on and off rates, whereas much slower on and off rates were observed when the CUT homeodomain was tested in association with any of the CUT repeats. The results from methylation interference assays suggested that the differences in binding kinetics are due to the fact that CR1ϩ2 binds the second and fifth positions of ATC-GAT only within the major groove, whereas CR3HD is able to make contacts within both the major and minor grooves. This latter finding is in agreement with the crystal structure of the Pou domain and Pou homeodomain in association with their binding site (53,54). In this complex, the Pou homeodomain was found to wrap around the double helix and to make contact within the minor groove.
The results from PCR-mediated site selection indicated that CR1ϩ2 binds with higher affinity to DNA sequences containing two C(A/G)AT sites in either orientations. Since the CAAT sequence is part of the NF-Y consensus binding site, our results suggested that CR1ϩ2 would bind with high affinity to this site when a second CAAT or CGAT site was present. Indeed, our results confirmed this prediction. In contrast, CR3HD preferred ATCGAT or ATCAAT and did not bind if the CAAT sequence was not preceded by AT (36). Similarly, CR2HD and CR23HD efficiently bound to ATCGAT, but not to the NF-Ybinding site. These results are in disagreement with previous claims that the CCAAT displacement activity involves cooperation between CR2 and the CUT homeodomain (33). Our results clearly demonstrate that the CCAAT displacement activity involves CR1ϩ2 without the participation of the CUT homeodomain. It is important to note, however, that the efficiency of the CCAAT displacement activity will not be the same on all promoters containing a CCAAT site and will rely on the presence of a second CAAT or CGAT site. The strength of the CCAAT displacement activity will be maximal when two perfect half-sites are present and will decrease when one or both half-sites diverge from the consensus binding site. In agreement with this prediction, we note that all promoters so far characterized as being regulated by a CCAAT displacement activity indeed contain two C(A/G)AT sites (Table II) (1, 3, 5, 17, 22, 24, 28, 30, 50 -52). Moreover, in EMSAs with DNA sequences from these promoters, detection of a CDP/CUT retarded complex required that a fairly long piece of DNA be used as a probe. In retrospect, we interpret these results to mean that a single CCAAT site was not sufficient for CDP/CUT binding.
Our results revealed that cooperation between various CDP/ CUT DNA-binding domains can generate at least two DNAbinding activities with distinct binding kinetics and specificities (Fig. 11). On the one hand, CR1 and CR2 bind rapidly but transiently to sequences containing two C(A/G)AT sites in either orientation. On the other hand, CR3 and the CUT homeodomain can form a stable complex with the ATCGAT DNA sequence (Fig. 11). Moreover, we were able to show that the purified full-length protein binds with higher affinity to oligonucleotides containing two binding sites (Fig. 7). Thus, the two bipartite domains CR1ϩ2 and CR3HD can cooperate, albeit weakly, to bind to DNA. Surprisingly, however, the full-length CUT protein bound in vitro with kinetics similar to those of CR1ϩ2, with fast on and off rates (Fig. 9). Although a fast on rate was to be expected because of CR1ϩ2, the simultaneous presence of CR3HD should have stabilized the protein on DNA. These results suggest the possibility that CR3HD is not very active in the context of the full-length CDP/CUT protein. This would help explain the lack of extensive cooperativity as noted above. Although such behavior by CDP/CUT in cells could be explained by invoking the phosphorylation of the homeodomain during the G 1 phase of the cell cycle, the same explanation cannot hold in the case of the purified full-length protein that was dephosphorylated in vitro prior to DNA binding (27). We considered the possibility that CR1ϩ2, in a manner analogous to high mobility group proteins, may impart a conformational change to DNA that would cause the quick release of CR3HD from its adjacent binding site (55)(56)(57). But when tested as individual proteins, CR1ϩ2 and CR3HD were able to bind simultaneously to the N 19 probe, which contains binding sites for both proteins (data not shown). Therefore, CR1ϩ2 does not prevent the stable binding of CR3HD to an adjacent site. Our results instead suggest that something else, perhaps some conformational constraint, prevents CR3HD from binding to DNA with high affinity in the context of the purified full-length protein; yet the same protein, when present within nuclear extracts in S phase, can stably bind to DNA (27). The molecular basis for this discrepancy is not known; however, our results clearly point to a difference in the behavior of the CDP/CUT protein in vitro and in nuclear extracts.