The NF-YB/NF-YC structure gives insight into DNA binding and transcription regulation by CCAAT factor NF-Y.

The heterotrimeric transcription factor NF-Y recognizes with high specificity and affinity the CCAAT regulatory element that is widely represented in promoters and enhancer regions. The CCAAT box acts in concert with neighboring elements, and its bending by NF-Y is thought to be a major mechanism required for transcription activation. We have solved the structure of the NF-YC/NF-YB subcomplex of NF-Y, which shows that the core domains of both proteins interact through histone fold motifs. This histone-like pair is closely related to the H2A/H2B and NC2alpha/NC2beta families, with features that are both common to this class of proteins and unique to NF-Y. The structure together with the modeling of the nonspecific interaction of NF-YC/NF-YB with DNA and the full NF-Y/CCAAT box complex highlight important structural features that account for different and possibly similar biological functions of the transcriptional regulators NF-Y and NC2. In particular, it emphasizes the role of the newly described alphaC helix of NF-YC, which is both important for NF-Y trimerization and a target for regulatory proteins, such as MYC and p53.

Transcription initiation by RNA polymerase II at class II gene promoters is a finely regulated process requiring the interplay of many different transcription factors (1). General transcription factors (GTFs), 1 namely TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, recognize specifically the core promoters, recruit the RNA polymerase, and help melt the DNA, thus enabling the initiation of transcription at the correct start site (2). Assembly of this preinitiation complex is controlled by a large set of transcriptional activators and repressors that recognize, in a sequence-specific way, DNA sequences located on proximal or distal enhancer regions of the promoters and function by contacting either directly or indirectly, through co-activators and co-repressors, the GTFs (1).
The eukaryotic transcription factor NF-Y (also termed CBF) specifically recognizes the regulatory CCAAT element found in either orientation in the proximal and distal enhancer regions of many genes (3)(4). In higher eukaryotes, this element is found in about 30% of the promoters, preferentially in the Ϫ60/Ϫ100 region, and analysis of various CCAAT boxes showed that specific flanking sequences are required for efficient binding (5)(6)(7). NF-Y is a heterotrimeric complex composed of NF-YA, NF-YB, and NF-YC which are all required for CCAAT binding (8). Each subunit contains a core region that has been highly conserved throughout evolution and that is sufficient for subunit interactions and CCAAT binding, whereas the flanking regions, which include the activation domains, are much less conserved (8 -13). In yeast, the activation function is encoded by a fourth subunit with no apparent homologues in other species (14).
NF-YC and NF-YB core regions are homologous in sequence to histones H2A and H2B, respectively, and are required for heterodimerization, a prerequisite for NF-YA association and CCAAT binding (8,(15)(16). NF-YC and NF-YB show an even higher sequence similarity with subunits ␣ and ␤ of NC2 (16), a protein that represses TATA box-dependent transcription, while increasing the activity of the distal promoter element (17)(18). The recent structure of a NC2/TBP/TATA element complex confirmed that NC2␣ and -␤ subunits interact through histone fold motifs and that NC2 recognizes the preformed TBP/TATA complex (19).
The NF-YA core domain is less than 60 amino acids long and is sufficient for DNA binding when complexed with NF-YC/NF-YB (9, 20 -21). Contrary to NF-YC and NF-YB, careful examination of available data bases failed to reveal homologues of NF-YA. Several studies have divided the NF-YA core domain into two segments: an N-terminal domain responsible for NF-YC/NF-YB binding, and a C-terminal domain implicated in specific recognition of the CCAAT element (20 -24).
Once the trimeric complex is formed, it binds DNA with very high specificity and affinity (6,25). Specific recognition of the bases seems to involve both minor and major groove interactions, and circular permutation assays indicated that, upon binding, the DNA is bent by about 60 -80° (26 -27). Footprinting and photocross-linking experiments have shown that the DNA is contacted by a subset of the three subunits at three different locations, spanning about 24 -26 nucleotides on each strand (6,28). In agreement with these results, it was shown that two CCAAT boxes cannot be occupied simultaneously, unless they are separated by at least 22-24 bp (27,29).
A major role of NF-Y is to act synergistically with other transcription factors for activation. The CCAAT box is generally found in the close vicinity of other promoter elements, and in many cases a precise distance is required for proper transcription. Evidence that this process requires CCAAT box bending and/or direct protein/protein interactions has been reported repeatedly (4). Several lines of evidence also indicate that NF-Y interacts directly with GTFs, especially TFIID (30 -32). Additionally, NF-Y has been shown to be the target of regulatory proteins such as c-Myc (33) and p53. 2 We have started the structural characterization of transcription factor NF-Y and have solved the structure of the complex between the conserved regions of human NF-YB and NF-YC by x-ray crystallography. The structure was refined at 1.6 Å resolution and shows that both proteins interact through histone fold motifs in a head-to-tail fashion. The structure is very close to that of H2A/H2B and especially of NC2␣/NC2␤, but changes at the sequence and secondary structure level provide explanations for various functional roles played by these different complexes. Based on this overall structural homology, which extends up to the electrostatic properties, the interaction between the NF-YC/NF-YB dimer and DNA was modeled and further extended to the full NF-Y/CCAAT element complex, in agreement with several biochemical studies performed on NF-Y, including footprinting experiments and mutational analyses. EMSA experiments were also carried out which emphasized the importance of the NF-YC/NF-YB histone-like pair in DNA binding and bending. Finally, the structure reveals an important element of secondary structure, the ␣C helix of NF-YC, which is not only involved in NF-YA binding but plays also a role in the regulatory pathway of important growth regulators such as MYC and p53.

EXPERIMENTAL PROCEDURES
Cloning, Expression, and Purification-The various constructs used for the co-expression study were amplified by standard PCR procedures. All NF-YB constructs were inserted in the pACYC184-11b vector (34), whereas all NF-YC constructs were inserted in the pET15b (Novagen) and pGEX4T-2 (Amersham Biosciences) vectors, using NdeI and BamHI restriction sites. Co-expression tests were carried out using a standard procedure described previously (34). For large scale expression, 6ϫ 1-liter cultures were grown, either in 2ϫ LB medium for native complexes or in M9 medium supplemented with selenomethionine (Sigma) for seleno-methionylated complexes. Cells were grown at 37°C to an absorbance of 0.3 at 600 nm, and the temperature was then switched to 25°C. Growth was then carried on until cells reached an absorbance of 0.8 -1.0 at 600 nm. At this point, co-expression of the complex was induced by adding a final concentration of 1 mM isopropyl-␤-D-thiogalactopyranoside (Euromedex), and cells were further grown overnight at 25°C. Cells were then collected by low speed centrifugation, resuspended in buffer A (10 mM Tris, pH 8.0; 400 mM NaCl), and lysed by sonication. The soluble fraction recovered by high speed centrifugation was mixed with 1 ml of Talon resin (Clontech) in the case of a His-tagged complex or 1 ml of glutathione-Sepharose 4B resin (Amersham Biosciences) in the case of a GST-tagged complex. After 1 h of incubation, the supernatant was removed and the resin washed extensively with buffer A. The resin was then resuspended in 2 ml of buffer A, and 5 units of bovine thrombin (Sigma) were added overnight at 4°C for cleaving off the tag. The supernatant containing the soluble dimer was recovered and applied onto a gel filtration column Hiload 16/60 Superdex 75 (Amersham Biosciences) equilibrated with buffer B (buffer A ϩ 2 mM 1.4-dithiothreitol, Roche Molecular Biochemicals). The purified complexes were concentrated on Microsep 10K Omega (Pall Filtron) to a final concentration of ϳ10 -14 mg/ml as assayed with Bio-Rad protein assay (Bio-Rad).
Crystallization-For crystallization of the NF-YC3/NF-YB3 complex, 2 l of protein complex solution were mixed with an equal volume of the reservoir solution containing 0.1 M NaHepes (Sigma), pH 7.5, 0.2 M Mg(OAc) 2 (Merck), and 10 -14% PEG 4000 (Fluka). Crystals appeared after a few hours and continued to grow for a few days or weeks to reach a size of approximately 0.5 ϫ 0.1 ϫ 0.1 mm (3). For the NF-YC2/NF-YB3 complex, the percentage of PEG 4000 had to be raised to 20 -24% to obtain crystals that were smaller and more clustered than for the NF-YC3/NF-YB3 complex. Only the latter dimer was used for solving the phase problem with selenomethionylated proteins. For data collection, crystals were briefly transferred in a cryoprotectant solution of 0.05 M NaHepes, 0.1 M Mg(OAc) 2 , 0.2 M NaCl, 13 or 22% PEG 4000, 20% glycerol, and then quickly frozen in liquid ethane.
Structure Determination-Data collection on native and selenomethionylated crystals was carried out on beamline BM30A at the European Synchrotron Radiation Facility. A three-wavelength multiwavelength anomalous diffraction experiment with collection of data up to 1.8 Å resolution was first carried out, and native data sets for the NF-YC3/NF-YB3 and NF-YC2/NF-YB3 complexes were then collected at 1.57 and 1.67 Å resolution, respectively. All data were processed and scaled using Denzo/Scalepack (35). Location of 5 of the 6 selenium atoms was done using Shake and Bake (see Ref. 36). Their positions were refined within the phasing program SHARP (37) and the phases further improved with the solvent flattening program SOLOMON (38).
Model Building, Refinement, and Modeling-Model building was carried out using program TURBO-FRODO (39). The model built in the initial 1.9-Å resolution multiwavelength anomalous diffraction electron density map was further refined independently against both native data sets by several cycles of manual building and refinement using standard protocols within the CNS (40). B-factor restraints for bonded main chain and side chain atoms were 1.5 and 2.0, respectively. B-factor restraints for angle main chain and side chain were 2.0 and 2.5, respectively. The coordinates of the NF-YC2/NF-YB3 complex have been deposited in the Protein Data Bank with code 1N1J.
Superimposition of NF-YC/NF-YB, H2A/H2B, H3/H4, and NC2␣/ NC2␤ complexes was carried out using polyglycine models with our in-house program Superpose, 3 and the transformations were applied onto the full models of the nucleosome core particle (Protein Data Bank code 1AOI) and of the NC2/TBP/TATA element complex (Protein Data Bank code 1JFI). The root mean square differences were obtained from the superimposition of the polyglycine model, removing additional residues but also helix ␣1-loop L1 in the case of superimpositions with H2A and H3, because these elements clearly have a different trajectory with respect to those of NF-YC or NC2␣.
Modeling of the NF-YC/NF-YB/DNA complex was made by extracting a DNA fragment from the structure of the nucleosome core particle once superimposed as described above. Replacement of the bases and the modeling of the interaction between NF-YA ␣-helices and the NF-YC/NF-YB/DNA complex was carried out manually in TURBO-FRODO. The coordinates of the model are available upon request.
Production of NF-YC Mutants and EMSA Experiments-NF-YC mutants were produced by PCR mutagenesis with the appropriate oligonucleotides in the backbone of the YC5 mutant (41). The recombinant His-tagged YC5 mutants were obtained in inclusion bodies from BL21 bacteria, renatured with equimolar amounts of NF-YB, and purified over nitrilotriacetic acid columns (29,42). The resulting dimers were assayed in immunoprecipitations and EMSA experiments with recombinant NF-YA and the monoclonal antibody 7 monoclonal antibody (42). Production and purifications of NF-Y and offrate EMSA experiments were done under conditions described previously (29,43).

Structure Determination of the NF-YC/NF-YB Complex-All
three subunits of NF-Y contain a core region that has been highly conserved throughout evolution and, in the case of NF-YC and NF-YB, that displays sequence homology to the histone fold motifs of H2A/NC2␣ and H2B/NC2␤, respectively (Fig. 1). The core domains of NF-YB and NF-YC have been shown to be necessary and sufficient for DNA binding in the context of the trimeric complex (15-16, 20, 41). However, less conserved stretches at their N and C termini seem to influence this process (29). The majority of histone fold proteins are produced in bacteria as insoluble material in inclusion bodies. We have studied the formation of the NF-YC/NF-YB pair with protein constructs of different lengths, by testing protein solubilization using the technique of co-expression in Escherichia coli (34). The results summarized in Table I indicate that only the evolutionary conserved domains of NF-YC and NF-YB, but not the less conserved regions, are necessary for complex formation.
For the subsequent crystallization trials, four of the six soluble complexes obtained were used: NF-YC2/NF-YB3, NF-YC2/NF-YB4, NF-YC3/NF-YB3, and NF-YC3/NF-YB4 (see Table I). Small crystals were initially obtained with the NF-YC3/NF-YB3 pair. Further refinement of the crystallization conditions showed that crystals of NF-YC2/NF-YB3 could also be obtained. Both crystals belong to the same space group with the same cell parameters (Table II). Crystals of the seleno-methionylated NF-YC3/NF-YB3 complex were also grown and used for solving the phase problem by multiwavelength anomalous diffraction (44). An initial model was built manually into the experimental electron density map at 1.9 Å resolution and was further refined independently against native NF-YC3/NF-YB3 and NF-YC2/NF-YB3 data sets at 1.57 and 1.67 Å resolution, respectively. The final models include 87 residues of NF-YB, 78 residues of NF-YC, about 300 water molecules, and have R factors around 18% and R-free factors around 20%, with very good deviations from ideal geometry (Table II).
In NF-YB3, no density was observed for the first seven residues. Mass spectrometry revealed that all these residues except for the initial methionine were present in the protein used for crystallization and also in the crystals (data not FIG. 1. NF-Y subunits sequence alignments. Sequence alignments of the core regions of NF-YA (A), NF-YB (B), and NF-YC (C) from human, Xenopus laevis, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Schizosaccharomyces pombe, and Saccharomyces cerevisiae. In the case of A. thaliana, one sequence was included for each subunit, but actually several different genes coding for each subunit are found in its genome (13). For NF-YB and NF-YC, the alignments also include the sequences from human H2B/NC2␤ and H2A/NC2␣, respectively, and are based on the superposition of the three structures. Human NF-Y subunits numbering is used. A, fully and almost totally conserved residues of NF-YA core domain are colored red and blue, respectively. Domains implicated in NF-YC/NF-YB (NF-YA1) and DNA (NF-YA2) binding are indicated, with important residues for each function boxed. B and C, fully conserved residues in all, in the NF-Y/NC2, and only in the NF-Y sequences are colored red, green and blue, respectively. Secondary structure elements (bars for ␣-helices and solid lines for coils), as observed in the structure, are colored orange above the alignments. Black solid lines indicate regions present in the crystals that are not seen in density. Intra-chain arginine-aspartate pairs have been schematically represented in red. Boxed residues indicate amino acids of H2A/H2B and NC2␣/NC2␤ pairs that hydrogen-bond directly the DNA backbone with at least main chain atoms (red boxes) or only their side chains (blue boxes).
shown). Thus, the N-terminal residues of NF-YB3, which point toward a solvent channel, are probably disordered. In NF-YC3, only the residual thrombin site residues Gly-Ser at the N terminus were not unambiguously found in density. In the case of the NF-YC2 construct, which is 16 residues longer than NF-YC3, no additional residues could be built at the N terminus either. Once again, mass spectrometry revealed that all the unobserved residues are present in the crystals (data not shown). Because the initial experimental phases were obtained for the NF-YC3/NF-YB3 complex, it could be assumed that the initial model was not good enough to provide phases for these residues. However, several loops in other parts of the structure, which could not be seen in the initial electron density map, appeared during refinement, whereas density at the N terminus of NF-YC2 never improved. Because a large solvent channel was found where the residues should be located, it seems reasonable to assume that these residues are disordered.
NF-YC/NF-YB Forms a Histone-like Pair-As expected, the core domains of NF-YB and NF-YC adopt a histone-like fold and interact in a head-to-tail fashion, forming a histone-like pair ( Fig. 2A). Interestingly, comparison of the NF-YC/NF-YB, NC2␣/NC2␤, H2A/H2B, but also H3/H4 histone pairs, reveals relatively little differences between their core histone motifs (helix ␣1-loop L1-helix ␣2-loop L2-helix ␣3; see Fig.  2B) both in terms of sequence identity (ranging from 10 to 20%) or pairwise main chain root mean square differences (ranging from 1.5 to 1.1). Actually, that NF-YC/NF-YB belongs to the H2A/H2B family is confirmed by the presence of additional elements of secondary structure, at the C termini of both proteins, characteristic of H2A and H2B, although H3/H4 features are also observed (see below). The interactions between the various elements of secondary structure of NC2␣/NC2␤ and the comparison of this pair with the H2A/ H2B dimer have already been described at length (19). The conclusions mostly apply to NF-YC/NF-YB and will not be discussed further. Rather, we will focus on the differences and the specificities we observe.
One feature concerns the presence in both NF-YB and NF-YC of an intra-chain arginine-aspartate bidentate pair which is found in histones H3 and H4 but not in H2A and H2B (45) (Fig. 2A). In NF-YC this pair is formed by residues Arg-93 (loop L2) and Asp-100 (helix ␣3), and both are absolutely conserved in the NF-YC but neither in H2A nor in NC2␣ families (Fig. 1C). In NF-YB, residues Arg-108 (loop L2) and Asp-115 (helix ␣3) form an identical pair and are also absolutely conserved throughout evolution. Once again, this pair is not conserved in NC2 and is replaced by an arginine/ lysine-glutamate pair in H2B (Fig. 1B). In this latter case, however, the pair is not formed, and the arginine contacts the DNA; interestingly, this is not seen in H3 and H4, where the pairs are formed even if the arginines are in the vicinity of the DNA backbone (45).
Another specific feature is the presence in NF-YC of an absolutely conserved tryptophan at position 85, at the end of helix ␣2, sandwiched between loop L2 of NF-YC and loop L1 of NF-YB (Fig. 2). Such a bulky residue at this position clearly influences the overall structure of this region. Interestingly, this amino acid is not conserved in NC2␣ and the conformation of L1 of NC2␤ is different from that of NF-YB. Because L2 of NC2␣ was not seen, it is impossible to compare it to that of NF-YC. The difference in L1 loop conformation is certainly dependent on their lengths; the length of NC2␤ is in general one residue shorter, although in yeast it has the same length. The hydrophobic cores organizing these regions are rather different, with little difference observed in the rest of the structures. These cores may be the reason why NF-YB/ NC2␣ and NF-YC/NC2␤ pairs cannot be formed (42). Notably, when both structures are superimposed, it is clear that steric clashes would occur between residues of NC2␤ L1 and Trp-85 which are not likely to be accommodated by conformational changes.
Characteristic of the H2B family, a long ␣C helix is found in NF-YB (Fig. 2). This helix is shorter than those of H2B and NC2␤, but because additional residues at its C terminus seem to prevent crystallization, it is possible that it extends further. In NF-YC, a loop-short helix-loop motif is found C-terminal to the core histone fold (Figs. 2 and 3). A short ␣C helix is also found in H2A but is positioned rather differently; it packs against the C terminus of helix ␣3 of H2A on one side, and loop LC/start of helix ␣C of H2B on the other side, making few other interactions with the rest of the dimer (Fig.  2C). The packing is totally different in the case of NF-YC, where the ␣C folds back onto ␣3 and participates in a large hydrophobic core formed by residues of ␣2 and ␣3 of NF-YC and ␣2 of NF-YB. The interactions between loop LC/helix ␣C of NF-YC and loop LC/helix ␣C of NF-YB are fewer than in H2A/H2B, where loop LC/start of helix ␣C of H2B is closer and interacts strongly with helix ␣C of H2A (Fig. 2C), especially with an absolutely conserved glutamate of H2A being fixed by the dipole effect of helix ␣C of H2B. Interestingly, the differences between these short ␣C helices extend further, and as in the case of NF-YC this region does not fold as an ␣-helix, but a 3 10 -helix (however, for clarity, the term ␣C has been kept). For technical reasons, the sequence spanning this region in NC2␣ was replaced by unrelated residues, and this chimera was subsequently used during crystallization studies of the NC2/TBP/TATA element complex (19). Based upon the strong sequence homology between NF-YC and NC2␣ in the ␣C region (residues 109 -114) and in the rest of the secondary structure elements participating in the hydrophobic core stabilizing it, we anticipate that a helix is also present in NC2␣ at the corresponding position. Whether additional residues at the C terminus of this helix adopt the  (47), NF-YC and NF-YB, have been colored orange and green, respectively, with elements of secondary structure indicated. Arginineaspartate pairs are displayed, together with Trp-85 of NF-YC that may play an important role in the specific interaction between NF-YB and NF-YC, in comparison to NC2. B, stereo C␣ traces of the superimposition of the NF-YC/NF-YB (orange), H2A/H2B (gray), NC2␣/NC2␤ (blue), and H3/H4 (green) histone pairs. The tails of the histone proteins and H3 ␣N helix have been removed for clarity. C, superimposition of NF-YC/NF-YB (orange) and H2A/H2B (gray) histone pairs. The tails of H2A and H2B have been removed for clarity. The elements of secondary structure showing major differences have been labeled. D, superimposition of NF-YC/NF-YB (orange) and NC2␣/NC2␤ (blue) histone pairs. Helix ␣5 of NC2␤ has been removed for clarity. NF-YC Trp-85 is displayed. same loop conformation seen in NF-YC is not clear, as NF-YC and NC2␣ sequences tend to diverge from this point.
Minimal DNA Fragment Required for Proper Binding by NF-Y-Previous footprinting experiments have shown that three regions of the CCAAT boxes of the pro-␣2(I) and pro-␣1(I) collagen promoters are protected upon NF-Y binding (6). We have performed EMSA experiments on the Ea promoter to assess the minimal DNA fragment required for proper NF-Y binding. The results of dose-response and off-rate experiments performed with full-length as well as with a mutant containing the conserved domains of the three subunits are summarized in Table III. Essentially, all the protected DNA stretches are important for proper recognition by NF-Y. Partial removal of one of these regions generally leads to a decrease in binding (oligonucleotides Ea-6, 12-Ea, and 8-Ea-2). The effect observed is rather weak when considering the region the farthest from the CCAAT pentanucleotide (see oligonucleotides Ea-4, Ea-6, and 8-Ea-2), but deletion of this site results in an almost complete loss of binding (oligonucleotide Ea-10). Besides, the comparison between oligonucleotides of identical length (Ea-10 and 8-Ea-2) clearly shows the asymmetry in position of the CCAAT pentanucleotide on the minimal DNA fragment required for NF-Y binding.
Mutational Analysis on NF-YC ␣C Residues-NF-YC helix ␣C has been shown to be important for NF-YA binding (15). We have mutated certain residues of this helix ( Fig. 3 and Table IV), either solvent-exposed or buried in hydrophobic cores, to further characterize its role in dimer and trimer formation and DNA binding, using EMSA experiments. None of the mutations affected dimerization, showing that the helix does not play an important role in the interaction be-    tween NF-YB and NF-YC. The mutation of the solventexposed aspartate 112 into asparagine leads to a decrease in trimerization but not in DNA binding (Table IV), showing that this residue most probably plays a role in NF-YA binding but that in the presence of DNA the trimeric interaction is stabilized. Two mutations, F111S and L114T, were supposed to destabilize the hydrophobic core in which the ␣C helix participates. From their positions, the F111S mutation should destabilize the overall hydrophobic core, whereas the L114T mutation should weaken the anchoring of helix ␣C to the rest of the dimer. Both F111S and L114T are indeed highly reduced in association with NF-YA. However, as for the D112N mutant, DNA binding was not affected, confirming that in presence of DNA the interaction with NF-YA is stabilized. Two more radical mutations were performed on solvent-exposed isoleucine residues, I115P and I117P, that are outside helix ␣C. Interestingly, both mutations prevent trimerization but also DNA binding (Table IV). On the other hand, we mutated isoleucine 115 into a lysine, the only residue of human NC2␣ that is deviant within this conserved stretch. Contrary to the I115P mutant, the I115K behaves like wild-type NF-YC in dimerization, trimerization, and DNA-binding assays (Table IV), showing that the conformation of the loop following helix ␣C is also important for NF-YA binding. DISCUSSION DNA Binding by NF-YC/NF-YB-The structure described here confirms that the NF-YC/NF-YB histone pair is structurally closely related to the H2A/H2B and NC2␣/NC2␤ dimers and suggests that DNA binding by NF-YC/NF-YB might also be similar. Both H2A/H2B and NC2␣/NC2␤ interact directly and non-specifically with DNA in a multiprotein context, within the histone octamer and with TBP, respectively. In these complexes, few direct protein/DNA contacts are made by the core histones motifs (Fig. 1). A stable interaction between these histone pairs and the DNA seems to require other protein stretches, e.g. the histone tails and helix ␣5 of NC2␤. Such additional regions do not seem to exist in NF-YB and NF-YC, and it is clear that to obtain the remarkable specificity and affinity for the CCAAT sequence, NF-YA must stabilize the complex, although photocross-linking experiments have confirmed that all three subunits of NF-Y interact with DNA directly (28).
The DNA fragments recognized by H2A/H2B (within the nucleosome core particle) and NC2␣/NC2␤ (in complex with TBP and the TATA element) have rather different conformations (in the latter case the DNA is strongly distorted upon TBP binding). However, both complexes display similar DNA binding properties (19), and the trajectory of these fragments on the surface of these complexes is extremely similar (Fig. 4A). The electrostatic properties of NF-YC/NF-YB are almost identical to those of H2A/H2B and NC2␣/NC2␤; thus, it is tempting to postulate that the CCAAT box would also follow an identical trajectory onto NF-YC/NF-YB (Fig. 4B).
Such an interaction was modeled by superimposing the NF-YC/NF-YB structure onto a H2A/H2B dimer from the nucleosome core particle (see "Experimental Procedures"). Upon modeling, no steric clashes between the NF-YC/NF-YB dimer and the DNA are observed (Fig. 4C). As in the nucleosome structure (45)(46), both the DNA interactions sites L1L2 (formed by loops L1 and L2 at both extremities of the dimer) and ␣1␣1 (formed by both ␣1-helices) are able to make contacts with the DNA, and the dipoles of helices ␣1 and ␣2 also point toward phosphate groups. On the other hand, the two arginines of H2A and H2B penetrating into the minor groove have not been conserved. The same observation is true for Lys/Arg-29 of NC2␣, the only residue contacting directly a base of the TATA element, which is replaced by an absolutely conserved methionine in NF-YC. In fact, careful inspection of the model could not identify any residue from the core domains of NF-YB and NF-YC which would be able to make specific contacts with a base of the CCAAT box. Besides, in this model the histone dimer spans about 24 -26 bp, which is in excellent agreement with biochemical data (7,27,29).
The interaction between NF-Y and the CCAAT box was further modeled by replacing the DNA bases found in the nucleosome structure by those of the pro-␣2(I) collagen promoter. Two locations for the CCAAT box are possible (depending on the strand chosen as the plus strand) that are related by the pseudo 2-fold axis of the histone dimer (Fig.  4C). One of them agrees better with NF-YA binding (see below). This model is in good agreement with the footprinting and methylation interference experiments made on several promoters (6,26). In particular, the L1L2 sites would be responsible for interacting with the protected sites at both extremities of the footprinted region and are actually sufficient to explain the protection by hydroxyl radical cleavage at these sites. Because these protein regions are not supposed to make specific contacts with the DNA, it would also explain why no interference by methylation has ever been observed at these locations. As for the ␣1␣1 site, it only partially accounts for the central region footprint; specifically, the CCAAT element itself, the only region of the DNA where methylation interference occurs (6,26), is not protected by any region of the dimer (Fig. 4C). This clearly suggests that this protection is brought by the third subunit, NF-YA.
The model further provides a good explanation for the results of our EMSA experiments. Indeed, both strands are contacted by the dimer at all the protected sites. Partial removal, on one strand only, of one of the external sites would lead to smaller effects in terms of binding, as is seen experimentally (Table III). On the other hand, complete removal of one of these sites on both strands should have a much drastic effect, which is the case when considering oligonucleotide Ea-10. It is interesting to note that in the case of NC2 where the histone pair recognizes a preformed TBP/TATA element complex, the requirement for three interaction sites seems to be less stringent (19).
Recognition of NF-YA and CCAAT Box Binding by NF-Y-Many biochemical studies have attempted to decipher the set of interactions between yeast and mammalian NF-YA, NF-YC/NF-YB, and the CCAAT box (15-16, 20 -24, 29, 42). Most of these studies were mutational analyses, performed either by point or deletion mutations. Recollection of all these data in the light of our structure and of the proposed model reveals that most of the mutants described can affect dimer and/or dimer/CCAAT interaction as follows: (i) interfering with the packing of the NF-YC/NF-YB dimer; (ii) destroying the dipole effect of ␣-helices ␣1 and ␣2 supposed to fix phosphate groups; and (iii) abolishing interactions, or causing steric or electrostatic hindrance, between the histone dimer and the DNA. Essentially, the vast majority of the mutations, particularly those falling in the last two classes, favor the NF-YC/ NF-YB/CCAAT element model. The remaining mutations that do not fall into these three classes have been further considered to model the interaction of NF-YA with both the histone dimer and the DNA.
Two regions of the core domain of NF-YA have been identified as follows: an N-terminal region (NF-YA1; residues 234 -257) recognizing the NF-YC/NF-YB dimer, and a C-terminal region (NF-YA2; residues 269 -289) responsible for specific recognition of the DNA (20 -24) (Fig. 1A). The N terminus of NF-YA1 forms an ␣-helix in solution and only residues on one side of this helix are functionally important (23). In NF-YB, mutations on helix ␣2 (E90R and S97R) were shown to influence NF-YA binding (16,22). In NF-YC, both helices ␣1 and ␣C have been shown to be important for NF-YA binding (15,42) (our mutational analyses). These three elements of secondary structure are on one side of the NF-YC/NF-YB dimer and form a groove where NF-YA1 N-terminal ␣-helix could bind (Fig.  4D). Such an interaction was modeled, showing that functionally important residues of NF-YA1, such as Arg-245 and Arg-249, could contact residues at the surface of the dimer, including Glu-90 and Ser-97. Interestingly, in this model the conserved NF-YA Ile-246 would pack against Ile-117, which is solvent-exposed in the loop following NF-YC helix ␣C. We have shown that a mutation of the latter isoleucine into proline completely abolishes NF-YA binding (Table IV), a result that further highlights the importance of NF-YC C-terminal region in trimer formation. Intriguingly, the model does not provide an explanation to the fact that the D112N mutant is impaired in NF-YA recognition (Table IV), showing that NF-YA1 binding possibly requires other interactions than the ones mentioned above.
From secondary structure prediction analysis, the NF-YA2 domain can be divided into an ␣-helical N-terminal region (residues 269 -281) and a small coiled C-terminal region (residues 282-289), as for NF-YA1 (data not shown). Several mutational experiments indicate that helix ␣1 of NF-YB influences DNA recognition by NF-YA2 (16,22,42) and suggest that these two regions interact directly, with NF-YB helix ␣1 possibly positioning NF-YA2 ␣-helix in a correct orientation. We have already mentioned that modeling of the CCAAT box left two possible locations for this DNA sequence. Interestingly, NF-YB helix ␣1 is positioned exactly below one of these two locations, on its major groove side. Modeling of the interaction between the helical part of NF-YA2 with both NF-YB helix ␣1 and the CCAAT pentanucleotide was rendered difficult by the fact that essential residues of NF-YA2 might be involved either in specific base recognition, in phosphate backbone recognition, or might interact with NF-YB. Besides, one cannot exclude that NF-YA binding distorts DNA into a conformation that would be preferred for proper recognition by NF-YC/NF-YB, a fact that could not be accounted for by our model. However, as it stands, the model could fully explain the footprinting pattern observed for NF-Y on the collagen promoter and our EMSA experiments.
Several questions still remain. First, the orientation of the NF-YA2 helix is not known because the linker connecting NF-YA1 to NF-YA2 could possibly either go through the space left between the NF-YC/NF-YB dimer and the DNA or cross over to the DNA (Fig. 4D). Second, NF-Y was shown to bind into the minor groove (6,26), a fact that could possibly be explained by having the NF-YA linker region crossing over the DNA, although it cannot be excluded that the supposed coiled C-terminal region of NF-YA2 could also play such a role. Third, the flanking regions of the CCAAT pentanucleotide are crucial for efficient binding of NF-Y. It is not known whether these bases can be recognized specifically by NF-Y, or are necessary for proper distortion of the DNA, or both. Finally, another unresolved question concerns NF-YA and NF-YC having been shown by photocross-linking studies to interact more extensively with CCAAT elements than the model would explain (28). We suspect that this might be because of the interaction between the activation domains of these two proteins and the DNA. Such open questions, and possibly others, will only be answered with the determination of the structure of the quaternary complex.
Structural and Functional Differences with H2A/H2B and NC2␣/NC2␤-A lot of controversy arises from the fact that the NF-YC/NF-YB, H2A/H2B, and NC2␣/NC2␤ histone pairs share sequence and structural similarity but have different functional roles inside the nucleus. This is clearly reinforced by the fact that their DNA binding characteristics are strongly conserved and that the NF-YC/NF-YB pair has been shown to interact with protein partners of the two other dimers. Indeed, NF-YC/NF-YB can associate with H3/H4, but not with H2A/H2B, to form higher order structures (43). After superposition of our dimer onto a H2A/H2B dimer of the core nucleosome particle, we looked at the possibility of forming (H3/H4) 2 (NF-YC/NF-YB) 2 octamers reminiscent of the histone octamer. Clearly, such an hypothesis is not valid, as many steric clashes occur at the different interfaces between the pairs (data not shown). This result is in agreement with previous data and with the fact that NF-YC/NF-YB can also associate with formed nucleosomes, suggesting indeed that the interactions between these dimers are rather different (43).
Next, NF-YC/NF-YB has also been shown to interact in vitro with TBP but not with a preformed TBP/TATA element (30,42). In the NC2/TBP/TATA structure, NC2 makes relatively few protein-DNA contacts, which could also be formed by NF-YC/NF-YB, and it recognizes TBP on both sides at two locations, thus encircling the DNA with TBP (19). The strongest interaction corresponds to numerous contacts between the ␣5 helix of NC2␤ and the C-terminal domain of TBP, whereas at the other location an arginine side chain from TBP contacts a main chain carbonyl and is stacked against other side chains of NC2␣. From sequence alignments, it is not clear whether NF-YB contains a fifth helix which might contact TBP as NC2␤ does. If there is any interaction between NF-YB and TBP, it would most probably be different. In the case of NF-YC, the replacement of the absolutely conserved Gly-28 in NC2␣ by Lys/Arg-59 would cause a strong steric hindrance, preventing interaction with TBP. In conclusion, and in agreement with experimental data, although the determinants for DNA binding in the core histone regions of NC2␣/NC2␤ are conserved in NF-YC/NF-YB, recognition of TBP by this latter complex must be rather different and cannot be achieved in the context of a preformed TBP/TATA element, as observed with NC2.
The previous examples suggest the existence of specific determinants implicated in the functionality of each pair. Another aspect in which the NF-YC/NF-YB dimer might play a role independently from NF-YA association is related to the positive transcription function of NC2, recently unmasked on distal promoter elements, for which the histone folds are sufficient (18). The mechanistic details are poorly understood at present but clearly independent from TBP binding, and are possibly related to facilitation of correct connections between the DNA and the H3/H4-like TAF II s within TFIID. Because NF-YC/NF-YB is known to interact with histone-like TAFs (32), it would be interesting to investigate whether NF-YC/ NF-YB might, in this case, play an essentially identical positive role than NC2 at the distal promoter element or whether other determinants make this process once again NC2-specific.
The NF-YC ␣C Region Is a Target for Regulatory Proteins-The distortion of the DNA by NF-Y, as modeled here, possibly coupled to the recruitment operated by NF-YA and NF-YC activation domains would make it possible for other genespecific regulatory factors (e.g. RFX, SREBPs, Sp1, and C/EBP) to come in close vicinity of the GTFs, thus facilitating transcription activation. More globally, the emergence in highly regulated promoters of several CCAAT elements located 30 -40 bases apart clearly raises the question of the precise three-dimensional arrangement mediated by NF-Y, and of the requirement of such large DNA distortions for the recruitment of incoming positive as well as negative cofactors. Recent studies reveal that this process might be influenced by different regulatory proteins in a promoter-dependent way and that the nearly invariant NF-YC ␣C region, besides its role in NF-YA binding, is a target element for these proteins. First, it has been shown that the C terminus of the NF-YC core region is the docking site for c-MYC and that this interaction is absolutely necessary for transcriptional repression by c-MYC on the platelet-derived growth factor ␤-receptor (33). Second, p53 transcriptional repression on promoters having multiple CCAAT boxes has been shown to be dependent on NF-Y, and once again, the ␣C region of NF-YC is required for this process. 2 The different location of NF-YC helix ␣C, compared with that of H2A, reveals a unique specificity of the NF-Y and also most probably of the NC2 sub-family. The overall strong evolutionary sequence conservation between these both histone pairs raises the question whether NF-Y and NC2, contrary to their different functional roles, could share common regulatory pathways. In this respect, it would be interesting to study whether c-MYC and p53 would have any effects on NC2 known functions.