The yeast homeodomain protein MATalpha2 shows extended DNA binding specificity in complex with Mcm1.

The MATα2 (α2) repressor interacts with the Mcm1 protein to turn off a-cell type-specific genes in the yeast Saccharomyces cerevisiae. We compared five natural α2-Mcm1 sites with an α2-Mcm1 symmetric consensus site (AMSC) for their relative strength of repression and found that the AMSC functions slightly better than any of the natural sites. To further investigate the DNA binding specificity of α2 in complex with Mcm1, symmetric substitutions at each position in the α2 half-sites of AMSC were constructed and assayed for their effect on repression in vivo and DNA binding affinity in vitro. As expected, substitutions at positions in which there are base-specific contacts decrease the level of repression. Interestingly, substitutions at other positions, in which there are no apparent base-specific contacts made by the protein in the α2-DNA co-crystal structure, also significantly decrease repression. As an alternative method to examining the DNA binding specificity of α2, we performed in vitro α2 binding site selection experiments in the presence and absence of Mcm1. In the presence of Mcm1, the consensus sequences obtained were extended and more closely related to the natural α2 sites than the consensus sequence obtained in the absence of Mcm1. These results demonstrate that in the presence of Mcm1 the sequence specificity of α2 is extended to these positions.

The MAT␣2 (␣2) repressor interacts with the Mcm1 protein to turn off a-cell type-specific genes in the yeast Saccharomyces cerevisiae. We compared five natural ␣2-Mcm1 sites with an ␣2-Mcm1 symmetric consensus site (AMSC) for their relative strength of repression and found that the AMSC functions slightly better than any of the natural sites. To further investigate the DNA binding specificity of ␣2 in complex with Mcm1, symmetric substitutions at each position in the ␣2 half-sites of AMSC were constructed and assayed for their effect on repression in vivo and DNA binding affinity in vitro. As expected, substitutions at positions in which there are base-specific contacts decrease the level of repression. Interestingly, substitutions at other positions, in which there are no apparent base-specific contacts made by the protein in the ␣2-DNA co-crystal structure, also significantly decrease repression. As an alternative method to examining the DNA binding specificity of ␣2, we performed in vitro ␣2 binding site selection experiments in the presence and absence of Mcm1. In the presence of Mcm1, the consensus sequences obtained were extended and more closely related to the natural ␣2 sites than the consensus sequence obtained in the absence of Mcm1. These results demonstrate that in the presence of Mcm1 the sequence specificity of ␣2 is extended to these positions.
Homeodomain proteins are a family of transcription factors involved in many developmental and cellular processes and have been found in almost every eukaryotic organism (1)(2)(3)(4). The natural target sites for many homeodomain proteins are unknown; therefore, their DNA-binding sites have been defined through site selection experiments in vitro (5)(6)(7)(8)(9). Although these studies provide important information on homeodomain-DNA recognition in vitro, in some cases it appears that the homeodomain proteins do not function well at these sites in vivo (10 -14). One possible explanation for this discrepancy is that in vivo the DNA binding specificity and affinity of some homeodomain proteins may be influenced by interactions with cofactors (13,(15)(16)(17)(18)(19). Since many of the studies which examine homeodomain binding sites have been done in the absence of cofactors, this may explain why in some cases sites selected in vitro may not be functional sites for the homeodomain protein complexes in vivo. To address this issue and to understand how homeodomain proteins recognize specific sites, we have investigated the DNA binding specificity of ␣2, a yeast homeodomain protein, in which the natural target sites and cofactors are well known.
The ␣2 protein is involved in the regulatory system that specifies cell mating type in Saccharomyces cerevisiae (20 -24). In diploid cells, the ␣2 and a1 proteins form a heterodimer to repress expression of haploid-specific genes (25,26). In haploid ␣ cells and diploid cells, ␣2 interacts with a general transcription regulatory factor, Mcm1, to repress expression of a-specific genes (asg) 1 (27)(28)(29). DNase I protection and deletion experiments of the promoter region of the STE6 gene revealed an operator site required for ␣2-mediated repression (23,30). Sequences similar to this site were also found in the promoters of four other asg, BAR1, STE2, MFA1, and MFA2 (20,(31)(32)(33). On its own, ␣2 binds in vitro to the STE6 operator site with moderate affinity (34). In the presence of Mcm1, the apparent ␣2 DNA binding affinity increases at least 100-fold, indicating that there are strong cooperative interactions between ␣2 and Mcm1 (28,29). ␣2 binds DNA as a dimer with each monomer flanking an Mcm1 dimer, which binds to the center of the ␣2-Mcm1 site. Mutations in either the ␣2 or Mcm1 binding site of the STE6 operator dramatically reduce the level of repression in vivo, demonstrating that binding by both proteins is required for repression (25,35,36).
The ␣2 protein (25 kDa, 210 amino acids) contains two structural domains (34). The N-terminal domain (residues 1-102) is required for dimerization and repression (37)(38)(39). The C-terminal domain (residues 132-188) contains a homeodomain, which binds DNA on its own in vitro but is not sufficient for repression (34,37). Flexible regions adjacent to the N terminus and C terminus of the ␣2 homeodomain are required for interaction with the Mcm1 and a1 cofactors, respectively (26, 40 -44). The three-dimensional structure of the ␣2 homeodomain has been determined by NMR and x-ray crystallography studies (42,45,46). Although there is only 27% sequence identity between the ␣2 and Drosophila engrailed homeodomains, their overall structures are very similar (46,47). Moreover, these proteins bind DNA in a similar manner, and most of the conserved residues in the third helices of the homeodomains make identical contacts with DNA (46,47). The ␣2 protein therefore provides a good model for studies of homeodomain protein-DNA interactions.
In this paper, we have examined in detail the sequence requirements for the ␣2 homeodomain protein in repression of asg in vivo and in vitro. Our results indicate that, in complex with Mcm1, the sequence specificity of DNA binding by ␣2 is apparently more extended than on its own. These results suggest one explanation for DNA recognition sites determined in vitro by proteins in the absence of their cofactors may not function as optimal sites in vivo.

MATERIALS AND METHODS
Plasmid Construction-Plasmids, which contain ␣2-Mcm1 binding sites (operators) found in the promoter regions of the asg (STE6, STE2, BAR1, MFA1, and MFA2 (21-24)), were constructed by inserting double-stranded oligonucleotides containing the operators with TCGA overhangs into the XhoI site between the TATA and UAS elements in the CYC1-lacZ promoter of pTBA23 (44). The ␣2-Mcm1 consensus symmetric site (AMSC) and mutant derivatives are symmetric and therefore can self-anneal, leaving TCGA overhangs for cloning into the reporter vector. The mutant sites are named by describing the original nucleotide, position, and substituted nucleotide in the top strand. For instance, T3A/A28T is a symmetric mutant in which the T at position 3 is mutated to A and the A at position 28 is mutated to T.
␤-Galactosidase Assays-Plasmids containing different asg operators or AMSC sites were transformed into the yeast strain 246.1.1 (MAT␣ trp1 leu2 ura3 his4) for ␤-galactosidase assays (48). The assays were performed as described (27). The level of repression was determined by comparing lacZ expression from a plasmid containing an ␣2-Mcm1 site with a plasmid lacking a site, pTBA23. Assays were performed with three independent transformants of each construct, and the ␤-galactosidase units were averaged. Standard deviations were within 20%.
Electrophoretic Mobility Shift Assays (EMSAs)-The relative ␣2-Mcm1 DNA binding affinities for the AMSC and mutant operators were determined by EMSA as described previously (43). Labeled DNA fragments containing ␣2-Mcm1 sites were incubated with purified ␣2-(92-210) and Mcm1-(1-96) proteins at the concentrations given in the figure legends at room temperature for 3 h. The ␣2 and Mcm1 proteins used in these experiments were purified as described previously and were greater than 90% pure as judged on Coomassie-stained SDS-polyacrylamide gels (40). All EMSAs were electrophoresed through 6% 0.5 ϫ TBE, native polyacrylamide gel at 200 V for 2 h. Gels were then dried and exposed to phosphor screens, and images were scanned on a Molecular Dynamics PhosphorImager model 425.
Labeled DNA fragments used in EMSAs were synthesized by PCR amplification. Oligonucleotides W340 (5Ј-CACGCCTGGCGGATCTGC) and W341 (5Ј-GCCCACGCGTAGGCAATC), which anneal to the sequences on either side of the ␣2-Mcm1 site in the CYC1-lacZ promoter, were used as primers in PCR amplifications. W340 was kinased with [␥-32 P]ATP and purified with a QIAquick spin column (Qiagen). PCR amplification was performed as follows: 94°C for 5 min; 94°C for 30 s, 48°C for 1 min, 72°C for 1 min for 35 cycles; and 72°C for 10 min. PCR products were purified on a 10% native polyacrylamide gel. The relative ␣2 DNA binding affinity for each site was calculated by comparing the percentage of fragment bound at different ␣2 protein concentrations. The cooperativity between ␣2 and Mcm1 was determined by comparing the ␣2 DNA binding affinity in the presence of Mcm1 with that in the absence of Mcm1.
Random Binding Site Selection-The binding site selection assays were performed with oligonucleotide W896 (5Ј-gcccacgcgtaggcaatcgaa-ttcN 8 tacctaattaggaagtcgacgcagatccgccaggcgtg), where N signifies a random base and the underlined nucleotides represent the core Mcm1 binding sequence. Nucleotides flanking the Mcm1 core binding site were designed to be imperfectly symmetric to ensure that the sites selected in the assay were not from contamination of a wild type sequence ( . . . CATGTAATtacctaattaggta . . . ) that was used for comparison of the DNA binding affinity. W896 was made double-stranded by filling in with Sequenase using the end-labeled W340 as a primer that anneals to the 3Ј part of W896. EMSA of the randomized double-stranded oligonucleotides were performed with ␣2-(92-210) as described above. DNA in the shifted band that is detectable at the lowest protein concentration was extracted from the dried gel slice (49). The isolated DNA were amplified by PCR using primer W341 and end-labeled primer W340 and then purified for the next round of selection. After six rounds, the purified DNA fragments were cloned into a T-overhang vector (50). The inserted selected sites were sequenced, and their DNA binding affinity was determined by EMSA and quantitated on a PhosphorImager. The ␣2 binding site selection in the presence of Mcm1 was performed by utilizing the same initial randomized oligonucleotide pool. In each round of selection, a titration of ␣2 initiated at 1 M and 0.5 M of Mcm1 were present in the EMSA reactions. After six rounds of selection, the selected sites were cloned and sequenced, and DNA binding affinity was measured as described above.

RESULTS
The ␣2-Mcm1 DNA-binding Sites from Five Natural asg Function as Repressor Sites in Either Orientation-The ␣2-Mcm1 binding sites have been identified in the promoter regions of five asg (STE6, BAR1, STE2, MFA1, and MFA2) (21)(22)(23)(24). Although the natural sites are highly conserved, there are variations at some positions that may result in different levels of ␣2-Mcm1-mediated repression. To verify that these ␣2-Mcm1 sites are all functional repressor sites and to measure their relative strength for repression in the same promoter context, oligonucleotides containing the sites were inserted into the promoter region of a CYC1-lacZ reporter plasmid, and the level of expression from the promoter was assayed by measuring ␤-galactosidase activity (Fig. 1). The results indicate that although there are small variations in the level of repression between the different sites, all of the sites confer greater than 30-fold repression of lacZ expression.
The STE6 operator was shown to function as a repressor site in either orientation. There was, however, a small difference in the level of repression between the two directions (23). Since all five natural ␣2-Mcm1 sites are only partially symmetric dyads, we were interested in whether the orientation of the sites with respect to the start site of transcription affects the level of repression. To address this question, we compared the level of repression conferred by these sites in both orientations (Fig. 1). Although all five sites function in either orientation, there were slight differences in the levels of repression. These small differences may be due to the asymmetry of the sites or to the flanking sequences.
An consensus sequence of these sites is highly symmetric. To examine whether a symmetric sequence may function as a better repressor site, an ␣2-Mcm1 symmetric consensus site, which we call AMSC, was assayed for the repression activity it conferred in the context of the CYC1-lacZ reporter construct. We found that AMSC is a slightly better repressor site (1.3-4-fold) than any of the natural ␣2-Mcm1 sites (Fig. 1).
To determine whether the increase in repression conferred by the AMSC site is due to stronger DNA binding affinity of the ␣2-Mcm1 complex to the site, we compared the DNA binding affinity of ␣2-Mcm1 for AMSC and STE6 sites by EMSA. Our results indicated that ␣2-Mcm1 binds to the AMSC site slightly better (1.3-fold) than to the STE6 operator (Fig. 3). This result is consistent with our in vivo repression data and shows that the AMSC site functions as a better site for ␣2-Mcm1 binding and repression than the known natural asg operators.
Saturation Mutagenesis of the ␣2 Recognition Sequence in AMSC-The co-crystal structure of the ␣2 homeodomain bound to DNA shows that the protein makes base-specific contacts with positions T 3 , G 4 and T 5 in the major groove and with positions T 8 (or A 8 ) and T 9 in the minor groove ( Fig. 2A) (46). These positions are conserved in the ␣2 binding sites in each of the known natural asg. However, a comparison of the natural ␣2-Mcm1 sites also indicates that other positions such as positions 1, 2, 6, and 7, in which there are no apparent base-specific contacts in the co-crystal structure, are also highly conserved. This strong conservation among the natural sites suggests that there is a sequence specificity at these positions. To investigate if there are sequence requirements at these positions for ␣2 DNA binding and repression, a series of AMSC operators with symmetric base pair substitutions in both ␣2 half-sites were cloned into the CYC1-lacZ reporter promoter, and their effects on repression in vivo were measured using ␤-galactosidase assays (Fig. 2B). As expected, substitutions at positions T 3 , G 4 and T 5 result in a large decrease in repression of lacZ expres-sion (approximately 100-fold). Substitutions at positions in which there are base-specific contacts in the minor groove, T 8 and T 9 , also significantly reduce the level of repression. Surprisingly, substitutions at positions in which there are sugarphosphate backbone contacts, but no base-specific contacts in the co-crystal structure, also dramatically reduce repression (Fig. 2B). For example, some substitutions at positions C 1 , A 2 , and A 7 , and most notably A 6 , reduce the level of repression 10 -50-fold. These results show that there is additional sequence specificity at these positions for ␣2-Mcm1-mediated repression in vivo.
To correlate the repression data with DNA binding activity, we assayed ␣2-Mcm1 DNA binding affinity for the mutant operator sites by EMSA. Substitutions at positions in which there are base-specific contacts, such as T3A/A28T, G4T/C27A and T5G/A26C cause a large reduction in ␣2-Mcm1 DNA binding affinity (10-, 26-, and 40-fold, respectively) (Fig. 3A). Substitutions at positions in which there are no base-specific contacts in the co-crystal structure, such as C1G/G30C, A2C/T29G, A6C/T25G and A7C/T24G, also affect the ␣2-Mcm1 DNA binding affinity but to a smaller degree, 2-, 3-, 8-, and 3-fold, respectively, compared with substitutions at positions T 3 , G 4 , and T 5 (Fig. 3B). Although there are some differences between the absolute fold decreases of the in vivo and in vitro assays, all of the substitutions have similar effects. For example, those substitutions with large decreases in repression in vivo have large decreases in DNA binding affinity in vitro, and substitutions with small effects on the level of repression in vivo have small decreases in DNA binding affinity. The in vitro results therefore support the in vivo observations and show that many of these substitutions, even at those positions in which there are no apparent contacts with bases, affect ␣2 DNA binding affinity and repression.
Substitutions in One ␣2 Half-site Have Smaller Effects on Repression than the Symmetric Substitutions in Both ␣2 Halfsites-We have found that substitutions at many positions in the AMSC site affect repression in vivo. These results are different from those of a previous study, which showed that The DNA sequence shows one ␣2 half-site used in the co-crystal structure and is identical to the ␣2 sites of the AMSC. Arrows represent base-specific contacts, and lines with circles represent sugar-phosphate backbone contacts that are observed in a co-crystal structure of the ␣2 homeodomain binding to DNA (46). B, saturation mutagenesis of the ␣2 binding site in AMSC. The sequence at the top of the table shows one ␣2 half-site. The other half-site is symmetric to the sequence shown. Each value in the table refers to the percentage of the repression activity of AMSC conferred by the mutant operator that contains one pair of symmetric substitutions at that position in both ␣2 half-sites. The AMSC site conferred 130-fold repression of lacZ expression, which was defined as 100% repression. The values were derived from the averages of measurements from three independent transformants. changes at positions 1, 2, 3, 6, 7, and 8 in the ␣2 homeodomain recognition sequence of the natural STE6 operator have only small effects on repression (36). One explanation for this discrepancy is that we have constructed operators with symmetric substitutions in both ␣2 half-sites of the AMSC instead of a single point mutation in one ␣2 half-site of the natural STE6 operator. To investigate this difference, we compared the effect on repression of single point substitutions in one ␣2 half-site in AMSC with symmetric mutations in both ␣2 half-sites (Fig. 4). An asymmetric substitution at a position in which there is a base-specific contact, such as G4A, leads to slightly higher repression than the symmetric substitution G4A/C27T. Asymmetric substitutions at other positions, such as A 7 , T 8 , and T 9 , have less effect on repression than symmetric substitutions in both half-sites. These results agree with the previous study and show that although the single mutations only have a small effect on repression, there is a sequence preference at these positions in the context of symmetric substitutions.
The ␣2 Homeodomain DNA-binding Site Selection in Vitro-The natural DNA-binding sites of many homeodomain proteins are unknown. One commonly used method to determine their target sites is through in vitro DNA-binding site selection experiments utilizing randomized oligonucleotides. One important question is how well sites selected through in vitro selection experiments correlate with the natural in vivo target sites. The mutagenesis experiments described above have precisely defined the sequence requirements for ␣2 binding. We therefore decided to use the site selection technique to determine whether it would identify a similar site. An oligonucleotide pool that contains an Mcm1 binding site adjacent to a randomized region was used in the site selection assay. After six rounds of selection, the sites were cloned and sequenced (Table I). An alignment of the sequences obtained in the selection arrived at a consensus site of TGT, which corresponds perfectly with the positions contacted by ␣2 in the major groove in the co-crystal structure (46). We have assayed the ␣2 DNA binding affinity to each site and have found that sites with high DNA binding affinity are closely related to the natural ␣2 recognition sequences (Fig. 1). If only those sites with moderate affinity (ϩϩ) or better were considered, we obtained a consensus sequence of TGTAA, which closely matches the natural ␣2 sites. These results indicate that this in vitro technique can identify sites that correspond well with the natural ones.
Although ␣2 binds DNA on its own in vitro, it must interact with Mcm1 to repress asg in vivo. Previous studies have shown that the cooperative DNA binding by ␣2 and Mcm1 requires a specific spacing and orientation between their respective DNA-binding sites (34,35,44). We therefore analyzed whether the sites that were obtained from the in vitro selection experiments were able to be bound cooperatively by ␣2 and Mcm1 (Table I).
Our results show that only sites that have the proper spacing, orientation, and sequence between the ␣2 and Mcm1 recognition sites, such as sequences 1, 2, and 9, are bound cooperatively by ␣2 and Mcm1. On the other hand, those sites that do not have these sequence requirements are not bound cooperatively by ␣2 and Mcm1, although on their own both proteins bind to these sites with relatively high affinity. For example, sequences 3, 5, and 7 are not bound cooperatively by ␣2 and Mcm1 because they do not have the same orientation between the ␣2 and Mcm1 binding sites. Furthermore, sequences 6 and 8, which have the proper orientation and spacing between the ␣2 and Mcm1 binding sites, are not bound cooperatively by ␣2 and Mcm1 because a G or C is present at positions that are important for the ␣2-Mcm1 complex binding in vivo (Fig. 2B). These results indicate that the site selection assay is also able to screen for additional sequence requirements such as spacing and orientation between sites when a protein binds DNA in a complex with other cofactors.
The in vitro selection experiment, using ␣2 alone, defined a consensus sequence TGT that corresponds well to the ␣2 recognition core sequence in natural sites. This consensus site, however, does not extend to some positions that are conserved in the natural ␣2-Mcm1 sites and that we have shown are important for repression in vivo. In the presence of Mcm1, it appears that there are base-specific preferences at these positions. We therefore performed the ␣2 site selection experiment a second time with the same pool of random oligonucleotides in the presence of Mcm1. After six rounds of selection, we obtained 50 sequences, which we aligned in two groups according to the different spacing (5 or 6 bp) between the ␣2 recognition core site, TGT, and the core Mcm1 binding site, CCTAATTAGG (Table II). In each group, the sequences are listed based on the observed binding affinity of the ␣2-Mcm1 complex. Most of the selected ␣2 sites have the appropriate orientation and spacing between the ␣2 and Mcm1 sites. In this selection, one-third of the sites selected were sequence 1, which is not only the highest affinity site selected from the pool, but also exactly matches the ␣2 half-site that was used in the AMSC site ( Figs. 1 and 2). Sequences that were selected from the pool (sequences 1, 4, 17, and 19) are identical to some of the natural ␣2 half-sites shown in Fig. 1. We obtained a consensus sequence of ATGTAAT for sites with 5-bp spacing between the ␣2 TGT core site and the Mcm1 site. This sequence perfectly matches the ␣2 half-site sequence in the AMSC site that we derived from an alignment of the natural asg operators. A slightly different consensus site, GTGTAADT (D represents A, G, or T) was obtained from selected sites with 6-bp spacing between the ␣2 TGT core site and the Mcm1 site. We have assayed one derivative of this consensus (CGTGTAAAT) for its ability to repress transcription of the CYC1-lacZ promoter in vivo and have found that the operator containing this sequence in each ␣2 half-site strongly represses the lacZ expression (45-fold repression).
One notable difference between the selected sites with 5-bp spacing or 6-bp spacing between the ␣2 TGT sequence and the Mcm1 site is the base preferences at position 2. Selected sites with 5-bp spacing predominantly have an A at this position (23 of 34), while only 6 of 34 have a G at this position. On the other hand, selected sites with 6-bp spacing predominantly contain a G (13 of 16) at this position. These results suggest that there may be a difference in the base pair specificity at this position that depends on the spacing between the ␣2 and Mcm1 sites. The data in Fig. 2 show that in operators with 5-bp spacing an A at position 2 represses lacZ expression 2-fold better than a site with a G at this position. However, in sites with 6-bp spacing, we find that the site with a G at position 2 represses lacZ expression about the same (45-fold) as the site with an A at this position (51-fold). These results indicate that the sequence-specific requirements at position 2 are less stringent for sites with 6-bp spacing between the ␣2 and Mcm1 recognition sites than sites with 5-bp spacing.

DISCUSSION
The five natural ␣2-Mcm1 binding sites that have been identified in the promoter regions of asg are highly conserved. We have examined the relative strength of these sites by comparing the level of repression mediated by these sites in the same promoter context. All of these sites confer strong repression of lacZ expression from a heterologous CYC1-lacZ promoter, although there are some differences in the relative strength of repression, with MFA1 Ͼ BAR1 Ͼ STE6 Ͼ MFA2 Ͼ STE2 (Fig.  1). The strength of repression mediated by these sites correlates with the degree of similarity to a consensus ␣2-Mcm1 binding site; i.e. the higher the sequence similarity to the consensus site, the stronger the repression. To further test this correlation, a symmetric consensus site (AMSC) was assayed in the same context and was found to confer better repression than any of the natural sites. The AMSC site was also bound cooperatively by the ␣2 and Mcm1 proteins with slightly higher affinity than the STE6 operator. The level of repression is therefore, at least in part, a function of the strength of ␣2-Mcm1 binding, and the higher the binding affinity, the greater the repression. Although the natural ␣2-Mcm1 operators are not optimal binding and repressor sites, it may not be biologically necessary for these sites to function as well as the AMSC site. For example, the transcriptional activator elements in the asg promoters may be significantly weaker than the CYC1 UAS elements of the reporter promoter used in this study. These weaker promoters would not require a repressor site as strong as the AMSC site to completely turn off expression of the genes. Alternatively, the weaker natural repressor sites may enable the cells to respond faster to switches in mating type and hence the cells would quickly derepress asg and be able to mate with MAT␣ cells.
The ␣2 half-sites in AMSC are identical to one of the ␣2 half-sites used in determining the co-crystal structure (46). In the co-crystal complex, residues Ser-50, Asn-51, and Arg-54 in the ␣2 homeodomain make base-specific contacts in the major groove with T 3 , G 4 , and T 5 , and Arg-7 in the N-terminal arm of the homeodomain makes base-specific contacts in the minor groove with T 8 and T 9 . As expected, our mutagenesis results show that mutations in T 3 , G 4 , T 5 , T 8 and T 9 dramatically reduce the level of ␣2-Mcm1-mediated repression in vivo. However, we also observed that substitutions at other positions, such as C 1 , A 2 , A 6 and A 7 , in which there are no base-specific contacts in the ␣2 co-crystal structure, also significantly affect repression (Fig. 2B). These results suggest that specific base pairs are also required at these positions.
Recently, a ternary crystal structure of the a1 and ␣2 proteins bound to DNA has been solved (42). This structure was determined at a higher resolution than the previous ␣2 cocrystal structure, and portions of the ␣2 protein, most notably the N-terminal arm and the C-terminal tail extending from the homeodomain, are more ordered in the ternary complex. The ␣2 half-site in the ternary complex is identical to the ␣2 halfsites in the AMSC consensus sequence. In the ternary structure, besides base-specific contacts at positions 3, 4, 5, 8, and 9 that are present in the co-crystal structure, there are also additional base-specific contacts at positions 2, 4, 5, and 6. It is possible that in complex with Mcm1, ␣2 may make similar contacts to these positions, which would explain why substitutions of these bases pairs have an effect on ␣2-Mcm1-mediated repression. For example, although there is no apparent basespecific contact to position 2 in the co-crystal structure, it has been shown in the structure of the a1-␣2-DNA ternary complex that N-7 of A 2 is contacted via a water-mediated hydrogen bond by Ser-50 of the ␣2 homeodomain (42). This position is strongly conserved among the ␣2-Mcm1 binding sites found upstream of asg, and of the 10 natural ␣2 half-sites, 8 contain an A and 2  contain a G at this position (Fig. 1). The observation that G, unlike C and T, functions almost as well as an A at this position is consistent with a model that in complex with Mcm1, ␣2 makes a similar base-specific contact to the N-7 group as is observed in the a1-␣2-DNA ternary complex.
We have found that substitutions to T or A at positions A 7 , T 8 , and T 9 have less effect on repression than substitutions to G or C (Figs. 2B and 4). It has been observed that A:T and T:A base pairs have a similar distribution of hydrogen bond donors and acceptors in the minor groove (51). Since in both crystal structures positions 8 and 9 are contacted in the minor groove by Arg-7, it is possible that this extended side chain is able to adjust to accommodate the slight alteration of the positions of the hydrogen bond acceptors when an A:T base pair is substituted for T:A at these positions. This model is supported by the observation that ␣2 binds on its own with almost equal affinity to sites with either T:A or A:T at these positions. However, substitutions from T to A at position 8 or 9 cause more than a 5-fold reduction in the level of ␣2-Mcm1-mediated repression (Fig. 2B). A portion of the effects of these substitutions may be due to the slight decrease in ␣2 DNA binding affinity. However, substitutions at these positions also affect Mcm1 binding to the site (52), and this decrease in affinity may account for most of the decrease that we observed in ␣2-Mcm1-mediated repression. Although no contacts were observed at position 7 in either structure, there is also an A or T preference at this base pair. It is possible that there may be base-specific contacts at this position in the ␣2-Mcm1-DNA complex. Alternatively, G or C substitutions at this position may interfere with the minor groove contacts at adjacent positions.
In the a1-␣2-DNA ternary complex, there are only basespecific contacts in the minor groove at position 6; therefore, we might expect the A to T substitution at this position would not greatly affect repression and DNA binding affinity. However, we observed that the T substitution at this position reduces the level of repression over 30-fold. If the Arg-4 side chain makes similar contacts in the ␣2-Mcm1-DNA complex as observed in the a1-␣2-DNA ternary complex, the position of the side chain may be fixed by its contacts with base pairs 4 and 5 (42). Therefore, unlike Arg-7, the Arg-4 side chain may not be able to alter its position to accommodate the small changes for making a hydrogen bond with the T substitution at position 6. In addition, the Gly-5 peptide backbone amide makes a hydrogen bond contact to the O-2 of thymine on the bottom strand at position 6. To maintain the hydrogen bond, the position of the peptide backbone would have to be slightly altered in the A 6 to T substitution. The repositioning of the backbone may in turn weaken or destroy multiple base-specific or sugar-phosphate backbone contacts that are made by other side chains in the N-terminal arm and therefore significantly reduce the level of repression. Alternatively, the substitution may sterically interfere with the precise position of the arm for making contacts with DNA. It has been shown that a small hydrophobic region proceeding the N-terminal arm of the ␣2 homeodomain is required for cooperative DNA binding and protein-protein interactions with Mcm1 (44). It is possible that the interactions between the proteins fix the position of residues in the Nterminal arm so that additional contacts could be made in the minor groove that are not observed in either crystal structure. If these additional contacts are made, then that may partially contribute to the increase in ␣2 DNA binding specificity that is observed in the presence of Mcm1. In summary, the high degree of sequence conservation at positions 1, 2, and 6 among the natural sites along with our mutational analysis at these positions shows that they play an important role in ␣2 DNA recognition. Our results are consistent with a model that, in combination with Mcm1, ␣2 is making contacts with the DNA that GCTGTATTta-M 6.9 ϫ 10 Ϫ8 1 8 CAGGTAAAta-M 1 9 AGGAAATTta-M  are similar to contacts observed in the a1-␣2-DNA ternary complex.
We have analyzed the DNA binding specificity of ␣2 in complex with Mcm1 by determining the effects of mutations within the AMSC site on repression. As an alternative approach to investigate the ␣2 DNA binding specificity, we have performed in vitro site selection experiments for the ␣2 homeodomain in the presence and absence of Mcm1. In the absence of Mcm1, we obtained an ␣2 binding consensus site of TGT, corresponding to positions in which there are base-specific contacts by the ␣2 homeodomain in the major groove (46). In the presence of Mcm1, we obtained two consensus sequences, ATGTAAT and GTGTAADT (D represents A, G, or T) according to the spacing between the ␣2 and Mcm1 sites. These consensus sequences show extended sequence specificity compared with the consensus sequence obtained from the site selection of ␣2 on its own. Furthermore, most sequences obtained from the second selection have the same orientation and spacing for the ␣2 and Mcm1 binding sites as is found in the natural ␣2-Mcm1 operators (Fig. 1, Table II). Among these selected sequences, four different sequences are identical to the natural ␣2 half-sites shown in Fig. 1. Our results demonstrate that in vitro DNA site selection technique can be utilized not only to identify binding sites of individual proteins but also to further screen for optimal binding sites for a protein complex.
Previous studies have shown that the relative positions between the ␣2 and Mcm1 binding sites is somewhat flexible, and while large changes in spacing are not functional, operators with 5 or 6 base pairs between the sites are bound cooperatively by the proteins and function as repressor elements in vivo (35,44). The flexibility of the spacing between the ␣2 and Mcm1 sites is evident among the natural operators, since the STE2, STE6, MFA1, and MFA2 sequences have a 5-bp space between the ␣2 and Mcm1 sites in one half-site and 6-bp spacing in the other half-site (Fig. 1). The fact that in the presence of Mcm1 sites were selected from the random pool which have both 5and 6-bp spacing further shows that binding by the ␣2-Mcm1 complex can accommodate either spacing. In contrast, the spacing requirements between the ␣2 and a1 binding sites of haploid-specific operators, as well as the positions of the binding sites in other homeodomain complexes such as the Drosophila Paired homodimer and the Hox-Pbx heterodimer are rigidly fixed (43,53,54). These results suggest that either the proteinprotein or protein-DNA interactions in the ␣2-Mcm1-DNA complex can adjust, to some extent, to accommodate the alterations in spacing between the binding sites.
Interestingly, in comparing the consensus sequences with 5or 6-bp spacing between the ␣2 and Mcm1 sites, we noticed that there is a different preference for the base pair corresponding to position 2 in the AMSC site. In sites with 5-bp spacing an A is preferred (23 of 34), while sites with 6-bp spacing predominantly have a G at this position (13 of 16). We have determined that in sites with 5-bp spacing an A at position 2 results in 2-fold higher repression than a G, while in sites with 6-bp spacing, G functions as well as A. These results suggest that sites with 6-bp spacing have relaxed sequence specificity at this position in comparison with sites with 5-bp spacing. It is possible that to make the proper contacts with Mcm1 on operators with 6-bp spacing, ␣2 may have to alter the contacts with position 2 of the operator. In the a1-␣2-DNA ternary complex structure, this base pair is contacted by Ser-50 of the ␣2 homeodomain via a water-mediated hydrogen bond to N-7 (42). The preference for purines at this position in operators with either 5-or 6-base pair spacing suggests that the contact to N-7 is made in both sets of operators. However, the fact that A is preferred to G in selected sites with 5-bp spacing indicates that there may be another base-specific contact to the A:T base pair at this position. In contrast, in operators with 6-bp spacing G functions as well as A, which suggests that this contact is not made in this set of operators. In other homeodomains, residue 50 makes either a direct or water-mediated hydrogen bond with the base pair corresponding to position 2, and this residue has been shown to have an important role in determining homeodomain DNA binding specificity (9,47,53,55,56).
In summary, the in vitro site selection results support the conclusions drawn from our mutagenesis data that, in complex with Mcm1, the DNA binding specificity of the ␣2 protein extends to positions in which there are no apparent basespecific contacts in the co-crystal structure. Similar changes in the binding specificity of homeodomain proteins in the presence of their cofactors have also been observed in other homeodomain proteins. For example, the optimal binding site for a Hox protein in complex with Pbx1 appears to be slightly different from the site selected for Hox binding on its own (54). Likewise, the DNA binding specificity of Oct-1 appears to change upon interaction with Bob1 (13). The fact that the ␣2 binding sites selected in the presence of Mcm1 are extended and better defined than in the absence of Mcm1 could explain why consensus sites for some DNA-binding proteins identified in the absence of their cofactors may not function well in vivo. Site selection in the absence of the cofactor would therefore not be able to define the sequence requirements for binding by a protein complex, such as the orientation and spacing between the binding sites of each protein, as well as the sequence specificity from additional contacts made by the proteins.