Altering the DNA-binding Specificity of the Yeast Mat (cid:1) 2 Homeodomain Protein*

Homeodomain proteins are a highly conserved class of DNA-binding proteins that are found in virtually every eukaryotic organism. The conserved mechanism that these proteins use to bind DNA suggests that there may be at least a partial DNA recognition code for this class of proteins. To test this idea, we have investigated the sequence-specific requirements for DNA binding and repression by the yeast (cid:1) 2 homeodomain protein in asso-ciation with its cofactors, Mcm1 and Mata1. We have determined the contribution for each residue in the (cid:1) 2 homeodomain that contacts the DNA in the co-crystal structures of the protein. We have also engineered mutants in the (cid:1) 2 homeodomain to alter the DNA-binding specificity of the protein. Although we were unable to change the specificity of (cid:1) 2 by making substitutions at residues 47, 54, and 55, we were able to alter the DNA-binding specificity by making substitutions at residue 50 in the homeodomain. Since other homeodomain proteins show similar changes in specificity with substitutions at residue 50, this suggests that there is at least a partial DNA recognition code at this position. 1

Homeodomain proteins are a highly conserved class of DNA-binding proteins that are found in virtually every eukaryotic organism. The conserved mechanism that these proteins use to bind DNA suggests that there may be at least a partial DNA recognition code for this class of proteins. To test this idea, we have investigated the sequence-specific requirements for DNA binding and repression by the yeast ␣2 homeodomain protein in association with its cofactors, Mcm1 and Mata1. We have determined the contribution for each residue in the ␣2 homeodomain that contacts the DNA in the co-crystal structures of the protein. We have also engineered mutants in the ␣2 homeodomain to alter the DNA-binding specificity of the protein. Although we were unable to change the specificity of ␣2 by making substitutions at residues 47, 54, and 55, we were able to alter the DNAbinding specificity by making substitutions at residue 50 in the homeodomain. Since other homeodomain proteins show similar changes in specificity with substitutions at residue 50, this suggests that there is at least a partial DNA recognition code at this position.
Homeodomain (HD) 1 proteins are a large family of transcriptional regulatory proteins that control many diverse cellular and developmental processes (1). HD proteins have been found in organisms ranging from fungi to plants and humans, and the DNA-binding domains of these proteins are strongly conserved. The structures of a large number of HDs have been determined alone and in complex with DNA, and each shows remarkable structural similarity to one another (2)(3)(4)(5)(6)(7)(8)(9). The HD is usually 60 residues long and consists of three ␣-helices, which form a tight bundle (see Fig. 1). The third helix in the bundle, frequently termed the "recognition helix," lies in the major groove of the DNA and makes the majority of the base-specific and sugar phosphate backbone contacts. In most of the HD structures, there is also a flexible region, called the N-terminal arm, that extends from the N terminus of the first helix and wraps around the DNA to make base-specific and phosphate backbone contacts in the minor groove. The similarity in the mechanism of DNA binding among HD proteins is due in part to highly conserved residues, such as Trp 48 , Arg 53 , Lys 55 , and Lys 57 , that are conserved in virtually all HD proteins and together make a set of phosphate and backbone contacts with both strands of the DNA (see Fig. 1). These contacts are likely to be important to help position the recognition helix within the major groove. Asn 51 is absolutely conserved in all HD proteins and makes virtually identical base-specific contacts with a conserved adenine in the binding site of every HD structure that has been determined.
Given the conserved nature of the HD structure and DNA contacts among the more than 15 HDs that have been solved, along with the high sequence conservation in this family of proteins, it seems likely that most other HDs will fold and bind DNA in a similar manner. However, although HD proteins use a conserved mechanism to bind DNA, different HD proteins have very different binding specificities in vivo and in vitro. These differences in specificity are due to residues in helix 3 and the N-terminal arm that are not as well conserved among the different HD proteins. For example, residue 50 in the recognition helix makes a base-specific contact in the major groove, but, in contrast to Asn 51 , is relatively diversified among the different HD proteins (10). Biochemical and genetic studies suggest that residue 50 is important for dictating the preference of the dinucleotide immediately 5Ј to the ATTA core sequence (5Ј-NNATTA-3Ј) that is present in many HD binding sites (11)(12)(13)(14)(15)(16)(17). These results suggest that a partial DNA recognition code may exist for residue 50 in the HD.
Although many HD proteins bind specifically to DNA in vitro, they often interact with cofactors to bind with higher affinity and specificity to their sites in vivo. Interactions with different cofactors may affect the specificity of binding by the HD protein such that its in vivo target specificity is different from its apparent binding specificity in vitro. It is important to examine the specificity of these proteins in vivo as well as in vitro. We have therefore chosen to examine the binding specificity of the Mat␣2 HD protein from the yeast Saccharomyces cerevisiae because the interactions with its cofactors, the Mata1 and Mcm1 proteins, and binding to its target sites have been well studied in vitro and in vivo. In the ␣ cell type, ␣2 interacts with Mcm1, a member of the strongly conserved MADS box family of DNA-binding proteins (18). The ␣2 and Mcm1 proteins bind cooperatively as a heterotetramer complex to conserved sites upstream of a-specific genes to repress transcription (19,20). Mcm1 binds as a dimer to the center of a partially symmetric site and is flanked on either side by monomer binding sites for the ␣2 HD protein. Although ␣2 and Mcm1 can bind to these sites on their own in vitro, both proteins are required for repression in vivo. In diploid a/␣ cells, ␣2 binds with a1, another HD protein, to repress transcription of haploid-specific genes. The ␣2 and a1 proteins form a heterodimer complex, with each HD binding to one half-site (21). The structures of the ␣2 HD binding DNA on its own and in combination with a1 or Mcm1 have been determined (3,4,22,23). The DNA sequence requirements for recognition by these complexes have also been determined in vivo and in vitro (24 -28). These structural, genetic, and biochemical data provide excellent models for how the ␣2 protein recognizes its site alone and in combination with its cofactors (see Fig. 1). In this study, we investigated whether the sequence recognition code that has been determined for other HD proteins (11)(12)(13)(14)(15)(16)(17) is similar for ␣2 in complex with its cofactors in vitro and in vivo.

MATERIALS AND METHODS
Plasmids-Transcription reporter plasmids with wild-type or mutant ␣2-Mcm1 or a1-␣2 binding sites were previously constructed by inserting double-stranded oligonucleotides containing these sites with TCGA overhangs into the XhoI site between the UAS and TATA elements of the CYC1-lacZ promoter of pTBA23 (27)(28)(29). The wild-type ␣2-Mcm1 site used in these experiments is a symmetric site derived from a consensus of the wild-type sites found upstream of a-specific genes (27). Mutant derivatives of this site contain symmetric base pair substitutions in each of the ␣2 binding sites. The mutant sites were named by describing the original nucleotide, the positions mutated, and the substituted nucleotide. For instance, T 3 A is a symmetric mutant in which T at position 3 is mutated to A, and A at the symmetric position 28 is mutated to T. The a1-␣2 site used in these studies is derived from a consensus of the wild-type binding sites found upstream of haploidspecific genes (28). Mutant derivatives of the a1-␣2 site contain base pair substitutions only in the ␣2 half-site and are in the same relative position as the mutations in the ␣2-Mcm1 site.
Derivatives of plasmid pAV115, a yeast CEN LEU2 plasmid containing a 4.3-kilobase MAT␣ locus with wild-type ␣2 or the S50I, S50Q, or N47I mutant, have been described (30). For comparison of the ␣2 HD with other HD structures, we have utilized the numbering system most commonly used for the 60-residue HD. Ser 50 in the ␣2 HD therefore corresponds to Ser 181 in the full-length ␣2 protein. Other mutants in the ␣2 HD were constructed by replacing fragments of pJM130 with the double-stranded oligonucleotides containing the desired codon substitution (29). pJM130 is a derivative of pAV115 that contains a MAT␣2 gene with silent and unique restriction sites engineered within the region coding for the HD. The constructs were screened by restriction digestion and verified by sequence analysis. Each of the mutants is in the context of the full-length MAT␣2 gene and is expressed from the endogenous MAT␣2 promoter on a low copy CEN plasmid.
The ␣2 mutants were expressed from derivatives of the bacterial expression vectors pJM163 (29) and pYJ195 (28), which contain an N-terminal 6-His-tagged MAT␣2 gene coding for the full-length protein or a C-terminal fragment of residues 123-210, respectively. Derivatives of pJM163 containing mutations in ␣2 were constructed by replacing the 490-base pair BglII-NheI fragment of pJM164 with the 570-base pair BglII-NheI fragments from the mutants in pJM130 (29). The mutant ␣2 C-terminal fragment expression vectors were constructed by cloning the BamHI-NheI fragments containing the ␣2 mutations from pJM130 into pYJ195 (28). All constructs were screened by restriction enzyme digestion and confirmed by sequence analysis.
␤-Galactosidase Assays-The haploid AJ83 (MATa ura3 his3 leu2 trp1) and AJ126 (mat⌬ ura3 his3 leu2 trp1) yeast strains used in the transcription reporter experiments were described previously (30). Derivatives of the CEN LEU2 ␣2 expression plasmids containing the wild-type or mutant MAT␣2 genes were co-transformed with 2 URA3 CYC1-lacZ reporter promoter plasmids containing the appropriate ␣2-Mcm1 or a1-␣2 binding sites. Cells were grown to mid-log phase, and ␤-galactosidase assays were performed as described (19). For each mutant binding site, ␤-galactosidase activities were measured from at least three independent transformants, and the values were averaged.
Electrophoretic Mobility Shift Assays (EMSAs)-The relative DNAbinding affinities of the different ␣2 proteins for the wild-type and mutant ␣2-Mcm1 and a1-␣2 sites were determined by EMSAs. The ␣2 proteins used in the DNA binding assays for the ␣2-Mcm1 site are full-length proteins with 6 His residues fused to the N-terminal end. The ␣2 proteins used in the a1-␣2 DNA binding assays are C-terminal fragments containing residues 123-210 with 6 His residues fused to the N-terminal end (28). The a1 protein used in these experiments is the full-length protein with 6 His residues fused to the C-terminal end and was expressed and purified from strain BL21(DE3) with plasmid pYJ173 (28). The ␣2 and a1 proteins were purified on nickel resin columns to Ͼ90% homogeneity according to the manufacturer's protocol (Novagen). The concentration of each protein was determined by Bradford assays (Bio-Rad) and then normalized and verified by Coomassie Blue staining of SDS-polyacrylamide gels.
DNA probes used in the EMSAs were synthesized by the polymerase chain reaction using 32 P-end-labeled primers (31). Assays were performed in 20 mM Tris (pH 8.0), 1 mM EDTA, 5 mM MgCl 2 , 10 mg/ml bovine serum albumin (Fraction V), 5% glycerol, 0.1% Nonidet P-40, and 10 mg/ml sheared salmon sperm DNA. Proteins were diluted with buffer containing 50 mM Tris (pH 8.0), 1 mM EDTA, 500 mM NaCl, 10 mM 2-mercaptoethanol, and 10 mg/ml bovine serum albumin. For ␣2-Mcm1 assays, 3 l of ␣2 was added to 27 l of end-labeled fragment diluted in the assay buffer and incubated for 3 h at room temperature. For a1-␣2 binding assays, 5 l of ␣2-(123-210) and 5 l of a1 were added to 40 l of end-labeled fragment diluted in assay buffer and incubated for 1 h at room temperature. In the no-protein controls, 10 l of protein dilution buffer was added instead of the ␣2 or a1 protein. 20 l of the reactions was loaded on a native 0.5ϫ Tris borate/EDTA-6% polyacrylamide gel and electrophoresed for 1.5 h at 200 V. Gels were dried and exposed to phosphor screens, and images were scanned on a Molecular Dynamics PhosphorImager. The relative ␣2 DNA-binding affinity for each site was calculated by comparing the percentage of fragment bound at different ␣2 protein concentrations.

RESULTS
Residues in the ␣2 HD That Contact the DNA Are Required for DNA Binding and Repression-It was previously shown that residues in the ␣2 HD that make base-specific contacts with the DNA have important roles in DNA binding and transcriptional repression (30). However, many of these residues are not as nearly as well conserved among the different HD proteins as the residues that are involved in contacts with the sugar phosphate backbone of the DNA. We were interested in whether the strong conservation of these residues in different HD proteins is because they have an essential role in DNA binding. To test the role of these side chains, we constructed alanine substitutions of each residue in ␣2 that contacts DNA in the co-crystal structures ( Fig. 1) (3,22,23). To monitor the ability of the mutants to repress transcription in complex with Mcm1, the ␣2 mutants were transformed into a mat⌬ strain bearing an integrated CYC1-lacZ reporter containing an ␣2-Mcm1 binding site in the promoter. The ability of the ␣2 mutants to repress lacZ expression was measured by liquid ␤-galactosidase assays (Table I). The majority of the ␣2 alanine substitution mutants displayed significantly lower levels of repression in complex with Mcm1. Even substitutions of residues that indirectly contact DNA through water-mediated contacts, such as L26A and N47A, had a large effect on repression in complex with Mcm1, suggesting that they are making important contributions to DNA binding. Substitutions of residues in the N-terminal arm, such as Y3A, R4A, and G5A, appeared to have slightly less effect on repression than substitutions in the recognition helix. This suggests that the minor groove contacts made by these residues may not be as important for DNA binding as major groove contacts made by residues in the recognition helix.
The level of repression of the ␣2 mutants in complex with a1 was monitored in a MATa strain with an integrated CYC1-lacZ reporter containing an a1-␣2 binding site in the promoter (Table I). As we have previously observed, many of the substitutions of residues that contact the DNA have little or no effect on repression in complex with a1 (28,30). The only substitutions that showed significant reductions in a1-␣2-mediated repression were F8A and W48A. In addition to contacting the DNA, these side chains make numerous contacts with other residues in the hydrophobic core of the HD; and therefore, alanine substitutions of these residues could affect the folding or stability of the protein (3, 22, 23). However, Western blot analysis of these mutant strains showed that proteins were present at the same level as the wild-type protein, indicating that they must only affect folding (data not shown). The observation that other alanine substitutions in ␣2 do not affect repression in complex with a1 suggests that these mutants do not affect expression or the ability to repress transcription. Therefore, these substitutions most likely affect DNA binding in complex with Mcm1.
We have tested a number of the mutants for their ability to bind DNA on their own or in complex with Mcm1 or a1 in vitro by EMSAs. As expected from the in vivo results, mutants that show large decreases in the ability to repress transcription also show large decreases in their ability to bind DNA on their own or in complex with Mcm1 (28, 30) (data not shown). Mutants that had small decreases in the level of repression, such as S50A and R132A, showed only moderate reductions in DNAbinding affinity. The in vitro DNA binding data therefore agree well with the in vivo repression results and support the model that the mutants fail to repress transcription in complex with Mcm1 because they no longer bind DNA with wild-type affinity. These results show that most of the individual protein side chain contacts with the DNA observed in the ␣2 co-crystal structures are important for the DNA-binding activity of the protein.
Residue 50 in the ␣2 HD Has Relaxed Sequence Requirements-Among the alanine substitutions in ␣2, the S50A mutant showed one of the smallest decreases in repression with Mcm1 (Table I). However, in many other HD proteins, residue 50 is an important determinant of DNA-binding specificity (11)(12)(13)(14)(15)(16)(17). We were therefore interested if other amino acid substitutions of this residue would affect the DNA-binding affinity or specificity of the protein. Ser 50 was substituted with amino acids that are often found at this position in other HD proteins (10), and the mutants were assayed for the ability to repress transcription of a promoter containing a wild-type ␣2-Mcm1 site (Table II,   a Values shown are percent activity of repression by each mutant relative to the wild-type protein of a CYC1-lacZ reporter promoter containing a wild-type ␣2-Mcm1 or a1-␣2 binding site. For each sample, three independent transformants were assayed. For ␣2-Mcm1, wildtype ␣2 yielded 6.7 Ϯ 0.5. ␤-galactosidase units, whereas a blank plasmid (no ␣2) yielded 154 Ϯ 9.7 units. This gave a repression ratio of 23-fold, which was set at 100% repression. Fold repression values were determined for each sample in the same manner, with ␤-galactosidase units within 10% error. The wild-type repression ratio was then divided by the ratios determined for the mutant proteins to give percent repression values. Similar calculations were done for a1-␣2, with wild-type ␣2  (4,23). The DNA sequence shown is one ␣2 half-site used in the structure of the a1-␣2-DNA ternary complex and is identical to the wild-type consensus ␣2 binding sites used in this study. Arrows represent base-specific contacts, and lines with circles represent sugar phosphate backbone contacts that are observed in the co-crystal structures. Water-mediated contacts are indicated by a circled W. For comparison between the different HD proteins, the numbering system we have used is the position of each residue in relation to the HD and not to the fulllength protein. B and C, models of the ␣2 HD bound to DNA derived from the crystal structure of the a1-␣2-DNA ternary complex (4). B, base-specific contacts in the major groove (Ser 50 , Asn 51 , and Arg 54 ) and minor groove (Arg 4 , Gly 5 , and Arg 6 ) of the DNA are highlighted in black. C, a model of residues involved in phosphate backbone contacts with the DNA are shown.
requirements for a specific amino acid at residue 50 in ␣2. In contrast, mutants with larger or charged side chains, such as S50I, S50H, S50Q, S50R, S50K, and S50E, showed a complete loss of repression activity, presumably because these side chains sterically interfere with DNA binding by the protein. This result shows that there are specific amino acid requirements for this residue in ␣2. Interestingly, although Asn at residue 50 has not been found in any other HD protein (10), this substitution in ␣2 appeared to function almost as well as the wild-type protein.
We next tested whether the decreases in the levels of repression by the Ser 50 substitutions are a result of decreases in the DNA-binding affinity of the mutant proteins. Each of the ␣2 Ser 50 mutants was expressed and purified from Escherichia coli, and the DNA-binding affinity was assayed by EMSAs. In general, the effects of the mutations on the DNA-binding affinity in vitro correlated well with the repression activity in vivo. For example, the S50N and S50A mutants repressed the lacZ reporter almost as well as the wild-type ␣2 protein, and these proteins bound to the wild-type site with similar binding affinity as the wild-type protein ( Fig. 2A). Mutants that showed lower levels of repression, such as S50C, showed further reductions in binding affinity. Finally, mutants that completely failed to repress transcription, such as S50K, S50E, and S50R, showed a Ͼ50-fold decrease in DNA-binding affinity for the wild-type site. These results show that for a small side chain, there are no stringent requirements at residue 50 in the ␣2 HD for binding to or repression of the wild-type site. However, there is specificity against having larger side chains at this position, presumably because of steric interference with the DNA.

Substitutions of Ser 50 Have Different Sequence Preferences for Positions 2 and 3 in the ␣2-Mcm1
Binding Site-In the a1-␣2 structure, the Ser 50 side chain makes two water-mediated hydrogen bonds with the A 2 and T 3 base pairs in the ␣2 recognition sequence (5Ј-CATGTAA-3Ј) (4). These positions are strongly conserved among the natural ␣2-Mcm1 sites, and some of the base pair substitutions at these positions cause a Ͼ30-fold decrease in repression (27). To examine whether amino acid substitutions at residue 50 affect the sequencespecific preferences at positions 2 and 3 in complex with Mcm1, the ␣2 mutants were assayed for repression of mutant ␣2-Mcm1 sites containing base pair substitutions at these positions (Table II).
The S50N mutant exhibited almost the same sequence pref-erence at positions 2 and 3 as the wild-type ␣2 protein. The S50N mutant showed a preference among the mutant sites for G at the second position and showed a slight preference for C over either G or A at the third position. Mutants with small amino acid side chain substitutions, such as S50A, S50G, and S50T, showed further decreases in repression with the mutant sites, indicating that despite somewhat relaxed requirements for a specific amino acid at this position for binding to the wild-type site, there is still sequence specificity for base pairs contacted by this residue. Interestingly, there were slight differences in the sequence preferences at position 2 among the different amino acid substitutions. For example, the S50A mutant had a strong preference against having T at the second position in the site, whereas the S50G mutant appeared to slightly prefer T over G or C at this position. The S50C mutant and wild-type proteins showed a slight difference in sequence specificity. These results indicate that each of these amino acid side chains has its own sequence-specific requirements and is able to at least partially discriminate among the mutant sites.
The ␣2 S50K and S50R Mutants Show Altered DNA-binding Specificity-Most of the residue 50 mutants with large or charged amino acid substitutions, such as S50I, S50H, S50Q, and S50E, showed very little repression activity with the wildtype or mutant binding sites (Table II). In contrast, the S50K mutant repressed the A 2 T and T 3 G sites slightly better (ϳ2.5fold for each) than it repressed the wild-type site and repressed significantly better (10-fold) through the A 2 G site. We further tested this mutant protein with the CGGGTAA site, but found that the double GG mutant did not result in a higher level of repression compared with a single mutation at position 2 in the ␤-galactosidase assay (data not shown). This result suggests that the effects of S50K binding to sites with G substitutions at positions 2 and 3 may not be additive. The S50R mutant also repressed promoters containing sites with G and T substitutions at position 2 better (ϳ3-fold) than a promoter containing the wild-type ␣2-Mcm1 site. Although the repression by mutant proteins through the mutant sites was not restored to wild-type levels, these results clearly indicate that both S50K and S50R have altered specificity for the mutant sites in complex with Mcm1 in vivo.
The S50K and S50R mutants were further tested for their DNA-binding affinity and specificity by EMSAs using different ␣2-Mcm1 binding sites. In general, the DNA-binding affinity of the S50K mutant protein for the different sites correlates well with the in vivo repression results. The S50K mutant bound to the A 2 G site with at least 5-fold higher affinity compared with the wild-type site (Fig. 2B). This result correlates well with the observation that this mutant represses promoters with the A 2 G site ϳ10-fold better than a promoter with the wild-type site. S50K showed a 2.5-fold increase in repression of a promoter with the T 3 G site, and we observed a similar modest increase in the DNA-binding affinity of the mutant protein for this site (data not shown). The S50R mutant also showed a significant increase in binding affinity for the A 2 G site over the wild-type site, which correlates well with the increase in repression by this mutant of a promoter containing this site (Fig. 2B). Taken together, both in vivo and in vitro experiments have demonstrated that the S50K and S50R substitutions alter the DNAbinding specificity of the ␣2 HD. Substitutions at Residue 50 in the ␣2 HD Do Not Affect a1-␣2-mediated Repression-The results shown above indicate that amino acid substitutions of residue 50 alter the sequence specificity of ␣2 in binding alone in vitro and in complex with Mcm1 in vivo. We have previously shown that substitutions of DNA-binding residues in the ␣2 HD have little or no effect on a1-␣2-mediated repression or DNA-binding affinity, suggesting that these side chains do not make essential contributions to the complex (30). However, substitutions of base pairs contacted by these residues have a large effect on binding and repression, showing that there are sequence-specific requirements at these positions in the DNA site (28). Therefore, even though these residues are not essential for ␣2 binding in complex with a1, substitutions of the amino acids at these positions may alter the binding specificity of the complex. To investigate whether residue 50 has a role in ␣2 DNA recognition in complex with a1, we assayed several of the ␣2 mutants for their effects on binding to and repression of the wild-type and mutant a1-␣2 sites. Plasmids containing ␣2 mutants with Ser 50 replaced by Ala, Gln, Ile, Lys, or Arg were co-transformed into a MATa strain and assayed with derivatives of a CYC1-lacZ reporter plasmid containing substitutions at positions 2 and 3 in the ␣2 half-site of the a1-␣2 site in the promoter (Fig. 3A). The S50A, S50Q, and S50I mutants had essentially wild-type levels of repression of a promoter with the consensus site, indicating that these substitutions do not greatly affect the binding affinity of the complex. The wild-type protein showed slightly reduced repression of promoters containing sites with base pair substitutions at positions 2 and 3 in the ␣2 half-site. Interestingly, both the S50A and S50I mutants discriminated among different a1-␣2 sites in roughly the same way as wildtype ␣2. These mutants were more tolerant to substitutions at position 2 than at position 3, with a base preference of A Ͼ T Ͼ C Ͼ G at position 2. However, the ␣2 S50Q mutant differed from the wild-type protein and preferred G instead of C at position 2.
In contrast to the S50A, S50I, and S50Q substitutions, the S50K mutant had significantly reduced repression with the wild-type consensus a1-␣2 site. However, repression by S50K was restored to nearly wild-type levels at the A 2 G or T 3 G site (Fig. 3A). We next examined the DNA-binding affinity of the S50K mutant for the consensus and mutant a1-␣2 sites (Fig.  3B). The S50K mutant bound the wild-type site with a 20-fold decrease in binding affinity compared with wild-type ␣2. However, the S50K protein bound the A 2 G mutant site with approximately the same affinity as that of the wild-type protein binding to the consensus site. We conclude that the in vitro DNAbinding affinity of the S50K mutant for different a1-␣2 sites corresponds with the in vivo repression activity and that this mutant has higher affinity for a1-␣2 sites with G at position 2 or 3 than for the consensus site. Although the S50R mutant did not show as large a decrease in repression of the wild-type a1-␣2 site compared with the ␣2-Mcm1 site, it also showed changes in specificity, preferring G or T at position 2 over the presence of C. This shows that the S50K and S50R substitutions alter the DNA-binding specificity of the ␣2 HD in complex with both Mcm1 and a1.
Substitutions of Other Residues in the HD Are Unable to Alter ␣2 DNA-binding Specificity-The results shown above indicate that the DNA-binding specificity of ␣2 can be altered by making changes at residue 50 in the HD. We were therefore interested if it is possible to alter the binding specificity of ␣2 by making amino acid substitutions of other residues in the HD that contact DNA. We focused our effort on residues that make base-specific contacts with the DNA and that are variable among the different HD proteins (10). Arg 54 in ␣2 makes a base-specific contact with the N-7 group of the G 4 base in the co-crystal structures (Fig. 1) (3, 4, 23). However, in many other HD proteins, this residue is Ala and is not involved in a direct contact with the DNA. Instead, many of these HD proteins use Ile at position 47 in the HD to make a van der Waals contact with T on the other strand of DNA at position 4 in the site. In ␣2, there is an Asn side chain at position 47 that makes only an indirect contact with the DNA through a water molecule (4). We therefore tested whether the N47I and R54A amino acid substitutions in ␣2 would change the specificity of the protein from G:C at this position in the site to an A:T base pair. We constructed each of the single amino acid substitutions and the double mutant and assayed for the ability of the mutants to repress promoters containing wild-type and G 4 A mutant ␣2-Mcm1 sites (Fig. 4A). The single and double mutants failed to repress promoters containing either the wild-type or mutant sites. These results show that the DNA-binding specificity conferred by these residues cannot be simply altered by swapping these amino acids. To ensure that the loss of repression correlates with a decrease in DNA binding, we examined the DNAbinding affinity of the mutant proteins for the wild-type and mutant sites. The N47I single mutant and the N47I/R54A double mutant were unable to bind to either the wild-type or mutant site (data not shown). The R54A mutant showed significantly reduced binding to the wild-type site, but did not show any further decrease in binding to the mutant site (Fig.  4B). This result suggests that the R54A mutant has decreased affinity but relaxed specificity at position 4 in site.
One possible explanation for the failure of the N47I/R54A double mutant to repress promoters with the G 4 A site is that residue 50 may be required to play a greater role in DNA binding in HD proteins with Ile at position 47 and Ala at position 54. In the Drosophila Engrailed HD, residue 50 is a Gln. We therefore made the N47I/S50Q/R54A triple amino acid substitution in ␣2 and assayed its ability to repress promoters containing the wild-type site and a mutant site that resembles the Engrailed binding site of TAATTA. However, this mutant was unable to repress reporters containing either the consensus ␣2-Mcm1 site or any of the chimeric Engrailed-Mcm1 sites (data not shown). This mutant also failed to show any binding activity for either the wild-type or mutant sites in vitro.
We also examined the role of Lys 55 in DNA-binding specificity. In many HD proteins, residue 55 is a Lys that makes a contact with the phosphate backbone. However in the a1, Pbx1, and Extradenticle HD proteins, this residue is an Arg that makes a base-specific contact with G on the bottom strand at position 6 in the site (4,32,33). We therefore tested if an ␣2 K55R mutant would show altered specificity for a A 6 C mutant site. The K55R mutant had roughly the same level of repression and DNA-binding affinity as the wild-type protein for the wild-type and A 6 C mutant sites. This result suggests that the Arg substitution is contacting the phosphate backbone and not making a base-specific contact with position 6. These results, taken together with the mutants at residues 47 and 54, suggest that, with the exception of residue 50, the specificity of DNA binding conferred by other residues in the ␣2 HD is not simply switched by swapping the amino acids in the recognition helix.

DISCUSSION
The mechanism of DNA binding by HD proteins has been extensively studied in vitro to understand how they recognize specific DNA sequences. However, because many HD proteins interact with cofactors that may alter the affinity and specificity of the HD, it is also important to determine whether these proteins bind through similar mechanisms in vivo. We have examined the mechanism of DNA binding by the yeast ␣2 protein, a distant member of the HD family, to determine if it uses a similar mechanism as other HD proteins to bind DNA. Many of the contacts made by ␣2 with the DNA in the cocrystal structures are similar to the contacts observed with other HD proteins (3,4,23). Our work shows that mutations in most of the residues contacting the DNA cause significant decreases in repression with Mcm1 and DNA-binding affinity. We have shown that many of the side chains that contact the phosphate backbone are essential for DNA binding and repression, indicating that they also play a critical role in helping position the HD on the DNA and providing binding energy.
Despite direct contacts with the DNA in both the ␣2-Mcm1 and a1-␣2 crystal structures, many of the residues in the Nterminal arm have only a weak role in repression with Mcm1. Biochemical, genetic, and structural studies have shown that residues in a flexible linker region adjacent to the N-terminal arm are involved in direct interactions with Mcm1 (23,29,34). In addition, two residues in the N-terminal arm, Tyr 3 and His 6 , make a direct contact with Mcm1. It is possible that the protein interactions by these residues replace the need for strong DNA contacts by the N-terminal arm. Another explanation for the relatively weak effect of mutations in the N-terminal arm is based on the observation that there is considerable variation among the natural ␣2-Mcm1 sites in the bases between the ␣2 and Mcm1 recognition sites (27). To bind cooperatively with Mcm1, the ␣2 N-terminal arm must be able to accommodate differences in the spacing between the sites as well as differences in the base pair sequences. DNA contacts by the Nterminal arm may therefore need to be relatively weak so that a loss of a contact to accommodate sites with alternate spacing will not cause a significant reduction in DNA-binding affinity. Although the N-terminal residues do not make large contributions to the binding affinity of the complex, they may still perform an important role in determining the specificity of DNA binding by the complex. In the context of the consensus ␣2-Mcm1 site used in these studies, mutations at the base pairs contacted by these residues cause large decreases in DNAbinding affinity and repression, indicating that there are important base-specific contacts at these positions (27).
As observed previously, although many of the side chains in the ␣2 HD are important for repression in complex with Mcm1, substitutions of these residues have little effect on repression in combination with a1 (28,30). However, several of the alanine substitutions, such as W48A and F8A, show a significant decrease in a1-␣2-mediated repression of the reporter promoter. In addition to directly contacting the DNA, a large portion of both of these side chains is buried in the hydrophobic core of the HD. Since Western blot analysis showed that these mutants are expressed at roughly the same level as the wildtype protein, it is likely that the alanine substitutions of these residues alter the HD structure. These changes likely disrupt multiple DNA contacts, resulting in a decrease in repression with a1.
As observed with the Engrailed HD, Ala substitution of Ser 50 in ␣2 causes only a slight decrease in DNA-binding affinity and repression (15). We have also found that substitutions with other small side chains, such as Cys, Thr, and Asn, show relatively minor effects on repression or DNA binding. The amino acid requirements at residue 50 therefore appear to be somewhat degenerate, allowing for substitutions with small amino acids. This degeneracy may be due in part to the fact that these contacts are at the edge of the binding site, which allows the conformation of the side chain to be altered slightly to accommodate more optimal contacts with the DNA. In support of this model, solution studies of the Antennapedia HD protein and crystallographic studies of the Even-skipped HD have shown that the Gln 50 side chain exists in multiple conformations when bound to DNA (8,35). These studies, along with other HD crystal structures, have also shown that there are often water molecules that contact both residue 50 and the DNA, forming indirect protein-DNA contacts. In mutants with substitutions at residue 50 with small side chains, these water molecules may reposition to make optimal contacts between the protein and DNA, as has been observed in the Q50A cocrystal structure of the Engrailed protein (36). The smaller side chain in this mutant permits three extra water molecules to form a cage-like structure around the residue that mediates contacts between the protein and DNA. It is likely that many of the Ser 50 substitutions in ␣2 with small side chains contain additional and/or repositioned water molecules near the side chain. The specific positioning of these water molecules by each side chain may explain why we observed subtle differences in the base pair specificity of some of the mutant proteins with small side chains. In contrast to what we have observed with the smaller side chains, substitutions with many of the larger side chains do not function in place of Ser 50 in ␣2. Either these substitutions may sterically interfere with the protein-DNA interface, or the size of the side chain may exclude water molecules from mediating a contact between the protein and DNA. Since many HD proteins contain these larger side chains, this suggests that to accommodate the larger side chains, these proteins may dock with the DNA in a slightly different manner from ␣2.
Although our results suggest that the amino acid requirements for residue 50 in ␣2 are partially degenerate, genetic and biochemical studies have shown that residue 50 plays an important role in determining the sequence specificity of many HD proteins, enabling them to distinguish between different NNATTA sites (11-13, 15-17, 37). The most striking alteredspecificity mutations usually involve substituting Lys at residue 50, such as in the Drosophila Engrailed Q50K, Paired S50K, and Fushi tarazu Q50K mutants. In each case, the Lys substitution prefers to bind a sequence of GGATTA. We observed a similar change in the DNA-binding specificity with the Lys substitution at residue 50 in the ␣2 HD. The ␣2 S50K mutant clearly prefers a G base at positions adjacent to the recognition core, such as GTGTAA, AGGTAA, and GGGTAA, in both ␣2-Mcm1and a1-␣2-mediated repression, whereas the FIG. 4. Effects of substitutions at residues 47 and 54 in the ␣2 HD. A, repression assays were performed in a mat⌬ (AJ126) or MATa strain (AJ83) cotransformed with ␣2 mutants and the reporter promoters on separate plasmids. Values shown are percent repression relative to the wild-type protein repressing transcription of the reporter promoter containing the wild-type site and were calculated as described in the legend to Fig. 3. B, shown is the electrophoretic mobility shift of the wild-type (WT) and ␣2 R54A proteins bound to wild-type and mutant ␣2-Mcm1 sites. ␣2 ranged by 2-fold dilutions from 8 ϫ 10 Ϫ7 to 1.2 ϫ 10 Ϫ8 M. wild-type ␣2 protein prefers the ATGTAA site. In the crystal structure of the Engrailed Q50K mutant bound to DNA, the Lys 50 side chain makes two hydrogen bonds with the O-6 and N-7 groups of the G base at the first position (GGATTA) and a hydrogen bond with O-6 of the G base at the second position (GGATTA) in the binding site (38). These hydrogen bonds are likely to contribute much more energy to binding than the van der Waals interactions between wild-type Gln 50 and the methyl group of the T base at the first position in the TAATTA site (2). In the case of ␣2, although we do not know whether the Lys side chain at residue 50 would make similar hydrogen bonds with the G base, it is clear that it makes a more favorable contact with G than with other bases at this position in the site. The altered specificity preference of the ␣2 S50R mutant for sites with G at position 2 may also be the result of that side chain making a single hydrogen bond with the G base. However, unlike the S50K mutant, the S50R mutant is unable to recognize a site with G at position 3. Both S50K and S50R show some preference for T at the second position in the site over a wild-type site. It is possible that these side chains make van der Waals contacts with the C-5 methyl group of a T base at this position.
Since it is possible to alter the DNA-binding specificity of HD proteins by changing the amino acid at position 50, we tested whether the DNA-binding specificity could be altered by changing other residues that contact the DNA. We have substituted Asn 47 and Arg 54 in the recognition helix of ␣2 with amino acids at the same positions in the Engrailed HD. However, these mutants are unable to bind DNA or repress either the mutant or wild-type sites. One explanation for this result is that in addition to making contact with the DNA, the Arg 54 side chain also makes a hydrogen bond with Asn 51 . This contact may help position Asn 51 , which makes a highly conserved contact with an adenine base in the site that is essential for DNA binding. A second explanation for our result is that there are base-specific and phosphate backbone contacts in the minor groove at position 4 in the site by the Arg 4 side chain in the N-terminal arm of the ␣2 HD (4). However, since the Arg 4 side chain makes only water-mediated contacts at this position in the site, and an Ala substitution of this residue has only a weak effect on repression or DNA binding (Table I), we reasoned that this contact is not likely to be very important for binding to the wild-type site. On the other hand, the G:C-to-A:T base pair substitution may disrupt the contacts by Arg 4 and other residues in the Nterminal arm, which, together with weakened contacts in the major groove, may prevent the mutants from binding to the mutant site. We have therefore been unable to alter the DNAbinding specificity of ␣2 by changing residues other than Ser 50 in the recognition helix. However, we have shown that changes at residue 50 alter the DNA-binding specificity of ␣2 in a manner similar to other HD proteins. These results suggest that there is at least a partial DNA recognition code for residue 50 in HD proteins.