Investigation of the Pyrimidine Preference by the c-Myb DNA-binding Domain at the Initial Base of the Consensus Sequence*

The principal determinant of the pyrimidine preference by the c-Myb DNA-binding domain at the initial base of the consensus sequence was investigated by mutation of both the protein and the DNA base pairs, with analysis by a filter binding assay. Amino acid residue 187 was revealed to interact with the pyrimidine base position, as estimated from our previous complex structure. Unexpectedly, since the pyrimidine preference is retained even in the Gly187 mutant, the principal origin of the base specificity should not occur via the direct-readout mechanism, but by an indirect-readout mechanism, namely in the intrinsic “bendability” of the pyrimidine-purine step of the DNA duplex. A significant but rather small positive base pair roll is detectable in the conformation of DNA in complex with the c-Myb DNA-binding domain. Following the conventional chemical rules of the direct-readout mechanism, amino acid mutagenesis at position 187 yielded several new base preferences for the protein.

Specific interactions between proteins and DNA are critical to gene expression and regulation, so a general readout mechanism of the information encoded in DNA has been sought (1)(2)(3). However, a number of complex structures of DNA duplexes and proteins determined at atomic resolution have revealed that nature uses a great variety of readout mechanisms (4 -11).
In most complex structures, the direct-readout mechanism is mediated by intermolecular hydrogen bond networks and hydrophobic interactions between DNA duplexes and proteins. The interaction modes have been classified into: (i) the intrinsic chemical features of bases and amino acids (1,3,12), and (ii) the stereochemical relations between the amino acids and the bases inside the DNA major grooves (12,13). In contrast, the indirect-readout mechanism works in several systems, such as the trp repressor/operator (4,5), where the DNA bases are specifically recognized by proteins without the use of particular hydrogen bonds or non-polar contacts. Instead, each sequencedependent deformation of the DNA conformation stabilizes the characteristic geometry of the phosphate backbone, which directly interacts with the protein through polar contacts (6). Water molecules are often observed to mediate the specific interaction through additional hydrogen bonds (4,5). One common type of DNA deformation is a steep kink of the duplex (7)(8)(9), which substantially contributes to readout of the minor groove (8,9).
In general, a combination of the direct-and indirect-readout mechanisms results in specific base pair recognition. In other words, both the specific binding affinity and the DNA bending contribute to the free energy of complex formation (6,14). This situation has made it difficult to determine how each consensus base sequence is recognized by the corresponding protein, even when the precise complex structure is known.
The c-myb gene product (c-Myb) is a transcriptional activator that specifically binds to DNA fragments containing the consensus sequence PyAAC(G/T)G, where Py indicates a pyrimidine (15)(16)(17). The DNA-binding domain (DBD) 1 of c-Myb consists of three imperfect 51-or 52-residue repeats (designated R1, R2, and R3 from the N terminus) (18 -20). The last two repeats, R2 and R3, are sufficient for the recognition of the specific DNA sequences (20,21). NMR analysis revealed that both R2 and R3 contain three helices, and the third helix in each is a recognition helix (22)(23)(24). R2 and R3 are closely packed in the major groove, so that the two recognition helices directly contact each other to bind cooperatively to the specific base sequence.
In the complex of c-Myb R2R3 with the Myb-binding DNA sequence (MBS-I), the consensus A4, the counterpart guanine of C6, and the last G8 directly interact with Asn 183 in R3, Lys 182 in R3, and Lys 128 in R2, respectively ( Fig. 1) (23). The strong cooperativity between R2 and R3 originates from the putative polar interactions between the side chains of Glu 132 and Asn 179 , and between those of Arg 131 and Asp 178 . However, it is not clear why the initial Py corresponding to the third base position in the MBS-I fragment is preferred by c-Myb R2R3, although this Py3 is less specific than the other A4, A5, C6, and G8 sites in the consensus DNA sequence (17). In our NMR structure shown in Fig. 1, Ser 187 is the only candidate that interacts with the T3 base, and this ability was suggested in our previous paper (23). The hydroxyl group in the Ser side chain could form a hydrogen bond with the O 4 oxygen of the T3 base, either directly or through water molecules.
Thus far, the Myb-homologous DBD has been found in over 30 proteins from many species. An alignment of the DBDs shows that the Ser at position 187 is highly conserved in the animal sequences, whereas it is variable in the plant sequences (23,25).
Here, to investigate the role of Ser 187 and the origin of this * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
‡ ‡ To whom all correspondence should be addressed. Tel.: 81-6-872-8212; Fax: 81-6-872-8219. pyrimidine preference at the third base position, both Ser 187 in the c-Myb R2R3 and the third T-A base pair in the 22-mer MBS-I fragment containing the Myb-binding site were substituted by other amino acids and other base pairs, respectively. The interactions between them were examined using a filter binding assay, whose efficiency has already been shown (17,26,27). The recognition mechanism will be discussed.

EXPERIMENTAL PROCEDURES
Plasmids and Site-directed Mutagenesis-A DNA fragment encompassing R2R3 (Leu 90 -Val 193 ) in the DNA-binding domain of c-Myb was amplified by polymerase chain reaction, using pact-c-myb (28) as the template and two synthetic primers, to generate an NcoI site and a BamHI site at the 5Ј-and the 3Ј-end of the amplified fragment, respectively. After digestion with NcoI and BamHI, the DNA fragment was cloned into pAR2156NcoI (17) to yield the expression plasmid, pRP23. An additional Met-Glu-sequence was introduced at the N terminus of R2R3. Site-directed mutagenesis was performed by two-step polymerase chain reaction, as described by Higuchi (29). Here the name of each mutant protein is indicated as, for example, C130I/S187G for the simultaneous mutations that replace Cys 130 with Ile and Ser 187 with Gly.
Protein Expression and Purification-Escherichia coli BL21(DE3) was transformed with the wild type and mutant plasmids (30). Freshly precultivated cells were inoculated into growth medium containing 100 g/ml ampicillin and were grown at 37°C. When the culture reached an A 600 of about 0.4, isopropyl-1-thio-␤-D-galactopyranoside was added to a final concentration of 0.5 mM. The cells were cultured at 22°C for another 12 h. The harvested cells were suspended in 50 mM Tris-HCl buffer (pH 7.8) containing 5 mM MgCl 2 , and were lysed by sonication at 4°C. After the cell debris was removed by centrifugation, ammonium sulfate was added to the supernatant to 50% saturation. After an incubation at 4°C for 1 h, the supernatant was dialyzed against 50 mM potassium phosphate buffer (pH 7.5) containing 200 mM NaCl, and was then applied to a phosphocellulose column (Whatman, P11). The purified fractions were pooled, and the buffer was exchanged to 100 mM potassium phosphate buffer (pH 7.5) containing 20 mM KCl. The protein concentrations were determined from UV absorption at 280 nm and were calculated by using the molar absorption coefficient of 3.7 ϫ 10 4 M Ϫ1 cm Ϫ1 (17). CD Measurements-Circular dichroism (CD) spectra were measured at 20°C on a Jasco J-600 spectropolarimeter equipped with a watercirculating cell holder. The spectra were obtained in 100 mM potassium phosphate buffer (pH 7.5) containing 20 mM KCl, using a 0.2-cm optical path length cell. The protein concentration was 0.1 mg/ml. CD spectra between 200 and 250 nm were obtained using a scanning speed of 20 nm/min, a time response of 1 s, a bandwidth of 1 nm, and an average over 8 scans.
Preparation of Oligonucleotides-The 22-mer oligonucleotide CAC-CCTAACTGACACACATTCT, containing the Myb-binding site in the simian virus 40 enhancer sequence (MBS-I) (31), and the third base substituted variants were synthesized and purified by high performance liquid chromatography with a C 18 reverse-phase column (Fig. 2). The purified DNA was suspended in STE (10 mM Tris-HCl (pH 8.0), 100 mM NaCl, 1 mM EDTA), and complementary strands were annealed and end-labeled with [␥-32 P]ATP (Amersham) using T4 polynucleotide kinase (Toyobo, Osaka, Japan). The labeled DNAs were purified by passage through spin columns (Pharmacia Biotech Inc., HR-300). Here the name of each variant DNA is indicated as, for example, [C3]MBS-I, for the substitution of the T-A base pair at the third position by a C-G base pair.
Filter Binding Assay-All filter binding assays for the protein-DNA binding were carried out essentially as described (32)(33)(34). [ 32 P]DNA and various amounts of the c-Myb R2R3 mutant proteins were incubated in 100 l of binding buffer (100 mM potassium phosphate buffer (pH 7.5), 20 mM KCl, 0.1 mM EDTA, 500 g/ml bovine serum albumin, and 5% (v/v) glycerol) on ice for 30 min. The final concentration of the [ 32 P]DNA in binding buffer was 0.4 nM, which was always a lower concentration than the K d value. The incubated samples were filtered through a nitrocellulose membrane (Schleicher & Schuell, BA-85, 0.45 m) in approximately 10 s with suction. The filters were dried and counted by a liquid scintillation counter. The equilibrium dissociation constants K d were obtained from the binding titration curve, based on the least square fitting to the normalized bound DNA (y) with the protein concentration (x) using the formula, y ϭ x/(x ϩ K d ).

RESULTS
Prior to the mutational analyses of Ser 187 , the Cys 130 in R2, which is the only cysteine residue in the c-Myb R2R3 and is located at a position equivalent to an isoleucine in R3, was replaced with Ile, to facilitate the protein purification and the DNA-binding assay (35). It was reported that this mutation has little effect on DNA binding (36). The affinity of the C130I mutant was also measured in our own assay system, and it was shown to be almost equal to that of the wild type, and to maintain the pyrimidine preference at the third base position (Table I).
A series of 10 amino acids, Gly, Ala, Thr, Asn, Gln, Val, Leu, Lys Arg, and Asp, were introduced into position 187 of the c-Myb R2R3, which is a Ser residue in the wild type. The purity of each mutant protein was about 95%, as monitored by SDS- polyacrylamide gel electrophoresis. All of the mutant proteins have secondary structure contents similar to the wild type, as confirmed by the CD spectra at the far UV region (Fig. 3). The perfect coincidence of all the spectra suggests that the global tertiary structures of the mutant proteins were not deformed.
The binding affinities of the mutants to the cognate 22-mer MBS-I fragments and the third base pair substituted variants were analyzed using the filter binding assay, and the results are summarized in Table I. All measurements were repeated at least twice, and typical experimental errors for the K d value were less than 10%, although the retention efficiency was 20 Ϯ 10% depending on the experimental conditions. From the methylation interference experiments (17) and the NMR analyses (23), the number of bound DNA duplexes per the c-Myb is considered to be one within the concentration used in this assay. As already indicated in the previous experiments (17,26,27), the filter binding assay was validated for this investigation.
The C130I/S187G mutant protein binds about one-third less strongly to the cognate MBS-I than the standard C130I mutant. The relative binding free energy change for the replacement of Ser with Gly, calculated from the K d values, is 0.65 kcal/mol. It should correspond to the free energy derived from the interaction between the Ser side chain and the T3 base. This Gly mutant preferentially binds to both the cognate [T3]MBS-I and the substituted [C3]MBS-I. That is, even when residue 187 has no side chain, the mutant protein prefers the third pyrimidine as well as the wild type and the C130I mutant proteins.
The substitutions of Ser 187 by Ala (C130I/S187A), Thr (C130I/S187T), or Val (C130I/S187V) reveal slightly reduced binding affinities, although the sequence specificities are retained like the standard C130I. In contrast, the C130I/S187N mutant preferentially binds to the [A3]MBS-I. The affinity for the A3 base is similar to that of the wild type, although those for the other three bases (T, C, and G) are greatly reduced, by approximately one-half to one-sixth. The specific interaction between the Asn residue and the A3 base closely follows the intrinsic chemical features. Interestingly, for the substitution by Gln, which is one methylene group longer than Asn, the C130I/S187Q mutant loses the preference for the A3 base. Also, in the case of the C130I/S187L mutant, in which Leu is one methylene group longer than Val, the binding affinity is greatly reduced.
The mutant proteins C130I/S187K and C130I/S187R, which introduced basic amino acids into position 187, specifically prefer to bind to the [G3]MBS-I and [C3]MBS-I variants. In contrast, for the substitution of Ser 187 by acidic Asp (C130I/ S187D), the binding affinity is completely reduced and is no longer sequence-specific. DISCUSSION Thus far, many amino acid replacements in the c-Myb R2R3 have been created and assayed by specific DNA binding (37)(38)(39), and almost all of their effects have been explained by the specific polar contacts between the R2R3 and the DNA in the three-dimensional structure of the R2R3-DNA complex (23). The current mutational study clearly indicates that residue 187 in R2R3 is also able to interact with the T3 base, as estimated from the geometry of Ser 187 in the NMR complex structure (23). This specific DNA-binding mode is very different from the telomeric DNA recognition by the yeast RAP1-DBD (40), whose amino acid sequence is weakly homologous to that of the c-Myb R2R3.
However, the substitution of Ser 187 with Gly, Ala, or Val unexpectedly resulted in only about a 3-fold decrease in the binding affinity toward any base, which would be a consequence of a direct-readout mechanism, while the pyrimidine base preference at the third position in the MBS-I fragment was retained. Ser is thought to have weak specificity, because its side chain can act as either a hydrogen bond donor or an acceptor, and thus can bind to any base. Nevertheless, Ser 187 of the R2R3 preferentially binds to the pyrimidine bases. If this interaction were attributable only to the direct-readout mechanism, then the substitution of Ser 187 should have resulted in an over 100-fold reduction of the binding affinity and a loss of the sequence specificity, like the substitution of Lys 128 by Ala a An additional Met-Ala-sequence was introduced at the N terminus of R2R3, which was used in the NMR experiment (23). (23), and those of Asn 136 and Asn 186 by Ala (38). These results suggest that the preference of the pyrimidine bases at the third position of MBS-I should occur primarily by an indirect-readout mechanism.
In our previous structural study, no distinct deformation of the global DNA conformation was observed (23). However, when the local bending of the DNA duplex was carefully analyzed in 25 NMR complex structures and the refined average structure (Protein Data Bank codes 1MSF and 1MSE, respectively), significantly positive roll angles were always observed between the third pyrimidine and the fourth purine, as indicated by an arrow in Fig. 4. Characteristic negative slides (Ϫ1.1 Ϯ 0.3 Å) were also observed at the same pyrimidinepurine step, corresponding to positive rolling, while the twist angles at this step were 34.1 Ϯ 2.3°, nearly equal to the twist angle in standard B-form DNA. Similar significant, positive rolls at pyrimidine-purine steps are general phenomena (42), observed in many complex crystal structures of repressors and homeodomains with the helixturn-helix motif, as summarized in Table II. In every case, as a part of the consensus base sequence, the base pair roll bends the DNA so that the recognition helix is wrapped by the DNA duplex in the major groove (3,42). Consequently, a large contact area is created between the recognition helix and the DNA major groove, facilitating the preferable polar contacts between the protein side chains and the DNA phosphate backbone. The local roll in the MBS-I fragment may be associated with the small magnitude of observed bending in long DNA duplexes bound with the c-Myb R2R3 (52). This bending may be enhanced by other regions in the protein, like the transactivation domain.
Due to the intrinsic propeller-twist of the DNA base pairs, the pyrimidine-purine step has two stable conformations, with rolling of 0°and around 10° (53,54), from the physical requirements of the base stacking (55). There is negligible additional free energy cost required for the 10°rolling at the pyrimidinepurine step, even for a free DNA duplex without a protein. This is the physical origin of the so-called "bendability" of kinked DNA duplexes, commonly observed in the minor groove readout mechanism (8,9). At the other pyrimidine-pyrimidine, purine-purine, and purine-pyrimidine steps, no such tendency toward a strongly bistable step is observed (53). In fact, the binding free energy differences between the pyrimidine bases and the purine bases at the third base position for the current Gly 187 , Ala 187 , and Val 187 mutants are 0.4 Ϯ 0.1 kcal/mol, as calculated from the dissociation constants in Table I. Fig. 5 shows the results of the relative binding free energy changes ⌬⌬G toward the C130I/S187G mutant: ⌬⌬G ϭ ⌬G bind (mutant against the third N base) Ϫ ⌬G bind (C130I/S187G against the same third N base), where ⌬G bind ϭ RT ln K d . Here, the difference was calculated while keeping the same third position base pair. We can now separate the bendability effect from the total binding free energies between the c-Myb R2R3 mutants and the variety of DNA sequences, unless the binding modes vary from the wild type. Each positive and negative free energy corresponds to a decrease and an increase of the binding affinity, depending upon the intrinsic chemical features of the amino acids and the bases, and subtracting the DNA bending effect.
For the Ala substitution, the binding affinity is increased as compared with Gly 187 , independent of the bases at the third position, probably due to the hydrophobic contacts. When the side chain volume is larger in the Val substitution, a similar binding affinity to the pyrimidines remains, but the affinity becomes neutral to the purines. Therefore, the volume of space created between residue 187 and the third base may allow at most the Val-pyrimidine pair, but the Val-purine pair would be slightly too large for the space. In fact, other amino acids, such as Leu and Gln, with larger side chain volumes than Val, significantly lack binding affinity, as indicated in Fig. 5. Moreover, the Val-, Leu-, and Gln-substituted mutants always have lower affinities for adenine than for guanine. This is also supported by the fact that the amino N 6 of adenine occupies a larger volume than the oxygen O 6 of guanine, which should be located at the position nearest to the side chain of residue 187.
From this consideration of the space volume around residue 187 and the third base, the native and the optimum interaction between Ser 187 and T3 should be mediated by water molecules, as long as the binding mode is assumed to be the same in all of the mutant proteins and DNAs. In the Thr mutant, the disposition of the water molecules could be different from that in the wild type, thus yielding a slight decrease in the binding affinity. Since there is no possible conformation on the helix in which the methyl group of the Thr side chain would be able to access the methyl group in T3, as shown in a modeling study, a specific non-polar contact between the Thr mutant and T3 is not expected.
Following the conventional chemical rules for specific binding between amino acids and bases (1,3,12), the current Asn mutant specifically binds to the A3 base relative to the other bases, as indicated in Fig. 5. The Asn side chain size is less than that of Val, and there should be enough space for the Asnadenine pair, resulting in the formation of direct hydrogen bonds with a free energy gain of about 0.5 kcal/mol. In addition, the Lys and Arg mutant proteins prefer to bind to the G3 base. From their intrinsic chemical nature, both basic amino acids can bind to the guanine base almost exclusively by electrostatic interaction. In contrast, these mutant proteins bind to the [A3]MBS-I and [T3]MBS-I bases with only weak affinity, probably because of the bulky side chains of the amino acids, like the Leu mutant. It is interesting that their long side chains seem to interact with the guanine base on the opposite side of C3. The acidic Asp substitution results in a severe reduction of its DNA binding, which is much lower than the Gly substitution, suggesting that the Asp side chain cannot interact with any base, including cytosine, in this geometry. Rather, the negative ionic charge may disturb other specific hydrogen bonds between the protein and the DNA.
The wild type protein and the C130I mutant with Ser 187 bind to the cognate DNA most tightly among the mutant proteins, and their K d values are in the nanomolar order. Generally, transcriptional regulator proteins bind to their target genes with greater affinity (57). These results are consistent with the conservation of Ser in position 187 of c-Myb among animal species (23). In contrast, among plant species, the amino acid in this position varies (25). This suggests that the recognition mode in the plant Myb homologues may be different from that of the c-Myb DBD from animal species. In fact, in the case of the yeast RAP1 domain 1, the corresponding Val 409 residue does not interact with the DNA in the complex structure (40), although the free domain structure is similar to that of the c-Myb R3.
In conclusion, the current mutational analysis revealed that the pyrimidine preference of the native c-Myb DBD for the initial base of the consensus sequence originates principally in the intrinsic positive roll at the pyrimidine-purine step of the DNA duplex. For the purine-purine step, as much as 0.4 kcal/ mol of additional free energy would be necessary, corresponding to the bendability. When these bending energies are separated, the conventional chemical rules between the amino acids and the bases are distinctively observed in the c-Myb R2R3 mutants.
It is still difficult to extract a definite "recognition code" from the variety of DNA information readout mechanisms. The situation becomes much more complicated when the DNA flexibility is considered. Only a screening technology, such as a phage display library (58 -60), would be expected to reveal a novel, specific form of DNA recognition, instead of an artificial  5. Relative binding free energy changes (⌬⌬G) toward the C130I/S187G mutant protein binding of the Ser 187 -substituted R2R3 mutant proteins to the cognate MBS-I and its variants. Each ⌬⌬G for the particular base pair was calculated from the K d values, as described under ''Discussion.'' The side chain volumes (Å 3 ) of the substituted amino acids (56) are indicated in the parentheses below the individual residues. molecular design. However, based upon the complex structure and the mutational analysis, one may be able to dissect the sequence specific affinity into the DNA bendability and the specific interaction between the amino acids and the bases. Without this kind of precise analysis, we may never reach a complete understanding of the readout mechanism, nor produce any novel devices for molecular readout.