Overexpression, Purification, and Biophysical Characterization of the Heterodimerization Domain of the Core-binding Factor β Subunit*

Core-binding factors (CBF) are heteromeric transcription factors essential for several developmental processes, including hematopoiesis. CBFs contain a DNA-binding CBFα subunit and a non-DNA binding CBFβ subunit that increases the affinity of CBFα for DNA. We have developed a procedure for overexpressing and purifying full-length CBFβ as well as a truncated form containing the N-terminal 141 amino acids using a novel glutaredoxin fusion expression system. Substantial quantities of the CBFβ proteins can be produced in this manner allowing for their biophysical characterization. We show that the full-length and truncated forms of CBFβ bind to a CBFα·DNA complex with very similar affinities. Sedimentation equilibrium measurements show these proteins to be monomeric. Circular dichroism spectroscopy demonstrates that CBFβ is a mixed α/β protein and NMR spectroscopy shows that the truncated and full-length proteins are structurally similar and suitable for structure determination by NMR spectroscopy.

complex (1,2). CBF␤ modulates the affinity of the CBF␣ subunit for DNA without establishing additional contacts on the DNA or changing the magnitude of DNA bending (2,13).
The amino acid sequence of CBF␤ yields few clues to its tertiary structure or the mechanism by which it modulates the affinity of the CBF␣ subunit for DNA. The heterodimerization domain in CBF␤ has been localized to its N-terminal 135 amino acids, which corresponds to a region of significant homology between CBF␤ and its two Drosophila homologs, Brother (Bro) and Big Brother (Bgb) (13,14). A truncated CBF␤ protein containing amino acids 1-141 (CBF␤(141)), which includes the region of homology to Bro and Bgb, stably associates with the CBF␣ subunit in vitro (1,14,15). Further truncation of the C terminus to amino acid 133 results in a protein that weakly associates with CBF␣, and C-terminal truncation to amino acid 117 disrupts stable heterodimerization with CBF␣ (1,15).
The primary structures of CBF␤ and its Drosophila homologs are not similar to those of any other proteins, and the mechanism by which CBF␤ stabilizes the CBF␣⅐DNA complex is unusual. The CBF␤ subunit is an essential component of the CBF complex and is mutated in a significant percentage of human leukemias, making it both an interesting and important target for biophysical and structural analyses. In this study, we describe a novel system for expressing the CBF␤ subunit in bacteria and a purification protocol with which we can obtain large amounts of homogeneous CBF␤ protein. We confirm that a truncated form of CBF␤, CBF␤(141), demonstrated previously to contain an intact heterodimerization domain (1), binds to CBF␣ with an affinity very similar to that of the full-length CBF␤(187) protein. The isolated heterodimerization domain in CBF␤(141) also assumes a folded structure * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
indistinguishable from that which it assumes in the context of the full-length CBF␤(187) protein. We also demonstrate that both the full-length CBF␤(187) and truncated CBF␤(141) proteins are monomeric and contain a mixture of ␣-helical and ␤-strand secondary structural elements.

EXPERIMENTAL PROCEDURES
Plasmid Construction-The pGRXCBF␤141 plasmid encoding the glutaredoxin-CBF␤(141) fusion protein was constructed in the following manner. DNA sequence corresponding to Escherichia coli glutaredoxin-1 was polymerase chain reaction amplified from the plasmid pMG524-GRX (25) using the following primers: 5Ј-CGGAATTCGGTTA-AACCTACTTTCAGCG-3Ј (S-GRX) and 5Ј-CGGGATCCCTTGTCA-TCGTCATCGGCGTCCAGATTTTCTTTCACC-3Ј (AS-GRX). The sense primer (S-GRX) contains the recognition site for EcoRI at its 5Ј end. The AS-GRX primer contains a restriction site for BamHI, followed by sequences encoding a cleavage site for the protease enterokinase (DDDDK). DNA encoding CBF␤(141) was amplified from the plasmid CBF␤(p21.5) (2) using the primers 5Ј-AGACGGATCCATGCCGC-CGTCGTCCCGGAC-3Ј (S-CBF␤141) and 5Ј-GGCCCAAGCTTTCACT-GTTGTGCTAATGCATCTTCC-3Ј (AS-CBF␤141). A restriction site for BamHI was included at the 5Ј end of the sense primer (S-CBF␤141), and a HindIII site followed by a translational stop codon was added to the antisense primer (AS-CBF␤141). The polymerase chain reaction amplified fragments were digested with the appropriate restriction enzymes and subcloned between the EcoRI and HindIII sites of pMG524. The resultant plasmid was transformed into the bacterial strain N99cI for the purpose of DNA characterization and sequencing and into strain N4830 or AR58 for protein expression.
The pGRXCBF␤187 plasmid encoding the glutaredoxin protein fused to CBF␤(1-187) was generated by replacing the BamHI-HindIII fragment in pGRXCBF␤141 that encodes CBF␤(141) with a BamHI-Hin-dIII fragment containing the open reading frame of CBF␤(1-187), which was excised from a previously described plasmid containing CBF␤(1-187) subcloned into pBluescript SKϩ (2). A cleavage site for Factor Xa was subsequently introduced into this plasmid. Complementary oligonucleotides encoding a Factor Xa cleavage site flanked by BamHI sites (5Ј-GATCCATCGAAGGTCGTG-3Ј and 5Ј-GATCCAC-GACCTTCGATG-3Ј) were annealed, phosphorylated, and subcloned into the BamHI site, between the enterokinase cleavage site and the open reading frame of CBF␤(187). The inserts in all plasmids were confirmed by sequencing. These plasmids are available upon request to the corresponding authors.
Expression of CBF␤(141) and CBF␤(187)-E. coli AR58 were transformed with the pGRXCBF␤141 plasmid, and a single colony was used to inoculate 10 ml of terrific broth (Sigma) containing 100 g/ml carbenicillin. After overnight shaking at 200 rpm at a temperature of 29°C, the culture was used to inoculate 1 liter of terrific broth (100 g/ml carbenicillin). This culture was grown to an A 600 of 1.5 and then induced by raising the temperature to 40°C and maintaining at this temperature for 4 h. Cells were collected by centrifugation, resuspended in an equal weight of 10% sucrose, 50 mM Tris (pH 7.5), and frozen by dripping into liquid nitrogen. The frozen cells were stored at Ϫ70°C. For expression of full-length CBF␤, pGRXCBF␤187 was transformed into E. coli AR58, and cells were grown in the identical manner as for CBF␤(141).
Purification of CBF␤(141)-All operations were carried out at 4°C. Pefabloc (1 mM, Boehringer Mannheim), EDTA (1 mM), and lysozyme (0.4 mg/ml) were added to the thawed bacterial cell pellet from a 1-liter culture. The solution was stirred gently for 30 min, and the cells were then further lysed by four passages through a French press at 18,000 p.s.i. The lysate was clarified by centrifugation at 11,000 ϫ g for 10 min. The resulting supernatant was loaded onto a 2.5 ϫ 25-cm column of DEAE-Sephacel (Pharmacia Biotech Inc.) equilibrated in 25 mM Tris-Cl (pH 7.5), 1 mM EDTA, 1 mM DTT at 1 ml/min and eluted from the column with a 1-liter linear gradient of 0 -500 mM NaCl in the same buffer. Fractions containing the fusion protein were identified by hydroxyethyl disulfide assay, which detects the enzymatic activity of the glutaredoxin system as described by Holmgren (26). The active fractions were pooled and concentrated by ultrafiltration on a YM3 membrane (Amicon, Inc.) to 15 ml. The DEAE pool was loaded onto a 2.5 ϫ 113-cm column of Sephacryl S-100 (Pharmacia) in 25 mM Tris-Cl (pH 7.5), 1 mM EDTA, 75 mM NaCl and eluted with the same buffer at 0.6 ml/min. The active fractions were again identified by 2-hydroxyethyl disulfide assay, pooled, and concentrated by ultrafiltration to 15 ml. The resulting homogeneous fusion protein was cleaved by treatment with 90 units of enterokinase (Biozyme, San Diego)/A 280 unit of fusion protein with addition of DTT to 0.5 mM and incubation at 30°C for 20 h. The cleavage reaction was halted by addition of 0.5 mM Pefabloc. The resulting cleaved protein was exchanged into a lower ionic strength buffer (25 mM NaCl), loaded onto a 2.5 ϫ 10-cm column of Q-Sepharose (Pharmacia) in 25 mM Tris-Cl (pH 7.5), 1 mM EDTA, 25 mM NaCl, 1 mM DTT, and eluted with a 500-ml linear gradient of 25-500 mM NaCl in the same buffer at 0.8 ml/min. The CBF␤ fractions were identified by SDS-PAGE, pooled, and concentrated to 10 ml by ultrafiltration. From 1 liter of culture, a yield of 20 mg is typically obtained. For the 15 Nlabeled protein used for NMR studies, the Q-Sepharose pool was loaded onto a 2.5 ϫ 36-cm column of Sephacryl S-100 in 25 mM potassium phosphate (pH 6.5), 0.1 mM EDTA, 1 mM DTT, 0.1% NaN 3 and eluted at 0.8 ml/min. The protein was concentrated by ultrafiltration.
Purification of CBF␤(187)-All operations were carried out at 4°C. The Grx-CBF␤(187) fusion protein was purified in the same manner as described above for the Grx-CBF␤(141) fusion protein. Following fractionation on DEAE-Sephacel and Sephacryl S-100, the resulting homogeneous Grx-CBF␤187 fusion protein was cleaved by treatment with 3.5 units of Factor Xa (Pharmacia)/A 280 unit of fusion protein in a buffer containing 25 mM Tris-Cl (pH 7.5), 1 mM EDTA, 75 mM NaCl, 1 mM CaCl 2 , and no DTT. Secondary cleavage at an undesired Factor Xasensitive site was minimized by carrying out the reaction for 6 h at 22°C. The cleavage reaction was halted by addition of 4 mM Pefabloc with a subsequent 40-min incubation at 37°C. The cleaved protein was exchanged into a lower ionic strength buffer (25 mM NaCl) by ultrafiltration, loaded onto a 2.5 ϫ 10-cm column of Q-Sepharose in 25 mM Tris-Cl (pH 7.5), 1 mM EDTA, 25 mM NaCl, 1 mM DTT, and eluted with a 500-ml linear gradient from 25 to 500 mM NaCl in the same buffer at 0.8 ml/min. The CBF␤(187) fractions were identified by SDS-PAGE, pooled, and concentrated to 4 ml by ultrafiltration. The concentrated protein was loaded onto a 2.5 ϫ 113-cm column of Sephacryl S-100 (Pharmacia) in 25 mM Tris (pH 7.5), 1 mM EDTA, 1 mM DTT, 75 mM NaCl and eluted with the same buffer to separate the full-length protein from the product of the secondary cleavage reaction.
SDS-Polyacrylamide and Isoelectric Focusing Gel Electrophoresis-SDS-polyacrylamide gel electrophoresis was carried out using 15% Ready Gels from Bio-Rad with Coomassie staining as described by the manufacturer. Isoelectric focusing gel electrophoresis was carried out on a Pharmacia Phast System using IEF 3-9 gels, and Coomassie staining was performed as described by the manufacturer.
Molar Extinction Coefficients-The molar extinction coefficients for the CBF␤(141) and CBF␤(187) proteins were calculated on the basis of the tryptophan, tyrosine, and cystine content of the proteins as described by Pace et al. (27). A value of 18,500 M Ϫ1 cm Ϫ1 was calculated for both proteins.
MALDI Mass Spectrometry-MALDI mass spectrometry was carried out by Dr. Gary Siuzdak at the mass spectrometry laboratory at the Beckman Center for Chemical Sciences, Scripps Research Institute. Samples were dialyzed into 20 mM ammonium acetate (pH 7), 0.5 mM DTT prior to analysis.
N-terminal Sequencing-To confirm the site of cleavage by enterokinase, N-terminal sequencing of the first five residues was carried out at the Biotechnology Resource Laboratory located at Yale University. A sequence of Xaa-Xaa-Met-Pro-Arg was obtained showing that cleavage had occurred at the correct site; however, the sequence of the first two amino acids was not determined with certainty.
Measurement of Dissociation Constants-The affinities of CBF␤(141) and CBF␤(187) for a CBF␣⅐DNA complex were measured by electrophoretic mobility shift assay. The source of CBF␣2 was an 41-214amino acid fragment from the murine CBF␣2 protein encompassing the DNA-binding Runt domain, expressed in bacteria and purified as described previously (28). Basically, binding conditions were chosen such that the concentration of DNA was Ͼ10-fold above the K d of the CBF␣2 Runt domain for the DNA site (ϳ1 ϫ 10 Ϫ8 M DNA), and the concentration of active CBF␣2 Runt domain was Ͼ10-fold below the K d of CBF␤ for the CBF␣2⅐DNA complex [5 ϫ 10 Ϫ10 M CBF␣2]. A fixed amount of DNA and the Runt domain, along with various amounts of CBF␤ proteins, were used in binding reactions, and the complexes were resolved by electrophoretic mobility shift assay. Assay conditions and methods for quantitation were the same as described previously (15). The data are plotted as the percentages of CBF␣2⅐CBF␤⅐DNA complex versus the concentration of CBF␤, and the K d is defined as the concentration of CBF␤ at 50% saturation (Kaleidagraph, Synergy Software). 100% ternary complex was defined as the point of saturation, and 0% ternary complex is taken from the background at the position in the absence of added CBF␤.
Sedimentation Equilibrium-High speed (29) sedimentation equilibrium experiments were conducted at 20°C in a Beckman XL-l ultracentrifuge at rotor speeds of 15,000, 20,000, 25,000, and 30,000 rpm (CBF␤(141)) or 10,000, 15,000, 20,000, and 25,000 rpm (CBF␤(187)) using absorbance detection. For each sample, three cell loading concentrations were examined, from 0.25 to ϳ1 mg/ml in 25 mM Tris-Cl (pH 7.5), 150 mM KCl, 1 mM DTT. Data were fit using NONLIN (30). Partial specific volumes and solvent density (1.00960 g/ml) were calculated as described (31). For CBF␤(141), the calculated partial specific volume (0.7225 ml/g) was adjusted for the peptide charge (Ϫ3 at pH 7.5) by decreasing the molar volume 25 ml/mol charge to 0.7185 ml/g. An additional uncertainty of 0.004 ml/g in the partial specific volume was included when calculating the molecular weights (32,33). For CBF␤(187), the calculated partial specific volume (0.7161 ml/g) also was reduced for its expected charge (Ϫ4 at pH 7.5) by 0.004 ml/g, and the same additional uncertainty was included in the molecular weight calculations.
Circular Dichroism Spectroscopy-Circular dichroism spectra were collected at 20°C on a Jasco 715 spectrometer calibrated using 10camphorsulfonic acid. Mean amide ⌬⑀ values were calculated using the known protein sequence. The protein solutions were dialyzed extensively against 25 mM potassium phosphate (pH 7.5), 0.1 mM EDTA, 0.5 mM DTT prior to CD measurements. Quartz cells of 0.05 mm were used for measurements in the far ultraviolet (176 -260 nm). The data were corrected by subtraction of a spectrum of the buffer alone. A total of eight scans were recorded at 1 nm resolution from 265 to 175 nm for both protein and buffer at a rate of 10 nm/min with a 16-s response time. The resulting data for 178 -260 nm were fit using the variable selection protocol of Johnson and co-worker (34,35) using software provided by Dr. Johnson. Three proteins at a time were removed from the 33-protein data base, and the resulting 5456 combinations were examined for total percentage of secondary structure and root mean square error. Eleven combinations were finally selected, all of which had root mean square error values less than 0.20 (CBF␤(141)) and 0.25 (CBF␤(187)).
NMR Spectroscopy-All measurements were performed on a Varian UnityPlus 500 NMR spectrometer equipped with an actively shielded triple-resonance probe from Nalorac Corporation and pulsed field gradients. Measurements were carried out at 20°C with solutions of the proteins in 25 mM potassium phosphate (pH 6.5), 0.1 mM EDTA, 0.1% sodium azide, 5 mM DTT. 15 N-1 H HSQC spectra were recorded using the gradient sensitivity-enhanced HSQC sequence (36). The number of complex points and acquisition times for these experiments were 15 N (F1), 128 points, 79.3 ms and 1 H (F2), 1024 points, 157 ms.
For measurement of 15 N T 1 and T 2 values, the pulse sequences of Farrow et al. (37) optimized for minimal saturation of water were employed. A recycle delay of 1.0 s was used between acquisitions to ensure sufficient recovery of NH magnetization (38). 15  To generate pure absorptive 2D line shapes, the N-and P-type signals recorded for each t 1 point were stored separately to carry out the necessary addition and subtraction of free induction decay and 90°p hase correction as described previously (36). The necessary data manipulations were carried out using software written in-house. All other data processing was carried out using the program PROSA (39).
The intensities of the peaks in the two-dimensional spectra were analyzed by measuring the peak heights using the integration routine in the program XEASY (40). The uncertainties in the measured peak intensities were set equal to the root-mean-square base-line noise of the spectra. The T 1 and T 2 values were determined by fitting the measured peak heights to the following two-parameter function: in which I(t) is the intensity at time t and I 0 is the intensity at time zero. T 1 and T 2 values and uncertainties were determined by nonlinear least squares fitting of the experimental data points to the monoexponential decay given in Equation 1 using the Levenburg-Marquardt algorithm (41) to minimize the 2 goodness of fit parameter. The goodness-of-fit of the data to Equation 1 was evaluated by comparison of the calculated 2 value with tabulated values of 2 at 95% confidence level as described previously (42). An initial estimate of the correlation time for CBF␤(141) and CBF␤(187) were obtained from a 10% trimmed mean of the T 1 /T 2 ratio (43) for those backbone amides with T 1 and T 2 values deemed adequate by the goodness-of-fit analysis described above.

Construction of Glutaredoxin Fusion Protein Vector-
The biophysical and structural studies we wish to carry out to characterize CBF␤ require large quantities of homogeneous protein. Attempts to overexpress CBF␤ in a number of different vectors including glutathione S-transferase fusion vectors met with limited success. To overcome these difficulties and provide the milligram quantities of CBF␤ necessary for biophysical and structural studies, we have engineered a novel fusion vector based on E. coli glutaredoxin. Based on the high level of overexpression obtained for E. coli glutaredoxin (25), we reasoned that a fusion vector with CBF␤ fused to the C-terminal end of glutaredoxin should also give very high levels of expression. Fig. 1A illustrates the construct we have employed in which the coding sequence for E. coli glutaredoxin-1 has been fused to that of CBF␤(141) via a linker containing the DDDDK recognition sequence for enterokinase. The DDDDK sequence permits the release of CBF␤(141) from the fusion protein via proteolysis with enterokinase. Two additional amino acids are retained at the N-terminal end (Gly-Ser) due to the BamHI restriction site used for subcloning purposes. A similar fusion vector has been constructed with the related protein thioredoxin (44); however, this vector employs tryptophan for induction, making it unsuitable for labeling proteins with 13 C and 15 N for NMR spectroscopy. Our Grx fusion vector employs a heat-inducible induction based on the heat stability of a mutant repressor (25), thus avoiding these difficulties.
For reasons described below, it was necessary to replace the enterokinase site in the vector utilized for expression of CBF␤(187). This alternate vector was constructed by subcloning a double stranded oligonucleotide encoding a Factor Xa cleavage site (IEGR) between the cleavage site for enterokinase and the CBF␤(187) open reading frame (Fig. 1B) Purification of CBF␤(141) and CBF␤(187)-Utilizing the pGRXCBF␤141 fusion vector, we obtain high levels of expression (Fig. 2, lane 3) of protein that is virtually completely soluble (Fig. 2, lanes 4 and 5). Fig. 2 illustrates a typical purification of CBF␤(141) starting with cells from a 1-liter bacterial culture. After lysis of the cells and clarification by centrifugation, the Grx-CBF␤(141) fusion protein can be purified close to homogeneity in two steps via ion-exchange chromatography on DEAE-Sephacel and size exclusion chromatography on Sephacryl S-100 (Fig. 2, lanes 6 and 7). The fusion protein-containing fractions were readily identified using the 2-hydroxyethyl disulfide assay specific for glutaredoxin (26), pooled, and concentrated by ultrafiltration.
Treatment of the Grx-CBF␤(141) fusion protein with enterokinase and subsequent ion-exchange chromatography on Q-Sepharose (Fig. 2, lanes 8 and 9) yields significant quantities of homogeneous CBF␤(141) protein (typically 20 mg from a 1-liter bacterial culture) as judged by SDS-PAGE as well as by isoelectric focusing gel electrophoresis. The conditions employed for the enterokinase cleavage of the Grx-CBF␤(141) fusion protein were optimized to obtain the highest yield of protein and utilize the minimum quantity of the somewhat costly enterokinase. Preliminary circular dichroism spectra of CBF␤(141) versus temperature showed that the protein begins to unfold above 30°C (data not shown), so the temperature employed for the cleavage reaction was lowered from 37 to 30°C, resulting in a substantial improvement in yield but requiring a longer reaction time. Optimization of the amount of enterokinase resulted in a final ratio of 90 units of enterokinase/A 280 of fusion protein/ml, resulting in a total of 5000 units of enterokinase being employed for a typical purification from a 1-liter bacterial culture.
To confirm the N-terminal sequence and size of the resulting protein, both N-terminal sequencing and MALDI mass spectrometry were carried out on CBF␤(141). N-terminal sequenc-ing could not identify the N-terminal two amino acids but confirmed the identity of the next three amino acids. MALDI mass spectrometry yielded a molecular mass of 16.74 kDa, in good agreement with the calculated value of 16.73 kDa for the 143-amino acid protein.
Overexpression and purification of CBF␤(187) was initially attempted utilizing the same construct used for the CBF␤(141) protein, namely Grx fused to CBF␤(187) with a linker containing an enterokinase site. Again, high levels of expression of the resulting fusion protein were obtained, and the Grx-CBF␤(187) protein could readily be purified utilizing the same two-step protocol employed for CBF␤(141). However, unlike the CBF␤(141) fusion protein, treatment of the CBF␤(187) fusion protein with enterokinase resulted in proteolytic degradation of CBF␤(187). To characterize the site of this cleavage, MALDI mass spectrometry of the cleaved protein was carried out. Assuming an intact N terminus based on the behavior of the CBF␤(141) protein in the enterokinase cleavage reaction, the observed molecular masses (17.4 and 18.6 kDa) indicated cleavage after residues 147 and 156 of CBF␤(187). The sequences of amino acids leading up through residues 147 and 156, respectively, are Phe-Glu-Glu-Ala-Arg and Glu-Phe-Glu-Asp-Arg, which apparently resemble closely enough the Asp-Asp-Asp-Asp-Lys recognition sequence for enterokinase such that cleavage also occurs at these positions.
To overcome this secondary proteolysis, a Factor Xa cleavage site was inserted between the enterokinase cleavage site and the CBF␤(187) coding sequence. Again, high levels of fusion protein are obtained and can be purified via the same two-step protocol used previously. Cleavage with Factor Xa minimizes secondary cleavage of the protein (Fig. 3); however, there continues to be some secondary cleavage (Fig. 3, lane 4) indicative of a mobile protease-accessible region in the tail of the protein.
For this reason, cleavage reactions with Factor Xa cannot be allowed to go to completion, thus reducing the overall yield of CBF␤(187) obtained. Ion-exchange chromatography on Q-Sepharose is employed as for CBF␤(141) followed by size exclusion chromatography on Sephacryl S-100 to remove the residual degraded CBF␤(187) from the full-length protein. The resulting CBF␤(187) is homogeneous as judged by both SDS-PAGE (Fig. 3, lane 5) as well as isoelectric focusing gel electro-  (1,15,16). To determine whether CBF␤(141) and the full-length CBF␤(187) have similar affinities for CBF␣, we performed electrophoretic mobility shift assays to measure the K d of both proteins for a CBF␣⅐DNA complex. The source of CBF␣ used in these experiments consists of a 175-amino acid fragment containing the DNA binding Runt domain from the murine CBF␣2 (AML1) protein, which was expressed and purified as described previously (28). The supershift of the Runt domain-DNA complex associated with binding of CBF␤ has been employed to obtain equilibrium dissociation constants for the binding of CBF␤ to a Runt domain-DNA complex (15). Fig.  4 shows gel shift data obtained from these measurements, and the data employed for the calculation of the K d s . K d values of (4.5 Ϯ 0.5) ϫ 10 Ϫ8 M and (1.8 Ϯ 0.1) ϫ 10 Ϫ8 M (measured at 4°C) were obtained for the binding of CBF␤(141) and CBF␤(187) to a Runt domain-DNA complex, respectively. These values are very similar, demonstrating that virtually all the determinants necessary for binding to the Runt domain reside in the N-terminal 141 amino acids. This agrees well with the observed homology between mammalian CBF␤ and its Drosophila homologs where there is a high degree of homology in the N-terminal 137 amino acids but very low homology in the C-terminal region of the proteins. The exact function of the C-terminal region is unknown at this time.
Sedimentation Equilibrium-To assess the solution oligomerization state of CBF␤(141) and CBF␤(187), sedimentation equilibrium measurements have been carried out. Both CBF␤(141) and CBF␤(187) sediment as single ideal species with apparent molecular weights of 16,600 Ϯ 1200 (root mean square of the fit ϭ 0.008 Å) and 25,700 Ϯ 1000 (root mean square of the fit ϭ 0.0011 Å). These values are in good agreement with the calculated molecular masses of the proteins (16.7 and 22.1 kDa); thus we conclude that both proteins are monomeric under these conditions. Because the two proteins do not aggregate to any observable extent up to the concentrations employed for the sedimentation equilibrium measurements, both proteins should be good candidates for structure determination by NMR spectroscopy.
CD Analysis-The far ultraviolet CD spectra for the two proteins have been analyzed to obtain estimates of the percentages of various secondary structures (Fig. 5) using the variable selection procedure of Johnson (34) and Manavalan and Johnson (35). Both CBF␤(141) and CBF␤(187) are mixed ␣/␤ proteins with CBF␤(141) having 21% ␣-helix (ϳ30 residues) and 27% ␤-sheet (ϳ39 residues) content, whereas CBF␤(187) dis-plays 34% ␣-helix (ϳ64 residues) and 21% ␤-sheet (ϳ39 residues) content. The remainder of the residues in both proteins are partitioned between turn and random coil conformations with both proteins having a similar number of residues in random coil conformations, but CBF␤(187) having a larger number of residues in turn conformations. Based on the similar numbers of residues in ␤-sheet conformations, the similar biochemical activity, and the very similar NMR spectra of the two proteins (see below), it appears that the N-terminal 141 amino acids retain the same conformation in both CBF␤(141) and the full-length protein. Based on this, the increase in the number of ␣-helical residues (64 for CBF␤(187) versus 30 for CBF␤(141)) must correspond to residues in the C-terminal portion of CBF␤(187). This implies that this nonconserved C-terminal tail is almost completely ␣-helical.
NMR Spectroscopy-15 N-1 H gradient sensitivity-enhanced HSQC (36) spectra of 15 N-labeled CBF␤(141) and CBF␤(187) have been recorded to assess the structural similarity of the two proteins and to assess their suitability for structure determination by NMR spectroscopy. Fig. 6 shows the 15 N-1 H HSQC spectra of the two proteins. Both proteins could readily be concentrated to the ϳ1 mM concentration necessary for NMR spectroscopy and remained soluble for extended periods. Both proteins give well dispersed spectra characteristic of mixed ␣/␤ proteins. The spectra show a striking resemblance to one another in almost all regions, providing additional confirmation of the retention of the conformation of the N-terminal 141 amino acids in both proteins. In particular, the residues with downfield NH frequencies that generally correspond to amino acids in extended conformations in ␤-sheets (45) display virtual identity to one another.
To confirm that the two proteins remain monomeric at the high concentrations necessary for NMR spectroscopy, we have recorded 15 N T 1 and T 2 relaxation data on both 15 N-labeled samples. Assuming reasonably isotropic rotation of the protein, quite accurate estimates of the overall correlation time can be obtained from the ratio of the 15 N T 1 and T 2 values (43). Fig. 7 shows the values of this ratio for 121 of the resolved NH cross-peaks observed in the HSQC spectrum of CBF␤(141) and 91 of the resolved NH cross-peaks observed in the HSQC spectrum of CBF␤(187). Based on these measurements, we obtain overall correlation times of 11.43 Ϯ 0.03 ns for CBF␤(141) and 16.9 Ϯ 0.4 ns for CBF␤(187). The value for CBF␤(141) is in good agreement with values obtained for a number of monomeric proteins of similar size (43,(47)(48)(49)(50), thus we conclude CBF␤(141) is also monomeric at the high concentrations employed for NMR spectroscopy. As the overall correlation time of an isotropically tumbling molecule should be proportional to its volume (51,52) and by inference its mass, the increase in the value observed for CBF␤(187) over CBF␤(141) is consistent with its increase in mass. We therefore conclude that CBF␤(187) also remains monomeric at these high concentrations. DISCUSSION The core binding factor proteins constitute a small family of transcription factors containing a DNA-binding ␣ subunit and a non-DNA-binding ␤ subunit (53). The Runt domain of CBF␣ is responsible for both DNA binding and heterodimerization with CBF␤. CBF␤ binds to the Runt domain and is believed to induce a conformational change that results in an increase in the affinity of the Runt domain for DNA of approximately 6-fold (1-2). 2 The three-dimensional structures of CBF␤ and the Runt domain are unknown, and the primary sequences do not show any homology to other DNA-binding domains or protein-protein interaction domains. Knockouts of the genes encoding CBF␣2 and CBF␤ in mice are embryonic lethal and lead to a complete blockage in definitive hematopoiesis (15)(16)(17)(18)(19), indicating that CBF plays a critical role in hematopoietic development. Mutations in the genes encoding CBF␣2 and CBF␤ are associated with a large number of leukemias (15,23). Biophysical characterization and structure determination of these proteins and their interactions with each other as well as with DNA should greatly facilitate the development of novel therapeutic strategies to treat the leukemias associated with the variant forms of these proteins.
We have developed a procedure for the high level overexpression and purification of full-length CBF␤, CBF␤(187), as well as a truncated form containing only the N-terminal 141 amino acids, CBF␤(141), using a novel protein fusion system employing E. coli glutaredoxin as the fusion partner. The use of the glutaredoxin protein as the fusion partner results in very high levels of expression of the proteins in a soluble form. Cleavage of the desired CBF␤ product from the fusion protein has been effected via either an enterokinase or Factor Xa site in the fusion protein. Use of Factor Xa was necessary for the full-  1 mM); B, CBF␤(187) (0.6 mM). All spectra were recorded at 20°C in a buffer of 25 mM potassium phosphate (pH 6.5), 0.1 mM EDTA, 0.1% sodium azide, 5 mM DTT on a 500 MHz Varian UnityPlus NMR spectrometer.
length protein because of the presence of two secondary sites of cleavage for enterokinase in the C-terminal portion of CBF␤(187). These sites have been identified by MALDI mass spectrometry of the cleaved products and do show a similarity to the Asp-Asp-Asp-Asp-Lys recognition sequence of enterokinase. The use of glutaredoxin fusions for overexpression may be generally applicable to many other proteins.
We have utilized a number of different methods to characterize the biophysical properties of CBF␤(141) and CBF␤(187). Electromobility gel shift assays have been utilized to measure the equilibrium binding constants for the binding of CBF␤(141) and CBF␤(187) to a Runt domain-DNA complex. We have shown that the binding of both proteins is nearly identical, confirming that the determinants for binding to the Runt domain reside in the N-terminal 141 amino acids. Sedimentation equilibrium NMR measurements show both proteins to be monomeric species in solution, thus good candidates for structure determination by NMR spectroscopy. Circular dichroism spectroscopy shows these proteins are mixed ␣/␤ proteins with the additional C-terminal residues in CBF␤(187) being predominantly helical. This is quite interesting considering the fact that the fusion protein formed by the chromosomal inversion that disrupts the CBFB gene, inv(16)(p13;q22), results in the fusion of the N-terminal 165 amino acids of CBF␤ to a smooth muscle myosin heavy chain protein encoded by the gene MYH11 (10,46). The portion of the smooth muscle myosin heavy chain fused to CBF␤ corresponds to the ␣-helical coiled coil domain, resulting in a helical conformation for this portion of the fusion protein as well. NMR spectroscopy has been conducted on the two proteins to demonstrate their structural similarity and establish their suitability for structure determination by NMR spectroscopy. A structure determination by NMR spectroscopy of CBF␤(141) is currently being pursued.