Structures of zinc finger domains from transcription factor Sp1. Insights into sequence-specific protein-DNA recognition.

The carboxyl terminus of transcription factor Sp1 contains three contiguous Cys2-His2 zinc finger domains with the consensus sequence Cys-X2-4-Cys-X12-His-X3-His. We have used standard homonuclear two-dimensional NMR techniques to solve the solution structures of synthetic peptides corresponding to the last two zinc finger domains (Sp1f2 and Sp1f3, respectively) of Sp1. Our studies indicate a classical Cys2-His2 type fold for both the domains differing from each other primarily in the conformation of Cys-X2-Cys (β-type I turn) and Cys-X4-Cys (β-type II turn) elements. There are, however, no significant differences in the metal binding properties between the Cys-X4-Cys (Sp1f2) and Cys-X2-Cys (Sp1f3) subclasses of zinc fingers. The free solution structures of Sp1f2 and Sp1f3 are very similar to those of the analogous fingers of Zif268 bound to DNA. There is NMR spectral evidence suggesting that the Arg-Asp buttressing interaction observed in the Zif-268·DNA complex is also preserved in unbound Sp1f2 and Sp1f3. Modeling Sp1-DNA complex by overlaying the Sp1f2 and Sp1f3 structures on Zif268 fingers 1 and 2, respectively, predicts the role of key amino acid residues, the interference/protection data, and supports the model of Sp1-DNA interaction proposed earlier.

Synthesis of mRNA by RNA polymerase II requires the interaction of a large array of auxiliary transcription factors that recognize and bind to specific promoter DNA sequences located upstream of eukaryotic genes (1,2). These transcription factors regulate the initiation of transcription in a temporally ordered manner by assembling and engaging the active transcription complex. Consequently, many of the transcription factors have multiple domains responsible for sequence-specific DNA binding and transcriptional activation.
In order to understand the detailed roles played by each of the domains of sequence-specific transcription factors, efforts were made to fractionate the factors necessary to reconstitute transcriptional activity in vitro (3)(4)(5). These experiments resulted in the identification of one such promoter-specific transcription factor, Sp1, from HeLa cells (6 -10). Sp1 enhances transcription from a variety of viral and cellular genes by binding to GC-rich decanucleotide recognition elements (GC boxes) within the 5Ј-flanking promoter sequences (10,11). Although Sp1 can bind and activate transcription from a single GC box sequence (12), Sp1 binding sites often occur as multiple repeats (6,9,10,13). However, Sp1 binds independently to each GC box sequence; physical interaction between adjacent Sp1 molecules is insufficient to give rise to cooperative DNA binding behavior (13). The multi-domain nature of Sp1 also facilitates Sp1-Sp1 interactions that occur in cases where Sp1 binding sites are widely separated (14 -16). This self-association and DNA looping phenomenon are proposed to give rise to the observed transcriptional synergism or super-activation of Sp1 (14,17).
The construction of truncated Sp1 fragments allowed the localization of the Zn 2ϩ -dependent DNA binding region to the carboxyl terminus, which was shown by sequence analysis to contain three "zinc finger" domains (7, 18 -20). The Zn 2ϩ domains found in Sp1 are analogous to those first identified in TFIIIA (21) and adopt the consensus sequence (FYH)XC-X 2-4 CX 3 FX 5 LX 2 HX 3 HX 5 (metal binding residues in bold) (22,23). These domains are distinct from the Cys-rich motifs found in the steroid receptors (24), the yeast transcription factor GAL4 (25), or the Cys 2 -His-Cys motif observed in retroviral proteins (26). Structural modeling (27) and structural studies (28 -35) show that Cys 2 -His 2 domains contain two ␤-strands with the Cys residues located at the ␤-turn, and an ␣-helix containing the two His residues, oriented to coordinate Zn 2ϩ in a tetrahedral fashion. This structural unit is now regarded as one of the major structural motifs involved in sequence-specific DNA binding and eukaryotic gene regulation.
Our general goal is to understand at the molecular level how the three zinc finger domains of Sp1 can bind with high affinity to a variety of GC box DNA sequences (10,11). We have previously described the overexpression, purification, and characterization of a 92-amino acid peptide, Sp1-Zn92, that contains the three zinc fingers of Sp1 (36). The DNA binding properties of Sp1-Zn92 were surveyed using a variety of techniques based on gel electrophoresis to quantitatively analyze its interaction with several native and modified DNA sites. Sp1-Zn92 was shown to mimic the DNA binding properties of native Sp1 and, through comparisons with results from other zinc finger systems, a model was developed to explain the distinctive DNA binding properties of Sp1. Our model serves as the starting point for detailed studies of the conformations of the individual domains of Sp1-Zn92 aimed at defining those molecular features that allow Sp1 to recognize promoter sequences that contain the asymmetric GGGCGG hexanucleotide core (GC box) with a consensus sequence of 5Ј-(G/T)GGGCG- The distinguishing feature of Sp1-DNA binding is the high degree of sequence variability that is tolerated within the GC box with retention of high binding affinity (10,11,37). This raises interesting questions regarding detailed molecular mechanism of Sp1-DNA recognition process and the role which the individual fingers may have to play as "flexible-independent" reading domains in modulating binding to nonidentical DNA sites with near equal affinity. As a first step toward studying Sp1-DNA interactions, we have determined the solution structures of synthetic peptides corresponding to zinc finger domains two and three (N terminus to C terminus) of Sp1 using standard homonuclear NMR techniques. While zinc finger 3 belongs to the well defined Cys-X 2 -Cys structural subclass, zinc finger 2 is a member of the Cys-X 4 -Cys structural subclass, which has been defined for relatively few systems (32,33,35). The refined solution structures of zinc fingers 2 and 3 are compared with each other and with other reported zinc finger structures. Putative DNA binding residues are identified, and the individual roles of conserved residues are analyzed in the context of our previous model (36) of Sp1-DNA interactions.

EXPERIMENTAL PROCEDURES
Peptide Synthesis-Peptides corresponding to zinc finger domains 2 and 3 of Sp1 (Sp1f 2 1 and Sp1f 3, respectively, sequences shown in Fig.  3) were prepared at the Peptide Synthesis Facility, Keck Foundation Biotechnology Resource Laboratory, Yale University, using solid phase N-tert-butyloxycarbonyl chemistry. Amino acids were coupled by activated esters, and final deprotection/cleavage was done using hydrogen fluoride. Purification of the peptides was performed using Vydac C18 reverse phase columns and a linear gradient from 0.05% trifluoroacetic acid to 80% acetonitrile, 0.05% trifluoroacetic acid. The final product was lyophilized and characterized by analytical reverse phase high performance liquid chromatography, amino acid analysis, and laser desorption mass spectrometry. Peptide samples for all studies were stored under an argon atmosphere.
Metal Binding Studies-The affinities of peptides Sp1f 2 and Sp1f 3 for Co 2ϩ ion were determined using absorption spectroscopy (Perkin-Elmer Lambda 6 UV/VIS spectrophotometer) as a function of CoCl 2 (99.999%, Aldrich) concentration in 50 mM HEPES, 50 mM NaCl, pH 8.0. All buffers were degassed using several freeze-pump-thaw cycles before use. The Co 2ϩ dissociation constants were calculated using Equation 1 (38): where l 0 ϭ cobalt concentration in the buffer, ␣ ϭ ⌬/⌬ max (⌬ ϭ change in absorbance at 640 nm upon addition of cobalt and ⌬ max ϭ maximum change in absorbance), K D ϭ dissociation constant of cobalt-peptide complex, and e 0 ϭ total concentration of the Co 2ϩ binding site (peptide concentration). Circular Dichroism of Sp1f 2 and Sp1f 3-CD spectra of Sp1f 2 and Sp1f 3 (100 g/ml peptide, 5 mM Tris⅐HCl, pH 8.0, 8°C) were measured using an Aviv model 60DS spectropolarimeter. Spectra were recorded from 300 to 190 nm and averaged over five scans with bandwidth of 1.50 nm, scan step of 1.00 nm/point, and an averaging time of 10.0 s. NMR Sample Preparation-Sp1f 2 and Sp1f 3 were dissolved in 0.5 ml of 25 mM Tris-d 11 (Cambridge Isotope Laboratories), pH 7.5, containing 0.2% sodium azide (w/v) and 10% D 2 O (v/v) followed by the addition of ZnSO 4 (20% molar excess). The final pH of the sample was adjusted to 5.90 (meter reading, uncorrected for isotope effect). The peptide concentrations were approximately 5 mM for each sample. All solutions were degassed by three freeze-pump-thaw cycles prior to protein dissolution, and all manipulations were carried out under an argon atmosphere. Samples were stored under an argon atmosphere and showed no degradation over the length of NMR experiments.
NMR Methods-NMR spectra were acquired using either a Bruker AM500 or a GE Omega500 NMR spectrometer. Two-dimensional NOESY 2 spectra were acquired with selective water presaturation (de-lays alternating with nutations for tailored excitation pulse (39), AM500; Shinnar-LeRoux pulse (40), Omega500) followed by the standard NOESY pulse train (41). An inversion pulse bracketed by homospoil pulses was used during mixing time to minimize artifacts from the residual water resonance. Double quantum filtered correlation spectroscopy spectra were acquired with optimized phase cycling (42). Clean TOCSY spectra (43) were acquired using water saturation as given above, with MLEV17 (44) (AM500, Omega500) or decoupling in the presence of scalar interactions-2 (45) (Omega500) mixing schemes, followed by flipback and homospoil pulses for elimination of the rotating frame Overhauser effect and for water suppression. Quadrature detection in the indirect dimension was obtained using either time-proportional phase incrementation (AM500) or States-time-proportional phase incrementation (Omega500) (46,47). Spectra were typically acquired with 32 or 48 scans per t 1 value for 1024 t 1 values, spectral width was typically 6000 Hz, and 2048 complex points were collected in the direct dimension. The free induction decay in both dimensions were multiplied by phase-shifted sine bell apodization function, zero-filled, and Fourier-transformed to yield 2048 by 2048 matrices. All spectra were processed using the FELIX 2.30 software package (Biosym, Inc.).
Structure Calculations-The hybrid distance geometry dynamical simulated annealing protocol within X-PLOR software package (48) was used for structure calculations. Interproton distances were calculated from cross-peak volumes derived from two-dimensional NOESY spectra recorded with a 200-ms mixing time, using a NH-NH (i, i ϩ 1) distance of 2.8 Å as an internal standard. Upper and lower bounds were set equal to Ϯ6% of the square of the distance calculated. For nonstereospecific assignments, distance constraints were applied to a pseudo atom situated at the geometric center of the nuclei, and the distance bounds were appropriately expanded, based on the known amino acid geometries. The experimental constraints were represented in the form of an asymmetric internuclear pseudo energy, having a minimum at the distance constraint, an infinite harmonic wall to the lower bound side, and a harmonic function making a transition to a zero slope asymptote to the upper bound side. The soft square potential function used in these calculations had a maximum potential of 50 kcal/mol, a soft square exponent of 2, and a scaling factor of 25.
3 J HNH␣ coupling constants for Sp1f 2 were determined from a double quantum filtered correlation spectroscopy spectrum using absorptive and dispersive antiphase splittings (49). The coupling constants were converted to torsional angle constraints using the Karplus relationship, and these restraints were used during the refinement step of Sp1f 2 structure calculation protocol.
The structure calculations for Sp1f 2 and Sp1f 3 can be divided into three steps. In the first step, a set of substructures containing the backbone, C ␤ and C ␥ atoms were embedded in Cartesian coordinate space using the distance geometry protocol. Next, the remaining atoms were added in an extended conformation and subjected to multiple rounds of simulated annealing. Finally, the distance geometry simulated annealing regularized structures were subjected to multiple rounds of simulated annealing refinement. All calculations were performed using a SGI R4000 workstation.
Zinc Coordination-Initial structure calculations were performed without incorporating the Zn 2ϩ atom; inspection of these structures clearly identified the Cys 2 -His 2 Zn 2ϩ binding ligands. Furthermore, analysis of the data allowed the unambiguous assignment of the N ⑀ atoms of His residues as the heteroatom coordinating the metal. Thereafter, Zn 2ϩ was incorporated in structure calculations with an approximately tetrahedral geometry. Zinc-ligand bonds were assigned equilibrium distances of 2.30 Å [Zn-S] and 2.00 Å [Zn-N] (50) using artificial NOE constraints with a high weighing factor (300). The angles centered on the metal were constrained with tetrahedral equilibrium (harmonic potential). The N ⑀ atom of His ligands were constrained to lie in the plane defined by C ␦2 , C ⑀1 , and Zn 2ϩ atoms.
Final structures were subjected to additional rounds of energy minimization, once after removing any lower bound on distance constraints and once without any explicit metal geometry constraints. Energy minimization without lower bounds had negligible effect on the average geometry of the calculated structures and did not increase the RMSD within the final structures for both Sp1f 2 and Sp1f 3. The energy minimization without metal binding constraints preserved the configuration and geometry of the Cys and His ligands around the metal binding site showing that metal-ligand constraints were consistent with the global energy minimum for the Sp1f 2 and Sp1f 3 structures.
The average coordinates and the RMSD values were calculated within X-PLOR, and the family of structures was visualized and overlaid using the software package Midas Plus Version 1.9. The structures with averaged coordinates (Sp1f 2⅐avg, Sp1f 3⅐avg) were subjected to a final round of simulated annealing refinement (refine⅐inp protocol of X-PLOR) to relieve bad contacts and irregular covalent geometry which might have arisen due to geometrical averaging. These average-refined structures (Sp1f 2⅐avg, Sp1f 3⅐avg) were used to identify potential hydrogen bonds and to model Sp1f 2-DNA and Sp1f 3-DNA interactions. The criteria for hydrogen bonds are that the distance between N of NH and O of CO (NH⅐⅐⅐O) be less than 3.4 Å and the angle N-O-C be larger than 110° (51).
Optical titration experiments allowed the determination of Co 2ϩ dissociation constants (K D Co ) for Sp1f 2 (K D Co Sp1f 2 ϭ 1.2 ϫ 10 Ϫ6 ) and Sp1f 3 (K D Co Sp1f 3 ϭ 2.1 ϫ 10 Ϫ6 ). These K D values are consistent with values previously reported for zinc finger domains (62) and indicate a folded conformation for Sp1f 2 and Sp1f 3 in solution.
The positions and intensities of the d 3 d transitions are consistent with the formation of 1:1 Co 2ϩ -peptide complexes with the metal centers occupying tetrahedral or distorted tetrahedral environments for both Sp1f 2 and Sp1f 3 (54,56,57). Observation of the ligand field bands are analogous to those reported for Co 2ϩ binding to the His 2 -Cys 2 site in the second zinc finger domain of TFIIIA (56) and the gene 32 protein (58,59). However, the definitive identification of coordination geometry based on UV/VIS spectroscopy is difficult owing to the similarity of the electronic spectra of distorted tetrahedral and five-coordinate Co 2ϩ complexes, especially when only band positions are considered (53,60,61).
The circular dichroism spectra of both peptides show negative ellipticities at 228 nm, large negative molar ellipticities at 208 nm, and positive ellipticities at 190 nm (data not shown) in the presence of Zn 2ϩ . These features are consistent with the presence of regular secondary structure elements (63) and further indicate that the peptides adopt a conformation typical of folded zinc finger domains.
NMR Sequential Assignments and Secondary Structure Determination-Standard procedures that utilize two-dimensional COSY, TOCSY, and NOESY NMR data (64) were used to determine sequential resonance assignments (Figs. 2, A and B). The short and medium range connectivity patterns and the 3 J HNH␣ coupling constants, summarized in Fig. 3, are consistent with an ␣-helical stretch for residues 17-28 (Sp1f 2) and residues 16 -27 (Sp1f 3). Furthermore, the added presence of i, i ϩ 2 connectivities indicates a 3 10 helical conformation for the last helical turn of both zinc finger domains. This transition from an ␣-helix to a 3 10 helix is a general property shared by the His-X 3 -His subclass of zinc fingers (structures with His-X 4 -5 -His spacing show no indication of a 3 10 helix) (29). In both Sp1f 2 and Sp1f 3, the helix terminates with the Gly residue two residues after the second metal binding His. While the COOH-terminal portion of both peptides produced NOE patterns characteristic of a classical ␣-helix, no long uninterrupted connectivity patterns characteristic of a classical ␤-strand were observed in either peptide. However, short segments preceding the first Cys residue and closely succeeding the second Cys residue gave strong C ␣ H i -NH i ϩ 1 connectivities characteristic of an extended strand conformation. The observed NOE connectivities (Fig. 3) are also consistent with a turn among residues Trp 7 -Cys 10 (Sp1f 2) and residues Cys 5 -Cys 8 (Sp1f 3) connecting the two extended strands in each zinc finger domain.
Three-dimensional Structures-The backbone conformations of Sp1f 2 and Sp1f 3 families are well defined by the NMR data, as shown in Fig. 4, A and B, and indicated by the average RMSDs for the peptides (Table I). As expected, the NH 3 ϩ -and COO Ϫ -terminal residues are poorly defined. This is reflected by the low RMSD of 0.43 Å for the backbone from the first conserved hydrophobic residue (Phe) through the residue immediately succeeding the second metal coordinating His residue. DISCUSSION The calculated structures exhibit secondary structure elements consistent with the short range NOE connectivity patterns. The overall topology of both peptides conforms to the expected fold of Cys 2 -His 2 zinc finger domains consisting of two antiparallel strands linked by a Cys-Cys loop followed by a reverse 90°turn and an ␣-helix containing the two zinc coordinating His residues.
Observed long range NOEs define the relative orientation of the secondary structure elements. For Sp1f 2 the NOE constraints are consistent with an antiparallel orientation for the two ␤-strands extending from the turn encompassing Trp 7 -Cys 10 . Ninety percent of Sp1f 2 structures are consistent with hydrogen bonds between Phe 3 (CO) and Arg 14 (NH) (distance NH⅐⅐⅐O ϭ 2.9 Å, angle N-O-C ϭ 163°) and between Cys 5 (NH) and Lys 12 (CO) (distance NH⅐⅐⅐O ϭ 3.1 Å, angle N-O-C ϭ 160°), stabilizing the antiparallel ␤-sheet. The remainder of the amide and carbonyl groups in this sheet region are oriented as if interacting with the solvent. Long range connectivities in Sp1f 3 also define an antiparallel orientation for its two ␤-strands. The antiparallel ␤-sheet of Sp1f 3, however, seems to be much more open with no backbone-backbone hydrogen bonds evident in the average refined structure, as observed previously in the case of ADR1 and human enhancer proteins (30,34).
Cys-Cys Loop-The results of NMR spectroscopy and Co 2ϩ titration studies show that the increased size of the Cys-X 4 -Cys chelate of Sp1f 2 does not dramatically alter either the local geometry or the metal affinity, relative to the more predomi- nant Cys-X 2 -Cys subclass. There are, however, some differences in the turn conformations of the two zinc finger domains despite the fact that both turns require the two Cys residues to be positioned for tetrahedral metal binding. The Sp1f 2 turn contains the residues Trp 7 (i)-Ser 8 (i ϩ 1)-Tyr 9 (i ϩ 2)-Cys 10 (i ϩ 3), and the first metal coordinating Cys is completely excluded from the reverse turn (Fig. 5A). The turn is best classified as a ␤-type II structural element based on the dihedral angle values observed for the average, refined structure (Table  II). Hydrogen bonds are observed between Trp 7 (CO) and Cys 10 (NH) (distance NH⅐⅐⅐O ϭ 3.73 Å, angle N-O-C ϭ 109°) in 50% of the structures. The average refined structure, Sp1f 2⅐avg⅐min, also shows the S ␥ of Cys 10 within H-bonding distance of the backbone NH of Lys 12 (distance NH⅐⅐⅐S ϭ 3.4 Å, angle N-H-S ϭ 124°). The turn conformation is further stabilized by the stacking of the Trp 7 ring against the His 27 imidazole ring (Fig. 6A). The orientation of the Trp indole ring with respect to His 27 is borne out by the dramatic upfield shift for His 27 ␤-protons due to ring current effects (Fig. 2A).
The Sp1f 3 turn includes residues Cys 5 (i)-Pro 6 (i ϩ 1)-Glu 7 (i ϩ 2)-Cys 8 (i ϩ 3) and is best classified as a ␤-type I structural element, based on dihedral angle values (65) ( Table II). The Sp1f 3 ␤-turn shows the expected hydrogen bond between CO of residue i (Cys 5 ) and NH of residue i ϩ 3 (Cys 8 ) in all calculated structures (distance NH⅐⅐⅐O ϭ 3.2 Å, angle N-O-C ϭ 120°). In addition, the structures place the S ␥ of Cys 5 within hydrogen bonding distance of the backbone NH groups of Glu 7 (distance NH⅐⅐⅐S ϭ 3.6 Å, angle N-H-S ϭ 136°) and Cys 8 (distance NH⅐⅐⅐S ϭ 3.6 Å, angle N-H-S ϭ 157°), thus further stabilizing the turn conformation (Fig. 5B). These NH⅐⅐⅐S hydrogen bond geometries are similar to those observed in ferrodoxin and rubredoxin (66). Also, the putative NH⅐⅐⅐S hydrogen bonds in Sp1f 3 correspond to the and bonds in the context of the SPXX motif (51), except that the Ser residue O atom is replaced by a Cys 5 S atom. The angle for the i ϩ 2 Glu 7 in Sp1f 3 differs significantly from the expected value of 0° (Table II) (67), perhaps due to the fact that Glu 7 is involved in a long range hydrophobic interaction with His 23 .
While it is not common for NH⅐⅐⅐S hydrogen bonds to occur in Cys residues involved in disulfide bridges, numerous NH⅐⅐⅐S bonds, as observed for Sp1f 3, are a common feature in proteins coordinating metal ions (66). These bonds are hypothesized to play an important role in stabilizing the ligand arrangement required for metal coordination, thereby minimizing the entropy change caused by metal coordination to the apo protein.
Hydrophobic Core-Packing of the ␤-sheet and ␣-helix of zinc finger domains against each other forms a hydrophobic core and places the conserved Cys and His residues toward the interior of the domain in a position to coordinate a Zn 2ϩ ion. The experimental NOE constraints unambiguously determine the absolute chirality around the zinc ion as S, following earlier convention (27). Several residues (Phe 14  for Sp1f 2; Phe 12 , Lys 10 , Glu 7 , and Ile 22 for Sp1f 3) serve to shield the zinc ion from solvent and may therefore stabilize the metal-ligand interaction by precluding close approach of alternative donor ligands. The occurrence of such hydrophobic shells surrounding metal binding sites is well known and believed to play a key role by not only minimizing the change in conformational entropy upon metal binding by preordering the primary coordination sphere but also by precluding alternative modes of metal binding through the reduction of heteroatoms in the vicinity of the primary coordination sphere.
In addition to the expected packing interactions between aromatic and other hydrophobic side chains (Figs. 6, A and B  and 7, A and B), alkyl methylene groups of certain long chain polar residues seem to be involved in hydrophobic interactions as well. For instance, in Sp1f 2, the alkyl chain of Lys 12 packs against the central Phe 14 and His 23 , whereas the polar amine is oriented toward the solvent. Similarly, methylene groups of Glu 19 and Lys 24 (not visible) pack against Phe 14 and His 23 , respectively, with their charged groups pointing outwards (Figs. 6A and 7A). A similar arrangement is seen for the side chain of Lys 10 in Sp1f 3 (Figs. 6B and 7B). Thus, despite their relatively small size, the zinc finger domains achieve a relatively high degree of packing and are stable as mini-globular domains in the presence of zinc and other divalent metal ions (Fig. 7, A and B).
Sp1-DNA Interactions-In the Zif268⅐DNA complex crystal structure the zinc fingers bind DNA by docking their ␣-helices in the major groove such that each zinc finger makes contact with the G-rich strand of the appropriate cognate 3-base pair subsite using residues at positions Ϫ1, ϩ3, or ϩ6 relative to the start of the ␣-helix (32). Fingers 1 and 3 of Zif268 use Arg residues Ϫ1 and ϩ6 to contact the underlined residues of the GCG subsite, and finger 2 uses residues Arg (Ϫ1) and His (ϩ3) to contact the underlined bases of GGG subsite. An analysis of the Sp1 sequence, considering the residues analogous to the ones involved in DNA binding in the Zif268 structure, reveals striking similarities between the zinc fingers of Sp1 and Zif268 (36). In light of these similarities, we proposed a model for interaction of Sp1 with DNA based on the Zif268/DNA cocrystal structure which envisaged Sp1f 2, by analogy to Zif268 finger 1, using Arg (Ϫ1) (residue 16) and Arg (ϩ6) (residue 22) for GCG recognition and Sp1f 3, by analogy to Zif268 finger 2, using Arg (Ϫ1) (residue 14) and His (ϩ3) (residue 17) for GGG recognition (Fig. 8) (36, 68).  4. A, stereoview of the 20 energy minimized conformers used to represent solution structure of Sp1f 2 (last residue not shown). The structures were overlaid to obtain the best match for the backbone for residues 3-27. All backbone heavy atoms including side chain heavy atoms of metal coordinating Cys and His residues are shown. B, stereoview of the 20 energy minimized conformers used to represent solution structure of Sp1f 3 (last residue not shown). The structures were overlaid to obtain the best match for the backbone for residues 3-25. All backbone atoms including side chain heavy atoms of metal coordinating Cys and His residues are shown.
To test the validity of the above-mentioned model and to gain further insight into the role of individual amino acid residues, we superimposed the average refined structures of Sp1f 2 and Sp1f 3 on Zif268 fingers 1 and 2 in the Zif268-DNA crystal structure (69). As required by the model, the structures of Sp1f 2 and Sp1f 3 were found to be very similar to Zif268 fingers 1 and 2, respectively (Fig. 9, A and B). The backbone RMSD values for residues 3-25 of Sp1f 3 (excluding the residue immediately succeeding the second metal binding Cys) and the corresponding 22 residues of Zif268 finger 2 is 0.90 Å. The backbones deviate significantly at the residue succeeding the second Cys, due to the difference in angles at this position. Sp1f 3 contains a Pro residue at this site that has its angle restricted to Ϫ60°, whereas most zinc fingers, including Zif268 finger 2, exhibit a positive angle for the residue immediately succeeding the second Cys location (69). The best overlay of Sp1f 2 and Zif-268 finger 1 between the first conserved hydrophobic residue to the second His residue (excluding the four residues in the Cys-Cys loop) gives an RMSD value of only 0.68 Å. The residues of the Cys-X 4 -Cys loop were excluded from the RMSD calculation since these residues were stated to be ill-defined in the crystal structure (32).
Overlay of Sp1f 2 with Zif-268 finger 1 in the co-crystal structure almost exactly overlaps the backbone atoms of their respective ␣-helices, and even though the side chains of residues Ϫ1 (Arg 16 ) and ϩ6 (Arg 22 ) of Sp1f 2 are not constrained in the NMR structures, their backbone ␣ carbons are positioned such that the side chains start out pointing toward the major groove of DNA (Fig. 9A) in a manner that is consistent with their proposed interaction with DNA bases, which would be the underlined guanines of the GCG subsite if we assume that Sp1f 2 docks in the major groove in an orientation similar to that observed for Zif268 finger 1 in the crystal structure. 3 This mode of interaction is also consistent with the protection/interference data (70) and the complete conservation of the underlined guanines of the subsite GCG in all Sp1 sites identified to date (10). In contrast to Arg (Ϫ1) and Arg (ϩ6), Glu (ϩ3) (residue 19) of Sp1f 2, whose side chain is relatively well defined in the NMR structures (average RMSD of side chain carbon atoms ϭ 1.07 Å), does not point toward the major groove but instead packs its methylene ␤-protons against the central Phe in a manner similar to the Zif268 Glu (ϩ3) residue (Figs. 6A and 7A), suggesting that it may not interact directly with DNA. In fact when all the 20 structures of Sp1f 2 were overlaid on Zif268 finger 1, as described before, the Glu (ϩ3) side chain did not come within interacting distance of any base of DNA in 19 of the 20 models generated. This is consistent with the reported absence of methylation protection of the middle C position of the 3-base pair GCG subsite. Gln (ϩ5) (residue 21) is another hydrophilic residue in the ␣-helix that is relatively well defined in the NMR structures (average RMSD of side chain carbon atoms ϭ 1.08 Å) and does not point toward the major groove. This residue, instead, folds back along the backbone (Figs. 6A and 7A) and appears to place its side chain amide proton within hydrogen bonding distance of Ser 17 side chain O ␥ and carbonyl O atoms, although it is difficult to clearly identify the interacting partner. Thus we see that the model generated by the overlay and the side chain packing arrangement is consistent with Arg 16 and Arg 22 interacting with DNA bases but tends to preclude the possibility of the other two polar residues on the ␣-helix (Glu 19 and Gln 21 ) being involved in direct base recognition.
Similar overlay of Sp1f 3 with Zif268 finger 2 (Fig. 9B) positions Arg (Ϫ1) (residue 14) and His (ϩ3) pointing (residue 17) toward the major groove consistent with proposed DNA con- 3 Overlay of Sp1f 2⅐avg structure on Zif268 finger 1 places the terminal NH protons of the relatively well defined Lys 12 (average RMSD of side chain carbons until C ␦ ϭ 0.75 Å) within hydrogen bonding distance of 5Ј-phosphate of base pair 7, analogous to Zif268 structure (Figs. 6A and 9A). Additionally, overlays of both Sp1f 2 and Sp1f 3 with Zif fingers 1 and 2, respectively, in the Zif268-DNA complex place the N ⑀ proton of the first metal coordinating His within hydrogen bonding distance of the 5Ј-phosphate of base pairs 4 and 7, respectively, as observed for the corresponding positions in the Zif268 complex (Fig. 9, A and B). The preservation of these contacts with DNA backbone (which have been proposed to serve to precisely position the ␣-helix with respect to their DNA sites in the major groove (32)), upon superpositioning of the structures suggests that the orientation and docking mode of Sp1f 2 and Sp1f 3 in the major groove of DNA must be very similar to that observed for the analogous Zif268 zinc fingers.   (71).
Arg-Asp Interaction-The oxygens of the carboxylate group of Asp (ϩ2) were found to be in a hydrogen bond-salt bridge interaction with N ⑀ group of Arg (Ϫ1) in all the three zinc fingers of Zif268 crystal structure. There is NMR spectral evidence suggesting that a similar interaction exists between the analogous side chains Arg 16 -Asp 18 of Sp1f 2 and Arg 14 -Asp 16 of Sp1f 3 even in the absence of DNA (69). In Sp1f 3 the N ⑀ H proton of Arg 14 gives intense TOCSY cross-peaks with neighboring protons in the side chain and is shifted downfield to 8.02 ppm from the random-coil value of 7.20 ppm (Fig. 2B) suggesting that the Arg 14 N ⑀ H proton is protected from exchange and probably involved in a hydrogen bond. Arg 14 is also the only long hydrophilic side chain to show a large chemical difference between the two diastereotopic methylene protons of the terminal CH 2 group (⌬ppm (C ␦ H 2 ) ϭ 0.50 at 5°C) (Fig. 2B). This large chemical shift difference indicates a well defined solution conformation for Arg 14 side chain. Furthermore, inspection of the NMR spectra revealed a moderately strong NOE between the C ␥ H proton of Arg 14 and C ␤ H proton of Asp 16 (this NOE was not included in the structure calculations). The abovementioned facts taken together strongly suggest that the Arg 14 N ⑀ H proton is involved in a hydrogen bond-like interaction, in all probability with the side chain of Asp 16 , even though this interaction is not apparent in all the individual structures due to lack of NOE constraints. The Asp 18 of Sp1f 2 is also capable of having a similar interaction with Arg 16 . Again the terminal N ⑀ H of Arg 16 (⌬ppm (C ␦ H 2 ) ϭ 0.06 at 15°C) ( Fig. 2A) gives an intense and considerably downfield shifted resonance in the NMR spectra indicating that Arg-Asp interaction also exists in Sp1f 2 free in solution unbound to DNA. This Arg-Asp interaction is presumed to stabilize the long side chain of Arg and enhance the specificity of arginine-guanine interaction. The presence of this interaction in Sp1f 2 and Sp1f 3 further implicates Arg (Ϫ1) of both domains in DNA binding.
We have also acquired NMR spectra and obtained backbone assignments of an over-expressed peptide fragment containing both the zinc finger domains 2 and 3 (Sp1f 23). Sp1f 23 then represents two-thirds of the DNA binding domain of Sp1 and has been shown to be capable of binding DNA in a Zn 2ϩ -dependent, sequence-specific manner. 4 We found that for most part the NMR spectrum of Sp1f 23 construct is close to the sum of the NMR spectra of Sp1f 2 and Sp1f 3 (except for the residues in the linker between the two domains), indicating negligible domain-domain interactions while free in solution. Since chemical shifts are very sensitive to local structure, this further supports the idea that the single finger structures are very relevant in context of larger domains and can serve as reasonable models to understand the mode of sequence-specific DNA interaction of entire multifinger constructs.
Furthermore, this essentially allows the transfer of Sp1f 2 and Sp1f 3 assignments to the larger Sp1f 23 fragment, thereby greatly facilitating assignment of the entire DNA binding domain of Sp1 in a modular fashion.
Conclusions-It is our aim to understand the chemical basis of the unique Sp1 DNA recognition process at the molecular level. Towards this objective, we have solved the solution structures of synthetic peptides corresponding to zinc finger domains 2 and 3 of Sp1, using homonuclear two-dimensional NMR spectroscopy. Circular dichroism studies and Co 2ϩ titration experiments show that both peptides assume a folded conformation around a tetrahedral metal center with no significant differences in the metal binding affinities between the Cys-X 4 -Cys (Sp1f 2) and Cys-X 2 -Cys (Sp1f 3) subclasses. Sp1f 2 has a stable ␤-type I turn between the two strands with the first Cys residue excluded from the turn motif due to the longer -X 4 -loop. Sp1f 3 contains the sequence Cys-Pro-Glu-Cys-Pro in which the first Pro causes the turn to closely resemble the SPXX motif both in geometry and hydrogen bonding pattern. The second Pro forces the angle at that position to be fixed at 4 X. Cao, unpublished results.

Dihedral angles
The dihedral angle values are for the average minimized structures of Sp1f 2 and Sp1f 3 (Sp1f 2⅐avg and Sp1f 3⅐avg, respectively). Residues i ϩ 1 and i ϩ 2 correspond to Pro 6 and Glu 7 , respectively, for Sp1f 3 and to Ser 8 and Tyr 9 , respectively, for Sp1f 2.
Ϫ62°Ϫ13°Ϫ79°Ϫ49°␤ -Type II Ϫ60°120°80°0°S p1F2 Ϫ86°142°76°5°F IG. 6. A, the average, refined structure of Sp1f 2 (Sp1f 2⅐avg⅐min) showing orientations of well defined aromatic and apolar residues involved in hydrophobic packing (Phe 3 , Phe 14 , Leu 20 , Trp 7 ), relatively well defined polar residues packing against the hydrophobic core via their alkyl methylene groups (Lys 12 , Glu 19 ), long polar residues of the ␣-helix (Arg 16 , Asp 18 , Gln 21 , Arg 22 , Arg 25 ), and the metal binding residues (Cys 5 , Cys 10 , His 23 , His 27 ). B, the average, refined structure of Sp1f 3 (Sp1f 3⅐avg⅐min) showing orientations of well defined aromatic and apolar residues involved in hydrophobic packing (Phe 3 , Phe 12 , Leu 18 , Glu 7 ), relatively well defined polar residue packing against the hydrophobic core via its alkyl methylene group (Lys 10 ), long polar residues of the ␣-helix about Ϫ60 degrees, contrary to the expected positive value at this position. This may be the reason for the antiparallel ␤-sheet being more open in Sp1f 3. The NMR solution structures show several relatively well defined polar side chains making hydrophobic contacts with the central apolar residues via their alkyl methylene groups or aromatic rings while pointing their charged atoms away toward the solution. Such residues include Lys 12 and His 23 of Sp1f 2 and Lys 10 and His 21 of Sp1f 3. It is interesting to note that polar groups of residues corresponding to these very positions show interactions with DNA backbone phosphates in the Zif 268/DNA co-crystal structure. Since these side chains seem to have a relatively fixed orientation even free in solution, these interactions could play an important role in correctly docking and orienting the zinc fingers in the major groove of DNA.
The comparison of NMR spectra of Sp1f 23 with those of Sp1f 2 and Sp1f 3 supports the idea that zinc fingers fold as independent, noninteracting entities with structures very relevant in the context of the entire protein bound to DNA. The free solution structures of Sp1f 2 and Sp1f 3 are very similar to those of analogous zinc finger domains of Zif268 bound to DNA. Modeling Sp1-DNA complex by overlaying the Sp1f 2 and Sp1f 3 structures on Zif268 fingers 1 and 2, respectively, predicts the role of key amino acid residues, the interference/protection data, and is consistent with the model of Sp1-DNA interaction proposed earlier. Interestingly, the Arg-Asp buttressing interaction observed in Zif268/DNA crystal structure also seems to be preserved in Sp1f 2 and Sp1f 3, free in solution. The presence of this interaction in single zinc fingers without DNA further strengthens the emerging theme that zinc fingers are preformed, prearranged motifs, ready to interact with DNA even at the level of individual subdomains. Thus we expect only a very small en- FIG. 7. A, space-filling model of Sp1f 2⅐avg⅐min with backbone atoms (dark gray), side chain carbons, and nitrogens of the metal binding and other hydrophobic residues (light gray), side chain carbons and nitrogens of long polar residues (white), side chain hydrogens (white), and zinc atom (black) illustrating the well packed mini-globular nature of the zinc finger domain. Note how the putative DNA binding residues Arg 16 (Ϫ1) and Arg 22 (ϩ6) point toward the exterior, whereas the side chain of Lys 12 , Glu 19 pack in with Phe 14 , and Gln 21 is folded in along the backbone. Lys 12 and His 23 are well positioned to make proposed DNA phosphate contacts via their exposed nitrogen atoms 3 (marked with asterisks). B, space-filling model of Sp1f 3⅐avg⅐min (same coloring scheme as Fig. 9A) illustrating the well packed mini-globular nature of the zinc finger domain. Putative DNA binding residues are Arg 14 (Ϫ1) and His 17 (ϩ3). Lys 10 and His 21 are proposed to contact the DNA backbone via their exposed nitrogen atoms (marked with asterisks).  9. A, stereo presentation of the superposition of Sp1f 2⅐avg⅐min (see "Experimental Procedures") and Zif268 finger 1. Residues 3-27 (excluding the loop between the two Cys residues) of Sp1f 2⅐min⅐avg (dark) were superimposed on the corresponding residues of Zif268 finger 1 (light). B, stereo presentation of the superposition of Sp1f 3⅐avg⅐min (see "Experimental Procedures") and Zif268 finger 2. Residues 3-25 (excluding Pro 9 ) of Sp1f 3⅐min⅐avg (dark) were superimposed on the corresponding residues of Zif268 finger 2 (light). tropic cost to be associated with sequence-specific recognition of DNA by Sp1 zinc fingers two and three.
In conclusion, the structures of Sp1f 2 and Sp1f 3 presented above are important steps toward understanding the DNA binding domain of Sp1. These data offer insight into both the structural features of zinc fingers and the mechanisms of sequence-specific interaction of Sp1 with DNA. Work is in progress using both mutagenesis and NMR spectroscopy to further characterize and understand the Sp1-DNA recognition process and define the chemical basis of the unique features of Sp1 binding.