Target-sequence recognition by separate-type Cys2/His2 zinc finger proteins in plants.

The EPF family is a group of transcription factors containing canonical Cys2/His2-type zinc finger motifs that were first discovered in plants. These zinc finger proteins are characterized by two zinc fingers that are separated by spacers of various lengths, which are much longer than typical spacers (HC-link) in cluster-type zinc finger proteins. We describe here direct evidence that the two zinc fingers make contact with two tandemly repeated AGT core sequences that are separated by about 13 base pairs, by contrast to the cluster-type zinc finger proteins that bind to contiguous triplet sequences. DNA binding affinities were sensitive to the spaces between the core sequences, and the sensitivity to the spacing was greatly affected by the DNA sequence between the core sequences, with GC-rich sequences endowing much higher specificity than AT-rich sequences. Among the members of the EPF family, EPF1 was less sensitive to the spacing than EPF2-5. These results suggest that EPFs recognize their cognate target DNAs not only by the sequence of the core sites but also by the spacing between the core sites and, moreover, that different members in the EPF family distinguish their specific target genes by reference to these two parameters. This represents a unique type of target-sequence recognition among Cys2/His2-type zinc finger transcription factors. In addition, site-directed mutagenesis studies demonstrated that the two zinc fingers contribute synergistically to the binding to DNA, indicating that both fingers are necessary for the high affinity DNA binding.

The Cys 2 /His 2 zinc finger proteins, first discovered in the transcription factor IIIA of Xenopus (1), represent an important class of eucaryotic regulatory proteins. To date, more than 200 different cDNAs have been found to encode the classical zinc finger motif, and many of their products have been shown to play central roles in development and in general transcription (2,3). Typically, more than one zinc finger forms a cluster within the DNA binding domain of transcription factors, and the fingers are separated by short spacers of seven amino acids, known as HC-links (4). Each zinc finger serves as a relatively independent module for binding to DNA, and more than one module is necessary for high affinity binding to DNA (5). There is little or no interaction between adjacent zinc fingers, and the linker region is flexible. These cluster-type zinc finger tran-scription factors interact with contiguous sets of triplet sequences, with one zinc finger module making contact with one triplet (6 -10).
The EPF family is a group of DNA-binding proteins with two Cys 2 /His 2 -type zinc finger motifs, which we isolated from Petunia. One member of the family, EPF1, was originally isolated as a petal-specific DNA-binding protein that might possibly regulate the petal-specific expression of the gene for 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) (11). The zinc finger motifs of other members (EPF2-4, EPF2-5, and EPF2-7) exhibit similarity with those of EPF1, but they make up a subfamily different from EPF1 in terms of the entire protein structure (12). Each member of the EPF family is preferentially expressed in different floral organs. Thus, EPFs are candidates for floral organ-specific transcription factors that control flower development acting in the network of transcription factors.
All of the members of the EPF family have two canonical Cys 2 /His 2 -type (C-X 2 -C-X 3 -F-X 5 -L-X 2 -H-X 3 -H) zinc fingers that are separated by spacers of various length. The spacers range from 31 amino acids in EPF2-4 to 61 amino acids in EPF1, by contrast to zinc finger proteins of the cluster type that have orderly short spacings. The unique structure of the EPF proteins raises the question of whether the two fingers are also spatially separated in the tertiary configuration in the DNA binding domain and, moreover, whether each zinc finger makes contact with a separated set of nucleotides in the target DNA. If this were so, the next question would be whether each protein with a spacer of a different length recognizes different spacings in the target DNA, thereby recognizing its cognate target gene.
To address these issues, we characterized DNA-protein interaction using wild-type and mutated recombinant EPF proteins. Our analyses clearly demonstrated that the EPF zinc finger proteins make contact with two tandem core sequences that are spatially separated. The DNA binding affinity was found to be sensitive to the spacing between the core sequences, and the sensitivity was dependent on the GC content in the spacer region. Our data also suggest that each member of the EPF family has a specific preference for a particular spacing. The recognition of spacings might be a mechanism that augments target-sequence specificity by compensating for the limited number of nucleotides that can be recognized by only two zinc fingers. Site-directed mutagenesis of the EPF protein revealed that the two zinc fingers act synergistically in the binding to DNA in spite of their separated configuration in the protein structure. Our observations suggest a unique mechanism for target-sequence recognition by the EPF family among Cys 2 /His 2 -type zinc finger transcription factors.

EXPERIMENTAL PROCEDURES
Construction of Plasmids-A truncated form of the gene for EPF2-5 (residues 66 -210) was created for expression in a rabbit reticulocyte lysate system as follows. The DNA fragment was amplified using an upstream primer (GGCCTCTAGAACAATGGGAACTACACCCGGTTC-AACTGATACTACT) that contained a restriction site for XbaI and an initiation codon and a downstream primer (GGCCCTCGAGTCAACTT-CCACTATGACCACCGCCGTCACGGTG) that contained a restriction site for XhoI and a stop codon. The amplified DNA fragment was digested with XbaI and XhoI, then inserted into the pBC SK Ϫ (Stratagene, La Jolla, CA) vector to yield pBC EPF2-5ZF. Full-length versions of EPF2-5 and EPF1 were also created by essentially the same procedure.
For production of the truncated form of EPF2-5 in Escherichia coli, the same region of the EPF2-5 gene as described above was amplified by polymerase chain reactions to create additional restriction sites, using an upstream primer that contained a SmaI site and a downstream primer that contained an XhoI site. The products of polymerase chain reaction were digested with SmaI and XhoI and then inserted between the XmnI and SalI sites in an expression vector pMAL-c2 (New England Biolabs, Beverly, MA), in-frame with the maltose-binding protein (MBP) 1 gene upstream of the XmnI site to yield pMAL-EPF2-5.
Site-directed Mutagenesis-Mutations were introduced into the coding region for the EPF2-5ZF by the recombinant polymerase chain reaction method (13). Amplification by polymerase chain reaction was carried out with plaque-forming unit DNA polymerase (Stratagene) using mutant forward primers and an M13-21 primer for the carboxylterminal portion of the protein, mutant reverse primers complimentary to the mutant forward primer, and an M13 reverse primer for the amino-terminal portion, respectively, with pBC EPF2-5ZF as the template. One-tenth each of the products from the two reactions was combined, and the second polymerase chain reaction was carried out with these templates, which were complimentary to each other at one end, using the M13 reverse primer and the M13-21 primer. Reaction products were digested with XbaI and XhoI and then inserted into the pBC SK Ϫ vector. All constructs were confirmed by DNA sequencing.
Transcription and Translation in Vitro-pBC EPF2-5ZF and its mutant derivatives were digested by XhoI, treated with proteinase K, extracted with phenol:chloroform (1:1, v/v), and then purified with a Wizard DNA cleanup kit (Promega Corp., Madison, WI). The templates were transcribed with T3 RNA polymerase from the T3 promoter in the pBC SK Ϫ vector, and the transcripts were purified by extraction with phenol:chloroform (1:1, v/v) and ethanol precipitation. One g of transcripts was translated in a rabbit reticulocyte lysate in the presence of [ 35 S]methionine (1000 Ci/mmol; Amersham Corp.). The sizes and amounts of the products were checked by electrophoresis in a 12.5% SDS-polyacrylamide gel and subsequent autoradiography. Then equivalent amounts of the products were used for subsequent gel-shift experiments. The wild-type EPF2-5ZF protein produced in this system is referred to hereinafter as EPF2-5ZF-R.
Full-length EPF2-5 and EPF1 were also expressed in the rabbit reticulocyte lysate in a similar manner. The proteins are referred to as EPF2-5-R and EPF1-R, respectively.
Expression of a Truncated Form of EPF2-5 in E. coli and Its Purification-pMAL-EPF2-5ZF was introduced into E. coli strain TB1. The transformants were grown at 37°C in LB medium with ampicillin (0.1 mg/ml) until A 600 reached 0.5; and then isopropyl-␤-D-thiogalactopyranoside was added to 0.3 mM, followed by incubation for 2 more h to induce the expression of the MBP-EPF2-5ZF fusion protein. Cells were harvested and lysed by sonication, and the resulting cell lysate was loaded onto an amylose-resin affinity column. The column was washed with buffer (10 mM sodium phosphate, pH 7.2, 1 M NaCl, 0.25% Tween 20, 10 mM ␤-mercaptoethanol, 0.5 mM phenylmethylsulfonyl fluoride, and 10 M zinc acetate), and the proteins that bound to the column were eluted with the same buffer plus 10 mM maltose. The eluted proteins were concentrated and subjected to proteolytic digestion with factor Xa (1%, w/w; New England Biolabs) at 30°C for 16 h, which removed the amino-terminal MBP domain from the fusion protein to leave a truncated form of EPF 2-5 with no junction sequence (EPF2-5ZF-E). The digestion mixture was loaded onto a Cibacron Blue 3GA column (Sigma) that had been equilibrated with 20 mM Tris-HCl, pH 7.5, 1 mM dithiothreitol, 10 M zinc acetate, and 50 mM NaCl, and the bound proteins were eluted with a linear gradient of NaCl from 50 mM to 1 M in the buffer. The fractions containing EPF2-5ZF-E (0.8 -1.0 M NaCl) were pooled and concentrated. Protein concentrations were determined by the method of Bradford (14).
DNA Binding Assays-All DNA binding reactions were carried out in 25 mM HEPES-KOH, pH 7.6, 40 mM KCl, 0.1% Nonidet P-40, 0.01 mM ZnCl 2 , 10 g/ml poly(dI⅐dC), and 0.1 mM dithiothreitol. Gel-shift assays were performed with 10,000 cpm of end-labeled probe and about 1 l of the product of in vitro translation. After incubation for 20 min at room temperature, the mixtures were subjected to electrophoresis in a 0.7% agarose/3% polyacrylamide gel, as described previously (11).
To determine the dissociation constant for the binding of EPF2-5ZF to its target DNA, we incubated 0.1 g of unlabeled probe DNA with EPF2-5ZF-E. After electrophoresis in a 10% polyacrylamide gel in 0.5 ϫ TBE buffer (0.09 Tris borate, pH 8.0, 0.002 M EDTA), the gel was stained with ethidium bromide, and then free and bound DNA were quantified by scanning the gel with a CCD camera-based densitometer (ATTO, Tokyo, Japan) under UV light (254 nm).
Hydroxyl radical footprinting was performed as described by Tullius and Dombroski (15) with modifications. A 32 P-end-labeled probe (200,000 cpm) was incubated with 5 ng of EPF2-5ZF-E in 30 l of binding buffer for 20 min. Hydroxyl radical reactions were started by the addition of 2 l each of 0.45% H 2 O 2 , 0.15 mM [(NH 4 ) 2 Fe(SO 4 ) 2 ]-EDTA, and 50 mM sodium ascorbate, and the reactions were quenched after 2 min by the addition of 5 l of 0.1 M thiourea. The reaction products were subjected to electrophoresis in a 0.7% agarose/3% polyacrylamide gel, and DNA⅐protein complexes (1:1) were eluted from the gel with 0.5 M ammonium acetate after autoradiography. The eluate was extracted twice with phenol and once with ether, and DNA was precipitated twice with ethanol. The purified digestion products were separated on a 12% polyacrylamide-urea sequencing gel. Autoradiography was carried out with an image scanner (BAS 2000; Fuji Photo Film, Kanagawa).

EPF2-5 Makes Contact with Two Separate AGT Motifs-
The amino acid sequences of the two zinc finger regions in EPF2-5 and EPF1 are shown in Fig. 1. There is substantial sequence similarity between the two zinc fingers of each protein and also between those of the two proteins, in particular around the strongly conserved QALGGH sequence. One of the features of the structures of these proteins is that the two fingers are separated by spacers of different lengths, with 44 and 61 amino acids in EPF2-5 and EPF1, respectively, between the second His in the first finger and the first Cys in the second finger. The products of the genes for both EPF1 and EPF2-5 were origi- nally found to bind to a tetramer of the EP1 sequence (TGATTTTGACAGTGTCACCTT), the binding sequence of a petal-specific nuclear factor within the promoter region of the EPSPS gene (11,12). Subsequently, a tetramer of the truncated form of EP1 (EP1S:TTGACAGTGTCAC) was found to have much stronger affinity for these proteins. This observation led us to speculate that the spaces between each unit of the binding sequence in the multimers might be a critical factor in the difference in binding affinities for the two probes and, moreover, that the long spacer region between the two zinc fingers might be responsible for the recognition of the spacing on the DNA.
To characterize the interactions of the EPFs and their target DNAs, we first determined minimal binding sequences by scanning mutation of the target DNA sequences using a truncated form of EPF2-5 that has been expressed in a rabbit reticulocyte in vitro translation system (EPF2-5ZF-R). Assuming from previous observations that each one of the two zinc fingers interacts with one unit of the EP1S, we used two tandemly repeated sequences as probes and competitors. An initial competition experiment with competitors with two-base mutations revealed that mutations in the CAGT sequence severely reduced the competition (M2 and M3), whereas those in the region that flanked the CAGT sequence had little effect (M1, M4, and M5; Fig. 2A). Subsequent experiments with one-base mutations revealed that mutation of any one base in the sequence AGT (M7, M8, and M9) reduced the binding, whereas a mutation in the preceding C (M6) had little effect. Furthermore, substitution of all other bases outside the three bases had little effect on the competition (M18; Fig. 2B). These results indicate that the tandemly repeated AGT is the minimal core sequence for the high affinity binding of EPF2-5.
For further characterization of the protein⅐DNA complex using larger amounts of the EPF2-5ZF protein, we overexpressed a fusion protein with MBP in E. coli and subsequently released the EPF2-5ZF domain from the MBP domain by proteolytic cleavage. Fig. 3 shows that the protein was purified to near homogeneity. The apparent dissociation constant (K d ), determined using EPF2-5ZF-E and a dimer of the EP1S sequence as a probe, was 120 nM (Fig. 4). Although this value indicates that the affinity of interaction is relatively low as compared to other DNA-binding proteins, it is similar to the binding of Tramtrack protein from Drosophila, which contains two clustered zinc fingers, to its native binding site (ϳ400 nM) (16).
Hydroxyl radical footprinting provides high-resolution details of a protein⅐DNA complex because of the small size of the DNA-cleavage reagent. We used this technique to obtain additional information about the EPF⅐DNA complex. A mixture of EPF2-5ZF-E and a probe DNA containing two tandemly repeated EP1S sequences was subjected to the hydroxyl radical reaction, and EPF2-5ZF⅐DNA complexes (1:1) were purified on and eluted from a gel. A sequencing gel revealed clear footprints at two tandem AGT sequences that were separated by 13 bp in the top strand and complementary sequences (ACT) in the opposite strand (Fig. 5). We had previously excluded the possibility that EPF binds to DNA as a multimer (12), and therefore, our results indicated that one molecule of EPF2-5ZF bound to two separate AGT core sequences, with each independent zinc finger making contact with one AGT sequence. The bands of DNA between the core sequences in the footprint, in particular those adjacent to the AGT core sequences, were somewhat weaker in intensity than those outside the core sequences, suggesting some interaction of the DNA with the spacer region between the two zinc fingers. We have not yet determined the core binding sequences of the other members of the EPF family. However, at least EPF1 and EPF2-4 are very likely to bind to the same AGT core sequences because these two proteins bind to the EP1S tetramer with similar affinity to that of EPF2-5, and the amino acid sequences in the zinc finger motifs of the respective proteins are very similar (12).
EPF Recognizes the Spacing between Two Core Sequences-Next, we examined whether the spacings between the core sequences in the DNA affects the binding affinity. We used probes with different spacings, in which AT or GC repeats were inserted between the EP1S units. The results showed that EPF2-5 was moderately sensitive to the spacing when the spacer region was AT-rich (Fig. 6a). The affinity was highest with a spacing of 13 and 15 bp. However, suboptimal spacings were well tolerated. By contrast, when the spacer sequences were GC-rich, the recognition of the spacing was much stricter than with the AT-rich sequences (Fig. 6b), and the binding affinity showed a sharp peak at a spacing of 13 bp (Fig. 6c).
These results indicate that EPF2-5 recognizes the spacing between the two core sequences in the target DNA and that the sequence between the cores, presumably its AT or GC content, greatly affects the sensitivity of the protein to the spacing. The strict sensitivity to the spacing, in particular with the GC-rich spacers, suggests that the protein structure in the spacer region of the EPF2-5 protein is relatively rigid and that the spatial configuration of the two fingers is relatively stable.
The difference in the sensitivity to AT-and GC-rich spacers was presumably due to differences in the flexibility of the DNA in the spacer region. Another possibility is that the direction of DNA bending upon the binding of the EPF protein might be responsible for the difference. In other words, the direction of the DNA bending with AT-rich spacers was more favorable for binding with suboptimal spacing. However, we did not detect any DNA bending with spacers of either base composition (data not shown). Therefore, it appears that it is simply the difference in structural flexibility that is likely to be the cause of the differences in sensitivity.
In the DNA binding experiments in Fig. 6, we used fulllength EPF2-5 proteins instead of the truncated proteins that were used in the other experiments. The gel-shift assays indicated that the binding affinities of both forms were similar, which suggest that there is no additional determinant of DNA sequence specificity other than the zinc finger region, at least with the probes used in this work.
Each EPF Protein Has a Different Preference for a Particular Spacing-To compare the recognition of the spacing by different members of the EPF family, the same series of the "spacingmutant" probes was examined for binding to EPF1. As shown in Fig. 6d, the binding affinity was similarly high with probes with spacing from 12 to 16 bp, even with GC-rich spacers. This recognition of the spacing was very relaxed as compared to that by EPF2-5, which showed stronger specificity for a spacing of 13 bp. This difference could be attributable to differences in the number and the sequence of the amino acids in the spacer regions between the two fingers in the two proteins.
Although the difference in the recognition of spacing between EPF1 and EPF2-5 was rather minor, it is quite possible that each member of the EPF family distinguishes its target genes from differences in spacing, as well as from the primary sequences of the three nucleotides recognized by each finger. This possibility will be discussed below with reference to newly identified members of this protein family with spacer regions of The probes were incubated with EPF2-5ZF-E protein and subjected to hydroxyl radical reactions; bound fractions were purified by the gelshift procedure as described under "Experimental Procedures." The purified reaction products (lanes B) were separated on a 12% polyacrylamide-urea sequencing gel together with hydroxyl radical-treated probes prepared in the absence of the protein (lanes F) and products of Maxam-Gilbert G sequencing reactions (lanes G) for the Top and Bottom strands. Strongly protected regions are indicated by bars beside the sequences. very different lengths between the two fingers. 2 Two Zinc Fingers Contribute Synergistically to the Binding to DNA-A common perception is that a protein must have a minimum of two fingers for high affinity DNA binding by Cys 2 /His 2 -type zinc finger proteins. However, in the EPF family, two zinc fingers appear to make contact with core DNA sequences even more independently than they do in proteins with clustered zinc fingers. To address this issue, we first tested a probe in which one of the two core sequences was disrupted by a base substitution. As shown in Fig. 7a, the affinity of binding to this probe was severely reduced (lane 2) as compared to that with two wild-type core sequences (lane 1). Furthermore, substitution of the first histidine to asparagine in the first zinc finger (Fig. 7b, lane 1) or the second zinc finger (Fig. 7b, lane 2), which should disrupt one of the zinc finger structures, abolished the binding almost completely. These results indicate that two fingers are necessary for the high affinity DNA binding and that the two zinc fingers contribute synergistically to the overall binding to DNA, in spite of the apparently more independent nature of each zinc finger as compared to that of cluster-type zinc finger proteins. DISCUSSION The structure of EPF zinc finger proteins is unique in that each molecule has two zinc finger motifs that are separated by a spacer of various lengths, and all of the zinc finger motifs have a conserved QALGGH motif. These unusual structural features led us to investigate the way in which EPF proteins recognize their target sequences. We obtained direct evidence that the proteins bind to two tandemly repeated but separated AGT core sequences, with each finger making contact with one of the core sequences. The binding affinity was greatly affected by the spacing between the two core sequences, in particular when the spacer region was a GC-rich sequence. These observations strongly suggest that the two fingers in the EPF proteins are spatially separated in the tertiary structures and that the structure of the peptide in the spacer region between the two zinc fingers is considerably rigid.
The sensitivity to the spacing was clearly different among the members of the EPF family. EPF2-5 had a relatively high specificity, preferring a spacing of 13 bp, whereas EPF1 bound with similar affinity when the spacing varied between 12 and 16 bp. The difference in tolerance to the spacing is most likely due to the difference in the flexibility of the spacer region between the two fingers. This difference ought to be reflected to the recognition of target genes by these two members of the family. With respect to the transcriptional regulation, it seems likely that some of the target genes of EPF2-5 and EPF1 are common but that EPF1 regulates the expression of additional target genes that have promoter elements with shorter and longer spacing between the AGT cores.
Recently, we cloned cDNAs for more than 30 zinc finger proteins with the QALGGH motif. Most of these cDNAs encode proteins with two zinc fingers, and moreover, the spacing between the two fingers is highly variable, from 19 amino acids for the shortest to 232 for the longest. 2 It seems likely that these proteins, with different spacing between the two fingers, recognize different spacing between the core target sequences in their target DNAs. The presence of such molecules strongly suggests that recognition of the spacing in target sequences is one of the parameters for the discrimination of cognate target genes by EPF-type zinc finger transcription factors, in addition to the recognition of the three nucleotides by each zinc finger. In view of the limited number of nucleotides recognizable by two zinc fingers, i.e. six, the recognition of spacing is most likely to be the mechanism that compensates for the low sequence specificity. This putative recognition of spacing is reminiscent of the discrimination of target sequences by nuclear receptor- FIG. 6. Recognition of the spacing between the two core sequences in the target DNA. a, probes containing four tandem repeats of the EP1S sequences with various spacings between each unit were tested for binding to EPF2-5-R. The probes contained TA ϫ 2n (n ϭ 1 Ϫ 4) sequences between each unit of the EP1S or a two-base deletion from both ends of the EP1S. The spacing between the centers of the two AGT sequences is shown at the top of each lane. The probes were end-labeled and subjected to gel-shift assays with EPF2-5-R. b, probes containing GC ϫ 4n (n ϭ 1-2) sequences instead of AT ϫ 2n in a were tested for binding to EPF2-5-R. c, probes containing spacers of GC repeats differing sequentially by one nucleotide between the two EP1S sequences in a dimer were tested for the binding to EPF2-5-R. d, the same set of the probes as in c was tested for the binding to EPF1. Positions of shifted bands are indicated by arrows. The faint bands with lower mobility in some lanes are due to the binding of two protein molecules to the probe.
FIG. 7. Two fingers are necessary for high affinity DNA binding. a, effect of a mutation in one half-site of the tandem repeats in the target DNA. A wild-type probe (lane 1) and a mutant probe with an AGT-to-CGT mutation in the second core sequence in the dimer of the tandem repeats (lane 2) were tested for binding to EPF2-5ZF-R. b, effects of disruption of either one of two zinc fingers on DNA binding. The first His residue in either of the two zinc fingers of EPF2-5ZF-R was replaced by Asp by site-directed mutagenesis, as described under "Experimental Procedures." The mutated proteins carrying a His-to-Asp mutation in the first finger (lane 1) and in the second finger (lane 2), respectively, and wild-type protein (lane 3) were expressed in a rabbit reticulocyte lysate system and subjected to gel-shift assays with a tetramer of the EP1S as a probe. Positions of shifted bands are shown by arrows. The weaker band with lower mobility is due to the binding of two protein molecules to the probe DNA. type transcription factors that contain Cys 2 /Cys 2 zinc finger motifs. The nuclear receptors are known to recognize the spacing in the target sequence, thereby distinguishing cognate promoter sequences from those of other members of the protein family (17,18). However, these factors bind to DNA as dimers, with each subunit binding to a half-site in the repeats, and this binding is clearly different from that by the EPF family, whose members bind to tandemly repeated sequences as a monomer.
There is another characteristic structural feature of the zinc finger region of the EPF family: strongly conserved sequence (QALGGH motif) within the region that corresponds to the ␣-helical region in several zinc finger proteins that have previously been characterized. This region, which is strongly conserved in the classical zinc finger motifs of many transcription factors, has been shown to be a DNA recognition surface that faces the target DNA. In a subclass of zinc finger proteins, such as Sp1, Krox 20, and Zif268 (6 -10), three positions in the ␣-helical region have been discovered to participate in major interactions with nucleotides (CXXCXXXFXX a XXX b LXXc HXXXH). These amino acids have been shown to recognize one nucleotide in the triplet of the target sequence by mutational and crystallographic studies. Statistical studies revealed that the three positions are highly variable among many zinc finger motifs (19). The author claimed that most variable amino acids within a highly conserved region are at base-recognition positions, as a general rule. In the EPF family, the strong conservation of the QALGGH motif in the corresponding region suggests that this region serves as a DNA-recognition surface. Phage display selection from a library of zinc fingers showed that Ala at position b might participate in base recognition (20); however, the second Gly at position c is unlikely to make contact with bases, at least via its side chain, because this amino acid has no side chain. If one or two amino acids in this region participate in the recognition of nucleotides, the fixed sequence of the QALGGH motif in more than 30 proteins limits still further the diversity of the target sequences that can be recognized by the members of the EPF family. Taking into account these factors, we cannot neglect the importance of the length of the spacings.
We are in the process of analyzing the three-dimensional structure of a complex between EPF2-5 and its target DNA by x-ray crystallography and two-dimensional nuclear magnetic resonance. The information from these analyses should reveal the amino acids involved in the recognition of the DNA. We also expect our physicochemical analyses to reveal the role of the spacer region between the two fingers in the contacts with DNA. The optimal spacing of 13 bp between the ATG core sequences for the binding of EPF2-5 is longer than one helical turn (10.5 bp). It will be of interest to determine whether the spacer region lies along the major groove of DNA, as has been proved for cluster-type zinc finger proteins, such as GLI (21), or whether it makes contact with the core sequences across the major groove.
The SUPERMAN gene is an important regulator of development in Arabidopsis thaliana. Recessive mutations of this gene cause extra stamens to form interior to the normal third whorl stamens (22). This gene was recently cloned and found to encode a protein that contained only one classical zinc finger with the QALGGH motif (23). Several other genes (At ZFP family) in Arabidopsis that encode proteins with one zinc finger with the QALGGH motif have also been reported (24). So far, all the Cys 2 /His 2 -type zinc finger proteins that have been found in plants have the QALGGH motifs in spite of the different strategies used for cloning their genes, suggesting that single and double zinc finger proteins that contain the QALGGH motif form a major class of transcription factors in plants. Target sequences for these single-finger proteins have not yet been identified, although the AP3 and PI genes are potential candidates for the target genes of the SUPERMAN protein.
With respect to DNA binding, our results indicate that one finger recognizes only three nucleotides; therefore, one finger is obviously insufficient for target-sequence specificity. Additionally, we have demonstrated that more than one finger is necessary for high affinity DNA binding. Two possibilities can account for the recognition of target sequence by the singlefinger proteins: 1) some other region of the protein is involved in the recognition of DNA, probably in combination with the zinc finger; and 2) the protein binds to DNA as a dimer or in association with other sequence-specific DNA-binding proteins. These possibilities should be taken into consideration when the target sequences are explored. The presence of a basic leucinezipper motif in these proteins, which might be a domain involved in protein-protein interactions, supports the second possibility.