Development of zinc finger domains for recognition of the 5’-ANN-3’ family of DNA sequences and their use in the construction of artificial transcription factors

In previous studies we have developed Cys(2)-His(2) zinc finger domains that specifically recognized each of the 16 5'-GNN-3' DNA target sequences and could be used to assemble six-finger proteins that bind 18-base pair DNA sequences (Beerli, R. R., Dreier, B., and Barbas, C. F., III (2000) Proc. Natl. Acad. Sci. U. S. A. 97, 1495--1500). Such proteins provide the basis for the construction of artificial transcription factors to study gene/function relationships in the post-genomic era. Central to the universal application of this approach is the development of zinc finger domains that specifically recognize each of the 64 possible DNA triplets. Here we describe the construction of a novel phage display library that enables the selection of zinc finger domains recognizing the 5'-ANN-3' family of DNA sequences. Library selections provided domains that in most cases showed binding specificity for the 3-base pair target site that they were selected to bind. These zinc finger domains were used to construct 6-finger proteins that specifically bound their 18-base pair target site with affinities in the pm to low nm range. When fused to regulatory domains, these proteins containing various numbers of 5'-ANN-3' domains were capable of specific transcriptional regulation of a reporter gene and the endogenous human ERBB-2 and ERBB-3 genes. These results suggest that modular DNA recognition by zinc finger domains is not limited to the 5'-GNN-3' family of DNA sequences and can be extended to the 5'-ANN-3' family. The domains characterized in this work provide for the rapid construction of artificial transcription factors, thereby greatly increasing the number of sequences and genes that can be targeted by DNA-binding proteins built from pre-defined zinc finger domains.


Introduction
The study of protein-DNA interactions is central to our understanding of the regulation of genes and the flow of genetic information characteristic of life. One practical application of the development of a protein-DNA recognition system is the construction of artificial transcription factors that might be used to purposefully regulate gene expression. We have demonstrated that gene expression can be specifically altered through the use of designed polydactyl zinc finger transcription factors that bind 18 basepairs (bps) of DNA sequence. Because of their extended DNA recognition site, these proteins have the potential to be genome-specific transcriptional regulators (1,2). Targeting of only 9 bps of sequence can also result in gene regulation wherein chromatin structure provides for an additional level of specificity (3,4). Because a universal system for gene regulation would provide many new opportunities in basic and applied biology and medicine, the development of such a system is of considerable interest.
Two key features have made Cys 2 -His 2 zinc finger domains the most promising DNA recognition motifs for the construction of artificial transcription factors, modular structure and modular recognition. Each domain consists of approximately 30 amino acids and folds into a ββα structure stabilized by hydrophobic interactions and the chelation of a zinc ion by the conserved Cys 2 -His 2 residues (5,6). DNA recognition of typically 3 bps is provided by presentation of the α-helix into the major groove of DNA. Binding of longer DNA sequences is achieved by covalent tandem repeats of these domains. We have previously reported the phage display selection of zinc finger domains that recognize each of the 5'-GNN-3' DNA subsites and the refinement of these domains by site-directed mutagenesis (7,8). These domains can be assembled to create polydactyl zinc finger proteins that recognize extended 18 bp DNA sequences (1,2). DNA addresses of this length have the potential to be unique within any genome. In addition to imposing constitutive transcriptional regulation on endogenous genes, these transcription factors can be made hormone-dependent by fusion to designed ligand-binding domains prepared from a variety of nuclear hormone receptors (9). To allow for the rapid construction of zinc finger-based transcription factors that bind any DNA sequence, it is by guest on March 24, 2020 http://www.jbc.org/ Downloaded from 5 residues of the helix are important in domain stability and in some cases for DNA recognition.
Leucine, for example, is the relatively conserved amino acid residue typically found at helical position 4 in zinc fingers of this type. The side chain of this residue packs into the hydrophobic core of the domain and is believed to be key in stabilizing the domain. Positions 1 and 5 of the α-helix have been shown to make direct or water-mediated contacts with the phosphate backbone of the DNA. A particularly important role in base recognition, as it relates to the development of modular recognition domains that bind non-5'-GNN-3' sequences, is played by the side chain of residue 2.
In Zif268, aspartic acid is found at position 2 of each α-helix and plays a role in determining base specificity at two positions. By making a pair of buttressing hydrogen bonds with Arg -1 within the same helix, this residue acts to stabilize the Arg -1 to 3' guanine interaction within the 3 bp recognition site of the domain. This conclusion is supported by structural observations and mutagenesis studies (8,10,11,23). It is the role this residue plays in specifying base identity at another position that limits domain modularity in recognition. In finger 3 for example, the carboxylate of Asp 2 can accept a hydrogen bond from the N4 of cytosine or the N6 of adenine that is base-paired to the 5' guanine or thymine, respectively, of the finger-2 subsite (10,11,23). Consequently, Zif268 does not discriminate well between 5'-GGG-3' and 5'-TGG-3' at the finger-2 subsite and recognition of 5'-AGG-3' or 5'-CGG-3' is precluded. A similar interaction is seen between the Asp 2 of finger-2 and the corresponding base within the finger-1 recognition site. This cross-strand contact to a base outside the canonical three-nucleotide recognition site effectively restricts the identity of the base that can be recognized at the 5' position of the preceding finger's binding site. Thus, while modularity of DNA recognition is a key feature of zinc finger domains, domain-independent modular interaction is not always complete. This type of constraint to zinc finger specificity and modularity, referred to as the target site overlap problem, has been the subject of much discussion (24)(25)(26).

Multitarget Specificity Assay and Gel mobility shift analysis
The zinc finger-coding sequence was subcloned from pComb3H into a modified bacterial expression vector pMal-c2 (New England Biolabs) (35). After transformation into XL1-Blue (Stratagene) the zinc finger-maltose-binding protein (MBP) fusions were expressed by addition of 1 nM isopropyl β-D-thiogalactoside (IPTG). Freeze/thaw extracts of these bacterial cultures were applied in 1:2 serial dilutions to 96-well plates coated with streptavidin (Pierce), and were tested for DNA-binding specificity against each of the sixteen 5'-GAT ANN GCG-3' target sites.
Detection occurred by addition of alkaline phosphatase substrate (Sigma), and the OD405 was determined by a micotiter plate reader with SOFTMAX2.35 (Molecular Devices).
Gelshift analysis was performed with purified protein (Protein Fusion and Purification System, New England Biolabs) essentially as described (7).

Construction of polydactyl zinc finger proteins
Three-finger proteins were constructed by finger-2 stitchery using the SP1C framework as described (1). The proteins generated in this work contained helices recognizing 5'-GNN-3' 9 DNA sequences (7), as well as 5'-ANN-3' and 5'-TAG-3' helices described here. Six finger proteins were assembled via compatible XmaI and BsrFI restriction sites. Analysis of DNAbinding properties were performed using freeze/thaw extracts from from IPTG-induced bacteria.
For the analysis of the capability of these proteins to regulate gene expression they were fused to the activation domain VP64 or repression domain KRAB of Kox-1 as described earlier ((1,2); VP64: tetrameric repeat of the herpes simplex virus VP16 minimal activation domain) and subcloned into pcDNA3 (Invitrogen) or the retroviral pMX-IRES-GFP vector ((36); IRES, internal ribosome-entry site; GFP, green fluorescent protein).

Computer modeling
Computer models were generated using InsightII (Molecular Simulations, Inc.

Library construction and selection
Selections of one our previously reported phage display libraries for modular zinc finger domains that bind to 3 bp DNA sites containing 5' nucleotides other than guanine or thymine have met with no success (data not shown). This phage display library (7) (8,23). The affinity of C7.GAT, measured by gel mobility shift analysis, was found to be relative low, ~ 400 nM as compared to 0.5 nM for C7 (7). In addition, one domain was selected from this library against a finger-2 subsite 5'-TAG-3'. The amino acid sequence for this helix was identified as RED-N-LHT (Fig. 3z).
The most interesting observation was the selection of amino acid residues in position 6 of the α-helices since this residue typically specifies binding to the 5' nucleotide of the 3 bp subsite.
In contrast to recognition of a 5' guanine where a direct base contact is achieved by Arg or Lys in position 6 of the helix, no direct interaction has been observed in protein/DNA complexes for any other nucleotide in the 5' position (11)(12)(13)(14)(15)(16)(17)(18). Selection of domains against finger-2 subsites of the type 5'-GNN-3' had previously generated domains containing only Arg 6 which directly contacts the 5' guanine (7). In analogy with guanine specification, one could assume that the recognition of 5' adenine could be achieved by certain amino acid residues in position 6 of the αhelix. However, unlike the results for 5'-GNN-3' zinc finger domains, selections of the phage display library against finger-2 subsites of the 5'-ANN-3' type identified domains containing a variety of amino acid residues: Ala 6 , Arg 6 , Asn 6 , Asp 6 , Lys 6 , Glu 6 , Thr 6 or Val 6 ( Fig. 2).

Characterization of zinc finger domains that bind to the 5'-ANN-3' family of DNA sequences
Finger-2 variants of C7.GAT were subcloned into a bacterial expression vector as fusions with maltose-binding protein (MBP) and proteins were expressed by induction with 1 mM IPTG (35). Proteins were tested by enzyme-linked immunosorbant assay (ELISA) against each of the  Fig. 3k) showed cross-reactivity for the middle nucleotide that was reduced by a Leu 5 to Thr 5 substitution (Fig. 3aa).

Generation of polydactyl proteins containing 5'-ANN-3' zinc finger domains
We have previously demonstrated that endogenous and transgene regulation can be achieved with 6-finger proteins containing zinc finger domains specifically recognizing 5'-  . 5a). Activation was specific and no regulation of the reporter containing 6x2C7-binding sites was observed (Fig. 5b). Further, transfection of a p2C7-VP64 expression construct (35) activated luciferase expression only when the promoter contained 6x2C7-binding sites (Fig. 5b), but not when the promoter contained the 5xAart-binding sites (Fig. 5a) subjected to flow cytometry to analyze the expression levels of ErbB-2 and ErbB-3 (Fig. 6).

Discussion
Zinc finger proteins of the Cys 2 -His 2 type have shown promise as versatile DNA-binding devices that would be essential components of a universal system for gene regulation (1)(2)(3)(4)35).
Ideally, zinc finger proteins could be readily constructed to bind any DNA sequence; however, information regarding zinc finger/DNA interactions is constrained to just a few of the 64 possible 5'-NNN-3' DNA subsites. Structural analysis of several domains that may specify a 5' nucleotide other than guanine has not revealed any specific interaction from position 6 of the αhelix (11)(12)(13)(14)(15)(16)(17)(18)22). Thus at present it is not possible to directly design zinc finger domains that specifically bind any given 3 bp DNA subsite.
While phage display selection coupled with refinement by site-directed mutagenesis has provided domains specifically recognizing each of the 16 DNA triplets of the 5'-GNN-3' type  (40). This contrasts the present study where Asn 6 was frequently selected in finger-2 domains for 5' adenine recognition (Fig. 2). These domains did indeed generally favor binding to a 5' adenine with some cross-reactivity to a 5' guanine as shown by multi-target ELISA (Fig. 3c, 3k, 3n, 3o, and 3t). Zinc finger/DNA recognition, as illustrated by these and many other examples, is more complex than a simple amino acid to base code.
Phage display selection of modular zinc finger domains that bind to subsites containing a 5' adenine or cytosine from our previously described finger-2 library based on the 3-finger protein C7 (7) failed due to the limitations imposed by Asp 2 of finger 3 of this protein which makes a cross-subsite contact to the nucleotide complementary to the 5' position of the finger-2 subsite (Fig. 1A). In the library reported here, this contact was eliminated by exchanging finger 3 of C7 with a domain lacking Asp 2 yielding C7.GAT (Fig. 1A). In most cases, novel 3-finger proteins that bound finger-2 subsites of the 5'-ANN-3' type were obtained. For subsites 5'-AGC-3' and 5'-ATC-3', however, no tight binding proteins were identified. This was not expected since domains that bind the subsites 5'-GGC-3' and 5'-GTC-3' were previously selected and shown to exhibit excellent DNA-binding specificity and affinity for their target sites (7). One potential explanation for this might be the limited randomization strategy used here based on VNS codons that do not encode for the aromatic amino acids. This limited randomization strategy was chosen since within the domains selected for 5'-GNN-3' recognition, no aromatic amino acid residues were selected even though they were included in the randomization strategy (7). Several naturally occurring zinc finger domains do indeed contain aromatic residues, for example finger 2 of CFII2 (VKD-Y-LTK; (21) (12)). It is tempting therefore to speculate that aromatic amino acid residues might be important for the recognition of subsites 5'-AGC-3' and 5'-ATC-3'. Alternatively, high-affinity interactions with these particular subsites might not be possible with these proteins but at this time we believe this to be an unlikely explanation.
In recent years it has become clear that the recognition helix of Cys 2 -His 2 zinc finger domains can adopt different orientations relative to DNA in order to achieve optimal binding (26). However, the orientation of the helix in this region may be partially restricted by interactions involving the zinc ion, His 7 , and the phosphate backbone since these interactions are frequently observed in structural studies (Fig. 7a). Comparative studies of zinc finger/DNA complexes has led to the conclusion that the Cα atom of position 6 is usually 8.8 ± 0.8Å away from the nearest heavy atom of the 5' nucleotide in the DNA subsite. This distance is most readily bridged by the long side chains presented by the amino acids Arg 6 or Lys 6 that most typically provide for 5' guanine specification (26).  (14).
In the pres ent study, eight different amino acid residues were selected at position 6 of finger-2 of the C7.GAT library for recognition of DNA subsites of the 5'-ANN-3' type; Ala 6 , Arg 6 , Asn 6 , Asp 6 Glu 6 , Lys 6 , Thr 6 , Val 6 ( Fig. 2). Selection of a wide range of residues at this position is consistent with studies from other laboratories where positions within adjacent fingers were randomized (28,29). These studies selected amino acid residues Tyr, Val, Thr, Asn, Lys, Glu and Leu, as well as Gly, Ser and Arg, but not Ala, for 5' adenine recognition. In addition, a sequential phage display selection strategy identified several 5'-ANN-3'-binding fingers and evaluated their specificity using target site selections. Arg, Ala, and Thr in position 6 of the helix were demonstrated to predominantly specify 5' adenine recognition (31). Further, Thr 6 was identified by target site selections of finger 5 of Gfi-1 (QSS-N-LIT) that binds the subsite 5'-AAA-3' to specify a 5' adenine (19). In combination with the data presented here there appears to be a complex but nonrandom relationship between the amino acid residue in position 6 and 5' Interestingly, the expected lack of 5' specification by amino acid residues that present short side chains in position 6 of the α-helix is only partially supported by the binding data.
While helices such as RRD-A-LNV (Fig. 3m) and the finger-2 helix RSD-H-LTT of C7.GAT ( Fig. 1B) did indeed show essentially no 5' specificity, helix DSG-N-LRV (Fig. 3b) displayed excellent specificity for a 5' adenine, while TSH-G-LTT (Fig. 3v) was specific for 5' adenine or guanine. Other helices with position-6 residues of this type displayed varying degrees of 5' specificity, but typically excluded 5' thymine recognition (Fig. 3). Since it is unlikely that the position-6 residue makes a direct base contact, the observed binding patterns must result from other binding mechanisms. Possibilities include the involvement of bound water, local sequence-specific DNA structure changes, and overlapping interactions from neighboring domains. The latter possibility is disfavored, however, because the residue in position 2 of finger 3 (which is frequently observed to contact the neighboring site) is glycine in the parental protein C7.GAT, and because 5' thymine was not excluded by RRD-A-LNV (Fig. 3m) or RSD-H-LTT (Fig. 1B).
However, Asn 6 also seemed to impart specificity for both adenine and guanine in some cases (Fig. 3n, 3o and 3t) suggesting an interaction with the N7 common to both nucleotides. To further study this question, finger-2 mutants containing Asn 6 to Gln 6 amino acid exchanges were constructed for pAAG (Fig. 3o), pAGG (Fig. 3t), and pATT (Fig. 3n). Analysis of these proteins in multi-target ELISA studies showed a shift in the recognition of the 5' nucleotide from adenine towards guanine and cytosine ( Fig. 3x and 3y). The mutant pmATT failed to bind DNA at all (data not shown), while the parental protein containing Asn 6 showed excellent binding specificity for its target site. These results indicate that the longer side chain of Gln 6 may sterically interfere with the binding of the protein, explaining why this residue was not selected by phage display.
The final position 6 residue to be considered is Arg 6 . It was somewhat surprising that Arg 6 was selected so frequently on 5'-ANN-3' targets since in previous studies it was unanimously selected to recognize a 5' guanine with high specificity (7). However, in the present study Arg 6 primarily specified 5' adenine ( Fig. 3e, f, h and s), with cross reactivity to a 5' guanine in some cases ( Fig. 3q and 3r). Computer modeling of a helix binding to 5'-ACA-3' (SPA-D-LTR; Fig. 3e) based on the coordinates of finger 1 (QSG-S-LTR) of a Zif268 variant bound to 5'-GCA-3' (18), suggested that Arg 6 could easily adopt a configuration that allowed it to make a cross-strand hydrogen bond to O4 of a thymine base-paired to the 5' adenine ( Fig. 7c and d). In fact, Arg 6 could bind with good geometry to both the O4 of thymine and the O6 of a guanine base-paired to a middle cytosine. Such an interaction is consistent with the fact that Arg 6 was selected almost unanimously when the target sequence was 5'-ACN-3'. The notion that arginine can facilitate multiple interactions is compelling. Several lysines in TFIIIA were observed by NMR to be conformationally flexible (41), and Gln -1 also behaves in a manner which suggests flexibility (8). Arginine has more rotatable bonds and more hydrogen bonding potential than lysine or glutamine and it is attractive to speculate that Arg 6 is not limited to specification of only a 5' guanine.
The amino acid residues selected in positions -1 and 3 in the present study were typically analogous to those identified in their 5'-GNN-3' binding counterparts (7) with two exceptions.
Ser -1 was selected for pACA, recognizing a 3' adenine ( Fig. 3e and 3q) and His -1 was selected for pAGT and pATT, recognizing a 3' thymine ( Fig. 3k, 3n). While Gln -1 was frequently used to specify a 3' adenine in subsites of the 5'GNN-3' type, a new element of 3' adenine recognition is suggested in the present study involving Ser -1 selected in domains recognizing the 5'-ACA-3' subsite ( Fig. 2). Computer modeling was used to study the interactions of this helix with DNA.
Models suggested that Ala 2 , co-selected in the helix SPA-D-LTR (Fig. 3e), can potentially make a van der Waals contact with the methyl group of the thymine based-paired to 3' adenine ( Fig. 7b and c) and that Ser -1 potentially makes a hydrogen bond with the 3' adenine (Fig. 7b).
Additional evidence that Ala 2 might also be directly involved in specification of the binding site of this protein is that helix SPA-D-LTR (Fig. 3e) is strongly specific for 3' adenine while SHS-D-LVR (Fig. 3q) is not. Gln -1 is often sufficient for 3' adenine recognition. However, data from our previous studies suggested that the side chain of Gln -1 can adopt multiple conformations, enabling, for example, recognition of 3' thymine (8,18,42). It is therefore intriguing to speculate that Ala 2 in combination with Ser -1 may provide an alternative means for specifying a 3' adenine.
Another interaction not observed in our 5'-GNN-3' study is the cooperative recognition  (Fig. 3n). Asn 2 in this helix has the potential not only to hydrogen bond with 3' thymine but also with the adenine base-paired to it (Fig. 7e and f). His -1 was also found within the helix the selected to bind 5'-AGT-3' (HRT-T-LLN; Fig. 3k) in combination with a Thr 2 .
Residue Thr 2 might be involved in a similar recognition mechanism as Ser 2 .
The examples discussed above demonstrate that it is difficult at the present time to understand why amino acid residues like alanine, valine, or threonine in position 6 of the α-helix assist in the recognition of a 5' adenine. These helices may simply not sterically exclude an adenine in the 5' position of the triplet. It is reasonable to consider that the surrounding domains, as well as bound water molecules and local DNA structure might influence the DNA-binding properties of these domains as has been discussed in detail (24)(25)(26)43). However, the domains characterized in this study that contain Asn 6 and Arg 6 more likely specify 5' adenine recognition by direct interaction with the nucleotide, as discussed above.

A C G T A C G T A C G T A C G T A C G T AAx ACx AGx
ATx xNN

A C G T A C G T A C G T A C G T A C G T AAx ACx AGx
ATx xNN

A C G T A C G T A C G T A C G T A C G T AAx ACx AGx
ATx xNN

A C G T A C G T A C G T A C G T A C G T AAx ACx AGx
ATx xNN

A C G T A C G T A C G T A C G T A C G T AAx ACx AGx
ATx xNG

A C G T A C G T A C G T A C G T A C G T AAx ACx AGx
ATx xNT