Structure-Function Analysis of the THAP Zinc Finger of THAP1, a Large C2CH DNA-binding Module Linked to Rb/E2F Pathways*

THAP1, the founding member of a previously uncharacterized large family of cellular proteins (THAP proteins), is a sequence-specific DNA-binding factor that has recently been shown to regulate cell proliferation through modulation of pRb/E2F cell cycle target genes. THAP1 shares its DNA-binding THAP zinc finger domain with Drosophila P element transposase, zebrafish E2F6, and several nematode proteins interacting genetically with the retinoblastoma protein pRb. In this study, we report the three-dimensional structure and structure-function relationships of the THAP zinc finger of human THAP1. Deletion mutagenesis and multidimensional NMR spectroscopy revealed that the THAP domain of THAP1 is an atypical zinc finger of ∼80 residues, distinguished by the presence between the C2CH zinc coordinating residues of a short antiparallel β-sheet interspersed by a long loop-helix-loop insertion. Alanine scanning mutagenesis of this loop-helix-loop motif resulted in the identification of a number of critical residues for DNA recognition. NMR chemical shift perturbation analysis was used to further characterize the residues involved in DNA binding. The combination of the mutagenesis and NMR data allowed the mapping of the DNA binding interface of the THAP zinc finger to a highly positively charged area harboring multiple lysine and arginine residues. Together, these data represent the first structure-function analysis of a functional THAP domain, with demonstrated sequence-specific DNA binding activity. They also provide a structural framework for understanding DNA recognition by this atypical zinc finger, which defines a novel family of cellular factors linked to cell proliferation and pRb/E2F cell cycle pathways in humans, fish, and nematodes.

THAP1, the founding member of a previously uncharacterized large family of cellular proteins (THAP proteins), is a sequence-specific DNA-binding factor that has recently been shown to regulate cell proliferation through modulation of pRb/ E2F cell cycle target genes. THAP1 shares its DNA-binding THAP zinc finger domain with Drosophila P element transposase, zebrafish E2F6, and several nematode proteins interacting genetically with the retinoblastoma protein pRb. In this study, we report the three-dimensional structure and structurefunction relationships of the THAP zinc finger of human THAP1. Deletion mutagenesis and multidimensional NMR spectroscopy revealed that the THAP domain of THAP1 is an atypical zinc finger of ϳ80 residues, distinguished by the presence between the C2CH zinc coordinating residues of a short antiparallel ␤-sheet interspersed by a long loop-helix-loop insertion. Alanine scanning mutagenesis of this loop-helix-loop motif resulted in the identification of a number of critical residues for DNA recognition. NMR chemical shift perturbation analysis was used to further characterize the residues involved in DNA binding. The combination of the mutagenesis and NMR data allowed the mapping of the DNA binding interface of the THAP zinc finger to a highly positively charged area harboring multiple lysine and arginine residues. Together, these data represent the first structure-function analysis of a functional THAP domain, with demonstrated sequence-specific DNA binding activity. They also provide a structural framework for understanding DNA recognition by this atypical zinc finger, which defines a novel family of cellular factors linked to cell proliferation and pRb/E2F cell cycle pathways in humans, fish, and nematodes.
Zinc finger proteins represent the most abundant class of DNA-binding proteins in the human genome. Zinc fingers have been defined as small, functional, independently folded domains that require coordination of a zinc atom to stabilize their structure (1). The zinc finger superfamily includes the C2H2-type zinc finger, a compact ϳ30-amino acid DNA-binding module repeated in multiple copies in the protein structure (2,3), the C4-type zinc finger found in the GATA family of transcription factors (4), and the zinc-coordinating DNA-binding domain of nuclear hormone receptors (5). We recently described an atypical zinc finger motif, characterized by a large C2CH module (Cys-X 2-4 -Cys-X 35-53 -Cys-X 2 -His) with a spacing of up to 53 amino acids between the zinc-coordinating C2 and CH residues (6). This motif, designated THAP domain or THAP zinc finger, defines a previously uncharacterized large family of cellular factors with more than 100 distinct members in the animal kingdom (6,7). We showed that the THAP domain of THAP1, the prototype of the THAP family (8), possesses zinc-dependent sequence-specific DNA binding activity and recognizes a consensus DNA target sequence of 11 nucleotides (THABS, for the THAP1 binding sequence) (7), considerably larger than the 3-4 nucleotides motif typically recognized by classical C2H2 zinc fingers (2,7). Interestingly, the consensus C2CH signature of the THAP domain was identified in the sequence-specific DNA-binding domain of Drosophila P element transposase, suggesting the THAP zinc finger constitutes a novel example of a DNA-binding domain shared between cellular proteins and transposons from mobile genomic parasites (6,9).
Although the biological roles of cellular THAP proteins remain largely unknown, data supporting an important function in cell proliferation and cell cycle control have recently been provided. We found that human THAP1 is an endogenous physiological regulator of endothelial cell proliferation and G 1 /S cell cycle progression, which modulates expression of sev-eral pRb 4 /E2F cell cycle target genes. In addition, we identified RRM1, a G 1 /S-regulated gene required for S-phase DNA synthesis, as a direct transcriptional target of endogenous THAP1 (10). These data provided the first links in mammals between THAP proteins, cell proliferation, and pRB/E2F cell cycle pathways and complemented genetic data previously obtained in model animal organisms. Indeed, in zebra fish and other fish species, the ortholog of cell cycle transcription factor E2F6, a repressor of E2F-dependent transcription during S phase (11) was found to contain a THAP zinc finger at its N terminus (7). In the nematode Caenorhabditis elegans, five distinct THAP zinc finger proteins (LIN-36, LIN-15B, LIN-15A, HIM-17, and GON-14) (7) were shown to interact genetically with LIN-35/ Rb, the sole C. elegans retinoblastoma homolog (12)(13)(14)(15)(16). Among these, GON-14 appeared to function as a positive regulator of cell proliferation, because cell division defects were observed in the intestine, gonad, and vulva of gon-14 null mutant (16). In contrast, LIN-36 and LIN-15B, initially characterized for their role in the specification of vulval cell fates (synthetic Multivulva class B genes, synMuvB) (12,13), were found to function as inhibitors of the G 1 /S cell cycle transition (14). LIN-36 behaved most similar to LIN-35/Rb and Efl-1/E2F, the ortholog of mammalian cell cycle transcription factors E2F4/5, and was therefore proposed to act in a transcriptional repressor complex with these factors to repress G 1 /S control genes (14,17,18). However, LIN-36, LIN-15B, and THAP1 were not found in the evolutionary conserved pRb/E2F protein complexes (DREAM or DRM complexes) that have recently been described in Drosophila, C. elegans, and human cells and that contain pRb/p130, E2F4/5, DP, and five other synMuvB gene products LIN-9, LIN-37, LIN-52, LIN-53, LIN-54 (19 -22). This suggests that THAP zinc finger proteins may function in distinct transcriptional regulatory complexes to regulate E2F target genes. Although not associated with Rb complexes, THAP zinc finger proteins may still act at the level of chromatin regulation because several C. elegans THAP family members have been found to interact genetically with components of diverse chromatin-modifying and/or chromatin-remodeling complexes, including members of the Nucleosome Remodeling Deacetylase (NuRD) complexes and components of the Tip60/ NuA4 histone acetyltransferase complex (12-16, 23, 24). In addition, the human THAP7 protein has also been shown to interact with chromatin-modifying enzymes (25). Together, these observations indicate that both in humans and model animal organisms, THAP zinc finger proteins appear to be critical regulators of cell proliferation and cell cycle progression, likely to act at the level of chromatin regulation.
Solution structures of the THAP domains from two previously uncharacterized proteins, human THAP2 and C. elegans CtBP, have recently been reported (THAP2, PDB code 2D8R; CtBP, PDB code 2JM3 (26)). However, sequence-specific DNA binding properties have not yet been demonstrated for these two domains. Here, we report the first structure-function anal-ysis of a functional THAP domain, the THAP zinc finger of human THAP1. The three-dimensional structure of the domain was determined by multidimensional NMR spectroscopy and its DNA binding interface was characterized by a combination of alanine scanning mutagenesis and NMR chemical shift perturbation analysis. Together, these data provide a better understanding of the structure-function relationships of this atypical zinc finger.
Protein Expression and Purification-For NMR experiments, recombinant THAP domains of human THAP1 (Met 1 -Lys 90 ) and (Met 1 -Phe 81 ) were produced as His tag fusion proteins in Escherichia coli BL21(DE3). Cells were grown in LB medium at 37°C to an A 600 of 0.8 before induction with 1 mM isopropyl-1thio-␤-D-galactopyranoside, to obtain an unlabeled sample. ZnCl 2 was added at this step (final concentration of 0.1 mM). Isotopically 15 N/ 13 C-labeled THAP domain (Met 1 -Lys 90 ) and 15 N-labeled THAP domain (Met 1 -Phe 81 )) were expressed in minimal (M9) medium containing 15 NH 4 Cl and 15 N Celtone and either [ 13 C]glucose or [ 12 C]glucose. Proteins were purified using a Ni-NTA column (HiTrap, Amersham Biosciences) followed by gel-filtration chromatography on Sephadex G75 (Amersham Biosciences). After digestion with thrombin (Novagen), proteins were further purified on a gel filtration column. NMR samples were concentrated to 0.4 -1.7 mM in 50 mM deuterated Tris-HCl, pH 6.8, 1 mM DTT with either 10 mM NaCl (for protein structure determination) or 250 mM NaCl (for DNA binding studies).
For gel-shift assays, the recombinant THAP domain of human THAP1 was produced as previously described (7). The recombinant THAP domains of THAP2, THAP3, Ce-CtBP, or GON-14 were produced in E. coli strain BL21trxB(DE3), transformed with the different pET-21c-THAP domain expression vectors. Protein expression and purification was performed according to the manufacturer's instructions (Novagen, Madison, WI), as previously described (7). The purity of the different THAP domains was assessed by SDS-PAGE, and protein concentrations were determined using the Bradford protein assay (Bio-Rad). Full-length THAP1 wild-type and mutants were synthesized in vitro in rabbit reticulocyte lysate (RRL). The corresponding pcDNA3 expression vectors were used with the TNT-T7 kit (Promega). Protein production was performed in the presence of 35 S-labeled methionine and verified by SDS-PAGE and autoradiography.
Surface Plasmon Resonance Experiments-DNA interaction kinetics was investigated by Surface Plasmon Resonance (SPR) assays using a four channel BIAcore 3000 optical biosensor instrument (BIAcore AB, Uppsala, Sweden). Immobilization of biotinylated single-stranded DNA probes was performed on a streptavidin-coated sensorchip (BIAcore SA sensorchip) in HBS-EP buffer (10 mM Hepes pH 7.4, 150 mM NaCl, 3 mM EDTA, 0.0005% Surfactant P20) (BIAcore AB, Uppsala, Sweden). All immobilization steps of biotinylated single-stranded DNA probes were performed at a final DNA concentration of 100 ng/ml and at a flow rate of 2 l/min. Hybridization of complementary DNA strands was performed in HBS-EP buffer supplemented with 200 mM NaCl. Biotinylated oligonucleotide sequences and complementary DNA strands were purchased from MWG Biotech.
Binding analyses were performed with multiple injections of THAP domain (Met 1 -Lys 90 ) at different protein concentrations over the immobilized surfaces at 25°C. A second DNA probe with unrelated sequence was used as a control. All samples were diluted in the running buffer containing 50 mM Tris, pH 6.8, 250 mM NaCl, 1 mM DTT, and were injected over the sensor surface for 4 min at a flow rate of 20 l/min. No binding was observed to the DNA probe with unrelated sequence. The SPR signal was therefore analyzed as difference sensorgrams between the two DNA sequences immobilized to separate channels of the sensor chip (the signal from the unrelated DNA used as control was subtracted).
NMR Spectroscopy-NMR experiments were recorded at 296 K on Bruker Avance 800, 700, and 600 MHz spectrometers. Backbone and side-chain resonances ( 1 H, 15 N, and 13 C) of the THAP domain (Met 1 -Lys 90 ) were assigned using a set of heteronuclear experiments (27). This information was partially used to assign 1 H and 15 N resonances of the THAP domain (Met 1 -Phe 81 ) using a combination of homonuclear and 15 N heteronuclear spectra. / torsion angles were derived using TALOS (28) and the 3 J HN-H␣ coupling constants obtained from three-dimensional HNHA (29). Four X-Pro bonds were identified as being in the trans-configuration on the basis of strong NOEs between the H␦ proton of each Pro residue and the H␣ protons of the preceding residues (30) and confirmed by 13 C chemical shifts (31). From characteristic NOEs, the residue Pro 26 was identified as being in the cis form (30). 15 stereospecific assignments of H␤ methylene were obtained using the DQF-COSY spectrum (32). NMR data were processed using TOPSPIN software (Bruker) and NMRPipe (33) and analyzed using XEASY (34) and NMRView (35). 15 N relaxation data were recorded at 296 K on a 0.9-mM protein sample at 10 mM and 250 mM NaCl using standard pulse sequences. The heteronuclear NOEs were determined from two 15 N HSQC spectra recorded in the presence and absence of 1 H presaturation period of 3 s and with a recycling delay of 5 s (36).
Structure Calculations-To solve the structure of the THAP domain (Met 1 -Phe 81 ), a set of distances was extracted from integration of two-dimensional 1 H and three-dimensional 15 N heteronuclear NOESY spectra. The secondary structure elements were derived from analysis of coupling constants, from identification of slowly exchanging amide protons and from characteristic NOEs. To maintain well-defined secondary structure elements, hydrogen bonds were added with restraints of 1.8 to 2.4 imposed on the distance between hydrogen and acceptor oxygen and restraints of 2.3 to 3.2 imposed on the distance between the donor nitrogen and acceptor oxygen.
Preliminary structure calculations run either with N␦1 or N⑀2 of the His 57 ring allowed us to identify N⑀2 as the zincbound atom. Subsequent structural refinement including a zinc ion together with constraints defining tetrahedral coordination (3,37) was performed. The structures were calculated using a torsion angle dynamics simulated annealing protocol using the CNS software suite (38). From 500 initial structures, a set of 20 structures were selected as accepted structures, based on the following criterions: low total energy, no distance violation larger than 0.2 Å and no torsion angle violation greater than 2°. Their structural quality was analyzed using PROCHECK (39).
NMR Chemical Shift Perturbation Analysis-The 14-bp duplex DNA containing THABS was reconstituted by hybridizing the following oligonucleotides, 5Ј-CAAGTATGGGC-AAG-3Ј and 5Ј-CTTGCCCATACTTG-3Ј in a 1:1 ratio. For NMR titration, two-dimensional 15 N HSQC spectra of the THAP domain at the concentration of 0.4 mM were collected at 296 K and 250 mM NaCl after each incremental addition of lyophilized DNA. One-dimensional 1 H spectra were recorded after each DNA addition and the DNA/protein ratio was followed from integration of 1 H protein and DNA signals on the 1 H spectra. A DNA fragment with an unrelated sequence was reconstituted by hybridizing the oligonucleotides, 5Ј-GATTT-GCATTTTAA-3Ј and 5Ј-TTAAAATGCAAATC-3Ј, and added to the THAP domain following the same procedure as described above. Normalized chemical shift changes were calculated as: ⌬␦ ϭ [(⌬␦ HN ) 2 ϩ (⌬␦ N ϫ 0.154) 2 ] 1/2 .
Model Building of the Complex between the THAP Zinc Finger and the THABS Sequence-Computational docking of the THABS DNA target onto the THAP zinc finger (Met 1 -Phe 81 ) of THAP1 was performed using HADDOCK1.3 (High Ambi-guity Driven DOCKing) (40), in conjunction with CNS (38). The docking was performed using an ensemble of THAP zinc finger structures of THAP1 and the 14-bp double-stranded DNA containing THABS built as a B-DNA template using Insight II (Accelrys). The active residues used to define the Ambiguous Interaction restraints (AIR) included residues showing relative solvent accessibility higher than 40% (as calculated by NACESS, Hubbard and Thornton, University College London) and either displaying a chemical shift perturbation higher than 0.2 ppm upon DNA binding or giving rise to loss of DNA binding from site-directed mutagenesis. Briefly, active residues were Lys 24 , Glu 37 , Arg 42 , Lys 46 , and Thr 48 . Solvent accessible residues that were surface neighbors of the active residues were defined as passive residues including Lys 34 , Glu 35 , Ala 38 , Arg 41 , Lys 43 , Asn 44 , Lys 49 , and Tyr 50 .
For DNA, bases corresponding to the core GGCA motif of the THABS sequence together with the thymine upstream were defined as active residues, as based on already reported data obtained from scanning mutagenesis (7).
The docking was performed on a SGI cluster equipped with 40 processors. Starting from 15 THAP domain structures of THAP1 and a model of the THABS target in B-DNA conformation, 1000 rigid-body solutions were generated. The best 200 solutions according to the HADDOCK rigid-body score were selected for semi-flexible refinement in torsion angle space; the top 200 structures were finally refined in explicit water. The final ensemble of 200 solutions was analyzed and clustered based on a pair-wise r.m.s.d. matrix calculated over the backbone atoms.

Biophysical Characterization of the THAP Domain of THAP1 and Identification of a Shorter Functional Fragment-The
THAP domain was originally assigned to the first 90 N-terminal residues of THAP1 (6). A corresponding fragment was initially expressed and purified for Surface Plasmon Resonance (SPR) and NMR studies. Inductively Coupled Plasma-Mass Spectrometry (ICP-MS) experiments indicated that the domain includes a zinc ion (data not shown). The SPR experiments (Fig.  1, A and B) showed that the THAP zinc finger of THAP1 binds to a 14-bp THAP domain binding site (THABS DNA probe) that includes the GGCA core motif at sequence positions 7-10, previously found to be critical for recognition by the THAP domain of THAP1 (7).
A series of triple-resonance NMR experiments allowed us to assign unambiguously residues 3-63 and 74 -77 of the domain. However, only partial assignment for residues 64 -73 and 78 -90 in the C-terminal tail could be performed due to severe lack of connectivity for these residues. These data together with characteristic negative heteronuclear NOE values (data not shown) indicated that the C-terminal tail of the domain is unstructured. These results led us to search for the minimal size of the functional THAP zinc finger. The cysteine residues at the N terminus of the THAP domain (THAP1 Cys 5 and Cys 10 ) have previously been shown to be required for the functional activity of the domain, for both human THAP1 (7) and Drosophila P element transposase (41), thus defining the N-terminal boundary of the domain. In contrast, nothing was known about the requirements at the C terminus downstream of the conserved AVPTIF motif (6) containing the essential Pro 78 residue (7). Alanine-scanning mutagenesis was therefore performed and revealed that residues 82-90 are not required for DNA binding activity of the THAP domain of THAP1 (data not shown). This was confirmed using a THAP1 deletion mutant, THAP1⌬82-90, that exhibited a similar activity in EMSA experiments than wild-type THAP1 (data not shown). In addition, recombinant  FEBRUARY 15, 2008 • VOLUME 283 • NUMBER 7

JOURNAL OF BIOLOGICAL CHEMISTRY 4355
THAP domains (Met 1 -Phe 81 ) and (Met 1 -Lys 90 ) exhibited similar activities in EMSA experiments indicating that residues 1-81 are sufficient for sequence-specific DNA binding to the THABS motif (Fig. 1C). A deletion mutant referred to fragment (Met 1 -Phe 63 ) was also tested both by NMR and EMSA experiments, but the HSQC spectrum corresponded to that of an unfolded protein and the fragment did not possess any DNA binding activity (data not shown). Therefore, the recombinant THAP domain (Met 1 -Phe 81 ) was selected for all structural studies.
NMR Solution Structure of the THAP Zinc Finger of Human THAP1-We solved the three-dimensional solution structure of the zinc-containing form of the THAP domain (Met 1 -Phe 81 ) using a set of distances extracted from two-dimensional and three-dimensional NOESY spectra. For the THAP domain in its DNA-free state, spectra were recorded in a buffer containing 10 mM NaCl. In these conditions, 1539 distance restraints obtained from two-dimensional 1 H-NOESY and three-dimensional 15 N HSQC-NOESY spectra and 104 angle restraints were used for calculations ( Table 1).
The core of the THAP zinc finger of THAP1 adopts a ␤␣␤ fold (Fig. 2, A-D). Residues Cys 5 , Cys 10 , Cys 54 , His 57 form a single zinc-binding site that begins with a long loop L1 (residues Gln 3 -Ser 21 ), which precedes the first ␤-strand (␤1; residues Phe 22 -Lys 24 ). This portion is followed by a second loop L2 (Phe 25 -Lys 32 ), which continues into a ␣-helix H1 (residues Cys 33 -Val 40 ). An additional loop L3 (residues Arg 41 -Ser 51 ) is followed by the second ␤-strand (␤2; residues Ser 52 -Cys 54 ) running anti-parallel to ␤1. The zinc-binding site is completed by a short 3 10 helix (⌯2; residues Ser 55 -His 57 ). A second 3 10 helix (⌯3; residues Pro 60 -Phe 63 ) is followed by a flexible loop L4 (Lys 64 -Asn 68 ) that continues into an extended region (Asn 69 -Pro 78 ) followed by an additional 3 10 helix (H4) including residues Thr 79 -Phe 81 that form part of the AVPTIF motif (6). Some regions of the loops are relatively well defined (L1: 6 -16; L2: 26 -32; L3: 46 -50; L4: 69 -78) with restricted mobility as judged by the heteronuclear NOEs, whereas the other parts of the loops are less ordered and display a mobility on ns-ps time scale (Fig. 2C and supplemental Fig. S1). In particular, beginning of loop L4 displays a high degree of structural disorder although its heteronuclear NOEs are not much lower than those of loop L1. This is because of the scarcity of NOEs involving this region despite extensive search in the NOESY maps.
Only a few amide protons are protected as observed from H 2 O/D 2 O experiments. They mainly correspond to some residues located within the helix H1 (Glu 35 , Trp 36 , Glu 37 , Ala 39 , Val 40 ) and the two-stranded ␤-sheet (Ile 53 , Cys 54 ) as well as few additional residues (Ser 6 , Ala 7 , Cys 10 , Asn 12 ) in the vicinity of the zinc-binding site. Apart from these residues, the THAP domain amide protons exchange rapidly with the solvent (data not shown).
A structure-based sequence alignment of the THAP zinc finger of THAP1 with representative THAP domains is shown in Fig. 2E. Besides the strictly conserved C2CH motif that provides ligands for the zinc ion, the THAP domain is defined by a C-terminal AVPTIF motif and four residues (Pro 26 , Trp 36 , Phe 58 , and Pro 78 , numbering refers to THAP1) that are invariant in more than hundred THAP domain sequences and that are absolutely required for DNA binding activity (6,7). The unique tryptophan (Trp 36 ) located in the ␣-helix H1 is a key element of the THAP zinc finger structure and constitutes the anchoring residue that makes hydrophobic contacts with the conserved Phe 58 residue (Fig. 2D) and the surrounding aromatic residues, namely Phe 25 and Phe 63 . In addition, NOEs are detected between Trp 36 at the center of the hydrophobic core and the two invariant prolines (Pro 26 in the loop L2 and Pro 78 in the AVPTIF motif). Both of the prolines display strongly upfield-shifted resonances because of the proximity of Trp 36 . NOEs are also observed between Phe 81 in the AVP-TIF motif and Ala 39 -Val 40 in the helix H1 (data not shown). Therefore, the AVPTIF motif appears to play an essential role in the folding of the THAP zinc finger by bringing together the C terminus and the ␣-helix H1 (Fig. 2D).
The THAP Zinc Fingers Share the Same Three-dimensional Fold but Not the Same DNA Target Sequence-Comparison of the structure of the THAP zinc fingers from human THAP1, THAP2, and C. elegans CtBP revealed that the overall fold and the packing around the tetrahedral zinc-coordinating site are similar for the three THAP domains (Fig. 3A). The structural homology is higher between the THAP domains of THAP1 and THAP2, as expected for closely related sequences (48%). Indeed, the solution structure of the THAP domain of THAP1 can be superimposed onto that of THAP2 for 80 C␣ equivalent residues with an r.m.s.d. value of 2.8 Å. A weaker score is found for the superimposition of the THAP domain of THAP1 onto that of CtBP with 66 C␣ equivalent atoms that could be super- imposed with an r.m.s.d. value of 3.1 Å. It is noteworthy that the sequence identity between these two domains is only of 27%. The core fold consisting of the anti-parallel ␤-sheet with the two strands separated by a loop-helix-loop insertion is conserved among the three THAP domains. Nevertheless, the C terminus displays structural variability. Indeed, the THAP domain of THAP1 shows two additional short 3 10 helices H3 and H4, encompassing residues 60 -63 and 79 -81, respectively, whereas the THAP domains of THAP2 and CtBP display additional two-stranded anti-parallel ␤-sheets. In the structure of the THAP domain of THAP2, the second anti-parallel ␤-sheet is formed by residues that would correspond to resi-dues Phe 63 -Lys 64 and Leu 71 -Leu 72 in THAP1. In the CtBP structure instead, the second anti-parallel ␤-sheet involves residues that would correspond to THAP1 residues Ala 76 -Val 77 in the AVPTIF motif and the two residues Leu 82 -Cys 83 that follow the AVPTIF motif. Because the Met 1 -Phe 81 fragment of THAP1 retains its capacity to bind DNA (Fig. 1C), the second ␤-sheet observed for the THAP domain of CtBP is unlikely to be important in the molecular scaffold of the THAP zinc finger.
It is noteworthy that the flexible loop L4 (residues 64 -68) between H3 and H4 in THAP1 is also observed in THAP2 but is absent in CtBP because it corresponds to an 8-residue sequence insertion compared with the THAP domain of CtBP. Despite these discrepancies in the secondary structure elements, residues in the C-terminal region keep mostly equivalent positions in the three structures and positions of residues Ala 76 -Phe 81 that form part of the AVPTIF motif in THAP1 are identical when compared with that of THAP2.
Because different THAP zinc fingers appear to exhibit a similar three-dimensional fold, we then studied the possibility they may recognize the same DNA target sequence. Among the 12 human THAP proteins, THAP2 and THAP3 are the most closely related to THAP1 and for instance, the THAP domains of THAP1 and THAP3 exhibit up to 50% identity (6). Therefore, we tested the ability of these two proteins to bind to the THABS motif specifically recognized by THAP1 (7). As shown in Fig. 3B, we found that the recombinant THAP domains of THAP2 and THAP3 did not bind to the THABS probe in gel shift assays. In contrast, the recombinant THAP domain of THAP1, used in the same conditions, exhibited strong binding to the THABS sequence (Fig. 3B). These results were confirmed using in vitro translated full-length THAP proteins, which provided an independent source of THAP domains. In contrast to THAP1, the full-length THAP2 and THAP3 proteins did not bind to the THABS probe in gel-shift assays (data not shown).
Liew et al. (26) recently reported binding of the THAP domain of C. elegans CtBP (Ce-CtBP) to the THABS sequence recognized by human THAP1. However, their gel-shift assays were performed in the absence of competitor DNA, and we considered the possibility that their observations may correspond to a non-sequence specific DNA binding activity of the THAP domain of Ce-CtBP. We therefore performed gel shift assays with the recombinant Ce-CtBP THAP domain in the presence or absence of the synthetic poly(dI/dC) nonspecific competitor DNA. In agreement with the results reported by Liew et al. (26), we observed binding of the THAP domain of Ce-CtBP to the THABS motif in the absence of competitor (Fig.  3C). However, no specific protein-DNA complex was observed in the presence of the poly(dI/dC) competitor. Similar results were obtained with the recombinant THAP domain from another C. elegans THAP protein, the cell proliferation and developmental regular GON-14 (Fig. 3C). In contrast, strong binding of the THAP domain of THAP1 to the THABS sequence was observed in the presence of competitor (Fig. 3C). We concluded that Ce-CtBP and GON-14 THAP zinc fingers do not bind specifically to the THABS motif recognized by THAP1. Together with the findings on human THAP2 and THAP3, these results indicate that the different THAP zinc fingers share the same three-dimensional fold but not the same DNA target sequence.
Structure-Function Analysis of the THAP Zinc Finger of THAP1 by Site-directed Mutagenesis-We have previously shown that the eight invariant residues that define the THAP domain are absolutely required for DNA binding activity (7). To get further insights into the role of other residues in DNA recognition, thirty additional residues were individually mutated to alanine and the resulting mutants were tested in gel-shift assays (Fig. 4, A-D and Table 2). These included twenty-four consecutive residues (Leu 27 to Ser 52 ) from the long loop-helixloop motif (L2-H1-L3) inserted into the anti-parallel ␤-sheet, one of the most distinctive features of the THAP zinc finger (Fig. 2). Singlepoint mutation of the invariant Trp 36 in the center of the ␣-helix was used as a control and, as expected, this mutation completely abolished the interaction. Similarly to mutation of Trp 36 , mutation of residues Lys 24 , Arg 29 , Arg 42 , Phe 45 , and Thr 48 led to a complete loss of DNA-binding activity whereas mutation of residues Lys 11 , Leu 27 , Glu 37 , Val 40 , and Tyr 50 decreased but did not abrogate the interaction of the THAP zinc finger with its THABS DNA target sequence (Fig.  4, A-D and Table 2). Triple mutations of residues Thr 28 -Arg 29 -Pro 30 , Arg 41 -Arg 42 -Lys 43 and Pro 47 -Thr 48 -Lys 49 to alanines were also performed and confirmed the importance of these regions for DNA binding activity of the THAP domain of THAP1 (Fig. 4, C and D and Table 2). Interestingly, triple alanine mutation of residues Tyr 50 , Ser 51 , and Ser 52 revealed a critical role for these residues that was less apparent in the single point mutants. Mapping of the essential residues on the THAP domain structure of THAP1 revealed that residues Arg 29 and Phe 45 make contacts with the hydrophobic core of the domain. Indeed, the Arg 29 resi-  45 , or Thr 48 abrogates DNA binding activity of the THAP zinc finger of THAP1. EMSA were performed with the 36-bp THABS probe and THAP1 wt or the indicated mutant proteins, in the presence of poly(dI/dC) and salmon sperm DNA competitors. The previously described Trp36A mutant (7) was included as a control for loss of DNA binding activity. Wt ϩ Ab, supershift experiment with anti-THAP1 antibody to demonstrate the specificity of the protein-DNA complexes. RRL, unprogrammed rabbit reticulocyte lysate; black arrowhead, THAP1-THABS DNA complex; white arrowhead, antibody-THAP1-THABS DNA complex; asterisks, nonspecific complexes. E, view illustrating the interactions of Arg 29 and Phe 45 (in red) with Leu 32 , Trp 36 , Ala 76 , and Phe 25 , Val 40 , respectively (in green). F, representations of the electrostatic surface potential of the THAP domain (Met 1 -Phe 81 ) showing the exposed residues that are found to be critical for DNA binding from site-directed mutagenesis experiments (underlined) or that undergo more than 0.2 ppm chemical shift change (marked with an asterisk). Exposed residues that are positively charged are indicated on the surface and colored yellow, otherwise black. Representations of the two corresponding ribbons are shown for clarity (in gray). FEBRUARY 15, 2008 • VOLUME 283 • NUMBER 7 due in the loop L2 is shown to make several NOE contacts with protons of residues Leu 32 in the loop L2, Trp 36 in the helix H1 and Ala 76 that is part of the AVPTIF motif (Fig. 4E). The residue Phe 45 that is part of the loop L3 gives NOEs to Val 40 in the helix H1 and to Phe 25 in the loop L2 (Fig. 4E). Therefore the loss in DNA binding after mutation of the two residues Arg 29 and Phe 45 could be due to a disruption of local structure.

Structure-Function Analysis of the THAP Zinc Finger of THAP1
In contrast, residues Lys 24 , Arg 42 , and Thr 48 are exposed at the surface of the THAP domain. Interestingly, they map onto the area of the domain that is highly positively charged due to the presence of several exposed basic side chains of lysines and arginines consistent with DNA interaction (Fig. 4F). These data strongly suggest that the three residues Lys 24 , Arg 42 , and Thr 48 may be directly involved in DNA binding.
Identification of the DNA Binding Interface of the THAP Zinc Finger by NMR Chemical Shift Perturbation Analysis-To further characterize the DNA binding interface, binding of the THAP domain of THAP1 to the THABS DNA target sequence was probed by NMR chemical shift perturbation analysis. The 15 N-labeled THAP domain dissolved in NMR buffer containing 250 mM NaCl was titrated with 14-bp duplex DNA containing THABS. In the absence of DNA, the spectrum recorded at 250 mM NaCl was similar to the one recorded at 10 mM NaCl, except a few peaks that slightly shifted (data not shown). In addition, based on 15 N longitudinal and transverse relaxation times, rotational correlation times ( c ) were determined to be 6.03 Ϯ 0.1 ns and 6.89 Ϯ 0.4 ns at 10 mM and 250 mM NaCl, respectively, indicating that the protein is monomeric in both salt conditions (data not shown). In the presence of increasing concentrations of the THABS oligonucleotide, several cross-peaks were significantly affected during titration (Fig. 5A). A similar two-dimensional 15 N HSQC spectrum recorded in the presence of an unrelated 14-bp duplex DNA did not reveal any significant shift (data not shown). The chemical shift perturbations observed in the presence of the specific THABS sequence were not further affected when DNA:protein ratio was above 1:1 suggesting a 1:1 binding stoichiometry, in agreement with SPR experiments (Fig. 1, A and B).
During titration, the majority of the affected signals could be followed as the fast-exchange manner, i.e. a single cross peak with intermediate chemical shift between that of the free and bound forms (Fig. 5A). Signals with the largest chemical shift changes could be followed as the slow-exchange manner with two peaks corresponding to the free and bound forms with intensities proportional to the free/bound ratio. Finally, the

TABLE 2
Alanine-scanning mutagenesis of the THAP zinc finger of human THAP1 DNA binding activity of the different mutants was tested using gel-shift assays. The eight evolutionary conserved residues, which define the THAP zinc finger and have previously been shown to be essential for DNA binding (7) (Fig. 5B). Several protein residues experienced significant chemical shift perturbation (⌬␦ higher than 0.2 ppm) and these were mainly organized into three different patches. The first one includes residues Ala 7 , Tyr 8 , Lys 11

DISCUSSION
We report here the three-dimensional structure and structure-function analysis of the sequence-specific DNA-binding THAP zinc finger of human THAP1, the prototype of a novel family of cellular factors involved in pRb/E2F cell cycle pathways. We recently demonstrated that THAP1 is a physiological regulator of cell proliferation. Silencing of THAP1 by RNA interference in human primary endothelial cells resulted in inhibition of G 1 /S cell cycle progression and down-modulation of several pRb/E2F cell cycle target genes, including RRM1, a gene activated at the G 1 /S transition and essential for S-phase DNA synthesis (10). We showed that the THAP zinc finger of THAP1 recognizes a consensus THAP1-binding site in the RRM1 promoter and that endogenous THAP1 associates in vivo with this site, indicating that RRM1 is a direct target gene of THAP1. The solution structure of the THAP domain of THAP1 is therefore the first structure of a THAP zinc finger with demonstrated biochemical activity as a sequence specific DNAbinding domain (7) and associated with a known biological function, i.e. recruitment of THAP1 on the pRb/E2F target gene RRM1 (10). In contrast, although the structure of the THAP domains from human THAP2 (PDB code 2D8R) and C. elegans CtBP have been determined (26), these two proteins have not been functionally characterized, and it is not yet known whether their THAP domains possess sequence-specific DNA binding properties.
The structure of the THAP zinc finger differs from that of other DNA-binding modules belonging to the zinc finger superfamily. For instance, the ␤␣␤ topology, the long spacing between the two pairs of zinc ligands (up to 53 residues in some THAP domains) distinguish the THAP zinc finger from the classical DNA-binding C2H2 zinc finger, which exhibits a ␤␤␣ topology with a shorter spacing (10 -12 residues) between the two pairs of zinc-coordinating residues (3). The position of the zinc is also an interesting feature; in the classical zinc finger, the zinc atom plays a central role in the structure by coordinating four ligands that anchor one end of the helix to one end of the ␤-sheet, whereas in the C2CH THAP motif, the zinc is not buried in the interior of the protein, and it links the N terminus of the domain to the second ␤-strand, without involving the ␣-helix that is distal to the zinc ion. The presence of the long loop-helix-loop insertion in the twostranded anti-parallel ␤-sheet is one of the most distinctive features of the THAP zinc finger. It explains the atypical spacing of the C2 and CH residues in the C2CH zinc-coordinating module and the relatively large size of the THAP domain (ϳ80 residues) compared with the C2H2 zinc finger (ϳ30 residues). The above features are very unique and are not found in other classes of zinc-coordinating DNA-binding modules. Surprisingly, however, the THAP zinc finger exhibits structural similarities with a protein-protein interaction module, the Zinc Finger-Associated Domain (ZAD, PDB code 1PZW) of the Drosophila transcription factor Grauzone (26,42). These structural homologies include the presence between the zinc-coordinating residues of a similar loop-helix-loop insertion into the twostranded anti-parallel ␤-sheet. However, despite these similarities in their molecular scaffolds, the THAP zinc finger and the ZAD domain are linked to different functions. The ZAD domain mediates protein-protein interactions and exhibits a highly negative electrostatic potential inconsistent with DNAbinding properties (42). In contrast, the THAP zinc finger of THAP1 functions as a sequence specific DNA-binding module with a highly positively charged surface (Fig. 4F).
Our previous mutagenesis studies have revealed that mutation of any of the eight residues that define the THAP zinc finger motif (including the C2CH residues) abrogate DNA binding activity of the domain (7). Mapping of these residues on the THAP zinc finger structure indicates that these amino acids play an essential role in the folding of the domain (Fig. 2D). In the present study, we identified five additional residues that are essential for DNA binding activity. Two of these residues (Arg 29 and Phe 45 ) could play a structural role by anchoring the loops of the loop-helix-loop motif to the hydrophobic core of the domain, potentially limiting the motions of these loops (Fig.  4E). The three other essential residues (Lys 24 , Arg 42 , Thr 48 ) are exposed on the positively charged surface of the THAP domain of THAP1 (Fig. 4F) and are therefore less likely to contribute to the folding or structure of the domain. Rather, they may play a direct role in DNA binding. Although Lys 24 and Arg 42 do not display significant chemical shift changes upon DNA binding, the residue Thr 48 is clearly affected and undergoes the largest change in chemical shift, consistent with it being directly involved in DNA interactions (Fig. 5B). Furthermore, the residue Thr 48 is poorly conserved among the THAP domains (Fig.  2E) suggesting a key role in binding specificity.
Surprisingly, with the exception of Trp 36 , mutation of residues located in the helix of the loop-helix-loop motif did not abrogate DNA binding. Therefore, despite the fact the ␣-helix is the most common protein structural element used for DNA recognition in zinc fingers and other types of DNA-binding domains (2,43,44), our results strongly suggest that the ␣-helix of the THAP zinc finger may not be the main DNA recognition element. In contrast, the positively charged region is likely to play a key role (electrostatic contacts) together with more specific contacts involving residues in the loops, namely the loop L3 of the loop-helix-loop motif (Arg 42 , Thr 48 ) and the loop L4 at the C terminus. Based on the NMR and mutagenesis data, a model of the complex between the THAP zinc finger of THAP1 and its DNA target was built using HADDOCK1.3 (40) (Fig. 6). The proposed model shows a good shape complementarity between the loop-helix-loop motif (L2-H1-L3) and DNA. The helix does not appear as the major recognition element but is located along the DNA chain so that the two loops on its sides fit into the DNA grooves. Remarkably, the loop L3 enters into the major groove to contact DNA and in particular, side chain of Thr 48 gives a polar contact with the GGCA core. Although the EMSA assays allowed us to define critical residues in this loop, these qualitative assays may be too insensitive to detect the role of other residues in DNA binding. Additional experiments may reveal, for instance, a role for Lys 46 , a residue that undergoes significant chemical shift changes upon DNA binding (Fig. 5), and is predicted to be in close proximity to DNA in the model of the protein-DNA complex (Fig. 6).
Interestingly, genetic data obtained in C. elegans for THAP family members LIN-36 and HIM-17 have revealed several sin-gle-point mutations which affect the functional activity of the THAP zinc finger. Most of these mutations concern residues that are critical for the folding of the domain. For instance, mutation of the second Cys of the C2CH motif was found in one of the THAP domains of HIM-17 (15), while two independent mutations were found in the last Pro residue of the THAP motif in the LIN-36 protein (13). However, other mutations have been found to affect residues that do not appear to be part of the hydrophobic core of the domain, and these may correspond to residues exposed on the surface and directly involved in DNA binding. Finally, a double alanine mutation introduced into the THAP zinc finger of Drosophila P element transposase at the level of residues His 18 and Cys 22 (corresponding to THAP1 residues Lys 24 and Arg 29 ) has previously been shown to abrogate sequencespecific DNA binding activity of the protein (41). This suggests that the essential residues we have identified in the present study are likely to be also critical for the functional activity of other THAP zinc fingers.
In this study, we show that the different THAP zinc fingers, despite sharing some structural homologies, do not recognize the same DNA target sequence. We found that recombinant THAP domains from human THAP2 and THAP3, and C. elegans CtBP and GON-14 do not exhibit sequence-specific DNA binding activity toward the DNA sequence motif recognized by human THAP1 (Fig. 3). Although Ce-CtBP and GON-14 were able to bind the THAP1 target sequence, this DNA binding activity was completely eliminated in the presence of nonspecific competitor DNA. Together with the observation that distinct THAP domains sequences within a single species exhibit less than 50% identity between each other (7), this suggests that each THAP zinc finger may possess its own specific DNA-binding site. This possibility is further supported by the observation that the DNA target sequence of the THAP zinc finger of THAP1 does not share homology with the AT-rich motif recognized by the THAP zinc finger of P element transposase (7,45). However, we cannot exclude at this stage that some THAP zinc fingers may lack sequence specificity or even DNA binding activity, and may rather function as protein-protein interaction modules.
Finally, protein-protein interactions mediated by other domains of the THAP proteins may be critical to increase the DNA binding activity of the THAP zinc finger, which appears to be relatively weak. In this respect, the C-terminal coiled-coil domain found in THAP1, as well as several other human THAP proteins, may enhance the affinity of the full-length protein for DNA by allowing dimerization or multimerization. Future studies will help to resolve these issues and will provide impor-FIGURE 6. A proposed model of the complex between the THAP zinc finger of THAP1 (gray, pink, yellow) and its DNA target (green, blue) in stereo view. Amino acids with backbone chemical shifts that undergo more than 0.2 ppm chemical shift change upon DNA binding are colored pink. Side chains of residues that are found critical from site-directed mutagenesis experiments are depicted in yellow. The side chain of Lys 46 is colored in pink. The THABS DNA molecule is colored green except bases at the GGCA core motif shown in blue. The THABS molecule was docked onto the THAP zinc finger of THAP1 using HADDOCK1.3 (High Ambiguity Driven Protein Docking) (40).
tant new insights about the structure and functions of THAP zinc finger proteins both in humans and model animal organisms.