A Genome Scale Location Analysis of Human Staf/ZNF143-binding Sites Suggests a Widespread Role for Human Staf/ZNF143 in Mammalian Promoters*

Staf was originally identified as the transcriptional activator of Xenopus tRNASec and small nuclear (sn) RNA-type genes. Recently, transcription of seven human (h) protein coding genes was reported to be activated by the human ortholog hStaf/ZNF143. Here we have used a combined in silico and biochemical approach to identify 1175 conserved hStaf/ZNF143-binding sites (SBS) distributed in 938 promoters of four mammalian genomes. The SBS shows a significant positional preference and occurs mostly within 200 bp upstream of the transcription start site. Chromatin immunoprecipitation assays with 295 of the promoters established that 90% contain bona fide SBS. By extrapolating the values of this mapping to the full sizes of the mammalian genomes, we can infer the existence of at least 2500 SBS distributed in 2000 promoters. This unexpected large number strongly suggests that SBS constitutes one of the most widespread transcription factor-binding sites in mammalian promoters. Furthermore, we demonstrated that the presence of the SBS alone is sufficient to direct expression of a luciferase reporter gene, suggesting that hStaf/ZNF143 can recruit per se the transcription machinery.

The vertebrate transcription factor Staf was originally identified in Xenopus laevis as the transcriptional activator of the tRNA Sec gene (1). Since then, Staf has also been involved in transcriptional activation of snRNA 2 and snRNA-type genes (2,3). Staf binds to the Staf-binding site (SBS), a sequence identified in promoters of many vertebrate snRNA and snRNA-type genes transcribed by RNA polymerases II or III. Seven contiguous zinc fingers of the C2-H2 type, located in the central part of the protein, contain the DNA-binding domain. Binding site selection experiments identified the 18-bp YWCCCRN-MATSCMYYRCR (Y, W, R, N, M, and S stand for T/C, A/T, A/G, any nucleotide, A/C, and G/C, respectively) or TAC-CCATAATGCATYGCG sequences as the Staf consensus binding sites, depending on whether moderate or highly stringent conditions were employed (2,4). Compared with the consensus sequences, known Staf-binding sites revealed a high degree of divergence. This is well illustrated by the absence of either the 5Ј part in the Xenopus tRNA Sec or the 3Ј part in the human U6 snRNA sites (5). Not all the seven zinc fingers are necessary for the binding of Staf to the SBS. For example, zinc finger 7 does not establish base-specific contacts in Staf-DNA complexes, and the requirement for zinc finger 1 is flexible because it contacts the DNA at the Xenopus tRNA Sec but not human U6 snRNA Staf motifs (5). The nonutilization of zinc finger 1 at the human U6 promoter enables the simultaneous binding of Staf and Oct-1 to their cognate DNA motifs; Oct-1 is another factor involved in transcriptional activation of the human U6 snRNA gene (5).
ZNF76 and ZNF143 are two human homologs of Xenopus Staf, with ZNF143 being the ortholog and ZNF76 a likely paralog (6 -8). ZNF143 is conserved in vertebrates and urochordates, such as Ciona intestinalis, whereas ZNF76 is only present in vertebrates. In this study, human Staf will be referred to as hStaf/ZNF143. Moreover, Staf possesses the capacity to stimulate chloramphenicol acetyltransferase expression from a synthetic mRNA promoter (SBS-tkCAT), containing SBS linked to the basal promoter of the thymidine kinase gene fused to the chloramphenicol acetyltransferase reporter (9). The presence of two physically and functionally distinct activation domains constitutes the molecular basis underlying the abilities of hStaf/ZNF143 to stimulate transcription from either snRNA-type or mRNA promoters. Indeed, although a 93amino acid domain made of four repeat units is specialized for activating transcription from an mRNA promoter, a segment of only 18 amino acids acts specifically on snRNA and snRNA-type promoters (10). To date, only the following seven protein coding genes have been described to be regulated by hStaf/ZNF143: the cytosolic chaperonin containing t-complex polypeptide 1 (TCP1) (11); the interferon regulatory factor (IRF3) (12); the neuronal nitric-oxide synthase (NOS1) (13); the transaldolase (TALDO1) (14); the aldehyde reductase (AKR1A1) (15); the mitochondrial ribosomal protein S11 (MRPS11) (16); and the synaptobrevin-like 1 (SYBL1) (17).
In this work, we describe the results of an analysis aimed at identifying hStaf/ZNF143-binding sites across mammalian genomes. We have used in silico genome-wide analysis to identify 1175 SBS distributed in 938 promoters, among which 295 were chosen for experimental validation by chromatin immunoprecipitation (ChIP). Semi-quantitative PCR confirmed the binding of hStaf/ZNF143 to 266 (90%) of the tested loci. Most of the identified SBS were located near the transcription start site and were tightly associated with CpG islands. Extrapolated to the full sizes of the mammalian genomes, this partial mapping revealed an unexpected large number of SBS in the promoters of protein coding genes, with at least 2500 SBS distributed in 2000 promoters. Furthermore, this study established that the presence of SBS alone is sufficient to stimulate the expression of a luciferase reporter gene, suggesting that hStaf/ZNF143 can recruit the transcription machinery and up-regulate transcription of an extraordinary high number of genes.

EXPERIMENTAL PROCEDURES
Bioinformatics-Human promoters containing putative SBS were obtained by searching transcription factor-binding sites at DBTSS (18,19). DBTSS version 4 (2003 human hg16 assembly) was queried with the Staf-binding sites listed in Table 1. Alignments of the human and mouse promoter sequences provided by DBTSS were examined for Staf-binding site conservation in the mouse. Using the human promoter sequences containing the SBS motif as the query, we looked for motif conservation in mouse (for promoters not present in DBTSS), rat, and dog genomes based on the vertebrate Multiz alignment (BLAT search at the UCSC Genome Bioinformatics Site). Assignment of the identified genes to categories of functional groups was performed using information from NCBI Entrez gene and Gene Ontology provided by GOA.
The proximity of CpG islands to the identified SBS was found by examining at DBTSS the results of the search for transcription factor-binding sites. The coordinates of the human Stafbinding sites (based on the May 2004 human hg17) were extracted by BLAT search at the UCSC Genome Bioinformatics Site, using the corresponding sequence. Positions of the Stafbinding sites in human and mouse promoters were obtained by searching DBTSS for transcription factor-binding site.
Identification of additional Staf-binding sites from the work of Xie et al. (20) was performed as follows. The coordinates of the discovered motifs corresponding to sequences ACTAY-RNNNCCCR, ACTACNNNNCCC, and ACTACNNNTCCCR (Xie motifs), in direct and reverse orientations, were obtained on line. Sequences located 3Ј to the Xie motifs were extracted with the Human Genome Browser. 3Ј-Extended Xie motifs were visually examined for the presence of an associated Stafbinding site. Conservation of the newly identified Staf motifs, in the mouse, rat, and dog genomes, was evaluated by the vertebrate Multiz alignment. Sequence logos were created using WebLogo (21).
Transfection and Luciferase Assays-COS-7 cells were transfected by the calcium phosphate co-precipitation procedure with 1 g of reporter construct, 0.5 g of pCH110 plasmid as the internal control, and carrier DNA to bring up the total DNA content to 10 g/plate. After 24 h, cells were lysed, and the ␤-galactosidase activity was measured as described previously (10). The luciferase assay was performed as recommended by the manufacturer. The luciferase activity was normalized to the ␤-galactosidase activity. Each transfection experiment was done in triplicate.
ChIP Assay-The ChIP procedure was essentially as described in Weinmann and Farnham (22) with a few modifications. HeLa cells (10 6 cells/ml) were treated with 1% formaldehyde at room temperature for 10 min. The reaction was stopped by addition of glycine to a final concentration of 125 mM. Cells were washed once with ice-cold phosphate-buffered saline containing protease inhibitors, scraped, centrifuged at 1500 rpm for 10 min, then resuspended in cell lysis buffer (5 mM HEPES-NaOH, pH 8, 85 mM KCl, 0.5% Nonidet P-40, protease inhibitors), and kept on ice for 10 min. They were homogenized 10 times with a Dounce homogenizer, and the resultant homogenates were centrifuged at 4000 rpm for 10 min to pellet the nuclei. Nuclei were resuspended in 1.5 ml of shearing buffer (50 mM Tris-HCl, pH 8, 10 mM EDTA, 1% SDS, protease inhibitors). Nuclear lysates were sonicated on ice to an average chromatin length of 0.5-1 kb (three pulses of 20 s at a setting of 25 with a Vibra-cell sonicator; Bioblock Scientific) and then centrifuged at 14,000 rpm for 12 min. The supernatant was incubated with 10 volumes of ChIP buffer (50 mM HEPES-NaOH, pH 7.9, 140 mM NaCl, 1% Triton X-100, 0.01% SDS, protease inhibitors) with protein A-Sepharose for 2 h at 4°C. After protein A-Sepharose removal, the precleared lysates were used as the soluble chromatin for ChIP. Chromatin was incubated at 4°C overnight with 3.5 g of anti-hStaf/ZNF143. Control for no antibody was included. Immune complexes were recovered by incubation for 2 h at 4°C with protein A-Sepharose previously blocked in buffer containing 2 mg/ml bovine serum albumin and 1 mg/ml salmon testis DNA. Immune complexes were washed four times sequentially in the following buffers (shearing buffer; wash buffer 1: 50 mM HEPES-NaOH, pH 7.9, 300 mM NaCl, 1% Triton X-100, 0.1% SDS; wash buffer 2: 50 mM HEPES-NaOH, pH 7.9, 500 mM NaCl, 1% Triton X-100, 0.1% SDS; wash buffer 3: 50 mM Tris-HCl, pH 8, 250 mM LiCl, 0.5% Nonidet P-40, 1 mM EDTA, 1% deoxycholate; TE buffer). The DNA-protein complex was eluted in 400 l of elution buffer (100 mM NaHCO 3 , 1% SDS) at room temperature for 15 min. The formaldehyde cross-link was reversed by overnight incubation at 65°C in the presence of 190 mM NaCl. Immunoprecipitated DNA was released by treatment with 100 g of proteinase K for 1 h at 45°C in the presence of 10 mM EDTA and 40 mM Tris-HCl, pH 6.5. The DNA was purified by phenol extraction and ethanol-precipitated. The rabbit polyclonal antipeptide antibody against a C-terminal epitope of the Xenopus Staf protein (5) was used for ChIP.
PCR Analysis-Purified DNA was analyzed in 25-l PCRs with primer pairs flanking the predicted SBS and in the pres-hStaf/ZNF143-binding Sites in the Human Genome ence of 3 Ci of [␣-32 P]dCTP (3000 Ci/mmol). PCR products were 140 -400 bp in length. Half of the samples were resolved by native PAGE and revealed by autoradiography. Typically, 1/500 and 1/2000 of the immunoprecipitated DNA were used for PCR analysis. Decreasing amounts of input DNA (1/10,000, 1/25,000, and 1/100,000) were used to determine the linear range of the PCR for each primer pair. Cycling parameters were 95°C for 3 min, 35 cycles at 95°C for 30 s, 52-72°C (depending on each primer pair) for 30 s, 72°C for 30 s, and 72°C for 5 min. For the negative controls of ChIP assay, we used the PP1 to PP4 primer pairs hybridizing to unique regions lying 2.4, 2.1, 6.5, and 2.5 kbp upstream of the tRNA Sec , U4 ATAC, glyceraldehyde-3phosphate dehydrogenase (GAPDH), and Budding Uninhibited by Benzimidazoles 1 (BUB1B) genes, respectively. The primer sequences used in this study are available on request.

In Silico Identification of 893 Potential hStaf/ZNF143-binding Sites across Human and Mouse Protein Gene Promoters-
Our strategy for identifying human promoters directly bound by Staf was based on the pre-selection of candidates showing phylogenetic conservation of the SBS site in orthologous mammalian promoters, followed by the experimental analysis of a large number of individual sites with a semi-quantitative ChIP assay. The pre-selected promoters were identified by searching potential human transcription factor-binding sites at the DBTSS promoter data base (18,19) that are conserved in orthologous mouse promoters. The DBTSS data base was originally constructed from a collection of experimentally determined transcription start sites (TSS) of human genes and is suitable for comparative analysis of human and mouse orthologous promoters. Version 4 used in this study was based on human hg16 and contains 8793 human promoter sequences. From the scrutiny of seven known target genes, we could localize the Staf-binding sites in promoters lying at the proximity of TSS. Therefore, the analysis was further limited to sequences spanning 1000 bp upstream and 200 bp downstream of the TSS. Eleven 18-bp-long sequences (A1 to K1 in Table 1) were used as queries for the first round of motif search. The 18-bp YYC-CCANNRNRCNNYRCR A1 sequence is the consensus derived by comparing the eight Staf motifs identified in seven human and mouse protein coding genes (11)(12)(13)(14)(15)(16)(17). It is more degenerated than the Staf consensus sequences derived from binding site selection experiments under moderately stringent conditions (see Introduction). With respect to A1, the B1 to K1 sequences bear 1-, 2-, or 3-bp variations (Table 1). These changes were chosen from the constraints observed at several positions in binding site selection experiments (2,4).
Among the DBTSS entries, a high number of promoters scored positive with one or more potential Staf-binding sites within the predetermined boundaries, identifying a total of 1488 potential binding sites with 648 motifs (43.5%) conserved in the orthologous mouse promoters (Table 1 and supplemental Table S1). Table 1 indicates that the score distribution is not uniform, the highest ones being obtained for the A1 and F1 sequences with 185 and 97 human/mouse conserved hits, the lowest score arising from the H1 sequence with 13 human/ mouse conserved hits. Generally, a good correlation was observed at one given position between score distribution and the nucleotide preference obtained from binding site selection experiments (2,4). For example, binding site selection indicated that a C occupied position 3 in 95% of the sequences, whereas a T occurred in only 3% of the cases (2). In the DBTSS screen, the same distribution was observed. The query sequence carrying a C at position 3 (A1) produced 185 conserved hits between human and mouse, a single C to T change (H1) leading to only 13 hits (Table 1). Interestingly, sequence alignments of putative SBS show that residues preferentially occupying positions 7-11, 13-16, and 18 ( Fig. 1A, right panel), corresponding to N, R, or Y in the queried sequences in Table 1, are identical to those derived from the binding site selection experiments (2). This strongly suggests that the identified sequences represent bona fide SBS.
Inspection of the adjacent sequences to the potential SBS revealed the presence of the 7-bp ACTACAN motif, or a 1-2-bp variant, among 305 (47%) of the 648 identified SBS (Table 1; Fig. 1A, left panel; supplemental Table S1). It lies immediately 5Ј to the SBS. Therefore, to identify novel SBS that would harbor the ACTACAN motif, we performed a second round of motif search at DBTSS with the A2 to F2, H2, and J2 motifs as queries. These 19-bp motifs are arranged in the following way (see Table 2). The 5Ј part is now composed of the 7-bp submotif ACTACAN, the 3Ј part corresponding to bp 1-12 of the A1 to F1, H1, and J1 sequences. This second round generated 459 hits with 183 occurrences conserved in the orthologous mouse promoters but not identified by the A1 to K1 queries. It is striking to observe that the consensus sequence derived for the 6-bp sequence directly adjacent to the 3Ј part of the newly identified motifs, although not included in the queries, bears high sequence similarity to sequence CC(C/T)GCG of the A1 to K1 motifs (compare positions 13-18 in Fig. 1, A and  B; note that the heights of the letters are not directly proportional to occurrences). Again, this strongly argues for the characterization of genuine binding sites. Next, to characterize promoters containing at least two potential SBS, we performed a third search at DBTSS (LA1-LK1, Table 3). This round was implemented with two simultaneous queries: (i) one with the previous A1-K1 motifs (sequence motif 1 in Table 3); (ii) the 648 47

hStaf/ZNF143-binding Sites in the Human Genome
second one with only the newly identified ACTACAN motif (sequence motif 2 in Table 3). The bioinformatic search was followed by visual inspection of the occurrences to detect the presence of potential SBS linked to the ACTACAN motif. In this way, we could identify 21 additional motifs that are conserved in orthologous mouse promoters (Table 3, Fig. 1C, and supplemental Table S1). Finally, the sequences of the identified promoters were visually examined for unidentified sequence elements that might serve as Staf-binding sites. This inspection enabled detection of 41 additional sites conserved in orthologous mouse promoters (named X in Fig. 1D and supplemental Table S1). To extend the phylogenetic comparison, a comparative analysis was performed in the orthologous promoters of the rat and dog genomes. Among the 893 identified motifs that we found conserved in the human and mouse genomes, 772 orthologous regions were detected in the rat and canine genomes. Interestingly, sequence comparisons revealed that 753 (97%) are also conserved in these two genomes. In 18 cases, the SBS is conserved in one of the two genomes, and in one single case only (NM_003085) they both lack it (supplemental Table S1). Altogether, our comparative search identified 893 SBS conserved in mouse and residing in 716 promoters of human protein coding genes. Furthermore, nearly about one-half of the discovered SBS were associated with the newly identified ACTACAN motif. Both the SBS and the associated ACTACAN motifs are also conserved with a very high prevalence in the rat and dog genomes.
Identification of 282 Other Putative hStaf/ZNF143-binding Sites-A recent report described the systematic discovery of regulatory motifs in human promoters by comparing several mammalian genomes (20). Among the 174 conserved motifs that the authors discovered in the promoters of protein coding genes, the 13-bp ACTAYRNNNC-CCR M4 sequence is particularly interesting because ACTAYRN shows a strong match to the SBSadjacent ACTACAN submotif, and NNCCCR is highly similar to the 5Ј part of the SBS. We thus sought to use the M4 motif to uncover new SBS by searching for the presence of SBS overlapping the M4 motif. This led us to discover 282 new potential SBS (see logo sequence derived from sequence comparisons, Fig. 1E). Among these, 23 are located in promoters previously identified by our search at DBTSS, and the other 259 SBS are located in 222 new promoters containing this site (Fig. 2A). The 282 sites

hStaf/ZNF143-binding Sites in the Human Genome
are conserved in the rat and dog genomes, with 273 (96.8%) occurring also in the mouse genome (supplemental Table S1). Collectively, our in silico analysis resulted in the identification of 1175 conserved putative SBS harbored in 938 promoters of mammalian protein coding genes ( Fig. 2A). Considering the results of this phylogenetic footprinting, it is likely that a high proportion of the identified SBS represent target sites for hStaf/ZNF143.
Characteristic Features of the Identified SBS and SBS-containing Promoters-Two features characterize a transcription factor-binding site as follows: first, its position, and second, its orientation relative to the gene. Fig. 2B (see also supplemental Table S1) shows the number of human SBS, with assigned positions, plotted versus their position relative to the TSS. We found that 673 (62%) of the 1077 SBS with assigned positions occur within 200 bp upstream of the TSS. Similar values were obtained with the SBS identified in mouse promoters, consistent with the involvement of this motif in initiation of transcription. We observed that 32% of the human/mouse orthologous SBS are positioned at the same distance from the TSS but that a difference of up to 50 bp occurs in 47% of the cases. It is very likely that this variation is related to the lack of accuracy in determining the TSS of numerous genes.
Another feature of the SBS motif that we detected is its property to appear as multiple copies. Overall, 188 (20%) of the 939 mammalian promoters we have identified contained multiple SBS lying within 500 bp (Fig. 2C). The majority (150) contain two SBS, with three SBS being found in 31 promoters; four genes contain four SBS (CDK5 (cyclin-dependent kinase 5), ENY2 (enhancer of yellow 2 homolog), NUDCD1 (NudC domain containing 1), and PRCC (papillary renal cell carcinoma)); two genes have five SBS (LIG1 (DNA ligase 1) and BANP (BTG3 associated nuclear protein)); one single gene represents a unique situation with six SBS (NCLN (nicalin homolog)). Fig. 3 displays the multiple sequence alignment of the nicalin homolog promoter region across the four mammalian genomes. The very high degree of sequence conservation of the six potential SBS appears clearly and stands out from the nonconserved flanking sequences.
Two interesting features regarding the physical arrangement of the SBS came from our study. First, of the 1175 identified SBS, a bias occurs toward a direct orientation with respect to the gene as follows: 702 (60%) are in direct orientation; among promoters possessing two SBS, 41% are in direct orientation, and 13% are in reverse orientation, and the remainder have a combination of both. Second, we identified SBS in potential bidirectional promoters, with 146 promoters (15%) arranged in a head-to-head configuration within 1 kbp (supplemental Table S1). 624 (87%) of the 716 human SBS-containing promoters that we identified at DBTSS overlap CpG islands, sequences known to be associated with active promoters (23, 24) (supplemental Table S1). In general, the promoters we found were GC-rich, and we could not recognize any significant TATA box sequence. TATA box consensus sequences could be observed in only 19 (5%) of the 355 examined promoters, suggesting that they are not a characteristic feature of SBS-containing gene promoters. Summarized in supplemental Table S1 are various features of the identified SBS and promoters, including identification number of the human promoters, gene name and symbol, position of the SBS in human and mouse promoters, coordinates of SBS on human chromosomes, motif orientation, association with CpG islands, sequence of the SBS motif with the associated 5Ј submotif, alignment of the mouse and human motifs, and motif conservation in rat and dog genomes.
Occupancy of Endogenous Genes by hStaf/ZNF143-To determine whether hStaf/ZNF143 actually binds to the conserved motifs, we tested by ChIP assays 430 of the sites identified in silico and contained in 295 different promoters (listed in supplemental Table 1; Fig. 4A). The 295 promoters represent a statistically valid sample because they include 107 (56%) promoters with multiple sites (multiple site promoter (MSP)) and 188 (25%) with unique sites (unique site promoter (USP)). The tested MSP corresponded essentially to genes with an attributed function. The tested USP were extracted with random choice from the results of the different queries at DBTSS. To obtain statistically valid data, the sample of tested promoters containing SBS with and without the ACTACAN motif was directly proportional to the outcome of the various interrogation results. ChIP analysis was performed in HeLa cells with antibodies directed against hStaf/ZNF143. The recovered DNA was analyzed by semi-quantitative PCR with primer pairs spanning promoters with either single or multiple SBS. Each DNA sequence was amplified in parallel reactions with each of the three DNA templates purified from the following: 1) anti-hStaf/ ZNF143 ChIP; 2) control ChIP; and 3) input chromatin. For each PCR, we tested two dilutions of DNA immunoprecipitated with anti-hStaf/ZNF143 (Fig. 4, B-G, lanes 1 and 2; supplemen-

Identification of Staf-binding sites by searching transcription factor-binding sites at DBTSS
Query names, sequences of the query, and hits of human versus mouse SBS conservation are indicated. N, R, and Y indicate any nucleotide, A or G, and T or C, respectively.

Query name Sequence New human/mouse hits
hStaf/ZNF143-binding Sites in the Human Genome DECEMBER 29, 2006 • VOLUME 281 • NUMBER 52 tal Fig. S1) or no antibody (Fig. 4, B-G, lanes 3 and 4; supplemental Fig. S1). Also, serial dilutions of input material were analyzed to demonstrate that the PCR was quantitative within a linear range of amplification (Fig. 4, B-G, lanes 5-7; supplemental Fig. S1). The positive controls for the ChIP assay included genes previously reported to respond to hStaf/ ZNF143, the tRNA Sec , synaptobrevin-like1, aldehyde reductase, and t-complex polypeptide 1 (1,11,15,17). As expected (Fig. 4B), a specific signal of higher intensity than in the no antibody control was obtained with the DNA immunoprecipitated with anti-hStaf/ZNF143 (compare lanes 1 and 2 and 3 and   4 in Fig. 4B). In contrast, no specific signal could be obtained with the primer pairs PP1 to PP4 amplifying DNA sequences lying several kbp upstream of the tRNA Sec , U4 ATAC, GAPDH, and BUB1B genes because these remote regions were not expected to interact with hStaf/ ZNF143 (compare lanes 1 and 2 and 3 and 4 in Fig. 4C). Among the 295 PCR amplifications, 29 (9.8%) were close to the background (supplemental Table S1 and Fig. S1). This is illustrated in Fig. 4 Fig. 4F). We observed an identical percentage of positive results regardless of the sequence used for identifying the motifs. Furthermore, no bias was introduced whether promoters with or without the ACTACAN motif were employed. By inspecting the SBS sequences that overlap the M4 motif of Xie et al. (20) (Xie in supplemental  Table S1), about 15% had the highly conserved C12 substituted by A, G, or T (compare sequence logos in Fig. 1,  A and E). ChIP experiments with five of them (A12, GenBank TM accession numbers NM_001320 and NM_006185; G12, GenBank TM accession numbers NM_004748 and NM_012257; T12, Gen-Bank TM accession number NM_005111; supplemental Fig. S1) concluded that sequence variation at position 12 did not influence the binding of hStaf/ZNF143.
Taken together, we could confirm that 266 (90%) of the promoters tested did harbor true hStaf/ZNF143-binding sites, underpinning the robustness of the computational screens. Moreover, these findings strongly suggest that the hStaf/ ZNF143-binding sites identified in silico constitute high prevalence bona fide direct targets of hStaf/ZNF143.

hStaf/ZNF143-binding Sites in the Human Genome
The SBS Motifs Are Sufficient to Direct Transcriptional Activation without Associated Basal Promoter Elements-Only 5% of the promoters identified in this study contain a putative TATA box. To assess whether the SBS alone has the capacity to drive transcription, we tested its ability to up-regulate a luciferase reporter gene in a context deprived of known basal pro-moter elements. Reporters were constructed that contained three identical copies of wild type (WT SBS) or mutant SBS (mut-1 SBS, mut-2 SBS; Fig. 5A) directly fused to the luciferase coding sequence. In these constructs, the SBS reside at Ϫ91, Ϫ149, and Ϫ182 bp upstream of the luciferase initiation codon. The effect of those same mutations was evaluated in a previous work but in a different context, with mut-1 leading to a moderately reduced ability to bind Staf, whereas mut-2 affected its binding much more severely (1). Transfection of the WT reporter into COS-7 cells resulted in efficient transcriptional activation (compare empty vector and WT SBS in Fig. 5B). In contrast, the reporters bearing mutations in the central (mut-1 SBS) or in the 5Ј part (mut-2 SBS) of the motif resulted in 8-and 22-fold reduction of transcriptional levels, respectively (Fig. 5B, compare WT SBS, mut-1 SBS, and mut-2 SBS). Taken together, these results indicate that the SBS is sufficient to direct expression of a reporter gene in the absence of known basal promoter elements.
To further evaluate the role of the ACTACAN motif, the effects of mutations therein were evaluated on the activity of a luciferase reporter gene directed only by the SBS and ACTACAN motifs. We constructed three different reporters containing, directly fused to the luciferase coding sequence, two identical copies of the following: 1) the wild-type ACTACAN motif and SBS (WT ACTACAN and WT SBS); 2) the mutated ACTACAN and WT SBS (mut ACTACAN and WT SBS); 3) the mutated ACTACAN motif and mutated SBS (mut ACTACAN and mut SBS) (Fig. 5A). In these constructs, the SBS resides at Ϫ189 and Ϫ303 bp upstream of the luciferase translation initiation codon. The construct containing both mutated motifs (mut ACTA-CAN and mut SBS) was unable to drive transcription above the basal level of the empty vector (Fig. 5C). In contrast, the presence of the WT ACATCAN associated with the WT SBS induced a 49-fold induction of the luciferase activity (Fig. 5C). Moreover, mutation of the ACTACAN motif alone reduced the  DECEMBER 29, 2006 • VOLUME 281 • NUMBER 52 transcriptional activity to about 50% of the WT level (Fig. 5C). Collectively, these results demonstrated the functional importance of the ACTACAN motif associated with the SBS.

DISCUSSION
The hStaf/ZNF143 Transcription Factor-binding Site, One of the Most Widespread in the Human Genome-We have performed a large scale analysis to evaluate the binding of hStaf/ ZNF143 to mammalian promoters containing the SBS consensus element and variants therein. Our data revealed that the protein binds to a strikingly large number of genes, suggesting a significant diversity in the ensuing transcriptional response. In total, we have defined a set of 1175 evolutionarily conserved hStaf/ZNF143-binding sites in mammalian genomes and distributed in 938 promoters. Direct experimental validation by ChIP on 295 promoters yielded 90% success, indicating that very few of the identified sites were false positives. Moreover, negative results with ChIP are not necessarily synonymous of protein absence on the promoter, this technique being dependent on factors such as accessibility of the antibody to its epitope (25). Importantly, the high rate of success demonstrates the robustness of the in silico approach applied here.
Previously, hStaf/ZNF143-binding sites were identified by studying individual genes. At the onset of this work, only seven promoters of protein coding genes were recognized as hStaf/ZNF143 targets (11)(12)(13)(14)(15)(16)(17). Among the 938 promoters identified in this study, 716 were obtained by queries at DBTSS (version 4), which contains only 8793 human promoters. Extrapolation to the 25,000 human genes (26) revealed a minimum estimate of 2500 SBS in 2000 promoters of protein coding genes. This value is very similar to the number of Sp1-binding sites in protein coding genes (27). Furthermore, the number of identified targets is most likely underestimated because our screen was based solely on the SBS consensus and its variants, whereas other variant sites can also be bound by the protein. For example, we could not identify the IRF3 and AKR1A1 promoters present in DBTSS and known as hStaf/ ZNF143-dependent (12,15) because the SBS sequences in these promoters do not fit the queried sequences used in the various screens.
About 58% of the identified SBS were found associated with the functional ACTACAN submotif, although our earlier work never detected it in the SBS of snRNA and snRNA-type genes The results of the ChIP and PCR assays performed on 107 promoters containing at least two SBS and 188 promoters with one SBS are indicated. B-G, the binding of endogenous hStaf-ZNF143 to genomic sites in HeLa cells was analyzed by ChIP. Genomic DNA fragments, recovered from input material or immunoprecipitated with hStaf/ZNF143 antibody or no antibody, were subjected to semiquantitative PCR amplification in the presence of [␣-32 P]dCTP with specific primer pairs. Lanes 1-4, serial dilutions of DNA immunoprecipitated with anti-hStaf/ZNF143 or no antibody, respectively. Lanes 5-7, serial dilutions of input material were analyzed to demonstrate that the assays were within the linear range of PCR amplification. Lane 8 (NC, negative control), PCR lacking chromatin DNA template. Positive (B) and negative (C) controls for ChIP assays. Promoters used are indicated on the right of B. PCR products generated with the PP1-PP4 primer pairs were 235, 211, 174, and 226 bp long and originate from unique regions 2.4, 2.1, 6.5, and 2.5 kbp upstream of the tRNA Sec , U4 ATAC, GAPDH, and BUB1B genes, respectively. Typical positive and negative results obtained from promoter regions containing multiple SBS are depicted in D and E, respectively. Similar experiments with those arising from promoters containing one single SBS are represented in F and G, respectively. Promoters are indicated by identification number and gene name. (2,28). It may well be that this submotif serves as a binding site for an unknown transcription factor. It might also induce a particular structure to the DNA that will optimize the binding of hStaf/ZNF143 to its cognate sequence. The ACTACAN sequence is particularly interesting in the light of a recent work describing the systematic discovery of motifs in human promoters by comparing several mammalian genomes (20). This promoter analysis yielded candidate motifs, including most of the previously known transcription factor-binding sites but also novel motifs. The top six of the discovered motifs and associated factors are M1, NRF-1; M2, MYC; M3, ELK-1; M4, unknown factor; M5, NFY; M6, Sp1 with a motif conservation score (MCS) of 107.8, 85.3, 80.4, 69.5, 64.6, and 63.9, respectively. Motifs with high MCS are both highly conserved and frequently occurring in the promoters of protein genes. The MCS of the M4 motif ACTAY-RNNNCCCR has conservation and occurrence frequency higher than that of Sp1 and NFY transcription factors. Our work established clearly that ACTAYRN indeed corresponds to the ACTACAN sequence we identified and that NNCCCR represents SBS residues 1-6 ( Fig. 1A, right panel). Xie et al. (20) did not report residues 7-18 of the SBS as a conserved motif. This is very likely because of the lower sequence conservation in the 3Ј part of the SBS. Considering that 42% of the identified SBS lacked the ACTACAN submotif, the global MCS for the hStaf/ZNF143-binding site (with and without ACTACAN) is higher than the MCS of M4. This strongly suggests that the hStaf/ ZNF143-binding site represents one of the most frequently occurring transcription factor-binding sites in the human genome.

hStaf/ZNF143-binding Sites in the Human Genome
Identified SBS Are Located Upstream TAF Sites-Promoters with SBS showed high enrichment in CpG islands. The percentage of CpG-associated promoters (87%) is equivalent to the percentage (88%) of CpG associated with active pro- FIGURE 5. Importance of the SBS and associated ACTACAN motif on transcription of luciferase reporter genes. A, schematic diagrams of the reporter genes used in the transfection assays. Mutations are underlined; they were introduced in each SBS. B, transcription assays in COS-7 cells with the empty luciferase (LUC) vector, WT SBS, mut-1 SBS, and mut-2 SBS luciferase reporter genes. C, transcription assays in COS-7 cells with the empty vector; mut ACTACAN, mut SBS; mut ACTACAN, WT SBS; and WT ACTACAN, WT SBS luciferase reporter genes. The relative transcription activities are indicated, representing the luciferase activity/empty vector ratio. FIGURE 6. Categorization of the hStaf/ZNF143 target genes. Genes targeted by hStaf/ZNF143, whose function is known or inferred, were classified according to the function of the encoded protein. DECEMBER 29, 2006 • VOLUME 281 • NUMBER 52 JOURNAL OF BIOLOGICAL CHEMISTRY 39961 moters (29). Notably, we did not find the TATA box to be significantly present in SBS-associated promoters, consistent with the TATA box being not a general promoter element in mammalian genes (29). Although the presence of a hStaf/ZNF143binding site does not necessarily imply a direct effect of the protein on the expression of the gene, it is difficult to imagine that hundreds of hStaf/ZNF143 sites lying at the proximity of TSS could represent random and functionally meaningless occurrences. In this respect, we have shown that introduction of an SBS is sufficient to up-regulate transcription of a luciferase reporter gene, suggesting that hStaf/ZNF143 can directly or indirectly recruit the transcription machinery.

hStaf/ZNF143-binding Sites in the Human Genome
We examined the results of Kim et al. (29) who performed genome-wide location analysis experiments to identify gene promoters associated with TAF1, the largest subunit of TFIID. They concluded that a total of 9328 nonredundant gene promoters are occupied by TAF1. We found that 346 hStaf/ ZNF143-bound promoters also harbor TAF1-binding sites. Comparison of the positions of SBS-and TAF1-binding sites in one given promoter established that 94.5% SBS are found upstream of TAF1 locations. Analysis of the distribution of the SBS and TAF1 sites revealed the interesting feature that SBS are essentially upstream of the transcription start site, whereas TAF sites are mostly found downstream of it (Fig. 2D). Moreover, the number of common promoter targets between TFIID and hStaf/ZNF143 far exceeds a random overlap between two similar groups, suggesting that hStaf/ZNF143 functions with TFIID in regulating gene expression. In this respect, we showed that SBS are sufficient to up-regulate a reporter gene on their own, and it is tempting to speculate that hStaf/ZNF143 can directly recruit TFIID.
SBS and Bidirectional Promoters-It is well established that bidirectional promoters are relatively common in mammalian genomes, representing more than 10% of the genes whose TSS are separated by less than 1000 bp (30,31). In 15% of the identified promoters (146 promoters distributed in 73 pairs), a common SBS is present in two promoters in a headto-head orientation. Further studies are needed to elucidate the direct involvement of hStaf/ZNF143 in this particular gene organization.
Functional Classification of the hStaf/ZNF143 Target Genes-Among the identified promoters, 30% were associated with genes with unknown function. To determine whether hStaf/ ZNF143 is specifically associated with a gene category with a particular function, we categorized the genes according to their functional annotation. As shown in Fig. 6, the identified genes are distributed along all the functional categories. The prominent classes are as follows: (i) DNA-binding proteins and transcription factors that represent 23% of the total; (ii) protein synthesis degradation turnover and modification (21% of the total). We note that many of the identified genes are indeed important for cell growth.
In summary, our ChIP-coupled comparative genomic analysis has delivered a first version of the human catalog of hStaf/ ZNF143 regulatory motifs in gene promoters. This work provides a convincing demonstration that performing combined in silico and experimental studies is absolutely instrumental in analyzing gene promoters.